Merge branch 'master' into spacy.io

2025-08-01 19:00:20 +03:00 · 2019-08-01 18:37:20 +02:00 · 2019-08-01 18:37:20 +02:00 · dcad9a14c5
commit dcad9a14c5
parent d8fcebf386 0f76e0022d
3 changed files with 23 additions and 21 deletions
--- a/website/docs/api/doc.md
+++ b/website/docs/api/doc.md
@ -651,7 +651,7 @@ The L2 norm of the document's vector representation.
 | `text_with_ws`                          | unicode      | An alias of `Doc.text`, provided for duck-type compatibility with `Span` and `Token`.                                                                                                                                                                                                      |
 | `mem`                                   | `Pool`       | The document's local memory heap, for all C data it owns.                                                                                                                                                                                                                                  |
 | `vocab`                                 | `Vocab`      | The store of lexical types.                                                                                                                                                                                                                                                                |
-| `tensor` <Tag variant="new">2</Tag>     | object       | Container for dense vector representations.                                                                                                                                                                                                                                                |
+| `tensor` <Tag variant="new">2</Tag>     | `ndarray`    | Container for dense vector representations.                                                                                                                                                                                                                                                |
 | `cats` <Tag variant="new">2</Tag>       | dictionary   | Maps either a label to a score for categories applied to whole document, or `(start_char, end_char, label)` to score for categories applied to spans. `start_char` and `end_char` should be character offsets, label can be either a string or an integer ID, and score should be a float. |
 | `user_data`                             | -            | A generic storage area, for user custom data.                                                                                                                                                                                                                                              |
 | `lang` <Tag variant="new">2.1</Tag>     | int          | Language of the document's vocabulary.                                                                                                                                                                                                                                                     |
--- a/website/docs/api/span.md
+++ b/website/docs/api/span.md
@ -463,22 +463,23 @@ The L2 norm of the span's vector representation.

 ## Attributes {#attributes}

-| Name           | Type         | Description                                                                                                    |
-| -------------- | ------------ | -------------------------------------------------------------------------------------------------------------- |
-| `doc`          | `Doc`        | The parent document.                                                                                           |
-| `sent`         | `Span`       | The sentence span that this span is a part of.                                                                 |
-| `start`        | int          | The token offset for the start of the span.                                                                    |
-| `end`          | int          | The token offset for the end of the span.                                                                      |
-| `start_char`   | int          | The character offset for the start of the span.                                                                |
-| `end_char`     | int          | The character offset for the end of the span.                                                                  |
-| `text`         | unicode      | A unicode representation of the span text.                                                                     |
-| `text_with_ws` | unicode      | The text content of the span with a trailing whitespace character if the last token has one.                   |
-| `orth`         | int          | ID of the verbatim text content.                                                                               |
-| `orth_`        | unicode      | Verbatim text content (identical to `Span.text`). Exists mostly for consistency with the other attributes.     |
-| `label`        | int          | The span's label.                                                                                              |
-| `label_`       | unicode      | The span's label.                                                                                              |
-| `lemma_`       | unicode      | The span's lemma.                                                                                              |
-| `ent_id`       | int          | The hash value of the named entity the token is an instance of.                                                |
-| `ent_id_`      | unicode      | The string ID of the named entity the token is an instance of.                                                 |
-| `sentiment`    | float        | A scalar value indicating the positivity or negativity of the span.                                            |
-| `_`            | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes). |
+| Name                                    | Type         | Description                                                                                                    |
+| --------------------------------------- | ------------ | -------------------------------------------------------------------------------------------------------------- |
+| `doc`                                   | `Doc`        | The parent document.                                                                                           |
+| `tensor` <Tag variant="new">2.1.7</Tag> | `ndarray`    | The span's slice of the parent `Doc`'s tensor.                                                                 |
+| `sent`                                  | `Span`       | The sentence span that this span is a part of.                                                                 |
+| `start`                                 | int          | The token offset for the start of the span.                                                                    |
+| `end`                                   | int          | The token offset for the end of the span.                                                                      |
+| `start_char`                            | int          | The character offset for the start of the span.                                                                |
+| `end_char`                              | int          | The character offset for the end of the span.                                                                  |
+| `text`                                  | unicode      | A unicode representation of the span text.                                                                     |
+| `text_with_ws`                          | unicode      | The text content of the span with a trailing whitespace character if the last token has one.                   |
+| `orth`                                  | int          | ID of the verbatim text content.                                                                               |
+| `orth_`                                 | unicode      | Verbatim text content (identical to `Span.text`). Exists mostly for consistency with the other attributes.     |
+| `label`                                 | int          | The span's label.                                                                                              |
+| `label_`                                | unicode      | The span's label.                                                                                              |
+| `lemma_`                                | unicode      | The span's lemma.                                                                                              |
+| `ent_id`                                | int          | The hash value of the named entity the token is an instance of.                                                |
+| `ent_id_`                               | unicode      | The string ID of the named entity the token is an instance of.                                                 |
+| `sentiment`                             | float        | A scalar value indicating the positivity or negativity of the span.                                            |
+| `_`                                     | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes). |
--- a/website/docs/api/token.md
+++ b/website/docs/api/token.md
@ -417,6 +417,7 @@ The L2 norm of the token's vector representation.
 | `orth`                                       | int          | ID of the verbatim text content.                                                                                                                                                                                              |
 | `orth_`                                      | unicode      | Verbatim text content (identical to `Token.text`). Exists mostly for consistency with the other attributes.                                                                                                                   |
 | `vocab`                                      | `Vocab`      | The vocab object of the parent `Doc`.                                                                                                                                                                                         |
+| `tensor` <Tag variant="new">2.1.7</Tag>      | `ndarray`    | The tokens's slice of the parent `Doc`'s tensor.                                                                                                                                                                              |
 | `head`                                       | `Token`      | The syntactic parent, or "governor", of this token.                                                                                                                                                                           |
 | `left_edge`                                  | `Token`      | The leftmost token of this token's syntactic descendants.                                                                                                                                                                     |
 | `right_edge`                                 | `Token`      | The rightmost token of this token's syntactic descendants.                                                                                                                                                                    |
@ -424,7 +425,7 @@ The L2 norm of the token's vector representation.
 | `ent_type`                                   | int          | Named entity type.                                                                                                                                                                                                            |
 | `ent_type_`                                  | unicode      | Named entity type.                                                                                                                                                                                                            |
 | `ent_iob`                                    | int          | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set.                                                  |  |
-| `ent_iob_`                                   | unicode      | IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set.                                                  |
+| `ent_iob_`                                   | unicode      | IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set.                                                   |
 | `ent_id`                                     | int          | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution.                                                                                                         |
 | `ent_id_`                                    | unicode      | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution.                                                                                                         |
 | `lemma`                                      | int          | Base form of the token, with no inflectional suffixes.                                                                                                                                                                        |