Merge branch 'master' into spacy.io

This commit is contained in:
Ines Montani 2019-08-01 18:37:20 +02:00
commit dcad9a14c5
3 changed files with 23 additions and 21 deletions

View File

@ -651,7 +651,7 @@ The L2 norm of the document's vector representation.
| `text_with_ws` | unicode | An alias of `Doc.text`, provided for duck-type compatibility with `Span` and `Token`. | | `text_with_ws` | unicode | An alias of `Doc.text`, provided for duck-type compatibility with `Span` and `Token`. |
| `mem` | `Pool` | The document's local memory heap, for all C data it owns. | | `mem` | `Pool` | The document's local memory heap, for all C data it owns. |
| `vocab` | `Vocab` | The store of lexical types. | | `vocab` | `Vocab` | The store of lexical types. |
| `tensor` <Tag variant="new">2</Tag> | object | Container for dense vector representations. | | `tensor` <Tag variant="new">2</Tag> | `ndarray` | Container for dense vector representations. |
| `cats` <Tag variant="new">2</Tag> | dictionary | Maps either a label to a score for categories applied to whole document, or `(start_char, end_char, label)` to score for categories applied to spans. `start_char` and `end_char` should be character offsets, label can be either a string or an integer ID, and score should be a float. | | `cats` <Tag variant="new">2</Tag> | dictionary | Maps either a label to a score for categories applied to whole document, or `(start_char, end_char, label)` to score for categories applied to spans. `start_char` and `end_char` should be character offsets, label can be either a string or an integer ID, and score should be a float. |
| `user_data` | - | A generic storage area, for user custom data. | | `user_data` | - | A generic storage area, for user custom data. |
| `lang` <Tag variant="new">2.1</Tag> | int | Language of the document's vocabulary. | | `lang` <Tag variant="new">2.1</Tag> | int | Language of the document's vocabulary. |

View File

@ -463,22 +463,23 @@ The L2 norm of the span's vector representation.
## Attributes {#attributes} ## Attributes {#attributes}
| Name | Type | Description | | Name | Type | Description |
| -------------- | ------------ | -------------------------------------------------------------------------------------------------------------- | | --------------------------------------- | ------------ | -------------------------------------------------------------------------------------------------------------- |
| `doc` | `Doc` | The parent document. | | `doc` | `Doc` | The parent document. |
| `sent` | `Span` | The sentence span that this span is a part of. | | `tensor` <Tag variant="new">2.1.7</Tag> | `ndarray` | The span's slice of the parent `Doc`'s tensor. |
| `start` | int | The token offset for the start of the span. | | `sent` | `Span` | The sentence span that this span is a part of. |
| `end` | int | The token offset for the end of the span. | | `start` | int | The token offset for the start of the span. |
| `start_char` | int | The character offset for the start of the span. | | `end` | int | The token offset for the end of the span. |
| `end_char` | int | The character offset for the end of the span. | | `start_char` | int | The character offset for the start of the span. |
| `text` | unicode | A unicode representation of the span text. | | `end_char` | int | The character offset for the end of the span. |
| `text_with_ws` | unicode | The text content of the span with a trailing whitespace character if the last token has one. | | `text` | unicode | A unicode representation of the span text. |
| `orth` | int | ID of the verbatim text content. | | `text_with_ws` | unicode | The text content of the span with a trailing whitespace character if the last token has one. |
| `orth_` | unicode | Verbatim text content (identical to `Span.text`). Exists mostly for consistency with the other attributes. | | `orth` | int | ID of the verbatim text content. |
| `label` | int | The span's label. | | `orth_` | unicode | Verbatim text content (identical to `Span.text`). Exists mostly for consistency with the other attributes. |
| `label_` | unicode | The span's label. | | `label` | int | The span's label. |
| `lemma_` | unicode | The span's lemma. | | `label_` | unicode | The span's label. |
| `ent_id` | int | The hash value of the named entity the token is an instance of. | | `lemma_` | unicode | The span's lemma. |
| `ent_id_` | unicode | The string ID of the named entity the token is an instance of. | | `ent_id` | int | The hash value of the named entity the token is an instance of. |
| `sentiment` | float | A scalar value indicating the positivity or negativity of the span. | | `ent_id_` | unicode | The string ID of the named entity the token is an instance of. |
| `_` | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes). | | `sentiment` | float | A scalar value indicating the positivity or negativity of the span. |
| `_` | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes). |

View File

@ -417,6 +417,7 @@ The L2 norm of the token's vector representation.
| `orth` | int | ID of the verbatim text content. | | `orth` | int | ID of the verbatim text content. |
| `orth_` | unicode | Verbatim text content (identical to `Token.text`). Exists mostly for consistency with the other attributes. | | `orth_` | unicode | Verbatim text content (identical to `Token.text`). Exists mostly for consistency with the other attributes. |
| `vocab` | `Vocab` | The vocab object of the parent `Doc`. | | `vocab` | `Vocab` | The vocab object of the parent `Doc`. |
| `tensor` <Tag variant="new">2.1.7</Tag> | `ndarray` | The tokens's slice of the parent `Doc`'s tensor. |
| `head` | `Token` | The syntactic parent, or "governor", of this token. | | `head` | `Token` | The syntactic parent, or "governor", of this token. |
| `left_edge` | `Token` | The leftmost token of this token's syntactic descendants. | | `left_edge` | `Token` | The leftmost token of this token's syntactic descendants. |
| `right_edge` | `Token` | The rightmost token of this token's syntactic descendants. | | `right_edge` | `Token` | The rightmost token of this token's syntactic descendants. |
@ -424,7 +425,7 @@ The L2 norm of the token's vector representation.
| `ent_type` | int | Named entity type. | | `ent_type` | int | Named entity type. |
| `ent_type_` | unicode | Named entity type. | | `ent_type_` | unicode | Named entity type. |
| `ent_iob` | int | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set. | | | `ent_iob` | int | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set. | |
| `ent_iob_` | unicode | IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set. | | `ent_iob_` | unicode | IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set. |
| `ent_id` | int | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. | | `ent_id` | int | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. |
| `ent_id_` | unicode | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. | | `ent_id_` | unicode | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. |
| `lemma` | int | Base form of the token, with no inflectional suffixes. | | `lemma` | int | Base form of the token, with no inflectional suffixes. |