From 0f76e0022d7907f28f0ec82fb1ca7856322c64e5 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Thu, 1 Aug 2019 18:37:09 +0200 Subject: [PATCH] Update .tensor docs [ci skip] --- website/docs/api/doc.md | 2 +- website/docs/api/span.md | 39 ++++++++++++++++++++------------------- website/docs/api/token.md | 3 ++- 3 files changed, 23 insertions(+), 21 deletions(-) diff --git a/website/docs/api/doc.md b/website/docs/api/doc.md index b1306ef91..431d3a092 100644 --- a/website/docs/api/doc.md +++ b/website/docs/api/doc.md @@ -651,7 +651,7 @@ The L2 norm of the document's vector representation. | `text_with_ws` | unicode | An alias of `Doc.text`, provided for duck-type compatibility with `Span` and `Token`. | | `mem` | `Pool` | The document's local memory heap, for all C data it owns. | | `vocab` | `Vocab` | The store of lexical types. | -| `tensor` 2 | object | Container for dense vector representations. | +| `tensor` 2 | `ndarray` | Container for dense vector representations. | | `cats` 2 | dictionary | Maps either a label to a score for categories applied to whole document, or `(start_char, end_char, label)` to score for categories applied to spans. `start_char` and `end_char` should be character offsets, label can be either a string or an integer ID, and score should be a float. | | `user_data` | - | A generic storage area, for user custom data. | | `lang` 2.1 | int | Language of the document's vocabulary. | diff --git a/website/docs/api/span.md b/website/docs/api/span.md index 7187a32a3..524ec412d 100644 --- a/website/docs/api/span.md +++ b/website/docs/api/span.md @@ -463,22 +463,23 @@ The L2 norm of the span's vector representation. ## Attributes {#attributes} -| Name | Type | Description | -| -------------- | ------------ | -------------------------------------------------------------------------------------------------------------- | -| `doc` | `Doc` | The parent document. | -| `sent` | `Span` | The sentence span that this span is a part of. | -| `start` | int | The token offset for the start of the span. | -| `end` | int | The token offset for the end of the span. | -| `start_char` | int | The character offset for the start of the span. | -| `end_char` | int | The character offset for the end of the span. | -| `text` | unicode | A unicode representation of the span text. | -| `text_with_ws` | unicode | The text content of the span with a trailing whitespace character if the last token has one. | -| `orth` | int | ID of the verbatim text content. | -| `orth_` | unicode | Verbatim text content (identical to `Span.text`). Exists mostly for consistency with the other attributes. | -| `label` | int | The span's label. | -| `label_` | unicode | The span's label. | -| `lemma_` | unicode | The span's lemma. | -| `ent_id` | int | The hash value of the named entity the token is an instance of. | -| `ent_id_` | unicode | The string ID of the named entity the token is an instance of. | -| `sentiment` | float | A scalar value indicating the positivity or negativity of the span. | -| `_` | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes). | +| Name | Type | Description | +| --------------------------------------- | ------------ | -------------------------------------------------------------------------------------------------------------- | +| `doc` | `Doc` | The parent document. | +| `tensor` 2.1.7 | `ndarray` | The span's slice of the parent `Doc`'s tensor. | +| `sent` | `Span` | The sentence span that this span is a part of. | +| `start` | int | The token offset for the start of the span. | +| `end` | int | The token offset for the end of the span. | +| `start_char` | int | The character offset for the start of the span. | +| `end_char` | int | The character offset for the end of the span. | +| `text` | unicode | A unicode representation of the span text. | +| `text_with_ws` | unicode | The text content of the span with a trailing whitespace character if the last token has one. | +| `orth` | int | ID of the verbatim text content. | +| `orth_` | unicode | Verbatim text content (identical to `Span.text`). Exists mostly for consistency with the other attributes. | +| `label` | int | The span's label. | +| `label_` | unicode | The span's label. | +| `lemma_` | unicode | The span's lemma. | +| `ent_id` | int | The hash value of the named entity the token is an instance of. | +| `ent_id_` | unicode | The string ID of the named entity the token is an instance of. | +| `sentiment` | float | A scalar value indicating the positivity or negativity of the span. | +| `_` | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes). | diff --git a/website/docs/api/token.md b/website/docs/api/token.md index 592a9cca5..78e7513c3 100644 --- a/website/docs/api/token.md +++ b/website/docs/api/token.md @@ -417,6 +417,7 @@ The L2 norm of the token's vector representation. | `orth` | int | ID of the verbatim text content. | | `orth_` | unicode | Verbatim text content (identical to `Token.text`). Exists mostly for consistency with the other attributes. | | `vocab` | `Vocab` | The vocab object of the parent `Doc`. | +| `tensor` 2.1.7 | `ndarray` | The tokens's slice of the parent `Doc`'s tensor. | | `head` | `Token` | The syntactic parent, or "governor", of this token. | | `left_edge` | `Token` | The leftmost token of this token's syntactic descendants. | | `right_edge` | `Token` | The rightmost token of this token's syntactic descendants. | @@ -424,7 +425,7 @@ The L2 norm of the token's vector representation. | `ent_type` | int | Named entity type. | | `ent_type_` | unicode | Named entity type. | | `ent_iob` | int | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set. | | -| `ent_iob_` | unicode | IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set. | +| `ent_iob_` | unicode | IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set. | | `ent_id` | int | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. | | `ent_id_` | unicode | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. | | `lemma` | int | Base form of the token, with no inflectional suffixes. |