Remove references to textcat spans (#5360)

Remove references to unimplemented `TextCategorizer` span labels in `GoldParse` and `Doc`.
2026-02-14 11:10:40 +03:00 · 2020-04-27 18:01:12 +02:00 · 2020-04-27 18:01:12 +02:00 · 792aa7b6ab
commit 792aa7b6ab
parent f8ac5b9f56
2 changed files with 8 additions and 10 deletions
--- a/website/docs/api/doc.md
+++ b/website/docs/api/doc.md
@ -653,7 +653,7 @@ The L2 norm of the document's vector representation.
 | `mem`                                   | `Pool`       | The document's local memory heap, for all C data it owns.                                                                                                                                                                                                                                  |
 | `vocab`                                 | `Vocab`      | The store of lexical types.                                                                                                                                                                                                                                                                |
 | `tensor` <Tag variant="new">2</Tag>     | `ndarray`    | Container for dense vector representations.                                                                                                                                                                                                                                                |
-| `cats` <Tag variant="new">2</Tag>       | dictionary   | Maps either a label to a score for categories applied to whole document, or `(start_char, end_char, label)` to score for categories applied to spans. `start_char` and `end_char` should be character offsets, label can be either a string or an integer ID, and score should be a float. |
+| `cats` <Tag variant="new">2</Tag>       | dict         | Maps a label to a score for categories applied to the document. The label is a string and the score should be a float.                                                                                     |
 | `user_data`                             | -            | A generic storage area, for user custom data.                                                                                                                                                                                                                                              |
 | `lang` <Tag variant="new">2.1</Tag>     | int          | Language of the document's vocabulary.                                                                                                                                                                                                                                                     |
 | `lang_` <Tag variant="new">2.1</Tag>    | unicode      | Language of the document's vocabulary.                                                                                                                                                                                                                                                     |
--- a/website/docs/api/goldparse.md
+++ b/website/docs/api/goldparse.md
@ -7,12 +7,10 @@ source: spacy/gold.pyx

 ## GoldParse.\_\_init\_\_ {#init tag="method"}

-Create a `GoldParse`. Unlike annotations in `entities`, label annotations in
-`cats` can overlap, i.e. a single word can be covered by multiple labelled
-spans. The [`TextCategorizer`](/api/textcategorizer) component expects true
-examples of a label to have the value `1.0`, and negative examples of a label to
-have the value `0.0`. Labels not in the dictionary are treated as missing – the
-gradient for those labels will be zero.
+Create a `GoldParse`. The [`TextCategorizer`](/api/textcategorizer) component
+expects true examples of a label to have the value `1.0`, and negative examples
+of a label to have the value `0.0`. Labels not in the dictionary are treated as
+missing – the gradient for those labels will be zero.

 | Name        | Type        | Description                                                                                                                                                                                                                            |
 | ----------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@ -22,8 +20,8 @@ gradient for those labels will be zero.
 | `heads`     | iterable    | A sequence of integers, representing syntactic head offsets.                                                                                                                                                                           |
 | `deps`      | iterable    | A sequence of strings, representing the syntactic relation types.                                                                                                                                                                      |
 | `entities`  | iterable    | A sequence of named entity annotations, either as BILUO tag strings, or as `(start_char, end_char, label)` tuples, representing the entity positions. If BILUO tag strings, you can specify missing values by setting the tag to None. |
-| `cats`      | dict        | Labels for text classification. Each key in the dictionary may be a string or an int, or a `(start_char, end_char, label)` tuple, indicating that the label is applied to only part of the document (usually a sentence).              |
-| `links`     | dict        | Labels for entity linking. A dict with `(start_char, end_char)` keys, and the values being dicts with `kb_id:value` entries, representing external KB IDs mapped to either 1.0 (positive) or 0.0 (negative).                           |
+| `cats`      | dict        | Labels for text classification. Each key in the dictionary is a string label for the category and each value is `1.0` (positive) or `0.0` (negative).                                                                                  |
+| `links`     | dict        | Labels for entity linking. A dict with `(start_char, end_char)` keys, and the values being dicts with `kb_id:value` entries, representing external KB IDs mapped to either `1.0` (positive) or `0.0` (negative).                       |
 | **RETURNS** | `GoldParse` | The newly constructed object.                                                                                                                                                                                                          |

 ## GoldParse.\_\_len\_\_ {#len tag="method"}
@ -53,7 +51,7 @@ Whether the provided syntactic annotations form a projective dependency tree.
 | `ner`                                | list | The named entity annotations as BILUO tags.                                                                                                              |
 | `cand_to_gold`                       | list | The alignment from candidate tokenization to gold tokenization.                                                                                          |
 | `gold_to_cand`                       | list | The alignment from gold tokenization to candidate tokenization.                                                                                          |
-| `cats` <Tag variant="new">2</Tag>    | list | Entries in the list should be either a label, or a `(start, end, label)` triple. The tuple form is used for categories applied to spans of the document. |
+| `cats` <Tag variant="new">2</Tag>    | dict | Keys in the dictionary are string category labels with values `1.0` or `0.0`.                                                                            |
 | `links` <Tag variant="new">2.2</Tag> | dict | Keys in the dictionary are `(start_char, end_char)` triples, and the values are dictionaries with `kb_id:value` entries.                                 |

 ## Utilities {#util}