mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 16:07:41 +03:00 
			
		
		
		
	Document Assigned Attributes of Pipeline Components (#9041)
* Add textcat docs * Add NER docs * Add Entity Linker docs * Add assigned fields docs for the tagger This also adds a preamble, since there wasn't one. * Add morphologizer docs * Add dependency parser docs * Update entityrecognizer docs This is a little weird because `Doc.ents` is the only thing assigned to, but it's actually a bidirectional property. * Add token fields for entityrecognizer * Fix section name * Add entity ruler docs * Add lemmatizer docs * Add sentencizer/recognizer docs * Update website/docs/api/entityrecognizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/entityruler.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/tagger.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/entityruler.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update type for Doc.ents This was `Tuple[Span, ...]` everywhere but `Tuple[Span]` seems to be correct. * Run prettier * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Run prettier * Add transformers section This basically just moves and renames the "custom attributes" section from the bottom of the page to be consistent with "assigned attributes" on other pages. I looked at moving the paragraph just above the section into the section, but it includes the unrelated registry additions, so it seemed better to leave it unchanged. * Make table header consistent Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
This commit is contained in:
		
							parent
							
								
									f803a84571
								
							
						
					
					
						commit
						ba6a37d358
					
				|  | @ -555,8 +555,8 @@ consists of either two or three subnetworks: | ||||||
| 
 | 
 | ||||||
| <Accordion title="spacy.TransitionBasedParser.v1 definition" spaced> | <Accordion title="spacy.TransitionBasedParser.v1 definition" spaced> | ||||||
| 
 | 
 | ||||||
| [TransitionBasedParser.v1](/api/legacy#TransitionBasedParser_v1) had the exact same signature,  | [TransitionBasedParser.v1](/api/legacy#TransitionBasedParser_v1) had the exact | ||||||
| but the `use_upper` argument was `True` by default. | same signature, but the `use_upper` argument was `True` by default. | ||||||
| 
 | 
 | ||||||
| </Accordion> | </Accordion> | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
|  | @ -25,6 +25,20 @@ current state. The weights are updated such that the scores assigned to the set | ||||||
| of optimal actions is increased, while scores assigned to other actions are | of optimal actions is increased, while scores assigned to other actions are | ||||||
| decreased. Note that more than one action may be optimal for a given state. | decreased. Note that more than one action may be optimal for a given state. | ||||||
| 
 | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | Dependency predictions are assigned to the `Token.dep` and `Token.head` fields. | ||||||
|  | Beside the dependencies themselves, the parser decides sentence boundaries, | ||||||
|  | which are saved in `Token.is_sent_start` and accessible via `Doc.sents`. | ||||||
|  | 
 | ||||||
|  | | Location              | Value                                                                                                                                         | | ||||||
|  | | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | | ||||||
|  | | `Token.dep`           | The type of dependency relation (hash). ~~int~~                                                                                               | | ||||||
|  | | `Token.dep_`          | The type of dependency relation. ~~str~~                                                                                                      | | ||||||
|  | | `Token.head`          | The syntactic parent, or "governor", of this token. ~~Token~~                                                                                 | | ||||||
|  | | `Token.is_sent_start` | A boolean value indicating whether the token starts a sentence. After the parser runs this will be `True` or `False` for all tokens. ~~bool~~ | | ||||||
|  | | `Doc.sents`           | An iterator over sentences in the `Doc`, determined by `Token.is_sent_start` values. ~~Iterator[Span]~~                                       | | ||||||
|  | 
 | ||||||
| ## Config and implementation {#config} | ## Config and implementation {#config} | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  |  | ||||||
|  | @ -571,9 +571,9 @@ objects, if the entity recognizer has been applied. | ||||||
| > assert ents[0].text == "Mr. Best" | > assert ents[0].text == "Mr. Best" | ||||||
| > ``` | > ``` | ||||||
| 
 | 
 | ||||||
| | Name        | Description                                                           | | | Name        | Description                                                      | | ||||||
| | ----------- | --------------------------------------------------------------------- | | | ----------- | ---------------------------------------------------------------- | | ||||||
| | **RETURNS** | Entities in the document, one `Span` per entity. ~~Tuple[Span, ...]~~ | | | **RETURNS** | Entities in the document, one `Span` per entity. ~~Tuple[Span]~~ | | ||||||
| 
 | 
 | ||||||
| ## Doc.spans {#spans tag="property"} | ## Doc.spans {#spans tag="property"} | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
|  | @ -16,6 +16,16 @@ plausible candidates from that `KnowledgeBase` given a certain textual mention, | ||||||
| and a machine learning model to pick the right candidate, given the local | and a machine learning model to pick the right candidate, given the local | ||||||
| context of the mention. | context of the mention. | ||||||
| 
 | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | Predictions, in the form of knowledge base IDs, will be assigned to | ||||||
|  | `Token.ent_kb_id_`. | ||||||
|  | 
 | ||||||
|  | | Location           | Value                             | | ||||||
|  | | ------------------ | --------------------------------- | | ||||||
|  | | `Token.ent_kb_id`  | Knowledge base ID (hash). ~~int~~ | | ||||||
|  | | `Token.ent_kb_id_` | Knowledge base ID. ~~str~~        | | ||||||
|  | 
 | ||||||
| ## Config and implementation {#config} | ## Config and implementation {#config} | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  |  | ||||||
|  | @ -20,6 +20,24 @@ your entities will be close to their initial tokens. If your entities are long | ||||||
| and characterized by tokens in their middle, the component will likely not be a | and characterized by tokens in their middle, the component will likely not be a | ||||||
| good fit for your task. | good fit for your task. | ||||||
| 
 | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | Predictions will be saved to `Doc.ents` as a tuple. Each label will also be | ||||||
|  | reflected to each underlying token, where it is saved in the `Token.ent_type` | ||||||
|  | and `Token.ent_iob` fields. Note that by definition each token can only have one | ||||||
|  | label. | ||||||
|  | 
 | ||||||
|  | When setting `Doc.ents` to create training data, all the spans must be valid and | ||||||
|  | non-overlapping, or an error will be thrown. | ||||||
|  | 
 | ||||||
|  | | Location          | Value                                                             | | ||||||
|  | | ----------------- | ----------------------------------------------------------------- | | ||||||
|  | | `Doc.ents`        | The annotated spans. ~~Tuple[Span]~~                              | | ||||||
|  | | `Token.ent_iob`   | An enum encoding of the IOB part of the named entity tag. ~~int~~ | | ||||||
|  | | `Token.ent_iob_`  | The IOB part of the named entity tag. ~~str~~                     | | ||||||
|  | | `Token.ent_type`  | The label part of the named entity tag (hash). ~~int~~            | | ||||||
|  | | `Token.ent_type_` | The label part of the named entity tag. ~~str~~                   | | ||||||
|  | 
 | ||||||
| ## Config and implementation {#config} | ## Config and implementation {#config} | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  |  | ||||||
|  | @ -15,6 +15,27 @@ used on its own to implement a purely rule-based entity recognition system. For | ||||||
| usage examples, see the docs on | usage examples, see the docs on | ||||||
| [rule-based entity recognition](/usage/rule-based-matching#entityruler). | [rule-based entity recognition](/usage/rule-based-matching#entityruler). | ||||||
| 
 | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | This component assigns predictions basically the same way as the | ||||||
|  | [`EntityRecognizer`](/api/entityrecognizer). | ||||||
|  | 
 | ||||||
|  | Predictions can be accessed under `Doc.ents` as a tuple. Each label will also be | ||||||
|  | reflected in each underlying token, where it is saved in the `Token.ent_type` | ||||||
|  | and `Token.ent_iob` fields. Note that by definition each token can only have one | ||||||
|  | label. | ||||||
|  | 
 | ||||||
|  | When setting `Doc.ents` to create training data, all the spans must be valid and | ||||||
|  | non-overlapping, or an error will be thrown. | ||||||
|  | 
 | ||||||
|  | | Location          | Value                                                             | | ||||||
|  | | ----------------- | ----------------------------------------------------------------- | | ||||||
|  | | `Doc.ents`        | The annotated spans. ~~Tuple[Span]~~                              | | ||||||
|  | | `Token.ent_iob`   | An enum encoding of the IOB part of the named entity tag. ~~int~~ | | ||||||
|  | | `Token.ent_iob_`  | The IOB part of the named entity tag. ~~str~~                     | | ||||||
|  | | `Token.ent_type`  | The label part of the named entity tag (hash). ~~int~~            | | ||||||
|  | | `Token.ent_type_` | The label part of the named entity tag. ~~str~~                   | | ||||||
|  | 
 | ||||||
| ## Config and implementation {#config} | ## Config and implementation {#config} | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  |  | ||||||
|  | @ -105,7 +105,8 @@ and residual connections. | ||||||
| 
 | 
 | ||||||
| ### spacy.TransitionBasedParser.v1 {#TransitionBasedParser_v1} | ### spacy.TransitionBasedParser.v1 {#TransitionBasedParser_v1} | ||||||
| 
 | 
 | ||||||
| Identical to [`spacy.TransitionBasedParser.v2`](/api/architectures#TransitionBasedParser) | Identical to | ||||||
|  | [`spacy.TransitionBasedParser.v2`](/api/architectures#TransitionBasedParser) | ||||||
| except the `use_upper` was set to `True` by default. | except the `use_upper` was set to `True` by default. | ||||||
| 
 | 
 | ||||||
| ### spacy.TextCatEnsemble.v1 {#TextCatEnsemble_v1} | ### spacy.TextCatEnsemble.v1 {#TextCatEnsemble_v1} | ||||||
|  |  | ||||||
|  | @ -31,6 +31,15 @@ available in the pipeline and runs _before_ the lemmatizer. | ||||||
| 
 | 
 | ||||||
| </Infobox> | </Infobox> | ||||||
| 
 | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | Lemmas generated by rules or predicted will be saved to `Token.lemma`. | ||||||
|  | 
 | ||||||
|  | | Location       | Value                     | | ||||||
|  | | -------------- | ------------------------- | | ||||||
|  | | `Token.lemma`  | The lemma (hash). ~~int~~ | | ||||||
|  | | `Token.lemma_` | The lemma. ~~str~~        | | ||||||
|  | 
 | ||||||
| ## Config and implementation | ## Config and implementation | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  |  | ||||||
|  | @ -15,6 +15,16 @@ coarse-grained POS tags following the Universal Dependencies | ||||||
| [FEATS](https://universaldependencies.org/format.html#morphological-annotation) | [FEATS](https://universaldependencies.org/format.html#morphological-annotation) | ||||||
| annotation guidelines. | annotation guidelines. | ||||||
| 
 | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | Predictions are saved to `Token.morph` and `Token.pos`. | ||||||
|  | 
 | ||||||
|  | | Location      | Value                                     | | ||||||
|  | | ------------- | ----------------------------------------- | | ||||||
|  | | `Token.pos`   | The UPOS part of speech (hash). ~~int~~   | | ||||||
|  | | `Token.pos_`  | The UPOS part of speech. ~~str~~          | | ||||||
|  | | `Token.morph` | Morphological features. ~~MorphAnalysis~~ | | ||||||
|  | 
 | ||||||
| ## Config and implementation {#config} | ## Config and implementation {#config} | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  |  | ||||||
|  | @ -105,11 +105,11 @@ representation. | ||||||
| 
 | 
 | ||||||
| ## Attributes {#attributes} | ## Attributes {#attributes} | ||||||
| 
 | 
 | ||||||
| | Name          | Description                                                                                                                  | | | Name          | Description                                                                                                                    | | ||||||
| | ------------- | ---------------------------------------------------------------------------------------------------------------------------- | ---------- | | | ------------- | ------------------------------------------------------------------------------------------------------------------------------ | | ||||||
| | `FEATURE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) feature separator. Default is `          | `. ~~str~~ | | | `FEATURE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) feature separator. Default is `|`. ~~str~~ | | ||||||
| | `FIELD_SEP`   | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) field separator. Default is `=`. ~~str~~ | | | `FIELD_SEP`   | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) field separator. Default is `=`. ~~str~~   | | ||||||
| | `VALUE_SEP`   | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) value separator. Default is `,`. ~~str~~ | | | `VALUE_SEP`   | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) value separator. Default is `,`. ~~str~~   | | ||||||
| 
 | 
 | ||||||
| ## MorphAnalysis {#morphanalysis tag="class" source="spacy/tokens/morphanalysis.pyx"} | ## MorphAnalysis {#morphanalysis tag="class" source="spacy/tokens/morphanalysis.pyx"} | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
|  | @ -149,8 +149,8 @@ patterns = [nlp("health care reform"), nlp("healthcare reform")] | ||||||
| </Infobox> | </Infobox> | ||||||
| 
 | 
 | ||||||
| | Name           | Description                                                                                                                                                | | | Name           | Description                                                                                                                                                | | ||||||
| | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | --- | | | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||||
| | `match_id`     | An ID for the thing you're matching. ~~str~~                                                                                                               |     | | | `match_id`     | An ID for the thing you're matching. ~~str~~                                                                                                               |  | | ||||||
| | `docs`         | `Doc` objects of the phrases to match. ~~List[Doc]~~                                                                                                       | | | `docs`         | `Doc` objects of the phrases to match. ~~List[Doc]~~                                                                                                       | | ||||||
| | _keyword-only_ |                                                                                                                                                            | | | _keyword-only_ |                                                                                                                                                            | | ||||||
| | `on_match`     | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ | | | `on_match`     | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ | | ||||||
|  |  | ||||||
|  | @ -80,7 +80,7 @@ Docs with `has_unknown_spaces` are skipped during scoring. | ||||||
| > ``` | > ``` | ||||||
| 
 | 
 | ||||||
| | Name        | Description                                                                                                         | | | Name        | Description                                                                                                         | | ||||||
| | ----------- | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | | ----------- | ------------------------------------------------------------------------------------------------------------------- | | ||||||
| | `examples`  | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ | | | `examples`  | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ | | ||||||
| | **RETURNS** | `Dict`                                                                                                              | A dictionary containing the scores `token_acc`, `token_p`, `token_r`, `token_f`. ~~Dict[str, float]]~~ | | | **RETURNS** | `Dict`                                                                                                              | A dictionary containing the scores `token_acc`, `token_p`, `token_r`, `token_f`. ~~Dict[str, float]]~~ | | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
|  | @ -12,6 +12,16 @@ api_trainable: true | ||||||
| A trainable pipeline component for sentence segmentation. For a simpler, | A trainable pipeline component for sentence segmentation. For a simpler, | ||||||
| rule-based strategy, see the [`Sentencizer`](/api/sentencizer). | rule-based strategy, see the [`Sentencizer`](/api/sentencizer). | ||||||
| 
 | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | Predicted values will be assigned to `Token.is_sent_start`. The resulting | ||||||
|  | sentences can be accessed using `Doc.sents`. | ||||||
|  | 
 | ||||||
|  | | Location              | Value                                                                                                                          | | ||||||
|  | | --------------------- | ------------------------------------------------------------------------------------------------------------------------------ | | ||||||
|  | | `Token.is_sent_start` | A boolean value indicating whether the token starts a sentence. This will be either `True` or `False` for all tokens. ~~bool~~ | | ||||||
|  | | `Doc.sents`           | An iterator over sentences in the `Doc`, determined by `Token.is_sent_start` values. ~~Iterator[Span]~~                        | | ||||||
|  | 
 | ||||||
| ## Config and implementation {#config} | ## Config and implementation {#config} | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  |  | ||||||
|  | @ -13,6 +13,16 @@ performed by the [`DependencyParser`](/api/dependencyparser), so the | ||||||
| `Sentencizer` lets you implement a simpler, rule-based strategy that doesn't | `Sentencizer` lets you implement a simpler, rule-based strategy that doesn't | ||||||
| require a statistical model to be loaded. | require a statistical model to be loaded. | ||||||
| 
 | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | Calculated values will be assigned to `Token.is_sent_start`. The resulting | ||||||
|  | sentences can be accessed using `Doc.sents`. | ||||||
|  | 
 | ||||||
|  | | Location              | Value                                                                                                                          | | ||||||
|  | | --------------------- | ------------------------------------------------------------------------------------------------------------------------------ | | ||||||
|  | | `Token.is_sent_start` | A boolean value indicating whether the token starts a sentence. This will be either `True` or `False` for all tokens. ~~bool~~ | | ||||||
|  | | `Doc.sents`           | An iterator over sentences in the `Doc`, determined by `Token.is_sent_start` values. ~~Iterator[Span]~~                        | | ||||||
|  | 
 | ||||||
| ## Config and implementation {#config} | ## Config and implementation {#config} | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  | @ -28,7 +38,7 @@ how the component should be configured. You can override its settings via the | ||||||
| > ``` | > ``` | ||||||
| 
 | 
 | ||||||
| | Setting       | Description                                                                                                                                            | | | Setting       | Description                                                                                                                                            | | ||||||
| | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | | | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | | ||||||
| | `punct_chars` | Optional custom list of punctuation characters that mark sentence ends. See below for defaults if not set. Defaults to `None`. ~~Optional[List[str]]~~ | `None` | | | `punct_chars` | Optional custom list of punctuation characters that mark sentence ends. See below for defaults if not set. Defaults to `None`. ~~Optional[List[str]]~~ | `None` | | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
|  |  | ||||||
|  | @ -8,6 +8,21 @@ api_string_name: tagger | ||||||
| api_trainable: true | api_trainable: true | ||||||
| --- | --- | ||||||
| 
 | 
 | ||||||
|  | A trainable pipeline component to predict part-of-speech tags for any | ||||||
|  | part-of-speech tag set. | ||||||
|  | 
 | ||||||
|  | In the pre-trained pipelines, the tag schemas vary by language; see the | ||||||
|  | [individual model pages](/models) for details. | ||||||
|  | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | Predictions are assigned to `Token.tag`. | ||||||
|  | 
 | ||||||
|  | | Location     | Value                              | | ||||||
|  | | ------------ | ---------------------------------- | | ||||||
|  | | `Token.tag`  | The part of speech (hash). ~~int~~ | | ||||||
|  | | `Token.tag_` | The part of speech. ~~str~~        | | ||||||
|  | 
 | ||||||
| ## Config and implementation {#config} | ## Config and implementation {#config} | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  |  | ||||||
|  | @ -29,6 +29,22 @@ only. | ||||||
| 
 | 
 | ||||||
| </Infobox> | </Infobox> | ||||||
| 
 | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | Predictions will be saved to `doc.cats` as a dictionary, where the key is the | ||||||
|  | name of the category and the value is a score between 0 and 1 (inclusive). For | ||||||
|  | `textcat` (exclusive categories), the scores will sum to 1, while for | ||||||
|  | `textcat_multilabel` there is no particular guarantee about their sum. | ||||||
|  | 
 | ||||||
|  | Note that when assigning values to create training data, the score of each | ||||||
|  | category must be 0 or 1. Using other values, for example to create a document | ||||||
|  | that is a little bit in category A and a little bit in category B, is not | ||||||
|  | supported. | ||||||
|  | 
 | ||||||
|  | | Location   | Value                                 | | ||||||
|  | | ---------- | ------------------------------------- | | ||||||
|  | | `Doc.cats` | Category scores. ~~Dict[str, float]~~ | | ||||||
|  | 
 | ||||||
| ## Config and implementation {#config} | ## Config and implementation {#config} | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  |  | ||||||
|  | @ -38,12 +38,21 @@ attributes. We also calculate an alignment between the word-piece tokens and the | ||||||
| spaCy tokenization, so that we can use the last hidden states to set the | spaCy tokenization, so that we can use the last hidden states to set the | ||||||
| `Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy | `Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy | ||||||
| token, the spaCy token receives the sum of their values. To access the values, | token, the spaCy token receives the sum of their values. To access the values, | ||||||
| you can use the custom [`Doc._.trf_data`](#custom-attributes) attribute. The | you can use the custom [`Doc._.trf_data`](#assigned-attributes) attribute. The | ||||||
| package also adds the function registries [`@span_getters`](#span_getters) and | package also adds the function registries [`@span_getters`](#span_getters) and | ||||||
| [`@annotation_setters`](#annotation_setters) with several built-in registered | [`@annotation_setters`](#annotation_setters) with several built-in registered | ||||||
| functions. For more details, see the | functions. For more details, see the | ||||||
| [usage documentation](/usage/embeddings-transformers). | [usage documentation](/usage/embeddings-transformers). | ||||||
| 
 | 
 | ||||||
|  | ## Assigned Attributes {#assigned-attributes} | ||||||
|  | 
 | ||||||
|  | The component sets the following | ||||||
|  | [custom extension attribute](/usage/processing-pipeline#custom-components-attributes): | ||||||
|  | 
 | ||||||
|  | | Location         | Value                                                                    | | ||||||
|  | | ---------------- | ------------------------------------------------------------------------ | | ||||||
|  | | `Doc._.trf_data` | Transformer tokens and outputs for the `Doc` object. ~~TransformerData~~ | | ||||||
|  | 
 | ||||||
| ## Config and implementation {#config} | ## Config and implementation {#config} | ||||||
| 
 | 
 | ||||||
| The default config is defined by the pipeline component factory and describes | The default config is defined by the pipeline component factory and describes | ||||||
|  | @ -98,7 +107,7 @@ https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/p | ||||||
| Construct a `Transformer` component. One or more subsequent spaCy components can | Construct a `Transformer` component. One or more subsequent spaCy components can | ||||||
| use the transformer outputs as features in its model, with gradients | use the transformer outputs as features in its model, with gradients | ||||||
| backpropagated to the single shared weights. The activations from the | backpropagated to the single shared weights. The activations from the | ||||||
| transformer are saved in the [`Doc._.trf_data`](#custom-attributes) extension | transformer are saved in the [`Doc._.trf_data`](#assigned-attributes) extension | ||||||
| attribute. You can also provide a callback to set additional annotations. In | attribute. You can also provide a callback to set additional annotations. In | ||||||
| your application, you would normally use a shortcut for this and instantiate the | your application, you would normally use a shortcut for this and instantiate the | ||||||
| component using its string name and [`nlp.add_pipe`](/api/language#create_pipe). | component using its string name and [`nlp.add_pipe`](/api/language#create_pipe). | ||||||
|  | @ -205,7 +214,7 @@ modifying them. | ||||||
| 
 | 
 | ||||||
| Assign the extracted features to the `Doc` objects. By default, the | Assign the extracted features to the `Doc` objects. By default, the | ||||||
| [`TransformerData`](/api/transformer#transformerdata) object is written to the | [`TransformerData`](/api/transformer#transformerdata) object is written to the | ||||||
| [`Doc._.trf_data`](#custom-attributes) attribute. Your `set_extra_annotations` | [`Doc._.trf_data`](#assigned-attributes) attribute. Your `set_extra_annotations` | ||||||
| callback is then called, if provided. | callback is then called, if provided. | ||||||
| 
 | 
 | ||||||
| > #### Example | > #### Example | ||||||
|  | @ -383,7 +392,7 @@ are wrapped into the | ||||||
| [FullTransformerBatch](/api/transformer#fulltransformerbatch) object. The | [FullTransformerBatch](/api/transformer#fulltransformerbatch) object. The | ||||||
| `FullTransformerBatch` then splits out the per-document data, which is handled | `FullTransformerBatch` then splits out the per-document data, which is handled | ||||||
| by this class. Instances of this class are typically assigned to the | by this class. Instances of this class are typically assigned to the | ||||||
| [`Doc._.trf_data`](/api/transformer#custom-attributes) extension attribute. | [`Doc._.trf_data`](/api/transformer#assigned-attributes) extension attribute. | ||||||
| 
 | 
 | ||||||
| | Name      | Description                                                                                                                                                                                                                                                                                                                                               | | | Name      | Description                                                                                                                                                                                                                                                                                                                                               | | ||||||
| | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||||
|  | @ -549,12 +558,3 @@ The following built-in functions are available: | ||||||
| | Name                                           | Description                           | | | Name                                           | Description                           | | ||||||
| | ---------------------------------------------- | ------------------------------------- | | | ---------------------------------------------- | ------------------------------------- | | ||||||
| | `spacy-transformers.null_annotation_setter.v1` | Don't set any additional annotations. | | | `spacy-transformers.null_annotation_setter.v1` | Don't set any additional annotations. | | ||||||
| 
 |  | ||||||
| ## Custom attributes {#custom-attributes} |  | ||||||
| 
 |  | ||||||
| The component sets the following |  | ||||||
| [custom extension attributes](/usage/processing-pipeline#custom-components-attributes): |  | ||||||
| 
 |  | ||||||
| | Name             | Description                                                              | |  | ||||||
| | ---------------- | ------------------------------------------------------------------------ | |  | ||||||
| | `Doc._.trf_data` | Transformer tokens and outputs for the `Doc` object. ~~TransformerData~~ | |  | ||||||
|  |  | ||||||
|  | @ -321,7 +321,7 @@ performed in chunks to avoid consuming too much memory. You can set the | ||||||
| > ``` | > ``` | ||||||
| 
 | 
 | ||||||
| | Name           | Description                                                                 | | | Name           | Description                                                                 | | ||||||
| | -------------- | --------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | | | -------------- | --------------------------------------------------------------------------- | | ||||||
| | `queries`      | An array with one or more vectors. ~~numpy.ndarray~~                        | | | `queries`      | An array with one or more vectors. ~~numpy.ndarray~~                        | | ||||||
| | _keyword-only_ |                                                                             | | | _keyword-only_ |                                                                             | | ||||||
| | `batch_size`   | The batch size to use. Default to `1024`. ~~int~~                           | | | `batch_size`   | The batch size to use. Default to `1024`. ~~int~~                           | | ||||||
|  |  | ||||||
|  | @ -21,14 +21,14 @@ Create the vocabulary. | ||||||
| > vocab = Vocab(strings=["hello", "world"]) | > vocab = Vocab(strings=["hello", "world"]) | ||||||
| > ``` | > ``` | ||||||
| 
 | 
 | ||||||
| | Name                                        | Description                                                                                                                                            | | | Name                                        | Description                                                                                                                                             | | ||||||
| | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | | | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||||
| | `lex_attr_getters`                          | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~                     | | | `lex_attr_getters`                          | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~                      | | ||||||
| | `strings`                                   | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~          | | | `strings`                                   | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~           | | ||||||
| | `lookups`                                   | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~                     | | | `lookups`                                   | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~                      | | ||||||
| | `oov_prob`                                  | The default OOV probability. Defaults to `-20.0`. ~~float~~                                                                                            | | | `oov_prob`                                  | The default OOV probability. Defaults to `-20.0`. ~~float~~                                                                                             | | ||||||
| | `vectors_name` <Tag variant="new">2.2</Tag> | A name to identify the vectors table. ~~str~~                                                                                                          | | | `vectors_name` <Tag variant="new">2.2</Tag> | A name to identify the vectors table. ~~str~~                                                                                                           | | ||||||
| | `writing_system`                            | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~         | | | `writing_system`                            | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~          | | ||||||
| | `get_noun_chunks`                           | A function that yields base noun phrases used for [`Doc.noun_chunks`](/api/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ | | | `get_noun_chunks`                           | A function that yields base noun phrases used for [`Doc.noun_chunks`](/api/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ | | ||||||
| 
 | 
 | ||||||
| ## Vocab.\_\_len\_\_ {#len tag="method"} | ## Vocab.\_\_len\_\_ {#len tag="method"} | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue
	
	Block a user