mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 16:07:41 +03:00 
			
		
		
		
	Document Assigned Attributes of Pipeline Components (#9041)
* Add textcat docs * Add NER docs * Add Entity Linker docs * Add assigned fields docs for the tagger This also adds a preamble, since there wasn't one. * Add morphologizer docs * Add dependency parser docs * Update entityrecognizer docs This is a little weird because `Doc.ents` is the only thing assigned to, but it's actually a bidirectional property. * Add token fields for entityrecognizer * Fix section name * Add entity ruler docs * Add lemmatizer docs * Add sentencizer/recognizer docs * Update website/docs/api/entityrecognizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/entityruler.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/tagger.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/entityruler.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update type for Doc.ents This was `Tuple[Span, ...]` everywhere but `Tuple[Span]` seems to be correct. * Run prettier * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Run prettier * Add transformers section This basically just moves and renames the "custom attributes" section from the bottom of the page to be consistent with "assigned attributes" on other pages. I looked at moving the paragraph just above the section into the section, but it includes the unrelated registry additions, so it seemed better to leave it unchanged. * Make table header consistent Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
This commit is contained in:
		
							parent
							
								
									f803a84571
								
							
						
					
					
						commit
						ba6a37d358
					
				|  | @ -555,8 +555,8 @@ consists of either two or three subnetworks: | |||
| 
 | ||||
| <Accordion title="spacy.TransitionBasedParser.v1 definition" spaced> | ||||
| 
 | ||||
| [TransitionBasedParser.v1](/api/legacy#TransitionBasedParser_v1) had the exact same signature,  | ||||
| but the `use_upper` argument was `True` by default. | ||||
| [TransitionBasedParser.v1](/api/legacy#TransitionBasedParser_v1) had the exact | ||||
| same signature, but the `use_upper` argument was `True` by default. | ||||
| 
 | ||||
| </Accordion> | ||||
| 
 | ||||
|  |  | |||
|  | @ -25,6 +25,20 @@ current state. The weights are updated such that the scores assigned to the set | |||
| of optimal actions is increased, while scores assigned to other actions are | ||||
| decreased. Note that more than one action may be optimal for a given state. | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| Dependency predictions are assigned to the `Token.dep` and `Token.head` fields. | ||||
| Beside the dependencies themselves, the parser decides sentence boundaries, | ||||
| which are saved in `Token.is_sent_start` and accessible via `Doc.sents`. | ||||
| 
 | ||||
| | Location              | Value                                                                                                                                         | | ||||
| | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `Token.dep`           | The type of dependency relation (hash). ~~int~~                                                                                               | | ||||
| | `Token.dep_`          | The type of dependency relation. ~~str~~                                                                                                      | | ||||
| | `Token.head`          | The syntactic parent, or "governor", of this token. ~~Token~~                                                                                 | | ||||
| | `Token.is_sent_start` | A boolean value indicating whether the token starts a sentence. After the parser runs this will be `True` or `False` for all tokens. ~~bool~~ | | ||||
| | `Doc.sents`           | An iterator over sentences in the `Doc`, determined by `Token.is_sent_start` values. ~~Iterator[Span]~~                                       | | ||||
| 
 | ||||
| ## Config and implementation {#config} | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  |  | |||
|  | @ -571,9 +571,9 @@ objects, if the entity recognizer has been applied. | |||
| > assert ents[0].text == "Mr. Best" | ||||
| > ``` | ||||
| 
 | ||||
| | Name        | Description                                                           | | ||||
| | ----------- | --------------------------------------------------------------------- | | ||||
| | **RETURNS** | Entities in the document, one `Span` per entity. ~~Tuple[Span, ...]~~ | | ||||
| | Name        | Description                                                      | | ||||
| | ----------- | ---------------------------------------------------------------- | | ||||
| | **RETURNS** | Entities in the document, one `Span` per entity. ~~Tuple[Span]~~ | | ||||
| 
 | ||||
| ## Doc.spans {#spans tag="property"} | ||||
| 
 | ||||
|  |  | |||
|  | @ -16,6 +16,16 @@ plausible candidates from that `KnowledgeBase` given a certain textual mention, | |||
| and a machine learning model to pick the right candidate, given the local | ||||
| context of the mention. | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| Predictions, in the form of knowledge base IDs, will be assigned to | ||||
| `Token.ent_kb_id_`. | ||||
| 
 | ||||
| | Location           | Value                             | | ||||
| | ------------------ | --------------------------------- | | ||||
| | `Token.ent_kb_id`  | Knowledge base ID (hash). ~~int~~ | | ||||
| | `Token.ent_kb_id_` | Knowledge base ID. ~~str~~        | | ||||
| 
 | ||||
| ## Config and implementation {#config} | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  |  | |||
|  | @ -20,6 +20,24 @@ your entities will be close to their initial tokens. If your entities are long | |||
| and characterized by tokens in their middle, the component will likely not be a | ||||
| good fit for your task. | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| Predictions will be saved to `Doc.ents` as a tuple. Each label will also be | ||||
| reflected to each underlying token, where it is saved in the `Token.ent_type` | ||||
| and `Token.ent_iob` fields. Note that by definition each token can only have one | ||||
| label. | ||||
| 
 | ||||
| When setting `Doc.ents` to create training data, all the spans must be valid and | ||||
| non-overlapping, or an error will be thrown. | ||||
| 
 | ||||
| | Location          | Value                                                             | | ||||
| | ----------------- | ----------------------------------------------------------------- | | ||||
| | `Doc.ents`        | The annotated spans. ~~Tuple[Span]~~                              | | ||||
| | `Token.ent_iob`   | An enum encoding of the IOB part of the named entity tag. ~~int~~ | | ||||
| | `Token.ent_iob_`  | The IOB part of the named entity tag. ~~str~~                     | | ||||
| | `Token.ent_type`  | The label part of the named entity tag (hash). ~~int~~            | | ||||
| | `Token.ent_type_` | The label part of the named entity tag. ~~str~~                   | | ||||
| 
 | ||||
| ## Config and implementation {#config} | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  |  | |||
|  | @ -15,6 +15,27 @@ used on its own to implement a purely rule-based entity recognition system. For | |||
| usage examples, see the docs on | ||||
| [rule-based entity recognition](/usage/rule-based-matching#entityruler). | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| This component assigns predictions basically the same way as the | ||||
| [`EntityRecognizer`](/api/entityrecognizer). | ||||
| 
 | ||||
| Predictions can be accessed under `Doc.ents` as a tuple. Each label will also be | ||||
| reflected in each underlying token, where it is saved in the `Token.ent_type` | ||||
| and `Token.ent_iob` fields. Note that by definition each token can only have one | ||||
| label. | ||||
| 
 | ||||
| When setting `Doc.ents` to create training data, all the spans must be valid and | ||||
| non-overlapping, or an error will be thrown. | ||||
| 
 | ||||
| | Location          | Value                                                             | | ||||
| | ----------------- | ----------------------------------------------------------------- | | ||||
| | `Doc.ents`        | The annotated spans. ~~Tuple[Span]~~                              | | ||||
| | `Token.ent_iob`   | An enum encoding of the IOB part of the named entity tag. ~~int~~ | | ||||
| | `Token.ent_iob_`  | The IOB part of the named entity tag. ~~str~~                     | | ||||
| | `Token.ent_type`  | The label part of the named entity tag (hash). ~~int~~            | | ||||
| | `Token.ent_type_` | The label part of the named entity tag. ~~str~~                   | | ||||
| 
 | ||||
| ## Config and implementation {#config} | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  |  | |||
|  | @ -105,7 +105,8 @@ and residual connections. | |||
| 
 | ||||
| ### spacy.TransitionBasedParser.v1 {#TransitionBasedParser_v1} | ||||
| 
 | ||||
| Identical to [`spacy.TransitionBasedParser.v2`](/api/architectures#TransitionBasedParser) | ||||
| Identical to | ||||
| [`spacy.TransitionBasedParser.v2`](/api/architectures#TransitionBasedParser) | ||||
| except the `use_upper` was set to `True` by default. | ||||
| 
 | ||||
| ### spacy.TextCatEnsemble.v1 {#TextCatEnsemble_v1} | ||||
|  |  | |||
|  | @ -31,6 +31,15 @@ available in the pipeline and runs _before_ the lemmatizer. | |||
| 
 | ||||
| </Infobox> | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| Lemmas generated by rules or predicted will be saved to `Token.lemma`. | ||||
| 
 | ||||
| | Location       | Value                     | | ||||
| | -------------- | ------------------------- | | ||||
| | `Token.lemma`  | The lemma (hash). ~~int~~ | | ||||
| | `Token.lemma_` | The lemma. ~~str~~        | | ||||
| 
 | ||||
| ## Config and implementation | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  |  | |||
|  | @ -15,6 +15,16 @@ coarse-grained POS tags following the Universal Dependencies | |||
| [FEATS](https://universaldependencies.org/format.html#morphological-annotation) | ||||
| annotation guidelines. | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| Predictions are saved to `Token.morph` and `Token.pos`. | ||||
| 
 | ||||
| | Location      | Value                                     | | ||||
| | ------------- | ----------------------------------------- | | ||||
| | `Token.pos`   | The UPOS part of speech (hash). ~~int~~   | | ||||
| | `Token.pos_`  | The UPOS part of speech. ~~str~~          | | ||||
| | `Token.morph` | Morphological features. ~~MorphAnalysis~~ | | ||||
| 
 | ||||
| ## Config and implementation {#config} | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  |  | |||
|  | @ -105,11 +105,11 @@ representation. | |||
| 
 | ||||
| ## Attributes {#attributes} | ||||
| 
 | ||||
| | Name          | Description                                                                                                                  | | ||||
| | ------------- | ---------------------------------------------------------------------------------------------------------------------------- | ---------- | | ||||
| | `FEATURE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) feature separator. Default is `          | `. ~~str~~ | | ||||
| | `FIELD_SEP`   | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) field separator. Default is `=`. ~~str~~ | | ||||
| | `VALUE_SEP`   | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) value separator. Default is `,`. ~~str~~ | | ||||
| | Name          | Description                                                                                                                    | | ||||
| | ------------- | ------------------------------------------------------------------------------------------------------------------------------ | | ||||
| | `FEATURE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) feature separator. Default is `|`. ~~str~~ | | ||||
| | `FIELD_SEP`   | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) field separator. Default is `=`. ~~str~~   | | ||||
| | `VALUE_SEP`   | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) value separator. Default is `,`. ~~str~~   | | ||||
| 
 | ||||
| ## MorphAnalysis {#morphanalysis tag="class" source="spacy/tokens/morphanalysis.pyx"} | ||||
| 
 | ||||
|  |  | |||
|  | @ -149,8 +149,8 @@ patterns = [nlp("health care reform"), nlp("healthcare reform")] | |||
| </Infobox> | ||||
| 
 | ||||
| | Name           | Description                                                                                                                                                | | ||||
| | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | --- | | ||||
| | `match_id`     | An ID for the thing you're matching. ~~str~~                                                                                                               |     | | ||||
| | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `match_id`     | An ID for the thing you're matching. ~~str~~                                                                                                               |  | | ||||
| | `docs`         | `Doc` objects of the phrases to match. ~~List[Doc]~~                                                                                                       | | ||||
| | _keyword-only_ |                                                                                                                                                            | | ||||
| | `on_match`     | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ | | ||||
|  |  | |||
|  | @ -80,7 +80,7 @@ Docs with `has_unknown_spaces` are skipped during scoring. | |||
| > ``` | ||||
| 
 | ||||
| | Name        | Description                                                                                                         | | ||||
| | ----------- | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | ||||
| | ----------- | ------------------------------------------------------------------------------------------------------------------- | | ||||
| | `examples`  | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ | | ||||
| | **RETURNS** | `Dict`                                                                                                              | A dictionary containing the scores `token_acc`, `token_p`, `token_r`, `token_f`. ~~Dict[str, float]]~~ | | ||||
| 
 | ||||
|  |  | |||
|  | @ -12,6 +12,16 @@ api_trainable: true | |||
| A trainable pipeline component for sentence segmentation. For a simpler, | ||||
| rule-based strategy, see the [`Sentencizer`](/api/sentencizer). | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| Predicted values will be assigned to `Token.is_sent_start`. The resulting | ||||
| sentences can be accessed using `Doc.sents`. | ||||
| 
 | ||||
| | Location              | Value                                                                                                                          | | ||||
| | --------------------- | ------------------------------------------------------------------------------------------------------------------------------ | | ||||
| | `Token.is_sent_start` | A boolean value indicating whether the token starts a sentence. This will be either `True` or `False` for all tokens. ~~bool~~ | | ||||
| | `Doc.sents`           | An iterator over sentences in the `Doc`, determined by `Token.is_sent_start` values. ~~Iterator[Span]~~                        | | ||||
| 
 | ||||
| ## Config and implementation {#config} | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  |  | |||
|  | @ -13,6 +13,16 @@ performed by the [`DependencyParser`](/api/dependencyparser), so the | |||
| `Sentencizer` lets you implement a simpler, rule-based strategy that doesn't | ||||
| require a statistical model to be loaded. | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| Calculated values will be assigned to `Token.is_sent_start`. The resulting | ||||
| sentences can be accessed using `Doc.sents`. | ||||
| 
 | ||||
| | Location              | Value                                                                                                                          | | ||||
| | --------------------- | ------------------------------------------------------------------------------------------------------------------------------ | | ||||
| | `Token.is_sent_start` | A boolean value indicating whether the token starts a sentence. This will be either `True` or `False` for all tokens. ~~bool~~ | | ||||
| | `Doc.sents`           | An iterator over sentences in the `Doc`, determined by `Token.is_sent_start` values. ~~Iterator[Span]~~                        | | ||||
| 
 | ||||
| ## Config and implementation {#config} | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  | @ -28,7 +38,7 @@ how the component should be configured. You can override its settings via the | |||
| > ``` | ||||
| 
 | ||||
| | Setting       | Description                                                                                                                                            | | ||||
| | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | | ||||
| | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | | ||||
| | `punct_chars` | Optional custom list of punctuation characters that mark sentence ends. See below for defaults if not set. Defaults to `None`. ~~Optional[List[str]]~~ | `None` | | ||||
| 
 | ||||
| ```python | ||||
|  |  | |||
|  | @ -8,6 +8,21 @@ api_string_name: tagger | |||
| api_trainable: true | ||||
| --- | ||||
| 
 | ||||
| A trainable pipeline component to predict part-of-speech tags for any | ||||
| part-of-speech tag set. | ||||
| 
 | ||||
| In the pre-trained pipelines, the tag schemas vary by language; see the | ||||
| [individual model pages](/models) for details. | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| Predictions are assigned to `Token.tag`. | ||||
| 
 | ||||
| | Location     | Value                              | | ||||
| | ------------ | ---------------------------------- | | ||||
| | `Token.tag`  | The part of speech (hash). ~~int~~ | | ||||
| | `Token.tag_` | The part of speech. ~~str~~        | | ||||
| 
 | ||||
| ## Config and implementation {#config} | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  |  | |||
|  | @ -29,6 +29,22 @@ only. | |||
| 
 | ||||
| </Infobox> | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| Predictions will be saved to `doc.cats` as a dictionary, where the key is the | ||||
| name of the category and the value is a score between 0 and 1 (inclusive). For | ||||
| `textcat` (exclusive categories), the scores will sum to 1, while for | ||||
| `textcat_multilabel` there is no particular guarantee about their sum. | ||||
| 
 | ||||
| Note that when assigning values to create training data, the score of each | ||||
| category must be 0 or 1. Using other values, for example to create a document | ||||
| that is a little bit in category A and a little bit in category B, is not | ||||
| supported. | ||||
| 
 | ||||
| | Location   | Value                                 | | ||||
| | ---------- | ------------------------------------- | | ||||
| | `Doc.cats` | Category scores. ~~Dict[str, float]~~ | | ||||
| 
 | ||||
| ## Config and implementation {#config} | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  |  | |||
|  | @ -38,12 +38,21 @@ attributes. We also calculate an alignment between the word-piece tokens and the | |||
| spaCy tokenization, so that we can use the last hidden states to set the | ||||
| `Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy | ||||
| token, the spaCy token receives the sum of their values. To access the values, | ||||
| you can use the custom [`Doc._.trf_data`](#custom-attributes) attribute. The | ||||
| you can use the custom [`Doc._.trf_data`](#assigned-attributes) attribute. The | ||||
| package also adds the function registries [`@span_getters`](#span_getters) and | ||||
| [`@annotation_setters`](#annotation_setters) with several built-in registered | ||||
| functions. For more details, see the | ||||
| [usage documentation](/usage/embeddings-transformers). | ||||
| 
 | ||||
| ## Assigned Attributes {#assigned-attributes} | ||||
| 
 | ||||
| The component sets the following | ||||
| [custom extension attribute](/usage/processing-pipeline#custom-components-attributes): | ||||
| 
 | ||||
| | Location         | Value                                                                    | | ||||
| | ---------------- | ------------------------------------------------------------------------ | | ||||
| | `Doc._.trf_data` | Transformer tokens and outputs for the `Doc` object. ~~TransformerData~~ | | ||||
| 
 | ||||
| ## Config and implementation {#config} | ||||
| 
 | ||||
| The default config is defined by the pipeline component factory and describes | ||||
|  | @ -98,7 +107,7 @@ https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/p | |||
| Construct a `Transformer` component. One or more subsequent spaCy components can | ||||
| use the transformer outputs as features in its model, with gradients | ||||
| backpropagated to the single shared weights. The activations from the | ||||
| transformer are saved in the [`Doc._.trf_data`](#custom-attributes) extension | ||||
| transformer are saved in the [`Doc._.trf_data`](#assigned-attributes) extension | ||||
| attribute. You can also provide a callback to set additional annotations. In | ||||
| your application, you would normally use a shortcut for this and instantiate the | ||||
| component using its string name and [`nlp.add_pipe`](/api/language#create_pipe). | ||||
|  | @ -205,7 +214,7 @@ modifying them. | |||
| 
 | ||||
| Assign the extracted features to the `Doc` objects. By default, the | ||||
| [`TransformerData`](/api/transformer#transformerdata) object is written to the | ||||
| [`Doc._.trf_data`](#custom-attributes) attribute. Your `set_extra_annotations` | ||||
| [`Doc._.trf_data`](#assigned-attributes) attribute. Your `set_extra_annotations` | ||||
| callback is then called, if provided. | ||||
| 
 | ||||
| > #### Example | ||||
|  | @ -383,7 +392,7 @@ are wrapped into the | |||
| [FullTransformerBatch](/api/transformer#fulltransformerbatch) object. The | ||||
| `FullTransformerBatch` then splits out the per-document data, which is handled | ||||
| by this class. Instances of this class are typically assigned to the | ||||
| [`Doc._.trf_data`](/api/transformer#custom-attributes) extension attribute. | ||||
| [`Doc._.trf_data`](/api/transformer#assigned-attributes) extension attribute. | ||||
| 
 | ||||
| | Name      | Description                                                                                                                                                                                                                                                                                                                                               | | ||||
| | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
|  | @ -549,12 +558,3 @@ The following built-in functions are available: | |||
| | Name                                           | Description                           | | ||||
| | ---------------------------------------------- | ------------------------------------- | | ||||
| | `spacy-transformers.null_annotation_setter.v1` | Don't set any additional annotations. | | ||||
| 
 | ||||
| ## Custom attributes {#custom-attributes} | ||||
| 
 | ||||
| The component sets the following | ||||
| [custom extension attributes](/usage/processing-pipeline#custom-components-attributes): | ||||
| 
 | ||||
| | Name             | Description                                                              | | ||||
| | ---------------- | ------------------------------------------------------------------------ | | ||||
| | `Doc._.trf_data` | Transformer tokens and outputs for the `Doc` object. ~~TransformerData~~ | | ||||
|  |  | |||
|  | @ -321,7 +321,7 @@ performed in chunks to avoid consuming too much memory. You can set the | |||
| > ``` | ||||
| 
 | ||||
| | Name           | Description                                                                 | | ||||
| | -------------- | --------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | | ||||
| | -------------- | --------------------------------------------------------------------------- | | ||||
| | `queries`      | An array with one or more vectors. ~~numpy.ndarray~~                        | | ||||
| | _keyword-only_ |                                                                             | | ||||
| | `batch_size`   | The batch size to use. Default to `1024`. ~~int~~                           | | ||||
|  |  | |||
|  | @ -21,14 +21,14 @@ Create the vocabulary. | |||
| > vocab = Vocab(strings=["hello", "world"]) | ||||
| > ``` | ||||
| 
 | ||||
| | Name                                        | Description                                                                                                                                            | | ||||
| | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | | ||||
| | `lex_attr_getters`                          | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~                     | | ||||
| | `strings`                                   | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~          | | ||||
| | `lookups`                                   | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~                     | | ||||
| | `oov_prob`                                  | The default OOV probability. Defaults to `-20.0`. ~~float~~                                                                                            | | ||||
| | `vectors_name` <Tag variant="new">2.2</Tag> | A name to identify the vectors table. ~~str~~                                                                                                          | | ||||
| | `writing_system`                            | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~         | | ||||
| | Name                                        | Description                                                                                                                                             | | ||||
| | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `lex_attr_getters`                          | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~                      | | ||||
| | `strings`                                   | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~           | | ||||
| | `lookups`                                   | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~                      | | ||||
| | `oov_prob`                                  | The default OOV probability. Defaults to `-20.0`. ~~float~~                                                                                             | | ||||
| | `vectors_name` <Tag variant="new">2.2</Tag> | A name to identify the vectors table. ~~str~~                                                                                                           | | ||||
| | `writing_system`                            | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~          | | ||||
| | `get_noun_chunks`                           | A function that yields base noun phrases used for [`Doc.noun_chunks`](/api/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ | | ||||
| 
 | ||||
| ## Vocab.\_\_len\_\_ {#len tag="method"} | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue
	
	Block a user