Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`
This commit is contained in:
Marcus Blättermann 2022-11-14 20:53:09 +01:00
parent 5f5d09f9dc
commit 94aa3629bb
No known key found for this signature in database
GPG Key ID: A1E1F04008AC450D
49 changed files with 225 additions and 225 deletions

View File

@ -2,7 +2,7 @@
title: AttributeRuler
tag: class
source: spacy/pipeline/attributeruler.py
new: 3
version: 3
teaser: 'Pipeline component for rule-based token attribute assignment'
api_string_name: attribute_ruler
api_trainable: false

View File

@ -87,7 +87,7 @@ $ python -m spacy info [model] [--markdown] [--silent] [--exclude]
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **PRINTS** | Information about your spaCy installation. |
## validate {id="validate",new="2",tag="command"}
## validate {id="validate",version="2",tag="command"}
Find all trained pipeline packages installed in the current environment and
check whether they are compatible with the currently installed version of spaCy.
@ -110,12 +110,12 @@ $ python -m spacy validate
| ---------- | -------------------------------------------------------------------- |
| **PRINTS** | Details about the compatibility of your installed pipeline packages. |
## init {id="init",new="3"}
## init {id="init",version="3"}
The `spacy init` CLI includes helpful commands for initializing training config
files and pipeline directories.
### init config {id="init-config",new="3",tag="command"}
### init config {id="init-config",version="3",tag="command"}
Initialize and save a [`config.cfg` file](/usage/training#config) using the
**recommended settings** for your use case. It works just like the
@ -147,7 +147,7 @@ $ python -m spacy init config [output_file] [--lang] [--pipeline] [--optimize] [
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | The config file for training. |
### init fill-config {id="init-fill-config",new="3"}
### init fill-config {id="init-fill-config",version="3"}
Auto-fill a partial [.cfg file](/usage/training#config) with **all default
values**, e.g. a config generated with the
@ -183,7 +183,7 @@ $ python -m spacy init fill-config [base_path] [output_file] [--diff]
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | Complete and auto-filled config file for training. |
### init vectors {id="init-vectors",new="3",tag="command"}
### init vectors {id="init-vectors",version="3",tag="command"}
Convert [word vectors](/usage/linguistic-features#vectors-similarity) for use
with spaCy. Will export an `nlp` object that you can use in the
@ -215,7 +215,7 @@ $ python -m spacy init vectors [lang] [vectors_loc] [output_dir] [--prune] [--tr
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | A spaCy pipeline directory containing the vocab and vectors. |
### init labels {id="init-labels",new="3",tag="command"}
### init labels {id="init-labels",version="3",tag="command"}
Generate JSON files for the labels in the data. This helps speed up the training
process, since spaCy won't have to preprocess the data to extract the labels.
@ -287,12 +287,12 @@ $ python -m spacy convert [input_file] [output_dir] [--converter] [--file-type]
| `ner` / `conll` | NER with IOB/IOB2/BILUO tags, one token per line with columns separated by whitespace. The first column is the token and the final column is the NER tag. Sentences are separated by blank lines and documents are separated by the line `-DOCSTART- -X- O O`. Supports CoNLL 2003 NER format. See [sample data](%%GITHUB_SPACY/extra/example_data/ner_example_data). |
| `iob` | NER with IOB/IOB2/BILUO tags, one sentence per line with tokens separated by whitespace and annotation separated by `\|`, either `word\|B-ENT`or`word\|POS\|B-ENT`. See [sample data](%%GITHUB_SPACY/extra/example_data/ner_example_data). |
## debug {id="debug",new="3"}
## debug {id="debug",version="3"}
The `spacy debug` CLI includes helpful commands for debugging and profiling your
configs, data and implementations.
### debug config {id="debug-config",new="3",tag="command"}
### debug config {id="debug-config",version="3",tag="command"}
Debug a [`config.cfg` file](/usage/training#config) and show validation errors.
The command will create all objects in the tree and validate them. Note that
@ -893,7 +893,7 @@ $ python -m spacy debug profile [model] [inputs] [--n-texts]
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **PRINTS** | Profiling information for the pipeline. |
### debug model {id="debug-model",new="3",tag="command"}
### debug model {id="debug-model",version="3",tag="command"}
Debug a Thinc [`Model`](https://thinc.ai/docs/api-model) by running it on a
sample text and checking how it updates its internal weights and parameters.
@ -1061,7 +1061,7 @@ $ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id]
| overrides | Config parameters to override. Should be options starting with `--` that correspond to the config section and value to override, e.g. `--paths.train ./train.spacy`. ~~Any (option/flag)~~ |
| **CREATES** | The final trained pipeline and the best trained pipeline. |
### Calling the training function from Python {id="train-function",new="3.2"}
### Calling the training function from Python {id="train-function",version="3.2"}
The training CLI exposes a `train` helper function that lets you run the
training just like `spacy train`. Usually it's easier to use the command line
@ -1084,7 +1084,7 @@ directly, but if you need to kick off training from code this is how to do it.
| `use_gpu` | Which GPU to use. Defaults to -1 for no GPU. ~~int~~ |
| `overrides` | Values to override config settings. ~~Dict[str, Any]~~ |
## pretrain {id="pretrain",new="2.1",tag="command,experimental"}
## pretrain {id="pretrain",version="2.1",tag="command,experimental"}
Pretrain the "token to vector" ([`Tok2vec`](/api/tok2vec)) layer of pipeline
components on raw text, using an approximate language-modeling objective.
@ -1132,7 +1132,7 @@ $ python -m spacy pretrain [config_path] [output_dir] [--code] [--resume-path] [
| overrides | Config parameters to override. Should be options starting with `--` that correspond to the config section and value to override, e.g. `--training.dropout 0.2`. ~~Any (option/flag)~~ |
| **CREATES** | The pretrained weights that can be used to initialize `spacy train`. |
## evaluate {id="evaluate",new="2",tag="command"}
## evaluate {id="evaluate",version="2",tag="command"}
Evaluate a trained pipeline. Expects a loadable spaCy pipeline (package name or
path) and evaluation data in the
@ -1162,7 +1162,7 @@ $ python -m spacy evaluate [model] [data_path] [--output] [--code] [--gold-prepr
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | Training results and optional metrics and visualizations. |
## find-threshold {id="find-threshold",new="3.5",tag="command"}
## find-threshold {id="find-threshold",version="3.5",tag="command"}
Runs prediction trials for a trained model with varying tresholds to maximize
the specified metric. The search space for the threshold is traversed linearly
@ -1281,7 +1281,7 @@ $ python -m spacy package [input_dir] [output_dir] [--code] [--meta-path] [--cre
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | A Python package containing the spaCy pipeline. |
## project {id="project",new="3"}
## project {id="project",version="3"}
The `spacy project` CLI includes subcommands for working with
[spaCy projects](/usage/projects), end-to-end workflows for building and
@ -1543,7 +1543,7 @@ $ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose] [--
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | A `dvc.yaml` file in the project directory, based on the steps defined in the given workflow. |
## huggingface-hub {id="huggingface-hub",new="3.1"}
## huggingface-hub {id="huggingface-hub",version="3.1"}
The `spacy huggingface-cli` CLI includes commands for uploading your trained
spaCy pipelines to the [Hugging Face Hub](https://huggingface.co/).

View File

@ -3,7 +3,7 @@ title: Corpus
teaser: An annotated corpus
tag: class
source: spacy/training/corpus.py
new: 3
version: 3
---
This class manages annotated corpora and can be used for training and

View File

@ -14,7 +14,7 @@ vocabulary data. For an overview of label schemes used by the models, see the
[models directory](/models). Each trained pipeline documents the label schemes
used in its components, depending on the data it was trained on.
## Training config {id="config",new="3"}
## Training config {id="config",version="3"}
Config files define the training process and pipeline and can be passed to
[`spacy train`](/api/cli#train). They use
@ -257,7 +257,7 @@ Also see the usage guides on the
## Training data {id="training"}
### Binary training format {id="binary-training",new="3"}
### Binary training format {id="binary-training",version="3"}
> #### Example
>
@ -466,7 +466,7 @@ gold_dict = {"entities": [(0, 12, "PERSON")],
example = Example.from_dict(doc, gold_dict)
```
## Lexical data for vocabulary {id="vocab-jsonl",new="2"}
## Lexical data for vocabulary {id="vocab-jsonl",version="2"}
This data file can be provided via the `vocab_data` setting in the
`[initialize]` block of the training config to pre-define the lexical data to

View File

@ -2,7 +2,7 @@
title: DependencyMatcher
teaser: Match subtrees within a dependency parse
tag: class
new: 3
version: 3
source: spacy/matcher/dependencymatcher.pyx
---

View File

@ -155,7 +155,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/dependencyparser#call) and
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
| **YIELDS** | The processed documents in order. ~~Doc~~ |
## DependencyParser.initialize {id="initialize",tag="method",new="3"}
## DependencyParser.initialize {id="initialize",tag="method",version="3"}
Initialize the component for training. `get_examples` should be a function that
returns an iterable of [`Example`](/api/example) objects. **At least one example
@ -432,7 +432,7 @@ The labels currently added to the component.
| ----------- | ------------------------------------------------------ |
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
## DependencyParser.label_data {id="label_data",tag="property",new="3"}
## DependencyParser.label_data {id="label_data",tag="property",version="3"}
The labels currently added to the component and their internal meta information.
This is the data generated by [`init labels`](/api/cli#init-labels) and used by

View File

@ -115,7 +115,7 @@ Get the number of tokens in the document.
| ----------- | --------------------------------------------- |
| **RETURNS** | The number of tokens in the document. ~~int~~ |
## Doc.set_extension {id="set_extension",tag="classmethod",new="2"}
## Doc.set_extension {id="set_extension",tag="classmethod",version="2"}
Define a custom attribute on the `Doc` which becomes available via `Doc._`. For
details, see the documentation on
@ -140,7 +140,7 @@ details, see the documentation on
| `setter` | Setter function that takes the `Doc` and a value, and modifies the object. Is called when the user writes to the `Doc._` attribute. ~~Optional[Callable[[Doc, Any], None]]~~ |
| `force` | Force overwriting existing attribute. ~~bool~~ |
## Doc.get_extension {id="get_extension",tag="classmethod",new="2"}
## Doc.get_extension {id="get_extension",tag="classmethod",version="2"}
Look up a previously registered extension by name. Returns a 4-tuple
`(default, method, getter, setter)` if the extension is registered. Raises a
@ -160,7 +160,7 @@ Look up a previously registered extension by name. Returns a 4-tuple
| `name` | Name of the extension. ~~str~~ |
| **RETURNS** | A `(default, method, getter, setter)` tuple of the extension. ~~Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]~~ |
## Doc.has_extension {id="has_extension",tag="classmethod",new="2"}
## Doc.has_extension {id="has_extension",tag="classmethod",version="2"}
Check whether an extension has been registered on the `Doc` class.
@ -177,7 +177,7 @@ Check whether an extension has been registered on the `Doc` class.
| `name` | Name of the extension to check. ~~str~~ |
| **RETURNS** | Whether the extension has been registered. ~~bool~~ |
## Doc.remove_extension {id="remove_extension",tag="classmethod",new="2.0.12"}
## Doc.remove_extension {id="remove_extension",tag="classmethod",version="2.0.12"}
Remove a previously registered extension.
@ -195,7 +195,7 @@ Remove a previously registered extension.
| `name` | Name of the extension. ~~str~~ |
| **RETURNS** | A `(default, method, getter, setter)` tuple of the removed extension. ~~Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]~~ |
## Doc.char_span {id="char_span",tag="method",new="2"}
## Doc.char_span {id="char_span",tag="method",version="2"}
Create a `Span` object from the slice `doc.text[start_idx:end_idx]`. Returns
`None` if the character indices don't map to a valid span using the default
@ -219,7 +219,7 @@ alignment mode `"strict".
| `alignment_mode` | How character indices snap to token boundaries. Options: `"strict"` (no snapping), `"contract"` (span of all tokens completely within the character span), `"expand"` (span of all tokens at least partially covered by the character span). Defaults to `"strict"`. ~~str~~ |
| **RETURNS** | The newly constructed object or `None`. ~~Optional[Span]~~ |
## Doc.set_ents {id="set_ents",tag="method",new="3"}
## Doc.set_ents {id="set_ents",tag="method",version="3"}
Set the named entities in the document.
@ -379,7 +379,7 @@ array of attributes.
| `exclude` | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
| **RETURNS** | The `Doc` itself. ~~Doc~~ |
## Doc.from_docs {id="from_docs",tag="staticmethod",new="3"}
## Doc.from_docs {id="from_docs",tag="staticmethod",version="3"}
Concatenate multiple `Doc` objects to form a new one. Raises an error if the
`Doc` objects do not all share the same `Vocab`.
@ -408,7 +408,7 @@ Concatenate multiple `Doc` objects to form a new one. Raises an error if the
| `exclude` <Tag variant="new">3.3</Tag> | String names of Doc attributes to exclude. Supported: `spans`, `tensor`, `user_data`. ~~Iterable[str]~~ |
| **RETURNS** | The new `Doc` object that is containing the other docs or `None`, if `docs` is empty or `None`. ~~Optional[Doc]~~ |
## Doc.to_disk {id="to_disk",tag="method",new="2"}
## Doc.to_disk {id="to_disk",tag="method",version="2"}
Save the current state to a directory.
@ -424,7 +424,7 @@ Save the current state to a directory.
| _keyword-only_ | |
| `exclude` | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
## Doc.from_disk {id="from_disk",tag="method",new="2"}
## Doc.from_disk {id="from_disk",tag="method",version="2"}
Loads state from a directory. Modifies the object in place and returns it.
@ -498,7 +498,7 @@ deprecated [`JSON training format`](/api/data-formats#json-input).
| `underscore` | Optional list of string names of custom `Doc` attributes. Attribute values need to be JSON-serializable. Values will be added to an `"_"` key in the data, e.g. `"_": {"foo": "bar"}`. ~~Optional[List[str]]~~ |
| **RETURNS** | The data in JSON format. ~~Dict[str, Any]~~ |
## Doc.from_json {id="from_json",tag="method",new="3.3.1"}
## Doc.from_json {id="from_json",tag="method",version="3.3.1"}
Deserializes a document from JSON, i.e. generates a document from the provided
JSON data as generated by [`Doc.to_json()`](/api/doc#to_json).
@ -520,7 +520,7 @@ JSON data as generated by [`Doc.to_json()`](/api/doc#to_json).
| `validate` | Whether to validate the JSON input against the expected schema for detailed debugging. Defaults to `False`. ~~bool~~ |
| **RETURNS** | A `Doc` corresponding to the provided JSON. ~~Doc~~ |
## Doc.retokenize {id="retokenize",tag="contextmanager",new="2.1"}
## Doc.retokenize {id="retokenize",tag="contextmanager",version="2.1"}
Context manager to handle retokenization of the `Doc`. Modifications to the
`Doc`'s tokenization are stored, and then made all at once when the context

View File

@ -1,7 +1,7 @@
---
title: DocBin
tag: class
new: 2.2
version: 2.2
teaser: Pack Doc objects for binary serialization
source: spacy/tokens/_serialize.py
---
@ -150,7 +150,7 @@ Deserialize the `DocBin`'s annotations from a bytestring.
| `bytes_data` | The data to load from. ~~bytes~~ |
| **RETURNS** | The loaded `DocBin`. ~~DocBin~~ |
## DocBin.to_disk {id="to_disk",tag="method",new="3"}
## DocBin.to_disk {id="to_disk",tag="method",version="3"}
Save the serialized `DocBin` to a file. Typically uses the `.spacy` extension
and the result can be used as the input data for
@ -168,7 +168,7 @@ and the result can be used as the input data for
| -------- | -------------------------------------------------------------------------- |
| `path` | The file path, typically with the `.spacy` extension. ~~Union[str, Path]~~ |
## DocBin.from_disk {id="from_disk",tag="method",new="3"}
## DocBin.from_disk {id="from_disk",tag="method",version="3"}
Load a serialized `DocBin` from a file. Typically uses the `.spacy` extension.

View File

@ -2,7 +2,7 @@
title: EditTreeLemmatizer
tag: class
source: spacy/pipeline/edit_tree_lemmatizer.py
new: 3.3
version: 3.3
teaser: 'Pipeline component for lemmatization'
api_base_class: /api/pipe
api_string_name: trainable_lemmatizer
@ -138,7 +138,7 @@ and [`pipe`](/api/edittreelemmatizer#pipe) delegate to the
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
| **YIELDS** | The processed documents in order. ~~Doc~~ |
## EditTreeLemmatizer.initialize {id="initialize",tag="method",new="3"}
## EditTreeLemmatizer.initialize {id="initialize",tag="method",version="3"}
Initialize the component for training. `get_examples` should be a function that
returns an iterable of [`Example`](/api/example) objects. **At least one example
@ -371,7 +371,7 @@ identifiers of edit trees.
| ----------- | ------------------------------------------------------ |
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
## EditTreeLemmatizer.label_data {id="label_data",tag="property",new="3"}
## EditTreeLemmatizer.label_data {id="label_data",tag="property",version="3"}
The labels currently added to the component and their internal meta information.
This is the data generated by [`init labels`](/api/cli#init-labels) and used by

View File

@ -2,7 +2,7 @@
title: EntityLinker
tag: class
source: spacy/pipeline/entity_linker.py
new: 2.2
version: 2.2
teaser: 'Pipeline component for named entity linking and disambiguation'
api_base_class: /api/pipe
api_string_name: entity_linker
@ -161,7 +161,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/entitylinker#call) and
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
| **YIELDS** | The processed documents in order. ~~Doc~~ |
## EntityLinker.set_kb {id="set_kb",tag="method",new="3"}
## EntityLinker.set_kb {id="set_kb",tag="method",version="3"}
The `kb_loader` should be a function that takes a `Vocab` instance and creates
the `KnowledgeBase`, ensuring that the strings of the knowledge base are synced
@ -183,7 +183,7 @@ with the current vocab.
| ----------- | ---------------------------------------------------------------------------------------------------------------- |
| `kb_loader` | Function that creates a [`KnowledgeBase`](/api/kb) from a `Vocab` instance. ~~Callable[[Vocab], KnowledgeBase]~~ |
## EntityLinker.initialize {id="initialize",tag="method",new="3"}
## EntityLinker.initialize {id="initialize",tag="method",version="3"}
Initialize the component for training. `get_examples` should be a function that
returns an iterable of [`Example`](/api/example) objects. **At least one example

View File

@ -151,7 +151,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/entityrecognizer#call) and
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
| **YIELDS** | The processed documents in order. ~~Doc~~ |
## EntityRecognizer.initialize {id="initialize",tag="method",new="3"}
## EntityRecognizer.initialize {id="initialize",tag="method",version="3"}
Initialize the component for training. `get_examples` should be a function that
returns an iterable of [`Example`](/api/example) objects. **At least one example
@ -427,7 +427,7 @@ The labels currently added to the component.
| ----------- | ------------------------------------------------------ |
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
## EntityRecognizer.label_data {id="label_data",tag="property",new="3"}
## EntityRecognizer.label_data {id="label_data",tag="property",version="3"}
The labels currently added to the component and their internal meta information.
This is the data generated by [`init labels`](/api/cli#init-labels) and used by

View File

@ -2,7 +2,7 @@
title: EntityRuler
tag: class
source: spacy/pipeline/entityruler.py
new: 2.1
version: 2.1
teaser: 'Pipeline component for rule-based named entity recognition'
api_string_name: entity_ruler
api_trainable: false
@ -96,7 +96,7 @@ be a token pattern (list) or a phrase pattern (string). For example:
| `ent_id_sep` | Separator used internally for entity IDs. Defaults to `"\|\|"`. ~~str~~ |
| `patterns` | Optional patterns to load in on initialization. ~~Optional[List[Dict[str, Union[str, List[dict]]]]]~~ |
## EntityRuler.initialize {id="initialize",tag="method",new="3"}
## EntityRuler.initialize {id="initialize",tag="method",version="3"}
Initialize the component with data and used before training to load in rules
from a [pattern file](/usage/rule-based-matching/#entityruler-files). This
@ -210,7 +210,7 @@ of dicts) or a phrase pattern (string). For more details, see the usage guide on
| ---------- | ---------------------------------------------------------------- |
| `patterns` | The patterns to add. ~~List[Dict[str, Union[str, List[dict]]]]~~ |
## EntityRuler.remove {id="remove",tag="method",new="3.2.1"}
## EntityRuler.remove {id="remove",tag="method",version="3.2.1"}
Remove a pattern by its ID from the entity ruler. A `ValueError` is raised if
the ID does not exist.
@ -307,7 +307,7 @@ All labels present in the match patterns.
| ----------- | -------------------------------------- |
| **RETURNS** | The string labels. ~~Tuple[str, ...]~~ |
## EntityRuler.ent_ids {id="ent_ids",tag="property",new="2.2.2"}
## EntityRuler.ent_ids {id="ent_ids",tag="property",version="2.2.2"}
All entity IDs present in the `id` properties of the match patterns.

View File

@ -3,7 +3,7 @@ title: Example
teaser: A training instance
tag: class
source: spacy/training/example.pyx
new: 3.0
version: 3.0
---
An `Example` holds the information for one training instance. It stores two
@ -282,7 +282,7 @@ Split one `Example` into multiple `Example` objects, one for each sentence.
| ----------- | ---------------------------------------------------------------------------- |
| **RETURNS** | List of `Example` objects, one for each original sentence. ~~List[Example]~~ |
## Alignment {id="alignment-object",new="3"}
## Alignment {id="alignment-object",version="3"}
Calculate alignment tables between two tokenizations.

View File

@ -5,7 +5,7 @@ teaser:
(ontology)
tag: class
source: spacy/kb/kb.pyx
new: 2.2
version: 2.2
---
The `KnowledgeBase` object is an abstract class providing a method to generate

View File

@ -5,7 +5,7 @@ teaser:
information in-memory.
tag: class
source: spacy/kb/kb_in_memory.pyx
new: 3.5
version: 3.5
---
The `InMemoryLookupKB` class inherits from [`KnowledgeBase`](/api/kb) and

View File

@ -44,7 +44,7 @@ information in [`Language.meta`](/api/language#meta) and not to configure the
| `create_tokenizer` | Optional function that receives the `nlp` object and returns a tokenizer. ~~Callable[[Language], Callable[[str], Doc]]~~ |
| `batch_size` | Default batch size for [`pipe`](#pipe) and [`evaluate`](#evaluate). Defaults to `1000`. ~~int~~ |
## Language.from_config {id="from_config",tag="classmethod",new="3"}
## Language.from_config {id="from_config",tag="classmethod",version="3"}
Create a `Language` object from a loaded config. Will set up the tokenizer and
language data, add pipeline components based on the pipeline and add pipeline
@ -76,7 +76,7 @@ spaCy loads a model under the hood based on its
| `validate` | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
| **RETURNS** | The initialized object. ~~Language~~ |
## Language.component {id="component",tag="classmethod",new="3"}
## Language.component {id="component",tag="classmethod",version="3"}
Register a custom pipeline component under a given name. This allows
initializing the component by name using
@ -209,7 +209,7 @@ tokenization is skipped but the rest of the pipeline is run.
| `n_process` | Number of processors to use. Defaults to `1`. ~~int~~ |
| **YIELDS** | Documents in the order of the original text. ~~Doc~~ |
## Language.set_error_handler {id="set_error_handler",tag="method",new="3"}
## Language.set_error_handler {id="set_error_handler",tag="method",version="3"}
Define a callback that will be invoked when an error is thrown during processing
of one or more documents. Specifically, this function will call
@ -231,7 +231,7 @@ being processed, and the original error.
| --------------- | -------------------------------------------------------------------------------------------------------------- |
| `error_handler` | A function that performs custom error handling. ~~Callable[[str, Callable[[Doc], Doc], List[Doc], Exception]~~ |
## Language.initialize {id="initialize",tag="method",new="3"}
## Language.initialize {id="initialize",tag="method",version="3"}
Initialize the pipeline for training and return an
[`Optimizer`](https://thinc.ai/docs/api-optimizers). Under the hood, it uses the
@ -282,7 +282,7 @@ objects.
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
| **RETURNS** | The optimizer. ~~Optimizer~~ |
## Language.resume_training {id="resume_training",tag="method,experimental",new="3"}
## Language.resume_training {id="resume_training",tag="method,experimental",version="3"}
Continue training a trained pipeline. Create and return an optimizer, and
initialize "rehearsal" for any pipeline component that has a `rehearse` method.
@ -342,7 +342,7 @@ and custom registered functions if needed. See the
| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ |
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
## Language.rehearse {id="rehearse",tag="method,experimental",new="3"}
## Language.rehearse {id="rehearse",tag="method,experimental",version="3"}
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
current model to make predictions similar to an initial model, to try to address
@ -409,7 +409,7 @@ their original weights after the block.
| -------- | ------------------------------------------------------ |
| `params` | A dictionary of parameters keyed by model ID. ~~dict~~ |
## Language.add_pipe {id="add_pipe",tag="method",new="2"}
## Language.add_pipe {id="add_pipe",tag="method",version="2"}
Add a component to the processing pipeline. Expects a name that maps to a
component factory registered using
@ -458,7 +458,7 @@ component, adds it to the pipeline and returns it.
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
## Language.create_pipe {id="create_pipe",tag="method",new="2"}
## Language.create_pipe {id="create_pipe",tag="method",version="2"}
Create a pipeline component from a factory.
@ -487,7 +487,7 @@ To create a component and add it to the pipeline, you should always use
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
## Language.has_factory {id="has_factory",tag="classmethod",new="3"}
## Language.has_factory {id="has_factory",tag="classmethod",version="3"}
Check whether a factory name is registered on the `Language` class or subclass.
Will check for
@ -514,7 +514,7 @@ the `Language` base class, available to all subclasses.
| `name` | Name of the pipeline factory to check. ~~str~~ |
| **RETURNS** | Whether a factory of that name is registered on the class. ~~bool~~ |
## Language.has_pipe {id="has_pipe",tag="method",new="2"}
## Language.has_pipe {id="has_pipe",tag="method",version="2"}
Check whether a component is present in the pipeline. Equivalent to
`name in nlp.pipe_names`.
@ -536,7 +536,7 @@ Check whether a component is present in the pipeline. Equivalent to
| `name` | Name of the pipeline component to check. ~~str~~ |
| **RETURNS** | Whether a component of that name exists in the pipeline. ~~bool~~ |
## Language.get_pipe {id="get_pipe",tag="method",new="2"}
## Language.get_pipe {id="get_pipe",tag="method",version="2"}
Get a pipeline component for a given component name.
@ -552,7 +552,7 @@ Get a pipeline component for a given component name.
| `name` | Name of the pipeline component to get. ~~str~~ |
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
## Language.replace_pipe {id="replace_pipe",tag="method",new="2"}
## Language.replace_pipe {id="replace_pipe",tag="method",version="2"}
Replace a component in the pipeline and return the new component.
@ -580,7 +580,7 @@ and instead expects the **name of a component factory** registered using
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
| **RETURNS** | The new pipeline component. ~~Callable[[Doc], Doc]~~ |
## Language.rename_pipe {id="rename_pipe",tag="method",new="2"}
## Language.rename_pipe {id="rename_pipe",tag="method",version="2"}
Rename a component in the pipeline. Useful to create custom names for
pre-defined and pre-loaded components. To change the default name of a component
@ -598,7 +598,7 @@ added to the pipeline, you can also use the `name` argument on
| `old_name` | Name of the component to rename. ~~str~~ |
| `new_name` | New name of the component. ~~str~~ |
## Language.remove_pipe {id="remove_pipe",tag="method",new="2"}
## Language.remove_pipe {id="remove_pipe",tag="method",version="2"}
Remove a component from the pipeline. Returns the removed component name and
component function.
@ -615,7 +615,7 @@ component function.
| `name` | Name of the component to remove. ~~str~~ |
| **RETURNS** | A `(name, component)` tuple of the removed component. ~~Tuple[str, Callable[[Doc], Doc]]~~ |
## Language.disable_pipe {id="disable_pipe",tag="method",new="3"}
## Language.disable_pipe {id="disable_pipe",tag="method",version="3"}
Temporarily disable a pipeline component so it's not run as part of the
pipeline. Disabled components are listed in
@ -641,7 +641,7 @@ does nothing.
| ------ | ----------------------------------------- |
| `name` | Name of the component to disable. ~~str~~ |
## Language.enable_pipe {id="enable_pipe",tag="method",new="3"}
## Language.enable_pipe {id="enable_pipe",tag="method",version="3"}
Enable a previously disabled component (e.g. via
[`Language.disable_pipes`](/api/language#disable_pipes)) so it's run as part of
@ -663,7 +663,7 @@ already enabled, this method does nothing.
| ------ | ---------------------------------------- |
| `name` | Name of the component to enable. ~~str~~ |
## Language.select_pipes {id="select_pipes",tag="contextmanager, method",new="3"}
## Language.select_pipes {id="select_pipes",tag="contextmanager, method",version="3"}
Disable one or more pipeline components. If used as a context manager, the
pipeline will be restored to the initial state at the end of the block.
@ -706,7 +706,7 @@ As of spaCy v3.0, the `disable_pipes` method has been renamed to `select_pipes`:
| `enable` | Name(s) of pipeline component(s) that will not be disabled. ~~Optional[Union[str, Iterable[str]]]~~ |
| **RETURNS** | The disabled pipes that can be restored by calling the object's `.restore()` method. ~~DisabledPipes~~ |
## Language.get_factory_meta {id="get_factory_meta",tag="classmethod",new="3"}
## Language.get_factory_meta {id="get_factory_meta",tag="classmethod",version="3"}
Get the factory meta information for a given pipeline component name. Expects
the name of the component **factory**. The factory meta is an instance of the
@ -728,7 +728,7 @@ information about the component and its default provided by the
| `name` | The factory name. ~~str~~ |
| **RETURNS** | The factory meta. ~~FactoryMeta~~ |
## Language.get_pipe_meta {id="get_pipe_meta",tag="method",new="3"}
## Language.get_pipe_meta {id="get_pipe_meta",tag="method",version="3"}
Get the factory meta information for a given pipeline component name. Expects
the name of the component **instance** in the pipeline. The factory meta is an
@ -751,7 +751,7 @@ contains the information about the component and its default provided by the
| `name` | The pipeline component name. ~~str~~ |
| **RETURNS** | The factory meta. ~~FactoryMeta~~ |
## Language.analyze_pipes {id="analyze_pipes",tag="method",new="3"}
## Language.analyze_pipes {id="analyze_pipes",tag="method",version="3"}
Analyze the current pipeline components and show a summary of the attributes
they assign and require, and the scores they set. The data is based on the
@ -840,7 +840,7 @@ token.ent_iob, token.ent_type
| `pretty` | Pretty-print the results as a table. Defaults to `False`. ~~bool~~ |
| **RETURNS** | Dictionary containing the pipe analysis, keyed by `"summary"` (component meta by pipe), `"problems"` (attribute names by pipe) and `"attrs"` (pipes that assign and require an attribute, keyed by attribute). ~~Optional[Dict[str, Any]]~~ |
## Language.replace_listeners {id="replace_listeners",tag="method",new="3"}
## Language.replace_listeners {id="replace_listeners",tag="method",version="3"}
Find [listener layers](/usage/embeddings-transformers#embedding-layers)
(connecting to a shared token-to-vector embedding component) of a given pipeline
@ -911,7 +911,7 @@ information is expressed in the [`config.cfg`](/api/data-formats#config).
| ----------- | --------------------------------- |
| **RETURNS** | The meta data. ~~Dict[str, Any]~~ |
## Language.config {id="config",tag="property",new="3"}
## Language.config {id="config",tag="property",version="3"}
Export a trainable [`config.cfg`](/api/data-formats#config) for the current
`nlp` object. Includes the current pipeline, all configs used to create the
@ -932,7 +932,7 @@ subclass of the built-in `dict`. It supports the additional methods `to_disk`
| ----------- | ---------------------- |
| **RETURNS** | The config. ~~Config~~ |
## Language.to_disk {id="to_disk",tag="method",new="2"}
## Language.to_disk {id="to_disk",tag="method",version="2"}
Save the current state to a directory. Under the hood, this method delegates to
the `to_disk` methods of the individual pipeline components, if available. This
@ -951,7 +951,7 @@ will be saved to disk.
| _keyword-only_ | |
| `exclude` | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
## Language.from_disk {id="from_disk",tag="method",new="2"}
## Language.from_disk {id="from_disk",tag="method",version="2"}
Loads state from a directory, including all data that was saved with the
`Language` object. Modifies the object in place and returns it.
@ -1117,7 +1117,7 @@ serialization by passing in the string names via the `exclude` argument.
| `meta` | The meta data, available as [`Language.meta`](/api/language#meta). |
| ... | String names of pipeline components, e.g. `"ner"`. |
## FactoryMeta {id="factorymeta",new="3",tag="dataclass"}
## FactoryMeta {id="factorymeta",version="3",tag="dataclass"}
The `FactoryMeta` contains the information about the component and its default
provided by the [`@Language.component`](/api/language#component) or

View File

@ -2,7 +2,7 @@
title: Lemmatizer
tag: class
source: spacy/pipeline/lemmatizer.py
new: 3
version: 3
teaser: 'Pipeline component for lemmatization'
api_string_name: lemmatizer
api_trainable: false

View File

@ -3,7 +3,7 @@ title: Lookups
teaser: A container for large lookup tables and dictionaries
tag: class
source: spacy/lookups.py
new: 2.2
version: 2.2
---
This class allows convenient access to large lookup tables and dictionaries,

View File

@ -143,7 +143,7 @@ the match.
| `with_alignments` <Tag variant="new">3.0.6</Tag> | Return match alignment information as part of the match tuple as `List[int]` with the same length as the matched span. Each entry denotes the corresponding index of the token in the pattern. If `as_spans` is set to `True`, this setting is ignored. Defaults to `False`. ~~bool~~ |
| **RETURNS** | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end`]. The `match_id` is the ID of the added match pattern. If `as_spans` is set to `True`, a list of `Span` objects is returned instead. ~~Union[List[Tuple[int, int, int]], List[Span]]~~ |
## Matcher.\_\_len\_\_ {id="len",tag="method",new="2"}
## Matcher.\_\_len\_\_ {id="len",tag="method",version="2"}
Get the number of rules added to the matcher. Note that this only returns the
number of rules (identical with the number of IDs), not the number of individual
@ -162,7 +162,7 @@ patterns.
| ----------- | ---------------------------- |
| **RETURNS** | The number of rules. ~~int~~ |
## Matcher.\_\_contains\_\_ {id="contains",tag="method",new="2"}
## Matcher.\_\_contains\_\_ {id="contains",tag="method",version="2"}
Check whether the matcher contains rules for a match ID.
@ -180,7 +180,7 @@ Check whether the matcher contains rules for a match ID.
| `key` | The match ID. ~~str~~ |
| **RETURNS** | Whether the matcher contains rules for this match ID. ~~bool~~ |
## Matcher.add {id="add",tag="method",new="2"}
## Matcher.add {id="add",tag="method",version="2"}
Add a rule to the matcher, consisting of an ID key, one or more patterns, and an
optional callback function to act on the matches. The callback function will
@ -226,7 +226,7 @@ patterns = [[{"TEXT": "Google"}, {"TEXT": "Now"}], [{"TEXT": "GoogleNow"}]]
| `on_match` | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |
| `greedy` <Tag variant="new">3</Tag> | Optional filter for greedy matches. Can either be `"FIRST"` or `"LONGEST"`. ~~Optional[str]~~ |
## Matcher.remove {id="remove",tag="method",new="2"}
## Matcher.remove {id="remove",tag="method",version="2"}
Remove a rule from the matcher. A `KeyError` is raised if the match ID does not
exist.
@ -244,7 +244,7 @@ exist.
| ----- | --------------------------------- |
| `key` | The ID of the match rule. ~~str~~ |
## Matcher.get {id="get",tag="method",new="2"}
## Matcher.get {id="get",tag="method",version="2"}
Retrieve the pattern stored for a key. Returns the rule as an
`(on_match, patterns)` tuple containing the callback and available patterns.

View File

@ -2,7 +2,7 @@
title: Morphologizer
tag: class
source: spacy/pipeline/morphologizer.pyx
new: 3
version: 3
teaser: 'Pipeline component for predicting morphological features'
api_base_class: /api/tagger
api_string_name: morphologizer
@ -403,7 +403,7 @@ coarse-grained POS as the feature `POS`.
| ----------- | ------------------------------------------------------ |
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
## Morphologizer.label_data {id="label_data",tag="property",new="3"}
## Morphologizer.label_data {id="label_data",tag="property",version="3"}
The labels currently added to the component and their internal meta information.
This is the data generated by [`init labels`](/api/cli#init-labels) and used by

View File

@ -3,7 +3,7 @@ title: PhraseMatcher
teaser: Match sequences of tokens, based on documents
tag: class
source: spacy/matcher/phrasematcher.pyx
new: 2
version: 2
---
The `PhraseMatcher` lets you efficiently match large terminology lists. While
@ -155,7 +155,7 @@ patterns = [nlp("health care reform"), nlp("healthcare reform")]
| _keyword-only_ | |
| `on_match` | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |
## PhraseMatcher.remove {id="remove",tag="method",new="2.2"}
## PhraseMatcher.remove {id="remove",tag="method",version="2.2"}
Remove a rule from the matcher by match ID. A `KeyError` is raised if the key
does not exist.

View File

@ -100,7 +100,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/pipe#call) and
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
| **YIELDS** | The processed documents in order. ~~Doc~~ |
## TrainablePipe.set_error_handler {id="set_error_handler",tag="method",new="3"}
## TrainablePipe.set_error_handler {id="set_error_handler",tag="method",version="3"}
Define a callback that will be invoked when an error is thrown during processing
of one or more documents with either [`__call__`](/api/pipe#call) or
@ -122,7 +122,7 @@ processed, and the original error.
| --------------- | -------------------------------------------------------------------------------------------------------------- |
| `error_handler` | A function that performs custom error handling. ~~Callable[[str, Callable[[Doc], Doc], List[Doc], Exception]~~ |
## TrainablePipe.get_error_handler {id="get_error_handler",tag="method",new="3"}
## TrainablePipe.get_error_handler {id="get_error_handler",tag="method",version="3"}
Retrieve the callback that performs error handling for this component's
[`__call__`](/api/pipe#call) and [`pipe`](/api/pipe#pipe) methods. If no custom
@ -141,7 +141,7 @@ returned that simply reraises the exception.
| ----------- | ---------------------------------------------------------------------------------------------------------------- |
| **RETURNS** | The function that performs custom error handling. ~~Callable[[str, Callable[[Doc], Doc], List[Doc], Exception]~~ |
## TrainablePipe.initialize {id="initialize",tag="method",new="3"}
## TrainablePipe.initialize {id="initialize",tag="method",version="3"}
Initialize the component for training. `get_examples` should be a function that
returns an iterable of [`Example`](/api/example) objects. The data examples are
@ -240,7 +240,7 @@ predictions and gold-standard annotations, and update the component's model.
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
## TrainablePipe.rehearse {id="rehearse",tag="method,experimental",new="3"}
## TrainablePipe.rehearse {id="rehearse",tag="method,experimental",version="3"}
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
current model to make predictions similar to an initial model, to try to address
@ -287,7 +287,7 @@ This method needs to be overwritten with your own custom `get_loss` method.
| `scores` | Scores representing the model's predictions. |
| **RETURNS** | The loss and the gradient, i.e. `(loss, gradient)`. ~~Tuple[float, float]~~ |
## TrainablePipe.score {id="score",tag="method",new="3"}
## TrainablePipe.score {id="score",tag="method",version="3"}
Score a batch of examples.

View File

@ -70,7 +70,7 @@ components to the end of the pipeline and after all other components.
| `doc` | The `Doc` object to process, e.g. the `Doc` in the pipeline. ~~Doc~~ |
| **RETURNS** | The modified `Doc` with merged entities. ~~Doc~~ |
## merge_subtokens {id="merge_subtokens",tag="function",new="2.1"}
## merge_subtokens {id="merge_subtokens",tag="function",version="2.1"}
Merge subtokens into a single token. Also available via the string name
`"merge_subtokens"`. As of v2.1, the parser is able to predict "subtokens" that
@ -110,7 +110,7 @@ end of the pipeline and after all other components.
| `label` | The subtoken dependency label. Defaults to `"subtok"`. ~~str~~ |
| **RETURNS** | The modified `Doc` with merged subtokens. ~~Doc~~ |
## token_splitter {id="token_splitter",tag="function",new="3.0"}
## token_splitter {id="token_splitter",tag="function",version="3.0"}
Split tokens longer than a minimum length into shorter tokens. Intended for use
with transformer pipelines where long spaCy tokens lead to input text that
@ -132,7 +132,7 @@ exceed the transformer model max length.
| `split_length` | The length of the split tokens. Defaults to `5`. ~~int~~ |
| **RETURNS** | The modified `Doc` with the split tokens. ~~Doc~~ |
## doc_cleaner {id="doc_cleaner",tag="function",new="3.2.1"}
## doc_cleaner {id="doc_cleaner",tag="function",version="3.2.1"}
Clean up `Doc` attributes. Intended for use at the end of pipelines with
`tok2vec` or `transformer` pipeline components that store tensors and other

View File

@ -72,7 +72,7 @@ core pipeline components, the individual score names start with the `Token` or
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
| **RETURNS** | A dictionary of scores. ~~Dict[str, Union[float, Dict[str, float]]]~~ |
## Scorer.score_tokenization {id="score_tokenization",tag="staticmethod",new="3"}
## Scorer.score_tokenization {id="score_tokenization",tag="staticmethod",version="3"}
Scores the tokenization:
@ -93,7 +93,7 @@ Docs with `has_unknown_spaces` are skipped during scoring.
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
| **RETURNS** | `Dict` | A dictionary containing the scores `token_acc`, `token_p`, `token_r`, `token_f`. ~~Dict[str, float]]~~ |
## Scorer.score_token_attr {id="score_token_attr",tag="staticmethod",new="3"}
## Scorer.score_token_attr {id="score_token_attr",tag="staticmethod",version="3"}
Scores a single token attribute. Tokens with missing values in the reference doc
are skipped during scoring.
@ -114,7 +114,7 @@ are skipped during scoring.
| `missing_values` | Attribute values to treat as missing annotation in the reference annotation. Defaults to `{0, None, ""}`. ~~Set[Any]~~ |
| **RETURNS** | A dictionary containing the score `{attr}_acc`. ~~Dict[str, float]~~ |
## Scorer.score_token_attr_per_feat {id="score_token_attr_per_feat",tag="staticmethod",new="3"}
## Scorer.score_token_attr_per_feat {id="score_token_attr_per_feat",tag="staticmethod",version="3"}
Scores a single token attribute per feature for a token attribute in the
Universal Dependencies
@ -138,7 +138,7 @@ scoring.
| `missing_values` | Attribute values to treat as missing annotation in the reference annotation. Defaults to `{0, None, ""}`. ~~Set[Any]~~ |
| **RETURNS** | A dictionary containing the micro PRF scores under the key `{attr}_micro_p/r/f` and the per-feature PRF scores under `{attr}_per_feat`. ~~Dict[str, Dict[str, float]]~~ |
## Scorer.score_spans {id="score_spans",tag="staticmethod",new="3"}
## Scorer.score_spans {id="score_spans",tag="staticmethod",version="3"}
Returns PRF scores for labeled or unlabeled spans.
@ -160,7 +160,7 @@ Returns PRF scores for labeled or unlabeled spans.
| `allow_overlap` | Defaults to `False`. Whether or not to allow overlapping spans. If set to `False`, the alignment will automatically resolve conflicts. ~~bool~~ |
| **RETURNS** | A dictionary containing the PRF scores under the keys `{attr}_p`, `{attr}_r`, `{attr}_f` and the per-type PRF scores under `{attr}_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~ |
## Scorer.score_deps {id="score_deps",tag="staticmethod",new="3"}
## Scorer.score_deps {id="score_deps",tag="staticmethod",version="3"}
Calculate the UAS, LAS, and LAS per type scores for dependency parses. Tokens
with missing values for the `attr` (typically `dep`) are skipped during scoring.
@ -194,7 +194,7 @@ with missing values for the `attr` (typically `dep`) are skipped during scoring.
| `missing_values` | Attribute values to treat as missing annotation in the reference annotation. Defaults to `{0, None, ""}`. ~~Set[Any]~~ |
| **RETURNS** | A dictionary containing the scores: `{attr}_uas`, `{attr}_las`, and `{attr}_las_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~ |
## Scorer.score_cats {id="score_cats",tag="staticmethod",new="3"}
## Scorer.score_cats {id="score_cats",tag="staticmethod",version="3"}
Calculate PRF and ROC AUC scores for a doc-level attribute that is a dict
containing scores for each label like `Doc.cats`. The returned dictionary
@ -241,7 +241,7 @@ The reported `{attr}_score` depends on the classification properties:
| `threshold` | Cutoff to consider a prediction "positive". Defaults to `0.5` for multi-label, and `0.0` (i.e. whatever's highest scoring) otherwise. ~~float~~ |
| **RETURNS** | A dictionary containing the scores, with inapplicable scores as `None`. ~~Dict[str, Optional[float]]~~ |
## Scorer.score_links {id="score_links",tag="staticmethod",new="3"}
## Scorer.score_links {id="score_links",tag="staticmethod",version="3"}
Returns PRF for predicted links on the entity level. To disentangle the
performance of the NEL from the NER, this method only evaluates NEL links for
@ -264,7 +264,7 @@ entities that overlap between the gold reference and the predictions.
| `negative_labels` | The string values that refer to no annotation (e.g. "NIL"). ~~Iterable[str]~~ |
| **RETURNS** | A dictionary containing the scores. ~~Dict[str, Optional[float]]~~ |
## get_ner_prf {id="get_ner_prf",new="3"}
## get_ner_prf {id="get_ner_prf",version="3"}
Compute micro-PRF and per-entity PRF scores.

View File

@ -2,7 +2,7 @@
title: SentenceRecognizer
tag: class
source: spacy/pipeline/senter.pyx
new: 3
version: 3
teaser: 'Pipeline component for sentence segmentation'
api_base_class: /api/tagger
api_string_name: senter
@ -211,7 +211,7 @@ Delegates to [`predict`](/api/sentencerecognizer#predict) and
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
## SentenceRecognizer.rehearse {id="rehearse",tag="method,experimental",new="3"}
## SentenceRecognizer.rehearse {id="rehearse",tag="method,experimental",version="3"}
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
current model to make predictions similar to an initial model to try to address

View File

@ -93,7 +93,7 @@ Get the number of tokens in the span.
| ----------- | ----------------------------------------- |
| **RETURNS** | The number of tokens in the span. ~~int~~ |
## Span.set_extension {id="set_extension",tag="classmethod",new="2"}
## Span.set_extension {id="set_extension",tag="classmethod",version="2"}
Define a custom attribute on the `Span` which becomes available via `Span._`.
For details, see the documentation on
@ -118,7 +118,7 @@ For details, see the documentation on
| `setter` | Setter function that takes the `Span` and a value, and modifies the object. Is called when the user writes to the `Span._` attribute. ~~Optional[Callable[[Span, Any], None]]~~ |
| `force` | Force overwriting existing attribute. ~~bool~~ |
## Span.get_extension {id="get_extension",tag="classmethod",new="2"}
## Span.get_extension {id="get_extension",tag="classmethod",version="2"}
Look up a previously registered extension by name. Returns a 4-tuple
`(default, method, getter, setter)` if the extension is registered. Raises a
@ -138,7 +138,7 @@ Look up a previously registered extension by name. Returns a 4-tuple
| `name` | Name of the extension. ~~str~~ |
| **RETURNS** | A `(default, method, getter, setter)` tuple of the extension. ~~Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]~~ |
## Span.has_extension {id="has_extension",tag="classmethod",new="2"}
## Span.has_extension {id="has_extension",tag="classmethod",version="2"}
Check whether an extension has been registered on the `Span` class.
@ -155,7 +155,7 @@ Check whether an extension has been registered on the `Span` class.
| `name` | Name of the extension to check. ~~str~~ |
| **RETURNS** | Whether the extension has been registered. ~~bool~~ |
## Span.remove_extension {id="remove_extension",tag="classmethod",new="2.0.12"}
## Span.remove_extension {id="remove_extension",tag="classmethod",version="2.0.12"}
Remove a previously registered extension.
@ -173,7 +173,7 @@ Remove a previously registered extension.
| `name` | Name of the extension. ~~str~~ |
| **RETURNS** | A `(default, method, getter, setter)` tuple of the removed extension. ~~Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]~~ |
## Span.char_span {id="char_span",tag="method",new="2.2.4"}
## Span.char_span {id="char_span",tag="method",version="2.2.4"}
Create a `Span` object from the slice `span.text[start:end]`. Returns `None` if
the character indices don't map to a valid span.
@ -235,7 +235,7 @@ ancestor is found, e.g. if span excludes a necessary ancestor.
| ----------- | --------------------------------------------------------------------------------------- |
| **RETURNS** | The lowest common ancestor matrix of the `Span`. ~~numpy.ndarray[ndim=2, dtype=int32]~~ |
## Span.to_array {id="to_array",tag="method",new="2"}
## Span.to_array {id="to_array",tag="method",version="2"}
Given a list of `M` attribute IDs, export the tokens to a numpy `ndarray` of
shape `(N, M)`, where `N` is the length of the document. The values will be
@ -256,7 +256,7 @@ shape `(N, M)`, where `N` is the length of the document. The values will be
| `attr_ids` | A list of attributes (int IDs or string names) or a single attribute (int ID or string name). ~~Union[int, str, List[Union[int, str]]]~~ |
| **RETURNS** | The exported attributes as a numpy array. ~~Union[numpy.ndarray[ndim=2, dtype=uint64], numpy.ndarray[ndim=1, dtype=uint64]]~~ |
## Span.ents {id="ents",tag="property",new="2.0.13",model="ner"}
## Span.ents {id="ents",tag="property",version="2.0.13",model="ner"}
The named entities that fall completely within the span. Returns a tuple of
`Span` objects.
@ -520,7 +520,7 @@ sent = doc[sent.start : max(sent.end, span.end)]
| ----------- | ------------------------------------------------------- |
| **RETURNS** | The sentence span that this span is a part of. ~~Span~~ |
## Span.sents {id="sents",tag="property",model="sentences",new="3.2.1"}
## Span.sents {id="sents",tag="property",model="sentences",version="3.2.1"}
Returns a generator over the sentences the span belongs to. This property is
only available when [sentence boundaries](/usage/linguistic-features#sbd) have

View File

@ -2,7 +2,7 @@
title: SpanCategorizer
tag: class,experimental
source: spacy/pipeline/spancat.py
new: 3.1
version: 3.1
teaser: 'Pipeline component for labeling potentially overlapping spans of text'
api_base_class: /api/pipe
api_string_name: spancat
@ -239,7 +239,7 @@ Delegates to [`predict`](/api/spancategorizer#predict) and
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
## SpanCategorizer.set_candidates {id="set_candidates",tag="method", new="3.3"}
## SpanCategorizer.set_candidates {id="set_candidates",tag="method", version="3.3"}
Use the suggester to add a list of [`Span`](/api/span) candidates to a list of
[`Doc`](/api/doc) objects. This method is intended to be used for debugging

View File

@ -2,7 +2,7 @@
title: SpanGroup
tag: class
source: spacy/tokens/span_group.pyx
new: 3
version: 3
---
A group of arbitrary, potentially overlapping [`Span`](/api/span) objects that
@ -125,7 +125,7 @@ changes to be reflected in the span group.
| `i` | The item index. ~~int~~ |
| **RETURNS** | The span at the given index. ~~Span~~ |
## SpanGroup.\_\_setitem\_\_ {id="setitem",tag="method", new="3.3"}
## SpanGroup.\_\_setitem\_\_ {id="setitem",tag="method", version="3.3"}
Set a span in the span group.
@ -144,7 +144,7 @@ Set a span in the span group.
| `i` | The item index. ~~int~~ |
| `span` | The new value. ~~Span~~ |
## SpanGroup.\_\_delitem\_\_ {id="delitem",tag="method", new="3.3"}
## SpanGroup.\_\_delitem\_\_ {id="delitem",tag="method", version="3.3"}
Delete a span from the span group.
@ -161,7 +161,7 @@ Delete a span from the span group.
| ---- | ----------------------- |
| `i` | The item index. ~~int~~ |
## SpanGroup.\_\_add\_\_ {id="add",tag="method", new="3.3"}
## SpanGroup.\_\_add\_\_ {id="add",tag="method", version="3.3"}
Concatenate the current span group with another span group and return the result
in a new span group. Any `attrs` from the first span group will have precedence
@ -182,7 +182,7 @@ over `attrs` in the second.
| `other` | The span group or spans to concatenate. ~~Union[SpanGroup, Iterable[Span]]~~ |
| **RETURNS** | The new span group. ~~SpanGroup~~ |
## SpanGroup.\_\_iadd\_\_ {id="iadd",tag="method", new="3.3"}
## SpanGroup.\_\_iadd\_\_ {id="iadd",tag="method", version="3.3"}
Append an iterable of spans or the content of a span group to the current span
group. Any `attrs` in the other span group will be added for keys that are not
@ -241,7 +241,7 @@ group.
| ------- | -------------------------------------------------------- |
| `spans` | The spans to add. ~~Union[SpanGroup, Iterable["Span"]]~~ |
## SpanGroup.copy {id="copy",tag="method", new="3.3"}
## SpanGroup.copy {id="copy",tag="method", version="3.3"}
Return a copy of the span group.

View File

@ -2,7 +2,7 @@
title: SpanRuler
tag: class
source: spacy/pipeline/span_ruler.py
new: 3.3
version: 3.3
teaser: 'Pipeline component for rule-based span and named entity recognition'
api_string_name: span_ruler
api_trainable: false

View File

@ -90,7 +90,7 @@ store will always include an empty string `""` at position `0`.
| ---------- | ------------------------------ |
| **YIELDS** | A string in the store. ~~str~~ |
## StringStore.add {id="add",tag="method",new="2"}
## StringStore.add {id="add",tag="method",version="2"}
Add a string to the `StringStore`.
@ -110,7 +110,7 @@ Add a string to the `StringStore`.
| `string` | The string to add. ~~str~~ |
| **RETURNS** | The string's hash value. ~~int~~ |
## StringStore.to_disk {id="to_disk",tag="method",new="2"}
## StringStore.to_disk {id="to_disk",tag="method",version="2"}
Save the current state to a directory.
@ -124,7 +124,7 @@ Save the current state to a directory.
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `path` | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
## StringStore.from_disk {id="from_disk",tag="method",new="2"}
## StringStore.from_disk {id="from_disk",tag="method",version="2"}
Loads state from a directory. Modifies the object in place and returns it.

View File

@ -127,7 +127,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/tagger#call) and
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
| **YIELDS** | The processed documents in order. ~~Doc~~ |
## Tagger.initialize {id="initialize",tag="method",new="3"}
## Tagger.initialize {id="initialize",tag="method",version="3"}
Initialize the component for training. `get_examples` should be a function that
returns an iterable of [`Example`](/api/example) objects. **At least one example
@ -228,7 +228,7 @@ Delegates to [`predict`](/api/tagger#predict) and
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
## Tagger.rehearse {id="rehearse",tag="method,experimental",new="3"}
## Tagger.rehearse {id="rehearse",tag="method,experimental",version="3"}
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
current model to make predictions similar to an initial model, to try to address
@ -410,7 +410,7 @@ The labels currently added to the component.
| ----------- | ------------------------------------------------------ |
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
## Tagger.label_data {id="label_data",tag="property",new="3"}
## Tagger.label_data {id="label_data",tag="property",version="3"}
The labels currently added to the component and their internal meta information.
This is the data generated by [`init labels`](/api/cli#init-labels) and used by

View File

@ -2,7 +2,7 @@
title: TextCategorizer
tag: class
source: spacy/pipeline/textcat.py
new: 2
version: 2
teaser: 'Pipeline component for text classification'
api_base_class: /api/pipe
api_string_name: textcat
@ -172,7 +172,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/textcategorizer#call) and
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
| **YIELDS** | The processed documents in order. ~~Doc~~ |
## TextCategorizer.initialize {id="initialize",tag="method",new="3"}
## TextCategorizer.initialize {id="initialize",tag="method",version="3"}
Initialize the component for training. `get_examples` should be a function that
returns an iterable of [`Example`](/api/example) objects. **At least one example
@ -275,7 +275,7 @@ Delegates to [`predict`](/api/textcategorizer#predict) and
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
## TextCategorizer.rehearse {id="rehearse",tag="method,experimental",new="3"}
## TextCategorizer.rehearse {id="rehearse",tag="method,experimental",version="3"}
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
current model to make predictions similar to an initial model to try to address
@ -317,7 +317,7 @@ predicted scores.
| `scores` | Scores representing the model's predictions. |
| **RETURNS** | The loss and the gradient, i.e. `(loss, gradient)`. ~~Tuple[float, float]~~ |
## TextCategorizer.score {id="score",tag="method",new="3"}
## TextCategorizer.score {id="score",tag="method",version="3"}
Score a batch of examples.
@ -472,7 +472,7 @@ The labels currently added to the component.
| ----------- | ------------------------------------------------------ |
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
## TextCategorizer.label_data {id="label_data",tag="property",new="3"}
## TextCategorizer.label_data {id="label_data",tag="property",version="3"}
The labels currently added to the component and their internal meta information.
This is the data generated by [`init labels`](/api/cli#init-labels) and used by

View File

@ -1,7 +1,7 @@
---
title: Tok2Vec
source: spacy/pipeline/tok2vec.py
new: 3
version: 3
teaser: null
api_base_class: /api/pipe
api_string_name: tok2vec

View File

@ -39,7 +39,7 @@ The number of unicode characters in the token, i.e. `token.text`.
| ----------- | ------------------------------------------------------ |
| **RETURNS** | The number of unicode characters in the token. ~~int~~ |
## Token.set_extension {id="set_extension",tag="classmethod",new="2"}
## Token.set_extension {id="set_extension",tag="classmethod",version="2"}
Define a custom attribute on the `Token` which becomes available via `Token._`.
For details, see the documentation on
@ -64,7 +64,7 @@ For details, see the documentation on
| `setter` | Setter function that takes the `Token` and a value, and modifies the object. Is called when the user writes to the `Token._` attribute. ~~Optional[Callable[[Token, Any], None]]~~ |
| `force` | Force overwriting existing attribute. ~~bool~~ |
## Token.get_extension {id="get_extension",tag="classmethod",new="2"}
## Token.get_extension {id="get_extension",tag="classmethod",version="2"}
Look up a previously registered extension by name. Returns a 4-tuple
`(default, method, getter, setter)` if the extension is registered. Raises a
@ -84,7 +84,7 @@ Look up a previously registered extension by name. Returns a 4-tuple
| `name` | Name of the extension. ~~str~~ |
| **RETURNS** | A `(default, method, getter, setter)` tuple of the extension. ~~Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]~~ |
## Token.has_extension {id="has_extension",tag="classmethod",new="2"}
## Token.has_extension {id="has_extension",tag="classmethod",version="2"}
Check whether an extension has been registered on the `Token` class.
@ -101,7 +101,7 @@ Check whether an extension has been registered on the `Token` class.
| `name` | Name of the extension to check. ~~str~~ |
| **RETURNS** | Whether the extension has been registered. ~~bool~~ |
## Token.remove_extension {id="remove_extension",tag="classmethod",new="2.0.11"}
## Token.remove_extension {id="remove_extension",tag="classmethod",version="2.0.11"}
Remove a previously registered extension.

View File

@ -70,7 +70,7 @@ for name in pipeline:
nlp.from_disk(data_path) # 4. Load in the binary data
```
### spacy.blank {id="spacy.blank",tag="function",new="2"}
### spacy.blank {id="spacy.blank",tag="function",version="2"}
Create a blank pipeline of a given language class. This function is the twin of
`spacy.load()`.
@ -134,7 +134,7 @@ list of available terms, see [`glossary.py`](%%GITHUB_SPACY/spacy/glossary.py).
| `term` | Term to explain. ~~str~~ |
| **RETURNS** | The explanation, or `None` if not found in the glossary. ~~Optional[str]~~ |
### spacy.prefer_gpu {id="spacy.prefer_gpu",tag="function",new="2.0.14"}
### spacy.prefer_gpu {id="spacy.prefer_gpu",tag="function",version="2.0.14"}
Allocate data and perform operations on [GPU](/usage/#gpu), if available. If
data has already been allocated on CPU, it will not be moved. Ideally, this
@ -162,7 +162,7 @@ ensure that the model is loaded on the correct device. See
| `gpu_id` | Device index to select. Defaults to `0`. ~~int~~ |
| **RETURNS** | Whether the GPU was activated. ~~bool~~ |
### spacy.require_gpu {id="spacy.require_gpu",tag="function",new="2.0.14"}
### spacy.require_gpu {id="spacy.require_gpu",tag="function",version="2.0.14"}
Allocate data and perform operations on [GPU](/usage/#gpu). Will raise an error
if no GPU is available. If data has already been allocated on CPU, it will not
@ -190,7 +190,7 @@ ensure that the model is loaded on the correct device. See
| `gpu_id` | Device index to select. Defaults to `0`. ~~int~~ |
| **RETURNS** | `True` ~~bool~~ |
### spacy.require_cpu {id="spacy.require_cpu",tag="function",new="3.0.0"}
### spacy.require_cpu {id="spacy.require_cpu",tag="function",version="3.0.0"}
Allocate data and perform operations on CPU. If data has already been allocated
on GPU, it will not be moved. Ideally, this function should be called right
@ -221,7 +221,7 @@ ensure that the model is loaded on the correct device. See
As of v2.0, spaCy comes with a built-in visualization suite. For more info and
examples, see the usage guide on [visualizing spaCy](/usage/visualizers).
### displacy.serve {id="displacy.serve",tag="method",new="2"}
### displacy.serve {id="displacy.serve",tag="method",version="2"}
Serve a dependency parse tree or named entity visualization to view it in your
browser. Will run a simple web server.
@ -248,7 +248,7 @@ browser. Will run a simple web server.
| `port` | Port to serve visualization. Defaults to `5000`. ~~int~~ |
| `host` | Host to serve visualization. Defaults to `"0.0.0.0"`. ~~str~~ |
### displacy.render {id="displacy.render",tag="method",new="2"}
### displacy.render {id="displacy.render",tag="method",version="2"}
Render a dependency parse tree or named entity visualization.
@ -273,7 +273,7 @@ Render a dependency parse tree or named entity visualization.
| `jupyter` | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None` (default). ~~Optional[bool]~~ |
| **RETURNS** | The rendered HTML markup. ~~str~~ |
### displacy.parse_deps {id="displacy.parse_deps",tag="method",new="2"}
### displacy.parse_deps {id="displacy.parse_deps",tag="method",version="2"}
Generate dependency parse in `{'words': [], 'arcs': []}` format. For use with
the `manual=True` argument in `displacy.render`.
@ -295,7 +295,7 @@ the `manual=True` argument in `displacy.render`.
| `options` | Dependency parse specific visualisation options. ~~Dict[str, Any]~~ |
| **RETURNS** | Generated dependency parse keyed by words and arcs. ~~dict~~ |
### displacy.parse_ents {id="displacy.parse_ents",tag="method",new="2"}
### displacy.parse_ents {id="displacy.parse_ents",tag="method",version="2"}
Generate named entities in `[{start: i, end: i, label: 'label'}]` format. For
use with the `manual=True` argument in `displacy.render`.
@ -317,7 +317,7 @@ use with the `manual=True` argument in `displacy.render`.
| `options` | NER-specific visualisation options. ~~Dict[str, Any]~~ |
| **RETURNS** | Generated entities keyed by text (original text) and ents. ~~dict~~ |
### displacy.parse_spans {id="displacy.parse_spans",tag="method",new="2"}
### displacy.parse_spans {id="displacy.parse_spans",tag="method",version="2"}
Generate spans in `[{start_token: i, end_token: i, label: 'label'}]` format. For
use with the `manual=True` argument in `displacy.render`.
@ -419,7 +419,7 @@ span. If you wish to link an entity to their URL then consider using the
should redirect you to their Wikidata page, in this case
`https://www.wikidata.org/wiki/Q95`.
## registry {id="registry",source="spacy/util.py",new="3"}
## registry {id="registry",source="spacy/util.py",version="3"}
spaCy's function registry extends
[Thinc's `registry`](https://thinc.ai/docs/api-config#registry) and allows you
@ -494,7 +494,7 @@ See the [`Transformer`](/api/transformer) API reference and
| [`span_getters`](/api/transformer#span_getters) | Registry for functions that take a batch of `Doc` objects and return a list of `Span` objects to process by the transformer, e.g. sentences. |
| [`annotation_setters`](/api/transformer#annotation_setters) | Registry for functions that create annotation setters. Annotation setters are functions that take a batch of `Doc` objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set additional annotations on the `Doc`. |
## Loggers {id="loggers",source="spacy/training/loggers.py",new="3"}
## Loggers {id="loggers",source="spacy/training/loggers.py",version="3"}
A logger records the training results. When a logger is created, two functions
are returned: one for logging the information for each training step, and a
@ -572,7 +572,7 @@ start decreasing across epochs.
## Readers {id="readers"}
### File readers {id="file-readers",source="github.com/explosion/srsly",new="3"}
### File readers {id="file-readers",source="github.com/explosion/srsly",version="3"}
The following file readers are provided by our serialization library
[`srsly`](https://github.com/explosion/srsly). All registered functions take one
@ -628,7 +628,7 @@ label sets.
| `require` | Whether to require the file to exist. If set to `False` and the labels file doesn't exist, the loader will return `None` and the `initialize` method will extract the labels from the data. Defaults to `False`. ~~bool~~ |
| **CREATES** | The list of labels. ~~List[str]~~ |
### Corpus readers {id="corpus-readers",source="spacy/training/corpus.py",new="3"}
### Corpus readers {id="corpus-readers",source="spacy/training/corpus.py",version="3"}
Corpus readers are registered functions that load data and return a function
that takes the current `nlp` object and yields [`Example`](/api/example) objects
@ -696,7 +696,7 @@ JSONL file. Also see the [`JsonlCorpus`](/api/corpus#jsonlcorpus) class.
| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
| **CREATES** | The corpus reader. ~~JsonlCorpus~~ |
## Batchers {id="batchers",source="spacy/training/batchers.py",new="3"}
## Batchers {id="batchers",source="spacy/training/batchers.py",version="3"}
A data batcher implements a batching strategy that essentially turns a stream of
items into a stream of batches, with each batch consisting of one item or a list
@ -783,7 +783,7 @@ sequences in the batch.
| `get_length` | Optional function that receives a sequence item and returns its length. Defaults to the built-in `len()` if not set. ~~Optional[Callable[[Any], int]]~~ |
| **CREATES** | The batcher that takes an iterable of items and returns batches. ~~Callable[[Iterable[Any]], Iterable[List[Any]]]~~ |
## Augmenters {id="augmenters",source="spacy/training/augment.py",new="3"}
## Augmenters {id="augmenters",source="spacy/training/augment.py",version="3"}
Data augmentation is the process of applying small modifications to the training
data. It can be especially useful for punctuation and case replacement for
@ -838,7 +838,7 @@ useful for making the model less sensitive to capitalization.
| `level` | The percentage of texts that will be augmented. ~~float~~ |
| **CREATES** | A function that takes the current `nlp` object and an [`Example`](/api/example) and yields augmented `Example` objects. ~~Callable[[Language, Example], Iterator[Example]]~~ |
## Callbacks {id="callbacks",source="spacy/training/callbacks.py",new="3"}
## Callbacks {id="callbacks",source="spacy/training/callbacks.py",version="3"}
The config supports [callbacks](/usage/training#custom-code-nlp-callbacks) at
several points in the lifecycle that can be used modify the `nlp` object.
@ -887,7 +887,7 @@ backprop passes.
| `backprop_color` | Color identifier for backpropagation passes. Defaults to `-1`. ~~int~~ |
| **CREATES** | A function that takes the current `nlp` and wraps forward/backprop passes in NVTX ranges. ~~Callable[[Language], Language]~~ |
### spacy.models_and_pipes_with_nvtx_range.v1 {id="models_and_pipes_with_nvtx_range",tag="registered function",new="3.4"}
### spacy.models_and_pipes_with_nvtx_range.v1 {id="models_and_pipes_with_nvtx_range",tag="registered function",version="3.4"}
> #### Example config
>
@ -975,7 +975,7 @@ This method was previously available as `spacy.gold.offsets_from_biluo_tags`.
| `tags` | A sequence of [BILUO](/usage/linguistic-features#accessing-ner) tags with each tag describing one token. Each tag string will be of the form of either `""`, `"O"` or `"{action}-{label}"`, where action is one of `"B"`, `"I"`, `"L"`, `"U"`. ~~List[str]~~ |
| **RETURNS** | A sequence of `(start, end, label)` triples. `start` and `end` will be character-offset integers denoting the slice into the original string. ~~List[Tuple[int, int, str]]~~ |
### training.biluo_tags_to_spans {id="biluo_tags_to_spans",tag="function",new="2.1"}
### training.biluo_tags_to_spans {id="biluo_tags_to_spans",tag="function",version="2.1"}
Encode per-token tags following the
[BILUO scheme](/usage/linguistic-features#accessing-ner) into
@ -1131,7 +1131,7 @@ custom language class, you can register it using the
| `lang` | Two-letter language code, e.g. `"en"`. ~~str~~ |
| **RETURNS** | The respective subclass. ~~Language~~ |
### util.lang_class_is_loaded {id="util.lang_class_is_loaded",tag="function",new="2.1"}
### util.lang_class_is_loaded {id="util.lang_class_is_loaded",tag="function",version="2.1"}
Check whether a `Language` subclass is already loaded. `Language` subclasses are
loaded lazily to avoid expensive setup code associated with the language data.
@ -1149,7 +1149,7 @@ loaded lazily to avoid expensive setup code associated with the language data.
| `name` | Two-letter language code, e.g. `"en"`. ~~str~~ |
| **RETURNS** | Whether the class has been loaded. ~~bool~~ |
### util.load_model {id="util.load_model",tag="function",new="2"}
### util.load_model {id="util.load_model",tag="function",version="2"}
Load a pipeline from a package or data path. If called with a string name, spaCy
will assume the pipeline is a Python package and import and call its `load()`
@ -1177,7 +1177,7 @@ and create a `Language` object. The model data will then be loaded in via
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
| **RETURNS** | `Language` class with the loaded pipeline. ~~Language~~ |
### util.load_model_from_init_py {id="util.load_model_from_init_py",tag="function",new="2"}
### util.load_model_from_init_py {id="util.load_model_from_init_py",tag="function",version="2"}
A helper function to use in the `load()` method of a pipeline package's
[`__init__.py`](https://github.com/explosion/spacy-models/tree/master/template/model/xx_model_name/__init__.py).
@ -1202,7 +1202,7 @@ A helper function to use in the `load()` method of a pipeline package's
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
| **RETURNS** | `Language` class with the loaded pipeline. ~~Language~~ |
### util.load_config {id="util.load_config",tag="function",new="3"}
### util.load_config {id="util.load_config",tag="function",version="3"}
Load a pipeline's [`config.cfg`](/api/data-formats#config) from a file path. The
config typically includes details about the components and how they're created,
@ -1222,7 +1222,7 @@ as well as all training settings and hyperparameters.
| `interpolate` | Whether to interpolate the config and replace variables like `${paths.train}` with their values. Defaults to `False`. ~~bool~~ |
| **RETURNS** | The pipeline's config. ~~Config~~ |
### util.load_meta {id="util.load_meta",tag="function",new="3"}
### util.load_meta {id="util.load_meta",tag="function",version="3"}
Get a pipeline's [`meta.json`](/api/data-formats#meta) from a file path and
validate its contents. The meta typically includes details about author,
@ -1239,7 +1239,7 @@ licensing, data sources and version.
| `path` | Path to the pipeline's `meta.json`. ~~Union[str, Path]~~ |
| **RETURNS** | The pipeline's meta data. ~~Dict[str, Any]~~ |
### util.get_installed_models {id="util.get_installed_models",tag="function",new="3"}
### util.get_installed_models {id="util.get_installed_models",tag="function",version="3"}
List all pipeline packages installed in the current environment. This will
include any spaCy pipeline that was packaged with
@ -1274,7 +1274,7 @@ Check if string maps to a package installed via pip. Mainly used to validate
| `name` | Name of package. ~~str~~ |
| **RETURNS** | `True` if installed package, `False` if not. ~~bool~~ |
### util.get_package_path {id="util.get_package_path",tag="function",new="2"}
### util.get_package_path {id="util.get_package_path",tag="function",version="2"}
Get path to an installed package. Mainly used to resolve the location of
[pipeline packages](/usage/models). Currently imports the package to find its
@ -1292,7 +1292,7 @@ path.
| `package_name` | Name of installed package. ~~str~~ |
| **RETURNS** | Path to pipeline package directory. ~~Path~~ |
### util.is_in_jupyter {id="util.is_in_jupyter",tag="function",new="2"}
### util.is_in_jupyter {id="util.is_in_jupyter",tag="function",version="2"}
Check if user is running spaCy from a [Jupyter](https://jupyter.org) notebook by
detecting the IPython kernel. Mainly used for the
@ -1362,7 +1362,7 @@ Compile a sequence of infix rules into a regex object.
| `entries` | The infix rules, e.g. [`lang.punctuation.TOKENIZER_INFIXES`](%%GITHUB_SPACY/spacy/lang/punctuation.py). ~~Iterable[Union[str, Pattern]]~~ |
| **RETURNS** | The regex object to be used for [`Tokenizer.infix_finditer`](/api/tokenizer#attributes). ~~Pattern~~ |
### util.minibatch {id="util.minibatch",tag="function",new="2"}
### util.minibatch {id="util.minibatch",tag="function",version="2"}
Iterate over batches of items. `size` may be an iterator, so that batch-size can
vary on each step.
@ -1381,7 +1381,7 @@ vary on each step.
| `size` | The batch size(s). ~~Union[int, Sequence[int]]~~ |
| **YIELDS** | The batches. |
### util.filter_spans {id="util.filter_spans",tag="function",new="2.1.4"}
### util.filter_spans {id="util.filter_spans",tag="function",version="2.1.4"}
Filter a sequence of [`Span`](/api/span) objects and remove duplicates or
overlaps. Useful for creating named entities (where one token can only be part
@ -1402,7 +1402,7 @@ of one entity) or when merging spans with
| `spans` | The spans to filter. ~~Iterable[Span]~~ |
| **RETURNS** | The filtered spans. ~~List[Span]~~ |
### util.get_words_and_spaces {id="get_words_and_spaces",tag="function",new="3"}
### util.get_words_and_spaces {id="get_words_and_spaces",tag="function",version="3"}
Given a list of words and a text, reconstruct the original tokens and return a
list of words and spaces that can be used to create a [`Doc`](/api/doc#init).

View File

@ -3,7 +3,7 @@ title: Transformer
teaser: Pipeline component for multi-task learning with transformer models
tag: class
source: github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py
new: 3
version: 3
api_base_class: /api/pipe
api_string_name: transformer
---

View File

@ -3,7 +3,7 @@ title: Vectors
teaser: Store, save and load word vectors
tag: class
source: spacy/vectors.pyx
new: 2
version: 2
---
Vectors data is kept in the `Vectors.data` attribute, which should be an
@ -356,7 +356,7 @@ supported for `floret` mode.
| `sort` | Whether to sort the entries returned by score. Defaults to `True`. ~~bool~~ |
| **RETURNS** | The most similar entries as a `(keys, best_rows, scores)` tuple. ~~Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]~~ |
## Vectors.get_batch {id="get_batch",tag="method",new="3.2"}
## Vectors.get_batch {id="get_batch",tag="method",version="3.2"}
Get the vectors for the provided keys efficiently as a batch.

View File

@ -122,7 +122,7 @@ using `token.check_flag(flag_id)`.
| `flag_id` | An integer between `1` and `63` (inclusive), specifying the bit at which the flag will be stored. If `-1`, the lowest available bit will be chosen. ~~int~~ |
| **RETURNS** | The integer ID by which the flag value can be checked. ~~int~~ |
## Vocab.reset_vectors {id="reset_vectors",tag="method",new="2"}
## Vocab.reset_vectors {id="reset_vectors",tag="method",version="2"}
Drop the current vector table. Because all vectors must be the same width, you
have to call this to change the size of the vectors. Only one of the `width` and
@ -140,7 +140,7 @@ have to call this to change the size of the vectors. Only one of the `width` and
| `width` | The new width. ~~int~~ |
| `shape` | The new shape. ~~int~~ |
## Vocab.prune_vectors {id="prune_vectors",tag="method",new="2"}
## Vocab.prune_vectors {id="prune_vectors",tag="method",version="2"}
Reduce the current vector table to `nr_row` unique entries. Words mapped to the
discarded vectors will be remapped to the closest vector among those remaining.
@ -165,7 +165,7 @@ cosines are calculated in minibatches to reduce memory usage.
| `batch_size` | Batch of vectors for calculating the similarities. Larger batch sizes might be faster, while temporarily requiring more memory. ~~int~~ |
| **RETURNS** | A dictionary keyed by removed words mapped to `(string, score)` tuples, where `string` is the entry the removed word was mapped to, and `score` the similarity score between the two words. ~~Dict[str, Tuple[str, float]]~~ |
## Vocab.deduplicate_vectors {id="deduplicate_vectors",tag="method",new="3.3"}
## Vocab.deduplicate_vectors {id="deduplicate_vectors",tag="method",version="3.3"}
> #### Example
>
@ -176,7 +176,7 @@ cosines are calculated in minibatches to reduce memory usage.
Remove any duplicate rows from the current vector table, maintaining the
mappings for all words in the vectors.
## Vocab.get_vector {id="get_vector",tag="method",new="2"}
## Vocab.get_vector {id="get_vector",tag="method",version="2"}
Retrieve a vector for a word in the vocabulary. Words can be looked up by string
or hash value. If the current vectors do not contain an entry for the word, a
@ -194,7 +194,7 @@ or hash value. If the current vectors do not contain an entry for the word, a
| `orth` | The hash value of a word, or its unicode string. ~~Union[int, str]~~ |
| **RETURNS** | A word vector. Size and shape are determined by the `Vocab.vectors` instance. ~~numpy.ndarray[ndim=1, dtype=float32]~~ |
## Vocab.set_vector {id="set_vector",tag="method",new="2"}
## Vocab.set_vector {id="set_vector",tag="method",version="2"}
Set a vector for a word in the vocabulary. Words can be referenced by string or
hash value.
@ -210,7 +210,7 @@ hash value.
| `orth` | The hash value of a word, or its unicode string. ~~Union[int, str]~~ |
| `vector` | The vector to set. ~~numpy.ndarray[ndim=1, dtype=float32]~~ |
## Vocab.has_vector {id="has_vector",tag="method",new="2"}
## Vocab.has_vector {id="has_vector",tag="method",version="2"}
Check whether a word has a vector. Returns `False` if no vectors are loaded.
Words can be looked up by string or hash value.
@ -227,7 +227,7 @@ Words can be looked up by string or hash value.
| `orth` | The hash value of a word, or its unicode string. ~~Union[int, str]~~ |
| **RETURNS** | Whether the word has a vector. ~~bool~~ |
## Vocab.to_disk {id="to_disk",tag="method",new="2"}
## Vocab.to_disk {id="to_disk",tag="method",version="2"}
Save the current state to a directory.
@ -243,7 +243,7 @@ Save the current state to a directory.
| _keyword-only_ | |
| `exclude` | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
## Vocab.from_disk {id="from_disk",tag="method",new="2"}
## Vocab.from_disk {id="from_disk",tag="method",version="2"}
Loads state from a directory. Modifies the object in place and returns it.

View File

@ -114,12 +114,12 @@ example, to add a tag for the documented type or mark features that have been
introduced in a specific version or require statistical models to be loaded.
Tags are also available as standalone `<Tag />` components.
| Argument | Example | Result |
| -------- | -------------------------- | ----------------------------------------- |
| `tag` | `{tag="method"}` | <Tag>method</Tag> |
| `new` | `{new="3"}` | <Tag variant="new">3</Tag> |
| `model` | `{model="tagger, parser"}` | <Tag variant="model">tagger, parser</Tag> |
| `hidden` | `{hidden="true"}` | |
| Argument | Example | Result |
| --------- | -------------------------- | ----------------------------------------- |
| `tag` | `{tag="method"}` | <Tag>method</Tag> |
| `version` | `{version="3"}` | <Tag variant="new">3</Tag> |
| `model` | `{model="tagger, parser"}` | <Tag variant="model">tagger, parser</Tag> |
| `hidden` | `{hidden="true"}` | |
## Elements {id="elements"}
@ -165,7 +165,7 @@ import Tag from 'components/tag'
> ```jsx
> <Tag>method</Tag>
> <Tag variant="new">4</Tag>
> <Tag variant="version">4</Tag>
> <Tag variant="model">tagger, parser</Tag>
> ```

View File

@ -121,7 +121,7 @@ $ pip install -U %%SPACY_PKG_NAME%%SPACY_PKG_FLAGS
$ python -m spacy validate
```
### Run spaCy with GPU {id="gpu",new="2.0.14"}
### Run spaCy with GPU {id="gpu",version="2.0.14"}
As of v2.0, spaCy comes with neural network models that are implemented in our
machine learning library, [Thinc](https://thinc.ai). For GPU support, we've been

View File

@ -76,7 +76,7 @@ print(token.morph) # 'Case=Nom|Number=Sing|Person=1|PronType=Prs'
print(token.morph.get("PronType")) # ['Prs']
```
### Statistical morphology {id="morphologizer",new="3",model="morphologizer"}
### Statistical morphology {id="morphologizer",version="3",model="morphologizer"}
spaCy's statistical [`Morphologizer`](/api/morphologizer) component assigns the
morphological features and coarse-grained part-of-speech tags as `Token.morph`
@ -118,7 +118,7 @@ print(doc[2].morph) # 'Case=Nom|Person=2|PronType=Prs'
print(doc[2].pos_) # 'PRON'
```
## Lemmatization {id="lemmatization",model="lemmatizer",new="3"}
## Lemmatization {id="lemmatization",model="lemmatizer",version="3"}
spaCy provides two pipeline components for lemmatization:
@ -959,7 +959,7 @@ nlp.tokenizer.add_special_case("...gimme...?", [{"ORTH": "...gimme...?"}])
assert len(nlp("...gimme...?")) == 1
```
#### Debugging the tokenizer {id="tokenizer-debug",new="2.2.3"}
#### Debugging the tokenizer {id="tokenizer-debug",version="2.2.3"}
A working implementation of the pseudo-code above is available for debugging as
[`nlp.tokenizer.explain(text)`](/api/tokenizer#explain). It returns a list of
@ -1287,7 +1287,7 @@ tokenizer** it will be using at runtime. See the docs on
</Infobox>
#### Training with custom tokenization {id="custom-tokenizer-training",new="3"}
#### Training with custom tokenization {id="custom-tokenizer-training",version="3"}
spaCy's [training config](/usage/training#config) describes the settings,
hyperparameters, pipeline and tokenizer used for constructing and training the
@ -1456,7 +1456,7 @@ tokenizations add up to the same string. For example, you'll be able to align
</Infobox>
## Merging and splitting {id="retokenization",new="2.1"}
## Merging and splitting {id="retokenization",version="2.1"}
The [`Doc.retokenize`](/api/doc#retokenize) context manager lets you merge and
split tokens. Modifications to the tokenization are stored and performed all at
@ -1709,7 +1709,7 @@ your `Doc` using custom components _before_ it's parsed. Depending on your text,
this may also improve parse accuracy, since the parser is constrained to predict
parses consistent with the sentence boundaries.
### Statistical sentence segmenter {id="sbd-senter",model="senter",new="3"}
### Statistical sentence segmenter {id="sbd-senter",model="senter",version="3"}
The [`SentenceRecognizer`](/api/sentencerecognizer) is a simple statistical
component that only provides sentence boundaries. Along with being faster and
@ -1810,7 +1810,7 @@ doc = nlp(text)
print("After:", [sent.text for sent in doc.sents])
```
## Mappings & Exceptions {id="mappings-exceptions",new="3"}
## Mappings & Exceptions {id="mappings-exceptions",version="3"}
The [`AttributeRuler`](/api/attributeruler) manages **rule-based mappings and
exceptions** for all token-level attributes. As the number of

View File

@ -74,7 +74,7 @@ import Languages from 'widgets/languages.js'
<Languages />
### Multi-language support {id="multi-language",new="2"}
### Multi-language support {id="multi-language",version="2"}
> ```python
> # Standard import
@ -96,7 +96,7 @@ To train a pipeline using the neutral multi-language class, you can set
import the `MultiLanguage` class directly, or call
[`spacy.blank("xx")`](/api/top-level#spacy.blank) for lazy-loading.
### Chinese language support {id="chinese",new="2.3"}
### Chinese language support {id="chinese",version="2.3"}
The Chinese language class supports three word segmentation options, `char`,
`jieba` and `pkuseg`.

View File

@ -461,7 +461,7 @@ run as part of the pipeline.
| `nlp.component_names` | All component names, including disabled components. |
| `nlp.disabled` | Names of components that are currently disabled. |
### Sourcing components from existing pipelines {id="sourced-components",new="3"}
### Sourcing components from existing pipelines {id="sourced-components",version="3"}
Pipeline components that are independent can also be reused across pipelines.
Instead of adding a new blank component, you can also copy an existing component
@ -518,7 +518,7 @@ nlp.add_pipe("ner", source=source_nlp)
print(nlp.pipe_names)
```
### Analyzing pipeline components {id="analysis",new="3"}
### Analyzing pipeline components {id="analysis",version="3"}
The [`nlp.analyze_pipes`](/api/language#analyze_pipes) method analyzes the
components in the current pipeline and outputs information about them like the
@ -838,7 +838,7 @@ make your factory a separate function. That's also how spaCy does it internally.
</Accordion>
### Language-specific factories {id="factories-language",new="3"}
### Language-specific factories {id="factories-language",version="3"}
There are many use cases where you might want your pipeline components to be
language-specific. Sometimes this requires entirely different implementation per
@ -1197,7 +1197,7 @@ object is saved to disk, which will run the component's `to_disk` method. When
the pipeline is loaded back into spaCy later to use it, the `from_disk` method
will load the data back in.
## Python type hints and validation {id="type-hints",new="3"}
## Python type hints and validation {id="type-hints",version="3"}
spaCy's configs are powered by our machine learning library Thinc's
[configuration system](https://thinc.ai/docs/usage-config), which supports
@ -1267,7 +1267,7 @@ nlp.add_pipe("debug", config={"log_level": "DEBUG"})
doc = nlp("This is a text...")
```
## Trainable components {id="trainable-components",new="3"}
## Trainable components {id="trainable-components",version="3"}
spaCy's [`TrainablePipe`](/api/pipe) class helps you implement your own
trainable components that have their own model instance, make predictions over
@ -1384,7 +1384,7 @@ into your spaCy pipeline, see the usage guide on
</Infobox>
## Extension attributes {id="custom-components-attributes",new="2"}
## Extension attributes {id="custom-components-attributes",version="2"}
spaCy allows you to set any custom attributes and methods on the `Doc`, `Span`
and `Token`, which become available as `Doc._`, `Span._` and `Token._` for

View File

@ -1,6 +1,6 @@
---
title: Projects
new: 3
version: 3
menu:
- ['Intro & Workflow', 'intro']
- ['Directory & Assets', 'directory']

View File

@ -218,7 +218,7 @@ spaCy processes your text and why your pattern matches, or why it doesn't.
</Infobox>
#### Extended pattern syntax and attributes {id="adding-patterns-attributes-extended",new="2.1"}
#### Extended pattern syntax and attributes {id="adding-patterns-attributes-extended",version="2.1"}
Instead of mapping to a single value, token patterns can also map to a
**dictionary of properties**. For example, to specify that the value of a lemma
@ -251,7 +251,7 @@ following rich comparison attributes are available:
| `INTERSECTS` | Attribute value (for `MORPH` or custom list attributes) has a non-empty intersection with a list. ~~Any~~ |
| `==`, `>=`, `<=`, `>`, `<` | Attribute value is equal, greater or equal, smaller or equal, greater or smaller. ~~Union[int, float]~~ |
#### Regular expressions {id="regex",new="2.1"}
#### Regular expressions {id="regex",version="2.1"}
In some cases, only matching tokens and token attributes isn't enough for
example, you might want to match different spellings of a word, without having
@ -402,7 +402,7 @@ This quirk in the semantics is corrected in spaCy v2.1.0.
</Infobox>
#### Using wildcard token patterns {id="adding-patterns-wildcard",new="2"}
#### Using wildcard token patterns {id="adding-patterns-wildcard",version="2"}
While the token attributes offer many options to write highly specific patterns,
you can also use an empty dictionary, `{}` as a wildcard representing **any
@ -416,7 +416,7 @@ character, but no whitespace so you'll know it will be handled as one token.
[{"ORTH": "User"}, {"ORTH": "name"}, {"ORTH": ":"}, {}]
```
#### Validating and debugging patterns {id="pattern-validation",new="2.1"}
#### Validating and debugging patterns {id="pattern-validation",version="2.1"}
The `Matcher` can validate patterns against a JSON schema with the option
`validate=True`. This is useful for debugging patterns during development, in
@ -927,7 +927,7 @@ as a stream.
</Infobox>
### Matching on other token attributes {id="phrasematcher-attrs",new="2.1"}
### Matching on other token attributes {id="phrasematcher-attrs",version="2.1"}
By default, the `PhraseMatcher` will match on the verbatim token text, e.g.
`Token.text`. By setting the `attr` argument on initialization, you can change
@ -991,7 +991,7 @@ to match phrases with the same sequence of punctuation and non-punctuation
tokens as the pattern. But this can easily get confusing and doesn't have much
of an advantage over writing one or two token patterns.
## Dependency Matcher {id="dependencymatcher",new="3",model="parser"}
## Dependency Matcher {id="dependencymatcher",version="3",model="parser"}
The [`DependencyMatcher`](/api/dependencymatcher) lets you match patterns within
the dependency parse using
@ -1272,7 +1272,7 @@ of patterns such as `{}` that match any token in the sentence.
</Infobox>
## Rule-based entity recognition {id="entityruler",new="2.1"}
## Rule-based entity recognition {id="entityruler",version="2.1"}
The [`EntityRuler`](/api/entityruler) is a component that lets you add named
entities based on pattern dictionaries, which makes it easy to combine
@ -1343,7 +1343,7 @@ doc = nlp("MyCorp Inc. is a company in the U.S.")
print([(ent.text, ent.label_) for ent in doc.ents])
```
#### Validating and debugging EntityRuler patterns {id="entityruler-pattern-validation",new="2.1.8"}
#### Validating and debugging EntityRuler patterns {id="entityruler-pattern-validation",version="2.1.8"}
The entity ruler can validate patterns against a JSON schema with the config
setting `"validate"`. See details under
@ -1353,7 +1353,7 @@ setting `"validate"`. See details under
ruler = nlp.add_pipe("entity_ruler", config={"validate": True})
```
### Adding IDs to patterns {id="entityruler-ent-ids",new="2.2.2"}
### Adding IDs to patterns {id="entityruler-ent-ids",version="2.2.2"}
The [`EntityRuler`](/api/entityruler) can also accept an `id` attribute for each
pattern. Using the `id` attribute allows multiple patterns to be associated with
@ -1427,7 +1427,7 @@ all pipeline components will be restored and deserialized including the enti
ruler. This lets you ship powerful pipeline packages with binary weights _and_
rules included!
### Using a large number of phrase patterns {id="entityruler-large-phrase-patterns",new="2.2.4"}
### Using a large number of phrase patterns {id="entityruler-large-phrase-patterns",version="2.2.4"}
<!-- TODO: double-check that this still works if the ruler is added to the pipeline on creation, and include suggestion if needed -->
@ -1455,7 +1455,7 @@ with nlp.select_pipes(enable="tagger"):
ruler.add_patterns(patterns)
```
## Rule-based span matching {id="spanruler",new="3.3.1"}
## Rule-based span matching {id="spanruler",version="3.3.1"}
The [`SpanRuler`](/api/spanruler) is a generalized version of the entity ruler
that lets you add spans to `doc.spans` or `doc.ents` based on pattern

View File

@ -49,7 +49,7 @@ the language class, creates and adds the pipeline components based on the config
and _then_ loads in the binary data. You can read more about this process
[here](/usage/processing-pipelines#pipelines).
## Serializing Doc objects efficiently {id="docs",new="2.2"}
## Serializing Doc objects efficiently {id="docs",version="2.2"}
If you're working with lots of data, you'll probably need to pass analyses
between machines, either to use something like [Dask](https://dask.org) or
@ -292,9 +292,9 @@ custom components to spaCy automatically.
</Infobox>
<!-- ## Initializing components with data {id="initialization",new="3"} -->
<!-- ## Initializing components with data {id="initialization",version="3"} -->
## Using entry points {id="entry-points",new="2.1"}
## Using entry points {id="entry-points",version="2.1"}
Entry points let you expose parts of a Python package you write to other Python
packages. This lets one application easily customize the behavior of another, by
@ -540,7 +540,7 @@ pipeline packages you [train](/usage/training), which could then specify
`lang = snk` in their `config.cfg` without spaCy raising an error because the
language is not available in the core library.
### Custom displaCy colors via entry points {id="entry-points-displacy",new="2.2"}
### Custom displaCy colors via entry points {id="entry-points-displacy",version="2.2"}
If you're training a named entity recognition model for a custom domain, you may
end up training different labels that don't have pre-defined colors in the

View File

@ -518,7 +518,7 @@ replace_listeners = ["model.tok2vec"]
</Infobox>
### Using predictions from preceding components {id="annotating-components",new="3.1"}
### Using predictions from preceding components {id="annotating-components",version="3.1"}
By default, components are updated in isolation during training, which means
that they don't see the predictions of any earlier components in the pipeline. A
@ -1657,7 +1657,7 @@ typically give you everything you need to train fully custom pipelines with
</Infobox>
### Training from a Python script {id="api-train",new="3.2"}
### Training from a Python script {id="api-train",version="3.2"}
If you want to run the training from a Python script instead of using the
[`spacy train`](/api/cli#train) CLI command, you can call into the

View File

@ -1,7 +1,7 @@
---
title: Visualizers
teaser: Visualize dependencies and entities in your browser or in a notebook
new: 2
version: 2
menu:
- ['Dependencies', 'dep']
- ['Named Entities', 'ent']
@ -79,7 +79,7 @@ For a list of all available options, see the
![displaCy visualizer (compact mode)](/images/displacy-compact.svg)
### Visualizing long texts {id="dep-long-text",new="2.0.12"}
### Visualizing long texts {id="dep-long-text",version="2.0.12"}
Long texts can become difficult to read when displayed in one row, so it's often
better to visualize them sentence-by-sentence instead. As of v2.0.12, `displacy`

View File

@ -83,7 +83,7 @@ const Headline = ({
Component,
id,
name,
new: version,
version,
model,
tag,
source,
@ -136,7 +136,7 @@ const Headline = ({
Headline.propTypes = {
Component: PropTypes.oneOfType([PropTypes.element, PropTypes.string]).isRequired,
id: PropTypes.oneOfType([PropTypes.string, PropTypes.oneOf([false])]),
new: PropTypes.string,
version: PropTypes.string,
model: PropTypes.string,
source: PropTypes.string,
tag: PropTypes.string,