mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 09:14:32 +03:00
Rename heading attribute
`new` was causing some weird issue, so renaming it to `version`
This commit is contained in:
parent
5f5d09f9dc
commit
94aa3629bb
|
@ -2,7 +2,7 @@
|
|||
title: AttributeRuler
|
||||
tag: class
|
||||
source: spacy/pipeline/attributeruler.py
|
||||
new: 3
|
||||
version: 3
|
||||
teaser: 'Pipeline component for rule-based token attribute assignment'
|
||||
api_string_name: attribute_ruler
|
||||
api_trainable: false
|
||||
|
|
|
@ -87,7 +87,7 @@ $ python -m spacy info [model] [--markdown] [--silent] [--exclude]
|
|||
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
|
||||
| **PRINTS** | Information about your spaCy installation. |
|
||||
|
||||
## validate {id="validate",new="2",tag="command"}
|
||||
## validate {id="validate",version="2",tag="command"}
|
||||
|
||||
Find all trained pipeline packages installed in the current environment and
|
||||
check whether they are compatible with the currently installed version of spaCy.
|
||||
|
@ -110,12 +110,12 @@ $ python -m spacy validate
|
|||
| ---------- | -------------------------------------------------------------------- |
|
||||
| **PRINTS** | Details about the compatibility of your installed pipeline packages. |
|
||||
|
||||
## init {id="init",new="3"}
|
||||
## init {id="init",version="3"}
|
||||
|
||||
The `spacy init` CLI includes helpful commands for initializing training config
|
||||
files and pipeline directories.
|
||||
|
||||
### init config {id="init-config",new="3",tag="command"}
|
||||
### init config {id="init-config",version="3",tag="command"}
|
||||
|
||||
Initialize and save a [`config.cfg` file](/usage/training#config) using the
|
||||
**recommended settings** for your use case. It works just like the
|
||||
|
@ -147,7 +147,7 @@ $ python -m spacy init config [output_file] [--lang] [--pipeline] [--optimize] [
|
|||
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
|
||||
| **CREATES** | The config file for training. |
|
||||
|
||||
### init fill-config {id="init-fill-config",new="3"}
|
||||
### init fill-config {id="init-fill-config",version="3"}
|
||||
|
||||
Auto-fill a partial [.cfg file](/usage/training#config) with **all default
|
||||
values**, e.g. a config generated with the
|
||||
|
@ -183,7 +183,7 @@ $ python -m spacy init fill-config [base_path] [output_file] [--diff]
|
|||
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
|
||||
| **CREATES** | Complete and auto-filled config file for training. |
|
||||
|
||||
### init vectors {id="init-vectors",new="3",tag="command"}
|
||||
### init vectors {id="init-vectors",version="3",tag="command"}
|
||||
|
||||
Convert [word vectors](/usage/linguistic-features#vectors-similarity) for use
|
||||
with spaCy. Will export an `nlp` object that you can use in the
|
||||
|
@ -215,7 +215,7 @@ $ python -m spacy init vectors [lang] [vectors_loc] [output_dir] [--prune] [--tr
|
|||
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
|
||||
| **CREATES** | A spaCy pipeline directory containing the vocab and vectors. |
|
||||
|
||||
### init labels {id="init-labels",new="3",tag="command"}
|
||||
### init labels {id="init-labels",version="3",tag="command"}
|
||||
|
||||
Generate JSON files for the labels in the data. This helps speed up the training
|
||||
process, since spaCy won't have to preprocess the data to extract the labels.
|
||||
|
@ -287,12 +287,12 @@ $ python -m spacy convert [input_file] [output_dir] [--converter] [--file-type]
|
|||
| `ner` / `conll` | NER with IOB/IOB2/BILUO tags, one token per line with columns separated by whitespace. The first column is the token and the final column is the NER tag. Sentences are separated by blank lines and documents are separated by the line `-DOCSTART- -X- O O`. Supports CoNLL 2003 NER format. See [sample data](%%GITHUB_SPACY/extra/example_data/ner_example_data). |
|
||||
| `iob` | NER with IOB/IOB2/BILUO tags, one sentence per line with tokens separated by whitespace and annotation separated by `\|`, either `word\|B-ENT`or`word\|POS\|B-ENT`. See [sample data](%%GITHUB_SPACY/extra/example_data/ner_example_data). |
|
||||
|
||||
## debug {id="debug",new="3"}
|
||||
## debug {id="debug",version="3"}
|
||||
|
||||
The `spacy debug` CLI includes helpful commands for debugging and profiling your
|
||||
configs, data and implementations.
|
||||
|
||||
### debug config {id="debug-config",new="3",tag="command"}
|
||||
### debug config {id="debug-config",version="3",tag="command"}
|
||||
|
||||
Debug a [`config.cfg` file](/usage/training#config) and show validation errors.
|
||||
The command will create all objects in the tree and validate them. Note that
|
||||
|
@ -893,7 +893,7 @@ $ python -m spacy debug profile [model] [inputs] [--n-texts]
|
|||
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
|
||||
| **PRINTS** | Profiling information for the pipeline. |
|
||||
|
||||
### debug model {id="debug-model",new="3",tag="command"}
|
||||
### debug model {id="debug-model",version="3",tag="command"}
|
||||
|
||||
Debug a Thinc [`Model`](https://thinc.ai/docs/api-model) by running it on a
|
||||
sample text and checking how it updates its internal weights and parameters.
|
||||
|
@ -1061,7 +1061,7 @@ $ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id]
|
|||
| overrides | Config parameters to override. Should be options starting with `--` that correspond to the config section and value to override, e.g. `--paths.train ./train.spacy`. ~~Any (option/flag)~~ |
|
||||
| **CREATES** | The final trained pipeline and the best trained pipeline. |
|
||||
|
||||
### Calling the training function from Python {id="train-function",new="3.2"}
|
||||
### Calling the training function from Python {id="train-function",version="3.2"}
|
||||
|
||||
The training CLI exposes a `train` helper function that lets you run the
|
||||
training just like `spacy train`. Usually it's easier to use the command line
|
||||
|
@ -1084,7 +1084,7 @@ directly, but if you need to kick off training from code this is how to do it.
|
|||
| `use_gpu` | Which GPU to use. Defaults to -1 for no GPU. ~~int~~ |
|
||||
| `overrides` | Values to override config settings. ~~Dict[str, Any]~~ |
|
||||
|
||||
## pretrain {id="pretrain",new="2.1",tag="command,experimental"}
|
||||
## pretrain {id="pretrain",version="2.1",tag="command,experimental"}
|
||||
|
||||
Pretrain the "token to vector" ([`Tok2vec`](/api/tok2vec)) layer of pipeline
|
||||
components on raw text, using an approximate language-modeling objective.
|
||||
|
@ -1132,7 +1132,7 @@ $ python -m spacy pretrain [config_path] [output_dir] [--code] [--resume-path] [
|
|||
| overrides | Config parameters to override. Should be options starting with `--` that correspond to the config section and value to override, e.g. `--training.dropout 0.2`. ~~Any (option/flag)~~ |
|
||||
| **CREATES** | The pretrained weights that can be used to initialize `spacy train`. |
|
||||
|
||||
## evaluate {id="evaluate",new="2",tag="command"}
|
||||
## evaluate {id="evaluate",version="2",tag="command"}
|
||||
|
||||
Evaluate a trained pipeline. Expects a loadable spaCy pipeline (package name or
|
||||
path) and evaluation data in the
|
||||
|
@ -1162,7 +1162,7 @@ $ python -m spacy evaluate [model] [data_path] [--output] [--code] [--gold-prepr
|
|||
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
|
||||
| **CREATES** | Training results and optional metrics and visualizations. |
|
||||
|
||||
## find-threshold {id="find-threshold",new="3.5",tag="command"}
|
||||
## find-threshold {id="find-threshold",version="3.5",tag="command"}
|
||||
|
||||
Runs prediction trials for a trained model with varying tresholds to maximize
|
||||
the specified metric. The search space for the threshold is traversed linearly
|
||||
|
@ -1281,7 +1281,7 @@ $ python -m spacy package [input_dir] [output_dir] [--code] [--meta-path] [--cre
|
|||
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
|
||||
| **CREATES** | A Python package containing the spaCy pipeline. |
|
||||
|
||||
## project {id="project",new="3"}
|
||||
## project {id="project",version="3"}
|
||||
|
||||
The `spacy project` CLI includes subcommands for working with
|
||||
[spaCy projects](/usage/projects), end-to-end workflows for building and
|
||||
|
@ -1543,7 +1543,7 @@ $ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose] [--
|
|||
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
|
||||
| **CREATES** | A `dvc.yaml` file in the project directory, based on the steps defined in the given workflow. |
|
||||
|
||||
## huggingface-hub {id="huggingface-hub",new="3.1"}
|
||||
## huggingface-hub {id="huggingface-hub",version="3.1"}
|
||||
|
||||
The `spacy huggingface-cli` CLI includes commands for uploading your trained
|
||||
spaCy pipelines to the [Hugging Face Hub](https://huggingface.co/).
|
||||
|
|
|
@ -3,7 +3,7 @@ title: Corpus
|
|||
teaser: An annotated corpus
|
||||
tag: class
|
||||
source: spacy/training/corpus.py
|
||||
new: 3
|
||||
version: 3
|
||||
---
|
||||
|
||||
This class manages annotated corpora and can be used for training and
|
||||
|
|
|
@ -14,7 +14,7 @@ vocabulary data. For an overview of label schemes used by the models, see the
|
|||
[models directory](/models). Each trained pipeline documents the label schemes
|
||||
used in its components, depending on the data it was trained on.
|
||||
|
||||
## Training config {id="config",new="3"}
|
||||
## Training config {id="config",version="3"}
|
||||
|
||||
Config files define the training process and pipeline and can be passed to
|
||||
[`spacy train`](/api/cli#train). They use
|
||||
|
@ -257,7 +257,7 @@ Also see the usage guides on the
|
|||
|
||||
## Training data {id="training"}
|
||||
|
||||
### Binary training format {id="binary-training",new="3"}
|
||||
### Binary training format {id="binary-training",version="3"}
|
||||
|
||||
> #### Example
|
||||
>
|
||||
|
@ -466,7 +466,7 @@ gold_dict = {"entities": [(0, 12, "PERSON")],
|
|||
example = Example.from_dict(doc, gold_dict)
|
||||
```
|
||||
|
||||
## Lexical data for vocabulary {id="vocab-jsonl",new="2"}
|
||||
## Lexical data for vocabulary {id="vocab-jsonl",version="2"}
|
||||
|
||||
This data file can be provided via the `vocab_data` setting in the
|
||||
`[initialize]` block of the training config to pre-define the lexical data to
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: DependencyMatcher
|
||||
teaser: Match subtrees within a dependency parse
|
||||
tag: class
|
||||
new: 3
|
||||
version: 3
|
||||
source: spacy/matcher/dependencymatcher.pyx
|
||||
---
|
||||
|
||||
|
|
|
@ -155,7 +155,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/dependencyparser#call) and
|
|||
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
|
||||
| **YIELDS** | The processed documents in order. ~~Doc~~ |
|
||||
|
||||
## DependencyParser.initialize {id="initialize",tag="method",new="3"}
|
||||
## DependencyParser.initialize {id="initialize",tag="method",version="3"}
|
||||
|
||||
Initialize the component for training. `get_examples` should be a function that
|
||||
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||
|
@ -432,7 +432,7 @@ The labels currently added to the component.
|
|||
| ----------- | ------------------------------------------------------ |
|
||||
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
|
||||
|
||||
## DependencyParser.label_data {id="label_data",tag="property",new="3"}
|
||||
## DependencyParser.label_data {id="label_data",tag="property",version="3"}
|
||||
|
||||
The labels currently added to the component and their internal meta information.
|
||||
This is the data generated by [`init labels`](/api/cli#init-labels) and used by
|
||||
|
|
|
@ -115,7 +115,7 @@ Get the number of tokens in the document.
|
|||
| ----------- | --------------------------------------------- |
|
||||
| **RETURNS** | The number of tokens in the document. ~~int~~ |
|
||||
|
||||
## Doc.set_extension {id="set_extension",tag="classmethod",new="2"}
|
||||
## Doc.set_extension {id="set_extension",tag="classmethod",version="2"}
|
||||
|
||||
Define a custom attribute on the `Doc` which becomes available via `Doc._`. For
|
||||
details, see the documentation on
|
||||
|
@ -140,7 +140,7 @@ details, see the documentation on
|
|||
| `setter` | Setter function that takes the `Doc` and a value, and modifies the object. Is called when the user writes to the `Doc._` attribute. ~~Optional[Callable[[Doc, Any], None]]~~ |
|
||||
| `force` | Force overwriting existing attribute. ~~bool~~ |
|
||||
|
||||
## Doc.get_extension {id="get_extension",tag="classmethod",new="2"}
|
||||
## Doc.get_extension {id="get_extension",tag="classmethod",version="2"}
|
||||
|
||||
Look up a previously registered extension by name. Returns a 4-tuple
|
||||
`(default, method, getter, setter)` if the extension is registered. Raises a
|
||||
|
@ -160,7 +160,7 @@ Look up a previously registered extension by name. Returns a 4-tuple
|
|||
| `name` | Name of the extension. ~~str~~ |
|
||||
| **RETURNS** | A `(default, method, getter, setter)` tuple of the extension. ~~Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]~~ |
|
||||
|
||||
## Doc.has_extension {id="has_extension",tag="classmethod",new="2"}
|
||||
## Doc.has_extension {id="has_extension",tag="classmethod",version="2"}
|
||||
|
||||
Check whether an extension has been registered on the `Doc` class.
|
||||
|
||||
|
@ -177,7 +177,7 @@ Check whether an extension has been registered on the `Doc` class.
|
|||
| `name` | Name of the extension to check. ~~str~~ |
|
||||
| **RETURNS** | Whether the extension has been registered. ~~bool~~ |
|
||||
|
||||
## Doc.remove_extension {id="remove_extension",tag="classmethod",new="2.0.12"}
|
||||
## Doc.remove_extension {id="remove_extension",tag="classmethod",version="2.0.12"}
|
||||
|
||||
Remove a previously registered extension.
|
||||
|
||||
|
@ -195,7 +195,7 @@ Remove a previously registered extension.
|
|||
| `name` | Name of the extension. ~~str~~ |
|
||||
| **RETURNS** | A `(default, method, getter, setter)` tuple of the removed extension. ~~Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]~~ |
|
||||
|
||||
## Doc.char_span {id="char_span",tag="method",new="2"}
|
||||
## Doc.char_span {id="char_span",tag="method",version="2"}
|
||||
|
||||
Create a `Span` object from the slice `doc.text[start_idx:end_idx]`. Returns
|
||||
`None` if the character indices don't map to a valid span using the default
|
||||
|
@ -219,7 +219,7 @@ alignment mode `"strict".
|
|||
| `alignment_mode` | How character indices snap to token boundaries. Options: `"strict"` (no snapping), `"contract"` (span of all tokens completely within the character span), `"expand"` (span of all tokens at least partially covered by the character span). Defaults to `"strict"`. ~~str~~ |
|
||||
| **RETURNS** | The newly constructed object or `None`. ~~Optional[Span]~~ |
|
||||
|
||||
## Doc.set_ents {id="set_ents",tag="method",new="3"}
|
||||
## Doc.set_ents {id="set_ents",tag="method",version="3"}
|
||||
|
||||
Set the named entities in the document.
|
||||
|
||||
|
@ -379,7 +379,7 @@ array of attributes.
|
|||
| `exclude` | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
|
||||
| **RETURNS** | The `Doc` itself. ~~Doc~~ |
|
||||
|
||||
## Doc.from_docs {id="from_docs",tag="staticmethod",new="3"}
|
||||
## Doc.from_docs {id="from_docs",tag="staticmethod",version="3"}
|
||||
|
||||
Concatenate multiple `Doc` objects to form a new one. Raises an error if the
|
||||
`Doc` objects do not all share the same `Vocab`.
|
||||
|
@ -408,7 +408,7 @@ Concatenate multiple `Doc` objects to form a new one. Raises an error if the
|
|||
| `exclude` <Tag variant="new">3.3</Tag> | String names of Doc attributes to exclude. Supported: `spans`, `tensor`, `user_data`. ~~Iterable[str]~~ |
|
||||
| **RETURNS** | The new `Doc` object that is containing the other docs or `None`, if `docs` is empty or `None`. ~~Optional[Doc]~~ |
|
||||
|
||||
## Doc.to_disk {id="to_disk",tag="method",new="2"}
|
||||
## Doc.to_disk {id="to_disk",tag="method",version="2"}
|
||||
|
||||
Save the current state to a directory.
|
||||
|
||||
|
@ -424,7 +424,7 @@ Save the current state to a directory.
|
|||
| _keyword-only_ | |
|
||||
| `exclude` | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
|
||||
|
||||
## Doc.from_disk {id="from_disk",tag="method",new="2"}
|
||||
## Doc.from_disk {id="from_disk",tag="method",version="2"}
|
||||
|
||||
Loads state from a directory. Modifies the object in place and returns it.
|
||||
|
||||
|
@ -498,7 +498,7 @@ deprecated [`JSON training format`](/api/data-formats#json-input).
|
|||
| `underscore` | Optional list of string names of custom `Doc` attributes. Attribute values need to be JSON-serializable. Values will be added to an `"_"` key in the data, e.g. `"_": {"foo": "bar"}`. ~~Optional[List[str]]~~ |
|
||||
| **RETURNS** | The data in JSON format. ~~Dict[str, Any]~~ |
|
||||
|
||||
## Doc.from_json {id="from_json",tag="method",new="3.3.1"}
|
||||
## Doc.from_json {id="from_json",tag="method",version="3.3.1"}
|
||||
|
||||
Deserializes a document from JSON, i.e. generates a document from the provided
|
||||
JSON data as generated by [`Doc.to_json()`](/api/doc#to_json).
|
||||
|
@ -520,7 +520,7 @@ JSON data as generated by [`Doc.to_json()`](/api/doc#to_json).
|
|||
| `validate` | Whether to validate the JSON input against the expected schema for detailed debugging. Defaults to `False`. ~~bool~~ |
|
||||
| **RETURNS** | A `Doc` corresponding to the provided JSON. ~~Doc~~ |
|
||||
|
||||
## Doc.retokenize {id="retokenize",tag="contextmanager",new="2.1"}
|
||||
## Doc.retokenize {id="retokenize",tag="contextmanager",version="2.1"}
|
||||
|
||||
Context manager to handle retokenization of the `Doc`. Modifications to the
|
||||
`Doc`'s tokenization are stored, and then made all at once when the context
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
title: DocBin
|
||||
tag: class
|
||||
new: 2.2
|
||||
version: 2.2
|
||||
teaser: Pack Doc objects for binary serialization
|
||||
source: spacy/tokens/_serialize.py
|
||||
---
|
||||
|
@ -150,7 +150,7 @@ Deserialize the `DocBin`'s annotations from a bytestring.
|
|||
| `bytes_data` | The data to load from. ~~bytes~~ |
|
||||
| **RETURNS** | The loaded `DocBin`. ~~DocBin~~ |
|
||||
|
||||
## DocBin.to_disk {id="to_disk",tag="method",new="3"}
|
||||
## DocBin.to_disk {id="to_disk",tag="method",version="3"}
|
||||
|
||||
Save the serialized `DocBin` to a file. Typically uses the `.spacy` extension
|
||||
and the result can be used as the input data for
|
||||
|
@ -168,7 +168,7 @@ and the result can be used as the input data for
|
|||
| -------- | -------------------------------------------------------------------------- |
|
||||
| `path` | The file path, typically with the `.spacy` extension. ~~Union[str, Path]~~ |
|
||||
|
||||
## DocBin.from_disk {id="from_disk",tag="method",new="3"}
|
||||
## DocBin.from_disk {id="from_disk",tag="method",version="3"}
|
||||
|
||||
Load a serialized `DocBin` from a file. Typically uses the `.spacy` extension.
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: EditTreeLemmatizer
|
||||
tag: class
|
||||
source: spacy/pipeline/edit_tree_lemmatizer.py
|
||||
new: 3.3
|
||||
version: 3.3
|
||||
teaser: 'Pipeline component for lemmatization'
|
||||
api_base_class: /api/pipe
|
||||
api_string_name: trainable_lemmatizer
|
||||
|
@ -138,7 +138,7 @@ and [`pipe`](/api/edittreelemmatizer#pipe) delegate to the
|
|||
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
|
||||
| **YIELDS** | The processed documents in order. ~~Doc~~ |
|
||||
|
||||
## EditTreeLemmatizer.initialize {id="initialize",tag="method",new="3"}
|
||||
## EditTreeLemmatizer.initialize {id="initialize",tag="method",version="3"}
|
||||
|
||||
Initialize the component for training. `get_examples` should be a function that
|
||||
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||
|
@ -371,7 +371,7 @@ identifiers of edit trees.
|
|||
| ----------- | ------------------------------------------------------ |
|
||||
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
|
||||
|
||||
## EditTreeLemmatizer.label_data {id="label_data",tag="property",new="3"}
|
||||
## EditTreeLemmatizer.label_data {id="label_data",tag="property",version="3"}
|
||||
|
||||
The labels currently added to the component and their internal meta information.
|
||||
This is the data generated by [`init labels`](/api/cli#init-labels) and used by
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: EntityLinker
|
||||
tag: class
|
||||
source: spacy/pipeline/entity_linker.py
|
||||
new: 2.2
|
||||
version: 2.2
|
||||
teaser: 'Pipeline component for named entity linking and disambiguation'
|
||||
api_base_class: /api/pipe
|
||||
api_string_name: entity_linker
|
||||
|
@ -161,7 +161,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/entitylinker#call) and
|
|||
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
|
||||
| **YIELDS** | The processed documents in order. ~~Doc~~ |
|
||||
|
||||
## EntityLinker.set_kb {id="set_kb",tag="method",new="3"}
|
||||
## EntityLinker.set_kb {id="set_kb",tag="method",version="3"}
|
||||
|
||||
The `kb_loader` should be a function that takes a `Vocab` instance and creates
|
||||
the `KnowledgeBase`, ensuring that the strings of the knowledge base are synced
|
||||
|
@ -183,7 +183,7 @@ with the current vocab.
|
|||
| ----------- | ---------------------------------------------------------------------------------------------------------------- |
|
||||
| `kb_loader` | Function that creates a [`KnowledgeBase`](/api/kb) from a `Vocab` instance. ~~Callable[[Vocab], KnowledgeBase]~~ |
|
||||
|
||||
## EntityLinker.initialize {id="initialize",tag="method",new="3"}
|
||||
## EntityLinker.initialize {id="initialize",tag="method",version="3"}
|
||||
|
||||
Initialize the component for training. `get_examples` should be a function that
|
||||
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||
|
|
|
@ -151,7 +151,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/entityrecognizer#call) and
|
|||
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
|
||||
| **YIELDS** | The processed documents in order. ~~Doc~~ |
|
||||
|
||||
## EntityRecognizer.initialize {id="initialize",tag="method",new="3"}
|
||||
## EntityRecognizer.initialize {id="initialize",tag="method",version="3"}
|
||||
|
||||
Initialize the component for training. `get_examples` should be a function that
|
||||
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||
|
@ -427,7 +427,7 @@ The labels currently added to the component.
|
|||
| ----------- | ------------------------------------------------------ |
|
||||
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
|
||||
|
||||
## EntityRecognizer.label_data {id="label_data",tag="property",new="3"}
|
||||
## EntityRecognizer.label_data {id="label_data",tag="property",version="3"}
|
||||
|
||||
The labels currently added to the component and their internal meta information.
|
||||
This is the data generated by [`init labels`](/api/cli#init-labels) and used by
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: EntityRuler
|
||||
tag: class
|
||||
source: spacy/pipeline/entityruler.py
|
||||
new: 2.1
|
||||
version: 2.1
|
||||
teaser: 'Pipeline component for rule-based named entity recognition'
|
||||
api_string_name: entity_ruler
|
||||
api_trainable: false
|
||||
|
@ -96,7 +96,7 @@ be a token pattern (list) or a phrase pattern (string). For example:
|
|||
| `ent_id_sep` | Separator used internally for entity IDs. Defaults to `"\|\|"`. ~~str~~ |
|
||||
| `patterns` | Optional patterns to load in on initialization. ~~Optional[List[Dict[str, Union[str, List[dict]]]]]~~ |
|
||||
|
||||
## EntityRuler.initialize {id="initialize",tag="method",new="3"}
|
||||
## EntityRuler.initialize {id="initialize",tag="method",version="3"}
|
||||
|
||||
Initialize the component with data and used before training to load in rules
|
||||
from a [pattern file](/usage/rule-based-matching/#entityruler-files). This
|
||||
|
@ -210,7 +210,7 @@ of dicts) or a phrase pattern (string). For more details, see the usage guide on
|
|||
| ---------- | ---------------------------------------------------------------- |
|
||||
| `patterns` | The patterns to add. ~~List[Dict[str, Union[str, List[dict]]]]~~ |
|
||||
|
||||
## EntityRuler.remove {id="remove",tag="method",new="3.2.1"}
|
||||
## EntityRuler.remove {id="remove",tag="method",version="3.2.1"}
|
||||
|
||||
Remove a pattern by its ID from the entity ruler. A `ValueError` is raised if
|
||||
the ID does not exist.
|
||||
|
@ -307,7 +307,7 @@ All labels present in the match patterns.
|
|||
| ----------- | -------------------------------------- |
|
||||
| **RETURNS** | The string labels. ~~Tuple[str, ...]~~ |
|
||||
|
||||
## EntityRuler.ent_ids {id="ent_ids",tag="property",new="2.2.2"}
|
||||
## EntityRuler.ent_ids {id="ent_ids",tag="property",version="2.2.2"}
|
||||
|
||||
All entity IDs present in the `id` properties of the match patterns.
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@ title: Example
|
|||
teaser: A training instance
|
||||
tag: class
|
||||
source: spacy/training/example.pyx
|
||||
new: 3.0
|
||||
version: 3.0
|
||||
---
|
||||
|
||||
An `Example` holds the information for one training instance. It stores two
|
||||
|
@ -282,7 +282,7 @@ Split one `Example` into multiple `Example` objects, one for each sentence.
|
|||
| ----------- | ---------------------------------------------------------------------------- |
|
||||
| **RETURNS** | List of `Example` objects, one for each original sentence. ~~List[Example]~~ |
|
||||
|
||||
## Alignment {id="alignment-object",new="3"}
|
||||
## Alignment {id="alignment-object",version="3"}
|
||||
|
||||
Calculate alignment tables between two tokenizations.
|
||||
|
||||
|
|
|
@ -5,7 +5,7 @@ teaser:
|
|||
(ontology)
|
||||
tag: class
|
||||
source: spacy/kb/kb.pyx
|
||||
new: 2.2
|
||||
version: 2.2
|
||||
---
|
||||
|
||||
The `KnowledgeBase` object is an abstract class providing a method to generate
|
||||
|
|
|
@ -5,7 +5,7 @@ teaser:
|
|||
information in-memory.
|
||||
tag: class
|
||||
source: spacy/kb/kb_in_memory.pyx
|
||||
new: 3.5
|
||||
version: 3.5
|
||||
---
|
||||
|
||||
The `InMemoryLookupKB` class inherits from [`KnowledgeBase`](/api/kb) and
|
||||
|
|
|
@ -44,7 +44,7 @@ information in [`Language.meta`](/api/language#meta) and not to configure the
|
|||
| `create_tokenizer` | Optional function that receives the `nlp` object and returns a tokenizer. ~~Callable[[Language], Callable[[str], Doc]]~~ |
|
||||
| `batch_size` | Default batch size for [`pipe`](#pipe) and [`evaluate`](#evaluate). Defaults to `1000`. ~~int~~ |
|
||||
|
||||
## Language.from_config {id="from_config",tag="classmethod",new="3"}
|
||||
## Language.from_config {id="from_config",tag="classmethod",version="3"}
|
||||
|
||||
Create a `Language` object from a loaded config. Will set up the tokenizer and
|
||||
language data, add pipeline components based on the pipeline and add pipeline
|
||||
|
@ -76,7 +76,7 @@ spaCy loads a model under the hood based on its
|
|||
| `validate` | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
|
||||
| **RETURNS** | The initialized object. ~~Language~~ |
|
||||
|
||||
## Language.component {id="component",tag="classmethod",new="3"}
|
||||
## Language.component {id="component",tag="classmethod",version="3"}
|
||||
|
||||
Register a custom pipeline component under a given name. This allows
|
||||
initializing the component by name using
|
||||
|
@ -209,7 +209,7 @@ tokenization is skipped but the rest of the pipeline is run.
|
|||
| `n_process` | Number of processors to use. Defaults to `1`. ~~int~~ |
|
||||
| **YIELDS** | Documents in the order of the original text. ~~Doc~~ |
|
||||
|
||||
## Language.set_error_handler {id="set_error_handler",tag="method",new="3"}
|
||||
## Language.set_error_handler {id="set_error_handler",tag="method",version="3"}
|
||||
|
||||
Define a callback that will be invoked when an error is thrown during processing
|
||||
of one or more documents. Specifically, this function will call
|
||||
|
@ -231,7 +231,7 @@ being processed, and the original error.
|
|||
| --------------- | -------------------------------------------------------------------------------------------------------------- |
|
||||
| `error_handler` | A function that performs custom error handling. ~~Callable[[str, Callable[[Doc], Doc], List[Doc], Exception]~~ |
|
||||
|
||||
## Language.initialize {id="initialize",tag="method",new="3"}
|
||||
## Language.initialize {id="initialize",tag="method",version="3"}
|
||||
|
||||
Initialize the pipeline for training and return an
|
||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). Under the hood, it uses the
|
||||
|
@ -282,7 +282,7 @@ objects.
|
|||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
||||
|
||||
## Language.resume_training {id="resume_training",tag="method,experimental",new="3"}
|
||||
## Language.resume_training {id="resume_training",tag="method,experimental",version="3"}
|
||||
|
||||
Continue training a trained pipeline. Create and return an optimizer, and
|
||||
initialize "rehearsal" for any pipeline component that has a `rehearse` method.
|
||||
|
@ -342,7 +342,7 @@ and custom registered functions if needed. See the
|
|||
| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ |
|
||||
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
|
||||
|
||||
## Language.rehearse {id="rehearse",tag="method,experimental",new="3"}
|
||||
## Language.rehearse {id="rehearse",tag="method,experimental",version="3"}
|
||||
|
||||
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
|
||||
current model to make predictions similar to an initial model, to try to address
|
||||
|
@ -409,7 +409,7 @@ their original weights after the block.
|
|||
| -------- | ------------------------------------------------------ |
|
||||
| `params` | A dictionary of parameters keyed by model ID. ~~dict~~ |
|
||||
|
||||
## Language.add_pipe {id="add_pipe",tag="method",new="2"}
|
||||
## Language.add_pipe {id="add_pipe",tag="method",version="2"}
|
||||
|
||||
Add a component to the processing pipeline. Expects a name that maps to a
|
||||
component factory registered using
|
||||
|
@ -458,7 +458,7 @@ component, adds it to the pipeline and returns it.
|
|||
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
|
||||
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
|
||||
|
||||
## Language.create_pipe {id="create_pipe",tag="method",new="2"}
|
||||
## Language.create_pipe {id="create_pipe",tag="method",version="2"}
|
||||
|
||||
Create a pipeline component from a factory.
|
||||
|
||||
|
@ -487,7 +487,7 @@ To create a component and add it to the pipeline, you should always use
|
|||
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
|
||||
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
|
||||
|
||||
## Language.has_factory {id="has_factory",tag="classmethod",new="3"}
|
||||
## Language.has_factory {id="has_factory",tag="classmethod",version="3"}
|
||||
|
||||
Check whether a factory name is registered on the `Language` class or subclass.
|
||||
Will check for
|
||||
|
@ -514,7 +514,7 @@ the `Language` base class, available to all subclasses.
|
|||
| `name` | Name of the pipeline factory to check. ~~str~~ |
|
||||
| **RETURNS** | Whether a factory of that name is registered on the class. ~~bool~~ |
|
||||
|
||||
## Language.has_pipe {id="has_pipe",tag="method",new="2"}
|
||||
## Language.has_pipe {id="has_pipe",tag="method",version="2"}
|
||||
|
||||
Check whether a component is present in the pipeline. Equivalent to
|
||||
`name in nlp.pipe_names`.
|
||||
|
@ -536,7 +536,7 @@ Check whether a component is present in the pipeline. Equivalent to
|
|||
| `name` | Name of the pipeline component to check. ~~str~~ |
|
||||
| **RETURNS** | Whether a component of that name exists in the pipeline. ~~bool~~ |
|
||||
|
||||
## Language.get_pipe {id="get_pipe",tag="method",new="2"}
|
||||
## Language.get_pipe {id="get_pipe",tag="method",version="2"}
|
||||
|
||||
Get a pipeline component for a given component name.
|
||||
|
||||
|
@ -552,7 +552,7 @@ Get a pipeline component for a given component name.
|
|||
| `name` | Name of the pipeline component to get. ~~str~~ |
|
||||
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
|
||||
|
||||
## Language.replace_pipe {id="replace_pipe",tag="method",new="2"}
|
||||
## Language.replace_pipe {id="replace_pipe",tag="method",version="2"}
|
||||
|
||||
Replace a component in the pipeline and return the new component.
|
||||
|
||||
|
@ -580,7 +580,7 @@ and instead expects the **name of a component factory** registered using
|
|||
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
|
||||
| **RETURNS** | The new pipeline component. ~~Callable[[Doc], Doc]~~ |
|
||||
|
||||
## Language.rename_pipe {id="rename_pipe",tag="method",new="2"}
|
||||
## Language.rename_pipe {id="rename_pipe",tag="method",version="2"}
|
||||
|
||||
Rename a component in the pipeline. Useful to create custom names for
|
||||
pre-defined and pre-loaded components. To change the default name of a component
|
||||
|
@ -598,7 +598,7 @@ added to the pipeline, you can also use the `name` argument on
|
|||
| `old_name` | Name of the component to rename. ~~str~~ |
|
||||
| `new_name` | New name of the component. ~~str~~ |
|
||||
|
||||
## Language.remove_pipe {id="remove_pipe",tag="method",new="2"}
|
||||
## Language.remove_pipe {id="remove_pipe",tag="method",version="2"}
|
||||
|
||||
Remove a component from the pipeline. Returns the removed component name and
|
||||
component function.
|
||||
|
@ -615,7 +615,7 @@ component function.
|
|||
| `name` | Name of the component to remove. ~~str~~ |
|
||||
| **RETURNS** | A `(name, component)` tuple of the removed component. ~~Tuple[str, Callable[[Doc], Doc]]~~ |
|
||||
|
||||
## Language.disable_pipe {id="disable_pipe",tag="method",new="3"}
|
||||
## Language.disable_pipe {id="disable_pipe",tag="method",version="3"}
|
||||
|
||||
Temporarily disable a pipeline component so it's not run as part of the
|
||||
pipeline. Disabled components are listed in
|
||||
|
@ -641,7 +641,7 @@ does nothing.
|
|||
| ------ | ----------------------------------------- |
|
||||
| `name` | Name of the component to disable. ~~str~~ |
|
||||
|
||||
## Language.enable_pipe {id="enable_pipe",tag="method",new="3"}
|
||||
## Language.enable_pipe {id="enable_pipe",tag="method",version="3"}
|
||||
|
||||
Enable a previously disabled component (e.g. via
|
||||
[`Language.disable_pipes`](/api/language#disable_pipes)) so it's run as part of
|
||||
|
@ -663,7 +663,7 @@ already enabled, this method does nothing.
|
|||
| ------ | ---------------------------------------- |
|
||||
| `name` | Name of the component to enable. ~~str~~ |
|
||||
|
||||
## Language.select_pipes {id="select_pipes",tag="contextmanager, method",new="3"}
|
||||
## Language.select_pipes {id="select_pipes",tag="contextmanager, method",version="3"}
|
||||
|
||||
Disable one or more pipeline components. If used as a context manager, the
|
||||
pipeline will be restored to the initial state at the end of the block.
|
||||
|
@ -706,7 +706,7 @@ As of spaCy v3.0, the `disable_pipes` method has been renamed to `select_pipes`:
|
|||
| `enable` | Name(s) of pipeline component(s) that will not be disabled. ~~Optional[Union[str, Iterable[str]]]~~ |
|
||||
| **RETURNS** | The disabled pipes that can be restored by calling the object's `.restore()` method. ~~DisabledPipes~~ |
|
||||
|
||||
## Language.get_factory_meta {id="get_factory_meta",tag="classmethod",new="3"}
|
||||
## Language.get_factory_meta {id="get_factory_meta",tag="classmethod",version="3"}
|
||||
|
||||
Get the factory meta information for a given pipeline component name. Expects
|
||||
the name of the component **factory**. The factory meta is an instance of the
|
||||
|
@ -728,7 +728,7 @@ information about the component and its default provided by the
|
|||
| `name` | The factory name. ~~str~~ |
|
||||
| **RETURNS** | The factory meta. ~~FactoryMeta~~ |
|
||||
|
||||
## Language.get_pipe_meta {id="get_pipe_meta",tag="method",new="3"}
|
||||
## Language.get_pipe_meta {id="get_pipe_meta",tag="method",version="3"}
|
||||
|
||||
Get the factory meta information for a given pipeline component name. Expects
|
||||
the name of the component **instance** in the pipeline. The factory meta is an
|
||||
|
@ -751,7 +751,7 @@ contains the information about the component and its default provided by the
|
|||
| `name` | The pipeline component name. ~~str~~ |
|
||||
| **RETURNS** | The factory meta. ~~FactoryMeta~~ |
|
||||
|
||||
## Language.analyze_pipes {id="analyze_pipes",tag="method",new="3"}
|
||||
## Language.analyze_pipes {id="analyze_pipes",tag="method",version="3"}
|
||||
|
||||
Analyze the current pipeline components and show a summary of the attributes
|
||||
they assign and require, and the scores they set. The data is based on the
|
||||
|
@ -840,7 +840,7 @@ token.ent_iob, token.ent_type
|
|||
| `pretty` | Pretty-print the results as a table. Defaults to `False`. ~~bool~~ |
|
||||
| **RETURNS** | Dictionary containing the pipe analysis, keyed by `"summary"` (component meta by pipe), `"problems"` (attribute names by pipe) and `"attrs"` (pipes that assign and require an attribute, keyed by attribute). ~~Optional[Dict[str, Any]]~~ |
|
||||
|
||||
## Language.replace_listeners {id="replace_listeners",tag="method",new="3"}
|
||||
## Language.replace_listeners {id="replace_listeners",tag="method",version="3"}
|
||||
|
||||
Find [listener layers](/usage/embeddings-transformers#embedding-layers)
|
||||
(connecting to a shared token-to-vector embedding component) of a given pipeline
|
||||
|
@ -911,7 +911,7 @@ information is expressed in the [`config.cfg`](/api/data-formats#config).
|
|||
| ----------- | --------------------------------- |
|
||||
| **RETURNS** | The meta data. ~~Dict[str, Any]~~ |
|
||||
|
||||
## Language.config {id="config",tag="property",new="3"}
|
||||
## Language.config {id="config",tag="property",version="3"}
|
||||
|
||||
Export a trainable [`config.cfg`](/api/data-formats#config) for the current
|
||||
`nlp` object. Includes the current pipeline, all configs used to create the
|
||||
|
@ -932,7 +932,7 @@ subclass of the built-in `dict`. It supports the additional methods `to_disk`
|
|||
| ----------- | ---------------------- |
|
||||
| **RETURNS** | The config. ~~Config~~ |
|
||||
|
||||
## Language.to_disk {id="to_disk",tag="method",new="2"}
|
||||
## Language.to_disk {id="to_disk",tag="method",version="2"}
|
||||
|
||||
Save the current state to a directory. Under the hood, this method delegates to
|
||||
the `to_disk` methods of the individual pipeline components, if available. This
|
||||
|
@ -951,7 +951,7 @@ will be saved to disk.
|
|||
| _keyword-only_ | |
|
||||
| `exclude` | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
|
||||
|
||||
## Language.from_disk {id="from_disk",tag="method",new="2"}
|
||||
## Language.from_disk {id="from_disk",tag="method",version="2"}
|
||||
|
||||
Loads state from a directory, including all data that was saved with the
|
||||
`Language` object. Modifies the object in place and returns it.
|
||||
|
@ -1117,7 +1117,7 @@ serialization by passing in the string names via the `exclude` argument.
|
|||
| `meta` | The meta data, available as [`Language.meta`](/api/language#meta). |
|
||||
| ... | String names of pipeline components, e.g. `"ner"`. |
|
||||
|
||||
## FactoryMeta {id="factorymeta",new="3",tag="dataclass"}
|
||||
## FactoryMeta {id="factorymeta",version="3",tag="dataclass"}
|
||||
|
||||
The `FactoryMeta` contains the information about the component and its default
|
||||
provided by the [`@Language.component`](/api/language#component) or
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: Lemmatizer
|
||||
tag: class
|
||||
source: spacy/pipeline/lemmatizer.py
|
||||
new: 3
|
||||
version: 3
|
||||
teaser: 'Pipeline component for lemmatization'
|
||||
api_string_name: lemmatizer
|
||||
api_trainable: false
|
||||
|
|
|
@ -3,7 +3,7 @@ title: Lookups
|
|||
teaser: A container for large lookup tables and dictionaries
|
||||
tag: class
|
||||
source: spacy/lookups.py
|
||||
new: 2.2
|
||||
version: 2.2
|
||||
---
|
||||
|
||||
This class allows convenient access to large lookup tables and dictionaries,
|
||||
|
|
|
@ -143,7 +143,7 @@ the match.
|
|||
| `with_alignments` <Tag variant="new">3.0.6</Tag> | Return match alignment information as part of the match tuple as `List[int]` with the same length as the matched span. Each entry denotes the corresponding index of the token in the pattern. If `as_spans` is set to `True`, this setting is ignored. Defaults to `False`. ~~bool~~ |
|
||||
| **RETURNS** | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end`]. The `match_id` is the ID of the added match pattern. If `as_spans` is set to `True`, a list of `Span` objects is returned instead. ~~Union[List[Tuple[int, int, int]], List[Span]]~~ |
|
||||
|
||||
## Matcher.\_\_len\_\_ {id="len",tag="method",new="2"}
|
||||
## Matcher.\_\_len\_\_ {id="len",tag="method",version="2"}
|
||||
|
||||
Get the number of rules added to the matcher. Note that this only returns the
|
||||
number of rules (identical with the number of IDs), not the number of individual
|
||||
|
@ -162,7 +162,7 @@ patterns.
|
|||
| ----------- | ---------------------------- |
|
||||
| **RETURNS** | The number of rules. ~~int~~ |
|
||||
|
||||
## Matcher.\_\_contains\_\_ {id="contains",tag="method",new="2"}
|
||||
## Matcher.\_\_contains\_\_ {id="contains",tag="method",version="2"}
|
||||
|
||||
Check whether the matcher contains rules for a match ID.
|
||||
|
||||
|
@ -180,7 +180,7 @@ Check whether the matcher contains rules for a match ID.
|
|||
| `key` | The match ID. ~~str~~ |
|
||||
| **RETURNS** | Whether the matcher contains rules for this match ID. ~~bool~~ |
|
||||
|
||||
## Matcher.add {id="add",tag="method",new="2"}
|
||||
## Matcher.add {id="add",tag="method",version="2"}
|
||||
|
||||
Add a rule to the matcher, consisting of an ID key, one or more patterns, and an
|
||||
optional callback function to act on the matches. The callback function will
|
||||
|
@ -226,7 +226,7 @@ patterns = [[{"TEXT": "Google"}, {"TEXT": "Now"}], [{"TEXT": "GoogleNow"}]]
|
|||
| `on_match` | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |
|
||||
| `greedy` <Tag variant="new">3</Tag> | Optional filter for greedy matches. Can either be `"FIRST"` or `"LONGEST"`. ~~Optional[str]~~ |
|
||||
|
||||
## Matcher.remove {id="remove",tag="method",new="2"}
|
||||
## Matcher.remove {id="remove",tag="method",version="2"}
|
||||
|
||||
Remove a rule from the matcher. A `KeyError` is raised if the match ID does not
|
||||
exist.
|
||||
|
@ -244,7 +244,7 @@ exist.
|
|||
| ----- | --------------------------------- |
|
||||
| `key` | The ID of the match rule. ~~str~~ |
|
||||
|
||||
## Matcher.get {id="get",tag="method",new="2"}
|
||||
## Matcher.get {id="get",tag="method",version="2"}
|
||||
|
||||
Retrieve the pattern stored for a key. Returns the rule as an
|
||||
`(on_match, patterns)` tuple containing the callback and available patterns.
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: Morphologizer
|
||||
tag: class
|
||||
source: spacy/pipeline/morphologizer.pyx
|
||||
new: 3
|
||||
version: 3
|
||||
teaser: 'Pipeline component for predicting morphological features'
|
||||
api_base_class: /api/tagger
|
||||
api_string_name: morphologizer
|
||||
|
@ -403,7 +403,7 @@ coarse-grained POS as the feature `POS`.
|
|||
| ----------- | ------------------------------------------------------ |
|
||||
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
|
||||
|
||||
## Morphologizer.label_data {id="label_data",tag="property",new="3"}
|
||||
## Morphologizer.label_data {id="label_data",tag="property",version="3"}
|
||||
|
||||
The labels currently added to the component and their internal meta information.
|
||||
This is the data generated by [`init labels`](/api/cli#init-labels) and used by
|
||||
|
|
|
@ -3,7 +3,7 @@ title: PhraseMatcher
|
|||
teaser: Match sequences of tokens, based on documents
|
||||
tag: class
|
||||
source: spacy/matcher/phrasematcher.pyx
|
||||
new: 2
|
||||
version: 2
|
||||
---
|
||||
|
||||
The `PhraseMatcher` lets you efficiently match large terminology lists. While
|
||||
|
@ -155,7 +155,7 @@ patterns = [nlp("health care reform"), nlp("healthcare reform")]
|
|||
| _keyword-only_ | |
|
||||
| `on_match` | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |
|
||||
|
||||
## PhraseMatcher.remove {id="remove",tag="method",new="2.2"}
|
||||
## PhraseMatcher.remove {id="remove",tag="method",version="2.2"}
|
||||
|
||||
Remove a rule from the matcher by match ID. A `KeyError` is raised if the key
|
||||
does not exist.
|
||||
|
|
|
@ -100,7 +100,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/pipe#call) and
|
|||
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
|
||||
| **YIELDS** | The processed documents in order. ~~Doc~~ |
|
||||
|
||||
## TrainablePipe.set_error_handler {id="set_error_handler",tag="method",new="3"}
|
||||
## TrainablePipe.set_error_handler {id="set_error_handler",tag="method",version="3"}
|
||||
|
||||
Define a callback that will be invoked when an error is thrown during processing
|
||||
of one or more documents with either [`__call__`](/api/pipe#call) or
|
||||
|
@ -122,7 +122,7 @@ processed, and the original error.
|
|||
| --------------- | -------------------------------------------------------------------------------------------------------------- |
|
||||
| `error_handler` | A function that performs custom error handling. ~~Callable[[str, Callable[[Doc], Doc], List[Doc], Exception]~~ |
|
||||
|
||||
## TrainablePipe.get_error_handler {id="get_error_handler",tag="method",new="3"}
|
||||
## TrainablePipe.get_error_handler {id="get_error_handler",tag="method",version="3"}
|
||||
|
||||
Retrieve the callback that performs error handling for this component's
|
||||
[`__call__`](/api/pipe#call) and [`pipe`](/api/pipe#pipe) methods. If no custom
|
||||
|
@ -141,7 +141,7 @@ returned that simply reraises the exception.
|
|||
| ----------- | ---------------------------------------------------------------------------------------------------------------- |
|
||||
| **RETURNS** | The function that performs custom error handling. ~~Callable[[str, Callable[[Doc], Doc], List[Doc], Exception]~~ |
|
||||
|
||||
## TrainablePipe.initialize {id="initialize",tag="method",new="3"}
|
||||
## TrainablePipe.initialize {id="initialize",tag="method",version="3"}
|
||||
|
||||
Initialize the component for training. `get_examples` should be a function that
|
||||
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
||||
|
@ -240,7 +240,7 @@ predictions and gold-standard annotations, and update the component's model.
|
|||
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
|
||||
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
|
||||
|
||||
## TrainablePipe.rehearse {id="rehearse",tag="method,experimental",new="3"}
|
||||
## TrainablePipe.rehearse {id="rehearse",tag="method,experimental",version="3"}
|
||||
|
||||
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
|
||||
current model to make predictions similar to an initial model, to try to address
|
||||
|
@ -287,7 +287,7 @@ This method needs to be overwritten with your own custom `get_loss` method.
|
|||
| `scores` | Scores representing the model's predictions. |
|
||||
| **RETURNS** | The loss and the gradient, i.e. `(loss, gradient)`. ~~Tuple[float, float]~~ |
|
||||
|
||||
## TrainablePipe.score {id="score",tag="method",new="3"}
|
||||
## TrainablePipe.score {id="score",tag="method",version="3"}
|
||||
|
||||
Score a batch of examples.
|
||||
|
||||
|
|
|
@ -70,7 +70,7 @@ components to the end of the pipeline and after all other components.
|
|||
| `doc` | The `Doc` object to process, e.g. the `Doc` in the pipeline. ~~Doc~~ |
|
||||
| **RETURNS** | The modified `Doc` with merged entities. ~~Doc~~ |
|
||||
|
||||
## merge_subtokens {id="merge_subtokens",tag="function",new="2.1"}
|
||||
## merge_subtokens {id="merge_subtokens",tag="function",version="2.1"}
|
||||
|
||||
Merge subtokens into a single token. Also available via the string name
|
||||
`"merge_subtokens"`. As of v2.1, the parser is able to predict "subtokens" that
|
||||
|
@ -110,7 +110,7 @@ end of the pipeline and after all other components.
|
|||
| `label` | The subtoken dependency label. Defaults to `"subtok"`. ~~str~~ |
|
||||
| **RETURNS** | The modified `Doc` with merged subtokens. ~~Doc~~ |
|
||||
|
||||
## token_splitter {id="token_splitter",tag="function",new="3.0"}
|
||||
## token_splitter {id="token_splitter",tag="function",version="3.0"}
|
||||
|
||||
Split tokens longer than a minimum length into shorter tokens. Intended for use
|
||||
with transformer pipelines where long spaCy tokens lead to input text that
|
||||
|
@ -132,7 +132,7 @@ exceed the transformer model max length.
|
|||
| `split_length` | The length of the split tokens. Defaults to `5`. ~~int~~ |
|
||||
| **RETURNS** | The modified `Doc` with the split tokens. ~~Doc~~ |
|
||||
|
||||
## doc_cleaner {id="doc_cleaner",tag="function",new="3.2.1"}
|
||||
## doc_cleaner {id="doc_cleaner",tag="function",version="3.2.1"}
|
||||
|
||||
Clean up `Doc` attributes. Intended for use at the end of pipelines with
|
||||
`tok2vec` or `transformer` pipeline components that store tensors and other
|
||||
|
|
|
@ -72,7 +72,7 @@ core pipeline components, the individual score names start with the `Token` or
|
|||
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
||||
| **RETURNS** | A dictionary of scores. ~~Dict[str, Union[float, Dict[str, float]]]~~ |
|
||||
|
||||
## Scorer.score_tokenization {id="score_tokenization",tag="staticmethod",new="3"}
|
||||
## Scorer.score_tokenization {id="score_tokenization",tag="staticmethod",version="3"}
|
||||
|
||||
Scores the tokenization:
|
||||
|
||||
|
@ -93,7 +93,7 @@ Docs with `has_unknown_spaces` are skipped during scoring.
|
|||
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
||||
| **RETURNS** | `Dict` | A dictionary containing the scores `token_acc`, `token_p`, `token_r`, `token_f`. ~~Dict[str, float]]~~ |
|
||||
|
||||
## Scorer.score_token_attr {id="score_token_attr",tag="staticmethod",new="3"}
|
||||
## Scorer.score_token_attr {id="score_token_attr",tag="staticmethod",version="3"}
|
||||
|
||||
Scores a single token attribute. Tokens with missing values in the reference doc
|
||||
are skipped during scoring.
|
||||
|
@ -114,7 +114,7 @@ are skipped during scoring.
|
|||
| `missing_values` | Attribute values to treat as missing annotation in the reference annotation. Defaults to `{0, None, ""}`. ~~Set[Any]~~ |
|
||||
| **RETURNS** | A dictionary containing the score `{attr}_acc`. ~~Dict[str, float]~~ |
|
||||
|
||||
## Scorer.score_token_attr_per_feat {id="score_token_attr_per_feat",tag="staticmethod",new="3"}
|
||||
## Scorer.score_token_attr_per_feat {id="score_token_attr_per_feat",tag="staticmethod",version="3"}
|
||||
|
||||
Scores a single token attribute per feature for a token attribute in the
|
||||
Universal Dependencies
|
||||
|
@ -138,7 +138,7 @@ scoring.
|
|||
| `missing_values` | Attribute values to treat as missing annotation in the reference annotation. Defaults to `{0, None, ""}`. ~~Set[Any]~~ |
|
||||
| **RETURNS** | A dictionary containing the micro PRF scores under the key `{attr}_micro_p/r/f` and the per-feature PRF scores under `{attr}_per_feat`. ~~Dict[str, Dict[str, float]]~~ |
|
||||
|
||||
## Scorer.score_spans {id="score_spans",tag="staticmethod",new="3"}
|
||||
## Scorer.score_spans {id="score_spans",tag="staticmethod",version="3"}
|
||||
|
||||
Returns PRF scores for labeled or unlabeled spans.
|
||||
|
||||
|
@ -160,7 +160,7 @@ Returns PRF scores for labeled or unlabeled spans.
|
|||
| `allow_overlap` | Defaults to `False`. Whether or not to allow overlapping spans. If set to `False`, the alignment will automatically resolve conflicts. ~~bool~~ |
|
||||
| **RETURNS** | A dictionary containing the PRF scores under the keys `{attr}_p`, `{attr}_r`, `{attr}_f` and the per-type PRF scores under `{attr}_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~ |
|
||||
|
||||
## Scorer.score_deps {id="score_deps",tag="staticmethod",new="3"}
|
||||
## Scorer.score_deps {id="score_deps",tag="staticmethod",version="3"}
|
||||
|
||||
Calculate the UAS, LAS, and LAS per type scores for dependency parses. Tokens
|
||||
with missing values for the `attr` (typically `dep`) are skipped during scoring.
|
||||
|
@ -194,7 +194,7 @@ with missing values for the `attr` (typically `dep`) are skipped during scoring.
|
|||
| `missing_values` | Attribute values to treat as missing annotation in the reference annotation. Defaults to `{0, None, ""}`. ~~Set[Any]~~ |
|
||||
| **RETURNS** | A dictionary containing the scores: `{attr}_uas`, `{attr}_las`, and `{attr}_las_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~ |
|
||||
|
||||
## Scorer.score_cats {id="score_cats",tag="staticmethod",new="3"}
|
||||
## Scorer.score_cats {id="score_cats",tag="staticmethod",version="3"}
|
||||
|
||||
Calculate PRF and ROC AUC scores for a doc-level attribute that is a dict
|
||||
containing scores for each label like `Doc.cats`. The returned dictionary
|
||||
|
@ -241,7 +241,7 @@ The reported `{attr}_score` depends on the classification properties:
|
|||
| `threshold` | Cutoff to consider a prediction "positive". Defaults to `0.5` for multi-label, and `0.0` (i.e. whatever's highest scoring) otherwise. ~~float~~ |
|
||||
| **RETURNS** | A dictionary containing the scores, with inapplicable scores as `None`. ~~Dict[str, Optional[float]]~~ |
|
||||
|
||||
## Scorer.score_links {id="score_links",tag="staticmethod",new="3"}
|
||||
## Scorer.score_links {id="score_links",tag="staticmethod",version="3"}
|
||||
|
||||
Returns PRF for predicted links on the entity level. To disentangle the
|
||||
performance of the NEL from the NER, this method only evaluates NEL links for
|
||||
|
@ -264,7 +264,7 @@ entities that overlap between the gold reference and the predictions.
|
|||
| `negative_labels` | The string values that refer to no annotation (e.g. "NIL"). ~~Iterable[str]~~ |
|
||||
| **RETURNS** | A dictionary containing the scores. ~~Dict[str, Optional[float]]~~ |
|
||||
|
||||
## get_ner_prf {id="get_ner_prf",new="3"}
|
||||
## get_ner_prf {id="get_ner_prf",version="3"}
|
||||
|
||||
Compute micro-PRF and per-entity PRF scores.
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: SentenceRecognizer
|
||||
tag: class
|
||||
source: spacy/pipeline/senter.pyx
|
||||
new: 3
|
||||
version: 3
|
||||
teaser: 'Pipeline component for sentence segmentation'
|
||||
api_base_class: /api/tagger
|
||||
api_string_name: senter
|
||||
|
@ -211,7 +211,7 @@ Delegates to [`predict`](/api/sentencerecognizer#predict) and
|
|||
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
|
||||
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
|
||||
|
||||
## SentenceRecognizer.rehearse {id="rehearse",tag="method,experimental",new="3"}
|
||||
## SentenceRecognizer.rehearse {id="rehearse",tag="method,experimental",version="3"}
|
||||
|
||||
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
|
||||
current model to make predictions similar to an initial model to try to address
|
||||
|
|
|
@ -93,7 +93,7 @@ Get the number of tokens in the span.
|
|||
| ----------- | ----------------------------------------- |
|
||||
| **RETURNS** | The number of tokens in the span. ~~int~~ |
|
||||
|
||||
## Span.set_extension {id="set_extension",tag="classmethod",new="2"}
|
||||
## Span.set_extension {id="set_extension",tag="classmethod",version="2"}
|
||||
|
||||
Define a custom attribute on the `Span` which becomes available via `Span._`.
|
||||
For details, see the documentation on
|
||||
|
@ -118,7 +118,7 @@ For details, see the documentation on
|
|||
| `setter` | Setter function that takes the `Span` and a value, and modifies the object. Is called when the user writes to the `Span._` attribute. ~~Optional[Callable[[Span, Any], None]]~~ |
|
||||
| `force` | Force overwriting existing attribute. ~~bool~~ |
|
||||
|
||||
## Span.get_extension {id="get_extension",tag="classmethod",new="2"}
|
||||
## Span.get_extension {id="get_extension",tag="classmethod",version="2"}
|
||||
|
||||
Look up a previously registered extension by name. Returns a 4-tuple
|
||||
`(default, method, getter, setter)` if the extension is registered. Raises a
|
||||
|
@ -138,7 +138,7 @@ Look up a previously registered extension by name. Returns a 4-tuple
|
|||
| `name` | Name of the extension. ~~str~~ |
|
||||
| **RETURNS** | A `(default, method, getter, setter)` tuple of the extension. ~~Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]~~ |
|
||||
|
||||
## Span.has_extension {id="has_extension",tag="classmethod",new="2"}
|
||||
## Span.has_extension {id="has_extension",tag="classmethod",version="2"}
|
||||
|
||||
Check whether an extension has been registered on the `Span` class.
|
||||
|
||||
|
@ -155,7 +155,7 @@ Check whether an extension has been registered on the `Span` class.
|
|||
| `name` | Name of the extension to check. ~~str~~ |
|
||||
| **RETURNS** | Whether the extension has been registered. ~~bool~~ |
|
||||
|
||||
## Span.remove_extension {id="remove_extension",tag="classmethod",new="2.0.12"}
|
||||
## Span.remove_extension {id="remove_extension",tag="classmethod",version="2.0.12"}
|
||||
|
||||
Remove a previously registered extension.
|
||||
|
||||
|
@ -173,7 +173,7 @@ Remove a previously registered extension.
|
|||
| `name` | Name of the extension. ~~str~~ |
|
||||
| **RETURNS** | A `(default, method, getter, setter)` tuple of the removed extension. ~~Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]~~ |
|
||||
|
||||
## Span.char_span {id="char_span",tag="method",new="2.2.4"}
|
||||
## Span.char_span {id="char_span",tag="method",version="2.2.4"}
|
||||
|
||||
Create a `Span` object from the slice `span.text[start:end]`. Returns `None` if
|
||||
the character indices don't map to a valid span.
|
||||
|
@ -235,7 +235,7 @@ ancestor is found, e.g. if span excludes a necessary ancestor.
|
|||
| ----------- | --------------------------------------------------------------------------------------- |
|
||||
| **RETURNS** | The lowest common ancestor matrix of the `Span`. ~~numpy.ndarray[ndim=2, dtype=int32]~~ |
|
||||
|
||||
## Span.to_array {id="to_array",tag="method",new="2"}
|
||||
## Span.to_array {id="to_array",tag="method",version="2"}
|
||||
|
||||
Given a list of `M` attribute IDs, export the tokens to a numpy `ndarray` of
|
||||
shape `(N, M)`, where `N` is the length of the document. The values will be
|
||||
|
@ -256,7 +256,7 @@ shape `(N, M)`, where `N` is the length of the document. The values will be
|
|||
| `attr_ids` | A list of attributes (int IDs or string names) or a single attribute (int ID or string name). ~~Union[int, str, List[Union[int, str]]]~~ |
|
||||
| **RETURNS** | The exported attributes as a numpy array. ~~Union[numpy.ndarray[ndim=2, dtype=uint64], numpy.ndarray[ndim=1, dtype=uint64]]~~ |
|
||||
|
||||
## Span.ents {id="ents",tag="property",new="2.0.13",model="ner"}
|
||||
## Span.ents {id="ents",tag="property",version="2.0.13",model="ner"}
|
||||
|
||||
The named entities that fall completely within the span. Returns a tuple of
|
||||
`Span` objects.
|
||||
|
@ -520,7 +520,7 @@ sent = doc[sent.start : max(sent.end, span.end)]
|
|||
| ----------- | ------------------------------------------------------- |
|
||||
| **RETURNS** | The sentence span that this span is a part of. ~~Span~~ |
|
||||
|
||||
## Span.sents {id="sents",tag="property",model="sentences",new="3.2.1"}
|
||||
## Span.sents {id="sents",tag="property",model="sentences",version="3.2.1"}
|
||||
|
||||
Returns a generator over the sentences the span belongs to. This property is
|
||||
only available when [sentence boundaries](/usage/linguistic-features#sbd) have
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: SpanCategorizer
|
||||
tag: class,experimental
|
||||
source: spacy/pipeline/spancat.py
|
||||
new: 3.1
|
||||
version: 3.1
|
||||
teaser: 'Pipeline component for labeling potentially overlapping spans of text'
|
||||
api_base_class: /api/pipe
|
||||
api_string_name: spancat
|
||||
|
@ -239,7 +239,7 @@ Delegates to [`predict`](/api/spancategorizer#predict) and
|
|||
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
|
||||
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
|
||||
|
||||
## SpanCategorizer.set_candidates {id="set_candidates",tag="method", new="3.3"}
|
||||
## SpanCategorizer.set_candidates {id="set_candidates",tag="method", version="3.3"}
|
||||
|
||||
Use the suggester to add a list of [`Span`](/api/span) candidates to a list of
|
||||
[`Doc`](/api/doc) objects. This method is intended to be used for debugging
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: SpanGroup
|
||||
tag: class
|
||||
source: spacy/tokens/span_group.pyx
|
||||
new: 3
|
||||
version: 3
|
||||
---
|
||||
|
||||
A group of arbitrary, potentially overlapping [`Span`](/api/span) objects that
|
||||
|
@ -125,7 +125,7 @@ changes to be reflected in the span group.
|
|||
| `i` | The item index. ~~int~~ |
|
||||
| **RETURNS** | The span at the given index. ~~Span~~ |
|
||||
|
||||
## SpanGroup.\_\_setitem\_\_ {id="setitem",tag="method", new="3.3"}
|
||||
## SpanGroup.\_\_setitem\_\_ {id="setitem",tag="method", version="3.3"}
|
||||
|
||||
Set a span in the span group.
|
||||
|
||||
|
@ -144,7 +144,7 @@ Set a span in the span group.
|
|||
| `i` | The item index. ~~int~~ |
|
||||
| `span` | The new value. ~~Span~~ |
|
||||
|
||||
## SpanGroup.\_\_delitem\_\_ {id="delitem",tag="method", new="3.3"}
|
||||
## SpanGroup.\_\_delitem\_\_ {id="delitem",tag="method", version="3.3"}
|
||||
|
||||
Delete a span from the span group.
|
||||
|
||||
|
@ -161,7 +161,7 @@ Delete a span from the span group.
|
|||
| ---- | ----------------------- |
|
||||
| `i` | The item index. ~~int~~ |
|
||||
|
||||
## SpanGroup.\_\_add\_\_ {id="add",tag="method", new="3.3"}
|
||||
## SpanGroup.\_\_add\_\_ {id="add",tag="method", version="3.3"}
|
||||
|
||||
Concatenate the current span group with another span group and return the result
|
||||
in a new span group. Any `attrs` from the first span group will have precedence
|
||||
|
@ -182,7 +182,7 @@ over `attrs` in the second.
|
|||
| `other` | The span group or spans to concatenate. ~~Union[SpanGroup, Iterable[Span]]~~ |
|
||||
| **RETURNS** | The new span group. ~~SpanGroup~~ |
|
||||
|
||||
## SpanGroup.\_\_iadd\_\_ {id="iadd",tag="method", new="3.3"}
|
||||
## SpanGroup.\_\_iadd\_\_ {id="iadd",tag="method", version="3.3"}
|
||||
|
||||
Append an iterable of spans or the content of a span group to the current span
|
||||
group. Any `attrs` in the other span group will be added for keys that are not
|
||||
|
@ -241,7 +241,7 @@ group.
|
|||
| ------- | -------------------------------------------------------- |
|
||||
| `spans` | The spans to add. ~~Union[SpanGroup, Iterable["Span"]]~~ |
|
||||
|
||||
## SpanGroup.copy {id="copy",tag="method", new="3.3"}
|
||||
## SpanGroup.copy {id="copy",tag="method", version="3.3"}
|
||||
|
||||
Return a copy of the span group.
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: SpanRuler
|
||||
tag: class
|
||||
source: spacy/pipeline/span_ruler.py
|
||||
new: 3.3
|
||||
version: 3.3
|
||||
teaser: 'Pipeline component for rule-based span and named entity recognition'
|
||||
api_string_name: span_ruler
|
||||
api_trainable: false
|
||||
|
|
|
@ -90,7 +90,7 @@ store will always include an empty string `""` at position `0`.
|
|||
| ---------- | ------------------------------ |
|
||||
| **YIELDS** | A string in the store. ~~str~~ |
|
||||
|
||||
## StringStore.add {id="add",tag="method",new="2"}
|
||||
## StringStore.add {id="add",tag="method",version="2"}
|
||||
|
||||
Add a string to the `StringStore`.
|
||||
|
||||
|
@ -110,7 +110,7 @@ Add a string to the `StringStore`.
|
|||
| `string` | The string to add. ~~str~~ |
|
||||
| **RETURNS** | The string's hash value. ~~int~~ |
|
||||
|
||||
## StringStore.to_disk {id="to_disk",tag="method",new="2"}
|
||||
## StringStore.to_disk {id="to_disk",tag="method",version="2"}
|
||||
|
||||
Save the current state to a directory.
|
||||
|
||||
|
@ -124,7 +124,7 @@ Save the current state to a directory.
|
|||
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `path` | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
|
||||
|
||||
## StringStore.from_disk {id="from_disk",tag="method",new="2"}
|
||||
## StringStore.from_disk {id="from_disk",tag="method",version="2"}
|
||||
|
||||
Loads state from a directory. Modifies the object in place and returns it.
|
||||
|
||||
|
|
|
@ -127,7 +127,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/tagger#call) and
|
|||
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
|
||||
| **YIELDS** | The processed documents in order. ~~Doc~~ |
|
||||
|
||||
## Tagger.initialize {id="initialize",tag="method",new="3"}
|
||||
## Tagger.initialize {id="initialize",tag="method",version="3"}
|
||||
|
||||
Initialize the component for training. `get_examples` should be a function that
|
||||
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||
|
@ -228,7 +228,7 @@ Delegates to [`predict`](/api/tagger#predict) and
|
|||
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
|
||||
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
|
||||
|
||||
## Tagger.rehearse {id="rehearse",tag="method,experimental",new="3"}
|
||||
## Tagger.rehearse {id="rehearse",tag="method,experimental",version="3"}
|
||||
|
||||
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
|
||||
current model to make predictions similar to an initial model, to try to address
|
||||
|
@ -410,7 +410,7 @@ The labels currently added to the component.
|
|||
| ----------- | ------------------------------------------------------ |
|
||||
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
|
||||
|
||||
## Tagger.label_data {id="label_data",tag="property",new="3"}
|
||||
## Tagger.label_data {id="label_data",tag="property",version="3"}
|
||||
|
||||
The labels currently added to the component and their internal meta information.
|
||||
This is the data generated by [`init labels`](/api/cli#init-labels) and used by
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: TextCategorizer
|
||||
tag: class
|
||||
source: spacy/pipeline/textcat.py
|
||||
new: 2
|
||||
version: 2
|
||||
teaser: 'Pipeline component for text classification'
|
||||
api_base_class: /api/pipe
|
||||
api_string_name: textcat
|
||||
|
@ -172,7 +172,7 @@ applied to the `Doc` in order. Both [`__call__`](/api/textcategorizer#call) and
|
|||
| `batch_size` | The number of documents to buffer. Defaults to `128`. ~~int~~ |
|
||||
| **YIELDS** | The processed documents in order. ~~Doc~~ |
|
||||
|
||||
## TextCategorizer.initialize {id="initialize",tag="method",new="3"}
|
||||
## TextCategorizer.initialize {id="initialize",tag="method",version="3"}
|
||||
|
||||
Initialize the component for training. `get_examples` should be a function that
|
||||
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||
|
@ -275,7 +275,7 @@ Delegates to [`predict`](/api/textcategorizer#predict) and
|
|||
| `losses` | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
|
||||
| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ |
|
||||
|
||||
## TextCategorizer.rehearse {id="rehearse",tag="method,experimental",new="3"}
|
||||
## TextCategorizer.rehearse {id="rehearse",tag="method,experimental",version="3"}
|
||||
|
||||
Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
|
||||
current model to make predictions similar to an initial model to try to address
|
||||
|
@ -317,7 +317,7 @@ predicted scores.
|
|||
| `scores` | Scores representing the model's predictions. |
|
||||
| **RETURNS** | The loss and the gradient, i.e. `(loss, gradient)`. ~~Tuple[float, float]~~ |
|
||||
|
||||
## TextCategorizer.score {id="score",tag="method",new="3"}
|
||||
## TextCategorizer.score {id="score",tag="method",version="3"}
|
||||
|
||||
Score a batch of examples.
|
||||
|
||||
|
@ -472,7 +472,7 @@ The labels currently added to the component.
|
|||
| ----------- | ------------------------------------------------------ |
|
||||
| **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
|
||||
|
||||
## TextCategorizer.label_data {id="label_data",tag="property",new="3"}
|
||||
## TextCategorizer.label_data {id="label_data",tag="property",version="3"}
|
||||
|
||||
The labels currently added to the component and their internal meta information.
|
||||
This is the data generated by [`init labels`](/api/cli#init-labels) and used by
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
title: Tok2Vec
|
||||
source: spacy/pipeline/tok2vec.py
|
||||
new: 3
|
||||
version: 3
|
||||
teaser: null
|
||||
api_base_class: /api/pipe
|
||||
api_string_name: tok2vec
|
||||
|
|
|
@ -39,7 +39,7 @@ The number of unicode characters in the token, i.e. `token.text`.
|
|||
| ----------- | ------------------------------------------------------ |
|
||||
| **RETURNS** | The number of unicode characters in the token. ~~int~~ |
|
||||
|
||||
## Token.set_extension {id="set_extension",tag="classmethod",new="2"}
|
||||
## Token.set_extension {id="set_extension",tag="classmethod",version="2"}
|
||||
|
||||
Define a custom attribute on the `Token` which becomes available via `Token._`.
|
||||
For details, see the documentation on
|
||||
|
@ -64,7 +64,7 @@ For details, see the documentation on
|
|||
| `setter` | Setter function that takes the `Token` and a value, and modifies the object. Is called when the user writes to the `Token._` attribute. ~~Optional[Callable[[Token, Any], None]]~~ |
|
||||
| `force` | Force overwriting existing attribute. ~~bool~~ |
|
||||
|
||||
## Token.get_extension {id="get_extension",tag="classmethod",new="2"}
|
||||
## Token.get_extension {id="get_extension",tag="classmethod",version="2"}
|
||||
|
||||
Look up a previously registered extension by name. Returns a 4-tuple
|
||||
`(default, method, getter, setter)` if the extension is registered. Raises a
|
||||
|
@ -84,7 +84,7 @@ Look up a previously registered extension by name. Returns a 4-tuple
|
|||
| `name` | Name of the extension. ~~str~~ |
|
||||
| **RETURNS** | A `(default, method, getter, setter)` tuple of the extension. ~~Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]~~ |
|
||||
|
||||
## Token.has_extension {id="has_extension",tag="classmethod",new="2"}
|
||||
## Token.has_extension {id="has_extension",tag="classmethod",version="2"}
|
||||
|
||||
Check whether an extension has been registered on the `Token` class.
|
||||
|
||||
|
@ -101,7 +101,7 @@ Check whether an extension has been registered on the `Token` class.
|
|||
| `name` | Name of the extension to check. ~~str~~ |
|
||||
| **RETURNS** | Whether the extension has been registered. ~~bool~~ |
|
||||
|
||||
## Token.remove_extension {id="remove_extension",tag="classmethod",new="2.0.11"}
|
||||
## Token.remove_extension {id="remove_extension",tag="classmethod",version="2.0.11"}
|
||||
|
||||
Remove a previously registered extension.
|
||||
|
||||
|
|
|
@ -70,7 +70,7 @@ for name in pipeline:
|
|||
nlp.from_disk(data_path) # 4. Load in the binary data
|
||||
```
|
||||
|
||||
### spacy.blank {id="spacy.blank",tag="function",new="2"}
|
||||
### spacy.blank {id="spacy.blank",tag="function",version="2"}
|
||||
|
||||
Create a blank pipeline of a given language class. This function is the twin of
|
||||
`spacy.load()`.
|
||||
|
@ -134,7 +134,7 @@ list of available terms, see [`glossary.py`](%%GITHUB_SPACY/spacy/glossary.py).
|
|||
| `term` | Term to explain. ~~str~~ |
|
||||
| **RETURNS** | The explanation, or `None` if not found in the glossary. ~~Optional[str]~~ |
|
||||
|
||||
### spacy.prefer_gpu {id="spacy.prefer_gpu",tag="function",new="2.0.14"}
|
||||
### spacy.prefer_gpu {id="spacy.prefer_gpu",tag="function",version="2.0.14"}
|
||||
|
||||
Allocate data and perform operations on [GPU](/usage/#gpu), if available. If
|
||||
data has already been allocated on CPU, it will not be moved. Ideally, this
|
||||
|
@ -162,7 +162,7 @@ ensure that the model is loaded on the correct device. See
|
|||
| `gpu_id` | Device index to select. Defaults to `0`. ~~int~~ |
|
||||
| **RETURNS** | Whether the GPU was activated. ~~bool~~ |
|
||||
|
||||
### spacy.require_gpu {id="spacy.require_gpu",tag="function",new="2.0.14"}
|
||||
### spacy.require_gpu {id="spacy.require_gpu",tag="function",version="2.0.14"}
|
||||
|
||||
Allocate data and perform operations on [GPU](/usage/#gpu). Will raise an error
|
||||
if no GPU is available. If data has already been allocated on CPU, it will not
|
||||
|
@ -190,7 +190,7 @@ ensure that the model is loaded on the correct device. See
|
|||
| `gpu_id` | Device index to select. Defaults to `0`. ~~int~~ |
|
||||
| **RETURNS** | `True` ~~bool~~ |
|
||||
|
||||
### spacy.require_cpu {id="spacy.require_cpu",tag="function",new="3.0.0"}
|
||||
### spacy.require_cpu {id="spacy.require_cpu",tag="function",version="3.0.0"}
|
||||
|
||||
Allocate data and perform operations on CPU. If data has already been allocated
|
||||
on GPU, it will not be moved. Ideally, this function should be called right
|
||||
|
@ -221,7 +221,7 @@ ensure that the model is loaded on the correct device. See
|
|||
As of v2.0, spaCy comes with a built-in visualization suite. For more info and
|
||||
examples, see the usage guide on [visualizing spaCy](/usage/visualizers).
|
||||
|
||||
### displacy.serve {id="displacy.serve",tag="method",new="2"}
|
||||
### displacy.serve {id="displacy.serve",tag="method",version="2"}
|
||||
|
||||
Serve a dependency parse tree or named entity visualization to view it in your
|
||||
browser. Will run a simple web server.
|
||||
|
@ -248,7 +248,7 @@ browser. Will run a simple web server.
|
|||
| `port` | Port to serve visualization. Defaults to `5000`. ~~int~~ |
|
||||
| `host` | Host to serve visualization. Defaults to `"0.0.0.0"`. ~~str~~ |
|
||||
|
||||
### displacy.render {id="displacy.render",tag="method",new="2"}
|
||||
### displacy.render {id="displacy.render",tag="method",version="2"}
|
||||
|
||||
Render a dependency parse tree or named entity visualization.
|
||||
|
||||
|
@ -273,7 +273,7 @@ Render a dependency parse tree or named entity visualization.
|
|||
| `jupyter` | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None` (default). ~~Optional[bool]~~ |
|
||||
| **RETURNS** | The rendered HTML markup. ~~str~~ |
|
||||
|
||||
### displacy.parse_deps {id="displacy.parse_deps",tag="method",new="2"}
|
||||
### displacy.parse_deps {id="displacy.parse_deps",tag="method",version="2"}
|
||||
|
||||
Generate dependency parse in `{'words': [], 'arcs': []}` format. For use with
|
||||
the `manual=True` argument in `displacy.render`.
|
||||
|
@ -295,7 +295,7 @@ the `manual=True` argument in `displacy.render`.
|
|||
| `options` | Dependency parse specific visualisation options. ~~Dict[str, Any]~~ |
|
||||
| **RETURNS** | Generated dependency parse keyed by words and arcs. ~~dict~~ |
|
||||
|
||||
### displacy.parse_ents {id="displacy.parse_ents",tag="method",new="2"}
|
||||
### displacy.parse_ents {id="displacy.parse_ents",tag="method",version="2"}
|
||||
|
||||
Generate named entities in `[{start: i, end: i, label: 'label'}]` format. For
|
||||
use with the `manual=True` argument in `displacy.render`.
|
||||
|
@ -317,7 +317,7 @@ use with the `manual=True` argument in `displacy.render`.
|
|||
| `options` | NER-specific visualisation options. ~~Dict[str, Any]~~ |
|
||||
| **RETURNS** | Generated entities keyed by text (original text) and ents. ~~dict~~ |
|
||||
|
||||
### displacy.parse_spans {id="displacy.parse_spans",tag="method",new="2"}
|
||||
### displacy.parse_spans {id="displacy.parse_spans",tag="method",version="2"}
|
||||
|
||||
Generate spans in `[{start_token: i, end_token: i, label: 'label'}]` format. For
|
||||
use with the `manual=True` argument in `displacy.render`.
|
||||
|
@ -419,7 +419,7 @@ span. If you wish to link an entity to their URL then consider using the
|
|||
should redirect you to their Wikidata page, in this case
|
||||
`https://www.wikidata.org/wiki/Q95`.
|
||||
|
||||
## registry {id="registry",source="spacy/util.py",new="3"}
|
||||
## registry {id="registry",source="spacy/util.py",version="3"}
|
||||
|
||||
spaCy's function registry extends
|
||||
[Thinc's `registry`](https://thinc.ai/docs/api-config#registry) and allows you
|
||||
|
@ -494,7 +494,7 @@ See the [`Transformer`](/api/transformer) API reference and
|
|||
| [`span_getters`](/api/transformer#span_getters) | Registry for functions that take a batch of `Doc` objects and return a list of `Span` objects to process by the transformer, e.g. sentences. |
|
||||
| [`annotation_setters`](/api/transformer#annotation_setters) | Registry for functions that create annotation setters. Annotation setters are functions that take a batch of `Doc` objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set additional annotations on the `Doc`. |
|
||||
|
||||
## Loggers {id="loggers",source="spacy/training/loggers.py",new="3"}
|
||||
## Loggers {id="loggers",source="spacy/training/loggers.py",version="3"}
|
||||
|
||||
A logger records the training results. When a logger is created, two functions
|
||||
are returned: one for logging the information for each training step, and a
|
||||
|
@ -572,7 +572,7 @@ start decreasing across epochs.
|
|||
|
||||
## Readers {id="readers"}
|
||||
|
||||
### File readers {id="file-readers",source="github.com/explosion/srsly",new="3"}
|
||||
### File readers {id="file-readers",source="github.com/explosion/srsly",version="3"}
|
||||
|
||||
The following file readers are provided by our serialization library
|
||||
[`srsly`](https://github.com/explosion/srsly). All registered functions take one
|
||||
|
@ -628,7 +628,7 @@ label sets.
|
|||
| `require` | Whether to require the file to exist. If set to `False` and the labels file doesn't exist, the loader will return `None` and the `initialize` method will extract the labels from the data. Defaults to `False`. ~~bool~~ |
|
||||
| **CREATES** | The list of labels. ~~List[str]~~ |
|
||||
|
||||
### Corpus readers {id="corpus-readers",source="spacy/training/corpus.py",new="3"}
|
||||
### Corpus readers {id="corpus-readers",source="spacy/training/corpus.py",version="3"}
|
||||
|
||||
Corpus readers are registered functions that load data and return a function
|
||||
that takes the current `nlp` object and yields [`Example`](/api/example) objects
|
||||
|
@ -696,7 +696,7 @@ JSONL file. Also see the [`JsonlCorpus`](/api/corpus#jsonlcorpus) class.
|
|||
| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
|
||||
| **CREATES** | The corpus reader. ~~JsonlCorpus~~ |
|
||||
|
||||
## Batchers {id="batchers",source="spacy/training/batchers.py",new="3"}
|
||||
## Batchers {id="batchers",source="spacy/training/batchers.py",version="3"}
|
||||
|
||||
A data batcher implements a batching strategy that essentially turns a stream of
|
||||
items into a stream of batches, with each batch consisting of one item or a list
|
||||
|
@ -783,7 +783,7 @@ sequences in the batch.
|
|||
| `get_length` | Optional function that receives a sequence item and returns its length. Defaults to the built-in `len()` if not set. ~~Optional[Callable[[Any], int]]~~ |
|
||||
| **CREATES** | The batcher that takes an iterable of items and returns batches. ~~Callable[[Iterable[Any]], Iterable[List[Any]]]~~ |
|
||||
|
||||
## Augmenters {id="augmenters",source="spacy/training/augment.py",new="3"}
|
||||
## Augmenters {id="augmenters",source="spacy/training/augment.py",version="3"}
|
||||
|
||||
Data augmentation is the process of applying small modifications to the training
|
||||
data. It can be especially useful for punctuation and case replacement – for
|
||||
|
@ -838,7 +838,7 @@ useful for making the model less sensitive to capitalization.
|
|||
| `level` | The percentage of texts that will be augmented. ~~float~~ |
|
||||
| **CREATES** | A function that takes the current `nlp` object and an [`Example`](/api/example) and yields augmented `Example` objects. ~~Callable[[Language, Example], Iterator[Example]]~~ |
|
||||
|
||||
## Callbacks {id="callbacks",source="spacy/training/callbacks.py",new="3"}
|
||||
## Callbacks {id="callbacks",source="spacy/training/callbacks.py",version="3"}
|
||||
|
||||
The config supports [callbacks](/usage/training#custom-code-nlp-callbacks) at
|
||||
several points in the lifecycle that can be used modify the `nlp` object.
|
||||
|
@ -887,7 +887,7 @@ backprop passes.
|
|||
| `backprop_color` | Color identifier for backpropagation passes. Defaults to `-1`. ~~int~~ |
|
||||
| **CREATES** | A function that takes the current `nlp` and wraps forward/backprop passes in NVTX ranges. ~~Callable[[Language], Language]~~ |
|
||||
|
||||
### spacy.models_and_pipes_with_nvtx_range.v1 {id="models_and_pipes_with_nvtx_range",tag="registered function",new="3.4"}
|
||||
### spacy.models_and_pipes_with_nvtx_range.v1 {id="models_and_pipes_with_nvtx_range",tag="registered function",version="3.4"}
|
||||
|
||||
> #### Example config
|
||||
>
|
||||
|
@ -975,7 +975,7 @@ This method was previously available as `spacy.gold.offsets_from_biluo_tags`.
|
|||
| `tags` | A sequence of [BILUO](/usage/linguistic-features#accessing-ner) tags with each tag describing one token. Each tag string will be of the form of either `""`, `"O"` or `"{action}-{label}"`, where action is one of `"B"`, `"I"`, `"L"`, `"U"`. ~~List[str]~~ |
|
||||
| **RETURNS** | A sequence of `(start, end, label)` triples. `start` and `end` will be character-offset integers denoting the slice into the original string. ~~List[Tuple[int, int, str]]~~ |
|
||||
|
||||
### training.biluo_tags_to_spans {id="biluo_tags_to_spans",tag="function",new="2.1"}
|
||||
### training.biluo_tags_to_spans {id="biluo_tags_to_spans",tag="function",version="2.1"}
|
||||
|
||||
Encode per-token tags following the
|
||||
[BILUO scheme](/usage/linguistic-features#accessing-ner) into
|
||||
|
@ -1131,7 +1131,7 @@ custom language class, you can register it using the
|
|||
| `lang` | Two-letter language code, e.g. `"en"`. ~~str~~ |
|
||||
| **RETURNS** | The respective subclass. ~~Language~~ |
|
||||
|
||||
### util.lang_class_is_loaded {id="util.lang_class_is_loaded",tag="function",new="2.1"}
|
||||
### util.lang_class_is_loaded {id="util.lang_class_is_loaded",tag="function",version="2.1"}
|
||||
|
||||
Check whether a `Language` subclass is already loaded. `Language` subclasses are
|
||||
loaded lazily to avoid expensive setup code associated with the language data.
|
||||
|
@ -1149,7 +1149,7 @@ loaded lazily to avoid expensive setup code associated with the language data.
|
|||
| `name` | Two-letter language code, e.g. `"en"`. ~~str~~ |
|
||||
| **RETURNS** | Whether the class has been loaded. ~~bool~~ |
|
||||
|
||||
### util.load_model {id="util.load_model",tag="function",new="2"}
|
||||
### util.load_model {id="util.load_model",tag="function",version="2"}
|
||||
|
||||
Load a pipeline from a package or data path. If called with a string name, spaCy
|
||||
will assume the pipeline is a Python package and import and call its `load()`
|
||||
|
@ -1177,7 +1177,7 @@ and create a `Language` object. The model data will then be loaded in via
|
|||
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
|
||||
| **RETURNS** | `Language` class with the loaded pipeline. ~~Language~~ |
|
||||
|
||||
### util.load_model_from_init_py {id="util.load_model_from_init_py",tag="function",new="2"}
|
||||
### util.load_model_from_init_py {id="util.load_model_from_init_py",tag="function",version="2"}
|
||||
|
||||
A helper function to use in the `load()` method of a pipeline package's
|
||||
[`__init__.py`](https://github.com/explosion/spacy-models/tree/master/template/model/xx_model_name/__init__.py).
|
||||
|
@ -1202,7 +1202,7 @@ A helper function to use in the `load()` method of a pipeline package's
|
|||
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
|
||||
| **RETURNS** | `Language` class with the loaded pipeline. ~~Language~~ |
|
||||
|
||||
### util.load_config {id="util.load_config",tag="function",new="3"}
|
||||
### util.load_config {id="util.load_config",tag="function",version="3"}
|
||||
|
||||
Load a pipeline's [`config.cfg`](/api/data-formats#config) from a file path. The
|
||||
config typically includes details about the components and how they're created,
|
||||
|
@ -1222,7 +1222,7 @@ as well as all training settings and hyperparameters.
|
|||
| `interpolate` | Whether to interpolate the config and replace variables like `${paths.train}` with their values. Defaults to `False`. ~~bool~~ |
|
||||
| **RETURNS** | The pipeline's config. ~~Config~~ |
|
||||
|
||||
### util.load_meta {id="util.load_meta",tag="function",new="3"}
|
||||
### util.load_meta {id="util.load_meta",tag="function",version="3"}
|
||||
|
||||
Get a pipeline's [`meta.json`](/api/data-formats#meta) from a file path and
|
||||
validate its contents. The meta typically includes details about author,
|
||||
|
@ -1239,7 +1239,7 @@ licensing, data sources and version.
|
|||
| `path` | Path to the pipeline's `meta.json`. ~~Union[str, Path]~~ |
|
||||
| **RETURNS** | The pipeline's meta data. ~~Dict[str, Any]~~ |
|
||||
|
||||
### util.get_installed_models {id="util.get_installed_models",tag="function",new="3"}
|
||||
### util.get_installed_models {id="util.get_installed_models",tag="function",version="3"}
|
||||
|
||||
List all pipeline packages installed in the current environment. This will
|
||||
include any spaCy pipeline that was packaged with
|
||||
|
@ -1274,7 +1274,7 @@ Check if string maps to a package installed via pip. Mainly used to validate
|
|||
| `name` | Name of package. ~~str~~ |
|
||||
| **RETURNS** | `True` if installed package, `False` if not. ~~bool~~ |
|
||||
|
||||
### util.get_package_path {id="util.get_package_path",tag="function",new="2"}
|
||||
### util.get_package_path {id="util.get_package_path",tag="function",version="2"}
|
||||
|
||||
Get path to an installed package. Mainly used to resolve the location of
|
||||
[pipeline packages](/usage/models). Currently imports the package to find its
|
||||
|
@ -1292,7 +1292,7 @@ path.
|
|||
| `package_name` | Name of installed package. ~~str~~ |
|
||||
| **RETURNS** | Path to pipeline package directory. ~~Path~~ |
|
||||
|
||||
### util.is_in_jupyter {id="util.is_in_jupyter",tag="function",new="2"}
|
||||
### util.is_in_jupyter {id="util.is_in_jupyter",tag="function",version="2"}
|
||||
|
||||
Check if user is running spaCy from a [Jupyter](https://jupyter.org) notebook by
|
||||
detecting the IPython kernel. Mainly used for the
|
||||
|
@ -1362,7 +1362,7 @@ Compile a sequence of infix rules into a regex object.
|
|||
| `entries` | The infix rules, e.g. [`lang.punctuation.TOKENIZER_INFIXES`](%%GITHUB_SPACY/spacy/lang/punctuation.py). ~~Iterable[Union[str, Pattern]]~~ |
|
||||
| **RETURNS** | The regex object to be used for [`Tokenizer.infix_finditer`](/api/tokenizer#attributes). ~~Pattern~~ |
|
||||
|
||||
### util.minibatch {id="util.minibatch",tag="function",new="2"}
|
||||
### util.minibatch {id="util.minibatch",tag="function",version="2"}
|
||||
|
||||
Iterate over batches of items. `size` may be an iterator, so that batch-size can
|
||||
vary on each step.
|
||||
|
@ -1381,7 +1381,7 @@ vary on each step.
|
|||
| `size` | The batch size(s). ~~Union[int, Sequence[int]]~~ |
|
||||
| **YIELDS** | The batches. |
|
||||
|
||||
### util.filter_spans {id="util.filter_spans",tag="function",new="2.1.4"}
|
||||
### util.filter_spans {id="util.filter_spans",tag="function",version="2.1.4"}
|
||||
|
||||
Filter a sequence of [`Span`](/api/span) objects and remove duplicates or
|
||||
overlaps. Useful for creating named entities (where one token can only be part
|
||||
|
@ -1402,7 +1402,7 @@ of one entity) or when merging spans with
|
|||
| `spans` | The spans to filter. ~~Iterable[Span]~~ |
|
||||
| **RETURNS** | The filtered spans. ~~List[Span]~~ |
|
||||
|
||||
### util.get_words_and_spaces {id="get_words_and_spaces",tag="function",new="3"}
|
||||
### util.get_words_and_spaces {id="get_words_and_spaces",tag="function",version="3"}
|
||||
|
||||
Given a list of words and a text, reconstruct the original tokens and return a
|
||||
list of words and spaces that can be used to create a [`Doc`](/api/doc#init).
|
||||
|
|
|
@ -3,7 +3,7 @@ title: Transformer
|
|||
teaser: Pipeline component for multi-task learning with transformer models
|
||||
tag: class
|
||||
source: github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py
|
||||
new: 3
|
||||
version: 3
|
||||
api_base_class: /api/pipe
|
||||
api_string_name: transformer
|
||||
---
|
||||
|
|
|
@ -3,7 +3,7 @@ title: Vectors
|
|||
teaser: Store, save and load word vectors
|
||||
tag: class
|
||||
source: spacy/vectors.pyx
|
||||
new: 2
|
||||
version: 2
|
||||
---
|
||||
|
||||
Vectors data is kept in the `Vectors.data` attribute, which should be an
|
||||
|
@ -356,7 +356,7 @@ supported for `floret` mode.
|
|||
| `sort` | Whether to sort the entries returned by score. Defaults to `True`. ~~bool~~ |
|
||||
| **RETURNS** | The most similar entries as a `(keys, best_rows, scores)` tuple. ~~Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]~~ |
|
||||
|
||||
## Vectors.get_batch {id="get_batch",tag="method",new="3.2"}
|
||||
## Vectors.get_batch {id="get_batch",tag="method",version="3.2"}
|
||||
|
||||
Get the vectors for the provided keys efficiently as a batch.
|
||||
|
||||
|
|
|
@ -122,7 +122,7 @@ using `token.check_flag(flag_id)`.
|
|||
| `flag_id` | An integer between `1` and `63` (inclusive), specifying the bit at which the flag will be stored. If `-1`, the lowest available bit will be chosen. ~~int~~ |
|
||||
| **RETURNS** | The integer ID by which the flag value can be checked. ~~int~~ |
|
||||
|
||||
## Vocab.reset_vectors {id="reset_vectors",tag="method",new="2"}
|
||||
## Vocab.reset_vectors {id="reset_vectors",tag="method",version="2"}
|
||||
|
||||
Drop the current vector table. Because all vectors must be the same width, you
|
||||
have to call this to change the size of the vectors. Only one of the `width` and
|
||||
|
@ -140,7 +140,7 @@ have to call this to change the size of the vectors. Only one of the `width` and
|
|||
| `width` | The new width. ~~int~~ |
|
||||
| `shape` | The new shape. ~~int~~ |
|
||||
|
||||
## Vocab.prune_vectors {id="prune_vectors",tag="method",new="2"}
|
||||
## Vocab.prune_vectors {id="prune_vectors",tag="method",version="2"}
|
||||
|
||||
Reduce the current vector table to `nr_row` unique entries. Words mapped to the
|
||||
discarded vectors will be remapped to the closest vector among those remaining.
|
||||
|
@ -165,7 +165,7 @@ cosines are calculated in minibatches to reduce memory usage.
|
|||
| `batch_size` | Batch of vectors for calculating the similarities. Larger batch sizes might be faster, while temporarily requiring more memory. ~~int~~ |
|
||||
| **RETURNS** | A dictionary keyed by removed words mapped to `(string, score)` tuples, where `string` is the entry the removed word was mapped to, and `score` the similarity score between the two words. ~~Dict[str, Tuple[str, float]]~~ |
|
||||
|
||||
## Vocab.deduplicate_vectors {id="deduplicate_vectors",tag="method",new="3.3"}
|
||||
## Vocab.deduplicate_vectors {id="deduplicate_vectors",tag="method",version="3.3"}
|
||||
|
||||
> #### Example
|
||||
>
|
||||
|
@ -176,7 +176,7 @@ cosines are calculated in minibatches to reduce memory usage.
|
|||
Remove any duplicate rows from the current vector table, maintaining the
|
||||
mappings for all words in the vectors.
|
||||
|
||||
## Vocab.get_vector {id="get_vector",tag="method",new="2"}
|
||||
## Vocab.get_vector {id="get_vector",tag="method",version="2"}
|
||||
|
||||
Retrieve a vector for a word in the vocabulary. Words can be looked up by string
|
||||
or hash value. If the current vectors do not contain an entry for the word, a
|
||||
|
@ -194,7 +194,7 @@ or hash value. If the current vectors do not contain an entry for the word, a
|
|||
| `orth` | The hash value of a word, or its unicode string. ~~Union[int, str]~~ |
|
||||
| **RETURNS** | A word vector. Size and shape are determined by the `Vocab.vectors` instance. ~~numpy.ndarray[ndim=1, dtype=float32]~~ |
|
||||
|
||||
## Vocab.set_vector {id="set_vector",tag="method",new="2"}
|
||||
## Vocab.set_vector {id="set_vector",tag="method",version="2"}
|
||||
|
||||
Set a vector for a word in the vocabulary. Words can be referenced by string or
|
||||
hash value.
|
||||
|
@ -210,7 +210,7 @@ hash value.
|
|||
| `orth` | The hash value of a word, or its unicode string. ~~Union[int, str]~~ |
|
||||
| `vector` | The vector to set. ~~numpy.ndarray[ndim=1, dtype=float32]~~ |
|
||||
|
||||
## Vocab.has_vector {id="has_vector",tag="method",new="2"}
|
||||
## Vocab.has_vector {id="has_vector",tag="method",version="2"}
|
||||
|
||||
Check whether a word has a vector. Returns `False` if no vectors are loaded.
|
||||
Words can be looked up by string or hash value.
|
||||
|
@ -227,7 +227,7 @@ Words can be looked up by string or hash value.
|
|||
| `orth` | The hash value of a word, or its unicode string. ~~Union[int, str]~~ |
|
||||
| **RETURNS** | Whether the word has a vector. ~~bool~~ |
|
||||
|
||||
## Vocab.to_disk {id="to_disk",tag="method",new="2"}
|
||||
## Vocab.to_disk {id="to_disk",tag="method",version="2"}
|
||||
|
||||
Save the current state to a directory.
|
||||
|
||||
|
@ -243,7 +243,7 @@ Save the current state to a directory.
|
|||
| _keyword-only_ | |
|
||||
| `exclude` | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
|
||||
|
||||
## Vocab.from_disk {id="from_disk",tag="method",new="2"}
|
||||
## Vocab.from_disk {id="from_disk",tag="method",version="2"}
|
||||
|
||||
Loads state from a directory. Modifies the object in place and returns it.
|
||||
|
||||
|
|
|
@ -114,12 +114,12 @@ example, to add a tag for the documented type or mark features that have been
|
|||
introduced in a specific version or require statistical models to be loaded.
|
||||
Tags are also available as standalone `<Tag />` components.
|
||||
|
||||
| Argument | Example | Result |
|
||||
| -------- | -------------------------- | ----------------------------------------- |
|
||||
| `tag` | `{tag="method"}` | <Tag>method</Tag> |
|
||||
| `new` | `{new="3"}` | <Tag variant="new">3</Tag> |
|
||||
| `model` | `{model="tagger, parser"}` | <Tag variant="model">tagger, parser</Tag> |
|
||||
| `hidden` | `{hidden="true"}` | |
|
||||
| Argument | Example | Result |
|
||||
| --------- | -------------------------- | ----------------------------------------- |
|
||||
| `tag` | `{tag="method"}` | <Tag>method</Tag> |
|
||||
| `version` | `{version="3"}` | <Tag variant="new">3</Tag> |
|
||||
| `model` | `{model="tagger, parser"}` | <Tag variant="model">tagger, parser</Tag> |
|
||||
| `hidden` | `{hidden="true"}` | |
|
||||
|
||||
## Elements {id="elements"}
|
||||
|
||||
|
@ -165,7 +165,7 @@ import Tag from 'components/tag'
|
|||
|
||||
> ```jsx
|
||||
> <Tag>method</Tag>
|
||||
> <Tag variant="new">4</Tag>
|
||||
> <Tag variant="version">4</Tag>
|
||||
> <Tag variant="model">tagger, parser</Tag>
|
||||
> ```
|
||||
|
||||
|
|
|
@ -121,7 +121,7 @@ $ pip install -U %%SPACY_PKG_NAME%%SPACY_PKG_FLAGS
|
|||
$ python -m spacy validate
|
||||
```
|
||||
|
||||
### Run spaCy with GPU {id="gpu",new="2.0.14"}
|
||||
### Run spaCy with GPU {id="gpu",version="2.0.14"}
|
||||
|
||||
As of v2.0, spaCy comes with neural network models that are implemented in our
|
||||
machine learning library, [Thinc](https://thinc.ai). For GPU support, we've been
|
||||
|
|
|
@ -76,7 +76,7 @@ print(token.morph) # 'Case=Nom|Number=Sing|Person=1|PronType=Prs'
|
|||
print(token.morph.get("PronType")) # ['Prs']
|
||||
```
|
||||
|
||||
### Statistical morphology {id="morphologizer",new="3",model="morphologizer"}
|
||||
### Statistical morphology {id="morphologizer",version="3",model="morphologizer"}
|
||||
|
||||
spaCy's statistical [`Morphologizer`](/api/morphologizer) component assigns the
|
||||
morphological features and coarse-grained part-of-speech tags as `Token.morph`
|
||||
|
@ -118,7 +118,7 @@ print(doc[2].morph) # 'Case=Nom|Person=2|PronType=Prs'
|
|||
print(doc[2].pos_) # 'PRON'
|
||||
```
|
||||
|
||||
## Lemmatization {id="lemmatization",model="lemmatizer",new="3"}
|
||||
## Lemmatization {id="lemmatization",model="lemmatizer",version="3"}
|
||||
|
||||
spaCy provides two pipeline components for lemmatization:
|
||||
|
||||
|
@ -959,7 +959,7 @@ nlp.tokenizer.add_special_case("...gimme...?", [{"ORTH": "...gimme...?"}])
|
|||
assert len(nlp("...gimme...?")) == 1
|
||||
```
|
||||
|
||||
#### Debugging the tokenizer {id="tokenizer-debug",new="2.2.3"}
|
||||
#### Debugging the tokenizer {id="tokenizer-debug",version="2.2.3"}
|
||||
|
||||
A working implementation of the pseudo-code above is available for debugging as
|
||||
[`nlp.tokenizer.explain(text)`](/api/tokenizer#explain). It returns a list of
|
||||
|
@ -1287,7 +1287,7 @@ tokenizer** it will be using at runtime. See the docs on
|
|||
|
||||
</Infobox>
|
||||
|
||||
#### Training with custom tokenization {id="custom-tokenizer-training",new="3"}
|
||||
#### Training with custom tokenization {id="custom-tokenizer-training",version="3"}
|
||||
|
||||
spaCy's [training config](/usage/training#config) describes the settings,
|
||||
hyperparameters, pipeline and tokenizer used for constructing and training the
|
||||
|
@ -1456,7 +1456,7 @@ tokenizations add up to the same string. For example, you'll be able to align
|
|||
|
||||
</Infobox>
|
||||
|
||||
## Merging and splitting {id="retokenization",new="2.1"}
|
||||
## Merging and splitting {id="retokenization",version="2.1"}
|
||||
|
||||
The [`Doc.retokenize`](/api/doc#retokenize) context manager lets you merge and
|
||||
split tokens. Modifications to the tokenization are stored and performed all at
|
||||
|
@ -1709,7 +1709,7 @@ your `Doc` using custom components _before_ it's parsed. Depending on your text,
|
|||
this may also improve parse accuracy, since the parser is constrained to predict
|
||||
parses consistent with the sentence boundaries.
|
||||
|
||||
### Statistical sentence segmenter {id="sbd-senter",model="senter",new="3"}
|
||||
### Statistical sentence segmenter {id="sbd-senter",model="senter",version="3"}
|
||||
|
||||
The [`SentenceRecognizer`](/api/sentencerecognizer) is a simple statistical
|
||||
component that only provides sentence boundaries. Along with being faster and
|
||||
|
@ -1810,7 +1810,7 @@ doc = nlp(text)
|
|||
print("After:", [sent.text for sent in doc.sents])
|
||||
```
|
||||
|
||||
## Mappings & Exceptions {id="mappings-exceptions",new="3"}
|
||||
## Mappings & Exceptions {id="mappings-exceptions",version="3"}
|
||||
|
||||
The [`AttributeRuler`](/api/attributeruler) manages **rule-based mappings and
|
||||
exceptions** for all token-level attributes. As the number of
|
||||
|
|
|
@ -74,7 +74,7 @@ import Languages from 'widgets/languages.js'
|
|||
|
||||
<Languages />
|
||||
|
||||
### Multi-language support {id="multi-language",new="2"}
|
||||
### Multi-language support {id="multi-language",version="2"}
|
||||
|
||||
> ```python
|
||||
> # Standard import
|
||||
|
@ -96,7 +96,7 @@ To train a pipeline using the neutral multi-language class, you can set
|
|||
import the `MultiLanguage` class directly, or call
|
||||
[`spacy.blank("xx")`](/api/top-level#spacy.blank) for lazy-loading.
|
||||
|
||||
### Chinese language support {id="chinese",new="2.3"}
|
||||
### Chinese language support {id="chinese",version="2.3"}
|
||||
|
||||
The Chinese language class supports three word segmentation options, `char`,
|
||||
`jieba` and `pkuseg`.
|
||||
|
|
|
@ -461,7 +461,7 @@ run as part of the pipeline.
|
|||
| `nlp.component_names` | All component names, including disabled components. |
|
||||
| `nlp.disabled` | Names of components that are currently disabled. |
|
||||
|
||||
### Sourcing components from existing pipelines {id="sourced-components",new="3"}
|
||||
### Sourcing components from existing pipelines {id="sourced-components",version="3"}
|
||||
|
||||
Pipeline components that are independent can also be reused across pipelines.
|
||||
Instead of adding a new blank component, you can also copy an existing component
|
||||
|
@ -518,7 +518,7 @@ nlp.add_pipe("ner", source=source_nlp)
|
|||
print(nlp.pipe_names)
|
||||
```
|
||||
|
||||
### Analyzing pipeline components {id="analysis",new="3"}
|
||||
### Analyzing pipeline components {id="analysis",version="3"}
|
||||
|
||||
The [`nlp.analyze_pipes`](/api/language#analyze_pipes) method analyzes the
|
||||
components in the current pipeline and outputs information about them like the
|
||||
|
@ -838,7 +838,7 @@ make your factory a separate function. That's also how spaCy does it internally.
|
|||
|
||||
</Accordion>
|
||||
|
||||
### Language-specific factories {id="factories-language",new="3"}
|
||||
### Language-specific factories {id="factories-language",version="3"}
|
||||
|
||||
There are many use cases where you might want your pipeline components to be
|
||||
language-specific. Sometimes this requires entirely different implementation per
|
||||
|
@ -1197,7 +1197,7 @@ object is saved to disk, which will run the component's `to_disk` method. When
|
|||
the pipeline is loaded back into spaCy later to use it, the `from_disk` method
|
||||
will load the data back in.
|
||||
|
||||
## Python type hints and validation {id="type-hints",new="3"}
|
||||
## Python type hints and validation {id="type-hints",version="3"}
|
||||
|
||||
spaCy's configs are powered by our machine learning library Thinc's
|
||||
[configuration system](https://thinc.ai/docs/usage-config), which supports
|
||||
|
@ -1267,7 +1267,7 @@ nlp.add_pipe("debug", config={"log_level": "DEBUG"})
|
|||
doc = nlp("This is a text...")
|
||||
```
|
||||
|
||||
## Trainable components {id="trainable-components",new="3"}
|
||||
## Trainable components {id="trainable-components",version="3"}
|
||||
|
||||
spaCy's [`TrainablePipe`](/api/pipe) class helps you implement your own
|
||||
trainable components that have their own model instance, make predictions over
|
||||
|
@ -1384,7 +1384,7 @@ into your spaCy pipeline, see the usage guide on
|
|||
|
||||
</Infobox>
|
||||
|
||||
## Extension attributes {id="custom-components-attributes",new="2"}
|
||||
## Extension attributes {id="custom-components-attributes",version="2"}
|
||||
|
||||
spaCy allows you to set any custom attributes and methods on the `Doc`, `Span`
|
||||
and `Token`, which become available as `Doc._`, `Span._` and `Token._` – for
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
title: Projects
|
||||
new: 3
|
||||
version: 3
|
||||
menu:
|
||||
- ['Intro & Workflow', 'intro']
|
||||
- ['Directory & Assets', 'directory']
|
||||
|
|
|
@ -218,7 +218,7 @@ spaCy processes your text – and why your pattern matches, or why it doesn't.
|
|||
|
||||
</Infobox>
|
||||
|
||||
#### Extended pattern syntax and attributes {id="adding-patterns-attributes-extended",new="2.1"}
|
||||
#### Extended pattern syntax and attributes {id="adding-patterns-attributes-extended",version="2.1"}
|
||||
|
||||
Instead of mapping to a single value, token patterns can also map to a
|
||||
**dictionary of properties**. For example, to specify that the value of a lemma
|
||||
|
@ -251,7 +251,7 @@ following rich comparison attributes are available:
|
|||
| `INTERSECTS` | Attribute value (for `MORPH` or custom list attributes) has a non-empty intersection with a list. ~~Any~~ |
|
||||
| `==`, `>=`, `<=`, `>`, `<` | Attribute value is equal, greater or equal, smaller or equal, greater or smaller. ~~Union[int, float]~~ |
|
||||
|
||||
#### Regular expressions {id="regex",new="2.1"}
|
||||
#### Regular expressions {id="regex",version="2.1"}
|
||||
|
||||
In some cases, only matching tokens and token attributes isn't enough – for
|
||||
example, you might want to match different spellings of a word, without having
|
||||
|
@ -402,7 +402,7 @@ This quirk in the semantics is corrected in spaCy v2.1.0.
|
|||
|
||||
</Infobox>
|
||||
|
||||
#### Using wildcard token patterns {id="adding-patterns-wildcard",new="2"}
|
||||
#### Using wildcard token patterns {id="adding-patterns-wildcard",version="2"}
|
||||
|
||||
While the token attributes offer many options to write highly specific patterns,
|
||||
you can also use an empty dictionary, `{}` as a wildcard representing **any
|
||||
|
@ -416,7 +416,7 @@ character, but no whitespace – so you'll know it will be handled as one token.
|
|||
[{"ORTH": "User"}, {"ORTH": "name"}, {"ORTH": ":"}, {}]
|
||||
```
|
||||
|
||||
#### Validating and debugging patterns {id="pattern-validation",new="2.1"}
|
||||
#### Validating and debugging patterns {id="pattern-validation",version="2.1"}
|
||||
|
||||
The `Matcher` can validate patterns against a JSON schema with the option
|
||||
`validate=True`. This is useful for debugging patterns during development, in
|
||||
|
@ -927,7 +927,7 @@ as a stream.
|
|||
|
||||
</Infobox>
|
||||
|
||||
### Matching on other token attributes {id="phrasematcher-attrs",new="2.1"}
|
||||
### Matching on other token attributes {id="phrasematcher-attrs",version="2.1"}
|
||||
|
||||
By default, the `PhraseMatcher` will match on the verbatim token text, e.g.
|
||||
`Token.text`. By setting the `attr` argument on initialization, you can change
|
||||
|
@ -991,7 +991,7 @@ to match phrases with the same sequence of punctuation and non-punctuation
|
|||
tokens as the pattern. But this can easily get confusing and doesn't have much
|
||||
of an advantage over writing one or two token patterns.
|
||||
|
||||
## Dependency Matcher {id="dependencymatcher",new="3",model="parser"}
|
||||
## Dependency Matcher {id="dependencymatcher",version="3",model="parser"}
|
||||
|
||||
The [`DependencyMatcher`](/api/dependencymatcher) lets you match patterns within
|
||||
the dependency parse using
|
||||
|
@ -1272,7 +1272,7 @@ of patterns such as `{}` that match any token in the sentence.
|
|||
|
||||
</Infobox>
|
||||
|
||||
## Rule-based entity recognition {id="entityruler",new="2.1"}
|
||||
## Rule-based entity recognition {id="entityruler",version="2.1"}
|
||||
|
||||
The [`EntityRuler`](/api/entityruler) is a component that lets you add named
|
||||
entities based on pattern dictionaries, which makes it easy to combine
|
||||
|
@ -1343,7 +1343,7 @@ doc = nlp("MyCorp Inc. is a company in the U.S.")
|
|||
print([(ent.text, ent.label_) for ent in doc.ents])
|
||||
```
|
||||
|
||||
#### Validating and debugging EntityRuler patterns {id="entityruler-pattern-validation",new="2.1.8"}
|
||||
#### Validating and debugging EntityRuler patterns {id="entityruler-pattern-validation",version="2.1.8"}
|
||||
|
||||
The entity ruler can validate patterns against a JSON schema with the config
|
||||
setting `"validate"`. See details under
|
||||
|
@ -1353,7 +1353,7 @@ setting `"validate"`. See details under
|
|||
ruler = nlp.add_pipe("entity_ruler", config={"validate": True})
|
||||
```
|
||||
|
||||
### Adding IDs to patterns {id="entityruler-ent-ids",new="2.2.2"}
|
||||
### Adding IDs to patterns {id="entityruler-ent-ids",version="2.2.2"}
|
||||
|
||||
The [`EntityRuler`](/api/entityruler) can also accept an `id` attribute for each
|
||||
pattern. Using the `id` attribute allows multiple patterns to be associated with
|
||||
|
@ -1427,7 +1427,7 @@ all pipeline components will be restored and deserialized – including the enti
|
|||
ruler. This lets you ship powerful pipeline packages with binary weights _and_
|
||||
rules included!
|
||||
|
||||
### Using a large number of phrase patterns {id="entityruler-large-phrase-patterns",new="2.2.4"}
|
||||
### Using a large number of phrase patterns {id="entityruler-large-phrase-patterns",version="2.2.4"}
|
||||
|
||||
<!-- TODO: double-check that this still works if the ruler is added to the pipeline on creation, and include suggestion if needed -->
|
||||
|
||||
|
@ -1455,7 +1455,7 @@ with nlp.select_pipes(enable="tagger"):
|
|||
ruler.add_patterns(patterns)
|
||||
```
|
||||
|
||||
## Rule-based span matching {id="spanruler",new="3.3.1"}
|
||||
## Rule-based span matching {id="spanruler",version="3.3.1"}
|
||||
|
||||
The [`SpanRuler`](/api/spanruler) is a generalized version of the entity ruler
|
||||
that lets you add spans to `doc.spans` or `doc.ents` based on pattern
|
||||
|
|
|
@ -49,7 +49,7 @@ the language class, creates and adds the pipeline components based on the config
|
|||
and _then_ loads in the binary data. You can read more about this process
|
||||
[here](/usage/processing-pipelines#pipelines).
|
||||
|
||||
## Serializing Doc objects efficiently {id="docs",new="2.2"}
|
||||
## Serializing Doc objects efficiently {id="docs",version="2.2"}
|
||||
|
||||
If you're working with lots of data, you'll probably need to pass analyses
|
||||
between machines, either to use something like [Dask](https://dask.org) or
|
||||
|
@ -292,9 +292,9 @@ custom components to spaCy automatically.
|
|||
|
||||
</Infobox>
|
||||
|
||||
<!-- ## Initializing components with data {id="initialization",new="3"} -->
|
||||
<!-- ## Initializing components with data {id="initialization",version="3"} -->
|
||||
|
||||
## Using entry points {id="entry-points",new="2.1"}
|
||||
## Using entry points {id="entry-points",version="2.1"}
|
||||
|
||||
Entry points let you expose parts of a Python package you write to other Python
|
||||
packages. This lets one application easily customize the behavior of another, by
|
||||
|
@ -540,7 +540,7 @@ pipeline packages you [train](/usage/training), which could then specify
|
|||
`lang = snk` in their `config.cfg` without spaCy raising an error because the
|
||||
language is not available in the core library.
|
||||
|
||||
### Custom displaCy colors via entry points {id="entry-points-displacy",new="2.2"}
|
||||
### Custom displaCy colors via entry points {id="entry-points-displacy",version="2.2"}
|
||||
|
||||
If you're training a named entity recognition model for a custom domain, you may
|
||||
end up training different labels that don't have pre-defined colors in the
|
||||
|
|
|
@ -518,7 +518,7 @@ replace_listeners = ["model.tok2vec"]
|
|||
|
||||
</Infobox>
|
||||
|
||||
### Using predictions from preceding components {id="annotating-components",new="3.1"}
|
||||
### Using predictions from preceding components {id="annotating-components",version="3.1"}
|
||||
|
||||
By default, components are updated in isolation during training, which means
|
||||
that they don't see the predictions of any earlier components in the pipeline. A
|
||||
|
@ -1657,7 +1657,7 @@ typically give you everything you need to train fully custom pipelines with
|
|||
|
||||
</Infobox>
|
||||
|
||||
### Training from a Python script {id="api-train",new="3.2"}
|
||||
### Training from a Python script {id="api-train",version="3.2"}
|
||||
|
||||
If you want to run the training from a Python script instead of using the
|
||||
[`spacy train`](/api/cli#train) CLI command, you can call into the
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
title: Visualizers
|
||||
teaser: Visualize dependencies and entities in your browser or in a notebook
|
||||
new: 2
|
||||
version: 2
|
||||
menu:
|
||||
- ['Dependencies', 'dep']
|
||||
- ['Named Entities', 'ent']
|
||||
|
@ -79,7 +79,7 @@ For a list of all available options, see the
|
|||
|
||||
![displaCy visualizer (compact mode)](/images/displacy-compact.svg)
|
||||
|
||||
### Visualizing long texts {id="dep-long-text",new="2.0.12"}
|
||||
### Visualizing long texts {id="dep-long-text",version="2.0.12"}
|
||||
|
||||
Long texts can become difficult to read when displayed in one row, so it's often
|
||||
better to visualize them sentence-by-sentence instead. As of v2.0.12, `displacy`
|
||||
|
|
|
@ -83,7 +83,7 @@ const Headline = ({
|
|||
Component,
|
||||
id,
|
||||
name,
|
||||
new: version,
|
||||
version,
|
||||
model,
|
||||
tag,
|
||||
source,
|
||||
|
@ -136,7 +136,7 @@ const Headline = ({
|
|||
Headline.propTypes = {
|
||||
Component: PropTypes.oneOfType([PropTypes.element, PropTypes.string]).isRequired,
|
||||
id: PropTypes.oneOfType([PropTypes.string, PropTypes.oneOf([false])]),
|
||||
new: PropTypes.string,
|
||||
version: PropTypes.string,
|
||||
model: PropTypes.string,
|
||||
source: PropTypes.string,
|
||||
tag: PropTypes.string,
|
||||
|
|
Loading…
Reference in New Issue
Block a user