Update docs

This commit is contained in:
Ines Montani 2020-08-29 12:36:05 +02:00
parent e0b4984aa4
commit 66d76f5126
5 changed files with 214 additions and 97 deletions

View File

@ -74,15 +74,16 @@ your config and check that it's valid, you can run the
Defines the `nlp` object, its tokenizer and Defines the `nlp` object, its tokenizer and
[processing pipeline](/usage/processing-pipelines) component names. [processing pipeline](/usage/processing-pipelines) component names.
| Name | Description | | Name | Description |
| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `lang` | Model language [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). Defaults to `null`. ~~str~~ | | `lang` | Model language [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). Defaults to `null`. ~~str~~ |
| `pipeline` | Names of pipeline components in order. Should correspond to sections in the `[components]` block, e.g. `[components.ner]`. See docs on [defining components](/usage/training#config-components). Defaults to `[]`. ~~List[str]~~ | | `pipeline` | Names of pipeline components in order. Should correspond to sections in the `[components]` block, e.g. `[components.ner]`. See docs on [defining components](/usage/training#config-components). Defaults to `[]`. ~~List[str]~~ |
| `load_vocab_data` | Whether to load additional lexeme and vocab data from [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) if available. Defaults to `true`. ~~bool~~ | | `disabled` | Names of pipeline components that are loaded but disabled by default and not run as part of the pipeline. Should correspond to components listed in `pipeline`. After a model is loaded, disabled components can be enabled using [`Language.enable_pipe`](/api/language#enable_pipe). ~~List[str]~~ |
| `before_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `Language` subclass before it's initialized. Defaults to `null`. ~~Optional[Callable[[Type[Language]], Type[Language]]]~~ | | `load_vocab_data` | Whether to load additional lexeme and vocab data from [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) if available. Defaults to `true`. ~~bool~~ |
| `after_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `nlp` object right after it's initialized. Defaults to `null`. ~~Optional[Callable[[Language], Language]]~~ | | `before_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `Language` subclass before it's initialized. Defaults to `null`. ~~Optional[Callable[[Type[Language]], Type[Language]]]~~ |
| `after_pipeline_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `nlp` object after the pipeline components have been added. Defaults to `null`. ~~Optional[Callable[[Language], Language]]~~ | | `after_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `nlp` object right after it's initialized. Defaults to `null`. ~~Optional[Callable[[Language], Language]]~~ |
| `tokenizer` | The tokenizer to use. Defaults to [`Tokenizer`](/api/tokenizer). ~~Callable[[str], Doc]~~ | | `after_pipeline_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `nlp` object after the pipeline components have been added. Defaults to `null`. ~~Optional[Callable[[Language], Language]]~~ |
| `tokenizer` | The tokenizer to use. Defaults to [`Tokenizer`](/api/tokenizer). ~~Callable[[str], Doc]~~ |
### components {#config-components tag="section"} ### components {#config-components tag="section"}

View File

@ -357,35 +357,6 @@ their original weights after the block.
| -------- | ------------------------------------------------------ | | -------- | ------------------------------------------------------ |
| `params` | A dictionary of parameters keyed by model ID. ~~dict~~ | | `params` | A dictionary of parameters keyed by model ID. ~~dict~~ |
## Language.create_pipe {#create_pipe tag="method" new="2"}
Create a pipeline component from a factory.
<Infobox title="Changed in v3.0" variant="warning">
As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method also takes
the string name of the factory, creates the component, adds it to the pipeline
and returns it. The `Language.create_pipe` method is now mostly used internally.
To create a component and add it to the pipeline, you should always use
`Language.add_pipe`.
</Infobox>
> #### Example
>
> ```python
> parser = nlp.create_pipe("parser")
> ```
| Name | Description |
| ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `factory_name` | Name of the registered component factory. ~~str~~ |
| `name` | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. ~~Optional[str]~~ |
| _keyword-only_ | |
| `config` <Tag variant="new">3</Tag> | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. ~~Optional[Dict[str, Any]]~~ |
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
## Language.add_pipe {#add_pipe tag="method" new="2"} ## Language.add_pipe {#add_pipe tag="method" new="2"}
Add a component to the processing pipeline. Expects a name that maps to a Add a component to the processing pipeline. Expects a name that maps to a
@ -434,6 +405,35 @@ component, adds it to the pipeline and returns it.
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ | | `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ | | **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
## Language.create_pipe {#create_pipe tag="method" new="2"}
Create a pipeline component from a factory.
<Infobox title="Changed in v3.0" variant="warning">
As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method also takes
the string name of the factory, creates the component, adds it to the pipeline
and returns it. The `Language.create_pipe` method is now mostly used internally.
To create a component and add it to the pipeline, you should always use
`Language.add_pipe`.
</Infobox>
> #### Example
>
> ```python
> parser = nlp.create_pipe("parser")
> ```
| Name | Description |
| ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `factory_name` | Name of the registered component factory. ~~str~~ |
| `name` | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. ~~Optional[str]~~ |
| _keyword-only_ | |
| `config` <Tag variant="new">3</Tag> | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. ~~Optional[Dict[str, Any]]~~ |
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
## Language.has_factory {#has_factory tag="classmethod" new="3"} ## Language.has_factory {#has_factory tag="classmethod" new="3"}
Check whether a factory name is registered on the `Language` class or subclass. Check whether a factory name is registered on the `Language` class or subclass.
@ -561,6 +561,54 @@ component function.
| `name` | Name of the component to remove. ~~str~~ | | `name` | Name of the component to remove. ~~str~~ |
| **RETURNS** | A `(name, component)` tuple of the removed component. ~~Tuple[str, Callable[[Doc], Doc]]~~ | | **RETURNS** | A `(name, component)` tuple of the removed component. ~~Tuple[str, Callable[[Doc], Doc]]~~ |
## Language.disable_pipe {#disable_pipe tag="method" new="3"}
Temporarily disable a pipeline component so it's not run as part of the
pipeline. Disabled components are listed in
[`nlp.disabled`](/api/language#attributes) and included in
[`nlp.components`](/api/language#attributes), but not in
[`nlp.pipeline`](/api/language#pipeline), so they're not run when you process a
`Doc` with the `nlp` object. If the component is already disabled, this method
does nothing.
> #### Example
>
> ```python
> nlp.add_pipe("ner")
> nlp.add_pipe("textcat")
> assert nlp.pipe_names == ["ner", "textcat"]
> nlp.disable_pipe("ner")
> assert nlp.pipe_names == ["textcat"]
> assert nlp.component_names == ["ner", "textcat"]
> assert nlp.disabled == ["ner"]
> ```
| Name | Description |
| ------ | ----------------------------------------- |
| `name` | Name of the component to disable. ~~str~~ |
## Language.enable_pipe {#enable_pipe tag="method" new="3"}
Enable a previously disable component (e.g. via
[`Language.disable_pipes`](/api/language#disable_pipes)) so it's run as part of
the pipeline, [`nlp.pipeline`](/api/language#pipeline). If the component is
already enabled, this method does nothing.
> #### Example
>
> ```python
> nlp.disable_pipe("ner")
> assert "ner" in nlp.disabled
> assert not "ner" in nlp.pipe_names
> nlp.enable_pipe("ner")
> assert not "ner" in nlp.disabled
> assert "ner" in nlp.pipe_names
> ```
| Name | Description |
| ------ | ---------------------------------------- |
| `name` | Name of the component to enable. ~~str~~ |
## Language.select_pipes {#select_pipes tag="contextmanager, method" new="3"} ## Language.select_pipes {#select_pipes tag="contextmanager, method" new="3"}
Disable one or more pipeline components. If used as a context manager, the Disable one or more pipeline components. If used as a context manager, the
@ -568,7 +616,9 @@ pipeline will be restored to the initial state at the end of the block.
Otherwise, a `DisabledPipes` object is returned, that has a `.restore()` method Otherwise, a `DisabledPipes` object is returned, that has a `.restore()` method
you can use to undo your changes. You can specify either `disable` (as a list or you can use to undo your changes. You can specify either `disable` (as a list or
string), or `enable`. In the latter case, all components not in the `enable` string), or `enable`. In the latter case, all components not in the `enable`
list, will be disabled. list, will be disabled. Under the hood, this method calls into
[`disable_pipe`](/api/language#disable_pipe) and
[`enable_pipe`](/api/language#enable_pipe).
> #### Example > #### Example
> >
@ -860,18 +910,21 @@ available to the loaded object.
## Attributes {#attributes} ## Attributes {#attributes}
| Name | Description | | Name | Description |
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `vocab` | A container for the lexical types. ~~Vocab~~ | | `vocab` | A container for the lexical types. ~~Vocab~~ |
| `tokenizer` | The tokenizer. ~~Tokenizer~~ | | `tokenizer` | The tokenizer. ~~Tokenizer~~ |
| `make_doc` | Callable that takes a string and returns a `Doc`. ~~Callable[[str], Doc]~~ | | `make_doc` | Callable that takes a string and returns a `Doc`. ~~Callable[[str], Doc]~~ |
| `pipeline` | List of `(name, component)` tuples describing the current processing pipeline, in order. ~~List[str, Callable[[Doc], Doc]]~~ | | `pipeline` | List of `(name, component)` tuples describing the current processing pipeline, in order. ~~List[Tuple[str, Callable[[Doc], Doc]]]~~ |
| `pipe_names` <Tag variant="new">2</Tag> | List of pipeline component names, in order. ~~List[str]~~ | | `pipe_names` <Tag variant="new">2</Tag> | List of pipeline component names, in order. ~~List[str]~~ |
| `pipe_labels` <Tag variant="new">2.2</Tag> | List of labels set by the pipeline components, if available, keyed by component name. ~~Dict[str, List[str]]~~ | | `pipe_labels` <Tag variant="new">2.2</Tag> | List of labels set by the pipeline components, if available, keyed by component name. ~~Dict[str, List[str]]~~ |
| `pipe_factories` <Tag variant="new">2.2</Tag> | Dictionary of pipeline component names, mapped to their factory names. ~~Dict[str, str]~~ | | `pipe_factories` <Tag variant="new">2.2</Tag> | Dictionary of pipeline component names, mapped to their factory names. ~~Dict[str, str]~~ |
| `factories` | All available factory functions, keyed by name. ~~Dict[str, Callable[[...], Callable[[Doc], Doc]]]~~ | | `factories` | All available factory functions, keyed by name. ~~Dict[str, Callable[[...], Callable[[Doc], Doc]]]~~ |
| `factory_names` <Tag variant="new">3</Tag> | List of all available factory names. ~~List[str]~~ | | `factory_names` <Tag variant="new">3</Tag> | List of all available factory names. ~~List[str]~~ |
| `path` <Tag variant="new">2</Tag> | Path to the model data directory, if a model is loaded. Otherwise `None`. ~~Optional[Path]~~ | | `components` <Tag variant="new">3</Tag> | List of all available `(name, component)` tuples, including components that are currently disabled. ~~List[Tuple[str, Callable[[Doc], Doc]]]~~ |
| `component_names` <Tag variant="new">3</Tag> | List of all available component names, including components that are currently disabled. ~~List[str]~~ |
| `disabled` <Tag variant="new">3</Tag> | Names of components that are currently disabled and don't run as part of the pipeline. ~~List[str]~~ |
| `path` <Tag variant="new">2</Tag> | Path to the model data directory, if a model is loaded. Otherwise `None`. ~~Optional[Path]~~ |
## Class attributes {#class-attributes} ## Class attributes {#class-attributes}

View File

@ -23,6 +23,14 @@ path, spaCy will assume it's a data directory, load its
information to construct the `Language` class. The data will be loaded in via information to construct the `Language` class. The data will be loaded in via
[`Language.from_disk`](/api/language#from_disk). [`Language.from_disk`](/api/language#from_disk).
<Infobox variant="warning" title="Changed in v3.0">
As of v3.0, the `disable` keyword argument specifies components to load but
disable, instead of components to not load at all. Those components can now be
specified separately using the new `exclude` keyword argument.
</Infobox>
> #### Example > #### Example
> >
> ```python > ```python
@ -30,16 +38,17 @@ information to construct the `Language` class. The data will be loaded in via
> nlp = spacy.load("/path/to/en") # string path > nlp = spacy.load("/path/to/en") # string path
> nlp = spacy.load(Path("/path/to/en")) # pathlib Path > nlp = spacy.load(Path("/path/to/en")) # pathlib Path
> >
> nlp = spacy.load("en_core_web_sm", disable=["parser", "tagger"]) > nlp = spacy.load("en_core_web_sm", exclude=["parser", "tagger"])
> ``` > ```
| Name | Description | | Name | Description |
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name` | Model to load, i.e. package name or path. ~~Union[str, Path]~~ | | `name` | Model to load, i.e. package name or path. ~~Union[str, Path]~~ |
| _keyword-only_ | | | _keyword-only_ | |
| `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). ~~List[str]~~ | | `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [nlp.enable_pipe](/api/language#enable_pipe). ~~List[str]~~ |
| `config` <Tag variant="new">3</Tag> | Optional config overrides, either as nested dict or dict keyed by section value in dot notation, e.g. `"components.name.value"`. ~~Union[Dict[str, Any], Config]~~ | | `exclude` <Tag variant="new">3</Tag> | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~ |
| **RETURNS** | A `Language` object with the loaded model. ~~Language~~ | | `config` <Tag variant="new">3</Tag> | Optional config overrides, either as nested dict or dict keyed by section value in dot notation, e.g. `"components.name.value"`. ~~Union[Dict[str, Any], Config]~~ |
| **RETURNS** | A `Language` object with the loaded model. ~~Language~~ |
Essentially, `spacy.load()` is a convenience wrapper that reads the model's Essentially, `spacy.load()` is a convenience wrapper that reads the model's
[`config.cfg`](/api/data-formats#config), uses the language and pipeline [`config.cfg`](/api/data-formats#config), uses the language and pipeline
@ -562,17 +571,18 @@ and create a `Language` object. The model data will then be loaded in via
> >
> ```python > ```python
> nlp = util.load_model("en_core_web_sm") > nlp = util.load_model("en_core_web_sm")
> nlp = util.load_model("en_core_web_sm", disable=["ner"]) > nlp = util.load_model("en_core_web_sm", exclude=["ner"])
> nlp = util.load_model("/path/to/data") > nlp = util.load_model("/path/to/data")
> ``` > ```
| Name | Description | | Name | Description |
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name` | Package name or model path. ~~str~~ | | `name` | Package name or model path. ~~str~~ |
| `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. | | `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. |
| `disable` | Names of pipeline components to disable. ~~Iterable[str]~~ | | `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [nlp.enable_pipe](/api/language#enable_pipe). ~~List[str]~~ |
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ | | `exclude` <Tag variant="new">3</Tag> | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~ |
| **RETURNS** | `Language` class with the loaded model. ~~Language~~ | | `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
| **RETURNS** | `Language` class with the loaded model. ~~Language~~ |
### util.load_model_from_init_py {#util.load_model_from_init_py tag="function" new="2"} ### util.load_model_from_init_py {#util.load_model_from_init_py tag="function" new="2"}
@ -588,13 +598,14 @@ A helper function to use in the `load()` method of a model package's
> return load_model_from_init_py(__file__, **overrides) > return load_model_from_init_py(__file__, **overrides)
> ``` > ```
| Name | Description | | Name | Description |
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `init_file` | Path to model's `__init__.py`, i.e. `__file__`. ~~Union[str, Path]~~ | | `init_file` | Path to model's `__init__.py`, i.e. `__file__`. ~~Union[str, Path]~~ |
| `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. | | `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. |
| `disable` | Names of pipeline components to disable. ~~Iterable[str]~~ | | `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [nlp.enable_pipe](/api/language#enable_pipe). ~~List[str]~~ |
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ | | `exclude` <Tag variant="new">3</Tag> | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~ |
| **RETURNS** | `Language` class with the loaded model. ~~Language~~ | | `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
| **RETURNS** | `Language` class with the loaded model. ~~Language~~ |
### util.load_config {#util.load_config tag="function" new="3"} ### util.load_config {#util.load_config tag="function" new="3"}

View File

@ -235,38 +235,54 @@ available pipeline components and component functions.
| `tok2vec` | [`Tok2Vec`](/api/tok2vec) | Assign token-to-vector embeddings. | | `tok2vec` | [`Tok2Vec`](/api/tok2vec) | Assign token-to-vector embeddings. |
| `transformer` | [`Transformer`](/api/transformer) | Assign the tokens and outputs of a transformer model. | | `transformer` | [`Transformer`](/api/transformer) | Assign the tokens and outputs of a transformer model. |
### Disabling and modifying pipeline components {#disabling} ### Disabling, excluding and modifying components {#disabling}
If you don't need a particular component of the pipeline for example, the If you don't need a particular component of the pipeline for example, the
tagger or the parser, you can **disable loading** it. This can sometimes make a tagger or the parser, you can **disable or exclude** it. This can sometimes make
big difference and improve loading speed. Disabled component names can be a big difference and improve loading and inference speed. There are two
provided to [`spacy.load`](/api/top-level#spacy.load), different mechanisms you can use:
[`Language.from_disk`](/api/language#from_disk) or the `nlp` object itself as a
list: 1. **Disable:** The component and its data will be loaded with the model, but it
will be disabled by default and not run as part of the processing pipeline.
To run it, you can explicitly enable it by calling
[`nlp.enable_pipe`](/api/language#enable_pipe). When you save out the `nlp`
object, the disabled component will be included but disabled by default.
2. **Exclude:** Don't load the component and its data with the model. Once the
model is loaded, there will be no reference to the excluded component.
Disabled and excluded component names can be provided to
[`spacy.load`](/api/top-level#spacy.load) as a list.
<!-- TODO: update with info on our models shipped with optional components -->
> #### 💡 Models with optional components
>
> The `disable` mechanism makes it easy to distribute models with optional
> components that you can enable or disable at runtime. For instance, your model
> may include a statistical _and_ a rule-based component for sentence
> segmentation, and you can choose which one to run depending on your use case.
```python ```python
### Disable loading # Load the model without the entity recognizer
nlp = spacy.load("en_core_web_sm", exclude=["ner"])
# Load the tagger and parser but don't enable them
nlp = spacy.load("en_core_web_sm", disable=["tagger", "parser"]) nlp = spacy.load("en_core_web_sm", disable=["tagger", "parser"])
# Explicitly enable the tagger later on
nlp.enable_pipe("tagger")
``` ```
In some cases, you do want to load all pipeline components and their weights, <Infobox variant="warning" title="Changed in v3.0">
because you need them at different points in your application. However, if you
only need a `Doc` object with named entities, there's no need to run all
pipeline components on it that can potentially make processing much slower.
Instead, you can use the `disable` keyword argument on
[`nlp.pipe`](/api/language#pipe) to temporarily disable the components **during
processing**:
```python As of v3.0, the `disable` keyword argument specifies components to load but
### Disable for processing disable, instead of components to not load at all. Those components can now be
for doc in nlp.pipe(texts, disable=["tagger", "parser"]): specified separately using the new `exclude` keyword argument.
# Do something with the doc here
```
If you need to **execute more code** with components disabled e.g. to reset </Infobox>
the weights or update only some components during training you can use the
[`nlp.select_pipes`](/api/language#select_pipes) context manager. At the end of As a shortcut, you can use the [`nlp.select_pipes`](/api/language#select_pipes)
the `with` block, the disabled pipeline components will be restored context manager to temporarily disable certain components for a given block. At
the end of the `with` block, the disabled pipeline components will be restored
automatically. Alternatively, `select_pipes` returns an object that lets you automatically. Alternatively, `select_pipes` returns an object that lets you
call its `restore()` method to restore the disabled components when needed. This call its `restore()` method to restore the disabled components when needed. This
can be useful if you want to prevent unnecessary code indentation of large can be useful if you want to prevent unnecessary code indentation of large
@ -295,6 +311,14 @@ with nlp.select_pipes(enable="parser"):
doc = nlp("I will only be parsed") doc = nlp("I will only be parsed")
``` ```
The [`nlp.pipe`](/api/language#pipe) method also supports a `disable` keyword
argument if you only want to disable components during processing:
```python
for doc in nlp.pipe(texts, disable=["tagger", "parser"]):
# Do something with the doc here
```
Finally, you can also use the [`remove_pipe`](/api/language#remove_pipe) method Finally, you can also use the [`remove_pipe`](/api/language#remove_pipe) method
to remove pipeline components from an existing pipeline, the to remove pipeline components from an existing pipeline, the
[`rename_pipe`](/api/language#rename_pipe) method to rename them, or the [`rename_pipe`](/api/language#rename_pipe) method to rename them, or the
@ -308,6 +332,31 @@ nlp.rename_pipe("ner", "entityrecognizer")
nlp.replace_pipe("tagger", my_custom_tagger) nlp.replace_pipe("tagger", my_custom_tagger)
``` ```
The `Language` object exposes different [attributes](/api/language#attributes)
that let you inspect all available components and the components that currently
run as part of the pipeline.
> #### Example
>
> ```python
> nlp = spacy.blank("en")
> nlp.add_pipe("ner")
> nlp.add_pipe("textcat")
> assert nlp.pipe_names == ["ner", "textcat"]
> nlp.disable_pipe("ner")
> assert nlp.pipe_names == ["textcat"]
> assert nlp.component_names == ["ner", "textcat"]
> assert nlp.disabled == ["ner"]
> ```
| Name | Description |
| --------------------- | ---------------------------------------------------------------- |
| `nlp.pipeline` | `(name, component)` tuples of the processing pipeline, in order. |
| `nlp.pipe_names` | Pipeline component names, in order. |
| `nlp.components` | All `(name, component)` tuples, including disabled components. |
| `nlp.component_names` | All component names, including disabled components. |
| `nlp.disabled` | Names of components that are currently disabled. |
### Sourcing pipeline components from existing models {#sourced-components new="3"} ### Sourcing pipeline components from existing models {#sourced-components new="3"}
Pipeline components that are independent can also be reused across models. Pipeline components that are independent can also be reused across models.

View File

@ -254,12 +254,15 @@ The following methods, attributes and commands are new in spaCy v3.0.
| [`Token.lex`](/api/token#attributes) | Access a token's [`Lexeme`](/api/lexeme). | | [`Token.lex`](/api/token#attributes) | Access a token's [`Lexeme`](/api/lexeme). |
| [`Token.morph`](/api/token#attributes) [`Token.morph_`](/api/token#attributes) | Access a token's morphological analysis. | | [`Token.morph`](/api/token#attributes) [`Token.morph_`](/api/token#attributes) | Access a token's morphological analysis. |
| [`Language.select_pipes`](/api/language#select_pipes) | Context manager for enabling or disabling specific pipeline components for a block. | | [`Language.select_pipes`](/api/language#select_pipes) | Context manager for enabling or disabling specific pipeline components for a block. |
| [`Language.disable_pipe`](/api/language#disable_pipe) [`Language.enable_pipe`](/api/language#enable_pipe) | Disable or enable a loaded pipeline component (but don't remove it). |
| [`Language.analyze_pipes`](/api/language#analyze_pipes) | [Analyze](/usage/processing-pipelines#analysis) components and their interdependencies. | | [`Language.analyze_pipes`](/api/language#analyze_pipes) | [Analyze](/usage/processing-pipelines#analysis) components and their interdependencies. |
| [`Language.resume_training`](/api/language#resume_training) | Experimental: continue training a pretrained model and initialize "rehearsal" for components that implement a `rehearse` method to prevent catastrophic forgetting. | | [`Language.resume_training`](/api/language#resume_training) | Experimental: continue training a pretrained model and initialize "rehearsal" for components that implement a `rehearse` method to prevent catastrophic forgetting. |
| [`@Language.factory`](/api/language#factory) [`@Language.component`](/api/language#component) | Decorators for [registering](/usage/processing-pipelines#custom-components) pipeline component factories and simple stateless component functions. | | [`@Language.factory`](/api/language#factory) [`@Language.component`](/api/language#component) | Decorators for [registering](/usage/processing-pipelines#custom-components) pipeline component factories and simple stateless component functions. |
| [`Language.has_factory`](/api/language#has_factory) | Check whether a component factory is registered on a language class.s | | [`Language.has_factory`](/api/language#has_factory) | Check whether a component factory is registered on a language class.s |
| [`Language.get_factory_meta`](/api/language#get_factory_meta) [`Language.get_pipe_meta`](/api/language#get_factory_meta) | Get the [`FactoryMeta`](/api/language#factorymeta) with component metadata for a factory or instance name. | | [`Language.get_factory_meta`](/api/language#get_factory_meta) [`Language.get_pipe_meta`](/api/language#get_factory_meta) | Get the [`FactoryMeta`](/api/language#factorymeta) with component metadata for a factory or instance name. |
| [`Language.config`](/api/language#config) | The [config](/usage/training#config) used to create the current `nlp` object. An instance of [`Config`](https://thinc.ai/docs/api-config#config) and can be saved to disk and used for training. | | [`Language.config`](/api/language#config) | The [config](/usage/training#config) used to create the current `nlp` object. An instance of [`Config`](https://thinc.ai/docs/api-config#config) and can be saved to disk and used for training. |
| [`Language.components`](/api/language#attributes) [`Language.component_names`](/api/language#attributes) | All available components and component names, including disabled components that are not run as part of the pipeline. |
| [`Language.disabled`](/api/language#attributes) | Names of disabled components that are not run as part of the pipeline. |
| [`Pipe.score`](/api/pipe#score) | Method on trainable pipeline components that returns a dictionary of evaluation scores. | | [`Pipe.score`](/api/pipe#score) | Method on trainable pipeline components that returns a dictionary of evaluation scores. |
| [`registry`](/api/top-level#registry) | Function registry to map functions to string names that can be referenced in [configs](/usage/training#config). | | [`registry`](/api/top-level#registry) | Function registry to map functions to string names that can be referenced in [configs](/usage/training#config). |
| [`util.load_meta`](/api/top-level#util.load_meta) [`util.load_config`](/api/top-level#util.load_config) | Updated helpers for loading a model's [`meta.json`](/api/data-formats#meta) and [`config.cfg`](/api/data-formats#config). | | [`util.load_meta`](/api/top-level#util.load_meta) [`util.load_config`](/api/top-level#util.load_config) | Updated helpers for loading a model's [`meta.json`](/api/data-formats#meta) and [`config.cfg`](/api/data-formats#config). |