mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 18:26:30 +03:00
Update docs
This commit is contained in:
parent
e0b4984aa4
commit
66d76f5126
|
@ -74,15 +74,16 @@ your config and check that it's valid, you can run the
|
||||||
Defines the `nlp` object, its tokenizer and
|
Defines the `nlp` object, its tokenizer and
|
||||||
[processing pipeline](/usage/processing-pipelines) component names.
|
[processing pipeline](/usage/processing-pipelines) component names.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `lang` | Model language [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). Defaults to `null`. ~~str~~ |
|
| `lang` | Model language [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). Defaults to `null`. ~~str~~ |
|
||||||
| `pipeline` | Names of pipeline components in order. Should correspond to sections in the `[components]` block, e.g. `[components.ner]`. See docs on [defining components](/usage/training#config-components). Defaults to `[]`. ~~List[str]~~ |
|
| `pipeline` | Names of pipeline components in order. Should correspond to sections in the `[components]` block, e.g. `[components.ner]`. See docs on [defining components](/usage/training#config-components). Defaults to `[]`. ~~List[str]~~ |
|
||||||
| `load_vocab_data` | Whether to load additional lexeme and vocab data from [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) if available. Defaults to `true`. ~~bool~~ |
|
| `disabled` | Names of pipeline components that are loaded but disabled by default and not run as part of the pipeline. Should correspond to components listed in `pipeline`. After a model is loaded, disabled components can be enabled using [`Language.enable_pipe`](/api/language#enable_pipe). ~~List[str]~~ |
|
||||||
| `before_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `Language` subclass before it's initialized. Defaults to `null`. ~~Optional[Callable[[Type[Language]], Type[Language]]]~~ |
|
| `load_vocab_data` | Whether to load additional lexeme and vocab data from [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) if available. Defaults to `true`. ~~bool~~ |
|
||||||
| `after_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `nlp` object right after it's initialized. Defaults to `null`. ~~Optional[Callable[[Language], Language]]~~ |
|
| `before_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `Language` subclass before it's initialized. Defaults to `null`. ~~Optional[Callable[[Type[Language]], Type[Language]]]~~ |
|
||||||
| `after_pipeline_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `nlp` object after the pipeline components have been added. Defaults to `null`. ~~Optional[Callable[[Language], Language]]~~ |
|
| `after_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `nlp` object right after it's initialized. Defaults to `null`. ~~Optional[Callable[[Language], Language]]~~ |
|
||||||
| `tokenizer` | The tokenizer to use. Defaults to [`Tokenizer`](/api/tokenizer). ~~Callable[[str], Doc]~~ |
|
| `after_pipeline_creation` | Optional [callback](/usage/training#custom-code-nlp-callbacks) to modify `nlp` object after the pipeline components have been added. Defaults to `null`. ~~Optional[Callable[[Language], Language]]~~ |
|
||||||
|
| `tokenizer` | The tokenizer to use. Defaults to [`Tokenizer`](/api/tokenizer). ~~Callable[[str], Doc]~~ |
|
||||||
|
|
||||||
### components {#config-components tag="section"}
|
### components {#config-components tag="section"}
|
||||||
|
|
||||||
|
|
|
@ -357,35 +357,6 @@ their original weights after the block.
|
||||||
| -------- | ------------------------------------------------------ |
|
| -------- | ------------------------------------------------------ |
|
||||||
| `params` | A dictionary of parameters keyed by model ID. ~~dict~~ |
|
| `params` | A dictionary of parameters keyed by model ID. ~~dict~~ |
|
||||||
|
|
||||||
## Language.create_pipe {#create_pipe tag="method" new="2"}
|
|
||||||
|
|
||||||
Create a pipeline component from a factory.
|
|
||||||
|
|
||||||
<Infobox title="Changed in v3.0" variant="warning">
|
|
||||||
|
|
||||||
As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method also takes
|
|
||||||
the string name of the factory, creates the component, adds it to the pipeline
|
|
||||||
and returns it. The `Language.create_pipe` method is now mostly used internally.
|
|
||||||
To create a component and add it to the pipeline, you should always use
|
|
||||||
`Language.add_pipe`.
|
|
||||||
|
|
||||||
</Infobox>
|
|
||||||
|
|
||||||
> #### Example
|
|
||||||
>
|
|
||||||
> ```python
|
|
||||||
> parser = nlp.create_pipe("parser")
|
|
||||||
> ```
|
|
||||||
|
|
||||||
| Name | Description |
|
|
||||||
| ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
||||||
| `factory_name` | Name of the registered component factory. ~~str~~ |
|
|
||||||
| `name` | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. ~~Optional[str]~~ |
|
|
||||||
| _keyword-only_ | |
|
|
||||||
| `config` <Tag variant="new">3</Tag> | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. ~~Optional[Dict[str, Any]]~~ |
|
|
||||||
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
|
|
||||||
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
|
|
||||||
|
|
||||||
## Language.add_pipe {#add_pipe tag="method" new="2"}
|
## Language.add_pipe {#add_pipe tag="method" new="2"}
|
||||||
|
|
||||||
Add a component to the processing pipeline. Expects a name that maps to a
|
Add a component to the processing pipeline. Expects a name that maps to a
|
||||||
|
@ -434,6 +405,35 @@ component, adds it to the pipeline and returns it.
|
||||||
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
|
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
|
||||||
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
|
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
|
||||||
|
|
||||||
|
## Language.create_pipe {#create_pipe tag="method" new="2"}
|
||||||
|
|
||||||
|
Create a pipeline component from a factory.
|
||||||
|
|
||||||
|
<Infobox title="Changed in v3.0" variant="warning">
|
||||||
|
|
||||||
|
As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method also takes
|
||||||
|
the string name of the factory, creates the component, adds it to the pipeline
|
||||||
|
and returns it. The `Language.create_pipe` method is now mostly used internally.
|
||||||
|
To create a component and add it to the pipeline, you should always use
|
||||||
|
`Language.add_pipe`.
|
||||||
|
|
||||||
|
</Infobox>
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> parser = nlp.create_pipe("parser")
|
||||||
|
> ```
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
| `factory_name` | Name of the registered component factory. ~~str~~ |
|
||||||
|
| `name` | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. ~~Optional[str]~~ |
|
||||||
|
| _keyword-only_ | |
|
||||||
|
| `config` <Tag variant="new">3</Tag> | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. ~~Optional[Dict[str, Any]]~~ |
|
||||||
|
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~ |
|
||||||
|
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
|
||||||
|
|
||||||
## Language.has_factory {#has_factory tag="classmethod" new="3"}
|
## Language.has_factory {#has_factory tag="classmethod" new="3"}
|
||||||
|
|
||||||
Check whether a factory name is registered on the `Language` class or subclass.
|
Check whether a factory name is registered on the `Language` class or subclass.
|
||||||
|
@ -561,6 +561,54 @@ component function.
|
||||||
| `name` | Name of the component to remove. ~~str~~ |
|
| `name` | Name of the component to remove. ~~str~~ |
|
||||||
| **RETURNS** | A `(name, component)` tuple of the removed component. ~~Tuple[str, Callable[[Doc], Doc]]~~ |
|
| **RETURNS** | A `(name, component)` tuple of the removed component. ~~Tuple[str, Callable[[Doc], Doc]]~~ |
|
||||||
|
|
||||||
|
## Language.disable_pipe {#disable_pipe tag="method" new="3"}
|
||||||
|
|
||||||
|
Temporarily disable a pipeline component so it's not run as part of the
|
||||||
|
pipeline. Disabled components are listed in
|
||||||
|
[`nlp.disabled`](/api/language#attributes) and included in
|
||||||
|
[`nlp.components`](/api/language#attributes), but not in
|
||||||
|
[`nlp.pipeline`](/api/language#pipeline), so they're not run when you process a
|
||||||
|
`Doc` with the `nlp` object. If the component is already disabled, this method
|
||||||
|
does nothing.
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> nlp.add_pipe("ner")
|
||||||
|
> nlp.add_pipe("textcat")
|
||||||
|
> assert nlp.pipe_names == ["ner", "textcat"]
|
||||||
|
> nlp.disable_pipe("ner")
|
||||||
|
> assert nlp.pipe_names == ["textcat"]
|
||||||
|
> assert nlp.component_names == ["ner", "textcat"]
|
||||||
|
> assert nlp.disabled == ["ner"]
|
||||||
|
> ```
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| ------ | ----------------------------------------- |
|
||||||
|
| `name` | Name of the component to disable. ~~str~~ |
|
||||||
|
|
||||||
|
## Language.enable_pipe {#enable_pipe tag="method" new="3"}
|
||||||
|
|
||||||
|
Enable a previously disable component (e.g. via
|
||||||
|
[`Language.disable_pipes`](/api/language#disable_pipes)) so it's run as part of
|
||||||
|
the pipeline, [`nlp.pipeline`](/api/language#pipeline). If the component is
|
||||||
|
already enabled, this method does nothing.
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> nlp.disable_pipe("ner")
|
||||||
|
> assert "ner" in nlp.disabled
|
||||||
|
> assert not "ner" in nlp.pipe_names
|
||||||
|
> nlp.enable_pipe("ner")
|
||||||
|
> assert not "ner" in nlp.disabled
|
||||||
|
> assert "ner" in nlp.pipe_names
|
||||||
|
> ```
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| ------ | ---------------------------------------- |
|
||||||
|
| `name` | Name of the component to enable. ~~str~~ |
|
||||||
|
|
||||||
## Language.select_pipes {#select_pipes tag="contextmanager, method" new="3"}
|
## Language.select_pipes {#select_pipes tag="contextmanager, method" new="3"}
|
||||||
|
|
||||||
Disable one or more pipeline components. If used as a context manager, the
|
Disable one or more pipeline components. If used as a context manager, the
|
||||||
|
@ -568,7 +616,9 @@ pipeline will be restored to the initial state at the end of the block.
|
||||||
Otherwise, a `DisabledPipes` object is returned, that has a `.restore()` method
|
Otherwise, a `DisabledPipes` object is returned, that has a `.restore()` method
|
||||||
you can use to undo your changes. You can specify either `disable` (as a list or
|
you can use to undo your changes. You can specify either `disable` (as a list or
|
||||||
string), or `enable`. In the latter case, all components not in the `enable`
|
string), or `enable`. In the latter case, all components not in the `enable`
|
||||||
list, will be disabled.
|
list, will be disabled. Under the hood, this method calls into
|
||||||
|
[`disable_pipe`](/api/language#disable_pipe) and
|
||||||
|
[`enable_pipe`](/api/language#enable_pipe).
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
|
@ -860,18 +910,21 @@ available to the loaded object.
|
||||||
|
|
||||||
## Attributes {#attributes}
|
## Attributes {#attributes}
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
|
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `vocab` | A container for the lexical types. ~~Vocab~~ |
|
| `vocab` | A container for the lexical types. ~~Vocab~~ |
|
||||||
| `tokenizer` | The tokenizer. ~~Tokenizer~~ |
|
| `tokenizer` | The tokenizer. ~~Tokenizer~~ |
|
||||||
| `make_doc` | Callable that takes a string and returns a `Doc`. ~~Callable[[str], Doc]~~ |
|
| `make_doc` | Callable that takes a string and returns a `Doc`. ~~Callable[[str], Doc]~~ |
|
||||||
| `pipeline` | List of `(name, component)` tuples describing the current processing pipeline, in order. ~~List[str, Callable[[Doc], Doc]]~~ |
|
| `pipeline` | List of `(name, component)` tuples describing the current processing pipeline, in order. ~~List[Tuple[str, Callable[[Doc], Doc]]]~~ |
|
||||||
| `pipe_names` <Tag variant="new">2</Tag> | List of pipeline component names, in order. ~~List[str]~~ |
|
| `pipe_names` <Tag variant="new">2</Tag> | List of pipeline component names, in order. ~~List[str]~~ |
|
||||||
| `pipe_labels` <Tag variant="new">2.2</Tag> | List of labels set by the pipeline components, if available, keyed by component name. ~~Dict[str, List[str]]~~ |
|
| `pipe_labels` <Tag variant="new">2.2</Tag> | List of labels set by the pipeline components, if available, keyed by component name. ~~Dict[str, List[str]]~~ |
|
||||||
| `pipe_factories` <Tag variant="new">2.2</Tag> | Dictionary of pipeline component names, mapped to their factory names. ~~Dict[str, str]~~ |
|
| `pipe_factories` <Tag variant="new">2.2</Tag> | Dictionary of pipeline component names, mapped to their factory names. ~~Dict[str, str]~~ |
|
||||||
| `factories` | All available factory functions, keyed by name. ~~Dict[str, Callable[[...], Callable[[Doc], Doc]]]~~ |
|
| `factories` | All available factory functions, keyed by name. ~~Dict[str, Callable[[...], Callable[[Doc], Doc]]]~~ |
|
||||||
| `factory_names` <Tag variant="new">3</Tag> | List of all available factory names. ~~List[str]~~ |
|
| `factory_names` <Tag variant="new">3</Tag> | List of all available factory names. ~~List[str]~~ |
|
||||||
| `path` <Tag variant="new">2</Tag> | Path to the model data directory, if a model is loaded. Otherwise `None`. ~~Optional[Path]~~ |
|
| `components` <Tag variant="new">3</Tag> | List of all available `(name, component)` tuples, including components that are currently disabled. ~~List[Tuple[str, Callable[[Doc], Doc]]]~~ |
|
||||||
|
| `component_names` <Tag variant="new">3</Tag> | List of all available component names, including components that are currently disabled. ~~List[str]~~ |
|
||||||
|
| `disabled` <Tag variant="new">3</Tag> | Names of components that are currently disabled and don't run as part of the pipeline. ~~List[str]~~ |
|
||||||
|
| `path` <Tag variant="new">2</Tag> | Path to the model data directory, if a model is loaded. Otherwise `None`. ~~Optional[Path]~~ |
|
||||||
|
|
||||||
## Class attributes {#class-attributes}
|
## Class attributes {#class-attributes}
|
||||||
|
|
||||||
|
|
|
@ -23,6 +23,14 @@ path, spaCy will assume it's a data directory, load its
|
||||||
information to construct the `Language` class. The data will be loaded in via
|
information to construct the `Language` class. The data will be loaded in via
|
||||||
[`Language.from_disk`](/api/language#from_disk).
|
[`Language.from_disk`](/api/language#from_disk).
|
||||||
|
|
||||||
|
<Infobox variant="warning" title="Changed in v3.0">
|
||||||
|
|
||||||
|
As of v3.0, the `disable` keyword argument specifies components to load but
|
||||||
|
disable, instead of components to not load at all. Those components can now be
|
||||||
|
specified separately using the new `exclude` keyword argument.
|
||||||
|
|
||||||
|
</Infobox>
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
|
@ -30,16 +38,17 @@ information to construct the `Language` class. The data will be loaded in via
|
||||||
> nlp = spacy.load("/path/to/en") # string path
|
> nlp = spacy.load("/path/to/en") # string path
|
||||||
> nlp = spacy.load(Path("/path/to/en")) # pathlib Path
|
> nlp = spacy.load(Path("/path/to/en")) # pathlib Path
|
||||||
>
|
>
|
||||||
> nlp = spacy.load("en_core_web_sm", disable=["parser", "tagger"])
|
> nlp = spacy.load("en_core_web_sm", exclude=["parser", "tagger"])
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
| ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `name` | Model to load, i.e. package name or path. ~~Union[str, Path]~~ |
|
| `name` | Model to load, i.e. package name or path. ~~Union[str, Path]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). ~~List[str]~~ |
|
| `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [nlp.enable_pipe](/api/language#enable_pipe). ~~List[str]~~ |
|
||||||
| `config` <Tag variant="new">3</Tag> | Optional config overrides, either as nested dict or dict keyed by section value in dot notation, e.g. `"components.name.value"`. ~~Union[Dict[str, Any], Config]~~ |
|
| `exclude` <Tag variant="new">3</Tag> | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~ |
|
||||||
| **RETURNS** | A `Language` object with the loaded model. ~~Language~~ |
|
| `config` <Tag variant="new">3</Tag> | Optional config overrides, either as nested dict or dict keyed by section value in dot notation, e.g. `"components.name.value"`. ~~Union[Dict[str, Any], Config]~~ |
|
||||||
|
| **RETURNS** | A `Language` object with the loaded model. ~~Language~~ |
|
||||||
|
|
||||||
Essentially, `spacy.load()` is a convenience wrapper that reads the model's
|
Essentially, `spacy.load()` is a convenience wrapper that reads the model's
|
||||||
[`config.cfg`](/api/data-formats#config), uses the language and pipeline
|
[`config.cfg`](/api/data-formats#config), uses the language and pipeline
|
||||||
|
@ -562,17 +571,18 @@ and create a `Language` object. The model data will then be loaded in via
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> nlp = util.load_model("en_core_web_sm")
|
> nlp = util.load_model("en_core_web_sm")
|
||||||
> nlp = util.load_model("en_core_web_sm", disable=["ner"])
|
> nlp = util.load_model("en_core_web_sm", exclude=["ner"])
|
||||||
> nlp = util.load_model("/path/to/data")
|
> nlp = util.load_model("/path/to/data")
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `name` | Package name or model path. ~~str~~ |
|
| `name` | Package name or model path. ~~str~~ |
|
||||||
| `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. |
|
| `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. |
|
||||||
| `disable` | Names of pipeline components to disable. ~~Iterable[str]~~ |
|
| `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [nlp.enable_pipe](/api/language#enable_pipe). ~~List[str]~~ |
|
||||||
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
|
| `exclude` <Tag variant="new">3</Tag> | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~ |
|
||||||
| **RETURNS** | `Language` class with the loaded model. ~~Language~~ |
|
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
|
||||||
|
| **RETURNS** | `Language` class with the loaded model. ~~Language~~ |
|
||||||
|
|
||||||
### util.load_model_from_init_py {#util.load_model_from_init_py tag="function" new="2"}
|
### util.load_model_from_init_py {#util.load_model_from_init_py tag="function" new="2"}
|
||||||
|
|
||||||
|
@ -588,13 +598,14 @@ A helper function to use in the `load()` method of a model package's
|
||||||
> return load_model_from_init_py(__file__, **overrides)
|
> return load_model_from_init_py(__file__, **overrides)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `init_file` | Path to model's `__init__.py`, i.e. `__file__`. ~~Union[str, Path]~~ |
|
| `init_file` | Path to model's `__init__.py`, i.e. `__file__`. ~~Union[str, Path]~~ |
|
||||||
| `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. |
|
| `vocab` <Tag variant="new">3</Tag> | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~. |
|
||||||
| `disable` | Names of pipeline components to disable. ~~Iterable[str]~~ |
|
| `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [nlp.enable_pipe](/api/language#enable_pipe). ~~List[str]~~ |
|
||||||
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
|
| `exclude` <Tag variant="new">3</Tag> | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~ |
|
||||||
| **RETURNS** | `Language` class with the loaded model. ~~Language~~ |
|
| `config` <Tag variant="new">3</Tag> | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"`. ~~Union[Dict[str, Any], Config]~~ |
|
||||||
|
| **RETURNS** | `Language` class with the loaded model. ~~Language~~ |
|
||||||
|
|
||||||
### util.load_config {#util.load_config tag="function" new="3"}
|
### util.load_config {#util.load_config tag="function" new="3"}
|
||||||
|
|
||||||
|
|
|
@ -235,38 +235,54 @@ available pipeline components and component functions.
|
||||||
| `tok2vec` | [`Tok2Vec`](/api/tok2vec) | Assign token-to-vector embeddings. |
|
| `tok2vec` | [`Tok2Vec`](/api/tok2vec) | Assign token-to-vector embeddings. |
|
||||||
| `transformer` | [`Transformer`](/api/transformer) | Assign the tokens and outputs of a transformer model. |
|
| `transformer` | [`Transformer`](/api/transformer) | Assign the tokens and outputs of a transformer model. |
|
||||||
|
|
||||||
### Disabling and modifying pipeline components {#disabling}
|
### Disabling, excluding and modifying components {#disabling}
|
||||||
|
|
||||||
If you don't need a particular component of the pipeline – for example, the
|
If you don't need a particular component of the pipeline – for example, the
|
||||||
tagger or the parser, you can **disable loading** it. This can sometimes make a
|
tagger or the parser, you can **disable or exclude** it. This can sometimes make
|
||||||
big difference and improve loading speed. Disabled component names can be
|
a big difference and improve loading and inference speed. There are two
|
||||||
provided to [`spacy.load`](/api/top-level#spacy.load),
|
different mechanisms you can use:
|
||||||
[`Language.from_disk`](/api/language#from_disk) or the `nlp` object itself as a
|
|
||||||
list:
|
1. **Disable:** The component and its data will be loaded with the model, but it
|
||||||
|
will be disabled by default and not run as part of the processing pipeline.
|
||||||
|
To run it, you can explicitly enable it by calling
|
||||||
|
[`nlp.enable_pipe`](/api/language#enable_pipe). When you save out the `nlp`
|
||||||
|
object, the disabled component will be included but disabled by default.
|
||||||
|
2. **Exclude:** Don't load the component and its data with the model. Once the
|
||||||
|
model is loaded, there will be no reference to the excluded component.
|
||||||
|
|
||||||
|
Disabled and excluded component names can be provided to
|
||||||
|
[`spacy.load`](/api/top-level#spacy.load) as a list.
|
||||||
|
|
||||||
|
<!-- TODO: update with info on our models shipped with optional components -->
|
||||||
|
|
||||||
|
> #### 💡 Models with optional components
|
||||||
|
>
|
||||||
|
> The `disable` mechanism makes it easy to distribute models with optional
|
||||||
|
> components that you can enable or disable at runtime. For instance, your model
|
||||||
|
> may include a statistical _and_ a rule-based component for sentence
|
||||||
|
> segmentation, and you can choose which one to run depending on your use case.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
### Disable loading
|
# Load the model without the entity recognizer
|
||||||
|
nlp = spacy.load("en_core_web_sm", exclude=["ner"])
|
||||||
|
|
||||||
|
# Load the tagger and parser but don't enable them
|
||||||
nlp = spacy.load("en_core_web_sm", disable=["tagger", "parser"])
|
nlp = spacy.load("en_core_web_sm", disable=["tagger", "parser"])
|
||||||
|
# Explicitly enable the tagger later on
|
||||||
|
nlp.enable_pipe("tagger")
|
||||||
```
|
```
|
||||||
|
|
||||||
In some cases, you do want to load all pipeline components and their weights,
|
<Infobox variant="warning" title="Changed in v3.0">
|
||||||
because you need them at different points in your application. However, if you
|
|
||||||
only need a `Doc` object with named entities, there's no need to run all
|
|
||||||
pipeline components on it – that can potentially make processing much slower.
|
|
||||||
Instead, you can use the `disable` keyword argument on
|
|
||||||
[`nlp.pipe`](/api/language#pipe) to temporarily disable the components **during
|
|
||||||
processing**:
|
|
||||||
|
|
||||||
```python
|
As of v3.0, the `disable` keyword argument specifies components to load but
|
||||||
### Disable for processing
|
disable, instead of components to not load at all. Those components can now be
|
||||||
for doc in nlp.pipe(texts, disable=["tagger", "parser"]):
|
specified separately using the new `exclude` keyword argument.
|
||||||
# Do something with the doc here
|
|
||||||
```
|
|
||||||
|
|
||||||
If you need to **execute more code** with components disabled – e.g. to reset
|
</Infobox>
|
||||||
the weights or update only some components during training – you can use the
|
|
||||||
[`nlp.select_pipes`](/api/language#select_pipes) context manager. At the end of
|
As a shortcut, you can use the [`nlp.select_pipes`](/api/language#select_pipes)
|
||||||
the `with` block, the disabled pipeline components will be restored
|
context manager to temporarily disable certain components for a given block. At
|
||||||
|
the end of the `with` block, the disabled pipeline components will be restored
|
||||||
automatically. Alternatively, `select_pipes` returns an object that lets you
|
automatically. Alternatively, `select_pipes` returns an object that lets you
|
||||||
call its `restore()` method to restore the disabled components when needed. This
|
call its `restore()` method to restore the disabled components when needed. This
|
||||||
can be useful if you want to prevent unnecessary code indentation of large
|
can be useful if you want to prevent unnecessary code indentation of large
|
||||||
|
@ -295,6 +311,14 @@ with nlp.select_pipes(enable="parser"):
|
||||||
doc = nlp("I will only be parsed")
|
doc = nlp("I will only be parsed")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The [`nlp.pipe`](/api/language#pipe) method also supports a `disable` keyword
|
||||||
|
argument if you only want to disable components during processing:
|
||||||
|
|
||||||
|
```python
|
||||||
|
for doc in nlp.pipe(texts, disable=["tagger", "parser"]):
|
||||||
|
# Do something with the doc here
|
||||||
|
```
|
||||||
|
|
||||||
Finally, you can also use the [`remove_pipe`](/api/language#remove_pipe) method
|
Finally, you can also use the [`remove_pipe`](/api/language#remove_pipe) method
|
||||||
to remove pipeline components from an existing pipeline, the
|
to remove pipeline components from an existing pipeline, the
|
||||||
[`rename_pipe`](/api/language#rename_pipe) method to rename them, or the
|
[`rename_pipe`](/api/language#rename_pipe) method to rename them, or the
|
||||||
|
@ -308,6 +332,31 @@ nlp.rename_pipe("ner", "entityrecognizer")
|
||||||
nlp.replace_pipe("tagger", my_custom_tagger)
|
nlp.replace_pipe("tagger", my_custom_tagger)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The `Language` object exposes different [attributes](/api/language#attributes)
|
||||||
|
that let you inspect all available components and the components that currently
|
||||||
|
run as part of the pipeline.
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> nlp = spacy.blank("en")
|
||||||
|
> nlp.add_pipe("ner")
|
||||||
|
> nlp.add_pipe("textcat")
|
||||||
|
> assert nlp.pipe_names == ["ner", "textcat"]
|
||||||
|
> nlp.disable_pipe("ner")
|
||||||
|
> assert nlp.pipe_names == ["textcat"]
|
||||||
|
> assert nlp.component_names == ["ner", "textcat"]
|
||||||
|
> assert nlp.disabled == ["ner"]
|
||||||
|
> ```
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| --------------------- | ---------------------------------------------------------------- |
|
||||||
|
| `nlp.pipeline` | `(name, component)` tuples of the processing pipeline, in order. |
|
||||||
|
| `nlp.pipe_names` | Pipeline component names, in order. |
|
||||||
|
| `nlp.components` | All `(name, component)` tuples, including disabled components. |
|
||||||
|
| `nlp.component_names` | All component names, including disabled components. |
|
||||||
|
| `nlp.disabled` | Names of components that are currently disabled. |
|
||||||
|
|
||||||
### Sourcing pipeline components from existing models {#sourced-components new="3"}
|
### Sourcing pipeline components from existing models {#sourced-components new="3"}
|
||||||
|
|
||||||
Pipeline components that are independent can also be reused across models.
|
Pipeline components that are independent can also be reused across models.
|
||||||
|
|
|
@ -254,12 +254,15 @@ The following methods, attributes and commands are new in spaCy v3.0.
|
||||||
| [`Token.lex`](/api/token#attributes) | Access a token's [`Lexeme`](/api/lexeme). |
|
| [`Token.lex`](/api/token#attributes) | Access a token's [`Lexeme`](/api/lexeme). |
|
||||||
| [`Token.morph`](/api/token#attributes) [`Token.morph_`](/api/token#attributes) | Access a token's morphological analysis. |
|
| [`Token.morph`](/api/token#attributes) [`Token.morph_`](/api/token#attributes) | Access a token's morphological analysis. |
|
||||||
| [`Language.select_pipes`](/api/language#select_pipes) | Context manager for enabling or disabling specific pipeline components for a block. |
|
| [`Language.select_pipes`](/api/language#select_pipes) | Context manager for enabling or disabling specific pipeline components for a block. |
|
||||||
|
| [`Language.disable_pipe`](/api/language#disable_pipe) [`Language.enable_pipe`](/api/language#enable_pipe) | Disable or enable a loaded pipeline component (but don't remove it). |
|
||||||
| [`Language.analyze_pipes`](/api/language#analyze_pipes) | [Analyze](/usage/processing-pipelines#analysis) components and their interdependencies. |
|
| [`Language.analyze_pipes`](/api/language#analyze_pipes) | [Analyze](/usage/processing-pipelines#analysis) components and their interdependencies. |
|
||||||
| [`Language.resume_training`](/api/language#resume_training) | Experimental: continue training a pretrained model and initialize "rehearsal" for components that implement a `rehearse` method to prevent catastrophic forgetting. |
|
| [`Language.resume_training`](/api/language#resume_training) | Experimental: continue training a pretrained model and initialize "rehearsal" for components that implement a `rehearse` method to prevent catastrophic forgetting. |
|
||||||
| [`@Language.factory`](/api/language#factory) [`@Language.component`](/api/language#component) | Decorators for [registering](/usage/processing-pipelines#custom-components) pipeline component factories and simple stateless component functions. |
|
| [`@Language.factory`](/api/language#factory) [`@Language.component`](/api/language#component) | Decorators for [registering](/usage/processing-pipelines#custom-components) pipeline component factories and simple stateless component functions. |
|
||||||
| [`Language.has_factory`](/api/language#has_factory) | Check whether a component factory is registered on a language class.s |
|
| [`Language.has_factory`](/api/language#has_factory) | Check whether a component factory is registered on a language class.s |
|
||||||
| [`Language.get_factory_meta`](/api/language#get_factory_meta) [`Language.get_pipe_meta`](/api/language#get_factory_meta) | Get the [`FactoryMeta`](/api/language#factorymeta) with component metadata for a factory or instance name. |
|
| [`Language.get_factory_meta`](/api/language#get_factory_meta) [`Language.get_pipe_meta`](/api/language#get_factory_meta) | Get the [`FactoryMeta`](/api/language#factorymeta) with component metadata for a factory or instance name. |
|
||||||
| [`Language.config`](/api/language#config) | The [config](/usage/training#config) used to create the current `nlp` object. An instance of [`Config`](https://thinc.ai/docs/api-config#config) and can be saved to disk and used for training. |
|
| [`Language.config`](/api/language#config) | The [config](/usage/training#config) used to create the current `nlp` object. An instance of [`Config`](https://thinc.ai/docs/api-config#config) and can be saved to disk and used for training. |
|
||||||
|
| [`Language.components`](/api/language#attributes) [`Language.component_names`](/api/language#attributes) | All available components and component names, including disabled components that are not run as part of the pipeline. |
|
||||||
|
| [`Language.disabled`](/api/language#attributes) | Names of disabled components that are not run as part of the pipeline. |
|
||||||
| [`Pipe.score`](/api/pipe#score) | Method on trainable pipeline components that returns a dictionary of evaluation scores. |
|
| [`Pipe.score`](/api/pipe#score) | Method on trainable pipeline components that returns a dictionary of evaluation scores. |
|
||||||
| [`registry`](/api/top-level#registry) | Function registry to map functions to string names that can be referenced in [configs](/usage/training#config). |
|
| [`registry`](/api/top-level#registry) | Function registry to map functions to string names that can be referenced in [configs](/usage/training#config). |
|
||||||
| [`util.load_meta`](/api/top-level#util.load_meta) [`util.load_config`](/api/top-level#util.load_config) | Updated helpers for loading a model's [`meta.json`](/api/data-formats#meta) and [`config.cfg`](/api/data-formats#config). |
|
| [`util.load_meta`](/api/top-level#util.load_meta) [`util.load_config`](/api/top-level#util.load_config) | Updated helpers for loading a model's [`meta.json`](/api/data-formats#meta) and [`config.cfg`](/api/data-formats#config). |
|
||||||
|
|
Loading…
Reference in New Issue
Block a user