mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 17:24:41 +03:00
Update documentation
This commit is contained in:
parent
99842387cb
commit
99af9e7125
|
@ -833,6 +833,51 @@ token.ent_iob, token.ent_type
|
||||||
| `pretty` | Pretty-print the results as a table. Defaults to `False`. ~~bool~~ |
|
| `pretty` | Pretty-print the results as a table. Defaults to `False`. ~~bool~~ |
|
||||||
| **RETURNS** | Dictionary containing the pipe analysis, keyed by `"summary"` (component meta by pipe), `"problems"` (attribute names by pipe) and `"attrs"` (pipes that assign and require an attribute, keyed by attribute). ~~Optional[Dict[str, Any]]~~ |
|
| **RETURNS** | Dictionary containing the pipe analysis, keyed by `"summary"` (component meta by pipe), `"problems"` (attribute names by pipe) and `"attrs"` (pipes that assign and require an attribute, keyed by attribute). ~~Optional[Dict[str, Any]]~~ |
|
||||||
|
|
||||||
|
## Language.replace_listeners {#replace_listeners tag="method" new="3"}
|
||||||
|
|
||||||
|
Find [listener layers](/usage/embeddings-transformers#embedding-layers)
|
||||||
|
(connecting to a shared token-to-vector embedding component) of a given pipeline
|
||||||
|
component model and replace them with a standalone copy of the token-to-vector
|
||||||
|
layer. The listener layer allows other components to connect to a shared
|
||||||
|
token-to-vector embedding component like [`Tok2Vec`](/api/tok2vec) or
|
||||||
|
[`Transformer`](/api/transformer). Replacing listeners can be useful when
|
||||||
|
training a pipeline with components sourced from an existing pipeline: if
|
||||||
|
multiple components (e.g. tagger, parser, NER) listen to the same
|
||||||
|
token-to-vector component, but some of them are frozen and not updated, their
|
||||||
|
performance may degrade significally as the token-to-vector component is updated
|
||||||
|
with new data. To prevent this, listeners can be replaced with a standalone
|
||||||
|
token-to-vector layer that is owned by the component and doesn't change if the
|
||||||
|
component isn't updated.
|
||||||
|
|
||||||
|
This method is typically not called directly and only executed under the hood
|
||||||
|
when loading a config with
|
||||||
|
[sourced components](/usage/training#config-components) that define
|
||||||
|
`replace_listeners`.
|
||||||
|
|
||||||
|
> ```python
|
||||||
|
> ### Example
|
||||||
|
> nlp = spacy.load("en_core_web_sm")
|
||||||
|
> nlp.replace_listeners("tok2vec", "tagger", ["model.tok2vec"])
|
||||||
|
> ```
|
||||||
|
>
|
||||||
|
> ```ini
|
||||||
|
> ### config.cfg (excerpt)
|
||||||
|
> [training]
|
||||||
|
> frozen_components = ["tagger"]
|
||||||
|
>
|
||||||
|
> [components]
|
||||||
|
>
|
||||||
|
> [components.tagger]
|
||||||
|
> source = "en_core_web_sm"
|
||||||
|
> replace_listeners = ["model.tok2vec"]
|
||||||
|
> ```
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
| `tok2vec_name` | Name of the token-to-vector component, typically `"tok2vec"` or `"transformer"`.~~str~~ |
|
||||||
|
| `pipe_name` | Name of pipeline component to replace listeners for. ~~str~~ |
|
||||||
|
| `listeners` | The paths to the listeners, relative to the component config, e.g. `["model.tok2vec"]`. Typically, implementations will only connect to one tok2vec component, `model.tok2vec`, but in theory, custom models can use multiple listeners. The value here can either be an empty list to not replace any listeners, or a _complete_ list of the paths to all listener layers used by the model.~~Iterable[str]~~ |
|
||||||
|
|
||||||
## Language.meta {#meta tag="property"}
|
## Language.meta {#meta tag="property"}
|
||||||
|
|
||||||
Meta data for the `Language` class, including name, version, data sources,
|
Meta data for the `Language` class, including name, version, data sources,
|
||||||
|
|
|
@ -419,13 +419,29 @@ pipeline = ["parser", "ner", "textcat", "custom"]
|
||||||
frozen_components = ["parser", "custom"]
|
frozen_components = ["parser", "custom"]
|
||||||
```
|
```
|
||||||
|
|
||||||
<Infobox variant="warning" title="Shared Tok2Vec layer">
|
<Infobox variant="warning" title="Shared Tok2Vec listener layer">
|
||||||
|
|
||||||
When the components in your pipeline
|
When the components in your pipeline
|
||||||
[share an embedding layer](/usage/embeddings-transformers#embedding-layers), the
|
[share an embedding layer](/usage/embeddings-transformers#embedding-layers), the
|
||||||
**performance** of your frozen component will be **degraded** if you continue training
|
**performance** of your frozen component will be **degraded** if you continue
|
||||||
other layers with the same underlying `Tok2Vec` instance. As a rule of thumb,
|
training other layers with the same underlying `Tok2Vec` instance. As a rule of
|
||||||
ensure that your frozen components are truly **independent** in the pipeline.
|
thumb, ensure that your frozen components are truly **independent** in the
|
||||||
|
pipeline.
|
||||||
|
|
||||||
|
To automatically replace a shared token-to-vector listener with an independent
|
||||||
|
copy of the token-to-vector layer, you can use the `replace_listeners` setting
|
||||||
|
of a sourced component, pointing to the listener layer(s) in the config. For
|
||||||
|
more details on how this works under the hood, see
|
||||||
|
[`Language.replace_listeners`](/api/language#replace_listeners).
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[training]
|
||||||
|
frozen_components = ["tagger"]
|
||||||
|
|
||||||
|
[components.tagger]
|
||||||
|
source = "en_core_web_sm"
|
||||||
|
replace_listeners = ["model.tok2vec"]
|
||||||
|
```
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user