Update documentation

2025-11-01 16:37:45 +03:00 · 2021-01-29 18:45:48 +11:00 · 2021-01-29 18:45:48 +11:00 · 99af9e7125
commit 99af9e7125
parent 99842387cb
2 changed files with 65 additions and 4 deletions
--- a/website/docs/api/language.md
+++ b/website/docs/api/language.md
@ -833,6 +833,51 @@ token.ent_iob, token.ent_type
 | `pretty`       | Pretty-print the results as a table. Defaults to `False`. ~~bool~~                                                                                                                                                                          |
 | **RETURNS**    | Dictionary containing the pipe analysis, keyed by `"summary"` (component meta by pipe), `"problems"` (attribute names by pipe) and `"attrs"` (pipes that assign and require an attribute, keyed by attribute). ~~Optional[Dict[str, Any]]~~ |
 ## Language.replace_listeners {#replace_listeners tag="method" new="3"}
 Find [listener layers](/usage/embeddings-transformers#embedding-layers)
 (connecting to a shared token-to-vector embedding component) of a given pipeline
 component model and replace them with a standalone copy of the token-to-vector
 layer. The listener layer allows other components to connect to a shared
 token-to-vector embedding component like [`Tok2Vec`](/api/tok2vec) or
 [`Transformer`](/api/transformer). Replacing listeners can be useful when
 training a pipeline with components sourced from an existing pipeline: if
 multiple components (e.g. tagger, parser, NER) listen to the same
 token-to-vector component, but some of them are frozen and not updated, their
 performance may degrade significally as the token-to-vector component is updated
 with new data. To prevent this, listeners can be replaced with a standalone
 token-to-vector layer that is owned by the component and doesn't change if the
 component isn't updated.
 This method is typically not called directly and only executed under the hood
 when loading a config with
 [sourced components](/usage/training#config-components) that define
 `replace_listeners`.
 > ```python
 > ### Example
 > nlp = spacy.load("en_core_web_sm")
 > nlp.replace_listeners("tok2vec", "tagger", ["model.tok2vec"])
 > ```
 >
 > ```ini
 > ### config.cfg (excerpt)
 > [training]
 > frozen_components = ["tagger"]
 >
 > [components]
 >
 > [components.tagger]
 > source = "en_core_web_sm"
 > replace_listeners = ["model.tok2vec"]
 > ```
 | Name           | Description                                                                                                                                                                                                                                                                                                                                                                                                    |
 | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `tok2vec_name` | Name of the token-to-vector component, typically `"tok2vec"` or `"transformer"`.~~str~~                                                                                                                                                                                                                                                                                                                        |
 | `pipe_name`    | Name of pipeline component to replace listeners for. ~~str~~                                                                                                                                                                                                                                                                                                                                                   |
 | `listeners`    | The paths to the listeners, relative to the component config, e.g. `["model.tok2vec"]`. Typically, implementations will only connect to one tok2vec component, `model.tok2vec`, but in theory, custom models can use multiple listeners. The value here can either be an empty list to not replace any listeners, or a _complete_ list of the paths to all listener layers used by the model.~~Iterable[str]~~ |
 ## Language.meta {#meta tag="property"}
 Meta data for the `Language` class, including name, version, data sources,
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -419,13 +419,29 @@ pipeline = ["parser", "ner", "textcat", "custom"]
 frozen_components = ["parser", "custom"]
 ```
-<Infobox variant="warning" title="Shared Tok2Vec layer">
+<Infobox variant="warning" title="Shared Tok2Vec listener layer">
 When the components in your pipeline
 [share an embedding layer](/usage/embeddings-transformers#embedding-layers), the
-**performance** of your frozen component will be **degraded** if you continue training
+**performance** of your frozen component will be **degraded** if you continue
-other layers with the same underlying `Tok2Vec` instance. As a rule of thumb,
+training other layers with the same underlying `Tok2Vec` instance. As a rule of
-ensure that your frozen components are truly **independent** in the pipeline.
+thumb, ensure that your frozen components are truly **independent** in the
 pipeline.
 To automatically replace a shared token-to-vector listener with an independent
 copy of the token-to-vector layer, you can use the `replace_listeners` setting
 of a sourced component, pointing to the listener layer(s) in the config. For
 more details on how this works under the hood, see
 [`Language.replace_listeners`](/api/language#replace_listeners).
 ```ini
 [training]
 frozen_components = ["tagger"]
 [components.tagger]
 source = "en_core_web_sm"
 replace_listeners = ["model.tok2vec"]
 ```
 </Infobox>