Dev docs: listeners (#9061)

* Start Listeners documentation

* intro tabel of different architectures

* initialization, linking, dim inference

* internal comm (WIP)

* expand internal comm section

* frozen components and replacing listeners

* various small fixes

* fix content table

* fix link
This commit is contained in:
Sofie Van Landeghem 2021-08-30 14:56:35 +02:00 committed by GitHub
parent 1e9b4b55ee
commit 5af88427a2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -0,0 +1,220 @@
# Listeners
1. [Overview](#1-overview)
2. [Initialization](#2-initialization)
- [A. Linking listeners to the embedding component](#2a-linking-listeners-to-the-embedding-component)
- [B. Shape inference](#2b-shape-inference)
3. [Internal communication](#3-internal-communication)
- [A. During prediction](#3a-during-prediction)
- [B. During training](#3b-during-training)
- [C. Frozen components](#3c-frozen-components)
4. [Replacing listener with standalone](#4-replacing-listener-with-standalone)
## 1. Overview
Trainable spaCy components typically use some sort of `tok2vec` layer as part of the `model` definition.
This `tok2vec` layer produces embeddings and is either a standard `Tok2Vec` layer, or a Transformer-based one.
Both versions can be used either inline/standalone, which means that they are defined and used
by only one specific component (e.g. NER), or
[shared](https://spacy.io/usage/embeddings-transformers#embedding-layers),
in which case the embedding functionality becomes a separate component that can
feed embeddings to multiple components downstream, using a listener-pattern.
| Type | Usage | Model Architecture |
| ------------- | ---------- | -------------------------------------------------------------------------------------------------- |
| `Tok2Vec` | standalone | [`spacy.Tok2Vec`](https://spacy.io/api/architectures#Tok2Vec) |
| `Tok2Vec` | listener | [`spacy.Tok2VecListener`](https://spacy.io/api/architectures#Tok2VecListener) |
| `Transformer` | standalone | [`spacy-transformers.Tok2VecTransformer`](https://spacy.io/api/architectures#Tok2VecTransformer) |
| `Transformer` | listener | [`spacy-transformers.TransformerListener`](https://spacy.io/api/architectures#TransformerListener) |
Here we discuss the listener pattern and its implementation in code in more detail.
## 2. Initialization
### 2A. Linking listeners to the embedding component
To allow sharing a `tok2vec` layer, a separate `tok2vec` component needs to be defined in the config:
```
[components.tok2vec]
factory = "tok2vec"
[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v2"
```
A listener can then be set up by making sure the correct `upstream` name is defined, referring to the
name of the `tok2vec` component (which equals the factory name by default), or `*` as a wildcard:
```
[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
upstream = "tok2vec"
```
When an [`nlp`](https://github.com/explosion/spaCy/blob/master/extra/DEVELOPER_DOCS/Language.md) object is
initialized or deserialized, it will make sure to link each `tok2vec` component to its listeners. This is
implemented in the method `nlp._link_components()` which loops over each
component in the pipeline and calls `find_listeners()` on a component if it's defined.
The [`tok2vec` component](https://github.com/explosion/spaCy/blob/master/spacy/pipeline/tok2vec.py)'s implementation
of this `find_listener()` method will specifically identify sublayers of a model definition that are of type
`Tok2VecListener` with a matching upstream name and will then add that listener to the internal `self.listener_map`.
If it's a Transformer-based pipeline, a
[`transformer` component](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py)
has a similar implementation but its `find_listener()` function will specifically look for `TransformerListener`
sublayers of downstream components.
### 2B. Shape inference
Typically, the output dimension `nO` of a listener's model equals the `nO` (or `width`) of the upstream embedding layer.
For a standard `Tok2Vec`-based component, this is typically known up-front and defined as such in the config:
```
[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
```
A `transformer` component however only knows its `nO` dimension after the HuggingFace transformer
is set with the function `model.attrs["set_transformer"]`,
[implemented](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/transformer_model.py)
by `set_pytorch_transformer`.
This is why, upon linking of the transformer listeners, the `transformer` component also makes sure to set
the listener's output dimension correctly.
This shape inference mechanism also needs to happen with resumed/frozen components, which means that for some CLI
commands (`assemble` and `train`), we need to call `nlp._link_components` even before initializing the `nlp`
object. To cover all use-cases and avoid negative side effects, the code base ensures that performing the
linking twice is not harmful.
## 3. Internal communication
The internal communication between a listener and its downstream components is organized by sending and
receiving information across the components - either directly or implicitly.
The details are different depending on whether the pipeline is currently training, or predicting.
Either way, the `tok2vec` or `transformer` component always needs to run before the listener.
### 3A. During prediction
When the `Tok2Vec` pipeline component is called, its `predict()` method is executed to produce the results,
which are then stored by `set_annotations()` in the `doc.tensor` field of the document(s).
Similarly, the `Transformer` component stores the produced embeddings
in `doc._.trf_data`. Next, the `forward` pass of a
[`Tok2VecListener`](https://github.com/explosion/spaCy/blob/master/spacy/pipeline/tok2vec.py)
or a
[`TransformerListener`](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/listener.py)
accesses these fields on the `Doc` directly. Both listener implementations have a fallback mechanism for when these
properties were not set on the `Doc`: in that case an all-zero tensor is produced and returned.
We need this fallback mechanism to enable shape inference methods in Thinc, but the code
is slightly risky and at times might hide another bug - so it's a good spot to be aware of.
### 3B. During training
During training, the `update()` methods of the `Tok2Vec` & `Transformer` components don't necessarily set the
annotations on the `Doc` (though since 3.1 they can if they are part of the `annotating_components` list in the config).
Instead, we rely on a caching mechanism between the original embedding component and its listener.
Specifically, the produced embeddings are sent to the listeners by calling `listener.receive()` and uniquely
identifying the batch of documents with a `batch_id`. This `receive()` call also sends the appropriate `backprop`
call to ensure that gradients from the downstream component flow back to the trainable `Tok2Vec` or `Transformer`
network.
We rely on the `nlp` object properly batching the data and sending each batch through the pipeline in sequence,
which means that only one such batch needs to be kept in memory for each listener.
When the downstream component runs and the listener should produce embeddings, it accesses the batch in memory,
runs the backpropagation, and returns the results and the gradients.
There are two ways in which this mechanism can fail, both are detected by `verify_inputs()`:
- `E953` if a different batch is in memory than the requested one - signaling some kind of out-of-sync state of the
training pipeline.
- `E954` if no batch is in memory at all - signaling that the pipeline is probably not set up correctly.
#### Training with multiple listeners
One `Tok2Vec` or `Transformer` component may be listened to by several downstream components, e.g.
a tagger and a parser could be sharing the same embeddings. In this case, we need to be careful about how we do
the backpropagation. When the `Tok2Vec` or `Transformer` sends out data to the listener with `receive()`, they will
send an `accumulate_gradient` function call to all listeners, except the last one. This function will keep track
of the gradients received so far. Only the final listener in the pipeline will get an actual `backprop` call that
will initiate the backpropagation of the `tok2vec` or `transformer` model with the accumulated gradients.
### 3C. Frozen components
The listener pattern can get particularly tricky in combination with frozen components. To detect components
with listeners that are not frozen consistently, `init_nlp()` (which is called by `spacy train`) goes through
the listeners and their upstream components and warns in two scenarios.
#### The Tok2Vec or Transformer is frozen
If the `Tok2Vec` or `Transformer` was already trained,
e.g. by [pretraining](https://spacy.io/usage/embeddings-transformers#pretraining),
it could be a valid use-case to freeze the embedding architecture and only train downstream components such
as a tagger or a parser. This used to be impossible before 3.1, but has become supported since then by putting the
embedding component in the [`annotating_components`](https://spacy.io/usage/training#annotating-components)
list of the config. This works like any other "annotating component" because it relies on the `Doc` attributes.
However, if the `Tok2Vec` or `Transformer` is frozen, and not present in `annotating_components`, and a related
listener isn't frozen, then a `W086` warning is shown and further training of the pipeline will likely end with `E954`.
#### The upstream component is frozen
If an upstream component is frozen but the underlying `Tok2Vec` or `Transformer` isn't, the performance of
the upstream component will be degraded after training. In this case, a `W087` warning is shown, explaining
how to use the `replace_listeners` functionality to prevent this problem.
## 4. Replacing listener with standalone
The [`replace_listeners`](https://spacy.io/api/language#replace_listeners) functionality changes the architecture
of a downstream component from using a listener pattern to a standalone `tok2vec` or `transformer` layer,
effectively making the downstream component independent of any other components in the pipeline.
It is implemented by `nlp.replace_listeners()` and typically executed by `nlp.from_config()`.
First, it fetches the original `Model` of the original component that creates the embeddings:
```
tok2vec = self.get_pipe(tok2vec_name)
tok2vec_model = tok2vec.model
```
Which is either a [`Tok2Vec` model](https://github.com/explosion/spaCy/blob/master/spacy/ml/models/tok2vec.py) or a
[`TransformerModel`](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/transformer_model.py).
In the case of the `tok2vec`, this model can be copied as-is into the configuration and architecture of the
downstream component. However, for the `transformer`, this doesn't work.
The reason is that the `TransformerListener` architecture chains the listener with
[`trfs2arrays`](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/trfs2arrays.py):
```
model = chain(
TransformerListener(upstream_name=upstream)
trfs2arrays(pooling, grad_factor),
)
```
but the standalone `Tok2VecTransformer` has an additional `split_trf_batch` chained inbetween the model
and `trfs2arrays`:
```
model = chain(
TransformerModel(name, get_spans, tokenizer_config),
split_trf_batch(),
trfs2arrays(pooling, grad_factor),
)
```
So you can't just take the model from the listener, and drop that into the component internally. You need to
adjust the model and the config. To facilitate this, `nlp.replace_listeners()` will check whether additional
[functions](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/_util.py) are
[defined](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/layers/transformer_model.py)
in `model.attrs`, and if so, it will essentially call these to make the appropriate changes:
```
replace_func = tok2vec_model.attrs["replace_listener_cfg"]
new_config = replace_func(tok2vec_cfg["model"], pipe_cfg["model"]["tok2vec"])
...
new_model = tok2vec_model.attrs["replace_listener"](new_model)
```
The new config and model are then properly stored on the `nlp` object.
Note that this functionality (running the replacement for a transformer listener) was broken prior to
`spacy-transformers` 1.0.5.