mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 09:14:32 +03:00
Update docs [ci skip]
This commit is contained in:
parent
8685229891
commit
37ebff6997
|
@ -9,23 +9,23 @@ menu:
|
|||
next: /usage/training
|
||||
---
|
||||
|
||||
spaCy supports a number of transfer and multi-task learning workflows that can
|
||||
often help improve your pipeline's efficiency or accuracy. Transfer learning
|
||||
spaCy supports a number of **transfer and multi-task learning** workflows that
|
||||
can often help improve your pipeline's efficiency or accuracy. Transfer learning
|
||||
refers to techniques such as word vector tables and language model pretraining.
|
||||
These techniques can be used to import knowledge from raw text into your
|
||||
pipeline, so that your models are able to generalize better from your
|
||||
annotated examples.
|
||||
pipeline, so that your models are able to generalize better from your annotated
|
||||
examples.
|
||||
|
||||
You can convert word vectors from popular tools like FastText and Gensim, or
|
||||
you can load in any pretrained transformer model if you install our
|
||||
`spacy-transformers` integration. You can also do your own language model pretraining
|
||||
via the `spacy pretrain` command. You can even share your transformer or other
|
||||
contextual embedding model across multiple components, which can make long
|
||||
pipelines several times more efficient.
|
||||
|
||||
In order to use transfer learning, you'll need to have at least a few annotated
|
||||
examples for all of the classes you're trying to predict. If you don't, you
|
||||
could try using a "one-shot learning" approach using
|
||||
You can convert **word vectors** from popular tools like
|
||||
[FastText](https://fasttext.cc) and [Gensim](https://radimrehurek.com/gensim),
|
||||
or you can load in any pretrained **transformer model** if you install
|
||||
[`spacy-transformers`](https://github.com/explosion/spacy-transformers). You can
|
||||
also do your own language model pretraining via the
|
||||
[`spacy pretrain`](/api/cli#pretrain) command. You can even **share** your
|
||||
transformer or other contextual embedding model across multiple components,
|
||||
which can make long pipelines several times more efficient. To use transfer
|
||||
learning, you'll need at least a few annotated examples for what you're trying
|
||||
to predict. Otherwise, you could try using a "one-shot learning" approach using
|
||||
[vectors and similarity](/usage/linguistic-features#vectors-similarity).
|
||||
|
||||
<Accordion title="What’s the difference between word vectors and language models?" id="vectors-vs-language-models">
|
||||
|
@ -70,35 +70,13 @@ of performance.
|
|||
|
||||
## Shared embedding layers {#embedding-layers}
|
||||
|
||||
You can share a single token-to-vector embedding model between multiple
|
||||
components using the `Tok2Vec` component. Other components in
|
||||
your pipeline can "connect" to the `Tok2Vec` component by including a _listener layer_
|
||||
within their model. At the beginning of training, the `Tok2Vec` component will
|
||||
grab a reference to the relevant listener layers in the rest of your pipeline.
|
||||
Then, when the `Tok2Vec` component processes a batch of documents, it will pass
|
||||
forward its predictions to the listeners, allowing the listeners to reuse the
|
||||
predictions when they are eventually called. A similar mechanism is used to
|
||||
pass gradients from the listeners back to the `Tok2Vec` model. The
|
||||
`Transformer` component and `TransformerListener` layer do the same thing for
|
||||
transformer models, making it easy to share a single transformer model across
|
||||
your whole pipeline.
|
||||
|
||||
Training a single transformer or other embedding layer for use with multiple
|
||||
components is termed _multi-task learning_. Multi-task learning is sometimes
|
||||
less consistent, and the results are generally harder to reason about (as there's
|
||||
more going on). You'll usually want to compare your accuracy against a single-task
|
||||
approach to understand whether the weight-sharing is impacting your accuracy,
|
||||
and whether you can address the problem by adjusting the hyper-parameters. We
|
||||
are not currently aware of any foolproof recipe.
|
||||
|
||||
The main disadvantage of sharing weights between components is flexibility.
|
||||
If your components are independent, you can train pipelines separately and
|
||||
merge them together much more easily. Shared weights also make it more
|
||||
difficult to resume training of only part of your pipeline. If you train only
|
||||
part of your pipeline, you risk hurting the accuracy of the other components,
|
||||
as you'll be changing the shared embedding layer those components are relying
|
||||
on. <!-- TODO: Once rehearsal is tested, mention it here. -->
|
||||
|
||||
spaCy lets you share a single embedding layer and reuse it across multiple
|
||||
components. This is also known as **multi-task learning**. Sharing weights
|
||||
between components can make your pipeline run a lot faster and result in a much
|
||||
smaller models size, as you only need a single copy of the embeddings. However,
|
||||
it can make the pipeline less modular and make it more difficult to swap
|
||||
components or retrain parts of the pipeline, since all components depend on the
|
||||
same weights.
|
||||
|
||||
![Pipeline components using a shared embedding component vs. independent embedding layers](../images/tok2vec.svg)
|
||||
|
||||
|
@ -107,10 +85,27 @@ on. <!-- TODO: Once rehearsal is tested, mention it here. -->
|
|||
| ✅ **smaller:** models only need to include a single copy of the embeddings | ❌ **larger:** models need to include the embeddings for each component |
|
||||
| ✅ **faster:** embed the documents once for your whole pipeline | ❌ **slower:** rerun the embedding for each component |
|
||||
| ❌ **less composable:** all components require the same embedding component in the pipeline | ✅ **modular:** components can be moved and swapped freely |
|
||||
| ?? **accuracy:** weight sharing may increase or decrease accuracy, depending on your task and data, but usually the impact is small |
|
||||
|
||||
a single token-to-vector embedding model between multiple components using the
|
||||
[`Tok2Vec`](/api/tok2vec) component. Other components in your pipeline can
|
||||
"connect" this component by including a **listener layer** like
|
||||
[Tok2VecListener](/api/architectures#Tok2VecListener) within their model.
|
||||
|
||||
![Pipeline components listening to shared embedding component](../images/tok2vec-listener.svg)
|
||||
|
||||
At the beginning of training, the [`Tok2Vec`](/api/tok2vec) component will grab
|
||||
a reference to the relevant listener layers in the rest of your pipeline. When
|
||||
it processes a batch of documents, it will pass forward its predictions to the
|
||||
listeners, allowing the listeners to **reuse the predictions** when they are
|
||||
eventually called. A similar mechanism is used to pass gradients from the
|
||||
listeners back to the model. The [`Transformer`](/api/transformer) component and
|
||||
[TransformerListener](/api/architectures#TransformerListener) layer do the same
|
||||
thing for transformer models, making it easy to share a single transformer model
|
||||
across your whole pipeline.
|
||||
|
||||
<!-- TODO: show example of implementation via config, side by side -->
|
||||
|
||||
<!-- TODO: Once rehearsal is tested, mention it here. -->
|
||||
|
||||
## Using transformer models {#transformers}
|
||||
|
||||
|
|
|
@ -1550,7 +1550,7 @@ import Vectors101 from 'usage/101/\_vectors-similarity.md'
|
|||
### Adding word vectors {#adding-vectors}
|
||||
|
||||
Custom word vectors can be trained using a number of open-source libraries, such
|
||||
as [Gensim](https://radimrehurek.com/gensim), [Fast Text](https://fasttext.cc),
|
||||
as [Gensim](https://radimrehurek.com/gensim), [FastText](https://fasttext.cc),
|
||||
or Tomas Mikolov's original
|
||||
[Word2vec implementation](https://code.google.com/archive/p/word2vec/). Most
|
||||
word vector libraries output an easy-to-read text-based format, where each line
|
||||
|
|
Loading…
Reference in New Issue
Block a user