mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-24 16:24:16 +03:00
Docs for pretrain architectures (#6605)
* document pretraining architectures * formatting * bit more info * small fixes
This commit is contained in:
parent
bf9096437e
commit
82ae95267a
|
@ -5,6 +5,7 @@ source: spacy/ml/models
|
|||
menu:
|
||||
- ['Tok2Vec', 'tok2vec-arch']
|
||||
- ['Transformers', 'transformers']
|
||||
- ['Pretraining', 'pretrain']
|
||||
- ['Parser & NER', 'parser']
|
||||
- ['Tagging', 'tagger']
|
||||
- ['Text Classification', 'textcat']
|
||||
|
@ -426,6 +427,71 @@ one component.
|
|||
| `grad_factor` | Reweight gradients from the component before passing them upstream. You can set this to `0` to "freeze" the transformer weights with respect to the component, or use it to make some components more significant than others. Leaving it at `1.0` is usually fine. ~~float~~ |
|
||||
| **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~ |
|
||||
|
||||
## Pretraining architectures {#pretrain source="spacy/ml/models/multi_task.py"}
|
||||
|
||||
The spacy `pretrain` command lets you initialize a `Tok2Vec` layer in your
|
||||
pipeline with information from raw text. To this end, additional layers are
|
||||
added to build a network for a temporary task that forces the `Tok2Vec` layer to
|
||||
learn something about sentence structure and word cooccurrence statistics. Two
|
||||
pretraining objectives are available, both of which are variants of the cloze
|
||||
task [Devlin et al. (2018)](https://arxiv.org/abs/1810.04805) introduced for
|
||||
BERT.
|
||||
|
||||
For more information, see the section on
|
||||
[pretraining](/usage/embeddings-transformers#pretraining).
|
||||
|
||||
### spacy.PretrainVectors.v1 {#pretrain_vectors}
|
||||
|
||||
> #### Example config
|
||||
>
|
||||
> ```ini
|
||||
> [pretraining]
|
||||
> component = "tok2vec"
|
||||
> ...
|
||||
>
|
||||
> [pretraining.objective]
|
||||
> @architectures = "spacy.PretrainVectors.v1"
|
||||
> maxout_pieces = 3
|
||||
> hidden_size = 300
|
||||
> loss = "cosine"
|
||||
> ```
|
||||
|
||||
Predict the word's vector from a static embeddings table as pretraining
|
||||
objective for a Tok2Vec layer.
|
||||
|
||||
| Name | Description |
|
||||
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `maxout_pieces` | The number of maxout pieces to use. Recommended values are `2` or `3`. ~~int~~ |
|
||||
| `hidden_size` | Size of the hidden layer of the model. ~~int~~ |
|
||||
| `loss` | The loss function can be either "cosine" or "L2". We typically recommend to use "cosine". ~~~str~~ |
|
||||
| **CREATES** | A callable function that can create the Model, given the `vocab` of the pipeline and the `tok2vec` layer to pretrain. ~~Callable[[Vocab, Model], Model]~~ |
|
||||
|
||||
### spacy.PretrainCharacters.v1 {#pretrain_chars}
|
||||
|
||||
> #### Example config
|
||||
>
|
||||
> ```ini
|
||||
> [pretraining]
|
||||
> component = "tok2vec"
|
||||
> ...
|
||||
>
|
||||
> [pretraining.objective]
|
||||
> @architectures = "spacy.PretrainCharacters.v1"
|
||||
> maxout_pieces = 3
|
||||
> hidden_size = 300
|
||||
> n_characters = 4
|
||||
> ```
|
||||
|
||||
Predict some number of leading and trailing UTF-8 bytes as pretraining objective
|
||||
for a Tok2Vec layer.
|
||||
|
||||
| Name | Description |
|
||||
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `maxout_pieces` | The number of maxout pieces to use. Recommended values are `2` or `3`. ~~int~~ |
|
||||
| `hidden_size` | Size of the hidden layer of the model. ~~int~~ |
|
||||
| `n_characters` | The window of characters - e.g. if `n_characters = 2`, the model will try to predict the first two and last two characters of the word. ~~int~~ |
|
||||
| **CREATES** | A callable function that can create the Model, given the `vocab` of the pipeline and the `tok2vec` layer to pretrain. ~~Callable[[Vocab, Model], Model]~~ |
|
||||
|
||||
## Parser & NER architectures {#parser}
|
||||
|
||||
### spacy.TransitionBasedParser.v2 {#TransitionBasedParser source="spacy/ml/models/parser.py"}
|
||||
|
|
|
@ -713,34 +713,39 @@ layer = "tok2vec"
|
|||
|
||||
#### Pretraining objectives {#pretraining-details}
|
||||
|
||||
Two pretraining objectives are available, both of which are variants of the
|
||||
cloze task [Devlin et al. (2018)](https://arxiv.org/abs/1810.04805) introduced
|
||||
for BERT. The objective can be defined and configured via the
|
||||
`[pretraining.objective]` config block.
|
||||
|
||||
> ```ini
|
||||
> ### Characters objective
|
||||
> [pretraining.objective]
|
||||
> type = "characters"
|
||||
> @architectures = "spacy.PretrainCharacters.v1"
|
||||
> maxout_pieces = 3
|
||||
> hidden_size = 300
|
||||
> n_characters = 4
|
||||
> ```
|
||||
>
|
||||
> ```ini
|
||||
> ### Vectors objective
|
||||
> [pretraining.objective]
|
||||
> type = "vectors"
|
||||
> @architectures = "spacy.PretrainVectors.v1"
|
||||
> maxout_pieces = 3
|
||||
> hidden_size = 300
|
||||
> loss = "cosine"
|
||||
> ```
|
||||
|
||||
- **Characters:** The `"characters"` objective asks the model to predict some
|
||||
number of leading and trailing UTF-8 bytes for the words. For instance,
|
||||
setting `n_characters = 2`, the model will try to predict the first two and
|
||||
last two characters of the word.
|
||||
Two pretraining objectives are available, both of which are variants of the
|
||||
cloze task [Devlin et al. (2018)](https://arxiv.org/abs/1810.04805) introduced
|
||||
for BERT. The objective can be defined and configured via the
|
||||
`[pretraining.objective]` config block.
|
||||
|
||||
- **Vectors:** The `"vectors"` objective asks the model to predict the word's
|
||||
vector, from a static embeddings table. This requires a word vectors model to
|
||||
be trained and loaded. The vectors objective can optimize either a cosine or
|
||||
an L2 loss. We've generally found cosine loss to perform better.
|
||||
- [`PretrainCharacters`](/api/architectures#pretrain_chars): The `"characters"`
|
||||
objective asks the model to predict some number of leading and trailing UTF-8
|
||||
bytes for the words. For instance, setting `n_characters = 2`, the model will
|
||||
try to predict the first two and last two characters of the word.
|
||||
|
||||
- [`PretrainVectors`](/api/architectures#pretrain_vectors): The `"vectors"`
|
||||
objective asks the model to predict the word's vector, from a static
|
||||
embeddings table. This requires a word vectors model to be trained and loaded.
|
||||
The vectors objective can optimize either a cosine or an L2 loss. We've
|
||||
generally found cosine loss to perform better.
|
||||
|
||||
These pretraining objectives use a trick that we term **language modelling with
|
||||
approximate outputs (LMAO)**. The motivation for the trick is that predicting an
|
||||
|
|
Loading…
Reference in New Issue
Block a user