Merge pull request #6646 from svlandeg/feature/cli-docs [ci skip]

This commit is contained in:
Ines Montani 2021-01-05 13:52:49 +11:00 committed by GitHub
commit 3614472e29
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 20 additions and 7 deletions

View File

@ -790,6 +790,12 @@ in the section `[paths]`.
</Infobox> </Infobox>
> #### Example
>
> ```cli
> $ python -m spacy train config.cfg --output ./output --paths.train ./train --paths.dev ./dev
> ```
```cli ```cli
$ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id] [overrides] $ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id] [overrides]
``` ```
@ -808,15 +814,16 @@ $ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id]
## pretrain {#pretrain new="2.1" tag="command,experimental"} ## pretrain {#pretrain new="2.1" tag="command,experimental"}
Pretrain the "token to vector" ([`Tok2vec`](/api/tok2vec)) layer of pipeline Pretrain the "token to vector" ([`Tok2vec`](/api/tok2vec)) layer of pipeline
components on [raw text](/api/data-formats#pretrain), using an approximate components on raw text, using an approximate language-modeling objective.
language-modeling objective. Specifically, we load pretrained vectors, and train Specifically, we load pretrained vectors, and train a component like a CNN,
a component like a CNN, BiLSTM, etc to predict vectors which match the BiLSTM, etc to predict vectors which match the pretrained ones. The weights are
pretrained ones. The weights are saved to a directory after each epoch. You can saved to a directory after each epoch. You can then include a **path to one of
then include a **path to one of these pretrained weights files** in your these pretrained weights files** in your
[training config](/usage/training#config) as the `init_tok2vec` setting when you [training config](/usage/training#config) as the `init_tok2vec` setting when you
train your pipeline. This technique may be especially helpful if you have little train your pipeline. This technique may be especially helpful if you have little
labelled data. See the usage docs on labelled data. See the usage docs on
[pretraining](/usage/embeddings-transformers#pretraining) for more info. [pretraining](/usage/embeddings-transformers#pretraining) for more info. To read
the raw text, a [`JsonlCorpus`](/api/top-level#jsonlcorpus) is typically used.
<Infobox title="Changed in v3.0" variant="warning"> <Infobox title="Changed in v3.0" variant="warning">
@ -830,6 +837,12 @@ auto-generated by setting `--pretraining` on
</Infobox> </Infobox>
> #### Example
>
> ```cli
> $ python -m spacy pretrain config.cfg ./output_pretrain --paths.raw_text ./data.jsonl
> ```
```cli ```cli
$ python -m spacy pretrain [config_path] [output_dir] [--code] [--resume-path] [--epoch-resume] [--gpu-id] [overrides] $ python -m spacy pretrain [config_path] [output_dir] [--code] [--resume-path] [--epoch-resume] [--gpu-id] [overrides]
``` ```

View File

@ -148,7 +148,7 @@ This section defines a **dictionary** mapping of string keys to functions. Each
function takes an `nlp` object and yields [`Example`](/api/example) objects. By function takes an `nlp` object and yields [`Example`](/api/example) objects. By
default, the two keys `train` and `dev` are specified and each refer to a default, the two keys `train` and `dev` are specified and each refer to a
[`Corpus`](/api/top-level#Corpus). When pretraining, an additional `pretrain` [`Corpus`](/api/top-level#Corpus). When pretraining, an additional `pretrain`
section is added that defaults to a [`JsonlCorpus`](/api/top-level#JsonlCorpus). section is added that defaults to a [`JsonlCorpus`](/api/top-level#jsonlcorpus).
You can also register custom functions that return a callable. You can also register custom functions that return a callable.
| Name | Description | | Name | Description |