mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 09:57:26 +03:00 
			
		
		
		
	Merge pull request #6646 from svlandeg/feature/cli-docs [ci skip]
This commit is contained in:
		
						commit
						3614472e29
					
				| 
						 | 
					@ -790,6 +790,12 @@ in the section `[paths]`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
</Infobox>
 | 
					</Infobox>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					> #### Example
 | 
				
			||||||
 | 
					>
 | 
				
			||||||
 | 
					> ```cli
 | 
				
			||||||
 | 
					> $ python -m spacy train config.cfg --output ./output --paths.train ./train --paths.dev ./dev
 | 
				
			||||||
 | 
					> ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```cli
 | 
					```cli
 | 
				
			||||||
$ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id] [overrides]
 | 
					$ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id] [overrides]
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
| 
						 | 
					@ -808,15 +814,16 @@ $ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id]
 | 
				
			||||||
## pretrain {#pretrain new="2.1" tag="command,experimental"}
 | 
					## pretrain {#pretrain new="2.1" tag="command,experimental"}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Pretrain the "token to vector" ([`Tok2vec`](/api/tok2vec)) layer of pipeline
 | 
					Pretrain the "token to vector" ([`Tok2vec`](/api/tok2vec)) layer of pipeline
 | 
				
			||||||
components on [raw text](/api/data-formats#pretrain), using an approximate
 | 
					components on raw text, using an approximate language-modeling objective.
 | 
				
			||||||
language-modeling objective. Specifically, we load pretrained vectors, and train
 | 
					Specifically, we load pretrained vectors, and train a component like a CNN,
 | 
				
			||||||
a component like a CNN, BiLSTM, etc to predict vectors which match the
 | 
					BiLSTM, etc to predict vectors which match the pretrained ones. The weights are
 | 
				
			||||||
pretrained ones. The weights are saved to a directory after each epoch. You can
 | 
					saved to a directory after each epoch. You can then include a **path to one of
 | 
				
			||||||
then include a **path to one of these pretrained weights files** in your
 | 
					these pretrained weights files** in your
 | 
				
			||||||
[training config](/usage/training#config) as the `init_tok2vec` setting when you
 | 
					[training config](/usage/training#config) as the `init_tok2vec` setting when you
 | 
				
			||||||
train your pipeline. This technique may be especially helpful if you have little
 | 
					train your pipeline. This technique may be especially helpful if you have little
 | 
				
			||||||
labelled data. See the usage docs on
 | 
					labelled data. See the usage docs on
 | 
				
			||||||
[pretraining](/usage/embeddings-transformers#pretraining) for more info.
 | 
					[pretraining](/usage/embeddings-transformers#pretraining) for more info. To read
 | 
				
			||||||
 | 
					the raw text, a [`JsonlCorpus`](/api/top-level#jsonlcorpus) is typically used.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
<Infobox title="Changed in v3.0" variant="warning">
 | 
					<Infobox title="Changed in v3.0" variant="warning">
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -830,6 +837,12 @@ auto-generated by setting `--pretraining` on
 | 
				
			||||||
 | 
					
 | 
				
			||||||
</Infobox>
 | 
					</Infobox>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					> #### Example
 | 
				
			||||||
 | 
					>
 | 
				
			||||||
 | 
					> ```cli
 | 
				
			||||||
 | 
					> $ python -m spacy pretrain config.cfg ./output_pretrain --paths.raw_text ./data.jsonl
 | 
				
			||||||
 | 
					> ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```cli
 | 
					```cli
 | 
				
			||||||
$ python -m spacy pretrain [config_path] [output_dir] [--code] [--resume-path] [--epoch-resume] [--gpu-id] [overrides]
 | 
					$ python -m spacy pretrain [config_path] [output_dir] [--code] [--resume-path] [--epoch-resume] [--gpu-id] [overrides]
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -148,7 +148,7 @@ This section defines a **dictionary** mapping of string keys to functions. Each
 | 
				
			||||||
function takes an `nlp` object and yields [`Example`](/api/example) objects. By
 | 
					function takes an `nlp` object and yields [`Example`](/api/example) objects. By
 | 
				
			||||||
default, the two keys `train` and `dev` are specified and each refer to a
 | 
					default, the two keys `train` and `dev` are specified and each refer to a
 | 
				
			||||||
[`Corpus`](/api/top-level#Corpus). When pretraining, an additional `pretrain`
 | 
					[`Corpus`](/api/top-level#Corpus). When pretraining, an additional `pretrain`
 | 
				
			||||||
section is added that defaults to a [`JsonlCorpus`](/api/top-level#JsonlCorpus).
 | 
					section is added that defaults to a [`JsonlCorpus`](/api/top-level#jsonlcorpus).
 | 
				
			||||||
You can also register custom functions that return a callable.
 | 
					You can also register custom functions that return a callable.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| Name       | Description                                                                                                                                                                 |
 | 
					| Name       | Description                                                                                                                                                                 |
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user