mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 07:57:35 +03:00 
			
		
		
		
	Improve init-model docs (see #4137)
This commit is contained in:
		
							parent
							
								
									198b7e9789
								
							
						
					
					
						commit
						25c2b4b9a5
					
				|  | @ -584,8 +584,8 @@ data. | |||
| ```python | ||||
| ### Entry structure | ||||
| { | ||||
|     "orth": string, | ||||
|     "id": int, | ||||
|     "orth": string,     # the word text | ||||
|     "id": int,          # can correspond to row in vectors table | ||||
|     "lower": string, | ||||
|     "norm": string, | ||||
|     "shape": string | ||||
|  |  | |||
|  | @ -347,14 +347,17 @@ tokenization can be provided. | |||
| 
 | ||||
| Create a new model directory from raw data, like word frequencies, Brown | ||||
| clusters and word vectors. This command is similar to the `spacy model` command | ||||
| in v1.x. | ||||
| in v1.x. Note that in order to populate the model's vocab, you need to pass in a | ||||
| JSONL-formatted [vocabulary file](<(/api/annotation#vocab-jsonl)>) as | ||||
| `--jsonl-loc` with optional `id` values that correspond to the vectors table. | ||||
| Just loading in vectors will not automatically populate the vocab. | ||||
| 
 | ||||
| <Infobox title="Deprecation note" variant="warning"> | ||||
| 
 | ||||
| As of v2.1.0, the `--freqs-loc` and `--clusters-loc` are deprecated and have | ||||
| been replaced with the `--jsonl-loc` argument, which lets you pass in a a | ||||
| [newline-delimited JSON](http://jsonlines.org/) (JSONL) file containing one | ||||
| lexical entry per line. For more details on the format, see the | ||||
| [JSONL](http://jsonlines.org/) file containing one lexical entry per line. For | ||||
| more details on the format, see the | ||||
| [annotation specs](/api/annotation#vocab-jsonl). | ||||
| 
 | ||||
| </Infobox> | ||||
|  | @ -368,7 +371,7 @@ $ python -m spacy init-model [lang] [output_dir] [--jsonl-loc] [--vectors-loc] | |||
| | ----------------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `lang`                  | positional | Model language [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes), e.g. `en`.                                                                                                                                                                      | | ||||
| | `output_dir`            | positional | Model output directory. Will be created if it doesn't exist.                                                                                                                                                                                                      | | ||||
| | `--jsonl-loc`, `-j`     | option     | Optional location of JSONL-formatted vocabulary file with lexical attributes.                                                                                                                                                                                     | | ||||
| | `--jsonl-loc`, `-j`     | option     | Optional location of JSONL-formatted [vocabulary file](/api/annotation#vocab-jsonl) with lexical attributes.                                                                                                                                                      | | ||||
| | `--vectors-loc`, `-v`   | option     | Optional location of vectors file. Should be a tab-separated file in Word2Vec format where the first column contains the word and the remaining columns the values. File can be provided in `.txt` format or as a zipped text file in `.zip` or `.tar.gz` format. | | ||||
| | `--prune-vectors`, `-V` | flag       | Number of vectors to prune the vocabulary to. Defaults to `-1` for no pruning.                                                                                                                                                                                    | | ||||
| | **CREATES**             | model      | A spaCy model containing the vocab and vectors.                                                                                                                                                                                                                   | | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue
	
	Block a user