mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-27 01:34:30 +03:00
Improve init-model docs (see #4137)
This commit is contained in:
parent
198b7e9789
commit
25c2b4b9a5
|
@ -584,8 +584,8 @@ data.
|
||||||
```python
|
```python
|
||||||
### Entry structure
|
### Entry structure
|
||||||
{
|
{
|
||||||
"orth": string,
|
"orth": string, # the word text
|
||||||
"id": int,
|
"id": int, # can correspond to row in vectors table
|
||||||
"lower": string,
|
"lower": string,
|
||||||
"norm": string,
|
"norm": string,
|
||||||
"shape": string
|
"shape": string
|
||||||
|
|
|
@ -347,14 +347,17 @@ tokenization can be provided.
|
||||||
|
|
||||||
Create a new model directory from raw data, like word frequencies, Brown
|
Create a new model directory from raw data, like word frequencies, Brown
|
||||||
clusters and word vectors. This command is similar to the `spacy model` command
|
clusters and word vectors. This command is similar to the `spacy model` command
|
||||||
in v1.x.
|
in v1.x. Note that in order to populate the model's vocab, you need to pass in a
|
||||||
|
JSONL-formatted [vocabulary file](<(/api/annotation#vocab-jsonl)>) as
|
||||||
|
`--jsonl-loc` with optional `id` values that correspond to the vectors table.
|
||||||
|
Just loading in vectors will not automatically populate the vocab.
|
||||||
|
|
||||||
<Infobox title="Deprecation note" variant="warning">
|
<Infobox title="Deprecation note" variant="warning">
|
||||||
|
|
||||||
As of v2.1.0, the `--freqs-loc` and `--clusters-loc` are deprecated and have
|
As of v2.1.0, the `--freqs-loc` and `--clusters-loc` are deprecated and have
|
||||||
been replaced with the `--jsonl-loc` argument, which lets you pass in a a
|
been replaced with the `--jsonl-loc` argument, which lets you pass in a a
|
||||||
[newline-delimited JSON](http://jsonlines.org/) (JSONL) file containing one
|
[JSONL](http://jsonlines.org/) file containing one lexical entry per line. For
|
||||||
lexical entry per line. For more details on the format, see the
|
more details on the format, see the
|
||||||
[annotation specs](/api/annotation#vocab-jsonl).
|
[annotation specs](/api/annotation#vocab-jsonl).
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
@ -368,7 +371,7 @@ $ python -m spacy init-model [lang] [output_dir] [--jsonl-loc] [--vectors-loc]
|
||||||
| ----------------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `lang` | positional | Model language [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes), e.g. `en`. |
|
| `lang` | positional | Model language [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes), e.g. `en`. |
|
||||||
| `output_dir` | positional | Model output directory. Will be created if it doesn't exist. |
|
| `output_dir` | positional | Model output directory. Will be created if it doesn't exist. |
|
||||||
| `--jsonl-loc`, `-j` | option | Optional location of JSONL-formatted vocabulary file with lexical attributes. |
|
| `--jsonl-loc`, `-j` | option | Optional location of JSONL-formatted [vocabulary file](/api/annotation#vocab-jsonl) with lexical attributes. |
|
||||||
| `--vectors-loc`, `-v` | option | Optional location of vectors file. Should be a tab-separated file in Word2Vec format where the first column contains the word and the remaining columns the values. File can be provided in `.txt` format or as a zipped text file in `.zip` or `.tar.gz` format. |
|
| `--vectors-loc`, `-v` | option | Optional location of vectors file. Should be a tab-separated file in Word2Vec format where the first column contains the word and the remaining columns the values. File can be provided in `.txt` format or as a zipped text file in `.zip` or `.tar.gz` format. |
|
||||||
| `--prune-vectors`, `-V` | flag | Number of vectors to prune the vocabulary to. Defaults to `-1` for no pruning. |
|
| `--prune-vectors`, `-V` | flag | Number of vectors to prune the vocabulary to. Defaults to `-1` for no pruning. |
|
||||||
| **CREATES** | model | A spaCy model containing the vocab and vectors. |
|
| **CREATES** | model | A spaCy model containing the vocab and vectors. |
|
||||||
|
|
Loading…
Reference in New Issue
Block a user