spaCy/spacy/cli
Adriane Boyd eed4b785f5 Load vocab lookups tables at beginning of training
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.

The option moves from `nlp.load_vocab_data` to `training.lookups`.

Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.

The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.

To load `lexeme_norm` from `spacy-lookups-data`:

```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
..
project Fix sparse checkout and error handling 2020-09-14 14:12:58 +02:00
templates generalize corpora, dot notation for dev and train corpus 2020-09-17 11:38:59 +02:00
__init__.py "model" terminology consistency in docs 2020-09-03 13:13:03 +02:00
_util.py Fix sparse checkout and error handling 2020-09-14 14:12:58 +02:00
convert.py Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
debug_config.py Update docs links in codebase 2020-09-04 12:58:50 +02:00
debug_data.py Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
debug_model.py Tidy up and auto-format [ci skip] 2020-09-13 10:55:36 +02:00
download.py Update docs links in codebase 2020-09-04 12:58:50 +02:00
evaluate.py Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
info.py Update docs links in codebase 2020-09-04 12:58:50 +02:00
init_config.py Use consistent shortcut 2020-09-17 16:57:02 +02:00
init_model.py Tidy up and auto-format [ci skip] 2020-09-13 10:55:36 +02:00
package.py Support overwriting name on spacy package 2020-09-11 11:38:28 +02:00
pretrain.py cleanup and formatting 2020-09-17 11:48:04 +02:00
profile.py Update docs links in codebase 2020-09-04 12:58:50 +02:00
train.py Load vocab lookups tables at beginning of training 2020-09-18 15:59:16 +02:00
validate.py Update docs links in codebase 2020-09-04 12:58:50 +02:00