spaCy/spacy
Adriane Boyd 4192e71599
Sync vocab in vectors and components sourced in configs (#9335)
Since a component may reference anything in the vocab, share the full
vocab when loading source components and vectors (which will include
`strings` as of #8909).

When loading a source component from a config, save and restore the
vocab state after loading source pipelines, in particular to preserve
the original state without vectors, since `[initialize.vectors]
= null` skips rather than resets the vectors.

The vocab references are not synced for components loaded with
`Language.add_pipe(source=)` because the pipelines are already loaded
and not necessarily with the same vocab. A warning could be added in
`Language.create_pipe_from_source` that it may be necessary to save and
reload before training, but it's a rare enough case that this kind of
warning may be too noisy overall.
2021-10-04 12:19:02 +02:00
..
cli avoid crash when unicode in title (#9254) 2021-09-22 21:01:34 +02:00
displacy Adjust kb_id visualizer templating and docs 2021-09-23 11:59:02 +02:00
lang Fix verbs list in lang/fr/tokenizer_exceptions.py (#9033) 2021-08-25 15:55:09 +02:00
matcher Pass alignments to Matcher callbacks (#9001) 2021-09-02 12:58:05 +02:00
ml Correct parser.py use_upper param info (#9180) 2021-09-10 16:19:58 +02:00
pipeline Auto-format code with black (#9065) 2021-08-27 11:42:27 +02:00
tests Sync thinc install dep in setup, fix test packaging (#9336) 2021-09-30 19:02:10 +02:00
tokens Auto-format code with black (#9346) 2021-10-01 11:17:11 +02:00
training Sync vocab in vectors and components sourced in configs (#9335) 2021-10-04 12:19:02 +02:00
__init__.pxd
__init__.py Tidy up and auto-format 2021-07-18 15:44:56 +10:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Prepare for v3.1.3 (#9200) 2021-09-14 11:03:51 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Remove unsupported attrs from attrs.IDS (#8132) 2021-06-02 19:16:57 +10:00
compat.py Auto-detect package dependencies in spacy package (#8948) 2021-08-17 14:05:13 +02:00
default_config_pretraining.cfg pretrain architectures (#6451) 2020-12-08 14:41:03 +08:00
default_config.cfg Add training option to set annotations on update (#7767) 2021-04-26 16:53:53 +02:00
errors.py Raise E983 early on in docbin init (#9247) 2021-09-27 20:43:03 +02:00
glossary.py Add glossary entry for _SP (#8983) 2021-08-20 12:04:02 +02:00
kb.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
kb.pyx KB & NEL to/from bytes (#8113) 2021-05-20 18:11:30 +10:00
language.py Sync vocab in vectors and components sourced in configs (#9335) 2021-10-04 12:19:02 +02:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyi Add stub files for main cython classes (#8427) 2021-08-07 12:30:03 +02:00
lexeme.pyx fix 's typo's across code base (#8384) 2021-06-15 10:57:08 +02:00
lookups.py Tidy up code 2021-06-28 12:08:15 +02:00
morphology.pxd Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
morphology.pyx Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py Tidy up and auto-format 2020-09-29 21:39:28 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Support list values and INTERSECTS in Matcher (#8784) 2021-08-02 19:39:26 +02:00
scorer.py Tidy up code 2021-06-28 12:08:15 +02:00
strings.pxd Remove 'cleanup' of strings (#6007) 2020-09-01 16:12:15 +02:00
strings.pyi Add stub files for main cython classes (#8427) 2021-08-07 12:30:03 +02:00
strings.pyx Make vocab update in get_docs deterministic (#7603) 2021-04-09 11:53:13 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
symbols.pxd introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
symbols.pyx introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
tokenizer.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
tokenizer.pyx Pass excludes when serializing vocab (#8824) 2021-08-03 14:42:44 +02:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Handle spacy-legacy in package CLI for dependencies (#9163) 2021-09-08 11:46:40 +02:00
vectors.pyx Fix vectors data on GPU (#7626) 2021-04-19 18:30:03 +10:00
vocab.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
vocab.pyi Add stub files for main cython classes (#8427) 2021-08-07 12:30:03 +02:00
vocab.pyx Skip vector ngram backoff if minn is not set (#7925) 2021-05-06 18:34:35 +10:00