spaCy/spacy
Adriane Boyd 7abfa25035
Don't use the same vocab for source models (#8388)
* Don't use the same vocab for source models

The source models should not be loaded with the vocab from the current
pipeline because this loads the vectors from the source model into the
current vocab.

The strings are all copied in `Language.create_pipe_from_source`, so if
the vectors are configured correctly in the current pipeline, the
sourced component will work as expected. If there is a vector mismatch,
a warning is shown. (It's not possible to inspect whether the vectors
are actually used by the component, so a warning is the best option.)

* Update comment on source model loading
2021-06-21 09:33:33 +02:00
..
cli Update package CLI handling of README and LICENSE (#8422) 2021-06-18 15:48:53 +02:00
displacy Also exclude user hooks in displacy conversion (#7419) 2021-03-12 09:41:59 +01:00
lang Fix non-deterministic deduplication in Greek lemmatizer (#8421) 2021-06-17 09:11:01 +02:00
matcher Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1 2021-06-15 15:05:17 +02:00
ml Resizable textcat (#7862) 2021-06-16 11:45:00 +02:00
pipeline Support negative examples in partial NER annotations (#8106) 2021-06-17 17:33:00 +10:00
tests Don't use the same vocab for source models (#8388) 2021-06-21 09:33:33 +02:00
tokens Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1 2021-06-15 15:05:17 +02:00
training Fix setting empty entities in Example.from_dict (#8426) 2021-06-18 10:41:50 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Show warning if entity_ruler runs without patterns (#7807) 2021-06-04 17:37:38 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.1.0.dev0 (#8379) 2021-06-14 11:17:35 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Remove unsupported attrs from attrs.IDS (#8132) 2021-06-02 19:16:57 +10:00
compat.py Use Literal type for nr_feature_tokens 2020-09-23 16:00:03 +02:00
default_config_pretraining.cfg pretrain architectures (#6451) 2020-12-08 14:41:03 +08:00
default_config.cfg Add training option to set annotations on update (#7767) 2021-04-26 16:53:53 +02:00
errors.py Support negative examples in partial NER annotations (#8106) 2021-06-17 17:33:00 +10:00
glossary.py Add Chinese PTB tags to glossary (#7993) 2021-05-06 18:43:03 +10:00
kb.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
kb.pyx KB & NEL to/from bytes (#8113) 2021-05-20 18:11:30 +10:00
language.py Don't use the same vocab for source models (#8388) 2021-06-21 09:33:33 +02:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyx fix 's typo's across code base (#8384) 2021-06-15 10:57:08 +02:00
lookups.py Update load_lookups return type and docstring (#7907) 2021-04-27 09:13:39 +02:00
morphology.pxd Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
morphology.pyx Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py Tidy up and auto-format 2020-09-29 21:39:28 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Add training option to set annotations on update (#7767) 2021-04-26 16:53:53 +02:00
scorer.py Extend score_spans for overlapping & non-labeled spans (#7209) 2021-04-08 12:19:17 +02:00
strings.pxd Remove 'cleanup' of strings (#6007) 2020-09-01 16:12:15 +02:00
strings.pyx Make vocab update in get_docs deterministic (#7603) 2021-04-09 11:53:13 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
symbols.pxd introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
symbols.pyx introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
tokenizer.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
tokenizer.pyx Fix tokenizer cache flushing (#7836) 2021-04-22 18:14:57 +10:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1 2021-06-15 15:05:17 +02:00
vectors.pyx Fix vectors data on GPU (#7626) 2021-04-19 18:30:03 +10:00
vocab.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
vocab.pyx Skip vector ngram backoff if minn is not set (#7925) 2021-05-06 18:34:35 +10:00