spaCy/spacy
Adriane Boyd 03fefa37e2
Add overwrite settings for more components (#9050)
* Add overwrite settings for more components

For pipeline components where it's relevant and not already implemented,
add an explicit `overwrite` setting that controls whether
`set_annotations` overwrites existing annotation.

For the `morphologizer`, add an additional setting `extend`, which
controls whether the existing features are preserved.

* +overwrite, +extend: overwrite values of existing features, add any new
features
* +overwrite, -extend: overwrite completely, removing any existing
features
* -overwrite, +extend: keep values of existing features, add any new
features
* -overwrite, -extend: do not modify the existing value if set

In all cases an unset value will be set by `set_annotations`.

Preserve current overwrite defaults:

* True: morphologizer, entity linker
* False: tagger, sentencizer, senter

* Add backwards compat overwrite settings

* Put empty line back

Removed by accident in last commit

* Set backwards-compatible defaults in __init__

Because the `TrainablePipe` serialization methods update `cfg`, there's
no straightforward way to detect whether models serialized with a
previous version are missing the overwrite settings.

It would be possible in the sentencizer due to its separate
serialization methods, however to keep the changes parallel, this also
sets the default in `__init__`.

* Remove traces

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-09-30 15:35:55 +02:00
..
cli avoid crash when unicode in title (#9254) 2021-09-22 21:01:34 +02:00
displacy Adjust kb_id visualizer templating and docs 2021-09-23 11:59:02 +02:00
lang Add an Irish lemmatiser, based on BuNaMo (#9102) 2021-09-30 14:18:47 +02:00
matcher Merge remote-tracking branch 'upstream/master' into develop 2021-09-27 09:10:45 +02:00
ml Correct parser.py use_upper param info (#9180) 2021-09-10 16:19:58 +02:00
pipeline Add overwrite settings for more components (#9050) 2021-09-30 15:35:55 +02:00
tests Add overwrite settings for more components (#9050) 2021-09-30 15:35:55 +02:00
tokens Merge remote-tracking branch 'upstream/master' into develop 2021-09-27 09:10:45 +02:00
training Move WandB loggers into spacy-loggers (#9223) 2021-09-29 11:12:50 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Tidy up and auto-format 2021-07-18 15:44:56 +10:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Prepare for v3.1.3 (#9200) 2021-09-14 11:03:51 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
compat.py Auto-detect package dependencies in spacy package (#8948) 2021-08-17 14:05:13 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Add training option to set annotations on update (#7767) 2021-04-26 16:53:53 +02:00
errors.py Merge remote-tracking branch 'upstream/master' into develop 2021-09-27 09:10:45 +02:00
glossary.py Add glossary entry for _SP (#8983) 2021-08-20 12:04:02 +02:00
kb.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
kb.pyx Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
language.py Merge remote-tracking branch 'upstream/master' into develop 2021-09-27 09:10:45 +02:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyi Add stub files for main cython classes (#8427) 2021-08-07 12:30:03 +02:00
lexeme.pyx Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
lookups.py Tidy up code 2021-06-28 12:08:15 +02:00
morphology.pxd Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
morphology.pyx Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py Tidy up and auto-format 2020-09-29 21:39:28 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
scorer.py Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
strings.pxd Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
strings.pyi Add stub files for main cython classes (#8427) 2021-08-07 12:30:03 +02:00
strings.pyx Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
symbols.pxd introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
symbols.pyx introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
tokenizer.pxd Remove two attributes marked for removal in 3.1 (#9150) 2021-09-15 23:07:21 +02:00
tokenizer.pyx Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Merge remote-tracking branch 'upstream/master' into develop 2021-09-27 09:10:45 +02:00
vectors.pyx Fix vectors data on GPU (#7626) 2021-04-19 18:30:03 +10:00
vocab.pxd Remove two attributes marked for removal in 3.1 (#9150) 2021-09-15 23:07:21 +02:00
vocab.pyi Remove two attributes marked for removal in 3.1 (#9150) 2021-09-15 23:07:21 +02:00
vocab.pyx Remove two attributes marked for removal in 3.1 (#9150) 2021-09-15 23:07:21 +02:00