spaCy/spacy/pipeline
Adriane Boyd 03fefa37e2
Add overwrite settings for more components (#9050)
* Add overwrite settings for more components

For pipeline components where it's relevant and not already implemented,
add an explicit `overwrite` setting that controls whether
`set_annotations` overwrites existing annotation.

For the `morphologizer`, add an additional setting `extend`, which
controls whether the existing features are preserved.

* +overwrite, +extend: overwrite values of existing features, add any new
features
* +overwrite, -extend: overwrite completely, removing any existing
features
* -overwrite, +extend: keep values of existing features, add any new
features
* -overwrite, -extend: do not modify the existing value if set

In all cases an unset value will be set by `set_annotations`.

Preserve current overwrite defaults:

* True: morphologizer, entity linker
* False: tagger, sentencizer, senter

* Add backwards compat overwrite settings

* Put empty line back

Removed by accident in last commit

* Set backwards-compatible defaults in __init__

Because the `TrainablePipe` serialization methods update `cfg`, there's
no straightforward way to detect whether models serialized with a
previous version are missing the overwrite settings.

It would be possible in the sentencizer due to its separate
serialization methods, however to keep the changes parallel, this also
sets the default in `__init__`.

* Remove traces

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-09-30 15:35:55 +02:00
..
_parser_internals Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
__init__.py Add SpanCategorizer component (#6747) 2021-06-24 12:35:27 +02:00
attributeruler.py Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
dep_parser.pyx Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
entity_linker.py Add overwrite settings for more components (#9050) 2021-09-30 15:35:55 +02:00
entityruler.py Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
functions.py Tidy up and auto-format 2021-02-13 12:55:56 +11:00
lemmatizer.py Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
morphologizer.pyx Add overwrite settings for more components (#9050) 2021-09-30 15:35:55 +02:00
multitask.pyx Replace negative rows with 0 in StaticVectors (#7674) 2021-04-22 18:04:15 +10:00
ner.pyx Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
pipe.pxd TrainablePipe (#6213) 2020-10-08 21:33:49 +02:00
pipe.pyx Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
sentencizer.pyx Add overwrite settings for more components (#9050) 2021-09-30 15:35:55 +02:00
senter.pyx Add overwrite settings for more components (#9050) 2021-09-30 15:35:55 +02:00
spancat.py Merge remote-tracking branch 'upstream/master' into develop 2021-09-27 09:10:45 +02:00
tagger.pyx Add overwrite settings for more components (#9050) 2021-09-30 15:35:55 +02:00
textcat_multilabel.py Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
textcat.py Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
tok2vec.py Ensemble textcat with listener (#8012) 2021-05-31 18:21:06 +10:00
trainable_pipe.pxd Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
trainable_pipe.pyx Pass excludes when serializing vocab (#8824) 2021-08-03 14:42:44 +02:00
transition_parser.pxd TrainablePipe (#6213) 2020-10-08 21:33:49 +02:00
transition_parser.pyx Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00