spaCy/spacy/training
Adriane Boyd 1f663b7c33 Address issues with source with component names and replacing listeners (#12701)
When sourcing a component, the object from the original pipeline is added to the new pipeline as the same object. This creates a situation where there are several attributes that cannot be in sync between the original pipeline and the new pipeline at the same time for this one object:

* component.name
* component.listener_map / component.listening_components for tok2vec and transformer

When running replace_listeners on a component, the config is not updated correctly if the state of the component is incorrect for the current pipeline (in particular changes that should be applied from model.attrs["replace_listener_cfg"] as used in spacy-transformers) due to the fact that:

* find_listeners relies on component.name to set the name in the listener_map
* replace_listeners relies on listener_map to determine how to modify the configs

In addition, there are several places where pipeline components are modified and the listener map and/or internal component names aren't currently updated.

In cases where there is a component shared by two pipelines that cannot be in sync, this PR chooses to prioritize the most recently modified or initialized pipeline. There is no actual solution with the current source behavior that will make both pipelines usable, so the current pipeline is updated whenever components are added/renamed/removed or the pipeline is initialized for training.
2023-06-28 10:03:27 +02:00
..
converters Auto-format code with black (#10377) 2022-02-25 10:00:21 +01:00
__init__.pxd Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
__init__.py Add spacy.PlainTextCorpusReader.v1 (#12122) 2023-01-26 11:33:22 +01:00
align.pyx Fix alignment for 1-to-1 tokens and lowercasing (#6476) 2020-12-08 14:25:16 +08:00
alignment_array.pxd Alignment: use a simplified ragged type for performance (#10319) 2022-04-01 09:02:06 +02:00
alignment_array.pyx Backport parser/alignment optimizations from feature/refactor-parser (#10952) 2022-06-24 13:39:52 +02:00
alignment.py Alignment: use a simplified ragged type for performance (#10319) 2022-04-01 09:02:06 +02:00
augment.py Preserve missing entity annotation in augmenters (#11540) 2022-09-27 10:16:51 +02:00
batchers.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
callbacks.py Have logging calls use string formatting types (#12215) 2023-02-02 11:15:22 +01:00
corpus.py Have logging calls use string formatting types (#12215) 2023-02-02 11:15:22 +01:00
example.pxd Make a pre-check to speed up alignment cache (#6139) 2020-09-24 18:13:39 +02:00
example.pyx Cast to uint64 for all array-based doc representations (#11933) 2022-12-12 08:45:35 +01:00
gold_io.pyx Fix is_sent_start when converting from JSON (fix #7635) (#7655) 2021-04-08 18:24:52 +10:00
initialize.py Address issues with source with component names and replacing listeners (#12701) 2023-06-28 10:03:27 +02:00
iob_utils.py Preserve missing entity annotation in augmenters (#11540) 2022-09-27 10:16:51 +02:00
loggers.py New console logger with expanded progress tracking (#11972) 2022-12-23 15:21:44 +01:00
loop.py Have logging calls use string formatting types (#12215) 2023-02-02 11:15:22 +01:00
pretrain.py Add model-last saving mechanism to pretraining (#12459) 2023-04-03 15:28:52 +02:00