spaCy/spacy/tests/serialize
single-fingal 6c6b8da7cc
Fix: De/Serialize SpanGroups including the SpanGroup keys (#10707)
* fix: De/Serialize `SpanGroups` including the SpanGroup keys

This prevents the loss of `SpanGroup`s that have the same .name as other `SpanGroup`s within the same `SpanGroups` object (upon de/serialization of the `SpanGroups`).

Fixes #10685

* Maintain backwards compatibility for serialized `SpanGroups`

(serialized as: a list of `SpanGroup`s, or b'')

* Add tests for `SpanGroups` deserialization backwards-compatibility

* Move a `SpanGroups` de/serialization test (test_issue10685)
  to tests/serialize/test_serialize_spangroups.py

* Output a warning if deserializing a `SpanGroups` with duplicate .name-d `SpanGroup`s

* Minor refactor

* `SpanGroups.from_bytes` handles only `list` and `dict` types with
`dict` as the expected default
* For lists, keep first rather than last value encountered
* Update error message
* Rename and update tests

* Update to preserve list serialization of SpanGroups

To avoid breaking compatibility of serialized `Doc` and `DocBin` with
earlier versions of spacy v3, revert back to a list-only serialization,
but update the names just for serialization so that the SpanGroups keys
override the SpanGroup names.

* Preserve object identity and current key overwrite

* Preserve SpanGroup object identity
* Preserve last rather than first span group from SpanGroup list
  format without SpanGroups keys

* Update inline comments

* Fix types

* Add type info for SpanGroup.copy

* Deserialize `SpanGroup`s as copies

when a single SpanGroup is the value for more than 1 `SpanGroups` key.
This is because we serialize `SpanGroups` as dicts (to maintain backward-
and forward-compatibility) and we can't assume `SpanGroup`s with the same
bytes/serialization were the same (identical) object, pre-serialization.

* Update spacy/tokens/_dict_proxies.py

* Add more SpanGroups serialization tests

Test that serialized SpanGroups maintain their Span order

* small clarification on older spaCy version

* Update spacy/tests/serialize/test_serialize_span_groups.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-02 15:56:27 +02:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_resource_warning.py Tidy up tests 2020-10-15 10:20:21 +02:00
test_serialize_config.py Tagger: use unnormalized probabilities for inference (#10197) 2022-03-15 14:15:31 +01:00
test_serialize_doc.py Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
test_serialize_docbin.py Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
test_serialize_extension_attrs.py Merge branch 'master' into develop 2020-02-18 14:47:23 +01:00
test_serialize_kb.py Update flake8 version in reqs and CI 2021-06-28 11:29:36 +02:00
test_serialize_language.py Tagger: use unnormalized probabilities for inference (#10197) 2022-03-15 14:15:31 +01:00
test_serialize_pipeline.py Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
test_serialize_span_groups.py Fix: De/Serialize SpanGroups including the SpanGroup keys (#10707) 2022-06-02 15:56:27 +02:00
test_serialize_tokenizer.py Add tokenizer option to allow Matcher handling for all rules (#10452) 2022-03-24 13:21:32 +01:00
test_serialize_vocab_strings.py Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00