mirror of
https://github.com/explosion/spaCy.git
synced 2025-11-02 00:47:52 +03:00
* fix: De/Serialize `SpanGroups` including the SpanGroup keys This prevents the loss of `SpanGroup`s that have the same .name as other `SpanGroup`s within the same `SpanGroups` object (upon de/serialization of the `SpanGroups`). Fixes #10685 * Maintain backwards compatibility for serialized `SpanGroups` (serialized as: a list of `SpanGroup`s, or b'') * Add tests for `SpanGroups` deserialization backwards-compatibility * Move a `SpanGroups` de/serialization test (test_issue10685) to tests/serialize/test_serialize_spangroups.py * Output a warning if deserializing a `SpanGroups` with duplicate .name-d `SpanGroup`s * Minor refactor * `SpanGroups.from_bytes` handles only `list` and `dict` types with `dict` as the expected default * For lists, keep first rather than last value encountered * Update error message * Rename and update tests * Update to preserve list serialization of SpanGroups To avoid breaking compatibility of serialized `Doc` and `DocBin` with earlier versions of spacy v3, revert back to a list-only serialization, but update the names just for serialization so that the SpanGroups keys override the SpanGroup names. * Preserve object identity and current key overwrite * Preserve SpanGroup object identity * Preserve last rather than first span group from SpanGroup list format without SpanGroups keys * Update inline comments * Fix types * Add type info for SpanGroup.copy * Deserialize `SpanGroup`s as copies when a single SpanGroup is the value for more than 1 `SpanGroups` key. This is because we serialize `SpanGroups` as dicts (to maintain backward- and forward-compatibility) and we can't assume `SpanGroup`s with the same bytes/serialization were the same (identical) object, pre-serialization. * Update spacy/tokens/_dict_proxies.py * Add more SpanGroups serialization tests Test that serialized SpanGroups maintain their Span order * small clarification on older spaCy version * Update spacy/tests/serialize/test_serialize_span_groups.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| __init__.pxd | ||
| __init__.py | ||
| _dict_proxies.py | ||
| _retokenize.pyi | ||
| _retokenize.pyx | ||
| _serialize.py | ||
| doc.pxd | ||
| doc.pyi | ||
| doc.pyx | ||
| graph.pxd | ||
| graph.pyx | ||
| morphanalysis.pxd | ||
| morphanalysis.pyi | ||
| morphanalysis.pyx | ||
| span_group.pxd | ||
| span_group.pyi | ||
| span_group.pyx | ||
| span.pxd | ||
| span.pyi | ||
| span.pyx | ||
| token.pxd | ||
| token.pyi | ||
| token.pyx | ||
| underscore.py | ||