spaCy/spacy/tests/serialize
Adriane Boyd f4339f9bff
Fix tokenizer cache flushing (#7836)
* Fix tokenizer cache flushing

Fix/simplify tokenizer init detection in order to fix cache flushing
when properties are modified.

* Remove init reloading logic

* Remove logic disabling `_reload_special_cases` on init
  * Setting `rules` last in `__init__` (as before) means that setting
    other properties doesn't reload any special cases
  * Reset `rules` first in `from_bytes` so that setting other properties
    during deserialization doesn't reload any special cases
    unnecessarily
* Reset all properties in `Tokenizer.from_bytes` to allow any settings
  to be `None`

* Also reset special matcher when special cache is flushed

* Remove duplicate special case validation

* Add test for special cases flushing

* Extend test for tokenizer deserialization of None values
2021-04-22 18:14:57 +10:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_resource_warning.py Tidy up tests 2020-10-15 10:20:21 +02:00
test_serialize_config.py Ensure hyphen in config file works as string value (#7642) 2021-04-12 14:35:57 +02:00
test_serialize_doc.py Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
test_serialize_extension_attrs.py Merge branch 'master' into develop 2020-02-18 14:47:23 +01:00
test_serialize_kb.py consistently use registry as callable 2021-03-02 17:56:28 +01:00
test_serialize_language.py Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
test_serialize_pipeline.py multi-label textcat component (#6474) 2021-01-06 13:07:14 +11:00
test_serialize_tokenizer.py Fix tokenizer cache flushing (#7836) 2021-04-22 18:14:57 +10:00
test_serialize_vocab_strings.py Make vocab update in get_docs deterministic (#7603) 2021-04-09 11:53:13 +02:00