spaCy/spacy/tests/tokenizer
Adriane Boyd f4339f9bff
Fix tokenizer cache flushing (#7836)
* Fix tokenizer cache flushing

Fix/simplify tokenizer init detection in order to fix cache flushing
when properties are modified.

* Remove init reloading logic

* Remove logic disabling `_reload_special_cases` on init
  * Setting `rules` last in `__init__` (as before) means that setting
    other properties doesn't reload any special cases
  * Reset `rules` first in `from_bytes` so that setting other properties
    during deserialization doesn't reload any special cases
    unnecessarily
* Reset all properties in `Tokenizer.from_bytes` to allow any settings
  to be `None`

* Also reset special matcher when special cache is flushed

* Remove duplicate special case validation

* Add test for special cases flushing

* Extend test for tokenizer deserialization of None values
2021-04-22 18:14:57 +10:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
sun.txt Revert #4334 2019-09-29 17:32:12 +02:00
test_exceptions.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
test_explain.py Update Tokenizer.explain with special matches (#7749) 2021-04-19 19:08:20 +10:00
test_naughty_strings.py Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00
test_tokenizer.py Fix tokenizer cache flushing (#7836) 2021-04-22 18:14:57 +10:00
test_urls.py Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
test_whitespace.py Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00