mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-24 04:31:17 +03:00
* Fix tokenizer cache flushing
Fix/simplify tokenizer init detection in order to fix cache flushing
when properties are modified.
* Remove init reloading logic
* Remove logic disabling `_reload_special_cases` on init
* Setting `rules` last in `__init__` (as before) means that setting
other properties doesn't reload any special cases
* Reset `rules` first in `from_bytes` so that setting other properties
during deserialization doesn't reload any special cases
unnecessarily
* Reset all properties in `Tokenizer.from_bytes` to allow any settings
to be `None`
* Also reset special matcher when special cache is flushed
* Remove duplicate special case validation
* Add test for special cases flushing
* Extend test for tokenizer deserialization of None values
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| sun.txt | ||
| test_exceptions.py | ||
| test_explain.py | ||
| test_naughty_strings.py | ||
| test_tokenizer.py | ||
| test_urls.py | ||
| test_whitespace.py | ||