mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-15 12:06:25 +03:00
5861308910
Handle tokenizer special cases more generally by using the Matcher internally to match special cases after the affix/token_match tokenization is complete. Instead of only matching special cases while processing balanced or nearly balanced prefixes and suffixes, this recognizes special cases in a wider range of contexts: * Allows arbitrary numbers of prefixes/affixes around special cases * Allows special cases separated by infixes Existing tests/settings that couldn't be preserved as before: * The emoticon '")' is no longer a supported special case * The emoticon ':)' in "example:)" is a false positive again When merged with #4258 (or the relevant cache bugfix), the affix and token_match properties should be modified to flush and reload all special cases to use the updated internal tokenization with the Matcher. |
||
---|---|---|
.. | ||
__init__.py | ||
test_issue1-1000.py | ||
test_issue1001-1500.py | ||
test_issue1501-2000.py | ||
test_issue2001-2500.py | ||
test_issue2501-3000.py | ||
test_issue3001-3500.py | ||
test_issue3521.py | ||
test_issue3526.py | ||
test_issue3531.py | ||
test_issue3540.py | ||
test_issue3549.py | ||
test_issue3555.py | ||
test_issue3611.py | ||
test_issue3625.py | ||
test_issue3803.py | ||
test_issue3830.py | ||
test_issue3839.py | ||
test_issue3869.py | ||
test_issue3879.py | ||
test_issue3880.py | ||
test_issue3882.py | ||
test_issue3951.py | ||
test_issue3959.py | ||
test_issue3962.py | ||
test_issue3972.py | ||
test_issue4002.py | ||
test_issue4030.py | ||
test_issue4054.py | ||
test_issue4104.py | ||
test_issue4120.py | ||
test_issue4133.py | ||
test_issue4190.py |