mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-24 00:46:28 +03:00
cbc2cee2c8
* Move prefix and suffix detection for URL_PATTERN Move prefix and suffix detection for `URL_PATTERN` into the tokenizer. Remove associated lookahead and lookbehind from `URL_PATTERN`. Fix tokenization for Hungarian given new modified handling of prefixes and suffixes. * Match a wider range of URI schemes |
||
---|---|---|
.. | ||
__init__.py | ||
examples.py | ||
punctuation.py | ||
stop_words.py | ||
tokenizer_exceptions.py |