mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-24 20:51:30 +03:00
* Move prefix and suffix detection for URL_PATTERN Move prefix and suffix detection for `URL_PATTERN` into the tokenizer. Remove associated lookahead and lookbehind from `URL_PATTERN`. Fix tokenization for Hungarian given new modified handling of prefixes and suffixes. * Match a wider range of URI schemes |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| sun.txt | ||
| test_exceptions.py | ||
| test_naughty_strings.py | ||
| test_tokenizer.py | ||
| test_urls.py | ||
| test_whitespace.py | ||