spaCy/spacy/tests/tokenizer
adrianeboyd cbc2cee2c8 Improve URL_PATTERN and handling in tokenizer (#4374)
* Move prefix and suffix detection for URL_PATTERN

Move prefix and suffix detection for `URL_PATTERN` into the tokenizer.
Remove associated lookahead and lookbehind from `URL_PATTERN`.

Fix tokenization for Hungarian given new modified handling of prefixes
and suffixes.

* Match a wider range of URI schemes
2019-10-05 13:00:09 +02:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
sun.txt Revert #4334 2019-09-29 17:32:12 +02:00
test_exceptions.py Revert #4334 2019-09-29 17:32:12 +02:00
test_naughty_strings.py Revert #4334 2019-09-29 17:32:12 +02:00
test_tokenizer.py Revert #4334 2019-09-29 17:32:12 +02:00
test_urls.py Improve URL_PATTERN and handling in tokenizer (#4374) 2019-10-05 13:00:09 +02:00
test_whitespace.py Revert #4334 2019-09-29 17:32:12 +02:00