spaCy/spacy/tests/tokenizer
adrianeboyd de69bc6509 Fix and improve URL pattern (#4882)
* match domains longer than `hostname.domain.tld` like `www.foo.co.uk`
* expand allowed characters in domain names while only matching
lowercase TLDs so that "this.That" isn't matched as a URL and can be
split on the period as an infix (relevant for at least English, German,
and Tatar)
2020-01-06 14:58:30 +01:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
sun.txt Revert #4334 2019-09-29 17:32:12 +02:00
test_exceptions.py Revert #4334 2019-09-29 17:32:12 +02:00
test_explain.py Detect more empty matches in tokenizer.explain() (#4675) 2019-11-20 16:31:29 +01:00
test_naughty_strings.py Revert #4334 2019-09-29 17:32:12 +02:00
test_tokenizer.py Revert #4334 2019-09-29 17:32:12 +02:00
test_urls.py Fix and improve URL pattern (#4882) 2020-01-06 14:58:30 +01:00
test_whitespace.py Revert #4334 2019-09-29 17:32:12 +02:00