spaCy/spacy/lang/hu
adrianeboyd cbc2cee2c8 Improve URL_PATTERN and handling in tokenizer (#4374)
* Move prefix and suffix detection for URL_PATTERN

Move prefix and suffix detection for `URL_PATTERN` into the tokenizer.
Remove associated lookahead and lookbehind from `URL_PATTERN`.

Fix tokenization for Hungarian given new modified handling of prefixes
and suffixes.

* Match a wider range of URI schemes
2019-10-05 13:00:09 +02:00
..
__init__.py Move lookup tables out of the core library (#4346) 2019-10-01 00:01:27 +02:00
examples.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
punctuation.py Improve URL_PATTERN and handling in tokenizer (#4374) 2019-10-05 13:00:09 +02:00
stop_words.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
tokenizer_exceptions.py Fix regex deprecation warnings 2019-02-21 11:56:47 +01:00