spaCy/spacy/lang/fi
Antti Ajanki e1f777b151
Improvements for Finnish tokenizer (#4985)
* don't split on a colon. Colon is used to attach suffixes for abbreviations
* tokenize on any of LIST_HYPHENS (except a single hyphen), not just on --
* simplify infix rules by merging similar rules
2020-02-10 20:32:43 -05:00
..
__init__.py Improvements to the Finnish language data (#4738) 2019-12-03 12:55:28 +01:00
examples.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
lex_attrs.py Improvements to the Finnish language data (#4738) 2019-12-03 12:55:28 +01:00
punctuation.py Improvements for Finnish tokenizer (#4985) 2020-02-10 20:32:43 -05:00
stop_words.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
tokenizer_exceptions.py Improvements for Finnish tokenizer (#4985) 2020-02-10 20:32:43 -05:00