spaCy/spacy/lang/tr
Duygu Altinok 0e55f806dd
Turkish tokenization improvements (#6268)
* added single and paired orth variants

* added token match

* added long text tokenization test

* inverted init

* normalized lemmas to lowercase

* more abbrevs

* tests for ordinals and abbrevs

* separated period abbvrevs to another list

* fiex typo

* added ordinal and abbrev tests

* added number tests for dates

* minor refinement

* added inflected abbrevs regex

* added percentage and inflection

* cosmetics

* added token match

* added url inflection tests

* excluded url tokens from custom pattern

* removed url match import
2020-10-29 09:43:17 +01:00
..
__init__.py Turkish tokenization improvements (#6268) 2020-10-29 09:43:17 +01:00
examples.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
lex_attrs.py Ordinal numbers for Turkish (#6142) 2020-10-07 10:25:37 +02:00
morph_rules.py Turkish tag map and morph rules addition (#6141) 2020-10-07 10:27:36 +02:00
stop_words.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
syntax_iterators.py Turkish language syntax iterators (#6191) 2020-10-07 11:07:52 +02:00
tokenizer_exceptions.py Turkish tokenization improvements (#6268) 2020-10-29 09:43:17 +01:00