Grey Murav
a9756963e6
Extend list of abbreviations for ru language ( #10282 )
...
* Extend list of abbreviations for ru language
Extended list of abbreviations for ru language those may have influence on tokenization.
* black formatting
Co-authored-by: thomashacker <EdwardSchmuhl@web.de>
2022-02-17 15:48:50 +01:00
Ines Montani
a624ae0675
Remove POS, TAG and LEMMA from tokenizer exceptions
2020-07-22 23:09:01 +02:00
Ines Montani
b507f61629
Tidy up and move noun_chunks, token_match, url_match
2020-07-22 22:18:46 +02:00
Ines Montani
db55577c45
Drop Python 2.7 and 3.5 ( #4828 )
...
* Remove unicode declarations
* Remove Python 3.5 and 2.7 from CI
* Don't require pathlib
* Replace compat helpers
* Remove OrderedDict
* Use f-strings
* Set Cython compiler language level
* Fix typo
* Re-add OrderedDict for Table
* Update setup.cfg
* Revert CONTRIBUTING.md
* Revert lookups.md
* Revert top-level.md
* Small adjustments and docs [ci skip]
2019-12-22 01:53:56 +01:00
Ines Montani
f580302673
Tidy up and auto-format
2019-08-20 17:36:34 +02:00
Vadim Mazaev
cacd859dcd
Added tag map, fixed tests fails, added more exceptions
2017-11-26 20:54:48 +03:00
Vadim Mazaev
52ee1f9bf9
Updated Russian Language, added lemmatizer, norm exceptions and lex
...
attrs
2017-11-21 11:44:46 +03:00
Vadim Mazaev
a0739a06d4
Returned russian support from v1.10 branch
2017-11-17 17:06:15 +03:00