Commit Graph

4 Commits

Author SHA1 Message Date
Adriane Boyd
6f314f99c4
Use Latin normalization for Serbian attrs (#12608)
* Use Latin normalization for Serbian attrs

Use Latin normalization for Serbian `NORM`, `PREFIX`, and `SUFFIX`.

* Update NORMs in tokenizer exceptions and related tests

* Add tests for all custom lex attrs

* Remove unused imports
2023-05-08 12:33:56 +02:00
Ines Montani
db55577c45
Drop Python 2.7 and 3.5 (#4828)
* Remove unicode declarations

* Remove Python 3.5 and 2.7 from CI

* Don't require pathlib

* Replace compat helpers

* Remove OrderedDict

* Use f-strings

* Set Cython compiler language level

* Fix typo

* Re-add OrderedDict for Table

* Update setup.cfg

* Revert CONTRIBUTING.md

* Revert lookups.md

* Revert top-level.md

* Small adjustments and docs [ci skip]
2019-12-22 01:53:56 +01:00
Ines Montani
a8752a569d Auto-format [ci skip] 2019-08-22 11:44:39 +02:00
Pavle Vidanović
60e10a9f93 Serbian language improvement (#4169)
* Serbian stopwords added. (cyrillic alphabet)

* spaCy Contribution agreement included.

* Test initialize updated

* Serbian language code update. --bugfix

* Tokenizer exceptions added. Init file updated.

* Norm exceptions and lexical attributes added.

* Examples added.

* Tests added.

* sr_lang examples update.

* Tokenizer exceptions updated. (Serbian)
2019-08-22 11:43:07 +02:00