spaCy/spacy/tests/lang
Adriane Boyd 30d31fd335
Update Russian and Ukrainian lemmatizers (#11811)
* pymorph2 issues #11620, #11626, #11625:
- #11620: pymorphy2_lookup
- #11626: handle multiple forms pointing to the same normal form + handling empty POS tag
- #11625: matching DET that are labelled as PRON by pymorhp2

* Move lemmatizer algorithm changes back into RussianLemmatizer

* Fix uk pymorphy3_lookup mode init

* Move and update tests for ru/uk lookup lemmatizer modes

* Fix typo

* Remove traces of previous behavior for uninflected POS

* Refactor to private generic-looking pymorphy methods

* Remove xfailed uk lemmatizer cases

* Update spacy/lang/ru/lemmatizer.py

Co-authored-by: Richard Hudson <richard@explosion.ai>

Co-authored-by: Dmytro S Lituiev <d.lituiev@gmail.com>
Co-authored-by: Richard Hudson <richard@explosion.ai>
2022-11-25 11:12:46 +01:00
..
af New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
am Tidy up and auto-format 2021-01-15 11:57:36 +11:00
ar Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
bg Handle Cyrillic combining diacritics (#10837) 2022-06-28 15:35:32 +02:00
bn Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ca Update Catalan tokenizer (#9297) 2021-09-27 14:42:30 +02:00
cs Remove unicode declarations and update language data 2020-09-04 13:19:16 +02:00
da Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
de Tidy up and auto-format 2020-09-29 21:39:28 +02:00
dsb Add Lower Sorbian support. (#10431) 2022-03-07 16:57:14 +01:00
el Tidy up and auto-format 2020-09-29 21:39:28 +02:00
en Remove English exceptions with mismatched features (#10873) 2022-06-03 09:44:04 +02:00
es Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
et New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
eu Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
fa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
fi Auto-format code with black (#10333) 2022-02-21 09:15:42 +01:00
fr Revert "Bump sudachipy version (#9917)" (#10071) 2022-01-17 10:38:37 +01:00
ga Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
grc add punctuation to grc (#11426) 2022-09-27 11:38:56 +02:00
gu Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
he Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00
hi Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
hr New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
hsb Add Upper Sorbian support. (#10432) 2022-03-07 16:20:39 +01:00
hu 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
hy Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
id Tidy up and auto-format 2020-09-29 21:39:28 +02:00
is New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
it Revert "Bump sudachipy version (#9917)" (#10071) 2022-01-17 10:38:37 +01:00
ja Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
ko Handle unknown tags in KoreanTokenizer tag map (#10536) 2022-03-24 11:25:36 +01:00
ky Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
la Auto-format code with black (#11427) 2022-09-02 11:43:20 +02:00
lb Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
lg luganda language extension (#10847) 2022-08-23 13:09:36 +02:00
lt Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
lv New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
mk Tidy up and auto-format 2021-01-05 13:41:53 +11:00
ml Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
nb Tidy up and auto-format 2020-09-29 21:39:28 +02:00
ne Tidy up and auto-format 2020-09-29 21:39:28 +02:00
nl Fix Dutch noun chunks to skip overlapping spans (#11275) 2022-08-10 09:49:08 +02:00
pl Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
pt Portuguese noun chunks review (#9559) 2021-11-04 23:55:49 +01:00
ro Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ru Update Russian and Ukrainian lemmatizers (#11811) 2022-11-25 11:12:46 +01:00
sa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
sk New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
sl Updates to Slovenian language (#11162) 2022-08-05 10:10:18 +02:00
sq New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
sr Un-xfail passing tests 2019-12-25 18:02:20 +01:00
sv Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
ta Basic tests for the Tamil language (#10629) 2022-04-07 14:47:37 +02:00
th Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
ti Update Tigrinya ትግርኛ language support (#8900) 2021-08-10 13:55:08 +02:00
tl Add initial Tagalog (tl) tests (#9582) 2021-11-02 08:35:49 +01:00
tr removing print statements from the test suite (#10712) 2022-04-27 09:14:25 +02:00
tt Merge branch 'master' into develop 2020-02-18 14:47:23 +01:00
uk Update Russian and Ukrainian lemmatizers (#11811) 2022-11-25 11:12:46 +01:00
ur Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
vi Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
xx New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
yo Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
zh Tidy up and auto-format 2020-10-03 17:20:18 +02:00
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_attrs.py Intify IOB (#9738) 2022-01-20 13:19:38 +01:00
test_initialize.py Fix Azerbaijani init, extend lang init tests (#8656) 2021-07-09 15:36:35 +02:00
test_lemmatizers.py Update Catalan language data (#8308) 2021-06-11 10:21:22 +02:00