spaCy/spacy/tests/lang
Paul O'Leary McCann 58bdd8607b
Bump sudachipy version (#9917)
* Edited Slovenian stop words list (#9707)

* Noun chunks for Italian (#9662)

* added it vocab

* copied portuguese

* added possessive determiner

* added conjed Nps

* added nmoded Nps

* test misc

* more examples

* fixed typo

* fixed parenth

* fixed comma

* comma fix

* added syntax iters

* fix some index problems

* fixed index

* corrected heads for test case

* fixed tets case

* fixed determiner gender

* cleaned left over

* added example with apostophe

* French NP review (#9667)

* adapted from pt

* added basic tests

* added fr vocab

* fixed noun chunks

* more examples

* typo fix

* changed naming

* changed the naming

* typo fix

* Add Japanese kana characters to default exceptions (fix #9693) (#9742)

This includes the main kana, or phonetic characters, used in Japanese.

There are some supplemental kana blocks in Unicode outside the BMP that
could also be included, but because their actual use is rare I omitted
them for now, but maybe they should be added. The omitted blocks are:

- Kana Supplement
- Kana Extended (A and B)
- Small Kana Extension

* Remove NER words from stop words in Norwegian (#9820)

Default stop words in Norwegian bokmål (nb) in Spacy contain important entities, e.g. France, Germany, Russia, Sweden and USA, police district, important units of time, e.g. months and days of the week, and organisations.

Nobody expects their presence among the default stop words. There is a danger of users complying with the general recommendation of filtering out stop words, while being unaware of filtering out important entities from their data.

See explanation in https://github.com/explosion/spaCy/issues/3052#issuecomment-986756711 and comment https://github.com/explosion/spaCy/issues/3052#issuecomment-986951831

* Bump sudachipy version

* Update sudachipy versions

* Bump versions

Bumping to the most recent dictionary just to keep thing current.
Bumping sudachipy to 5.2 because older versions don't support recent
dictionaries.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Richard Hudson <richard@explosion.ai>
Co-authored-by: Duygu Altinok <duygu@explosion.ai>
Co-authored-by: Haakon Meland Eriksen <haakon.eriksen@far.no>
2022-01-17 08:16:22 +01:00
..
af New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
am Tidy up and auto-format 2021-01-15 11:57:36 +11:00
ar Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
bg Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
bn Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ca Update Catalan tokenizer (#9297) 2021-09-27 14:42:30 +02:00
cs Remove unicode declarations and update language data 2020-09-04 13:19:16 +02:00
da Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
de Tidy up and auto-format 2020-09-29 21:39:28 +02:00
el Tidy up and auto-format 2020-09-29 21:39:28 +02:00
en Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
es Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
et New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
eu Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
fa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
fi Tidy up code 2021-06-28 12:08:15 +02:00
fr Bump sudachipy version (#9917) 2022-01-17 08:16:22 +01:00
ga Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
grc Added ancient Greek language support (#8606) 2021-07-15 10:27:17 +02:00
gu Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
he Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00
hi Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
hr New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
hu 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
hy Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
id Tidy up and auto-format 2020-09-29 21:39:28 +02:00
is New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
it Bump sudachipy version (#9917) 2022-01-17 08:16:22 +01:00
ja Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
ko Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
ky Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
lb Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
lt Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
lv New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
mk Tidy up and auto-format 2021-01-05 13:41:53 +11:00
ml Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
nb Tidy up and auto-format 2020-09-29 21:39:28 +02:00
ne Tidy up and auto-format 2020-09-29 21:39:28 +02:00
nl Adding noun_chunks to the DUTCH language model (nl) (#8529) 2021-07-14 14:01:02 +02:00
pl Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
pt Portuguese noun chunks review (#9559) 2021-11-04 23:55:49 +01:00
ro Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ru Tidy up tests and docs 2020-09-21 20:43:54 +02:00
sa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
sk New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
sl New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
sq New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
sr Un-xfail passing tests 2019-12-25 18:02:20 +01:00
sv Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
th Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
ti Update Tigrinya ትግርኛ language support (#8900) 2021-08-10 13:55:08 +02:00
tl Add initial Tagalog (tl) tests (#9582) 2021-11-02 08:35:49 +01:00
tr Tidy up and auto-format 2021-01-05 13:41:53 +11:00
tt Merge branch 'master' into develop 2020-02-18 14:47:23 +01:00
uk Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
ur Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
vi Update custom tokenizer APIs and pickling (#8972) 2021-08-19 14:37:47 +02:00
xx New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
yo Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
zh Tidy up and auto-format 2020-10-03 17:20:18 +02:00
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_attrs.py Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
test_initialize.py Fix Azerbaijani init, extend lang init tests (#8656) 2021-07-09 15:36:35 +02:00
test_lemmatizers.py Update Catalan language data (#8308) 2021-06-11 10:21:22 +02:00