spaCy/spacy/tests/lang
Stanisław Giziński 1448ad100c Improved polish tokenizer and stop words. (#2974)
* Improved stop words list

* Removed some wrong stop words form list

* Improved stop words list

* Removed some wrong stop words form list

* Improved Polish Tokenizer (#38)

* Add tests for polish tokenizer

* Add polish tokenizer exceptions

* Don't split any words containing hyphens

* Fix test case with wrong model answer

* Remove commented out line of code until better solution is found

* Add source srx' license

* Rename exception_list.py to match spaCy conventionality

* Add a brief explanation of where the exception list comes from

* Add newline after reach exception

* Rename COPYING.txt to LICENSE

* Delete old files

* Add header to the license

* Agreements signed

* Stanisław Giziński agreement

* Krzysztof Kowalczyk - signed agreement

* Mateusz Olko agreement

* Add DoomCoder's contributor agreement

* Improve like number checking in polish lang


* like num tests added

* all from SI system added

* Final licence and removed splitting exceptions

* Added polish stop words to LEX_ATTRA

* Add encoding info to pl tokenizer exceptions
2019-02-08 14:27:21 +11:00
..
ar Add Arabic language (#2314) 2018-05-15 00:27:19 +02:00
bn update bengali token rules for hyphen and digits (#2731) 2018-09-05 21:49:00 +02:00
ca Catalan Language Support (#2940) 2018-11-26 15:25:47 +01:00
da Add Danish lemmatizer (#2184) 2018-04-07 19:07:28 +02:00
de Update tests for pytest 4.x (#2965) 2018-11-26 18:14:57 +01:00
el Add support for Greek language (#2535) 2018-07-10 13:48:38 +02:00
en Update tests for pytest 4.x (#2965) 2018-11-26 18:14:57 +01:00
es Move language-specific tests to tests/lang 2017-05-09 00:02:37 +02:00
fi Move language-specific tests to tests/lang 2017-05-09 00:02:37 +02:00
fr Fix small typo bug in French regexp + relevant unit test (#2980) 2018-11-29 20:16:13 +01:00
ga merge 2017-10-31 22:55:59 +00:00
he Move language-specific tests to tests/lang 2017-05-09 00:02:37 +02:00
hu Update tests for pytest 4.x (#2965) 2018-11-26 18:14:57 +01:00
id added {pre,suf,in}fix tests 2017-08-20 13:43:00 +07:00
ja Add Japanese lemmas (#2543) 2018-07-13 10:55:14 +02:00
nb Move language-specific tests to tests/lang 2017-05-09 00:02:37 +02:00
pl Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
ro Updates to Romanian support (#2354) 2018-05-24 11:40:00 +02:00
ru Added tag map, fixed tests fails, added more exceptions 2017-11-26 20:54:48 +03:00
sv Updates to Swedish Language (#3164) 2019-01-16 13:45:50 +01:00
th add thai in spacy2 2017-09-26 21:36:27 +07:00
tr Adds Turkish Lemmatization 2017-12-01 17:04:32 +03:00
tt Add Tatar Language Support (#2444) 2018-06-19 10:17:53 +02:00
uk Ukrainian language added. Small fixes in Russian (#3241) 2019-02-07 21:05:11 +01:00
ur Add Urdu Language Support (#2430) 2018-06-22 11:14:03 +02:00
__init__.py Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
test_attrs.py added lex test for is_currency 2018-02-11 18:50:50 +01:00