spaCy/spacy/tests/lang
Rokas Ramanauskas 61ce126d4c Lithuanian language support (#3895)
* initial LT lang support

* Added more stopwords. Started setting up some basic test environment (not complete)

* Initial morph rules for LT lang

* Closes #1 Adds tokenizer exceptions for Lithuanian

* Closes #5 Punctuation rules. Closes #6 Lexical Attributes

* test: add native examples to basic tests

* feat: add tag map for lt lang

* fix: remove undefined tag attribute 'Definite'

* feat: add lemmatizer for lt lang

* refactor: add new instances to lt lang morph rules; use tags from tag map

* refactor: add morph rules to lt lang defaults

* refactor: only keep nouns, verbs, adverbs and adjectives in lt lang lemmatizer lookup

* refactor: add capitalized words to lt lang lemmatizer

* refactor: add more num words to lt lang lex attrs

* refactor: update lt lang stop word set

* refactor: add new instances to lt lang tokenizer exceptions

* refactor: remove comments form lt lang init file

* refactor: use function instead of lambda in lt lex lang getter

* refactor: remove conversion to dict in lt init when dict is already provided

* chore: rename lt 'test_basic' to 'test_text'

* feat: add more lt text tests

* feat: add lemmatizer tests

* refactor: remove unused imports, add newline to end of file

* chore: add contributor agreement

* chore: change 'en' to 'lt' in lt example description

* fix: add missing encoding info

* style: add newline to end of file

* refactor: use python2 compatible syntax

* style: reformat code using black
2019-07-08 10:25:22 +02:00
..
ar Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
bn 💫 Port master changes over to develop (#2979) 2018-11-29 16:30:29 +01:00
ca Improve Italian & Urdu tokenization accuracy (#3228) 2019-02-04 22:39:25 +01:00
da 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
de 💫 Port master changes over to develop (#2979) 2018-11-29 16:30:29 +01:00
el 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
en Fix/irreg adverbs extension (#3499) 2019-03-28 13:23:33 +01:00
es 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
fi 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
fr Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00
ga 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
he 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
hu Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
id 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
it Improve Italian & Urdu tokenization accuracy (#3228) 2019-02-04 22:39:25 +01:00
ja Tags are joined with a comma and padded with asterisks (#3491) 2019-03-28 16:17:31 +01:00
lt Lithuanian language support (#3895) 2019-07-08 10:25:22 +02:00
nb 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
nl Improved Dutch language resources and Dutch lemmatization (#3409) 2019-04-03 14:13:26 +02:00
pl Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
pt 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
ro 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
ru Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
sv Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
th 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
tr 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
tt 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
uk Merge branch 'master' into develop 2019-02-25 15:54:55 +01:00
ur Improve Italian & Urdu tokenization accuracy (#3228) 2019-02-04 22:39:25 +01:00
__init__.py Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
test_attrs.py 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
test_initialize.py Fix noqa [ci skip] 2019-03-07 12:25:00 +01:00