spaCy/spacy/lang/pl
Stanisław Giziński 1448ad100c Improved polish tokenizer and stop words. (#2974)
* Improved stop words list

* Removed some wrong stop words form list

* Improved stop words list

* Removed some wrong stop words form list

* Improved Polish Tokenizer (#38)

* Add tests for polish tokenizer

* Add polish tokenizer exceptions

* Don't split any words containing hyphens

* Fix test case with wrong model answer

* Remove commented out line of code until better solution is found

* Add source srx' license

* Rename exception_list.py to match spaCy conventionality

* Add a brief explanation of where the exception list comes from

* Add newline after reach exception

* Rename COPYING.txt to LICENSE

* Delete old files

* Add header to the license

* Agreements signed

* Stanisław Giziński agreement

* Krzysztof Kowalczyk - signed agreement

* Mateusz Olko agreement

* Add DoomCoder's contributor agreement

* Improve like number checking in polish lang


* like num tests added

* all from SI system added

* Final licence and removed splitting exceptions

* Added polish stop words to LEX_ATTRA

* Add encoding info to pl tokenizer exceptions
2019-02-08 14:27:21 +11:00
..
__init__.py Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
_tokenizer_exceptions_list.py Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
examples.py Add language example sentences (see #1107) 2017-08-19 12:22:29 +02:00
lex_attrs.py Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
polish_srx_rules_LICENSE.txt Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
punctuation.py Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
stop_words.py Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00
tokenizer_exceptions.py Improved polish tokenizer and stop words. (#2974) 2019-02-08 14:27:21 +11:00