Commit Graph

22 Commits

Author SHA1 Message Date
Stanisław Giziński
1448ad100c Improved polish tokenizer and stop words. (#2974)
* Improved stop words list

* Removed some wrong stop words form list

* Improved stop words list

* Removed some wrong stop words form list

* Improved Polish Tokenizer (#38)

* Add tests for polish tokenizer

* Add polish tokenizer exceptions

* Don't split any words containing hyphens

* Fix test case with wrong model answer

* Remove commented out line of code until better solution is found

* Add source srx' license

* Rename exception_list.py to match spaCy conventionality

* Add a brief explanation of where the exception list comes from

* Add newline after reach exception

* Rename COPYING.txt to LICENSE

* Delete old files

* Add header to the license

* Agreements signed

* Stanisław Giziński agreement

* Krzysztof Kowalczyk - signed agreement

* Mateusz Olko agreement

* Add DoomCoder's contributor agreement

* Improve like number checking in polish lang


* like num tests added

* all from SI system added

* Final licence and removed splitting exceptions

* Added polish stop words to LEX_ATTRA

* Add encoding info to pl tokenizer exceptions
2019-02-08 14:27:21 +11:00
tyburam
476472d181 Lex _attrs for polish language (#2750)
* Signed spaCy contributor agreement

* Added polish version of english lex_attrs
2018-09-10 11:53:57 +02:00
Ines Montani
68226109f4
Merge pull request #2142 from jimregan/polish-more-tokens
more exceptions
2018-03-24 19:06:44 +01:00
Jim O'Regan
efe037e8be more exceptions 2018-03-24 00:05:27 +00:00
Jim O'Regan
c3e6cee17a use inan in polimorf tagset conversion 2017-11-29 23:15:47 +00:00
Jim O'Regan
b32575e78c imports 2017-11-29 23:03:41 +00:00
Jim O'Regan
3696ce6a7b add UD mapping 2017-11-29 22:59:19 +00:00
Jim O'Regan
076a6fc60a symbols 2017-11-29 20:11:20 +00:00
Jim O'Regan
834ba3c69a (semi generated) Polimorf mapping 2017-11-29 20:08:24 +00:00
ines
18c859500b Add missing imports 2017-11-01 23:02:51 +01:00
ines
819e30a26e Tidy up tokenizer exceptions 2017-11-01 23:02:45 +01:00
ines
7e424a1804 Don't copy exception dicts if not necessary and tidy up 2017-10-31 21:05:29 +01:00
ines
8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00
ines
0c2343d73a Tidy up language data 2017-10-11 02:22:49 +02:00
ines
1fe5e1a4d1 Add language example sentences (see #1107)
da, de, en, es, fr, he, it, nb, pl, pt, sv
2017-08-19 12:22:29 +02:00
Jim Regan
d81ceb0cd5 Merge branch 'develop' into polish 2017-06-26 22:42:27 +01:00
Jim O'Regan
2f84c73585 a start 2017-06-26 22:40:04 +01:00
Jim O'Regan
28d7f0a672 reference 2017-06-26 22:38:28 +01:00
ines
4c643d74c5 Add norm exceptions to other Language classes 2017-06-03 22:29:21 +02:00
ines
924e8506de Move Defaults subclass to module scope (necessary for pickling) 2017-05-20 19:02:27 +02:00
ines
a4a37a783e Remove import from non-existing module 2017-05-13 16:00:09 +02:00
ines
ca65993d59 Add basic Polish Language class 2017-05-12 09:25:37 +02:00