spaCy/spacy/lang
Matthew Honnibal 04395ffa49 Bring English tag_map in line with UD Treebank
I wrote a small script to read the UD English training data and check
that our tag map and morph rules were resulting in the best POS map.
This hadn't been done for some time, and there have been various changes
to the UD schema since it has been done. After these changes we should
see much better agreement between our POS assignments and the UD POS
tags.
2019-03-21 13:53:44 +01:00
..
af 💫 Add base Language classes for more languages (#3276) 2019-02-15 01:31:19 +11:00
ar Add writing_system to ArabicDefaults (experimental) 2019-03-11 14:22:23 +01:00
bg 💫 Add base Language classes for more languages (#3276) 2019-02-15 01:31:19 +11:00
bn Merge branch 'master' into develop 2019-02-18 10:03:32 +01:00
ca Merge branch 'master' into develop 2019-02-17 17:51:17 +01:00
cs 💫 Add base Language classes for more languages (#3276) 2019-02-15 01:31:19 +11:00
da Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
de Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
el Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
en Bring English tag_map in line with UD Treebank 2019-03-21 13:53:44 +01:00
es 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
et 💫 Add base Language classes for more languages (#3276) 2019-02-15 01:31:19 +11:00
fa Add support for vocab.writing_system property (#3390) 2019-03-11 15:23:20 +01:00
fi 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
fr Tiny correction in french lookup dictionary (#3427) 2019-03-19 13:00:19 +01:00
ga 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
he Auto-format [ci skip] 2019-03-11 17:10:50 +01:00
hi 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
hr 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
hu Fix regex deprecation warnings 2019-02-21 11:56:47 +01:00
id Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
is 💫 Add base Language classes for more languages (#3276) 2019-02-15 01:31:19 +11:00
it Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00
ja Don't set extension attribute in Japanese (closes #3398) 2019-03-12 13:30:33 +01:00
kn Fix typo 2019-02-14 12:26:56 +01:00
lt 💫 Add base Language classes for more languages (#3276) 2019-02-15 01:31:19 +11:00
lv 💫 Add base Language classes for more languages (#3276) 2019-02-15 01:31:19 +11:00
nb Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
nl 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
pl Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
pt 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
ro 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
ru Merge branch 'master' into develop 2019-02-25 15:54:55 +01:00
si 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
sk 💫 Add base Language classes for more languages (#3276) 2019-02-15 01:31:19 +11:00
sl 💫 Add base Language classes for more languages (#3276) 2019-02-15 01:31:19 +11:00
sq 💫 Add base Language classes for more languages (#3276) 2019-02-15 01:31:19 +11:00
sv Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
ta Remove stray print statement (closes #3342) 2019-02-27 15:35:04 +01:00
te 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
th Merge branch 'master' into develop 2019-02-07 20:54:07 +01:00
tl Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
tr Merge branch 'master' into develop 2019-02-07 20:54:07 +01:00
tt Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
uk Add missing " (closes #3343) 2019-02-27 16:37:03 +01:00
ur Improve Italian & Urdu tokenization accuracy (#3228) 2019-02-04 22:39:25 +01:00
vi 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
xx 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
zh Auto-format [ci skip] 2019-03-11 17:10:50 +01:00
__init__.py Remove imports in /lang/__init__.py 2017-05-08 23:58:07 +02:00
char_classes.py Revert hyphens 2019-03-09 12:51:53 +01:00
lex_attrs.py Replacing regex library with re to increase tokenization speed (#3218) 2019-02-01 18:05:22 +11:00
norm_exceptions.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
punctuation.py Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00
tag_map.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
tokenizer_exceptions.py Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00