spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-03-27 21:34:12 +03:00

History

Hiroshi Matsuda 150a39ccca Japanese model: add user_dict entries and small refactor (#5573 ) * user_dict fields: adding inflections, reading_forms, sub_tokens deleting: unidic_tags improve code readability around the token alignment procedure * add test cases, replace fugashi with sudachipy in conftest * move bunsetu.py to spaCy Universe as a pipeline component BunsetuRecognizer * tag is space -> both surface and tag are spaces * consider len(text)==0		2020-06-22 14:32:25 +02:00
..
ar	Revert #4334	2019-09-29 17:32:12 +02:00
bn	Revert #4334	2019-09-29 17:32:12 +02:00
ca	Revert #4334	2019-09-29 17:32:12 +02:00
da	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
de	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
el	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
en	Add missing pronoums/determiners (#5569 )	2020-06-10 18:47:04 +02:00
es	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
eu	Add __init__.py to eu and hy tests (#5278 )	2020-04-08 20:03:06 +02:00
fa	Fix syntax iterators for Persian (#5437 )	2020-05-14 16:51:03 +02:00
fi	add two abbreviations and some additional unit tests (#5040 )	2020-02-22 14:12:32 +01:00
fr	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
ga	Revert #4334	2019-09-29 17:32:12 +02:00
gu	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
he	Revert #4334	2019-09-29 17:32:12 +02:00
hu	Tidy up and auto-format	2020-03-25 12:28:12 +01:00
hy	Add missing declaration	2020-05-21 17:30:05 +02:00
id	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
it	Revert #4334	2019-09-29 17:32:12 +02:00
ja	Japanese model: add user_dict entries and small refactor (#5573 )	2020-06-22 14:32:25 +02:00
ko	Revert #4334	2019-09-29 17:32:12 +02:00
lb	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
lt	Improve Lithuanian tokenization (#5205 )	2020-03-25 11:28:12 +01:00
ml	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
nb	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
ne	Add Nepali Language (#5622 )	2020-06-22 10:25:46 +02:00
nl	Move lookup tables out of the core library (#4346 )	2019-10-01 00:01:27 +02:00
pl	Update Polish tokenizer for UD_Polish-PDB (#5432 )	2020-05-19 15:59:55 +02:00
pt	Revert #4334	2019-09-29 17:32:12 +02:00
ro	Move lookup tables out of the core library (#4346 )	2019-10-01 00:01:27 +02:00
ru	Revert #4334	2019-09-29 17:32:12 +02:00
sr	Move lookup tables out of the core library (#4346 )	2019-10-01 00:01:27 +02:00
sv	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
th	Revert #4334	2019-09-29 17:32:12 +02:00
tr	Move lookup tables out of the core library (#4346 )	2019-10-01 00:01:27 +02:00
tt	Add trailing whitespace to multiline test text (#4877 )	2020-01-06 14:58:59 +01:00
uk	Revert #4334	2019-09-29 17:32:12 +02:00
ur	Revert #4334	2019-09-29 17:32:12 +02:00
yo	Adding support for Yoruba Language (#4614 )	2019-12-21 14:11:50 +01:00
zh	Tidy up and auto-format	2020-05-21 14:14:01 +02:00
__init__.py	Revert #4334	2019-09-29 17:32:12 +02:00
test_attrs.py	Tidy up and auto-format	2019-12-21 19:04:17 +01:00
test_initialize.py	Adding support for Yoruba Language (#4614 )	2019-12-21 14:11:50 +01:00