spaCy/spacy/tests/lang/ja
Hiroshi Matsuda 150a39ccca
Japanese model: add user_dict entries and small refactor (#5573)
* user_dict fields: adding inflections, reading_forms, sub_tokens
deleting: unidic_tags
improve code readability around the token alignment procedure

* add test cases, replace fugashi with sudachipy in conftest

* move bunsetu.py to spaCy Universe as a pipeline component BunsetuRecognizer

* tag is space -> both surface and tag are spaces

* consider len(text)==0
2020-06-22 14:32:25 +02:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_lemmatization.py Add Japanese Model (#5544) 2020-06-04 19:15:43 +02:00
test_serialize.py Update Japanese tokenizer config and add serialization (#5562) 2020-06-08 16:29:05 +02:00
test_tokenizer.py Japanese model: add user_dict entries and small refactor (#5573) 2020-06-22 14:32:25 +02:00