spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-06 01:34:25 +03:00

History

adrianeboyd 0b9a5f4074 Rework Chinese language initialization and tokenization (#4619 ) * Rework Chinese language initialization * Create a `ChineseTokenizer` class * Modify jieba post-processing to handle whitespace correctly * Modify non-jieba character tokenization to handle whitespace correctly * Add a `create_tokenizer()` method to `ChineseDefaults` * Load lexical attributes * Update Chinese tag_map for UD v2 * Add very basic Chinese tests * Test tokenization with and without jieba * Test `like_num` attribute * Fix try_jieba_import() * Fix zh code formatting		2019-11-11 14:23:21 +01:00
..
__init__.py	Rework Chinese language initialization and tokenization (#4619 )	2019-11-11 14:23:21 +01:00
test_text.py	Rework Chinese language initialization and tokenization (#4619 )	2019-11-11 14:23:21 +01:00
test_tokenizer.py	Rework Chinese language initialization and tokenization (#4619 )	2019-11-11 14:23:21 +01:00