mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-11 12:18:04 +03:00
0b9a5f4074
* Rework Chinese language initialization * Create a `ChineseTokenizer` class * Modify jieba post-processing to handle whitespace correctly * Modify non-jieba character tokenization to handle whitespace correctly * Add a `create_tokenizer()` method to `ChineseDefaults` * Load lexical attributes * Update Chinese tag_map for UD v2 * Add very basic Chinese tests * Test tokenization with and without jieba * Test `like_num` attribute * Fix try_jieba_import() * Fix zh code formatting |
||
---|---|---|
.. | ||
__init__.py | ||
test_text.py | ||
test_tokenizer.py |