mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-28 06:31:12 +03:00
* Rework Chinese language initialization * Create a `ChineseTokenizer` class * Modify jieba post-processing to handle whitespace correctly * Modify non-jieba character tokenization to handle whitespace correctly * Add a `create_tokenizer()` method to `ChineseDefaults` * Load lexical attributes * Update Chinese tag_map for UD v2 * Add very basic Chinese tests * Test tokenization with and without jieba * Test `like_num` attribute * Fix try_jieba_import() * Fix zh code formatting |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_text.py | ||
| test_tokenizer.py | ||