spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-16 03:02:41 +03:00

History

adrianeboyd 0b9a5f4074 Rework Chinese language initialization and tokenization (#4619 ) * Rework Chinese language initialization * Create a `ChineseTokenizer` class * Modify jieba post-processing to handle whitespace correctly * Modify non-jieba character tokenization to handle whitespace correctly * Add a `create_tokenizer()` method to `ChineseDefaults` * Load lexical attributes * Update Chinese tag_map for UD v2 * Add very basic Chinese tests * Test tokenization with and without jieba * Test `like_num` attribute * Fix try_jieba_import() * Fix zh code formatting		2019-11-11 14:23:21 +01:00
..
__init__.py	Rework Chinese language initialization and tokenization (#4619 )	2019-11-11 14:23:21 +01:00
examples.py	Tidy up and auto-format	2019-08-20 17:36:34 +02:00
lex_attrs.py	Tidy up and auto-format	2019-08-20 17:36:34 +02:00
stop_words.py	💫 Tidy up and auto-format .py files (#2983 )	2018-11-30 17:03:03 +01:00
tag_map.py	Rework Chinese language initialization and tokenization (#4619 )	2019-11-11 14:23:21 +01:00