spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-04 02:46:40 +03:00

History

Adriane Boyd 39ebcd9ec9 Refactor Chinese tokenizer configuration (#5736 ) * Refactor Chinese tokenizer configuration Refactor `ChineseTokenizer` configuration so that it uses a single `segmenter` setting to choose between character segmentation, jieba, and pkuseg. * replace `use_jieba`, `use_pkuseg`, `require_pkuseg` with the setting `segmenter` with the supported values: `char`, `jieba`, `pkuseg` * make the default segmenter plain character segmentation `char` (no additional libraries required) * Fix Chinese serialization test to use char default * Warn if attempting to customize other segmenter Add a warning if `Chinese.pkuseg_update_user_dict` is called when another segmenter is selected.		2020-07-19 13:34:37 +02:00
..
__init__.py	Refactor Chinese tokenizer configuration (#5736 )	2020-07-19 13:34:37 +02:00
examples.py	Tidy up and auto-format	2020-02-18 15:38:18 +01:00
lex_attrs.py	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
stop_words.py	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
tag_map.py	Merge branch 'develop' into master-tmp	2020-06-20 15:52:00 +02:00