mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-24 20:51:30 +03:00
* Refactor Chinese tokenizer configuration Refactor `ChineseTokenizer` configuration so that it uses a single `segmenter` setting to choose between character segmentation, jieba, and pkuseg. * replace `use_jieba`, `use_pkuseg`, `require_pkuseg` with the setting `segmenter` with the supported values: `char`, `jieba`, `pkuseg` * make the default segmenter plain character segmentation `char` (no additional libraries required) * Fix Chinese serialization test to use char default * Warn if attempting to customize other segmenter Add a warning if `Chinese.pkuseg_update_user_dict` is called when another segmenter is selected. |
||
|---|---|---|
| .. | ||
| 101 | ||
| _benchmarks-choi.md | ||
| facts-figures.md | ||
| index.md | ||
| linguistic-features.md | ||
| models.md | ||
| processing-pipelines.md | ||
| projects.md | ||
| rule-based-matching.md | ||
| saving-loading.md | ||
| spacy-101.md | ||
| training.md | ||
| v2-1.md | ||
| v2-2.md | ||
| v2-3.md | ||
| v2.md | ||
| v3.md | ||
| vectors-embeddings.md | ||
| visualizers.md | ||