spaCy/website/docs
Adriane Boyd 39ebcd9ec9
Refactor Chinese tokenizer configuration (#5736)
* Refactor Chinese tokenizer configuration

Refactor `ChineseTokenizer` configuration so that it uses a single
`segmenter` setting to choose between character segmentation, jieba, and
pkuseg.

* replace `use_jieba`, `use_pkuseg`, `require_pkuseg` with the setting
`segmenter` with the supported values: `char`, `jieba`, `pkuseg`
* make the default segmenter plain character segmentation `char` (no
additional libraries required)

* Fix Chinese serialization test to use char default

* Warn if attempting to customize other segmenter

Add a warning if `Chinese.pkuseg_update_user_dict` is called when
another segmenter is selected.
2020-07-19 13:34:37 +02:00
..
api Merge pull request #5747 from explosion/feature/refactor-config-args 2020-07-14 00:00:22 +02:00
images Update example and training docs 2020-07-07 20:30:12 +02:00
models Update sidebar menu 2020-07-09 11:44:09 +02:00
usage Refactor Chinese tokenizer configuration (#5736) 2020-07-19 13:34:37 +02:00
index.md 💫 Update website (#3285) 2019-02-17 19:31:19 +01:00
styleguide.md 💫 v2.1.0 launch updates (only merge on launch!) (#3414) 2019-03-18 16:07:26 +01:00