spaCy/spacy/tests/lang
Adriane Boyd 39ebcd9ec9
Refactor Chinese tokenizer configuration (#5736)
* Refactor Chinese tokenizer configuration

Refactor `ChineseTokenizer` configuration so that it uses a single
`segmenter` setting to choose between character segmentation, jieba, and
pkuseg.

* replace `use_jieba`, `use_pkuseg`, `require_pkuseg` with the setting
`segmenter` with the supported values: `char`, `jieba`, `pkuseg`
* make the default segmenter plain character segmentation `char` (no
additional libraries required)

* Fix Chinese serialization test to use char default

* Warn if attempting to customize other segmenter

Add a warning if `Chinese.pkuseg_update_user_dict` is called when
another segmenter is selected.
2020-07-19 13:34:37 +02:00
..
ar Tidy up and auto-format 2020-02-18 15:38:18 +01:00
bn Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ca Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
da Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
de Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
el Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
en Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
es Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
eu Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
fa Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
fi Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
fr Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
ga Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
gu Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
he Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
hu Tidy up and auto-format 2020-03-25 12:28:12 +01:00
hy Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
id Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
it Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ja Tidy up and auto-format 2020-06-21 22:38:04 +02:00
ko Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
lb Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
lt Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
ml Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
nb Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
nl Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pl Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
pt Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ro Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ru Un-xfail passing tests 2019-12-25 18:02:20 +01:00
sr Un-xfail passing tests 2019-12-25 18:02:20 +01:00
sv Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
th Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
tr Move lookup tables out of the core library (#4346) 2019-10-01 00:01:27 +02:00
tt Merge branch 'master' into develop 2020-02-18 14:47:23 +01:00
uk Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
ur Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
yo Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
zh Refactor Chinese tokenizer configuration (#5736) 2020-07-19 13:34:37 +02:00
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_attrs.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
test_initialize.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00