spaCy/spacy
Adriane Boyd 39ebcd9ec9
Refactor Chinese tokenizer configuration (#5736)
* Refactor Chinese tokenizer configuration

Refactor `ChineseTokenizer` configuration so that it uses a single
`segmenter` setting to choose between character segmentation, jieba, and
pkuseg.

* replace `use_jieba`, `use_pkuseg`, `require_pkuseg` with the setting
`segmenter` with the supported values: `char`, `jieba`, `pkuseg`
* make the default segmenter plain character segmentation `char` (no
additional libraries required)

* Fix Chinese serialization test to use char default

* Warn if attempting to customize other segmenter

Add a warning if `Chinese.pkuseg_update_user_dict` is called when
another segmenter is selected.
2020-07-19 13:34:37 +02:00
..
cli Improve tag map initialization and updating (#5764) 2020-07-19 13:13:57 +02:00
displacy Remove object subclassing 2020-07-12 14:03:23 +02:00
gold cleanup components API (#5726) 2020-07-09 19:43:39 +02:00
lang Refactor Chinese tokenizer configuration (#5736) 2020-07-19 13:34:37 +02:00
matcher Remove object subclassing 2020-07-12 14:03:23 +02:00
ml fix doc.to_utf8 on GPU (#5757) 2020-07-13 23:05:33 +02:00
pipeline Update morphologizer (#5766) 2020-07-19 11:10:51 +02:00
syntax Explicitly delete objects after parser.update to free GPU memory (#5748) 2020-07-10 22:35:20 +02:00
tests Refactor Chinese tokenizer configuration (#5736) 2020-07-19 13:34:37 +02:00
tokens Add morph to morphology in Doc.from_array (#5762) 2020-07-14 14:07:35 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.0.0a4 2020-07-10 22:40:12 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
compat.py Merge branch 'develop' into refactor/remove-symlinks 2020-02-18 17:22:20 +01:00
errors.py Refactor Chinese tokenizer configuration (#5736) 2020-07-19 13:34:37 +02:00
glossary.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
gold.pyx Improve spacy.gold (no GoldParse, no json format!) (#5555) 2020-06-26 19:34:12 +02:00
kb.pxd Tidy up and avoid absolute spacy imports in core 2020-05-21 20:05:03 +02:00
kb.pyx Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
language.py Remove object subclassing 2020-07-12 14:03:23 +02:00
lemmatizer.py Remove object subclassing 2020-07-12 14:03:23 +02:00
lexeme.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
lexeme.pyx Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
lookups.py Remove object subclassing 2020-07-12 14:03:23 +02:00
morphology.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
morphology.pyx Improve tag map initialization and updating (#5764) 2020-07-19 13:13:57 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
schemas.py Don't use file paths in schemas 2020-07-12 12:32:08 +02:00
scorer.py Remove object subclassing 2020-07-12 14:03:23 +02:00
strings.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
strings.pyx unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00
structs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
symbols.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
symbols.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
tokenizer.pxd Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
tokenizer.pyx Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
typedefs.pxd Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Merge pull request #5747 from explosion/feature/refactor-config-args 2020-07-14 00:00:22 +02:00
vectors.pyx Remove object subclassing 2020-07-12 14:03:23 +02:00
vocab.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
vocab.pyx Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00