spaCy/website/docs
Adriane Boyd 2a558a7cdc
Switch to mecab-ko as default Korean tokenizer (#11294)
* Switch to mecab-ko as default Korean tokenizer

Switch to the (confusingly-named) mecab-ko python module for default Korean
tokenization.

Maintain the previous `natto-py` tokenizer as
`spacy.KoreanNattoTokenizer.v1`.

* Temporarily run tests with mecab-ko tokenizer

* Fix types

* Fix duplicate test names

* Update requirements test

* Revert "Temporarily run tests with mecab-ko tokenizer"

This reverts commit d2083e7044.

* Add mecab_args setting, fix pickle for KoreanNattoTokenizer

* Fix length check

* Update docs

* Formatting

* Update natto-py error message

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-08-26 10:11:18 +02:00
..
api Merge remote-tracking branch 'upstream/develop' into chore/update-v4-from-develop 2022-08-24 20:43:07 +02:00
images Docs for v3.3 (#10628) 2022-04-28 14:09:35 +02:00
models oov confusion fix (#10828) 2022-05-23 09:15:51 +02:00
usage Switch to mecab-ko as default Korean tokenizer (#11294) 2022-08-26 10:11:18 +02:00
index.md 💫 Update website (#3285) 2019-02-17 19:31:19 +01:00
styleguide.md Update styleguide [ci skip] 2020-09-14 11:25:57 +02:00