Update Japanese docs and pin for sudachipy

2026-03-06 12:51:26 +03:00 · 2020-10-02 10:12:44 +02:00 · 2020-10-02 10:12:44 +02:00 · 351f352cdc
commit 351f352cdc
parent 7670df04dd
2 changed files with 5 additions and 6 deletions
--- a/setup.cfg
+++ b/setup.cfg
@ -84,7 +84,7 @@ cuda102 =
    cupy-cuda102>=5.0.0b4,<9.0.0
 # Language tokenizers with external dependencies
 ja =
-    sudachipy>=0.4.5
+    sudachipy>=0.4.9
    sudachidict_core>=20200330
 ko =
    natto-py==0.9.0
--- a/website/docs/usage/models.md
+++ b/website/docs/usage/models.md
@ -199,20 +199,19 @@ nlp.tokenizer.initialize(pkuseg_model="/path/to/pkuseg_model")
 >
 > # Load SudachiPy with split mode B
 > cfg = {"split_mode": "B"}
-> nlp = Japanese(meta={"tokenizer": {"config": cfg}})
+> nlp = Japanese.from_config({"nlp": {"tokenizer": cfg}})
 > ```

 The Japanese language class uses
 [SudachiPy](https://github.com/WorksApplications/SudachiPy) for word
 segmentation and part-of-speech tagging. The default Japanese language class and
-the provided Japanese pipelines use SudachiPy split mode `A`. The `meta`
-argument of the `Japanese` language class can be used to configure the split
-mode to `A`, `B` or `C`.
+the provided Japanese pipelines use SudachiPy split mode `A`. The tokenizer
+config can be used to configure the split mode to `A`, `B` or `C`.

 <Infobox variant="warning">

 If you run into errors related to `sudachipy`, which is currently under active
-development, we suggest downgrading to `sudachipy==0.4.5`, which is the version
+development, we suggest downgrading to `sudachipy==0.4.9`, which is the version
 used for training the current [Japanese pipelines](/models/ja).

 </Infobox>