Add Finnish, Korean, and Swedish models and Korean support notes (#10355)

* Add Finnish, Korean, and Swedish models to website * Add Korean language support notes
2025-07-17 11:42:30 +03:00 · 2022-03-07 17:03:45 +01:00 · 2022-03-07 17:03:45 +01:00 · b2bbefd0b5
commit b2bbefd0b5
parent 5ca0dbae76
2 changed files with 61 additions and 7 deletions
--- a/website/docs/usage/models.md
+++ b/website/docs/usage/models.md
@ -259,6 +259,45 @@ used for training the current [Japanese pipelines](/models/ja).
 </Infobox>
 ### Korean language support {#korean}
 > #### mecab-ko tokenizer
 >
 > ```python
 > nlp = spacy.blank("ko")
 > ```
 The default MeCab-based Korean tokenizer requires:
 - [mecab-ko](https://bitbucket.org/eunjeon/mecab-ko/src/master/README.md)
 - [mecab-ko-dic](https://bitbucket.org/eunjeon/mecab-ko-dic)
 - [natto-py](https://github.com/buruzaemon/natto-py)
 For some Korean datasets and tasks, the
 [rule-based tokenizer](/usage/linguistic-features#tokenization) is better-suited
 than MeCab. To configure a Korean pipeline with the rule-based tokenizer:
 > #### Rule-based tokenizer
 >
 > ```python
 > config = {"nlp": {"tokenizer": {"@tokenizers": "spacy.Tokenizer.v1"}}}
 > nlp = spacy.blank("ko", config=config)
 > ```
 ```ini
 ### config.cfg
 [nlp]
 lang = "ko"
 tokenizer = {"@tokenizers" = "spacy.Tokenizer.v1"}
 ```
 <Infobox>
 The [Korean trained pipelines](/models/ko) use the rule-based tokenizer, so no
 additional dependencies are required.
 </Infobox>
 ## Installing and using trained pipelines {#download}
 The easiest way to download a trained pipeline is via spaCy's
@ -417,10 +456,10 @@ doc = nlp("This is a sentence.")
 <Infobox title="Tip: Preview model info" emoji="💡">
 You can use the [`info`](/api/cli#info) command or
-[`spacy.info()`](/api/top-level#spacy.info) method to print a pipeline
+[`spacy.info()`](/api/top-level#spacy.info) method to print a pipeline package's
-package's meta data before loading it. Each `Language` object with a loaded
+meta data before loading it. Each `Language` object with a loaded pipeline also
-pipeline also exposes the pipeline's meta data as the attribute `meta`. For
+exposes the pipeline's meta data as the attribute `meta`. For example,
-example, `nlp.meta['version']` will return the package version.
+`nlp.meta['version']` will return the package version.
 </Infobox>
--- a/website/meta/languages.json
+++ b/website/meta/languages.json
@ -114,7 +114,12 @@
        {
            "code": "fi",
            "name": "Finnish",
-            "has_examples": true
+            "has_examples": true,
            "models": [
                "fi_core_news_sm",
                "fi_core_news_md",
                "fi_core_news_lg"
            ]
        },
        {
            "code": "fr",
@ -227,7 +232,12 @@
                }
            ],
            "example": "이것은 문장입니다.",
-            "has_examples": true
+            "has_examples": true,
            "models": [
                "ko_core_news_sm",
                "ko_core_news_md",
                "ko_core_news_lg"
            ]
        },
        {
            "code": "ky",
@ -388,7 +398,12 @@
        {
            "code": "sv",
            "name": "Swedish",
-            "has_examples": true
+            "has_examples": true,
            "models": [
                "sv_core_news_sm",
                "sv_core_news_md",
                "sv_core_news_lg"
            ]
        },
        {
            "code": "ta",