Add Finnish, Korean, and Swedish models and Korean support notes (#10355)

* Add Finnish, Korean, and Swedish models to website

* Add Korean language support notes
This commit is contained in:
Adriane Boyd 2022-03-07 17:03:45 +01:00 committed by GitHub
parent 5ca0dbae76
commit b2bbefd0b5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 61 additions and 7 deletions

View File

@ -259,6 +259,45 @@ used for training the current [Japanese pipelines](/models/ja).
</Infobox> </Infobox>
### Korean language support {#korean}
> #### mecab-ko tokenizer
>
> ```python
> nlp = spacy.blank("ko")
> ```
The default MeCab-based Korean tokenizer requires:
- [mecab-ko](https://bitbucket.org/eunjeon/mecab-ko/src/master/README.md)
- [mecab-ko-dic](https://bitbucket.org/eunjeon/mecab-ko-dic)
- [natto-py](https://github.com/buruzaemon/natto-py)
For some Korean datasets and tasks, the
[rule-based tokenizer](/usage/linguistic-features#tokenization) is better-suited
than MeCab. To configure a Korean pipeline with the rule-based tokenizer:
> #### Rule-based tokenizer
>
> ```python
> config = {"nlp": {"tokenizer": {"@tokenizers": "spacy.Tokenizer.v1"}}}
> nlp = spacy.blank("ko", config=config)
> ```
```ini
### config.cfg
[nlp]
lang = "ko"
tokenizer = {"@tokenizers" = "spacy.Tokenizer.v1"}
```
<Infobox>
The [Korean trained pipelines](/models/ko) use the rule-based tokenizer, so no
additional dependencies are required.
</Infobox>
## Installing and using trained pipelines {#download} ## Installing and using trained pipelines {#download}
The easiest way to download a trained pipeline is via spaCy's The easiest way to download a trained pipeline is via spaCy's
@ -417,10 +456,10 @@ doc = nlp("This is a sentence.")
<Infobox title="Tip: Preview model info" emoji="💡"> <Infobox title="Tip: Preview model info" emoji="💡">
You can use the [`info`](/api/cli#info) command or You can use the [`info`](/api/cli#info) command or
[`spacy.info()`](/api/top-level#spacy.info) method to print a pipeline [`spacy.info()`](/api/top-level#spacy.info) method to print a pipeline package's
package's meta data before loading it. Each `Language` object with a loaded meta data before loading it. Each `Language` object with a loaded pipeline also
pipeline also exposes the pipeline's meta data as the attribute `meta`. For exposes the pipeline's meta data as the attribute `meta`. For example,
example, `nlp.meta['version']` will return the package version. `nlp.meta['version']` will return the package version.
</Infobox> </Infobox>

View File

@ -114,7 +114,12 @@
{ {
"code": "fi", "code": "fi",
"name": "Finnish", "name": "Finnish",
"has_examples": true "has_examples": true,
"models": [
"fi_core_news_sm",
"fi_core_news_md",
"fi_core_news_lg"
]
}, },
{ {
"code": "fr", "code": "fr",
@ -227,7 +232,12 @@
} }
], ],
"example": "이것은 문장입니다.", "example": "이것은 문장입니다.",
"has_examples": true "has_examples": true,
"models": [
"ko_core_news_sm",
"ko_core_news_md",
"ko_core_news_lg"
]
}, },
{ {
"code": "ky", "code": "ky",
@ -388,7 +398,12 @@
{ {
"code": "sv", "code": "sv",
"name": "Swedish", "name": "Swedish",
"has_examples": true "has_examples": true,
"models": [
"sv_core_news_sm",
"sv_core_news_md",
"sv_core_news_lg"
]
}, },
{ {
"code": "ta", "code": "ta",