mirror of
https://github.com/explosion/spaCy.git
synced 2025-02-28 09:30:38 +03:00
Add Finnish, Korean, and Swedish models and Korean support notes (#10355)
* Add Finnish, Korean, and Swedish models to website * Add Korean language support notes
This commit is contained in:
parent
5ca0dbae76
commit
b2bbefd0b5
|
@ -259,6 +259,45 @@ used for training the current [Japanese pipelines](/models/ja).
|
|||
|
||||
</Infobox>
|
||||
|
||||
### Korean language support {#korean}
|
||||
|
||||
> #### mecab-ko tokenizer
|
||||
>
|
||||
> ```python
|
||||
> nlp = spacy.blank("ko")
|
||||
> ```
|
||||
|
||||
The default MeCab-based Korean tokenizer requires:
|
||||
|
||||
- [mecab-ko](https://bitbucket.org/eunjeon/mecab-ko/src/master/README.md)
|
||||
- [mecab-ko-dic](https://bitbucket.org/eunjeon/mecab-ko-dic)
|
||||
- [natto-py](https://github.com/buruzaemon/natto-py)
|
||||
|
||||
For some Korean datasets and tasks, the
|
||||
[rule-based tokenizer](/usage/linguistic-features#tokenization) is better-suited
|
||||
than MeCab. To configure a Korean pipeline with the rule-based tokenizer:
|
||||
|
||||
> #### Rule-based tokenizer
|
||||
>
|
||||
> ```python
|
||||
> config = {"nlp": {"tokenizer": {"@tokenizers": "spacy.Tokenizer.v1"}}}
|
||||
> nlp = spacy.blank("ko", config=config)
|
||||
> ```
|
||||
|
||||
```ini
|
||||
### config.cfg
|
||||
[nlp]
|
||||
lang = "ko"
|
||||
tokenizer = {"@tokenizers" = "spacy.Tokenizer.v1"}
|
||||
```
|
||||
|
||||
<Infobox>
|
||||
|
||||
The [Korean trained pipelines](/models/ko) use the rule-based tokenizer, so no
|
||||
additional dependencies are required.
|
||||
|
||||
</Infobox>
|
||||
|
||||
## Installing and using trained pipelines {#download}
|
||||
|
||||
The easiest way to download a trained pipeline is via spaCy's
|
||||
|
@ -417,10 +456,10 @@ doc = nlp("This is a sentence.")
|
|||
<Infobox title="Tip: Preview model info" emoji="💡">
|
||||
|
||||
You can use the [`info`](/api/cli#info) command or
|
||||
[`spacy.info()`](/api/top-level#spacy.info) method to print a pipeline
|
||||
package's meta data before loading it. Each `Language` object with a loaded
|
||||
pipeline also exposes the pipeline's meta data as the attribute `meta`. For
|
||||
example, `nlp.meta['version']` will return the package version.
|
||||
[`spacy.info()`](/api/top-level#spacy.info) method to print a pipeline package's
|
||||
meta data before loading it. Each `Language` object with a loaded pipeline also
|
||||
exposes the pipeline's meta data as the attribute `meta`. For example,
|
||||
`nlp.meta['version']` will return the package version.
|
||||
|
||||
</Infobox>
|
||||
|
||||
|
|
|
@ -114,7 +114,12 @@
|
|||
{
|
||||
"code": "fi",
|
||||
"name": "Finnish",
|
||||
"has_examples": true
|
||||
"has_examples": true,
|
||||
"models": [
|
||||
"fi_core_news_sm",
|
||||
"fi_core_news_md",
|
||||
"fi_core_news_lg"
|
||||
]
|
||||
},
|
||||
{
|
||||
"code": "fr",
|
||||
|
@ -227,7 +232,12 @@
|
|||
}
|
||||
],
|
||||
"example": "이것은 문장입니다.",
|
||||
"has_examples": true
|
||||
"has_examples": true,
|
||||
"models": [
|
||||
"ko_core_news_sm",
|
||||
"ko_core_news_md",
|
||||
"ko_core_news_lg"
|
||||
]
|
||||
},
|
||||
{
|
||||
"code": "ky",
|
||||
|
@ -388,7 +398,12 @@
|
|||
{
|
||||
"code": "sv",
|
||||
"name": "Swedish",
|
||||
"has_examples": true
|
||||
"has_examples": true,
|
||||
"models": [
|
||||
"sv_core_news_sm",
|
||||
"sv_core_news_md",
|
||||
"sv_core_news_lg"
|
||||
]
|
||||
},
|
||||
{
|
||||
"code": "ta",
|
||||
|
|
Loading…
Reference in New Issue
Block a user