//- 💫 DOCS > USAGE > MODELS > LANGUAGE SUPPORT p spaCy currently provides models for the following languages: +table(["Language", "Code", "Language data", "Models"]) for models, code in MODELS - var count = Object.keys(models).length +row +cell=LANGUAGES[code] +cell #[code=code] +cell +src(gh("spaCy", "spacy/lang/" + code)) #[code lang/#{code}] +cell +a("/models/" + code) #{count} #{(count == 1) ? "model" : "models"} +h(3, "alpha-support") Alpha tokenization support p | Work has started on the following languages. You can help by | #[+a("/usage/adding-languages#language-data") improving the existing language data] | and extending the tokenization patterns. +aside("Usage note") | Note that the alpha languages don't yet come with a language model. In | order to use them, you have to import them directly, or use | #[+api("spacy#blank") #[code spacy.blank]]: +code.o-no-block. from spacy.lang.fi import Finnish nlp = Finnish() # use directly nlp = spacy.blank('fi') # blank instance +table(["Language", "Code", "Language data"]) - var sorted_langs = Object.assign({}, ...Object.keys(LANGUAGES).filter(key => !MODELS[key]).sort().map(key => ({ [key]: LANGUAGES[key] }))) for lang, code in sorted_langs +row +cell #{LANGUAGES[code]} +cell #[code=code] +cell +src(gh("spaCy", "spacy/lang/" + code)) #[code lang/#{code}] +infobox("Dependencies") .o-block-small Some language tokenizers require external dependencies. +list.o-no-block +item #[strong Chinese]: #[+a("https://github.com/fxsjy/jieba") Jieba] +item #[strong Japanese]: #[+a("https://github.com/taku910/mecab") MeCab] with #[+a("http://unidic.ninjal.ac.jp/back_number#unidic_cwj") Unidic] +item #[strong Thai]: #[+a("https://github.com/wannaphongcom/pythainlp") pythainlp] +item #[strong Vietnamese]: #[+a("https://github.com/trungtv/pyvi") Pyvi] +item #[strong Russian]: #[+a("https://github.com/kmike/pymorphy2") pymorphy2] +h(3, "multi-language") Multi-language support +tag-new(2) p | As of v2.0, spaCy supports models trained on more than one language. This | is especially useful for named entity recognition. The language ID used | for multi-language or language-neutral models is #[code xx]. The | language class, a generic subclass containing only the base language data, | can be found in #[+src(gh("spaCy", "spacy/lang/xx")) #[code lang/xx]]. p | To load your model with the neutral, multi-language class, simply set | #[code "language": "xx"] in your | #[+a("/usage/training#models-generating") model package]'s | meta.json. You can also import the class directly, or call | #[+api("util#get_lang_class") #[code util.get_lang_class()]] for | lazy-loading. +code("Standard import"). from spacy.lang.xx import MultiLanguage nlp = MultiLanguage() +code("With lazy-loading"). from spacy.util import get_lang_class nlp = get_lang_class('xx')