spaCy/website/docs/api
BLKSerene 7b1d6e58ff
Remove dependency on langcodes (#13760)
This PR removes the dependency on langcodes introduced in #9342.

While the introduction of langcodes allows a significantly wider range of language codes, there are some unexpected side effects:

    zh-Hant (Traditional Chinese) should be mapped to zh intead of None, as spaCy's Chinese model is based on pkuseg which supports tokenization of both Simplified and Traditional Chinese.
    Since it is possible that spaCy may have a model for Norwegian Nynorsk in the future, mapping no (macrolanguage Norwegian) to nb (Norwegian Bokmål) might be misleading. In that case, the user should be asked to specify nb or nn (Norwegian Nynorsk) specifically or consult the doc.
    Same as above for regional variants of languages such as en_gb and en_us.

Overall, IMHO, introducing an extra dependency just for the conversion of language codes is an overkill. It is possible that most user just need the conversion between 2/3-letter ISO codes and a simple dictionary lookup should suffice.

With this PR, ISO 639-1 and ISO 639-3 codes are supported. ISO 639-2/B (bibliographic codes which are not favored and used in ISO 639-3) and deprecated ISO 639-1/2 codes are also supported to maximize backward compatibility.
2025-05-28 17:21:46 +02:00
..
architectures.mdx Add spacy.TextCatParametricAttention.v1 (#13201) 2024-01-02 10:03:06 +01:00
attributeruler.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
attributes.mdx Fix typos in docs (#13466) 2024-04-29 11:10:17 +02:00
basevectors.mdx Support registered vectors (#12492) 2023-08-01 15:46:08 +02:00
cli.mdx Remove dependency on langcodes (#13760) 2025-05-28 17:21:46 +02:00
coref.mdx corrected example code (#12466) 2023-03-27 11:32:49 +02:00
corpus.mdx Add spacy.PlainTextCorpusReader.v1 (#12122) 2023-01-26 11:33:22 +01:00
curatedtransformer.mdx Docs: update trf_data examples and pipeline design info (#13164) 2023-12-04 15:15:54 +01:00
cython-classes.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
cython-structs.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
cython.mdx fix typos (#13813) 2025-05-26 16:05:29 +02:00
data-formats.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
dependencymatcher.mdx docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 (#12531) 2023-04-17 13:14:01 +02:00
dependencyparser.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
doc.mdx Backslash fixes in docs (#12213) 2023-02-01 10:15:38 +01:00
docbin.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
edittreelemmatizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
entitylinker.mdx Fix typos in docs (#13466) 2024-04-29 11:10:17 +02:00
entityrecognizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
entityruler.mdx Fix typos in docs (#13466) 2024-04-29 11:10:17 +02:00
example.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
index.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
inmemorylookupkb.mdx Update inmemorylookupkb.mdx (#12586) 2023-05-02 12:51:13 +02:00
kb.mdx API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar (#12128) 2023-01-19 13:29:17 +01:00
language.mdx Remove dependency on langcodes (#13760) 2025-05-28 17:21:46 +02:00
large-language-models.mdx Fix typo (#13657) [ci skip] 2024-10-23 12:06:36 +02:00
legacy.mdx Add TextCatReduce.v1 (#13181) 2023-12-21 11:00:06 +01:00
lemmatizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
lexeme.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
lookups.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
matcher.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
morphologizer.mdx Tagger label smoothing (#12293) 2023-03-22 12:17:56 +01:00
morphology.mdx fix docs for MorphAnalysis.__contains__ (#13433) 2024-05-02 16:46:41 +02:00
phrasematcher.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
pipe.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
pipeline-functions.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
scorer.mdx Add scorer option to return per-component scores (#12540) 2023-05-12 15:36:54 +02:00
sentencerecognizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
sentencizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
span-resolver.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
span.mdx Fix typos in docs (#13466) 2024-04-29 11:10:17 +02:00
spancategorizer.mdx Fix spancat typo. (#13095) 2023-10-31 13:45:10 +01:00
spanfinder.mdx Update max_length default in span finder docs (#12803) 2023-07-07 10:17:41 +02:00
spangroup.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
spanruler.mdx fix (#12881) 2023-08-03 08:37:43 +02:00
stringstore.mdx Add info to stringstore and vocab (#12471) 2023-03-27 13:15:14 +02:00
tagger.mdx Tagger label smoothing (#12293) 2023-03-22 12:17:56 +01:00
textcategorizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
tok2vec.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
token.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
tokenizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
top-level.mdx Remove dependency on langcodes (#13760) 2025-05-28 17:21:46 +02:00
transformer.mdx Fix typos in docs (#13466) 2024-04-29 11:10:17 +02:00
vectors.mdx Support registered vectors (#12492) 2023-08-01 15:46:08 +02:00
vocab.mdx Clarify vocab docs (#13273) 2024-01-26 10:58:48 +01:00