spaCy/spacy
BLKSerene 7b1d6e58ff
Remove dependency on langcodes (#13760)
This PR removes the dependency on langcodes introduced in #9342.

While the introduction of langcodes allows a significantly wider range of language codes, there are some unexpected side effects:

    zh-Hant (Traditional Chinese) should be mapped to zh intead of None, as spaCy's Chinese model is based on pkuseg which supports tokenization of both Simplified and Traditional Chinese.
    Since it is possible that spaCy may have a model for Norwegian Nynorsk in the future, mapping no (macrolanguage Norwegian) to nb (Norwegian Bokmål) might be misleading. In that case, the user should be asked to specify nb or nn (Norwegian Nynorsk) specifically or consult the doc.
    Same as above for regional variants of languages such as en_gb and en_us.

Overall, IMHO, introducing an extra dependency just for the conversion of language codes is an overkill. It is possible that most user just need the conversion between 2/3-letter ISO codes and a simple dictionary lookup should suffice.

With this PR, ISO 639-1 and ISO 639-3 codes are supported. ISO 639-2/B (bibliographic codes which are not favored and used in ISO 639-3) and deprecated ISO 639-1/2 codes are also supported to maximize backward compatibility.
2025-05-28 17:21:46 +02:00
..
cli fix typos (#13813) 2025-05-26 16:05:29 +02:00
displacy Fix displacy span stacking (#13068) 2023-11-02 12:02:18 +01:00
kb Update __all__ fields (#13063) 2023-10-16 10:17:47 +02:00
lang Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
matcher Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
ml Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
pipeline fix: match hyphenated words to lemmas in index_table (e.g. "co-authored" -> "co-author") (#13816) 2025-05-27 01:20:26 +02:00
tests Remove dependency on langcodes (#13760) 2025-05-28 17:21:46 +02:00
tokens Test and fix issue13769 2025-05-28 17:04:23 +02:00
training Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Increment version 2025-05-22 13:58:00 +02:00
attrs.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
attrs.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
compat.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Support registered vectors (#12492) 2023-08-01 15:46:08 +02:00
errors.py Make Language.pipe workers exit cleanly (#13321) 2024-02-12 14:39:38 +01:00
glossary.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
language.py Remove dependency on langcodes (#13760) 2025-05-28 17:21:46 +02:00
lexeme.pxd Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
lexeme.pyi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
lexeme.pyx Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
lookups.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
morphology.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
morphology.pyx Fix allocation of non-transient strings in StringStore (#13713) 2024-12-11 13:06:53 +01:00
parts_of_speech.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
parts_of_speech.pyx Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
pipe_analysis.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
registrations.py Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
schemas.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1 2023-09-28 15:09:06 +02:00
scorer.py Update for numpy 2.0 deprecations (#13103) 2023-11-06 08:47:53 +01:00
strings.pxd Fix memory zones 2024-09-09 13:49:41 +02:00
strings.pyi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
strings.pyx Fix allocation of non-transient strings in StringStore (#13713) 2024-12-11 13:06:53 +01:00
structs.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
symbols.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
symbols.pyx Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
tokenizer.pxd Support 'memory zones' for user memory management (#13621) 2024-09-09 11:19:39 +02:00
tokenizer.pyx Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
ty.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
typedefs.pxd Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
typedefs.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
util.py Remove dependency on langcodes (#13760) 2025-05-28 17:21:46 +02:00
vectors.pyx Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
vocab.pxd Support 'memory zones' for user memory management (#13621) 2024-09-09 11:19:39 +02:00
vocab.pyi Support 'memory zones' for user memory management (#13621) 2024-09-09 11:19:39 +02:00
vocab.pyx Fix memory zones 2024-09-09 13:49:41 +02:00