spaCy/spacy/tests/lang
Jeff Adolphe 41e07772dc
Added Haitian Creole (ht) Language Support to spaCy (#13807)
This PR adds official support for Haitian Creole (ht) to spaCy's spacy/lang module.
It includes:

    Added all core language data files for spacy/lang/ht:
        tokenizer_exceptions.py
        punctuation.py
        lex_attrs.py
        syntax_iterators.py
        lemmatizer.py
        stop_words.py
        tag_map.py

    Unit tests for tokenizer and noun chunking (test_tokenizer.py, test_noun_chunking.py, etc.). Passed all 58 pytest spacy/tests/lang/ht tests that I've created.

    Basic tokenizer rules adapted for Haitian Creole orthography and informal contractions.

    Custom like_num atrribute supporting Haitian number formats (e.g., "3yèm").

    Support for common informal apostrophe usage (e.g., "m'ap", "n'ap", "di'm").

    Ensured no breakages in other language modules.

    Followed spaCy coding style (PEP8, Black).

This provides a foundation for Haitian Creole NLP development using spaCy.
2025-05-28 17:23:38 +02:00
..
af New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
am Tidy up and auto-format 2021-01-15 11:57:36 +11:00
ar Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
bg Handle Cyrillic combining diacritics (#10837) 2022-06-28 15:35:32 +02:00
bn Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
bo Format 2024-09-09 11:22:52 +02:00
ca Update requirements, fixing windows crashes (#13727) 2025-01-13 16:39:46 +01:00
cs Remove unicode declarations and update language data 2020-09-04 13:19:16 +02:00
da Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
de Tidy up and auto-format 2020-09-29 21:39:28 +02:00
dsb Add Lower Sorbian support. (#10431) 2022-03-07 16:57:14 +01:00
el Tidy up and auto-format 2020-09-29 21:39:28 +02:00
en Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
es Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
et New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
eu Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
fa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
fi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
fo Feature/nn and fo language extensions (#13116) 2023-11-20 07:49:59 +01:00
fr Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
ga Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
grc Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
gu Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
he Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
hi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
hr New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
hsb Add Upper Sorbian support. (#10432) 2022-03-07 16:20:39 +01:00
ht Added Haitian Creole (ht) Language Support to spaCy (#13807) 2025-05-28 17:23:38 +02:00
hu Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
hy Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
id Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
is New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
it Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
ja Update requirements, fixing windows crashes (#13727) 2025-01-13 16:39:46 +01:00
kmr Format 2024-09-09 11:22:52 +02:00
ko Update requirements, fixing windows crashes (#13727) 2025-01-13 16:39:46 +01:00
ky Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
la Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
lb Remove POS, TAG and LEMMA from tokenizer exceptions 2020-07-22 23:09:01 +02:00
lg luganda language extension (#10847) 2022-08-23 13:09:36 +02:00
lt Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
lv New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
mk Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
ml Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
ms Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
nb Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
ne Tidy up and auto-format 2020-09-29 21:39:28 +02:00
nl Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
nn Feature/nn and fo language extensions (#13116) 2023-11-20 07:49:59 +01:00
pl Update requirements, fixing windows crashes (#13727) 2025-01-13 16:39:46 +01:00
pt Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
ro Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
ru Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
sa Tidy up and auto-format 2020-09-29 21:39:28 +02:00
sk New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
sl Updates to Slovenian language (#11162) 2022-08-05 10:10:18 +02:00
sq New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
sr Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
sv Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
ta Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
th Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
ti Update Tigrinya ትግርኛ language support (#8900) 2021-08-10 13:55:08 +02:00
tl Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
tr Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
tt Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
uk Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
ur Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
vi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
xx New tests for a number of alpha languages (#9703) 2021-11-28 21:59:23 +01:00
yo Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
zh Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_attrs.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
test_initialize.py Add Kurdish Kurmanji language (#13561) 2024-09-09 11:15:40 +02:00
test_lemmatizers.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00