spaCy/spacy
Jeff Adolphe 41e07772dc
Added Haitian Creole (ht) Language Support to spaCy (#13807)
This PR adds official support for Haitian Creole (ht) to spaCy's spacy/lang module.
It includes:

    Added all core language data files for spacy/lang/ht:
        tokenizer_exceptions.py
        punctuation.py
        lex_attrs.py
        syntax_iterators.py
        lemmatizer.py
        stop_words.py
        tag_map.py

    Unit tests for tokenizer and noun chunking (test_tokenizer.py, test_noun_chunking.py, etc.). Passed all 58 pytest spacy/tests/lang/ht tests that I've created.

    Basic tokenizer rules adapted for Haitian Creole orthography and informal contractions.

    Custom like_num atrribute supporting Haitian number formats (e.g., "3yèm").

    Support for common informal apostrophe usage (e.g., "m'ap", "n'ap", "di'm").

    Ensured no breakages in other language modules.

    Followed spaCy coding style (PEP8, Black).

This provides a foundation for Haitian Creole NLP development using spaCy.
2025-05-28 17:23:38 +02:00
..
cli fix typos (#13813) 2025-05-26 16:05:29 +02:00
displacy Fix displacy span stacking (#13068) 2023-11-02 12:02:18 +01:00
kb Update __all__ fields (#13063) 2023-10-16 10:17:47 +02:00
lang Added Haitian Creole (ht) Language Support to spaCy (#13807) 2025-05-28 17:23:38 +02:00
matcher Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
ml Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
pipeline fix: match hyphenated words to lemmas in index_table (e.g. "co-authored" -> "co-author") (#13816) 2025-05-27 01:20:26 +02:00
tests Added Haitian Creole (ht) Language Support to spaCy (#13807) 2025-05-28 17:23:38 +02:00
tokens Test and fix issue13769 2025-05-28 17:04:23 +02:00
training Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Increment version 2025-05-22 13:58:00 +02:00
attrs.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
attrs.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
compat.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Support registered vectors (#12492) 2023-08-01 15:46:08 +02:00
errors.py Make Language.pipe workers exit cleanly (#13321) 2024-02-12 14:39:38 +01:00
glossary.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
language.py Remove dependency on langcodes (#13760) 2025-05-28 17:21:46 +02:00
lexeme.pxd Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
lexeme.pyi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
lexeme.pyx Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
lookups.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
morphology.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
morphology.pyx Fix allocation of non-transient strings in StringStore (#13713) 2024-12-11 13:06:53 +01:00
parts_of_speech.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
parts_of_speech.pyx Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
pipe_analysis.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
registrations.py Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
schemas.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1 2023-09-28 15:09:06 +02:00
scorer.py Update for numpy 2.0 deprecations (#13103) 2023-11-06 08:47:53 +01:00
strings.pxd Fix memory zones 2024-09-09 13:49:41 +02:00
strings.pyi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
strings.pyx Fix allocation of non-transient strings in StringStore (#13713) 2024-12-11 13:06:53 +01:00
structs.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
symbols.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
symbols.pyx Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
tokenizer.pxd Support 'memory zones' for user memory management (#13621) 2024-09-09 11:19:39 +02:00
tokenizer.pyx Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
ty.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
typedefs.pxd Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
typedefs.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
util.py Remove dependency on langcodes (#13760) 2025-05-28 17:21:46 +02:00
vectors.pyx Python 3.13 support (#13823) 2025-05-22 13:47:21 +02:00
vocab.pxd Support 'memory zones' for user memory management (#13621) 2024-09-09 11:19:39 +02:00
vocab.pyi Support 'memory zones' for user memory management (#13621) 2024-09-09 11:19:39 +02:00
vocab.pyx Fix memory zones 2024-09-09 13:49:41 +02:00