spaCy/spacy/lang/ht/examples.py
Jeff Adolphe 41e07772dc
Added Haitian Creole (ht) Language Support to spaCy (#13807)
This PR adds official support for Haitian Creole (ht) to spaCy's spacy/lang module.
It includes:

    Added all core language data files for spacy/lang/ht:
        tokenizer_exceptions.py
        punctuation.py
        lex_attrs.py
        syntax_iterators.py
        lemmatizer.py
        stop_words.py
        tag_map.py

    Unit tests for tokenizer and noun chunking (test_tokenizer.py, test_noun_chunking.py, etc.). Passed all 58 pytest spacy/tests/lang/ht tests that I've created.

    Basic tokenizer rules adapted for Haitian Creole orthography and informal contractions.

    Custom like_num atrribute supporting Haitian number formats (e.g., "3yèm").

    Support for common informal apostrophe usage (e.g., "m'ap", "n'ap", "di'm").

    Ensured no breakages in other language modules.

    Followed spaCy coding style (PEP8, Black).

This provides a foundation for Haitian Creole NLP development using spaCy.
2025-05-28 17:23:38 +02:00

19 lines
540 B
Python

"""
Example sentences to test spaCy and its language models.
>>> from spacy.lang.ht.examples import sentences
>>> docs = nlp.pipe(sentences)
"""
sentences = [
"Apple ap panse achte yon demaraj nan Wayòm Ini pou $1 milya dola",
"Machin otonòm fè responsablite asirans lan ale sou men fabrikan yo",
"San Francisco ap konsidere entèdi robo ki livre sou twotwa yo",
"Lond se yon gwo vil nan Wayòm Ini",
"Kote ou ye?",
"Kilès ki prezidan Lafrans?",
"Ki kapital Etazini?",
"Kile Barack Obama te fèt?",
]