Christoph Purschke
a7ee4b6f17
new tests & tokenization fixes ( #4734 )
...
- added some tests for tokenization issues
- fixed some issues with tokenization of words with hyphen infix
- rewrote the "tokenizer_exceptions.py" file (stemming from the German version)
2019-12-01 23:08:21 +01:00
Ines Montani
6e303de717
Auto-format
2019-11-20 13:15:24 +01:00
Christoph Purschke
433748e867
Fix basic language support for Luxembourgish (by adding punctuation.py) ( #4648 )
...
* Update __init__.py
* Create punctuation.py
* Update tokenizer_exceptions.py
* Create questoph.md
* Update questoph.md
* Update test_text.py
* Update test_text.py
* Update test_text.py
* Update test_text.py
2019-11-15 16:16:47 +01:00
Ines Montani
181c01f629
Tidy up and auto-format
2019-10-18 11:27:38 +02:00
Peter Gilles
428887b8f2
Initial commit: New language Luxembourgish (lb) ( #4424 )
...
* new language: Luxembourgish (lb)
* update
* update
* Update and rename .github/CONTRIBUTOR_AGREEMENT.md to .github/contributors/PeterGilles.md
* Update and rename .github/contributors/PeterGilles.md to .github/CONTRIBUTOR_AGREEMENT.md
* Update norm_exceptions.py
* Delete README.md
* moved test_lemma.py
* deactivated 'lemma_lookup = LOOKUP'
* update
* Update conftest.py
* update
* tests updated
* import unicode_literals
* Update spacy/tests/lang/lb/test_text.py
Co-Authored-By: Ines Montani <ines@ines.io>
* Create PeterGilles.md
2019-10-14 12:27:50 +02:00