While working on an unrelated task I got warnings about an unsupported
escape sequence (`"\("`) in the tokenizer exceptions. Making the
tokenizer exceptions a raw string makes this warning go away.
The specific string that triggered this is `¯\(ツ)/¯`.
* pytest file for issue4104 established
* edited default lookup english lemmatizer for spun; fixes issue 4102
* eliminated parameterization and sorted dictionary dependnency in issue 4104 test
* added contributor agreement
* Norwegian fix
Add support for alternative past tense verb form (vaska).
* Norwegian months
Add all Norwegian months to tokenizer excpetions.
* More Norwegian abbreviations
Add more Norwegian abbreviations to tokenizer_exceptions.
* Contributor agreement khellan
Add signed contributor agreement for khellan (Knut O. Hellan).
* initial LT lang support
* Added more stopwords. Started setting up some basic test environment (not complete)
* Initial morph rules for LT lang
* Closes#1 Adds tokenizer exceptions for Lithuanian
* Closes#5 Punctuation rules. Closes#6 Lexical Attributes
* test: add native examples to basic tests
* feat: add tag map for lt lang
* fix: remove undefined tag attribute 'Definite'
* feat: add lemmatizer for lt lang
* refactor: add new instances to lt lang morph rules; use tags from tag map
* refactor: add morph rules to lt lang defaults
* refactor: only keep nouns, verbs, adverbs and adjectives in lt lang lemmatizer lookup
* refactor: add capitalized words to lt lang lemmatizer
* refactor: add more num words to lt lang lex attrs
* refactor: update lt lang stop word set
* refactor: add new instances to lt lang tokenizer exceptions
* refactor: remove comments form lt lang init file
* refactor: use function instead of lambda in lt lex lang getter
* refactor: remove conversion to dict in lt init when dict is already provided
* chore: rename lt 'test_basic' to 'test_text'
* feat: add more lt text tests
* feat: add lemmatizer tests
* refactor: remove unused imports, add newline to end of file
* chore: add contributor agreement
* chore: change 'en' to 'lt' in lt example description
* fix: add missing encoding info
* style: add newline to end of file
* refactor: use python2 compatible syntax
* style: reformat code using black
* Update norm_exceptions.py
Extended the Currency set to include Franc, Indian Rupee, Bangladeshi Taka, Korean Won, Mexican Dollar, and Egyptian Pound
* Fix formatting [ci skip]
* Adding Marathi language details and folder to it
* Adding few changes and running tests
* Adding few changes and running tests
* Update __init__.py
mh -> mr
* Rename spacy/lang/mh/__init__.py to spacy/lang/mr/__init__.py
* mh -> mr
* test sPacy commit to git fri 04052019 10:54
* change Data format from my format to master format
* ทัทั้งนี้ ---> ทั้งนี้
* delete stop_word translate from Eng
* Adjust formatting and readability
* add Thai norm_exception
* Add Dobita21 SCA
* editรึ : หรือ,
* Update Dobita21.md
* Auto-format
* Integrate norms into language defaults
* add acronym and some norm exception words
* add lex_attrs
* Add lexical attribute getters into the language defaults
* fix LEX_ATTRS
Co-authored-by: Donut <dobita21@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io>
* test sPacy commit to git fri 04052019 10:54
* change Data format from my format to master format
* ทัทั้งนี้ ---> ทั้งนี้
* delete stop_word translate from Eng
* Adjust formatting and readability
* add Thai norm_exception
* Add Dobita21 SCA
* editรึ : หรือ,
* Update Dobita21.md
* Auto-format
* Integrate norms into language defaults
* add acronym and some norm exception words
* test sPacy commit to git fri 04052019 10:54
* change Data format from my format to master format
* ทัทั้งนี้ ---> ทั้งนี้
* delete stop_word translate from Eng
* Adjust formatting and readability
* add Thai norm_exception
* Add Dobita21 SCA
* editรึ : หรือ,
* Update Dobita21.md
* Auto-format
* Integrate norms into language defaults
* test sPacy commit to git fri 04052019 10:54
* change Data format from my format to master format
* ทัทั้งนี้ ---> ทั้งนี้
* delete stop_word translate from Eng
* Adjust formatting and readability
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
Co-authored-by: Ines Montani <ines@ines.io>
* added tag_map for indonesian
* changed tag map from .py to .txt to see if tests pass
* added symbols import
* added utf8 encoding flag
* added missing SCONJ symbol
* Auto-format
* Remove unused imports
* Make tag map available in Indonesian defaults