* Adding Mindmeld to Universe JSON
Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/
* Signing contribution agreement.
Co-authored-by: kunshar2 <kunshar2@cisco.com>
* Include Macedonian language
* Fix indentation at char_classes.py
* Fix indentation at char_classes.py
* Add Macedonian tests, update lex_attrs and char_classes
* Import unicode literals for python 2
* Hindi: Adds tests for lexical attributes (norm and like_num)
* Signs and sdds the contributor agreement
* Add ordinal numbers to be tagged as like_num
* Adds alternate pronunciation for 31 and 39
* Regression test for issue 6207
* Fix issue 6207
* Sign contributor agreement
* Minor adjustments to test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* create contributor agreement
* Update Indonesian example. (see #1107)
Update Indonesian examples with more proper phrases. the current phrases contains sensitive and violent words.
* Update stop_words.py
Hebrew STOP WORDS
* Update stop_words.py
* contributor
* contributor
* add some common domain extentions
support human number 1K/1M....
* support human number 1K/1M....
* hebrew number tokenize
1K/1M implement in EN
* test human tokenize fix
* test
* heb like num
revert human number change
* heb like num
* fix the wrong hash url in adding-languages.md file
change the #101 url hash path to #language-data
* filled in the spaCy Contributor Agreement
filled in the spaCy Contributor Agreement
* Added Examples for Tamil Sentences
#### Description
This PR add example sentences for the Tamil language which were missing as per issue #1107
#### Type of Change
This is an enhancement.
* Accepting spaCy Contributor Agreement
* Signed on my behalf as an individual
* added setting for neighbour sentence in NEL
* added spaCy contributor agreement
* added multi sentence also for training
* made the try-except block smaller
* added contextualSpellCheck in spacy universe meta
* removed extra formatting by code
* updated with permanent links
* run json linter used by spacy
* filled SCA
* updated the description
* Limiting noun_chunks for specific langauges
* Limiting noun_chunks for specific languages
Contributor Agreement
* Addressing review comments
* Removed unused fixtures and imports
* Add fa_tokenizer in test suite
* Use fa_tokenizer in test
* Undo extraneous reformatting
Co-authored-by: adrianeboyd <adrianeboyd@gmail.com>
Reconstruction of the original PR #4697 by @MiniLau.
Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema
because the Matcher is only going to be able to support `IS_SENT_START`.