* Spacy Cli info method causing backward compatibility issues #6791
fix backward compatibility by setting default value to exclude in info
method.
* setting empty list as default argument is dangerous.
so setting default to None and then setting it to emptylist, if None.
Reference : https://nikos7am.com/posts/mutable-default-arguments/
* Adding contributor agreement for user werew
* [DependencyMatcher] Comment and clean code
* [DependencyMatcher] Use defaultdicts
* [DependencyMatcher] Simplify _retrieve_tree method
* [DependencyMatcher] Remove prepended underscores
* [DependencyMatcher] Address TODO and move grouping of token's positions out of the loop
* [DependencyMatcher] Remove _nodes attribute
* [DependencyMatcher] Use enumerate in _retrieve_tree method
* [DependencyMatcher] Clean unused vars and use camel_case naming
* [DependencyMatcher] Memoize node+operator map
* Add root property to Token
* [DependencyMatcher] Groups matches by root
* [DependencyMatcher] Remove unused _keys_to_token attribute
* [DependencyMatcher] Use a list to map tokens to matcher's keys
* [DependencyMatcher] Remove recursion
* [DependencyMatcher] Use a generator to retrieve matches
* [DependencyMatcher] Remove unused memory pool
* [DependencyMatcher] Hide private methods and attributes
* [DependencyMatcher] Improvements to the matches validation
* Apply suggestions from code review
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* [DependencyMatcher] Fix keys_to_position_maps
* Remove Token.root property
* [DependencyMatcher] Remove functools' lru_cache
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* Update stop_words.py
Added three aditional stopwords: "a" and "o" that means "the", and "e" that means "and"
* Create cristianasp.md
* zero edit to push CI
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* add syntax iterators for danish
* add test noun chunks for danish syntax iterators
* add contributor agreement
* update da syntax iterators to remove nested chunks
* add tests for da noun chunks
* Fix test
* add missing import
* fix example
* Prevent overlapping noun chunks
Prevent overlapping noun chunks by tracking the end index of the
previous noun chunk span.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Add Amharic to space
* clean up
* Add some PRON_LEMMA
* add Tigrinya support
* remove text_noun_chunks
* Tigrinya Support
* added some more details for ti
* fix unit test
* add amharic char range
* changes from review
* amharic and tigrinya share same unicode block
* get rid of _amharic/_tigrinya in char_classes
Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
* Avoid a SyntaxError in self-attentive-parser
Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser
* Create forest1988.md
Fill in the spaCy contributor agreement
* Adding Mindmeld to Universe JSON
Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/
* Signing contribution agreement.
Co-authored-by: kunshar2 <kunshar2@cisco.com>
* Include Macedonian language
* Fix indentation at char_classes.py
* Fix indentation at char_classes.py
* Add Macedonian tests, update lex_attrs and char_classes
* Import unicode literals for python 2
* Regression test for issue 6207
* Fix issue 6207
* Sign contributor agreement
* Minor adjustments to test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Hindi: Adds tests for lexical attributes (norm and like_num)
* Signs and sdds the contributor agreement
* Add ordinal numbers to be tagged as like_num
* Adds alternate pronunciation for 31 and 39
* Regression test for issue 6207
* Fix issue 6207
* Sign contributor agreement
* Minor adjustments to test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* create contributor agreement
* Update Indonesian example. (see #1107)
Update Indonesian examples with more proper phrases. the current phrases contains sensitive and violent words.
* Update stop_words.py
Hebrew STOP WORDS
* Update stop_words.py
* contributor
* contributor
* add some common domain extentions
support human number 1K/1M....
* support human number 1K/1M....
* hebrew number tokenize
1K/1M implement in EN
* test human tokenize fix
* test
* heb like num
revert human number change
* heb like num