Instead of silently using only the first token in each matched span:
* Forbid `OP: ?/*/+` through `DependencyMatcher` validation
* As a fail-safe, add warning if a token match that's not exactly one
token long is found by a token pattern.
* add error handler for pipe methods
* add unit tests
* remove pipe method that are the same as their base class
* have Language keep track of a default error handler
* cleanup
* formatting
* small refactor
* add documentation
* Initial Spanish lemmatizer
* Handle merged verb+pron(s) multi-word tokens
* Use VERB for AUX rule lookup
* Add morph to lemma cache key
* Fix aux lookups, minor refactoring
* Improve verb+pron handling
* Move verb+pron handling into its own method
* Check for exceptions (primarily for se)
* Collect pronouns in the same (not reversed) order
* Only add modified possible lemmas
* Fix `spacy.util.minibatch` when the size iterator is finished (#6745)
* Skip 0-length matches (#6759)
Add hack to prevent matcher from returning 0-length matches.
* support IS_SENT_START in PhraseMatcher (#6771)
* support IS_SENT_START in PhraseMatcher
* add unit test and friendlier error
* use IDS.get instead
* ensure span.text works for an empty span (#6772)
* Remove unicode_literals
Co-authored-by: Santiago Castro <bryant@montevideo.com.uy>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Allow output_path to be None during training
* Fix cat scoring (?)
* Improve error message for weighted None score
* Improve messages
So we can call this in other places etc.
* FIx output path check
* Use latest wasabi
* Revert "Improve error message for weighted None score"
This reverts commit 7059926763.
* Exclude None scores from final score by default
It's otherwise very difficult to keep track of the score weights if we modify a config programmatically, source components etc.
* Update warnings and use logger.warning
* Spacy Cli info method causing backward compatibility issues #6791
fix backward compatibility by setting default value to exclude in info
method.
* setting empty list as default argument is dangerous.
so setting default to None and then setting it to emptylist, if None.
Reference : https://nikos7am.com/posts/mutable-default-arguments/
* Adding contributor agreement for user werew
* [DependencyMatcher] Comment and clean code
* [DependencyMatcher] Use defaultdicts
* [DependencyMatcher] Simplify _retrieve_tree method
* [DependencyMatcher] Remove prepended underscores
* [DependencyMatcher] Address TODO and move grouping of token's positions out of the loop
* [DependencyMatcher] Remove _nodes attribute
* [DependencyMatcher] Use enumerate in _retrieve_tree method
* [DependencyMatcher] Clean unused vars and use camel_case naming
* [DependencyMatcher] Memoize node+operator map
* Add root property to Token
* [DependencyMatcher] Groups matches by root
* [DependencyMatcher] Remove unused _keys_to_token attribute
* [DependencyMatcher] Use a list to map tokens to matcher's keys
* [DependencyMatcher] Remove recursion
* [DependencyMatcher] Use a generator to retrieve matches
* [DependencyMatcher] Remove unused memory pool
* [DependencyMatcher] Hide private methods and attributes
* [DependencyMatcher] Improvements to the matches validation
* Apply suggestions from code review
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* [DependencyMatcher] Fix keys_to_position_maps
* Remove Token.root property
* [DependencyMatcher] Remove functools' lru_cache
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
* warn when frozen components break listener pattern
* few notes in the documentation
* update arg name
* formatting
* cleanup
* specify listeners return type