* Add edit tree lemmatizer
Co-authored-by: Daniël de Kok <me@danieldk.eu>
* Hide edit tree lemmatizer labels
* Use relative imports
* Switch to single quotes in error message
* Type annotation fixes
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Reformat edit_tree_lemmatizer with black
* EditTreeLemmatizer.predict: take Iterable
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Validate edit trees during deserialization
This change also changes the serialized representation. Rather than
mirroring the deep C structure, we use a simple flat union of the match
and substitution node types.
* Move edit_trees to _edit_tree_internals
* Fix invalid edit tree format error message
* edit_tree_lemmatizer: remove outdated TODO comment
* Rename factory name to trainable_lemmatizer
* Ignore type instead of casting truths to List[Union[Ints1d, Floats2d, List[int], List[str]]] for thinc v8.0.14
* Switch to Tagger.v2
* Add documentation for EditTreeLemmatizer
* docs: Fix 3.2 -> 3.3 somewhere
* trainable_lemmatizer documentation fixes
* docs: EditTreeLemmatizer is in edit_tree_lemmatizer.py
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix infix as prefix in Tokenizer.explain
Update `Tokenizer.explain` to align with the `Tokenizer` algorithm:
* skip infix matches that are prefixes in the current substring
* Update tokenizer pseudocode in docs
* Update Tokenizer.explain with special matches
Update `Tokenizer.explain` and the pseudo-code in the docs to include
the processing of special cases that contain affixes or whitespace.
* Handle optional settings in explain
* Add test for special matches in explain
Add test for `Tokenizer.explain` for special cases containing affixes.
* raise NotImplementedError when noun_chunks iterator is not implemented
* bring back, fix and document span.noun_chunks
* formatting
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>