spaCy/website/docs/api
Adriane Boyd 85778dfcf4
Add edit tree lemmatizer (#10231)
* Add edit tree lemmatizer

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* Hide edit tree lemmatizer labels

* Use relative imports

* Switch to single quotes in error message

* Type annotation fixes

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Reformat edit_tree_lemmatizer with black

* EditTreeLemmatizer.predict: take Iterable

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Validate edit trees during deserialization

This change also changes the serialized representation. Rather than
mirroring the deep C structure, we use a simple flat union of the match
and substitution node types.

* Move edit_trees to _edit_tree_internals

* Fix invalid edit tree format error message

* edit_tree_lemmatizer: remove outdated TODO comment

* Rename factory name to trainable_lemmatizer

* Ignore type instead of casting truths to List[Union[Ints1d, Floats2d, List[int], List[str]]] for thinc v8.0.14

* Switch to Tagger.v2

* Add documentation for EditTreeLemmatizer

* docs: Fix 3.2 -> 3.3 somewhere

* trainable_lemmatizer documentation fixes

* docs: EditTreeLemmatizer is in edit_tree_lemmatizer.py

Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-03-28 11:13:50 +02:00
..
architectures.md Tagger: use unnormalized probabilities for inference (#10197) 2022-03-15 14:15:31 +01:00
attributeruler.md Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
cli.md Fix references to config file in the docs & UX (#9961) 2022-01-04 14:31:26 +01:00
corpus.md Add shuffle parameter to Corpus API docs (#10220) 2022-02-07 14:55:53 +01:00
cython-classes.md Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
cython-structs.md Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
cython.md Update docs [ci skip] 2020-09-12 17:05:10 +02:00
data-formats.md Fix references to config file in the docs & UX (#9961) 2022-01-04 14:31:26 +01:00
dependencymatcher.md doc fixes 2020-09-12 17:38:54 +02:00
dependencyparser.md Fix types in API docs for moves in parser and ner (#10464) 2022-03-08 13:51:11 +01:00
doc.md Token sent attributes more consistent (#10164) 2022-02-08 08:35:37 +01:00
docbin.md Fix point typo on docbin docs (#9097) 2021-08-31 10:55:44 +02:00
edittreelemmatizer.md Add edit tree lemmatizer (#10231) 2022-03-28 11:13:50 +02:00
entitylinker.md Fix entity linker batching (#9669) 2022-03-04 09:17:36 +01:00
entityrecognizer.md Fix types in API docs for moves in parser and ner (#10464) 2022-03-08 13:51:11 +01:00
entityruler.md Add link to pattern file info in EntityRuler.initialize docs (#10091) 2022-01-19 10:45:11 +01:00
example.md Extend score_spans for overlapping & non-labeled spans (#7209) 2021-04-08 12:19:17 +02:00
index.md Update v3 docs 2020-07-03 16:48:21 +02:00
kb.md Tidy up docs 2021-06-28 12:08:15 +02:00
language.md Merge remote-tracking branch 'upstream/develop' into chore/switch-to-master-v3.2.0 2021-11-03 15:32:18 +01:00
legacy.md Clean up loggers docs (#10351) 2022-02-25 16:29:12 +01:00
lemmatizer.md Add edit tree lemmatizer (#10231) 2022-03-28 11:13:50 +02:00
lexeme.md fix 's typo's across code base (#8384) 2021-06-15 10:57:08 +02:00
lookups.md Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
matcher.md Add NORM to Matcher feature in docs (#10560) 2022-03-28 10:35:47 +02:00
morphologizer.md Update overwrite and scorer in API docs (#9384) 2021-10-11 10:35:07 +02:00
morphology.md Document Assigned Attributes of Pipeline Components (#9041) 2021-09-01 12:09:39 +02:00
phrasematcher.md 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
pipe.md Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
pipeline-functions.md Add doc_cleaner component (#9659) 2021-11-23 15:33:33 +01:00
scorer.md Add micro PRF for morph scoring (#9546) 2021-10-29 10:29:29 +02:00
sentencerecognizer.md Update overwrite and scorer in API docs (#9384) 2021-10-11 10:35:07 +02:00
sentencizer.md Update overwrite and scorer in API docs (#9384) 2021-10-11 10:35:07 +02:00
span.md Clarify Span.ents documentation (#10154) 2022-01-31 08:41:42 +01:00
spancategorizer.md Save span candidates produced by spancat suggesters (#10413) 2022-03-14 16:46:58 +01:00
spangroup.md Warn and document spangroup.doc weakref (#8980) 2021-08-20 11:06:19 +02:00
stringstore.md Update docs, types and API consistency 2020-08-17 16:45:24 +02:00
tagger.md Document Tagger neg_prefix, fix typo (#9821) 2021-12-07 09:42:40 +01:00
textcategorizer.md Fix Scorer.score_cats for missing labels (#9443) 2021-12-29 11:04:39 +01:00
tok2vec.md Tidy up docs 2021-06-28 12:08:15 +02:00
token.md Token sent attributes more consistent (#10164) 2022-02-08 08:35:37 +01:00
tokenizer.md Add tokenizer option to allow Matcher handling for all rules (#10452) 2022-03-24 13:21:32 +01:00
top-level.md Add displacy support for overlapping Spans (#10332) 2022-03-16 18:14:34 +01:00
transformer.md Update docs for spacy-transformers v1.1 data classes (#9361) 2021-10-18 14:16:58 +02:00
vectors.md Fix Vectors.n_keys for floret vectors (#10394) 2022-03-01 09:21:25 +01:00
vocab.md Update docs for Vocab.get_vector (#10486) 2022-03-15 09:10:47 +01:00