* Added ENT_ID and ENT_KB_ID into the list of the attributes that Matcher matches on
* Added ENT_ID and ENT_KB_ID to TEST_PATTERNS in test_pattern_validation.py. Disabled tests that I added before
* Update website/docs/api/matcher.md
* Format
* Remove skipped tests
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* added error string
* added serialization test
* added more to if statements
* wrote file to tempdir
* added tempdir
* changed parameter a bit
* Update spacy/tests/pipeline/test_entity_ruler.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
If the predicted docs are missing annotation according to
`has_annotation`, treat the docs as having no predictions rather than
raising errors when the annotation is missing.
The motivation for this is a combined tokenization+sents scorer for a
component where the sents annotation is optional. To provide a single
scorer in the component factory, it needs to be possible for the scorer
to continue despite missing sents annotation in the case where the
component is not annotating sents.
* Add note on batch contract
Using listeners requires batches to be consistent. This is obvious if
you understand how the listener works, but it wasn't clearly stated in
the Docs, and was subtle enough that the EntityLinker missed it.
There is probably a clearer way to explain what the actual requirement
is, but I figure this is a good start.
* Rewrite to clarify role of caching
Exclude strings from `Vector.to_bytes()` comparions for v3.2+ `Vectors`
that now include the string store so that the source vector comparison
is only comparing the vectors and not the strings.
* Clarify how to fill in init_tok2vec after pretraining
* Ignore init_tok2vec arg in pretraining
* Update docs, config setting
* Remove obsolete note about not filling init_tok2vec early
This seems to have also caught some lines that needed cleanup.