Fix bug where `Morphologizer.get_loss` treated misaligned annotation as
`EMPTY_MORPH` rather than ignoring it. Remove unneeded default `EMPTY_MORPH`
mappings.
* Update pip in CI
* Use --prefer-binary
* Use `--prefer-binary`
* Delete all installed packages before testing source install
* sdist install with --only-binary :all:
For the `DependencyMatcher`:
* Fix on_match callback so that it is called once per matched pattern
* Fix results so that patterns with empty match lists are not returned
* Replace pytokenizations with internal alignment
Replace pytokenizations with internal alignment algorithm that is
restricted to only allow differences in whitespace and capitalization.
* Rename `spacy.training.align` to `spacy.training.alignment` to contain
the `Alignment` dataclass
* Implement `get_alignments` in `spacy.training.align`
* Refactor trailing whitespace handling
* Remove unnecessary exception for empty docs
Allow a non-empty whitespace-only doc to be aligned with an empty doc
* Remove empty docs exceptions completely
* Handle missing reference values in scorer
Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.
Attributes without unset states:
* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation
Additional changes:
* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`
* Fix import
* Update return types