* Move coref scoring code to scorer.py
Includes some renames to make names less generic.
* Refactor coval code to remove ternary expressions
* Black formatting
* Add header
* Make scorers into registered scorers
* Small test fixes
* Skip coref tests when torch not present
Coref can't be loaded without Torch, so nothing works.
* Fix remaining type issues
Some of this just involves ignoring types in thorny areas. Two main
issues:
1. Some things have weird types due to indirection/ argskwargs
2. xp2torch return type seems to have changed at some point
* Update spacy/scorer.py
Co-authored-by: kadarakos <kadar.akos@gmail.com>
* Small changes from review
* Be specific about the ValueError
* Type fix
Co-authored-by: kadarakos <kadar.akos@gmail.com>
* Fix TODO about typing
Fix was simple: just request an array2f.
* Add type ignore
Maxout has a more restrictive type than the residual layer expects (only
Floats2d vs any Floats).
* Various cleanup
This moves a lot of lines around but doesn't change any functionality.
Details:
1. use `continue` to reduce indentation
2. move sentence doc building inside conditional since it's otherwise
unused
3. reduces some temporary assignments
* Make changes to typing
* Correction
* Format with black
* Corrections based on review
* Bumped Thinc dependency version
* Bumped blis requirement
* Correction for older Python versions
* Update spacy/ml/models/textcat.py
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
* Corrections based on review feedback
* Readd deleted docstring line
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Torch is required for the coref/spanpred models but shouldn't be
required for spaCy in general.
The one tricky part of this is that one function in coref_util relied on
torch, but that file was imported in several places. Since the function
was only used in one place I moved it there.
* Tagger: use unnormalized probabilities for inference
Using unnormalized softmax avoids use of the relatively expensive exp function,
which can significantly speed up non-transformer models (e.g. I got a speedup
of 27% on a German tagging + parsing pipeline).
* Add spacy.Tagger.v2 with configurable normalization
Normalization of probabilities is disabled by default to improve
performance.
* Update documentation, models, and tests to spacy.Tagger.v2
* Move Tagger.v1 to spacy-legacy
* docs/architectures: run prettier
* Unnormalized softmax is now a Softmax_v2 option
* Require thinc 8.0.14 and spacy-legacy 3.0.9
The span predictor component is initialized but not used at all now.
Plan is to work on it after the word level clustering part is trainable
end-to-end.