* Move coref scoring code to scorer.py
Includes some renames to make names less generic.
* Refactor coval code to remove ternary expressions
* Black formatting
* Add header
* Make scorers into registered scorers
* Small test fixes
* Skip coref tests when torch not present
Coref can't be loaded without Torch, so nothing works.
* Fix remaining type issues
Some of this just involves ignoring types in thorny areas. Two main
issues:
1. Some things have weird types due to indirection/ argskwargs
2. xp2torch return type seems to have changed at some point
* Update spacy/scorer.py
Co-authored-by: kadarakos <kadar.akos@gmail.com>
* Small changes from review
* Be specific about the ValueError
* Type fix
Co-authored-by: kadarakos <kadar.akos@gmail.com>
This is necessary because one of the three old methods relied on scipy
for some complex problem solving. LEA is generally better for
evaluations.
The downside is that this means evaluations aren't comparable with many
papers, but canonical scoring can be supported using external eval
scripts or other methods.
The way fake batching works is that the pipeline component calls the
model repeatedly in a loop internally. It feels like this should break
something, but it worked in testing.
Another issue is that this changes the signature of some of the pipeline
functions, though I don't think that's an issue.
Tested with batch size of 2, so more testing is needed, but this is a
start.
This calculates scores as an average of three metrics. As noted in the
code, these metrics all have issues, but we want to use them to match up
with prior work.
This should be replaced with some simpler default scoring and the scorer
here should be moved to an external project to be passed in just for
generating the traditional scores.
This rewrites the loss to not use the Thinc crossentropy code at all.
The main difference here is that the negative predictions are being
masked out (= marginalized over), but negative gradient is still being
reflected.
I'm still not sure this is exactly right but models seem to train
reliably now.