Temporary work-around for scoring a subset of components (#6090)

* Try hacking the scorer to work around sentence boundaries * Upd scorer * Set dev version * Upd scorer hack * Fix version * Improve comment on hack
2026-02-07 15:59:43 +03:00 · 2020-09-18 14:26:42 +02:00 · 2020-09-18 14:26:42 +02:00 · bbdb5f62b7
commit bbdb5f62b7
parent d32ce121be
1 changed files with 12 additions and 0 deletions
--- a/spacy/scorer.py
+++ b/spacy/scorer.py
@ -270,6 +270,18 @@ class Scorer:
        for example in examples:
            pred_doc = example.predicted
            gold_doc = example.reference
+            # TODO
+            # This is a temporary hack to work around the problem that the scorer
+            # fails if you have examples that are not fully annotated for all
+            # the tasks in your pipeline. For instance, you might have a corpus
+            # of NER annotations that does not set sentence boundaries, but the
+            # pipeline includes a parser or senter, and then the score_weights
+            # are used to evaluate that component. When the scorer attempts
+            # to read the sentences from the gold document, it fails.
+            try:
+                list(getter(gold_doc, attr))
+            except ValueError:
+                continue
            # Find all labels in gold and doc
            labels = set(
                [k.label_ for k in getter(gold_doc, attr)]