mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-25 01:16:28 +03:00
Temporary work-around for scoring a subset of components (#6090)
* Try hacking the scorer to work around sentence boundaries * Upd scorer * Set dev version * Upd scorer hack * Fix version * Improve comment on hack
This commit is contained in:
parent
d32ce121be
commit
bbdb5f62b7
|
@ -270,6 +270,18 @@ class Scorer:
|
||||||
for example in examples:
|
for example in examples:
|
||||||
pred_doc = example.predicted
|
pred_doc = example.predicted
|
||||||
gold_doc = example.reference
|
gold_doc = example.reference
|
||||||
|
# TODO
|
||||||
|
# This is a temporary hack to work around the problem that the scorer
|
||||||
|
# fails if you have examples that are not fully annotated for all
|
||||||
|
# the tasks in your pipeline. For instance, you might have a corpus
|
||||||
|
# of NER annotations that does not set sentence boundaries, but the
|
||||||
|
# pipeline includes a parser or senter, and then the score_weights
|
||||||
|
# are used to evaluate that component. When the scorer attempts
|
||||||
|
# to read the sentences from the gold document, it fails.
|
||||||
|
try:
|
||||||
|
list(getter(gold_doc, attr))
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
# Find all labels in gold and doc
|
# Find all labels in gold and doc
|
||||||
labels = set(
|
labels = set(
|
||||||
[k.label_ for k in getter(gold_doc, attr)]
|
[k.label_ for k in getter(gold_doc, attr)]
|
||||||
|
|
Loading…
Reference in New Issue
Block a user