spaCy/scorer.md at e95d9caa878a39070ed47af3b390e161d6486f40

mirror of https://github.com/explosion/spaCy.git synced 2024-12-27 10:26:35 +03:00

Sofie Van Landeghem 75a202ce65

* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos

2020-10-18 14:50:41 +02:00

15 KiB

Raw Blame History

title	teaser	tag	source
Scorer	Compute evaluation scores	class	spacy/scorer.py

The Scorer computes evaluation scores. It's typically created by Language.evaluate. In addition, the Scorer provides a number of evaluation methods for evaluating Token and Doc attributes.

Scorer.init

Create a new Scorer.

Example

from spacy.scorer import Scorer

# Default scoring pipeline
scorer = Scorer()

# Provided scoring pipeline
nlp = spacy.load("en_core_web_sm")
scorer = Scorer(nlp)

Name	Description
`nlp`	The pipeline to use for scoring, where each pipeline component may provide a scoring method. If none is provided, then a default pipeline for the multi-language code `xx` is constructed containing: `senter`, `tagger`, `morphologizer`, `parser`, `ner`, `textcat`. ~~Language~~

Scorer.score

Calculate the scores for a list of Example objects using the scoring methods provided by the components in the pipeline.

The returned Dict contains the scores provided by the individual pipeline components. For the scoring methods provided by the Scorer and use by the core pipeline components, the individual score names start with the Token or Doc attribute being scored:

token_acc, token_p, token_r, token_f,
sents_p, sents_r, sents_f
tag_acc, pos_acc, morph_acc, morph_per_feat, lemma_acc
dep_uas, dep_las, dep_las_per_type
ents_p, ents_r ents_f, ents_per_type
textcat_macro_auc, textcat_macro_f

Example

scorer = Scorer()
scores = scorer.score(examples)

Name	Description
`examples`	The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~
RETURNS	A dictionary of scores. ~~Dict[str, Union[float, Dict[str, float]]]~~

Scorer.score_tokenization

Scores the tokenization:

token_acc: number of correct tokens / number of gold tokens
token_p, token_r, token_f: precision, recall and F-score for token character spans

Example

scores = Scorer.score_tokenization(examples)

Name	Description
`examples`	The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~
RETURNS	`Dict`

Scorer.score_token_attr

Scores a single token attribute.

Example

scores = Scorer.score_token_attr(examples, "pos")
print(scores["pos_acc"])

Name	Description
`examples`	The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~
`attr`	The attribute to score. ~~str~~
keyword-only
`getter`	Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. ~~Callable[[Token, str], Any]~~
RETURNS	A dictionary containing the score `{attr}_acc`. ~~Dict[str, float]~~

Scorer.score_token_attr_per_feat

Scores a single token attribute per feature for a token attribute in the Universal Dependencies FEATS format.

Example

scores = Scorer.score_token_attr_per_feat(examples, "morph")
print(scores["morph_per_feat"])

Name	Description
`examples`	The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~
`attr`	The attribute to score. ~~str~~
keyword-only
`getter`	Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. ~~Callable[[Token, str], Any]~~
RETURNS	A dictionary containing the per-feature PRF scores under the key `{attr}_per_feat`. ~~Dict[str, Dict[str, float]]~~

Scorer.score_spans

Returns PRF scores for labeled or unlabeled spans.

Example

scores = Scorer.score_spans(examples, "ents")
print(scores["ents_f"])

Name	Description
`examples`	The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~
`attr`	The attribute to score. ~~str~~
keyword-only
`getter`	Defaults to `getattr`. If provided, `getter(doc, attr)` should return the `Span` objects for an individual `Doc`. ~~CallableDoc, str], Iterable[Span~~
RETURNS	A dictionary containing the PRF scores under the keys `{attr}_p`, `{attr}_r`, `{attr}_f` and the per-type PRF scores under `{attr}_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~

Scorer.score_deps

Calculate the UAS, LAS, and LAS per type scores for dependency parses.

Example

def dep_getter(token, attr):
    dep = getattr(token, attr)
    dep = token.vocab.strings.as_string(dep).lower()
    return dep

scores = Scorer.score_deps(
    examples,
    "dep",
    getter=dep_getter,
    ignore_labels=("p", "punct")
)
print(scores["dep_uas"], scores["dep_las"])

Name	Description
`examples`	The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~
`attr`	The attribute to score. ~~str~~
keyword-only
`getter`	Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. ~~Callable[[Token, str], Any]~~
`head_attr`	The attribute containing the head token. ~~str~~
`head_getter`	Defaults to `getattr`. If provided, `head_getter(token, attr)` should return the head for an individual `Token`. ~~Callable[[Doc, str], Token]~~
`ignore_labels`	Labels to ignore while scoring (e.g. `"punct"`). ~~Iterable[str]~~
RETURNS	A dictionary containing the scores: `{attr}_uas`, `{attr}_las`, and `{attr}_las_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~

Scorer.score_cats

Calculate PRF and ROC AUC scores for a doc-level attribute that is a dict containing scores for each label like Doc.cats. The returned dictionary contains the following scores:

{attr}_micro_p, {attr}_micro_r and {attr}_micro_f: each instance across each label is weighted equally
{attr}_macro_p, {attr}_macro_r and {attr}_macro_f: the average values across evaluations per label
{attr}_f_per_type and {attr}_auc_per_type: each contains a dictionary of scores, keyed by label
A final {attr}_score and corresponding {attr}_score_desc (text description)

The reported {attr}_score depends on the classification properties:

binary exclusive with positive label: {attr}_score is set to the F-score of the positive label
3+ exclusive classes, macro-averaged F-score: {attr}_score = {attr}_macro_f
multilabel, macro-averaged AUC: {attr}_score = {attr}_macro_auc

Example

labels = ["LABEL_A", "LABEL_B", "LABEL_C"]
scores = Scorer.score_cats(
    examples,
    "cats",
    labels=labels
)
print(scores["cats_macro_auc"])

Name	Description
`examples`	The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~
`attr`	The attribute to score. ~~str~~
keyword-only
`getter`	Defaults to `getattr`. If provided, `getter(doc, attr)` should return the cats for an individual `Doc`. ~~CallableDoc, str], Dict[str, float~~
labels	The set of possible labels. Defaults to `[]`. ~~Iterable[str]~~
`multi_label`	Whether the attribute allows multiple labels. Defaults to `True`. ~~bool~~
`positive_label`	The positive label for a binary task with exclusive classes. Defaults to `None`. ~~Optional[str]~~
RETURNS	A dictionary containing the scores, with inapplicable scores as `None`. ~~Dict[str, Optional[float]]~~

Scorer.score_links

Returns PRF for predicted links on the entity level. To disentangle the performance of the NEL from the NER, this method only evaluates NEL links for entities that overlap between the gold reference and the predictions.

Example

scores = Scorer.score_links(
    examples,
    negative_labels=["NIL", ""]
)
print(scores["nel_micro_f"])

Name	Description
`examples`	The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~
keyword-only
`negative_labels`	The string values that refer to no annotation (e.g. "NIL"). ~~Iterable[str]~~
RETURNS	A dictionary containing the scores. ~~Dict[str, Optional[float]]~~

15 KiB Raw Blame History

Scorer.__init__

Example

Scorer.score

Example

Scorer.score_tokenization

Example

Scorer.score_token_attr

Example

Scorer.score_token_attr_per_feat

Example

Scorer.score_spans

Example

Scorer.score_deps

Example

Scorer.score_cats

Example

Scorer.score_links

Example

15 KiB

Raw Blame History

Scorer.init