mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-14 13:47:13 +03:00
c7eedd3534
* NEL: read sentences and ents from reference * fiddling with sent_start annotations * add KB serialization test * KB write additional file with strings.json * score_links function to calculate NEL P/R/F * formatting * documentation
232 lines
14 KiB
Markdown
232 lines
14 KiB
Markdown
---
|
|
title: Scorer
|
|
teaser: Compute evaluation scores
|
|
tag: class
|
|
source: spacy/scorer.py
|
|
---
|
|
|
|
The `Scorer` computes evaluation scores. It's typically created by
|
|
[`Language.evaluate`](/api/language#evaluate). In addition, the `Scorer`
|
|
provides a number of evaluation methods for evaluating [`Token`](/api/token) and
|
|
[`Doc`](/api/doc) attributes.
|
|
|
|
## Scorer.\_\_init\_\_ {#init tag="method"}
|
|
|
|
Create a new `Scorer`.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.scorer import Scorer
|
|
>
|
|
> # Default scoring pipeline
|
|
> scorer = Scorer()
|
|
>
|
|
> # Provided scoring pipeline
|
|
> nlp = spacy.load("en_core_web_sm")
|
|
> scorer = Scorer(nlp)
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| `nlp` | The pipeline to use for scoring, where each pipeline component may provide a scoring method. If none is provided, then a default pipeline for the multi-language code `xx` is constructed containing: `senter`, `tagger`, `morphologizer`, `parser`, `ner`, `textcat`. ~~Language~~ |
|
|
|
|
## Scorer.score {#score tag="method"}
|
|
|
|
Calculate the scores for a list of [`Example`](/api/example) objects using the
|
|
scoring methods provided by the components in the pipeline.
|
|
|
|
The returned `Dict` contains the scores provided by the individual pipeline
|
|
components. For the scoring methods provided by the `Scorer` and use by the core
|
|
pipeline components, the individual score names start with the `Token` or `Doc`
|
|
attribute being scored:
|
|
|
|
- `token_acc`, `token_p`, `token_r`, `token_f`,
|
|
- `sents_p`, `sents_r`, `sents_f`
|
|
- `tag_acc`, `pos_acc`, `morph_acc`, `morph_per_feat`, `lemma_acc`
|
|
- `dep_uas`, `dep_las`, `dep_las_per_type`
|
|
- `ents_p`, `ents_r` `ents_f`, `ents_per_type`
|
|
- `textcat_macro_auc`, `textcat_macro_f`
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> scorer = Scorer()
|
|
> scores = scorer.score(examples)
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------------------------------------------------------------------------------------------- |
|
|
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
|
| **RETURNS** | A dictionary of scores. ~~Dict[str, Union[float, Dict[str, float]]]~~ |
|
|
|
|
## Scorer.score_tokenization {#score_tokenization tag="staticmethod" new="3"}
|
|
|
|
Scores the tokenization:
|
|
|
|
- `token_acc`: number of correct tokens / number of gold tokens
|
|
- `token_p`, `token_r`, `token_f`: precision, recall and F-score for token
|
|
character spans
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> scores = Scorer.score_tokenization(examples)
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------------------------------------------------------------------------------------------- |
|
|
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
|
| **RETURNS** | `Dict` | A dictionary containing the scores `token_acc`, `token_p`, `token_r`, `token_f`. ~~Dict[str, float]]~~ |
|
|
|
|
## Scorer.score_token_attr {#score_token_attr tag="staticmethod" new="3"}
|
|
|
|
Scores a single token attribute.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> scores = Scorer.score_token_attr(examples, "pos")
|
|
> print(scores["pos_acc"])
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
|
| `attr` | The attribute to score. ~~str~~ |
|
|
| _keyword-only_ | |
|
|
| `getter` | Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. ~~Callable[[Token, str], Any]~~ |
|
|
| **RETURNS** | A dictionary containing the score `{attr}_acc`. ~~Dict[str, float]~~ |
|
|
|
|
## Scorer.score_token_attr_per_feat {#score_token_attr_per_feat tag="staticmethod" new="3"}
|
|
|
|
Scores a single token attribute per feature for a token attribute in the
|
|
Universal Dependencies
|
|
[FEATS](https://universaldependencies.org/format.html#morphological-annotation)
|
|
format.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> scores = Scorer.score_token_attr_per_feat(examples, "morph")
|
|
> print(scores["morph_per_feat"])
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
|
| `attr` | The attribute to score. ~~str~~ |
|
|
| _keyword-only_ | |
|
|
| `getter` | Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. ~~Callable[[Token, str], Any]~~ |
|
|
| **RETURNS** | A dictionary containing the per-feature PRF scores under the key `{attr}_per_feat`. ~~Dict[str, Dict[str, float]]~~ |
|
|
|
|
## Scorer.score_spans {#score_spans tag="staticmethod" new="3"}
|
|
|
|
Returns PRF scores for labeled or unlabeled spans.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> scores = Scorer.score_spans(examples, "ents")
|
|
> print(scores["ents_f"])
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
|
| `attr` | The attribute to score. ~~str~~ |
|
|
| _keyword-only_ | |
|
|
| `getter` | Defaults to `getattr`. If provided, `getter(doc, attr)` should return the `Span` objects for an individual `Doc`. ~~Callable[[Doc, str], Iterable[Span]]~~ |
|
|
| **RETURNS** | A dictionary containing the PRF scores under the keys `{attr}_p`, `{attr}_r`, `{attr}_f` and the per-type PRF scores under `{attr}_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~ |
|
|
|
|
## Scorer.score_deps {#score_deps tag="staticmethod" new="3"}
|
|
|
|
Calculate the UAS, LAS, and LAS per type scores for dependency parses.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> def dep_getter(token, attr):
|
|
> dep = getattr(token, attr)
|
|
> dep = token.vocab.strings.as_string(dep).lower()
|
|
> return dep
|
|
>
|
|
> scores = Scorer.score_deps(
|
|
> examples,
|
|
> "dep",
|
|
> getter=dep_getter,
|
|
> ignore_labels=("p", "punct")
|
|
> )
|
|
> print(scores["dep_uas"], scores["dep_las"])
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
|
| `attr` | The attribute to score. ~~str~~ |
|
|
| _keyword-only_ | |
|
|
| `getter` | Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. ~~Callable[[Token, str], Any]~~ |
|
|
| `head_attr` | The attribute containing the head token. ~~str~~ |
|
|
| `head_getter` | Defaults to `getattr`. If provided, `head_getter(token, attr)` should return the head for an individual `Token`. ~~Callable[[Doc, str], Token]~~ |
|
|
| `ignore_labels` | Labels to ignore while scoring (e.g. `"punct"`). ~~Iterable[str]~~ |
|
|
| **RETURNS** | A dictionary containing the scores: `{attr}_uas`, `{attr}_las`, and `{attr}_las_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~ |
|
|
|
|
## Scorer.score_cats {#score_cats tag="staticmethod" new="3"}
|
|
|
|
Calculate PRF and ROC AUC scores for a doc-level attribute that is a dict
|
|
containing scores for each label like `Doc.cats`. The reported overall score
|
|
depends on the scorer settings:
|
|
|
|
1. **all:** `{attr}_score` (one of `{attr}_f` / `{attr}_macro_f` /
|
|
`{attr}_macro_auc`), `{attr}_score_desc` (text description of the overall
|
|
score), `{attr}_f_per_type`, `{attr}_auc_per_type`
|
|
2. **binary exclusive with positive label:** `{attr}_p`, `{attr}_r`, `{attr}_f`
|
|
3. **3+ exclusive classes**, macro-averaged F-score: `{attr}_macro_f`;
|
|
4. **multilabel**, macro-averaged AUC: `{attr}_macro_auc`
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> labels = ["LABEL_A", "LABEL_B", "LABEL_C"]
|
|
> scores = Scorer.score_cats(
|
|
> examples,
|
|
> "cats",
|
|
> labels=labels
|
|
> )
|
|
> print(scores["cats_macro_auc"])
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
|
| `attr` | The attribute to score. ~~str~~ |
|
|
| _keyword-only_ | |
|
|
| `getter` | Defaults to `getattr`. If provided, `getter(doc, attr)` should return the cats for an individual `Doc`. ~~Callable[[Doc, str], Dict[str, float]]~~ |
|
|
| labels | The set of possible labels. Defaults to `[]`. ~~Iterable[str]~~ |
|
|
| `multi_label` | Whether the attribute allows multiple labels. Defaults to `True`. ~~bool~~ |
|
|
| `positive_label` | The positive label for a binary task with exclusive classes. Defaults to `None`. ~~Optional[str]~~ |
|
|
| **RETURNS** | A dictionary containing the scores, with inapplicable scores as `None`. ~~Dict[str, Optional[float]]~~ |
|
|
|
|
## Scorer.score_links {#score_links tag="staticmethod" new="3"}
|
|
|
|
Returns PRF for predicted links on the entity level. To disentangle the
|
|
performance of the NEL from the NER, this method only evaluates NEL links for
|
|
entities that overlap between the gold reference and the predictions.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> scores = Scorer.score_links(
|
|
> examples,
|
|
> negative_labels=["NIL", ""]
|
|
> )
|
|
> print(scores["nel_micro_f"])
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------------- | ------------------------------------------------------------------------------------------------------------------- |
|
|
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
|
| _keyword-only_ | |
|
|
| `negative_labels` | The string values that refer to no annotation (e.g. "NIL"). ~~Iterable[str]~~ |
|
|
| **RETURNS** | A dictionary containing the scores. ~~Dict[str, Optional[float]]~~ |
|