mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-29 03:16:31 +03:00
2bcceb80c4
* Refactor the Scorer to improve flexibility Refactor the `Scorer` to improve flexibility for arbitrary pipeline components. * Individual pipeline components provide their own `evaluate` methods that score a list of `Example`s and return a dictionary of scores * `Scorer` is initialized either: * with a provided pipeline containing components to be scored * with a default pipeline containing the built-in statistical components (senter, tagger, morphologizer, parser, ner) * `Scorer.score` evaluates a list of `Example`s and returns a dictionary of scores referring to the scores provided by the components in the pipeline Significant differences: * `tags_acc` is renamed to `tag_acc` to be consistent with `token_acc` and the new `morph_acc`, `pos_acc`, and `lemma_acc` * Scoring is no longer cumulative: `Scorer.score` scores a list of examples rather than a single example and does not retain any state about previously scored examples * PRF values in the returned scores are no longer multiplied by 100 * Add kwargs to Morphologizer.evaluate * Create generalized scoring methods in Scorer * Generalized static scoring methods are added to `Scorer` * Methods require an attribute (either on Token or Doc) that is used to key the returned scores Naming differences: * `uas`, `las`, and `las_per_type` in the scores dict are renamed to `dep_uas`, `dep_las`, and `dep_las_per_type` Scoring differences: * `Doc.sents` is now scored as spans rather than on sentence-initial token positions so that `Doc.sents` and `Doc.ents` can be scored with the same method (this lowers scores since a single incorrect sentence start results in two incorrect spans) * Simplify / extend hasattr check for eval method * Add hasattr check to tokenizer scoring * Simplify to hasattr check for component scoring * Reset Example alignment if docs are set Reset the Example alignment if either doc is set in case the tokenization has changed. * Add PRF tokenization scoring for tokens as spans Add PRF scores for tokens as character spans. The scores are: * token_acc: # correct tokens / # gold tokens * token_p/r/f: PRF for (token.idx, token.idx + len(token)) * Add docstring to Scorer.score_tokenization * Rename component.evaluate() to component.score() * Update Scorer API docs * Update scoring for positive_label in textcat * Fix TextCategorizer.score kwargs * Update Language.evaluate docs * Update score names in default config
134 lines
8.7 KiB
Markdown
134 lines
8.7 KiB
Markdown
---
|
|
title: Scorer
|
|
teaser: Compute evaluation scores
|
|
tag: class
|
|
source: spacy/scorer.py
|
|
---
|
|
|
|
The `Scorer` computes evaluation scores. It's typically created by
|
|
[`Language.evaluate`](/api/language#evaluate).
|
|
|
|
In addition, the `Scorer` provides a number of evaluation methods for
|
|
evaluating `Token` and `Doc` attributes.
|
|
|
|
## Scorer.\_\_init\_\_ {#init tag="method"}
|
|
|
|
Create a new `Scorer`.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.scorer import Scorer
|
|
>
|
|
> # default scoring pipeline
|
|
> scorer = Scorer()
|
|
>
|
|
> # provided scoring pipeline
|
|
> nlp = spacy.load("en_core_web_sm")
|
|
> scorer = Scorer(nlp)
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ------------ | -------- | ------------------------------------------------------------ |
|
|
| `nlp` | Language | The pipeline to use for scoring, where each pipeline component may provide a scoring method. If none is provided, then a default pipeline for the multi-language code `xx` is constructed containing: `senter`, `tagger`, `morphologizer`, `parser`, `ner`, `textcat`. |
|
|
| **RETURNS** | `Scorer` | The newly created object. |
|
|
|
|
## Scorer.score {#score tag="method"}
|
|
|
|
Calculate the scores for a list of [`Example`](/api/example) objects using the
|
|
scoring methods provided by the components in the pipeline.
|
|
|
|
The returned `Dict` contains the scores provided by the individual pipeline
|
|
components. For the scoring methods provided by the `Scorer` and use by the
|
|
core pipeline components, the individual score names start with the `Token` or
|
|
`Doc` attribute being scored: `token_acc`, `token_p/r/f`, `sents_p/r/f`,
|
|
`tag_acc`, `pos_acc`, `morph_acc`, `morph_per_feat`, `lemma_acc`, `dep_uas`,
|
|
`dep_las`, `dep_las_per_type`, `ents_p/r/f`, `ents_per_type`,
|
|
`textcat_macro_auc`, `textcat_macro_f`.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> scorer = Scorer()
|
|
> scorer.score(examples)
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | --------- | --------------------------------------------------------------------------------------------------------|
|
|
| `examples` | `Iterable[Example]` | The `Example` objects holding both the predictions and the correct gold-standard annotations. |
|
|
| **RETURNS** | `Dict` | A dictionary of scores. |
|
|
## Scorer.score_tokenization {#score_tokenization tag="staticmethod"}
|
|
|
|
Scores the tokenization:
|
|
|
|
* `token_acc`: # correct tokens / # gold tokens
|
|
* `token_p/r/f`: PRF for token character spans
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | --------- | --------------------------------------------------------------------------------------------------------|
|
|
| `examples` | `Iterable[Example]` | The `Example` objects holding both the predictions and the correct gold-standard annotations. |
|
|
| **RETURNS** | `Dict` | A dictionary containing the scores `token_acc/p/r/f`. |
|
|
|
|
## Scorer.score_token_attr {#score_token_attr tag="staticmethod"}
|
|
|
|
Scores a single token attribute.
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | --------- | --------------------------------------------------------------------------------------------------------|
|
|
| `examples` | `Iterable[Example]` | The `Example` objects holding both the predictions and the correct gold-standard annotations. |
|
|
| `attr` | `str` | The attribute to score. |
|
|
| `getter` | `callable` | Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. |
|
|
| **RETURNS** | `Dict` | A dictionary containing the score `attr_acc`. |
|
|
|
|
## Scorer.score_token_attr_per_feat {#score_token_attr_per_feat tag="staticmethod"}
|
|
|
|
Scores a single token attribute per feature for a token attribute in UFEATS format.
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | --------- | --------------------------------------------------------------------------------------------------------|
|
|
| `examples` | `Iterable[Example]` | The `Example` objects holding both the predictions and the correct gold-standard annotations. |
|
|
| `attr` | `str` | The attribute to score. |
|
|
| `getter` | `callable` | Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. |
|
|
| **RETURNS** | `Dict` | A dictionary containing the per-feature PRF scores unders the key `attr_per_feat`. |
|
|
|
|
## Scorer.score_spans {#score_spans tag="staticmethod"}
|
|
|
|
Returns PRF scores for labeled or unlabeled spans.
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | --------- | --------------------------------------------------------------------------------------------------------|
|
|
| `examples` | `Iterable[Example]` | The `Example` objects holding both the predictions and the correct gold-standard annotations. |
|
|
| `attr` | `str` | The attribute to score. |
|
|
| `getter` | `callable` | Defaults to `getattr`. If provided, `getter(doc, attr)` should return the `Span` objects for an individual `Doc`. |
|
|
| **RETURNS** | `Dict` | A dictionary containing the PRF scores under the keys `attr_p/r/f` and the per-type PRF scores under `attr_per_type`. |
|
|
|
|
## Scorer.score_deps {#score_deps tag="staticmethod"}
|
|
|
|
Calculate the UAS, LAS, and LAS per type scores for dependency parses.
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | --------- | --------------------------------------------------------------------------------------------------------|
|
|
| `examples` | `Iterable[Example]` | The `Example` objects holding both the predictions and the correct gold-standard annotations. |
|
|
| `attr` | `str` | The attribute containing the dependency label. |
|
|
| `getter` | `callable` | Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. |
|
|
| `head_attr` | `str` | The attribute containing the head token. |
|
|
| `head_getter` | `callable` | Defaults to `getattr`. If provided, `head_getter(token, attr)` should return the head for an individual `Token`. |
|
|
| `ignore_labels` | `Tuple` | Labels to ignore while scoring (e.g., `punct`).
|
|
| **RETURNS** | `Dict` | A dictionary containing the scores: `attr_uas`, `attr_las`, and `attr_las_per_type`. |
|
|
|
|
## Scorer.score_cats {#score_cats tag="staticmethod"}
|
|
|
|
Calculate PRF and ROC AUC scores for a doc-level attribute that is a dict
|
|
containing scores for each label like `Doc.cats`.
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | --------- | --------------------------------------------------------------------------------------------------------|
|
|
| `examples` | `Iterable[Example]` | The `Example` objects holding both the predictions and the correct gold-standard annotations. |
|
|
| `attr` | `str` | The attribute to score. |
|
|
| `getter` | `callable` | Defaults to `getattr`. If provided, `getter(doc, attr)` should return the cats for an individual `Doc`. |
|
|
| labels | `Iterable[str]` | The set of possible labels. Defaults to `[]`. |
|
|
| multi_label | `bool` | Whether the attribute allows multiple labels. Defaults to `True`. |
|
|
| positive_label | `str` | The positive label for a binary task with exclusive classes. Defaults to `None`. |
|
|
| **RETURNS** | `Dict` | A dictionary containing the scores: 1) for binary exclusive with positive label: `attr_p/r/f`; 2) for 3+ exclusive classes, macro-averaged fscore: `attr_macro_f`; 3) for multilabel, macro-averaged AUC: `attr_macro_auc`; 4) for all: `attr_f_per_type`, `attr_auc_per_type` |
|
|
|