--- title: Scorer teaser: Compute evaluation scores tag: class source: spacy/scorer.py --- The `Scorer` computes evaluation scores. It's typically created by [`Language.evaluate`](/api/language#evaluate). In addition, the `Scorer` provides a number of evaluation methods for evaluating [`Token`](/api/token) and [`Doc`](/api/doc) attributes. ## Scorer.\_\_init\_\_ {#init tag="method"} Create a new `Scorer`. > #### Example > > ```python > from spacy.scorer import Scorer > > # Default scoring pipeline > scorer = Scorer() > > # Provided scoring pipeline > nlp = spacy.load("en_core_web_sm") > scorer = Scorer(nlp) > ``` | Name | Description | | ----- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `nlp` | The pipeline to use for scoring, where each pipeline component may provide a scoring method. If none is provided, then a default pipeline for the multi-language code `xx` is constructed containing: `senter`, `tagger`, `morphologizer`, `parser`, `ner`, `textcat`. ~~Language~~ | ## Scorer.score {#score tag="method"} Calculate the scores for a list of [`Example`](/api/example) objects using the scoring methods provided by the components in the pipeline. The returned `Dict` contains the scores provided by the individual pipeline components. For the scoring methods provided by the `Scorer` and use by the core pipeline components, the individual score names start with the `Token` or `Doc` attribute being scored: - `token_acc`, `token_p`, `token_r`, `token_f`, - `sents_p`, `sents_r`, `sents_f` - `tag_acc`, `pos_acc`, `morph_acc`, `morph_per_feat`, `lemma_acc` - `dep_uas`, `dep_las`, `dep_las_per_type` - `ents_p`, `ents_r` `ents_f`, `ents_per_type` - `textcat_macro_auc`, `textcat_macro_f` > #### Example > > ```python > scorer = Scorer() > scores = scorer.score(examples) > ``` | Name | Description | | ----------- | ------------------------------------------------------------------------------------------------------------------- | | `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ | | **RETURNS** | A dictionary of scores. ~~Dict[str, Union[float, Dict[str, float]]]~~ | ## Scorer.score_tokenization {#score_tokenization tag="staticmethod" new="3"} Scores the tokenization: - `token_acc`: number of correct tokens / number of gold tokens - `token_p`, `token_r`, `token_f`: precision, recall and F-score for token character spans > #### Example > > ```python > scores = Scorer.score_tokenization(examples) > ``` | Name | Description | | ----------- | ------------------------------------------------------------------------------------------------------------------- | | `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ | | **RETURNS** | `Dict` | A dictionary containing the scores `token_acc`, `token_p`, `token_r`, `token_f`. ~~Dict[str, float]]~~ | ## Scorer.score_token_attr {#score_token_attr tag="staticmethod" new="3"} Scores a single token attribute. > #### Example > > ```python > scores = Scorer.score_token_attr(examples, "pos") > print(scores["pos_acc"]) > ``` | Name | Description | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ | | `attr` | The attribute to score. ~~str~~ | | _keyword-only_ | | | `getter` | Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. ~~Callable[[Token, str], Any]~~ | | **RETURNS** | A dictionary containing the score `{attr}_acc`. ~~Dict[str, float]~~ | ## Scorer.score_token_attr_per_feat {#score_token_attr_per_feat tag="staticmethod" new="3"} Scores a single token attribute per feature for a token attribute in the Universal Dependencies [FEATS](https://universaldependencies.org/format.html#morphological-annotation) format. > #### Example > > ```python > scores = Scorer.score_token_attr_per_feat(examples, "morph") > print(scores["morph_per_feat"]) > ``` | Name | Description | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ | | `attr` | The attribute to score. ~~str~~ | | _keyword-only_ | | | `getter` | Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. ~~Callable[[Token, str], Any]~~ | | **RETURNS** | A dictionary containing the per-feature PRF scores under the key `{attr}_per_feat`. ~~Dict[str, Dict[str, float]]~~ | ## Scorer.score_spans {#score_spans tag="staticmethod" new="3"} Returns PRF scores for labeled or unlabeled spans. > #### Example > > ```python > scores = Scorer.score_spans(examples, "ents") > print(scores["ents_f"]) > ``` | Name | Description | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ | | `attr` | The attribute to score. ~~str~~ | | _keyword-only_ | | | `getter` | Defaults to `getattr`. If provided, `getter(doc, attr)` should return the `Span` objects for an individual `Doc`. ~~Callable[[Doc, str], Iterable[Span]]~~ | | **RETURNS** | A dictionary containing the PRF scores under the keys `{attr}_p`, `{attr}_r`, `{attr}_f` and the per-type PRF scores under `{attr}_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~ | ## Scorer.score_deps {#score_deps tag="staticmethod" new="3"} Calculate the UAS, LAS, and LAS per type scores for dependency parses. > #### Example > > ```python > def dep_getter(token, attr): > dep = getattr(token, attr) > dep = token.vocab.strings.as_string(dep).lower() > return dep > > scores = Scorer.score_deps( > examples, > "dep", > getter=dep_getter, > ignore_labels=("p", "punct") > ) > print(scores["dep_uas"], scores["dep_las"]) > ``` | Name | Description | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ | | `attr` | The attribute to score. ~~str~~ | | _keyword-only_ | | | `getter` | Defaults to `getattr`. If provided, `getter(token, attr)` should return the value of the attribute for an individual `Token`. ~~Callable[[Token, str], Any]~~ | | `head_attr` | The attribute containing the head token. ~~str~~ | | `head_getter` | Defaults to `getattr`. If provided, `head_getter(token, attr)` should return the head for an individual `Token`. ~~Callable[[Doc, str], Token]~~ | | `ignore_labels` | Labels to ignore while scoring (e.g. `"punct"`). ~~Iterable[str]~~ | | **RETURNS** | A dictionary containing the scores: `{attr}_uas`, `{attr}_las`, and `{attr}_las_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~ | ## Scorer.score_cats {#score_cats tag="staticmethod" new="3"} Calculate PRF and ROC AUC scores for a doc-level attribute that is a dict containing scores for each label like `Doc.cats`. The reported overall score depends on the scorer settings: 1. **all:** `{attr}_score` (one of `{attr}_f` / `{attr}_macro_f` / `{attr}_macro_auc`), `{attr}_score_desc` (text description of the overall score), `{attr}_f_per_type`, `{attr}_auc_per_type` 2. **binary exclusive with positive label:** `{attr}_p`, `{attr}_r`, `{attr}_f` 3. **3+ exclusive classes**, macro-averaged F-score: `{attr}_macro_f`; 4. **multilabel**, macro-averaged AUC: `{attr}_macro_auc` > #### Example > > ```python > labels = ["LABEL_A", "LABEL_B", "LABEL_C"] > scores = Scorer.score_cats( > examples, > "cats", > labels=labels > ) > print(scores["cats_macro_auc"]) > ``` | Name | Description | | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | | `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ | | `attr` | The attribute to score. ~~str~~ | | _keyword-only_ | | | `getter` | Defaults to `getattr`. If provided, `getter(doc, attr)` should return the cats for an individual `Doc`. ~~Callable[[Doc, str], Dict[str, float]]~~ | | labels | The set of possible labels. Defaults to `[]`. ~~Iterable[str]~~ | | `multi_label` | Whether the attribute allows multiple labels. Defaults to `True`. ~~bool~~ | | `positive_label` | The positive label for a binary task with exclusive classes. Defaults to `None`. ~~Optional[str]~~ | | **RETURNS** | A dictionary containing the scores, with inapplicable scores as `None`. ~~Dict[str, Optional[float]]~~ |