Restore v2 token_acc score implementation (#12073)

In the v3 scorer refactoring, `token_acc` was implemented incorrectly. It should use `precision` instead of `fscore` for the measure of correctly aligned tokens / number of predicted tokens. Fix the docs to reflect that the measure uses the number of predicted tokens rather than the number of gold tokens.
2025-07-30 10:00:04 +03:00 · 2023-01-11 08:01:47 +01:00 · 2023-01-11 08:01:47 +01:00 · 9e0322de1a
commit 9e0322de1a
parent 19650ebb52
3 changed files with 3 additions and 3 deletions
--- a/spacy/scorer.py
+++ b/spacy/scorer.py
@ -174,7 +174,7 @@ class Scorer:
            prf_score.score_set(pred_spans, gold_spans)
        if len(acc_score) > 0:
            return {
-                "token_acc": acc_score.fscore,
+                "token_acc": acc_score.precision,
                "token_p": prf_score.precision,
                "token_r": prf_score.recall,
                "token_f": prf_score.fscore,
--- a/spacy/tests/test_scorer.py
+++ b/spacy/tests/test_scorer.py
@ -110,7 +110,7 @@ def test_tokenization(sented_doc):
    )
    example.predicted[1].is_sent_start = False
    scores = scorer.score([example])
-    assert scores["token_acc"] == approx(0.66666666)
+    assert scores["token_acc"] == 0.5
    assert scores["token_p"] == 0.5
    assert scores["token_r"] == approx(0.33333333)
    assert scores["token_f"] == 0.4
--- a/website/docs/api/scorer.md
+++ b/website/docs/api/scorer.md
@ -76,7 +76,7 @@ core pipeline components, the individual score names start with the `Token` or

 Scores the tokenization:

- `token_acc`: number of correct tokens / number of gold tokens
+- `token_acc`: number of correct tokens / number of predicted tokens
 - `token_p`, `token_r`, `token_f`: precision, recall and F-score for token
  character spans