From 574fd53289c80d43676643c5be985658b1ee48a3 Mon Sep 17 00:00:00 2001 From: Matthew Honnibal Date: Tue, 18 Aug 2020 13:51:08 +0200 Subject: [PATCH] Add precision/recall description --- website/docs/usage/training.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/website/docs/usage/training.md b/website/docs/usage/training.md index fc96a76c1..acfa4afa8 100644 --- a/website/docs/usage/training.md +++ b/website/docs/usage/training.md @@ -454,7 +454,22 @@ components are weighted equally. | **UAS** / **LAS** | Unlabeled and labeled attachment score for the dependency parser, i.e. the percentage of correct arcs. Should increase. | | **Words per second** (WPS) | Prediction speed in words per second. Should stay stable. | +Precision and recall are two common measurements of a model's accuracy. You +need precision and recall statistics whenever your model can return a variable +number of predictions, as in this situation there are two different ways your +model can be "accurate". + +Precision refers to the percentage of predicted annotations that were correct, +while recall refers to the percentage of reference annotations recovered. +A model that only returns one entity for a document will have precision 1.0 if +that entity is correct, but might have low recall if it has missed lots of +other correct entities. F-score is the harmonic mean of precision and recall. +The harmonic mean is used instead of the arithmetic mean so that systems with +very low precision or very low recall will score lower than systems that +achieve a balance of the two. + + Note that if the development data has raw text, some of the gold-standard entities might not align to the predicted tokenization. These tokenization