From 574fd53289c80d43676643c5be985658b1ee48a3 Mon Sep 17 00:00:00 2001
From: Matthew Honnibal <honnibal+gh@gmail.com>
Date: Tue, 18 Aug 2020 13:51:08 +0200
Subject: [PATCH] Add precision/recall description

---
 website/docs/usage/training.md | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/website/docs/usage/training.md b/website/docs/usage/training.md
index fc96a76c1..acfa4afa8 100644
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@@ -454,7 +454,22 @@ components are weighted equally.
 | **UAS** / **LAS**          | Unlabeled and labeled attachment score for the dependency parser, i.e. the percentage of correct arcs. Should increase. |
 | **Words per second** (WPS) | Prediction speed in words per second. Should stay stable.                                                               |
 
+Precision and recall are two common measurements of a model's accuracy. You
+need precision and recall statistics whenever your model can return a variable
+number of predictions, as in this situation there are two different ways your
+model can be "accurate".
+
+Precision refers to the percentage of predicted annotations that were correct,
+while recall refers to the percentage of reference annotations recovered.
+A model that only returns one entity for a document will have precision 1.0 if
+that entity is correct, but might have low recall if it has missed lots of
+other correct entities. F-score is the harmonic mean of precision and recall.
+The harmonic mean is used instead of the arithmetic mean so that systems with
+very low precision or very low recall will score lower than systems that
+achieve a balance of the two.
+
 <!-- TODO: is this still relevant? -->
+<!-- Yes (MH) -->
 
 Note that if the development data has raw text, some of the gold-standard
 entities might not align to the predicted tokenization. These tokenization