Edits to the training 101 section

2025-07-15 02:32:37 +03:00 · 2020-07-26 13:42:08 +02:00 · 2020-07-26 13:42:08 +02:00 · e6a7deb7cc
commit e6a7deb7cc
parent 520d25cb50
1 changed files with 18 additions and 12 deletions
--- a/website/docs/usage/101/_training.md
+++ b/website/docs/usage/101/_training.md
@ -1,26 +1,30 @@
-spaCy's models are **statistical** and every "decision" they make – for example,
+spaCy's tagger, parser, text categorizer and many other components are powered
 by **statistical models**. Every "decision" these components make – for example,
 which part-of-speech tag to assign, or whether a word is a named entity – is a
-**prediction**. This prediction is based on the examples the model has seen
+**prediction** based on the model's current **weight values**. The weight
 values are estimated based on examples the model has seen
 during **training**. To train a model, you first need training data – examples
 of text, and the labels you want the model to predict. This could be a
 part-of-speech tag, a named entity or any other information.
-The model is then shown the unlabelled text and will make a prediction. Because
+Training is an iterative process in which the model's predictions are compared 
-we know the correct answer, we can give the model feedback on its prediction in
+against the reference annotations in order to estimate the **gradient of the
-the form of an **error gradient** of the **loss function** that calculates the
+loss**. The gradient of the loss is then used to calculate the gradient of the
-difference between the training example and the expected output. The greater the
+weights through [backpropagation](https://thinc.ai/backprop101). The gradients
-difference, the more significant the gradient and the updates to our model.
+indicate how the weight values should be changed so that the model's
 predictions become more similar to the reference labels over time. 
 > - **Training data:** Examples and their annotations.
 > - **Text:** The input text the model should predict a label for.
 > - **Label:** The label the model should predict.
-> - **Gradient:** Gradient of the loss function calculating the difference
+> - **Gradient:** The direction and rate of change for a numeric value.
->   between input and expected output.
+>   Minimising the gradient of the weights should result in predictions that
 >   are closer to the reference labels on the training data.
 ![The training process](../../images/training.svg)
 When training a model, we don't just want it to memorize our examples – we want
-it to come up with a theory that can be **generalized across other examples**.
+it to come up with a theory that can be **generalized across unseen data**.
 After all, we don't just want the model to learn that this one instance of
 "Amazon" right here is a company – we want it to learn that "Amazon", in
 contexts _like this_, is most likely a company. That's why the training data
@ -34,5 +38,7 @@ it's learning the right things, you don't only need **training data** – you'll
 also need **evaluation data**. If you only test the model with the data it was
 trained on, you'll have no idea how well it's generalizing. If you want to train
 a model from scratch, you usually need at least a few hundred examples for both
-training and evaluation. To update an existing model, you can already achieve
+training and evaluation. A good rule of thumb is that you should have 10
-decent results with very few examples – as long as they're representative.
+samples for each significant figure of accuracy you report.
 If you only have 100 samples and your model predicts 92 of them correctly, you
 would report accuracy of 0.9 rather than 0.92.