Edits to the training 101 section

This commit is contained in:
Matthew Honnibal 2020-07-26 13:42:08 +02:00
parent 520d25cb50
commit e6a7deb7cc

View File

@ -1,26 +1,30 @@
spaCy's models are **statistical** and every "decision" they make for example, spaCy's tagger, parser, text categorizer and many other components are powered
by **statistical models**. Every "decision" these components make for example,
which part-of-speech tag to assign, or whether a word is a named entity is a which part-of-speech tag to assign, or whether a word is a named entity is a
**prediction**. This prediction is based on the examples the model has seen **prediction** based on the model's current **weight values**. The weight
values are estimated based on examples the model has seen
during **training**. To train a model, you first need training data examples during **training**. To train a model, you first need training data examples
of text, and the labels you want the model to predict. This could be a of text, and the labels you want the model to predict. This could be a
part-of-speech tag, a named entity or any other information. part-of-speech tag, a named entity or any other information.
The model is then shown the unlabelled text and will make a prediction. Because Training is an iterative process in which the model's predictions are compared
we know the correct answer, we can give the model feedback on its prediction in against the reference annotations in order to estimate the **gradient of the
the form of an **error gradient** of the **loss function** that calculates the loss**. The gradient of the loss is then used to calculate the gradient of the
difference between the training example and the expected output. The greater the weights through [backpropagation](https://thinc.ai/backprop101). The gradients
difference, the more significant the gradient and the updates to our model. indicate how the weight values should be changed so that the model's
predictions become more similar to the reference labels over time.
> - **Training data:** Examples and their annotations. > - **Training data:** Examples and their annotations.
> - **Text:** The input text the model should predict a label for. > - **Text:** The input text the model should predict a label for.
> - **Label:** The label the model should predict. > - **Label:** The label the model should predict.
> - **Gradient:** Gradient of the loss function calculating the difference > - **Gradient:** The direction and rate of change for a numeric value.
> between input and expected output. > Minimising the gradient of the weights should result in predictions that
> are closer to the reference labels on the training data.
![The training process](../../images/training.svg) ![The training process](../../images/training.svg)
When training a model, we don't just want it to memorize our examples we want When training a model, we don't just want it to memorize our examples we want
it to come up with a theory that can be **generalized across other examples**. it to come up with a theory that can be **generalized across unseen data**.
After all, we don't just want the model to learn that this one instance of After all, we don't just want the model to learn that this one instance of
"Amazon" right here is a company we want it to learn that "Amazon", in "Amazon" right here is a company we want it to learn that "Amazon", in
contexts _like this_, is most likely a company. That's why the training data contexts _like this_, is most likely a company. That's why the training data
@ -34,5 +38,7 @@ it's learning the right things, you don't only need **training data** you'll
also need **evaluation data**. If you only test the model with the data it was also need **evaluation data**. If you only test the model with the data it was
trained on, you'll have no idea how well it's generalizing. If you want to train trained on, you'll have no idea how well it's generalizing. If you want to train
a model from scratch, you usually need at least a few hundred examples for both a model from scratch, you usually need at least a few hundred examples for both
training and evaluation. To update an existing model, you can already achieve training and evaluation. A good rule of thumb is that you should have 10
decent results with very few examples as long as they're representative. samples for each significant figure of accuracy you report.
If you only have 100 samples and your model predicts 92 of them correctly, you
would report accuracy of 0.9 rather than 0.92.