mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-25 17:36:30 +03:00
Add training 101
This commit is contained in:
parent
abed463bbb
commit
2f40d6e7e7
|
@ -1,3 +1,52 @@
|
|||
//- 💫 DOCS > USAGE > SPACY 101 > TRAINING
|
||||
|
||||
+under-construction
|
||||
p
|
||||
| spaCy's models are #[strong statistical] and every "decision" they make –
|
||||
| for example, which part-of-speech tag to assign, or whether a word is a
|
||||
| named entity – is a #[strong prediction]. This prediction is based
|
||||
| on the examples the model has seen during #[strong training]. To train
|
||||
| a model, you first need training data – examples of text, and the
|
||||
| labels you want the model to predict. This could be a part-of-speech tag,
|
||||
| a named entity or any other information.
|
||||
|
||||
p
|
||||
| The model is then shown the unlabelled text and will make a prediction.
|
||||
| Because we know the correct answer, we can give the model feedback on its
|
||||
| prediction in the form of an #[strong error gradient] of the
|
||||
| #[strong loss function] that calculates the difference between the training
|
||||
| example and the expected output. The greater the difference, the more
|
||||
| significant the gradient and the updates to our model.
|
||||
|
||||
+aside
|
||||
| #[strong Training data:] Examples and their annotations.#[br]
|
||||
| #[strong Text:] The input text the model should predict a label for.#[br]
|
||||
| #[strong Label:] The label the model should predict.#[br]
|
||||
| #[strong Gradient:] Gradient of the loss function calculating the
|
||||
| difference between input and expected output.
|
||||
|
||||
+image
|
||||
include ../../../assets/img/docs/training.svg
|
||||
.u-text-right
|
||||
+button("/assets/img/docs/training.svg", false, "secondary").u-text-tag View large graphic
|
||||
|
||||
p
|
||||
| When training a model, we don't just want it to memorise our examples –
|
||||
| we want it to come up with theory that can be
|
||||
| #[strong generalised across other examples]. After all, we don't just want
|
||||
| the model to learn that this one instance of "Amazon" right here is a
|
||||
| company – we want it to learn that "Amazon", in contexts #[em like this],
|
||||
| is most likely a company. That's why the training data should always be
|
||||
| representative of the data we want to process. A model trained on
|
||||
| Wikipedia, where sentences in the first person are extremely rare, will
|
||||
| likely perform badly on Twitter. Similarly, a model trained on romantic
|
||||
| novels will likely perform badly on legal text.
|
||||
|
||||
p
|
||||
| This also means that in order to know how the model is performing,
|
||||
| and whether it's learning the right things, you don't only need
|
||||
| #[strong training data] – you'll also need #[strong evaluation data]. If
|
||||
| you only test the model with the data it was trained on, you'll have no
|
||||
| idea how well it's generalising. If you want to train a model from scratch,
|
||||
| you usually need at least a few hundred examples for both training and
|
||||
| evaluation. To update an existing model, you can already achieve decent
|
||||
| results with very few examples – as long as they're representative.
|
||||
|
|
|
@ -252,6 +252,12 @@ include _spacy-101/_serialization
|
|||
|
||||
include _spacy-101/_training
|
||||
|
||||
+infobox
|
||||
| To learn more about #[strong training and updating] models, how to create
|
||||
| training data and how to improve spaCy's named entity recognition models,
|
||||
| see the usage guides on #[+a("/docs/usage/training") training] and
|
||||
| #[+a("/docs/usage/training-ner") training the named entity recognizer].
|
||||
|
||||
+h(2, "architecture") Architecture
|
||||
|
||||
+under-construction
|
||||
|
|
Loading…
Reference in New Issue
Block a user