Add notes on catastrophic forgetting (see #1496)

This commit is contained in:
ines 2017-11-06 13:17:02 +01:00
parent e68d31bffa
commit 2dca9e71a1
2 changed files with 16 additions and 0 deletions

View File

@ -40,6 +40,10 @@ from spacy.gold import GoldParse, minibatch
LABEL = 'ANIMAL' LABEL = 'ANIMAL'
# training data # training data
# Note: If you're using an existing model, make sure to mix in examples of
# other entity types that spaCy correctly recognized before. Otherwise, your
# model might learn the new type, but "forget" what it previously knew.
# https://explosion.ai/blog/pseudo-rehearsal-catastrophic-forgetting
TRAIN_DATA = [ TRAIN_DATA = [
("Horses are too tall and they pretend to care about your feelings", ("Horses are too tall and they pretend to care about your feelings",
[(0, 6, 'ANIMAL')]), [(0, 6, 'ANIMAL')]),

View File

@ -144,3 +144,15 @@ p
| novel symbol, #[code -PRON-], which is used as the lemma for | novel symbol, #[code -PRON-], which is used as the lemma for
| all personal pronouns. For more info on this, see the | all personal pronouns. For more info on this, see the
| #[+api("annotation#lemmatization") annotation specs] on lemmatization. | #[+api("annotation#lemmatization") annotation specs] on lemmatization.
+h(3, "catastrophic-forgetting") NER model doesn't recognise other entities anymore after training
p
| If your training data only contained new entities and you didn't mix in
| any examples the model previously recognised, it can cause the model to
| "forget" what it had previously learned. This is also referred to as the
| #[+a("https://explosion.ai/blog/pseudo-rehearsal-catastrophic-forgetting", true) "catastrophic forgetting problem"].
| A solution is to pre-label some text, and mix it with the new text in
| your updates. You can also do this by running spaCy over some text,
| extracting a bunch of entities the model previously recognised correctly,
| and adding them to your training examples.