spaCy/spacy/tests/pipeline
Daniël de Kok eec5ccd72f
Language.update: ensure that tok2vec gets updated (#12136)
* `Language.update`: ensure that tok2vec gets updated

The components in a pipeline can be updated independently. However,
tok2vec implementations are an exception to this, since they depend on
listeners for their gradients. The update method of a tok2vec
implementation computes the tok2vec forward and passes this along with a
backprop function to the listeners. This backprop function accumulates
gradients for all the listeners. There are two ways in which the
accumulated gradients can be used to update the tok2vec weights:

1. Call the `finish_update` method of tok2vec *after* the `update`
   method is called on all of the pipes that use a tok2vec listener.
2. Pass an optimizer to the `update` method of tok2vec. In this
   case, tok2vec will give the last listener a special backprop
   function that calls `finish_update` on the tok2vec.

Unfortunately, `Language.update` did neither of these. Instead, it
immediately called `finish_update` on every pipe after `update`. As a
result, the tok2vec weights are updated when no gradients have been
accumulated from listeners yet. And the gradients of the listeners are
only used in the next call to `Language.update` (when `finish_update` is
called on tok2vec again).

This change fixes this issue by passing the optimizer to the `update`
method of trainable pipes, leading to use of the second strategy
outlined above.

The main updating loop in `Language.update` is also simplified by using
the `TrainableComponent` protocol consistently.

* Train loop: `sgd` is `Optional[Optimizer]`, do not pass false

* Language.update: call pipe finish_update after all pipe updates

This does correct and fast updates if multiple components update the
same parameters.

* Add comment why we moved `finish_update` to a separate loop
2023-02-03 15:22:25 +01:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_analysis.py Simplify pipe analysis 2020-08-01 13:40:06 +02:00
test_annotates_on_update.py Language.update: ensure that tok2vec gets updated (#12136) 2023-02-03 15:22:25 +01:00
test_attributeruler.py Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
test_edit_tree_lemmatizer.py Merge remote-tracking branch 'upstream/master' into update-v4-from-master-1 2023-01-27 08:29:09 +01:00
test_entity_linker.py Move Entity Linker v1 to spacy-legacy (#12006) 2023-02-01 09:47:56 +01:00
test_entity_ruler.py update tests from master to follow v4 principles (2) 2023-01-11 19:04:06 +01:00
test_functions.py Add doc_cleaner component (#9659) 2021-11-23 15:33:33 +01:00
test_initialize.py Test with default value 2020-09-29 17:00:40 +02:00
test_lemmatizer.py Tidy up and auto-format 2021-07-18 15:44:56 +10:00
test_models.py Make stable private modules public and adjust names (#11353) 2022-08-30 13:56:35 +02:00
test_morphologizer.py Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
test_pipe_factories.py Auto-format code with black (#10795) 2022-05-13 19:02:08 +02:00
test_pipe_methods.py Remove all references to "begin_training" (#11943) 2022-12-08 11:43:52 +01:00
test_sentencizer.py Refactor Docs.is_ flags (#6044) 2020-09-17 00:14:01 +02:00
test_senter.py Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
test_span_ruler.py Rename language codes (Icelandic, multi-language) (#12149) 2023-01-31 17:30:43 +01:00
test_spancat.py Merge branch 'copy_master' into copy_v4 2022-12-05 08:56:15 +01:00
test_tagger.py Fix batching regression (#12094) 2023-01-18 18:28:30 +01:00
test_textcat.py Fix batching regression (#12094) 2023-01-18 18:28:30 +01:00
test_tok2vec.py Merge the parser refactor into v4 (#10940) 2023-01-18 11:27:45 +01:00