spaCy/spacy/tests/pipeline
Daniël de Kok 5e297aa20e
Add TrainablePipe.{distill,get_teacher_student_loss} (#12016)
* Add `TrainablePipe.{distill,get_teacher_student_loss}`

This change adds two methods:

- `TrainablePipe::distill` which performs a training step of a
   student pipe on a teacher pipe, giving a batch of `Doc`s.
- `TrainablePipe::get_teacher_student_loss` computes the loss
  of a student relative to the teacher.

The `distill` or `get_teacher_student_loss` methods are also implemented
in the tagger, edit tree lemmatizer, and parser pipes, to enable
distillation in those pipes and as an example for other pipes.

* Fix stray `Beam` import

* Fix incorrect import

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* TrainablePipe.distill: use `Iterable[Example]`

* Add Pipe.is_distillable method

* Add `validate_distillation_examples`

This first calls `validate_examples` and then checks that the
student/teacher tokens are the same.

* Update distill documentation

* Add distill documentation for all pipes that support distillation

* Fix incorrect identifier

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add comment to explain `is_distillable`

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-01-16 10:25:53 +01:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_analysis.py Simplify pipe analysis 2020-08-01 13:40:06 +02:00
test_annotates_on_update.py Tidy up and auto-format 2021-07-18 15:44:56 +10:00
test_attributeruler.py Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
test_edit_tree_lemmatizer.py Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
test_entity_linker.py Merge remote-tracking branch 'upstream/master' into chore/update-v4-from-master-4 2022-11-03 09:42:36 +01:00
test_entity_ruler.py update tests from master to follow v4 principles (2) 2023-01-11 19:04:06 +01:00
test_functions.py Add doc_cleaner component (#9659) 2021-11-23 15:33:33 +01:00
test_initialize.py Test with default value 2020-09-29 17:00:40 +02:00
test_lemmatizer.py Tidy up and auto-format 2021-07-18 15:44:56 +10:00
test_models.py Make stable private modules public and adjust names (#11353) 2022-08-30 13:56:35 +02:00
test_morphologizer.py Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
test_pipe_factories.py Auto-format code with black (#10795) 2022-05-13 19:02:08 +02:00
test_pipe_methods.py Remove all references to "begin_training" (#11943) 2022-12-08 11:43:52 +01:00
test_sentencizer.py Refactor Docs.is_ flags (#6044) 2020-09-17 00:14:01 +02:00
test_senter.py Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
test_span_ruler.py Add SpanRuler component (#9880) 2022-06-02 13:12:53 +02:00
test_spancat.py Merge branch 'copy_master' into copy_v4 2022-12-05 08:56:15 +01:00
test_tagger.py Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
test_textcat.py Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
test_tok2vec.py Auto-format code with black (#11687) 2022-10-21 11:54:17 +02:00