spaCy/spacy
Daniël de Kok 5e297aa20e
Add TrainablePipe.{distill,get_teacher_student_loss} (#12016)
* Add `TrainablePipe.{distill,get_teacher_student_loss}`

This change adds two methods:

- `TrainablePipe::distill` which performs a training step of a
   student pipe on a teacher pipe, giving a batch of `Doc`s.
- `TrainablePipe::get_teacher_student_loss` computes the loss
  of a student relative to the teacher.

The `distill` or `get_teacher_student_loss` methods are also implemented
in the tagger, edit tree lemmatizer, and parser pipes, to enable
distillation in those pipes and as an example for other pipes.

* Fix stray `Beam` import

* Fix incorrect import

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* TrainablePipe.distill: use `Iterable[Example]`

* Add Pipe.is_distillable method

* Add `validate_distillation_examples`

This first calls `validate_examples` and then checks that the
student/teacher tokens are the same.

* Update distill documentation

* Add distill documentation for all pipes that support distillation

* Fix incorrect identifier

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add comment to explain `is_distillable`

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-01-16 10:25:53 +01:00
..
cli Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
displacy improve ux for displacy when the serve port is in use (#11948) 2023-01-10 15:52:57 +09:00
kb Refactor KB for easier customization (#11268) 2022-09-08 10:38:07 +02:00
lang Merge remote-tracking branch 'upstream/master' into chore/v4-merge-master-20221222 2022-12-22 10:08:54 +01:00
matcher Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
ml Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
pipeline Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
tests Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
tokens Merge remote-tracking branch 'upstream/master' into chore/v4-merge-master-20221222 2022-12-22 10:08:54 +01:00
training Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Simplify and clarify enable/disable behavior of spacy.load() (#11459) 2022-09-27 14:22:36 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.5.0 2022-11-25 12:05:25 +01:00
attrs.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
attrs.pyx Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
compat.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Add training.before_update callback (#11739) 2022-11-23 17:54:58 +01:00
errors.py Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
glossary.py Add glossary entry for root (#10821) 2022-05-20 09:56:32 +02:00
language.py Remove all references to "begin_training" (#11943) 2022-12-08 11:43:52 +01:00
lexeme.pxd Delete unused imports for StringStore (#12040) 2023-01-03 17:43:09 +01:00
lexeme.pyi Remove sentiment extension (#11722) 2022-11-23 13:09:32 +01:00
lexeme.pyx Remove sentiment extension (#11722) 2022-11-23 13:09:32 +01:00
lookups.py Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786) 2022-05-25 09:33:54 +02:00
morphology.pxd Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
morphology.pyx Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
parts_of_speech.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
scorer.py Restore v2 token_acc score implementation (#12073) 2023-01-11 08:01:47 +01:00
strings.pxd StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
strings.pyi StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
strings.pyx StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
structs.pxd Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
symbols.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
symbols.pyx Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
tokenizer.pxd Delete unused imports for StringStore (#12040) 2023-01-03 17:43:09 +01:00
tokenizer.pyx Update/remove old Matcher syntax (#11370) 2022-08-30 15:40:31 +02:00
ty.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
vectors.pyx Add equality definition for vectors (#11806) 2022-11-16 09:44:42 +01:00
vocab.pxd Cleanup Cython structs (#11337) 2022-08-22 15:52:24 +02:00
vocab.pyi Cleanup Cython structs (#11337) 2022-08-22 15:52:24 +02:00
vocab.pyx Merge branch 'copy_master' into copy_v4 2022-12-05 08:56:15 +01:00