spaCy/spacy/training
Daniël de Kok 8a5814bf2c
Add distillation loop (#12542)
* Add distillation initialization and loop

* Fix up configuration keys

* Add docstring

* Type annotations

* init_nlp_distill -> init_nlp_student

* Do not resolve dot name distill corpus in initialization

(Since we don't use it.)

* student: do not request use of optimizer in student pipe

We apply finish up the updates once in the training loop instead.

Also add the necessary logic to `Language.distill` to mirror
`Language.update`.

* Correctly determine sort key in subdivide_batch

* Fix _distill_loop docstring wrt. stopping condition

* _distill_loop: fix distill_data docstring

Make similar changes in train_while_improving, since it also had
incorrect types and missing type annotations.

* Move `set_{gpu_allocator,seed}_from_config` to spacy.util

* Update Language.update docs for the sgd argument

* Type annotation

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2023-04-21 13:49:40 +02:00
..
converters Rename language codes (Icelandic, multi-language) (#12149) 2023-01-31 17:30:43 +01:00
__init__.pxd Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
__init__.py Merge remote-tracking branch 'upstream/master' into update-v4-from-master-1 2023-01-27 08:29:09 +01:00
align.pyx Fix alignment for 1-to-1 tokens and lowercasing (#6476) 2020-12-08 14:25:16 +08:00
alignment_array.pxd Alignment: use a simplified ragged type for performance (#10319) 2022-04-01 09:02:06 +02:00
alignment_array.pyx Backport parser/alignment optimizations from feature/refactor-parser (#10952) 2022-06-24 13:39:52 +02:00
alignment.py Alignment: use a simplified ragged type for performance (#10319) 2022-04-01 09:02:06 +02:00
augment.py Preserve missing entity annotation in augmenters (#11540) 2022-09-27 10:16:51 +02:00
batchers.py Fix batching regression (#12094) 2023-01-18 18:28:30 +01:00
callbacks.py Have logging calls use string formatting types (#12215) 2023-02-02 11:15:22 +01:00
corpus.py Have logging calls use string formatting types (#12215) 2023-02-02 11:15:22 +01:00
example.pxd Make a pre-check to speed up alignment cache (#6139) 2020-09-24 18:13:39 +02:00
example.pyx Merge the parser refactor into v4 (#10940) 2023-01-18 11:27:45 +01:00
gold_io.pyx Fix is_sent_start when converting from JSON (fix #7635) (#7655) 2021-04-08 18:24:52 +10:00
initialize.py Add distillation loop (#12542) 2023-04-21 13:49:40 +02:00
iob_utils.py Preserve missing entity annotation in augmenters (#11540) 2022-09-27 10:16:51 +02:00
loggers.py New console logger with expanded progress tracking (#11972) 2022-12-23 15:21:44 +01:00
loop.py Add distillation loop (#12542) 2023-04-21 13:49:40 +02:00
pretrain.py Clarify how to fill in init_tok2vec after pretraining (#9639) 2021-11-18 15:38:30 +01:00