* Add distillation initialization and loop
* Fix up configuration keys
* Add docstring
* Type annotations
* init_nlp_distill -> init_nlp_student
* Do not resolve dot name distill corpus in initialization
(Since we don't use it.)
* student: do not request use of optimizer in student pipe
We apply finish up the updates once in the training loop instead.
Also add the necessary logic to `Language.distill` to mirror
`Language.update`.
* Correctly determine sort key in subdivide_batch
* Fix _distill_loop docstring wrt. stopping condition
* _distill_loop: fix distill_data docstring
Make similar changes in train_while_improving, since it also had
incorrect types and missing type annotations.
* Move `set_{gpu_allocator,seed}_from_config` to spacy.util
* Update Language.update docs for the sgd argument
* Type annotation
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>