spaCy/spacy
Daniël de Kok 8a5814bf2c
Add distillation loop (#12542)
* Add distillation initialization and loop

* Fix up configuration keys

* Add docstring

* Type annotations

* init_nlp_distill -> init_nlp_student

* Do not resolve dot name distill corpus in initialization

(Since we don't use it.)

* student: do not request use of optimizer in student pipe

We apply finish up the updates once in the training loop instead.

Also add the necessary logic to `Language.distill` to mirror
`Language.update`.

* Correctly determine sort key in subdivide_batch

* Fix _distill_loop docstring wrt. stopping condition

* _distill_loop: fix distill_data docstring

Make similar changes in train_while_improving, since it also had
incorrect types and missing type annotations.

* Move `set_{gpu_allocator,seed}_from_config` to spacy.util

* Update Language.update docs for the sgd argument

* Type annotation

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2023-04-21 13:49:40 +02:00
..
cli Merge branch 'master' into sync/master-into-v4 2023-03-06 16:27:56 +01:00
displacy Auto-format code with black (#12100) 2023-01-13 10:12:10 +01:00
kb Entity linking: use SpanGroup instead of Iterable[Span] for mentions (#12344) 2023-03-20 12:25:18 +01:00
lang Merge branch 'master' into sync/master-into-v4 2023-03-02 16:24:15 +01:00
matcher Merge branch 'master' into sync/master-into-v4 2023-03-02 16:24:15 +01:00
ml Entity linking: use SpanGroup instead of Iterable[Span] for mentions (#12344) 2023-03-20 12:25:18 +01:00
pipeline Entity linking: use SpanGroup instead of Iterable[Span] for mentions (#12344) 2023-03-20 12:25:18 +01:00
tests Add distillation loop (#12542) 2023-04-21 13:49:40 +02:00
tokens Enforce that Span.start/end(_char) remain valid and in sync (#12268) 2023-04-06 16:01:59 +02:00
training Add distillation loop (#12542) 2023-04-21 13:49:40 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Simplify and clarify enable/disable behavior of spacy.load() (#11459) 2022-09-27 14:22:36 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v4.0.0.dev0 (#12126) 2023-01-19 09:25:34 +01:00
attrs.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
attrs.pyx Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
compat.py Drop python 3.6/3.7, remove unneeded compat (#12187) 2023-01-27 15:48:20 +01:00
default_config_distillation.cfg Add the configuration schema for distillation (#12201) 2023-01-31 13:06:02 +01:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Add training.before_update callback (#11739) 2022-11-23 17:54:58 +01:00
errors.py Enforce that Span.start/end(_char) remain valid and in sync (#12268) 2023-04-06 16:01:59 +02:00
glossary.py Add glossary entry for root (#10821) 2022-05-20 09:56:32 +02:00
language.py Add distillation loop (#12542) 2023-04-21 13:49:40 +02:00
lexeme.pxd Delete unused imports for StringStore (#12040) 2023-01-03 17:43:09 +01:00
lexeme.pyi Remove sentiment extension (#11722) 2022-11-23 13:09:32 +01:00
lexeme.pyx Refactor lexeme mem passing (#12125) 2023-01-25 12:50:21 +09:00
lookups.py Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786) 2022-05-25 09:33:54 +02:00
morphology.pxd Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
morphology.pyx Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
parts_of_speech.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Add distillation loop (#12542) 2023-04-21 13:49:40 +02:00
scorer.py Rename language codes (Icelandic, multi-language) (#12149) 2023-01-31 17:30:43 +01:00
strings.pxd StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
strings.pyi Clean up Vocab constructor (#12290) 2023-03-19 23:41:20 +01:00
strings.pyx StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
structs.pxd Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
symbols.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
symbols.pyx Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
tokenizer.pxd Refactor lexeme mem passing (#12125) 2023-01-25 12:50:21 +09:00
tokenizer.pyx Refactor lexeme mem passing (#12125) 2023-01-25 12:50:21 +09:00
ty.py Add Language.distill (#12116) 2023-01-30 12:44:11 +01:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Add distillation loop (#12542) 2023-04-21 13:49:40 +02:00
vectors.pyx Remove names for vectors (#12243) 2023-02-08 14:37:42 +01:00
vocab.pxd Refactor lexeme mem passing (#12125) 2023-01-25 12:50:21 +09:00
vocab.pyi Clean up Vocab constructor (#12290) 2023-03-19 23:41:20 +01:00
vocab.pyx Clean up Vocab constructor (#12290) 2023-03-19 23:41:20 +01:00