spaCy/spacy/pipeline
Daniël de Kok e27c60a702
Reimplement distillation with oracle cut size (#12214)
* Improve the correctness of _parse_patch

* If there are no more actions, do not attempt to make further
  transitions, even if not all states are final.
* Assert that the number of actions for a step is the same as
  the number of states.

* Reimplement distillation with oracle cut size

The code for distillation with an oracle cut size was not reimplemented
after the parser refactor. We did not notice, because we did not have
tests for this functionality. This change brings back the functionality
and adds this to the parser tests.

* Rename states2actions to _states_to_actions for consistency

* Test distillation max cuts in NER

* Mark parser/NER tests as slow

* Typo

* Fix invariant in _states_diff_to_actions

* Rename _init_batch -> _init_batch_from_teacher

* Ninja edit the ninja edit

* Check that we raise an exception when we pass the incorrect number or actions

* Remove unnecessary get

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Write out condition more explicitly

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2023-02-21 15:47:18 +01:00
..
_edit_tree_internals Refactor error messages to remove hardcoded strings (#10729) 2022-05-02 13:38:46 +02:00
_parser_internals Merge the parser refactor into v4 (#10940) 2023-01-18 11:27:45 +01:00
__init__.py Replace EntityRuler with SpanRuler implementation (#11320) 2022-10-24 09:11:35 +02:00
attribute_ruler.py Make stable private modules public and adjust names (#11353) 2022-08-30 13:56:35 +02:00
dep_parser.py Merge the parser refactor into v4 (#10940) 2023-01-18 11:27:45 +01:00
edit_tree_lemmatizer.py Format 2023-01-27 08:29:46 +01:00
entity_linker.py Cleanup/remove backwards compat overwrite settings (#11888) 2023-02-02 14:13:38 +01:00
entityruler.py Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
functions.py Add doc_cleaner component (#9659) 2021-11-23 15:33:33 +01:00
lemmatizer.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
morphologizer.pyx Cleanup/remove backwards compat overwrite settings (#11888) 2023-02-02 14:13:38 +01:00
ner.py Format 2023-01-27 08:29:46 +01:00
pipe.pxd TrainablePipe (#6213) 2020-10-08 21:33:49 +02:00
pipe.pyi Add Pipe.hide_labels to omit labels from pipeline meta (#10175) 2022-02-05 17:59:24 +01:00
pipe.pyx Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
sentencizer.pyx Cleanup/remove backwards compat overwrite settings (#11888) 2023-02-02 14:13:38 +01:00
senter.pyx Cleanup/remove backwards compat overwrite settings (#11888) 2023-02-02 14:13:38 +01:00
span_ruler.py Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
spancat.py Drop python 3.6/3.7, remove unneeded compat (#12187) 2023-01-27 15:48:20 +01:00
tagger.pyx Cleanup/remove backwards compat overwrite settings (#11888) 2023-02-02 14:13:38 +01:00
textcat_multilabel.py Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
textcat.py Merge branch 'copy_master' into copy_v4 2023-01-03 13:34:05 +01:00
tok2vec.py Prevent tok2vec to broadcast to listeners when predicting (#11385) 2022-09-12 15:36:48 +02:00
trainable_pipe.pxd Store activations in Docs when save_activations is enabled (#11002) 2022-09-13 09:51:12 +02:00
trainable_pipe.pyx Add Language.distill (#12116) 2023-01-30 12:44:11 +01:00
transition_parser.pyx Reimplement distillation with oracle cut size (#12214) 2023-02-21 15:47:18 +01:00