spaCy/spacy
Daniël de Kok e27c60a702
Reimplement distillation with oracle cut size (#12214)
* Improve the correctness of _parse_patch

* If there are no more actions, do not attempt to make further
  transitions, even if not all states are final.
* Assert that the number of actions for a step is the same as
  the number of states.

* Reimplement distillation with oracle cut size

The code for distillation with an oracle cut size was not reimplemented
after the parser refactor. We did not notice, because we did not have
tests for this functionality. This change brings back the functionality
and adds this to the parser tests.

* Rename states2actions to _states_to_actions for consistency

* Test distillation max cuts in NER

* Mark parser/NER tests as slow

* Typo

* Fix invariant in _states_diff_to_actions

* Rename _init_batch -> _init_batch_from_teacher

* Ninja edit the ninja edit

* Check that we raise an exception when we pass the incorrect number or actions

* Remove unnecessary get

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Write out condition more explicitly

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2023-02-21 15:47:18 +01:00
..
cli Remove names for vectors (#12243) 2023-02-08 14:37:42 +01:00
displacy Auto-format code with black (#12100) 2023-01-13 10:12:10 +01:00
kb API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar (#12128) 2023-01-19 13:29:17 +01:00
lang Rename language codes (Icelandic, multi-language) (#12149) 2023-01-31 17:30:43 +01:00
matcher Drop python 3.6/3.7, remove unneeded compat (#12187) 2023-01-27 15:48:20 +01:00
ml Reimplement distillation with oracle cut size (#12214) 2023-02-21 15:47:18 +01:00
pipeline Reimplement distillation with oracle cut size (#12214) 2023-02-21 15:47:18 +01:00
tests Reimplement distillation with oracle cut size (#12214) 2023-02-21 15:47:18 +01:00
tokens Make Span.char_span optional args keyword-only (#12257) 2023-02-15 12:34:33 +01:00
training Remove names for vectors (#12243) 2023-02-08 14:37:42 +01:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Simplify and clarify enable/disable behavior of spacy.load() (#11459) 2022-09-27 14:22:36 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v4.0.0.dev0 (#12126) 2023-01-19 09:25:34 +01:00
attrs.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
attrs.pyx Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
compat.py Drop python 3.6/3.7, remove unneeded compat (#12187) 2023-01-27 15:48:20 +01:00
default_config_distillation.cfg Add the configuration schema for distillation (#12201) 2023-01-31 13:06:02 +01:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Add training.before_update callback (#11739) 2022-11-23 17:54:58 +01:00
errors.py Rename language codes (Icelandic, multi-language) (#12149) 2023-01-31 17:30:43 +01:00
glossary.py Add glossary entry for root (#10821) 2022-05-20 09:56:32 +02:00
language.py Remove names for vectors (#12243) 2023-02-08 14:37:42 +01:00
lexeme.pxd Delete unused imports for StringStore (#12040) 2023-01-03 17:43:09 +01:00
lexeme.pyi Remove sentiment extension (#11722) 2022-11-23 13:09:32 +01:00
lexeme.pyx Refactor lexeme mem passing (#12125) 2023-01-25 12:50:21 +09:00
lookups.py Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786) 2022-05-25 09:33:54 +02:00
morphology.pxd Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
morphology.pyx Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
parts_of_speech.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Add the configuration schema for distillation (#12201) 2023-01-31 13:06:02 +01:00
scorer.py Rename language codes (Icelandic, multi-language) (#12149) 2023-01-31 17:30:43 +01:00
strings.pxd StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
strings.pyi StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
strings.pyx StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
structs.pxd Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
symbols.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
symbols.pyx Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
tokenizer.pxd Refactor lexeme mem passing (#12125) 2023-01-25 12:50:21 +09:00
tokenizer.pyx Refactor lexeme mem passing (#12125) 2023-01-25 12:50:21 +09:00
ty.py Add Language.distill (#12116) 2023-01-30 12:44:11 +01:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Use tempfile.TemporaryDirectory (#12285) 2023-02-16 11:08:55 +01:00
vectors.pyx Remove names for vectors (#12243) 2023-02-08 14:37:42 +01:00
vocab.pxd Refactor lexeme mem passing (#12125) 2023-01-25 12:50:21 +09:00
vocab.pyi Remove names for vectors (#12243) 2023-02-08 14:37:42 +01:00
vocab.pyx Remove names for vectors (#12243) 2023-02-08 14:37:42 +01:00