spaCy/spacy
Daniël de Kok b052b1b47f
Fix batching regression (#12094)
* Fix batching regression

Some time ago, the spaCy v4 branch switched to the new Thinc v9
schedule. However, this introduced an error in how batching is handed.

In the PR, the batchers were changed to keep track of their step,
so that the step can be passed to the schedule. However, the issue
is that the training loop repeatedly calls the batching functions
(rather than using an infinite generator/iterator). So, the step and
therefore the schedule would be reset each epoch. Before the schedule
switch we didn't have this issue, because the old schedules were
stateful.

This PR fixes this issue by reverting the batching functions to use
a (stateful) generator. Their registry functions do accept a `Schedule`
and we convert `Schedule`s to generators.

* Update batcher docs

* Docstring fixes

* Make minibatch take iterables again as well

* Bump thinc requirement to 9.0.0.dev2

* Use type declaration

* Convert another comment into a proper type declaration
2023-01-18 18:28:30 +01:00
..
cli Merge the parser refactor into v4 (#10940) 2023-01-18 11:27:45 +01:00
displacy improve ux for displacy when the serve port is in use (#11948) 2023-01-10 15:52:57 +09:00
kb Refactor KB for easier customization (#11268) 2022-09-08 10:38:07 +02:00
lang Merge remote-tracking branch 'upstream/master' into chore/v4-merge-master-20221222 2022-12-22 10:08:54 +01:00
matcher Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
ml Merge the parser refactor into v4 (#10940) 2023-01-18 11:27:45 +01:00
pipeline Merge the parser refactor into v4 (#10940) 2023-01-18 11:27:45 +01:00
tests Fix batching regression (#12094) 2023-01-18 18:28:30 +01:00
tokens Merge remote-tracking branch 'upstream/master' into chore/v4-merge-master-20221222 2022-12-22 10:08:54 +01:00
training Fix batching regression (#12094) 2023-01-18 18:28:30 +01:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Simplify and clarify enable/disable behavior of spacy.load() (#11459) 2022-09-27 14:22:36 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.5.0 2022-11-25 12:05:25 +01:00
attrs.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
attrs.pyx Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
compat.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Add training.before_update callback (#11739) 2022-11-23 17:54:58 +01:00
errors.py Merge the parser refactor into v4 (#10940) 2023-01-18 11:27:45 +01:00
glossary.py Add glossary entry for root (#10821) 2022-05-20 09:56:32 +02:00
language.py Remove all references to "begin_training" (#11943) 2022-12-08 11:43:52 +01:00
lexeme.pxd Delete unused imports for StringStore (#12040) 2023-01-03 17:43:09 +01:00
lexeme.pyi Remove sentiment extension (#11722) 2022-11-23 13:09:32 +01:00
lexeme.pyx Remove sentiment extension (#11722) 2022-11-23 13:09:32 +01:00
lookups.py Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786) 2022-05-25 09:33:54 +02:00
morphology.pxd Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
morphology.pyx Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
parts_of_speech.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
scorer.py Restore v2 token_acc score implementation (#12073) 2023-01-11 08:01:47 +01:00
strings.pxd StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
strings.pyi StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
strings.pyx StringStore refactoring (#11344) 2022-10-06 10:51:06 +02:00
structs.pxd Morphology/Morphologizer optimizations and refactoring (#11024) 2022-07-15 11:14:08 +02:00
symbols.pxd Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
symbols.pyx Consolidate and freeze symbols (#11352) 2022-09-02 09:08:40 +02:00
tokenizer.pxd Delete unused imports for StringStore (#12040) 2023-01-03 17:43:09 +01:00
tokenizer.pyx Update/remove old Matcher syntax (#11370) 2022-08-30 15:40:31 +02:00
ty.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Fix batching regression (#12094) 2023-01-18 18:28:30 +01:00
vectors.pyx Add equality definition for vectors (#11806) 2022-11-16 09:44:42 +01:00
vocab.pxd Cleanup Cython structs (#11337) 2022-08-22 15:52:24 +02:00
vocab.pyi Cleanup Cython structs (#11337) 2022-08-22 15:52:24 +02:00
vocab.pyx Merge branch 'copy_master' into copy_v4 2022-12-05 08:56:15 +01:00