spaCy/spacy/training
Daniël de Kok b052b1b47f
Fix batching regression (#12094)
* Fix batching regression

Some time ago, the spaCy v4 branch switched to the new Thinc v9
schedule. However, this introduced an error in how batching is handed.

In the PR, the batchers were changed to keep track of their step,
so that the step can be passed to the schedule. However, the issue
is that the training loop repeatedly calls the batching functions
(rather than using an infinite generator/iterator). So, the step and
therefore the schedule would be reset each epoch. Before the schedule
switch we didn't have this issue, because the old schedules were
stateful.

This PR fixes this issue by reverting the batching functions to use
a (stateful) generator. Their registry functions do accept a `Schedule`
and we convert `Schedule`s to generators.

* Update batcher docs

* Docstring fixes

* Make minibatch take iterables again as well

* Bump thinc requirement to 9.0.0.dev2

* Use type declaration

* Convert another comment into a proper type declaration
2023-01-18 18:28:30 +01:00
..
converters Auto-format code with black (#10377) 2022-02-25 10:00:21 +01:00
__init__.pxd Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
__init__.py Add TrainablePipe.{distill,get_teacher_student_loss} (#12016) 2023-01-16 10:25:53 +01:00
align.pyx Fix alignment for 1-to-1 tokens and lowercasing (#6476) 2020-12-08 14:25:16 +08:00
alignment_array.pxd Alignment: use a simplified ragged type for performance (#10319) 2022-04-01 09:02:06 +02:00
alignment_array.pyx Backport parser/alignment optimizations from feature/refactor-parser (#10952) 2022-06-24 13:39:52 +02:00
alignment.py Alignment: use a simplified ragged type for performance (#10319) 2022-04-01 09:02:06 +02:00
augment.py Preserve missing entity annotation in augmenters (#11540) 2022-09-27 10:16:51 +02:00
batchers.py Fix batching regression (#12094) 2023-01-18 18:28:30 +01:00
callbacks.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
corpus.py Auto-format code with black (#9664) 2021-11-12 10:00:03 +01:00
example.pxd Make a pre-check to speed up alignment cache (#6139) 2020-09-24 18:13:39 +02:00
example.pyx Merge the parser refactor into v4 (#10940) 2023-01-18 11:27:45 +01:00
gold_io.pyx Fix is_sent_start when converting from JSON (fix #7635) (#7655) 2021-04-08 18:24:52 +10:00
initialize.py Clean up warnings in the test suite (#11331) 2022-08-22 12:04:30 +02:00
iob_utils.py Preserve missing entity annotation in augmenters (#11540) 2022-09-27 10:16:51 +02:00
loggers.py New console logger with expanded progress tracking (#11972) 2022-12-23 15:21:44 +01:00
loop.py Pass step=0 to Schedule class to yield initial learning rate (#12078) 2023-01-09 20:15:02 +01:00
pretrain.py Clarify how to fill in init_tok2vec after pretraining (#9639) 2021-11-18 15:38:30 +01:00