spaCy/spacy
Daniël de Kok 128197a5fc
Properly clean up pipe multiprocessing workers (#13259)
Before this change, the workers of pipe call with n_process != 1 were
stopped by calling `terminate` on the processes. However, terminating a
process can leave queues, pipes, and other concurrent data structures in
an invalid state.

With this change, we stop using terminate and take the following approach
instead:

* When the all documents are processed, the parent process puts a
  sentinel in the queue of each worker.
* The parent process then calls `join` on each worker process to
  let them finish up gracefully.
* Worker processes break from the queue processing loop when the
  sentinel is encountered, so that they exit.

We need special handling when one of the workers encounters an error and
the error handler is set to raise an exception. In this case, we cannot
rely on the sentinel to finish all workers -- the queue is a FIFO queue
and there may be other work queued up before the sentinel. We use the
following approach to handle error scenarios:

* The parent puts the end-of-work sentinel in the queue of each worker.
* The parent closes the reading-end of the channel of each worker.
* Then:
  - If the worker was waiting for work, it will encounter the sentinel
    and break from the processing loop.
  - If the worker was processing a batch, it will attempt to write
    results to the channel. This will fail because the channel was
    closed by the parent and the worker will break from the processing
    loop.
2024-01-23 18:33:04 +01:00
..
cli Update TextCatBOW to use the fixed SparseLinear layer (#13149) 2023-11-29 09:11:54 +01:00
displacy Fix displacy span stacking (#13068) 2023-11-02 12:02:18 +01:00
kb Update __all__ fields (#13063) 2023-10-16 10:17:47 +02:00
lang Feature/nn and fo language extensions (#13116) 2023-11-20 07:49:59 +01:00
matcher Update __all__ fields (#13063) 2023-10-16 10:17:47 +02:00
ml Add spacy.TextCatParametricAttention.v1 (#13201) 2024-01-02 10:03:06 +01:00
pipeline Add TextCatReduce.v1 (#13181) 2023-12-21 11:00:06 +01:00
tests Add spacy.TextCatParametricAttention.v1 (#13201) 2024-01-02 10:03:06 +01:00
tokens Type documentation fixes for Doc (#13187) 2023-12-18 09:00:47 +01:00
training Update __all__ fields (#13063) 2023-10-16 10:17:47 +02:00
__init__.pxd
__init__.py Revert "Load the cli module lazily for spacy.info (#12962)" 2023-10-04 12:33:33 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.7.2 (#13066) 2023-10-16 15:10:55 +02:00
attrs.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
attrs.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
compat.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Support registered vectors (#12492) 2023-08-01 15:46:08 +02:00
errors.py Add TextCatReduce.v1 (#13181) 2023-12-21 11:00:06 +01:00
glossary.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
language.py Properly clean up pipe multiprocessing workers (#13259) 2024-01-23 18:33:04 +01:00
lexeme.pxd Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
lexeme.pyi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
lexeme.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
lookups.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
morphology.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
morphology.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
parts_of_speech.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
parts_of_speech.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
pipe_analysis.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1 2023-09-28 15:09:06 +02:00
scorer.py Update for numpy 2.0 deprecations (#13103) 2023-11-06 08:47:53 +01:00
strings.pxd Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
strings.pyi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
strings.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
structs.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
symbols.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
symbols.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
tokenizer.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
tokenizer.pyx Update Tokenizer.explain for special cases with whitespace (#13086) 2023-11-06 17:29:59 +01:00
ty.py Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
typedefs.pxd Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
typedefs.pyx Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
util.py Warn about reloading dependencies after downloading models (#13081) 2023-11-10 08:05:07 +01:00
vectors.pyx Remove profile=True from currently profiled cython 2023-09-28 17:09:41 +02:00
vocab.pxd ci: add cython linter (#12694) 2023-07-19 12:03:31 +02:00
vocab.pyi Configure isort to use the Black profile, recursively isort the spacy module (#12721) 2023-06-14 17:48:41 +02:00
vocab.pyx Remove profile=True from currently profiled cython 2023-09-28 17:09:41 +02:00