* Add workflow files for cibuildwheel
* Add config for cibuildwheel
* Set version for experimental prerelease
* Try updating cython
* Skip 32-bit windows builds
* Revert "Try updating cython"
This reverts commit c1b794ab5c.
* Try to import cibuildwheel settings from previous setup
* fix type annotation in docs
* only restore entities after loss calculation
* restore entities of sample in initialization
* rename overfitting function
* fix EL scorer
* Relax test
* fix formatting
* Update spacy/pipeline/entity_linker.py
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* rename to _ensure_ents
* further rename
* allow for scorer to be None
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
The 'direct' option in 'spacy download' is supposed to only download from our model releases repository. However, users were able to pass in a relative path, allowing download from arbitrary repositories. This meant that a service that sourced strings from user input and which used the direct option would allow users to install arbitrary packages.
* TextCatParametricAttention.v1: set key transform dimensions
This is necessary for tok2vec implementations that initialize
lazily (e.g. curated transformers).
* Add lazily-initialized tok2vec to simulate transformers
Add a lazily-initialized tok2vec to the tests and test the current
textcat models with it.
Fix some additional issues found using this test.
* isort
* Add `test.` prefix to `LazyInitTok2Vec.v1`
The doc/token extension serialization tests add extensions that are not
serializable with pickle. This didn't cause issues before due to the
implicit run order of tests. However, test ordering has changed with
pytest 8.0.0, leading to failed tests in test_language.
Update the fixtures in the extension serialization tests to do proper
teardown and remove the extensions.
macOS now uses port 5000 for the AirPlay receiver functionality, so this
test will always fail on a macOS desktop (unless AirPlay receiver
functionality is disabled like in CI).
Before this change, the workers of pipe call with n_process != 1 were
stopped by calling `terminate` on the processes. However, terminating a
process can leave queues, pipes, and other concurrent data structures in
an invalid state.
With this change, we stop using terminate and take the following approach
instead:
* When the all documents are processed, the parent process puts a
sentinel in the queue of each worker.
* The parent process then calls `join` on each worker process to
let them finish up gracefully.
* Worker processes break from the queue processing loop when the
sentinel is encountered, so that they exit.
We need special handling when one of the workers encounters an error and
the error handler is set to raise an exception. In this case, we cannot
rely on the sentinel to finish all workers -- the queue is a FIFO queue
and there may be other work queued up before the sentinel. We use the
following approach to handle error scenarios:
* The parent puts the end-of-work sentinel in the queue of each worker.
* The parent closes the reading-end of the channel of each worker.
* Then:
- If the worker was waiting for work, it will encounter the sentinel
and break from the processing loop.
- If the worker was processing a batch, it will attempt to write
results to the channel. This will fail because the channel was
closed by the parent and the worker will break from the processing
loop.