1. Issue: Unresolved Import (`ai_insights`)
- Problem: The script contained an import statement for a module named `ai_insights`, which could not be resolved. This resulted in an error indicating that the import was missing.
- Resolution: To resolve this issue, the import of `ai_insights` was either removed if it was unnecessary, or the correct module path was updated to ensure the import could be resolved.
Additionally, the script was refactored to ensure that any AI-driven functionality previously dependent on `ai_insights` was correctly integrated or replaced with appropriate logic.
2. Issue: Cognitive Complexity Reduction
- Problem: The original script had a function that exceeded the allowed cognitive complexity limit. High cognitive complexity can make code difficult to understand and maintain.
- Resolution: The complex function was refactored to reduce its cognitive complexity. This involved breaking down the function into smaller, more manageable sub-functions, and simplifying the logic where possible. The goal was to maintain the same functionality while making the code more readable and easier to maintain.
3. Issue: Validation of Component Attributes
- Problem: The script had potential issues related to the validation of component attributes such as `assigns`, `requires`, etc. These attributes could cause errors if they were invalid or improperly formatted.
- Resolution: The validation logic was enhanced to ensure that attributes provided to components were correctly validated. This included checking for invalid attributes, ensuring proper formatting, and handling edge cases like custom extension attributes. Error messages were improved to provide clearer guidance on how to fix issues.
4. Issue: Pipeline Analysis Enhancements
- Problem: The pipeline analysis feature in the script needed enhancements to better handle the analysis and reporting of pipeline components.
- Resolution: AI-driven insights were integrated into the pipeline analysis process. This involved adding functionality to provide more detailed and accurate analysis of pipeline components, including the detection of potential issues and the generation of more informative summaries. The reporting format was also improved for better readability.
5. Issue: Improved Error Handling
- Problem: The original script had basic error handling, which might not have been sufficient to catch and address all potential issues.
- Resolution: The error handling mechanisms were upgraded to include AI-driven predictive error handling. This involved preemptive checks before executing critical parts of the code, as well as more robust exception handling to catch and manage errors more effectively. The script now includes AI-generated suggestions for resolving issues when errors are encountered.
6. Issue: Refactoring for Readability and Maintainability
- Problem: Certain parts of the script were complex and difficult to read, which could hinder future maintenance and updates.
- Resolution: The script was refactored to improve readability and maintainability. This included reorganizing code into logical sections, renaming variables and functions for clarity, and adding comments to explain key parts of the code. The overall structure was improved to make it easier for developers to understand and work with the codebase.
These changes collectively enhanced the functionality, readability, and maintainability of the script, while also integrating AI-driven features to improve performance and error handling. The result is a more robust and user-friendly codebase that aligns with modern coding standards.
* Add workflow files for cibuildwheel
* Add config for cibuildwheel
* Set version for experimental prerelease
* Try updating cython
* Skip 32-bit windows builds
* Revert "Try updating cython"
This reverts commit c1b794ab5c.
* Try to import cibuildwheel settings from previous setup
* fix type annotation in docs
* only restore entities after loss calculation
* restore entities of sample in initialization
* rename overfitting function
* fix EL scorer
* Relax test
* fix formatting
* Update spacy/pipeline/entity_linker.py
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* rename to _ensure_ents
* further rename
* allow for scorer to be None
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
The 'direct' option in 'spacy download' is supposed to only download from our model releases repository. However, users were able to pass in a relative path, allowing download from arbitrary repositories. This meant that a service that sourced strings from user input and which used the direct option would allow users to install arbitrary packages.
* TextCatParametricAttention.v1: set key transform dimensions
This is necessary for tok2vec implementations that initialize
lazily (e.g. curated transformers).
* Add lazily-initialized tok2vec to simulate transformers
Add a lazily-initialized tok2vec to the tests and test the current
textcat models with it.
Fix some additional issues found using this test.
* isort
* Add `test.` prefix to `LazyInitTok2Vec.v1`
The doc/token extension serialization tests add extensions that are not
serializable with pickle. This didn't cause issues before due to the
implicit run order of tests. However, test ordering has changed with
pytest 8.0.0, leading to failed tests in test_language.
Update the fixtures in the extension serialization tests to do proper
teardown and remove the extensions.
macOS now uses port 5000 for the AirPlay receiver functionality, so this
test will always fail on a macOS desktop (unless AirPlay receiver
functionality is disabled like in CI).
Before this change, the workers of pipe call with n_process != 1 were
stopped by calling `terminate` on the processes. However, terminating a
process can leave queues, pipes, and other concurrent data structures in
an invalid state.
With this change, we stop using terminate and take the following approach
instead:
* When the all documents are processed, the parent process puts a
sentinel in the queue of each worker.
* The parent process then calls `join` on each worker process to
let them finish up gracefully.
* Worker processes break from the queue processing loop when the
sentinel is encountered, so that they exit.
We need special handling when one of the workers encounters an error and
the error handler is set to raise an exception. In this case, we cannot
rely on the sentinel to finish all workers -- the queue is a FIFO queue
and there may be other work queued up before the sentinel. We use the
following approach to handle error scenarios:
* The parent puts the end-of-work sentinel in the queue of each worker.
* The parent closes the reading-end of the channel of each worker.
* Then:
- If the worker was waiting for work, it will encounter the sentinel
and break from the processing loop.
- If the worker was processing a batch, it will attempt to write
results to the channel. This will fail because the channel was
closed by the parent and the worker will break from the processing
loop.
* Add spacy.TextCatParametricAttention.v1
This layer provides is a simplification of the ensemble classifier that
only uses paramteric attention. We have found empirically that with a
sufficient amount of training data, using the ensemble classifier with
BoW does not provide significant improvement in classifier accuracy.
However, plugging in a BoW classifier does reduce GPU training and
inference performance substantially, since it uses a GPU-only kernel.
* Fix merge fallout
* Add TextCatReduce.v1
This is a textcat classifier that pools the vectors generated by a
tok2vec implementation and then applies a classifier to the pooled
representation. Three reductions are supported for pooling: first, max,
and mean. When multiple reductions are enabled, the reductions are
concatenated before providing them to the classification layer.
This model is a generalization of the TextCatCNN model, which only
supports mean reductions and is a bit of a misnomer, because it can also
be used with transformers. This change also reimplements TextCatCNN.v2
using the new TextCatReduce.v1 layer.
* Doc fixes
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fully specify `TextCatCNN` <-> `TextCatReduce` equivalence
* Move TextCatCNN docs to legacy, in prep for moving to spacy-legacy
* Add back a test for TextCatCNN.v2
* Replace TextCatCNN in pipe configurations and templates
* Add an infobox to the `TextCatReduce` section with an `TextCatCNN` anchor
* Add last reduction (`use_reduce_last`)
* Remove non-working TextCatCNN Netlify redirect
* Revert layer changes for the quickstart
* Revert one more quickstart change
* Remove unused import
* Fix docstring
* Fix setting name in error message
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update README.md to include links for GPU processing, LLM, and spaCy's blog.
* Create ojo4f3.md
* corrected README to most current version with links to GPU processing, LLM's, and the spaCy blog.
* Delete .github/contributors/ojo4f3.md
* changed LLM icon
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Apply suggestions from code review
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>