* Use minor version for compatibility check
* Use minor version of compatibility table
* Soften warning message about incompatible models
* Add test for presence of current version in compatibility table
* Add test for download compatibility table
* Use minor version of lower pin in error message if possible
* Fall back to spacy_git_version if available
* Fix unknown version string
* Copy rather than move files to top-level of package
* Add all files to `MANIFEST.in` (primarily for older versions of pip)
* Include the `README.md` contents as `long_description` in the setup
* implement textcat resizing for TextCatCNN
* resizing textcat in-place
* simplify code
* ensure predictions for old textcat labels remain the same after resizing (WIP)
* fix for softmax
* store softmax as attr
* fix ensemble weight copy and cleanup
* restructure slightly
* adjust documentation, update tests and quickstart templates to use latest versions
* extend unit test slightly
* revert unnecessary edits
* fix typo
* ensemble architecture won't be resizable for now
* use resizable layer (WIP)
* revert using resizable layer
* resizable container while avoid shape inference trouble
* cleanup
* ensure model continues training after resizing
* use fill_b parameter
* use fill_defaults
* resize_layer callback
* format
* bump thinc to 8.0.4
* bump spacy-legacy to 3.0.6
The behavior of `spacy.Corpus.v1` is unexpected enough for `max_length
!= 0` that `0` is a better default for users creating a new config with
the quickstart.
If not, documents are skipped, sometimes the entire corpus is skipped,
and sometimes documents are (quite unexpectedly for your average user)
split into sentences.
* Check for unsupported cats values
* Only show labels if train/dev mismatched
* Don't show label counts (only counting positive labels seems odd)
* Use warnings for mismatched train/dev labels
This came up in #7878, but if --resume-path is a directory then loading
the weights will fail. On Linux this will give a straightforward error
message, but on Windows it gives "Permission Denied", which is
confusing.
* Fix percent unk display
This was showing (ratio %), so 10% would show as 0.10%. Fix by
multiplying ration by 100.
Might want to add a warning if this is over a threshold.
* Only show whole-integer percents
* Add empty lines at the end of Python files
* Only prepend the lang code if it's not there already
* Update spacy/cli/package.py
* fix whitespace stripping
* Replace negative rows with 0 in StaticVectors
Replace negative row indices with 0-vectors in `StaticVectors`.
* Increase versions related to StaticVectors
* Increase versions of all architctures and layers related to
`StaticVectors`
* Improve efficiency of 0-vector operations
Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5
* Update config defaults to new versions
* Update docs
* Terminology: deprecated vs obsolete
Typically, deprecated is used for functionality that is bound to become unavailable but that can still be used. Obsolete is used for features that have been removed. In E941, I think what is meant is "obsolete" since loading a model by a shortcut simply does not work anymore (and throws an error). This is different from downloading a model with a shortcut, which is deprecated but still works.
In light of this, perhaps all other error codes should be checked as well.
* clarify that the link command is removed and not just deprecated
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
* Update debug data further for v3
* Remove new/existing label distinction (new labels are not immediately
distinguishable because the pipeline is already initialized)
* Warn on missing labels in training data for all components except parser
* Separate textcat and textcat_multilabel sections
* Add section for morphologizer
* Reword missing label warnings
* add multi-label textcat to menu
* add infobox on textcat API
* add info to v3 migration guide
* small edits
* further fixes in doc strings
* add infobox to textcat architectures
* add textcat_multilabel to overview of built-in components
* spelling
* fix unrelated warn msg
* Add textcat_multilabel to quickstart [ci skip]
* remove separate documentation page for multilabel_textcategorizer
* small edits
* positive label clarification
* avoid duplicating information in self.cfg and fix textcat.score
* fix multilabel textcat too
* revert threshold to storage in cfg
* revert threshold stuff for multi-textcat
Co-authored-by: Ines Montani <ines@ines.io>
* Add hint for --gpu-id to CLI device info
If the user has `cupy` and an available GPU, add a hint about using
`--gpu-id 0` to the CLI output.
* Undo change to original CPU message
Now that `nlp.evaluate()` does not modify the examples, rerun the
pipeline on the (limited) texts in order to provide the predicted
annotation in the displacy output option.
* add capture argument to project_run and run_commands
* git bump to 3.0.1
* Set version to 3.0.1.dev0
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
When `--no-cache-dir` is present, it prevents caching to properly function.
If the user still wants to do this, there is the possibility to pass options with `user_pip_args`.
But you should not enforce options like these. In my case this is preventing some docker build (using buildkit caching) to have proper caching of models.
* Allow output_path to be None during training
* Fix cat scoring (?)
* Improve error message for weighted None score
* Improve messages
So we can call this in other places etc.
* FIx output path check
* Use latest wasabi
* Revert "Improve error message for weighted None score"
This reverts commit 7059926763.
* Exclude None scores from final score by default
It's otherwise very difficult to keep track of the score weights if we modify a config programmatically, source components etc.
* Update warnings and use logger.warning