* Init
* Fix return type for mypy
* adjust types and improve setting new attributes
* Add underscore changes to json conversion
* Add test and underscore changes to from_docs
* add underscore changes and test to span.to_doc
* update return values
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add types to function
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* adjust formatting
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* shorten return type
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* add helper function to improve readability
* Improve code and add comments
* rerun azure tests
* Fix tests for json conversion
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Remove experimental multi-task components
These are incomplete implementations and are not usable in their current state.
* Remove orphaned error message
* Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)
* Switch ubuntu-latest to ubuntu-20.04 in main tests
* Only use 20.04 for 3.6
* Revert "Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)"
This reverts commit 77c0fd7b17.
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Remove old model shortcuts
* Remove error, docs warnings about shortcuts
* Fix import in util
Accidentally deleted the whole import and not just the old part...
* Change universe example to v3 style
* Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)
* Switch ubuntu-latest to ubuntu-20.04 in main tests
* Only use 20.04 for 3.6
* Update some model loading in Universe
* Add v2 tag to neuralcoref
* Use the spacy-version feature instead of a v2 tag
Co-authored-by: svlandeg <svlandeg@github.com>
If you don't have spacy-transformers installed, but try to use `init
config` with the GPU flag, you'll get an error. The issue is that the
`use_transformers` flag in the config is conflated with the GPU flag,
and then there's an attempt to access transformers config info that may
not exist.
There may be a better way to do this, but this stops the error.
* Support local filesystem remotes for projects
* Fix support for local filesystem remotes for projects
* Use `FluidPath` instead of `Pathy` to support both filesystem and
remote paths
* Create missing parent directories if required for local filesystem
* Add a more general `_file_exists` method to support both `Pathy`,
`Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
`compression` flag
* Update `pathy` dependency to exclude older versions that aren't
compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
remotes (technically you can still push to any `smart_open`-compatible
path but you can't pull from them)
* Add tests for local filesystem remotes
* Update pathy for general BlobStat sorting
* Add import
* Remove _file_exists since only Pathy remotes are supported
* Format CLI docs
* Clean up merge
* pymorph2 issues #11620, #11626, #11625:
- #11620: pymorphy2_lookup
- #11626: handle multiple forms pointing to the same normal form + handling empty POS tag
- #11625: matching DET that are labelled as PRON by pymorhp2
* Move lemmatizer algorithm changes back into RussianLemmatizer
* Fix uk pymorphy3_lookup mode init
* Move and update tests for ru/uk lookup lemmatizer modes
* Fix typo
* Remove traces of previous behavior for uninflected POS
* Refactor to private generic-looking pymorphy methods
* Remove xfailed uk lemmatizer cases
* Update spacy/lang/ru/lemmatizer.py
Co-authored-by: Richard Hudson <richard@explosion.ai>
Co-authored-by: Dmytro S Lituiev <d.lituiev@gmail.com>
Co-authored-by: Richard Hudson <richard@explosion.ai>
* Add `training.before_update` callback
This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning.
* Fix type annotation, default config value
* Generalize arguments passed to the callback
* Update schema
* Pass `epoch` to callback, rename `current_step` to `step`
* Add test
* Simplify test
* Replace config string with `spacy.blank`
* Apply suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Cleanup imports
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* remove sentiment attribute
* remove sentiment from docs
* add test for backwards compatibility
* replace from_disk with from_bytes
* Fix docs and format file
* Fix formatting
* Check textcat values for validity
* Fix error numbers
* Clean up vals reference
* Check category value validity through training
The _validate_categories is called in update, which for multilabel is
inherited from the single label component.
* Formatting
* Add equality definition for vectors
This re-uses the check from sourcing components.
* Use the equality check
* Format
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>