* Init
* Fix return type for mypy
* adjust types and improve setting new attributes
* Add underscore changes to json conversion
* Add test and underscore changes to from_docs
* add underscore changes and test to span.to_doc
* update return values
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add types to function
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* adjust formatting
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* shorten return type
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* add helper function to improve readability
* Improve code and add comments
* rerun azure tests
* Fix tests for json conversion
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Convert all individual values explicitly to uint64 for array-based doc representations
* Temporarily test with latest numpy v1.24.0rc
* Remove unnecessary conversion from attr_t
* Reduce number of individual casts
* Convert specifically from int32 to uint64
* Revert "Temporarily test with latest numpy v1.24.0rc"
This reverts commit eb0e3c5006.
* Also use int32 in tests
* Extend to wasabi v1.1
* Temporarily run mypy and tests with newest wasabi
* Temporarily skip check requirements test
* Revert "Temporarily skip check requirements test"
This reverts commit 44f4ce20a8.
* Revert "Temporarily run mypy and tests with newest wasabi"
This reverts commit e677a2257c.
* Remove experimental multi-task components
These are incomplete implementations and are not usable in their current state.
* Remove orphaned error message
* Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)
* Switch ubuntu-latest to ubuntu-20.04 in main tests
* Only use 20.04 for 3.6
* Revert "Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)"
This reverts commit 77c0fd7b17.
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Remove old model shortcuts
* Remove error, docs warnings about shortcuts
* Fix import in util
Accidentally deleted the whole import and not just the old part...
* Change universe example to v3 style
* Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)
* Switch ubuntu-latest to ubuntu-20.04 in main tests
* Only use 20.04 for 3.6
* Update some model loading in Universe
* Add v2 tag to neuralcoref
* Use the spacy-version feature instead of a v2 tag
Co-authored-by: svlandeg <svlandeg@github.com>
Strings in replacement nodes where not added to the `StringStore`
when `EditTreeLemmatizer` was initialized from a set of labels. The
corresponding test did not capture this because it added the strings
through the examples that were passed to the initialization.
This change fixes both this bug in the initialization as the 'shadowing'
of the bug in the test.
If you don't have spacy-transformers installed, but try to use `init
config` with the GPU flag, you'll get an error. The issue is that the
`use_transformers` flag in the config is conflated with the GPU flag,
and then there's an attempt to access transformers config info that may
not exist.
There may be a better way to do this, but this stops the error.
* Support local filesystem remotes for projects
* Fix support for local filesystem remotes for projects
* Use `FluidPath` instead of `Pathy` to support both filesystem and
remote paths
* Create missing parent directories if required for local filesystem
* Add a more general `_file_exists` method to support both `Pathy`,
`Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
`compression` flag
* Update `pathy` dependency to exclude older versions that aren't
compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
remotes (technically you can still push to any `smart_open`-compatible
path but you can't pull from them)
* Add tests for local filesystem remotes
* Update pathy for general BlobStat sorting
* Add import
* Remove _file_exists since only Pathy remotes are supported
* Format CLI docs
* Clean up merge
* pymorph2 issues #11620, #11626, #11625:
- #11620: pymorphy2_lookup
- #11626: handle multiple forms pointing to the same normal form + handling empty POS tag
- #11625: matching DET that are labelled as PRON by pymorhp2
* Move lemmatizer algorithm changes back into RussianLemmatizer
* Fix uk pymorphy3_lookup mode init
* Move and update tests for ru/uk lookup lemmatizer modes
* Fix typo
* Remove traces of previous behavior for uninflected POS
* Refactor to private generic-looking pymorphy methods
* Remove xfailed uk lemmatizer cases
* Update spacy/lang/ru/lemmatizer.py
Co-authored-by: Richard Hudson <richard@explosion.ai>
Co-authored-by: Dmytro S Lituiev <d.lituiev@gmail.com>
Co-authored-by: Richard Hudson <richard@explosion.ai>
* Add `training.before_update` callback
This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning.
* Fix type annotation, default config value
* Generalize arguments passed to the callback
* Update schema
* Pass `epoch` to callback, rename `current_step` to `step`
* Add test
* Simplify test
* Replace config string with `spacy.blank`
* Apply suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Cleanup imports
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* remove sentiment attribute
* remove sentiment from docs
* add test for backwards compatibility
* replace from_disk with from_bytes
* Fix docs and format file
* Fix formatting