* Support local filesystem remotes for projects
* Fix support for local filesystem remotes for projects
* Use `FluidPath` instead of `Pathy` to support both filesystem and
remote paths
* Create missing parent directories if required for local filesystem
* Add a more general `_file_exists` method to support both `Pathy`,
`Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
`compression` flag
* Update `pathy` dependency to exclude older versions that aren't
compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
remotes (technically you can still push to any `smart_open`-compatible
path but you can't pull from them)
* Add tests for local filesystem remotes
* Update pathy for general BlobStat sorting
* Add import
* Remove _file_exists since only Pathy remotes are supported
* Format CLI docs
* Clean up merge
* Add `training.before_update` callback
This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning.
* Fix type annotation, default config value
* Generalize arguments passed to the callback
* Update schema
* Pass `epoch` to callback, rename `current_step` to `step`
* Add test
* Simplify test
* Replace config string with `spacy.blank`
* Apply suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Cleanup imports
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update textcat scorer threshold behavior
For `textcat` (with exclusive classes) the scorer should always use a
threshold of 0.0 because there should be one predicted label per doc and
the numeric score for that particular label should not matter.
* Rename to test_textcat_multilabel_threshold
* Remove all uses of threshold for multi_label=False
* Update Scorer.score_cats API docs
* Add tests for score_cats with thresholds
* Update textcat API docs
* Fix types
* Convert threshold back to float
* Fix threshold type in docstring
* Improve formatting in Scorer API docs
* added spacy-cleaner to the spaCy universe
* Move data to righ section of universe.json
* Cleanup
- fix typo ("replacers")
- spaCy doesn't need to be marked as code
- lemma of "Hello" is lower case
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Fix flag handling in dvc
Prior to this commit, if a flag (--verbose or --quiet) was passed to
DVC, it would be added to the end of the generated dvc command line.
This would result in the command being interpreted as part of the actual
command to run, rather than an argument to dvc. This would result in
command lines like:
spacy project run preprocess --verbose
That would fail with an error that there's no such directory as
`--verbose`.
This change puts the flags at the front of the dvc command so that they
are interpreted correctly. It removes the `run_dvc_commands` function,
which had been reduced to just a for loop and wasn't used elsewhere.
A separate problem is that there's no way to specify the quiet behaviour
to dvc from the command line, though it's unclear if that's a bug.
* Add dvc quiet flag to docs
* Handle case in DVC where no commands are appropriate
If only have commands with no deps or outputs (admittedly unlikely), you
get a weird error about the dvc file not existing. This gives explicit
output instead.
* Add support for quiet flag
* Fix command execution
Commands are strings now because they're joined further up.
* Fix example code for spacy-wordnet
It looks like in the most recent version, 0.1.0, it's no longer possible
to pass the lang parameter to the component separately. Doing so will
raise an error.
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Cleanup
* More cleanup
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add experimental coref docs
* Docs cleanup
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Apply changes from code review
* Fix prettier formatting
It seems a period after a number made this think it was a list?
* Update docs on examples for initialize
* Add docs for coref scorers
* Remove 3.4 notes from coref
There won't be a "new" tag until it's in core.
* Add docs for span cleaner
* Fix docs
* Fix docs to match spacy-experimental
These weren't properly updated when the code was moved out of spacy
core.
* More doc fixes
* Formatting
* Update architectures
* Fix links
* Fix another link
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
* Remove side effects from Doc.__init__()
* Changes based on review comment
* Readd test
* Change interface of Doc.__init__()
* Simplify test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update doc.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update cupy extras:
* Extend to v11
* Add `cupy-cuda11x` and `cupy-wheel`
* Update quickstart to use `cupy-wheel` for CUDA 10.2+
* Rename cuda-wheel to cuda-autodetect, remove repeated CUDA in menu
* replicate bug with tok2vec in annotating components
* add overfitting test with a frozen tok2vec
* remove broadcast from predict and check doc.tensor instead
* remove broadcast
* proper error
* slight rephrase of documentation