* Support local filesystem remotes for projects
* Fix support for local filesystem remotes for projects
* Use `FluidPath` instead of `Pathy` to support both filesystem and
remote paths
* Create missing parent directories if required for local filesystem
* Add a more general `_file_exists` method to support both `Pathy`,
`Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
`compression` flag
* Update `pathy` dependency to exclude older versions that aren't
compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
remotes (technically you can still push to any `smart_open`-compatible
path but you can't pull from them)
* Add tests for local filesystem remotes
* Update pathy for general BlobStat sorting
* Add import
* Remove _file_exists since only Pathy remotes are supported
* Format CLI docs
* Clean up merge
* Add `training.before_update` callback
This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning.
* Fix type annotation, default config value
* Generalize arguments passed to the callback
* Update schema
* Pass `epoch` to callback, rename `current_step` to `step`
* Add test
* Simplify test
* Replace config string with `spacy.blank`
* Apply suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Cleanup imports
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* remove sentiment attribute
* remove sentiment from docs
* add test for backwards compatibility
* replace from_disk with from_bytes
* Fix docs and format file
* Fix formatting
* Update textcat scorer threshold behavior
For `textcat` (with exclusive classes) the scorer should always use a
threshold of 0.0 because there should be one predicted label per doc and
the numeric score for that particular label should not matter.
* Rename to test_textcat_multilabel_threshold
* Remove all uses of threshold for multi_label=False
* Update Scorer.score_cats API docs
* Add tests for score_cats with thresholds
* Update textcat API docs
* Fix types
* Convert threshold back to float
* Fix threshold type in docstring
* Improve formatting in Scorer API docs
* Replace EntityRuler with SpanRuler implementation
Remove `EntityRuler` and rename the `SpanRuler`-based
`future_entity_ruler` to `entity_ruler`.
Main changes:
* It is no longer possible to load patterns on init as with
`EntityRuler(patterns=)`.
* The older serialization formats (`patterns.jsonl`) are no longer
supported and the related tests are removed.
* The config settings are only stored in the config, not in the
serialized component (in particular the `phrase_matcher_attr` and
overwrite settings).
* Add migration guide to EntityRuler API docs
* docs update
* Minor edit
Co-authored-by: svlandeg <svlandeg@github.com>
* added spacy-cleaner to the spaCy universe
* Move data to righ section of universe.json
* Cleanup
- fix typo ("replacers")
- spaCy doesn't need to be marked as code
- lemma of "Hello" is lower case
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Fix flag handling in dvc
Prior to this commit, if a flag (--verbose or --quiet) was passed to
DVC, it would be added to the end of the generated dvc command line.
This would result in the command being interpreted as part of the actual
command to run, rather than an argument to dvc. This would result in
command lines like:
spacy project run preprocess --verbose
That would fail with an error that there's no such directory as
`--verbose`.
This change puts the flags at the front of the dvc command so that they
are interpreted correctly. It removes the `run_dvc_commands` function,
which had been reduced to just a for loop and wasn't used elsewhere.
A separate problem is that there's no way to specify the quiet behaviour
to dvc from the command line, though it's unclear if that's a bug.
* Add dvc quiet flag to docs
* Handle case in DVC where no commands are appropriate
If only have commands with no deps or outputs (admittedly unlikely), you
get a weird error about the dvc file not existing. This gives explicit
output instead.
* Add support for quiet flag
* Fix command execution
Commands are strings now because they're joined further up.
* Fix example code for spacy-wordnet
It looks like in the most recent version, 0.1.0, it's no longer possible
to pass the lang parameter to the component separately. Doing so will
raise an error.
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Cleanup
* More cleanup
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* `strings`: Remove unused `hash32_utf8` function
* `strings`: Make `hash_utf8` and `decode_Utf8Str` private
* `strings`: Reorganize private functions
* 'strings': Raise error when non-string/-int types are passed to functions that don't accept them
* `strings`: Add `items()` method, add type hints, remove unused methods, restrict inputs to specific types, reorganize methods
* `Morphology`: Use `StringStore.items()` to enumerate features when pickling
* `test_stringstore`: Update pre-Python 3 tests
* Update `StringStore` docs
* Fix `get_string_id` imports
* Replace redundant test with tests for type checking
* Rename `_retrieve_interned_str`, remove `.get` default arg
* Add `get_string_id` to `strings.pyi`
Remove `mypy` ignore directives from imports of the above
* `strings.pyi`: Replace functions that consume `Union`-typed params with overloads
* `strings.pyi`: Revert some function signatures
* Update `SYMBOLS_BY_INT` lookups and error codes post-merge
* Revert clobbered change introduced in a previous merge
* Remove unnecessary type hint
* Invert tuple order in `StringStore.items()`
* Add test for `StringStore.items()`
* Revert "`Morphology`: Use `StringStore.items()` to enumerate features when pickling"
This reverts commit 1af9510ceb.
* Rename `keys` and `key_map`
* Add `keys()` and `values()`
* Add comment about the inverted key-value semantics in the API
* Fix type hints
* Implement `keys()`, `values()`, `items()` without generators
* Fix type hints, remove unnecessary boxing
* Update docs
* Simplify `keys/values/items()` impl
* `mypy` fix
* Fix error message, doc fixes
* Add experimental coref docs
* Docs cleanup
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Apply changes from code review
* Fix prettier formatting
It seems a period after a number made this think it was a list?
* Update docs on examples for initialize
* Add docs for coref scorers
* Remove 3.4 notes from coref
There won't be a "new" tag until it's in core.
* Add docs for span cleaner
* Fix docs
* Fix docs to match spacy-experimental
These weren't properly updated when the code was moved out of spacy
core.
* More doc fixes
* Formatting
* Update architectures
* Fix links
* Fix another link
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>