* Fix multiple extensions and character offset
* Rename token_start/end to start/end
* Refactor Doc.from_json based on review
* Iterate over user_data items
* Only add non-empty underscore entries
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Fix flag handling in dvc
Prior to this commit, if a flag (--verbose or --quiet) was passed to
DVC, it would be added to the end of the generated dvc command line.
This would result in the command being interpreted as part of the actual
command to run, rather than an argument to dvc. This would result in
command lines like:
spacy project run preprocess --verbose
That would fail with an error that there's no such directory as
`--verbose`.
This change puts the flags at the front of the dvc command so that they
are interpreted correctly. It removes the `run_dvc_commands` function,
which had been reduced to just a for loop and wasn't used elsewhere.
A separate problem is that there's no way to specify the quiet behaviour
to dvc from the command line, though it's unclear if that's a bug.
* Add dvc quiet flag to docs
* Handle case in DVC where no commands are appropriate
If only have commands with no deps or outputs (admittedly unlikely), you
get a weird error about the dvc file not existing. This gives explicit
output instead.
* Add support for quiet flag
* Fix command execution
Commands are strings now because they're joined further up.
* Fix example code for spacy-wordnet
It looks like in the most recent version, 0.1.0, it's no longer possible
to pass the lang parameter to the component separately. Doing so will
raise an error.
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Cleanup
* More cleanup
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* empty commit
* restrict importlib-metadata to lower than 5.0.0
* restrict importlib-metadata also for validate CI step
* set fixed version for CI
* try flake8 5.0.4 in CI validation step
* from importlib-metadata from requirements again
* Change enable/disable behavior so that arguments take precedence over config options. Extend error message on conflict. Add warning message in case of overwriting config option with arguments.
* Fix tests in test_serialize_pipeline.py to reflect changes to handling of enable/disable.
* Fix type issue.
* Move comment.
* Move comment.
* Issue UserWarning instead of printing wasabi message. Adjust test.
* Added pytest.warns(UserWarning) for expected warning to fix tests.
* Update warning message.
* Move type handling out of fetch_pipes_status().
* Add global variable for default value. Use id() to determine whether used values are default value.
* Fix default value for disable.
* Rename DEFAULT_PIPE_STATUS to _DEFAULT_EMPTY_PIPES.
* add punctuation to grc
Add support for special editorial punctuation that is common in ancient Greek texts. Ancient Greek texts, as found in digital and print form, have been largely edited by scholars. Restorations and improvements are normally marked with special characters that need to be handled properly by the tokenizer.
* add unit tests
* simplify regex
* move generic quotes to char classes
* rename unit test
* fix regex
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: svlandeg <svlandeg@github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Add experimental coref docs
* Docs cleanup
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Apply changes from code review
* Fix prettier formatting
It seems a period after a number made this think it was a list?
* Update docs on examples for initialize
* Add docs for coref scorers
* Remove 3.4 notes from coref
There won't be a "new" tag until it's in core.
* Add docs for span cleaner
* Fix docs
* Fix docs to match spacy-experimental
These weren't properly updated when the code was moved out of spacy
core.
* More doc fixes
* Formatting
* Update architectures
* Fix links
* Fix another link
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
Preserve both `-` and `O` annotation in augmenters rather than relying
on `Example.to_dict`'s default support for one option outside of labeled
entity spans.
This is intended as a temporary workaround for augmenters for v3.4.x.
The behavior of `Example` and related IOB utils could be improved in the
general case for v3.5.
* Remove side effects from Doc.__init__()
* Changes based on review comment
* Readd test
* Change interface of Doc.__init__()
* Simplify test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update doc.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update cupy extras:
* Extend to v11
* Add `cupy-cuda11x` and `cupy-wheel`
* Update quickstart to use `cupy-wheel` for CUDA 10.2+
* Rename cuda-wheel to cuda-autodetect, remove repeated CUDA in menu
* replicate bug with tok2vec in annotating components
* add overfitting test with a frozen tok2vec
* remove broadcast from predict and check doc.tensor instead
* remove broadcast
* proper error
* slight rephrase of documentation
* Enable Cython<->Python bindings for `Pipe` and `TrainablePipe` methods
* `pipes_with_nvtx_range`: Skip hooking methods whose signature cannot be ascertained
When loading pipelines from a config file, the arguments passed to individual pipeline components is validated by `pydantic` during init. For this, the validation model attempts to parse the function signature of the component's c'tor/entry point so that it can check if all mandatory parameters are present in the config file.
When using the `models_and_pipes_with_nvtx_range` as a `after_pipeline_creation` callback, the methods of all pipeline components get replaced by a NVTX range wrapper **before** the above-mentioned validation takes place. This can be problematic for components that are implemented as Cython extension types - if the extension type is not compiled with Python bindings for its methods, they will have no signatures at runtime. This resulted in `pydantic` matching the *wrapper's* parameters with the those in the config and raising errors.
To avoid this, we now skip applying the wrapper to any (Cython) methods that do not have signatures.
* new error message when 'project run assets'
* new error message when 'project run assets'
* Update spacy/cli/project/run.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Due to problems with the javascript conversion in the website
quickstart, remove the `has_letters` setting to simplify generating
`attrs` for the default `tok2vec`.
Additionally reduce `PREFIX` as in the trained pipelines.