* empty commit
* restrict importlib-metadata to lower than 5.0.0
* restrict importlib-metadata also for validate CI step
* set fixed version for CI
* try flake8 5.0.4 in CI validation step
* from importlib-metadata from requirements again
* Change enable/disable behavior so that arguments take precedence over config options. Extend error message on conflict. Add warning message in case of overwriting config option with arguments.
* Fix tests in test_serialize_pipeline.py to reflect changes to handling of enable/disable.
* Fix type issue.
* Move comment.
* Move comment.
* Issue UserWarning instead of printing wasabi message. Adjust test.
* Added pytest.warns(UserWarning) for expected warning to fix tests.
* Update warning message.
* Move type handling out of fetch_pipes_status().
* Add global variable for default value. Use id() to determine whether used values are default value.
* Fix default value for disable.
* Rename DEFAULT_PIPE_STATUS to _DEFAULT_EMPTY_PIPES.
* add punctuation to grc
Add support for special editorial punctuation that is common in ancient Greek texts. Ancient Greek texts, as found in digital and print form, have been largely edited by scholars. Restorations and improvements are normally marked with special characters that need to be handled properly by the tokenizer.
* add unit tests
* simplify regex
* move generic quotes to char classes
* rename unit test
* fix regex
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: svlandeg <svlandeg@github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Add experimental coref docs
* Docs cleanup
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Apply changes from code review
* Fix prettier formatting
It seems a period after a number made this think it was a list?
* Update docs on examples for initialize
* Add docs for coref scorers
* Remove 3.4 notes from coref
There won't be a "new" tag until it's in core.
* Add docs for span cleaner
* Fix docs
* Fix docs to match spacy-experimental
These weren't properly updated when the code was moved out of spacy
core.
* More doc fixes
* Formatting
* Update architectures
* Fix links
* Fix another link
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
Preserve both `-` and `O` annotation in augmenters rather than relying
on `Example.to_dict`'s default support for one option outside of labeled
entity spans.
This is intended as a temporary workaround for augmenters for v3.4.x.
The behavior of `Example` and related IOB utils could be improved in the
general case for v3.5.
* Remove side effects from Doc.__init__()
* Changes based on review comment
* Readd test
* Change interface of Doc.__init__()
* Simplify test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update doc.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update cupy extras:
* Extend to v11
* Add `cupy-cuda11x` and `cupy-wheel`
* Update quickstart to use `cupy-wheel` for CUDA 10.2+
* Rename cuda-wheel to cuda-autodetect, remove repeated CUDA in menu
* replicate bug with tok2vec in annotating components
* add overfitting test with a frozen tok2vec
* remove broadcast from predict and check doc.tensor instead
* remove broadcast
* proper error
* slight rephrase of documentation
* Enable Cython<->Python bindings for `Pipe` and `TrainablePipe` methods
* `pipes_with_nvtx_range`: Skip hooking methods whose signature cannot be ascertained
When loading pipelines from a config file, the arguments passed to individual pipeline components is validated by `pydantic` during init. For this, the validation model attempts to parse the function signature of the component's c'tor/entry point so that it can check if all mandatory parameters are present in the config file.
When using the `models_and_pipes_with_nvtx_range` as a `after_pipeline_creation` callback, the methods of all pipeline components get replaced by a NVTX range wrapper **before** the above-mentioned validation takes place. This can be problematic for components that are implemented as Cython extension types - if the extension type is not compiled with Python bindings for its methods, they will have no signatures at runtime. This resulted in `pydantic` matching the *wrapper's* parameters with the those in the config and raising errors.
To avoid this, we now skip applying the wrapper to any (Cython) methods that do not have signatures.
* new error message when 'project run assets'
* new error message when 'project run assets'
* Update spacy/cli/project/run.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Due to problems with the javascript conversion in the website
quickstart, remove the `has_letters` setting to simplify generating
`attrs` for the default `tok2vec`.
Additionally reduce `PREFIX` as in the trained pipelines.
* Add dev docs on satellite packages
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add displacy link
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add a dry run flag to download
* Remove --dry-run, add --url option to `spacy info` instead
* Make mypy happy
* Print only the URL, so it's easier to use in scripts
* Don't add the egg hash unless downloading an sdist
* Update spacy/cli/info.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add two implementations of requirements
* Clean up requirements sample slightly
This should make mypy happy
* Update URL help string
* Remove requirements option
* Add url option to docs
* Add URL to spacy info model output, when available
* Add types-setuptools to testing reqs
* Add types-setuptools to requirements
* Add "compatible", expand docstring
* Update spacy/cli/info.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Run prettier on CLI docs
* Update docs
Add a sidebar about finding download URLs, with some examples of the new
command.
* Add download URLs to table on model page
* Apply suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Updates from review
* download url -> download link
* Update docs
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* `Matcher`: Better type checking of values in `SetPredicate`
`SetPredicate`: Emit warning and return `False` on unexpected value types
* Rename `value_type_mismatch` variable
* Inline warning
* Remove unexpected type warning from `_SetPredicate`
* Ensure that `str` values are not interpreted as sequences
Check elements of sequence values for convertibility to `str` or `int`
* Add more `INTERSECT` and `IN` test cases
* Test for inputs with multiple characters
* Return `False` early instead of using a boolean flag
* Remove superfluous `int` check, parentheses
* Apply suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Appy suggestions from code review
* Clarify test comment
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* adding unit test for spacy.load with disable/exclude string arg
* allow pure strings in from_config
* update docs
* upstream type adjustements
* docs update
* make docstring more consistent
* Update spacy/language.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* two more cleanups
* fix type in internal method
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>