Commit Graph

15636 Commits

Author SHA1 Message Date
Paul O'Leary McCann
71c71d3760 Fix types 2022-09-29 13:58:24 +09:00
Paul O'Leary McCann
6dfa8bde2f Merge branch 'master' into fix/windows-quoting 2022-09-28 19:47:55 +09:00
Paul O'Leary McCann
30c90febf1 Attempt to handle command rewriting on Windows
This handles command rewriting on Windows to ensure the same Python
executable is re-used by using the `executable` argument to
subprocess.run. This has the advantage that it avoids any need to escape
the path of sys.executable. Still needs testing.
2022-09-28 19:26:27 +09:00
Raphael Mitsch
aea16719be
Simplify and clarify enable/disable behavior of spacy.load() (#11459)
* Change enable/disable behavior so that arguments take precedence over config options. Extend error message on conflict. Add warning message in case of overwriting config option with arguments.

* Fix tests in test_serialize_pipeline.py to reflect changes to handling of enable/disable.

* Fix type issue.

* Move comment.

* Move comment.

* Issue UserWarning instead of printing wasabi message. Adjust test.

* Added pytest.warns(UserWarning) for expected warning to fix tests.

* Update warning message.

* Move type handling out of fetch_pipes_status().

* Add global variable for default value. Use id() to determine whether used values are default value.

* Fix default value for disable.

* Rename DEFAULT_PIPE_STATUS to _DEFAULT_EMPTY_PIPES.
2022-09-27 14:22:36 +02:00
Taniguchi Yasufumi
9557b0fb01
Add spacy-partial-tagger to spaCy Universe (#11538) 2022-09-27 14:11:50 +02:00
Paul O'Leary McCann
a44b7d4622
Add experimental coref docs (#11291)
* Add experimental coref docs

* Docs cleanup

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply changes from code review

* Fix prettier formatting

It seems a period after a number made this think it was a list?

* Update docs on examples for initialize

* Add docs for coref scorers

* Remove 3.4 notes from coref

There won't be a "new" tag until it's in core.

* Add docs for span cleaner

* Fix docs

* Fix docs to match spacy-experimental

These weren't properly updated when the code was moved out of spacy
core.

* More doc fixes

* Formatting

* Update architectures

* Fix links

* Fix another link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2022-09-27 18:11:23 +09:00
Adriane Boyd
877671e09a
Preserve missing entity annotation in augmenters (#11540)
Preserve both `-` and `O` annotation in augmenters rather than relying
on `Example.to_dict`'s default support for one option outside of labeled
entity spans.

This is intended as a temporary workaround for augmenters for v3.4.x.
The behavior of `Example` and related IOB utils could be improved in the
general case for v3.5.
2022-09-27 10:16:51 +02:00
Paul O'Leary McCann
936a5f0506
Fix English pipeline names in 3.4 release notes (#11542) 2022-09-27 08:25:24 +02:00
Richard Hudson
6f692a06d5
Remove side effects from Doc.__init__() (#11506)
* Remove side effects from Doc.__init__()

* Changes based on review comment

* Readd test

* Change interface of Doc.__init__()

* Simplify test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update doc.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-26 15:58:21 +02:00
Basile Dura
f40d2fac29
fix: remove duplicate v3.2 (#11530) 2022-09-23 13:18:51 +02:00
Raphael Mitsch
af9b01ef97
Add dependency check to project step runs (#11226)
* Add dependency check to project step running.

* Fix dependency mismatch warning.

* Remove newline.

* Add types-setuptools to setup.cfg.

* Move types-setuptools to test requirements. Move warnings into _validate_requirements(). Handle file reading in project_run().

* Remove newline formatting for output of package conflicts.

* Show full version conflict message instead of just package name.

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix typo.

* Re-add rephrasing of message for conflicting packages. Remove requirements path redundancy.

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Print unified message for requirement conflicts and missing requirements.

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix warning message.

* Print conflict/missing messages individually.

* Print conflict/missing messages individually.

* Add check_requirements setting in project.yml to disable requirements check.

* Update website/docs/usage/projects.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/usage/projects.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update description of project.yml structure in projects.md.

* Update website/docs/usage/projects.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Prettify projects docs.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-09-16 16:54:31 +02:00
github-actions[bot]
279358be63
Auto-format code with black (#11513)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-09-16 11:50:19 +02:00
Sofie Van Landeghem
df0b815c23
more explicit Example constructor example (#11489)
* make constructor example for Example more explicit

* shorten example and add spaces
2022-09-16 09:26:33 +02:00
Sofie Van Landeghem
0509f90874
add dot (#11500) 2022-09-15 17:29:42 +02:00
Sofie Van Landeghem
ca1ad67458
disable mypy run for Python 3.10 (#11508) 2022-09-15 15:51:19 +02:00
Adriane Boyd
7c98245c0c
Add levenshtein from polyleven (#11418)
Add a simple levenshtein distance function using the implementation from
the polyleven library as `spacy.matcher.levenshtein`.
2022-09-14 17:05:22 +02:00
Richard Hudson
3f0c3ad7d3
Correct alignment example and documentation (#11491)
* Correct example and documentation

* Added altered example.md

* Changes based on review + apply prettier

* Remote unnecessary 'the'

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2022-09-14 09:36:55 +02:00
Adriane Boyd
6be6913ba5
Update cupy extras (#11279)
* Update cupy extras:

* Extend to v11
* Add `cupy-cuda11x` and `cupy-wheel`
* Update quickstart to use `cupy-wheel` for CUDA 10.2+

* Rename cuda-wheel to cuda-autodetect, remove repeated CUDA in menu
2022-09-13 09:04:53 +02:00
Madeesh Kannan
0ec9a696e6
Fix config validation failures caused by NVTX pipeline wrappers (#11460)
* Enable Cython<->Python bindings for `Pipe` and `TrainablePipe` methods

* `pipes_with_nvtx_range`: Skip hooking methods whose signature cannot be ascertained

When loading pipelines from a config file, the arguments passed to individual pipeline components is validated by `pydantic` during init. For this, the validation model attempts to parse the function signature of the component's c'tor/entry point so that it can check if all mandatory parameters are present in the config file.

When using the `models_and_pipes_with_nvtx_range` as a `after_pipeline_creation` callback, the methods of all pipeline components get replaced by a NVTX range wrapper **before** the above-mentioned validation takes place. This can be problematic for components that are implemented as Cython extension types - if the extension type is not compiled with Python bindings for its methods, they will have no signatures at runtime. This resulted in `pydantic` matching the *wrapper's* parameters with the those in the config and raising errors.

To avoid this, we now skip applying the wrapper to any (Cython) methods that do not have signatures.
2022-09-12 14:55:41 +02:00
Paul O'Leary McCann
e2265f0864 Handle case in DVC where no commands are appropriate
If only have commands with no deps or outputs (admittedly unlikely), you
get a weird error about the dvc file not existing. This gives explicit
output instead.
2022-09-12 18:24:55 +09:00
Paul O'Leary McCann
b5a0518b06 Update project_run docstring
The docstring indicated that normal use of the function called DVC to
determine if the function should be re-run, but that is not the case.
2022-09-12 17:47:57 +09:00
Paul O'Leary McCann
e75c3f56c6 Add dvc quiet flag to docs 2022-09-12 15:24:57 +09:00
Paul O'Leary McCann
0989ccdc8a Add test for run_commands 2022-09-12 15:22:43 +09:00
Paul O'Leary McCann
e13e11e1d8 Fix flag handling in dvc
Prior to this commit, if a flag (--verbose or --quiet) was passed to
DVC, it would be added to the end of the generated dvc command line.
This would result in the command being interpreted as part of the actual
command to run, rather than an argument to dvc. This would result in
command lines like:

    spacy project run preprocess --verbose

That would fail with an error that there's no such directory as
`--verbose`.

This change puts the flags at the front of the dvc command so that they
are interpreted correctly. It removes the `run_dvc_commands` function,
which had been reduced to just a for loop and wasn't used elsewhere.

A separate problem is that there's no way to specify the quiet behaviour
to dvc from the command line, though it's unclear if that's a bug.
2022-09-12 15:06:09 +09:00
Paul O'Leary McCann
0b6f0643eb Fix dvc 2022-09-12 14:54:28 +09:00
kadarakos
6b83fee58d
Assets message (#11458)
* new error message when 'project run assets'

* new error message when 'project run assets'

* Update spacy/cli/project/run.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-09-09 17:17:10 +02:00
Adriane Boyd
8a86a35eab
Remove has_letters in config template (#11465)
Due to problems with the javascript conversion in the website
quickstart, remove the `has_letters` setting to simplify generating
`attrs` for the default `tok2vec`.

Additionally reduce `PREFIX` as in the trained pipelines.
2022-09-09 15:10:04 +02:00
github-actions[bot]
0c72c6bb2c
Auto-format code with black (#11468)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-09-09 11:21:17 +02:00
Madeesh Kannan
aac9a58c29
Add docs for the spacy.models_and_pipes_with_nvtx_range.v1 callback (#11463)
* Add docs for the `spacy.models_and_pipes_with_nvtx_range.v1` callback

* Add `new` tag
2022-09-09 10:46:01 +02:00
Paul O'Leary McCann
2602a30d32
Fix DVC command example (#11457)
This command doesn't have the project dir, but it's required.
2022-09-08 13:42:47 +02:00
Paul O'Leary McCann
df53e964d2 Fix mypy 2022-09-08 15:09:10 +09:00
Paul O'Leary McCann
b29c0c6083 Clean up run_command
This cleans up run_command to separate Windows and non-Windows argument
prep. It still needs to be verified on Windows.

One other change is to the structure of E970. E970 assumed that if a
command was not found we would always have the command name, but that is
not true on Windows if the input command is a string, which we can't
split reliably.
2022-09-08 14:31:28 +09:00
Paul O'Leary McCann
65ad661914 Remove join_command
Also fixes issue with run_commands on Windows.
2022-09-08 13:14:05 +09:00
Paul O'Leary McCann
515d5c65d5
Add dev docs on satellite packages (#11435)
* Add dev docs on satellite packages

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add displacy link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-09-07 15:24:22 +02:00
Paul O'Leary McCann
ff0522f8da Fix asent pip package name 2022-09-06 19:19:05 +09:00
Paul O'Leary McCann
977dc33312
Add a way to get the URL to download a pipeline to the CLI (#11175)
* Add a dry run flag to download

* Remove --dry-run, add --url option to `spacy info` instead

* Make mypy happy

* Print only the URL, so it's easier to use in scripts

* Don't add the egg hash unless downloading an sdist

* Update spacy/cli/info.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add two implementations of requirements

* Clean up requirements sample slightly

This should make mypy happy

* Update URL help string

* Remove requirements option

* Add url option to docs

* Add URL to spacy info model output, when available

* Add types-setuptools to testing reqs

* Add types-setuptools to requirements

* Add "compatible", expand docstring

* Update spacy/cli/info.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Run prettier on CLI docs

* Update docs

Add a sidebar about finding download URLs, with some examples of the new
command.

* Add download URLs to table on model page

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Updates from review

* download url -> download link

* Update docs

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-02 11:58:21 +02:00
github-actions[bot]
71884d0942
Auto-format code with black (#11427)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-09-02 11:43:20 +02:00
Madeesh Kannan
d1760ebe02
Better handling of unexpected types in SetPredicate (#11312)
* `Matcher`: Better type checking of values in `SetPredicate`
`SetPredicate`: Emit warning and return `False` on unexpected value types

* Rename `value_type_mismatch` variable

* Inline warning

* Remove unexpected type warning from `_SetPredicate`

* Ensure that `str` values are not interpreted as sequences
Check elements of sequence values for convertibility to `str` or `int`

* Add more `INTERSECT` and `IN` test cases

* Test for inputs with multiple characters

* Return `False` early instead of using a boolean flag

* Remove superfluous `int` check, parentheses

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Appy suggestions from code review

* Clarify test comment

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-02 09:09:48 +02:00
Paul O'Leary McCann
0f1d97d87d Formatting 2022-09-02 14:44:16 +09:00
Paul O'Leary McCann
0a0b9714c0 Ignore type of shlex.join
Mypy throws an error for any reference to a module member that doesn't
exist, even with a clear hasattr check like in this code.

https://github.com/python/mypy/issues/5059
2022-09-02 14:42:57 +09:00
Paul O'Leary McCann
5cfd601697 shlex.join is only available in 3.8+, so only use when present
Fallback is former behaviour of joining with a space.
2022-09-02 14:06:13 +09:00
Paul O'Leary McCann
21a4182b41 Import shlex 2022-09-02 13:37:49 +09:00
Paul O'Leary McCann
a53735f018 Use shlex.join in run_command
Previously the command string was built by joining with a space, which
wouldn't work for anything requiring quotes. The command string built
this way is only used for debugging purposes, but shlex.join will be
always be valid on Linux, and will usually be valid in Windows. It's
unclear how to do better on Windows.
2022-09-02 13:33:30 +09:00
Paul O'Leary McCann
2fe360d7f3 Change command used when running on Windows
Before this commit the string version of a command was used when running
on Windows. This changes it so the input - string or list of strings -
is passed directly.

In the case the input was a string, nothing changes.

In the case the input was a list, Python will internally build a command
string such that each element of the list is quoted correctly for
Windows exec call. We can't call that method directly because it's
considered an implementation detail, see below.

https://bugs.python.org/issue10838

An example of when this behavior is correct is if we build a command
line list and sys.executable needs to be quoted because it contains
spaces or something. Before this change the arguments would just be
joined with a space, which would not work.
2022-09-02 13:23:36 +09:00
Paul O'Leary McCann
edfa32da47 Add comment about shlex use on Windows 2022-09-02 13:23:18 +09:00
Paul O'Leary McCann
80c276c5f7 Remove split_command and handle Windows commands
The main change here is the preprocessing of commands run in projects,
before being passed to the actual command running function.

Previously commands were split and the first argument was checked, and
if it was python or pip it was rewritten to use sys.executable. This
change makes it so that preprocessing is not done on Windows, because
there is no reliable way to split the string the same way the Windows
interpreter would.

It is possible to handle a limited set of commands in Windows, like
literal "python" declarations, even without full shell parsing, but then
quoting related to sys.executable has to be handled.
2022-09-02 13:13:40 +09:00
Paul O'Leary McCann
5058607896 Remove use of split_command in dvc
In order to be safe on Windows, this uses string concatenation instead
of relying on shlex.split to create a command list.
2022-09-02 12:51:26 +09:00
Adriane Boyd
78f5503a29
Check for any non-Doc returned value for components (#11424) 2022-09-01 19:37:23 +02:00
Sofie Van Landeghem
8fc0efc502
Allow string argument for disable/enable/exclude (#11406)
* adding unit test for spacy.load with disable/exclude string arg

* allow pure strings in from_config

* update docs

* upstream type adjustements

* docs update

* make docstring more consistent

* Update spacy/language.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* two more cleanups

* fix type in internal method

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-31 09:02:34 +02:00
Daniël de Kok
3f4b4b7b4f
Fix test_{prefer,require}_gpu (#11390)
* Fix `test_{prefer,require}_gpu`

These tests assumed that GPUs are only supported with CuPy, but since Thinc 8.1
we also support Metal Performance Shaders.

* test_misc: arrange thinc imports to be together
2022-08-30 14:21:02 +02:00