* Add a dry run flag to download
* Remove --dry-run, add --url option to `spacy info` instead
* Make mypy happy
* Print only the URL, so it's easier to use in scripts
* Don't add the egg hash unless downloading an sdist
* Update spacy/cli/info.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Add two implementations of requirements
* Clean up requirements sample slightly
This should make mypy happy
* Update URL help string
* Remove requirements option
* Add url option to docs
* Add URL to spacy info model output, when available
* Add types-setuptools to testing reqs
* Add types-setuptools to requirements
* Add "compatible", expand docstring
* Update spacy/cli/info.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Run prettier on CLI docs
* Update docs
Add a sidebar about finding download URLs, with some examples of the new
command.
* Add download URLs to table on model page
* Apply suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Updates from review
* download url -> download link
* Update docs
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* `Matcher`: Better type checking of values in `SetPredicate`
`SetPredicate`: Emit warning and return `False` on unexpected value types
* Rename `value_type_mismatch` variable
* Inline warning
* Remove unexpected type warning from `_SetPredicate`
* Ensure that `str` values are not interpreted as sequences
Check elements of sequence values for convertibility to `str` or `int`
* Add more `INTERSECT` and `IN` test cases
* Test for inputs with multiple characters
* Return `False` early instead of using a boolean flag
* Remove superfluous `int` check, parentheses
* Apply suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Appy suggestions from code review
* Clarify test comment
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Mypy throws an error for any reference to a module member that doesn't
exist, even with a clear hasattr check like in this code.
https://github.com/python/mypy/issues/5059
Previously the command string was built by joining with a space, which
wouldn't work for anything requiring quotes. The command string built
this way is only used for debugging purposes, but shlex.join will be
always be valid on Linux, and will usually be valid in Windows. It's
unclear how to do better on Windows.
Before this commit the string version of a command was used when running
on Windows. This changes it so the input - string or list of strings -
is passed directly.
In the case the input was a string, nothing changes.
In the case the input was a list, Python will internally build a command
string such that each element of the list is quoted correctly for
Windows exec call. We can't call that method directly because it's
considered an implementation detail, see below.
https://bugs.python.org/issue10838
An example of when this behavior is correct is if we build a command
line list and sys.executable needs to be quoted because it contains
spaces or something. Before this change the arguments would just be
joined with a space, which would not work.
The main change here is the preprocessing of commands run in projects,
before being passed to the actual command running function.
Previously commands were split and the first argument was checked, and
if it was python or pip it was rewritten to use sys.executable. This
change makes it so that preprocessing is not done on Windows, because
there is no reliable way to split the string the same way the Windows
interpreter would.
It is possible to handle a limited set of commands in Windows, like
literal "python" declarations, even without full shell parsing, but then
quoting related to sys.executable has to be handled.
* adding unit test for spacy.load with disable/exclude string arg
* allow pure strings in from_config
* update docs
* upstream type adjustements
* docs update
* make docstring more consistent
* Update spacy/language.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* two more cleanups
* fix type in internal method
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Fix `test_{prefer,require}_gpu`
These tests assumed that GPUs are only supported with CuPy, but since Thinc 8.1
we also support Metal Performance Shaders.
* test_misc: arrange thinc imports to be together
* Add lang folder for la (Latin)
* Add Latin lang classes
* Add minimal tokenizer exceptions
* Add minimal stopwords
* Add minimal lex_attrs
* Update stopwords, tokenizer exceptions
* Add la tests; register la_tokenizer in conftest.py
* Update spacy/lang/la/lex_attrs.py
Remove duplicate form in Latin lex_attrs
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update natto-py version spec (#11222)
* Update natto-py version spec
* Update setup.cfg
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Add scorer to textcat API docs config settings (#11263)
* Update docs for pipeline initialize() methods (#11221)
* Update documentation for dependency parser
* Update documentation for trainable_lemmatizer
* Update documentation for entity_linker
* Update documentation for ner
* Update documentation for morphologizer
* Update documentation for senter
* Update documentation for spancat
* Update documentation for tagger
* Update documentation for textcat
* Update documentation for tok2vec
* Run prettier on edited files
* Apply similar changes in transformer docs
* Remove need to say annotated example explicitly
I removed the need to say "Must contain at least one annotated Example"
because it's often a given that Examples will contain some gold-standard
annotation.
* Run prettier on transformer docs
* chore: add 'concepCy' to spacy universe (#11255)
* chore: add 'concepCy' to spacy universe
* docs: add 'slogan' to concepCy
* Support full prerelease versions in the compat table (#11228)
* Support full prerelease versions in the compat table
* Fix types
* adding spans to doc_annotation in Example.to_dict (#11261)
* adding spans to doc_annotation in Example.to_dict
* to_dict compatible with from_dict: tuples instead of spans
* use strings for label and kb_id
* Simplify test
* Update data formats docs
Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Fix regex invalid escape sequences (#11276)
* Add W605 to the errors raised by flake8 in the CI (#11283)
* Clean up automated label-based issue handling (#11284)
* Clean up automated label-based issue handline
1. upgrade tiangolo/issue-manager to latest
2. move needs-more-info to tiangolo
3. change needs-more-info close time to 7 days
4. delete old needs-more-info config
* Use old, longer message
* Fix label name
* Fix Dutch noun chunks to skip overlapping spans (#11275)
* Add test for overlapping noun chunks
* Skip overlapping noun chunks
* Update spacy/tests/lang/nl/test_noun_chunks.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Docs: displaCy documentation - data types, `parse_{deps,ents,spans}`, spans example (#10950)
* add in spans example and parse references
* rm autoformatter
* rm extra ents copy
* TypedDict draft
* type fixes
* restore non-documentation files
* docs update
* fix spans example
* fix hyperlinks
* add parse example
* example fix + argument fix
* fix api arg in docs
* fix bad variable replacement
* fix spacing in style
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* fix spacing on table
* fix spacing on table
* rm temp files
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* include span_ruler for default warning filter (#11333)
* Add uk pipelines to website (#11332)
* Check for . in factory names (#11336)
* Make fixes for PR #11349
* Fix roman numeral coverage in #11349
Co-authored-by: Patrick J. Burns <patricks@diyclassics.org>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Lj Miranda <12949683+ljvmiranda921@users.noreply.github.com>
Co-authored-by: Jules Belveze <32683010+JulesBelveze@users.noreply.github.com>
Co-authored-by: stefawolf <wlf.ste@gmail.com>
Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com>
Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>
* Fix lookup usage (fix#11347)
Before using the lookups table in the French (and Catalan) lemmatizers,
there's a check to see if the current term is in the table. But it's
checking a string against hashes, so it's always false. Also the table
lookup function is designed so you don't have to do that anyway.
* Use the lookup table directly
* Use string, not token
* Add token and span custom attributes to to_json()
* Change logic for to_json
* Add functionality to from_json
* Small adjustments
* Move token/span attributes to new dict key
* Fix test
* Fix the same test but much better
* Add backwards compatibility tests and adjust logic
* Add test to check if attributes not set in underscore are not saved in the json
* Add tests for json compatibility
* Adjust test names
* Fix tests and clean up code
* Fix assert json tests
* small adjustment
* adjust naming and code readability
* Adjust naming, added more tests and changed logic
* Fix typo
* Adjust errors, naming, and small test optimization
* Fix byte tests
* Fix bytes tests
* Change naming and json structure
* update schema
* Update spacy/schemas.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/tokens/doc.pyx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/tokens/doc.pyx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/schemas.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update schema for underscore attributes
* Adjust underscore schema
* adjust schema tests
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Clean up automated label-based issue handline
1. upgrade tiangolo/issue-manager to latest
2. move needs-more-info to tiangolo
3. change needs-more-info close time to 7 days
4. delete old needs-more-info config
* Use old, longer message
* Fix label name
* adding spans to doc_annotation in Example.to_dict
* to_dict compatible with from_dict: tuples instead of spans
* use strings for label and kb_id
* Simplify test
* Update data formats docs
Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Added examples for Slovene
* Update spacy/lang/sl/examples.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Corrected a typo in one of the sentences
* Updated support for Slovenian
* Some minor changes to corrections
* Added forint currency
* Corrected HYPHENS_PERMITTED regex and some formatting
* Minor changes
* Un-xfail tokenizer test
* Format
Co-authored-by: Luka Dragar <D20124481@mytudublin.ie>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>