Commit Graph

15833 Commits

Author SHA1 Message Date
Sofie Van Landeghem
29649589fc
remove dtype (#11615) 2022-10-11 15:25:05 +02:00
Sofie Van Landeghem
ef74f8f5e4
Fix mypy error in edittree lemmatizer (#11612)
* cleanup imports

* try limiting Thinc to previous release

* remove Model specification

* fix code and revert Thinc constraint
2022-10-11 14:15:22 +02:00
Adriane Boyd
8cd77dd54c
Sync flake8 version across requirements (#11580) 2022-10-04 11:23:04 +02:00
Sofie Van Landeghem
b187076a2d
fix docs (#11573) 2022-10-03 17:01:04 +02:00
Sofie Van Landeghem
3033babe98
Merge pull request #11571 from svlandeg/copy_develop
update develop with latest from master, incl CI fix
2022-10-03 14:05:51 +02:00
svlandeg
83425d4f6f Merge branch 'copy_master' into copy_develop 2022-10-03 13:06:31 +02:00
Sofie Van Landeghem
70e21dfcad
PR to test importlib-metadata (#11569)
* empty commit

* restrict importlib-metadata to lower than 5.0.0

* restrict importlib-metadata also for validate CI step

* set fixed version for CI

* try flake8 5.0.4 in CI validation step

* from importlib-metadata from requirements again
2022-10-03 13:04:03 +02:00
Paul O'Leary McCann
087cc74c6a
Remove mention of 1.7 from issue template (#11570)
It's rare to have anyone using v1 anymore, so this message is no longer
helpful.
2022-10-03 11:53:21 +02:00
Sofie Van Landeghem
bf6e43ab2f
Merge pull request #11563 from svlandeg/develop_copy
update develop with latest from master
2022-10-03 09:34:38 +02:00
svlandeg
9c8cdb403e Merge branch 'master_copy' into develop_copy 2022-09-30 15:40:26 +02:00
Gabriele Picco
ff9002b726
Add Zshot Spacy plugin (#11557)
* Add Zshot Spacy plugin

Add Zshot (Zero and Few shot named entity & relationships recognition) Spacy plugin

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-29 17:34:44 +02:00
Sofie Van Landeghem
bcda8bc1e7
update mypy to latest version (#11546)
* update mypy and disable it for python 3.6

* ignoring mypy's type redefinition error
2022-09-29 14:24:40 +02:00
Paul O'Leary McCann
ba63f57f81
Update docs to reflect Doc input to Language (#11555) 2022-09-29 18:50:29 +09:00
Adriane Boyd
6d7630c5d3
Allow overriding spacy_version in spacy package meta (#11552) 2022-09-29 10:44:06 +02:00
Kevin Humphreys
bf4b353ce5 handle sets inside regex operator 2022-09-28 16:08:37 -07:00
Peter Baumgartner
e794d4ae39
debug data Spancat Table Improvements (#11504)
* update

* fix format function

* pull out _format_number

* format with black
2022-09-28 17:16:05 +02:00
Raphael Mitsch
aea16719be
Simplify and clarify enable/disable behavior of spacy.load() (#11459)
* Change enable/disable behavior so that arguments take precedence over config options. Extend error message on conflict. Add warning message in case of overwriting config option with arguments.

* Fix tests in test_serialize_pipeline.py to reflect changes to handling of enable/disable.

* Fix type issue.

* Move comment.

* Move comment.

* Issue UserWarning instead of printing wasabi message. Adjust test.

* Added pytest.warns(UserWarning) for expected warning to fix tests.

* Update warning message.

* Move type handling out of fetch_pipes_status().

* Add global variable for default value. Use id() to determine whether used values are default value.

* Fix default value for disable.

* Rename DEFAULT_PIPE_STATUS to _DEFAULT_EMPTY_PIPES.
2022-09-27 14:22:36 +02:00
Taniguchi Yasufumi
9557b0fb01
Add spacy-partial-tagger to spaCy Universe (#11538) 2022-09-27 14:11:50 +02:00
Jacobo Myerston
3e8bc1272f
add punctuation to grc (#11426)
* add punctuation to grc

Add support for special editorial punctuation that is common in ancient Greek texts.  Ancient Greek texts, as found in digital and print form, have been largely edited by scholars. Restorations and improvements are normally marked with special characters that need to be handled properly by the tokenizer.

* add unit tests

* simplify regex

* move generic quotes to char classes

* rename unit test

* fix regex

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: svlandeg <svlandeg@github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-27 11:38:56 +02:00
Paul O'Leary McCann
a44b7d4622
Add experimental coref docs (#11291)
* Add experimental coref docs

* Docs cleanup

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply changes from code review

* Fix prettier formatting

It seems a period after a number made this think it was a list?

* Update docs on examples for initialize

* Add docs for coref scorers

* Remove 3.4 notes from coref

There won't be a "new" tag until it's in core.

* Add docs for span cleaner

* Fix docs

* Fix docs to match spacy-experimental

These weren't properly updated when the code was moved out of spacy
core.

* More doc fixes

* Formatting

* Update architectures

* Fix links

* Fix another link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2022-09-27 18:11:23 +09:00
Adriane Boyd
877671e09a
Preserve missing entity annotation in augmenters (#11540)
Preserve both `-` and `O` annotation in augmenters rather than relying
on `Example.to_dict`'s default support for one option outside of labeled
entity spans.

This is intended as a temporary workaround for augmenters for v3.4.x.
The behavior of `Example` and related IOB utils could be improved in the
general case for v3.5.
2022-09-27 10:16:51 +02:00
Paul O'Leary McCann
936a5f0506
Fix English pipeline names in 3.4 release notes (#11542) 2022-09-27 08:25:24 +02:00
Richard Hudson
6f692a06d5
Remove side effects from Doc.__init__() (#11506)
* Remove side effects from Doc.__init__()

* Changes based on review comment

* Readd test

* Change interface of Doc.__init__()

* Simplify test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update doc.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-26 15:58:21 +02:00
Basile Dura
f40d2fac29
fix: remove duplicate v3.2 (#11530) 2022-09-23 13:18:51 +02:00
Kevin Humphreys
0da324ab5b reinstate FUZZY operator
with length-based distance function
2022-09-22 19:26:52 -07:00
Kevin Humphreys
eab96f7c03 fix min distance 2022-09-22 15:37:19 -07:00
Raphael Mitsch
af9b01ef97
Add dependency check to project step runs (#11226)
* Add dependency check to project step running.

* Fix dependency mismatch warning.

* Remove newline.

* Add types-setuptools to setup.cfg.

* Move types-setuptools to test requirements. Move warnings into _validate_requirements(). Handle file reading in project_run().

* Remove newline formatting for output of package conflicts.

* Show full version conflict message instead of just package name.

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix typo.

* Re-add rephrasing of message for conflicting packages. Remove requirements path redundancy.

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Print unified message for requirement conflicts and missing requirements.

* Update spacy/cli/project/run.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix warning message.

* Print conflict/missing messages individually.

* Print conflict/missing messages individually.

* Add check_requirements setting in project.yml to disable requirements check.

* Update website/docs/usage/projects.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/usage/projects.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update description of project.yml structure in projects.md.

* Update website/docs/usage/projects.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Prettify projects docs.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-09-16 16:54:31 +02:00
github-actions[bot]
279358be63
Auto-format code with black (#11513)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-09-16 11:50:19 +02:00
Sofie Van Landeghem
df0b815c23
more explicit Example constructor example (#11489)
* make constructor example for Example more explicit

* shorten example and add spaces
2022-09-16 09:26:33 +02:00
Kevin Humphreys
4a677acf5d don't allow more edits than characters 2022-09-15 16:14:24 -07:00
Kevin Humphreys
252e9ab3af exclude whitespace tokens 2022-09-15 15:50:07 -07:00
Kevin Humphreys
a1c984043a remove polyleven 2022-09-15 12:42:17 -07:00
Kevin Humphreys
711f16cc82 Merge branch 'master' into rapidfuzz 2022-09-15 11:54:16 -07:00
Sofie Van Landeghem
d5c8498f2f
disable mypy run for Python 3.10 (#11508) (#11511) 2022-09-15 17:41:25 +02:00
Sofie Van Landeghem
0509f90874
add dot (#11500) 2022-09-15 17:29:42 +02:00
Sofie Van Landeghem
ca1ad67458
disable mypy run for Python 3.10 (#11508) 2022-09-15 15:51:19 +02:00
Kevin Humphreys
b393525b50 Merge branch 'rapidfuzz' of https://github.com/kwhumphreys/spaCy into rapidfuzz 2022-09-14 15:56:18 -07:00
Kevin Humphreys
b7599dfb2f fuzzy match only on oov tokens 2022-09-14 15:54:05 -07:00
Adriane Boyd
7c98245c0c
Add levenshtein from polyleven (#11418)
Add a simple levenshtein distance function using the implementation from
the polyleven library as `spacy.matcher.levenshtein`.
2022-09-14 17:05:22 +02:00
Richard Hudson
3f0c3ad7d3
Correct alignment example and documentation (#11491)
* Correct example and documentation

* Added altered example.md

* Changes based on review + apply prettier

* Remote unnecessary 'the'

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2022-09-14 09:36:55 +02:00
Adriane Boyd
6be6913ba5
Update cupy extras (#11279)
* Update cupy extras:

* Extend to v11
* Add `cupy-cuda11x` and `cupy-wheel`
* Update quickstart to use `cupy-wheel` for CUDA 10.2+

* Rename cuda-wheel to cuda-autodetect, remove repeated CUDA in menu
2022-09-13 09:04:53 +02:00
Kevin Humphreys
a6d26a0195 switch to polyleven
(Python package)
2022-09-12 16:45:51 -07:00
Kevin Humphreys
568a843c09 revert changes added for fuzzy param 2022-09-12 16:45:51 -07:00
Kevin Humphreys
3591a69d35 switch to FUZZYn predicates
use Levenshtein distance.
remove fuzzy param.
remove rapidfuzz_capi.
2022-09-12 16:45:51 -07:00
Kevin Humphreys
974e5f9902 case fix 2022-09-12 16:45:51 -07:00
Kevin Humphreys
e636f4941b simplify fuzzy sets 2022-09-12 16:45:51 -07:00
Kevin Humphreys
9c0f9368a9 handle fuzzy sets 2022-09-12 16:45:51 -07:00
Kevin Humphreys
0859e391c6 remove unnecessary dependency 2022-09-12 16:45:50 -07:00
Kevin Humphreys
ee25d434b6 tidying 2022-09-12 16:45:50 -07:00
Kevin Humphreys
3dba984db9 fix type properly 2022-09-12 16:45:50 -07:00