Commit Graph

15737 Commits

Author SHA1 Message Date
Paul O'Leary McCann
201731df2d Move spans2ints to util 2022-07-03 15:12:53 +09:00
Paul O'Leary McCann
1dacecbbfb Run black 2022-07-03 14:49:02 +09:00
Paul O'Leary McCann
5192ac1617 Clean tests. 2022-07-03 14:48:42 +09:00
Paul O'Leary McCann
7c1bf2fa1f
Merge pull request #11062 from explosion/autoblack
Auto-format code with black
2022-07-03 14:35:53 +09:00
Paul O'Leary McCann
79720886fa Merge branch 'feature/coref' into fix/coref-alignment
Had to renumber error message.
2022-07-01 19:09:29 +09:00
Paul O'Leary McCann
c59aeeb0ae
Merge pull request #11043 from kadarakos/feature/coref
Merging master into Feature/coref
2022-07-01 19:04:21 +09:00
explosion-bot
7e55a51314 Auto-format code with black 2022-07-01 08:04:32 +00:00
Paul O'Leary McCann
e8fdbfc65e Minor fix in Lemmatizer docs 2022-07-01 14:28:03 +09:00
Madeesh Kannan
eaf66e7431
Add NVTX ranges to TrainablePipe components (#10965)
* `TrainablePipe`: Add NVTX range decorator

* Annotate `TrainablePipe` subclasses with NVTX ranges

* Export function signature to allow introspection of args in tests

* Revert "Annotate `TrainablePipe` subclasses with NVTX ranges"

This reverts commit d8684f7372.

* Revert "Export function signature to allow introspection of args in tests"

This reverts commit f4405ca3ad.

* Revert "`TrainablePipe`: Add NVTX range decorator"

This reverts commit 26536eb6b8.

* Add `spacy.pipes_with_nvtx_range` pipeline callback

* Show warnings for all missing user-defined pipe functions that need to be annotated
Fix imports, typos

* Rename `DEFAULT_ANNOTATABLE_PIPE_METHODS` to `DEFAULT_NVTX_ANNOTATABLE_PIPE_METHODS`
Reorder import

* Walk model nodes directly whilst applying NVTX ranges
Ignore pipe method wrapper when applying range
2022-06-30 11:28:12 +02:00
Adriane Boyd
3fe9f47de4
Revert "disable failing test because Stanford servers are down (#11015)" (#11054)
This reverts commit f8116078ce.
2022-06-30 11:24:54 +02:00
Adriane Boyd
3bc1fe0a78
Update cupy extras (#11055)
* Add cuda116 and cuda117 extras

* Revert "remove `cuda116` extra from install widget (#11012)"

This reverts commit e7b498fb1f.

* Add cuda117 to quickstart
2022-06-30 11:24:37 +02:00
Shen Qin
be00db6645
Addition of min_max quantifier in matcher {n,m} (#10981)
* Min_max_operators
1. Modified API and Usage for spaCy website to include min_max operator
2. Modified matcher.pyx to include min_max function {n,m} and its variants
3. Modified schemas.py to include min_max validation error
4. Added test cases to test_matcher_api.py, test_matcher_logic.py and test_pattern_validation.py

* attempt to fix mypy/pydantic compat issue

* formatting

* Update spacy/tests/matcher/test_pattern_validation.py

Co-authored-by: Source-Shen <82353723+Source-Shen@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-30 11:01:58 +02:00
Adriane Boyd
4581a4f53f
Run mypy for python 3.10 (#11052) 2022-06-29 20:03:36 +02:00
Daniël de Kok
0ff14aabce
vectors: avoid expensive comparisons between numpy ints and Python ints (#10992)
* vectors: avoid expensive comparisons between numpy ints and Python ints

* vectors: avoid failure on lists of ints

* Convert another numpy int to Python
2022-06-29 12:58:31 +02:00
Paul O'Leary McCann
dd812ca84a Handle case with nothing to score in span predictor
This case was not handled correctly. It may be desirable to make changes
in the coref component to make sure this doesn't happen, but the span
predictor should also handle this kind of data intelligently internally.

Note that something is still weird because the span predictor seems to
not be learning.
2022-06-29 19:30:37 +09:00
kadarakos
0076f0f617 span predictor device fix 2022-06-29 06:58:47 +00:00
Peter Baumgartner
dd038b536c
fix to horizontal space (#10994) 2022-06-28 20:42:40 +02:00
Adriane Boyd
24f4908fce
Update vector handling in similarity methods (#11013)
Distinguish between vectors that are 0 vs. missing vectors when warning
about missing vectors.

Update `Doc.has_vector` to match `Span.has_vector` and
`Token.has_vector` for cases where the vocab has vectors but none of the
tokens in the container have vectors.
2022-06-28 19:50:47 +02:00
Madeesh Kannan
1d5cad0b42
Example.get_aligned_parse: Handle unit and zero length vectors correctly (#11026)
* `Example.get_aligned_parse`: Do not squeeze gold token idx vector
Correctly handle zero-size vectors passed to `np.vectorize`

* Add tests

* Use `Doc` ctor to initialize attributes

* Remove unintended change

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Remove unused import

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-28 19:42:58 +02:00
Richard Hudson
a9559e7435
Handle Cyrillic combining diacritics (#10837)
* Handle Russian, Ukrainian and Bulgarian

* Corrections

* Correction

* Correction to comment

* Changes based on review

* Correction

* Reverted irrelevant change in punctuation.py

* Remove unnecessary group

* Reverted accidental change
2022-06-28 15:35:32 +02:00
Zackere
8ffff18ac4
Try cloning repo from main & master (#10843)
* Try cloning repo from main & master

* fixup! Try cloning repo from main & master

* fixup! fixup! Try cloning repo from main & master

* refactor clone and check for repo:branch existence

* spacing fix

* make mypy happy

* type util function

* Update spacy/cli/project/clone.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-28 09:11:15 -04:00
kadarakos
1a782592c4 make sure same device 2022-06-28 12:53:20 +00:00
kadarakos
9f9453865a Merge branch 'master' into feature/coref 2022-06-28 10:27:35 +00:00
Paul O'Leary McCann
d1ff933e9b Test works
This may not be done yet, as the test is just for consistency, and not
overfitting correctly yet.
2022-06-28 19:15:33 +09:00
Paul O'Leary McCann
ef5762d78e Bad hack to get tests to run
This changes the tok2vec size in coref to hardcoded 64 to get tests to
run. This should be reverted and hopefully replaced with proper shape
inference.
2022-06-28 19:06:13 +09:00
Paul O'Leary McCann
af6d5ae2fe Initial test of mismatched tokenization
This runs, but the results are nonsense because the indices are off.
2022-06-28 19:05:47 +09:00
Eric Holscher
308a612ec9
Remove simply (#11017)
I was reading this page, and as a relative beginner, nothing about it was simple :)
2022-06-27 09:45:22 +02:00
github-actions[bot]
4155a59d47
Auto-format code with black (#11022)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-06-27 09:35:35 +02:00
Adriane Boyd
738b38064f
Merge pull request #11021 from adrianeboyd/chore/v3.4.0
Set version to v3.4.0
2022-06-24 14:54:16 +02:00
Madeesh Kannan
8f1ba4de58
Backport parser/alignment optimizations from feature/refactor-parser (#10952) 2022-06-24 13:39:52 +02:00
Adriane Boyd
d9320db7db Temporarily skip tests that require models/compat 2022-06-24 11:20:53 +02:00
Adriane Boyd
bffe54d02b Set version to v3.4.0 2022-06-24 08:48:58 +02:00
Peter Baumgartner
9738b69c0e
Update Code Conventions.md (#11018) 2022-06-24 15:11:29 +09:00
Dmytro Sadovnychyi
4cd8b4cc22
Fix some of the broken links on universe pages (#11011)
Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:

```
https://github.com/https://github.com/explosion
```

[0] https://spacy.io/universe/project/spacy-experimental


Also one remains broken with `https://szegedai.github.io/`.
2022-06-23 17:53:00 +02:00
Sofie Van Landeghem
f8116078ce
disable failing test because Stanford servers are down (#11015) 2022-06-23 10:57:46 +02:00
Adriane Boyd
d4e3f43639
Update thinc version to switch back to blis v0.7 (#11014) 2022-06-23 09:50:25 +02:00
Adriane Boyd
f1197d9175
Add API docs for token attribute symbols (#10836)
* Add API docs for token attribute symbols

* Remove NBSP's

* Fix typo

* Rephrase

Co-authored-by: svlandeg <svlandeg@github.com>
2022-06-23 08:16:38 +02:00
Peter Baumgartner
3335bb9d0c
remove cuda116 extra from install widget (#11012) 2022-06-23 08:15:28 +02:00
jademlc
bed23ff291
Update serialization methods code block (#11004)
* Update serialization methods code block

* Update website/docs/usage/saving-loading.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-22 20:45:26 +02:00
Paul O'Leary McCann
16894e665d
Refactor Coval Scoring code (#10875)
* Move coref scoring code to scorer.py

Includes some renames to make names less generic.

* Refactor coval code to remove ternary expressions

* Black formatting

* Add header

* Make scorers into registered scorers

* Small test fixes

* Skip coref tests when torch not present

Coref can't be loaded without Torch, so nothing works.

* Fix remaining type issues

Some of this just involves ignoring types in thorny areas. Two main
issues:

1. Some things have weird types due to indirection/ argskwargs
2. xp2torch return type seems to have changed at some point

* Update spacy/scorer.py

Co-authored-by: kadarakos <kadar.akos@gmail.com>

* Small changes from review

* Be specific about the ValueError

* Type fix

Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-06-22 16:05:52 +09:00
Sofie Van Landeghem
0fa004c4cd the 'new' indicator wants a 'number' (#10997) 2022-06-21 22:01:16 +02:00
Philip Vollet
1ae13b2a70
Merge pull request #10991 from Lucaterre/master
updated spacy universe for spacyfishing
2022-06-21 10:33:26 +02:00
Daniël de Kok
0271306f16
Use thinc-apple-ops>=0.1.0.dev0 with apple extras (#10904)
* Use thinc-apple-ops>=0.1.0.dev0 with `apple` extras

Also test with thinc-apple-ops that is at least 0.1.0.dev0.

* Check thinc-apple-ops on macOS with Python 3.10

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Use `pip install --pre` for installing thinc-apple-ops in CI

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-21 08:26:59 +02:00
Victoria
a08ca064e5
Update linguistic-features.md (#10993)
Change link for downloading fasttext word vectors
2022-06-21 15:03:41 +09:00
Lucaterre
2820d7dd8d correct typo in universe.json for 'code_example' key : pipe name 'entityfishing' 2022-06-20 15:26:23 +02:00
Lucaterre
cdad815c68 updated spacy universe for spacyfishing 2022-06-20 14:28:49 +02:00
Sofie Van Landeghem
f00254ae27
add counts to verbose list of NER labels (#10957) 2022-06-20 09:48:40 +02:00
Raphael Mitsch
4c058eb40a
enable argument for spacy.load() (#10784)
* Enable flag on spacy.load: foundation for include, enable arguments.

* Enable flag on spacy.load: fixed tests.

* Enable flag on spacy.load: switched from pretrained model to empty model with added pipes for tests.

* Enable flag on spacy.load: switched to more consistent error on misspecification of component activity. Test refactoring. Added  to default config.

* Enable flag on spacy.load: added support for fields not in pipeline.

* Enable flag on spacy.load: removed serialization fields from supported fields.

* Enable flag on spacy.load: removed 'enable' from config again.

* Enable flag on spacy.load: relaxed checks in _resolve_component_activation_status() to allow non-standard pipes.

* Enable flag on spacy.load: fixed relaxed checks for _resolve_component_activation_status() to allow non-standard pipes. Extended tests.

* Enable flag on spacy.load: comments w.r.t. resolution workarounds.

* Enable flag on spacy.load: remove include fields. Update website docs.

* Enable flag on spacy.load: updates w.r.t. changes in master.

* Implement Doc.from_json(): update docstrings.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): remove newline.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): change error message for E1038.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Enable flag on spacy.load: wrapped docstring for _resolve_component_status() at 80 chars.

* Enable flag on spacy.load: changed exmples for enable flag.

* Remove newline.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix docstring for Language._resolve_component_status().

* Rename E1038 to E1042.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-06-17 20:24:13 +01:00
Sofie Van Landeghem
eaeca5eb6a
account for NER labels with a hyphen in the name (#10960)
* account for NER labels with a hyphen in the name

* cleanup

* fix docstring

* add return type to helper method

* shorter method and few more occurrences

* user helper method across repo

* fix circular import

* partial revert to avoid circular import
2022-06-17 20:02:37 +01:00
github-actions[bot]
6313787fb6
Auto-format code with black (#10977)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-06-17 19:41:55 +01:00