Commit Graph

15299 Commits

Author SHA1 Message Date
kadarakos
647d1e188e Spancat speed improvement (#12577)
* avoid nesting then flattening

* mypy fix

* Apply suggestions from code review

* Add type for indices

* Run full matrix for mypy

* Add back modified type: ignore

* Revert "Run full matrix for mypy"

This reverts commit e218873d04.

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-25 09:04:48 +02:00
Adriane Boyd
d2464d7bc9 Switch from azure to GHA 2023-05-25 08:52:02 +02:00
Adriane Boyd
6e8ab15445
Merge pull request #11964 from adrianeboyd/backport/v3.2.5
Backport bug fixes to v3.2.x
2022-12-14 18:33:05 +01:00
Adriane Boyd
427de63f0a Set version to v3.2.5 2022-12-13 13:21:53 +01:00
Adriane Boyd
386a3e69da CI and precommit hooks: switch to flake8==5.0.4 2022-12-13 13:21:41 +01:00
Adriane Boyd
b449d355d5 CI: Install thinc-apple-ops through extra (#11963) 2022-12-13 13:21:41 +01:00
Paul O'Leary McCann
e73755e49f Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)
* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6
2022-12-13 13:21:41 +01:00
Adriane Boyd
41afbb2f89 Modernize and simplify CI steps (#11738)
* Use `build` instead of `python setup.py sdist`
* Remove in-place build with `setup.py`
* Remove `gpu` parameter and GPU tests
* Keep `architecture` and `num_build_jobs` in azure steps with CI
  defaults
* Fix use of `num_build_jobs` parameters
* Remove now-unused `prefix` parameter
* Test imports and CLI before installing test requirements
  * Remove `*.egg-info` directory in addition to source directory for an
    warning-free `import spacy`
2022-12-13 13:21:41 +01:00
Adriane Boyd
571ef56fa9 Modify similarity tests to avoid spurious warnings 2022-12-13 13:21:41 +01:00
Adriane Boyd
1a5352e423 Clean up warnings in the test suite (#11331) 2022-12-13 13:21:41 +01:00
Adriane Boyd
e3ef798e03 Rename test helper method with non-test_ name (#11701) 2022-12-12 11:09:14 +01:00
Adriane Boyd
8cfc4c7325 Cast to uint64 for all array-based doc representations (#11933)
* Convert all individual values explicitly to uint64 for array-based doc representations

* Temporarily test with latest numpy v1.24.0rc

* Remove unnecessary conversion from attr_t

* Reduce number of individual casts

* Convert specifically from int32 to uint64

* Revert "Temporarily test with latest numpy v1.24.0rc"

This reverts commit eb0e3c5006.

* Also use int32 in tests
2022-12-12 11:09:14 +01:00
Paul O'Leary McCann
3ac7230abd Config generation fails for GPU without transformers (#11899)
If you don't have spacy-transformers installed, but try to use `init
config` with the GPU flag, you'll get an error. The issue is that the
`use_transformers` flag in the config is conflated with the GPU flag,
and then there's an attempt to access transformers config info that may
not exist.

There may be a better way to do this, but this stops the error.
2022-12-12 11:09:14 +01:00
Paul O'Leary McCann
0de7892033 Add in errors used in the beam code that were removed at some point (#11935)
I don't think there's any way to use the beam code at the moment, but as
long as it's around the errors it refers to should also be present.
2022-12-12 11:09:14 +01:00
Adriane Boyd
21204f17c7 Add smart_open requirement, update deprecated options (#11864)
* Switch from deprecated `ignore_ext` to `compression`
* Add upload/download test for local files
2022-12-12 11:09:14 +01:00
Adriane Boyd
a8b883fead Fix spancat for zero suggestions (#11860)
* Add test for spancat predict with zero suggestions

* Fix spancat for zero suggestions

* Undo changes to extract_spans

* Use .sum() as in update
2022-12-12 11:09:14 +01:00
Adriane Boyd
cca1e21ad6 Revert "Add click pin to avoid typer issues (#10573)"
This reverts commit 9966e08f32.
2022-12-12 11:09:14 +01:00
Adriane Boyd
346a25f587 Support env var for num build jobs (#11073) 2022-07-04 20:51:02 +02:00
Adriane Boyd
9a566e7d2b Extend build constraints for aarch64 2022-07-04 13:31:48 +02:00
Adriane Boyd
b50fe5ec68
Merge pull request #10577 from adrianeboyd/chore/backport-click-pin-v3.2.x
Backport click pin, set version to v3.2.4
2022-03-29 17:46:35 +02:00
Adriane Boyd
259ad994e2 Set version to v3.2.4 2022-03-29 14:59:29 +02:00
Adriane Boyd
03bee62568 Add click pin to avoid typer issues (#10573) 2022-03-29 14:58:57 +02:00
Adriane Boyd
b2f34b1507
Merge pull request #10399 from adrianeboyd/chore/undo-blis-test
Revert temporary blis test
2022-03-01 16:14:01 +01:00
Adriane Boyd
19b16f047f Revert "Test spacy v3.2.3 with blis v0.7.6"
This reverts commit bee99548e0.
2022-03-01 13:38:03 +01:00
Adriane Boyd
b6fa6ef94d Revert "Fix requirements in setup.cfg"
This reverts commit 9de43ab0a8.
2022-03-01 13:37:52 +01:00
Adriane Boyd
9de43ab0a8 Fix requirements in setup.cfg 2022-03-01 13:25:05 +01:00
Adriane Boyd
bee99548e0 Test spacy v3.2.3 with blis v0.7.6 2022-03-01 13:19:12 +01:00
Adriane Boyd
99425de369
Set version to v3.2.3 (#10392) 2022-02-28 12:54:33 +01:00
Adriane Boyd
b31993e03c
Merge pull request #10389 from adrianeboyd/chore/v3.2-backport-10324-2
Fix Tok2Vec for empty batches (#10324)
2022-02-28 11:18:25 +01:00
Adriane Boyd
f606e1d044 Fix Tok2Vec for empty batches (#10324)
* Add test for tok2vec with vectors and empty docs

* Add shortcut for empty batch in Tok2Vec.predict

* Avoid types
2022-02-28 09:08:05 +01:00
Adriane Boyd
bbaf41fb3b
Set version to v3.2.2 (#10262) 2022-02-11 11:45:26 +01:00
Edward
7961a0a959
Fix typo in errors (#10256) 2022-02-10 13:45:46 +01:00
Ryn Daniels
2d6cabb23c
Fix the date command and the matrix failure mode (#10254) 2022-02-10 12:06:30 +01:00
Peter Baumgartner
ee662ec381
Raise error in spacy package when model name is not a valid python identifier (#10192)
* MultiHashEmbed vector docs correction

* raise error for invalid identifier as model name

* more succinct error message

* update success message

* permitted package name + double underscore

* clarify package name error

* clarify underscore run message

* tweak language + simplify underscore run

* cleanup underscore run warning

* spacing correction

* Update spacy/tests/test_cli.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-02-10 08:15:23 +01:00
Ryn Daniels
3877f78ff9
fix the syntax for the slow/gpu test crons (#10244) 2022-02-09 11:21:20 +01:00
John Boy
10c77af83d
add textnets to spaCy universe (#10216)
https://github.com/jboynyc/textnets/issues/38
2022-02-09 15:04:26 +09:00
Ines Montani
7b883da9fd
Merge pull request #10239 from explosion/docs/spacy-tailored-pipelines [ci skip] 2022-02-08 18:04:01 +01:00
Ramon Ziai
6477dafac2
fix(phrasematcher.pyi): change type annotation of docs in add() to List[Doc] (#10235)
https://github.com/explosion/spaCy/issues/10234
2022-02-08 13:37:27 +01:00
Ines Montani
f2c2b97e56 Add spaCy Tailored Pipelines 2022-02-08 11:46:42 +01:00
Adriane Boyd
a9ee5bff98
Support mixed case model package names (#10223) 2022-02-08 10:52:46 +01:00
Ryn Daniels
f939da0bfa
Add github actions for slow and gpu tests (#10225)
* Add github actions for slow and gpu tests

* change weekly GPU tests to also run slow tests, and change the time

* only run the tests if there were commits in the past day
2022-02-08 10:05:35 +01:00
Sofie Van Landeghem
deb143fa70
Token sent attributes more consistent (#10164)
* remove duplicate line

* add sent start/end token attributes to the docs

* let has_annotation work with IS_SENT_END

* elif instead of if

* add has_annotation test for sent attributes

* fix typo

* remove duplicate is_sent_start entry in docs
2022-02-08 08:35:37 +01:00
Peter Baumgartner
836f689cc7
YAML multiline tip for project.yml files (#10187)
* MultiHashEmbed vector docs correction

* add in multi-line tip

* convert to sidebar tip
2022-02-08 08:35:09 +01:00
Kenneth Enevoldsen
e4625d2fc3
Added Augmenty to universe (#10229)
* Added Augmenty to universe

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-02-08 08:32:11 +01:00
Lj Miranda
42072f4468
Add spancat pipeline in spacy debug data (#10070)
* Setup debug data for spancat

* Add check for missing labels

* Add low-level data warning error

* Improve logic when compiling the gold train data

* Implement check for negative examples

* Remove breakpoint

* Remove ws_ents and missing entity checks

* Fix mypy errors

* Make variable name spans_key consistent

* Rename pipeline -> component for consistency

* Account for missing labels per spans_key

* Cleanup variable names for consistency

* Improve brevity of conditional statements

* Remove unused variables

* Include spans_key as an argument for _get_examples

* Add a conditional check for spans_key

* Update spancat debug data based on new API

- Instead of using _get_labels_from_model(), I'm now using
_get_labels_from_spancat() (cf. https://github.com/explosion/spaCy/pull10079)
- The way information is displayed was also changed (text -> table)

* Rename model_labels to ensure mypy works

* Update wording on warning messages

Use "span type" instead of "entity type" in wording the warning messages.
This is because Spans aren't necessarily entities.

* Update component type into a Literal

This is to make it clear that the component parameter should only accept
either 'spancat' or 'ner'.

* Update checks to include actual model span_keys

Instead of looking at everything in the data, we only check those
span_keys from the actual spancat component. Instead of doing the filter
inside the for-loop, I just made another dictionary,
data_labels_in_component to hold this value.

* Update spacy/cli/debug_data.py

* Show label counts only when verbose is True

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-02-07 15:03:36 +01:00
Lj Miranda
72fece712f
Add shuffle parameter to Corpus API docs (#10220)
* Add shuffle parameter to Corpus API docs

* Update website/docs/api/corpus.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-02-07 14:55:53 +01:00
Adriane Boyd
63e1e4e8f6
Fix debug data check for ents that cross sents (#10188)
* Fix debug data check for ents that cross sents

* Use aligned sent starts to have the same indices for the NER and sent
start annotation
* Add a temporary, insufficient hack for the case where a
sentence-initial reference token is split into multiple tokens in the
predicted doc, since `Example.get_aligned("SENT_START")` currently
aligns `True` to all the split tokens.

* Improve test example

* Use Example.get_aligned_sent_starts

* Add test for crossing entity
2022-02-07 08:53:30 +01:00
github-actions[bot]
91ccacea12
Auto-format code with black (#10209)
* Auto-format code with black

* add black requirement to dev dependencies and pin to 22.x

* ignore black dependency for comparison with setup.cfg

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2022-02-06 16:30:30 +01:00
Adriane Boyd
0668a449ba
Add Pipe.hide_labels to omit labels from pipeline meta (#10175) 2022-02-05 17:59:24 +01:00
Adriane Boyd
6f551043e4
Use paths.vectors for vectors in init config (#10146)
So that overriding `paths.vectors` works consistently in generated
configs, set vectors model in `paths.vectors` and always refer to this
path in `initialize.vectors`.
2022-02-04 21:09:48 +01:00