Commit Graph

15845 Commits

Author SHA1 Message Date
Paul O'Leary McCann
b76222e56a
Raise Typer limit (#11720)
* Raise typer limit to <0.7.0

* Raise limit to <0.8.0
2022-11-07 08:11:55 +01:00
Adriane Boyd
ea326cf47d
Fix types for Span.id and Span.id_ (#11744) 2022-11-07 08:11:13 +01:00
richardpaulhudson
a972791c9a Removed extraneous import 2022-11-04 17:47:04 +01:00
richardpaulhudson
6e069c91f6 Correct .pyi file 2022-11-04 12:50:07 +01:00
richardpaulhudson
28a93fd3e3 Another correction 2022-11-04 12:44:22 +01:00
richardpaulhudson
8d703963d3 Correct error 2022-11-04 12:40:03 +01:00
richardpaulhudson
f97d6e6826 Updated example config 2022-11-04 12:36:14 +01:00
richardpaulhudson
dcfc810033 Remove extraneous import 2022-11-04 11:31:18 +01:00
github-actions[bot]
bbf64cfc43
Auto-format code with black (#11749)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-04 11:17:43 +01:00
richardpaulhudson
750628a623 Fix mypy problem 2022-11-04 11:00:33 +01:00
richardpaulhudson
f0dc60691a Switch to 64-bit hashes 2022-11-04 10:17:25 +01:00
richardpaulhudson
7f1873ad81 Everything working after refactoring 2022-11-04 09:33:06 +01:00
richardpaulhudson
5d210a0f3b Tidy up code 2022-11-03 21:26:47 +01:00
richardpaulhudson
aaaed55459 Save end_search_idx in variable 2022-11-03 21:06:37 +01:00
richard@explosion.ai
5d32dd6246 Intermediate state 2022-11-03 20:54:07 +01:00
richard@explosion.ai
7db2770c05 Intermediate state 2022-11-03 15:23:50 +01:00
richard@explosion.ai
b462f85a73 Correction 2022-11-03 13:37:53 +01:00
Adriane Boyd
40e1000db0
Restore Doc attr getter values in Doc.to_json (#11700) 2022-11-03 11:49:08 +01:00
richard@explosion.ai
c7a960f19e Performance improvement 2022-11-03 11:17:07 +01:00
Paul O'Leary McCann
db56600536
Fix default parameters for load functions (fix #11706) (#11713)
* Fix default parameters for load functions

Some load functions used SimpleFrozenList() directly instead of the
_DEFAULT_EMPTY_PIPES parameter. That mostly worked as intended, but
the changes in #11459 check for equality using identity, not value, so a
warning is incorrectly raised sometimes, as in #11706.

This change just has all the load functions use the singleton value
instead.

* Add test that there are no warnings on module-based load

This will succeed due to changes in this branch, but local tests with
the latest release failed as intended.

* Try reverting commit and see if CI changes

There is an error in CI that is probably unrelated.

Revert "Fix default parameters for load functions"

This reverts commit dc46b35687.

* Revert "Try reverting commit and see if CI changes"

This reverts commit 2514ed07ef.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-03 10:52:59 +01:00
richard@explosion.ai
deba504173 Add FNV1A conformity tests 2022-11-03 10:19:38 +01:00
Adriane Boyd
1211552f0e
Modernize and simplify CI steps (#11738)
* Use `build` instead of `python setup.py sdist`
* Remove in-place build with `setup.py`
* Remove `gpu` parameter and GPU tests
* Keep `architecture` and `num_build_jobs` in azure steps with CI
  defaults
* Fix use of `num_build_jobs` parameters
* Remove now-unused `prefix` parameter
* Test imports and CLI before installing test requirements
  * Remove `*.egg-info` directory in addition to source directory for an
    warning-free `import spacy`
* Switch `thinc-apple-ops` test to python 3.11 (as most recent python
  that is tested across platforms)
2022-11-03 09:29:46 +01:00
richard@explosion.ai
557799358c Switch to FNV1A hashing 2022-11-02 20:04:43 +01:00
richard@explosion.ai
e7626f423a Generate Numpy array at end 2022-11-02 17:11:20 +01:00
Ryn Daniels
2fb7e4dc74
More version updates for github action deprecation warnings (#11705)
* More version updates for github action deprecation warnings

* fix the deprecated set-output commands

* bump explosion-bot to run on ubuntu-latest
2022-11-02 15:36:30 +01:00
Adriane Boyd
420b1d854b
Update textcat scorer threshold behavior (#11696)
* Update textcat scorer threshold behavior

For `textcat` (with exclusive classes) the scorer should always use a
threshold of 0.0 because there should be one predicted label per doc and
the numeric score for that particular label should not matter.

* Rename to test_textcat_multilabel_threshold

* Remove all uses of threshold for multi_label=False

* Update Scorer.score_cats API docs

* Add tests for score_cats with thresholds

* Update textcat API docs

* Fix types

* Convert threshold back to float

* Fix threshold type in docstring

* Improve formatting in Scorer API docs
2022-11-02 15:35:04 +01:00
Adriane Boyd
f7edd84b44
Switch CI to Python 3.11.0 (#11737) 2022-11-02 13:42:20 +01:00
richardpaulhudson
bbf058029a Intermediate state 2022-11-01 20:46:55 +01:00
richardpaulhudson
2552340fb8 Get rid of memory views 2022-11-01 14:05:35 +01:00
Aaron Zipp
d25f09468c
Spelling mistake in rule-based-matching.md (#11717)
Changed retokenize to retokenizer
2022-10-31 13:27:12 +09:00
richardpaulhudson
749da9d348 Speed improvements 2022-10-28 14:42:42 +02:00
richardpaulhudson
217ff36559 Tests passing again after refactoring 2022-10-28 13:31:14 +02:00
Paul O'Leary McCann
d61e742960
Handle Docs with no entities in EntityLinker (#11640)
* Handle docs with no entities

If a whole batch contains no entities it won't make it to the model, but
it's possible for individual Docs to have no entities. Before this
commit, those Docs would cause an error when attempting to concatenate
arrays because the dimensions didn't match.

It turns out the process of preparing the Ragged at the end of the span
maker forward was a little different from list2ragged, which just uses
the flatten function directly. Letting list2ragged do the conversion
avoids the dimension issue.

This did not come up before because in NEL demo projects it's typical
for data with no entities to be discarded before it reaches the NEL
component.

This includes a simple direct test that shows the issue and checks it's
resolved. It doesn't check if there are any downstream changes, so a
more complete test could be added. A full run was tested by adding an
example with no entities to the Emerson sample project.

* Add a blank instance to default training data in tests

Rather than adding a specific test, since not failing on instances with
no entities is basic functionality, it makes sense to add it to the
default set.

* Fix without modifying architecture

If the architecture is modified this would have to be a new version, but
this change isn't big enough to merit that.
2022-10-28 10:25:34 +02:00
richardpaulhudson
5d151b4abe Correction 2022-10-27 21:05:22 +02:00
richardpaulhudson
13e417e8d1 Intermediate state 2022-10-27 20:59:30 +02:00
richardpaulhudson
c140bd6083 Correction 2022-10-27 18:19:19 +02:00
richardpaulhudson
a1b8697aab Changes after review discussion — intermed. state 2022-10-27 18:03:25 +02:00
Paul O'Leary McCann
6b78135b9e
Add warning to install widget for M1 GPUs (#11666)
* Add warning to install widget for M1 GPUs

* Use Thinc tracking issue instead

* Update website/src/widgets/quickstart-install.js

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Underline URL in warning

* Update website/src/widgets/quickstart-install.js

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Don't install cupy on m1 gpus

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-10-27 15:08:24 +02:00
Adriane Boyd
865691d169
Adjust default attrs for textcat configs (#11698) 2022-10-26 08:43:00 +02:00
Ryn Daniels
a9139907a9
update github actions to deal with deprecations (#11702) 2022-10-26 08:15:13 +02:00
Adriane Boyd
0a9859ba01
Reduce python 3.10 in CI to one OS (#11703) 2022-10-25 19:38:23 +02:00
Adriane Boyd
8740e4341f
Update languages and version in README and website (#11694) 2022-10-25 14:54:54 +02:00
Adriane Boyd
88d35450dc
Rename test helper method with non-test_ name (#11701) 2022-10-25 14:53:18 +02:00
richardpaulhudson
7d8258bec8 Correct documentation 2022-10-21 14:35:40 +02:00
richardpaulhudson
100d66a052 Fix error codes 2022-10-21 12:48:03 +02:00
Richard Hudson
34e8bc620d
Merge branch 'master' into feature/etl 2022-10-21 12:46:02 +02:00
richardpaulhudson
42b7b8d509 Major refactoring 2022-10-21 12:01:24 +02:00
github-actions[bot]
84d9cb6b38
Auto-format code with black (#11687)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-10-21 11:54:17 +02:00
richardpaulhudson
f7d9942e7c Intermediate state 2022-10-20 21:48:53 +02:00
Adriane Boyd
fb280001cc
Merge pull request #11678 from adrianeboyd/chore/update-develop-from-master-v3.5
Update develop from master before v3.5
2022-10-20 15:45:19 +02:00