Commit Graph

15759 Commits

Author SHA1 Message Date
richardpaulhudson
79b2843a3f Simple changes based on review comments 2022-12-12 11:10:10 +01:00
richardpaulhudson
ec1426700e Avoid memcpy by writing directly to numpy data buf 2022-11-11 08:45:58 +01:00
richardpaulhudson
42f8563d0d Remove unnecessary variable defintiion 2022-11-10 11:40:19 +01:00
richardpaulhudson
5b29568fb7 Fix wild pointer problem 2022-11-10 11:37:03 +01:00
richardpaulhudson
54bdc11353 Merge branch 'master' of https://github.com/explosion/spaCy into feature/etl 2022-11-09 12:24:36 +01:00
richardpaulhudson
999c0fc6c6 Format with black 2022-11-09 11:43:17 +01:00
richardpaulhudson
6a5b671261 Add full stop 2022-11-09 11:41:52 +01:00
richardpaulhudson
35d0c217d2 Final touches 2022-11-09 11:40:54 +01:00
Adriane Boyd
03eebe9d1c
Update warning, add tests for project requirements check (#11777)
* Update warning, add tests for project requirements check

* Make warning more general for differences between PEP 508 and pip
* Add tests for _check_requirements

* Parameterize test
2022-11-09 10:59:28 +01:00
Raphael Mitsch
20bbbe3e44
Revert disable/disabled merging behavior (#11745)
* Merge disable with disabled. Adjust warnings, errors and tests.

* Replace any() with set operation.

* Update spacy/tests/pipeline/test_pipe_methods.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update docs.

* Remve reference to config entry nlp.enabled from docs.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-08 14:58:10 +01:00
Adriane Boyd
2e3cfd758e
Use python 3.10 for GHA universe alert (#11768) 2022-11-08 12:46:19 +09:00
Adriane Boyd
e116395f89
Add fallback in requirements check, only check once (#11735)
* Add fallback in requirements check, only check once

* Rename to skip_requirements_check

* Update spacy/cli/project/run.py

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-11-07 14:46:08 +01:00
Adriane Boyd
6105f20d8a
Switch CI to python 3.11 (#11765) 2022-11-07 13:25:40 +01:00
Adriane Boyd
e91b47a226
Check for unsafe paths in tarfile.extractall (CVE-2007-4559) (#11746)
* Adding tarfile member sanitization to extractall()

* Format

* Simplify and add error message

* Fix import

* Add comment about CVE

Co-authored-by: TrellixVulnTeam <charles.mcfarland@trellix.com>
2022-11-07 10:43:34 +01:00
Paul O'Leary McCann
b76222e56a
Raise Typer limit (#11720)
* Raise typer limit to <0.7.0

* Raise limit to <0.8.0
2022-11-07 08:11:55 +01:00
Adriane Boyd
ea326cf47d
Fix types for Span.id and Span.id_ (#11744) 2022-11-07 08:11:13 +01:00
richardpaulhudson
a972791c9a Removed extraneous import 2022-11-04 17:47:04 +01:00
richardpaulhudson
6e069c91f6 Correct .pyi file 2022-11-04 12:50:07 +01:00
richardpaulhudson
28a93fd3e3 Another correction 2022-11-04 12:44:22 +01:00
richardpaulhudson
8d703963d3 Correct error 2022-11-04 12:40:03 +01:00
richardpaulhudson
f97d6e6826 Updated example config 2022-11-04 12:36:14 +01:00
richardpaulhudson
dcfc810033 Remove extraneous import 2022-11-04 11:31:18 +01:00
github-actions[bot]
bbf64cfc43
Auto-format code with black (#11749)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-04 11:17:43 +01:00
richardpaulhudson
750628a623 Fix mypy problem 2022-11-04 11:00:33 +01:00
richardpaulhudson
f0dc60691a Switch to 64-bit hashes 2022-11-04 10:17:25 +01:00
richardpaulhudson
7f1873ad81 Everything working after refactoring 2022-11-04 09:33:06 +01:00
richardpaulhudson
5d210a0f3b Tidy up code 2022-11-03 21:26:47 +01:00
richardpaulhudson
aaaed55459 Save end_search_idx in variable 2022-11-03 21:06:37 +01:00
richard@explosion.ai
5d32dd6246 Intermediate state 2022-11-03 20:54:07 +01:00
richard@explosion.ai
7db2770c05 Intermediate state 2022-11-03 15:23:50 +01:00
richard@explosion.ai
b462f85a73 Correction 2022-11-03 13:37:53 +01:00
Adriane Boyd
40e1000db0
Restore Doc attr getter values in Doc.to_json (#11700) 2022-11-03 11:49:08 +01:00
richard@explosion.ai
c7a960f19e Performance improvement 2022-11-03 11:17:07 +01:00
Paul O'Leary McCann
db56600536
Fix default parameters for load functions (fix #11706) (#11713)
* Fix default parameters for load functions

Some load functions used SimpleFrozenList() directly instead of the
_DEFAULT_EMPTY_PIPES parameter. That mostly worked as intended, but
the changes in #11459 check for equality using identity, not value, so a
warning is incorrectly raised sometimes, as in #11706.

This change just has all the load functions use the singleton value
instead.

* Add test that there are no warnings on module-based load

This will succeed due to changes in this branch, but local tests with
the latest release failed as intended.

* Try reverting commit and see if CI changes

There is an error in CI that is probably unrelated.

Revert "Fix default parameters for load functions"

This reverts commit dc46b35687.

* Revert "Try reverting commit and see if CI changes"

This reverts commit 2514ed07ef.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-03 10:52:59 +01:00
richard@explosion.ai
deba504173 Add FNV1A conformity tests 2022-11-03 10:19:38 +01:00
Adriane Boyd
1211552f0e
Modernize and simplify CI steps (#11738)
* Use `build` instead of `python setup.py sdist`
* Remove in-place build with `setup.py`
* Remove `gpu` parameter and GPU tests
* Keep `architecture` and `num_build_jobs` in azure steps with CI
  defaults
* Fix use of `num_build_jobs` parameters
* Remove now-unused `prefix` parameter
* Test imports and CLI before installing test requirements
  * Remove `*.egg-info` directory in addition to source directory for an
    warning-free `import spacy`
* Switch `thinc-apple-ops` test to python 3.11 (as most recent python
  that is tested across platforms)
2022-11-03 09:29:46 +01:00
richard@explosion.ai
557799358c Switch to FNV1A hashing 2022-11-02 20:04:43 +01:00
richard@explosion.ai
e7626f423a Generate Numpy array at end 2022-11-02 17:11:20 +01:00
Ryn Daniels
2fb7e4dc74
More version updates for github action deprecation warnings (#11705)
* More version updates for github action deprecation warnings

* fix the deprecated set-output commands

* bump explosion-bot to run on ubuntu-latest
2022-11-02 15:36:30 +01:00
Adriane Boyd
420b1d854b
Update textcat scorer threshold behavior (#11696)
* Update textcat scorer threshold behavior

For `textcat` (with exclusive classes) the scorer should always use a
threshold of 0.0 because there should be one predicted label per doc and
the numeric score for that particular label should not matter.

* Rename to test_textcat_multilabel_threshold

* Remove all uses of threshold for multi_label=False

* Update Scorer.score_cats API docs

* Add tests for score_cats with thresholds

* Update textcat API docs

* Fix types

* Convert threshold back to float

* Fix threshold type in docstring

* Improve formatting in Scorer API docs
2022-11-02 15:35:04 +01:00
Adriane Boyd
f7edd84b44
Switch CI to Python 3.11.0 (#11737) 2022-11-02 13:42:20 +01:00
richardpaulhudson
bbf058029a Intermediate state 2022-11-01 20:46:55 +01:00
richardpaulhudson
2552340fb8 Get rid of memory views 2022-11-01 14:05:35 +01:00
Aaron Zipp
d25f09468c
Spelling mistake in rule-based-matching.md (#11717)
Changed retokenize to retokenizer
2022-10-31 13:27:12 +09:00
richardpaulhudson
749da9d348 Speed improvements 2022-10-28 14:42:42 +02:00
richardpaulhudson
217ff36559 Tests passing again after refactoring 2022-10-28 13:31:14 +02:00
Paul O'Leary McCann
d61e742960
Handle Docs with no entities in EntityLinker (#11640)
* Handle docs with no entities

If a whole batch contains no entities it won't make it to the model, but
it's possible for individual Docs to have no entities. Before this
commit, those Docs would cause an error when attempting to concatenate
arrays because the dimensions didn't match.

It turns out the process of preparing the Ragged at the end of the span
maker forward was a little different from list2ragged, which just uses
the flatten function directly. Letting list2ragged do the conversion
avoids the dimension issue.

This did not come up before because in NEL demo projects it's typical
for data with no entities to be discarded before it reaches the NEL
component.

This includes a simple direct test that shows the issue and checks it's
resolved. It doesn't check if there are any downstream changes, so a
more complete test could be added. A full run was tested by adding an
example with no entities to the Emerson sample project.

* Add a blank instance to default training data in tests

Rather than adding a specific test, since not failing on instances with
no entities is basic functionality, it makes sense to add it to the
default set.

* Fix without modifying architecture

If the architecture is modified this would have to be a new version, but
this change isn't big enough to merit that.
2022-10-28 10:25:34 +02:00
richardpaulhudson
5d151b4abe Correction 2022-10-27 21:05:22 +02:00
richardpaulhudson
13e417e8d1 Intermediate state 2022-10-27 20:59:30 +02:00
richardpaulhudson
c140bd6083 Correction 2022-10-27 18:19:19 +02:00