Commit Graph

15871 Commits

Author SHA1 Message Date
github-actions[bot]
89bfd06fbd
Auto-format code with black (#11826)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-18 18:24:13 +09:00
Paul O'Leary McCann
e3173bd86d
Remove spikex from Universe (#11825) 2022-11-18 08:24:22 +01:00
Adriane Boyd
a83463c5e0
Add transformer recommendation for ca (#11819)
Model recommendation from @cayorodriguez.
2022-11-18 08:15:27 +01:00
Paul O'Leary McCann
75bb7ad541
Check textcat values for validity (#11763)
* Check textcat values for validity

* Fix error numbers

* Clean up vals reference

* Check category value validity through training

The _validate_categories is called in update, which for multilabel is
inherited from the single label component.

* Formatting
2022-11-17 10:25:01 +01:00
Adriane Boyd
317b6ef99c
Update to mypy 0.990 (#11801) 2022-11-16 14:09:10 +01:00
Paul O'Leary McCann
c0c54e44bc
Add equality definition for vectors (#11806)
* Add equality definition for vectors

This re-uses the check from sourcing components.

* Use the equality check

* Format

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-16 09:44:42 +01:00
Sofie Van Landeghem
caa9efad59
prevent rewriting an already raw URL (#11810) 2022-11-15 14:15:00 +01:00
Denis Bezykornov
7e684ad691
Update russian tokenizer exceptions (#11753)
* Fix typos, add couple of new abbreviations, remove nonbreaking spaces

* Remove space from abbreviation

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-15 11:37:25 +01:00
Peter Baumgartner
9baa686f82
remove migration support form (#11802) 2022-11-14 16:53:14 +01:00
Paul O'Leary McCann
bb523d4d91
Remove spacy-ray from docs (#11781)
* Remove spacy ray from cli docs

* Remove more ray docs

* Remove ray from universe
2022-11-14 19:58:38 +09:00
Edward
3478ff1eb0
remove new v2 tags (#11780) 2022-11-14 17:41:01 +09:00
github-actions[bot]
188a7d00eb
Auto-format code with black (#11792)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-11 09:58:31 +01:00
richardpaulhudson
ec1426700e Avoid memcpy by writing directly to numpy data buf 2022-11-11 08:45:58 +01:00
richardpaulhudson
42f8563d0d Remove unnecessary variable defintiion 2022-11-10 11:40:19 +01:00
richardpaulhudson
5b29568fb7 Fix wild pointer problem 2022-11-10 11:37:03 +01:00
Jacobo Myerston
322b5dc1df
Add greCy to Universe (#11774)
* Update universe.json

* Update universe.json

fixes Github value
2022-11-10 13:21:20 +09:00
richardpaulhudson
54bdc11353 Merge branch 'master' of https://github.com/explosion/spaCy into feature/etl 2022-11-09 12:24:36 +01:00
richardpaulhudson
999c0fc6c6 Format with black 2022-11-09 11:43:17 +01:00
richardpaulhudson
6a5b671261 Add full stop 2022-11-09 11:41:52 +01:00
richardpaulhudson
35d0c217d2 Final touches 2022-11-09 11:40:54 +01:00
Adriane Boyd
03eebe9d1c
Update warning, add tests for project requirements check (#11777)
* Update warning, add tests for project requirements check

* Make warning more general for differences between PEP 508 and pip
* Add tests for _check_requirements

* Parameterize test
2022-11-09 10:59:28 +01:00
Raphael Mitsch
20bbbe3e44
Revert disable/disabled merging behavior (#11745)
* Merge disable with disabled. Adjust warnings, errors and tests.

* Replace any() with set operation.

* Update spacy/tests/pipeline/test_pipe_methods.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update docs.

* Remve reference to config entry nlp.enabled from docs.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-08 14:58:10 +01:00
Adriane Boyd
2e3cfd758e
Use python 3.10 for GHA universe alert (#11768) 2022-11-08 12:46:19 +09:00
Adriane Boyd
e116395f89
Add fallback in requirements check, only check once (#11735)
* Add fallback in requirements check, only check once

* Rename to skip_requirements_check

* Update spacy/cli/project/run.py

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-11-07 14:46:08 +01:00
Adriane Boyd
6105f20d8a
Switch CI to python 3.11 (#11765) 2022-11-07 13:25:40 +01:00
Adriane Boyd
e91b47a226
Check for unsafe paths in tarfile.extractall (CVE-2007-4559) (#11746)
* Adding tarfile member sanitization to extractall()

* Format

* Simplify and add error message

* Fix import

* Add comment about CVE

Co-authored-by: TrellixVulnTeam <charles.mcfarland@trellix.com>
2022-11-07 10:43:34 +01:00
Paul O'Leary McCann
b76222e56a
Raise Typer limit (#11720)
* Raise typer limit to <0.7.0

* Raise limit to <0.8.0
2022-11-07 08:11:55 +01:00
Adriane Boyd
ea326cf47d
Fix types for Span.id and Span.id_ (#11744) 2022-11-07 08:11:13 +01:00
richardpaulhudson
a972791c9a Removed extraneous import 2022-11-04 17:47:04 +01:00
richardpaulhudson
6e069c91f6 Correct .pyi file 2022-11-04 12:50:07 +01:00
richardpaulhudson
28a93fd3e3 Another correction 2022-11-04 12:44:22 +01:00
richardpaulhudson
8d703963d3 Correct error 2022-11-04 12:40:03 +01:00
richardpaulhudson
f97d6e6826 Updated example config 2022-11-04 12:36:14 +01:00
richardpaulhudson
dcfc810033 Remove extraneous import 2022-11-04 11:31:18 +01:00
github-actions[bot]
bbf64cfc43
Auto-format code with black (#11749)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-04 11:17:43 +01:00
richardpaulhudson
750628a623 Fix mypy problem 2022-11-04 11:00:33 +01:00
richardpaulhudson
f0dc60691a Switch to 64-bit hashes 2022-11-04 10:17:25 +01:00
richardpaulhudson
7f1873ad81 Everything working after refactoring 2022-11-04 09:33:06 +01:00
richardpaulhudson
5d210a0f3b Tidy up code 2022-11-03 21:26:47 +01:00
richardpaulhudson
aaaed55459 Save end_search_idx in variable 2022-11-03 21:06:37 +01:00
richard@explosion.ai
5d32dd6246 Intermediate state 2022-11-03 20:54:07 +01:00
richard@explosion.ai
7db2770c05 Intermediate state 2022-11-03 15:23:50 +01:00
richard@explosion.ai
b462f85a73 Correction 2022-11-03 13:37:53 +01:00
Adriane Boyd
40e1000db0
Restore Doc attr getter values in Doc.to_json (#11700) 2022-11-03 11:49:08 +01:00
richard@explosion.ai
c7a960f19e Performance improvement 2022-11-03 11:17:07 +01:00
Paul O'Leary McCann
db56600536
Fix default parameters for load functions (fix #11706) (#11713)
* Fix default parameters for load functions

Some load functions used SimpleFrozenList() directly instead of the
_DEFAULT_EMPTY_PIPES parameter. That mostly worked as intended, but
the changes in #11459 check for equality using identity, not value, so a
warning is incorrectly raised sometimes, as in #11706.

This change just has all the load functions use the singleton value
instead.

* Add test that there are no warnings on module-based load

This will succeed due to changes in this branch, but local tests with
the latest release failed as intended.

* Try reverting commit and see if CI changes

There is an error in CI that is probably unrelated.

Revert "Fix default parameters for load functions"

This reverts commit dc46b35687.

* Revert "Try reverting commit and see if CI changes"

This reverts commit 2514ed07ef.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-03 10:52:59 +01:00
richard@explosion.ai
deba504173 Add FNV1A conformity tests 2022-11-03 10:19:38 +01:00
Adriane Boyd
1211552f0e
Modernize and simplify CI steps (#11738)
* Use `build` instead of `python setup.py sdist`
* Remove in-place build with `setup.py`
* Remove `gpu` parameter and GPU tests
* Keep `architecture` and `num_build_jobs` in azure steps with CI
  defaults
* Fix use of `num_build_jobs` parameters
* Remove now-unused `prefix` parameter
* Test imports and CLI before installing test requirements
  * Remove `*.egg-info` directory in addition to source directory for an
    warning-free `import spacy`
* Switch `thinc-apple-ops` test to python 3.11 (as most recent python
  that is tested across platforms)
2022-11-03 09:29:46 +01:00
richard@explosion.ai
557799358c Switch to FNV1A hashing 2022-11-02 20:04:43 +01:00
richard@explosion.ai
e7626f423a Generate Numpy array at end 2022-11-02 17:11:20 +01:00