Commit Graph

260 Commits

Author SHA1 Message Date
svlandeg
c27679f210 Merge branch 'master' into feat/update_v4 2024-05-14 17:42:48 +02:00
Sofie Van Landeghem
ecd85d2618
Update Typer pin and GH actions (#13471)
* update gh actions

* pin typer upperbound to 1.0.0
2024-04-29 13:28:46 +02:00
Sofie Van Landeghem
6d6c10ab9c
Fix CI (#13469)
* Remove hardcoded architecture setting

* update classifiers to include Python 3.12
2024-04-29 10:18:07 +02:00
Daniël de Kok
f5918d4353
Update to Thinc 9.0.0 and set version to 4.0.0.dev3 (#13448)
* Update to Thinc 9.0.0 and set version to 4.0.0.dev3

* Set minimum Python version to 3.9
2024-04-22 09:40:55 +02:00
Daniël de Kok
5bd141013b
Remove apple from extras (#13439)
Account for merging of `thinc-apple-ops` into `thinc`.
2024-04-17 13:43:27 +02:00
Sofie Van Landeghem
f5e85fa05a
allow weasel 0.4.x (#13409) 2024-04-04 12:55:08 +02:00
Sofie Van Landeghem
d410d95b52
remove smart_open requirement as it's taken care of via Weasel (#13391) 2024-03-22 18:21:20 +01:00
Daniël de Kok
70e2f2a14a
Update spacy-legacy dependency to 4.0.0.dev1 (#13270)
This release is compatible with the parser refactor backout.
2024-01-25 18:24:22 +01:00
Daniël de Kok
36ee709390 Remove setup_requires from setup.cfg 2024-01-24 15:02:02 +01:00
Daniël de Kok
81beaea70e Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119 2024-01-19 12:34:29 +01:00
Daniël de Kok
7351f6bbeb Update thinc dependency to 9.0.0.dev4 2024-01-16 15:56:09 +01:00
Daniël de Kok
e2a3952de5
Add spacy.TextCatParametricAttention.v1 (#13201)
* Add spacy.TextCatParametricAttention.v1

This layer provides is a simplification of the ensemble classifier that
only uses paramteric attention. We have found empirically that with a
sufficient amount of training data, using the ensemble classifier with
BoW does not provide significant improvement in classifier accuracy.
However, plugging in a BoW classifier does reduce GPU training and
inference performance substantially, since it uses a GPU-only kernel.

* Fix merge fallout
2024-01-02 10:03:06 +01:00
Adriane Boyd
6e54360a3d
Remove pathy dependency, update docs for cloudpathlib in Weasel (#13035) 2023-10-05 08:50:22 +02:00
Matthew Hoffman
483d4a5bc0
Allow spacy-transformers v1.3.x in transformers extra (#13025)
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-09-29 08:22:56 +02:00
Adriane Boyd
406794a081 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1 2023-09-28 15:09:06 +02:00
Adriane Boyd
ff4215f1c7
Drop support for python 3.6 (#13009)
* Drop support for python 3.6

* Update docs
2023-09-25 14:48:38 +02:00
Adriane Boyd
198488ee86
Extend to weasel v0.3 (#12908)
* Extend to weasel v0.3

* Clean up unused imports in test_cli
2023-08-16 17:36:53 +02:00
Adriane Boyd
6a4aa43164
Extend to thinc v8.2 (#12897) 2023-08-11 13:05:46 +02:00
Adriane Boyd
9622c11529
Extend to weasel v0.2 (#12902) 2023-08-11 10:59:51 +02:00
Adriane Boyd
060241a8d5 Revert "Extend to spacy-transformers v1.3.x (#12877)"
This reverts commit e5773e0c69.
2023-08-10 11:42:09 +02:00
Adriane Boyd
c4e378df97
Update CuPy extras (#12890)
* Add `cuda12x` for `cupy-cuda12x`.
* Drop `cuda-autodetect` from quickstart, set default to `cuda11x`
instead.
2023-08-08 12:58:28 +02:00
Adriane Boyd
245e2ddc25
Allow pydantic v2 using transitional v1 support (#12888) 2023-08-08 11:27:28 +02:00
Adriane Boyd
e5773e0c69
Extend to spacy-transformers v1.3.x (#12877) 2023-08-02 09:35:16 +02:00
Adriane Boyd
9ffa5d8a15
Remove ray extra (#12870) 2023-07-28 15:48:36 +02:00
Adriane Boyd
5888afa884
Update numpy build constraints for numpy 1.25 (#12839)
* Update numpy build constraints for numpy 1.25

Starting in numpy 1.25 (see
https://github.com/numpy/numpy/releases/tag/v1.25.0), the numpy C API is
backwards-compatible by default.

For python 3.9+, we should be able to drop the specific numpy build
requirements and use `numpy>=1.25`, which is currently
backwards-compatible to `numpy>=1.19`.

In the future, the python <3.9 requirements could be dropped and the
lower numpy pin could correspond to the oldest supported version for the
current lower python pin.

* Turn off fail-fast

* Revert "Turn off fail-fast"

This reverts commit 4306f516bc.

* Update for python 3.6

* Fix typo
2023-07-24 10:32:56 +02:00
Sofie Van Landeghem
b1b20bf69d
Replace projects functionality with weasel (#12769)
* Setting up weasel branch (#12456)

* remove project-specific functionality

* remove project-specific tests

* remove project-specific schemas

* remove project-specific information in about

* remove project-specific functions in util.py

* remove project-specific error strings

* remove project-specific CLI commands

* black formatting

* restore some functions that are used beyond projects

* remove project imports

* remove imports

* remove remote_storage tests

* remove one more project unit test

* update for PR 12394

* remove get_hash and get_checksum

* remove upload_ and download_file methods

* remove ensure_pathy

* revert clumsy fingers

* reinstate E970

* feat: use weasel as spacy project command (#12473)

* feat: use weasel as spacy project command

* build: use constrained requirement for weasel

* feat: add weasel to the library requirements

* build: update weasel to new version

* build: use specific weasel tag

* build: use weasel-0.1.0rc1 from PyPI

* fix: remove weasel from requirements.txt

* fix: requirements.txt and setup.cfg need to reflect each other

* feat: remove legacy spacy project code

* bump version

* further merge fixes

* isort

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-07-07 09:10:27 +02:00
Daniël de Kok
50c5e9a2dd Merge remote-tracking branch 'upstream/master' into sync-v4-master-20230612 2023-06-12 15:57:10 +02:00
Basile Dura
f96b9e03df
build: bump typer version to accept >=0.3<0.10 (#12631) 2023-05-15 08:06:58 +02:00
Adriane Boyd
476a2e7a0a
Allow cupy 12.0 for extras (#12490) 2023-03-31 13:48:15 +02:00
Paul O'Leary McCann
e656189ec3
Change GPU efficient textcat to use CNN, not BOW in generated configs (#11900)
* Change GPU efficient textcat to use CNN, not BOW

If you generate a config with a textcat component using GPU
(transformers), the defaut option (efficiency) uses a BOW architecture,
which does not use tok2vec features. While that can make sense as part
of a larger pipeline, in the case of just a transformer and a textcat,
that means the transformer is doing a lot of work for no purpose.

This changes it so that the CNN architecture is used instead. It could
also be changed to be the same as the accuracy config, which uses the
ensemble architecture.

* Add the transformer when using a textcat with GPU

* Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)

* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6

* Require thinc v8.1.7

* Require thinc v8.1.8

* Break up longer expression

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-07 17:47:45 +01:00
Adriane Boyd
ec45f704b1
Drop python 3.6/3.7, remove unneeded compat (#12187)
* Drop python 3.6/3.7, remove unneeded compat

* Remove unused import

* Minimal python 3.8+ docs updates
2023-01-27 15:48:20 +01:00
Adriane Boyd
8548d4d16e Merge remote-tracking branch 'upstream/master' into update-v4-from-master-1 2023-01-27 08:29:09 +01:00
Daniël de Kok
b052b1b47f
Fix batching regression (#12094)
* Fix batching regression

Some time ago, the spaCy v4 branch switched to the new Thinc v9
schedule. However, this introduced an error in how batching is handed.

In the PR, the batchers were changed to keep track of their step,
so that the step can be passed to the schedule. However, the issue
is that the training loop repeatedly calls the batching functions
(rather than using an infinite generator/iterator). So, the step and
therefore the schedule would be reset each epoch. Before the schedule
switch we didn't have this issue, because the old schedules were
stateful.

This PR fixes this issue by reverting the batching functions to use
a (stateful) generator. Their registry functions do accept a `Schedule`
and we convert `Schedule`s to generators.

* Update batcher docs

* Docstring fixes

* Make minibatch take iterables again as well

* Bump thinc requirement to 9.0.0.dev2

* Use type declaration

* Convert another comment into a proper type declaration
2023-01-18 18:28:30 +01:00
Albert Villanova del Moral
25373d8e8e
Fix required maximum version of typing-extensions (#12036)
* Fix required maximum version of typing-extensions

* Restrict to <4.5.0, sync minimum pin

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-01-13 10:44:02 +01:00
svlandeg
b2fd9490e3 Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
Adriane Boyd
e0168ccce9
Allow spacy-transformers v1.2.x in transformers extra (#12092) 2023-01-11 13:54:58 +01:00
svlandeg
6852adc8b7 Merge branch 'copy_master' into copy_v4 2023-01-03 13:34:05 +01:00
Adriane Boyd
ef9e504eac
Rename modified textcat scorer to v2 (#11971)
As a follow-up to #11696, rename the modified scorer to v2 and move the
v1 scorer to `spacy-legacy`.
2022-12-29 14:01:08 +01:00
Daniël de Kok
20b63943f5
Adjust to new Schedule class and pass scores to Optimizer (#12008)
* Adjust to new `Schedule` class and pass scores to `Optimizer`

Requires https://github.com/explosion/thinc/pull/804

* Bump minimum Thinc requirement to 9.0.0.dev1
2022-12-29 08:03:24 +01:00
Adriane Boyd
64d2d27c5d
Add classifier for python 3.11 (#12013) 2022-12-22 10:53:16 +01:00
Daniël de Kok
207565a788 Merge remote-tracking branch 'upstream/master' into chore/v4-merge-master-20221222 2022-12-22 10:08:54 +01:00
Daniël de Kok
f9308aae13
Fix v4 branch to build against Thinc v9 (#11921)
* Move `thinc.extra.search` to `spacy.pipeline._parser_internals`

Backport of:
https://github.com/explosion/spaCy/pull/11317

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Replace references to `thinc.backends.linalg` with `CBlas`

Backport of:
https://github.com/explosion/spaCy/pull/11292

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Use cross entropy from `thinc.legacy`

* Require thinc>=9.0.0.dev0,<9.1.0

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2022-12-17 14:32:19 +01:00
Adriane Boyd
8c291ace0c
Extend to wasabi v1.1 (#11945)
* Extend to wasabi v1.1

* Temporarily run mypy and tests with newest wasabi

* Temporarily skip check requirements test

* Revert "Temporarily skip check requirements test"

This reverts commit 44f4ce20a8.

* Revert "Temporarily run mypy and tests with newest wasabi"

This reverts commit e677a2257c.
2022-12-12 08:38:36 +01:00
svlandeg
04fea09ffd Merge branch 'copy_master' into copy_v4 2022-12-05 08:56:15 +01:00
Adriane Boyd
1ebe7db07c
Support local filesystem remotes for projects (#11762)
* Support local filesystem remotes for projects

* Fix support for local filesystem remotes for projects
  * Use `FluidPath` instead of `Pathy` to support both filesystem and
    remote paths
  * Create missing parent directories if required for local filesystem
  * Add a more general `_file_exists` method to support both `Pathy`,
    `Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
  `compression` flag
* Update `pathy` dependency to exclude older versions that aren't
  compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
  remotes (technically you can still push to any `smart_open`-compatible
  path but you can't pull from them)
* Add tests for local filesystem remotes

* Update pathy for general BlobStat sorting

* Add import

* Remove _file_exists since only Pathy remotes are supported

* Format CLI docs

* Clean up merge
2022-11-29 11:40:58 +01:00
Adriane Boyd
681ec20914
Add smart_open requirement, update deprecated options (#11864)
* Switch from deprecated `ignore_ext` to `compression`
* Add upload/download test for local files
2022-11-25 13:00:57 +01:00
Paul O'Leary McCann
b76222e56a
Raise Typer limit (#11720)
* Raise typer limit to <0.7.0

* Raise limit to <0.8.0
2022-11-07 08:11:55 +01:00
Adriane Boyd
103b24fb25 Merge remote-tracking branch 'upstream/master' into chore/update-v4-from-master 2022-10-21 09:13:32 +02:00
Adriane Boyd
6b5a3e7219
Extend to pydantic v1.10 (#11635)
* Update types in `spacy.schemas` for updated pydantic+mypy
2022-10-14 08:16:49 +02:00
svlandeg
e3027c65b8 Merge branch 'copy_develop' into copy_v4 2022-10-03 14:12:16 +02:00