spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-10 19:57:17 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	3ccec6af7a	Update thinc pin	2024-09-02 12:35:56 +02:00
Matthew Honnibal	77abf0828a	Pin numpy to v2	2024-09-02 10:09:55 +02:00
svlandeg	c27679f210	Merge branch 'master' into feat/update_v4	2024-05-14 17:42:48 +02:00
Sofie Van Landeghem	ecd85d2618	Update Typer pin and GH actions (#13471 ) * update gh actions * pin typer upperbound to 1.0.0	2024-04-29 13:28:46 +02:00
Daniël de Kok	f5918d4353	Update to Thinc 9.0.0 and set version to 4.0.0.dev3 (#13448 ) * Update to Thinc 9.0.0 and set version to 4.0.0.dev3 * Set minimum Python version to 3.9	2024-04-22 09:40:55 +02:00
Sofie Van Landeghem	f5e85fa05a	allow weasel 0.4.x (#13409 )	2024-04-04 12:55:08 +02:00
Sofie Van Landeghem	d410d95b52	remove smart_open requirement as it's taken care of via Weasel (#13391 )	2024-03-22 18:21:20 +01:00
Daniël de Kok	70e2f2a14a	Update `spacy-legacy` dependency to 4.0.0.dev1 (#13270 ) This release is compatible with the parser refactor backout.	2024-01-25 18:24:22 +01:00
Daniël de Kok	c621e251b8	Typing fixes	2024-01-24 12:20:01 +01:00
Daniël de Kok	81beaea70e	Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119	2024-01-19 12:34:29 +01:00
Daniël de Kok	7351f6bbeb	Update thinc dependency to 9.0.0.dev4	2024-01-16 15:56:09 +01:00
Daniël de Kok	e2a3952de5	Add spacy.TextCatParametricAttention.v1 (#13201 ) * Add spacy.TextCatParametricAttention.v1 This layer provides is a simplification of the ensemble classifier that only uses paramteric attention. We have found empirically that with a sufficient amount of training data, using the ensemble classifier with BoW does not provide significant improvement in classifier accuracy. However, plugging in a BoW classifier does reduce GPU training and inference performance substantially, since it uses a GPU-only kernel. * Fix merge fallout	2024-01-02 10:03:06 +01:00
Adriane Boyd	6e54360a3d	Remove pathy dependency, update docs for cloudpathlib in Weasel (#13035 )	2023-10-05 08:50:22 +02:00
Adriane Boyd	b4990395f9	Update mypy requirements	2023-09-28 17:12:42 +02:00
Adriane Boyd	406794a081	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1	2023-09-28 15:09:06 +02:00
Adriane Boyd	ff4215f1c7	Drop support for python 3.6 (#13009 ) * Drop support for python 3.6 * Update docs	2023-09-25 14:48:38 +02:00
Adriane Boyd	198488ee86	Extend to weasel v0.3 (#12908 ) * Extend to weasel v0.3 * Clean up unused imports in test_cli	2023-08-16 17:36:53 +02:00
Adriane Boyd	6a4aa43164	Extend to thinc v8.2 (#12897 )	2023-08-11 13:05:46 +02:00
Adriane Boyd	9622c11529	Extend to weasel v0.2 (#12902 )	2023-08-11 10:59:51 +02:00
Adriane Boyd	245e2ddc25	Allow pydantic v2 using transitional v1 support (#12888 )	2023-08-08 11:27:28 +02:00
Adriane Boyd	5888afa884	Update numpy build constraints for numpy 1.25 (#12839 ) * Update numpy build constraints for numpy 1.25 Starting in numpy 1.25 (see https://github.com/numpy/numpy/releases/tag/v1.25.0), the numpy C API is backwards-compatible by default. For python 3.9+, we should be able to drop the specific numpy build requirements and use `numpy>=1.25`, which is currently backwards-compatible to `numpy>=1.19`. In the future, the python <3.9 requirements could be dropped and the lower numpy pin could correspond to the oldest supported version for the current lower python pin. * Turn off fail-fast * Revert "Turn off fail-fast" This reverts commit `4306f516bc`. * Update for python 3.6 * Fix typo	2023-07-24 10:32:56 +02:00
svlandeg	0e3b6a87d6	Merge branch 'upstream_master' into sync_v4	2023-07-19 16:37:31 +02:00
svlandeg	79ec68f01b	Merge branch 'upstream_master' into sync_develop	2023-07-19 12:08:52 +02:00
Basile Dura	b0228d8ea6	ci: add cython linter (#12694 ) * chore: add cython-linter dev dependency * fix: lexeme.pyx * fix: morphology.pxd * fix: tokenizer.pxd * fix: vocab.pxd * fix: morphology.pxd (line length) * ci: add cython-lint * ci: fix cython-lint call * Fix kb/candidate.pyx. * Fix kb/kb.pyx. * Fix kb/kb_in_memory.pyx. * Fix kb. * Fix training/ partially. * Fix training/. Ignore trailing whitespaces and too long lines. * Fix ml/. * Fix matcher/. * Fix pipeline/. * Fix tokens/. * Fix build errors. Fix vocab.pyx. * Fix cython-lint install and run. * Fix lexeme.pyx, parts_of_speech.pxd, vectors.pyx. Temporarily disable cython-lint execution. * Fix attrs.pyx, lexeme.pyx, symbols.pxd, isort issues. * Make cython-lint install conditional. Fix tokenizer.pyx. * Fix remaining files. Reenable cython-lint check. * Readded parentheses. * Fix test_build_dependencies(). * Add explanatory comment to cython-lint execution. --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-07-19 12:03:31 +02:00
Sofie Van Landeghem	b1b20bf69d	Replace projects functionality with weasel (#12769 ) * Setting up weasel branch (#12456) * remove project-specific functionality * remove project-specific tests * remove project-specific schemas * remove project-specific information in about * remove project-specific functions in util.py * remove project-specific error strings * remove project-specific CLI commands * black formatting * restore some functions that are used beyond projects * remove project imports * remove imports * remove remote_storage tests * remove one more project unit test * update for PR 12394 * remove get_hash and get_checksum * remove upload_ and download_file methods * remove ensure_pathy * revert clumsy fingers * reinstate E970 * feat: use weasel as spacy project command (#12473) * feat: use weasel as spacy project command * build: use constrained requirement for weasel * feat: add weasel to the library requirements * build: update weasel to new version * build: use specific weasel tag * build: use weasel-0.1.0rc1 from PyPI * fix: remove weasel from requirements.txt * fix: requirements.txt and setup.cfg need to reflect each other * feat: remove legacy spacy project code * bump version * further merge fixes * isort --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-07-07 09:10:27 +02:00
Daniël de Kok	bf92ca4f10	Merge remote-tracking branch 'upstream/master' into v4-isort	2023-06-26 12:43:00 +02:00
Daniël de Kok	e2b70df012	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 ) * Use isort with Black profile * isort all the things * Fix import cycles as a result of import sorting * Add DOCBIN_ALL_ATTRS type definition * Add isort to requirements * Remove isort from build dependencies check * Typo	2023-06-14 17:48:41 +02:00
Daniël de Kok	d82e167aea	Remove Python 3.7 builds	2023-06-12 16:16:03 +02:00
Daniël de Kok	50c5e9a2dd	Merge remote-tracking branch 'upstream/master' into sync-v4-master-20230612	2023-06-12 15:57:10 +02:00
Basile Dura	f96b9e03df	build: bump typer version to accept >=0.3<0.10 (#12631 )	2023-05-15 08:06:58 +02:00
Paul O'Leary McCann	e656189ec3	Change GPU efficient textcat to use CNN, not BOW in generated configs (#11900 ) * Change GPU efficient textcat to use CNN, not BOW If you generate a config with a textcat component using GPU (transformers), the defaut option (efficiency) uses a BOW architecture, which does not use tok2vec features. While that can make sense as part of a larger pipeline, in the case of just a transformer and a textcat, that means the transformer is doing a lot of work for no purpose. This changes it so that the CNN architecture is used instead. It could also be changed to be the same as the accuracy config, which uses the ensemble architecture. * Add the transformer when using a textcat with GPU * Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928) * Switch ubuntu-latest to ubuntu-20.04 in main tests * Only use 20.04 for 3.6 * Require thinc v8.1.7 * Require thinc v8.1.8 * Break up longer expression --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-07 17:47:45 +01:00
Raphael Mitsch	1ea31552be	Merge branch 'master' into sync/master-into-v4 # Conflicts: # requirements.txt # spacy/pipeline/entity_linker.py # spacy/util.py # website/docs/api/entitylinker.mdx	2023-03-02 16:24:15 +01:00
Adriane Boyd	9d920bafcf	Extend mypy to v1.0.x (#12245 )	2023-02-08 14:33:16 +01:00
Adriane Boyd	9a454676f3	Use black version constraints from requirements.txt (#12220 )	2023-02-03 11:44:10 +01:00
Adriane Boyd	ec45f704b1	Drop python 3.6/3.7, remove unneeded compat (#12187 ) * Drop python 3.6/3.7, remove unneeded compat * Remove unused import * Minimal python 3.8+ docs updates	2023-01-27 15:48:20 +01:00
Adriane Boyd	8548d4d16e	Merge remote-tracking branch 'upstream/master' into update-v4-from-master-1	2023-01-27 08:29:09 +01:00
Daniël de Kok	b052b1b47f	Fix batching regression (#12094 ) * Fix batching regression Some time ago, the spaCy v4 branch switched to the new Thinc v9 schedule. However, this introduced an error in how batching is handed. In the PR, the batchers were changed to keep track of their step, so that the step can be passed to the schedule. However, the issue is that the training loop repeatedly calls the batching functions (rather than using an infinite generator/iterator). So, the step and therefore the schedule would be reset each epoch. Before the schedule switch we didn't have this issue, because the old schedules were stateful. This PR fixes this issue by reverting the batching functions to use a (stateful) generator. Their registry functions do accept a `Schedule` and we convert `Schedule`s to generators. * Update batcher docs * Docstring fixes * Make minibatch take iterables again as well * Bump thinc requirement to 9.0.0.dev2 * Use type declaration * Convert another comment into a proper type declaration	2023-01-18 18:28:30 +01:00
Albert Villanova del Moral	25373d8e8e	Fix required maximum version of typing-extensions (#12036 ) * Fix required maximum version of typing-extensions * Restrict to <4.5.0, sync minimum pin Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-01-13 10:44:02 +01:00
svlandeg	6852adc8b7	Merge branch 'copy_master' into copy_v4	2023-01-03 13:34:05 +01:00
Adriane Boyd	ef9e504eac	Rename modified textcat scorer to v2 (#11971 ) As a follow-up to #11696, rename the modified scorer to v2 and move the v1 scorer to `spacy-legacy`.	2022-12-29 14:01:08 +01:00
Daniël de Kok	20b63943f5	Adjust to new `Schedule` class and pass scores to `Optimizer` (#12008 ) * Adjust to new `Schedule` class and pass scores to `Optimizer` Requires https://github.com/explosion/thinc/pull/804 * Bump minimum Thinc requirement to 9.0.0.dev1	2022-12-29 08:03:24 +01:00
Daniël de Kok	207565a788	Merge remote-tracking branch 'upstream/master' into chore/v4-merge-master-20221222	2022-12-22 10:08:54 +01:00
Daniël de Kok	f9308aae13	Fix v4 branch to build against Thinc v9 (#11921 ) * Move `thinc.extra.search` to `spacy.pipeline._parser_internals` Backport of: https://github.com/explosion/spaCy/pull/11317 Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Replace references to `thinc.backends.linalg` with `CBlas` Backport of: https://github.com/explosion/spaCy/pull/11292 Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Use cross entropy from `thinc.legacy` * Require thinc>=9.0.0.dev0,<9.1.0 Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2022-12-17 14:32:19 +01:00
Adriane Boyd	8c291ace0c	Extend to wasabi v1.1 (#11945 ) * Extend to wasabi v1.1 * Temporarily run mypy and tests with newest wasabi * Temporarily skip check requirements test * Revert "Temporarily skip check requirements test" This reverts commit `44f4ce20a8`. * Revert "Temporarily run mypy and tests with newest wasabi" This reverts commit `e677a2257c`.	2022-12-12 08:38:36 +01:00
Adriane Boyd	1ebe7db07c	Support local filesystem remotes for projects (#11762 ) * Support local filesystem remotes for projects * Fix support for local filesystem remotes for projects * Use `FluidPath` instead of `Pathy` to support both filesystem and remote paths * Create missing parent directories if required for local filesystem * Add a more general `_file_exists` method to support both `Pathy`, `Path`, and `smart_open`-compatible URLs * Add explicit `smart_open` dependency starting with support for `compression` flag * Update `pathy` dependency to exclude older versions that aren't compatible with required `smart_open` version * Update docs to refer to `Pathy` instead of `smart_open` for project remotes (technically you can still push to any `smart_open`-compatible path but you can't pull from them) * Add tests for local filesystem remotes * Update pathy for general BlobStat sorting * Add import * Remove _file_exists since only Pathy remotes are supported * Format CLI docs * Clean up merge	2022-11-29 11:40:58 +01:00
Adriane Boyd	681ec20914	Add smart_open requirement, update deprecated options (#11864 ) * Switch from deprecated `ignore_ext` to `compression` * Add upload/download test for local files	2022-11-25 13:00:57 +01:00
Adriane Boyd	317b6ef99c	Update to mypy 0.990 (#11801 )	2022-11-16 14:09:10 +01:00
Paul O'Leary McCann	b76222e56a	Raise Typer limit (#11720 ) * Raise typer limit to <0.7.0 * Raise limit to <0.8.0	2022-11-07 08:11:55 +01:00
Adriane Boyd	6b5a3e7219	Extend to pydantic v1.10 (#11635 ) * Update types in `spacy.schemas` for updated pydantic+mypy	2022-10-14 08:16:49 +02:00
Adriane Boyd	8cd77dd54c	Sync flake8 version across requirements (#11580 )	2022-10-04 11:23:04 +02:00

1 2 3 4 5 ...

384 Commits