Commit Graph

16309 Commits

Author SHA1 Message Date
Matthew Honnibal
ae6910b09b Bump version 2024-09-06 22:23:41 +02:00
Matthew Honnibal
3bc5846e83 Fix serialization for uk trf model 2024-09-06 22:23:25 +02:00
Matthew Honnibal
2a37f97365 Increment version 2024-09-04 14:31:07 +02:00
Matthew Honnibal
3ee1b2bd1f Fix Spanish lemmatizer 2024-09-04 14:29:34 +02:00
Matthew Honnibal
6f7590bbf1 Revert "Fix apparent bug in Spanish lemmatizer. Not sure why this emerges in v4 not in v3"
This reverts commit 64b22be76e.
2024-09-04 14:26:39 +02:00
Matthew Honnibal
64b22be76e Fix apparent bug in Spanish lemmatizer. Not sure why this emerges in v4 not in v3 2024-09-04 14:22:13 +02:00
Matthew Honnibal
4eec3bfad1 Bump version 2024-09-02 13:16:15 +02:00
Matthew Honnibal
9e7421d45f Relax cupy-cuda pins to allow numpy v2 2024-09-02 13:15:53 +02:00
Matthew Honnibal
b9ecb15439 Bump version 2024-09-02 12:36:28 +02:00
Matthew Honnibal
3ccec6af7a Update thinc pin 2024-09-02 12:35:56 +02:00
Matthew Honnibal
a5ba7e4716 Bump dev version 2024-09-02 10:10:43 +02:00
Matthew Honnibal
d558e79823 Pin numpy to v2 2024-09-02 10:10:14 +02:00
Matthew Honnibal
77abf0828a Pin numpy to v2 2024-09-02 10:09:55 +02:00
Matthew Honnibal
304a8539e9 Bump dev version 2024-09-02 01:45:38 +02:00
Matthew Honnibal
f4c8fdfaad Update cli.package for removed spacy.vectors.name attr 2024-09-01 16:43:49 +02:00
Sofie Van Landeghem
818fdb537e
Merge pull request #13490 from svlandeg/feat/update_v4
Update v4 branch with latest from master
2024-05-14 22:41:17 +02:00
svlandeg
e32a394ff0 fix the fix for textcat init functionality 2024-05-14 18:45:51 +02:00
svlandeg
5992e927b9 fix textcat init functionality 2024-05-14 18:38:11 +02:00
svlandeg
c27679f210 Merge branch 'master' into feat/update_v4 2024-05-14 17:42:48 +02:00
Sofie Van Landeghem
c195ca4f9c
fix docs for MorphAnalysis.__contains__ (#13433) 2024-05-02 16:46:41 +02:00
Sofie Van Landeghem
d3a232f773
Update LICENSE to include 2024 (#13472) 2024-04-30 09:17:59 +02:00
Sofie Van Landeghem
ecd85d2618
Update Typer pin and GH actions (#13471)
* update gh actions

* pin typer upperbound to 1.0.0
2024-04-29 13:28:46 +02:00
Alex Strick van Linschoten
045cd43c3f
Fix typos in docs (#13466)
* fix typos

* prettier formatting

---------

Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-29 11:10:17 +02:00
Sofie Van Landeghem
74836524e3
Bump to v5 (#13470) 2024-04-29 10:36:31 +02:00
Sofie Van Landeghem
6d6c10ab9c
Fix CI (#13469)
* Remove hardcoded architecture setting

* update classifiers to include Python 3.12
2024-04-29 10:18:07 +02:00
Sofie Van Landeghem
287deee02c
remove empty file (#13458) 2024-04-26 10:04:16 +02:00
Daniël de Kok
b2ca7253d2
Document TrainablePipe.save_activations (#13452)
* Document `TrainablePipe.save_activations`

* Fully qualified links

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* prettier

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-04-23 09:21:23 +02:00
Daniël de Kok
f5918d4353
Update to Thinc 9.0.0 and set version to 4.0.0.dev3 (#13448)
* Update to Thinc 9.0.0 and set version to 4.0.0.dev3

* Set minimum Python version to 3.9
2024-04-22 09:40:55 +02:00
Daniël de Kok
5bd141013b
Remove apple from extras (#13439)
Account for merging of `thinc-apple-ops` into `thinc`.
2024-04-17 13:43:27 +02:00
Daniël de Kok
8696861c8c
Update spacy-curated-transformers docs for spaCy 4 (#13440)
- Update model constructors to v2 and add `dtype` argument.
- Update to `PyTorchCheckpointLoader` to `v2`.
- Add `transformer_discriminative.v1`.
2024-04-16 12:06:58 +02:00
Sofie Van Landeghem
2e2334632b
Fix use_gold_ents behaviour for EntityLinker (#13400)
* fix type annotation in docs

* only restore entities after loss calculation

* restore entities of sample in initialization

* rename overfitting function

* fix EL scorer

* Relax test

* fix formatting

* Update spacy/pipeline/entity_linker.py

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

* rename to _ensure_ents

* further rename

* allow for scorer to be None

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2024-04-16 12:00:22 +02:00
Joe Schiff
2e96797696
Convert properties to decorator syntax (#13390) 2024-04-16 11:51:14 +02:00
Daniël de Kok
fbc14aea45
Add distill subcommand (#13431)
* Add distill subcommand

This subcommand distills a student model from a teacher model.

* Fixes from Sofie

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Type and doc fixes

* Wording

* distill: document missing `-o`

* Wording

* Small fix

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-04-11 19:33:46 +02:00
Raphael Mitsch
304b9331e6
Modify EL batching to doc-wise streaming approach (#12367)
* Convert Candidate from Cython to Python class.

* Format.

* Fix .entity_ typo in _add_activations() usage.

* Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span].

* Update docs.

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update doc string of BaseCandidate.__init__().

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.

* Adjust Candidate to support and mandate numerical entity IDs.

* Format.

* Fix docstring and docs.

* Update website/docs/api/kb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename alias -> mention.

* Refactor Candidate attribute names. Update docs and tests accordingly.

* Refacor Candidate attributes and their usage.

* Format.

* Fix mypy error.

* Update error code in line with v4 convention.

* Modify EL batching system.

* Update leftover get_candidates() mention in docs.

* Format docs.

* Format.

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Updated error code.

* Simplify interface for int/str representations.

* Update website/docs/api/kb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename 'alias' to 'mention'.

* Port Candidate and InMemoryCandidate to Cython.

* Remove redundant entry in setup.py.

* Add abstract class check.

* Drop storing mention.

* Update spacy/kb/candidate.pxd

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix entity_id refactoring problems in docstrings.

* Drop unused InMemoryCandidate._entity_hash.

* Update docstrings.

* Move attributes out of Candidate.

* Partially fix alias/mention terminology usage. Convert Candidate to interface.

* Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().

* Update docstrings related to prior_prob.

* Update alias/mention usage in doc(strings).

* Update spacy/ml/models/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/ml/models/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.

* Update docstrings.

* Fix InMemoryCandidate attribute names.

* Update spacy/kb/kb.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/ml/models/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update W401 test.

* Update spacy/errors.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/kb/kb.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Use Candidate output type for toy generators in the test suite to mimick best practices

* fix docs

* fix import

* Fix merge leftovers.

* Update spacy/kb/kb.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/kb/kb.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/kb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/entitylinker.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/kb/kb_in_memory.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/inmemorylookupkb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update get_candidates() docstring.

* Reformat imports in entity_linker.py.

* Drop valid_ent_idx_per_doc.

* Update docs.

* Format.

* Simplify doc loop in predict().

* Remove E1044 comment.

* Fix merge errors.

* Format.

* Format.

* Format.

* Fix merge error & tests.

* Format.

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Use type alias.

* isort.

* isort.

* Lint.

* Add typedefs.pyx.

* Fix typedef import.

* Fix type aliases.

* Format.

* Update docstring and type usage.

* Add info on get_candidates(), get_candidates_batched().

* Readd get_candidates info to v3 changelog.

* Update website/docs/api/entitylinker.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update factory functions for backwards compatibility.

* Format.

* Ignore mypy error.

* Fix mypy error.

* Format.

* Add test for multiple docs with multiple entities.

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-09 11:39:18 +02:00
Sofie Van Landeghem
f5e85fa05a
allow weasel 0.4.x (#13409) 2024-04-04 12:55:08 +02:00
Yaseen
21aea59001
Update code.module.sass to make code title sticky (#13379) 2024-03-26 12:15:25 +01:00
Sofie Van Landeghem
4dc5fe5469
Renamed main branch back to v4 for now (#13395)
* Update gputests.yml

* Update slowtests.yml
2024-03-26 09:53:07 +01:00
Ines Montani
1252370f69 Move DocSearch key to env var [ci skip] 2024-03-25 10:17:57 +01:00
Sofie Van Landeghem
d410d95b52
remove smart_open requirement as it's taken care of via Weasel (#13391) 2024-03-22 18:21:20 +01:00
Matthew Honnibal
0518c36f04
Sanitize direct download (#13313)
The 'direct' option in 'spacy download' is supposed to only download from our model releases repository. However, users were able to pass in a relative path, allowing download from arbitrary repositories. This meant that a service that sourced strings from user input and which used the direct option would allow users to install arbitrary packages.
2024-02-20 13:17:51 +01:00
Daniël de Kok
bff8725f4b
Set version to 3.7.4 (#13327) 2024-02-14 14:46:28 +01:00
Daniël de Kok
fdfdbcd9f4
Make Language.pipe workers exit cleanly (#13321)
Also warn when any worker exited with a non-zero exit code and modify
test to ensure that workers exit cleanly by default.
2024-02-12 14:39:38 +01:00
Daniël de Kok
14bd9d89a3
Update example that shows model in requirments (#13302)
See #13293.
2024-02-11 19:46:43 +01:00
Adriane Boyd
afb22ad491
Remove debug data normalization for span analysis (#13203)
* Remove debug data normalization for span analysis

As a result of this normalization, `debug data` could show a user tokens
that do not exist in their data.

* Update spacy/cli/debug_data.py

---------

Co-authored-by: svlandeg <svlandeg@github.com>
2024-02-06 14:14:55 +01:00
Daniël de Kok
e1249d3722
Test if closing explicitly solves recursive lock issues (#13304) 2024-02-05 10:07:03 +01:00
Daniël de Kok
1052cba9f3
Merge pull request #13299 from danieldk/copy/master
Sync main with latests changes from master (v3)
2024-02-04 15:40:55 +01:00
Daniël de Kok
40422ff904
Set version to 3.7.3 (#13301) 2024-02-02 13:51:26 +01:00
Daniël de Kok
2dbb332cea
TextCatParametricAttention.v1: set key transform dimensions (#13249)
* TextCatParametricAttention.v1: set key transform dimensions

This is necessary for tok2vec implementations that initialize
lazily (e.g. curated transformers).

* Add lazily-initialized tok2vec to simulate transformers

Add a lazily-initialized tok2vec to the tests and test the current
textcat models with it.

Fix some additional issues found using this test.

* isort

* Add `test.` prefix to `LazyInitTok2Vec.v1`
2024-02-02 13:01:59 +01:00
Daniël de Kok
2d4067d021 Test if closing explicitly solves recursive lock issues 2024-02-02 11:39:07 +01:00
Daniël de Kok
d84068e460
Run slow tests: v4 -> main (#13290)
* Run slow tests: v4 -> main

* Also update the branch in GPU tests
2024-01-30 13:58:28 +01:00