Matthew Honnibal
ae6910b09b
Bump version
2024-09-06 22:23:41 +02:00
Matthew Honnibal
3bc5846e83
Fix serialization for uk trf model
2024-09-06 22:23:25 +02:00
Matthew Honnibal
2a37f97365
Increment version
2024-09-04 14:31:07 +02:00
Matthew Honnibal
3ee1b2bd1f
Fix Spanish lemmatizer
2024-09-04 14:29:34 +02:00
Matthew Honnibal
6f7590bbf1
Revert "Fix apparent bug in Spanish lemmatizer. Not sure why this emerges in v4 not in v3"
...
This reverts commit 64b22be76e
.
2024-09-04 14:26:39 +02:00
Matthew Honnibal
64b22be76e
Fix apparent bug in Spanish lemmatizer. Not sure why this emerges in v4 not in v3
2024-09-04 14:22:13 +02:00
Matthew Honnibal
4eec3bfad1
Bump version
2024-09-02 13:16:15 +02:00
Matthew Honnibal
9e7421d45f
Relax cupy-cuda pins to allow numpy v2
2024-09-02 13:15:53 +02:00
Matthew Honnibal
b9ecb15439
Bump version
2024-09-02 12:36:28 +02:00
Matthew Honnibal
3ccec6af7a
Update thinc pin
2024-09-02 12:35:56 +02:00
Matthew Honnibal
a5ba7e4716
Bump dev version
2024-09-02 10:10:43 +02:00
Matthew Honnibal
d558e79823
Pin numpy to v2
2024-09-02 10:10:14 +02:00
Matthew Honnibal
77abf0828a
Pin numpy to v2
2024-09-02 10:09:55 +02:00
Matthew Honnibal
304a8539e9
Bump dev version
2024-09-02 01:45:38 +02:00
Matthew Honnibal
f4c8fdfaad
Update cli.package for removed spacy.vectors.name attr
2024-09-01 16:43:49 +02:00
Sofie Van Landeghem
818fdb537e
Merge pull request #13490 from svlandeg/feat/update_v4
...
Update v4 branch with latest from master
2024-05-14 22:41:17 +02:00
svlandeg
e32a394ff0
fix the fix for textcat init functionality
2024-05-14 18:45:51 +02:00
svlandeg
5992e927b9
fix textcat init functionality
2024-05-14 18:38:11 +02:00
svlandeg
c27679f210
Merge branch 'master' into feat/update_v4
2024-05-14 17:42:48 +02:00
Sofie Van Landeghem
c195ca4f9c
fix docs for MorphAnalysis.__contains__ ( #13433 )
2024-05-02 16:46:41 +02:00
Sofie Van Landeghem
d3a232f773
Update LICENSE to include 2024 ( #13472 )
2024-04-30 09:17:59 +02:00
Sofie Van Landeghem
ecd85d2618
Update Typer pin and GH actions ( #13471 )
...
* update gh actions
* pin typer upperbound to 1.0.0
2024-04-29 13:28:46 +02:00
Alex Strick van Linschoten
045cd43c3f
Fix typos in docs ( #13466 )
...
* fix typos
* prettier formatting
---------
Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-29 11:10:17 +02:00
Sofie Van Landeghem
74836524e3
Bump to v5 ( #13470 )
2024-04-29 10:36:31 +02:00
Sofie Van Landeghem
6d6c10ab9c
Fix CI ( #13469 )
...
* Remove hardcoded architecture setting
* update classifiers to include Python 3.12
2024-04-29 10:18:07 +02:00
Sofie Van Landeghem
287deee02c
remove empty file ( #13458 )
2024-04-26 10:04:16 +02:00
Daniël de Kok
b2ca7253d2
Document TrainablePipe.save_activations
( #13452 )
...
* Document `TrainablePipe.save_activations`
* Fully qualified links
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* prettier
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-04-23 09:21:23 +02:00
Daniël de Kok
f5918d4353
Update to Thinc 9.0.0 and set version to 4.0.0.dev3 ( #13448 )
...
* Update to Thinc 9.0.0 and set version to 4.0.0.dev3
* Set minimum Python version to 3.9
2024-04-22 09:40:55 +02:00
Daniël de Kok
5bd141013b
Remove apple
from extras ( #13439 )
...
Account for merging of `thinc-apple-ops` into `thinc`.
2024-04-17 13:43:27 +02:00
Daniël de Kok
8696861c8c
Update spacy-curated-transformers
docs for spaCy 4 ( #13440 )
...
- Update model constructors to v2 and add `dtype` argument.
- Update to `PyTorchCheckpointLoader` to `v2`.
- Add `transformer_discriminative.v1`.
2024-04-16 12:06:58 +02:00
Sofie Van Landeghem
2e2334632b
Fix use_gold_ents behaviour for EntityLinker ( #13400 )
...
* fix type annotation in docs
* only restore entities after loss calculation
* restore entities of sample in initialization
* rename overfitting function
* fix EL scorer
* Relax test
* fix formatting
* Update spacy/pipeline/entity_linker.py
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* rename to _ensure_ents
* further rename
* allow for scorer to be None
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2024-04-16 12:00:22 +02:00
Joe Schiff
2e96797696
Convert properties to decorator syntax ( #13390 )
2024-04-16 11:51:14 +02:00
Daniël de Kok
fbc14aea45
Add distill subcommand ( #13431 )
...
* Add distill subcommand
This subcommand distills a student model from a teacher model.
* Fixes from Sofie
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Type and doc fixes
* Wording
* distill: document missing `-o`
* Wording
* Small fix
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-04-11 19:33:46 +02:00
Raphael Mitsch
304b9331e6
Modify EL batching to doc-wise streaming approach ( #12367 )
...
* Convert Candidate from Cython to Python class.
* Format.
* Fix .entity_ typo in _add_activations() usage.
* Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span].
* Update docs.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update doc string of BaseCandidate.__init__().
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.
* Adjust Candidate to support and mandate numerical entity IDs.
* Format.
* Fix docstring and docs.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename alias -> mention.
* Refactor Candidate attribute names. Update docs and tests accordingly.
* Refacor Candidate attributes and their usage.
* Format.
* Fix mypy error.
* Update error code in line with v4 convention.
* Modify EL batching system.
* Update leftover get_candidates() mention in docs.
* Format docs.
* Format.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Updated error code.
* Simplify interface for int/str representations.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename 'alias' to 'mention'.
* Port Candidate and InMemoryCandidate to Cython.
* Remove redundant entry in setup.py.
* Add abstract class check.
* Drop storing mention.
* Update spacy/kb/candidate.pxd
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix entity_id refactoring problems in docstrings.
* Drop unused InMemoryCandidate._entity_hash.
* Update docstrings.
* Move attributes out of Candidate.
* Partially fix alias/mention terminology usage. Convert Candidate to interface.
* Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().
* Update docstrings related to prior_prob.
* Update alias/mention usage in doc(strings).
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.
* Update docstrings.
* Fix InMemoryCandidate attribute names.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update W401 test.
* Update spacy/errors.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Use Candidate output type for toy generators in the test suite to mimick best practices
* fix docs
* fix import
* Fix merge leftovers.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/entitylinker.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb_in_memory.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/inmemorylookupkb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update get_candidates() docstring.
* Reformat imports in entity_linker.py.
* Drop valid_ent_idx_per_doc.
* Update docs.
* Format.
* Simplify doc loop in predict().
* Remove E1044 comment.
* Fix merge errors.
* Format.
* Format.
* Format.
* Fix merge error & tests.
* Format.
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Use type alias.
* isort.
* isort.
* Lint.
* Add typedefs.pyx.
* Fix typedef import.
* Fix type aliases.
* Format.
* Update docstring and type usage.
* Add info on get_candidates(), get_candidates_batched().
* Readd get_candidates info to v3 changelog.
* Update website/docs/api/entitylinker.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update factory functions for backwards compatibility.
* Format.
* Ignore mypy error.
* Fix mypy error.
* Format.
* Add test for multiple docs with multiple entities.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-09 11:39:18 +02:00
Sofie Van Landeghem
f5e85fa05a
allow weasel 0.4.x ( #13409 )
2024-04-04 12:55:08 +02:00
Yaseen
21aea59001
Update code.module.sass to make code title sticky ( #13379 )
2024-03-26 12:15:25 +01:00
Sofie Van Landeghem
4dc5fe5469
Renamed main branch back to v4 for now ( #13395 )
...
* Update gputests.yml
* Update slowtests.yml
2024-03-26 09:53:07 +01:00
Ines Montani
1252370f69
Move DocSearch key to env var [ci skip]
2024-03-25 10:17:57 +01:00
Sofie Van Landeghem
d410d95b52
remove smart_open requirement as it's taken care of via Weasel ( #13391 )
2024-03-22 18:21:20 +01:00
Matthew Honnibal
0518c36f04
Sanitize direct download ( #13313 )
...
The 'direct' option in 'spacy download' is supposed to only download from our model releases repository. However, users were able to pass in a relative path, allowing download from arbitrary repositories. This meant that a service that sourced strings from user input and which used the direct option would allow users to install arbitrary packages.
2024-02-20 13:17:51 +01:00
Daniël de Kok
bff8725f4b
Set version to 3.7.4 ( #13327 )
2024-02-14 14:46:28 +01:00
Daniël de Kok
fdfdbcd9f4
Make Language.pipe
workers exit cleanly ( #13321 )
...
Also warn when any worker exited with a non-zero exit code and modify
test to ensure that workers exit cleanly by default.
2024-02-12 14:39:38 +01:00
Daniël de Kok
14bd9d89a3
Update example that shows model in requirments ( #13302 )
...
See #13293 .
2024-02-11 19:46:43 +01:00
Adriane Boyd
afb22ad491
Remove debug data normalization for span analysis ( #13203 )
...
* Remove debug data normalization for span analysis
As a result of this normalization, `debug data` could show a user tokens
that do not exist in their data.
* Update spacy/cli/debug_data.py
---------
Co-authored-by: svlandeg <svlandeg@github.com>
2024-02-06 14:14:55 +01:00
Daniël de Kok
e1249d3722
Test if closing explicitly solves recursive lock issues ( #13304 )
2024-02-05 10:07:03 +01:00
Daniël de Kok
1052cba9f3
Merge pull request #13299 from danieldk/copy/master
...
Sync main with latests changes from master (v3)
2024-02-04 15:40:55 +01:00
Daniël de Kok
40422ff904
Set version to 3.7.3 ( #13301 )
2024-02-02 13:51:26 +01:00
Daniël de Kok
2dbb332cea
TextCatParametricAttention.v1
: set key transform dimensions (#13249 )
...
* TextCatParametricAttention.v1: set key transform dimensions
This is necessary for tok2vec implementations that initialize
lazily (e.g. curated transformers).
* Add lazily-initialized tok2vec to simulate transformers
Add a lazily-initialized tok2vec to the tests and test the current
textcat models with it.
Fix some additional issues found using this test.
* isort
* Add `test.` prefix to `LazyInitTok2Vec.v1`
2024-02-02 13:01:59 +01:00
Daniël de Kok
2d4067d021
Test if closing explicitly solves recursive lock issues
2024-02-02 11:39:07 +01:00
Daniël de Kok
d84068e460
Run slow tests: v4 -> main ( #13290 )
...
* Run slow tests: v4 -> main
* Also update the branch in GPU tests
2024-01-30 13:58:28 +01:00