svlandeg
5992e927b9
fix textcat init functionality
2024-05-14 18:38:11 +02:00
svlandeg
c27679f210
Merge branch 'master' into feat/update_v4
2024-05-14 17:42:48 +02:00
Sofie Van Landeghem
c195ca4f9c
fix docs for MorphAnalysis.__contains__ ( #13433 )
2024-05-02 16:46:41 +02:00
Sofie Van Landeghem
d3a232f773
Update LICENSE to include 2024 ( #13472 )
2024-04-30 09:17:59 +02:00
Sofie Van Landeghem
ecd85d2618
Update Typer pin and GH actions ( #13471 )
...
* update gh actions
* pin typer upperbound to 1.0.0
2024-04-29 13:28:46 +02:00
Alex Strick van Linschoten
045cd43c3f
Fix typos in docs ( #13466 )
...
* fix typos
* prettier formatting
---------
Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-29 11:10:17 +02:00
Sofie Van Landeghem
74836524e3
Bump to v5 ( #13470 )
2024-04-29 10:36:31 +02:00
Sofie Van Landeghem
6d6c10ab9c
Fix CI ( #13469 )
...
* Remove hardcoded architecture setting
* update classifiers to include Python 3.12
2024-04-29 10:18:07 +02:00
Sofie Van Landeghem
287deee02c
remove empty file ( #13458 )
2024-04-26 10:04:16 +02:00
Daniël de Kok
b2ca7253d2
Document TrainablePipe.save_activations
( #13452 )
...
* Document `TrainablePipe.save_activations`
* Fully qualified links
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* prettier
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-04-23 09:21:23 +02:00
Daniël de Kok
f5918d4353
Update to Thinc 9.0.0 and set version to 4.0.0.dev3 ( #13448 )
...
* Update to Thinc 9.0.0 and set version to 4.0.0.dev3
* Set minimum Python version to 3.9
2024-04-22 09:40:55 +02:00
Daniël de Kok
5bd141013b
Remove apple
from extras ( #13439 )
...
Account for merging of `thinc-apple-ops` into `thinc`.
2024-04-17 13:43:27 +02:00
Daniël de Kok
8696861c8c
Update spacy-curated-transformers
docs for spaCy 4 ( #13440 )
...
- Update model constructors to v2 and add `dtype` argument.
- Update to `PyTorchCheckpointLoader` to `v2`.
- Add `transformer_discriminative.v1`.
2024-04-16 12:06:58 +02:00
Sofie Van Landeghem
2e2334632b
Fix use_gold_ents behaviour for EntityLinker ( #13400 )
...
* fix type annotation in docs
* only restore entities after loss calculation
* restore entities of sample in initialization
* rename overfitting function
* fix EL scorer
* Relax test
* fix formatting
* Update spacy/pipeline/entity_linker.py
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* rename to _ensure_ents
* further rename
* allow for scorer to be None
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2024-04-16 12:00:22 +02:00
Joe Schiff
2e96797696
Convert properties to decorator syntax ( #13390 )
2024-04-16 11:51:14 +02:00
Daniël de Kok
fbc14aea45
Add distill subcommand ( #13431 )
...
* Add distill subcommand
This subcommand distills a student model from a teacher model.
* Fixes from Sofie
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Type and doc fixes
* Wording
* distill: document missing `-o`
* Wording
* Small fix
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-04-11 19:33:46 +02:00
Raphael Mitsch
304b9331e6
Modify EL batching to doc-wise streaming approach ( #12367 )
...
* Convert Candidate from Cython to Python class.
* Format.
* Fix .entity_ typo in _add_activations() usage.
* Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span].
* Update docs.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update doc string of BaseCandidate.__init__().
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.
* Adjust Candidate to support and mandate numerical entity IDs.
* Format.
* Fix docstring and docs.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename alias -> mention.
* Refactor Candidate attribute names. Update docs and tests accordingly.
* Refacor Candidate attributes and their usage.
* Format.
* Fix mypy error.
* Update error code in line with v4 convention.
* Modify EL batching system.
* Update leftover get_candidates() mention in docs.
* Format docs.
* Format.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Updated error code.
* Simplify interface for int/str representations.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename 'alias' to 'mention'.
* Port Candidate and InMemoryCandidate to Cython.
* Remove redundant entry in setup.py.
* Add abstract class check.
* Drop storing mention.
* Update spacy/kb/candidate.pxd
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix entity_id refactoring problems in docstrings.
* Drop unused InMemoryCandidate._entity_hash.
* Update docstrings.
* Move attributes out of Candidate.
* Partially fix alias/mention terminology usage. Convert Candidate to interface.
* Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().
* Update docstrings related to prior_prob.
* Update alias/mention usage in doc(strings).
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.
* Update docstrings.
* Fix InMemoryCandidate attribute names.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update W401 test.
* Update spacy/errors.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Use Candidate output type for toy generators in the test suite to mimick best practices
* fix docs
* fix import
* Fix merge leftovers.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/entitylinker.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb_in_memory.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/inmemorylookupkb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update get_candidates() docstring.
* Reformat imports in entity_linker.py.
* Drop valid_ent_idx_per_doc.
* Update docs.
* Format.
* Simplify doc loop in predict().
* Remove E1044 comment.
* Fix merge errors.
* Format.
* Format.
* Format.
* Fix merge error & tests.
* Format.
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Use type alias.
* isort.
* isort.
* Lint.
* Add typedefs.pyx.
* Fix typedef import.
* Fix type aliases.
* Format.
* Update docstring and type usage.
* Add info on get_candidates(), get_candidates_batched().
* Readd get_candidates info to v3 changelog.
* Update website/docs/api/entitylinker.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update factory functions for backwards compatibility.
* Format.
* Ignore mypy error.
* Fix mypy error.
* Format.
* Add test for multiple docs with multiple entities.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-09 11:39:18 +02:00
Sofie Van Landeghem
f5e85fa05a
allow weasel 0.4.x ( #13409 )
2024-04-04 12:55:08 +02:00
Yaseen
21aea59001
Update code.module.sass to make code title sticky ( #13379 )
2024-03-26 12:15:25 +01:00
Sofie Van Landeghem
4dc5fe5469
Renamed main branch back to v4 for now ( #13395 )
...
* Update gputests.yml
* Update slowtests.yml
2024-03-26 09:53:07 +01:00
Ines Montani
1252370f69
Move DocSearch key to env var [ci skip]
2024-03-25 10:17:57 +01:00
Sofie Van Landeghem
d410d95b52
remove smart_open requirement as it's taken care of via Weasel ( #13391 )
2024-03-22 18:21:20 +01:00
Matthew Honnibal
0518c36f04
Sanitize direct download ( #13313 )
...
The 'direct' option in 'spacy download' is supposed to only download from our model releases repository. However, users were able to pass in a relative path, allowing download from arbitrary repositories. This meant that a service that sourced strings from user input and which used the direct option would allow users to install arbitrary packages.
2024-02-20 13:17:51 +01:00
Daniël de Kok
bff8725f4b
Set version to 3.7.4 ( #13327 )
2024-02-14 14:46:28 +01:00
Daniël de Kok
fdfdbcd9f4
Make Language.pipe
workers exit cleanly ( #13321 )
...
Also warn when any worker exited with a non-zero exit code and modify
test to ensure that workers exit cleanly by default.
2024-02-12 14:39:38 +01:00
Daniël de Kok
14bd9d89a3
Update example that shows model in requirments ( #13302 )
...
See #13293 .
2024-02-11 19:46:43 +01:00
Adriane Boyd
afb22ad491
Remove debug data normalization for span analysis ( #13203 )
...
* Remove debug data normalization for span analysis
As a result of this normalization, `debug data` could show a user tokens
that do not exist in their data.
* Update spacy/cli/debug_data.py
---------
Co-authored-by: svlandeg <svlandeg@github.com>
2024-02-06 14:14:55 +01:00
Daniël de Kok
e1249d3722
Test if closing explicitly solves recursive lock issues ( #13304 )
2024-02-05 10:07:03 +01:00
Daniël de Kok
1052cba9f3
Merge pull request #13299 from danieldk/copy/master
...
Sync main with latests changes from master (v3)
2024-02-04 15:40:55 +01:00
Daniël de Kok
40422ff904
Set version to 3.7.3 ( #13301 )
2024-02-02 13:51:26 +01:00
Daniël de Kok
2dbb332cea
TextCatParametricAttention.v1
: set key transform dimensions (#13249 )
...
* TextCatParametricAttention.v1: set key transform dimensions
This is necessary for tok2vec implementations that initialize
lazily (e.g. curated transformers).
* Add lazily-initialized tok2vec to simulate transformers
Add a lazily-initialized tok2vec to the tests and test the current
textcat models with it.
Fix some additional issues found using this test.
* isort
* Add `test.` prefix to `LazyInitTok2Vec.v1`
2024-02-02 13:01:59 +01:00
Daniël de Kok
2d4067d021
Test if closing explicitly solves recursive lock issues
2024-02-02 11:39:07 +01:00
Daniël de Kok
d84068e460
Run slow tests: v4 -> main ( #13290 )
...
* Run slow tests: v4 -> main
* Also update the branch in GPU tests
2024-01-30 13:58:28 +01:00
Sofie Van Landeghem
89a43f39b7
update universe description ( #13291 )
2024-01-30 13:49:49 +01:00
Daniël de Kok
68d7841df5
Extension serialization attr tests: add teardown ( #13284 )
...
The doc/token extension serialization tests add extensions that are not
serializable with pickle. This didn't cause issues before due to the
implicit run order of tests. However, test ordering has changed with
pytest 8.0.0, leading to failed tests in test_language.
Update the fixtures in the extension serialization tests to do proper
teardown and remove the extensions.
2024-01-29 13:51:56 +01:00
Eliana Vornov
00e938a7c3
add custom code support to CLI speed benchmark ( #13247 )
...
* add custom code support to CLI speed benchmark
* sort imports
* better copying for warmup docs
2024-01-26 13:29:22 +01:00
Sofie Van Landeghem
68b85ea950
Clarify data_path loading for apply CLI command ( #13272 )
...
* attempt to clarify additional annotations on .spacy file
* suggestion by Daniël
* pipeline instead of pipe
2024-01-26 12:10:05 +01:00
Sofie Van Landeghem
7496e03a2c
Clarify vocab docs ( #13273 )
...
* add line to ensure that apple is in fact in the vocab
* add that the vocab may be empty
2024-01-26 10:58:48 +01:00
Daniël de Kok
70e2f2a14a
Update spacy-legacy
dependency to 4.0.0.dev1 ( #13270 )
...
This release is compatible with the parser refactor backout.
2024-01-25 18:24:22 +01:00
Daniël de Kok
ce9ea9629f
Set version to v4.0.0.dev2 ( #13269 )
2024-01-25 12:54:23 +01:00
Daniël de Kok
bbf38d4d0f
Merge pull request #13250 from danieldk/maintenance/v4-merge-master-20240119
...
Merge `master` into `v4`
2024-01-24 19:30:23 +01:00
Daniël de Kok
9e97c730be
Fix up requirements test
...
To account for buil dependencies being removed from `setup.cfg`.
2024-01-24 17:18:49 +01:00
Daniël de Kok
36ee709390
Remove setup_requires
from setup.cfg
2024-01-24 15:02:02 +01:00
Daniël de Kok
e722284ff4
Construct TextCatEnsemble.v2 using helper function
2024-01-24 14:59:01 +01:00
Daniël de Kok
ce4ea5ffa7
Py_UNICODE is not compatible with 3.12
2024-01-24 13:08:56 +01:00
Daniël de Kok
c621e251b8
Typing fixes
2024-01-24 12:20:01 +01:00
Sofie Van Landeghem
a493981163
fix typo ( #13254 )
2024-01-24 09:29:57 +01:00
Daniël de Kok
82ef6783a8
Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119
2024-01-24 09:09:01 +01:00
Daniël de Kok
a8894a8946
Merge pull request #13240 from mauricesvp/patch-1
...
Fix typo in method name
2024-01-23 20:49:21 +01:00
Daniël de Kok
afac7fb650
test_find_available_port: use port 5001 ( #13255 )
...
macOS now uses port 5000 for the AirPlay receiver functionality, so this
test will always fail on a macOS desktop (unless AirPlay receiver
functionality is disabled like in CI).
2024-01-23 20:11:16 +01:00