svlandeg
c27679f210
Merge branch 'master' into feat/update_v4
2024-05-14 17:42:48 +02:00
Sofie Van Landeghem
c195ca4f9c
fix docs for MorphAnalysis.__contains__ ( #13433 )
2024-05-02 16:46:41 +02:00
Alex Strick van Linschoten
045cd43c3f
Fix typos in docs ( #13466 )
...
* fix typos
* prettier formatting
---------
Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-29 11:10:17 +02:00
Daniël de Kok
b2ca7253d2
Document TrainablePipe.save_activations
( #13452 )
...
* Document `TrainablePipe.save_activations`
* Fully qualified links
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* prettier
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-04-23 09:21:23 +02:00
Daniël de Kok
5bd141013b
Remove apple
from extras ( #13439 )
...
Account for merging of `thinc-apple-ops` into `thinc`.
2024-04-17 13:43:27 +02:00
Daniël de Kok
8696861c8c
Update spacy-curated-transformers
docs for spaCy 4 ( #13440 )
...
- Update model constructors to v2 and add `dtype` argument.
- Update to `PyTorchCheckpointLoader` to `v2`.
- Add `transformer_discriminative.v1`.
2024-04-16 12:06:58 +02:00
Sofie Van Landeghem
2e2334632b
Fix use_gold_ents behaviour for EntityLinker ( #13400 )
...
* fix type annotation in docs
* only restore entities after loss calculation
* restore entities of sample in initialization
* rename overfitting function
* fix EL scorer
* Relax test
* fix formatting
* Update spacy/pipeline/entity_linker.py
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* rename to _ensure_ents
* further rename
* allow for scorer to be None
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2024-04-16 12:00:22 +02:00
Daniël de Kok
fbc14aea45
Add distill subcommand ( #13431 )
...
* Add distill subcommand
This subcommand distills a student model from a teacher model.
* Fixes from Sofie
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Type and doc fixes
* Wording
* distill: document missing `-o`
* Wording
* Small fix
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-04-11 19:33:46 +02:00
Raphael Mitsch
304b9331e6
Modify EL batching to doc-wise streaming approach ( #12367 )
...
* Convert Candidate from Cython to Python class.
* Format.
* Fix .entity_ typo in _add_activations() usage.
* Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span].
* Update docs.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update doc string of BaseCandidate.__init__().
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.
* Adjust Candidate to support and mandate numerical entity IDs.
* Format.
* Fix docstring and docs.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename alias -> mention.
* Refactor Candidate attribute names. Update docs and tests accordingly.
* Refacor Candidate attributes and their usage.
* Format.
* Fix mypy error.
* Update error code in line with v4 convention.
* Modify EL batching system.
* Update leftover get_candidates() mention in docs.
* Format docs.
* Format.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Updated error code.
* Simplify interface for int/str representations.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename 'alias' to 'mention'.
* Port Candidate and InMemoryCandidate to Cython.
* Remove redundant entry in setup.py.
* Add abstract class check.
* Drop storing mention.
* Update spacy/kb/candidate.pxd
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix entity_id refactoring problems in docstrings.
* Drop unused InMemoryCandidate._entity_hash.
* Update docstrings.
* Move attributes out of Candidate.
* Partially fix alias/mention terminology usage. Convert Candidate to interface.
* Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().
* Update docstrings related to prior_prob.
* Update alias/mention usage in doc(strings).
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.
* Update docstrings.
* Fix InMemoryCandidate attribute names.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update W401 test.
* Update spacy/errors.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Use Candidate output type for toy generators in the test suite to mimick best practices
* fix docs
* fix import
* Fix merge leftovers.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/entitylinker.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb_in_memory.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/inmemorylookupkb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update get_candidates() docstring.
* Reformat imports in entity_linker.py.
* Drop valid_ent_idx_per_doc.
* Update docs.
* Format.
* Simplify doc loop in predict().
* Remove E1044 comment.
* Fix merge errors.
* Format.
* Format.
* Format.
* Fix merge error & tests.
* Format.
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Use type alias.
* isort.
* isort.
* Lint.
* Add typedefs.pyx.
* Fix typedef import.
* Fix type aliases.
* Format.
* Update docstring and type usage.
* Add info on get_candidates(), get_candidates_batched().
* Readd get_candidates info to v3 changelog.
* Update website/docs/api/entitylinker.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update factory functions for backwards compatibility.
* Format.
* Ignore mypy error.
* Fix mypy error.
* Format.
* Add test for multiple docs with multiple entities.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-09 11:39:18 +02:00
Daniël de Kok
14bd9d89a3
Update example that shows model in requirments ( #13302 )
...
See #13293 .
2024-02-11 19:46:43 +01:00
Daniël de Kok
1052cba9f3
Merge pull request #13299 from danieldk/copy/master
...
Sync main with latests changes from master (v3)
2024-02-04 15:40:55 +01:00
Eliana Vornov
00e938a7c3
add custom code support to CLI speed benchmark ( #13247 )
...
* add custom code support to CLI speed benchmark
* sort imports
* better copying for warmup docs
2024-01-26 13:29:22 +01:00
Sofie Van Landeghem
68b85ea950
Clarify data_path loading for apply CLI command ( #13272 )
...
* attempt to clarify additional annotations on .spacy file
* suggestion by Daniël
* pipeline instead of pipe
2024-01-26 12:10:05 +01:00
Sofie Van Landeghem
7496e03a2c
Clarify vocab docs ( #13273 )
...
* add line to ensure that apple is in fact in the vocab
* add that the vocab may be empty
2024-01-26 10:58:48 +01:00
Sofie Van Landeghem
a493981163
fix typo ( #13254 )
2024-01-24 09:29:57 +01:00
Daniël de Kok
82ef6783a8
Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119
2024-01-24 09:09:01 +01:00
Raphael Mitsch
575c405ae3
Fix LLM docs on task factories.
2024-01-19 16:48:54 +01:00
Raphael Mitsch
256468c414
Merge branch 'docs/llm_main' into chore/sync-master-with-llm_main
...
# Conflicts:
# website/docs/api/large-language-models.mdx
2024-01-19 16:34:35 +01:00
Raphael Mitsch
91c24c0285
Merge pull request #13251 from explosion/docs/llm_develop
...
Sync `docs/llm_main` with `docs/llm_develop`
2024-01-19 12:56:38 +01:00
Daniël de Kok
81beaea70e
Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119
2024-01-19 12:34:29 +01:00
Raphael Mitsch
0062c22c35
Updated docs w.r.t. infinite doc length changes ( #13214 )
...
* Updated docs w.r.t. infinite doc length.
* Fix typo.
* fix typo's
* Fix table formatting.
* Update formatting.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-01-05 14:20:58 +01:00
Daniël de Kok
e2a3952de5
Add spacy.TextCatParametricAttention.v1 ( #13201 )
...
* Add spacy.TextCatParametricAttention.v1
This layer provides is a simplification of the ensemble classifier that
only uses paramteric attention. We have found empirically that with a
sufficient amount of training data, using the ensemble classifier with
BoW does not provide significant improvement in classifier accuracy.
However, plugging in a BoW classifier does reduce GPU training and
inference performance substantially, since it uses a GPU-only kernel.
* Fix merge fallout
2024-01-02 10:03:06 +01:00
Daniël de Kok
7718886fa3
TransitionBasedParser.v2 in run example output
...
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-12-21 11:14:35 +01:00
Daniël de Kok
7ebba86402
Add TextCatReduce.v1 ( #13181 )
...
* Add TextCatReduce.v1
This is a textcat classifier that pools the vectors generated by a
tok2vec implementation and then applies a classifier to the pooled
representation. Three reductions are supported for pooling: first, max,
and mean. When multiple reductions are enabled, the reductions are
concatenated before providing them to the classification layer.
This model is a generalization of the TextCatCNN model, which only
supports mean reductions and is a bit of a misnomer, because it can also
be used with transformers. This change also reimplements TextCatCNN.v2
using the new TextCatReduce.v1 layer.
* Doc fixes
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fully specify `TextCatCNN` <-> `TextCatReduce` equivalence
* Move TextCatCNN docs to legacy, in prep for moving to spacy-legacy
* Add back a test for TextCatCNN.v2
* Replace TextCatCNN in pipe configurations and templates
* Add an infobox to the `TextCatReduce` section with an `TextCatCNN` anchor
* Add last reduction (`use_reduce_last`)
* Remove non-working TextCatCNN Netlify redirect
* Revert layer changes for the quickstart
* Revert one more quickstart change
* Remove unused import
* Fix docstring
* Fix setting name in error message
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-12-21 11:00:06 +01:00
Daniël de Kok
57203fa0fc
Fix TransitionBasedParser
version in transformer embeddings docs
2023-12-19 09:28:20 +01:00
Raphael Mitsch
d56ee65ddf
Document spacy-llm
's TranslationTask
( #13183 )
...
* Describe translation task.
* Fix references to examples and template.
* Format.
2023-12-11 17:41:04 +01:00
Raphael Mitsch
e79a9c5acd
Document spacy-llm
's RawTask
( #13180 )
...
* Add section on RawTask.
* Fix API docs.
* Update website/docs/api/large-language-models.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-12-11 17:14:12 +01:00
Daniël de Kok
e5ec45cb7e
Revert "Merge the parser refactor into v4
( #10940 )"
...
This reverts commit a183db3cef
.
2023-12-08 20:23:08 +01:00
Raphael Mitsch
9fcd2bfa08
Add info on endpoint arg. ( #13169 )
2023-12-05 12:46:29 +01:00
Raphael Mitsch
a25a3b996b
Merge pull request #13173 from explosion/docs/llm_main
...
Sync `llm_develop` with `llm_main`
2023-12-04 16:46:21 +01:00
Raphael Mitsch
55ed2b4e82
Add documentation for EL task ( #12988 )
...
* Add documentation for EL task.
* Fix EL factory name.
* Add llm_entity_linker_mentio.
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Incorporate feedback.
* Format.
* Fix link to KB data.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2023-12-04 15:23:28 +01:00
Adriane Boyd
e467573550
Docs: update trf_data examples and pipeline design info ( #13164 )
2023-12-04 15:15:54 +01:00
Raphael Mitsch
0e43fca036
Add Claude-2.1 mention. ( #13167 )
2023-12-01 16:48:35 +01:00
Daniël de Kok
da7ad97519
Update TextCatBOW
to use the fixed SparseLinear
layer ( #13149 )
...
* Update `TextCatBOW` to use the fixed `SparseLinear` layer
A while ago, we fixed the `SparseLinear` layer to use all available
parameters: https://github.com/explosion/thinc/pull/754
This change updates `TextCatBOW` to `v3` which uses the new
`SparseLinear_v2` layer. This results in a sizeable improvement on a
text categorization task that was tested.
While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent`
option to make it possible to change the hidden size. Ideally, we'd just
have an option called `length`. But the way that `TextCatBOW` uses
hashes results in a non-uniform distribution of parameters when the
length is not a power of two.
* Replace TexCatBOW `length_exponent` parameter by `length`
We now round up the length to the next power of two if it isn't
a power of two.
* Remove some tests for TextCatBOW.v2
* Fix missing import
2023-11-29 09:11:54 +01:00
Raphael Mitsch
b2e831d966
LLM docs: OpenAI model update ( #13119 )
...
* Update supported OpenAI models.
* Update with new GPT-3.5 and GPT-4 versions.
* Add links to OpenAI model docs.
2023-11-08 17:55:16 +01:00
Adriane Boyd
513bbd5fa3
Add preferred use of build for package CLI ( #13109 )
...
Build with `build` if available. Warn and fall back to previous
`setup.py`-based builds if `build` build fails.
2023-11-08 17:35:24 +01:00
Sofie Van Landeghem
a804b83a4b
Update llm docs to clarify task-specific factories ( #13082 )
...
* fix typo
* add examples to specify custom model for task-specific factory
2023-10-31 22:07:07 +01:00
Sofie Van Landeghem
48248c62b6
Clarify EL example in docs ( #13071 )
...
* add comment that pipeline is a custom one
* add link to NEL tutorial
* prettier
* revert prettier reformat
* revert prettier reformat (2)
* fix typo
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-10-31 21:58:29 +01:00
Raphael Mitsch
0c15876502
Fix spancat typo. ( #13095 )
2023-10-31 13:45:10 +01:00
Raphael Mitsch
9deaac9786
Add note in docs on score_weight
config if using a non-default spans_key
for SpanCat ( #13093 )
...
* Add note on score_weight if using a non-default span_key for SpanCat.
* Fix formatting.
* Fix formatting.
* Fix typo.
* Use warning infobox.
* Fix infobox formatting.
2023-10-30 17:02:08 +01:00
Raphael Mitsch
d72029d9c8
Add binary examples for Textcat task in spacy-llm
( #13051 )
...
* Add examples for binary classification.
* Fix example.
* Remove binary textcat example. Format.
* Rephrase.
2023-10-11 12:23:38 +02:00
Ines Montani
b83f1e3724
Inline displaCy visualizations in docs ( #13050 ) [ci skip]
2023-10-06 14:22:43 +02:00
Raphael Mitsch
be29216fe2
Merge pull request #13044 from explosion/docs/llm_main
...
Sync `master` with `docs/llm_main`
2023-10-05 16:10:19 +02:00
Raphael Mitsch
1162fcf099
Add Mistral mentions. ( #13037 )
2023-10-05 14:44:38 +02:00
Raphael Mitsch
862f8254e8
Add docs on Azure OpenAI support in spacy-llm
( #13043 )
...
* Add gpt-3.5-turbo-instruct to list of supported OpenAI models.
* Update `spacy-llm` task argument docs w.r.t. task refactoring (#12995 )
* Update task arguments w.r.t. task refactoring in 0.5.0.
* Add disclaimer w.r.t. gated models/Llama 2.
* Update website/docs/api/large-language-models.mdx
* Update website/docs/api/large-language-models.mdx
* Update docs w.r.t. PaLM support. (#13018 )
* Add info on spacy.Azure.v1.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Format.
2023-10-05 13:18:27 +02:00
Raphael Mitsch
1dec138e61
Update docs w.r.t. PaLM support. ( #13018 )
2023-10-05 08:50:41 +02:00
Adriane Boyd
6e54360a3d
Remove pathy dependency, update docs for cloudpathlib in Weasel ( #13035 )
2023-10-05 08:50:22 +02:00
Raphael Mitsch
734826db79
Update spacy-llm
task argument docs w.r.t. task refactoring ( #12995 )
...
* Update task arguments w.r.t. task refactoring in 0.5.0.
* Add disclaimer w.r.t. gated models/Llama 2.
* Update website/docs/api/large-language-models.mdx
* Update website/docs/api/large-language-models.mdx
2023-10-05 08:45:25 +02:00
Adriane Boyd
160e61772e
Docs for v3.7.0 ( #13029 )
...
* Docs for v3.7.0
* Minor fixes
* Extend Weasel notes
* Minor edits
* Update version in README
2023-10-01 21:40:07 +02:00
Adriane Boyd
406794a081
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1
2023-09-28 15:09:06 +02:00