spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-24 19:11:58 +03:00

Author	SHA1	Message	Date
svlandeg	c27679f210	Merge branch 'master' into feat/update_v4	2024-05-14 17:42:48 +02:00
Sofie Van Landeghem	c195ca4f9c	fix docs for MorphAnalysis.__contains__ (#13433 )	2024-05-02 16:46:41 +02:00
Alex Strick van Linschoten	045cd43c3f	Fix typos in docs (#13466 ) * fix typos * prettier formatting --------- Co-authored-by: svlandeg <svlandeg@github.com>	2024-04-29 11:10:17 +02:00
Daniël de Kok	b2ca7253d2	Document `TrainablePipe.save_activations` (#13452 ) * Document `TrainablePipe.save_activations` * Fully qualified links Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * prettier --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2024-04-23 09:21:23 +02:00
Daniël de Kok	5bd141013b	Remove `apple` from extras (#13439 ) Account for merging of `thinc-apple-ops` into `thinc`.	2024-04-17 13:43:27 +02:00
Daniël de Kok	8696861c8c	Update `spacy-curated-transformers` docs for spaCy 4 (#13440 ) - Update model constructors to v2 and add `dtype` argument. - Update to `PyTorchCheckpointLoader` to `v2`. - Add `transformer_discriminative.v1`.	2024-04-16 12:06:58 +02:00
Sofie Van Landeghem	2e2334632b	Fix use_gold_ents behaviour for EntityLinker (#13400 ) * fix type annotation in docs * only restore entities after loss calculation * restore entities of sample in initialization * rename overfitting function * fix EL scorer * Relax test * fix formatting * Update spacy/pipeline/entity_linker.py Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * rename to _ensure_ents * further rename * allow for scorer to be None --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2024-04-16 12:00:22 +02:00
Daniël de Kok	fbc14aea45	Add distill subcommand (#13431 ) * Add distill subcommand This subcommand distills a student model from a teacher model. * Fixes from Sofie Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Type and doc fixes * Wording * distill: document missing `-o` * Wording * Small fix --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2024-04-11 19:33:46 +02:00
Raphael Mitsch	304b9331e6	Modify EL batching to doc-wise streaming approach (#12367 ) * Convert Candidate from Cython to Python class. * Format. * Fix .entity_ typo in _add_activations() usage. * Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span]. * Update docs. * Update spacy/kb/candidate.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update doc string of BaseCandidate.__init__(). * Update spacy/kb/candidate.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate. * Adjust Candidate to support and mandate numerical entity IDs. * Format. * Fix docstring and docs. * Update website/docs/api/kb.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Rename alias -> mention. * Refactor Candidate attribute names. Update docs and tests accordingly. * Refacor Candidate attributes and their usage. * Format. * Fix mypy error. * Update error code in line with v4 convention. * Modify EL batching system. * Update leftover get_candidates() mention in docs. * Format docs. * Format. * Update spacy/kb/candidate.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Updated error code. * Simplify interface for int/str representations. * Update website/docs/api/kb.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Rename 'alias' to 'mention'. * Port Candidate and InMemoryCandidate to Cython. * Remove redundant entry in setup.py. * Add abstract class check. * Drop storing mention. * Update spacy/kb/candidate.pxd Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix entity_id refactoring problems in docstrings. * Drop unused InMemoryCandidate._entity_hash. * Update docstrings. * Move attributes out of Candidate. * Partially fix alias/mention terminology usage. Convert Candidate to interface. * Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs(). * Update docstrings related to prior_prob. * Update alias/mention usage in doc(strings). * Update spacy/ml/models/entity_linker.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/models/entity_linker.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs. * Update docstrings. * Fix InMemoryCandidate attribute names. * Update spacy/kb/kb.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/models/entity_linker.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update W401 test. * Update spacy/errors.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/kb/kb.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Use Candidate output type for toy generators in the test suite to mimick best practices * fix docs * fix import * Fix merge leftovers. * Update spacy/kb/kb.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/kb/kb.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/docs/api/kb.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/docs/api/entitylinker.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/kb/kb_in_memory.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/docs/api/inmemorylookupkb.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update get_candidates() docstring. * Reformat imports in entity_linker.py. * Drop valid_ent_idx_per_doc. * Update docs. * Format. * Simplify doc loop in predict(). * Remove E1044 comment. * Fix merge errors. * Format. * Format. * Format. * Fix merge error & tests. * Format. * Apply suggestions from code review Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Use type alias. * isort. * isort. * Lint. * Add typedefs.pyx. * Fix typedef import. * Fix type aliases. * Format. * Update docstring and type usage. * Add info on get_candidates(), get_candidates_batched(). * Readd get_candidates info to v3 changelog. * Update website/docs/api/entitylinker.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update factory functions for backwards compatibility. * Format. * Ignore mypy error. * Fix mypy error. * Format. * Add test for multiple docs with multiple entities. --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: svlandeg <svlandeg@github.com>	2024-04-09 11:39:18 +02:00
Yaseen	21aea59001	Update code.module.sass to make code title sticky (#13379 )	2024-03-26 12:15:25 +01:00
Ines Montani	1252370f69	Move DocSearch key to env var [ci skip]	2024-03-25 10:17:57 +01:00
Daniël de Kok	14bd9d89a3	Update example that shows model in requirments (#13302 ) See #13293.	2024-02-11 19:46:43 +01:00
Daniël de Kok	1052cba9f3	Merge pull request #13299 from danieldk/copy/master Sync main with latests changes from master (v3)	2024-02-04 15:40:55 +01:00
Eliana Vornov	00e938a7c3	add custom code support to CLI speed benchmark (#13247 ) * add custom code support to CLI speed benchmark * sort imports * better copying for warmup docs	2024-01-26 13:29:22 +01:00
Sofie Van Landeghem	68b85ea950	Clarify data_path loading for apply CLI command (#13272 ) * attempt to clarify additional annotations on .spacy file * suggestion by Daniël * pipeline instead of pipe	2024-01-26 12:10:05 +01:00
Sofie Van Landeghem	7496e03a2c	Clarify vocab docs (#13273 ) * add line to ensure that apple is in fact in the vocab * add that the vocab may be empty	2024-01-26 10:58:48 +01:00
Sofie Van Landeghem	a493981163	fix typo (#13254 )	2024-01-24 09:29:57 +01:00
Daniël de Kok	82ef6783a8	Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119	2024-01-24 09:09:01 +01:00
Raphael Mitsch	575c405ae3	Fix LLM docs on task factories.	2024-01-19 16:48:54 +01:00
Raphael Mitsch	256468c414	Merge branch 'docs/llm_main' into chore/sync-master-with-llm_main # Conflicts: # website/docs/api/large-language-models.mdx	2024-01-19 16:34:35 +01:00
Raphael Mitsch	91c24c0285	Merge pull request #13251 from explosion/docs/llm_develop Sync `docs/llm_main` with `docs/llm_develop`	2024-01-19 12:56:38 +01:00
Daniël de Kok	81beaea70e	Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119	2024-01-19 12:34:29 +01:00
Raphael Mitsch	0062c22c35	Updated docs w.r.t. infinite doc length changes (#13214 ) * Updated docs w.r.t. infinite doc length. * Fix typo. * fix typo's * Fix table formatting. * Update formatting. --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2024-01-05 14:20:58 +01:00
Daniël de Kok	e2a3952de5	Add spacy.TextCatParametricAttention.v1 (#13201 ) * Add spacy.TextCatParametricAttention.v1 This layer provides is a simplification of the ensemble classifier that only uses paramteric attention. We have found empirically that with a sufficient amount of training data, using the ensemble classifier with BoW does not provide significant improvement in classifier accuracy. However, plugging in a BoW classifier does reduce GPU training and inference performance substantially, since it uses a GPU-only kernel. * Fix merge fallout	2024-01-02 10:03:06 +01:00
Daniël de Kok	7718886fa3	TransitionBasedParser.v2 in run example output Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-12-21 11:14:35 +01:00
Daniël de Kok	7ebba86402	Add TextCatReduce.v1 (#13181 ) * Add TextCatReduce.v1 This is a textcat classifier that pools the vectors generated by a tok2vec implementation and then applies a classifier to the pooled representation. Three reductions are supported for pooling: first, max, and mean. When multiple reductions are enabled, the reductions are concatenated before providing them to the classification layer. This model is a generalization of the TextCatCNN model, which only supports mean reductions and is a bit of a misnomer, because it can also be used with transformers. This change also reimplements TextCatCNN.v2 using the new TextCatReduce.v1 layer. * Doc fixes Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fully specify `TextCatCNN` <-> `TextCatReduce` equivalence * Move TextCatCNN docs to legacy, in prep for moving to spacy-legacy * Add back a test for TextCatCNN.v2 * Replace TextCatCNN in pipe configurations and templates * Add an infobox to the `TextCatReduce` section with an `TextCatCNN` anchor * Add last reduction (`use_reduce_last`) * Remove non-working TextCatCNN Netlify redirect * Revert layer changes for the quickstart * Revert one more quickstart change * Remove unused import * Fix docstring * Fix setting name in error message --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-12-21 11:00:06 +01:00
Daniël de Kok	57203fa0fc	Fix `TransitionBasedParser` version in transformer embeddings docs	2023-12-19 09:28:20 +01:00
Raphael Mitsch	d56ee65ddf	Document `spacy-llm`'s `TranslationTask` (#13183 ) * Describe translation task. * Fix references to examples and template. * Format.	2023-12-11 17:41:04 +01:00
Raphael Mitsch	e79a9c5acd	Document `spacy-llm`'s `RawTask` (#13180 ) * Add section on RawTask. * Fix API docs. * Update website/docs/api/large-language-models.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-12-11 17:14:12 +01:00
Daniël de Kok	e5ec45cb7e	Revert "Merge the parser refactor into `v4` (#10940 )" This reverts commit `a183db3cef`.	2023-12-08 20:23:08 +01:00
Raphael Mitsch	9fcd2bfa08	Add info on endpoint arg. (#13169 )	2023-12-05 12:46:29 +01:00
Raphael Mitsch	a25a3b996b	Merge pull request #13173 from explosion/docs/llm_main Sync `llm_develop` with `llm_main`	2023-12-04 16:46:21 +01:00
Raphael Mitsch	55ed2b4e82	Add documentation for EL task (#12988 ) * Add documentation for EL task. * Fix EL factory name. * Add llm_entity_linker_mentio. * Apply suggestions from code review Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update EL task docs. * Update EL task docs. * Update EL task docs. * Update EL task docs. * Update EL task docs. * Update EL task docs. * Update EL task docs. * Update EL task docs. * Update EL task docs. * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Incorporate feedback. * Format. * Fix link to KB data. --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2023-12-04 15:23:28 +01:00
Adriane Boyd	e467573550	Docs: update trf_data examples and pipeline design info (#13164 )	2023-12-04 15:15:54 +01:00
Raphael Mitsch	0e43fca036	Add Claude-2.1 mention. (#13167 )	2023-12-01 16:48:35 +01:00
Daniël de Kok	da7ad97519	Update `TextCatBOW` to use the fixed `SparseLinear` layer (#13149 ) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: https://github.com/explosion/thinc/pull/754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import	2023-11-29 09:11:54 +01:00
Ines Montani	8f69e56a5a	Add swag [ci skip]	2023-11-20 14:42:01 +01:00
Lise	b6e022381d	Feature/nn and fo language extensions (#13116 ) * add language extensions for norwegian nynorsk and faroese * update docstring for nn/examples.py * use relative imports * add fo and nn tokenizers to pytest fixtures * add unittests for fo and nn and fix bug in nn * remove module docstring from fo/__init__.py * add comments about example sentences' origin * add license information to faroese data credit * format unittests using black * add __init__ files to test/lang/nn and tests/lang/fo * fix import order and use relative imports in fo/__nit__.py and nn/__init__.py * Make the tests a bit more compact * Add fo and nn to website languages * Add note about jul. * Add "jul." as exception --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-11-20 07:49:59 +01:00
ajbond	9f2ce6bb00	Add Redfield NLP Nodes to the Spacy Universe (#13133 )	2023-11-17 09:48:02 +01:00
Raphael Mitsch	b2e831d966	LLM docs: OpenAI model update (#13119 ) * Update supported OpenAI models. * Update with new GPT-3.5 and GPT-4 versions. * Add links to OpenAI model docs.	2023-11-08 17:55:16 +01:00
Adriane Boyd	513bbd5fa3	Add preferred use of build for package CLI (#13109 ) Build with `build` if available. Warn and fall back to previous `setup.py`-based builds if `build` build fails.	2023-11-08 17:35:24 +01:00
Sofie Van Landeghem	a804b83a4b	Update llm docs to clarify task-specific factories (#13082 ) * fix typo * add examples to specify custom model for task-specific factory	2023-10-31 22:07:07 +01:00
Sofie Van Landeghem	48248c62b6	Clarify EL example in docs (#13071 ) * add comment that pipeline is a custom one * add link to NEL tutorial * prettier * revert prettier reformat * revert prettier reformat (2) * fix typo Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-10-31 21:58:29 +01:00
Raphael Mitsch	0c15876502	Fix spancat typo. (#13095 )	2023-10-31 13:45:10 +01:00
Raphael Mitsch	9deaac9786	Add note in docs on `score_weight` config if using a non-default `spans_key` for SpanCat (#13093 ) * Add note on score_weight if using a non-default span_key for SpanCat. * Fix formatting. * Fix formatting. * Fix typo. * Use warning infobox. * Fix infobox formatting.	2023-10-30 17:02:08 +01:00
Raphael Mitsch	d72029d9c8	Add binary examples for Textcat task in `spacy-llm` (#13051 ) * Add examples for binary classification. * Fix example. * Remove binary textcat example. Format. * Rephrase.	2023-10-11 12:23:38 +02:00
Ines Montani	65e7bd54f5	Update usage sidebar and nav alert [ci skip]	2023-10-06 14:36:37 +02:00
Ines Montani	b83f1e3724	Inline displaCy visualizations in docs (#13050 ) [ci skip]	2023-10-06 14:22:43 +02:00
Raphael Mitsch	be29216fe2	Merge pull request #13044 from explosion/docs/llm_main Sync `master` with `docs/llm_main`	2023-10-05 16:10:19 +02:00
Raphael Mitsch	1162fcf099	Add Mistral mentions. (#13037 )	2023-10-05 14:44:38 +02:00

1 2 3 4 5 ...

3241 Commits