spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-03-07 05:11:27 +03:00

Author	SHA1	Message	Date
Ines Montani	0b43518611	Add spacy-layout [ci skip]	2024-11-19 10:43:53 +01:00
Matthew Honnibal	7a7f191220	Usage page on memory management, explaining memory zones and doc_cleaner (#13643 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-10-23 12:43:18 +02:00
Ikko Eltociear Ashimine	e5910847a9	docs: update rule-based-matching.mdx (#13665 ) [ci skip]	2024-10-23 12:43:18 +02:00
Sergei Pashakhin	5056e6b3cf	Fix typo (#13657 ) [ci skip]	2024-10-23 12:43:18 +02:00
thjbdvlt	54791f664f	universe-pipeline-solipCysme-french (#13627 ) [ci skip]	2024-10-11 11:26:41 +02:00
Ines Montani	95d56aad14	Fix universe.json [ci skip]	2024-10-11 11:24:50 +02:00
aravind-mc	c2e424347f	Update universe.json to add my spaCy online course (#13632 ) [ci skip]	2024-10-11 11:22:15 +02:00
Ines Montani	a60be278e4	Fix landing banner links [ci skip]	2024-10-11 11:19:20 +02:00
William Mattingly	2e1afb740e	Added Date spaCy to universe (#13415 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:29:50 +02:00
William Mattingly	375a466784	added spacy whisper to universe (#13418 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:29:50 +02:00
William Mattingly	fb2151c505	added spacy annoy to universe (#13416 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:29:49 +02:00
William Mattingly	3577bf5b3d	updated universe for number spacy (#13424 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:29:49 +02:00
William Mattingly	5cf338b480	added bagpipes-spacy to universe (#13425 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:29:49 +02:00
thjbdvlt	bdbf1cb30e	universe-project-presque (#13515 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:29:49 +02:00
thjbdvlt	b90e33469e	universe-package-quelquhui (#13514 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:29:49 +02:00
Oren Halvani	573b2f2c09	Added: Constituent-Treelib to: universe.json (#13432 ) [ci skip] Co-authored-by: Halvani <>	2024-09-10 14:29:49 +02:00
William Mattingly	1f94cecc33	added gliner-spacy to universe (#13417 ) [ci skip] Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:29:49 +02:00
Ines Montani	1e0cc44994	Add case study [ci skip]	2024-06-26 09:41:37 +02:00
Sofie Van Landeghem	04ad6d179a	fix docs for MorphAnalysis.__contains__ (#13433 )	2024-05-02 16:48:07 +02:00
Alex Strick van Linschoten	b33123f5a2	Fix typos in docs (#13466 ) * fix typos * prettier formatting --------- Co-authored-by: svlandeg <svlandeg@github.com>	2024-04-29 11:11:42 +02:00
Yaseen	01a1bce560	Update code.module.sass to make code title sticky (#13379 )	2024-03-26 12:18:18 +01:00
Ines Montani	c5d0f1e376	Move DocSearch key to env var [ci skip]	2024-03-25 10:18:12 +01:00
Daniël de Kok	c39596aabb	Update example that shows model in requirments (#13302 ) See #13293.	2024-02-12 14:36:18 +01:00
Daniël de Kok	0c46b69022	Merge branch 'master' into spacy.io	2024-02-05 11:53:49 +01:00
Daniël de Kok	e1249d3722	Test if closing explicitly solves recursive lock issues (#13304 )	2024-02-05 10:07:03 +01:00
Daniël de Kok	40422ff904	Set version to 3.7.3 (#13301 )	2024-02-02 13:51:26 +01:00
Daniël de Kok	2dbb332cea	`TextCatParametricAttention.v1`: set key transform dimensions (#13249 ) * TextCatParametricAttention.v1: set key transform dimensions This is necessary for tok2vec implementations that initialize lazily (e.g. curated transformers). * Add lazily-initialized tok2vec to simulate transformers Add a lazily-initialized tok2vec to the tests and test the current textcat models with it. Fix some additional issues found using this test. * isort * Add `test.` prefix to `LazyInitTok2Vec.v1`	2024-02-02 13:01:59 +01:00
Daniël de Kok	d84068e460	Run slow tests: v4 -> main (#13290 ) * Run slow tests: v4 -> main * Also update the branch in GPU tests	2024-01-30 13:58:28 +01:00
Sofie Van Landeghem	89a43f39b7	update universe description (#13291 )	2024-01-30 13:49:49 +01:00
Daniël de Kok	68d7841df5	Extension serialization attr tests: add teardown (#13284 ) The doc/token extension serialization tests add extensions that are not serializable with pickle. This didn't cause issues before due to the implicit run order of tests. However, test ordering has changed with pytest 8.0.0, leading to failed tests in test_language. Update the fixtures in the extension serialization tests to do proper teardown and remove the extensions.	2024-01-29 13:51:56 +01:00
Sofie Van Landeghem	9b1b091b1a	Clarify data_path loading for apply CLI command (#13272 ) * attempt to clarify additional annotations on .spacy file * suggestion by Daniël * pipeline instead of pipe	2024-01-26 15:56:49 +01:00
Sofie Van Landeghem	259b9dd593	Clarify vocab docs (#13273 ) * add line to ensure that apple is in fact in the vocab * add that the vocab may be empty	2024-01-26 15:56:43 +01:00
Eliana Vornov	00e938a7c3	add custom code support to CLI speed benchmark (#13247 ) * add custom code support to CLI speed benchmark * sort imports * better copying for warmup docs	2024-01-26 13:29:22 +01:00
Sofie Van Landeghem	68b85ea950	Clarify data_path loading for apply CLI command (#13272 ) * attempt to clarify additional annotations on .spacy file * suggestion by Daniël * pipeline instead of pipe	2024-01-26 12:10:05 +01:00
Sofie Van Landeghem	7496e03a2c	Clarify vocab docs (#13273 ) * add line to ensure that apple is in fact in the vocab * add that the vocab may be empty	2024-01-26 10:58:48 +01:00
Sofie Van Landeghem	c749eb5570	fix typo (#13254 )	2024-01-24 09:32:57 +01:00
Sofie Van Landeghem	a493981163	fix typo (#13254 )	2024-01-24 09:29:57 +01:00
Daniël de Kok	a8894a8946	Merge pull request #13240 from mauricesvp/patch-1 Fix typo in method name	2024-01-23 20:49:21 +01:00
Daniël de Kok	afac7fb650	test_find_available_port: use port 5001 (#13255 ) macOS now uses port 5000 for the AirPlay receiver functionality, so this test will always fail on a macOS desktop (unless AirPlay receiver functionality is disabled like in CI).	2024-01-23 20:11:16 +01:00
Daniël de Kok	5a2ad4af4b	Merge remote-tracking branch 'upstream/master' into patch-1	2024-01-23 19:53:20 +01:00
Daniël de Kok	128197a5fc	Properly clean up pipe multiprocessing workers (#13259 ) Before this change, the workers of pipe call with n_process != 1 were stopped by calling `terminate` on the processes. However, terminating a process can leave queues, pipes, and other concurrent data structures in an invalid state. With this change, we stop using terminate and take the following approach instead: * When the all documents are processed, the parent process puts a sentinel in the queue of each worker. * The parent process then calls `join` on each worker process to let them finish up gracefully. * Worker processes break from the queue processing loop when the sentinel is encountered, so that they exit. We need special handling when one of the workers encounters an error and the error handler is set to raise an exception. In this case, we cannot rely on the sentinel to finish all workers -- the queue is a FIFO queue and there may be other work queued up before the sentinel. We use the following approach to handle error scenarios: * The parent puts the end-of-work sentinel in the queue of each worker. * The parent closes the reading-end of the channel of each worker. * Then: - If the worker was waiting for work, it will encounter the sentinel and break from the processing loop. - If the worker was processing a batch, it will attempt to write results to the channel. This will fail because the channel was closed by the parent and the worker will break from the processing loop.	2024-01-23 18:33:04 +01:00
Raphael Mitsch	465848cbc6	Fix LLM docs on task factories. (cherry picked from commit `575c405ae3`)	2024-01-19 16:58:26 +01:00
Raphael Mitsch	3b3b5cdc63	Merge pull request #13253 from explosion/chore/sync-master-with-llm_main Sync `master` with `docs/llm_main`	2024-01-19 16:50:43 +01:00
Raphael Mitsch	575c405ae3	Fix LLM docs on task factories.	2024-01-19 16:48:54 +01:00
Raphael Mitsch	256468c414	Merge branch 'docs/llm_main' into chore/sync-master-with-llm_main # Conflicts: # website/docs/api/large-language-models.mdx	2024-01-19 16:34:35 +01:00
Raphael Mitsch	91c24c0285	Merge pull request #13251 from explosion/docs/llm_develop Sync `docs/llm_main` with `docs/llm_develop`	2024-01-19 12:56:38 +01:00
maurice	c608baeecc	Fix typo in method name	2024-01-16 21:54:54 +01:00
Raphael Mitsch	0062c22c35	Updated docs w.r.t. infinite doc length changes (#13214 ) * Updated docs w.r.t. infinite doc length. * Fix typo. * fix typo's * Fix table formatting. * Update formatting. --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2024-01-05 14:20:58 +01:00
Daniël de Kok	e2a3952de5	Add spacy.TextCatParametricAttention.v1 (#13201 ) * Add spacy.TextCatParametricAttention.v1 This layer provides is a simplification of the ensemble classifier that only uses paramteric attention. We have found empirically that with a sufficient amount of training data, using the ensemble classifier with BoW does not provide significant improvement in classifier accuracy. However, plugging in a BoW classifier does reduce GPU training and inference performance substantially, since it uses a GPU-only kernel. * Fix merge fallout	2024-01-02 10:03:06 +01:00
Daniël de Kok	7ebba86402	Add TextCatReduce.v1 (#13181 ) * Add TextCatReduce.v1 This is a textcat classifier that pools the vectors generated by a tok2vec implementation and then applies a classifier to the pooled representation. Three reductions are supported for pooling: first, max, and mean. When multiple reductions are enabled, the reductions are concatenated before providing them to the classification layer. This model is a generalization of the TextCatCNN model, which only supports mean reductions and is a bit of a misnomer, because it can also be used with transformers. This change also reimplements TextCatCNN.v2 using the new TextCatReduce.v1 layer. * Doc fixes Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fully specify `TextCatCNN` <-> `TextCatReduce` equivalence * Move TextCatCNN docs to legacy, in prep for moving to spacy-legacy * Add back a test for TextCatCNN.v2 * Replace TextCatCNN in pipe configurations and templates * Add an infobox to the `TextCatReduce` section with an `TextCatCNN` anchor * Add last reduction (`use_reduce_last`) * Remove non-working TextCatCNN Netlify redirect * Revert layer changes for the quickstart * Revert one more quickstart change * Remove unused import * Fix docstring * Fix setting name in error message --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-12-21 11:00:06 +01:00

1 2 3 4 5 ...

16269 Commits