spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-08-05 21:00:19 +03:00

Author	SHA1	Message	Date
William Mattingly	30f1f33e78	Added Date spaCy to universe (#13415 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:29:03 +02:00
William Mattingly	f1a5ff9dba	added spacy whisper to universe (#13418 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:28:00 +02:00
William Mattingly	c80dacd046	added spacy annoy to universe (#13416 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:26:21 +02:00
William Mattingly	7fbbb2002a	updated universe for number spacy (#13424 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:25:23 +02:00
William Mattingly	89c1774d43	added bagpipes-spacy to universe (#13425 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:24:06 +02:00
thjbdvlt	081e4e385d	universe-project-presque (#13515 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:21:41 +02:00
thjbdvlt	0190e669c5	universe-package-quelquhui (#13514 ) [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:17:33 +02:00
Oren Halvani	54dc4ee8fb	Added: Constituent-Treelib to: universe.json (#13432 ) [ci skip] Co-authored-by: Halvani <>	2024-09-10 14:13:36 +02:00
William Mattingly	5a7ad5572c	added gliner-spacy to universe (#13417 ) [ci skip] Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Ines Montani <ines@ines.io>	2024-09-10 14:12:52 +02:00
marinelay	b18cc94451	Delete unnecessary method (#13441 ) Co-authored-by: marinelay <marinelay@gmail.com>	2024-09-09 20:57:13 +02:00
Matthew Honnibal	4cc3ebe74e	Format	2024-09-09 20:56:01 +02:00
Matthew Honnibal	a019315534	Fix memory zones	2024-09-09 13:49:41 +02:00
Matthew Honnibal	59ac7e6bdb	Format	2024-09-09 11:22:52 +02:00
Matthew Honnibal	b65491b641	Set version to v3.8.0.dev0	2024-09-09 11:20:23 +02:00
Matthew Honnibal	1b8d560d0e	Support 'memory zones' for user memory management (#13621 ) Add a context manage nlp.memory_zone(), which will begin memory_zone() blocks on the vocab, string store, and potentially other components. Example usage: ``` with nlp.memory_zone(): for text in nlp.pipe(texts): do_something(doc) # do_something(doc) <-- Invalid ``` Once the memory_zone() block expires, spaCy will free any shared resources that were allocated for the text-processing that occurred within the memory_zone. If you create Doc objects within a memory zone, it's invalid to access them once the memory zone is expired. The purpose of this is that spaCy creates and stores Lexeme objects in the Vocab that can be shared between multiple Doc objects. It also interns strings. Normally, spaCy can't know when all Doc objects using a Lexeme are out-of-scope, so new Lexemes accumulate in the vocab, causing memory pressure. Memory zones solve this problem by telling spaCy "okay none of the documents allocated within this block will be accessed again". This lets spaCy free all new Lexeme objects and other data that were created during the block. The mechanism is general, so memory_zone() context managers can be added to other components that could benefit from them, e.g. pipeline components. I experimented with adding memory zone support to the tokenizer as well, for its cache. However, this seems unnecessarily complicated. It makes more sense to just stick a limit on the cache size. This lets spaCy benefit from the efficiency advantage of the cache better, because we can maintain a (bounded) cache even if only small batches of documents are being processed.	2024-09-09 11:19:39 +02:00
ykyogoku	608f65ce40	add Tibetan (#13510 )	2024-09-09 11:18:03 +02:00
Muzaffer Cikay	acbf2a428f	Add Kurdish Kurmanji language (#13561 ) * Add Kurdish Kurmanji language * Add lex_attrs	2024-09-09 11:15:40 +02:00
Mark Liberko	55db9c2e87	Added gd language folder (#13570 ) Implemented a foundational Scottish Gaelic (gd) language option with tokenizer_exceptions and stop_words files.	2024-09-09 11:14:09 +02:00
Matthew Honnibal	319e02545c	Set version to 3.7.6	2024-08-20 12:16:08 +02:00
Matthew Honnibal	a8accc3396	Use cibuildwheel to build wheels (#13603 ) * Add workflow files for cibuildwheel * Add config for cibuildwheel * Set version for experimental prerelease * Try updating cython * Skip 32-bit windows builds * Revert "Try updating cython" This reverts commit `c1b794ab5c`. * Try to import cibuildwheel settings from previous setup	2024-08-20 12:15:05 +02:00
Ines Montani	8cda27aefa	Add case study [ci skip]	2024-06-26 09:41:23 +02:00
Matthew Honnibal	f78e5ce732	Disable extra CI	2024-06-21 14:32:00 +02:00
Sofie Van Landeghem	a6d0fc3602	Remove typing-extensions from requirements (#13516 )	2024-05-31 19:20:46 +02:00
Sofie Van Landeghem	82fc2ecfa5	Bump version to 3.7.5 (#13493 )	2024-05-15 12:11:33 +02:00
Sofie Van Landeghem	c195ca4f9c	fix docs for MorphAnalysis.__contains__ (#13433 )	2024-05-02 16:46:41 +02:00
Sofie Van Landeghem	d3a232f773	Update LICENSE to include 2024 (#13472 )	2024-04-30 09:17:59 +02:00
Sofie Van Landeghem	ecd85d2618	Update Typer pin and GH actions (#13471 ) * update gh actions * pin typer upperbound to 1.0.0	2024-04-29 13:28:46 +02:00
Alex Strick van Linschoten	045cd43c3f	Fix typos in docs (#13466 ) * fix typos * prettier formatting --------- Co-authored-by: svlandeg <svlandeg@github.com>	2024-04-29 11:10:17 +02:00
Sofie Van Landeghem	74836524e3	Bump to v5 (#13470 )	2024-04-29 10:36:31 +02:00
Sofie Van Landeghem	6d6c10ab9c	Fix CI (#13469 ) * Remove hardcoded architecture setting * update classifiers to include Python 3.12	2024-04-29 10:18:07 +02:00
Sofie Van Landeghem	2e2334632b	Fix use_gold_ents behaviour for EntityLinker (#13400 ) * fix type annotation in docs * only restore entities after loss calculation * restore entities of sample in initialization * rename overfitting function * fix EL scorer * Relax test * fix formatting * Update spacy/pipeline/entity_linker.py Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * rename to _ensure_ents * further rename * allow for scorer to be None --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2024-04-16 12:00:22 +02:00
Joe Schiff	2e96797696	Convert properties to decorator syntax (#13390 )	2024-04-16 11:51:14 +02:00
Sofie Van Landeghem	f5e85fa05a	allow weasel 0.4.x (#13409 )	2024-04-04 12:55:08 +02:00
Yaseen	21aea59001	Update code.module.sass to make code title sticky (#13379 )	2024-03-26 12:15:25 +01:00
Sofie Van Landeghem	4dc5fe5469	Renamed main branch back to v4 for now (#13395 ) * Update gputests.yml * Update slowtests.yml	2024-03-26 09:53:07 +01:00
Ines Montani	1252370f69	Move DocSearch key to env var [ci skip]	2024-03-25 10:17:57 +01:00
Sofie Van Landeghem	d410d95b52	remove smart_open requirement as it's taken care of via Weasel (#13391 )	2024-03-22 18:21:20 +01:00
Matthew Honnibal	0518c36f04	Sanitize direct download (#13313 ) The 'direct' option in 'spacy download' is supposed to only download from our model releases repository. However, users were able to pass in a relative path, allowing download from arbitrary repositories. This meant that a service that sourced strings from user input and which used the direct option would allow users to install arbitrary packages.	2024-02-20 13:17:51 +01:00
Daniël de Kok	bff8725f4b	Set version to 3.7.4 (#13327 )	2024-02-14 14:46:28 +01:00
Daniël de Kok	fdfdbcd9f4	Make `Language.pipe` workers exit cleanly (#13321 ) Also warn when any worker exited with a non-zero exit code and modify test to ensure that workers exit cleanly by default.	2024-02-12 14:39:38 +01:00
Daniël de Kok	14bd9d89a3	Update example that shows model in requirments (#13302 ) See #13293.	2024-02-11 19:46:43 +01:00
Daniël de Kok	e1249d3722	Test if closing explicitly solves recursive lock issues (#13304 )	2024-02-05 10:07:03 +01:00
Daniël de Kok	40422ff904	Set version to 3.7.3 (#13301 )	2024-02-02 13:51:26 +01:00
Daniël de Kok	2dbb332cea	`TextCatParametricAttention.v1`: set key transform dimensions (#13249 ) * TextCatParametricAttention.v1: set key transform dimensions This is necessary for tok2vec implementations that initialize lazily (e.g. curated transformers). * Add lazily-initialized tok2vec to simulate transformers Add a lazily-initialized tok2vec to the tests and test the current textcat models with it. Fix some additional issues found using this test. * isort * Add `test.` prefix to `LazyInitTok2Vec.v1`	2024-02-02 13:01:59 +01:00
Daniël de Kok	d84068e460	Run slow tests: v4 -> main (#13290 ) * Run slow tests: v4 -> main * Also update the branch in GPU tests	2024-01-30 13:58:28 +01:00
Sofie Van Landeghem	89a43f39b7	update universe description (#13291 )	2024-01-30 13:49:49 +01:00
Daniël de Kok	68d7841df5	Extension serialization attr tests: add teardown (#13284 ) The doc/token extension serialization tests add extensions that are not serializable with pickle. This didn't cause issues before due to the implicit run order of tests. However, test ordering has changed with pytest 8.0.0, leading to failed tests in test_language. Update the fixtures in the extension serialization tests to do proper teardown and remove the extensions.	2024-01-29 13:51:56 +01:00
Eliana Vornov	00e938a7c3	add custom code support to CLI speed benchmark (#13247 ) * add custom code support to CLI speed benchmark * sort imports * better copying for warmup docs	2024-01-26 13:29:22 +01:00
Sofie Van Landeghem	68b85ea950	Clarify data_path loading for apply CLI command (#13272 ) * attempt to clarify additional annotations on .spacy file * suggestion by Daniël * pipeline instead of pipe	2024-01-26 12:10:05 +01:00
Sofie Van Landeghem	7496e03a2c	Clarify vocab docs (#13273 ) * add line to ensure that apple is in fact in the vocab * add that the vocab may be empty	2024-01-26 10:58:48 +01:00

1 2 3 4 5 ...

16154 Commits