spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-14 16:12:39 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	0cdcfe56cb	Set version to v3.8.2	2024-10-01 16:47:24 +02:00
Matthew Honnibal	9c5b61bdff	isort	2024-10-01 12:38:51 +02:00
Matthew Honnibal	725ccbac39	Format	2024-10-01 12:38:02 +02:00
Matthew Honnibal	a8837beab7	Set version to v3.8.1	2024-10-01 12:37:11 +02:00
Matthew Honnibal	114b4894fb	Fix --require-parent default	2024-09-29 15:50:31 +02:00
Matthew Honnibal	dec13b4258	Fix inverted cli arg	2024-09-29 15:50:05 +02:00
Matthew Honnibal	c03f060527	Allow positive option --require-parent	2024-09-29 14:30:14 +02:00
Matthew Honnibal	6255cb985f	Include version constraint in parent package requirement	2024-09-29 14:22:21 +02:00
Matthew Honnibal	3b165a8716	Simplify setting to require parent package	2024-09-29 14:19:10 +02:00
Matthew Honnibal	969832f5d6	Fix package	2024-09-29 14:00:11 +02:00
Matthew Honnibal	8ce53a6bbe	Syntax	2024-09-29 13:51:44 +02:00
Matthew Honnibal	6fa0d709d5	Support option to not depend on parent package in spacy package	2024-09-29 13:51:04 +02:00
Matthew Honnibal	5010fcbd3a	Fix numpy constant	2024-09-14 13:13:11 +02:00
Matthew Honnibal	de4f19f3a3	Fix version	2024-09-14 13:12:44 +02:00
Matthew Honnibal	3d03565498	Replace numpy floats in evaluate and update	2024-09-14 12:55:53 +02:00
Matthew Honnibal	0576a1ff56	Fix numpy floats in meta.json	2024-09-14 12:54:08 +02:00
Matthew Honnibal	2f1e7ed09a	Lint	2024-09-14 11:36:27 +02:00
Matthew Honnibal	e2dc9b79e1	Format	2024-09-14 11:29:40 +02:00
Matthew Honnibal	3c3d75015b	Set version to v3.7.7	2024-09-14 11:27:32 +02:00
Matthew Honnibal	50aa3b5cbe	Merge branch 'master' of https://github.com/explosion/spaCy	2024-09-14 11:09:44 +02:00
Matthew Honnibal	8266031454	Merge numpy version update	2024-09-14 11:08:35 +02:00
Matthew Honnibal	69ecb85fad	Set version to v3.8.1	2024-09-13 10:43:40 +02:00
Matthew Honnibal	b427597fc8	Set version to v3.8.0	2024-09-11 21:32:26 +02:00
Matthew Honnibal	c068e1de1b	Fix dependencies	2024-09-11 15:57:52 +02:00
marinelay	b18cc94451	Delete unnecessary method (#13441 ) Co-authored-by: marinelay <marinelay@gmail.com>	2024-09-09 20:57:13 +02:00
Matthew Honnibal	4cc3ebe74e	Format	2024-09-09 20:56:01 +02:00
Matthew Honnibal	a019315534	Fix memory zones	2024-09-09 13:49:41 +02:00
Matthew Honnibal	59ac7e6bdb	Format	2024-09-09 11:22:52 +02:00
Matthew Honnibal	b65491b641	Set version to v3.8.0.dev0	2024-09-09 11:20:23 +02:00
Matthew Honnibal	1b8d560d0e	Support 'memory zones' for user memory management (#13621 ) Add a context manage nlp.memory_zone(), which will begin memory_zone() blocks on the vocab, string store, and potentially other components. Example usage: ``` with nlp.memory_zone(): for text in nlp.pipe(texts): do_something(doc) # do_something(doc) <-- Invalid ``` Once the memory_zone() block expires, spaCy will free any shared resources that were allocated for the text-processing that occurred within the memory_zone. If you create Doc objects within a memory zone, it's invalid to access them once the memory zone is expired. The purpose of this is that spaCy creates and stores Lexeme objects in the Vocab that can be shared between multiple Doc objects. It also interns strings. Normally, spaCy can't know when all Doc objects using a Lexeme are out-of-scope, so new Lexemes accumulate in the vocab, causing memory pressure. Memory zones solve this problem by telling spaCy "okay none of the documents allocated within this block will be accessed again". This lets spaCy free all new Lexeme objects and other data that were created during the block. The mechanism is general, so memory_zone() context managers can be added to other components that could benefit from them, e.g. pipeline components. I experimented with adding memory zone support to the tokenizer as well, for its cache. However, this seems unnecessarily complicated. It makes more sense to just stick a limit on the cache size. This lets spaCy benefit from the efficiency advantage of the cache better, because we can maintain a (bounded) cache even if only small batches of documents are being processed.	2024-09-09 11:19:39 +02:00
ykyogoku	608f65ce40	add Tibetan (#13510 )	2024-09-09 11:18:03 +02:00
Muzaffer Cikay	acbf2a428f	Add Kurdish Kurmanji language (#13561 ) * Add Kurdish Kurmanji language * Add lex_attrs	2024-09-09 11:15:40 +02:00
Mark Liberko	55db9c2e87	Added gd language folder (#13570 ) Implemented a foundational Scottish Gaelic (gd) language option with tokenizer_exceptions and stop_words files.	2024-09-09 11:14:09 +02:00
Matthew Honnibal	319e02545c	Set version to 3.7.6	2024-08-20 12:16:08 +02:00
Matthew Honnibal	a8accc3396	Use cibuildwheel to build wheels (#13603 ) * Add workflow files for cibuildwheel * Add config for cibuildwheel * Set version for experimental prerelease * Try updating cython * Skip 32-bit windows builds * Revert "Try updating cython" This reverts commit `c1b794ab5c`. * Try to import cibuildwheel settings from previous setup	2024-08-20 12:15:05 +02:00
Sofie Van Landeghem	82fc2ecfa5	Bump version to 3.7.5 (#13493 )	2024-05-15 12:11:33 +02:00
Alex Strick van Linschoten	045cd43c3f	Fix typos in docs (#13466 ) * fix typos * prettier formatting --------- Co-authored-by: svlandeg <svlandeg@github.com>	2024-04-29 11:10:17 +02:00
Sofie Van Landeghem	2e2334632b	Fix use_gold_ents behaviour for EntityLinker (#13400 ) * fix type annotation in docs * only restore entities after loss calculation * restore entities of sample in initialization * rename overfitting function * fix EL scorer * Relax test * fix formatting * Update spacy/pipeline/entity_linker.py Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * rename to _ensure_ents * further rename * allow for scorer to be None --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2024-04-16 12:00:22 +02:00
Joe Schiff	2e96797696	Convert properties to decorator syntax (#13390 )	2024-04-16 11:51:14 +02:00
Matthew Honnibal	0518c36f04	Sanitize direct download (#13313 ) The 'direct' option in 'spacy download' is supposed to only download from our model releases repository. However, users were able to pass in a relative path, allowing download from arbitrary repositories. This meant that a service that sourced strings from user input and which used the direct option would allow users to install arbitrary packages.	2024-02-20 13:17:51 +01:00
Daniël de Kok	bff8725f4b	Set version to 3.7.4 (#13327 )	2024-02-14 14:46:28 +01:00
Daniël de Kok	fdfdbcd9f4	Make `Language.pipe` workers exit cleanly (#13321 ) Also warn when any worker exited with a non-zero exit code and modify test to ensure that workers exit cleanly by default.	2024-02-12 14:39:38 +01:00
Daniël de Kok	e1249d3722	Test if closing explicitly solves recursive lock issues (#13304 )	2024-02-05 10:07:03 +01:00
Daniël de Kok	40422ff904	Set version to 3.7.3 (#13301 )	2024-02-02 13:51:26 +01:00
Daniël de Kok	2dbb332cea	`TextCatParametricAttention.v1`: set key transform dimensions (#13249 ) * TextCatParametricAttention.v1: set key transform dimensions This is necessary for tok2vec implementations that initialize lazily (e.g. curated transformers). * Add lazily-initialized tok2vec to simulate transformers Add a lazily-initialized tok2vec to the tests and test the current textcat models with it. Fix some additional issues found using this test. * isort * Add `test.` prefix to `LazyInitTok2Vec.v1`	2024-02-02 13:01:59 +01:00
Daniël de Kok	68d7841df5	Extension serialization attr tests: add teardown (#13284 ) The doc/token extension serialization tests add extensions that are not serializable with pickle. This didn't cause issues before due to the implicit run order of tests. However, test ordering has changed with pytest 8.0.0, leading to failed tests in test_language. Update the fixtures in the extension serialization tests to do proper teardown and remove the extensions.	2024-01-29 13:51:56 +01:00
Eliana Vornov	00e938a7c3	add custom code support to CLI speed benchmark (#13247 ) * add custom code support to CLI speed benchmark * sort imports * better copying for warmup docs	2024-01-26 13:29:22 +01:00
Daniël de Kok	a8894a8946	Merge pull request #13240 from mauricesvp/patch-1 Fix typo in method name	2024-01-23 20:49:21 +01:00
Daniël de Kok	afac7fb650	test_find_available_port: use port 5001 (#13255 ) macOS now uses port 5000 for the AirPlay receiver functionality, so this test will always fail on a macOS desktop (unless AirPlay receiver functionality is disabled like in CI).	2024-01-23 20:11:16 +01:00
Daniël de Kok	5a2ad4af4b	Merge remote-tracking branch 'upstream/master' into patch-1	2024-01-23 19:53:20 +01:00

1 2 3 4 5 ...

9422 Commits