spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-03-07 13:21:46 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	bc3337d539	Fix doc string	2024-04-16 12:24:45 +02:00
Matthew Honnibal	ef18829fd9	Fix comment	2024-04-16 12:23:25 +02:00
Matthew Honnibal	0c8393ef8e	Fix comment	2024-04-16 12:13:54 +02:00
Matthew Honnibal	42a373db5c	Fix error	2024-04-16 12:13:34 +02:00
svlandeg	0fd797e33c	fix warning numbers	2023-07-05 10:35:41 +02:00
svlandeg	8a79a71190	Merge branch 'master' into feat/add-pipe-instance	2023-07-05 10:35:25 +02:00
Tom Aarsen	eab929361d	Use 'exclude' instead of 'disable' (#12783 ) as suggested by @svlandeg	2023-07-04 11:45:13 +02:00
Marcus Blättermann	bd239511a4	Fix problem with missing syntax highlighting languages causing runtime crash on the website (#12781 ) * Fix problem with universe pages using `docker` language * Fix problem with universe pages using `r` language * Add fallback, in case code language is unknown	2023-07-03 10:24:25 +02:00
Daniël de Kok	57a230c6e4	Remove section about parallel training with Ray (#12770 ) The Ray integration is currently broken, having these docs around suggest that this functionality is currently available.	2023-06-28 17:09:57 +02:00
Adriane Boyd	fb0da3e097	Support custom token/lexeme attribute for vectors (#12625 ) * Support custom token/lexeme attribute for vectors * Fix imports * Back off to ORTH without Vectors.attr * Fallback if vectors.attr doesn't exist * Update docs	2023-06-28 09:43:14 +02:00
Adriane Boyd	337a360cc7	Use spans_ prefix for default span finder scores (#12753 )	2023-06-27 19:32:17 +02:00
Adriane Boyd	65f6c9cd10	Support overriding registered functions in configs (#12623 ) Support overriding registered functions in configs. Previously the registry name was parsed as a section name rather than as a registry name.	2023-06-27 17:36:33 +02:00
Adriane Boyd	c067b5264c	Address issues with source with component names and replacing listeners (#12701 ) When sourcing a component, the object from the original pipeline is added to the new pipeline as the same object. This creates a situation where there are several attributes that cannot be in sync between the original pipeline and the new pipeline at the same time for this one object: * component.name * component.listener_map / component.listening_components for tok2vec and transformer When running replace_listeners on a component, the config is not updated correctly if the state of the component is incorrect for the current pipeline (in particular changes that should be applied from model.attrs["replace_listener_cfg"] as used in spacy-transformers) due to the fact that: * find_listeners relies on component.name to set the name in the listener_map * replace_listeners relies on listener_map to determine how to modify the configs In addition, there are several places where pipeline components are modified and the listener map and/or internal component names aren't currently updated. In cases where there is a component shared by two pipelines that cannot be in sync, this PR chooses to prioritize the most recently modified or initialized pipeline. There is no actual solution with the current source behavior that will make both pipelines usable, so the current pipeline is updated whenever components are added/renamed/removed or the pipeline is initialized for training.	2023-06-27 10:47:07 +02:00
svlandeg	9fcbc8eb67	add pipe_instances also to load_model_from_init_py	2023-06-26 11:46:09 +02:00
svlandeg	4cc5bd3ef5	fix	2023-06-26 11:25:16 +02:00
svlandeg	dcd8a765fd	Merge branch 'master' into feat/add-pipe-instance	2023-06-26 11:04:40 +02:00
Adriane Boyd	e1664217f5	Add spancat_singlelabel to debug data CLI (#12749 )	2023-06-26 10:25:20 +02:00
Adriane Boyd	cb4fdc83e4	Merge pull request #12742 from adrianeboyd/chore/v3.6.0 Set version to v3.6.0	2023-06-21 15:34:28 +02:00
Adriane Boyd	34971bcbd1	Set version to v3.6.0	2023-06-21 12:59:36 +02:00
Adriane Boyd	dd5e00c735	Temporarily skip tests for compat table	2023-06-21 12:59:36 +02:00
Sofie Van Landeghem	d3ac8e897c	default value for phrasematcher in pyi (#12714 )	2023-06-21 10:10:13 +02:00
Tom Aarsen	93983f08fc	Add SpanMarker for NER to spaCy universe (#12730 ) * Add SpanMarker for NER to spaCy universe * Escape the newlines in the text in the code example Or at least, attempt to * Remove now unnecessary import * Disable NER pipeline component in code example	2023-06-20 16:47:44 +02:00
David Berenstein	53c400bd7a	docs: added reference to `spacy-setfit` to the spaCy Universe (#12737 ) * docs: added reference to spacy-setfit * removed package import after adding factory entry points to packages	2023-06-19 15:52:07 +02:00
Ziad Amerr	3125b97ace	Fixed e941 link rendering by removing the dot (#12735 )	2023-06-19 13:31:08 +02:00
Marcus Blättermann	7e4b38c841	Fix #12716 does not update the `config` generation section (#12718 ) This is a really odd bug, where Firefox doesn't re-render the `code` element, even though `children` changed. Two things fixed that: - remove the `language-ini` `className` - replace the `code` block with a `div` Both are not ideal. Therefor this solution adds an inner `div` that now has the classes while still maintaining the semantic `code` element. I couldn't find any explanation for why this is happening and why it only happens in Firefox. I assume it is a bug caused by one of our many dependencies (or their interplay) To make matters worse: This bug doesn't occure when running the site in dev mode. You have to build and serve the site to recreate it.	2023-06-19 09:34:28 +02:00
Daniël de Kok	e73c1a89bf	CI: add isort --check to validate job (#12727 )	2023-06-15 23:10:25 +01:00
Daniël de Kok	e2b70df012	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 ) * Use isort with Black profile * isort all the things * Fix import cycles as a result of import sorting * Add DOCBIN_ALL_ATTRS type definition * Add isort to requirements * Remove isort from build dependencies check * Typo	2023-06-14 17:48:41 +02:00
Matthew Honnibal	77a08591ad	Format	2023-06-13 10:06:00 +02:00
Jacobo Myerston	daa6e0339f	Update universe.json (#12709 ) * Update universe.json * Update universe.json add some missing commas in the greCy's description.	2023-06-12 13:55:20 +02:00
Matthew Honnibal	9753484b94	Fix conflicts	2023-06-11 13:27:46 +02:00
Matthew Honnibal	afbdd8259a	Fix find missing pipes	2023-06-10 17:54:37 +02:00
Matthew Honnibal	b9730a64cb	Format	2023-06-10 16:56:10 +02:00
Matthew Honnibal	4332d12ce2	Support adding pipeline component by instance	2023-06-10 16:55:52 +02:00
Matthew Honnibal	aa0d747739	Support adding pipeline component by instance	2023-06-10 16:55:13 +02:00
Matthew Honnibal	6f821efaf3	Add errors for pipe instance problems	2023-06-10 16:53:59 +02:00
Sofie Van Landeghem	d65e3c31a6	use system-independent commands (#12693 )	2023-06-08 11:43:36 +02:00
Adriane Boyd	0f9d2b01fb	Set version v3.6.0.dev1 (#12703 )	2023-06-07 16:23:14 +02:00
kadarakos	c003aac29a	SpanFinder into spaCy from experimental (#12507 ) * span finder integrated into spacy from experimental * black * isort * black * default spankey constant * black * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * rename * rename * max_length and min_length as Optional[int] and strict checking * black * mypy fix for integer type infinity * revert line order * implement all comparison operators for inf int * avoid two for loops over all docs by not precomputing * interleave thresholding with span creation * black * revert to not interleaving (relized its faster) * black * Update spacy/errors.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * update dosctring * enforce that the gold and predicted documents have the same text * new error for ensuring reference and predicted texts are the same * remove todo * adjust test * black * handle misaligned tokenization * return correct variable * failing overfit test * only use a single spans_key like in spancat * black * remove debug lines * typo * remove comment * remove near duplicate reduntant method * use the 'spans_key' variable name everywhere * Update spacy/pipeline/span_finder.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * flaky test fix suggestion, hand set bias terms * only test suggester and test result exhaustively * make it clear that the span_finder_suggester is more general (not specific to span_finder) * Update spacy/tests/pipeline/test_span_finder.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Apply suggestions from code review * remove question comment * move preset_spans_suggester test to spancat tests * Add docs and unify default configs for spancat and span finder * Add `allow_overlap=True` to span finder scorer * Fix offset bug in set_annotations * Ignore labels in span finder scorer * Format * Add span_finder to quickstart template * Move settings to self.cfg, store min/max unset as None * Remove debugging * Update docstrings and docs * Update spacy/pipeline/span_finder.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix imports --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-06-07 15:52:28 +02:00
Basile Dura	c3c064ace4	fix: `InitializableComponent` type hints (#12692 ) * fix: InitializableComponent type hints * fix: avoid circular dependency * style: clean imports in language.py * style: use relative imports Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * fix: apply black --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-06-02 14:29:52 +02:00
Adriane Boyd	c4112a1da3	Require that all SpanGroup spans are from the current doc (#12569 ) * Require that all SpanGroup spans are from the current doc The restriction on only adding spans from the current doc were already implemented for all operations except for `SpanGroup.__init__`. Initialize copied spans for `SpanGroup.copy` with `Doc.char_span` in order to validate the character offsets and to make it possible to copy spans between documents with differing tokenization. Currently there is no validation that the document texts are identical, but the span char offsets must be valid spans in the target doc, which prevents you from ending up with completely invalid spans. * Undo change in test_beam_overfitting_IO	2023-06-01 19:19:17 +02:00
Isabel Zimmerman	05df59fd4a	[DOCS] add vetiver to spacy universe (#12557 ) * add vetiver to spacy universe * remove image * update logo to render correctly in thumbnail * apply Basil's suggestion Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * refer to the same model --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-06-01 17:11:18 +02:00
Adriane Boyd	c936db2faf	Address numpy 1.25 deprecations in test suite (#12684 ) * Address upcoming numpy v1.25 deprecations in test suite * Temporarily test most recent numpy prerelease in CI * Revert "Temporarily test most recent numpy prerelease in CI" This reverts commit `d75a66e55e`.	2023-05-31 17:23:07 +02:00
Adriane Boyd	9b7a59c325	Revert "CI: Disable fail-fast (#12658 )" (#12676 ) This reverts commit `1f088cbf4a`.	2023-05-26 10:57:02 +02:00
Vinit Ravishankar	f0e0206b77	update universe for spacypdfreader (#12661 )	2023-05-23 13:28:48 +02:00
Adriane Boyd	1f088cbf4a	CI: Disable fail-fast (#12658 ) While the typing_extensions/pydantic `Literal` bugs are being sorted out, disable fail-fast so the rest of the CI is available for development purposes.	2023-05-23 10:48:06 +02:00
Basile Dura	6ea4155487	feat: add comparison operators in `span.pyi` (#12652 ) * feat: add comparison operators in span.pyi remove Cython-specific `__richcmp__` * fix: comparison operators should be defined for any other object	2023-05-23 08:50:37 +02:00
Victoria	6930a6bf45	Add spaCy VSCode extension materials (#12592 )	2023-05-19 14:38:53 +02:00
Basile Dura	95fd46b1dd	feat: add type hinting on SpanGroup.__iter__ (#12642 )	2023-05-17 14:20:00 +02:00
Adriane Boyd	df083f91a5	Add Malay to website languages (#12643 )	2023-05-17 13:13:43 +02:00
Sani	873c16a4df	Malay language support (#12602 ) * add malay lang * fix token len * black format * reformat conftest malay * remove exceptions not exist in dbp * format code	2023-05-17 12:45:21 +02:00

1 2 3 4 5 ...

15963 Commits