spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-12 15:12:39 +03:00

Author	SHA1	Message	Date
Adriane Boyd	dd5e00c735	Temporarily skip tests for compat table	2023-06-21 12:59:36 +02:00
Sofie Van Landeghem	d3ac8e897c	default value for phrasematcher in pyi (#12714 )	2023-06-21 10:10:13 +02:00
Tom Aarsen	93983f08fc	Add SpanMarker for NER to spaCy universe (#12730 ) * Add SpanMarker for NER to spaCy universe * Escape the newlines in the text in the code example Or at least, attempt to * Remove now unnecessary import * Disable NER pipeline component in code example	2023-06-20 16:47:44 +02:00
David Berenstein	53c400bd7a	docs: added reference to `spacy-setfit` to the spaCy Universe (#12737 ) * docs: added reference to spacy-setfit * removed package import after adding factory entry points to packages	2023-06-19 15:52:07 +02:00
Ziad Amerr	3125b97ace	Fixed e941 link rendering by removing the dot (#12735 )	2023-06-19 13:31:08 +02:00
Marcus Blättermann	7e4b38c841	Fix #12716 does not update the `config` generation section (#12718 ) This is a really odd bug, where Firefox doesn't re-render the `code` element, even though `children` changed. Two things fixed that: - remove the `language-ini` `className` - replace the `code` block with a `div` Both are not ideal. Therefor this solution adds an inner `div` that now has the classes while still maintaining the semantic `code` element. I couldn't find any explanation for why this is happening and why it only happens in Firefox. I assume it is a bug caused by one of our many dependencies (or their interplay) To make matters worse: This bug doesn't occure when running the site in dev mode. You have to build and serve the site to recreate it.	2023-06-19 09:34:28 +02:00
Daniël de Kok	e73c1a89bf	CI: add isort --check to validate job (#12727 )	2023-06-15 23:10:25 +01:00
Daniël de Kok	e2b70df012	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 ) * Use isort with Black profile * isort all the things * Fix import cycles as a result of import sorting * Add DOCBIN_ALL_ATTRS type definition * Add isort to requirements * Remove isort from build dependencies check * Typo	2023-06-14 17:48:41 +02:00
Jacobo Myerston	daa6e0339f	Update universe.json (#12709 ) * Update universe.json * Update universe.json add some missing commas in the greCy's description.	2023-06-12 13:55:20 +02:00
Sofie Van Landeghem	d65e3c31a6	use system-independent commands (#12693 )	2023-06-08 11:43:36 +02:00
Adriane Boyd	0f9d2b01fb	Set version v3.6.0.dev1 (#12703 )	2023-06-07 16:23:14 +02:00
kadarakos	c003aac29a	SpanFinder into spaCy from experimental (#12507 ) * span finder integrated into spacy from experimental * black * isort * black * default spankey constant * black * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * rename * rename * max_length and min_length as Optional[int] and strict checking * black * mypy fix for integer type infinity * revert line order * implement all comparison operators for inf int * avoid two for loops over all docs by not precomputing * interleave thresholding with span creation * black * revert to not interleaving (relized its faster) * black * Update spacy/errors.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * update dosctring * enforce that the gold and predicted documents have the same text * new error for ensuring reference and predicted texts are the same * remove todo * adjust test * black * handle misaligned tokenization * return correct variable * failing overfit test * only use a single spans_key like in spancat * black * remove debug lines * typo * remove comment * remove near duplicate reduntant method * use the 'spans_key' variable name everywhere * Update spacy/pipeline/span_finder.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * flaky test fix suggestion, hand set bias terms * only test suggester and test result exhaustively * make it clear that the span_finder_suggester is more general (not specific to span_finder) * Update spacy/tests/pipeline/test_span_finder.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Apply suggestions from code review * remove question comment * move preset_spans_suggester test to spancat tests * Add docs and unify default configs for spancat and span finder * Add `allow_overlap=True` to span finder scorer * Fix offset bug in set_annotations * Ignore labels in span finder scorer * Format * Add span_finder to quickstart template * Move settings to self.cfg, store min/max unset as None * Remove debugging * Update docstrings and docs * Update spacy/pipeline/span_finder.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix imports --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-06-07 15:52:28 +02:00
Basile Dura	c3c064ace4	fix: `InitializableComponent` type hints (#12692 ) * fix: InitializableComponent type hints * fix: avoid circular dependency * style: clean imports in language.py * style: use relative imports Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * fix: apply black --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-06-02 14:29:52 +02:00
Adriane Boyd	c4112a1da3	Require that all SpanGroup spans are from the current doc (#12569 ) * Require that all SpanGroup spans are from the current doc The restriction on only adding spans from the current doc were already implemented for all operations except for `SpanGroup.__init__`. Initialize copied spans for `SpanGroup.copy` with `Doc.char_span` in order to validate the character offsets and to make it possible to copy spans between documents with differing tokenization. Currently there is no validation that the document texts are identical, but the span char offsets must be valid spans in the target doc, which prevents you from ending up with completely invalid spans. * Undo change in test_beam_overfitting_IO	2023-06-01 19:19:17 +02:00
Isabel Zimmerman	05df59fd4a	[DOCS] add vetiver to spacy universe (#12557 ) * add vetiver to spacy universe * remove image * update logo to render correctly in thumbnail * apply Basil's suggestion Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * refer to the same model --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-06-01 17:11:18 +02:00
Adriane Boyd	c936db2faf	Address numpy 1.25 deprecations in test suite (#12684 ) * Address upcoming numpy v1.25 deprecations in test suite * Temporarily test most recent numpy prerelease in CI * Revert "Temporarily test most recent numpy prerelease in CI" This reverts commit `d75a66e55e`.	2023-05-31 17:23:07 +02:00
Adriane Boyd	9b7a59c325	Revert "CI: Disable fail-fast (#12658 )" (#12676 ) This reverts commit `1f088cbf4a`.	2023-05-26 10:57:02 +02:00
Vinit Ravishankar	f0e0206b77	update universe for spacypdfreader (#12661 )	2023-05-23 13:28:48 +02:00
Adriane Boyd	1f088cbf4a	CI: Disable fail-fast (#12658 ) While the typing_extensions/pydantic `Literal` bugs are being sorted out, disable fail-fast so the rest of the CI is available for development purposes.	2023-05-23 10:48:06 +02:00
Basile Dura	6ea4155487	feat: add comparison operators in `span.pyi` (#12652 ) * feat: add comparison operators in span.pyi remove Cython-specific `__richcmp__` * fix: comparison operators should be defined for any other object	2023-05-23 08:50:37 +02:00
Victoria	6930a6bf45	Add spaCy VSCode extension materials (#12592 )	2023-05-19 14:38:53 +02:00
Basile Dura	95fd46b1dd	feat: add type hinting on SpanGroup.__iter__ (#12642 )	2023-05-17 14:20:00 +02:00
Adriane Boyd	df083f91a5	Add Malay to website languages (#12643 )	2023-05-17 13:13:43 +02:00
Sani	873c16a4df	Malay language support (#12602 ) * add malay lang * fix token len * black format * reformat conftest malay * remove exceptions not exist in dbp * format code	2023-05-17 12:45:21 +02:00
Lj Miranda	58779c24ef	Remove shorthand for output-file in spacy apply (#12636 ) The output-file argument is positional, so can't use a shorthand like -o.	2023-05-17 12:36:29 +02:00
David Berenstein	83b6f488cb	universe: Update examples Adept Augementation (#12620 ) * Update universe.json * chore: changed readme example as suggested by Vincent Warmerdam (koaning)	2023-05-15 14:09:33 +02:00
Adriane Boyd	3dc445df8d	Fix new tags in docs for v3.5.x (#12629 ) * Fix new tags in docs for v3.5.x * Fix new tag	2023-05-15 12:06:58 +02:00
Basile Dura	2dd8825f09	docs: add comment on `offset_x` argument (#12630 )	2023-05-15 11:42:47 +02:00
Basile Dura	f96b9e03df	build: bump typer version to accept >=0.3<0.10 (#12631 )	2023-05-15 08:06:58 +02:00
Adriane Boyd	3637148c4d	Add scorer option to return per-component scores (#12540 ) * Add scorer option to return per-component scores Add `per_component` option to `Language.evaluate` and `Scorer.score` to return scores keyed by `tokenizer` (hard-coded) or by component name. Add option to `evaluate` CLI to score by component. Per-component scores can only be saved to JSON. * Update help text and messages	2023-05-12 15:36:54 +02:00
Kenneth Enevoldsen	88680a6eed	docs: remove invalid huggingface-hub push argument (#12624 )	2023-05-12 09:40:28 +02:00
Adriane Boyd	b5af0fe836	Revert "Use Latin normalization for Serbian attrs (#12608 )" (#12621 ) This reverts commit `6f314f99c4`. We are reverting this until we can support this normalization more consistently across vectors, training corpora, and lemmatizer data.	2023-05-11 11:54:16 +02:00
royashcenazi	3252f6b13f	Parsigs universe 3 (#12617 ) * parsigs universe * added model installation explanation in the description * Update website/meta/universe.json Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * added model installement instruction in the code example * added biomedical category --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:49:51 +02:00
royashcenazi	a56ab98e3c	parsigs universe (#12616 ) * parsigs universe * added model installation explanation in the description * Update website/meta/universe.json Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * added model installement instruction in the code example --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:19:28 +02:00
David Berenstein	d11b549195	chore: added adept-augmentations to the spacy universe (#12609 ) * chore: added adept-augmentations to the spacy universe * Apply suggestions from code review Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * Update universe.json --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:16:16 +02:00
Patrick J. Burns	15f16db6ca	Fix typo (#12615 )	2023-05-09 15:52:34 +02:00
Patrick J. Burns	eb3960a15a	Add LatinCy models to universe.json (#12597 ) * Add LatinCy models to universe.json * Update website/meta/universe.json Add install code for LatinCy models to 'code_example' Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update LatinCy ‘code_example’ in website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-05-09 12:02:45 +02:00
Adriane Boyd	1279b464bb	In initialize only calculate current vectors hash if needed (#12607 )	2023-05-08 16:51:58 +02:00
Adriane Boyd	6f314f99c4	Use Latin normalization for Serbian attrs (#12608 ) * Use Latin normalization for Serbian attrs Use Latin normalization for Serbian `NORM`, `PREFIX`, and `SUFFIX`. * Update NORMs in tokenizer exceptions and related tests * Add tests for all custom lex attrs * Remove unused imports	2023-05-08 12:33:56 +02:00
Adriane Boyd	cbc6bcf434	Merge pull request #12604 from adrianeboyd/chore/v3.6.0.dev0 Set version to v3.6.0.dev0	2023-05-08 10:05:15 +02:00
Adriane Boyd	46ce66021a	Temporarily skip download CLI related tests in CI	2023-05-08 09:17:33 +02:00
Adriane Boyd	fbd12eb4a4	Set version to v3.6.0.dev0	2023-05-08 09:10:35 +02:00
Adriane Boyd	dbc71ecd44	Remove #egg from download URLs (#12567 ) The current URLs will become invalid in pip 25.0. According to the pip docs, the egg= URLs are currently only needed for editable VCS installs.	2023-05-04 17:13:12 +02:00
Kenneth Enevoldsen	73698326df	Update inmemorylookupkb.mdx (#12586 ) Example does not refer to the in memory lookup	2023-05-02 12:51:13 +02:00
Lj Miranda	298e6036b7	Add spans in spacy benchmark (#12575 ) * Add spans in spacy benchmark The current implementation of spaCy benchmark accuracy / spacy evaluate doesn't include the "spans" type, so calling the command doesn't render the HTML displaCy file needed. This PR attempts to fix that by creating a new parameter for "spans" and calling the appropriate displaCy value. * Reformat file with black * Add tests for evaluate * Fix spans -> span for displacy style * Update test to check render instead * Update source so mypy passes * Add parser information to avoid warnings	2023-04-28 14:32:52 +02:00
Adriane Boyd	6817e3d372	CI: Only run test suite once with thinc-apple-ops for macos python 3.11 (#12436 ) * CI: Only run test suite once with thinc-apple-ops for macos python 3.11 * Adjust syntax * Try alternate syntax * Try alternate syntax * Try alternate syntax	2023-04-28 14:29:51 +02:00
kadarakos	34d1164b0e	Spancat speed improvement (#12577 ) * avoid nesting then flattening * mypy fix * Apply suggestions from code review * Add type for indices * Run full matrix for mypy * Add back modified type: ignore * Revert "Run full matrix for mypy" This reverts commit `e218873d04`. --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-27 15:27:13 +02:00
Victoria	a8dfc66135	Add spacy-wasm to universe (#12572 ) * add spacy-wasm to universe * add tag	2023-04-26 14:18:40 +02:00
moxley01	070fa16545	add spacysee project (#12568 )	2023-04-25 12:30:19 +02:00
Adriane Boyd	68da580a4c	CI: Disable Azure (#12560 )	2023-04-21 15:05:53 +02:00

... 3 4 5 6 7 ...

16137 Commits