spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-10 16:22:29 +03:00

Author	SHA1	Message	Date
Adriane Boyd	b8d40cae3e	Support overriding registered functions in configs (#12623 ) Support overriding registered functions in configs. Previously the registry name was parsed as a section name rather than as a registry name.	2023-06-28 10:03:27 +02:00
Adriane Boyd	1f663b7c33	Address issues with source with component names and replacing listeners (#12701 ) When sourcing a component, the object from the original pipeline is added to the new pipeline as the same object. This creates a situation where there are several attributes that cannot be in sync between the original pipeline and the new pipeline at the same time for this one object: * component.name * component.listener_map / component.listening_components for tok2vec and transformer When running replace_listeners on a component, the config is not updated correctly if the state of the component is incorrect for the current pipeline (in particular changes that should be applied from model.attrs["replace_listener_cfg"] as used in spacy-transformers) due to the fact that: * find_listeners relies on component.name to set the name in the listener_map * replace_listeners relies on listener_map to determine how to modify the configs In addition, there are several places where pipeline components are modified and the listener map and/or internal component names aren't currently updated. In cases where there is a component shared by two pipelines that cannot be in sync, this PR chooses to prioritize the most recently modified or initialized pipeline. There is no actual solution with the current source behavior that will make both pipelines usable, so the current pipeline is updated whenever components are added/renamed/removed or the pipeline is initialized for training.	2023-06-28 10:03:27 +02:00
Adriane Boyd	0b190c5d42	Address numpy 1.25 deprecations in test suite (#12684 ) * Address upcoming numpy v1.25 deprecations in test suite * Temporarily test most recent numpy prerelease in CI * Revert "Temporarily test most recent numpy prerelease in CI" This reverts commit `d75a66e55e`.	2023-06-28 10:03:27 +02:00
Basile Dura	6c2f601812	build: bump typer version to accept >=0.3<0.10 (#12631 )	2023-06-28 09:54:16 +02:00
Adriane Boyd	512241e124	Set version to v3.5.3 (#12628 )	2023-05-12 11:50:08 +02:00
Adriane Boyd	424e917c6c	Merge pull request #12626 from adrianeboyd/backport/v3.5.3-1 Backports for v3.5.3	2023-05-12 11:18:12 +02:00
Kenneth Enevoldsen	9beaec6a03	docs: remove invalid huggingface-hub push argument (#12624 )	2023-05-12 09:55:22 +02:00
royashcenazi	931a46308f	Parsigs universe 3 (#12617 ) * parsigs universe * added model installation explanation in the description * Update website/meta/universe.json Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * added model installement instruction in the code example * added biomedical category --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-12 09:55:22 +02:00
royashcenazi	7c49d251c7	parsigs universe (#12616 ) * parsigs universe * added model installation explanation in the description * Update website/meta/universe.json Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * added model installement instruction in the code example --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-12 09:55:22 +02:00
David Berenstein	81488fa88b	chore: added adept-augmentations to the spacy universe (#12609 ) * chore: added adept-augmentations to the spacy universe * Apply suggestions from code review Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * Update universe.json --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-12 09:55:22 +02:00
Patrick J. Burns	54d9198e62	Fix typo (#12615 )	2023-05-12 09:55:22 +02:00
Patrick J. Burns	7ae4fc19a1	Add LatinCy models to universe.json (#12597 ) * Add LatinCy models to universe.json * Update website/meta/universe.json Add install code for LatinCy models to 'code_example' Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update LatinCy ‘code_example’ in website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-05-12 09:55:22 +02:00
Adriane Boyd	2cfbc1209d	In initialize only calculate current vectors hash if needed (#12607 )	2023-05-12 09:55:22 +02:00
Adriane Boyd	42e5043816	Remove #egg from download URLs (#12567 ) The current URLs will become invalid in pip 25.0. According to the pip docs, the egg= URLs are currently only needed for editable VCS installs.	2023-05-12 09:55:22 +02:00
Kenneth Enevoldsen	4e1db35f6e	Update inmemorylookupkb.mdx (#12586 ) Example does not refer to the in memory lookup	2023-05-12 09:55:22 +02:00
Lj Miranda	9ec12fcfde	Add spans in spacy benchmark (#12575 ) * Add spans in spacy benchmark The current implementation of spaCy benchmark accuracy / spacy evaluate doesn't include the "spans" type, so calling the command doesn't render the HTML displaCy file needed. This PR attempts to fix that by creating a new parameter for "spans" and calling the appropriate displaCy value. * Reformat file with black * Add tests for evaluate * Fix spans -> span for displacy style * Update test to check render instead * Update source so mypy passes * Add parser information to avoid warnings	2023-05-12 09:55:22 +02:00
Adriane Boyd	139368d9ce	CI: Only run test suite once with thinc-apple-ops for macos python 3.11 (#12436 ) * CI: Only run test suite once with thinc-apple-ops for macos python 3.11 * Adjust syntax * Try alternate syntax * Try alternate syntax * Try alternate syntax	2023-05-12 09:55:22 +02:00
kadarakos	0de1f8bf73	Spancat speed improvement (#12577 ) * avoid nesting then flattening * mypy fix * Apply suggestions from code review * Add type for indices * Run full matrix for mypy * Add back modified type: ignore * Revert "Run full matrix for mypy" This reverts commit `e218873d04`. --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-05-12 09:55:22 +02:00
Victoria	1f8f910554	Add spacy-wasm to universe (#12572 ) * add spacy-wasm to universe * add tag	2023-05-12 09:55:22 +02:00
moxley01	e9945ccd04	add spacysee project (#12568 )	2023-05-12 09:55:22 +02:00
Adriane Boyd	664a53ffbe	CI: Disable Azure (#12560 )	2023-05-12 09:55:22 +02:00
Adriane Boyd	e05b2ccc7c	Add default option to MorphAnalysis.get (#12545 ) * Add default to MorphAnalysis.get Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for the user to provide a default return value if the field is not found. The default return value remains `[]`, which is not the same as `dict.get`, but is already established as this method's default return value with the return type `List[str]`. However the new `default` option does not enforce that the user-provided default is actually `List[str]`. * Restore test case	2023-05-12 09:55:22 +02:00
Adriane Boyd	357fdd4871	Load exceptions last in Tokenizer.from_bytes (#12553 ) In `Tokenizer.from_bytes`, the exceptions should be loaded last so that they are only processed once as part of loading the model. The exceptions are tokenized as phrase matcher patterns in the background and the internal tokenization needs to be synced with all the remaining tokenizer settings. If the exceptions are not loaded last, there are speed regressions for `Tokenizer.from_bytes/disk` vs. `Tokenizer.add_special_case` as the caches are reloaded more than necessary during deserialization.	2023-05-12 09:55:22 +02:00
Sofie Van Landeghem	7bf1db87ad	fix typo (#12543 )	2023-05-12 09:55:22 +02:00
TAN Long	b0e5aed5ed	perf(REL_OP): Replace some token.children with token.rights or token.lefts (#12528 ) Co-authored-by: Tan Long <tanloong@foxmail.com>	2023-05-12 09:55:22 +02:00
TAN Long	6be67db59f	docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 (#12531 ) Co-authored-by: Tan Long <tanloong@foxmail.com>	2023-05-12 09:55:22 +02:00
andyjessen	18a2a88a95	Add category to spaCy project (#12506 ) ScispaCy fits within biomedical domain. Consider adding this category.	2023-05-12 09:55:22 +02:00
Adriane Boyd	aea4a96f92	Set version to v3.5.2 (#12508 )	2023-04-06 17:30:39 +02:00
Adriane Boyd	e4bbdf7b50	Merge pull request #12494 from adrianeboyd/backport/v3.5.2-1 Backports for v3.5.2	2023-04-06 16:18:59 +02:00
Madeesh Kannan	f66d55fe5b	`Docs`: Fix rule-based matching example that expands named entities (#12495 )	2023-04-06 11:48:04 +02:00
Edward	9fbb8ee912	Add more information to custom code docs (#12491 ) * Add info to sections * Update website/docs/usage/training.mdx --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-06 11:48:04 +02:00
Will Frey	314a7cea73	Fix invalid ConsoleLogger.v3 example config (#12498 ) Replace `progress_bar = "all_steps"` with `progress_bar = "eval"`, which is consistent with the default behavior for `spacy.ConsoleLogger.v1` and `spacy.ConsoleLogger.v2`.	2023-04-06 11:48:04 +02:00
Edward	2fbd080a03	Add model-last saving mechanism to pretraining (#12459 ) * Adjust pretrain command * chane naming and add finally block * Add unit test * Add unit test assertions * Update spacy/training/pretrain.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * change finally block * Add to docs * Update website/docs/usage/embeddings-transformers.mdx * Add flag to skip saving model-last --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-03 15:28:52 +02:00
Adriane Boyd	bbf232e355	Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set (#12493 ) * Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set * Format	2023-04-03 15:28:52 +02:00
Adriane Boyd	0ec4dc5c29	Remove redundant strings.add for Doc.char_span (#12429 )	2023-04-03 15:28:52 +02:00
Adriane Boyd	a5406a6c45	Allow cupy 12.0 for extras (#12490 )	2023-04-03 15:28:52 +02:00
Adriane Boyd	57ee1212de	Fix pickle for ngram suggester (#12486 )	2023-04-03 15:28:52 +02:00
Ye Lei (叶磊)	b228875600	Allow passing a Span to displacy.parse_deps (#12477 ) * Allow passing a Span to displacy.parse_deps * Update docstring Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update API docs --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-03 15:28:52 +02:00
Raphael Mitsch	8d064872ff	Fix Span.sents for edge case of Span being the only Span in the last sentence of a Doc. (#12484 )	2023-04-03 15:28:52 +02:00
kadarakos	26da226a39	Fix spancat-singlelabel score (#12469 ) * debug argmax sort and add span scores * add missing tests for spanscores	2023-04-03 15:28:52 +02:00
Edward	888332dfb2	Add info to stringstore and vocab (#12471 )	2023-04-03 15:28:52 +02:00
Adriane Boyd	1b4a67bc54	Restrict github workflows to explosion (#12470 )	2023-04-03 15:28:52 +02:00
sloev / Johannes Valbjørn	79dcef17f7	add spacy_onnx_sentiment_english to universe (#12422 ) * add spacy_onnx_sentiment_english to universe * rename to sentimental-onix * fix comma json error * fix typo * typo fix Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * mention need to download model before example works Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-03 15:28:52 +02:00
Prajakta Darade	0ecbeff1a6	corrected example code (#12466 )	2023-04-03 15:28:52 +02:00
kadarakos	4380d750f9	add explanation about overwriting behaviour (#12464 ) * add explanation about overwriting behaviour * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * format --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-03 15:28:52 +02:00
Adriane Boyd	2953e7b7ce	Support floret for PretrainVectors (#12435 ) * Support floret for PretrainVectors * Format	2023-04-03 15:28:52 +02:00
Ines Montani	d2d9e9e139	Add user survey alert to the top (#12452 ) * Add user survey alert to the top * Shorter --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-04-03 15:28:52 +02:00
Adriane Boyd	f1a42b6fcc	CI: Separate spacy universe validation into a separate workflow (#12440 ) * Separate spacy universe validation into a separate workflow * Fix new workflow name	2023-04-03 15:28:52 +02:00
Adriane Boyd	f9c0220ea5	CI: Switch PR back to paths-ignore (#12438 ) Switch PR tests back to paths-ignore but include changes to `.github` for all PRs rather than trying to figure out complicated includes+excludes. Changes to `.github` are relatively rare and should not be a huge burden for the CI.	2023-04-03 15:28:52 +02:00
Adriane Boyd	6183906a0b	Remove autoblack workflow (#12437 ) Now that all PRs have `black` formatting validation, we no longer need the autoblack workflow.	2023-04-03 15:28:52 +02:00

1 2 3 4 5 ...

15907 Commits