spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-10 22:22:39 +03:00

Author	SHA1	Message	Date
Adriane Boyd	b5af0fe836	Revert "Use Latin normalization for Serbian attrs (#12608 )" (#12621 ) This reverts commit `6f314f99c4`. We are reverting this until we can support this normalization more consistently across vectors, training corpora, and lemmatizer data.	2023-05-11 11:54:16 +02:00
royashcenazi	3252f6b13f	Parsigs universe 3 (#12617 ) * parsigs universe * added model installation explanation in the description * Update website/meta/universe.json Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * added model installement instruction in the code example * added biomedical category --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:49:51 +02:00
royashcenazi	a56ab98e3c	parsigs universe (#12616 ) * parsigs universe * added model installation explanation in the description * Update website/meta/universe.json Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * added model installement instruction in the code example --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:19:28 +02:00
David Berenstein	d11b549195	chore: added adept-augmentations to the spacy universe (#12609 ) * chore: added adept-augmentations to the spacy universe * Apply suggestions from code review Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * Update universe.json --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:16:16 +02:00
Patrick J. Burns	15f16db6ca	Fix typo (#12615 )	2023-05-09 15:52:34 +02:00
Patrick J. Burns	eb3960a15a	Add LatinCy models to universe.json (#12597 ) * Add LatinCy models to universe.json * Update website/meta/universe.json Add install code for LatinCy models to 'code_example' Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update LatinCy ‘code_example’ in website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-05-09 12:02:45 +02:00
Adriane Boyd	1279b464bb	In initialize only calculate current vectors hash if needed (#12607 )	2023-05-08 16:51:58 +02:00
Adriane Boyd	6f314f99c4	Use Latin normalization for Serbian attrs (#12608 ) * Use Latin normalization for Serbian attrs Use Latin normalization for Serbian `NORM`, `PREFIX`, and `SUFFIX`. * Update NORMs in tokenizer exceptions and related tests * Add tests for all custom lex attrs * Remove unused imports	2023-05-08 12:33:56 +02:00
Adriane Boyd	cbc6bcf434	Merge pull request #12604 from adrianeboyd/chore/v3.6.0.dev0 Set version to v3.6.0.dev0	2023-05-08 10:05:15 +02:00
Adriane Boyd	46ce66021a	Temporarily skip download CLI related tests in CI	2023-05-08 09:17:33 +02:00
Adriane Boyd	fbd12eb4a4	Set version to v3.6.0.dev0	2023-05-08 09:10:35 +02:00
Adriane Boyd	dbc71ecd44	Remove #egg from download URLs (#12567 ) The current URLs will become invalid in pip 25.0. According to the pip docs, the egg= URLs are currently only needed for editable VCS installs.	2023-05-04 17:13:12 +02:00
Kenneth Enevoldsen	73698326df	Update inmemorylookupkb.mdx (#12586 ) Example does not refer to the in memory lookup	2023-05-02 12:51:13 +02:00
Lj Miranda	298e6036b7	Add spans in spacy benchmark (#12575 ) * Add spans in spacy benchmark The current implementation of spaCy benchmark accuracy / spacy evaluate doesn't include the "spans" type, so calling the command doesn't render the HTML displaCy file needed. This PR attempts to fix that by creating a new parameter for "spans" and calling the appropriate displaCy value. * Reformat file with black * Add tests for evaluate * Fix spans -> span for displacy style * Update test to check render instead * Update source so mypy passes * Add parser information to avoid warnings	2023-04-28 14:32:52 +02:00
Adriane Boyd	6817e3d372	CI: Only run test suite once with thinc-apple-ops for macos python 3.11 (#12436 ) * CI: Only run test suite once with thinc-apple-ops for macos python 3.11 * Adjust syntax * Try alternate syntax * Try alternate syntax * Try alternate syntax	2023-04-28 14:29:51 +02:00
kadarakos	34d1164b0e	Spancat speed improvement (#12577 ) * avoid nesting then flattening * mypy fix * Apply suggestions from code review * Add type for indices * Run full matrix for mypy * Add back modified type: ignore * Revert "Run full matrix for mypy" This reverts commit `e218873d04`. --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-27 15:27:13 +02:00
Victoria	a8dfc66135	Add spacy-wasm to universe (#12572 ) * add spacy-wasm to universe * add tag	2023-04-26 14:18:40 +02:00
moxley01	070fa16545	add spacysee project (#12568 )	2023-04-25 12:30:19 +02:00
Adriane Boyd	68da580a4c	CI: Disable Azure (#12560 )	2023-04-21 15:05:53 +02:00
Victoria	e115408514	remove survey link (#12559 )	2023-04-21 10:22:26 +02:00
Patrick J. Burns	ab4ba04c32	Update LatinDefaults for lang 'la' (#12538 ) * Add noun chunking to la syntax iterators * Expand list of numeral, ordinal words * Expand abbreviations in la tokenizer_exceptions * Add example sents * Update spacy/lang/la/syntax_iterators.py Reorganize la syntax iterators Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Minor updates based on review * fix call --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-04-20 16:55:40 +02:00
Adriane Boyd	b60b027927	Add default option to MorphAnalysis.get (#12545 ) * Add default to MorphAnalysis.get Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for the user to provide a default return value if the field is not found. The default return value remains `[]`, which is not the same as `dict.get`, but is already established as this method's default return value with the return type `List[str]`. However the new `default` option does not enforce that the user-provided default is actually `List[str]`. * Restore test case	2023-04-20 14:06:32 +02:00
Adriane Boyd	dc0a1a9808	Load exceptions last in Tokenizer.from_bytes (#12553 ) In `Tokenizer.from_bytes`, the exceptions should be loaded last so that they are only processed once as part of loading the model. The exceptions are tokenized as phrase matcher patterns in the background and the internal tokenization needs to be synced with all the remaining tokenizer settings. If the exceptions are not loaded last, there are speed regressions for `Tokenizer.from_bytes/disk` vs. `Tokenizer.add_special_case` as the caches are reloaded more than necessary during deserialization.	2023-04-20 11:30:34 +02:00
Sofie Van Landeghem	8e6a3d58d8	fix typo (#12543 )	2023-04-19 10:59:33 +02:00
TAN Long	923d24e885	perf(REL_OP): Replace some token.children with token.rights or token.lefts (#12528 ) Co-authored-by: Tan Long <tanloong@foxmail.com>	2023-04-17 13:16:34 +02:00
TAN Long	119f959218	docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 (#12531 ) Co-authored-by: Tan Long <tanloong@foxmail.com>	2023-04-17 13:14:01 +02:00
andyjessen	02259fa195	Add category to spaCy project (#12506 ) ScispaCy fits within biomedical domain. Consider adding this category.	2023-04-07 15:31:04 +02:00
Madeesh Kannan	6db20b354f	`Docs`: Fix rule-based matching example that expands named entities (#12495 )	2023-04-06 11:45:58 +02:00
Edward	c95d320d28	Add more information to custom code docs (#12491 ) * Add info to sections * Update website/docs/usage/training.mdx --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-06 11:45:19 +02:00
Will Frey	8d4129e177	Fix invalid ConsoleLogger.v3 example config (#12498 ) Replace `progress_bar = "all_steps"` with `progress_bar = "eval"`, which is consistent with the default behavior for `spacy.ConsoleLogger.v1` and `spacy.ConsoleLogger.v2`.	2023-04-04 20:53:07 +02:00
Edward	de32011e4c	Add model-last saving mechanism to pretraining (#12459 ) * Adjust pretrain command * chane naming and add finally block * Add unit test * Add unit test assertions * Update spacy/training/pretrain.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * change finally block * Add to docs * Update website/docs/usage/embeddings-transformers.mdx * Add flag to skip saving model-last --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-03 15:24:03 +02:00
Adriane Boyd	4a1ec332de	Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set (#12493 ) * Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set * Format	2023-04-03 15:11:12 +02:00
Adriane Boyd	4538ceb507	Remove redundant strings.add for Doc.char_span (#12429 )	2023-04-03 11:38:56 +02:00
Adriane Boyd	476a2e7a0a	Allow cupy 12.0 for extras (#12490 )	2023-03-31 13:48:15 +02:00
Adriane Boyd	69e20ce03d	Fix pickle for ngram suggester (#12486 )	2023-03-31 13:43:51 +02:00
Adriane Boyd	140d53649d	Convert values to numpy for label smoothing tests (#12472 )	2023-03-31 13:41:41 +02:00
Ye Lei (叶磊)	ce258670b7	Allow passing a Span to displacy.parse_deps (#12477 ) * Allow passing a Span to displacy.parse_deps * Update docstring Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update API docs --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-31 09:44:01 +02:00
Raphael Mitsch	d85df9d577	Fix Span.sents for edge case of Span being the only Span in the last sentence of a Doc. (#12484 )	2023-03-29 18:54:47 +02:00
kadarakos	372a90885e	Fix spancat-singlelabel score (#12469 ) * debug argmax sort and add span scores * add missing tests for spanscores	2023-03-29 08:38:11 +02:00
Edward	dba4e7bece	Add info to stringstore and vocab (#12471 )	2023-03-27 13:15:14 +02:00
Adriane Boyd	2fba21be63	Restrict github workflows to explosion (#12470 )	2023-03-27 12:44:04 +02:00
sloev / Johannes Valbjørn	fd072533e7	add spacy_onnx_sentiment_english to universe (#12422 ) * add spacy_onnx_sentiment_english to universe * rename to sentimental-onix * fix comma json error * fix typo * typo fix Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * mention need to download model before example works Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-27 11:35:14 +02:00
Prajakta Darade	ae7779e830	corrected example code (#12466 )	2023-03-27 11:32:49 +02:00
kadarakos	d1474fdd91	add explanation about overwriting behaviour (#12464 ) * add explanation about overwriting behaviour * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * format --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-27 10:27:11 +02:00
Adriane Boyd	fac457a509	Support floret for PretrainVectors (#12435 ) * Support floret for PretrainVectors * Format	2023-03-24 16:28:51 +01:00
Adriane Boyd	d0bd3f5ee4	Update Serbian tokenization for UD Serbian SET (#12442 )	2023-03-24 16:26:40 +01:00
Vinit Ravishankar	28de85737f	Tagger label smoothing (#12293 ) * add label smoothing * use True/False instead of floats * add entropy to debug data * formatting * docs * change test to check difference in distributions * Update website/docs/api/tagger.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/tagger.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * bool -> float * update docs * fix seed * black * update tests to use label_smoothing = 0.0 * set default to 0.0, update quickstart * Update spacy/pipeline/tagger.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * update morphologizer, tagger test * fix morph docs * add url to docs --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-22 12:17:56 +01:00
Ines Montani	b479f8bfa5	Add user survey alert to the top (#12452 ) * Add user survey alert to the top * Shorter --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-03-22 11:09:37 +01:00
Adriane Boyd	54c614e116	CI: Separate spacy universe validation into a separate workflow (#12440 ) * Separate spacy universe validation into a separate workflow * Fix new workflow name	2023-03-17 10:59:53 +01:00
Adriane Boyd	5f72d6c836	CI: Switch PR back to paths-ignore (#12438 ) Switch PR tests back to paths-ignore but include changes to `.github` for all PRs rather than trying to figure out complicated includes+excludes. Changes to `.github` are relatively rare and should not be a huge burden for the CI.	2023-03-17 10:01:49 +01:00

... 3 4 5 6 7 ...

16106 Commits