spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-16 22:57:22 +03:00

Author	SHA1	Message	Date
Sofie Van Landeghem	ddffd09602	Trainable lemmatizer docs link (#12795 ) * add an anchor to the trainable lemmatizer section * add requirement for morphologizer,tagger to rule-based lemmatizer * morphologizer only	2023-07-07 15:18:16 +02:00
Adriane Boyd	1a55661cfb	Update website binder version to v3.6 (#12805 )	2023-07-07 10:52:33 +02:00
Adriane Boyd	41dba5bd34	Update max_length default in span finder docs (#12803 )	2023-07-07 10:17:41 +02:00
Adriane Boyd	4e19ec7eb8	Docs for v3.6.0 (#12792 ) * Docs for v3.6.0 * Add sl performance * Add da trf note	2023-07-06 12:58:25 +02:00
Tom Aarsen	eab929361d	Use 'exclude' instead of 'disable' (#12783 ) as suggested by @svlandeg	2023-07-04 11:45:13 +02:00
Marcus Blättermann	bd239511a4	Fix problem with missing syntax highlighting languages causing runtime crash on the website (#12781 ) * Fix problem with universe pages using `docker` language * Fix problem with universe pages using `r` language * Add fallback, in case code language is unknown	2023-07-03 10:24:25 +02:00
Daniël de Kok	57a230c6e4	Remove section about parallel training with Ray (#12770 ) The Ray integration is currently broken, having these docs around suggest that this functionality is currently available.	2023-06-28 17:09:57 +02:00
Adriane Boyd	fb0da3e097	Support custom token/lexeme attribute for vectors (#12625 ) * Support custom token/lexeme attribute for vectors * Fix imports * Back off to ORTH without Vectors.attr * Fallback if vectors.attr doesn't exist * Update docs	2023-06-28 09:43:14 +02:00
Tom Aarsen	93983f08fc	Add SpanMarker for NER to spaCy universe (#12730 ) * Add SpanMarker for NER to spaCy universe * Escape the newlines in the text in the code example Or at least, attempt to * Remove now unnecessary import * Disable NER pipeline component in code example	2023-06-20 16:47:44 +02:00
David Berenstein	53c400bd7a	docs: added reference to `spacy-setfit` to the spaCy Universe (#12737 ) * docs: added reference to spacy-setfit * removed package import after adding factory entry points to packages	2023-06-19 15:52:07 +02:00
Marcus Blättermann	7e4b38c841	Fix #12716 does not update the `config` generation section (#12718 ) This is a really odd bug, where Firefox doesn't re-render the `code` element, even though `children` changed. Two things fixed that: - remove the `language-ini` `className` - replace the `code` block with a `div` Both are not ideal. Therefor this solution adds an inner `div` that now has the classes while still maintaining the semantic `code` element. I couldn't find any explanation for why this is happening and why it only happens in Firefox. I assume it is a bug caused by one of our many dependencies (or their interplay) To make matters worse: This bug doesn't occure when running the site in dev mode. You have to build and serve the site to recreate it.	2023-06-19 09:34:28 +02:00
Jacobo Myerston	daa6e0339f	Update universe.json (#12709 ) * Update universe.json * Update universe.json add some missing commas in the greCy's description.	2023-06-12 13:55:20 +02:00
kadarakos	c003aac29a	SpanFinder into spaCy from experimental (#12507 ) * span finder integrated into spacy from experimental * black * isort * black * default spankey constant * black * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * rename * rename * max_length and min_length as Optional[int] and strict checking * black * mypy fix for integer type infinity * revert line order * implement all comparison operators for inf int * avoid two for loops over all docs by not precomputing * interleave thresholding with span creation * black * revert to not interleaving (relized its faster) * black * Update spacy/errors.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * update dosctring * enforce that the gold and predicted documents have the same text * new error for ensuring reference and predicted texts are the same * remove todo * adjust test * black * handle misaligned tokenization * return correct variable * failing overfit test * only use a single spans_key like in spancat * black * remove debug lines * typo * remove comment * remove near duplicate reduntant method * use the 'spans_key' variable name everywhere * Update spacy/pipeline/span_finder.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * flaky test fix suggestion, hand set bias terms * only test suggester and test result exhaustively * make it clear that the span_finder_suggester is more general (not specific to span_finder) * Update spacy/tests/pipeline/test_span_finder.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Apply suggestions from code review * remove question comment * move preset_spans_suggester test to spancat tests * Add docs and unify default configs for spancat and span finder * Add `allow_overlap=True` to span finder scorer * Fix offset bug in set_annotations * Ignore labels in span finder scorer * Format * Add span_finder to quickstart template * Move settings to self.cfg, store min/max unset as None * Remove debugging * Update docstrings and docs * Update spacy/pipeline/span_finder.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix imports --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-06-07 15:52:28 +02:00
Isabel Zimmerman	05df59fd4a	[DOCS] add vetiver to spacy universe (#12557 ) * add vetiver to spacy universe * remove image * update logo to render correctly in thumbnail * apply Basil's suggestion Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * refer to the same model --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-06-01 17:11:18 +02:00
Vinit Ravishankar	f0e0206b77	update universe for spacypdfreader (#12661 )	2023-05-23 13:28:48 +02:00
Victoria	6930a6bf45	Add spaCy VSCode extension materials (#12592 )	2023-05-19 14:38:53 +02:00
Adriane Boyd	df083f91a5	Add Malay to website languages (#12643 )	2023-05-17 13:13:43 +02:00
Lj Miranda	58779c24ef	Remove shorthand for output-file in spacy apply (#12636 ) The output-file argument is positional, so can't use a shorthand like -o.	2023-05-17 12:36:29 +02:00
David Berenstein	83b6f488cb	universe: Update examples Adept Augementation (#12620 ) * Update universe.json * chore: changed readme example as suggested by Vincent Warmerdam (koaning)	2023-05-15 14:09:33 +02:00
Adriane Boyd	3dc445df8d	Fix new tags in docs for v3.5.x (#12629 ) * Fix new tags in docs for v3.5.x * Fix new tag	2023-05-15 12:06:58 +02:00
Basile Dura	2dd8825f09	docs: add comment on `offset_x` argument (#12630 )	2023-05-15 11:42:47 +02:00
Adriane Boyd	3637148c4d	Add scorer option to return per-component scores (#12540 ) * Add scorer option to return per-component scores Add `per_component` option to `Language.evaluate` and `Scorer.score` to return scores keyed by `tokenizer` (hard-coded) or by component name. Add option to `evaluate` CLI to score by component. Per-component scores can only be saved to JSON. * Update help text and messages	2023-05-12 15:36:54 +02:00
Kenneth Enevoldsen	88680a6eed	docs: remove invalid huggingface-hub push argument (#12624 )	2023-05-12 09:40:28 +02:00
royashcenazi	3252f6b13f	Parsigs universe 3 (#12617 ) * parsigs universe * added model installation explanation in the description * Update website/meta/universe.json Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * added model installement instruction in the code example * added biomedical category --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:49:51 +02:00
royashcenazi	a56ab98e3c	parsigs universe (#12616 ) * parsigs universe * added model installation explanation in the description * Update website/meta/universe.json Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * added model installement instruction in the code example --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:19:28 +02:00
David Berenstein	d11b549195	chore: added adept-augmentations to the spacy universe (#12609 ) * chore: added adept-augmentations to the spacy universe * Apply suggestions from code review Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * Update universe.json --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:16:16 +02:00
Patrick J. Burns	15f16db6ca	Fix typo (#12615 )	2023-05-09 15:52:34 +02:00
Patrick J. Burns	eb3960a15a	Add LatinCy models to universe.json (#12597 ) * Add LatinCy models to universe.json * Update website/meta/universe.json Add install code for LatinCy models to 'code_example' Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update LatinCy ‘code_example’ in website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-05-09 12:02:45 +02:00
Kenneth Enevoldsen	73698326df	Update inmemorylookupkb.mdx (#12586 ) Example does not refer to the in memory lookup	2023-05-02 12:51:13 +02:00
Victoria	a8dfc66135	Add spacy-wasm to universe (#12572 ) * add spacy-wasm to universe * add tag	2023-04-26 14:18:40 +02:00
moxley01	070fa16545	add spacysee project (#12568 )	2023-04-25 12:30:19 +02:00
Victoria	e115408514	remove survey link (#12559 )	2023-04-21 10:22:26 +02:00
Adriane Boyd	b60b027927	Add default option to MorphAnalysis.get (#12545 ) * Add default to MorphAnalysis.get Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for the user to provide a default return value if the field is not found. The default return value remains `[]`, which is not the same as `dict.get`, but is already established as this method's default return value with the return type `List[str]`. However the new `default` option does not enforce that the user-provided default is actually `List[str]`. * Restore test case	2023-04-20 14:06:32 +02:00
TAN Long	119f959218	docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 (#12531 ) Co-authored-by: Tan Long <tanloong@foxmail.com>	2023-04-17 13:14:01 +02:00
andyjessen	02259fa195	Add category to spaCy project (#12506 ) ScispaCy fits within biomedical domain. Consider adding this category.	2023-04-07 15:31:04 +02:00
Madeesh Kannan	6db20b354f	`Docs`: Fix rule-based matching example that expands named entities (#12495 )	2023-04-06 11:45:58 +02:00
Edward	c95d320d28	Add more information to custom code docs (#12491 ) * Add info to sections * Update website/docs/usage/training.mdx --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-06 11:45:19 +02:00
Will Frey	8d4129e177	Fix invalid ConsoleLogger.v3 example config (#12498 ) Replace `progress_bar = "all_steps"` with `progress_bar = "eval"`, which is consistent with the default behavior for `spacy.ConsoleLogger.v1` and `spacy.ConsoleLogger.v2`.	2023-04-04 20:53:07 +02:00
Edward	de32011e4c	Add model-last saving mechanism to pretraining (#12459 ) * Adjust pretrain command * chane naming and add finally block * Add unit test * Add unit test assertions * Update spacy/training/pretrain.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * change finally block * Add to docs * Update website/docs/usage/embeddings-transformers.mdx * Add flag to skip saving model-last --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-03 15:24:03 +02:00
Ye Lei (叶磊)	ce258670b7	Allow passing a Span to displacy.parse_deps (#12477 ) * Allow passing a Span to displacy.parse_deps * Update docstring Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update API docs --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-31 09:44:01 +02:00
Edward	dba4e7bece	Add info to stringstore and vocab (#12471 )	2023-03-27 13:15:14 +02:00
sloev / Johannes Valbjørn	fd072533e7	add spacy_onnx_sentiment_english to universe (#12422 ) * add spacy_onnx_sentiment_english to universe * rename to sentimental-onix * fix comma json error * fix typo * typo fix Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * mention need to download model before example works Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-27 11:35:14 +02:00
Prajakta Darade	ae7779e830	corrected example code (#12466 )	2023-03-27 11:32:49 +02:00
kadarakos	d1474fdd91	add explanation about overwriting behaviour (#12464 ) * add explanation about overwriting behaviour * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * format --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-27 10:27:11 +02:00
Vinit Ravishankar	28de85737f	Tagger label smoothing (#12293 ) * add label smoothing * use True/False instead of floats * add entropy to debug data * formatting * docs * change test to check difference in distributions * Update website/docs/api/tagger.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/tagger.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * bool -> float * update docs * fix seed * black * update tests to use label_smoothing = 0.0 * set default to 0.0, update quickstart * Update spacy/pipeline/tagger.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * update morphologizer, tagger test * fix morph docs * add url to docs --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-22 12:17:56 +01:00
Ines Montani	b479f8bfa5	Add user survey alert to the top (#12452 ) * Add user survey alert to the top * Shorter --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-03-22 11:09:37 +01:00
Adriane Boyd	2ce9a220db	Fix --verbose for spacy find-threshold (#12418 )	2023-03-14 17:16:49 +01:00
Lj Miranda	913d74f509	Add spancat_singlelabel pipeline for multiclass and non-overlapping span labelling tasks (#11365 ) * [wip] Update * [wip] Update * Add initial port * [wip] Update * Fix all imports * Add spancat_exclusive to pipeline * [WIP] Update * [ci skip] Add breakpoint for debugging * Use spacy.SpanCategorizer.v1 as default archi * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: kadarakos <kadar.akos@gmail.com> * [ci skip] Small updates * Use Softmax v2 directly from thinc * Cache the label map * Fix mypy errors However, I ignored line 370 because it opened up a bunch of type errors that might be trickier to solve and might lead to a more complicated codebase. * avoid multiplication with 1.0 Co-authored-by: kadarakos <kadar.akos@gmail.com> * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update component versions to v2 * Add scorer to docstring * Add _n_labels property to SpanCategorizer Instead of using len(self.labels) in initialize() I am using a private property self._n_labels. This achieves implementation parity and allows me to delete the whole initialize() method for spancat_exclusive (since it's now the same with spancat). * Inherit from SpanCat instead of TrainablePipe This commit changes the inheritance structure of Exclusive_Spancat, now it's inheriting from SpanCategorizer than TrainablePipe. This allows me to remove duplicate methods that are already present in the parent function. * Revert documentation link to spancat * Fix init call for exclusive spancat * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Import Suggester from spancat * Include zero_init.v1 for spancat * Implement _allow_extra_label to use _n_labels To ensure that spancat / spancat_exclusive cannot be resized after initialization, I inherited the _allow_extra_label() method from spacy/pipeline/trainable_pipe.pyx and used self._n_labels instead of len(self.labels) for checking. I think that changing it locally is a better solution rather than forcing each class that inherits TrainablePipe to use the self._n_labels attribute. Also note that I turned-off black formatting in this block of code because it reads better without the overhang. * Extend existing tests to spancat_exclusive In this commit, I extended the existing tests for spancat to include spancat_exclusive. I parametrized the test functions with 'name' (similar var name with textcat and textcat_multilabel) for each applicable test. TODO: Add overfitting tests for spancat_exclusive * Update documentation for spancat * Turn on formatting for allow_extra_label * Remove initializers in default config * Use DEFAULT_EXCL_SPANCAT_MODEL I also renamed spancat_exclusive_default_config into spancat_excl_default_config because black does some not pretty formatting changes. * Update documentation Update grammar and usage Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Clarify docstring for Exclusive_SpanCategorizer * Remove mypy ignore and typecast labels to list * Fix documentation API * Use a single variable for tests * Update defaults for number of rows Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Put back initializers in spancat config Whenever I remove model.scorer.init_w and model.scorer.init_b, I encounter an error in the test: SystemError: <method '__getitem__' of 'dict' objects> returned a result with an error set. My Thinc version is 8.1.5, but I can't seem to check what's causing the error. * Update spancat_exclusive docstring * Remove init_W and init_B parameters This commit is expected to fail until the new Thinc release. * Require thinc>=8.1.6 for serializable Softmax defaults * Handle zero suggestions to make tests pass I'm not sure if this is the most elegant solution. But what should happen is that the _make_span_group function MUST return an empty SpanGroup if there are no suggestions. The error happens when the 'scores' variable is empty. We cannot get the 'predicted' and other downstream vars. * Better approach for handling zero suggestions * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spancategorizer headers * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Add default value in negative_weight in docs * Add default value in allow_overlap in docs * Update how spancat_exclusive is constructed In this commit, I added the following: - Put the default values of negative_weight and allow_overlap in the default_config dictionary. - Rename make_spancat -> make_exclusive_spancat * Run prettier on spancategorizer.mdx * Change exactly one -> at most one * Add suggester documentation in Exclusive_SpanCategorizer * Add suggester to spancat docstrings * merge multilabel and singlelabel spancat * rename spancat_exclusive to singlelable * wire up different make_spangroups for single and multilabel * black * black * add docstrings * more docstring and fix negative_label * don't rely on default arguments * black * remove spancat exclusive * replace single_label with add_negative_label and adjust inference * mypy * logical bug in configuration check * add spans.attrs[scores] * single label make_spangroup test * bugfix * black * tests for make_span_group with negative labels * refactor make_span_group * black * Update spacy/tests/pipeline/test_spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * remove duplicate declaration * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * raise error instead of just print * make label mapper private * update docs * run prettier * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * don't keep recomputing self._label_map for each span * typo in docs * Intervals to private and document 'name' param * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * add Tag to new features * replace tags * revert * revert * revert * revert * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * prettier * Fix merge * Update website/docs/api/spancategorizer.mdx * remove references to 'single_label' * remove old paragraph * Add spancat_singlelabel to config template * Format * Extend init config tests --------- Co-authored-by: kadarakos <kadar.akos@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-09 10:30:59 +01:00
Victoria	4fdf356b29	Add links in website and readme for survey (#12385 )	2023-03-09 10:01:18 +01:00
Marcus Blättermann	b309336712	Make sure to run Python setup before NPM dev mode (#12384 )	2023-03-08 11:59:10 +01:00

1 2 3 4 5 ...

3098 Commits