spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-28 19:06:33 +03:00

Author	SHA1	Message	Date
Adriane Boyd	95075298f5	Update pex Makefile defaults (#12832 ) * Update pex Makefile defaults - switch to python 3.8 - only install spacy-lookups-data for extra packages * Update website for pex defaults	2023-07-18 09:29:04 +02:00
Ian Thompson	ef20e114e0	Typo fix in `Language.replace_listeners` docs (#12823 ) * modified: spacy/language.py - corrected typo in docstring for :method:`Language.replace_listeners` - added noqa comment on unused local variable assignment in :method:`Language.from_config` as I wasn't sure if it should be unassigned modified: website/docs/api/language.mdx - corrected typo in `Language.replace_listeners` markdown * modified: spacy/language.py - removed noqa comment --------- Co-authored-by: Ian Thompson <ian.thompson@hrblock.com>	2023-07-14 09:45:54 +02:00
Sofie Van Landeghem	ddffd09602	Trainable lemmatizer docs link (#12795 ) * add an anchor to the trainable lemmatizer section * add requirement for morphologizer,tagger to rule-based lemmatizer * morphologizer only	2023-07-07 15:18:16 +02:00
Adriane Boyd	1a55661cfb	Update website binder version to v3.6 (#12805 )	2023-07-07 10:52:33 +02:00
Adriane Boyd	41dba5bd34	Update max_length default in span finder docs (#12803 )	2023-07-07 10:17:41 +02:00
Adriane Boyd	4e19ec7eb8	Docs for v3.6.0 (#12792 ) * Docs for v3.6.0 * Add sl performance * Add da trf note	2023-07-06 12:58:25 +02:00
Tom Aarsen	eab929361d	Use 'exclude' instead of 'disable' (#12783 ) as suggested by @svlandeg	2023-07-04 11:45:13 +02:00
Marcus Blättermann	bd239511a4	Fix problem with missing syntax highlighting languages causing runtime crash on the website (#12781 ) * Fix problem with universe pages using `docker` language * Fix problem with universe pages using `r` language * Add fallback, in case code language is unknown	2023-07-03 10:24:25 +02:00
Daniël de Kok	57a230c6e4	Remove section about parallel training with Ray (#12770 ) The Ray integration is currently broken, having these docs around suggest that this functionality is currently available.	2023-06-28 17:09:57 +02:00
Adriane Boyd	fb0da3e097	Support custom token/lexeme attribute for vectors (#12625 ) * Support custom token/lexeme attribute for vectors * Fix imports * Back off to ORTH without Vectors.attr * Fallback if vectors.attr doesn't exist * Update docs	2023-06-28 09:43:14 +02:00
Tom Aarsen	93983f08fc	Add SpanMarker for NER to spaCy universe (#12730 ) * Add SpanMarker for NER to spaCy universe * Escape the newlines in the text in the code example Or at least, attempt to * Remove now unnecessary import * Disable NER pipeline component in code example	2023-06-20 16:47:44 +02:00
David Berenstein	53c400bd7a	docs: added reference to `spacy-setfit` to the spaCy Universe (#12737 ) * docs: added reference to spacy-setfit * removed package import after adding factory entry points to packages	2023-06-19 15:52:07 +02:00
Marcus Blättermann	7e4b38c841	Fix #12716 does not update the `config` generation section (#12718 ) This is a really odd bug, where Firefox doesn't re-render the `code` element, even though `children` changed. Two things fixed that: - remove the `language-ini` `className` - replace the `code` block with a `div` Both are not ideal. Therefor this solution adds an inner `div` that now has the classes while still maintaining the semantic `code` element. I couldn't find any explanation for why this is happening and why it only happens in Firefox. I assume it is a bug caused by one of our many dependencies (or their interplay) To make matters worse: This bug doesn't occure when running the site in dev mode. You have to build and serve the site to recreate it.	2023-06-19 09:34:28 +02:00
Jacobo Myerston	daa6e0339f	Update universe.json (#12709 ) * Update universe.json * Update universe.json add some missing commas in the greCy's description.	2023-06-12 13:55:20 +02:00
kadarakos	c003aac29a	SpanFinder into spaCy from experimental (#12507 ) * span finder integrated into spacy from experimental * black * isort * black * default spankey constant * black * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * rename * rename * max_length and min_length as Optional[int] and strict checking * black * mypy fix for integer type infinity * revert line order * implement all comparison operators for inf int * avoid two for loops over all docs by not precomputing * interleave thresholding with span creation * black * revert to not interleaving (relized its faster) * black * Update spacy/errors.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * update dosctring * enforce that the gold and predicted documents have the same text * new error for ensuring reference and predicted texts are the same * remove todo * adjust test * black * handle misaligned tokenization * return correct variable * failing overfit test * only use a single spans_key like in spancat * black * remove debug lines * typo * remove comment * remove near duplicate reduntant method * use the 'spans_key' variable name everywhere * Update spacy/pipeline/span_finder.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * flaky test fix suggestion, hand set bias terms * only test suggester and test result exhaustively * make it clear that the span_finder_suggester is more general (not specific to span_finder) * Update spacy/tests/pipeline/test_span_finder.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Apply suggestions from code review * remove question comment * move preset_spans_suggester test to spancat tests * Add docs and unify default configs for spancat and span finder * Add `allow_overlap=True` to span finder scorer * Fix offset bug in set_annotations * Ignore labels in span finder scorer * Format * Add span_finder to quickstart template * Move settings to self.cfg, store min/max unset as None * Remove debugging * Update docstrings and docs * Update spacy/pipeline/span_finder.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix imports --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-06-07 15:52:28 +02:00
Isabel Zimmerman	05df59fd4a	[DOCS] add vetiver to spacy universe (#12557 ) * add vetiver to spacy universe * remove image * update logo to render correctly in thumbnail * apply Basil's suggestion Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * refer to the same model --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-06-01 17:11:18 +02:00
Vinit Ravishankar	f0e0206b77	update universe for spacypdfreader (#12661 )	2023-05-23 13:28:48 +02:00
Victoria	6930a6bf45	Add spaCy VSCode extension materials (#12592 )	2023-05-19 14:38:53 +02:00
Adriane Boyd	df083f91a5	Add Malay to website languages (#12643 )	2023-05-17 13:13:43 +02:00
Lj Miranda	58779c24ef	Remove shorthand for output-file in spacy apply (#12636 ) The output-file argument is positional, so can't use a shorthand like -o.	2023-05-17 12:36:29 +02:00
David Berenstein	83b6f488cb	universe: Update examples Adept Augementation (#12620 ) * Update universe.json * chore: changed readme example as suggested by Vincent Warmerdam (koaning)	2023-05-15 14:09:33 +02:00
Adriane Boyd	3dc445df8d	Fix new tags in docs for v3.5.x (#12629 ) * Fix new tags in docs for v3.5.x * Fix new tag	2023-05-15 12:06:58 +02:00
Basile Dura	2dd8825f09	docs: add comment on `offset_x` argument (#12630 )	2023-05-15 11:42:47 +02:00
Adriane Boyd	3637148c4d	Add scorer option to return per-component scores (#12540 ) * Add scorer option to return per-component scores Add `per_component` option to `Language.evaluate` and `Scorer.score` to return scores keyed by `tokenizer` (hard-coded) or by component name. Add option to `evaluate` CLI to score by component. Per-component scores can only be saved to JSON. * Update help text and messages	2023-05-12 15:36:54 +02:00
Kenneth Enevoldsen	88680a6eed	docs: remove invalid huggingface-hub push argument (#12624 )	2023-05-12 09:40:28 +02:00
royashcenazi	3252f6b13f	Parsigs universe 3 (#12617 ) * parsigs universe * added model installation explanation in the description * Update website/meta/universe.json Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * added model installement instruction in the code example * added biomedical category --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:49:51 +02:00
royashcenazi	a56ab98e3c	parsigs universe (#12616 ) * parsigs universe * added model installation explanation in the description * Update website/meta/universe.json Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * added model installement instruction in the code example --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:19:28 +02:00
David Berenstein	d11b549195	chore: added adept-augmentations to the spacy universe (#12609 ) * chore: added adept-augmentations to the spacy universe * Apply suggestions from code review Co-authored-by: Basile Dura <bdura@users.noreply.github.com> * Update universe.json --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-05-10 13:16:16 +02:00
Patrick J. Burns	15f16db6ca	Fix typo (#12615 )	2023-05-09 15:52:34 +02:00
Patrick J. Burns	eb3960a15a	Add LatinCy models to universe.json (#12597 ) * Add LatinCy models to universe.json * Update website/meta/universe.json Add install code for LatinCy models to 'code_example' Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update LatinCy ‘code_example’ in website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-05-09 12:02:45 +02:00
Kenneth Enevoldsen	73698326df	Update inmemorylookupkb.mdx (#12586 ) Example does not refer to the in memory lookup	2023-05-02 12:51:13 +02:00
Victoria	a8dfc66135	Add spacy-wasm to universe (#12572 ) * add spacy-wasm to universe * add tag	2023-04-26 14:18:40 +02:00
moxley01	070fa16545	add spacysee project (#12568 )	2023-04-25 12:30:19 +02:00
Victoria	e115408514	remove survey link (#12559 )	2023-04-21 10:22:26 +02:00
Adriane Boyd	b60b027927	Add default option to MorphAnalysis.get (#12545 ) * Add default to MorphAnalysis.get Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for the user to provide a default return value if the field is not found. The default return value remains `[]`, which is not the same as `dict.get`, but is already established as this method's default return value with the return type `List[str]`. However the new `default` option does not enforce that the user-provided default is actually `List[str]`. * Restore test case	2023-04-20 14:06:32 +02:00
TAN Long	119f959218	docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 (#12531 ) Co-authored-by: Tan Long <tanloong@foxmail.com>	2023-04-17 13:14:01 +02:00
andyjessen	02259fa195	Add category to spaCy project (#12506 ) ScispaCy fits within biomedical domain. Consider adding this category.	2023-04-07 15:31:04 +02:00
Madeesh Kannan	6db20b354f	`Docs`: Fix rule-based matching example that expands named entities (#12495 )	2023-04-06 11:45:58 +02:00
Edward	c95d320d28	Add more information to custom code docs (#12491 ) * Add info to sections * Update website/docs/usage/training.mdx --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-06 11:45:19 +02:00
Will Frey	8d4129e177	Fix invalid ConsoleLogger.v3 example config (#12498 ) Replace `progress_bar = "all_steps"` with `progress_bar = "eval"`, which is consistent with the default behavior for `spacy.ConsoleLogger.v1` and `spacy.ConsoleLogger.v2`.	2023-04-04 20:53:07 +02:00
Edward	de32011e4c	Add model-last saving mechanism to pretraining (#12459 ) * Adjust pretrain command * chane naming and add finally block * Add unit test * Add unit test assertions * Update spacy/training/pretrain.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * change finally block * Add to docs * Update website/docs/usage/embeddings-transformers.mdx * Add flag to skip saving model-last --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-04-03 15:24:03 +02:00
Ye Lei (叶磊)	ce258670b7	Allow passing a Span to displacy.parse_deps (#12477 ) * Allow passing a Span to displacy.parse_deps * Update docstring Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update API docs --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-31 09:44:01 +02:00
Edward	dba4e7bece	Add info to stringstore and vocab (#12471 )	2023-03-27 13:15:14 +02:00
sloev / Johannes Valbjørn	fd072533e7	add spacy_onnx_sentiment_english to universe (#12422 ) * add spacy_onnx_sentiment_english to universe * rename to sentimental-onix * fix comma json error * fix typo * typo fix Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * mention need to download model before example works Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-27 11:35:14 +02:00
Prajakta Darade	ae7779e830	corrected example code (#12466 )	2023-03-27 11:32:49 +02:00
kadarakos	d1474fdd91	add explanation about overwriting behaviour (#12464 ) * add explanation about overwriting behaviour * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * format --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-27 10:27:11 +02:00
Vinit Ravishankar	28de85737f	Tagger label smoothing (#12293 ) * add label smoothing * use True/False instead of floats * add entropy to debug data * formatting * docs * change test to check difference in distributions * Update website/docs/api/tagger.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/tagger.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * bool -> float * update docs * fix seed * black * update tests to use label_smoothing = 0.0 * set default to 0.0, update quickstart * Update spacy/pipeline/tagger.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * update morphologizer, tagger test * fix morph docs * add url to docs --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-22 12:17:56 +01:00
Ines Montani	b479f8bfa5	Add user survey alert to the top (#12452 ) * Add user survey alert to the top * Shorter --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-03-22 11:09:37 +01:00
Adriane Boyd	2ce9a220db	Fix --verbose for spacy find-threshold (#12418 )	2023-03-14 17:16:49 +01:00
Lj Miranda	913d74f509	Add spancat_singlelabel pipeline for multiclass and non-overlapping span labelling tasks (#11365 ) * [wip] Update * [wip] Update * Add initial port * [wip] Update * Fix all imports * Add spancat_exclusive to pipeline * [WIP] Update * [ci skip] Add breakpoint for debugging * Use spacy.SpanCategorizer.v1 as default archi * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: kadarakos <kadar.akos@gmail.com> * [ci skip] Small updates * Use Softmax v2 directly from thinc * Cache the label map * Fix mypy errors However, I ignored line 370 because it opened up a bunch of type errors that might be trickier to solve and might lead to a more complicated codebase. * avoid multiplication with 1.0 Co-authored-by: kadarakos <kadar.akos@gmail.com> * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update component versions to v2 * Add scorer to docstring * Add _n_labels property to SpanCategorizer Instead of using len(self.labels) in initialize() I am using a private property self._n_labels. This achieves implementation parity and allows me to delete the whole initialize() method for spancat_exclusive (since it's now the same with spancat). * Inherit from SpanCat instead of TrainablePipe This commit changes the inheritance structure of Exclusive_Spancat, now it's inheriting from SpanCategorizer than TrainablePipe. This allows me to remove duplicate methods that are already present in the parent function. * Revert documentation link to spancat * Fix init call for exclusive spancat * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Import Suggester from spancat * Include zero_init.v1 for spancat * Implement _allow_extra_label to use _n_labels To ensure that spancat / spancat_exclusive cannot be resized after initialization, I inherited the _allow_extra_label() method from spacy/pipeline/trainable_pipe.pyx and used self._n_labels instead of len(self.labels) for checking. I think that changing it locally is a better solution rather than forcing each class that inherits TrainablePipe to use the self._n_labels attribute. Also note that I turned-off black formatting in this block of code because it reads better without the overhang. * Extend existing tests to spancat_exclusive In this commit, I extended the existing tests for spancat to include spancat_exclusive. I parametrized the test functions with 'name' (similar var name with textcat and textcat_multilabel) for each applicable test. TODO: Add overfitting tests for spancat_exclusive * Update documentation for spancat * Turn on formatting for allow_extra_label * Remove initializers in default config * Use DEFAULT_EXCL_SPANCAT_MODEL I also renamed spancat_exclusive_default_config into spancat_excl_default_config because black does some not pretty formatting changes. * Update documentation Update grammar and usage Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Clarify docstring for Exclusive_SpanCategorizer * Remove mypy ignore and typecast labels to list * Fix documentation API * Use a single variable for tests * Update defaults for number of rows Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Put back initializers in spancat config Whenever I remove model.scorer.init_w and model.scorer.init_b, I encounter an error in the test: SystemError: <method '__getitem__' of 'dict' objects> returned a result with an error set. My Thinc version is 8.1.5, but I can't seem to check what's causing the error. * Update spancat_exclusive docstring * Remove init_W and init_B parameters This commit is expected to fail until the new Thinc release. * Require thinc>=8.1.6 for serializable Softmax defaults * Handle zero suggestions to make tests pass I'm not sure if this is the most elegant solution. But what should happen is that the _make_span_group function MUST return an empty SpanGroup if there are no suggestions. The error happens when the 'scores' variable is empty. We cannot get the 'predicted' and other downstream vars. * Better approach for handling zero suggestions * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spancategorizer headers * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Add default value in negative_weight in docs * Add default value in allow_overlap in docs * Update how spancat_exclusive is constructed In this commit, I added the following: - Put the default values of negative_weight and allow_overlap in the default_config dictionary. - Rename make_spancat -> make_exclusive_spancat * Run prettier on spancategorizer.mdx * Change exactly one -> at most one * Add suggester documentation in Exclusive_SpanCategorizer * Add suggester to spancat docstrings * merge multilabel and singlelabel spancat * rename spancat_exclusive to singlelable * wire up different make_spangroups for single and multilabel * black * black * add docstrings * more docstring and fix negative_label * don't rely on default arguments * black * remove spancat exclusive * replace single_label with add_negative_label and adjust inference * mypy * logical bug in configuration check * add spans.attrs[scores] * single label make_spangroup test * bugfix * black * tests for make_span_group with negative labels * refactor make_span_group * black * Update spacy/tests/pipeline/test_spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * remove duplicate declaration * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * raise error instead of just print * make label mapper private * update docs * run prettier * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * don't keep recomputing self._label_map for each span * typo in docs * Intervals to private and document 'name' param * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * add Tag to new features * replace tags * revert * revert * revert * revert * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * prettier * Fix merge * Update website/docs/api/spancategorizer.mdx * remove references to 'single_label' * remove old paragraph * Add spancat_singlelabel to config template * Format * Extend init config tests --------- Co-authored-by: kadarakos <kadar.akos@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-09 10:30:59 +01:00
Victoria	4fdf356b29	Add links in website and readme for survey (#12385 )	2023-03-09 10:01:18 +01:00
Marcus Blättermann	b309336712	Make sure to run Python setup before NPM dev mode (#12384 )	2023-03-08 11:59:10 +01:00
Raphael Mitsch	6aa6b86d49	Make generation of empty `KnowledgeBase` instances configurable in `EntityLinker` (#12320 ) * Make empty_kb() configurable. * Format. * Update docs. * Be more specific in KB serialization test. * Update KB serialization tests. Update docs. * Remove doc update for batched candidate generation. * Fix serialization of subclassed KB in tests. * Format. * Update docstring. * Update docstring. * Switch from pickle to json for custom field serialization.	2023-03-01 16:02:55 +01:00
kadarakos	56aa0cc75f	Displacy doc fix (#12352 ) * more details for color setting * more details for color setting * prettier	2023-03-01 15:38:23 +01:00
Raphael Mitsch	efbc3d37b3	Update docs w.r.t. spacy.CandidateBatchGenerator.v1. (#12350 )	2023-03-01 11:01:35 +01:00
Adriane Boyd	33864f1d07	Add new tags in docs for #12334 (#12348 )	2023-03-01 10:46:13 +01:00
TAN Long	071667376a	Add new REL_OPs: `>+`, `>-`, `<+`, and `<-` (#12334 ) * Add immediate left/right child/parent dependency relations * Add tests for new REL_OPs: `>+`, `>-`, `<+`, and `<-`. --------- Co-authored-by: Tan Long <tanloong@foxmail.com>	2023-02-28 14:36:33 +01:00
Adriane Boyd	4539fbae17	Revert "Fix FUZZY operator definition (#12318 )" (#12336 ) This reverts commit `daedc45d05`. The default length depends on the length of the pattern string and was correct for this example.	2023-02-27 09:48:36 +01:00
andyjessen	daedc45d05	Fix FUZZY operator definition (#12318 ) * Fix FUZZY operator definition The default length of the FUZZY operator is 2 and not 3. * adjust edit distance in matcher usage docs too --------- Co-authored-by: svlandeg <svlandeg@github.com>	2023-02-23 09:37:40 +01:00
Raphael Mitsch	2d4fb94ba0	Fix wrong file name in docs for rule-based matcher. (#12262 )	2023-02-09 12:58:14 +01:00
Raphael Mitsch	d38a88f0f3	Remove negation. (#12252 )	2023-02-08 14:18:33 +01:00
Sofie Van Landeghem	4c60afb946	Backslash fixes in docs (#12213 ) * backslash fixes * revert unrelated change	2023-02-01 10:15:38 +01:00
Paul O'Leary McCann	8932f4dc35	Add extra flag to assets docs (#12194 ) * Add extra flag to assets docs For some reason this wasn't included. * Add new tag to docs	2023-01-30 10:05:23 +01:00
Sofie Van Landeghem	bd739e67d6	explain KB change and how to remedy (#12189 )	2023-01-27 15:13:20 +01:00
Adriane Boyd	5f8a398bb9	Add span_id to Span.char_span, update Doc/Span.char_span docs (#12196 ) * Add span_id to Span.char_span, update Doc/Span.char_span docs `Span.char_span(id=)` should be removed in the future. * Also use Union[int, str] in Doc docstring	2023-01-27 15:09:17 +01:00
Simon Gurcke	774c10fa39	Add alignment_mode argument to Span.char_span() (#12145 ) * Add alignment_mode argument to Span.char_span() * Update website * Update spacy/tokens/span.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add test Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-01-27 11:43:40 +01:00
Daniël de Kok	8d69874afb	Add `spacy.PlainTextCorpusReader.v1` (#12122 ) * Add `spacy.PlainTextCorpusReader.v1` This is a corpus reader that reads plain text corpora with the following format: - UTF-8 encoding - One line per document. - Blank lines are ignored. It is useful for applications where we deal with very large corpora, such as distillation, and don't want to deal with the space overhead of serialized formats. Additionally, many large corpora already use such a text format, keeping the necessary preprocessing to a minimum. * Update spacy/training/corpus.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * docs: add version to `PlainTextCorpus` * Add docstring to registry function * Add plain text corpus tests * Only strip newline/carriage return * Add return type _string_to_tmp_file helper * Use a temporary directory in place of file name Different OS auto delete/sharing semantics are just wonky. * This will be new in 3.5.1 (rather than 4) * Test improvements from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-01-26 11:33:22 +01:00
Marcus Blättermann	a37117abd0	Fix text colors in docs (#12186 )	2023-01-26 10:30:24 +01:00
Marcus Blättermann	056b73468c	Load components dynamically (decrease initial file size for docs) (#12175 ) * Extract `CodeBlock` component into own file * Extract `InlineCode` component into own file * Extract `TypeAnnotation` component into own file * Convert named `export` to `default export` * Remove unused `export` * Simplify `TypeAnnotation` to remove dependency for Prism * Load `Code` component dynamically * Extract `MarkdownToReact` component into own file * WIP Code Dynamic * Load `MarkdownToReact` component dynamically * Extract `htmlToReact` to own file * Load `htmlToReact` component dynamically * Dynamically load `Juniper`	2023-01-25 17:30:41 +01:00
Marcus Blättermann	11f10fff60	Fix frontpage image (#12184 )	2023-01-25 13:17:35 +01:00
Marcus Blättermann	5a6000fb8b	Fix text color in docs (#12183 ) * Fix text color on landing page * Fix code color	2023-01-25 13:14:32 +01:00
Adriane Boyd	8ea15240ca	Update binder version to v3.5 (#12153 )	2023-01-25 13:14:23 +01:00
Marcus Blättermann	99a05734a8	Add `aria-label` to quickstart widget (#12179 )	2023-01-25 11:46:55 +01:00
Marcus Blättermann	0298b1a863	WEB-28 Increase contrast of grey text (#12178 ) * Use transparent colors to increase contrast on darker backgrounds * Increase color contrast of grey text	2023-01-25 11:46:43 +01:00
Marcus Blättermann	3062fae2ca	Fix broken URL (#12176 )	2023-01-25 11:42:19 +01:00
Marcus Blättermann	57ba37bc52	Fix regression with links in prompts (#12172 )	2023-01-25 08:51:40 +01:00
Marcus Blättermann	05a3685849	Fix broken syntax for type annotations (#12171 )	2023-01-25 08:51:25 +01:00
Marcus Blättermann	f3c586f74a	Fix navigation alert (#12169 ) Fixes a regression introduced in #12163	2023-01-24 16:40:40 +01:00
Marcus Blättermann	49237f05a6	Fix `aria-hidden` element (#12163 ) * Rename CSS class to make use more clear * Rename component prop to improve code readability * Fix `aria-hidden` directly on a link element This link wouldn't have been clickable by screenreaders * Refactor component This removes a unnessary `div` and a duplicate link Co-authored-by: Ines Montani <ines@ines.io>	2023-01-24 14:44:47 +01:00
Marcus Blättermann	0a70696923	Fix wrong HTML element attribute (#12151 ) Originally introduced in `62b9c9c6d7` Original error: Warning: Invalid DOM property `class`. Did you mean `className`? React doesn't have `class`, it uses `className`.	2023-01-24 14:35:31 +01:00
Marcus Blättermann	9555e7aecf	Remove unnessary links (#12159 ) There is no need to link to the image we are already viewing and this is also considered an accessibility issue.	2023-01-24 14:01:00 +01:00
Marcus Blättermann	031f6c7b60	WEB-27 Add `alt` tags to images (#12166 ) * Update spaCy badge `alt` text * Add `next/image` component to Universe * Add missing `alt`texts	2023-01-24 13:56:14 +01:00
Marcus Blättermann	c9beb47ab7	Increase contrast of text and theme color (#12165 )	2023-01-24 13:55:20 +01:00
Marcus Blättermann	a7d6a62f7c	Remove zoom locking (#12164 ) * Fix missing comma * Activate user zoom for website This is recommended by lighthouse: > Disabling zooming is problematic for users with low vision who rely on screen magnification to properly see the contents of a web page. Learn more. Also iOS already ignores this attribute anyway.	2023-01-24 13:54:49 +01:00
Marcus Blättermann	48159e1d60	Update explosion logo (#12162 ) This fixes a misalignment of the explosion logo	2023-01-24 13:53:51 +01:00
Marcus Blättermann	7160f7835d	Fix GitHub badge (#12161 ) * Extract component * Remove rounded border form GitHub Stars badge * Add `alt` text	2023-01-24 13:53:28 +01:00
Marcus Blättermann	3aa61e615f	Add missing label (#12160 )	2023-01-24 13:52:55 +01:00
Marcus Blättermann	fcedcd54a8	WEB-30 spaCy pattern in `.png` (#12158 ) * Fix gap in landing pattern at the top * Replace `.jpg` patterns with `.png` This drastically reduces file size (for the landing page from 221kb to 57kb) while doubling the resolution to look sharper on retina displays.	2023-01-24 13:51:39 +01:00
Edward	e9048fd4a1	Add how to load probability tables to existing models to spaCy docs (#12051 ) * add section about adding tables to models * change to lexeme_norm * Change syntax * change to _prob * Update website/docs/usage/saving-loading.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-01-24 10:01:22 +01:00
Marcus Blättermann	8a3ca77d9e	Fix broken social media image (#12137 )	2023-01-20 16:57:43 +01:00
Sofie Van Landeghem	0f5d8a27f2	3.5 usage page (#12057 ) * skeleton * Fill in non-CLI details from release notes draft * Add TODO for fuzzy matching * Website updates for v3-5 draft * Fill in usage examples * Add fuzzy matching to intro * Fix fuzzy examples * Shell example formatting * Fix typo * Format * Remove trailing periods in internal list * Update * Fix spacing for nested lists * Update InMemoryLookupKB link Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Ines Montani <ines@ines.io>	2023-01-19 16:13:04 +01:00
Adriane Boyd	3b8918e166	API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar (#12128 ) * API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar * adjust to mdx * linkout to InMemoryLookupKB at first occurrence in kb.mdx * fix links to docs * revert Azure trigger setting (I'll make a separate PR) Co-authored-by: svlandeg <svlandeg@github.com>	2023-01-19 13:29:17 +01:00
Adriane Boyd	a9910b6081	Update years in website landing page (#12107 ) * Update years in website landing page * Update website/pages/index.tsx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-01-19 11:08:02 +01:00
Sofie Van Landeghem	7d88c55eeb	update docs for apply (#12127 ) * update docs for apply * prettier	2023-01-19 10:37:09 +01:00
Adriane Boyd	28fd589b85	Move all website gitignore settings to website/.gitignore (#12120 )	2023-01-18 21:46:19 +01:00
Daniël de Kok	668ec989ad	Update Dockerfile to work with Next.js (#12119 ) * Update Dockerfile to work with Next.js - Update to Node 18 - Do not run as root, this also works better with Node privilege-dropping. - Update README with new run instructions and adding the `--rm` flag to avoid leaving a bunch of unused Docker containers. - Also change README to recommend building the image locally. Image builds are pretty fast and the uploaded images get outdated pretty quickly. * Add .dockerignore to avoid sending large build contexts * Typo	2023-01-18 18:15:47 +01:00
Adriane Boyd	794cea6907	Fix comments and examples for levenshtein_compare (#12113 )	2023-01-18 08:02:33 +01:00
Paul O'Leary McCann	a3b15c9f53	Clarify how `--code` arg works (#12102 ) * Clarify how `--code` arg works This adds a few sentences to the docs to clarify how the `--code` argument works, including an explanation of how to load custom components in your own code. * Add link to spacy.load docs	2023-01-17 19:30:02 +09:00
Daniël de Kok	319eb508b5	Add a `spacy benchmark speed` subcommand (#11902 ) * Add a `spacy evaluate speed` subcommand This subcommand reports the mean batch performance of a model on a data set with a 95% confidence interval. For reliability, it first performs some warmup rounds. Then it will measure performance on batches with randomly shuffled documents. To avoid having too many spaCy commands, `speed` is a subcommand of `evaluate` and accuracy evaluation is moved to its own `evaluate accuracy` subcommand. * Fix import cycle * Restore `spacy evaluate`, make `spacy benchmark speed` an alias * Add documentation for `spacy benchmark` * CREATES -> PRINTS * WPS -> words/s * Disable formatting of benchmark speed arguments * Fail with an error message when trying to speed bench empty corpus * Make it clearer that `benchmark accuracy` is a replacement for `evaluate` * Fix docstring webpage reference * tests: check `evaluate` output against `benchmark accuracy`	2023-01-12 11:55:21 +01:00
Paul O'Leary McCann	8e558095a1	Clean up displacy port-related error messages, docs (#12089 ) * Clean up displacy port-related error messages, docs There were some issues in the error messages and docs in #11948. 1. the error messages didn't specify the port argument to displacy.serve correctly 2. the docs didn't mark the auto select argument as new This addresses those issues. * Update website/docs/api/top-level.md Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * Apply prettier Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-01-12 14:54:09 +09:00

1 2 3 4 5 ...

3150 Commits