spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-08-04 04:10:20 +03:00

Author	SHA1	Message	Date
Daniël de Kok	677c1a3507	Speed up the StateC::L feature function (#10019 ) * Speed up the StateC::L feature function This function gets the n-th most-recent left-arc with a particular head. Before this change, StateC::L would construct a vector of all left-arcs with the given head and then pick the n-th most recent from that vector. Since the number of left-arcs strongly correlates with the doc length and the feature is constructed for every transition, this can make transition-parsing quadratic. With this change StateC::L: - Searches left-arcs backwards. - Stops early when the n-th matching transition is found. - Does not construct a vector (reducing memory pressure). This change doesn't avoid the linear search when the transition that is queried does not occur in the left-arcs. Regardless, performance is improved quite a bit with very long docs: Before: N Time 400 3.3 800 5.4 1600 11.6 3200 30.7 After: N Time 400 3.2 800 5.0 1600 9.5 3200 23.2 We can probably do better with more tailored data structures, but I first wanted to make a low-impact PR. Found while investigating #9858. * StateC::L: simplify loop	2022-01-13 09:29:58 +01:00
Daniël de Kok	28299644fc	Speed up the StateC::L feature function (#10019 ) * Speed up the StateC::L feature function This function gets the n-th most-recent left-arc with a particular head. Before this change, StateC::L would construct a vector of all left-arcs with the given head and then pick the n-th most recent from that vector. Since the number of left-arcs strongly correlates with the doc length and the feature is constructed for every transition, this can make transition-parsing quadratic. With this change StateC::L: - Searches left-arcs backwards. - Stops early when the n-th matching transition is found. - Does not construct a vector (reducing memory pressure). This change doesn't avoid the linear search when the transition that is queried does not occur in the left-arcs. Regardless, performance is improved quite a bit with very long docs: Before: N Time 400 3.3 800 5.4 1600 11.6 3200 30.7 After: N Time 400 3.2 800 5.0 1600 9.5 3200 23.2 We can probably do better with more tailored data structures, but I first wanted to make a low-impact PR. Found while investigating #9858. * StateC::L: simplify loop	2022-01-13 09:03:55 +01:00
jsnfly	176a90edee	Fix texcat loss scaling (#9904 ) (#10002 ) * add failing test for issue 9904 * remove division by batch size and summation before applying the mean Co-authored-by: jonas <jsnfly@gmx.de>	2022-01-13 09:03:23 +01:00
Sofie Van Landeghem	d8a3012539	Merge pull request #10037 from explosion/master Update develop with master	2022-01-12 12:29:23 +01:00
Ryn Daniels	057b8c64c0	Check for assets with size of 0 bytes (#10026 ) * Check for assets with size of 0 bytes * Update spacy/cli/project/assets.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-01-12 10:34:23 +01:00
Sofie Van Landeghem	5ba4171b19	Update LICENSE to include 2022 [ci skip]	2022-01-07 09:24:07 +01:00
Ines Montani	005e23a525	Merge pull request #9989 from explosion/docs/update-algolia-search-api [ci skip]	2022-01-05 14:14:42 +01:00
Ines Montani	a437ca6737	Update website to use new Algolia search API	2022-01-05 13:21:06 +01:00
Sofie Van Landeghem	067a44a417	Merge pull request #9987 from explosion/master Update develop with commits from master	2022-01-05 11:49:50 +01:00
Lj Miranda	00e7bf5ffd	Add a few docs to the default_config.cfg (#9981 ) * Clarify patience hyperparameter The current value for patience doesn't seem to indicate that it's pointing to the number of steps. It may be useful to specify that explicitly. Ref: https://github.com/explosion/spaCy/discussions/7450 Ref: https://github.com/explosion/spaCy/discussions/7465 * Update docs for max_steps	2022-01-05 09:16:40 +01:00
Duygu Altinok	55cf492218	Feat/debug data warn spread ents (#9960 ) * added check for crossing boundaries * formatted blacked * Rephrasing slightly Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-01-04 18:22:10 +01:00
Sofie Van Landeghem	56dcb39fb7	Fix references to config file in the docs & UX (#9961 ) * doc fixes around config file * fix typo * clarify default	2022-01-04 14:31:26 +01:00
Sofie Van Landeghem	029a48e340	fix type of lexeme.rank (#9979 )	2022-01-04 13:15:25 +01:00
Sam Edwardes	6f65e2b544	Added spacypdfreader to universe.json (#9963 )	2022-01-03 16:34:36 +09:00
Richard Hudson	cc21eac88a	Use \n rather than linesep for consistency with wasabi	2021-12-29 13:33:56 +01:00
Richard Hudson	85da92f041	Ignore Windows carriage return characters	2021-12-29 12:16:45 +01:00
Paul O'Leary McCann	f40e237c5a	Remove denomme from universe (#9952 ) Package seems to have been deleted.	2021-12-29 11:41:29 +01:00
Richard Hudson	f7f9cc72e7	Fixed supports_ansi problem for Windows tests	2021-12-29 11:22:48 +01:00
Florian Cäsar	86e71e7b19	Fix Scorer.score_cats for missing labels (#9443 ) * Fix Scorer.score_cats for missing labels * Add test case for Scorer.score_cats missing labels * semantic nitpick * black formatting * adjust test to give different results depending on multi_label setting * fix loss function according to whether or not missing values are supported * add note to docs * small fixes * make mypy happy * Update spacy/pipeline/textcat.py Co-authored-by: Florian Cäsar <florian.caesar@pm.me> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <svlandeg@github.com>	2021-12-29 11:04:39 +01:00
Richard Hudson	264ead3274	Removed incorrect automatically added import statement	2021-12-29 10:11:48 +01:00
Sofie Van Landeghem	b8106e0f95	Merge pull request #9951 from explosion/master Update develop branch with master	2021-12-29 10:11:43 +01:00
Richard Hudson	8e55efcbd9	Check SUPPORTS_ANSI when rendering	2021-12-29 09:30:35 +01:00
Richard Hudson	08370604d3	Change order of imports Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-12-29 09:22:06 +01:00
Richard Hudson	678bc61086	Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-12-29 09:21:23 +01:00
Richard Hudson	e3e8495b41	Updated requirements.txt	2021-12-29 08:47:56 +01:00
Yoav Vollansky	9d63dfacfc	Update UNIVERSE.md (#9941 ) typo	2021-12-27 13:46:04 +01:00
Peter Baumgartner	72abf9e102	MultiHashEmbed vector docs correction (#9918 )	2021-12-27 11:18:08 +01:00
Richard Hudson	92943f8a23	Removed unused import	2021-12-23 17:47:56 +01:00
Richard Hudson	2cae470180	More type corrections	2021-12-23 17:35:47 +01:00
Richard Hudson	106fb53509	More type corrections	2021-12-23 17:24:28 +01:00
Richard Hudson	5c850b2ac3	Corrected types	2021-12-23 17:01:43 +01:00
Richard Hudson	e713aa0938	Add surrounding tokens functionality	2021-12-23 16:13:40 +01:00
Duygu Altinok	7ec1452f5f	added ellided forms (#9878 ) * added ellided forms * rearranged a bit * rearranged a bit * added stopword tests * blacked tests file	2021-12-23 13:41:01 +01:00
Andrew Janco	3cfeb518ee	Handle "_" value for token pos in conllu data (#9903 ) * change '_' to '' to allow Token.pos, when no value for token pos in conllu data * Minor code style Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-12-21 15:46:33 +01:00
Adriane Boyd	837d241b68	Make floret murmurhash endian-neutral (#9735 )	2021-12-20 17:11:31 +01:00
Adriane Boyd	1163073756	Remove outdated patterns MANIFEST.in (#9912 )	2021-12-20 16:40:20 +01:00
Adriane Boyd	18e5638af0	Extend cupy to v10.x (#9911 ) * Add extra for `cupy-cuda115`	2021-12-20 15:48:35 +01:00
Sofie Van Landeghem	7847839003	Merge pull request #9891 from explosion/master Update develop with master	2021-12-17 14:01:27 +01:00
Daniël de Kok	93e9bf681f	Merge pull request #9873 from danieldk/temporarily-pin-mypy Pin mypy to 0.910 until there is a compatible pydantic version	2021-12-16 10:28:31 +01:00
Daniël de Kok	b08f1ac17d	Pin mypy to 0.910 until there is a compatible pydantic version	2021-12-16 09:31:45 +01:00
Adriane Boyd	94fbd88521	Use dict.copy().items() instead of list(.items()) (#9868 )	2021-12-16 09:17:33 +01:00
Edward	018827e9fd	Add healthsea to universe (#9838 ) * Add healthsea to universe * Update website/meta/universe.json * Add thumbnail * Update website/meta/universe.json Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-12-15 17:57:19 +01:00
antonpibm	ac45ae3779	Update Tokenizer documentation to reflect token_match and url_match signatures (#9859 )	2021-12-15 09:34:33 +01:00
Ines Montani	ba0fa7a64e	Support Google Sheets embeds in docs (#9861 )	2021-12-15 09:27:08 +01:00
Richard Hudson	ed788c5def	Add render_instances function	2021-12-08 19:24:32 +01:00
Richard Hudson	bd00611259	Add render_text	2021-12-08 17:47:29 +01:00
Richard Hudson	49f3fd39b9	Refactoring	2021-12-08 16:42:39 +01:00
Richard Hudson	183d535ef4	Add permitted values	2021-12-08 14:58:02 +01:00
Richard Hudson	9f7f234b0f	Added tabular view	2021-12-08 14:30:38 +01:00
Richard Hudson	e04950ef3c	Fixed problems with non-projective trees	2021-12-07 12:04:41 +01:00

... 10 11 12 13 14 ...

15817 Commits