spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-22 16:24:26 +03:00

Author	SHA1	Message	Date
Paul O'Leary McCann	00d481dd12	Stack the mention scorer In the reference implementations, there's usually a function to build a ffnn of arbitrary depth, consisting of a stack of Linear >> Relu >> Dropout. In practice the depth is always 1 in coref-hoi, but in earlier iterations of the model, which are more similar to our model here (since we aren't using attention or even necessarily BERT), using a small depth like 2 was common. This hard-codes a stack of 2. In brief tests this allows similar performance to the unstacked version with much smaller embedding sizes. The depth of the stack could be made into a hyperparameter.	2021-08-09 18:04:42 +09:00
Paul O'Leary McCann	56803d3909	Change mention limit to match reference implementations This generall means fewer spans are considered, which makes individual steps in training faster but can make training take longer to find the good spans.	2021-08-08 19:55:52 +09:00
Paul O'Leary McCann	1d1679d431	Minor speedup This continue should be a break. The current form doesn't cause errors but using a break will be a bit faster.	2021-07-21 19:50:10 +09:00
Paul O'Leary McCann	a151c62d13	Add sentence map test	2021-07-19 13:05:26 +09:00
Paul O'Leary McCann	3ed0fae671	Add multi-sentence mention test Also formatting.	2021-07-19 13:00:16 +09:00
Paul O'Leary McCann	8bd0474730	Run black	2021-07-18 20:20:22 +09:00
Paul O'Leary McCann	bc081c24fa	Add full traditional scoring This calculates scores as an average of three metrics. As noted in the code, these metrics all have issues, but we want to use them to match up with prior work. This should be replaced with some simpler default scoring and the scorer here should be moved to an external project to be passed in just for generating the traditional scores.	2021-07-18 20:13:10 +09:00
Paul O'Leary McCann	a4531be099	Add simple mention test	2021-07-18 19:15:32 +09:00
Paul O'Leary McCann	9b63cbb775	Add extract spans import	2021-07-15 18:16:53 +09:00
Paul O'Leary McCann	e9626e38c1	Fix serialization test This test was failing not because the thing it was testing wasn't working, but because of the way span equality works. Span equality relies on doc equality, and doc equality is object identity, so spans from different docs will never be equal.	2021-07-14 18:37:34 +09:00
Paul O'Leary McCann	4a9dc00d86	Use relative indices for mentions Was using batch absolute indices to manage mentions, but extract_spans expects doc-relative ones.	2021-07-14 18:36:18 +09:00
Paul O'Leary McCann	3684f7fdfd	Remove comment from fixed test	2021-07-14 18:22:14 +09:00
Paul O'Leary McCann	f1796e4af7	Fix mention list bug There was an off-by-one error in how mentions are generated that would affect mentions at the end of a sentence. This was pretty nasty.	2021-07-14 18:19:00 +09:00
Paul O'Leary McCann	80a17071d3	Remove unused code	2021-07-11 18:46:39 +09:00
Paul O'Leary McCann	447c7070e3	Fix loss Accidentally deleted it	2021-07-10 22:45:25 +09:00
Paul O'Leary McCann	c25ec292a9	Cleanup	2021-07-10 22:42:55 +09:00
Paul O'Leary McCann	e00bd422d9	Fix span embeds Some of the lengths and backprop weren't right. Also various cleanup.	2021-07-10 21:38:53 +09:00
Paul O'Leary McCann	d7d317a1b5	Clean up span embedding code This is now cleaner and significantly faster. There's still some messy parts in the code (particularly variable names), will get to that later.	2021-07-10 19:59:08 +09:00
Paul O'Leary McCann	dc1f974d39	Merge branch 'master' into feature/coref	2021-07-10 18:10:40 +09:00
Paul O'Leary McCann	f34915c1e8	Use scatter_add to speed up span embed backprop This was the slowest part of the code, and using scatter_add here probably reduces the runtime by 50%.	2021-07-10 18:08:51 +09:00
Ines Montani	616f4de034	Merge pull request #8674 from polm/fix/autoblack-no-forks [ci skip] Make the autoblack job not run on forks	2021-07-10 16:41:59 +10:00
Paul O'Leary McCann	b8cdbb4bb6	Make the autoblack job not run on forks The autoblack job is an occasional cleanup job. If it runs on forks and those PRs are accepted the git history will be weird and that doesn't help anyone. The way to make the job not run on forks is a little non-obvious but based on this thread. https://github.com/prisma/prisma/issues/3539	2021-07-10 15:38:20 +09:00
Ines Montani	d4fecdfb82	Merge pull request #8665 from rynoV/patch-1 [ci skip]	2021-07-10 10:52:15 +10:00
Ines Montani	50000d37e4	Avoid double parentheses [ci skip]	2021-07-10 10:52:01 +10:00
Calum Sieppert	e2d53aa1a6	Typo fixes	2021-07-09 10:25:56 -06:00
Adriane Boyd	d8805a1073	Fix ru/uk lemmatizer mp with spawn (#8657 ) Use an instance variable instead a class variable for the morphological analzyer so that multiprocessing with spawn is possible.	2021-07-09 15:36:56 +02:00
Adriane Boyd	b8e720fdb9	Fix Azerbaijani init, extend lang init tests (#8656 ) * Extend langs in initialize tests * Fix az init	2021-07-09 15:36:35 +02:00
Ines Montani	1c0ed22d1e	Merge pull request #8573 from julien-talkair/code-quality-pre-commit	2021-07-09 23:09:24 +10:00
Ines Montani	bbca56687f	Merge pull request #8655 from explosion/autoblack Auto-format code with black	2021-07-09 23:08:05 +10:00
explosion-bot	334f1f98d8	Auto-format code with black	2021-07-09 08:06:06 +00:00
Adriane Boyd	1ee5bee29d	Add Macedonian models to website (#8637 )	2021-07-08 09:32:14 +02:00
Paul O'Leary McCann	d0b041aff4	Switch to using Thinc tuplify The tuplify code here was added to Thinc proper and that's been released, so no need to have it here any more.	2021-07-08 16:08:36 +09:00
Paul O'Leary McCann	1d9209d43a	Merge pull request #8547 from mylibrar/update-universe Add forte to universe.json	2021-07-08 14:59:49 +09:00
Ines Montani	39c8f7949e	Add code preview for textcat_multilabel [ci skip]	2021-07-08 13:33:25 +10:00
Ines Montani	bcd2be40b5	Merge pull request #8634 from rynoV/patch-1 [ci skip]	2021-07-08 12:52:59 +10:00
Calum Sieppert	889c187bc2	Typo fixes	2021-07-07 16:53:04 -06:00
julien-talkair	833f7f2918	👷 configure flake8 pre-commit * uses setup.cfg for flake8 configuration during pre-commit	2021-07-07 21:31:46 +02:00
Ines Montani	530b5d72f6	Merge pull request #8624 from adrianeboyd/docs/v3-1-usage-updates [ci skip] Update v3.1 usage docs	2021-07-07 16:50:36 +10:00
Adriane Boyd	6db647dfe0	Update v3.1 usage docs	2021-07-07 08:43:33 +02:00
Sofie Van Landeghem	64fac754fe	add spacy prefix to ngram_suggester.v1 (#8623 )	2021-07-07 08:09:30 +02:00
julien-talkair	82b01964fa	🚨 adjust flake8 sensitivity * pass arguments to flake8 * reproduce arguments from CI config	2021-07-06 22:41:54 +02:00
Sofie Van Landeghem	733e8ceea9	fix spancat initialize with labels (#8620 )	2021-07-06 19:08:25 +02:00
Sofie Van Landeghem	608fc1d623	avoid msg var impliciteness (#8619 ) * avoid msg var impliciteness * rename local msg * Add CI tests for debug data and train * Adjust debug data CLI test Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-07-06 19:08:08 +02:00
Sofie Van Landeghem	e7d747e3ee	TransitionBasedParser.v1 to legacy (#8586 ) * TransitionBasedParser.v1 to legacy * register sublayers * bump spacy-legacy to 3.0.7	2021-07-06 15:26:45 +02:00
Ines Montani	04a9ade40f	Merge pull request #8466 from explosion/docs/new-in-v3-1 [ci skip]	2021-07-06 22:20:24 +10:00
Luca Dorigo	e8ef4a46d5	Add the right return type for Language.pipe and an overload for the as_tuples case (#8441 ) * Add the right return type for Language.pipe and an overload for the as_tuples version * Reformat, tidy up Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-07-06 14:18:40 +02:00
Sofie Van Landeghem	b9f59118bf	Fix silent evaluation (#8581 ) * fix silentness * sneak in docs typo fix * pass silent boolean instead	2021-07-06 14:16:19 +02:00
Sofie Van Landeghem	3daf57d70c	Small spancat fixes (#8614 ) * two small fixes + additional tests * rename	2021-07-06 14:15:41 +02:00
Ines Montani	327f83573a	Move scores per type handling into util function (#8590 )	2021-07-06 13:02:37 +02:00
Adriane Boyd	5fd0b5207e	Fix vectors check for sourced components (#8559 ) * Fix vectors check for sourced components Since vectors are not loaded when components are sourced, store a hash for the vectors of each sourced component and compare it to the loaded vectors after the vectors are loaded from the `[initialize]` block. * Pop temporary info * Remove stored hash in remove_pipe * Add default for pop * Add additional convert/debug/assemble CLI tests	2021-07-06 12:43:17 +02:00

1 2 3 4 5 ...

14763 Commits