spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-21 11:14:32 +03:00

Author	SHA1	Message	Date
Paul O'Leary McCann	8bd0474730	Run black	2021-07-18 20:20:22 +09:00
Paul O'Leary McCann	bc081c24fa	Add full traditional scoring This calculates scores as an average of three metrics. As noted in the code, these metrics all have issues, but we want to use them to match up with prior work. This should be replaced with some simpler default scoring and the scorer here should be moved to an external project to be passed in just for generating the traditional scores.	2021-07-18 20:13:10 +09:00
Paul O'Leary McCann	80a17071d3	Remove unused code	2021-07-11 18:46:39 +09:00
Paul O'Leary McCann	447c7070e3	Fix loss Accidentally deleted it	2021-07-10 22:45:25 +09:00
Paul O'Leary McCann	e00bd422d9	Fix span embeds Some of the lengths and backprop weren't right. Also various cleanup.	2021-07-10 21:38:53 +09:00
Paul O'Leary McCann	8f66176b2d	Fix loss? This rewrites the loss to not use the Thinc crossentropy code at all. The main difference here is that the negative predictions are being masked out (= marginalized over), but negative gradient is still being reflected. I'm still not sure this is exactly right but models seem to train reliably now.	2021-07-05 18:17:10 +09:00
Paul O'Leary McCann	2d3c559dc4	On initialize, use just two samples Coref docs are kind of long, and using 10 samples on a smallish GPU can cause OOMs.	2021-07-03 18:43:03 +09:00
Paul O'Leary McCann	f2e0e9dc28	Move placeholder handling into model code	2021-07-03 18:38:48 +09:00
Paul O'Leary McCann	a62121e3b4	Expose more hyperparameters	2021-06-17 21:21:46 +09:00
Paul O'Leary McCann	67d9ebc922	Transpose before calculating loss	2021-06-04 17:56:08 +09:00
svlandeg	04b55bf054	removing unused imports	2021-05-27 16:31:38 +02:00
svlandeg	910026582d	set versions to v1 instead of v0	2021-05-27 16:17:20 +02:00
Paul O'Leary McCann	a484245f35	Remove references to coref_er	2021-05-24 19:08:45 +09:00
Paul O'Leary McCann	d6389b133d	Don't use a generator for no reason	2021-05-24 19:06:15 +09:00
Paul O'Leary McCann	f6652c9252	Add new coref scoring This is closer to the traditional evaluation method. That uses an average of three scores, this is just using the bcubed metric for now (nothing special about bcubed, just picked one). The scoring implementation comes from the coval project. It relies on scipy, which is one issue, and is rather involved, which is another. Besides being comparable with traditional evaluations, this scoring is relatively fast.	2021-05-21 15:56:40 +09:00
Paul O'Leary McCann	e1b4a85bb9	Fix loss The loss was being returned as a single element array, which caused training to die when it attempted to turn it into JSON.	2021-05-21 15:46:50 +09:00
Paul O'Leary McCann	d22acee4f7	Fix backprop Training seems to actually run now!	2021-05-18 20:09:27 +09:00
Paul O'Leary McCann	2486b8ad4d	Fix pipeline intialize	2021-05-18 19:56:27 +09:00
Paul O'Leary McCann	e303628205	Attempt to use registry correctly	2021-05-17 14:52:48 +09:00
Paul O'Leary McCann	91b111467b	Minor fixes	2021-05-17 14:52:30 +09:00
Paul O'Leary McCann	7c42a8c90a	Migrate coref code This includes the coref code that was being tested separately, modified to work in spaCy. It hasn't been tested yet and presumably still needs fixes. In particular, the evaluation code is currently omitted. It's unclear at the moment whether we want to use a complex scorer similar to the official one, or a simpler scorer using more modern evaluation methods.	2021-05-15 21:36:10 +09:00
Sofie Van Landeghem	e0c45c669a	Native coref component (#7243 ) * initial coref_er pipe * matcher more flexible * base coref component without actual model * initial setup of coref_er.score * rename to include_label * preliminary score_clusters method * apply scoring in coref component * IO fix * return None loss for now * rename to CoreferenceResolver * some preliminary unit tests * use registry as callable	2021-03-03 13:50:14 +01:00

22 Commits