Commit Graph

75 Commits

Author SHA1 Message Date
Paul O'Leary McCann
838f50192b Black formatting 2022-05-25 19:20:03 +09:00
Paul O'Leary McCann
2a8efda689 Code review suggestions, cleanup 2022-05-25 19:18:26 +09:00
Paul O'Leary McCann
c9233a5a1f Import torch from thinc 2022-05-24 17:28:27 +09:00
Paul O'Leary McCann
b1118cee58 Move epsilon 2022-05-24 15:59:08 +09:00
Paul O'Leary McCann
9da16df96e Add guards around torch import
Torch is required for the coref/spanpred models but shouldn't be
required for spaCy in general.

The one tricky part of this is that one function in coref_util relied on
torch, but that file was imported in several places. Since the function
was only used in one place I moved it there.
2022-05-24 15:16:25 +09:00
Paul O'Leary McCann
2e8f0e9168 Rename coref params 2022-05-16 16:50:10 +09:00
Paul O'Leary McCann
13481fbcc2 Remove unused param, add TODOs about typing 2022-05-13 19:29:28 +09:00
kadarakos
7cf6bcca0e merge misery 2022-05-10 17:19:16 +00:00
kadarakos
e512874c80 small refactor and docs 2022-05-10 16:40:31 +00:00
Paul O'Leary McCann
33f4f90ff0 Formatting 2022-05-10 19:09:52 +09:00
Paul O'Leary McCann
41fc092674 Split span predictor model into its own file 2022-05-10 19:08:21 +09:00
svlandeg
6b51258a58 clean up unused imports + black formatting 2022-05-09 13:34:50 +02:00
kadarakos
b53113e3b8
Preparing span predictor for predicting from gold (#10547)
Note this is squashed because rebasing had conflicts.

* remove unnecessary .device

* span predictor debug start

* gearing up SpanPredictor for gold-heads

* merge SpanPredictor attributes

* remove useless extra prefix and device from spanpredictor

* make sure predicted and reference keeps aligned

* handle empty head_ids

* handle empty clusters

* addressing suggestions by @polm

* nicer restore

* fix score overwriting bug

* prepare for aligned heads-spans training

* span accuracy score

* update with eg.predited as other components

* add backprop callback to spanpredictor

* report start- and end-accuracies separately

* fixing scorer

Co-authored-by: Kádár Ákos <akos@onyx.uvt.nl>
2022-04-13 19:42:49 +09:00
Kádár Ákos
2a1ad4c5d2 add backprop callback to spanpredictor 2022-04-08 14:56:44 +02:00
Kádár Ákos
4fc40340f9 handle empty head_ids 2022-03-28 11:28:21 +02:00
Kádár Ákos
83ac0477c8 remove useless extra prefix and device from spanpredictor 2022-03-24 16:44:50 +01:00
Kádár Ákos
1c5dabcb47 merge SpanPredictor attributes 2022-03-24 16:23:12 +01:00
Kádár Ákos
a872c69ffb merge 2022-03-24 16:10:04 +01:00
Kádár Ákos
706b2e6f25 gearing up SpanPredictor for gold-heads 2022-03-24 16:06:20 +01:00
Kádár Ákos
150e7c46d7 conflict 2022-03-23 11:27:02 +01:00
Kádár Ákos
1eaf8fb0cf span predictor debug start 2022-03-23 11:24:27 +01:00
Paul O'Leary McCann
eec00ce60d Fix various sizes in SpanPredictor FFNN 2022-03-23 16:20:31 +09:00
Paul O'Leary McCann
2190cbc0e6 Add progress on SpanPredictor component
This isn't working. There is a CUDA error in the torch code during
initialization and it's not clear why.
2022-03-19 19:39:49 +09:00
Kádár Ákos
db422abf01 remove unnecessary .device 2022-03-18 16:24:26 +01:00
Paul O'Leary McCann
0275ae29de Remove stale comment 2022-03-16 20:09:12 +09:00
Paul O'Leary McCann
6974f55daa Hack for transformer listener size 2022-03-16 15:15:53 +09:00
Paul O'Leary McCann
d0ae2590db Delete all the coref-hoi code 2022-03-15 20:05:24 +09:00
Paul O'Leary McCann
abdc7d87af Clean up util code
Moved everything into coref_util.py, deleted wl-specific file.
2022-03-15 19:59:44 +09:00
Paul O'Leary McCann
8eadf3781b Training runs now
Evaluation needs fixing, and code still needs cleanup.
2022-03-14 19:02:17 +09:00
Paul O'Leary McCann
d22a002641 Forward/backward pass works
Evaluate does not work - predict hasn't been updated
2022-03-14 17:26:27 +09:00
Paul O'Leary McCann
c4f9c24738 The coref model is able to be loaded
The span predictor component is initialized but not used at all now.
Plan is to work on it after the word level clustering part is trainable
end-to-end.
2022-03-09 19:31:11 +09:00
Paul O'Leary McCann
35cc2b138f Add span predictor code
Accidentally omitted before
2022-03-08 18:13:26 +09:00
Paul O'Leary McCann
1c697b4011 Remove references to config
Replaced with model arguments
2022-03-08 18:13:09 +09:00
Paul O'Leary McCann
c0cd5025e3 Start bringin in wl-coref
This absolutely does not work. First step here is getting over most of
the code in roughly the files we want it in. After the code has been
pulled over it can be restructured to match spaCy and cleaned up.
2022-03-06 20:00:15 +09:00
Paul O'Leary McCann
00d481dd12 Stack the mention scorer
In the reference implementations, there's usually a function to build a
ffnn of arbitrary depth, consisting of a stack of Linear >> Relu >>
Dropout. In practice the depth is always 1 in coref-hoi, but in earlier
iterations of the model, which are more similar to our model here (since
we aren't using attention or even necessarily BERT), using a small depth
like 2 was common. This hard-codes a stack of 2.

In brief tests this allows similar performance to the unstacked version
with much smaller embedding sizes.

The depth of the stack could be made into a hyperparameter.
2021-08-09 18:04:42 +09:00
Paul O'Leary McCann
56803d3909 Change mention limit to match reference implementations
This generall means fewer spans are considered, which makes individual
steps in training faster but can make training take longer to find the
good spans.
2021-08-08 19:55:52 +09:00
Paul O'Leary McCann
8bd0474730 Run black 2021-07-18 20:20:22 +09:00
Paul O'Leary McCann
9b63cbb775 Add extract spans import 2021-07-15 18:16:53 +09:00
Paul O'Leary McCann
4a9dc00d86 Use relative indices for mentions
Was using batch absolute indices to manage mentions, but extract_spans
expects doc-relative ones.
2021-07-14 18:36:18 +09:00
Paul O'Leary McCann
c25ec292a9 Cleanup 2021-07-10 22:42:55 +09:00
Paul O'Leary McCann
e00bd422d9 Fix span embeds
Some of the lengths and backprop weren't right.

Also various cleanup.
2021-07-10 21:38:53 +09:00
Paul O'Leary McCann
d7d317a1b5 Clean up span embedding code
This is now cleaner and significantly faster. There's still some messy
parts in the code (particularly variable names), will get to that later.
2021-07-10 19:59:08 +09:00
Paul O'Leary McCann
f34915c1e8 Use scatter_add to speed up span embed backprop
This was the slowest part of the code, and using scatter_add here
probably reduces the runtime by 50%.
2021-07-10 18:08:51 +09:00
Paul O'Leary McCann
d0b041aff4 Switch to using Thinc tuplify
The tuplify code here was added to Thinc proper and that's been
released, so no need to have it here any more.
2021-07-08 16:08:36 +09:00
Paul O'Leary McCann
eb5820b593 Improve take_vecs implementation
This pulls out references to needed bits so that other parts (the larger
embeddings) can be freed before backprop.
2021-07-05 21:08:42 +09:00
Paul O'Leary McCann
13bef2ddb6 Add width prior feature
Not necessary for convergence, but in coref-hoi this seems to add a few
f1 points.

Note that there are two width-related features in coref-hoi. This is a
"prior" that is added to mention scores. The other width related feature
is appended to the span embedding representation for other layers to
reference.
2021-07-05 21:06:28 +09:00
Paul O'Leary McCann
8f66176b2d Fix loss?
This rewrites the loss to not use the Thinc crossentropy code at all.
The main difference here is that the negative predictions are being
masked out (= marginalized over), but negative gradient is still being
reflected.

I'm still not sure this is exactly right but models seem to train
reliably now.
2021-07-05 18:17:10 +09:00
Paul O'Leary McCann
5db28ec2fd Tweak mention limit calculation
The calculation of this in the coref-hoi code is hard to follow. Based
on comments and variable names it sounds like it's using the doc length,
but it might actually be the number of mentions? Number of mentions
should be much larger and seems more correct, but might want to revisit
this.
2021-07-03 21:13:32 +09:00
Paul O'Leary McCann
865caedebd Remove XXX comment
Comment wondered if there should be some subtraction to avoid double
counting, but it probably doesn't matter because the diagonal is 0.
2021-07-03 18:40:38 +09:00
Paul O'Leary McCann
f2e0e9dc28 Move placeholder handling into model code 2021-07-03 18:38:48 +09:00