Commit Graph

14572 Commits

Author SHA1 Message Date
Paul O'Leary McCann
13bef2ddb6 Add width prior feature
Not necessary for convergence, but in coref-hoi this seems to add a few
f1 points.

Note that there are two width-related features in coref-hoi. This is a
"prior" that is added to mention scores. The other width related feature
is appended to the span embedding representation for other layers to
reference.
2021-07-05 21:06:28 +09:00
Paul O'Leary McCann
8f66176b2d Fix loss?
This rewrites the loss to not use the Thinc crossentropy code at all.
The main difference here is that the negative predictions are being
masked out (= marginalized over), but negative gradient is still being
reflected.

I'm still not sure this is exactly right but models seem to train
reliably now.
2021-07-05 18:17:10 +09:00
Paul O'Leary McCann
5db28ec2fd Tweak mention limit calculation
The calculation of this in the coref-hoi code is hard to follow. Based
on comments and variable names it sounds like it's using the doc length,
but it might actually be the number of mentions? Number of mentions
should be much larger and seems more correct, but might want to revisit
this.
2021-07-03 21:13:32 +09:00
Paul O'Leary McCann
2d3c559dc4 On initialize, use just two samples
Coref docs are kind of long, and using 10 samples on a smallish GPU can
cause OOMs.
2021-07-03 18:43:03 +09:00
Paul O'Leary McCann
251a5b43ac Minor fix in crossing spans code
I think this was technically incorrect but harmless. The reason the code
here is different than the reference in coref-hoi is that the indices
there are such that they get +1 at the end of processing, while the code
here handles indices directly.
2021-07-03 18:41:46 +09:00
Paul O'Leary McCann
865caedebd Remove XXX comment
Comment wondered if there should be some subtraction to avoid double
counting, but it probably doesn't matter because the diagonal is 0.
2021-07-03 18:40:38 +09:00
Paul O'Leary McCann
d74fa82c80 Fix axis handling in topk
In practice this is only ever used with axis=1, so it wasn't causing
issues, even though it was wrong.
2021-07-03 18:39:25 +09:00
Paul O'Leary McCann
f2e0e9dc28 Move placeholder handling into model code 2021-07-03 18:38:48 +09:00
Paul O'Leary McCann
3f66e18592 Clean up pw_prod loss
This doesn't change the math but makes the transposes slightly easier to
understand (maybe?).
2021-07-03 18:33:17 +09:00
Paul O'Leary McCann
b02df61eb9 Add test for crossing spans
This should maybe go elsewhere?
2021-06-28 18:21:00 +09:00
Paul O'Leary McCann
4f377d8de8 Fix bug in crossing span detection 2021-06-28 18:20:33 +09:00
Paul O'Leary McCann
23344857b9 Remove unused function 2021-06-28 18:19:43 +09:00
Paul O'Leary McCann
5c98c4c3b9 Probably fix pw prod backprop
I think this change is correct, but intuition doesn't really help
here...
2021-06-17 21:23:00 +09:00
Paul O'Leary McCann
ccf561112a Remove old comments 2021-06-17 21:22:17 +09:00
Paul O'Leary McCann
a62121e3b4 Expose more hyperparameters 2021-06-17 21:21:46 +09:00
Paul O'Leary McCann
848fd102e7 Small fix 2021-06-17 21:19:38 +09:00
Paul O'Leary McCann
fce804a79f Minor optimization 2021-06-17 21:10:46 +09:00
Paul O'Leary McCann
cb2364cf83 Fix type of mask
The call here was creating a float64 array, which was turning many
downstream scores into float64s. Later on these values were assigned to
a float32 array in backprop, and numerical underflow caused things to go
to zero.

That's almost certainly not the only reason things go to zero, but it is
incorrect.
2021-06-17 17:56:00 +09:00
Paul O'Leary McCann
8452d117ef Fix typo, remove old comment 2021-06-13 19:42:55 +09:00
Paul O'Leary McCann
96be7e8858 Change topk to sort descending
Shouldn't change correctness but is a little clearer
2021-06-13 19:42:24 +09:00
Paul O'Leary McCann
d71198ed36 Replace squeeze with flatten
At a few points in the code it's normal to get a "2d" array where each
row is a single entry. Calling squeeze will make that a proper 1d
array... unless it's just one entry, in which case it turns into a 0d
scalar. That's not what we want; flatten() provides the desired
behavior.
2021-06-12 19:48:01 +09:00
Paul O'Leary McCann
e728b0e45d Silence warning 2021-06-12 19:31:35 +09:00
Paul O'Leary McCann
7efbc721a1 Don't use is_sentenced 2021-06-12 19:29:27 +09:00
Paul O'Leary McCann
67d9ebc922 Transpose before calculating loss 2021-06-04 17:56:08 +09:00
Paul O'Leary McCann
18444fccd9 Remove old comment 2021-06-04 17:56:08 +09:00
Paul O'Leary McCann
4a4ef72191 Clean up unused functions
`make_clean_doc` is not needed and was removed.

`logsumexp` may be needed if I misunderstood the loss calculation, so I
left it in for now with a note.
2021-06-02 21:42:23 +09:00
svlandeg
0aa1083ce8 avoid repetitive entities in the output 2021-05-28 16:52:51 +02:00
svlandeg
0d81bce9cc add failing test for too short a sentence 2021-05-28 15:10:35 +02:00
svlandeg
0f5c586e2f add basic tests for debugging 2021-05-28 14:19:55 +02:00
svlandeg
391b512afd fix types of fwd functions 2021-05-27 16:36:46 +02:00
svlandeg
04b55bf054 removing unused imports 2021-05-27 16:31:38 +02:00
svlandeg
910026582d set versions to v1 instead of v0 2021-05-27 16:17:20 +02:00
svlandeg
2e3c0e2256 delete outdated tests 2021-05-27 13:54:31 +02:00
svlandeg
ba2e491cc4 Merge remote-tracking branch 'upstream/master' into feature/coref 2021-05-27 13:50:32 +02:00
Sofie Van Landeghem
3c58c0323f
fix docs (#8200) 2021-05-27 10:48:59 +02:00
Sofie Van Landeghem
290bd6ed39
ensure tolerance is properly passed on (#8158) 2021-05-27 18:10:28 +10:00
Paul O'Leary McCann
0c553ecd4e Fix docs (fix #8189) 2021-05-24 19:47:30 +09:00
Paul O'Leary McCann
a484245f35 Remove references to coref_er 2021-05-24 19:08:45 +09:00
Paul O'Leary McCann
d6389b133d Don't use a generator for no reason 2021-05-24 19:06:15 +09:00
Paul O'Leary McCann
d6fd5fe1c0 Minor cleanup 2021-05-24 14:56:43 +09:00
Paul O'Leary McCann
0942a0b51b Remove coref_er.py
The intent of this was that it would be a component pipeline that used
entities as input, but that's now covered by the get_mentions function
as a pipeline arg.
2021-05-21 18:20:25 +09:00
Paul O'Leary McCann
f6652c9252 Add new coref scoring
This is closer to the traditional evaluation method. That uses an
average of three scores, this is just using the bcubed metric for now
(nothing special about bcubed, just picked one).

The scoring implementation comes from the coval project. It relies on
scipy, which is one issue, and is rather involved, which is another.

Besides being comparable with traditional evaluations, this scoring is
relatively fast.
2021-05-21 15:56:40 +09:00
Paul O'Leary McCann
e1b4a85bb9 Fix loss
The loss was being returned as a single element array, which caused
training to die when it attempted to turn it into JSON.
2021-05-21 15:46:50 +09:00
Paul O'Leary McCann
ff3fed06cf Catch a stray reference 2021-05-20 21:30:46 +09:00
Sofie Van Landeghem
202943bc8c
KB & NEL to/from bytes (#8113)
* unit test for pickling KB

* add pickling test for NEL

* KB to_bytes and from_bytes

* NEL to_bytes and from_bytes

* xfail pickle tests for now

* fix docs

* cleanup
2021-05-20 18:11:30 +10:00
Paul O'Leary McCann
8c5df622d8 Help out python gc in coref backprop 2021-05-20 16:40:55 +09:00
Paul O'Leary McCann
fa92daf052 Break pairwise operations into pseudolayers
This makes their scope tighter and more contained, and has the nice side
effect that fewer things need to be passed around for backprop.
2021-05-20 15:59:51 +09:00
Adriane Boyd
f6128c06b0
Disable GPU CI tests (#8143) 2021-05-19 12:00:07 +02:00
Paul O'Leary McCann
d22acee4f7 Fix backprop
Training seems to actually run now!
2021-05-18 20:09:27 +09:00
Paul O'Leary McCann
2486b8ad4d Fix pipeline intialize 2021-05-18 19:56:27 +09:00