Paul O'Leary McCann
865caedebd
Remove XXX comment
...
Comment wondered if there should be some subtraction to avoid double
counting, but it probably doesn't matter because the diagonal is 0.
2021-07-03 18:40:38 +09:00
Paul O'Leary McCann
d74fa82c80
Fix axis handling in topk
...
In practice this is only ever used with axis=1, so it wasn't causing
issues, even though it was wrong.
2021-07-03 18:39:25 +09:00
Paul O'Leary McCann
f2e0e9dc28
Move placeholder handling into model code
2021-07-03 18:38:48 +09:00
Paul O'Leary McCann
3f66e18592
Clean up pw_prod loss
...
This doesn't change the math but makes the transposes slightly easier to
understand (maybe?).
2021-07-03 18:33:17 +09:00
Paul O'Leary McCann
b02df61eb9
Add test for crossing spans
...
This should maybe go elsewhere?
2021-06-28 18:21:00 +09:00
Paul O'Leary McCann
4f377d8de8
Fix bug in crossing span detection
2021-06-28 18:20:33 +09:00
Paul O'Leary McCann
23344857b9
Remove unused function
2021-06-28 18:19:43 +09:00
Paul O'Leary McCann
5c98c4c3b9
Probably fix pw prod backprop
...
I think this change is correct, but intuition doesn't really help
here...
2021-06-17 21:23:00 +09:00
Paul O'Leary McCann
ccf561112a
Remove old comments
2021-06-17 21:22:17 +09:00
Paul O'Leary McCann
a62121e3b4
Expose more hyperparameters
2021-06-17 21:21:46 +09:00
Paul O'Leary McCann
848fd102e7
Small fix
2021-06-17 21:19:38 +09:00
Paul O'Leary McCann
fce804a79f
Minor optimization
2021-06-17 21:10:46 +09:00
Paul O'Leary McCann
cb2364cf83
Fix type of mask
...
The call here was creating a float64 array, which was turning many
downstream scores into float64s. Later on these values were assigned to
a float32 array in backprop, and numerical underflow caused things to go
to zero.
That's almost certainly not the only reason things go to zero, but it is
incorrect.
2021-06-17 17:56:00 +09:00
Paul O'Leary McCann
8452d117ef
Fix typo, remove old comment
2021-06-13 19:42:55 +09:00
Paul O'Leary McCann
96be7e8858
Change topk to sort descending
...
Shouldn't change correctness but is a little clearer
2021-06-13 19:42:24 +09:00
Paul O'Leary McCann
d71198ed36
Replace squeeze with flatten
...
At a few points in the code it's normal to get a "2d" array where each
row is a single entry. Calling squeeze will make that a proper 1d
array... unless it's just one entry, in which case it turns into a 0d
scalar. That's not what we want; flatten() provides the desired
behavior.
2021-06-12 19:48:01 +09:00
Paul O'Leary McCann
e728b0e45d
Silence warning
2021-06-12 19:31:35 +09:00
Paul O'Leary McCann
7efbc721a1
Don't use is_sentenced
2021-06-12 19:29:27 +09:00
Paul O'Leary McCann
67d9ebc922
Transpose before calculating loss
2021-06-04 17:56:08 +09:00
Paul O'Leary McCann
18444fccd9
Remove old comment
2021-06-04 17:56:08 +09:00
Paul O'Leary McCann
4a4ef72191
Clean up unused functions
...
`make_clean_doc` is not needed and was removed.
`logsumexp` may be needed if I misunderstood the loss calculation, so I
left it in for now with a note.
2021-06-02 21:42:23 +09:00
svlandeg
0aa1083ce8
avoid repetitive entities in the output
2021-05-28 16:52:51 +02:00
svlandeg
0d81bce9cc
add failing test for too short a sentence
2021-05-28 15:10:35 +02:00
svlandeg
0f5c586e2f
add basic tests for debugging
2021-05-28 14:19:55 +02:00
svlandeg
391b512afd
fix types of fwd functions
2021-05-27 16:36:46 +02:00
svlandeg
04b55bf054
removing unused imports
2021-05-27 16:31:38 +02:00
svlandeg
910026582d
set versions to v1 instead of v0
2021-05-27 16:17:20 +02:00
svlandeg
2e3c0e2256
delete outdated tests
2021-05-27 13:54:31 +02:00
svlandeg
ba2e491cc4
Merge remote-tracking branch 'upstream/master' into feature/coref
2021-05-27 13:50:32 +02:00
Sofie Van Landeghem
3c58c0323f
fix docs ( #8200 )
2021-05-27 10:48:59 +02:00
Sofie Van Landeghem
290bd6ed39
ensure tolerance is properly passed on ( #8158 )
2021-05-27 18:10:28 +10:00
Paul O'Leary McCann
0c553ecd4e
Fix docs ( fix #8189 )
2021-05-24 19:47:30 +09:00
Paul O'Leary McCann
a484245f35
Remove references to coref_er
2021-05-24 19:08:45 +09:00
Paul O'Leary McCann
d6389b133d
Don't use a generator for no reason
2021-05-24 19:06:15 +09:00
Paul O'Leary McCann
d6fd5fe1c0
Minor cleanup
2021-05-24 14:56:43 +09:00
Paul O'Leary McCann
0942a0b51b
Remove coref_er.py
...
The intent of this was that it would be a component pipeline that used
entities as input, but that's now covered by the get_mentions function
as a pipeline arg.
2021-05-21 18:20:25 +09:00
Paul O'Leary McCann
f6652c9252
Add new coref scoring
...
This is closer to the traditional evaluation method. That uses an
average of three scores, this is just using the bcubed metric for now
(nothing special about bcubed, just picked one).
The scoring implementation comes from the coval project. It relies on
scipy, which is one issue, and is rather involved, which is another.
Besides being comparable with traditional evaluations, this scoring is
relatively fast.
2021-05-21 15:56:40 +09:00
Paul O'Leary McCann
e1b4a85bb9
Fix loss
...
The loss was being returned as a single element array, which caused
training to die when it attempted to turn it into JSON.
2021-05-21 15:46:50 +09:00
Paul O'Leary McCann
ff3fed06cf
Catch a stray reference
2021-05-20 21:30:46 +09:00
Sofie Van Landeghem
202943bc8c
KB & NEL to/from bytes ( #8113 )
...
* unit test for pickling KB
* add pickling test for NEL
* KB to_bytes and from_bytes
* NEL to_bytes and from_bytes
* xfail pickle tests for now
* fix docs
* cleanup
2021-05-20 18:11:30 +10:00
Paul O'Leary McCann
8c5df622d8
Help out python gc in coref backprop
2021-05-20 16:40:55 +09:00
Paul O'Leary McCann
fa92daf052
Break pairwise operations into pseudolayers
...
This makes their scope tighter and more contained, and has the nice side
effect that fewer things need to be passed around for backprop.
2021-05-20 15:59:51 +09:00
Adriane Boyd
f6128c06b0
Disable GPU CI tests ( #8143 )
2021-05-19 12:00:07 +02:00
Paul O'Leary McCann
d22acee4f7
Fix backprop
...
Training seems to actually run now!
2021-05-18 20:09:27 +09:00
Paul O'Leary McCann
2486b8ad4d
Fix pipeline intialize
2021-05-18 19:56:27 +09:00
Paul O'Leary McCann
0620820857
Deal with generators in tuplify
2021-05-18 19:55:52 +09:00
Paul O'Leary McCann
a7d9c8156d
Make get_sentence_map work with init
...
When sentences are not available, just treat the whole doc as one
sentence. A reasonable general fallback, but important due to the init
call, where upstream components aren't run.
2021-05-18 19:54:54 +09:00
Paul O'Leary McCann
883c137b26
Add basic tuplify init
2021-05-18 19:53:59 +09:00
Paul O'Leary McCann
051715506e
Fiddle with get_mentions definition
...
Ended up not making a difference, but oh well.
2021-05-18 19:53:33 +09:00
Adriane Boyd
06324e5a5e
Update pydantic requirements ( #8127 )
...
Update pydantic requirements following
https://github.com/explosion/thinc/pull/499
2021-05-18 11:35:50 +02:00