Commit Graph

123 Commits

Author SHA1 Message Date
svlandeg
cd6c263fe4 format offsets 2019-07-23 11:31:29 +02:00
svlandeg
9f8c1e71a2 fix for Issue #4000 2019-07-22 13:34:12 +02:00
svlandeg
dae8a21282 rename entity frequency 2019-07-19 17:40:28 +02:00
svlandeg
21176517a7 have gold.links correspond exactly to doc.ents 2019-07-19 12:36:15 +02:00
svlandeg
e1213eaf6a use original gold object in get_loss function 2019-07-18 13:35:10 +02:00
svlandeg
ec55d2fccd filter training data beforehand (+black formatting) 2019-07-18 10:22:24 +02:00
svlandeg
b7a0c9bf60 fixing the context/prior weight settings 2019-07-03 17:48:09 +02:00
svlandeg
8840d4b1b3 fix for context encoder optimizer 2019-07-03 13:35:36 +02:00
svlandeg
3420cbe496 small fixes 2019-07-03 10:25:51 +02:00
svlandeg
2d2dea9924 experiment with adding NER types to the feature vector 2019-06-29 14:52:36 +02:00
svlandeg
c664f58246 adding prior probability as feature in the model 2019-06-28 16:22:58 +02:00
svlandeg
1c80b85241 fix tests 2019-06-28 08:59:23 +02:00
svlandeg
68a0662019 context encoder with Tok2Vec + linking model instead of cosine 2019-06-28 08:29:31 +02:00
svlandeg
dbc53b9870 rename to KBEntryC 2019-06-26 15:55:26 +02:00
svlandeg
1de61f68d6 improve speed of prediction loop 2019-06-26 13:53:10 +02:00
svlandeg
bee23cd8af try Tok2Vec instead of SpacyVectors 2019-06-25 16:09:22 +02:00
svlandeg
b58bace84b small fixes 2019-06-24 10:55:04 +02:00
svlandeg
a31648d28b further code cleanup 2019-06-19 09:15:43 +02:00
svlandeg
478305cd3f small tweaks and documentation 2019-06-18 18:38:09 +02:00
svlandeg
0d177c1146 clean up code, remove old code, move to bin 2019-06-18 13:20:40 +02:00
svlandeg
ffae7d3555 sentence encoder only (removing article/mention encoder) 2019-06-18 00:05:47 +02:00
svlandeg
6332af40de baseline performances: oracle KB, random and prior prob 2019-06-17 14:39:40 +02:00
svlandeg
24db1392b9 reprocessing all of wikipedia for training data 2019-06-16 21:14:45 +02:00
svlandeg
81731907ba performance per entity type 2019-06-14 19:55:46 +02:00
svlandeg
b312f2d0e7 redo training data to be independent of KB and entity-level instead of doc-level 2019-06-14 15:55:26 +02:00
svlandeg
0b04d142de regenerating KB 2019-06-13 22:32:56 +02:00
svlandeg
78dd3e11da write entity linking pipe to file and keep vocab consistent between kb and nlp 2019-06-13 16:25:39 +02:00
svlandeg
b12001f368 small fixes 2019-06-12 22:05:53 +02:00
svlandeg
6521cfa132 speeding up training 2019-06-12 13:37:05 +02:00
svlandeg
66813a1fdc speed up predictions 2019-06-11 14:18:20 +02:00
svlandeg
fe1ed432ef eval on dev set, varying combo's of prior and context scores 2019-06-11 11:40:58 +02:00
svlandeg
83dc7b46fd first tests with EL pipe 2019-06-10 21:25:26 +02:00
svlandeg
7de1ee69b8 training loop in proper pipe format 2019-06-07 15:55:10 +02:00
svlandeg
0486ccabfd introduce goldparse.links 2019-06-07 13:54:45 +02:00
svlandeg
a5c061f506 storing NEL training data in GoldParse objects 2019-06-07 12:58:42 +02:00
svlandeg
61f0e2af65 code cleanup 2019-06-06 20:22:14 +02:00
svlandeg
d8b435ceff pretraining description vectors and storing them in the KB 2019-06-06 19:51:27 +02:00
svlandeg
5c723c32c3 entity vectors in the KB + serialization of them 2019-06-05 18:29:18 +02:00
svlandeg
9abbd0899f separate entity encoder to get 64D descriptions 2019-06-05 00:09:46 +02:00
svlandeg
fb37cdb2d3 implementing el pipe in pipes.pyx (not tested yet) 2019-06-03 21:32:54 +02:00
svlandeg
9e88763dab 60% acc run 2019-06-03 08:04:49 +02:00
svlandeg
268a52ead7 experimenting with cosine sim for negative examples (not OK yet) 2019-05-29 16:07:53 +02:00
svlandeg
a761929fa5 context encoder combining sentence and article 2019-05-28 18:14:49 +02:00
svlandeg
992fa92b66 refactor again to clusters of entities and cosine similarity 2019-05-28 00:05:22 +02:00
svlandeg
8c4aa076bc small fixes 2019-05-27 14:29:38 +02:00
svlandeg
cfc27d7ff9 using Tok2Vec instead 2019-05-26 23:39:46 +02:00
svlandeg
abf9af81c9 learn rate en epochs 2019-05-24 22:04:25 +02:00
svlandeg
86ed771e0b adding local sentence encoder 2019-05-23 16:59:11 +02:00
svlandeg
4392c01b7b obtain sentence for each mention 2019-05-23 15:37:05 +02:00
svlandeg
97241a3ed7 upsampling and batch processing 2019-05-22 23:40:10 +02:00