| 
							
							
								 svlandeg | 9abbd0899f | separate entity encoder to get 64D descriptions | 2019-06-05 00:09:46 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | fb37cdb2d3 | implementing el pipe in pipes.pyx (not tested yet) | 2019-06-03 21:32:54 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | d83a1e3052 | Merge branch 'master' into feature/nel-wiki | 2019-06-03 09:35:10 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 9e88763dab | 60% acc run | 2019-06-03 08:04:49 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 268a52ead7 | experimenting with cosine sim for negative examples (not OK yet) | 2019-05-29 16:07:53 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | a761929fa5 | context encoder combining sentence and article | 2019-05-28 18:14:49 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 992fa92b66 | refactor again to clusters of entities and cosine similarity | 2019-05-28 00:05:22 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 8c4aa076bc | small fixes | 2019-05-27 14:29:38 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | cfc27d7ff9 | using Tok2Vec instead | 2019-05-26 23:39:46 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | abf9af81c9 | learn rate en epochs | 2019-05-24 22:04:25 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 86ed771e0b | adding local sentence encoder | 2019-05-23 16:59:11 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 4392c01b7b | obtain sentence for each mention | 2019-05-23 15:37:05 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 97241a3ed7 | upsampling and batch processing | 2019-05-22 23:40:10 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 1a16490d20 | update per entity | 2019-05-22 12:46:40 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | eb08bdb11f | hidden with for encoders | 2019-05-21 23:42:46 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 7b13e3d56f | undersampling negatives | 2019-05-21 18:35:10 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 2fa3fac851 | fix concat bp and more efficient batch calls | 2019-05-21 13:43:59 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 0a15ee4541 | fix in bp call | 2019-05-20 23:54:55 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 89e322a637 | small fixes | 2019-05-20 17:20:39 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 7edb2e1711 | fix convolution layer | 2019-05-20 11:58:48 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | dd691d0053 | debugging | 2019-05-17 17:44:11 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 400b19353d | simplify architecture and larger-scale test runs | 2019-05-17 01:51:18 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | d51bffe63b | clean up code | 2019-05-16 18:36:15 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | b5470f3d75 | various tests, architectures and experiments | 2019-05-16 18:25:34 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 9ffe5437ae | calculate gradient for entity encoding | 2019-05-15 02:23:08 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 2713abc651 | implement loss function using dot product and prob estimate per candidate cluster | 2019-05-14 22:55:56 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 09ed446b20 | different architecture / settings | 2019-05-14 08:37:52 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 4142e8dd1b | train and predict per article (saving time for doc encoding) | 2019-05-13 17:02:34 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 3b81b00954 | evaluating on dev set during training | 2019-05-13 14:26:04 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | b6d788064a | some first experiments with different architectures and metrics | 2019-05-10 12:53:14 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 9d089c0410 | grouping clusters of instances per doc+mention | 2019-05-09 18:11:49 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | c6ca8649d7 | first stab at model - not functional yet | 2019-05-09 17:23:19 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 9f33732b96 | using entity descriptions and article texts as input embedding vectors for training | 2019-05-07 16:03:42 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 7e348d7f7f | baseline evaluation using highest-freq candidate | 2019-05-06 15:13:50 +02:00 |  | 
			
				
					| 
							
							
								 Ines Montani | dd153b2b33 | Simplify helper (see #3681) [ci skip] | 2019-05-06 15:13:10 +02:00 |  | 
			
				
					| 
							
							
								 Ines Montani | f8fce6c03c | Fix typo (see #3681) | 2019-05-06 15:02:11 +02:00 |  | 
			
				
					| 
							
							
								 Ines Montani | f2a56c1b56 | Rewrite example to use Retokenizer (resolves #3681) Also add helper to filter spans | 2019-05-06 14:51:18 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 6961215578 | refactor code to separate functionality into different files | 2019-05-06 10:56:56 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | f5190267e7 | run only 100M of WP data as training dataset (9%) | 2019-05-03 18:09:09 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 4e929600e5 | fix WP id parsing, speed up processing and remove ambiguous strings in one doc (for now) | 2019-05-03 17:37:47 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 34600c92bd | try catch per article to ensure the pipeline goes on | 2019-05-03 15:10:09 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | bbcb9da466 | creating training data with clean WP texts and QID entities true/false | 2019-05-03 10:44:29 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | cba9680d13 | run NER on clean WP text and link to gold-standard entity IDs | 2019-05-02 17:24:52 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 581dc9742d | parsing clean text from WP articles to use as input data for NER and NEL | 2019-05-02 17:09:56 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 8353552191 | cleanup | 2019-05-01 23:26:16 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 1ae41daaa9 | allow small rounding errors | 2019-05-01 23:05:40 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 3629a52ede | reading all persons in wikidata | 2019-05-01 01:00:59 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 60b54ae8ce | bulk entity writing and experiment with regex wikidata reader to speed up processing | 2019-05-01 00:00:38 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 653b7d9c87 | calculate entity raw counts offline to speed up KB construction | 2019-04-30 11:39:42 +02:00 |  | 
			
				
					| 
							
							
								 svlandeg | 19e8f339cb | deduce entity freq from WP corpus and serialize vocab in WP test | 2019-04-29 17:37:29 +02:00 |  |