svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							c6ca8649d7 
							
						 
					 
					
						
						
							
							first stab at model - not functional yet  
						
						
						
					 
					
						2019-05-09 17:23:19 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							9f33732b96 
							
						 
					 
					
						
						
							
							using entity descriptions and article texts as input embedding vectors for training  
						
						
						
					 
					
						2019-05-07 16:03:42 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							7e348d7f7f 
							
						 
					 
					
						
						
							
							baseline evaluation using highest-freq candidate  
						
						
						
					 
					
						2019-05-06 15:13:50 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6961215578 
							
						 
					 
					
						
						
							
							refactor code to separate functionality into different files  
						
						
						
					 
					
						2019-05-06 10:56:56 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							f5190267e7 
							
						 
					 
					
						
						
							
							run only 100M of WP data as training dataset (9%)  
						
						
						
					 
					
						2019-05-03 18:09:09 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							4e929600e5 
							
						 
					 
					
						
						
							
							fix WP id parsing, speed up processing and remove ambiguous strings in one doc (for now)  
						
						
						
					 
					
						2019-05-03 17:37:47 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							34600c92bd 
							
						 
					 
					
						
						
							
							try catch per article to ensure the pipeline goes on  
						
						
						
					 
					
						2019-05-03 15:10:09 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							bbcb9da466 
							
						 
					 
					
						
						
							
							creating training data with clean WP texts and QID entities true/false  
						
						
						
					 
					
						2019-05-03 10:44:29 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							cba9680d13 
							
						 
					 
					
						
						
							
							run NER on clean WP text and link to gold-standard entity IDs  
						
						
						
					 
					
						2019-05-02 17:24:52 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							581dc9742d 
							
						 
					 
					
						
						
							
							parsing clean text from WP articles to use as input data for NER and NEL  
						
						
						
					 
					
						2019-05-02 17:09:56 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							8353552191 
							
						 
					 
					
						
						
							
							cleanup  
						
						
						
					 
					
						2019-05-01 23:26:16 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							1ae41daaa9 
							
						 
					 
					
						
						
							
							allow small rounding errors  
						
						
						
					 
					
						2019-05-01 23:05:40 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							3629a52ede 
							
						 
					 
					
						
						
							
							reading all persons in wikidata  
						
						
						
					 
					
						2019-05-01 01:00:59 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							60b54ae8ce 
							
						 
					 
					
						
						
							
							bulk entity writing and experiment with regex wikidata reader to speed up processing  
						
						
						
					 
					
						2019-05-01 00:00:38 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							653b7d9c87 
							
						 
					 
					
						
						
							
							calculate entity raw counts offline to speed up KB construction  
						
						
						
					 
					
						2019-04-30 11:39:42 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							19e8f339cb 
							
						 
					 
					
						
						
							
							deduce entity freq from WP corpus and serialize vocab in WP test  
						
						
						
					 
					
						2019-04-29 17:37:29 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							387263d618 
							
						 
					 
					
						
						
							
							simplify chains  
						
						
						
					 
					
						2019-04-29 13:58:07 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							54d0cea062 
							
						 
					 
					
						
						
							
							unit test for KB serialization  
						
						
						
					 
					
						2019-04-24 23:52:34 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							3e0cb69065 
							
						 
					 
					
						
						
							
							KB aliases to and from file  
						
						
						
					 
					
						2019-04-24 20:24:24 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							ad6c5e581c 
							
						 
					 
					
						
						
							
							writing and reading number of entries to/from header  
						
						
						
					 
					
						2019-04-24 15:31:44 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6e3223f234 
							
						 
					 
					
						
						
							
							bulk loading in proper order of entity indices  
						
						
						
					 
					
						2019-04-24 11:26:38 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							694fea597a 
							
						 
					 
					
						
						
							
							dumping all entryC entries + (inefficient) reading back in  
						
						
						
					 
					
						2019-04-23 18:36:50 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							8e70a564f1 
							
						 
					 
					
						
						
							
							custom reader and writer for _EntryC fields (first stab at it - not complete)  
						
						
						
					 
					
						2019-04-23 16:33:40 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							004e5e7d1c 
							
						 
					 
					
						
						
							
							little fixes  
						
						
						
					 
					
						2019-04-19 14:24:02 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							9a8197185b 
							
						 
					 
					
						
						
							
							fix alias capitalization  
						
						
						
					 
					
						2019-04-18 22:37:50 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							9f308eb5dc 
							
						 
					 
					
						
						
							
							fixes for prior prob and linking wikidata IDs with wikipedia titles  
						
						
						
					 
					
						2019-04-18 16:14:25 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							10ee8dfea2 
							
						 
					 
					
						
						
							
							poc with few entities and collecting aliases from the WP links  
						
						
						
					 
					
						2019-04-18 14:12:17 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6763e025e1 
							
						 
					 
					
						
						
							
							parse wp dump for links to determine prior probabilities  
						
						
						
					 
					
						2019-04-15 11:41:57 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							3163331b1e 
							
						 
					 
					
						
						
							
							wikipedia dump parser and mediawiki format regex cleanup  
						
						
						
					 
					
						2019-04-14 21:52:01 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							b31a390a9a 
							
						 
					 
					
						
						
							
							reading types, claims and sitelinks  
						
						
						
					 
					
						2019-04-11 21:42:44 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6e997be4b4 
							
						 
					 
					
						
						
							
							reading wikidata descriptions and aliases  
						
						
						
					 
					
						2019-04-11 21:08:22 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							9a7d534b1b 
							
						 
					 
					
						
						
							
							enable nogil for cython functions in kb.pxd  
						
						
						
					 
					
						2019-04-10 17:25:10 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							61a33f55d2 
							
						 
					 
					
						
						
							
							little fixes  
						
						
						
					 
					
						2019-04-10 16:06:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6ae3b5699e 
							
						 
					 
					
						
						
							
							Make sure path is string ( resolves   #3546 )  
						
						
						
					 
					
						2019-04-08 12:53:41 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							d0f5e015cb 
							
						 
					 
					
						
						
							
							Auto-format  
						
						
						
					 
					
						2019-04-08 12:53:16 +02:00 
						 
				 
			
				
					
						
							
							
								pierremonico 
							
						 
					 
					
						
						
						
						
							
						
						
							0d26bfe677 
							
						 
					 
					
						
						
							
							Removes duplicate in table ( #3550 )  
						
						... 
						
						
						
						* Removes duplicate in table
Just fixing typos.
* Remove newline
Co-authored-by: Ines Montani <ines@ines.io> 
						
					 
					
						2019-04-08 10:30:42 +02:00 
						 
				 
			
				
					
						
							
							
								Piero Molino 
							
						 
					 
					
						
						
						
						
							
						
						
							5198aa4ae6 
							
						 
					 
					
						
						
							
							Added Ludwig among the projects ( #3548 ) [ci skip]  
						
						... 
						
						
						
						* Added Ludwig among the projects
* Create w4nderlust.md
* Add Uber to logo wall 
						
					 
					
						2019-04-07 13:01:26 +02:00 
						 
				 
			
				
					
						
							
							
								Dobita21 
							
						 
					 
					
						
						
						
						
							
						
						
							8bf6967eb7 
							
						 
					 
					
						
						
							
							Update Thai stop words ( #3545 )  
						
						... 
						
						
						
						* test sPacy commit to git fri 04052019 10:54
* change Data format from my format to master format
* ทัทั้งนี้ ---> ทั้งนี้
* delete stop_word translate from Eng
* Adjust formatting and readability 
						
					 
					
						2019-04-05 12:06:38 +02:00 
						 
				 
			
				
					
						
							
							
								jeannefukumaru 
							
						 
					 
					
						
						
						
						
							
						
						
							f67d881b30 
							
						 
					 
					
						
						
							
							fix typos in tag_map flagged by python -m debug-data ( #3542 )  
						
						... 
						
						
						
						## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
Co-authored-by: Ines Montani <ines@ines.io> 
						
					 
					
						2019-04-05 12:06:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							cd21778bef 
							
						 
					 
					
						
						
							
							Merge pull request  #3539  from jeannefukumaru/master  
						
						... 
						
						
						
						Added tags previously missing from Indonesian `tag_map.py` 
						
					 
					
						2019-04-04 11:57:03 +02:00 
						 
				 
			
				
					
						
							
							
								Jeanne Choo 
							
						 
					 
					
						
						
						
						
							
						
						
							b6c9807431 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/master'  
						
						
						
					 
					
						2019-04-04 14:21:50 +08:00 
						 
				 
			
				
					
						
							
							
								Jeanne Choo 
							
						 
					 
					
						
						
						
						
							
						
						
							80e15af76c 
							
						 
					 
					
						
						
							
							fixed tag_map.py merge conflict  
						
						
						
					 
					
						2019-04-04 14:18:27 +08:00 
						 
				 
			
				
					
						
							
							
								jeannefukumaru 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							eba4f77526 
							
						 
					 
					
						
						
							
							Merge pull request  #2  from jeannefukumaru/update_indonesian_tag_map  
						
						... 
						
						
						
						updated tag map with missing tags 
						
					 
					
						2019-04-04 06:49:04 +08:00 
						 
				 
			
				
					
						
							
							
								jeannefukumaru 
							
						 
					 
					
						
						
						
						
							
						
						
							876ce01567 
							
						 
					 
					
						
						
							
							updated tag map with missing tags  
						
						
						
					 
					
						2019-04-03 23:09:11 +08:00 
						 
				 
			
				
					
						
							
							
								jeannefukumaru 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							99e04c4ce2 
							
						 
					 
					
						
						
							
							Merge pull request  #1  from jeannefukumaru/added-indonesian-tag-map  
						
						... 
						
						
						
						Added indonesian tag map 
						
					 
					
						2019-04-03 23:05:05 +08:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4faf62d515 
							
						 
					 
					
						
						
							
							Merge pull request  #3530  from svlandeg/fix/issue_3521  
						
						... 
						
						
						
						Allow English stopwords with any type of apostrophe 
						
					 
					
						2019-04-03 14:14:03 +02:00 
						 
				 
			
				
					
						
							
							
								Yves Peirsman 
							
						 
					 
					
						
						
						
						
							
						
						
							951825532c 
							
						 
					 
					
						
						
							
							Improved Dutch language resources and Dutch lemmatization ( #3409 )  
						
						... 
						
						
						
						* Improved Dutch language resources and Dutch lemmatization
* Fix conftest
* Update punctuation.py
* Auto-format
* Format and fix tests
* Remove unused test file
* Re-add deleted test
* removed redundant infix regex pattern for ','; note: brackets + simple hyphen remains
* Cleaner lemmatization files 
						
					 
					
						2019-04-03 14:13:26 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							4ff786e113 
							
						 
					 
					
						
						
							
							addressed all comments by Ines  
						
						
						
					 
					
						2019-04-03 13:50:33 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6a4575a56c 
							
						 
					 
					
						
						
							
							Don't make "settings" or "title" required in displaCy data ( closes   #3531 )  
						
						
						
					 
					
						2019-04-03 10:13:16 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2f0f439c54 
							
						 
					 
					
						
						
							
							Remove non-existent example ( closes   #3533 )  
						
						
						
					 
					
						2019-04-03 09:59:17 +02:00