svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							2713abc651 
							
						 
					 
					
						
						
							
							implement loss function using dot product and prob estimate per candidate cluster  
						
						
						
					 
					
						2019-05-14 22:55:56 +02:00 
						 
				 
			
				
					
						
							
							
								BreakBB 
							
						 
					 
					
						
						
						
						
							
						
						
							ed18a6efbd 
							
						 
					 
					
						
						
							
							Add check for callable to 'Language.replace_pipe' to  fix   #3737  ( #3741 )  
						
						
						
					 
					
						2019-05-14 16:59:31 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							09ed446b20 
							
						 
					 
					
						
						
							
							different architecture / settings  
						
						
						
					 
					
						2019-05-14 08:37:52 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							4142e8dd1b 
							
						 
					 
					
						
						
							
							train and predict per article (saving time for doc encoding)  
						
						
						
					 
					
						2019-05-13 17:02:34 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							3b81b00954 
							
						 
					 
					
						
						
							
							evaluating on dev set during training  
						
						
						
					 
					
						2019-05-13 14:26:04 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8baff1c7c0 
							
						 
					 
					
						
						
							
							💫  Improve introspection of custom extension attributes ( #3729 )  
						
						... 
						
						
						
						* Add custom __dir__ to Underscore (see #3707 )
* Make sure custom extension methods keep their docstrings (see #3707 )
* Improve tests
* Prepend note on partial to docstring (see #3707 )
* Remove print statement
* Handle cases where docstring is None 
						
					 
					
						2019-05-12 00:53:11 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f96af8526a 
							
						 
					 
					
						
						
							
							Merge branch 'spacy.io' [ci skip]  
						
						
						
					 
					
						2019-05-11 23:03:56 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3aceeeaaeb 
							
						 
					 
					
						
						
							
							Set version to v2.1.4  
						
						
						
					 
					
						2019-05-11 22:57:53 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							aea1c93a05 
							
						 
					 
					
						
						
							
							Replace cytoolz.partition_all with util.minibatch  
						
						
						
					 
					
						2019-05-11 21:12:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0bf6441863 
							
						 
					 
					
						
						
							
							Fix .iob converter ( closes   #3620 )  
						
						
						
					 
					
						2019-05-11 19:15:26 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f6e9394aa5 
							
						 
					 
					
						
						
							
							Fix push-tag script  
						
						
						
					 
					
						2019-05-11 19:04:35 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a5159ddcf5 
							
						 
					 
					
						
						
							
							Set version to v2.1.4.dev1  
						
						
						
					 
					
						2019-05-11 19:03:51 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7534f7cb44 
							
						 
					 
					
						
						
							
							Fix return value of Language.update ( closes   #3692 )  
						
						
						
					 
					
						2019-05-11 18:40:19 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							503b8c85f1 
							
						 
					 
					
						
						
							
							Add TWiML podcast to universe [ci skip]  
						
						
						
					 
					
						2019-05-11 17:48:22 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0daf2422a3 
							
						 
					 
					
						
						
							
							Auto-format  
						
						
						
					 
					
						2019-05-11 17:48:07 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6b3a79ac96 
							
						 
					 
					
						
						
							
							Call rmtree and copytree with strings ( closes   #3713 )  
						
						
						
					 
					
						2019-05-11 15:48:35 +02:00 
						 
				 
			
				
					
						
							
							
								devforfu 
							
						 
					 
					
						
						
						
						
							
						
						
							21af12eb53 
							
						 
					 
					
						
						
							
							Make "text" key in JSONL format optional when "tokens" key is provided ( #3721 )  
						
						... 
						
						
						
						* Fix issue with forcing text key when it is not required
* Extending the docs to reflect the new behavior 
						
					 
					
						2019-05-11 15:41:29 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6cfa1e1f47 
							
						 
					 
					
						
						
							
							Fix DependencyParser.predict docs ( resolves   #3561 )  
						
						
						
					 
					
						2019-05-11 15:37:54 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							25f5592d57 
							
						 
					 
					
						
						
							
							Improve Token.prob and Lexeme.prob docs ( resolves   #3701 )  
						
						
						
					 
					
						2019-05-11 15:23:41 +02:00 
						 
				 
			
				
					
						
							
							
								Aaron Kub 
							
						 
					 
					
						
						
						
						
							
						
						
							719a15f23d 
							
						 
					 
					
						
						
							
							fixing regex matcher examples ( #3708 ) ( #3719 )  
						
						
						
					 
					
						2019-05-10 14:23:52 +02:00 
						 
				 
			
				
					
						
							
							
								Luca Dorigo 
							
						 
					 
					
						
						
						
						
							
						
						
							82d034f976 
							
						 
					 
					
						
						
							
							Update glossary.py to match information found in documentation ( #3704 ) (closes ##3679)  
						
						... 
						
						
						
						* Update glossary.py to match information found in documentation
I used regexes to add any dependency tag that was in the documentation but not in the glossary. Solves #3679  👍 
* Adds forgotten colon 
						
					 
					
						2019-05-10 14:23:20 +02:00 
						 
				 
			
				
					
						
							
							
								Wannaphong Phatthiyaphaibun 
							
						 
					 
					
						
						
						
						
							
						
						
							5a14a13f64 
							
						 
					 
					
						
						
							
							fix thai bug ( #3693 )  
						
						... 
						
						
						
						fix tokenize for pythainlp 
						
					 
					
						2019-05-10 14:21:34 +02:00 
						 
				 
			
				
					
						
							
							
								Luca Dorigo 
							
						 
					 
					
						
						
						
						
							
						
						
							2663f4133c 
							
						 
					 
					
						
						
							
							Submit contributor agreement ( #3705 )  
						
						
						
					 
					
						2019-05-10 14:19:18 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							65b55f1aaa 
							
						 
					 
					
						
						
							
							Add version tag to --base-model argument ( closes   #3720 )  
						
						
						
					 
					
						2019-05-10 14:06:47 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							b6d788064a 
							
						 
					 
					
						
						
							
							some first experiments with different architectures and metrics  
						
						
						
					 
					
						2019-05-10 12:53:14 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							9d089c0410 
							
						 
					 
					
						
						
							
							grouping clusters of instances per doc+mention  
						
						
						
					 
					
						2019-05-09 18:11:49 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							c6ca8649d7 
							
						 
					 
					
						
						
							
							first stab at model - not functional yet  
						
						
						
					 
					
						2019-05-09 17:23:19 +02:00 
						 
				 
			
				
					
						
							
							
								richardpaulhudson 
							
						 
					 
					
						
						
						
						
							
						
						
							a1e07f0d14 
							
						 
					 
					
						
						
							
							Request to include Holmes in spaCy Universe ( #3685 )  
						
						... 
						
						
						
						* Request to add Holmes to spaCy Universe
Dear spaCy team, I would be grateful if you would consider my Python library Holmes for inclusion in the spaCy Universe. Holmes transforms the syntactic structures delivered by spaCy into semantic structures that, together with various other techniques including ontological matching and word embeddings, serve as the basis for information extraction. Holmes supports several use cases including chatbot, structured search, topic matching and supervised document classification. I had the basic idea for Holmes around 15 years ago and now spaCy has made it possible to build an implementation that is stable and fast enough to actually be of use - thank you! At present Holmes supports English and German (I am based in Munich) but could easily be extended to support any other language with a spaCy model.
* Added 
						
					 
					
						2019-05-08 02:42:03 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							505c9e0e19 
							
						 
					 
					
						
						
							
							Add util.filter_spans helper ( #3686 )  
						
						
						
					 
					
						2019-05-08 02:33:40 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							9f33732b96 
							
						 
					 
					
						
						
							
							using entity descriptions and article texts as input embedding vectors for training  
						
						
						
					 
					
						2019-05-07 16:03:42 +02:00 
						 
				 
			
				
					
						
							
							
								F0rge1cE 
							
						 
					 
					
						
						
						
						
							
						
						
							dd1e6b0bc6 
							
						 
					 
					
						
						
							
							Fix offset bug in loading pre-trained word2vec. ( #3689 )  
						
						... 
						
						
						
						* Fix offset bug in loading pre-trained word2vec.
* add contributor agreement 
						
					 
					
						2019-05-06 23:00:38 +02:00 
						 
				 
			
				
					
						
							
							
								Bram Vanroy 
							
						 
					 
					
						
						
						
						
							
						
						
							8e6f8deaf6 
							
						 
					 
					
						
						
							
							Re-added Universe readme ( #3688 ) ( closes   #3680 )  
						
						
						
					 
					
						2019-05-06 21:08:01 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							78cb807a9a 
							
						 
					 
					
						
						
							
							Auto-format [ci skip]  
						
						
						
					 
					
						2019-05-06 16:58:29 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							7e348d7f7f 
							
						 
					 
					
						
						
							
							baseline evaluation using highest-freq candidate  
						
						
						
					 
					
						2019-05-06 15:13:50 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							dd153b2b33 
							
						 
					 
					
						
						
							
							Simplify helper (see  #3681 ) [ci skip]  
						
						
						
					 
					
						2019-05-06 15:13:10 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f8fce6c03c 
							
						 
					 
					
						
						
							
							Fix typo (see  #3681 )  
						
						
						
					 
					
						2019-05-06 15:02:11 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f2a56c1b56 
							
						 
					 
					
						
						
							
							Rewrite example to use Retokenizer ( resolves   #3681 )  
						
						... 
						
						
						
						Also add helper to filter spans 
						
					 
					
						2019-05-06 14:51:18 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							6961215578 
							
						 
					 
					
						
						
							
							refactor code to separate functionality into different files  
						
						
						
					 
					
						2019-05-06 10:56:56 +02:00 
						 
				 
			
				
					
						
							
							
								Brad Jascob 
							
						 
					 
					
						
						
						
						
							
						
						
							955b95cb8b 
							
						 
					 
					
						
						
							
							Fix inconsistant lemmatizer issue  #3484  ( #3646 )  
						
						... 
						
						
						
						* Fix inconsistant lemmatizer issue #3484 
* Remove test case 
						
					 
					
						2019-05-04 18:16:03 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							f5190267e7 
							
						 
					 
					
						
						
							
							run only 100M of WP data as training dataset (9%)  
						
						
						
					 
					
						2019-05-03 18:09:09 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							4e929600e5 
							
						 
					 
					
						
						
							
							fix WP id parsing, speed up processing and remove ambiguous strings in one doc (for now)  
						
						
						
					 
					
						2019-05-03 17:37:47 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							34600c92bd 
							
						 
					 
					
						
						
							
							try catch per article to ensure the pipeline goes on  
						
						
						
					 
					
						2019-05-03 15:10:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b4d142e3c4 
							
						 
					 
					
						
						
							
							Adjust wording and formatting [ci skip]  
						
						
						
					 
					
						2019-05-03 12:00:31 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							04658ebbb2 
							
						 
					 
					
						
						
							
							Relax jsonschema pin ( closes   #3628 )  
						
						
						
					 
					
						2019-05-03 11:58:58 +02:00 
						 
				 
			
				
					
						
							
							
								d5555 
							
						 
					 
					
						
						
						
						
							
						
						
							ba4bcbf285 
							
						 
					 
					
						
						
							
							Update universe.json ( #3653 ) [ci skip]  
						
						... 
						
						
						
						* Update universe.json
* Update universe.json 
						
					 
					
						2019-05-03 11:50:12 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							bbcb9da466 
							
						 
					 
					
						
						
							
							creating training data with clean WP texts and QID entities true/false  
						
						
						
					 
					
						2019-05-03 10:44:29 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							cba9680d13 
							
						 
					 
					
						
						
							
							run NER on clean WP text and link to gold-standard entity IDs  
						
						
						
					 
					
						2019-05-02 17:24:52 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							581dc9742d 
							
						 
					 
					
						
						
							
							parsing clean text from WP articles to use as input data for NER and NEL  
						
						
						
					 
					
						2019-05-02 17:09:56 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							8353552191 
							
						 
					 
					
						
						
							
							cleanup  
						
						
						
					 
					
						2019-05-01 23:26:16 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							1ae41daaa9 
							
						 
					 
					
						
						
							
							allow small rounding errors  
						
						
						
					 
					
						2019-05-01 23:05:40 +02:00