Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							655b434553 
							
						 
					 
					
						
						
							
							Merge branch 'master' into develop  
						
						
						
					 
					
						2019-09-12 11:39:18 +02:00 
						 
				 
			
				
					
						
							
							
								tamuhey 
							
						 
					 
					
						
						
						
						
							
						
						
							71909cdf22 
							
						 
					 
					
						
						
							
							Fix iss4278 ( #4279 )  
						
						... 
						
						
						
						* fix: len(tuple) == 2
* (#4278 ) add fail test
* add contributor's aggreement 
						
					 
					
						2019-09-12 10:44:49 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							e82a8d0d7a 
							
						 
					 
					
						
						
							
							Merge branch 'master' into develop  
						
						
						
					 
					
						2019-09-11 11:52:38 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8f9f48b04c 
							
						 
					 
					
						
						
							
							Add GreekLemmatizer.lookup ( resolves   #4272 )  
						
						
						
					 
					
						2019-09-11 11:44:40 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							6279d74c65 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2019-09-11 11:38:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7b858ba606 
							
						 
					 
					
						
						
							
							Update from master  
						
						
						
					 
					
						2019-09-10 20:14:08 +02:00 
						 
				 
			
				
					
						
							
							
								adrianeboyd 
							
						 
					 
					
						
						
						
						
							
						
						
							3780e2ff50 
							
						 
					 
					
						
						
							
							Flush tokenizer cache when necessary ( #4258 )  
						
						... 
						
						
						
						Flush tokenizer cache when affixes, token_match, or special cases are
modified.
Fixes  #4238 , same issue as in #1250 . 
						
					 
					
						2019-09-08 20:52:46 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							1a65c5b7af 
							
						 
					 
					
						
						
							
							Update develop from master  
						
						
						
					 
					
						2019-09-08 18:21:41 +02:00 
						 
				 
			
				
					
						
							
							
								Adriane Boyd 
							
						 
					 
					
						
						
						
						
							
						
						
							0f28418446 
							
						 
					 
					
						
						
							
							Add regression test for  #1061  back to test suite  
						
						
						
					 
					
						2019-09-04 20:42:24 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							419ae59c79 
							
						 
					 
					
						
						
							
							Make flaky test test_issue_1971_4 more explicit  
						
						
						
					 
					
						2019-08-31 14:08:05 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							7bec0ebbcb 
							
						 
					 
					
						
						
							
							failing unit test for Issue 4190  
						
						
						
					 
					
						2019-08-28 14:16:34 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							22250cf6b7 
							
						 
					 
					
						
						
							
							Make regression test less sensitive to tag-map stuff  
						
						
						
					 
					
						2019-08-25 21:54:26 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							bb911e5f4e 
							
						 
					 
					
						
						
							
							Fix   #3830 : 'subtok' label being added even if learn_tokens=False ( #4188 )  
						
						... 
						
						
						
						* Prevent subtok label if not learning tokens
The parser introduces the subtok label to mark tokens that should be
merged during post-processing. Previously this happened even if we did
not have the --learn-tokens flag set. This patch passes the config
through to the parser, to prevent the problem.
* Make merge_subtokens a parser post-process if learn_subtokens
* Fix train script
* Add test for 3830: subtok problem
* Fix handlign of non-subtok in parser training 
						
					 
					
						2019-08-23 17:54:00 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							c417c380e3 
							
						 
					 
					
						
						
							
							Matcher ID fixes ( #4179 )  
						
						... 
						
						
						
						* allow phrasematcher to link one match to multiple original patterns
* small fix for defining ent_id in the matcher (anti-ghost prevention)
* cleanup
* formatting 
						
					 
					
						2019-08-22 17:17:07 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							de272f8b82 
							
						 
					 
					
						
						
							
							adding double match for optional operator at the end ( #4166 )  
						
						
						
					 
					
						2019-08-21 22:46:56 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							01c5980187 
							
						 
					 
					
						
						
							
							Serialize POS attribute when doc.is_tagged ( #4092 )  
						
						... 
						
						
						
						* fix and unit test for issue 3959
* additional unit test for manifestation of the same (resolved) bug 
						
					 
					
						2019-08-21 21:59:30 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							7539a4f3a8 
							
						 
					 
					
						
						
							
							use states[q] in while retry loop ( #4162 )  
						
						
						
					 
					
						2019-08-21 21:58:04 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							f580302673 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2019-08-20 17:36:34 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							364aaf5bc2 
							
						 
					 
					
						
						
							
							Simplify test  
						
						
						
					 
					
						2019-08-20 16:41:58 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							68ee0384fd 
							
						 
					 
					
						
						
							
							Unit test for Issue 3879 ( #4153 )  
						
						... 
						
						
						
						* failing unit test for Issue #3879 
* mark test as failing 
						
					 
					
						2019-08-20 16:40:25 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							86cd7f0efd 
							
						 
					 
					
						
						
							
							Add regression test for  #4120  
						
						
						
					 
					
						2019-08-20 16:33:09 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							009280fbc5 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2019-08-18 15:09:16 +02:00 
						 
				 
			
				
					
						
							
							
								AJ Rader 
							
						 
					 
					
						
						
						
						
							
						
						
							2f3648700c 
							
						 
					 
					
						
						
							
							Correction of default lemmatizer lookup in English (Issue # 4104) ( #4110 )  
						
						... 
						
						
						
						* pytest file for issue4104 established
* edited default lookup english lemmatizer for spun; fixes issue 4102
* eliminated parameterization and sorted dictionary dependnency in issue 4104 test
* added contributor agreement 
						
					 
					
						2019-08-15 11:39:10 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							963ea5e8d0 
							
						 
					 
					
						
						
							
							Update lemma and vector information after splitting a token ( #4097 )  
						
						... 
						
						
						
						* fixing vector and lemma attributes after retokenizer.split
* fixing unit test with mockup tensor
* xp instead of numpy 
						
					 
					
						2019-08-08 15:09:44 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							ad09b0d6f3 
							
						 
					 
					
						
						
							
							fetch norm from lex if necessary for matching ( #4080 )  
						
						
						
					 
					
						2019-08-05 23:51:04 +02:00 
						 
				 
			
				
					
						
							
							
								adrianeboyd 
							
						 
					 
					
						
						
						
						
							
						
						
							925a852bb6 
							
						 
					 
					
						
						
							
							Improve NER per type scoring ( #4052 )  
						
						... 
						
						
						
						* Improve NER per type scoring
* include all gold labels in per type scoring, not only when recall > 0
* improve efficiency of per type scoring
* Create Scorer tests, initially with NER tests
* move regression test #3968  (per type NER scoring) to Scorer tests
* add new test for per type NER scoring with imperfect P/R/F and per
type P/R/F including a case where R == 0.0 
						
					 
					
						2019-08-01 17:15:36 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							f7d950de6d 
							
						 
					 
					
						
						
							
							ensure the lang of vocab and nlp stay consistent ( #4057 )  
						
						... 
						
						
						
						* ensure the language of vocab and nlp stay consistent across serialization
* equality with = 
						
					 
					
						2019-08-01 17:13:01 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							7de3b129ab 
							
						 
					 
					
						
						
							
							Resolve edge case when calling textcat.predict with empty doc ( #4035 )  
						
						... 
						
						
						
						* resolve edge case where no doc has tokens when calling textcat.predict
* more explicit value test 
						
					 
					
						2019-07-30 14:58:01 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							ba02957c80 
							
						 
					 
					
						
						
							
							Fix dependency copy for as_doc ( #3969 )  
						
						... 
						
						
						
						* failing unit test for issue 3962
* attempt to fix Issue #3962 
* create artificial unit test example
* using length instead of self.length
* sp
* reformat with black
* find better ancestor within span and use generic 'dep'
* attach to span.root if there is no appropriate ancestor
* comment span text
* clean up ancestor code
* reconstruct dep tree to keep same number of sentences 
						
					 
					
						2019-07-23 18:28:54 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a32b033b8c 
							
						 
					 
					
						
						
							
							Add regression test for  #4002  
						
						... 
						
						
						
						Test that the PhraseMatcher can match on overwritten NORM attributes. 
						
					 
					
						2019-07-22 14:18:24 +02:00 
						 
				 
			
				
					
						
							
							
								Falak Asad 
							
						 
					 
					
						
						
						
						
							
						
						
							ff1e73e35c 
							
						 
					 
					
						
						
							
							Bugfix/issue 3968 ( #3982 )  
						
						... 
						
						
						
						* Fix for issue-3968
* Added contributor agreement
* Made suggested changes 
						
					 
					
						2019-07-18 00:20:32 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							073013f129 
							
						 
					 
					
						
						
							
							Auto-format [ci skip]  
						
						
						
					 
					
						2019-07-17 12:34:13 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							62ff128888 
							
						 
					 
					
						
						
							
							Add regression test for  #3951  
						
						
						
					 
					
						2019-07-16 14:00:00 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7f551050b1 
							
						 
					 
					
						
						
							
							Add regression test for  #3972  
						
						
						
					 
					
						2019-07-16 13:07:35 +02:00 
						 
				 
			
				
					
						
							
							
								Sofie Van Landeghem 
							
						 
					 
					
						
						
						
						
							
						
						
							ed774cb953 
							
						 
					 
					
						
						
							
							Fixing ngram bug ( #3953 )  
						
						... 
						
						
						
						* minimal failing example for Issue #3661 
* referenced Issue #3661  instead of Issue #3611 
* cleanup 
						
					 
					
						2019-07-12 10:01:35 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							673c864a06 
							
						 
					 
					
						
						
							
							Fix doc.count_by functionality ( #3950 )  
						
						... 
						
						
						
						Fix doc.count_by functionality 
						
					 
					
						2019-07-11 13:44:00 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2426f4d44c 
							
						 
					 
					
						
						
							
							Fix default punctuation rules for splitting Hindi text ( #3948 )  
						
						... 
						
						
						
						Fix default punctuation rules for splitting Hindi text
Co-authored-by: yash <patadiayash@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io> 
						
					 
					
						2019-07-11 13:36:28 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							349107daa3 
							
						 
					 
					
						
						
							
							cleanup  
						
						
						
					 
					
						2019-07-11 13:09:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b40b4c2c31 
							
						 
					 
					
						
						
							
							💫  Fix issue  #3839 : Incorrect entity IDs from Matcher with operators ( #3949 )  
						
						... 
						
						
						
						* Add regression test for issue #3541 
* Add comment on bugfix
* Remove incorrect test
* Un-xfail test 
						
					 
					
						2019-07-11 12:55:11 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							197cfd7ebc 
							
						 
					 
					
						
						
							
							Merge branch 'master' into pr/3948  
						
						
						
					 
					
						2019-07-11 12:18:31 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0b8406a05c 
							
						 
					 
					
						
						
							
							Tidy up and auto-format  
						
						
						
					 
					
						2019-07-11 12:02:25 +02:00 
						 
				 
			
				
					
						
							
							
								yash 
							
						 
					 
					
						
						
						
						
							
						
						
							ae2d52e323 
							
						 
					 
					
						
						
							
							Add default encoding utf-8 for test file  
						
						
						
					 
					
						2019-07-11 15:26:27 +05:30 
						 
				 
			
				
					
						
							
							
								yash 
							
						 
					 
					
						
						
						
						
							
						
						
							d5311b3c42 
							
						 
					 
					
						
						
							
							Add test file for issue ( #3625 ) and spacy contributor agreement  
						
						
						
					 
					
						2019-07-11 14:53:14 +05:30 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							e080412385 
							
						 
					 
					
						
						
							
							tracked the bug down to PreshCounter.inc - still unclear what goes wrong  
						
						
						
					 
					
						2019-07-11 01:53:06 +02:00 
						 
				 
			
				
					
						
							
							
								svlandeg 
							
						 
					 
					
						
						
						
						
							
						
						
							a89fecce97 
							
						 
					 
					
						
						
							
							failing unit test for issue  #3869  
						
						
						
					 
					
						2019-07-11 00:43:55 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							465456edb9 
							
						 
					 
					
						
						
							
							Un-xfail test  #3880  
						
						
						
					 
					
						2019-07-10 14:01:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							87f7ec34d5 
							
						 
					 
					
						
						
							
							Add test for  #3880  
						
						
						
					 
					
						2019-07-10 13:53:55 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							82045aac8a 
							
						 
					 
					
						
						
							
							Merge regression tests  
						
						
						
					 
					
						2019-07-10 12:49:18 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							570ab1f481 
							
						 
					 
					
						
						
							
							Fix handling of old entity ruler files  
						
						... 
						
						
						
						Expected an `entity_ruler.jsonl` file in the top-level model directory, so the path passed to from_disk by default (model path plus componentn name), but with the suffix ".jsonl". 
						
					 
					
						2019-07-10 12:14:12 +02:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							874d914a44 
							
						 
					 
					
						
						
							
							Tidy up test  
						
						
						
					 
					
						2019-07-10 12:13:23 +02:00