Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fe442cac53 
							
						 
					 
					
						
						
							
							Fix   #717 : Set correct lemma for contracted verbs  
						
						
						
					 
					
						2017-03-18 16:16:10 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8843b84bd1 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'origin/develop-downloads'  
						
						
						
					 
					
						2017-03-16 12:00:42 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							71956c94db 
							
						 
					 
					
						
						
							
							Handle deprecated language-specific model downloading  
						
						
						
					 
					
						2017-03-15 17:37:55 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1101fd3855 
							
						 
					 
					
						
						
							
							Fix formatting and remove unused imports  
						
						
						
					 
					
						2017-03-15 17:33:39 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							842782c128 
							
						 
					 
					
						
						
							
							Move fix_deprecated_glove_vectors_loading to deprecated.py  
						
						
						
					 
					
						2017-03-15 17:33:29 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8dbff4f5f4 
							
						 
					 
					
						
						
							
							Wire up English lemma and morph rules.  
						
						
						
					 
					
						2017-03-15 09:23:22 -05:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f70be44746 
							
						 
					 
					
						
						
							
							Use lemmatizer in code, not from downloaded model.  
						
						
						
					 
					
						2017-03-15 04:52:50 -05:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							eec3f21c50 
							
						 
					 
					
						
						
							
							Add WordNet license  
						
						
						
					 
					
						2017-03-12 13:58:24 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							f9e603903b 
							
						 
					 
					
						
						
							
							Rename stop_words.py to word_sets.py and include more sets  
						
						... 
						
						
						
						NUM_WORDS and ORDINAL_WORDS are currently not used, but the hard-coded
list should be removed from orth.pyx and replaced to use
language-specific functions. This will later allow other languages to
use their own functions to set those flags. (In English, this is easier
because it only needs to be checked against a set – in German for
example, this requires a more complex function, as most number words
are one word.) 
						
					 
					
						2017-03-12 13:58:22 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							0957737ee8 
							
						 
					 
					
						
						
							
							Add Python-formatted lemmatizer data and rules  
						
						
						
					 
					
						2017-03-12 13:58:22 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							ce9568af84 
							
						 
					 
					
						
						
							
							Move English time exceptions ("1a.m." etc.) and refactor  
						
						
						
					 
					
						2017-03-12 13:58:22 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							6b30541774 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2017-03-12 13:58:22 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							66c1f194f9 
							
						 
					 
					
						
						
							
							Use consistent unicode declarations  
						
						
						
					 
					
						2017-03-12 13:07:28 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d108534dc2 
							
						 
					 
					
						
						
							
							Fix 2/3 problems for training  
						
						
						
					 
					
						2017-03-08 01:37:52 +01:00 
						 
				 
			
				
					
						
							
							
								Roman Inflianskas 
							
						 
					 
					
						
						
						
						
							
						
						
							66e1109b53 
							
						 
					 
					
						
						
							
							Add support for Universal Dependencies v2.0  
						
						
						
					 
					
						2017-03-03 13:17:34 +01:00 
						 
				 
			
				
					
						
							
							
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							30ce2a6793 
							
						 
					 
					
						
						
							
							Exclude "shed" and "Shed" from tokenizer exceptions (see  #847 )  
						
						
						
					 
					
						2017-02-18 14:10:44 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							209c37bbcf 
							
						 
					 
					
						
						
							
							Exclude "shell" and "Shell" from English tokenizer exceptions ( resolves   #775 )  
						
						
						
					 
					
						2017-01-25 13:15:02 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							50878ef598 
							
						 
					 
					
						
						
							
							Exclude "were" and "Were" from tokenizer exceptions and add regression test ( resolves   #744 )  
						
						
						
					 
					
						2017-01-16 13:10:38 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4e48862fa8 
							
						 
					 
					
						
						
							
							Remove print statement  
						
						
						
					 
					
						2017-01-12 11:25:39 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fba67fa342 
							
						 
					 
					
						
						
							
							Fix Issue  #736 : Times were being tokenized with incorrect string values.  
						
						
						
					 
					
						2017-01-12 11:21:01 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							0dec90e9f7 
							
						 
					 
					
						
						
							
							Use global abbreviation data languages and remove duplicates  
						
						
						
					 
					
						2017-01-08 20:36:00 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							cab39c59c5 
							
						 
					 
					
						
						
							
							Add missing contractions to English tokenizer exceptions  
						
						... 
						
						
						
						Inspired by
https://github.com/kootenpv/contractions/blob/master/contractions/__init 
__.py 
						
					 
					
						2017-01-05 19:59:06 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a23504fe07 
							
						 
					 
					
						
						
							
							Move abbreviations below other exceptions  
						
						
						
					 
					
						2017-01-05 19:58:07 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							7d2cf934b9 
							
						 
					 
					
						
						
							
							Generate he/she/it correctly with 's instead of 've  
						
						
						
					 
					
						2017-01-05 19:57:00 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							bc911322b3 
							
						 
					 
					
						
						
							
							Move ") to emoticons (see Tweebo challenge test)  
						
						
						
					 
					
						2017-01-05 18:05:38 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1d237664af 
							
						 
					 
					
						
						
							
							Add lowercase lemma to tokenizer exceptions  
						
						
						
					 
					
						2017-01-03 23:02:21 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							84a87951eb 
							
						 
					 
					
						
						
							
							Fix typos  
						
						
						
					 
					
						2017-01-03 18:27:43 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							35b39f53c3 
							
						 
					 
					
						
						
							
							Reorganise English tokenizer exceptions (as discussed in  #718 )  
						
						... 
						
						
						
						Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly. 
						
					 
					
						2017-01-03 18:26:09 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							461cbb99d8 
							
						 
					 
					
						
						
							
							Revert "Reorganise English tokenizer exceptions (as discussed in  #718 )"  
						
						... 
						
						
						
						This reverts commit b19cfcc144 
						
					 
					
						2017-01-03 18:21:29 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b19cfcc144 
							
						 
					 
					
						
						
							
							Reorganise English tokenizer exceptions (as discussed in  #718 )  
						
						... 
						
						
						
						Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly. 
						
					 
					
						2017-01-03 18:17:57 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							78e63dc7d0 
							
						 
					 
					
						
						
							
							Update tokenizer exceptions for English  
						
						
						
					 
					
						2016-12-21 18:06:34 +01:00 
						 
				 
			
				
					
						
							
							
								JM 
							
						 
					 
					
						
						
						
						
							
						
						
							70ff0639b5 
							
						 
					 
					
						
						
							
							Fixed missing vec_path declaration that was failing if 'add_vectors' was set  
						
						... 
						
						
						
						Added vec_path variable declaration to avoid accessing it before assignment in case 'add_vectors' is in overrides. 
						
					 
					
						2016-12-20 18:21:05 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							13a0b31279 
							
						 
					 
					
						
						
							
							Another tweak to GloVe path hackery.  
						
						
						
					 
					
						2016-12-18 23:12:49 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2c6228565e 
							
						 
					 
					
						
						
							
							Fix vector loading re glove hack  
						
						
						
					 
					
						2016-12-18 23:06:44 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							618b50a064 
							
						 
					 
					
						
						
							
							Fix issue  #684 : GloVe vectors not loaded in spacy.en.English.  
						
						
						
					 
					
						2016-12-18 22:46:31 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2ef9d53117 
							
						 
					 
					
						
						
							
							Untested fix for issue  #684 : GloVe vectors hack should be inserted in English, not in spacy.load.  
						
						
						
					 
					
						2016-12-18 22:29:31 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7a98ee5e5a 
							
						 
					 
					
						
						
							
							Merge language data change  
						
						
						
					 
					
						2016-12-18 17:03:52 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b99d683a93 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2016-12-18 16:58:28 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							b11d8cd3db 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'origin/organize-language-data' into organize-language-data  
						
						
						
					 
					
						2016-12-18 16:57:12 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2b2ea8ca11 
							
						 
					 
					
						
						
							
							Reorganise language data  
						
						
						
					 
					
						2016-12-18 16:54:19 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							44f4f008bd 
							
						 
					 
					
						
						
							
							Wire up lemmatizer rules for English  
						
						
						
					 
					
						2016-12-18 15:50:09 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							1bff59a8db 
							
						 
					 
					
						
						
							
							Update English language data  
						
						
						
					 
					
						2016-12-18 15:36:53 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							2eb163c5dd 
							
						 
					 
					
						
						
							
							Add lemma rules  
						
						
						
					 
					
						2016-12-18 15:36:53 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							29ad8143d8 
							
						 
					 
					
						
						
							
							Add morph rules  
						
						
						
					 
					
						2016-12-18 15:36:53 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							704c7442e0 
							
						 
					 
					
						
						
							
							Break language data components into their own files  
						
						
						
					 
					
						2016-12-18 15:36:53 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							28326649f3 
							
						 
					 
					
						
						
							
							Fix typo  
						
						
						
					 
					
						2016-12-18 13:30:03 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							28d63ec58e 
							
						 
					 
					
						
						
							
							Restore missing '' character in tokenizer exceptions.  
						
						
						
					 
					
						2016-12-18 05:34:51 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							a9421652c9 
							
						 
					 
					
						
						
							
							Remove duplicates in tag map  
						
						
						
					 
					
						2016-12-17 22:44:31 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							577adad945 
							
						 
					 
					
						
						
							
							Fix formatting  
						
						
						
					 
					
						2016-12-17 14:00:52 +01:00 
						 
				 
			
				
					
						
							
							
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							bb94e784dc 
							
						 
					 
					
						
						
							
							Fix typo  
						
						
						
					 
					
						2016-12-17 13:59:30 +01:00