Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4454c1b23f 
							
						 
					 
					
						
						
							
							Block lemmatization of base-form adjectives  
						
						 
						
						... 
						
						
						
						Fixes check that an adjective is a base form (as opposed to a
comparative or superlative), so that it's not lemmatized.
e.g. inner -!> inn. Closes  #912 . 
						
					 
					
						2017-03-25 21:29:57 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							413138de79 
							
						 
					 
					
						
						
							
							Fix   #719 : Lemmatizer can no longer output empty string  
						
						 
						
						
						
					 
					
						2017-03-18 16:02:06 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c4351e1165 
							
						 
					 
					
						
						
							
							Update base-form check in lemmatizer, for UD 2.0 morphology  
						
						 
						
						
						
					 
					
						2017-03-16 17:59:31 -05:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fea9fe08af 
							
						 
					 
					
						
						
							
							Merge pull request  #866  from juanmirocks/master  
						
						 
						
						... 
						
						
						
						Fix lemmatization of OOV words 
						
					 
					
						2017-03-16 23:37:36 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1da29a7146 
							
						 
					 
					
						
						
							
							Use new Lemmatizer data and remove file import  
						
						 
						
						... 
						
						
						
						Since there's currently only an English lemmatizer, the global
Lemmatizer imports from spacy.en. This is unideal and still needs to be
fixed. 
						
					 
					
						2017-03-12 13:58:22 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Juan Miguel Cejuela 
							
						 
					 
					
						
						
						
						
							
						
						
							25c29f072d 
							
						 
					 
					
						
						
							
							apply patch  
						
						 
						
						
						
					 
					
						2017-03-01 21:44:17 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							44f4f008bd 
							
						 
					 
					
						
						
							
							Wire up lemmatizer rules for English  
						
						 
						
						
						
					 
					
						2016-12-18 15:50:09 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a4eb5c2bff 
							
						 
					 
					
						
						
							
							Check POS key in lemmatizer, to update it for new data format  
						
						 
						
						
						
					 
					
						2016-12-18 13:28:20 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8350d65695 
							
						 
					 
					
						
						
							
							Change morphology and lemmatizer API  
						
						 
						
						... 
						
						
						
						Take morphology features as object instead of keyword arguments 
						
					 
					
						2016-12-07 21:12:49 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e30348b331 
							
						 
					 
					
						
						
							
							Prefer to import from symbols instead of parts_of_speech  
						
						 
						
						
						
					 
					
						2016-11-04 00:27:55 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f5fe4f595b 
							
						 
					 
					
						
						
							
							Fix json loading, for Python 3.  
						
						 
						
						
						
					 
					
						2016-10-20 21:23:26 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2e92c6fb3a 
							
						 
					 
					
						
						
							
							Fix JSON encoding issue on load  
						
						 
						
						
						
					 
					
						2016-10-20 21:06:48 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f189a3cb00 
							
						 
					 
					
						
						
							
							Fix encoding when opening files in Python 2.7, re Issue  #539  
						
						 
						
						
						
					 
					
						2016-10-20 14:42:56 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a2f3510d6d 
							
						 
					 
					
						
						
							
							Fix lemmatizer  
						
						 
						
						
						
					 
					
						2016-09-27 17:47:05 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							35cd953f9e 
							
						 
					 
					
						
						
							
							Fix pos name conflict with morphology  
						
						 
						
						
						
					 
					
						2016-09-27 14:16:22 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							40509e8bca 
							
						 
					 
					
						
						
							
							Tweak the new is_base_form logic, because we can expect the 'pos' key in the morphology we're passed.  
						
						 
						
						
						
					 
					
						2016-09-27 14:01:16 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3cb4d455d2 
							
						 
					 
					
						
						
							
							Pass lemmatizer morphological features, so that rules are sensitive to base/inflected distinction, which is how the WordNet data is designed. See Issue  #435  
						
						 
						
						
						
					 
					
						2016-09-27 13:52:11 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fd65cf6cbb 
							
						 
					 
					
						
						
							
							Finish refactoring data loading  
						
						 
						
						
						
					 
					
						2016-09-24 20:26:17 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							83e364188c 
							
						 
					 
					
						
						
							
							Mostly finished loading refactoring. Design is in place, but doesn't work yet.  
						
						 
						
						
						
					 
					
						2016-09-24 15:42:01 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							846fa49b2a 
							
						 
					 
					
						
						
							
							distinct load() and from_package() methods  
						
						 
						
						
						
					 
					
						2016-01-16 10:00:57 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							788f734513 
							
						 
					 
					
						
						
							
							refactored data_dir->via, add zip_safe, add spacy.load()  
						
						 
						
						
						
					 
					
						2016-01-15 18:01:02 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							bc229790ac 
							
						 
					 
					
						
						
							
							integrate with sputnik  
						
						 
						
						
						
					 
					
						2016-01-13 19:46:17 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							eaf2ad59f1 
							
						 
					 
					
						
						
							
							* Fix use of mock Package object  
						
						 
						
						
						
					 
					
						2015-12-31 04:13:15 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							55bcdf8bdd 
							
						 
					 
					
						
						
							
							* Fix errors  
						
						 
						
						
						
					 
					
						2015-12-29 22:32:03 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							aec130af56 
							
						 
					 
					
						
						
							
							Use util.Package class for io  
						
						 
						
						... 
						
						
						
						Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download 
						
					 
					
						2015-12-29 18:00:48 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c5902f2b4b 
							
						 
					 
					
						
						
							
							* Upd Lemmatizer to use MockPackage. Replace from_package with load() classmethod  
						
						 
						
						
						
					 
					
						2015-12-29 16:56:02 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							8359bd4d93 
							
						 
					 
					
						
						
							
							strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible  
						
						 
						
						
						
					 
					
						2015-12-18 09:52:55 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							9027cef3bc 
							
						 
					 
					
						
						
							
							access model via sputnik  
						
						 
						
						
						
					 
					
						2015-12-07 06:01:28 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								maxirmx 
							
						 
					 
					
						
						
						
						
							
						
						
							f07e4accd7 
							
						 
					 
					
						
						
							
							Fixing encoding issue  #4  
						
						 
						
						
						
					 
					
						2015-10-21 20:45:56 +03:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								maxirmx 
							
						 
					 
					
						
						
						
						
							
						
						
							fcbfff043f 
							
						 
					 
					
						
						
							
							Fixing encoding issue  #3  
						
						 
						
						
						
					 
					
						2015-10-21 15:52:34 +03:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								maxirmx 
							
						 
					 
					
						
						
						
						
							
						
						
							fe9d2e2c4e 
							
						 
					 
					
						
						
							
							Fixing encode issue  #2  
						
						 
						
						
						
					 
					
						2015-10-21 15:36:21 +03:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								maxirmx 
							
						 
					 
					
						
						
						
						
							
						
						
							e4a1726f77 
							
						 
					 
					
						
						
							
							Fixing encoding issue  
						
						 
						
						... 
						
						
						
						UTF-8 
						
					 
					
						2015-10-21 14:16:37 +03:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5332c0b697 
							
						 
					 
					
						
						
							
							* Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue  #130  
						
						 
						
						
						
					 
					
						2015-10-09 18:54:40 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							24ed3fc25c 
							
						 
					 
					
						
						
							
							* Check file existance before opening in lemmatizer  
						
						 
						
						
						
					 
					
						2015-09-13 10:45:21 +10:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							631c843ed1 
							
						 
					 
					
						
						
							
							* Don't look for index.adv in le,matizer  
						
						 
						
						
						
					 
					
						2015-09-12 06:03:44 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7c660c5efc 
							
						 
					 
					
						
						
							
							* Use dict.get in lemmatizer  
						
						 
						
						
						
					 
					
						2015-09-10 14:51:39 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							64d71f8893 
							
						 
					 
					
						
						
							
							* Fix lemmatizer  
						
						 
						
						
						
					 
					
						2015-09-08 15:38:03 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f0a7c99554 
							
						 
					 
					
						
						
							
							* Relax rule-requirement in lemmatizer  
						
						 
						
						
						
					 
					
						2015-08-27 10:26:19 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0af139e183 
							
						 
					 
					
						
						
							
							* Tagger training now working. Still need to test load/save of model. Morphology still broken.  
						
						 
						
						
						
					 
					
						2015-08-27 09:16:11 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c5a27d1821 
							
						 
					 
					
						
						
							
							* Move lemmatizer to spacy  
						
						 
						
						
						
					 
					
						2015-08-25 15:47:08 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e1c1a4b868 
							
						 
					 
					
						
						
							
							* Tmp  
						
						 
						
						
						
					 
					
						2014-12-21 05:36:29 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							99bbbb6feb 
							
						 
					 
					
						
						
							
							* Work on morphological processing  
						
						 
						
						
						
					 
					
						2014-12-08 21:12:15 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7b68f911cf 
							
						 
					 
					
						
						
							
							* Add WordNet lemmatizer  
						
						 
						
						
						
					 
					
						2014-12-08 01:39:13 +11:00