ines 
							
						 
					 
					
						
						
						
						
							
						
						
							91899d337b 
							
						 
					 
					
						
						
							
							Tidy up language, lemmatizer and scorer  
						
						 
						
						
						
					 
					
						2017-10-27 14:40:14 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							8492d5be6d 
							
						 
					 
					
						
						
							
							Always make lemmatizer return a list of lemmas, not a set  
						
						 
						
						
						
					 
					
						2017-10-24 16:00:56 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							95f866f99f 
							
						 
					 
					
						
						
							
							Add lookup argument to Lemmatizer.load  
						
						 
						
						
						
					 
					
						2017-10-24 16:00:56 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							3516aa0cea 
							
						 
					 
					
						
						
							
							Port over changes from  #1389  
						
						 
						
						
						
					 
					
						2017-10-14 13:32:55 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9b90d235d1 
							
						 
					 
					
						
						
							
							Fix tag check in lemmatizer  
						
						 
						
						
						
					 
					
						2017-10-12 22:50:43 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							9fd471372a 
							
						 
					 
					
						
						
							
							Add lookup lemmatizer to lemmatizer as lookup() method  
						
						 
						
						
						
					 
					
						2017-10-11 13:25:51 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a6ac4699eb 
							
						 
					 
					
						
						
							
							Allow Morphology class to setup tokens  
						
						 
						
						... 
						
						
						
						Add Morphology.assign_untagged() C-method, and call it from
Doc.push_back() when a token is created. This gives a place
to allow the Morphology class to initialize token data. 
						
					 
					
						2017-10-11 03:24:14 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c15d8278cb 
							
						 
					 
					
						
						
							
							Avoid lemmatizing inappropriate tags in English lemmatizer  
						
						 
						
						
						
					 
					
						2017-10-11 03:23:23 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							820bf85075 
							
						 
					 
					
						
						
							
							Move LookupLemmatizer to spacy.lemmatizer  
						
						 
						
						
						
					 
					
						2017-10-11 02:25:13 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9cb2aef587 
							
						 
					 
					
						
						
							
							Remove print statement  
						
						 
						
						
						
					 
					
						2017-09-14 13:38:28 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5c3ff06924 
							
						 
					 
					
						
						
							
							Fix lemmatizer rules  
						
						 
						
						
						
					 
					
						2017-09-06 19:13:24 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							bfddf50081 
							
						 
					 
					
						
						
							
							Fix   #1296 : Incorrect lemmatization of base form verbs  
						
						 
						
						
						
					 
					
						2017-09-04 15:18:41 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							d24589aa72 
							
						 
					 
					
						
						
							
							Clean up imports, unused code, whitespace, docstrings  
						
						 
						
						
						
					 
					
						2017-04-15 12:05:47 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							561f2a3eb4 
							
						 
					 
					
						
						
							
							Use consistent formatting for docstrings  
						
						 
						
						
						
					 
					
						2017-04-15 11:59:21 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ed2b106f4d 
							
						 
					 
					
						
						
							
							Fix circular import in lemmatizer  
						
						 
						
						
						
					 
					
						2017-03-26 07:17:07 -05:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c748907a66 
							
						 
					 
					
						
						
							
							Fix errors in previous commit  
						
						 
						
						
						
					 
					
						2017-03-25 22:25:01 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4f400fa486 
							
						 
					 
					
						
						
							
							Prevent lemmatization of base nouns  
						
						 
						
						... 
						
						
						
						Update lemmatizer's base-form check, for change in morphology class.
Closes  #903 . 
						
					 
					
						2017-03-25 21:51:12 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4454c1b23f 
							
						 
					 
					
						
						
							
							Block lemmatization of base-form adjectives  
						
						 
						
						... 
						
						
						
						Fixes check that an adjective is a base form (as opposed to a
comparative or superlative), so that it's not lemmatized.
e.g. inner -!> inn. Closes  #912 . 
						
					 
					
						2017-03-25 21:29:57 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							413138de79 
							
						 
					 
					
						
						
							
							Fix   #719 : Lemmatizer can no longer output empty string  
						
						 
						
						
						
					 
					
						2017-03-18 16:02:06 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c4351e1165 
							
						 
					 
					
						
						
							
							Update base-form check in lemmatizer, for UD 2.0 morphology  
						
						 
						
						
						
					 
					
						2017-03-16 17:59:31 -05:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fea9fe08af 
							
						 
					 
					
						
						
							
							Merge pull request  #866  from juanmirocks/master  
						
						 
						
						... 
						
						
						
						Fix lemmatization of OOV words 
						
					 
					
						2017-03-16 23:37:36 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ines 
							
						 
					 
					
						
						
						
						
							
						
						
							1da29a7146 
							
						 
					 
					
						
						
							
							Use new Lemmatizer data and remove file import  
						
						 
						
						... 
						
						
						
						Since there's currently only an English lemmatizer, the global
Lemmatizer imports from spacy.en. This is unideal and still needs to be
fixed. 
						
					 
					
						2017-03-12 13:58:22 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Juan Miguel Cejuela 
							
						 
					 
					
						
						
						
						
							
						
						
							25c29f072d 
							
						 
					 
					
						
						
							
							apply patch  
						
						 
						
						
						
					 
					
						2017-03-01 21:44:17 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							44f4f008bd 
							
						 
					 
					
						
						
							
							Wire up lemmatizer rules for English  
						
						 
						
						
						
					 
					
						2016-12-18 15:50:09 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a4eb5c2bff 
							
						 
					 
					
						
						
							
							Check POS key in lemmatizer, to update it for new data format  
						
						 
						
						
						
					 
					
						2016-12-18 13:28:20 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ines Montani 
							
						 
					 
					
						
						
						
						
							
						
						
							8350d65695 
							
						 
					 
					
						
						
							
							Change morphology and lemmatizer API  
						
						 
						
						... 
						
						
						
						Take morphology features as object instead of keyword arguments 
						
					 
					
						2016-12-07 21:12:49 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e30348b331 
							
						 
					 
					
						
						
							
							Prefer to import from symbols instead of parts_of_speech  
						
						 
						
						
						
					 
					
						2016-11-04 00:27:55 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f5fe4f595b 
							
						 
					 
					
						
						
							
							Fix json loading, for Python 3.  
						
						 
						
						
						
					 
					
						2016-10-20 21:23:26 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2e92c6fb3a 
							
						 
					 
					
						
						
							
							Fix JSON encoding issue on load  
						
						 
						
						
						
					 
					
						2016-10-20 21:06:48 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f189a3cb00 
							
						 
					 
					
						
						
							
							Fix encoding when opening files in Python 2.7, re Issue  #539  
						
						 
						
						
						
					 
					
						2016-10-20 14:42:56 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a2f3510d6d 
							
						 
					 
					
						
						
							
							Fix lemmatizer  
						
						 
						
						
						
					 
					
						2016-09-27 17:47:05 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							35cd953f9e 
							
						 
					 
					
						
						
							
							Fix pos name conflict with morphology  
						
						 
						
						
						
					 
					
						2016-09-27 14:16:22 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							40509e8bca 
							
						 
					 
					
						
						
							
							Tweak the new is_base_form logic, because we can expect the 'pos' key in the morphology we're passed.  
						
						 
						
						
						
					 
					
						2016-09-27 14:01:16 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3cb4d455d2 
							
						 
					 
					
						
						
							
							Pass lemmatizer morphological features, so that rules are sensitive to base/inflected distinction, which is how the WordNet data is designed. See Issue  #435  
						
						 
						
						
						
					 
					
						2016-09-27 13:52:11 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fd65cf6cbb 
							
						 
					 
					
						
						
							
							Finish refactoring data loading  
						
						 
						
						
						
					 
					
						2016-09-24 20:26:17 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							83e364188c 
							
						 
					 
					
						
						
							
							Mostly finished loading refactoring. Design is in place, but doesn't work yet.  
						
						 
						
						
						
					 
					
						2016-09-24 15:42:01 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							846fa49b2a 
							
						 
					 
					
						
						
							
							distinct load() and from_package() methods  
						
						 
						
						
						
					 
					
						2016-01-16 10:00:57 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							788f734513 
							
						 
					 
					
						
						
							
							refactored data_dir->via, add zip_safe, add spacy.load()  
						
						 
						
						
						
					 
					
						2016-01-15 18:01:02 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							bc229790ac 
							
						 
					 
					
						
						
							
							integrate with sputnik  
						
						 
						
						
						
					 
					
						2016-01-13 19:46:17 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							eaf2ad59f1 
							
						 
					 
					
						
						
							
							* Fix use of mock Package object  
						
						 
						
						
						
					 
					
						2015-12-31 04:13:15 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							55bcdf8bdd 
							
						 
					 
					
						
						
							
							* Fix errors  
						
						 
						
						
						
					 
					
						2015-12-29 22:32:03 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							aec130af56 
							
						 
					 
					
						
						
							
							Use util.Package class for io  
						
						 
						
						... 
						
						
						
						Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download 
						
					 
					
						2015-12-29 18:00:48 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							c5902f2b4b 
							
						 
					 
					
						
						
							
							* Upd Lemmatizer to use MockPackage. Replace from_package with load() classmethod  
						
						 
						
						
						
					 
					
						2015-12-29 16:56:02 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							8359bd4d93 
							
						 
					 
					
						
						
							
							strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible  
						
						 
						
						
						
					 
					
						2015-12-18 09:52:55 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							9027cef3bc 
							
						 
					 
					
						
						
							
							access model via sputnik  
						
						 
						
						
						
					 
					
						2015-12-07 06:01:28 +01:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								maxirmx 
							
						 
					 
					
						
						
						
						
							
						
						
							f07e4accd7 
							
						 
					 
					
						
						
							
							Fixing encoding issue  #4  
						
						 
						
						
						
					 
					
						2015-10-21 20:45:56 +03:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								maxirmx 
							
						 
					 
					
						
						
						
						
							
						
						
							fcbfff043f 
							
						 
					 
					
						
						
							
							Fixing encoding issue  #3  
						
						 
						
						
						
					 
					
						2015-10-21 15:52:34 +03:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								maxirmx 
							
						 
					 
					
						
						
						
						
							
						
						
							fe9d2e2c4e 
							
						 
					 
					
						
						
							
							Fixing encode issue  #2  
						
						 
						
						
						
					 
					
						2015-10-21 15:36:21 +03:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								maxirmx 
							
						 
					 
					
						
						
						
						
							
						
						
							e4a1726f77 
							
						 
					 
					
						
						
							
							Fixing encoding issue  
						
						 
						
						... 
						
						
						
						UTF-8 
						
					 
					
						2015-10-21 14:16:37 +03:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							5332c0b697 
							
						 
					 
					
						
						
							
							* Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue  #130  
						
						 
						
						
						
					 
					
						2015-10-09 18:54:40 +11:00