Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							95aaea0d3f 
							
						 
					 
					
						
						
							
							Refactor so that the tokenizer data is read from Python data, rather than from disk  
						
						
						
					 
					
						2016-09-25 14:49:53 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							82b8cc5efb 
							
						 
					 
					
						
						
							
							Whitespace  
						
						
						
					 
					
						2016-09-24 22:17:01 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f19af6cb2c 
							
						 
					 
					
						
						
							
							Python 3 compatible basestring  
						
						
						
					 
					
						2016-09-24 22:08:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							fd65cf6cbb 
							
						 
					 
					
						
						
							
							Finish refactoring data loading  
						
						
						
					 
					
						2016-09-24 20:26:17 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							83e364188c 
							
						 
					 
					
						
						
							
							Mostly finished loading refactoring. Design is in place, but doesn't work yet.  
						
						
						
					 
					
						2016-09-24 15:42:01 +02:00 
						 
				 
			
				
					
						
							
							
								Daylen Yang 
							
						 
					 
					
						
						
						
						
							
						
						
							5405e7dd73 
							
						 
					 
					
						
						
							
							Fix get_lang_class parsing (take 2)  
						
						
						
					 
					
						2016-05-16 16:40:31 -07:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b240104f40 
							
						 
					 
					
						
						
							
							Revert "Fix get_lang_class parsing"  
						
						
						
					 
					
						2016-05-17 08:04:26 +10:00 
						 
				 
			
				
					
						
							
							
								Daylen Yang 
							
						 
					 
					
						
						
						
						
							
						
						
							1692c2df3c 
							
						 
					 
					
						
						
							
							Fix get_lang_class parsing  
						
						... 
						
						
						
						We want the get_lang_class to return "en" for both "en" and "en_glove_cc_300_1m_vectors". Changed the split rule to "_" so that this happens. 
						
					 
					
						2016-05-16 14:38:20 -07:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							ff690f76ba 
							
						 
					 
					
						
						
							
							fix loading non-german models  
						
						
						
					 
					
						2016-04-12 16:00:56 +02:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							c90d4a6f17 
							
						 
					 
					
						
						
							
							relative imports in __init__.py  
						
						
						
					 
					
						2016-03-26 11:44:53 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							b8f63071eb 
							
						 
					 
					
						
						
							
							add lang registration facility  
						
						
						
					 
					
						2016-03-25 18:54:45 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							a7d7ea3afa 
							
						 
					 
					
						
						
							
							first idea for supporting multiple langs in download script  
						
						
						
					 
					
						2016-03-24 11:19:43 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							eb7ae61b1c 
							
						 
					 
					
						
						
							
							cleanup api  
						
						
						
					 
					
						2016-03-08 12:59:18 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							9cc4f8d5b3 
							
						 
					 
					
						
						
							
							avoid shadowing __name__  
						
						
						
					 
					
						2016-02-15 01:33:39 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							235f094534 
							
						 
					 
					
						
						
							
							untangle data_path/via  
						
						
						
					 
					
						2016-01-16 12:23:45 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							6d1a3af343 
							
						 
					 
					
						
						
							
							cleanup unused  
						
						
						
					 
					
						2016-01-16 10:05:04 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							846fa49b2a 
							
						 
					 
					
						
						
							
							distinct load() and from_package() methods  
						
						
						
					 
					
						2016-01-16 10:00:57 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							211913d689 
							
						 
					 
					
						
						
							
							add about.py, adapt setup.py  
						
						
						
					 
					
						2016-01-15 18:57:01 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							788f734513 
							
						 
					 
					
						
						
							
							refactored data_dir->via, add zip_safe, add spacy.load()  
						
						
						
					 
					
						2016-01-15 18:01:02 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							d9471f684f 
							
						 
					 
					
						
						
							
							fix typo  
						
						
						
					 
					
						2016-01-14 12:14:12 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							9b75d872b0 
							
						 
					 
					
						
						
							
							fix model download  
						
						
						
					 
					
						2016-01-14 12:02:56 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							bc229790ac 
							
						 
					 
					
						
						
							
							integrate with sputnik  
						
						
						
					 
					
						2016-01-13 19:46:17 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							eaf2ad59f1 
							
						 
					 
					
						
						
							
							* Fix use of mock Package object  
						
						
						
					 
					
						2015-12-31 04:13:15 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a2dfdec85d 
							
						 
					 
					
						
						
							
							* Clean up spacy.util  
						
						
						
					 
					
						2015-12-29 18:06:09 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							aec130af56 
							
						 
					 
					
						
						
							
							Use util.Package class for io  
						
						... 
						
						
						
						Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().
Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.
Sputnik is now only used to download and install the data, in
spacy.en.download 
						
					 
					
						2015-12-29 18:00:48 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4131e45543 
							
						 
					 
					
						
						
							
							* Add MockPackage class, to see whether we can proxy for Sputnik in a lightweight way  
						
						
						
					 
					
						2015-12-29 16:55:03 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							d8d348bb55 
							
						 
					 
					
						
						
							
							allow to specify version constraint within model name  
						
						
						
					 
					
						2015-12-18 19:12:08 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							cfa187aaf0 
							
						 
					 
					
						
						
							
							fix tests  
						
						
						
					 
					
						2015-12-18 10:58:02 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							8359bd4d93 
							
						 
					 
					
						
						
							
							strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible  
						
						
						
					 
					
						2015-12-18 09:52:55 +01:00 
						 
				 
			
				
					
						
							
							
								Henning Peters 
							
						 
					 
					
						
						
						
						
							
						
						
							9027cef3bc 
							
						 
					 
					
						
						
							
							access model via sputnik  
						
						
						
					 
					
						2015-12-07 06:01:28 +01:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							dc393a5f1d 
							
						 
					 
					
						
						
							
							Merge pull request  #126  from tomtung/master  
						
						... 
						
						
						
						Improve slicing support for both Doc and Span 
						
					 
					
						2015-10-10 14:14:57 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							83dccf0fd7 
							
						 
					 
					
						
						
							
							* Use io module insteads of deprecated codecs module  
						
						
						
					 
					
						2015-10-10 14:13:01 +11:00 
						 
				 
			
				
					
						
							
							
								Yubing (Tom) Dong 
							
						 
					 
					
						
						
						
						
							
						
						
							3fd3bc79aa 
							
						 
					 
					
						
						
							
							Refactor to remove duplicate slicing logic  
						
						
						
					 
					
						2015-10-07 01:25:35 -07:00 
						 
				 
			
				
					
						
							
							
								alvations 
							
						 
					 
					
						
						
						
						
							
						
						
							8199012d26 
							
						 
					 
					
						
						
							
							changing deprecated codecs.open to io.open =)  
						
						
						
					 
					
						2015-09-30 20:10:15 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6ab1696b15 
							
						 
					 
					
						
						
							
							* Remove read_encoding_freqs from util.py  
						
						
						
					 
					
						2015-07-23 01:17:32 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							317cbbc015 
							
						 
					 
					
						
						
							
							* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.  
						
						
						
					 
					
						2015-07-19 15:18:17 +02:00 
						 
				 
			
				
					
						
							
							
								Jordan Suchow 
							
						 
					 
					
						
						
						
						
							
						
						
							3a8d9b37a6 
							
						 
					 
					
						
						
							
							Remove trailing whitespace  
						
						
						
					 
					
						2015-04-19 13:01:38 -07:00 
						 
				 
			
				
					
						
							
							
								Jordan Suchow 
							
						 
					 
					
						
						
						
						
							
						
						
							5f0f940a1f 
							
						 
					 
					
						
						
							
							Remove unused imports  
						
						
						
					 
					
						2015-04-19 01:05:22 -07:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3f1944d688 
							
						 
					 
					
						
						
							
							* Make PyPy work  
						
						
						
					 
					
						2015-01-05 17:54:38 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							f5d41028b5 
							
						 
					 
					
						
						
							
							* Move around data files for test release  
						
						
						
					 
					
						2015-01-03 01:59:22 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e1c1a4b868 
							
						 
					 
					
						
						
							
							* Tmp  
						
						
						
					 
					
						2014-12-21 05:36:29 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							b962fe73d7 
							
						 
					 
					
						
						
							
							* Make suffixes file use full-power regex, so that we can handle periods properly  
						
						
						
					 
					
						2014-12-09 19:04:27 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							302e09018b 
							
						 
					 
					
						
						
							
							* Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas  
						
						
						
					 
					
						2014-12-09 14:48:01 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ea8f1e7053 
							
						 
					 
					
						
						
							
							* Tighten interfaces  
						
						
						
					 
					
						2014-10-30 18:14:42 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							67c8c8019f 
							
						 
					 
					
						
						
							
							* Update lexeme serialization, using a binary file format  
						
						
						
					 
					
						2014-10-30 01:01:00 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							43d5964e13 
							
						 
					 
					
						
						
							
							* Add function to read detokenization rules  
						
						
						
					 
					
						2014-10-22 12:54:59 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							12742f4f83 
							
						 
					 
					
						
						
							
							* Add detokenize method and test  
						
						
						
					 
					
						2014-10-18 18:07:29 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							6fb42c4919 
							
						 
					 
					
						
						
							
							* Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang  
						
						
						
					 
					
						2014-10-14 16:17:45 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e40caae51f 
							
						 
					 
					
						
						
							
							* Update Lexicon class to expect a list of lexeme dict descriptions  
						
						
						
					 
					
						2014-10-09 14:51:35 +11:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							2e44fa7179 
							
						 
					 
					
						
						
							
							* Add util.py  
						
						
						
					 
					
						2014-09-25 18:26:22 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							e9a62b6eba 
							
						 
					 
					
						
						
							
							* Refactoring with Lexeme as a class now compiles. Basic design seems to work  
						
						
						
					 
					
						2014-08-27 17:15:39 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d10993f41a 
							
						 
					 
					
						
						
							
							* More docs work  
						
						
						
					 
					
						2014-08-21 16:37:13 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							3379d7a571 
							
						 
					 
					
						
						
							
							* Reforming data model for lexemes  
						
						
						
					 
					
						2014-08-19 02:40:37 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							01469b0888 
							
						 
					 
					
						
						
							
							* Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word.  
						
						
						
					 
					
						2014-08-18 19:14:00 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ff1869ff07 
							
						 
					 
					
						
						
							
							* Fixed major efficiency problem, from not quite grokking pass by reference in cython c++  
						
						
						
					 
					
						2014-07-07 07:36:43 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							25849fc926 
							
						 
					 
					
						
						
							
							* Generalize tokenization rules to capitals  
						
						
						
					 
					
						2014-07-07 05:07:21 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4e79446dc2 
							
						 
					 
					
						
						
							
							* Reading in tokenization rules correctly. Passing tests.  
						
						
						
					 
					
						2014-07-07 00:02:55 +02:00 
						 
				 
			
				
					
						
							
							
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							556f6a18ca 
							
						 
					 
					
						
						
							
							* Initial commit. Tests passing for punctuation handling. Need contractions, file transport, tokenize function, etc.  
						
						
						
					 
					
						2014-07-05 20:51:42 +02:00