Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							9c4d0aae62 
							
						 
					 
					
						
						
							
							* Switch to better Python2/3 compatible unicode handling  
						
						 
						
						
						
					 
					
						2015-07-28 14:45:37 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							df01a88763 
							
						 
					 
					
						
						
							
							Merge branch 'refactor' (and serializaton)  
						
						 
						
						... 
						
						
						
						Add Huffman-code serialization, and do a lot of
refactoring. Highlights include:
* Much more efficient StringStore
* Vocab maintains a by-orth mapping of Lexemes
* Avoid manually slicing Py_UNICODE buffers,
  simplifying tokenizer and vocab C APIs
* Remove various bits of dead code
* Work on removing GIL around parser
* Work on bridge to Theano
Conflicts:
	spacy/strings.pxd
	spacy/strings.pyx
	spacy/structs.pxd 
						
					 
					
						2015-07-23 02:18:35 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							bf77bcd6b9 
							
						 
					 
					
						
						
							
							* Add comment explaining hash_string  
						
						 
						
						
						
					 
					
						2015-07-22 13:39:42 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							dd60594f41 
							
						 
					 
					
						
						
							
							* Fix double encoding error in strings.pyx  
						
						 
						
						
						
					 
					
						2015-07-20 13:52:56 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							52d538ea42 
							
						 
					 
					
						
						
							
							* Fix short string optimization in strings.pyx. StringStore tests now all pass.  
						
						 
						
						
						
					 
					
						2015-07-20 12:05:23 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							09a3055630 
							
						 
					 
					
						
						
							
							* Work on short string optimization in Utf8Str  
						
						 
						
						
						
					 
					
						2015-07-20 11:26:46 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4dddc8a69b 
							
						 
					 
					
						
						
							
							* Fix type declarations for attr_t. Remove unused id_t.  
						
						 
						
						
						
					 
					
						2015-07-18 22:39:57 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							15ff739996 
							
						 
					 
					
						
						
							
							* Fix passing of ID attribute in string store  
						
						 
						
						
						
					 
					
						2015-07-17 14:49:42 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							95e57c2780 
							
						 
					 
					
						
						
							
							* Remove unnecessary key and id properties from Utf8String.  
						
						 
						
						
						
					 
					
						2015-07-17 01:40:18 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							d1cb30dbc4 
							
						 
					 
					
						
						
							
							* Remove unnecessary key and id properties from Utf8String.  
						
						 
						
						
						
					 
					
						2015-07-16 19:29:02 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							8bf0f65f1c 
							
						 
					 
					
						
						
							
							* Remove dead code in strings.pyx  
						
						 
						
						
						
					 
					
						2015-07-16 17:35:53 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							a9c3863665 
							
						 
					 
					
						
						
							
							* Fix inefficiency in StringStore.dump function  
						
						 
						
						
						
					 
					
						2015-07-16 17:34:32 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cc579ed429 
							
						 
					 
					
						
						
							
							* Add __len__ function to StringStore  
						
						 
						
						
						
					 
					
						2015-06-23 00:02:50 +02:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							71b95202eb 
							
						 
					 
					
						
						
							
							* Add docstring to StringStore  
						
						 
						
						
						
					 
					
						2015-01-24 20:49:15 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7d3c40de7d 
							
						 
					 
					
						
						
							
							* Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme  
						
						 
						
						
						
					 
					
						2015-01-15 00:33:16 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							0930892fc1 
							
						 
					 
					
						
						
							
							* Tmp. Working on refactor. Compiles, must hook up lexical feats.  
						
						 
						
						
						
					 
					
						2015-01-14 00:03:48 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							ce2edd6312 
							
						 
					 
					
						
						
							
							* Tmp commit. Refactoring to create a Python Lexeme class.  
						
						 
						
						
						
					 
					
						2015-01-12 10:26:22 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							73f200436f 
							
						 
					 
					
						
						
							
							* Tests passing except for morphology/lemmatization stuff  
						
						 
						
						
						
					 
					
						2014-12-23 11:40:32 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							cf8d26c3d2 
							
						 
					 
					
						
						
							
							* POS tagger training working after reorg  
						
						 
						
						
						
					 
					
						2014-12-22 08:54:47 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							4c4aa2c5c9 
							
						 
					 
					
						
						
							
							* Work on train  
						
						 
						
						
						
					 
					
						2014-12-22 07:25:43 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							89a1cc1a48 
							
						 
					 
					
						
						
							
							* Move murmurhash to .pxd in strings file  
						
						 
						
						
						
					 
					
						2014-12-20 07:41:08 +11:00  
					
					
						 
						
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Matthew Honnibal 
							
						 
					 
					
						
						
						
						
							
						
						
							7d48bba6c4 
							
						 
					 
					
						
						
							
							* Move StringStore class to its own file  
						
						 
						
						
						
					 
					
						2014-12-20 06:42:01 +11:00