| 
							
							
								 Matthew Honnibal | 6c807aa45f | * Restore id attribute to lexeme, and rename pos field to postype, to store clustered tag dictionaries | 2014-10-31 17:43:00 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c6fcd03692 | * Small efficiency tweak to lexeme init | 2014-10-30 17:56:11 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 87c2418a89 | * Fiddle with data types on Lexeme, to compress them to a much smaller size. | 2014-10-30 15:42:15 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e6b87766fe | * Remove lexemes vector from Lexicon, and the id and hash attributes from Lexeme | 2014-10-30 15:21:38 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 67c8c8019f | * Update lexeme serialization, using a binary file format | 2014-10-30 01:01:00 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 13909a2e24 | * Rewriting Lexeme serialization. | 2014-10-29 23:19:38 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 08ce602243 | * Large refactor, particularly to Python API | 2014-10-24 00:59:17 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e5e951ae67 | * Remove the feature array stuff from Tokens class, and replace vector with array-based implementation, with padding. | 2014-10-23 01:57:59 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0a0e41f6c8 | * Add prefix and suffix features | 2014-10-22 12:56:09 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 65d3ead4fd | * Rename LexStr_casefix to LexStr_norm and LexInt_i to LexInt_id | 2014-10-14 15:19:07 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 71ee921055 | * Slight cleaning of tokenizer code | 2014-10-10 19:17:22 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 59b41a9fd3 | * Switch to new data model, tests passing | 2014-10-10 08:11:31 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1b0e01d3d8 | * Revising data model of lexeme. Compiles. | 2014-10-09 19:53:30 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 51d75b244b | * Add serialize/deserialize functions for lexeme, transport to/from python dict. | 2014-10-09 14:10:46 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d73d89a2de | * Add i attribute to lexeme, giving lexemes sequential IDs. | 2014-10-09 13:50:05 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ac522e2553 | * Switch from own memory class to cymem, in pip | 2014-09-17 23:09:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6266cac593 | * Switch to using a Python ref counted gateway to malloc/free, to prevent memory leaks | 2014-09-17 20:02:26 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c396581a0b | * Fiddle with the way strings are interned in lexeme | 2014-09-15 06:34:45 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f77b7098c0 | * Upd Tokens to use vector, with bounds checking. | 2014-09-15 03:22:40 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | df24e3708c | * Move EnglishTokens stuff to Tokens | 2014-09-15 01:31:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b488224c09 | * Restoring Lexeme-as-struct | 2014-09-10 20:41:37 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 88095666dc | * Remove Lexeme struct, preparing to rename Word to Lexeme. | 2014-08-24 19:24:42 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e289896603 | * Fix ptb3 module | 2014-08-22 16:36:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d10993f41a | * More docs work | 2014-08-21 16:37:13 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a78ad4152d | * Broken version being refactored for docs | 2014-08-20 13:39:39 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5fddb8d165 | * Working refactor, with updated data model for Lexemes | 2014-08-19 04:21:20 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3379d7a571 | * Reforming data model for lexemes | 2014-08-19 02:40:37 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 01469b0888 | * Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word. | 2014-08-18 19:14:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6319ff0f22 | * Add length property | 2014-08-02 21:26:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 571808a274 | Group-by seems to be working | 2014-07-07 20:27:02 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 80b36f9f27 | * 710k words per second for counts | 2014-07-07 19:12:19 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 057c21969b | * Refactor for string view features. Working on setting up flags and enums. | 2014-07-07 16:58:48 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | f1bcbd4c4e | * Reorganized code to accomodate Tokens class. Need string views before group_by and count_by can be done well. | 2014-07-07 12:47:21 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ff1869ff07 | * Fixed major efficiency problem, from not quite grokking pass by reference in cython c++ | 2014-07-07 07:36:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d5bef02c72 | * Reorganized, moving language-independent stuff to spacy. The functions in spacy ask for the dictionaries and split function on input, but the language-specific modules are curried versions that use the globals | 2014-07-07 04:21:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 556f6a18ca | * Initial commit. Tests passing for punctuation handling. Need contractions, file transport, tokenize function, etc. | 2014-07-05 20:51:42 +02:00 |  |