| 
							
							
								 Matthew Honnibal | 99f5e59286 | * Have tokenizer emit tokens for whitespace other than single spaces | 2014-10-14 20:25:57 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6fb42c4919 | * Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang | 2014-10-14 16:17:45 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 59b41a9fd3 | * Switch to new data model, tests passing | 2014-10-10 08:11:31 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bc460de171 | * Add extra tests | 2014-09-25 18:29:42 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2d4e5ceafd | * Remove old docs stuff | 2014-09-25 18:24:05 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0152831c89 | * Refactor tokenization, enable cache, and ensure we look up specials correctly even when there's confusing punctuation surrounding the token. | 2014-09-16 18:01:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | db191361ee | * Add new tests for fancier tokenization cases | 2014-09-15 06:31:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5dcc1a426a | * Update tokenization tests for new tokenizer rules | 2014-09-15 01:32:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0447279c57 | * PointerHash working, efficiency is good. 6-7 mins | 2014-09-13 16:43:59 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 985bc68327 | * Fix bug with trailing punct on contractions. Reduced efficiency, and slightly hacky implementation. | 2014-09-12 18:26:26 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b5b31c6b6e | * Avoid testing for object identity | 2014-09-10 20:58:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7c09c73a14 | * Refactor to use tokens class. | 2014-09-10 18:27:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5ee4d8c641 | * Work on tests for flag features | 2014-09-01 23:41:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bf47429368 | * Add tests for non_sparse string transform | 2014-09-01 23:27:31 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c50433163f | * Add tests for flag features | 2014-09-01 23:27:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 786a4a86fe | * Add tests for canon_case | 2014-09-01 23:26:49 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4c7b997df7 | * Add tests for word shape features | 2014-09-01 23:26:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c5abb81f4c | * Add incomplete tests of asciify function | 2014-09-01 23:25:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8bbfadfced | * Pass tests. Need to implement more feature functions. | 2014-08-30 20:36:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dcab14ede2 | * Begin testing more functionality | 2014-08-30 19:01:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6209d94f83 | * Add tests for word shape | 2014-08-30 19:00:10 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c282e6d5fb | * Redesign proceeding | 2014-08-28 19:45:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fd4e61e58b | * Fixed contraction tests. Need to correct problem with the way case stats and tag stats are supposed to work. | 2014-08-27 20:22:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fdaf24604a | * Basic punct tests updated and passing | 2014-08-27 19:38:57 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9815c7649e | * Refactor around Word objects, adapting tests. Tests passing, except for string views. | 2014-08-23 19:55:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6f83dca218 | * Fix import for ptb tokenization test | 2014-08-22 17:05:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4bcdd6d31c | * Further improvements to spacy docs, tweaks to code. | 2014-08-22 04:20:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 01469b0888 | * Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word. | 2014-08-18 19:14:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b555e2dc5d | * Add hash tests | 2014-08-02 21:58:31 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6319ff0f22 | * Add length property | 2014-08-02 21:26:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e494494d80 | * Add tests for group_by | 2014-07-23 17:36:12 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bc6c1f6156 | * Add test for open apostrophe bug | 2014-07-07 23:24:20 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e60b958b7d | * Add test to check how well we match ptb tokenizer. Needs more text. | 2014-07-07 05:11:31 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2c431f9fdc | * Upd tokenization test | 2014-07-07 05:11:04 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 25849fc926 | * Generalize tokenization rules to capitals | 2014-07-07 05:07:21 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e4263a241a | * Tests passing for reorganized version | 2014-07-07 04:23:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 12f8a0e3c2 | * Tests passing for reorganized version | 2014-07-07 04:23:20 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a62c38e1ef | * Working tokenization. en doesn't match PTB perfectly. Need to reorganize before adding more schemes. | 2014-07-07 01:15:59 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4e79446dc2 | * Reading in tokenization rules correctly. Passing tests. | 2014-07-07 00:02:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9bef797afe | * Rejigged tests. Working possessives, but no other contractions | 2014-07-06 20:02:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 556f6a18ca | * Initial commit. Tests passing for punctuation handling. Need contractions, file transport, tokenize function, etc. | 2014-07-05 20:51:42 +02:00 |  |