Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							14097311ae
							
						
					 | 
					
						
						
							
							* Make StringStore.__getitem__ accept unicode-typed keys.
						
						
						
						
						
					 | 
					
						2014-12-03 01:33:20 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							60c1e78596
							
						
					 | 
					
						
						
							
							* Commit outstanding tests
						
						
						
						
						
					 | 
					
						2014-11-12 23:24:32 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b01604b303
							
						
					 | 
					
						
						
							
							* Upd NER tests
						
						
						
						
						
					 | 
					
						2014-11-11 21:10:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							10e9e14c4f
							
						
					 | 
					
						
						
							
							* Add tests for NER oracle
						
						
						
						
						
					 | 
					
						2014-11-10 22:13:46 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							d7b2843643
							
						
					 | 
					
						
						
							
							* Add some tests for ner
						
						
						
						
						
					 | 
					
						2014-11-10 16:29:19 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a42321bd4e
							
						
					 | 
					
						
						
							
							* Upd shape test
						
						
						
						
						
					 | 
					
						2014-11-07 04:42:54 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							81da61f3cf
							
						
					 | 
					
						
						
							
							* Remove out-dated POS data test
						
						
						
						
						
					 | 
					
						2014-11-05 02:04:12 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0de700b566
							
						
					 | 
					
						
						
							
							* Comment out tests of hyphenation, while we decide what hyphenation policy should be.
						
						
						
						
						
					 | 
					
						2014-11-05 02:03:22 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							11915e5238
							
						
					 | 
					
						
						
							
							* Update tests
						
						
						
						
						
					 | 
					
						2014-11-03 00:23:04 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							493d5ffb50
							
						
					 | 
					
						
						
							
							* Add test for '' in punct
						
						
						
						
						
					 | 
					
						2014-11-02 21:24:09 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							99b5cefa88
							
						
					 | 
					
						
						
							
							* Add tests for emoticon tokenization
						
						
						
						
						
					 | 
					
						2014-11-02 13:22:14 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							23131f21bb
							
						
					 | 
					
						
						
							
							* Add tests for like_url
						
						
						
						
						
					 | 
					
						2014-11-02 13:21:57 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							dc6c3c0f56
							
						
					 | 
					
						
						
							
							* Add tests for like_number
						
						
						
						
						
					 | 
					
						2014-11-02 13:21:39 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c414d0eebe
							
						
					 | 
					
						
						
							
							* Add tests for is_number
						
						
						
						
						
					 | 
					
						2014-11-01 19:13:40 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							63114820cf
							
						
					 | 
					
						
						
							
							* Upd tests for tighter interface
						
						
						
						
						
					 | 
					
						2014-10-30 18:15:30 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							13909a2e24
							
						
					 | 
					
						
						
							
							* Rewriting Lexeme serialization.
						
						
						
						
						
					 | 
					
						2014-10-29 23:19:38 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							08ce602243
							
						
					 | 
					
						
						
							
							* Large refactor, particularly to Python API
						
						
						
						
						
					 | 
					
						2014-10-24 00:59:17 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							168b2b8cb2
							
						
					 | 
					
						
						
							
							* Add tests for string intern
						
						
						
						
						
					 | 
					
						2014-10-23 20:47:06 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							077885637d
							
						
					 | 
					
						
						
							
							* Add test for reading in POS tags
						
						
						
						
						
					 | 
					
						2014-10-22 10:18:43 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							12742f4f83
							
						
					 | 
					
						
						
							
							* Add detokenize method and test
						
						
						
						
						
					 | 
					
						2014-10-18 18:07:29 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							31aad7c08a
							
						
					 | 
					
						
						
							
							* Test hyphenation etc
						
						
						
						
						
					 | 
					
						2014-10-14 20:26:16 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							99f5e59286
							
						
					 | 
					
						
						
							
							* Have tokenizer emit tokens for whitespace other than single spaces
						
						
						
						
						
					 | 
					
						2014-10-14 20:25:57 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6fb42c4919
							
						
					 | 
					
						
						
							
							* Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang
						
						
						
						
						
					 | 
					
						2014-10-14 16:17:45 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							59b41a9fd3
							
						
					 | 
					
						
						
							
							* Switch to new data model, tests passing
						
						
						
						
						
					 | 
					
						2014-10-10 08:11:31 +11:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bc460de171
							
						
					 | 
					
						
						
							
							* Add extra tests
						
						
						
						
						
					 | 
					
						2014-09-25 18:29:42 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2d4e5ceafd
							
						
					 | 
					
						
						
							
							* Remove old docs stuff
						
						
						
						
						
					 | 
					
						2014-09-25 18:24:05 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0152831c89
							
						
					 | 
					
						
						
							
							* Refactor tokenization, enable cache, and ensure we look up specials correctly even when there's confusing punctuation surrounding the token.
						
						
						
						
						
					 | 
					
						2014-09-16 18:01:46 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							db191361ee
							
						
					 | 
					
						
						
							
							* Add new tests for fancier tokenization cases
						
						
						
						
						
					 | 
					
						2014-09-15 06:31:58 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5dcc1a426a
							
						
					 | 
					
						
						
							
							* Update tokenization tests for new tokenizer rules
						
						
						
						
						
					 | 
					
						2014-09-15 01:32:51 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							0447279c57
							
						
					 | 
					
						
						
							
							* PointerHash working, efficiency is good. 6-7 mins
						
						
						
						
						
					 | 
					
						2014-09-13 16:43:59 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							985bc68327
							
						
					 | 
					
						
						
							
							* Fix bug with trailing punct on contractions. Reduced efficiency, and slightly hacky implementation.
						
						
						
						
						
					 | 
					
						2014-09-12 18:26:26 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b5b31c6b6e
							
						
					 | 
					
						
						
							
							* Avoid testing for object identity
						
						
						
						
						
					 | 
					
						2014-09-10 20:58:30 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							7c09c73a14
							
						
					 | 
					
						
						
							
							* Refactor to use tokens class.
						
						
						
						
						
					 | 
					
						2014-09-10 18:27:44 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							5ee4d8c641
							
						
					 | 
					
						
						
							
							* Work on tests for flag features
						
						
						
						
						
					 | 
					
						2014-09-01 23:41:43 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bf47429368
							
						
					 | 
					
						
						
							
							* Add tests for non_sparse string transform
						
						
						
						
						
					 | 
					
						2014-09-01 23:27:31 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c50433163f
							
						
					 | 
					
						
						
							
							* Add tests for flag features
						
						
						
						
						
					 | 
					
						2014-09-01 23:27:09 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							786a4a86fe
							
						
					 | 
					
						
						
							
							* Add tests for canon_case
						
						
						
						
						
					 | 
					
						2014-09-01 23:26:49 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4c7b997df7
							
						
					 | 
					
						
						
							
							* Add tests for word shape features
						
						
						
						
						
					 | 
					
						2014-09-01 23:26:17 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c5abb81f4c
							
						
					 | 
					
						
						
							
							* Add incomplete tests of asciify function
						
						
						
						
						
					 | 
					
						2014-09-01 23:25:51 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							8bbfadfced
							
						
					 | 
					
						
						
							
							* Pass tests. Need to implement more feature functions.
						
						
						
						
						
					 | 
					
						2014-08-30 20:36:06 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							dcab14ede2
							
						
					 | 
					
						
						
							
							* Begin testing more functionality
						
						
						
						
						
					 | 
					
						2014-08-30 19:01:15 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6209d94f83
							
						
					 | 
					
						
						
							
							* Add tests for word shape
						
						
						
						
						
					 | 
					
						2014-08-30 19:00:10 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							c282e6d5fb
							
						
					 | 
					
						
						
							
							* Redesign proceeding
						
						
						
						
						
					 | 
					
						2014-08-28 19:45:09 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fd4e61e58b
							
						
					 | 
					
						
						
							
							* Fixed contraction tests. Need to correct problem with the way case stats and tag stats are supposed to work.
						
						
						
						
						
					 | 
					
						2014-08-27 20:22:33 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							fdaf24604a
							
						
					 | 
					
						
						
							
							* Basic punct tests updated and passing
						
						
						
						
						
					 | 
					
						2014-08-27 19:38:57 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9815c7649e
							
						
					 | 
					
						
						
							
							* Refactor around Word objects, adapting tests. Tests passing, except for string views.
						
						
						
						
						
					 | 
					
						2014-08-23 19:55:06 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6f83dca218
							
						
					 | 
					
						
						
							
							* Fix import for ptb tokenization test
						
						
						
						
						
					 | 
					
						2014-08-22 17:05:44 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4bcdd6d31c
							
						
					 | 
					
						
						
							
							* Further improvements to spacy docs, tweaks to code.
						
						
						
						
						
					 | 
					
						2014-08-22 04:20:24 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							01469b0888
							
						
					 | 
					
						
						
							
							* Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word.
						
						
						
						
						
					 | 
					
						2014-08-18 19:14:00 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							b555e2dc5d
							
						
					 | 
					
						
						
							
							* Add hash tests
						
						
						
						
						
					 | 
					
						2014-08-02 21:58:31 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							6319ff0f22
							
						
					 | 
					
						
						
							
							* Add length property
						
						
						
						
						
					 | 
					
						2014-08-02 21:26:44 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e494494d80
							
						
					 | 
					
						
						
							
							* Add tests for group_by
						
						
						
						
						
					 | 
					
						2014-07-23 17:36:12 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							bc6c1f6156
							
						
					 | 
					
						
						
							
							* Add test for open apostrophe bug
						
						
						
						
						
					 | 
					
						2014-07-07 23:24:20 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e60b958b7d
							
						
					 | 
					
						
						
							
							* Add test to check how well we match ptb tokenizer. Needs more text.
						
						
						
						
						
					 | 
					
						2014-07-07 05:11:31 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							2c431f9fdc
							
						
					 | 
					
						
						
							
							* Upd tokenization test
						
						
						
						
						
					 | 
					
						2014-07-07 05:11:04 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							25849fc926
							
						
					 | 
					
						
						
							
							* Generalize tokenization rules to capitals
						
						
						
						
						
					 | 
					
						2014-07-07 05:07:21 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							e4263a241a
							
						
					 | 
					
						
						
							
							* Tests passing for reorganized version
						
						
						
						
						
					 | 
					
						2014-07-07 04:23:46 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							12f8a0e3c2
							
						
					 | 
					
						
						
							
							* Tests passing for reorganized version
						
						
						
						
						
					 | 
					
						2014-07-07 04:23:20 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							a62c38e1ef
							
						
					 | 
					
						
						
							
							* Working tokenization. en doesn't match PTB perfectly. Need to reorganize before adding more schemes.
						
						
						
						
						
					 | 
					
						2014-07-07 01:15:59 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							4e79446dc2
							
						
					 | 
					
						
						
							
							* Reading in tokenization rules correctly. Passing tests.
						
						
						
						
						
					 | 
					
						2014-07-07 00:02:55 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							9bef797afe
							
						
					 | 
					
						
						
							
							* Rejigged tests. Working possessives, but no other contractions
						
						
						
						
						
					 | 
					
						2014-07-06 20:02:00 +02:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Matthew Honnibal
							
						 
					 | 
					
						
						
						
						
							
						
						
							556f6a18ca
							
						
					 | 
					
						
						
							
							* Initial commit. Tests passing for punctuation handling. Need contractions, file transport, tokenize function, etc.
						
						
						
						
						
					 | 
					
						2014-07-05 20:51:42 +02:00 | 
					
					
						
						
							
							
							
						
					 |