| 
							
							
								 Matthew Honnibal | 06e7456c65 | * Upd tests | 2015-01-17 17:33:23 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 802867e96a | * Revise interface to Token. Strings now have attribute names like norm1_ | 2015-01-15 03:51:47 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7d3c40de7d | * Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme | 2015-01-15 00:33:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dc681920bc | * Upd asciify test, fixing type error | 2015-01-06 01:09:44 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 64f33a8705 | * Upd asciify test, fixing type error | 2015-01-06 01:03:29 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0aa9860c2d | * Fix string-typing in test_contractions. API is inconsistent, must fix... | 2015-01-05 20:10:03 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ee3a71862e | * Fix unicode bugs in tests | 2015-01-05 17:54:54 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 166c09832f | * Upd test for Python3 | 2015-01-05 13:15:46 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 775a66e2b6 | * Fix encoding in lemmatizer tests | 2015-01-05 11:53:30 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 33b7b3182a | * Relax lemma test for now | 2015-01-04 01:16:18 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5d7e6e37ea | * Refine lemma test to probe failure | 2015-01-03 23:41:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4a5cb20899 | * Upd test given new data file layout | 2015-01-03 01:59:56 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 81d878beb2 | * Upd tests | 2014-12-30 21:34:09 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 91a5064b7f | * Upd tests | 2014-12-26 14:26:27 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b00bc01d8c | * All tests now passing for reorg | 2014-12-23 13:18:59 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 73f200436f | * Tests passing except for morphology/lemmatization stuff | 2014-12-23 11:40:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cf8d26c3d2 | * POS tagger training working after reorg | 2014-12-22 08:54:47 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4d4d2c0db4 | * Upd test | 2014-12-21 21:05:28 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d047dc0d0f | Upd lemmatizer test | 2014-12-21 21:02:44 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b864f0e539 | * Upd iteration test | 2014-12-21 21:01:46 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c1ab134159 | * Upd lemmas test | 2014-12-21 20:58:21 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 82bd57c76f | * Upd intern test | 2014-12-21 20:44:21 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 734d1da55c | * Upd emoticons test | 2014-12-21 20:43:27 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 199025609f | * Upd contractions test | 2014-12-21 20:41:13 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0d9972f4b0 | * Upd tokenizer test | 2014-12-21 20:38:27 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | ed2fff6128 | * Add tests | 2014-12-20 03:51:25 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 516f0f1e14 | * Remove test for loading ad hoc rules format | 2014-12-09 16:08:45 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6369835306 | * Add false positive test for emoticons | 2014-12-09 16:08:17 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2a6bd2818f | * Load the lexicon before we check flag values | 2014-12-09 15:18:43 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 302e09018b | * Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas | 2014-12-09 14:48:01 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | cda9ea9a4a | * Add test to make sure iterating over the lexicon isnt broken | 2014-12-08 21:12:51 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7b68f911cf | * Add WordNet lemmatizer | 2014-12-08 01:39:13 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8f2f319c57 | * Add a couple more contractions tests | 2014-12-07 22:08:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 14097311ae | * Make StringStore.__getitem__ accept unicode-typed keys. | 2014-12-03 01:33:20 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 60c1e78596 | * Commit outstanding tests | 2014-11-12 23:24:32 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b01604b303 | * Upd NER tests | 2014-11-11 21:10:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 10e9e14c4f | * Add tests for NER oracle | 2014-11-10 22:13:46 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d7b2843643 | * Add some tests for ner | 2014-11-10 16:29:19 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a42321bd4e | * Upd shape test | 2014-11-07 04:42:54 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 81da61f3cf | * Remove out-dated POS data test | 2014-11-05 02:04:12 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0de700b566 | * Comment out tests of hyphenation, while we decide what hyphenation policy should be. | 2014-11-05 02:03:22 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 11915e5238 | * Update tests | 2014-11-03 00:23:04 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 493d5ffb50 | * Add test for '' in punct | 2014-11-02 21:24:09 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 99b5cefa88 | * Add tests for emoticon tokenization | 2014-11-02 13:22:14 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 23131f21bb | * Add tests for like_url | 2014-11-02 13:21:57 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dc6c3c0f56 | * Add tests for like_number | 2014-11-02 13:21:39 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c414d0eebe | * Add tests for is_number | 2014-11-01 19:13:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 63114820cf | * Upd tests for tighter interface | 2014-10-30 18:15:30 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 13909a2e24 | * Rewriting Lexeme serialization. | 2014-10-29 23:19:38 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 08ce602243 | * Large refactor, particularly to Python API | 2014-10-24 00:59:17 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 168b2b8cb2 | * Add tests for string intern | 2014-10-23 20:47:06 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 077885637d | * Add test for reading in POS tags | 2014-10-22 10:18:43 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 12742f4f83 | * Add detokenize method and test | 2014-10-18 18:07:29 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 31aad7c08a | * Test hyphenation etc | 2014-10-14 20:26:16 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 99f5e59286 | * Have tokenizer emit tokens for whitespace other than single spaces | 2014-10-14 20:25:57 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6fb42c4919 | * Add offsets to Tokens class. Some changes to interfaces, and reorganization of spacy.Lang | 2014-10-14 16:17:45 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 59b41a9fd3 | * Switch to new data model, tests passing | 2014-10-10 08:11:31 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bc460de171 | * Add extra tests | 2014-09-25 18:29:42 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2d4e5ceafd | * Remove old docs stuff | 2014-09-25 18:24:05 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0152831c89 | * Refactor tokenization, enable cache, and ensure we look up specials correctly even when there's confusing punctuation surrounding the token. | 2014-09-16 18:01:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | db191361ee | * Add new tests for fancier tokenization cases | 2014-09-15 06:31:58 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5dcc1a426a | * Update tokenization tests for new tokenizer rules | 2014-09-15 01:32:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0447279c57 | * PointerHash working, efficiency is good. 6-7 mins | 2014-09-13 16:43:59 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 985bc68327 | * Fix bug with trailing punct on contractions. Reduced efficiency, and slightly hacky implementation. | 2014-09-12 18:26:26 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b5b31c6b6e | * Avoid testing for object identity | 2014-09-10 20:58:30 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7c09c73a14 | * Refactor to use tokens class. | 2014-09-10 18:27:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5ee4d8c641 | * Work on tests for flag features | 2014-09-01 23:41:43 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bf47429368 | * Add tests for non_sparse string transform | 2014-09-01 23:27:31 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c50433163f | * Add tests for flag features | 2014-09-01 23:27:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 786a4a86fe | * Add tests for canon_case | 2014-09-01 23:26:49 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4c7b997df7 | * Add tests for word shape features | 2014-09-01 23:26:17 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c5abb81f4c | * Add incomplete tests of asciify function | 2014-09-01 23:25:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 8bbfadfced | * Pass tests. Need to implement more feature functions. | 2014-08-30 20:36:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | dcab14ede2 | * Begin testing more functionality | 2014-08-30 19:01:15 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6209d94f83 | * Add tests for word shape | 2014-08-30 19:00:10 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c282e6d5fb | * Redesign proceeding | 2014-08-28 19:45:09 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fd4e61e58b | * Fixed contraction tests. Need to correct problem with the way case stats and tag stats are supposed to work. | 2014-08-27 20:22:33 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | fdaf24604a | * Basic punct tests updated and passing | 2014-08-27 19:38:57 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9815c7649e | * Refactor around Word objects, adapting tests. Tests passing, except for string views. | 2014-08-23 19:55:06 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6f83dca218 | * Fix import for ptb tokenization test | 2014-08-22 17:05:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4bcdd6d31c | * Further improvements to spacy docs, tweaks to code. | 2014-08-22 04:20:24 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 01469b0888 | * Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word. | 2014-08-18 19:14:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b555e2dc5d | * Add hash tests | 2014-08-02 21:58:31 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6319ff0f22 | * Add length property | 2014-08-02 21:26:44 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e494494d80 | * Add tests for group_by | 2014-07-23 17:36:12 +01:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bc6c1f6156 | * Add test for open apostrophe bug | 2014-07-07 23:24:20 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e60b958b7d | * Add test to check how well we match ptb tokenizer. Needs more text. | 2014-07-07 05:11:31 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 2c431f9fdc | * Upd tokenization test | 2014-07-07 05:11:04 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 25849fc926 | * Generalize tokenization rules to capitals | 2014-07-07 05:07:21 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e4263a241a | * Tests passing for reorganized version | 2014-07-07 04:23:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 12f8a0e3c2 | * Tests passing for reorganized version | 2014-07-07 04:23:20 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a62c38e1ef | * Working tokenization. en doesn't match PTB perfectly. Need to reorganize before adding more schemes. | 2014-07-07 01:15:59 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4e79446dc2 | * Reading in tokenization rules correctly. Passing tests. | 2014-07-07 00:02:55 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9bef797afe | * Rejigged tests. Working possessives, but no other contractions | 2014-07-06 20:02:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 556f6a18ca | * Initial commit. Tests passing for punctuation handling. Need contractions, file transport, tokenize function, etc. | 2014-07-05 20:51:42 +02:00 |  |