| 
							
							
								 Matthew Honnibal | e8bd92f1e7 | * Fix lemma of let's, re Issue #177 | 2015-11-13 06:42:23 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 726bb648da | * Fix non-breaking space in specials.json | 2015-10-19 12:46:11 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | e39095da82 | * Fix designation of non-breaking space in specials.json. | 2015-10-19 12:39:03 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 454c1996d0 | * Add tokenizer rule to fix numeric range tokenization | 2015-10-17 15:49:51 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7488821677 | * Map NIL to empty string in tag map | 2015-10-10 22:09:50 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 4bbd1388bd | * Whitespace | 2015-10-10 16:03:48 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bdcb8d695c | * Add non-breaking space to specials.json | 2015-10-10 15:54:06 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c12d36d5f4 | * Fix quote marks in lemma_rules | 2015-10-10 15:03:36 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 57b3cd4661 | * Add smart-quotes to lemma rules | 2015-10-10 14:06:46 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 7e7f28e1fd | * Add smart-quote possessive marker in generate_specials | 2015-10-10 14:06:09 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | a510858f5a | * Pretty-print specials.json, and add the em dash | 2015-10-09 11:07:45 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 49600a44a8 | * Fix trailing comma in lemma_rules.json | 2015-10-09 11:06:57 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0e92e8574a | * Fix pos tag in em-dash in specials | 2015-10-09 11:06:37 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d341443282 | * Remove em-dash from lemma rules. Handle instead in specials. | 2015-10-09 10:27:13 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b6047afe4c | * Fix punctuation lemma rules, to resolve Issue #130 | 2015-10-09 10:25:37 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 393a13d1af | * Add unicode em dash to specials.json, so that we can control what POS tag it gets. This way we can prevent sentence boundary detection errors, to address Issue #130. | 2015-10-09 19:24:33 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1490feda29 | * Make generate_specials pretty-print the specials.json file | 2015-10-09 19:23:47 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 1842a53e73 | * Lemmatize smart quotes as plain quotes | 2015-10-09 19:09:36 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5332c0b697 | * Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue #130 | 2015-10-09 18:54:40 +11:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 095831e5bf | * Start adding auxiliaries to morphs.json | 2015-09-27 16:56:34 +10:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c579b6b96c | * Update English morphs.json | 2015-09-24 22:38:41 +10:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3b3547251c | * Fix Issue #102: DT tag was mapped to DET. | 2015-09-24 18:38:47 +10:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | be4848fbcb | * Update morphs.json with universal dependencies/interset morphological features | 2015-09-24 00:59:42 +10:00 |  | 
			
				
					| 
							
							
								 Henning Peters | 911de2ae49 | add overseen (?) char | 2015-09-22 12:29:47 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b9e31dc245 | * Bug fix to gazetteer.json | 2015-09-10 14:50:44 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 623329b19a | Merge branch 'master' of ssh://github.com/honnibal/spaCy into develop | 2015-09-08 14:27:01 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 86c888667f | * Merge in changes from de branch | 2015-09-06 19:49:28 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b3703836f9 | * Add en lemma rules | 2015-09-06 17:56:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | c9f2082e3c | * Fix compilation error in en/tag_map.json | 2015-09-06 17:54:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0af139e183 | * Tagger training now working. Still need to test load/save of model. Morphology still broken. | 2015-08-27 09:16:11 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 56c4e07a59 | Update gazetteer.json | 2015-08-27 08:53:48 +10:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 494da25872 | * Refactor for more universal spacy | 2015-08-26 19:13:50 +02:00 |  | 
			
				
					| 
							
							
								 jxs8172 | 85f01c5e16 | Add contributor agreement. Add exception to 'it' so that 'its' and 'Its' isn't generated (its =/= it's) | 2015-08-24 18:20:06 -04:00 |  | 
			
				
					| 
							
							
								 jxs8172 | 5876248109 | Add missing we've and hardcoded 's and 'S | 2015-08-21 22:57:47 -04:00 |  | 
			
				
					| 
							
							
								 jxs8172 | a5e0a0073b | Add a script to generate the specials.json file, to take care of handling uppercase and missing apostrophe contractions | 2015-08-21 22:39:33 -04:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | b27bd18d6e | * Add spaCy to gazetteer | 2015-08-08 23:30:49 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 855af087fc | * Fix gazetteer.json | 2015-08-06 17:27:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 0e098815cc | * Expand gazetteer with some of the errors from the reddit parse | 2015-08-06 17:13:27 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6fcc3df989 | * Expand gazetteer with some of the errors from the reddit parse | 2015-08-06 17:11:00 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 832896ea6c | * Add html to gazetteer | 2015-08-06 16:36:54 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 5c3c962038 | * Add html to gazetteer | 2015-08-06 16:34:51 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 91a94e152b | * Make initial gazetteer | 2015-08-06 16:10:04 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | af84669306 | * Add smart-quote possessive marker to tokenizer | 2015-07-30 05:12:48 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 3cfe3d8c1c | * Revert bad infix change | 2015-07-26 17:32:37 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | bd608559bc | * Fix infix-period tokenization | 2015-07-26 17:14:52 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 94f314c271 | * Fix tokenization of email addresses. | 2015-07-26 16:38:08 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 6c01e01f12 | * Fix some casing problems in specials.json | 2015-07-26 01:38:29 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 9cae1b4cad | * Restore accidentally clobbered updates to specials.json | 2015-07-20 12:19:46 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | 14e9e6ec6c | * Fix ... tokenization, and correct orth inconsistencies in specials.json | 2015-07-20 12:10:56 +02:00 |  | 
			
				
					| 
							
							
								 Matthew Honnibal | d8bc279e0c | * Fix 'you' contraction capitals in specials.json | 2015-07-16 01:28:32 +02:00 |  |