Commit Graph

26 Commits

Author SHA1 Message Date
Matthew Honnibal
494da25872 * Refactor for more universal spacy 2015-08-26 19:13:50 +02:00
Matthew Honnibal
b27bd18d6e * Add spaCy to gazetteer 2015-08-08 23:30:49 +02:00
Matthew Honnibal
855af087fc * Fix gazetteer.json 2015-08-06 17:27:51 +02:00
Matthew Honnibal
0e098815cc * Expand gazetteer with some of the errors from the reddit parse 2015-08-06 17:13:27 +02:00
Matthew Honnibal
6fcc3df989 * Expand gazetteer with some of the errors from the reddit parse 2015-08-06 17:11:00 +02:00
Matthew Honnibal
832896ea6c * Add html to gazetteer 2015-08-06 16:36:54 +02:00
Matthew Honnibal
5c3c962038 * Add html to gazetteer 2015-08-06 16:34:51 +02:00
Matthew Honnibal
91a94e152b * Make initial gazetteer 2015-08-06 16:10:04 +02:00
Matthew Honnibal
af84669306 * Add smart-quote possessive marker to tokenizer 2015-07-30 05:12:48 +02:00
Matthew Honnibal
3cfe3d8c1c * Revert bad infix change 2015-07-26 17:32:37 +02:00
Matthew Honnibal
bd608559bc * Fix infix-period tokenization 2015-07-26 17:14:52 +02:00
Matthew Honnibal
94f314c271 * Fix tokenization of email addresses. 2015-07-26 16:38:08 +02:00
Matthew Honnibal
6c01e01f12 * Fix some casing problems in specials.json 2015-07-26 01:38:29 +02:00
Matthew Honnibal
9cae1b4cad * Restore accidentally clobbered updates to specials.json 2015-07-20 12:19:46 +02:00
Matthew Honnibal
14e9e6ec6c * Fix ... tokenization, and correct orth inconsistencies in specials.json 2015-07-20 12:10:56 +02:00
Matthew Honnibal
d8bc279e0c * Fix 'you' contraction capitals in specials.json 2015-07-16 01:28:32 +02:00
Matthew Honnibal
3c1e3e9ee8 * Fix capitalization problems in specials.json 2015-07-14 23:46:31 +02:00
Matthew Honnibal
b5223c4824 * Add whitespace to specials.json 2015-07-09 13:31:12 +02:00
Matthew Honnibal
aba0257894 * Add lemma rule for better and best in morphs.json 2015-06-28 09:26:25 +02:00
Matthew Honnibal
b5b869366b * Adjust hyphenation rule in tokenizer 2015-06-28 06:18:58 +02:00
Matthew Honnibal
45ec92243a * Add hyphenation rule to infix.txt for tokenizer 2015-06-06 05:56:00 +02:00
Jordan Suchow
3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal
0c25001325 * Fix specials.json 2015-04-12 04:45:41 +02:00
Matthew Honnibal
056c672caf * Bug fixes to tokenization, and support for times 2015-03-26 16:44:48 +01:00
Matthew Honnibal
13520e6cf0 * Add i.e. to specials.json 2015-03-26 16:44:45 +01:00
Matthew Honnibal
5e27bd0c4c * Add en language data, for tokenizer etc 2015-02-25 17:10:32 -05:00