Commit Graph

57 Commits

Author SHA1 Message Date
Matthew Honnibal
a510858f5a * Pretty-print specials.json, and add the em dash 2015-10-09 11:07:45 +02:00
Matthew Honnibal
49600a44a8 * Fix trailing comma in lemma_rules.json 2015-10-09 11:06:57 +02:00
Matthew Honnibal
0e92e8574a * Fix pos tag in em-dash in specials 2015-10-09 11:06:37 +02:00
Matthew Honnibal
d341443282 * Remove em-dash from lemma rules. Handle instead in specials. 2015-10-09 10:27:13 +02:00
Matthew Honnibal
b6047afe4c * Fix punctuation lemma rules, to resolve Issue #130 2015-10-09 10:25:37 +02:00
Matthew Honnibal
393a13d1af * Add unicode em dash to specials.json, so that we can control what POS tag it gets. This way we can prevent sentence boundary detection errors, to address Issue #130. 2015-10-09 19:24:33 +11:00
Matthew Honnibal
1490feda29 * Make generate_specials pretty-print the specials.json file 2015-10-09 19:23:47 +11:00
Matthew Honnibal
1842a53e73 * Lemmatize smart quotes as plain quotes 2015-10-09 19:09:36 +11:00
Matthew Honnibal
5332c0b697 * Add support for punctuation lemmatization, to handle unicode characters. This should help in addressing Issue #130 2015-10-09 18:54:40 +11:00
Matthew Honnibal
e3e8994368 * Patch italian tag map 2015-10-08 14:00:13 +11:00
Matthew Honnibal
2d68f75b6a * Fix identity tag map 2015-10-08 13:59:56 +11:00
Matthew Honnibal
095831e5bf * Start adding auxiliaries to morphs.json 2015-09-27 16:56:34 +10:00
Matthew Honnibal
c579b6b96c * Update English morphs.json 2015-09-24 22:38:41 +10:00
Matthew Honnibal
3b3547251c * Fix Issue #102: DT tag was mapped to DET. 2015-09-24 18:38:47 +10:00
Matthew Honnibal
be4848fbcb * Update morphs.json with universal dependencies/interset morphological features 2015-09-24 00:59:42 +10:00
Henning Peters
911de2ae49 add overseen (?) char 2015-09-22 12:29:47 +02:00
Henning Peters
9ecb98f30e basic german rules 2015-09-22 11:56:29 +02:00
Matthew Honnibal
b9e31dc245 * Bug fix to gazetteer.json 2015-09-10 14:50:44 +02:00
Matthew Honnibal
623329b19a Merge branch 'master' of ssh://github.com/honnibal/spaCy into develop 2015-09-08 14:27:01 +02:00
Matthew Honnibal
86c888667f * Merge in changes from de branch 2015-09-06 19:49:28 +02:00
Matthew Honnibal
dbf8dce109 Merge branch 'gaz' of ssh://github.com/honnibal/spaCy into gaz 2015-09-06 18:44:14 +02:00
Matthew Honnibal
577418986a * Add draft Italian stuff 2015-09-06 18:44:10 +02:00
Matthew Honnibal
80a66c0159 * Add draft finnish stuff 2015-09-06 18:43:44 +02:00
Matthew Honnibal
b3703836f9 * Add en lemma rules 2015-09-06 17:56:11 +02:00
Matthew Honnibal
238b2f533b * Add lemma rules 2015-09-06 17:55:53 +02:00
Matthew Honnibal
c9f2082e3c * Fix compilation error in en/tag_map.json 2015-09-06 17:54:51 +02:00
Matthew Honnibal
0af139e183 * Tagger training now working. Still need to test load/save of model. Morphology still broken. 2015-08-27 09:16:11 +02:00
Matthew Honnibal
56c4e07a59 Update gazetteer.json 2015-08-27 08:53:48 +10:00
Matthew Honnibal
494da25872 * Refactor for more universal spacy 2015-08-26 19:13:50 +02:00
jxs8172
85f01c5e16 Add contributor agreement. Add exception to 'it' so that 'its' and 'Its' isn't generated (its =/= it's) 2015-08-24 18:20:06 -04:00
jxs8172
5876248109 Add missing we've and hardcoded 's and 'S 2015-08-21 22:57:47 -04:00
jxs8172
a5e0a0073b Add a script to generate the specials.json file, to take care of handling uppercase and missing apostrophe contractions 2015-08-21 22:39:33 -04:00
Matthew Honnibal
b27bd18d6e * Add spaCy to gazetteer 2015-08-08 23:30:49 +02:00
Matthew Honnibal
855af087fc * Fix gazetteer.json 2015-08-06 17:27:51 +02:00
Matthew Honnibal
0e098815cc * Expand gazetteer with some of the errors from the reddit parse 2015-08-06 17:13:27 +02:00
Matthew Honnibal
6fcc3df989 * Expand gazetteer with some of the errors from the reddit parse 2015-08-06 17:11:00 +02:00
Matthew Honnibal
832896ea6c * Add html to gazetteer 2015-08-06 16:36:54 +02:00
Matthew Honnibal
5c3c962038 * Add html to gazetteer 2015-08-06 16:34:51 +02:00
Matthew Honnibal
91a94e152b * Make initial gazetteer 2015-08-06 16:10:04 +02:00
Matthew Honnibal
af84669306 * Add smart-quote possessive marker to tokenizer 2015-07-30 05:12:48 +02:00
Matthew Honnibal
3cfe3d8c1c * Revert bad infix change 2015-07-26 17:32:37 +02:00
Matthew Honnibal
bd608559bc * Fix infix-period tokenization 2015-07-26 17:14:52 +02:00
Matthew Honnibal
94f314c271 * Fix tokenization of email addresses. 2015-07-26 16:38:08 +02:00
Matthew Honnibal
6c01e01f12 * Fix some casing problems in specials.json 2015-07-26 01:38:29 +02:00
Matthew Honnibal
9cae1b4cad * Restore accidentally clobbered updates to specials.json 2015-07-20 12:19:46 +02:00
Matthew Honnibal
14e9e6ec6c * Fix ... tokenization, and correct orth inconsistencies in specials.json 2015-07-20 12:10:56 +02:00
Matthew Honnibal
d8bc279e0c * Fix 'you' contraction capitals in specials.json 2015-07-16 01:28:32 +02:00
Matthew Honnibal
3c1e3e9ee8 * Fix capitalization problems in specials.json 2015-07-14 23:46:31 +02:00
Matthew Honnibal
b5223c4824 * Add whitespace to specials.json 2015-07-09 13:31:12 +02:00
Matthew Honnibal
aba0257894 * Add lemma rule for better and best in morphs.json 2015-06-28 09:26:25 +02:00