Commit Graph

14 Commits

Author SHA1 Message Date
Matthew Honnibal
85485f5c2b Fix inconsistencies in generate_specials.py
Re Issue #321, fix inconsistencies in the script that generates specials.json. The result still isn't so satisfying --- we need to revise this as we move to parse more morphologically rich languages.
2016-04-07 11:21:52 +10:00
Matthew Honnibal
4b4eec8b47 * Fix Issue #201: Tokenization of there'll 2015-12-29 18:09:09 +01:00
Matthew Honnibal
e8bd92f1e7 * Fix lemma of let's, re Issue #177 2015-11-13 06:42:23 +11:00
Matthew Honnibal
726bb648da * Fix non-breaking space in specials.json 2015-10-19 12:46:11 +11:00
Matthew Honnibal
e39095da82 * Fix designation of non-breaking space in specials.json. 2015-10-19 12:39:03 +11:00
Matthew Honnibal
bdcb8d695c * Add non-breaking space to specials.json 2015-10-10 15:54:06 +11:00
Matthew Honnibal
7e7f28e1fd * Add smart-quote possessive marker in generate_specials 2015-10-10 14:06:09 +11:00
Matthew Honnibal
0e92e8574a * Fix pos tag in em-dash in specials 2015-10-09 11:06:37 +02:00
Matthew Honnibal
393a13d1af * Add unicode em dash to specials.json, so that we can control what POS tag it gets. This way we can prevent sentence boundary detection errors, to address Issue #130. 2015-10-09 19:24:33 +11:00
Matthew Honnibal
1490feda29 * Make generate_specials pretty-print the specials.json file 2015-10-09 19:23:47 +11:00
Henning Peters
911de2ae49 add overseen (?) char 2015-09-22 12:29:47 +02:00
jxs8172
85f01c5e16 Add contributor agreement. Add exception to 'it' so that 'its' and 'Its' isn't generated (its =/= it's) 2015-08-24 18:20:06 -04:00
jxs8172
5876248109 Add missing we've and hardcoded 's and 'S 2015-08-21 22:57:47 -04:00
jxs8172
a5e0a0073b Add a script to generate the specials.json file, to take care of handling uppercase and missing apostrophe contractions 2015-08-21 22:39:33 -04:00