Commit Graph

15 Commits

Author SHA1 Message Date
Ines Montani
877f09218b Add more custom rules for abbreviations 2016-12-08 12:47:01 +01:00
Ines Montani
ec44bee321 Fix capitalization on morphological features 2016-12-08 12:00:54 +01:00
Ines Montani
ce979553df Resolve conflict 2016-12-07 21:16:52 +01:00
Ines Montani
0d07d7fc80 Apply emoticon exceptions to tokenizer 2016-12-07 21:11:59 +01:00
Ines Montani
71f0f34cb3 Fix formatting 2016-12-07 21:11:29 +01:00
Ines Montani
1285c4ba93 Update English language data 2016-12-07 20:33:28 +01:00
Ines Montani
a662a95294 Add line breaks 2016-12-07 20:33:28 +01:00
Ines Montani
e0712d1b32 Reformat language data 2016-12-07 20:33:28 +01:00
Ines Montani
4dcfafde02 Add line breaks 2016-11-24 14:57:37 +01:00
Ines Montani
de747e39e7 Reformat language data 2016-11-24 13:51:32 +01:00
Ines Montani
dad2c6cae9 Strip trailing whitespace 2016-11-20 16:45:51 +01:00
Matthew Honnibal
f0917b6808 Fix Issue #376: and/or was tagged as a noun. 2016-11-04 15:21:28 +01:00
Matthew Honnibal
737816e86e Fix #368: Tokenizer handled pattern 'unicode close quote, period' incorrectly. 2016-11-04 15:16:20 +01:00
Matthew Honnibal
41a90a7fbb Add tokenizer exception for 'Ph.D.', to fix 592. 2016-11-03 00:03:34 +01:00
Matthew Honnibal
d7e9acdcdf Add English language data, so that the tokenizer doesn't require the data download 2016-09-25 14:49:00 +02:00