Commit Graph

8 Commits

Author SHA1 Message Date
Gyorgy Orosz
3a9be4d485 Updated token exception handling mechanism to allow the usage of arbitrary functions as token exception matchers. 2016-12-23 23:49:34 +01:00
Gyorgy Orosz
1748549aeb Added exception pattern mechanism to the tokenizer. 2016-12-21 23:16:19 +01:00
Ines Montani
2b2ea8ca11 Reorganise language data 2016-12-18 16:54:19 +01:00
Ines Montani
bc40dad7d9 Add entity rules 2016-12-18 15:36:53 +01:00
Ines Montani
f324311249 Add global language data utils 2016-12-17 12:27:41 +01:00
Ines Montani
e47ee94761 Split punctuation into its own file 2016-12-08 19:46:43 +01:00
Ines Montani
0d07d7fc80 Apply emoticon exceptions to tokenizer 2016-12-07 21:11:59 +01:00
Ines Montani
79dce0aabe Add emoticons 2016-12-07 20:33:28 +01:00