Matthew Honnibal
|
fe442cac53
|
Fix #717: Set correct lemma for contracted verbs
|
2017-03-18 16:16:10 +01:00 |
|
Matthew Honnibal
|
8dbff4f5f4
|
Wire up English lemma and morph rules.
|
2017-03-15 09:23:22 -05:00 |
|
ines
|
ce9568af84
|
Move English time exceptions ("1a.m." etc.) and refactor
|
2017-03-12 13:58:22 +01:00 |
|
ines
|
6b30541774
|
Fix formatting
|
2017-03-12 13:58:22 +01:00 |
|
ines
|
66c1f194f9
|
Use consistent unicode declarations
|
2017-03-12 13:07:28 +01:00 |
|
ines
|
30ce2a6793
|
Exclude "shed" and "Shed" from tokenizer exceptions (see #847)
|
2017-02-18 14:10:44 +01:00 |
|
Ines Montani
|
209c37bbcf
|
Exclude "shell" and "Shell" from English tokenizer exceptions (resolves #775)
|
2017-01-25 13:15:02 +01:00 |
|
Ines Montani
|
50878ef598
|
Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744)
|
2017-01-16 13:10:38 +01:00 |
|
Matthew Honnibal
|
fba67fa342
|
Fix Issue #736: Times were being tokenized with incorrect string values.
|
2017-01-12 11:21:01 +01:00 |
|
Ines Montani
|
0dec90e9f7
|
Use global abbreviation data languages and remove duplicates
|
2017-01-08 20:36:00 +01:00 |
|
Ines Montani
|
cab39c59c5
|
Add missing contractions to English tokenizer exceptions
Inspired by
https://github.com/kootenpv/contractions/blob/master/contractions/__init
__.py
|
2017-01-05 19:59:06 +01:00 |
|
Ines Montani
|
a23504fe07
|
Move abbreviations below other exceptions
|
2017-01-05 19:58:07 +01:00 |
|
Ines Montani
|
7d2cf934b9
|
Generate he/she/it correctly with 's instead of 've
|
2017-01-05 19:57:00 +01:00 |
|
Ines Montani
|
bc911322b3
|
Move ") to emoticons (see Tweebo challenge test)
|
2017-01-05 18:05:38 +01:00 |
|
Ines Montani
|
1d237664af
|
Add lowercase lemma to tokenizer exceptions
|
2017-01-03 23:02:21 +01:00 |
|
Ines Montani
|
84a87951eb
|
Fix typos
|
2017-01-03 18:27:43 +01:00 |
|
Ines Montani
|
35b39f53c3
|
Reorganise English tokenizer exceptions (as discussed in #718)
Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly.
|
2017-01-03 18:26:09 +01:00 |
|
Ines Montani
|
461cbb99d8
|
Revert "Reorganise English tokenizer exceptions (as discussed in #718)"
This reverts commit b19cfcc144 .
|
2017-01-03 18:21:29 +01:00 |
|
Ines Montani
|
b19cfcc144
|
Reorganise English tokenizer exceptions (as discussed in #718)
Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly.
|
2017-01-03 18:17:57 +01:00 |
|
Ines Montani
|
78e63dc7d0
|
Update tokenizer exceptions for English
|
2016-12-21 18:06:34 +01:00 |
|
Ines Montani
|
704c7442e0
|
Break language data components into their own files
|
2016-12-18 15:36:53 +01:00 |
|