Commit Graph

7 Commits

Author SHA1 Message Date
Ines Montani
e7ef51b382 Update tokenizer_exceptions.py 2017-06-02 19:00:01 +02:00
Francisco Aranda
70a2180199 fix(spanish sentence segmentation): remove tokenizer exceptions the break sentence segmentation. Aligned with training corpus 2017-06-02 08:19:57 +02:00
ines
66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Ines Montani
0dec90e9f7 Use global abbreviation data languages and remove duplicates 2017-01-08 20:36:00 +01:00
Ines Montani
1d64527727 Update Spanish tokenizer
Remove reflexive pronouns as they're part of an open class, fix
mistakes and add exceptions
2016-12-23 21:36:01 +01:00
Ines Montani
d60380418e Update tokenizer exceptions for Spanish 2016-12-21 18:06:17 +01:00
Ines Montani
2b2ea8ca11 Reorganise language data 2016-12-18 16:54:19 +01:00