Ines Montani
|
e7ef51b382
|
Update tokenizer_exceptions.py
|
2017-06-02 19:00:01 +02:00 |
|
Francisco Aranda
|
70a2180199
|
fix(spanish sentence segmentation): remove tokenizer exceptions the break sentence segmentation. Aligned with training corpus
|
2017-06-02 08:19:57 +02:00 |
|
ines
|
66c1f194f9
|
Use consistent unicode declarations
|
2017-03-12 13:07:28 +01:00 |
|
Ines Montani
|
0dec90e9f7
|
Use global abbreviation data languages and remove duplicates
|
2017-01-08 20:36:00 +01:00 |
|
Ines Montani
|
1d64527727
|
Update Spanish tokenizer
Remove reflexive pronouns as they're part of an open class, fix
mistakes and add exceptions
|
2016-12-23 21:36:01 +01:00 |
|
Ines Montani
|
d60380418e
|
Update tokenizer exceptions for Spanish
|
2016-12-21 18:06:17 +01:00 |
|
Ines Montani
|
2b2ea8ca11
|
Reorganise language data
|
2016-12-18 16:54:19 +01:00 |
|