Tahar Zanouda
00417794d3
Add Arabic language ( #2314 )
...
* added support for Arabic lang
* added Arabic language support
* updated conftest
2018-05-15 00:27:19 +02:00
Ali Zarezade
42349471bc
add ٪ as punctuation
2018-01-23 18:11:33 +03:30
Ali Zarezade
2bda582135
Add Persian character and symbols
...
Add Persian characters and the following:
- ٪ used instead of %
- ؟ used instead of ?
- ﷼ used instead of $
- ، used instead of ,
- ؛ used instead of ;
2018-01-23 13:20:36 +03:30
Vadim Mazaev
81314f8659
Fixed tokenizer: added char classes; added first lemmatizer and
...
tokenizer tests
2017-11-21 22:23:59 +03:00
ines
e85e1d571b
Update base punctuation
2017-10-14 14:59:23 +02:00
ines
09aed58140
Port over changes from #1333 and add comments
2017-10-14 12:52:59 +02:00
ines
5ee10379db
Port over changes from #1340
2017-09-26 16:38:08 +02:00
ines
10d291f129
Port over change from #1351
2017-09-26 16:11:41 +02:00
Matthew Honnibal
cfc055734e
Split % in units, for compatibility with corpus
2017-08-25 20:03:37 -05:00
ines
a8e58e04ef
Add symbols class to punctuation rules to handle emoji (see #1088 )
...
Currently doesn't work for Hungarian, because of conflicts with the
custom punctuation rules. Also doesn't take multi-character emoji like
👩🏽💻 into account.
2017-05-27 17:57:10 +02:00
ines
604f299cf6
Add char classes to global language data
2017-05-08 23:59:33 +02:00