ines
|
6d2c85f428
|
Drop six and related hacks as a dependency
|
2018-03-28 10:45:25 +02:00 |
|
4altinok
|
471d3c9e23
|
added lex test for is_currency
|
2018-02-11 18:50:50 +01:00 |
|
Ines Montani
|
a3dd167d7f
|
Merge branch 'master' into da_ud_tokenization
|
2017-12-20 21:05:34 +00:00 |
|
Søren Lind Kristiansen
|
15d13efafd
|
Tune Danish tokenizer to more closely match tokenization in Universal Dependencies.
|
2017-12-20 17:36:52 +01:00 |
|
Canbey Bilgili
|
abe098b255
|
Adds Turkish Lemmatization
|
2017-12-01 17:04:32 +03:00 |
|
Matthew Honnibal
|
f9ed9ea529
|
Merge pull request #1624 from GreenRiverRUS/russian
Add support for Russian
|
2017-11-29 23:10:01 +01:00 |
|
Søren Lind Kristiansen
|
0ffd27b0f6
|
Add several Danish alternative spellings
|
2017-11-27 13:35:41 +01:00 |
|
Vadim Mazaev
|
cacd859dcd
|
Added tag map, fixed tests fails, added more exceptions
|
2017-11-26 20:54:48 +03:00 |
|
Søren Lind Kristiansen
|
6aa241bcec
|
Add day of month tokenizer exceptions for Danish.
|
2017-11-24 15:03:24 +01:00 |
|
Søren Lind Kristiansen
|
0c276ed020
|
Add weekday abbreviations and remove abiguous month abbreviations for Danish.
|
2017-11-24 14:43:29 +01:00 |
|
Søren Lind Kristiansen
|
056547e989
|
Add multiple tokenizer exceptions for Danish.
|
2017-11-24 11:51:26 +01:00 |
|
Søren Lind Kristiansen
|
8dc265ac0c
|
Add test for tokenization of 'i.' for Danish.
|
2017-11-24 11:29:37 +01:00 |
|
Vadim Mazaev
|
81314f8659
|
Fixed tokenizer: added char classes; added first lemmatizer and
tokenizer tests
|
2017-11-21 22:23:59 +03:00 |
|
ines
|
17849dee4b
|
Fix French test (see #1617)
|
2017-11-20 13:59:59 +01:00 |
|
Matthew Honnibal
|
63c6ae4191
|
Fix lemmatizer test
|
2017-11-06 11:57:06 +01:00 |
|
Matthew Honnibal
|
144a93c2a5
|
Back-off to tensor for similarity if no vectors
|
2017-11-03 20:56:33 +01:00 |
|
Matthew Honnibal
|
d6e831bf89
|
Fix lemmatizer tests
|
2017-11-03 19:46:34 +01:00 |
|
Jim O'Regan
|
08b0bfd153
|
merge
|
2017-10-31 22:55:59 +00:00 |
|
Jim O'Regan
|
00ecfa5417
|
Ó, not O
|
2017-10-31 22:54:42 +00:00 |
|
Ines Montani
|
25b1d6cd91
|
Fix syntax error
|
2017-10-31 22:36:03 +01:00 |
|
Jim O'Regan
|
fe4b10346a
|
replace example sentence until I get around to adding a punctuation.py
|
2017-10-31 20:24:53 +00:00 |
|
Jim O'Regan
|
d4a8160c36
|
change quotes
|
2017-10-31 15:15:44 +00:00 |
|
Jim O'Regan
|
41dd29e48e
|
merge
|
2017-10-31 14:07:45 +00:00 |
|
Ines Montani
|
facf77e541
|
Merge branch 'develop' into support-danish
|
2017-10-24 11:53:19 +02:00 |
|
ines
|
cd6a29dce7
|
Port over changes from #1294
|
2017-10-14 13:28:46 +02:00 |
|
ines
|
38c756fd85
|
Port over changes from #1287
|
2017-10-14 13:16:21 +02:00 |
|
ines
|
612224c10d
|
Port over changes from #1157
|
2017-10-14 13:11:39 +02:00 |
|
Matthew Honnibal
|
cf6da9301a
|
Update lemmatizer test
|
2017-10-12 22:50:52 +02:00 |
|
ines
|
453c47ca24
|
Add German lemmatizer tests
|
2017-10-11 13:27:26 +02:00 |
|
Matthew Honnibal
|
c6cd81f192
|
Wrap try/except around model saving
|
2017-10-05 08:14:24 -05:00 |
|
Matthew Honnibal
|
fd4baff475
|
Update tests
|
2017-10-05 08:12:27 -05:00 |
|
Wannaphong Phatthiyaphaibun
|
5cba67146c
|
add thai in spacy2
|
2017-09-26 21:36:27 +07:00 |
|
ines
|
ece30c28a8
|
Don't split hyphenated words in German
This way, the tokenizer matches the tokenization in German treebanks
|
2017-09-16 20:40:15 +02:00 |
|
Jim O'Regan
|
187be6d372
|
copy/paste error
|
2017-09-11 09:33:17 +01:00 |
|
Jim O'Regan
|
c283e9edfe
|
first stab at test
|
2017-09-11 08:57:48 +01:00 |
|
Matthew Honnibal
|
d5fbf27335
|
Fix test
|
2017-09-04 16:45:11 +02:00 |
|
Matthew Honnibal
|
644d6c9e1a
|
Improve lemmatization tests, re #1296
|
2017-09-04 15:17:44 +02:00 |
|
Jim Geovedi
|
fbc62a09c7
|
added {pre,suf,in}fix tests
|
2017-08-20 13:43:00 +07:00 |
|
Jim Geovedi
|
cc4772cac2
|
reworks
|
2017-08-03 13:08:38 +07:00 |
|
Jim Geovedi
|
783f7d8b86
|
added test set for Indonesian language
|
2017-07-29 18:21:07 +07:00 |
|
mollerhoj
|
e840077601
|
Add some basic tests for Danish
|
2017-07-03 15:49:51 +02:00 |
|
ines
|
cc9c5dc7a3
|
Fix noun chunks test
|
2017-06-05 16:39:04 +02:00 |
|
ines
|
a0f4592f0a
|
Update tests
|
2017-06-05 02:26:13 +02:00 |
|
ines
|
3e105bcd36
|
Update tests
|
2017-06-05 02:09:27 +02:00 |
|
Matthew Honnibal
|
58be0e1f6f
|
Update tests
|
2017-06-04 16:35:06 -05:00 |
|
Ines Montani
|
112c5787eb
|
Merge pull request #1101 from oroszgy/hu_tokenizer_fix
More robust Hungarian tokenizer.
|
2017-06-04 22:37:51 +02:00 |
|
ines
|
e47eef5e03
|
Update German tokenizer exceptions and tests
|
2017-06-03 21:07:44 +02:00 |
|
ines
|
d77c2cc8bb
|
Add tests for English norm exceptions
|
2017-06-03 20:59:50 +02:00 |
|
Gyorgy Orosz
|
f0c3b09242
|
More robust Hungarian tokenizer.
|
2017-05-31 22:28:40 +02:00 |
|
ines
|
20a7003c0d
|
Update model fixtures and reorganise tests
|
2017-05-29 22:14:31 +02:00 |
|
ines
|
d0c6d4f76d
|
Fix formatting
|
2017-05-23 11:32:00 +02:00 |
|
ines
|
2c3bdd09b1
|
Add English test for like_num
|
2017-05-09 11:06:34 +02:00 |
|
ines
|
22375eafb0
|
Fix and merge attrs and lex_attrs tests
|
2017-05-09 11:06:25 +02:00 |
|
ines
|
c714841cc8
|
Move language-specific tests to tests/lang
|
2017-05-09 00:02:37 +02:00 |
|
ines
|
3c0f85de8e
|
Remove imports in /lang/__init__.py
|
2017-05-08 23:58:07 +02:00 |
|