Eric Zhao
aafdf6ffb8
Add option to use label karg to determine ent_type in doc.merge
2017-03-28 23:35:03 -07:00
Em
9c809efc25
Removed mapStr
2017-03-11 16:23:26 -08:00
Em
426d17167f
Added string manipulation for spans
2017-03-10 16:50:02 -08:00
Ines Montani
a16aff17aa
Merge pull request #876 from PySUST/master
...
[Bangla] Update "tokenizer_exceptions.py"
2017-03-10 14:46:00 +01:00
ines
10e29189ac
Adjust URL testcases and xfail problems (instead of comment)
2017-03-10 14:22:50 +01:00
ines
b04893a059
Make regex locale-independent for Python 2
2017-03-10 14:21:57 +01:00
Ines Montani
1c40890321
Add missing comma
...
Should fix Travis build error
2017-03-10 09:34:54 +01:00
Shuvanon Razik
c251703428
Update abbreviations
2017-03-10 10:45:01 +06:00
Dan Rapp
123d3f2d38
Fix error in test case parameterization
2017-03-09 12:18:21 -07:00
Dan Rapp
b9307dfcd7
Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix
2017-03-09 11:42:14 -07:00
Dan Rapp
3b1df3808d
Issue #840 - URL pattenr too broad
2017-03-09 11:39:39 -07:00
shuvanon
85438aee1b
update tokenizertokenizer
2017-03-08 17:29:39 +06:00
shuvanon
45bc78461c
update tokenizertokenizer
2017-03-08 17:27:12 +06:00
Aniruddha Adhikary
696215a3fb
add tests for Bengali
2017-03-05 11:25:12 +06:00
Aniruddha Adhikary
8f3bfe9bfc
[Bengali] basic tag map, morph, lemma rules and exceptions
2017-03-04 12:36:59 +06:00
ines
8dff040032
Revert "Add regression test for #859 "
...
This reverts commit c4f16c66d1
.
2017-03-01 21:56:20 +01:00
ines
c4f16c66d1
Add regression test for #859
2017-03-01 16:07:27 +01:00
Aniruddha Adhikary
d91be7aed4
add punctuations for Bengali
2017-02-28 21:07:14 +06:00
Aniruddha Adhikary
5a4fc09576
add basic Bengali support
2017-02-28 07:48:37 +06:00
Matthew Honnibal
cc9b2b74e3
Merge branch 'french-tokenizer-exceptions'
2017-02-27 11:44:39 +01:00
Matthew Honnibal
bd4375a2e6
Remove comment
2017-02-27 11:44:26 +01:00
Matthew Honnibal
e7e22d8be6
Move import within get_exceptions() function, to speed import
2017-02-27 11:34:48 +01:00
Matthew Honnibal
34bcc8706d
Merge branch 'french-tokenizer-exceptions'
2017-02-27 11:21:21 +01:00
Matthew Honnibal
0aaa546435
Fix test after updating the French tokenizer stuff
2017-02-27 11:20:47 +01:00
Matthew Honnibal
26446aa728
Avoid loading all French exceptions on import
...
Move exceptions loading behind a get_tokenizer_exceptions() function
for French, instead of loading into the top-level namespace. This
cuts import times from 0.6s to 0.2s, at the expense of making the
French data a little different from the others (there's no top-level
TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat
unsatisfying.
2017-02-25 11:55:00 +01:00
ines
376c5813a7
Remove print statements from test
2017-02-24 18:26:32 +01:00
ines
7c1260e98c
Add regression test
2017-02-24 18:22:49 +01:00
ines
0e2e331b58
Convert exceptions to Python list
2017-02-24 18:22:40 +01:00
ines
51eb190ef4
Remove print statements from test
2017-02-24 17:41:12 +01:00
Matthew Honnibal
db5ada3995
Merge branch 'master' of https://github.com/explosion/spaCy
2017-02-24 14:28:12 +01:00
Matthew Honnibal
8f94897d07
Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766
2017-02-24 14:27:02 +01:00
ines
67991b6e5f
Add more test cases to #775 regression test to cover #847
2017-02-18 14:10:44 +01:00
ines
30ce2a6793
Exclude "shed" and "Shed" from tokenizer exceptions (see #847 )
2017-02-18 14:10:44 +01:00
Ines Montani
de997c1a33
Merge pull request #842 from magnusburton/master
...
Added regular verb rules for Swedish
2017-02-17 11:18:20 +01:00
Magnus Burton
41fcfd06b8
Added regular verb rules for Swedish
2017-02-17 10:04:04 +01:00
ines
aa92d4e9b5
Fix unicode regex for Python 2 (see #834 )
2017-02-16 23:49:54 +01:00
ines
44de3c7642
Reformat test and use text_file fixture
2017-02-16 23:49:19 +01:00
ines
3dd22e9c88
Mark vectors test as xfail (temporary)
2017-02-16 23:28:51 +01:00
ines
85d249d451
Revert "Revert "Merge pull request #836 from raphael0202/load_vectors ( closes #834 )""
...
This reverts commit ea05f78660
.
2017-02-16 23:26:25 +01:00
ines
ea05f78660
Revert "Merge pull request #836 from raphael0202/load_vectors ( closes #834 )"
...
This reverts commit 7d8c9eee7f
, reversing
changes made to f6b69babcc
.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque
06a71d22df
Fix test failure by using unicode literals
2017-02-16 14:48:00 +01:00
Raphaël Bournhonesque
3ba109622c
Add regression test with non ' ' space character as token
2017-02-16 12:23:27 +01:00
Raphaël Bournhonesque
e17dc2db75
Remove useless import
2017-02-16 12:10:24 +01:00
Raphaël Bournhonesque
3fd2742649
load_vectors should accept arbitrary space characters as word tokens
...
Fix bug #834
2017-02-16 12:08:30 +01:00
ines
f08e180a47
Make groups non-capturing
...
Prevents hitting the 100 named groups limit in Python
2017-02-10 13:35:02 +01:00
ines
fa3b8512da
Use consistent imports and exports
...
Bundle everything in language_data to keep it consistent with other
languages and make TOKENIZER_EXCEPTIONS importable from there.
2017-02-10 13:34:09 +01:00
ines
21f09d10d7
Revert "Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions""
...
This reverts commit f02a2f9322
.
2017-02-10 13:17:05 +01:00
ines
f02a2f9322
Revert "Merge pull request #818 from raphael0202/tokenizer_exceptions"
...
This reverts commit b95afdf39c
, reversing
changes made to b0ccf32378
.
2017-02-09 17:07:21 +01:00
Raphaël Bournhonesque
309da78bf0
Merge branch 'master' into tokenizer_exceptions
2017-02-09 16:32:12 +01:00
Raphaël Bournhonesque
4ce0bbc6b6
Update unit tests
2017-02-09 16:30:43 +01:00