ines
|
1da29a7146
|
Use new Lemmatizer data and remove file import
Since there's currently only an English lemmatizer, the global
Lemmatizer imports from spacy.en. This is unideal and still needs to be
fixed.
|
2017-03-12 13:58:22 +01:00 |
|
ines
|
0957737ee8
|
Add Python-formatted lemmatizer data and rules
|
2017-03-12 13:58:22 +01:00 |
|
ines
|
c89e30d1a3
|
Add test for English time exceptions ("1a.m." etc.)
|
2017-03-12 13:58:22 +01:00 |
|
ines
|
ce9568af84
|
Move English time exceptions ("1a.m." etc.) and refactor
|
2017-03-12 13:58:22 +01:00 |
|
ines
|
6b30541774
|
Fix formatting
|
2017-03-12 13:58:22 +01:00 |
|
Ines Montani
|
e97a30b99a
|
Merge pull request #885 from PySUST/master
[Bengali] Spell checked and add new stop words
|
2017-03-12 13:20:59 +01:00 |
|
ines
|
66c1f194f9
|
Use consistent unicode declarations
|
2017-03-12 13:07:28 +01:00 |
|
shuvanon
|
91cb4cdb2b
|
Sort stop_words
|
2017-03-12 17:55:51 +06:00 |
|
shuvanon
|
784f6cfa49
|
Update stop_words
|
2017-03-12 17:41:01 +06:00 |
|
shuvanon
|
73cc17078e
|
Merge branch 'master' of https://github.com/PySUST/spaCy
|
2017-03-12 14:52:17 +06:00 |
|
shuvanon
|
35ec7135bb
|
Spell checked and add new stop words
|
2017-03-12 14:51:34 +06:00 |
|
Matthew Honnibal
|
fa23278ee3
|
Add classes for beam parser and beam NER
|
2017-03-11 12:45:37 -06:00 |
|
Matthew Honnibal
|
6c4108c073
|
Add header for beam parser
|
2017-03-11 12:45:12 -06:00 |
|
Matthew Honnibal
|
4382f175b3
|
Squelch compiler warnings
|
2017-03-11 12:44:43 -06:00 |
|
Matthew Honnibal
|
ea2592879f
|
Merge branch 'master' of https://github.com/explosion/spaCy
|
2017-03-11 11:13:37 -06:00 |
|
Matthew Honnibal
|
1224c4d3c6
|
Improve output on trainer
|
2017-03-11 11:12:48 -06:00 |
|
Matthew Honnibal
|
b438dfd3f3
|
Add itn argument to tagger.update
|
2017-03-11 11:12:21 -06:00 |
|
Matthew Honnibal
|
931feb3360
|
Allow beam parsing for NER
|
2017-03-11 11:12:01 -06:00 |
|
Matthew Honnibal
|
f77a5bb60a
|
Switch back to greedy parser
|
2017-03-11 11:11:30 -06:00 |
|
Matthew Honnibal
|
ca9c8c57c0
|
Add iteration argument to parser.update
|
2017-03-11 07:00:47 -06:00 |
|
Matthew Honnibal
|
dcce9ca3f3
|
Use beam parser
|
2017-03-11 07:00:20 -06:00 |
|
Matthew Honnibal
|
e30ffdd003
|
Use ftrl optimizer in tagger
|
2017-03-11 06:59:13 -06:00 |
|
Matthew Honnibal
|
d59c6926c1
|
I think this fixes the segfault
|
2017-03-11 06:58:34 -06:00 |
|
Matthew Honnibal
|
318b9e32ff
|
WIP on beam parser. Currently segfaults.
|
2017-03-11 06:19:52 -06:00 |
|
Matthew Honnibal
|
b0d80dc9ae
|
Update name of 'train' function in BeamParser
|
2017-03-10 14:35:43 -06:00 |
|
Matthew Honnibal
|
d11f1a4ddf
|
Record negative costs in non-monotonic arc eager oracle
|
2017-03-10 11:22:04 -06:00 |
|
Matthew Honnibal
|
ecf91a2dbb
|
Support beam parser
|
2017-03-10 11:21:21 -06:00 |
|
Ines Montani
|
a16aff17aa
|
Merge pull request #876 from PySUST/master
[Bangla] Update "tokenizer_exceptions.py"
|
2017-03-10 14:46:00 +01:00 |
|
ines
|
10e29189ac
|
Adjust URL testcases and xfail problems (instead of comment)
|
2017-03-10 14:22:50 +01:00 |
|
ines
|
b04893a059
|
Make regex locale-independent for Python 2
|
2017-03-10 14:21:57 +01:00 |
|
Matthew Honnibal
|
ea53647362
|
Merge branch 'develop'
|
2017-03-10 02:49:39 -06:00 |
|
Ines Montani
|
1c40890321
|
Add missing comma
Should fix Travis build error
|
2017-03-10 09:34:54 +01:00 |
|
Shuvanon Razik
|
c251703428
|
Update abbreviations
|
2017-03-10 10:45:01 +06:00 |
|
Matthew Honnibal
|
b5247c49eb
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-03-09 18:45:43 -06:00 |
|
Matthew Honnibal
|
798450136d
|
Set L1 penalty to 0 in tagger.
|
2017-03-09 18:43:47 -06:00 |
|
Matthew Honnibal
|
c62da02344
|
Use ftrl training, to learn compressed model.
|
2017-03-09 18:43:21 -06:00 |
|
Matthew Honnibal
|
f71eeef9bb
|
Pass path argument to end_training
|
2017-03-09 18:42:40 -06:00 |
|
Dan Rapp
|
123d3f2d38
|
Fix error in test case parameterization
|
2017-03-09 12:18:21 -07:00 |
|
Dan Rapp
|
b9307dfcd7
|
Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix
|
2017-03-09 11:42:14 -07:00 |
|
Dan Rapp
|
3b1df3808d
|
Issue #840 - URL pattenr too broad
|
2017-03-09 11:39:39 -07:00 |
|
Matthew Honnibal
|
5b0b968d13
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-03-08 15:03:10 +01:00 |
|
Matthew Honnibal
|
0ac3d27689
|
Fix handling of trailing whitespace
Fix off-by-one error that meant trailing spaces were being dropped.
Closes #792
|
2017-03-08 15:01:40 +01:00 |
|
ines
|
c2e3e651b8
|
Re-add regression test for #859
|
2017-03-08 14:36:09 +01:00 |
|
Matthew Honnibal
|
0a6d7ca200
|
Fix spacing after token_match
The boolean flag indicating a space after the token was
being set incorrectly after the token_match regex was applied.
Fixes #859.
|
2017-03-08 14:33:32 +01:00 |
|
shuvanon
|
85438aee1b
|
update tokenizertokenizer
|
2017-03-08 17:29:39 +06:00 |
|
shuvanon
|
45bc78461c
|
update tokenizertokenizer
|
2017-03-08 17:27:12 +06:00 |
|
Matthew Honnibal
|
cd33b39a04
|
Fix 2/3 problem for json save/load
|
2017-03-08 01:39:13 +01:00 |
|
Matthew Honnibal
|
40703988bc
|
Use FTRL training in parser
|
2017-03-08 01:38:51 +01:00 |
|
Matthew Honnibal
|
d108534dc2
|
Fix 2/3 problems for training
|
2017-03-08 01:37:52 +01:00 |
|
Matthew Honnibal
|
d03d6a13f1
|
Merge branch 'rominf-ud20' into develop
|
2017-03-07 21:48:56 +01:00 |
|