Dan Rapp
b9307dfcd7
Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix
2017-03-09 11:42:14 -07:00
Dan Rapp
3b1df3808d
Issue #840 - URL pattenr too broad
2017-03-09 11:39:39 -07:00
Matthew Honnibal
5b0b968d13
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-03-08 15:03:10 +01:00
Matthew Honnibal
0ac3d27689
Fix handling of trailing whitespace
...
Fix off-by-one error that meant trailing spaces were being dropped.
Closes #792
2017-03-08 15:01:40 +01:00
ines
c2e3e651b8
Re-add regression test for #859
2017-03-08 14:36:09 +01:00
Matthew Honnibal
77f0594761
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-03-08 14:34:48 +01:00
Matthew Honnibal
0a6d7ca200
Fix spacing after token_match
...
The boolean flag indicating a space after the token was
being set incorrectly after the token_match regex was applied.
Fixes #859 .
2017-03-08 14:33:32 +01:00
ines
ffe0f0c6c4
Add dill to requirements
2017-03-08 14:11:54 +01:00
shuvanon
85438aee1b
update tokenizertokenizer
2017-03-08 17:29:39 +06:00
shuvanon
45bc78461c
update tokenizertokenizer
2017-03-08 17:27:12 +06:00
ines
dc32e3ecb3
Fix link
2017-03-08 11:37:04 +01:00
ines
758335452d
Update installation instructions and fix formatting
2017-03-08 11:36:00 +01:00
Ines Montani
34801a0725
Update README.rst
2017-03-08 11:08:09 +01:00
Matthew Honnibal
cd33b39a04
Fix 2/3 problem for json save/load
2017-03-08 01:39:13 +01:00
Matthew Honnibal
40703988bc
Use FTRL training in parser
2017-03-08 01:38:51 +01:00
Matthew Honnibal
d108534dc2
Fix 2/3 problems for training
2017-03-08 01:37:52 +01:00
Matthew Honnibal
04a51dab62
Print active parser features during training
2017-03-08 01:37:19 +01:00
Matthew Honnibal
d03d6a13f1
Merge branch 'rominf-ud20' into develop
2017-03-07 21:48:56 +01:00
Matthew Honnibal
f7374d0b86
Merge branch 'ud20' of https://github.com/rominf/spaCy into rominf-ud20
2017-03-07 21:48:37 +01:00
Matthew Honnibal
16670d3251
Xfail the vocab pickling for now
2017-03-07 21:43:28 +01:00
Matthew Honnibal
a89c3500f6
Fixes to hacky vocab pickling
2017-03-07 20:58:55 +01:00
Matthew Honnibal
d814892805
Hackish pickle support for Vocab.
2017-03-07 20:25:12 +01:00
Matthew Honnibal
26614e028f
Add hacky support for StringCFile, to make pickling easier.
2017-03-07 20:24:37 +01:00
ines
004c4c9566
Update installation docs
...
Include conda and virtualenv info for pip, add instructions for
downloading models manually and add details and fab commands to
"Compile from source" section.
2017-03-07 18:52:22 +01:00
Ines Montani
57d70ea3e1
Update README.rst
2017-03-07 17:59:30 +01:00
Matthew Honnibal
3edb8ae207
Whitespace
2017-03-07 17:16:26 +01:00
Matthew Honnibal
5de7e712b7
Add support for pickling StringStore.
2017-03-07 17:15:18 +01:00
Matthew Honnibal
4e75e74247
Update regression test for variable-length pattern problem in the matcher.
2017-03-07 16:08:32 +01:00
Matthew Honnibal
6d67213b80
Add test for 850: Matcher fails on zero-or-more.
2017-03-07 15:55:28 +01:00
Matthew Honnibal
3a5f726208
Merge pull request #874 from badbye/patch-1
...
**Documentation**: Edit example code
2017-03-07 15:31:28 +01:00
yalei
27c0e6226b
Edit example code
...
The original code forget to import the `random` module and the `EntityRecognizer` module.
2017-03-07 18:07:40 +08:00
Ines Montani
f710fc3f2d
Merge pull request #873 from banglakit/bn-tests
...
Add tests for Bengali
2017-03-05 12:13:49 +01:00
Aniruddha Adhikary
696215a3fb
add tests for Bengali
2017-03-05 11:25:12 +06:00
Ines Montani
3c1411226d
Update CONTRIBUTORS.md
2017-03-04 12:31:51 +01:00
Ines Montani
bb959692f5
Merge pull request #872 from banglakit/bn-improvements
...
[Bengali] basic tag map, morph, lemma rules and exceptions
2017-03-04 11:36:24 +01:00
Aniruddha Adhikary
8f3bfe9bfc
[Bengali] basic tag map, morph, lemma rules and exceptions
2017-03-04 12:36:59 +06:00
Ines Montani
33efe77392
Update badges and add info about conda (see #778 )
2017-03-03 19:15:56 +01:00
Roman Inflianskas
66e1109b53
Add support for Universal Dependencies v2.0
2017-03-03 13:17:34 +01:00
ines
8dff040032
Revert "Add regression test for #859 "
...
This reverts commit c4f16c66d1
.
2017-03-01 21:56:20 +01:00
ines
c4f16c66d1
Add regression test for #859
2017-03-01 16:07:27 +01:00
ines
d25f17f139
Add Bengali to list of languages (see #865 )
2017-03-01 15:59:21 +01:00
Matthew Honnibal
0f74002a26
Merge pull request #865 from banglakit/bn
...
Add basic Bengali language support
2017-03-01 15:25:58 +01:00
Aniruddha Adhikary
d91be7aed4
add punctuations for Bengali
2017-02-28 21:07:14 +06:00
Aniruddha Adhikary
5a4fc09576
add basic Bengali support
2017-02-28 07:48:37 +06:00
Matthew Honnibal
cc9b2b74e3
Merge branch 'french-tokenizer-exceptions'
2017-02-27 11:44:39 +01:00
Matthew Honnibal
bd4375a2e6
Remove comment
2017-02-27 11:44:26 +01:00
Matthew Honnibal
e7e22d8be6
Move import within get_exceptions() function, to speed import
2017-02-27 11:34:48 +01:00
Matthew Honnibal
34bcc8706d
Merge branch 'french-tokenizer-exceptions'
2017-02-27 11:21:21 +01:00
Matthew Honnibal
0aaa546435
Fix test after updating the French tokenizer stuff
2017-02-27 11:20:47 +01:00
Matthew Honnibal
26446aa728
Avoid loading all French exceptions on import
...
Move exceptions loading behind a get_tokenizer_exceptions() function
for French, instead of loading into the top-level namespace. This
cuts import times from 0.6s to 0.2s, at the expense of making the
French data a little different from the others (there's no top-level
TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat
unsatisfying.
2017-02-25 11:55:00 +01:00