Commit Graph

5103 Commits

Author SHA1 Message Date
Matthew Honnibal
0ac3d27689 Fix handling of trailing whitespace
Fix off-by-one error that meant trailing spaces were being dropped.
Closes #792
2017-03-08 15:01:40 +01:00
ines
c2e3e651b8 Re-add regression test for #859 2017-03-08 14:36:09 +01:00
Matthew Honnibal
77f0594761 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-03-08 14:34:48 +01:00
Matthew Honnibal
0a6d7ca200 Fix spacing after token_match
The boolean flag indicating a space after the token was
being set incorrectly after the token_match regex was applied.
Fixes #859.
2017-03-08 14:33:32 +01:00
ines
ffe0f0c6c4 Add dill to requirements 2017-03-08 14:11:54 +01:00
shuvanon
85438aee1b update tokenizertokenizer 2017-03-08 17:29:39 +06:00
shuvanon
45bc78461c update tokenizertokenizer 2017-03-08 17:27:12 +06:00
ines
dc32e3ecb3 Fix link 2017-03-08 11:37:04 +01:00
ines
758335452d Update installation instructions and fix formatting 2017-03-08 11:36:00 +01:00
Ines Montani
34801a0725 Update README.rst 2017-03-08 11:08:09 +01:00
Matthew Honnibal
cd33b39a04 Fix 2/3 problem for json save/load 2017-03-08 01:39:13 +01:00
Matthew Honnibal
40703988bc Use FTRL training in parser 2017-03-08 01:38:51 +01:00
Matthew Honnibal
d108534dc2 Fix 2/3 problems for training 2017-03-08 01:37:52 +01:00
Matthew Honnibal
04a51dab62 Print active parser features during training 2017-03-08 01:37:19 +01:00
Matthew Honnibal
d03d6a13f1 Merge branch 'rominf-ud20' into develop 2017-03-07 21:48:56 +01:00
Matthew Honnibal
f7374d0b86 Merge branch 'ud20' of https://github.com/rominf/spaCy into rominf-ud20 2017-03-07 21:48:37 +01:00
Matthew Honnibal
16670d3251 Xfail the vocab pickling for now 2017-03-07 21:43:28 +01:00
Matthew Honnibal
a89c3500f6 Fixes to hacky vocab pickling 2017-03-07 20:58:55 +01:00
Matthew Honnibal
d814892805 Hackish pickle support for Vocab. 2017-03-07 20:25:12 +01:00
Matthew Honnibal
26614e028f Add hacky support for StringCFile, to make pickling easier. 2017-03-07 20:24:37 +01:00
ines
004c4c9566 Update installation docs
Include conda and virtualenv info for pip, add instructions for
downloading models manually and add details and fab commands to
"Compile from source" section.
2017-03-07 18:52:22 +01:00
Ines Montani
57d70ea3e1 Update README.rst 2017-03-07 17:59:30 +01:00
Matthew Honnibal
3edb8ae207 Whitespace 2017-03-07 17:16:26 +01:00
Matthew Honnibal
5de7e712b7 Add support for pickling StringStore. 2017-03-07 17:15:18 +01:00
Matthew Honnibal
4e75e74247 Update regression test for variable-length pattern problem in the matcher. 2017-03-07 16:08:32 +01:00
Matthew Honnibal
6d67213b80 Add test for 850: Matcher fails on zero-or-more. 2017-03-07 15:55:28 +01:00
Matthew Honnibal
3a5f726208 Merge pull request #874 from badbye/patch-1
**Documentation**: Edit example code
2017-03-07 15:31:28 +01:00
yalei
27c0e6226b Edit example code
The original code forget to import the `random` module and the `EntityRecognizer` module.
2017-03-07 18:07:40 +08:00
Ines Montani
f710fc3f2d Merge pull request #873 from banglakit/bn-tests
Add tests for Bengali
2017-03-05 12:13:49 +01:00
Aniruddha Adhikary
696215a3fb add tests for Bengali 2017-03-05 11:25:12 +06:00
Ines Montani
3c1411226d Update CONTRIBUTORS.md 2017-03-04 12:31:51 +01:00
Ines Montani
bb959692f5 Merge pull request #872 from banglakit/bn-improvements
[Bengali] basic tag map, morph, lemma rules and exceptions
2017-03-04 11:36:24 +01:00
Aniruddha Adhikary
8f3bfe9bfc [Bengali] basic tag map, morph, lemma rules and exceptions 2017-03-04 12:36:59 +06:00
Ines Montani
33efe77392 Update badges and add info about conda (see #778) 2017-03-03 19:15:56 +01:00
Roman Inflianskas
66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
ines
8dff040032 Revert "Add regression test for #859"
This reverts commit c4f16c66d1.
2017-03-01 21:56:20 +01:00
Juan Miguel Cejuela
25c29f072d apply patch 2017-03-01 21:44:17 +01:00
Juan Miguel Cejuela
a8cfde46d3 #781 Fix test — colocalizes is lemmatized to colocaliz and colicalize 2017-03-01 21:43:08 +01:00
Juan Miguel Cejuela
a471114eb2 #781 add regression test, failing previous bug fix 2017-03-01 21:30:51 +01:00
ines
c4f16c66d1 Add regression test for #859 2017-03-01 16:07:27 +01:00
ines
d25f17f139 Add Bengali to list of languages (see #865) 2017-03-01 15:59:21 +01:00
Matthew Honnibal
0f74002a26 Merge pull request #865 from banglakit/bn
Add basic Bengali language support
2017-03-01 15:25:58 +01:00
Aniruddha Adhikary
d91be7aed4 add punctuations for Bengali 2017-02-28 21:07:14 +06:00
Aniruddha Adhikary
5a4fc09576 add basic Bengali support 2017-02-28 07:48:37 +06:00
Matthew Honnibal
cc9b2b74e3 Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:44:39 +01:00
Matthew Honnibal
bd4375a2e6 Remove comment 2017-02-27 11:44:26 +01:00
Matthew Honnibal
e7e22d8be6 Move import within get_exceptions() function, to speed import 2017-02-27 11:34:48 +01:00
Matthew Honnibal
34bcc8706d Merge branch 'french-tokenizer-exceptions' 2017-02-27 11:21:21 +01:00
Matthew Honnibal
0aaa546435 Fix test after updating the French tokenizer stuff 2017-02-27 11:20:47 +01:00
Matthew Honnibal
26446aa728 Avoid loading all French exceptions on import
Move exceptions loading behind a get_tokenizer_exceptions() function
for French, instead of loading into the top-level namespace. This
cuts import times from 0.6s to 0.2s, at the expense of making the
French data a little different from the others (there's no top-level
TOKENIZER_EXCEPTIONS variable.) The current solution feels somewhat
unsatisfying.
2017-02-25 11:55:00 +01:00