Matthew Honnibal
23afa6429f
Add input length error, to address #1826
2018-03-29 21:45:26 +02:00
Matthew Honnibal
cca7e7ad11
Merge branch 'master' of https://github.com/explosion/spaCy
2018-03-29 20:27:06 +02:00
Matthew Honnibal
68ad366935
Improve train_new_entity_type example
2018-03-29 20:26:41 +02:00
Ines Montani
a609a1ca29
Merge pull request #2152 from explosion/feature/tidy-up-dependencies
...
💫 Tidy up dependencies
2018-03-29 14:35:09 +02:00
Viet Trung Tran
ea2af94cd9
Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer ( #2155 )
...
* support for Vietnamese
* Contributor Agreement for adding Vietnamese support on spaCy
2018-03-29 12:19:51 +02:00
Matthew Honnibal
6efb76bb3f
Require next thinc
2018-03-28 23:30:32 +00:00
ines
e6979bdbbd
Merge branch 'feature/tidy-up-dependencies' of https://github.com/explosion/spaCy into feature/tidy-up-dependencies
2018-03-29 00:19:37 +02:00
ines
83146458a2
Fix urllib for Python 3
2018-03-29 00:19:33 +02:00
Matthew Honnibal
8308bbc617
Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts
2018-03-29 00:14:55 +02:00
Matthew Honnibal
b5098079d8
Fix error on urllib
2018-03-29 00:08:16 +02:00
Ines Montani
0de599b16b
Merge pull request #2159 from explosion/feature/fix-merged-entity-iob ( resolves #1554 , resolves #1752 )
...
💫 Fix token.ent_iob after doc.merge(), and ensure consistency in doc.ents
2018-03-28 23:10:00 +02:00
Ines Montani
98e9cda677
Merge pull request #2158 from explosion/feature/fix-multiple-vectors ( resolves #1660 )
...
💫 Fix loading of multiple vector models
2018-03-28 23:08:24 +02:00
Matthew Honnibal
a7c5ae2beb
Avoid forcing a name on empty vectors, and remove print statement
2018-03-28 21:08:58 +02:00
ines
3eb67bbe4b
Allow entity types with dashes ( resolves #1967 )
2018-03-28 20:51:26 +02:00
Matthew Honnibal
cf5fcf0546
Update serialization test
2018-03-28 20:12:53 +02:00
Matthew Honnibal
4555e3e251
Dont assume pretrained_vectors cfg set in build_tagger
2018-03-28 20:12:45 +02:00
ines
9615ed5ed7
Update emoji/hashtag matcher example ( resolves #2156 ) [ci skip]
2018-03-28 18:41:28 +02:00
Matthew Honnibal
0b375d50c8
Fix ent_iob tags in doc.merge to avoid inconsistent sequences
2018-03-28 18:39:03 +02:00
Matthew Honnibal
95fa89c4b8
Update doc.ents test
2018-03-28 18:39:03 +02:00
Matthew Honnibal
e807f88410
Resolve merge when cherry-picking ent iob patches from develop
2018-03-28 18:38:13 +02:00
Matthew Honnibal
99fbc7db33
Improve error message when entity sequence is inconsistent
2018-03-28 18:36:53 +02:00
Matthew Honnibal
cbd2794be0
Add test for ent_iob during span merge
2018-03-28 18:36:53 +02:00
Matthew Honnibal
f8dd905a24
Warn and fallback if vectors have no name
2018-03-28 18:24:53 +02:00
Matthew Honnibal
fd9e259414
Add test for #1660
2018-03-28 18:22:51 +02:00
Matthew Honnibal
bc4afa9881
Remove print statement
2018-03-28 17:48:37 +02:00
Matthew Honnibal
79dc241caa
Set pretrained_vectors in parser cfg
2018-03-28 17:35:07 +02:00
Matthew Honnibal
17c3e7efa2
Add message noting vectors
2018-03-28 16:33:43 +02:00
Matthew Honnibal
9bf6e93b3e
Set pretrained_vectors in begin_training
2018-03-28 16:32:41 +02:00
Matthew Honnibal
95a9615221
Fix loading of multiple pre-trained vectors
...
This patch addresses #1660 , which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.
The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.
In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it.
2018-03-28 16:02:59 +02:00
ines
07b8c255a5
Updatee example with note to install requests
2018-03-28 12:46:27 +02:00
ines
366c98a94b
Remove requests dependency
2018-03-28 12:46:18 +02:00
ines
7fbc9e5874
Replace requests with urllib
2018-03-28 12:46:07 +02:00
ines
da1f200362
Add compat helpers for urllib
2018-03-28 12:45:53 +02:00
ines
ac88c72c9a
Fix ftfy workaround and remove old import
2018-03-28 12:14:28 +02:00
ines
ce6071ca89
Remove ftfy dependency and update docs
2018-03-28 12:09:42 +02:00
Matthew Honnibal
070b6c6495
Remove dependency on ftfy
2018-03-28 12:07:02 +02:00
ines
6d2c85f428
Drop six and related hacks as a dependency
2018-03-28 10:45:25 +02:00
ines
9e83513004
Add position of invalid token to error message
2018-03-27 23:56:59 +02:00
ines
11c4735ccf
Fix issue in Italian lemmatizer data ( resolves #2050 )
2018-03-27 23:55:22 +02:00
Matthew Honnibal
6a961928b2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-03-27 21:01:48 +00:00
Matthew Honnibal
b7136cb094
Support zipped vector files in init-model
2018-03-27 21:01:18 +00:00
ines
693971dd8f
Improve error message if token text is empty string (see #2101 )
2018-03-27 22:25:40 +02:00
ines
0c829e6605
Fix whitespace
2018-03-27 22:20:59 +02:00
Matthew Honnibal
de9fd091ac
Fix #2014 : token.pos_ not writeable
2018-03-27 21:21:11 +02:00
Matthew Honnibal
18da89e04c
Handle non-callable gold_tuples in parser begin_training
2018-03-27 21:08:41 +02:00
Matthew Honnibal
1f7229f40f
Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
...
This reverts commit c9ba3d3c2d
, reversing
changes made to 92c26a35d4
.
2018-03-27 19:23:02 +02:00
Matthew Honnibal
8b7a74570f
Revert "Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop""
...
This reverts commit f41e626844
.
2018-03-27 19:22:52 +02:00
Matthew Honnibal
f41e626844
Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
...
This reverts commit c9ba3d3c2d
, reversing
changes made to f57bfbccdc
.
2018-03-27 19:22:25 +02:00
Matthew Honnibal
c9ba3d3c2d
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2018-03-27 18:59:08 +02:00
Matthew Honnibal
92c26a35d4
Update get_cuda_stream
2018-03-27 16:42:00 +00:00