Commit Graph

10940 Commits

Author SHA1 Message Date
Matthew Honnibal
9bf6e93b3e Set pretrained_vectors in begin_training 2018-03-28 16:32:41 +02:00
Matthew Honnibal
95a9615221 Fix loading of multiple pre-trained vectors
This patch addresses #1660, which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.

The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.

In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it.
2018-03-28 16:02:59 +02:00
ines
07b8c255a5 Updatee example with note to install requests 2018-03-28 12:46:27 +02:00
ines
366c98a94b Remove requests dependency 2018-03-28 12:46:18 +02:00
ines
7fbc9e5874 Replace requests with urllib 2018-03-28 12:46:07 +02:00
ines
da1f200362 Add compat helpers for urllib 2018-03-28 12:45:53 +02:00
ines
ac88c72c9a Fix ftfy workaround and remove old import 2018-03-28 12:14:28 +02:00
ines
ce6071ca89 Remove ftfy dependency and update docs 2018-03-28 12:09:42 +02:00
Matthew Honnibal
070b6c6495 Remove dependency on ftfy 2018-03-28 12:07:02 +02:00
ines
6d2c85f428 Drop six and related hacks as a dependency 2018-03-28 10:45:25 +02:00
ines
9e83513004 Add position of invalid token to error message 2018-03-27 23:56:59 +02:00
ines
11c4735ccf Fix issue in Italian lemmatizer data (resolves #2050) 2018-03-27 23:55:22 +02:00
Matthew Honnibal
6a961928b2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-03-27 21:01:48 +00:00
Matthew Honnibal
b7136cb094 Support zipped vector files in init-model 2018-03-27 21:01:18 +00:00
ines
693971dd8f Improve error message if token text is empty string (see #2101) 2018-03-27 22:25:40 +02:00
ines
0c829e6605 Fix whitespace 2018-03-27 22:20:59 +02:00
Matthew Honnibal
de9fd091ac Fix #2014: token.pos_ not writeable 2018-03-27 21:21:11 +02:00
Matthew Honnibal
18da89e04c Handle non-callable gold_tuples in parser begin_training 2018-03-27 21:08:41 +02:00
Matthew Honnibal
1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
Matthew Honnibal
8b7a74570f Revert "Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop""
This reverts commit f41e626844.
2018-03-27 19:22:52 +02:00
Matthew Honnibal
f41e626844 Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to f57bfbccdc.
2018-03-27 19:22:25 +02:00
Matthew Honnibal
c9ba3d3c2d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-03-27 18:59:08 +02:00
Matthew Honnibal
92c26a35d4 Update get_cuda_stream 2018-03-27 16:42:00 +00:00
Ines Montani
e0ae390607
Update CONTRIBUTING.md 2018-03-27 13:47:00 +02:00
Matthew Honnibal
f57bfbccdc Fix non-projective label filtering 2018-03-27 13:41:33 +02:00
Matthew Honnibal
d2118792e7 Merge changes from master 2018-03-27 13:38:41 +02:00
Matthew Honnibal
d4680e4d83 Merge branch 'master' of https://github.com/explosion/spaCy 2018-03-27 13:36:37 +02:00
Matthew Honnibal
63a267b34d Fix #2073: Token.set_extension not working 2018-03-27 13:36:20 +02:00
Ines Montani
284bbb1dd1
Merge pull request #2146 from justindujardin/tensorboard-standalone-example
Add example using TensorBoard standalone projector
2018-03-27 13:23:32 +02:00
Matthew Honnibal
25280b7013 Try to make sum_state_features faster 2018-03-27 10:08:38 +00:00
Matthew Honnibal
987e1533a4 Use 8 features in parser 2018-03-27 10:08:12 +00:00
Matthew Honnibal
8bbd26579c Support GPU in UD training script 2018-03-27 09:53:35 +00:00
Matthew Honnibal
dd54511c4f Pass data as a function in begin_training methods 2018-03-27 09:39:59 +00:00
Matthew Honnibal
d9ebd78e11 Change default sizes in parser 2018-03-26 17:22:18 +02:00
Matthew Honnibal
a3d0cb15d3 Fix ent_iob tags in doc.merge to avoid inconsistent sequences 2018-03-26 07:16:06 +02:00
Matthew Honnibal
7d4687162f Update doc.ents test 2018-03-26 07:14:35 +02:00
Matthew Honnibal
514d89a3ae Set missing label for non-specified entities when setting doc.ents 2018-03-26 07:14:16 +02:00
Matthew Honnibal
54d7a1c916 Improve error message when entity sequence is inconsistent 2018-03-26 07:13:34 +02:00
Justin DuJardin
4eeb178856 Add example using TensorBoard standalone projector
- the tensorboard standalone project expects a different set of files than the plugin to TensorFlow.
2018-03-25 21:50:13 -07:00
Matthew Honnibal
938436455a Add test for ent_iob during span merge 2018-03-25 22:16:19 +02:00
Matthew Honnibal
8e08c378fe Fix entity IOB and tag in span merging 2018-03-25 22:16:01 +02:00
Matthew Honnibal
5430c43298 Set about to spacy-nightly 2018-03-25 19:30:14 +02:00
Matthew Honnibal
c059fcb0ba Update thinc requirement 2018-03-25 19:29:36 +02:00
Ines Montani
68226109f4
Merge pull request #2142 from jimregan/polish-more-tokens
more exceptions
2018-03-24 19:06:44 +01:00
Matthew Honnibal
d566e673bf Set version to v2.0.10 2018-03-24 18:09:03 +01:00
Matthew Honnibal
0d3bf0d4eb Merge branch 'master' of https://github.com/explosion/spaCy 2018-03-24 17:31:49 +01:00
dejanmarich
ccd1c04c63 Update stop_words.py
Added more words
2018-03-24 17:31:24 +01:00
ines
f1446b0257 Port over Turkish changes 2018-03-24 17:31:07 +01:00
DuyguA
cd604878a4 quick typo fix 2018-03-24 17:26:35 +01:00
Matthew Honnibal
406548b976 Support .gz and .tar.gz files in spacy init-model 2018-03-24 17:18:32 +01:00