Matthew Honnibal
95a9615221
Fix loading of multiple pre-trained vectors
...
This patch addresses #1660 , which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.
The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.
In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it.
2018-03-28 16:02:59 +02:00
Matthew Honnibal
43f381ce36
Make Vocab.__contains__ work with ints. Fixes #1868
2018-01-23 23:26:47 +01:00
Matthew Honnibal
0153220304
Make set_vector add word to vocab. Fixes #1807
2018-01-14 13:57:57 +01:00
Matthew Honnibal
07acb43a85
Merge branch 'master' of https://github.com/explosion/spaCy
2017-12-04 14:42:52 +01:00
Matthew Honnibal
79f11d4f85
Pickle vectors with vocab
2017-11-23 17:19:50 +01:00
Roman Domrachev
b3311100c7
Merge branch 'master' of github.com:explosion/spaCy
2017-11-15 18:30:04 +03:00
ines
8e65247886
Fix lex.id if vectors is None
2017-11-15 14:23:58 +01:00
Matthew Honnibal
2f169fdb0a
Set lex ID correctly for new tokens in Vocab
2017-11-15 13:58:03 +01:00
Roman Domrachev
3e21680814
Use safer method to get string without hit
2017-11-14 22:58:46 +03:00
Roman Domrachev
91e2fa6561
Clean all caches
2017-11-14 21:15:04 +03:00
Matthew Honnibal
c16310d156
Update vectors with find method
2017-11-01 00:34:55 +01:00
Matthew Honnibal
c5799ecc7b
Remove print statement
2017-10-31 21:12:33 +01:00
Matthew Honnibal
77d8f5de9a
Revise and simplify Vectors class
2017-10-31 18:25:08 +01:00
Matthew Honnibal
cb5217012f
Fix vector remapping
2017-10-31 11:40:46 +01:00
Matthew Honnibal
9c11ee4a1c
WIP on vectors fixes
2017-10-31 11:22:56 +01:00
Matthew Honnibal
368fdb389a
WIP on refactoring and fixing vectors
2017-10-31 02:00:26 +01:00
Matthew Honnibal
4e3006cec7
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-30 19:44:58 +01:00
Matthew Honnibal
4112a991ec
Fix vector pruning
2017-10-30 19:44:40 +01:00
ines
ec657c1ddc
Update vocab docs and document Vocab.prune_vectors
2017-10-30 19:35:41 +01:00
Matthew Honnibal
e026b29ea9
Add prune_vectors method to Vocab
2017-10-30 17:59:43 +01:00
Explosion Bot
aa64031751
Fix clear_vectors() method on Vocab
2017-10-30 16:09:04 +01:00
Explosion Bot
7b56b2f04b
Add Vocab.cfg attr, to hold stuff like oov probs
2017-10-30 16:08:50 +01:00
ines
d96e72f656
Tidy up rest
2017-10-27 21:07:59 +02:00
ines
5167a0cce2
Tidy up Vectors and docs
2017-10-27 19:45:19 +02:00
Matthew Honnibal
c52671420c
Remove old cfile import
2017-10-26 13:28:19 +02:00
Matthew Honnibal
ef3e5a361b
Merge pull request #1442 from explosion/feature/fix-sp
...
💫 Fix SP tag, tweak Vectors.__init__, fix Morphology
2017-10-24 10:24:07 +02:00
Matthew Honnibal
8f8bccecb9
Patch deserialisation for invalid loads, to avoid model failure
2017-10-21 00:51:42 +02:00
Matthew Honnibal
9010a1a060
Create vectors correctly
2017-10-20 14:19:46 +02:00
Matthew Honnibal
33229b1c9e
Remove print statement
2017-10-20 14:19:29 +02:00
Matthew Honnibal
92ac9316b5
Fix initialization of vectors, to address serialization problem
2017-10-20 13:59:24 +02:00
Matthew Honnibal
0d57b9748a
Serialize lex_attr_getters with dill, for better pickle support
2017-10-17 18:17:45 +02:00
ines
b776f48e58
Fix typo
2017-10-01 21:58:45 +02:00
Matthew Honnibal
2cf0f4622f
Fix loading of models with pre-trained vectors
2017-10-01 14:05:32 -05:00
Matthew Honnibal
5aaef3e7b8
Dont link vectors in vocab deserialize
2017-09-26 06:45:47 -05:00
Matthew Honnibal
d9124f1aa3
Add link_vectors_to_models function
2017-09-22 09:38:22 -05:00
Matthew Honnibal
039d609362
Remove hard-coded default vectors width
2017-09-17 12:29:39 -05:00
Matthew Honnibal
83f8e98450
Fix retrieval of OOV vectors
2017-08-22 19:46:35 +02:00
Matthew Honnibal
5b329acbf2
Fix vectors_length property in vocab
2017-08-22 19:00:27 +02:00
Matthew Honnibal
6a94648373
Fix serialization
2017-08-19 21:27:35 +02:00
Matthew Honnibal
1157294434
Improve vector handling
2017-08-19 20:35:33 +02:00
Matthew Honnibal
93fb8b64e9
Fix vector loading
2017-08-19 19:52:25 +02:00
Matthew Honnibal
49a615e7d9
Create Vectors object in Vocab
2017-08-19 18:50:16 +02:00
Matthew Honnibal
2993b54fff
Load vectors in vocab
2017-08-18 20:46:56 +02:00
Matthew Honnibal
add9a33782
Return False for vocab.has_vector
2017-06-04 14:26:14 -05:00
ines
05fe6758a7
Set lexeme attributes for tokenizer special cases
2017-06-03 19:44:39 +02:00
ines
41a6adf1f6
Initialise Vocab length correctly
2017-06-02 10:57:25 +02:00
ines
53b82f972a
Add strings to Vocab in init, instead of StringStore
2017-06-02 10:57:06 +02:00
ines
023f38bdd4
Fix return value of Vocab.from_bytes
2017-06-02 10:56:40 +02:00
Matthew Honnibal
307d615c5f
Fix serialization for tagger when tag_map has changed
2017-06-01 12:18:36 -05:00
Matthew Honnibal
9805e0e369
Fix vocab pickling
2017-05-31 08:25:01 -05:00