Suraj Rajan
1cdbb7c97c
[2032] - Changed python set to cpp stl set ( #2170 )
...
Changed python set to cpp stl set #2032
## Description
Changed python set to cpp stl set. CPP stl set works better due to the logarithmic run time of its methods. Finding minimum in the cpp set is done in constant time as opposed to the worst case linear runtime of python set. Operations such as find,count,insert,delete are also done in either constant and logarithmic time thus making cpp set a better option to manage vectors.
Reference : http://www.cplusplus.com/reference/set/set/
### Types of change
Enhancement for `Vectors` for faster initialising of word vectors(fasttext)
2018-03-31 13:28:25 +02:00
Ines Montani
a609a1ca29
Merge pull request #2152 from explosion/feature/tidy-up-dependencies
...
💫 Tidy up dependencies
2018-03-29 14:35:09 +02:00
Matthew Honnibal
8308bbc617
Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts
2018-03-29 00:14:55 +02:00
Matthew Honnibal
95a9615221
Fix loading of multiple pre-trained vectors
...
This patch addresses #1660 , which was caused by keying all pre-trained
vectors with the same ID when telling Thinc how to refer to them. This
meant that if multiple models were loaded that had pre-trained vectors,
errors or incorrect behaviour resulted.
The vectors class now includes a .name attribute, which defaults to:
{nlp.meta['lang']_nlp.meta['name']}.vectors
The vectors name is set in the cfg of the pipeline components under the
key pretrained_vectors. This replaces the previous cfg key
pretrained_dims.
In order to make existing models compatible with this change, we check
for the pretrained_dims key when loading models in from_disk and
from_bytes, and add the cfg key pretrained_vectors if we find it.
2018-03-28 16:02:59 +02:00
Matthew Honnibal
8cefc58abc
Fix Vectors pickling
2018-03-14 16:59:37 +01:00
Claudiu-Vlad Ursache
e28de12cbd
Ensure files opened in from_disk
are closed
...
Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706 ).
2018-02-13 20:49:43 +01:00
Matthew Honnibal
29897ed1b3
Allow vector loading to work on 1d data files. Fixes #1831
2018-01-22 19:18:26 +01:00
Matthew Honnibal
1a1cca6052
Fix vectors.resize() on Py3. Closes #1539
2018-01-14 14:48:51 +01:00
Matthew Honnibal
36b47e3fa6
Fix (and test) vector pickling
2017-12-07 09:53:30 +01:00
Matthew Honnibal
b712de774e
Fix vectors pickling
2017-12-05 12:45:24 +01:00
Matthew Honnibal
a5ea0fdf5a
Fix #1518 : vocab.vectors.resize() didn't work
2017-11-08 22:18:37 +01:00
Matthew Honnibal
225cc249c9
Pass string path to numpy, to fix #1479
2017-11-05 14:42:46 +01:00
Matthew Honnibal
fdb4b8e456
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-11-01 02:07:17 +01:00
Matthew Honnibal
c48dd0e1d3
Fix vector pruning
2017-11-01 02:06:58 +01:00
ines
5683fd65ed
Update docstrings
2017-11-01 00:42:39 +01:00
Matthew Honnibal
c16310d156
Update vectors with find method
2017-11-01 00:34:55 +01:00
ines
2ad2f09d12
Update docstrings and simplify most_similar
2017-11-01 00:18:08 +01:00
ines
ba2e6c8c6f
Update docstrings and formatting
2017-10-31 23:23:34 +01:00
Matthew Honnibal
d90a22afe6
Fix loading previous vectors models
2017-10-31 19:58:35 +01:00
Matthew Honnibal
997a61557a
Add vectors.n_keys property
2017-10-31 19:30:52 +01:00
Matthew Honnibal
77d8f5de9a
Revise and simplify Vectors class
2017-10-31 18:25:08 +01:00
Matthew Honnibal
9c11ee4a1c
WIP on vectors fixes
2017-10-31 11:22:56 +01:00
Matthew Honnibal
368fdb389a
WIP on refactoring and fixing vectors
2017-10-31 02:00:26 +01:00
Matthew Honnibal
4112a991ec
Fix vector pruning
2017-10-30 19:44:40 +01:00
Explosion Bot
d0cf12c8c7
Fix off-by-one error in vectors
2017-10-30 16:22:03 +01:00
Explosion Bot
ab5d5ed880
Fix vectors.add()
2017-10-30 16:08:09 +01:00
Explosion Bot
72aea8f105
Update vectors.add() to allow setting keys to rows
2017-10-30 10:03:08 +01:00
ines
5167a0cce2
Tidy up Vectors and docs
2017-10-27 19:45:19 +02:00
Matthew Honnibal
cfae54c507
Make change to Vectors.__init__
2017-10-20 14:19:04 +02:00
Matthew Honnibal
92ac9316b5
Fix initialization of vectors, to address serialization problem
2017-10-20 13:59:24 +02:00
Matthew Honnibal
df488274b1
Fix deserialization of vectors
2017-10-16 20:55:00 +02:00
Matthew Honnibal
d90cc917fa
Merge vectors.pyx doc strings
2017-10-01 17:05:54 -05:00
Matthew Honnibal
b2a8b9be77
Fix inconsistency of Vectors class API
2017-10-01 17:00:34 -05:00
Matthew Honnibal
97c409b602
Add docstrings for spacy.vectors
2017-10-01 22:10:33 +02:00
Matthew Honnibal
4f38a67a89
Make width default to 0 in vectors.pyx
2017-09-17 12:29:14 -05:00
Matthew Honnibal
e0a2aa9289
Support having word vectors data on GPU
2017-09-16 12:45:09 -05:00
Matthew Honnibal
7742a6d559
Add GloVe vectors reader
2017-09-01 16:39:22 +02:00
Matthew Honnibal
b8e1603cc4
Fix load fail for missing vectors
2017-08-19 22:07:00 +02:00
Matthew Honnibal
6a94648373
Fix serialization
2017-08-19 21:27:35 +02:00
Matthew Honnibal
1157294434
Improve vector handling
2017-08-19 20:35:33 +02:00
Matthew Honnibal
93fb8b64e9
Fix vector loading
2017-08-19 19:52:25 +02:00
Matthew Honnibal
3d049af563
Improve vectors to/from disk
2017-08-19 18:42:11 +02:00
Matthew Honnibal
19c495f451
Fix vectors deserialization
2017-08-19 04:33:03 +02:00
Matthew Honnibal
ed4fb991dc
Work on vectors loading
2017-08-18 20:45:48 +02:00
Matthew Honnibal
5489b49203
Remove print statement
2017-06-05 13:20:41 +02:00
Matthew Honnibal
280d419529
Add pickle method for vectors
2017-06-05 12:36:04 +02:00
Matthew Honnibal
eb7cbb62c2
Flesh out Vectors class
2017-06-05 12:32:08 +02:00