Vectors data is kept in the Vectors.data attribute, which should be an
instance of numpy.ndarray (for CPU vectors) or cupy.ndarray (for GPU
vectors). Multiple keys can be mapped to the same vector, and not all of the
rows in the table need to be assigned – so vectors.n_keys may be greater or
smaller than vectors.shape[0].
Vectors.__init__
Create a new vector store. You can set the vector values and keys directly on
initialization, or supply a shape keyword argument to create an empty table
you can add vectors to later.
Size of the table as (n_entries, n_columns), the number of entries and number of columns. Not required if you're initializing the object with data and keys.
name
str
A name to identify the vectors table.
RETURNS
Vectors
The newly created object.
Vectors.__getitem__
Get a vector by key. If the key is not found in the table, a KeyError is
raised.
Add a key to the table, optionally setting a vector value as well. Keys can be
mapped to an existing vector by setting row, or a new vector can be added.
When adding unicode keys, keep in mind that the Vectors class itself has no
StringStore, so you have to store the hash-to-string
mapping separately. If you need to manage the strings, you should use the
Vectors via the Vocab class, e.g. vocab.vectors.
An optional row number of a vector to map the key to.
RETURNS
int
The row the vector was added to.
Vectors.resize
Resize the underlying vectors array. If inplace=True, the memory is
reallocated. This may cause other references to the data to become invalid, so
only use inplace=True if you're sure that's what you want. If the number of
vectors is reduced, keys mapped to rows that have been deleted are removed.
These removed items are returned as a list of (key, row) tuples.
Example
removed=nlp.vocab.vectors.resize((10000,300))
Name
Type
Description
shape
tuple
A (rows, dims) tuple describing the number of rows and dimensions.
Iterate over vectors that have been assigned to at least one key. Note that some
vectors may be unassigned, so the number of vectors returned may be less than
the length of the vectors table.
Get the number of keys in the table. Note that this is the number of all keys,
not just unique vectors. If several keys are mapped are mapped to the same
vectors, they will be counted individually.
For each of the given vectors, find the n most similar entries to it, by
cosine. Queries are by vector. Results are returned as a
(keys, best_rows, scores) tuple. If queries is large, the calculations are
performed in chunks, to avoid consuming too much memory. You can set the
batch_size to control the size/space trade-off during the calculations.