Vectors data is kept in the Vectors.data attribute, which should be an
instance of numpy.ndarray (for CPU vectors) or cupy.ndarray (for GPU
vectors). Multiple keys can be mapped to the same vector, and not all of the
rows in the table need to be assigned – so vectors.n_keys may be greater or
smaller than vectors.shape[0].
Vectors.__init__
Create a new vector store. You can set the vector values and keys directly on
initialization, or supply a shape keyword argument to create an empty table
you can add vectors to later.
Size of the table as (n_entries, n_columns), the number of entries and number of columns. Not required if you're initializing the object with data and keys. Tuple[int, int]
data
The vector data. numpy.ndarray[ndim=1, dtype=float32]
keys
A sequence of keys aligned with the data. Iterable[Union[str, int]]
name
A name to identify the vectors table. str
Vectors.__getitem__
Get a vector by key. If the key is not found in the table, a KeyError is
raised.
Add a key to the table, optionally setting a vector value as well. Keys can be
mapped to an existing vector by setting row, or a new vector can be added.
When adding string keys, keep in mind that the Vectors class itself has no
StringStore, so you have to store the hash-to-string
mapping separately. If you need to manage the strings, you should use the
Vectors via the Vocab class, e.g. vocab.vectors.
An optional vector to add for the key. numpy.ndarray[ndim=1, dtype=float32]
row
An optional row number of a vector to map the key to. int
RETURNS
The row the vector was added to. int
Vectors.resize
Resize the underlying vectors array. If inplace=True, the memory is
reallocated. This may cause other references to the data to become invalid, so
only use inplace=True if you're sure that's what you want. If the number of
vectors is reduced, keys mapped to rows that have been deleted are removed.
These removed items are returned as a list of (key, row) tuples.
Example
removed=nlp.vocab.vectors.resize((10000,300))
Name
Description
shape
A (rows, dims) tuple describing the number of rows and dimensions. Tuple[int, int]
inplace
Reallocate the memory. bool
RETURNS
The removed items as a list of (key, row) tuples. List[Tuple[int, int]]
Iterate over vectors that have been assigned to at least one key. Note that some
vectors may be unassigned, so the number of vectors returned may be less than
the length of the vectors table.
Get the number of keys in the table. Note that this is the number of all keys,
not just unique vectors. If several keys are mapped to the same
vectors, they will be counted individually.
For each of the given vectors, find the n most similar entries to it by
cosine. Queries are by vector. Results are returned as a
(keys, best_rows, scores) tuple. If queries is large, the calculations are
performed in chunks to avoid consuming too much memory. You can set the
batch_size to control the size/space trade-off during the calculations.
Stored vectors data. numpy is used for CPU vectors, cupy for GPU vectors. Union[numpy.ndarray[ndim=1, dtype=float32], cupy.ndarray[ndim=1, dtype=float32]]
key2row
Dictionary mapping word hashes to rows in the Vectors.data table. Dict[int, int]
keys
Array keeping the keys in order, such that keys[vectors.key2row[key]] == key. Union[numpy.ndarray[ndim=1, dtype=float32], cupy.ndarray[ndim=1, dtype=float32]]