mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 01:04:34 +03:00
0fe43f40f1
* Support registered vectors * Format * Auto-fill [nlp] on load from config and from bytes/disk * Only auto-fill [nlp] * Undo all changes to Language.from_disk * Expand BaseVectors These methods are needed in various places for training and vector similarity. * isort * More linting * Only fill [nlp.vectors] * Update spacy/vocab.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Revert changes to test related to auto-filling [nlp] * Add vectors registry * Rephrase error about vocab methods for vectors * Switch to dummy implementation for BaseVectors.to_ops * Add initial draft of docs * Remove example from BaseVectors docs * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/docs/api/basevectors.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix type and lint bpemb example * Update website/docs/api/basevectors.mdx --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
144 lines
6.3 KiB
Plaintext
144 lines
6.3 KiB
Plaintext
---
|
|
title: BaseVectors
|
|
teaser: Abstract class for word vectors
|
|
tag: class
|
|
source: spacy/vectors.pyx
|
|
version: 3.7
|
|
---
|
|
|
|
`BaseVectors` is an abstract class to support the development of custom vectors
|
|
implementations.
|
|
|
|
For use in training with [`StaticVectors`](/api/architectures#staticvectors),
|
|
`get_batch` must be implemented. For improved performance, use efficient
|
|
batching in `get_batch` and implement `to_ops` to copy the vector data to the
|
|
current device. See an example custom implementation for
|
|
[BPEmb subword embeddings](/usage/embeddings-transformers#custom-vectors).
|
|
|
|
## BaseVectors.\_\_init\_\_ {id="init",tag="method"}
|
|
|
|
Create a new vector store.
|
|
|
|
| Name | Description |
|
|
| -------------- | --------------------------------------------------------------------------------------------------------------------- |
|
|
| _keyword-only_ | |
|
|
| `strings` | The string store. A new string store is created if one is not provided. Defaults to `None`. ~~Optional[StringStore]~~ |
|
|
|
|
## BaseVectors.\_\_getitem\_\_ {id="getitem",tag="method"}
|
|
|
|
Get a vector by key. If the key is not found in the table, a `KeyError` should
|
|
be raised.
|
|
|
|
| Name | Description |
|
|
| ----------- | ---------------------------------------------------------------- |
|
|
| `key` | The key to get the vector for. ~~Union[int, str]~~ |
|
|
| **RETURNS** | The vector for the key. ~~numpy.ndarray[ndim=1, dtype=float32]~~ |
|
|
|
|
## BaseVectors.\_\_len\_\_ {id="len",tag="method"}
|
|
|
|
Return the number of vectors in the table.
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------------------- |
|
|
| **RETURNS** | The number of vectors in the table. ~~int~~ |
|
|
|
|
## BaseVectors.\_\_contains\_\_ {id="contains",tag="method"}
|
|
|
|
Check whether there is a vector entry for the given key.
|
|
|
|
| Name | Description |
|
|
| ----------- | -------------------------------------------- |
|
|
| `key` | The key to check. ~~int~~ |
|
|
| **RETURNS** | Whether the key has a vector entry. ~~bool~~ |
|
|
|
|
## BaseVectors.add {id="add",tag="method"}
|
|
|
|
Add a key to the table, if possible. If no keys can be added, return `-1`.
|
|
|
|
| Name | Description |
|
|
| ----------- | ----------------------------------------------------------------------------------- |
|
|
| `key` | The key to add. ~~Union[str, int]~~ |
|
|
| **RETURNS** | The row the vector was added to, or `-1` if the operation is not supported. ~~int~~ |
|
|
|
|
## BaseVectors.shape {id="shape",tag="property"}
|
|
|
|
Get `(rows, dims)` tuples of number of rows and number of dimensions in the
|
|
vector table.
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------------------ |
|
|
| **RETURNS** | A `(rows, dims)` pair. ~~Tuple[int, int]~~ |
|
|
|
|
## BaseVectors.size {id="size",tag="property"}
|
|
|
|
The vector size, i.e. `rows * dims`.
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------ |
|
|
| **RETURNS** | The vector size. ~~int~~ |
|
|
|
|
## BaseVectors.is_full {id="is_full",tag="property"}
|
|
|
|
Whether the vectors table is full and no slots are available for new keys.
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------------------- |
|
|
| **RETURNS** | Whether the vectors table is full. ~~bool~~ |
|
|
|
|
## BaseVectors.get_batch {id="get_batch",tag="method",version="3.2"}
|
|
|
|
Get the vectors for the provided keys efficiently as a batch. Required to use
|
|
the vectors with [`StaticVectors`](/api/architectures#StaticVectors) for
|
|
training.
|
|
|
|
| Name | Description |
|
|
| ------ | --------------------------------------- |
|
|
| `keys` | The keys. ~~Iterable[Union[int, str]]~~ |
|
|
|
|
## BaseVectors.to_ops {id="to_ops",tag="method"}
|
|
|
|
Dummy method. Implement this to change the embedding matrix to use different
|
|
Thinc ops.
|
|
|
|
| Name | Description |
|
|
| ----- | -------------------------------------------------------- |
|
|
| `ops` | The Thinc ops to switch the embedding matrix to. ~~Ops~~ |
|
|
|
|
## BaseVectors.to_disk {id="to_disk",tag="method"}
|
|
|
|
Dummy method to allow serialization. Implement to save vector data with the
|
|
pipeline.
|
|
|
|
| Name | Description |
|
|
| ------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
|
|
| `path` | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
|
|
|
|
## BaseVectors.from_disk {id="from_disk",tag="method"}
|
|
|
|
Dummy method to allow serialization. Implement to load vector data from a saved
|
|
pipeline.
|
|
|
|
| Name | Description |
|
|
| ----------- | ----------------------------------------------------------------------------------------------- |
|
|
| `path` | A path to a directory. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
|
|
| **RETURNS** | The modified vectors object. ~~BaseVectors~~ |
|
|
|
|
## BaseVectors.to_bytes {id="to_bytes",tag="method"}
|
|
|
|
Dummy method to allow serialization. Implement to serialize vector data to a
|
|
binary string.
|
|
|
|
| Name | Description |
|
|
| ----------- | ---------------------------------------------------- |
|
|
| **RETURNS** | The serialized form of the vectors object. ~~bytes~~ |
|
|
|
|
## BaseVectors.from_bytes {id="from_bytes",tag="method"}
|
|
|
|
Dummy method to allow serialization. Implement to load vector data from a binary
|
|
string.
|
|
|
|
| Name | Description |
|
|
| ----------- | ----------------------------------- |
|
|
| `data` | The data to load from. ~~bytes~~ |
|
|
| **RETURNS** | The vectors object. ~~BaseVectors~~ |
|