spaCy/website/docs/api/kb.mdx

---
title: KnowledgeBase
teaser:
  A storage class for entities and aliases of a specific knowledge base
  (ontology)
tag: class
source: spacy/kb/kb.pyx
version: 2.2
---

The `KnowledgeBase` object is an abstract class providing a method to generate
[`Candidate`](/api/kb#candidate) objects, which are plausible external
identifiers given a certain textual mention. Each such `Candidate` holds
information from the relevant KB entities, such as its frequency in text and
possible aliases. Each entity in the knowledge base also has a pretrained entity
vector of a fixed size.

Beyond that, `KnowledgeBase` classes have to implement a number of utility
functions called by the [`EntityLinker`](/api/entitylinker) component.

<Infobox variant="warning">

This class was not abstract up to spaCy version 3.5. The `KnowledgeBase`
implementation up to that point is available as
[`InMemoryLookupKB`](/api/inmemorylookupkb) from 3.5 onwards.

</Infobox>

## KnowledgeBase.\_\_init\_\_ {id="init",tag="method"}

`KnowledgeBase` is an abstract class and cannot be instantiated. Its child
classes should call `__init__()` to set up some necessary attributes.

> #### Example
>
> ```python
> from spacy.kb import KnowledgeBase
> from spacy.vocab import Vocab
>
> class FullyImplementedKB(KnowledgeBase):
>   def __init__(self, vocab: Vocab, entity_vector_length: int):
>       super().__init__(vocab, entity_vector_length)
>       ...
> vocab = nlp.vocab
> kb = FullyImplementedKB(vocab=vocab, entity_vector_length=64)
> ```

| Name                   | Description                                      |
| ---------------------- | ------------------------------------------------ |
| `vocab`                | The shared vocabulary. ~~Vocab~~                 |
| `entity_vector_length` | Length of the fixed-size entity vectors. ~~int~~ |

## KnowledgeBase.entity_vector_length {id="entity_vector_length",tag="property"}

The length of the fixed-size entity vectors in the knowledge base.

| Name        | Description                                      |
| ----------- | ------------------------------------------------ |
| **RETURNS** | Length of the fixed-size entity vectors. ~~int~~ |

## KnowledgeBase.get_candidates {id="get_candidates",tag="method"}

Given textual mentions for an arbitrary number of documents as input, retrieve a
list of candidate entities of type [`Candidate`](/api/kb#candidate) for each
mention. The [`EntityLinker`](/api/entitylinker) component passes a generator
that yields mentions as [`SpanGroup`](/api/spangroup))s per document.
The decision of how to batch
candidate retrieval lookups over multiple documents is left up to the
implementation of `KnowledgeBase.get_candidates()`.

> #### Example
>
> ```python
> from spacy.lang.en import English
> from spacy.tokens import SpanGroup
> nlp = English()
> doc = nlp("Douglas Adams wrote 'The Hitchhiker's Guide to the Galaxy'.")
> candidates = kb.get_candidates([SpanGroup(doc, spans=[doc[0:2], doc[3:]]])
> ```

| Name        | Description                                                                                                                                                                          |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `mentions`  | The textual mentions or aliases (one `SpanGroup` per `Doc` instance). ~~Iterator[SpanGroup]~~                                                                                        |
| **RETURNS** | An iterator (per document) over iterables (per mention) of iterables (per candidate for this mention) with relevant `Candidate` objects. ~~Iterator[Iterable[Iterable[Candidate]]]~~ |

## KnowledgeBase.get_vector {id="get_vector",tag="method"}

Given a certain entity ID, retrieve its pretrained entity vector.

> #### Example
>
> ```python
> vector = kb.get_vector("Q42")
> ```

| Name        | Description                            |
| ----------- | -------------------------------------- |
| `entity`    | The entity ID. ~~str~~                 |
| **RETURNS** | The entity vector. ~~Iterable[float]~~ |

## KnowledgeBase.get_vectors {id="get_vectors",tag="method"}

Same as [`get_vector()`](/api/kb#get_vector), but for an arbitrary number of
entity IDs.

The default implementation of `get_vectors()` executes `get_vector()` in a loop.
We recommend implementing a more efficient way to retrieve vectors for multiple
entities at once, if performance is of concern to you.

> #### Example
>
> ```python
> vectors = kb.get_vectors(("Q42", "Q3107329"))
> ```

| Name        | Description                                               |
| ----------- | --------------------------------------------------------- |
| `entities`  | The entity IDs. ~~Iterable[str]~~                         |
| **RETURNS** | The entity vectors. ~~Iterable[Iterable[numpy.ndarray]]~~ |

## KnowledgeBase.to_disk {id="to_disk",tag="method"}

Save the current state of the knowledge base to a directory.

> #### Example
>
> ```python
> kb.to_disk(path)
> ```

| Name      | Description                                                                                                                                |
| --------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `path`    | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
| `exclude` | List of components to exclude. ~~Iterable[str]~~                                                                                           |

## KnowledgeBase.from_disk {id="from_disk",tag="method"}

Restore the state of the knowledge base from a given directory. Note that the
[`Vocab`](/api/vocab) should also be the same as the one used to create the KB.

> #### Example
>
> ```python
> from spacy.vocab import Vocab
> vocab = Vocab().from_disk("/path/to/vocab")
> kb = FullyImplementedKB(vocab=vocab, entity_vector_length=64)
> kb.from_disk("/path/to/kb")
> ```

| Name        | Description                                                                                     |
| ----------- | ----------------------------------------------------------------------------------------------- |
| `loc`       | A path to a directory. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
| `exclude`   | List of components to exclude. ~~Iterable[str]~~                                                |
| **RETURNS** | The modified `KnowledgeBase` object. ~~KnowledgeBase~~                                          |

## InMemoryCandidate {id="candidate",tag="class"}

An `InMemoryCandidate` object refers to a textual mention (alias) that may or
may not be resolved to a specific entity from a `KnowledgeBase`. This will be
used as input for the entity linking algorithm which will disambiguate the
various candidates to the correct one. Each candidate `(alias, entity)` pair is
assigned to a certain prior probability.

### InMemoryCandidate.\_\_init\_\_ {id="candidate-init",tag="method"}

Construct an `InMemoryCandidate` object. Usually this constructor is not called
directly, but instead these objects are returned by the `get_candidates` method
of the [`entity_linker`](/api/entitylinker) pipe.

> #### Example
>
> ```python
> from spacy.kb import InMemoryCandidate candidate = InMemoryCandidate(kb,
> entity_hash, entity_freq, entity_vector, alias_hash, prior_prob)
> ```

| Name          | Description                                                               |
| ------------- | ------------------------------------------------------------------------- |
| `kb`          | The knowledge base that defined this candidate. ~~KnowledgeBase~~         |
| `entity_hash` | The hash of the entity's KB ID. ~~int~~                                   |
| `entity_freq` | The entity frequency as recorded in the KB. ~~float~~                     |
| `alias_hash`  | The hash of the entity alias. ~~int~~                                     |
| `prior_prob`  | The prior probability of the `alias` referring to the `entity`. ~~float~~ |

## InMemoryCandidate attributes {id="candidate-attributes"}

| Name            | Description                                                              |
| --------------- | ------------------------------------------------------------------------ |
| `entity`        | The entity's unique KB identifier. ~~int~~                               |
| `entity_`       | The entity's unique KB identifier. ~~str~~                               |
| `alias`         | The alias or textual mention. ~~int~~                                    |
| `alias_`        | The alias or textual mention. ~~str~~                                    |
| `prior_prob`    | The prior probability of the `alias` referring to the `entity`. ~~long~~ |
| `entity_freq`   | The frequency of the entity in a typical corpus. ~~long~~                |
| `entity_vector` | The pretrained vector of the entity. ~~numpy.ndarray~~                   |