spaCy/website/docs/api/kb.md
Sofie Van Landeghem 358cbb21e3
Define candidate generator in EL config (#5876)
* candidate generator as separate part of EL config

* update comment

* ent instead of str as input for candidate generation

* Span instead of str: correct type indication

* fix types

* unit test to create new candidate generator

* fix replace_pipe argument passing

* move error message, general cleanup

* add vocab back to KB constructor

* provide KB as callable from Vocab arg

* rename to kb_loader, fix KB serialization as part of the EL pipe

* fix typo

* reformatting

* cleanup

* fix comment

* fix wrongly duplicated code from merge conflict

* rename dump to to_disk

* from_disk instead of load_bulk

* update test after recent removal of set_morphology in tagger

* remove old doc
2020-08-18 16:10:36 +02:00

11 KiB

title teaser tag source new
KnowledgeBase A storage class for entities and aliases of a specific knowledge base (ontology) class spacy/kb.pyx 2.2

The KnowledgeBase object provides a method to generate Candidate objects, which are plausible external identifiers given a certain textual mention. Each such Candidate holds information from the relevant KB entities, such as its frequency in text and possible aliases. Each entity in the knowledge base also has a pretrained entity vector of a fixed size.

KnowledgeBase.__init__

Create the knowledge base.

Example

from spacy.kb import KnowledgeBase
vocab = nlp.vocab
kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
Name Description
vocab The shared vocabulary. Vocab
entity_vector_length Length of the fixed-size entity vectors. int

KnowledgeBase.entity_vector_length

The length of the fixed-size entity vectors in the knowledge base.

Name Description
RETURNS Length of the fixed-size entity vectors. int

KnowledgeBase.add_entity

Add an entity to the knowledge base, specifying its corpus frequency and entity vector, which should be of length entity_vector_length.

Example

kb.add_entity(entity="Q42", freq=32, entity_vector=vector1)
kb.add_entity(entity="Q463035", freq=111, entity_vector=vector2)
Name Description
entity The unique entity identifier. str
freq The frequency of the entity in a typical corpus. float
entity_vector The pretrained vector of the entity. numpy.ndarray

KnowledgeBase.set_entities

Define the full list of entities in the knowledge base, specifying the corpus frequency and entity vector for each entity.

Example

kb.set_entities(entity_list=["Q42", "Q463035"], freq_list=[32, 111], vector_list=[vector1, vector2])
Name Description
entity_list List of unique entity identifiers. Iterable[Union[str, int]]
freq_list List of entity frequencies. Iterable[int]
vector_list List of entity vectors. Iterable[numpy.ndarray]

KnowledgeBase.add_alias

Add an alias or mention to the knowledge base, specifying its potential KB identifiers and their prior probabilities. The entity identifiers should refer to entities previously added with add_entity or set_entities. The sum of the prior probabilities should not exceed 1.

Example

kb.add_alias(alias="Douglas", entities=["Q42", "Q463035"], probabilities=[0.6, 0.3])
Name Description
alias The textual mention or alias. str
entities The potential entities that the alias may refer to. Iterable[Union[str, int]]
probabilities The prior probabilities of each entity. Iterable[float]

KnowledgeBase.__len__

Get the total number of entities in the knowledge base.

Example

total_entities = len(kb)
Name Description
RETURNS The number of entities in the knowledge base. int

KnowledgeBase.get_entity_strings

Get a list of all entity IDs in the knowledge base.

Example

all_entities = kb.get_entity_strings()
Name Description
RETURNS The list of entities in the knowledge base. List[str]

KnowledgeBase.get_size_aliases

Get the total number of aliases in the knowledge base.

Example

total_aliases = kb.get_size_aliases()
Name Description
RETURNS The number of aliases in the knowledge base. int

KnowledgeBase.get_alias_strings

Get a list of all aliases in the knowledge base.

Example

all_aliases = kb.get_alias_strings()
Name Description
RETURNS The list of aliases in the knowledge base. List[str]

KnowledgeBase.get_candidates

Given a certain textual mention as input, retrieve a list of candidate entities of type Candidate.

Example

candidates = kb.get_candidates("Douglas")
Name Description
alias The textual mention or alias. str
RETURNS iterable

KnowledgeBase.get_vector

Given a certain entity ID, retrieve its pretrained entity vector.

Example

vector = kb.get_vector("Q42")
Name Description
entity The entity ID. str
RETURNS The entity vector. numpy.ndarray

KnowledgeBase.get_prior_prob

Given a certain entity ID and a certain textual mention, retrieve the prior probability of the fact that the mention links to the entity ID.

Example

probability = kb.get_prior_prob("Q42", "Douglas")
Name Description
entity The entity ID. str
alias The textual mention or alias. str
RETURNS The prior probability of the alias referring to the entity. float

KnowledgeBase.to_disk

Save the current state of the knowledge base to a directory.

Example

kb.to_disk(loc)
Name Description
loc A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. Union[str, Path]

KnowledgeBase.from_disk

Restore the state of the knowledge base from a given directory. Note that the Vocab should also be the same as the one used to create the KB.

Example

from spacy.kb import KnowledgeBase
from spacy.vocab import Vocab
vocab = Vocab().from_disk("/path/to/vocab")
kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
kb.from_disk("/path/to/kb")
Name Description
loc A path to a directory. Paths may be either strings or Path-like objects. Union[str, Path]
RETURNS The modified KnowledgeBase object. KnowledgeBase

Candidate

A Candidate object refers to a textual mention (alias) that may or may not be resolved to a specific entity from a KnowledgeBase. This will be used as input for the entity linking algorithm which will disambiguate the various candidates to the correct one. Each candidate (alias, entity) pair is assigned to a certain prior probability.

Candidate.__init__

Construct a Candidate object. Usually this constructor is not called directly, but instead these objects are returned by the get_candidates method of a KnowledgeBase.

Example

from spacy.kb import Candidate
candidate = Candidate(kb, entity_hash, entity_freq, entity_vector, alias_hash, prior_prob)
Name Description
kb The knowledge base that defined this candidate. KnowledgeBase
entity_hash The hash of the entity's KB ID. int
entity_freq The entity frequency as recorded in the KB. float
alias_hash The hash of the textual mention or alias. int
prior_prob The prior probability of the alias referring to the entity. float

Candidate attributes

Name Description
entity The entity's unique KB identifier. int
entity_ The entity's unique KB identifier. str
alias The alias or textual mention. int
alias_ The alias or textual mention. str
prior_prob The prior probability of the alias referring to the entity. long
entity_freq The frequency of the entity in a typical corpus. long
entity_vector The pretrained vector of the entity. numpy.ndarray