spaCy/website/docs/api/kb.md
2019-10-02 10:37:39 +02:00

11 KiB

title teaser tag source new
KnowledgeBase A storage class for entities and aliases of a specific knowledge base (ontology) class spacy/kb.pyx 2.2

The KnowledgeBase object provides a method to generate Candidate objects, which are plausible external identifiers given a certain textual mention. Each such Candidate holds information from the relevant KB entities, such as its frequency in text and possible aliases. Each entity in the knowledge base also has a pretrained entity vector of a fixed size.

KnowledgeBase.__init__

Create the knowledge base.

Example

from spacy.kb import KnowledgeBase
vocab = nlp.vocab
kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
Name Type Description
vocab Vocab A Vocab object.
entity_vector_length int Length of the fixed-size entity vectors.
RETURNS KnowledgeBase The newly constructed object.

KnowledgeBase.entity_vector_length

The length of the fixed-size entity vectors in the knowledge base.

Name Type Description
RETURNS int Length of the fixed-size entity vectors.

KnowledgeBase.add_entity

Add an entity to the knowledge base, specifying its corpus frequency and entity vector, which should be of length entity_vector_length.

Example

kb.add_entity(entity="Q42", freq=32, entity_vector=vector1)
kb.add_entity(entity="Q463035", freq=111, entity_vector=vector2)
Name Type Description
entity unicode The unique entity identifier
freq float The frequency of the entity in a typical corpus
entity_vector vector The pretrained vector of the entity

KnowledgeBase.set_entities

Define the full list of entities in the knowledge base, specifying the corpus frequency and entity vector for each entity.

Example

kb.set_entities(entity_list=["Q42", "Q463035"], freq_list=[32, 111], vector_list=[vector1, vector2])
Name Type Description
entity_list iterable List of unique entity identifiers
freq_list iterable List of entity frequencies
vector_list iterable List of entity vectors

KnowledgeBase.add_alias

Add an alias or mention to the knowledge base, specifying its potential KB identifiers and their prior probabilities. The entity identifiers should refer to entities previously added with add_entity or set_entities. The sum of the prior probabilities should not exceed 1.

Example

kb.add_alias(alias="Douglas", entities=["Q42", "Q463035"], probabilities=[0.6, 0.3])
Name Type Description
alias unicode The textual mention or alias
entities iterable The potential entities that the alias may refer to
probabilities iterable The prior probabilities of each entity

KnowledgeBase.__len__

Get the total number of entities in the knowledge base.

Example

total_entities = len(kb)
Name Type Description
RETURNS int The number of entities in the knowledge base.

KnowledgeBase.get_entity_strings

Get a list of all entity IDs in the knowledge base.

Example

all_entities = kb.get_entity_strings()
Name Type Description
RETURNS list The list of entities in the knowledge base.

KnowledgeBase.get_size_aliases

Get the total number of aliases in the knowledge base.

Example

total_aliases = kb.get_size_aliases()
Name Type Description
RETURNS int The number of aliases in the knowledge base.

KnowledgeBase.get_alias_strings

Get a list of all aliases in the knowledge base.

Example

all_aliases = kb.get_alias_strings()
Name Type Description
RETURNS list The list of aliases in the knowledge base.

KnowledgeBase.get_candidates

Given a certain textual mention as input, retrieve a list of candidate entities of type Candidate.

Example

candidates = kb.get_candidates("Douglas")
Name Type Description
alias unicode The textual mention or alias
RETURNS iterable The list of relevant Candidate objects

KnowledgeBase.get_vector

Given a certain entity ID, retrieve its pretrained entity vector.

Example

vector = kb.get_vector("Q42")
Name Type Description
entity unicode The entity ID
RETURNS vector The entity vector

KnowledgeBase.get_prior_prob

Given a certain entity ID and a certain textual mention, retrieve the prior probability of the fact that the mention links to the entity ID.

Example

probability = kb.get_prior_prob("Q42", "Douglas")
Name Type Description
entity unicode The entity ID
alias unicode The textual mention or alias
RETURNS float The prior probability of the alias referring to the entity

KnowledgeBase.dump

Save the current state of the knowledge base to a directory.

Example

kb.dump(loc)
Name Type Description
loc unicode / Path A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.

KnowledgeBase.load_bulk

Restore the state of the knowledge base from a given directory. Note that the Vocab should also be the same as the one used to create the KB.

Example

from spacy.kb import KnowledgeBase
from spacy.vocab import Vocab
vocab = Vocab().from_disk("/path/to/vocab")
kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
kb.load_bulk("/path/to/kb")
Name Type Description
loc unicode / Path A path to a directory. Paths may be either strings or Path-like objects.
RETURNS KnowledgeBase The modified KnowledgeBase object.

Candidate.__init__

Construct a Candidate object. Usually this constructor is not called directly, but instead these objects are returned by the get_candidates method of a KnowledgeBase.

Example

from spacy.kb import Candidate
candidate = Candidate(kb, entity_hash, entity_freq, entity_vector, alias_hash, prior_prob)
Name Type Description
kb KnowledgeBase The knowledge base that defined this candidate.
entity_hash int The hash of the entity's KB ID.
entity_freq float The entity frequency as recorded in the KB.
alias_hash int The hash of the textual mention or alias.
prior_prob float The prior probability of the alias referring to the entity
RETURNS Candidate The newly constructed object.

Candidate attributes

Name Type Description
entity int The entity's unique KB identifier
entity_ unicode The entity's unique KB identifier
alias int The alias or textual mention
alias_ unicode The alias or textual mention
prior_prob long The prior probability of the alias referring to the entity
entity_freq long The frequency of the entity in a typical corpus
entity_vector vector The pretrained vector of the entity