title |
teaser |
tag |
source |
new |
KnowledgeBase |
A storage class for entities and aliases of a specific knowledge base (ontology) |
class |
spacy/kb.pyx |
2.2 |
The KnowledgeBase
object provides a method to generate
Candidate
objects, which are plausible external
identifiers given a certain textual mention. Each such Candidate
holds
information from the relevant KB entities, such as its frequency in text and
possible aliases. Each entity in the knowledge base also has a pretrained entity
vector of a fixed size.
KnowledgeBase.__init__
Create the knowledge base.
Example
from spacy.kb import KnowledgeBase
vocab = nlp.vocab
kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
Name |
Type |
Description |
vocab |
Vocab |
A Vocab object. |
entity_vector_length |
int |
Length of the fixed-size entity vectors. |
KnowledgeBase.entity_vector_length
The length of the fixed-size entity vectors in the knowledge base.
Name |
Type |
Description |
RETURNS |
int |
Length of the fixed-size entity vectors. |
KnowledgeBase.add_entity
Add an entity to the knowledge base, specifying its corpus frequency and entity
vector, which should be of length
entity_vector_length
.
Example
kb.add_entity(entity="Q42", freq=32, entity_vector=vector1)
kb.add_entity(entity="Q463035", freq=111, entity_vector=vector2)
Name |
Type |
Description |
entity |
str |
The unique entity identifier |
freq |
float |
The frequency of the entity in a typical corpus |
entity_vector |
vector |
The pretrained vector of the entity |
KnowledgeBase.set_entities
Define the full list of entities in the knowledge base, specifying the corpus
frequency and entity vector for each entity.
Example
kb.set_entities(entity_list=["Q42", "Q463035"], freq_list=[32, 111], vector_list=[vector1, vector2])
Name |
Type |
Description |
entity_list |
iterable |
List of unique entity identifiers |
freq_list |
iterable |
List of entity frequencies |
vector_list |
iterable |
List of entity vectors |
KnowledgeBase.add_alias
Add an alias or mention to the knowledge base, specifying its potential KB
identifiers and their prior probabilities. The entity identifiers should refer
to entities previously added with add_entity
or
set_entities
. The sum of the prior probabilities
should not exceed 1.
Example
kb.add_alias(alias="Douglas", entities=["Q42", "Q463035"], probabilities=[0.6, 0.3])
Name |
Type |
Description |
alias |
str |
The textual mention or alias |
entities |
iterable |
The potential entities that the alias may refer to |
probabilities |
iterable |
The prior probabilities of each entity |
KnowledgeBase.__len__
Get the total number of entities in the knowledge base.
Example
total_entities = len(kb)
Name |
Type |
Description |
RETURNS |
int |
The number of entities in the knowledge base. |
KnowledgeBase.get_entity_strings
Get a list of all entity IDs in the knowledge base.
Example
all_entities = kb.get_entity_strings()
Name |
Type |
Description |
RETURNS |
list |
The list of entities in the knowledge base. |
KnowledgeBase.get_size_aliases
Get the total number of aliases in the knowledge base.
Example
total_aliases = kb.get_size_aliases()
Name |
Type |
Description |
RETURNS |
int |
The number of aliases in the knowledge base. |
KnowledgeBase.get_alias_strings
Get a list of all aliases in the knowledge base.
Example
all_aliases = kb.get_alias_strings()
Name |
Type |
Description |
RETURNS |
list |
The list of aliases in the knowledge base. |
KnowledgeBase.get_candidates
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate
.
Example
candidates = kb.get_candidates("Douglas")
Name |
Type |
Description |
alias |
str |
The textual mention or alias |
RETURNS |
iterable |
The list of relevant Candidate objects |
KnowledgeBase.get_vector
Given a certain entity ID, retrieve its pretrained entity vector.
Example
vector = kb.get_vector("Q42")
Name |
Type |
Description |
entity |
str |
The entity ID |
RETURNS |
vector |
The entity vector |
KnowledgeBase.get_prior_prob
Given a certain entity ID and a certain textual mention, retrieve the prior
probability of the fact that the mention links to the entity ID.
Example
probability = kb.get_prior_prob("Q42", "Douglas")
Name |
Type |
Description |
entity |
str |
The entity ID |
alias |
str |
The textual mention or alias |
RETURNS |
float |
The prior probability of the alias referring to the entity |
KnowledgeBase.dump
Save the current state of the knowledge base to a directory.
Example
kb.dump(loc)
Name |
Type |
Description |
loc |
str / Path |
A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path -like objects. |
KnowledgeBase.load_bulk
Restore the state of the knowledge base from a given directory. Note that the
Vocab
should also be the same as the one used to create the KB.
Example
from spacy.kb import KnowledgeBase
from spacy.vocab import Vocab
vocab = Vocab().from_disk("/path/to/vocab")
kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
kb.load_bulk("/path/to/kb")
Name |
Type |
Description |
loc |
str / Path |
A path to a directory. Paths may be either strings or Path -like objects. |
RETURNS |
KnowledgeBase |
The modified KnowledgeBase object. |
Candidate.__init__
Construct a Candidate
object. Usually this constructor is not called directly,
but instead these objects are returned by the
get_candidates
method of a KnowledgeBase
.
Example
from spacy.kb import Candidate
candidate = Candidate(kb, entity_hash, entity_freq, entity_vector, alias_hash, prior_prob)
Name |
Type |
Description |
kb |
KnowledgeBase |
The knowledge base that defined this candidate. |
entity_hash |
int |
The hash of the entity's KB ID. |
entity_freq |
float |
The entity frequency as recorded in the KB. |
alias_hash |
int |
The hash of the textual mention or alias. |
prior_prob |
float |
The prior probability of the alias referring to the entity |
Candidate attributes
Name |
Type |
Description |
entity |
int |
The entity's unique KB identifier |
entity_ |
str |
The entity's unique KB identifier |
alias |
int |
The alias or textual mention |
alias_ |
str |
The alias or textual mention |
prior_prob |
long |
The prior probability of the alias referring to the entity |
entity_freq |
long |
The frequency of the entity in a typical corpus |
entity_vector |
vector |
The pretrained vector of the entity |