title
teaser
tag
source
new
KnowledgeBase
A storage class for entities and aliases of a specific knowledge base (ontology)
class
spacy/kb.pyx
2.2
The KnowledgeBase
object provides a method to generate
Candidate
objects, which are plausible external
identifiers given a certain textual mention. Each such Candidate
holds
information from the relevant KB entities, such as its frequency in text and
possible aliases. Each entity in the knowledge base also has a pretrained entity
vector of a fixed size.
KnowledgeBase.__init__
Create the knowledge base.
Example
from spacy.kb import KnowledgeBase
vocab = nlp . vocab
kb = KnowledgeBase ( vocab = vocab , entity_vector_length = 64 )
Name
Type
Description
vocab
Vocab
A Vocab
object.
entity_vector_length
int
Length of the fixed-size entity vectors.
KnowledgeBase.entity_vector_length
The length of the fixed-size entity vectors in the knowledge base.
Name
Type
Description
RETURNS
int
Length of the fixed-size entity vectors.
KnowledgeBase.add_entity
Add an entity to the knowledge base, specifying its corpus frequency and entity
vector, which should be of length
entity_vector_length
.
Example
kb . add_entity ( entity = "Q42" , freq = 32 , entity_vector = vector1 )
kb . add_entity ( entity = "Q463035" , freq = 111 , entity_vector = vector2 )
Name
Type
Description
entity
str
The unique entity identifier
freq
float
The frequency of the entity in a typical corpus
entity_vector
vector
The pretrained vector of the entity
KnowledgeBase.set_entities
Define the full list of entities in the knowledge base, specifying the corpus
frequency and entity vector for each entity.
Example
kb . set_entities ( entity_list = [ "Q42" , "Q463035" ], freq_list = [ 32 , 111 ], vector_list = [ vector1 , vector2 ])
Name
Type
Description
entity_list
iterable
List of unique entity identifiers
freq_list
iterable
List of entity frequencies
vector_list
iterable
List of entity vectors
KnowledgeBase.add_alias
Add an alias or mention to the knowledge base, specifying its potential KB
identifiers and their prior probabilities. The entity identifiers should refer
to entities previously added with add_entity
or
set_entities
. The sum of the prior probabilities
should not exceed 1.
Example
kb . add_alias ( alias = "Douglas" , entities = [ "Q42" , "Q463035" ], probabilities = [ 0.6 , 0.3 ])
Name
Type
Description
alias
str
The textual mention or alias
entities
iterable
The potential entities that the alias may refer to
probabilities
iterable
The prior probabilities of each entity
KnowledgeBase.__len__
Get the total number of entities in the knowledge base.
Example
total_entities = len ( kb )
Name
Type
Description
RETURNS
int
The number of entities in the knowledge base.
KnowledgeBase.get_entity_strings
Get a list of all entity IDs in the knowledge base.
Example
all_entities = kb . get_entity_strings ()
Name
Type
Description
RETURNS
list
The list of entities in the knowledge base.
KnowledgeBase.get_size_aliases
Get the total number of aliases in the knowledge base.
Example
total_aliases = kb . get_size_aliases ()
Name
Type
Description
RETURNS
int
The number of aliases in the knowledge base.
KnowledgeBase.get_alias_strings
Get a list of all aliases in the knowledge base.
Example
all_aliases = kb . get_alias_strings ()
Name
Type
Description
RETURNS
list
The list of aliases in the knowledge base.
KnowledgeBase.get_candidates
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate
.
Example
candidates = kb . get_candidates ( "Douglas" )
Name
Type
Description
alias
str
The textual mention or alias
RETURNS
iterable
The list of relevant Candidate
objects
KnowledgeBase.get_vector
Given a certain entity ID, retrieve its pretrained entity vector.
Example
vector = kb . get_vector ( "Q42" )
Name
Type
Description
entity
str
The entity ID
RETURNS
vector
The entity vector
KnowledgeBase.get_prior_prob
Given a certain entity ID and a certain textual mention, retrieve the prior
probability of the fact that the mention links to the entity ID.
Example
probability = kb . get_prior_prob ( "Q42" , "Douglas" )
Name
Type
Description
entity
str
The entity ID
alias
str
The textual mention or alias
RETURNS
float
The prior probability of the alias
referring to the entity
KnowledgeBase.dump
Save the current state of the knowledge base to a directory.
Example
kb . dump ( loc )
Name
Type
Description
loc
str / Path
A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path
-like objects.
KnowledgeBase.load_bulk
Restore the state of the knowledge base from a given directory. Note that the
Vocab
should also be the same as the one used to create the KB.
Example
from spacy.kb import KnowledgeBase
from spacy.vocab import Vocab
vocab = Vocab () . from_disk ( "/path/to/vocab" )
kb = KnowledgeBase ( vocab = vocab , entity_vector_length = 64 )
kb . load_bulk ( "/path/to/kb" )
Name
Type
Description
loc
str / Path
A path to a directory. Paths may be either strings or Path
-like objects.
RETURNS
KnowledgeBase
The modified KnowledgeBase
object.
Candidate.__init__
Construct a Candidate
object. Usually this constructor is not called directly,
but instead these objects are returned by the
get_candidates
method of a KnowledgeBase
.
Example
from spacy.kb import Candidate
candidate = Candidate ( kb , entity_hash , entity_freq , entity_vector , alias_hash , prior_prob )
Name
Type
Description
kb
KnowledgeBase
The knowledge base that defined this candidate.
entity_hash
int
The hash of the entity's KB ID.
entity_freq
float
The entity frequency as recorded in the KB.
alias_hash
int
The hash of the textual mention or alias.
prior_prob
float
The prior probability of the alias
referring to the entity
Candidate attributes
Name
Type
Description
entity
int
The entity's unique KB identifier
entity_
str
The entity's unique KB identifier
alias
int
The alias or textual mention
alias_
str
The alias or textual mention
prior_prob
long
The prior probability of the alias
referring to the entity
entity_freq
long
The frequency of the entity in a typical corpus
entity_vector
vector
The pretrained vector of the entity