* candidate generator as separate part of EL config * update comment * ent instead of str as input for candidate generation * Span instead of str: correct type indication * fix types * unit test to create new candidate generator * fix replace_pipe argument passing * move error message, general cleanup * add vocab back to KB constructor * provide KB as callable from Vocab arg * rename to kb_loader, fix KB serialization as part of the EL pipe * fix typo * reformatting * cleanup * fix comment * fix wrongly duplicated code from merge conflict * rename dump to to_disk * from_disk instead of load_bulk * update test after recent removal of set_morphology in tagger * remove old doc
11 KiB
title | teaser | tag | source | new |
---|---|---|---|---|
KnowledgeBase | A storage class for entities and aliases of a specific knowledge base (ontology) | class | spacy/kb.pyx | 2.2 |
The KnowledgeBase
object provides a method to generate
Candidate
objects, which are plausible external
identifiers given a certain textual mention. Each such Candidate
holds
information from the relevant KB entities, such as its frequency in text and
possible aliases. Each entity in the knowledge base also has a pretrained entity
vector of a fixed size.
KnowledgeBase.__init__
Create the knowledge base.
Example
from spacy.kb import KnowledgeBase vocab = nlp.vocab kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
Name | Description |
---|---|
vocab |
The shared vocabulary. |
entity_vector_length |
Length of the fixed-size entity vectors. |
KnowledgeBase.entity_vector_length
The length of the fixed-size entity vectors in the knowledge base.
Name | Description |
---|---|
RETURNS | Length of the fixed-size entity vectors. |
KnowledgeBase.add_entity
Add an entity to the knowledge base, specifying its corpus frequency and entity
vector, which should be of length
entity_vector_length
.
Example
kb.add_entity(entity="Q42", freq=32, entity_vector=vector1) kb.add_entity(entity="Q463035", freq=111, entity_vector=vector2)
Name | Description |
---|---|
entity |
The unique entity identifier. |
freq |
The frequency of the entity in a typical corpus. |
entity_vector |
The pretrained vector of the entity. |
KnowledgeBase.set_entities
Define the full list of entities in the knowledge base, specifying the corpus frequency and entity vector for each entity.
Example
kb.set_entities(entity_list=["Q42", "Q463035"], freq_list=[32, 111], vector_list=[vector1, vector2])
Name | Description |
---|---|
entity_list |
List of unique entity identifiers. |
freq_list |
List of entity frequencies. |
vector_list |
List of entity vectors. |
KnowledgeBase.add_alias
Add an alias or mention to the knowledge base, specifying its potential KB
identifiers and their prior probabilities. The entity identifiers should refer
to entities previously added with add_entity
or
set_entities
. The sum of the prior probabilities
should not exceed 1.
Example
kb.add_alias(alias="Douglas", entities=["Q42", "Q463035"], probabilities=[0.6, 0.3])
Name | Description |
---|---|
alias |
The textual mention or alias. |
entities |
The potential entities that the alias may refer to. |
probabilities |
The prior probabilities of each entity. |
KnowledgeBase.__len__
Get the total number of entities in the knowledge base.
Example
total_entities = len(kb)
Name | Description |
---|---|
RETURNS | The number of entities in the knowledge base. |
KnowledgeBase.get_entity_strings
Get a list of all entity IDs in the knowledge base.
Example
all_entities = kb.get_entity_strings()
Name | Description |
---|---|
RETURNS | The list of entities in the knowledge base. |
KnowledgeBase.get_size_aliases
Get the total number of aliases in the knowledge base.
Example
total_aliases = kb.get_size_aliases()
Name | Description |
---|---|
RETURNS | The number of aliases in the knowledge base. |
KnowledgeBase.get_alias_strings
Get a list of all aliases in the knowledge base.
Example
all_aliases = kb.get_alias_strings()
Name | Description |
---|---|
RETURNS | The list of aliases in the knowledge base. |
KnowledgeBase.get_candidates
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate
.
Example
candidates = kb.get_candidates("Douglas")
Name | Description |
---|---|
alias |
The textual mention or alias. |
RETURNS | iterable |
KnowledgeBase.get_vector
Given a certain entity ID, retrieve its pretrained entity vector.
Example
vector = kb.get_vector("Q42")
Name | Description |
---|---|
entity |
The entity ID. |
RETURNS | The entity vector. |
KnowledgeBase.get_prior_prob
Given a certain entity ID and a certain textual mention, retrieve the prior probability of the fact that the mention links to the entity ID.
Example
probability = kb.get_prior_prob("Q42", "Douglas")
Name | Description |
---|---|
entity |
The entity ID. |
alias |
The textual mention or alias. |
RETURNS | The prior probability of the alias referring to the entity . |
KnowledgeBase.to_disk
Save the current state of the knowledge base to a directory.
Example
kb.to_disk(loc)
Name | Description |
---|---|
loc |
A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path -like objects. |
KnowledgeBase.from_disk
Restore the state of the knowledge base from a given directory. Note that the
Vocab
should also be the same as the one used to create the KB.
Example
from spacy.kb import KnowledgeBase from spacy.vocab import Vocab vocab = Vocab().from_disk("/path/to/vocab") kb = KnowledgeBase(vocab=vocab, entity_vector_length=64) kb.from_disk("/path/to/kb")
Name | Description |
---|---|
loc |
A path to a directory. Paths may be either strings or Path -like objects. |
RETURNS | The modified KnowledgeBase object. |
Candidate
A Candidate
object refers to a textual mention (alias) that may or may not be
resolved to a specific entity from a KnowledgeBase
. This will be used as input
for the entity linking algorithm which will disambiguate the various candidates
to the correct one. Each candidate (alias, entity)
pair is assigned to a
certain prior probability.
Candidate.__init__
Construct a Candidate
object. Usually this constructor is not called directly,
but instead these objects are returned by the
get_candidates
method of a KnowledgeBase
.
Example
from spacy.kb import Candidate candidate = Candidate(kb, entity_hash, entity_freq, entity_vector, alias_hash, prior_prob)
Name | Description |
---|---|
kb |
The knowledge base that defined this candidate. |
entity_hash |
The hash of the entity's KB ID. |
entity_freq |
The entity frequency as recorded in the KB. |
alias_hash |
The hash of the textual mention or alias. |
prior_prob |
The prior probability of the alias referring to the entity . |
Candidate attributes
Name | Description |
---|---|
entity |
The entity's unique KB identifier. |
entity_ |
The entity's unique KB identifier. |
alias |
The alias or textual mention. |
alias_ |
The alias or textual mention. |
prior_prob |
The prior probability of the alias referring to the entity . |
entity_freq |
The frequency of the entity in a typical corpus. |
entity_vector |
The pretrained vector of the entity. |