11 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	| title | teaser | tag | source | new | 
|---|---|---|---|---|
| KnowledgeBase | A storage class for entities and aliases of a specific knowledge base (ontology) | class | spacy/kb.pyx | 2.2 | 
The KnowledgeBase object provides a method to generate
Candidate objects, which are plausible external
identifiers given a certain textual mention. Each such Candidate holds
information from the relevant KB entities, such as its frequency in text and
possible aliases. Each entity in the knowledge base also has a pretrained entity
vector of a fixed size.
KnowledgeBase.__init__
Create the knowledge base.
Example
from spacy.kb import KnowledgeBase vocab = nlp.vocab kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
| Name | Description | 
|---|---|
vocab | 
The shared vocabulary.  | 
entity_vector_length | 
Length of the fixed-size entity vectors.  | 
KnowledgeBase.entity_vector_length
The length of the fixed-size entity vectors in the knowledge base.
| Name | Description | 
|---|---|
| RETURNS | Length of the fixed-size entity vectors.  | 
KnowledgeBase.add_entity
Add an entity to the knowledge base, specifying its corpus frequency and entity
vector, which should be of length
entity_vector_length.
Example
kb.add_entity(entity="Q42", freq=32, entity_vector=vector1) kb.add_entity(entity="Q463035", freq=111, entity_vector=vector2)
| Name | Description | 
|---|---|
entity | 
The unique entity identifier.  | 
freq | 
The frequency of the entity in a typical corpus.  | 
entity_vector | 
The pretrained vector of the entity.  | 
KnowledgeBase.set_entities
Define the full list of entities in the knowledge base, specifying the corpus frequency and entity vector for each entity.
Example
kb.set_entities(entity_list=["Q42", "Q463035"], freq_list=[32, 111], vector_list=[vector1, vector2])
| Name | Description | 
|---|---|
entity_list | 
List of unique entity identifiers.  | 
freq_list | 
List of entity frequencies.  | 
vector_list | 
List of entity vectors.  | 
KnowledgeBase.add_alias
Add an alias or mention to the knowledge base, specifying its potential KB
identifiers and their prior probabilities. The entity identifiers should refer
to entities previously added with add_entity or
set_entities. The sum of the prior probabilities
should not exceed 1. Note that an empty string can not be used as alias.
Example
kb.add_alias(alias="Douglas", entities=["Q42", "Q463035"], probabilities=[0.6, 0.3])
| Name | Description | 
|---|---|
alias | 
The textual mention or alias. Can not be the empty string.  | 
entities | 
The potential entities that the alias may refer to.  | 
probabilities | 
The prior probabilities of each entity.  | 
KnowledgeBase.__len__
Get the total number of entities in the knowledge base.
Example
total_entities = len(kb)
| Name | Description | 
|---|---|
| RETURNS | The number of entities in the knowledge base.  | 
KnowledgeBase.get_entity_strings
Get a list of all entity IDs in the knowledge base.
Example
all_entities = kb.get_entity_strings()
| Name | Description | 
|---|---|
| RETURNS | The list of entities in the knowledge base.  | 
KnowledgeBase.get_size_aliases
Get the total number of aliases in the knowledge base.
Example
total_aliases = kb.get_size_aliases()
| Name | Description | 
|---|---|
| RETURNS | The number of aliases in the knowledge base.  | 
KnowledgeBase.get_alias_strings
Get a list of all aliases in the knowledge base.
Example
all_aliases = kb.get_alias_strings()
| Name | Description | 
|---|---|
| RETURNS | The list of aliases in the knowledge base.  | 
KnowledgeBase.get_candidates
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate.
Example
candidates = kb.get_candidates("Douglas")
| Name | Description | 
|---|---|
alias | 
The textual mention or alias.  | 
| RETURNS | iterable | 
KnowledgeBase.get_vector
Given a certain entity ID, retrieve its pretrained entity vector.
Example
vector = kb.get_vector("Q42")
| Name | Description | 
|---|---|
entity | 
The entity ID.  | 
| RETURNS | The entity vector.  | 
KnowledgeBase.get_prior_prob
Given a certain entity ID and a certain textual mention, retrieve the prior probability of the fact that the mention links to the entity ID.
Example
probability = kb.get_prior_prob("Q42", "Douglas")
| Name | Description | 
|---|---|
entity | 
The entity ID.  | 
alias | 
The textual mention or alias.  | 
| RETURNS | The prior probability of the alias referring to the entity.  | 
KnowledgeBase.to_disk
Save the current state of the knowledge base to a directory.
Example
kb.to_disk(loc)
| Name | Description | 
|---|---|
loc | 
A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.  | 
KnowledgeBase.from_disk
Restore the state of the knowledge base from a given directory. Note that the
Vocab should also be the same as the one used to create the KB.
Example
from spacy.kb import KnowledgeBase from spacy.vocab import Vocab vocab = Vocab().from_disk("/path/to/vocab") kb = KnowledgeBase(vocab=vocab, entity_vector_length=64) kb.from_disk("/path/to/kb")
| Name | Description | 
|---|---|
loc | 
A path to a directory. Paths may be either strings or Path-like objects.  | 
| RETURNS | The modified KnowledgeBase object.  | 
Candidate
A Candidate object refers to a textual mention (alias) that may or may not be
resolved to a specific entity from a KnowledgeBase. This will be used as input
for the entity linking algorithm which will disambiguate the various candidates
to the correct one. Each candidate (alias, entity) pair is assigned to a
certain prior probability.
Candidate.__init__
Construct a Candidate object. Usually this constructor is not called directly,
but instead these objects are returned by the
get_candidates method of a KnowledgeBase.
Example
from spacy.kb import Candidate candidate = Candidate(kb, entity_hash, entity_freq, entity_vector, alias_hash, prior_prob)
| Name | Description | 
|---|---|
kb | 
The knowledge base that defined this candidate.  | 
entity_hash | 
The hash of the entity's KB ID.  | 
entity_freq | 
The entity frequency as recorded in the KB.  | 
alias_hash | 
The hash of the textual mention or alias.  | 
prior_prob | 
The prior probability of the alias referring to the entity.  | 
Candidate attributes
| Name | Description | 
|---|---|
entity | 
The entity's unique KB identifier.  | 
entity_ | 
The entity's unique KB identifier.  | 
alias | 
The alias or textual mention.  | 
alias_ | 
The alias or textual mention.  | 
prior_prob | 
The prior probability of the alias referring to the entity.  | 
entity_freq | 
The frequency of the entity in a typical corpus.  | 
entity_vector | 
The pretrained vector of the entity.  |