Sofie Van Landeghem 0b4b4f1819 Documentation for Entity Linking (#4065)
* document token ent_kb_id

* document span kb_id

* update pipeline documentation

* prior and context weights as bool's instead

* entitylinker api documentation

* drop for both models

* finish entitylinker documentation

* small fixes

* documentation for KB

* candidate documentation

* links to api pages in code

* small fix

* frequency examples as counts for consistency

* consistent documentation about tensors returned by predict

* add entity linking to usage 101

* add entity linking infobox and KB section to 101

* entity-linking in linguistic features

* small typo corrections

* training example and docs for entity_linker

* predefined nlp and kb

* revert back to similarity encodings for simplicity (for now)

* set prior probabilities to 0 when excluded

* code clean up

* bugfix: deleting kb ID from tokens when entities were removed

* refactor train el example to use either model or vocab

* pretrain_kb example for example kb generation

* add to training docs for KB + EL example scripts

* small fixes

* error numbering

* ensure the language of vocab and nlp stay consistent across serialization

* equality with =

* avoid conflict in errors file

* add error 151

* final adjustements to the train scripts - consistency

* update of goldparse documentation

* small corrections

* push commit

* typo fix

* add candidate API to kb documentation

* update API sidebar with EntityLinker and KnowledgeBase

* remove EL from 101 docs

* remove entity linker from 101 pipelines / rephrase

* custom el model instead of existing model

* set version to 2.2 for EL functionality

* update documentation for 2 CLI scripts
2019-09-12 11:38:34 +02:00

11 KiB

title teaser tag source new
KnowledgeBase A storage class for entities and aliases of a specific knowledge base (ontology) class spacy/kb.pyx 2.2

The KnowledgeBase object provides a method to generate Candidate objects, which are plausible external identifiers given a certain textual mention. Each such Candidate holds information from the relevant KB entities, such as its frequency in text and possible aliases. Each entity in the knowledge base also has a pre-trained entity vector of a fixed size.


Create the knowledge base.


from spacy.kb import KnowledgeBase
vocab = nlp.vocab
kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
Name Type Description
vocab Vocab A Vocab object.
entity_vector_length int Length of the fixed-size entity vectors.
RETURNS KnowledgeBase The newly constructed object.


The length of the fixed-size entity vectors in the knowledge base.

Name Type Description
RETURNS int Length of the fixed-size entity vectors.


Add an entity to the knowledge base, specifying its corpus frequency and entity vector, which should be of length entity_vector_length.


kb.add_entity(entity="Q42", freq=32, entity_vector=vector1)
kb.add_entity(entity="Q463035", freq=111, entity_vector=vector2)
Name Type Description
entity unicode The unique entity identifier
freq float The frequency of the entity in a typical corpus
entity_vector vector The pre-trained vector of the entity


Define the full list of entities in the knowledge base, specifying the corpus frequency and entity vector for each entity.


kb.set_entities(entity_list=["Q42", "Q463035"], freq_list=[32, 111], vector_list=[vector1, vector2])
Name Type Description
entity_list iterable List of unique entity identifiers
freq_list iterable List of entity frequencies
vector_list iterable List of entity vectors


Add an alias or mention to the knowledge base, specifying its potential KB identifiers and their prior probabilities. The entity identifiers should refer to entities previously added with add_entity or set_entities. The sum of the prior probabilities should not exceed 1.


kb.add_alias(alias="Douglas", entities=["Q42", "Q463035"], probabilities=[0.6, 0.3])
Name Type Description
alias unicode The textual mention or alias
entities iterable The potential entities that the alias may refer to
probabilities iterable The prior probabilities of each entity


Get the total number of entities in the knowledge base.


total_entities = len(kb)
Name Type Description
RETURNS int The number of entities in the knowledge base.


Get a list of all entity IDs in the knowledge base.


all_entities = kb.get_entity_strings()
Name Type Description
RETURNS list The list of entities in the knowledge base.


Get the total number of aliases in the knowledge base.


total_aliases = kb.get_size_aliases()
Name Type Description
RETURNS int The number of aliases in the knowledge base.


Get a list of all aliases in the knowledge base.


all_aliases = kb.get_alias_strings()
Name Type Description
RETURNS list The list of aliases in the knowledge base.


Given a certain textual mention as input, retrieve a list of candidate entities of type Candidate.


candidates = kb.get_candidates("Douglas")
Name Type Description
alias unicode The textual mention or alias
RETURNS iterable The list of relevant Candidate objects


Given a certain entity ID, retrieve its pre-trained entity vector.


vector = kb.get_vector("Q42")
Name Type Description
entity unicode The entity ID
RETURNS vector The entity vector


Given a certain entity ID and a certain textual mention, retrieve the prior probability of the fact that the mention links to the entity ID.


probability = kb.get_prior_prob("Q42", "Douglas")
Name Type Description
entity unicode The entity ID
alias unicode The textual mention or alias
RETURNS float The prior probability of the alias referring to the entity


Save the current state of the knowledge base to a directory.


Name Type Description
loc unicode / Path A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.


Restore the state of the knowledge base from a given directory. Note that the Vocab should also be the same as the one used to create the KB.


from spacy.kb import KnowledgeBase
from spacy.vocab import Vocab
vocab = Vocab().from_disk("/path/to/vocab")
kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
Name Type Description
loc unicode / Path A path to a directory. Paths may be either strings or Path-like objects.
RETURNS KnowledgeBase The modified KnowledgeBase object.


Construct a Candidate object. Usually this constructor is not called directly, but instead these objects are returned by the get_candidates method of a KnowledgeBase.


from spacy.kb import Candidate
candidate = Candidate(kb, entity_hash, entity_freq, entity_vector, alias_hash, prior_prob)
Name Type Description
kb KnowledgeBase The knowledge base that defined this candidate.
entity_hash int The hash of the entity's KB ID.
entity_freq float The entity frequency as recorded in the KB.
alias_hash int The hash of the textual mention or alias.
prior_prob float The prior probability of the alias referring to the entity
RETURNS Candidate The newly constructed object.

Candidate attributes

Name Type Description
entity int The entity's unique KB identifier
entity_ unicode The entity's unique KB identifier
alias int The alias or textual mention
alias_ unicode The alias or textual mention
prior_prob long The prior probability of the alias referring to the entity
entity_freq long The frequency of the entity in a typical corpus
entity_vector vector The pre-trained vector of the entity