* Add implementation of batching + backwards compatibility fixes. Tests indicate issue with batch disambiguation for custom singular entity lookups. * Fix tests. Add distinction w.r.t. batch size. * Remove redundant and add new comments. * Adjust comments. Fix variable naming in EL prediction. * Fix mypy errors. * Remove KB entity type config option. Change return types of candidate retrieval functions to Iterable from Iterator. Fix various other issues. * Update spacy/pipeline/entity_linker.py Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Update spacy/pipeline/entity_linker.py Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Update spacy/kb_base.pyx Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Update spacy/kb_base.pyx Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Update spacy/pipeline/entity_linker.py Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Add error messages to NotImplementedErrors. Remove redundant comment. * Fix imports. * Remove redundant comments. * Rename KnowledgeBase to InMemoryLookupKB and BaseKnowledgeBase to KnowledgeBase. * Fix tests. * Update spacy/errors.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Move KB into subdirectory. * Adjust imports after KB move to dedicated subdirectory. * Fix config imports. * Move Candidate + retrieval functions to separate module. Fix other, small issues. * Fix docstrings and error message w.r.t. class names. Fix typing for candidate retrieval functions. * Update spacy/kb/kb_in_memory.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/models/entity_linker.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix typing. * Change typing of mentions to be Span instead of Union[Span, str]. * Update docs. * Update EntityLinker and _architecture docs. * Update website/docs/api/entitylinker.md Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Adjust message for E1046. * Re-add section for Candidate in kb.md, add reference to dedicated page. * Update docs and docstrings. * Re-add section + reference for KnowledgeBase.get_alias_candidates() in docs. * Update spacy/kb/candidate.pyx * Update spacy/kb/kb_in_memory.pyx * Update spacy/pipeline/legacy/entity_linker.py * Remove canididate.md. Remove mistakenly added config snippet in entity_linker.py. Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
		
			
				
	
	
	
		
			12 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	| title | teaser | tag | source | new | 
|---|---|---|---|---|
| InMemoryLookupKB | The default implementation of the KnowledgeBase interface. Stores all information in-memory. | class | spacy/kb/kb_in_memory.pyx | 3.5 | 
The InMemoryLookupKB class inherits from KnowledgeBase and
implements all of its methods. It stores all KB data in-memory and generates
Candidate objects by exactly matching mentions with
entity names. It's highly optimized for both a low memory footprint and speed of
retrieval.
InMemoryLookupKB.__init__
Create the knowledge base.
Example
from spacy.kb import KnowledgeBase vocab = nlp.vocab kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
| Name | Description | 
|---|---|
| vocab | The shared vocabulary. | 
| entity_vector_length | Length of the fixed-size entity vectors. | 
InMemoryLookupKB.entity_vector_length
The length of the fixed-size entity vectors in the knowledge base.
| Name | Description | 
|---|---|
| RETURNS | Length of the fixed-size entity vectors. | 
InMemoryLookupKB.add_entity
Add an entity to the knowledge base, specifying its corpus frequency and entity
vector, which should be of length
entity_vector_length.
Example
kb.add_entity(entity="Q42", freq=32, entity_vector=vector1) kb.add_entity(entity="Q463035", freq=111, entity_vector=vector2)
| Name | Description | 
|---|---|
| entity | The unique entity identifier. | 
| freq | The frequency of the entity in a typical corpus. | 
| entity_vector | The pretrained vector of the entity. | 
InMemoryLookupKB.set_entities
Define the full list of entities in the knowledge base, specifying the corpus frequency and entity vector for each entity.
Example
kb.set_entities(entity_list=["Q42", "Q463035"], freq_list=[32, 111], vector_list=[vector1, vector2])
| Name | Description | 
|---|---|
| entity_list | List of unique entity identifiers. | 
| freq_list | List of entity frequencies. | 
| vector_list | List of entity vectors. | 
InMemoryLookupKB.add_alias
Add an alias or mention to the knowledge base, specifying its potential KB
identifiers and their prior probabilities. The entity identifiers should refer
to entities previously added with add_entity
or set_entities. The sum of the prior
probabilities should not exceed 1. Note that an empty string can not be used as
alias.
Example
kb.add_alias(alias="Douglas", entities=["Q42", "Q463035"], probabilities=[0.6, 0.3])
| Name | Description | 
|---|---|
| alias | The textual mention or alias. Can not be the empty string. | 
| entities | The potential entities that the alias may refer to. | 
| probabilities | The prior probabilities of each entity. | 
InMemoryLookupKB.__len__
Get the total number of entities in the knowledge base.
Example
total_entities = len(kb)
| Name | Description | 
|---|---|
| RETURNS | The number of entities in the knowledge base. | 
InMemoryLookupKB.get_entity_strings
Get a list of all entity IDs in the knowledge base.
Example
all_entities = kb.get_entity_strings()
| Name | Description | 
|---|---|
| RETURNS | The list of entities in the knowledge base. | 
InMemoryLookupKB.get_size_aliases
Get the total number of aliases in the knowledge base.
Example
total_aliases = kb.get_size_aliases()
| Name | Description | 
|---|---|
| RETURNS | The number of aliases in the knowledge base. | 
InMemoryLookupKB.get_alias_strings
Get a list of all aliases in the knowledge base.
Example
all_aliases = kb.get_alias_strings()
| Name | Description | 
|---|---|
| RETURNS | The list of aliases in the knowledge base. | 
InMemoryLookupKB.get_candidates
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate. Wraps
get_alias_candidates().
Example
from spacy.lang.en import English nlp = English() doc = nlp("Douglas Adams wrote 'The Hitchhiker's Guide to the Galaxy'.") candidates = kb.get_candidates(doc[0:2])
| Name | Description | 
|---|---|
| mention | The textual mention or alias. | 
| RETURNS | An iterable of relevant Candidateobjects. | 
InMemoryLookupKB.get_candidates_batch
Same as get_candidates(), but for an
arbitrary number of mentions. The EntityLinker component
will call get_candidates_batch() instead of get_candidates(), if the config
parameter candidates_batch_size is greater or equal than 1.
The default implementation of get_candidates_batch() executes
get_candidates() in a loop. We recommend implementing a more efficient way to
retrieve candidates for multiple mentions at once, if performance is of concern
to you.
Example
from spacy.lang.en import English nlp = English() doc = nlp("Douglas Adams wrote 'The Hitchhiker's Guide to the Galaxy'.") candidates = kb.get_candidates((doc[0:2], doc[3:]))
| Name | Description | 
|---|---|
| mentions | The textual mention or alias. | 
| RETURNS | An iterable of iterable with relevant Candidateobjects. | 
InMemoryLookupKB.get_alias_candidates
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate.
Example
candidates = kb.get_alias_candidates("Douglas")
| Name | Description | 
|---|---|
| alias | The textual mention or alias. | 
| RETURNS | The list of relevant Candidateobjects. | 
InMemoryLookupKB.get_vector
Given a certain entity ID, retrieve its pretrained entity vector.
Example
vector = kb.get_vector("Q42")
| Name | Description | 
|---|---|
| entity | The entity ID. | 
| RETURNS | The entity vector. | 
InMemoryLookupKB.get_vectors
Same as get_vector(), but for an arbitrary
number of entity IDs.
The default implementation of get_vectors() executes get_vector() in a loop.
We recommend implementing a more efficient way to retrieve vectors for multiple
entities at once, if performance is of concern to you.
Example
vectors = kb.get_vectors(("Q42", "Q3107329"))
| Name | Description | 
|---|---|
| entities | The entity IDs. | 
| RETURNS | The entity vectors. | 
InMemoryLookupKB.get_prior_prob
Given a certain entity ID and a certain textual mention, retrieve the prior probability of the fact that the mention links to the entity ID.
Example
probability = kb.get_prior_prob("Q42", "Douglas")
| Name | Description | 
|---|---|
| entity | The entity ID. | 
| alias | The textual mention or alias. | 
| RETURNS | The prior probability of the aliasreferring to theentity. | 
InMemoryLookupKB.to_disk
Save the current state of the knowledge base to a directory.
Example
kb.to_disk(path)
| Name | Description | 
|---|---|
| path | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. | 
| exclude | List of components to exclude. | 
InMemoryLookupKB.from_disk
Restore the state of the knowledge base from a given directory. Note that the
Vocab should also be the same as the one used to create the KB.
Example
from spacy.vocab import Vocab vocab = Vocab().from_disk("/path/to/vocab") kb = FullyImplementedKB(vocab=vocab, entity_vector_length=64) kb.from_disk("/path/to/kb")
| Name | Description | 
|---|---|
| loc | A path to a directory. Paths may be either strings or Path-like objects. | 
| exclude | List of components to exclude. | 
| RETURNS | The modified KnowledgeBaseobject. |