EL architectures documentation

This commit is contained in:
svlandeg 2020-08-06 17:41:26 +02:00
parent f396f091dc
commit e8fd0c1f1e
2 changed files with 60 additions and 6 deletions

View File

@ -158,9 +158,21 @@ architectures into your training config.
## Entity linking architectures {#entitylinker source="spacy/ml/models/entity_linker.py"}
An Entity Linker component disambiguates textual mentions (tagged as named
entities) to unique identifiers, grounding the named entities into the "real
world". This requires 3 main components:
- A [`KnowledgeBase`](/api/kb) (KB) holding the unique identifiers, potential
synonyms and prior probabilities.
- A candidate generation step to produce a set of likely identifiers, given a
certain textual mention.
- A Machine learning [`Model`](https://thinc.ai/docs/api-model) that picks the
most plausible ID from the set of candidates.
### spacy.EntityLinker.v1 {#EntityLinker}
<!-- TODO: intro -->
The `EntityLinker` model architecture is a `Thinc` `Model` with a Linear output
layer.
> #### Example Config
>
@ -170,10 +182,46 @@ architectures into your training config.
> nO = null
>
> [model.tok2vec]
> # ...
> @architectures = "spacy.HashEmbedCNN.v1"
> pretrained_vectors = null
> width = 96
> depth = 2
> embed_size = 300
> window_size = 1
> maxout_pieces = 3
> subword_features = true
> dropout = null
>
> [kb_loader]
> @assets = "spacy.EmptyKB.v1"
> entity_vector_length = 64
>
> [get_candidates]
> @assets = "spacy.CandidateGenerator.v1"
> ```
| Name | Type | Description |
| --------- | ------------------------------------------ | ----------- |
| `tok2vec` | [`Model`](https://thinc.ai/docs/api-model) | |
| `nO` | int | |
| Name | Type | Description |
| --------- | ------------------------------------------ | ---------------------------------------------------------------------------------------- |
| `tok2vec` | [`Model`](https://thinc.ai/docs/api-model) | The [`tok2vec`](#tok2vec) layer of the model. |
| `nO` | int | Output dimension, determined by the length of the vectors encoding each entity in the KB |
If the `nO` dimension is not set, the Entity Linking component will set it when
`begin_training` is called.
### spacy.EmptyKB.v1 {#EmptyKB}
A function that creates a default, empty Knowledge Base from a [`Vocab`](/api/vocab) instance.
| Name | Type | Description |
| ---------------------- | ---- | -------------------------------------------------------- |
| `entity_vector_length` | int | The length of the vectors encoding each entity in the KB - 64 by default. |
### spacy.CandidateGenerator.v1 {#CandidateGenerator}
A function that takes as input a [`KnowledgeBase`](/api/kb) and a [`Span`](/api/span) object denoting a
named entity, and returns a list of plausible
[`Candidate` objects](/api/kb/#candidate_init).
The default `CandidateGenerator` simply uses the text of a mention to find its
potential aliases in the Knowledgebase. Note that this function is
case-dependent.

View File

@ -9,6 +9,12 @@ api_string_name: entity_linker
api_trainable: true
---
An Entity Linker component disambiguates textual mentions (tagged as named
entities) to unique identifiers, grounding the named entities into the "real
world". It requires a Knowledge base, a function to generate plausible
candidates from that Knowledge base given a certain textual mention, and a ML
model to pick the right candidate, given the local context of the mention.
## Config and implementation {#config}
The default config is defined by the pipeline component factory and describes