Few more updates to the EL documentation

This commit is contained in:
svlandeg 2020-04-30 10:17:06 +02:00
parent 8602daba85
commit ebaed7dcfa
2 changed files with 24 additions and 23 deletions

View File

@ -64,7 +64,7 @@ def main(kb_path, vocab_path=None, output_dir=None, n_iter=50):
"""Create a blank model with the specified vocab, set up the pipeline and train the entity linker. """Create a blank model with the specified vocab, set up the pipeline and train the entity linker.
The `vocab` should be the one used during creation of the KB.""" The `vocab` should be the one used during creation of the KB."""
vocab = Vocab().from_disk(vocab_path) vocab = Vocab().from_disk(vocab_path)
# create blank Language class with correct vocab # create blank English model with correct vocab
nlp = spacy.blank("en", vocab=vocab) nlp = spacy.blank("en", vocab=vocab)
nlp.vocab.vectors.name = "spacy_pretrained_vectors" nlp.vocab.vectors.name = "spacy_pretrained_vectors"
print("Created blank 'en' model with vocab from '%s'" % vocab_path) print("Created blank 'en' model with vocab from '%s'" % vocab_path)

View File

@ -619,25 +619,24 @@ https://github.com/explosion/spaCy/tree/master/examples/training/create_kb.py
#### Step by step guide {#step-by-step-kb} #### Step by step guide {#step-by-step-kb}
1. **Load the model** you want to start with, or create an **empty model** using 1. **Load the model** you want to start with. It should contain pretrained word
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and vectors.
a pre-defined [`vocab`](/api/vocab) object. 2. **Obtain the entity embeddings** by running the descriptions of the entities
2. **Pretrain the entity embeddings** by running the descriptions of the through the `nlp` model and taking the average of all words with
entities through a simple encoder-decoder network. The current implementation `nlp(desc).vector`. At this point, a custom encoding step can also be used.
requires the `nlp` model to have access to pretrained word embeddings, but a 3. **Construct the KB** by defining all entities with their embeddings, and all
custom implementation of this encoding step can also be used. aliases with their prior probabilities.
3. **Construct the KB** by defining all entities with their pretrained vectors,
and all aliases with their prior probabilities.
4. **Save** the KB using [`kb.dump`](/api/kb#dump). 4. **Save** the KB using [`kb.dump`](/api/kb#dump).
5. **Test** the KB to make sure the entities were added correctly. 5. **Print** the contents of the KB to make sure the entities were added
correctly.
### Training an entity linking model {#entity-linker-model} ### Training an entity linking model {#entity-linker-model}
This example shows how to create an entity linker pipe using a previously This example shows how to create an entity linker pipe using a previously
created knowledge base. The entity linker pipe is then trained with your own created knowledge base. The entity linker is then trained with a set of custom
examples. To do so, you'll need to provide **example texts**, and the examples. To do so, you need to provide **example texts**, and the **character
**character offsets** and **knowledge base identifiers** of each entity offsets** and **knowledge base identifiers** of each entity contained in the
contained in the texts. texts.
```python ```python
https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
@ -647,14 +646,16 @@ https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_li
1. **Load the KB** you want to start with, and specify the path to the `Vocab` 1. **Load the KB** you want to start with, and specify the path to the `Vocab`
object that was used to create this KB. Then, create an **empty model** using object that was used to create this KB. Then, create an **empty model** using
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language. [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language. Add
Don't forget to add the KB to the entity linker, and to add the entity linker a component for recognizing sentences en one for identifying relevant
to the pipeline. In practical applications, you will want a more advanced entities. In practical applications, you will want a more advanced pipeline
pipeline including also a component for including also a component for
[named entity recognition](/usage/training#ner). If you're using a model with [named entity recognition](/usage/training#ner). Then, create a new entity
additional components, make sure to disable all other pipeline components linker component, add the KB to it, and then add the entity linker to the
during training using [`nlp.disable_pipes`](/api/language#disable_pipes). pipeline. If you're using a model with additional components, make sure to
This way, you'll only be training the entity linker. disable all other pipeline components during training using
[`nlp.disable_pipes`](/api/language#disable_pipes). This way, you'll only be
training the entity linker.
2. **Shuffle and loop over** the examples. For each example, **update the 2. **Shuffle and loop over** the examples. For each example, **update the
model** by calling [`nlp.update`](/api/language#update), which steps through model** by calling [`nlp.update`](/api/language#update), which steps through
the annotated examples of the input. For each combination of a mention in the annotated examples of the input. For each combination of a mention in