Few more updates to the EL documentation

This commit is contained in:
svlandeg 2020-04-30 10:17:06 +02:00
parent 8602daba85
commit ebaed7dcfa
2 changed files with 24 additions and 23 deletions

View File

@ -64,7 +64,7 @@ def main(kb_path, vocab_path=None, output_dir=None, n_iter=50):
"""Create a blank model with the specified vocab, set up the pipeline and train the entity linker.
The `vocab` should be the one used during creation of the KB."""
vocab = Vocab().from_disk(vocab_path)
# create blank Language class with correct vocab
# create blank English model with correct vocab
nlp = spacy.blank("en", vocab=vocab)
nlp.vocab.vectors.name = "spacy_pretrained_vectors"
print("Created blank 'en' model with vocab from '%s'" % vocab_path)

View File

@ -619,25 +619,24 @@ https://github.com/explosion/spaCy/tree/master/examples/training/create_kb.py
#### Step by step guide {#step-by-step-kb}
1. **Load the model** you want to start with, or create an **empty model** using
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and
a pre-defined [`vocab`](/api/vocab) object.
2. **Pretrain the entity embeddings** by running the descriptions of the
entities through a simple encoder-decoder network. The current implementation
requires the `nlp` model to have access to pretrained word embeddings, but a
custom implementation of this encoding step can also be used.
3. **Construct the KB** by defining all entities with their pretrained vectors,
and all aliases with their prior probabilities.
1. **Load the model** you want to start with. It should contain pretrained word
vectors.
2. **Obtain the entity embeddings** by running the descriptions of the entities
through the `nlp` model and taking the average of all words with
`nlp(desc).vector`. At this point, a custom encoding step can also be used.
3. **Construct the KB** by defining all entities with their embeddings, and all
aliases with their prior probabilities.
4. **Save** the KB using [`kb.dump`](/api/kb#dump).
5. **Test** the KB to make sure the entities were added correctly.
5. **Print** the contents of the KB to make sure the entities were added
correctly.
### Training an entity linking model {#entity-linker-model}
This example shows how to create an entity linker pipe using a previously
created knowledge base. The entity linker pipe is then trained with your own
examples. To do so, you'll need to provide **example texts**, and the
**character offsets** and **knowledge base identifiers** of each entity
contained in the texts.
created knowledge base. The entity linker is then trained with a set of custom
examples. To do so, you need to provide **example texts**, and the **character
offsets** and **knowledge base identifiers** of each entity contained in the
texts.
```python
https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
@ -647,14 +646,16 @@ https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_li
1. **Load the KB** you want to start with, and specify the path to the `Vocab`
object that was used to create this KB. Then, create an **empty model** using
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language.
Don't forget to add the KB to the entity linker, and to add the entity linker
to the pipeline. In practical applications, you will want a more advanced
pipeline including also a component for
[named entity recognition](/usage/training#ner). If you're using a model with
additional components, make sure to disable all other pipeline components
during training using [`nlp.disable_pipes`](/api/language#disable_pipes).
This way, you'll only be training the entity linker.
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language. Add
a component for recognizing sentences en one for identifying relevant
entities. In practical applications, you will want a more advanced pipeline
including also a component for
[named entity recognition](/usage/training#ner). Then, create a new entity
linker component, add the KB to it, and then add the entity linker to the
pipeline. If you're using a model with additional components, make sure to
disable all other pipeline components during training using
[`nlp.disable_pipes`](/api/language#disable_pipes). This way, you'll only be
training the entity linker.
2. **Shuffle and loop over** the examples. For each example, **update the
model** by calling [`nlp.update`](/api/language#update), which steps through
the annotated examples of the input. For each combination of a mention in