Few more updates to the EL documentation

2025-07-31 10:29:46 +03:00 · 2020-04-30 10:17:06 +02:00 · 2020-04-30 10:17:06 +02:00 · ebaed7dcfa
commit ebaed7dcfa
parent 8602daba85
2 changed files with 24 additions and 23 deletions
--- a/examples/training/train_entity_linker.py
+++ b/examples/training/train_entity_linker.py
@ -64,7 +64,7 @@ def main(kb_path, vocab_path=None, output_dir=None, n_iter=50):
    """Create a blank model with the specified vocab, set up the pipeline and train the entity linker.
    The `vocab` should be the one used during creation of the KB."""
    vocab = Vocab().from_disk(vocab_path)
-    # create blank Language class with correct vocab
+    # create blank English model with correct vocab
    nlp = spacy.blank("en", vocab=vocab)
    nlp.vocab.vectors.name = "spacy_pretrained_vectors"
    print("Created blank 'en' model with vocab from '%s'" % vocab_path)
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -619,25 +619,24 @@ https://github.com/explosion/spaCy/tree/master/examples/training/create_kb.py

 #### Step by step guide {#step-by-step-kb}

-1. **Load the model** you want to start with, or create an **empty model** using
-   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and
-   a pre-defined [`vocab`](/api/vocab) object.
-2. **Pretrain the entity embeddings** by running the descriptions of the
-   entities through a simple encoder-decoder network. The current implementation
-   requires the `nlp` model to have access to pretrained word embeddings, but a
-   custom implementation of this encoding step can also be used.
-3. **Construct the KB** by defining all entities with their pretrained vectors,
-   and all aliases with their prior probabilities.
+1. **Load the model** you want to start with. It should contain pretrained word
+   vectors.
+2. **Obtain the entity embeddings** by running the descriptions of the entities
+   through the `nlp` model and taking the average of all words with
+   `nlp(desc).vector`. At this point, a custom encoding step can also be used.
+3. **Construct the KB** by defining all entities with their embeddings, and all
+   aliases with their prior probabilities.
 4. **Save** the KB using [`kb.dump`](/api/kb#dump).
-5. **Test** the KB to make sure the entities were added correctly.
+5. **Print** the contents of the KB to make sure the entities were added
+   correctly.

 ### Training an entity linking model {#entity-linker-model}

 This example shows how to create an entity linker pipe using a previously
-created knowledge base. The entity linker pipe is then trained with your own
-examples. To do so, you'll need to provide **example texts**, and the
-**character offsets** and **knowledge base identifiers** of each entity
-contained in the texts.
+created knowledge base. The entity linker is then trained with a set of custom
+examples. To do so, you need to provide **example texts**, and the **character
+offsets** and **knowledge base identifiers** of each entity contained in the
+texts.

 ```python
 https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
@ -647,14 +646,16 @@ https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_li

 1. **Load the KB** you want to start with, and specify the path to the `Vocab`
   object that was used to create this KB. Then, create an **empty model** using
-   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language.
-   Don't forget to add the KB to the entity linker, and to add the entity linker
-   to the pipeline. In practical applications, you will want a more advanced
-   pipeline including also a component for
-   [named entity recognition](/usage/training#ner). If you're using a model with
-   additional components, make sure to disable all other pipeline components
-   during training using [`nlp.disable_pipes`](/api/language#disable_pipes).
-   This way, you'll only be training the entity linker.
+   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language. Add
+   a component for recognizing sentences en one for identifying relevant
+   entities. In practical applications, you will want a more advanced pipeline
+   including also a component for
+   [named entity recognition](/usage/training#ner). Then, create a new entity
+   linker component, add the KB to it, and then add the entity linker to the
+   pipeline. If you're using a model with additional components, make sure to
+   disable all other pipeline components during training using
+   [`nlp.disable_pipes`](/api/language#disable_pipes). This way, you'll only be
+   training the entity linker.
 2. **Shuffle and loop over** the examples. For each example, **update the
   model** by calling [`nlp.update`](/api/language#update), which steps through
   the annotated examples of the input. For each combination of a mention in