Few more updates to the EL documentation

2025-07-12 09:12:21 +03:00 · 2020-04-30 10:17:06 +02:00 · 2020-04-30 10:17:06 +02:00 · ebaed7dcfa
commit ebaed7dcfa
parent 8602daba85
2 changed files with 24 additions and 23 deletions
--- a/examples/training/train_entity_linker.py
+++ b/examples/training/train_entity_linker.py
@ -64,7 +64,7 @@ def main(kb_path, vocab_path=None, output_dir=None, n_iter=50):
    """Create a blank model with the specified vocab, set up the pipeline and train the entity linker.
    The `vocab` should be the one used during creation of the KB."""
    vocab = Vocab().from_disk(vocab_path)
-    # create blank Language class with correct vocab
+    # create blank English model with correct vocab
    nlp = spacy.blank("en", vocab=vocab)
    nlp.vocab.vectors.name = "spacy_pretrained_vectors"
    print("Created blank 'en' model with vocab from '%s'" % vocab_path)
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -619,25 +619,24 @@ https://github.com/explosion/spaCy/tree/master/examples/training/create_kb.py
 #### Step by step guide {#step-by-step-kb}
-1. **Load the model** you want to start with, or create an **empty model** using
+1. **Load the model** you want to start with. It should contain pretrained word
-   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and
+   vectors.
-   a pre-defined [`vocab`](/api/vocab) object.
+2. **Obtain the entity embeddings** by running the descriptions of the entities
-2. **Pretrain the entity embeddings** by running the descriptions of the
+   through the `nlp` model and taking the average of all words with
-   entities through a simple encoder-decoder network. The current implementation
+   `nlp(desc).vector`. At this point, a custom encoding step can also be used.
-   requires the `nlp` model to have access to pretrained word embeddings, but a
+3. **Construct the KB** by defining all entities with their embeddings, and all
-   custom implementation of this encoding step can also be used.
+   aliases with their prior probabilities.
 3. **Construct the KB** by defining all entities with their pretrained vectors,
   and all aliases with their prior probabilities.
 4. **Save** the KB using [`kb.dump`](/api/kb#dump).
-5. **Test** the KB to make sure the entities were added correctly.
+5. **Print** the contents of the KB to make sure the entities were added
   correctly.
 ### Training an entity linking model {#entity-linker-model}
 This example shows how to create an entity linker pipe using a previously
-created knowledge base. The entity linker pipe is then trained with your own
+created knowledge base. The entity linker is then trained with a set of custom
-examples. To do so, you'll need to provide **example texts**, and the
+examples. To do so, you need to provide **example texts**, and the **character
-**character offsets** and **knowledge base identifiers** of each entity
+offsets** and **knowledge base identifiers** of each entity contained in the
-contained in the texts.
+texts.
 ```python
 https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
@ -647,14 +646,16 @@ https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_li
 1. **Load the KB** you want to start with, and specify the path to the `Vocab`
   object that was used to create this KB. Then, create an **empty model** using
-   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language.
+   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language. Add
-   Don't forget to add the KB to the entity linker, and to add the entity linker
+   a component for recognizing sentences en one for identifying relevant
-   to the pipeline. In practical applications, you will want a more advanced
+   entities. In practical applications, you will want a more advanced pipeline
-   pipeline including also a component for
+   including also a component for
-   [named entity recognition](/usage/training#ner). If you're using a model with
+   [named entity recognition](/usage/training#ner). Then, create a new entity
-   additional components, make sure to disable all other pipeline components
+   linker component, add the KB to it, and then add the entity linker to the
-   during training using [`nlp.disable_pipes`](/api/language#disable_pipes).
+   pipeline. If you're using a model with additional components, make sure to
-   This way, you'll only be training the entity linker.
+   disable all other pipeline components during training using
   [`nlp.disable_pipes`](/api/language#disable_pipes). This way, you'll only be
   training the entity linker.
 2. **Shuffle and loop over** the examples. For each example, **update the
   model** by calling [`nlp.update`](/api/language#update), which steps through
   the annotated examples of the input. For each combination of a mention in