diff --git a/examples/training/train_entity_linker.py b/examples/training/train_entity_linker.py index c7eba8a30..3a8deb7a0 100644 --- a/examples/training/train_entity_linker.py +++ b/examples/training/train_entity_linker.py @@ -64,7 +64,7 @@ def main(kb_path, vocab_path=None, output_dir=None, n_iter=50): """Create a blank model with the specified vocab, set up the pipeline and train the entity linker. The `vocab` should be the one used during creation of the KB.""" vocab = Vocab().from_disk(vocab_path) - # create blank Language class with correct vocab + # create blank English model with correct vocab nlp = spacy.blank("en", vocab=vocab) nlp.vocab.vectors.name = "spacy_pretrained_vectors" print("Created blank 'en' model with vocab from '%s'" % vocab_path) diff --git a/website/docs/usage/training.md b/website/docs/usage/training.md index ecdc6720b..0be14df69 100644 --- a/website/docs/usage/training.md +++ b/website/docs/usage/training.md @@ -619,25 +619,24 @@ https://github.com/explosion/spaCy/tree/master/examples/training/create_kb.py #### Step by step guide {#step-by-step-kb} -1. **Load the model** you want to start with, or create an **empty model** using - [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and - a pre-defined [`vocab`](/api/vocab) object. -2. **Pretrain the entity embeddings** by running the descriptions of the - entities through a simple encoder-decoder network. The current implementation - requires the `nlp` model to have access to pretrained word embeddings, but a - custom implementation of this encoding step can also be used. -3. **Construct the KB** by defining all entities with their pretrained vectors, - and all aliases with their prior probabilities. +1. **Load the model** you want to start with. It should contain pretrained word + vectors. +2. **Obtain the entity embeddings** by running the descriptions of the entities + through the `nlp` model and taking the average of all words with + `nlp(desc).vector`. At this point, a custom encoding step can also be used. +3. **Construct the KB** by defining all entities with their embeddings, and all + aliases with their prior probabilities. 4. **Save** the KB using [`kb.dump`](/api/kb#dump). -5. **Test** the KB to make sure the entities were added correctly. +5. **Print** the contents of the KB to make sure the entities were added + correctly. ### Training an entity linking model {#entity-linker-model} This example shows how to create an entity linker pipe using a previously -created knowledge base. The entity linker pipe is then trained with your own -examples. To do so, you'll need to provide **example texts**, and the -**character offsets** and **knowledge base identifiers** of each entity -contained in the texts. +created knowledge base. The entity linker is then trained with a set of custom +examples. To do so, you need to provide **example texts**, and the **character +offsets** and **knowledge base identifiers** of each entity contained in the +texts. ```python https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py @@ -647,14 +646,16 @@ https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_li 1. **Load the KB** you want to start with, and specify the path to the `Vocab` object that was used to create this KB. Then, create an **empty model** using - [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language. - Don't forget to add the KB to the entity linker, and to add the entity linker - to the pipeline. In practical applications, you will want a more advanced - pipeline including also a component for - [named entity recognition](/usage/training#ner). If you're using a model with - additional components, make sure to disable all other pipeline components - during training using [`nlp.disable_pipes`](/api/language#disable_pipes). - This way, you'll only be training the entity linker. + [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language. Add + a component for recognizing sentences en one for identifying relevant + entities. In practical applications, you will want a more advanced pipeline + including also a component for + [named entity recognition](/usage/training#ner). Then, create a new entity + linker component, add the KB to it, and then add the entity linker to the + pipeline. If you're using a model with additional components, make sure to + disable all other pipeline components during training using + [`nlp.disable_pipes`](/api/language#disable_pipes). This way, you'll only be + training the entity linker. 2. **Shuffle and loop over** the examples. For each example, **update the model** by calling [`nlp.update`](/api/language#update), which steps through the annotated examples of the input. For each combination of a mention in