mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 18:26:30 +03:00
Few more updates to the EL documentation
This commit is contained in:
parent
8602daba85
commit
ebaed7dcfa
|
@ -64,7 +64,7 @@ def main(kb_path, vocab_path=None, output_dir=None, n_iter=50):
|
||||||
"""Create a blank model with the specified vocab, set up the pipeline and train the entity linker.
|
"""Create a blank model with the specified vocab, set up the pipeline and train the entity linker.
|
||||||
The `vocab` should be the one used during creation of the KB."""
|
The `vocab` should be the one used during creation of the KB."""
|
||||||
vocab = Vocab().from_disk(vocab_path)
|
vocab = Vocab().from_disk(vocab_path)
|
||||||
# create blank Language class with correct vocab
|
# create blank English model with correct vocab
|
||||||
nlp = spacy.blank("en", vocab=vocab)
|
nlp = spacy.blank("en", vocab=vocab)
|
||||||
nlp.vocab.vectors.name = "spacy_pretrained_vectors"
|
nlp.vocab.vectors.name = "spacy_pretrained_vectors"
|
||||||
print("Created blank 'en' model with vocab from '%s'" % vocab_path)
|
print("Created blank 'en' model with vocab from '%s'" % vocab_path)
|
||||||
|
|
|
@ -619,25 +619,24 @@ https://github.com/explosion/spaCy/tree/master/examples/training/create_kb.py
|
||||||
|
|
||||||
#### Step by step guide {#step-by-step-kb}
|
#### Step by step guide {#step-by-step-kb}
|
||||||
|
|
||||||
1. **Load the model** you want to start with, or create an **empty model** using
|
1. **Load the model** you want to start with. It should contain pretrained word
|
||||||
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and
|
vectors.
|
||||||
a pre-defined [`vocab`](/api/vocab) object.
|
2. **Obtain the entity embeddings** by running the descriptions of the entities
|
||||||
2. **Pretrain the entity embeddings** by running the descriptions of the
|
through the `nlp` model and taking the average of all words with
|
||||||
entities through a simple encoder-decoder network. The current implementation
|
`nlp(desc).vector`. At this point, a custom encoding step can also be used.
|
||||||
requires the `nlp` model to have access to pretrained word embeddings, but a
|
3. **Construct the KB** by defining all entities with their embeddings, and all
|
||||||
custom implementation of this encoding step can also be used.
|
aliases with their prior probabilities.
|
||||||
3. **Construct the KB** by defining all entities with their pretrained vectors,
|
|
||||||
and all aliases with their prior probabilities.
|
|
||||||
4. **Save** the KB using [`kb.dump`](/api/kb#dump).
|
4. **Save** the KB using [`kb.dump`](/api/kb#dump).
|
||||||
5. **Test** the KB to make sure the entities were added correctly.
|
5. **Print** the contents of the KB to make sure the entities were added
|
||||||
|
correctly.
|
||||||
|
|
||||||
### Training an entity linking model {#entity-linker-model}
|
### Training an entity linking model {#entity-linker-model}
|
||||||
|
|
||||||
This example shows how to create an entity linker pipe using a previously
|
This example shows how to create an entity linker pipe using a previously
|
||||||
created knowledge base. The entity linker pipe is then trained with your own
|
created knowledge base. The entity linker is then trained with a set of custom
|
||||||
examples. To do so, you'll need to provide **example texts**, and the
|
examples. To do so, you need to provide **example texts**, and the **character
|
||||||
**character offsets** and **knowledge base identifiers** of each entity
|
offsets** and **knowledge base identifiers** of each entity contained in the
|
||||||
contained in the texts.
|
texts.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
|
https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
|
||||||
|
@ -647,14 +646,16 @@ https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_li
|
||||||
|
|
||||||
1. **Load the KB** you want to start with, and specify the path to the `Vocab`
|
1. **Load the KB** you want to start with, and specify the path to the `Vocab`
|
||||||
object that was used to create this KB. Then, create an **empty model** using
|
object that was used to create this KB. Then, create an **empty model** using
|
||||||
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language.
|
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language. Add
|
||||||
Don't forget to add the KB to the entity linker, and to add the entity linker
|
a component for recognizing sentences en one for identifying relevant
|
||||||
to the pipeline. In practical applications, you will want a more advanced
|
entities. In practical applications, you will want a more advanced pipeline
|
||||||
pipeline including also a component for
|
including also a component for
|
||||||
[named entity recognition](/usage/training#ner). If you're using a model with
|
[named entity recognition](/usage/training#ner). Then, create a new entity
|
||||||
additional components, make sure to disable all other pipeline components
|
linker component, add the KB to it, and then add the entity linker to the
|
||||||
during training using [`nlp.disable_pipes`](/api/language#disable_pipes).
|
pipeline. If you're using a model with additional components, make sure to
|
||||||
This way, you'll only be training the entity linker.
|
disable all other pipeline components during training using
|
||||||
|
[`nlp.disable_pipes`](/api/language#disable_pipes). This way, you'll only be
|
||||||
|
training the entity linker.
|
||||||
2. **Shuffle and loop over** the examples. For each example, **update the
|
2. **Shuffle and loop over** the examples. For each example, **update the
|
||||||
model** by calling [`nlp.update`](/api/language#update), which steps through
|
model** by calling [`nlp.update`](/api/language#update), which steps through
|
||||||
the annotated examples of the input. For each combination of a mention in
|
the annotated examples of the input. For each combination of a mention in
|
||||||
|
|
Loading…
Reference in New Issue
Block a user