mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 17:24:41 +03:00
Merge pull request #5386 from svlandeg/fix/nel-docs
This commit is contained in:
commit
f333c2a011
|
@ -64,7 +64,7 @@ def main(kb_path, vocab_path=None, output_dir=None, n_iter=50):
|
|||
"""Create a blank model with the specified vocab, set up the pipeline and train the entity linker.
|
||||
The `vocab` should be the one used during creation of the KB."""
|
||||
vocab = Vocab().from_disk(vocab_path)
|
||||
# create blank Language class with correct vocab
|
||||
# create blank English model with correct vocab
|
||||
nlp = spacy.blank("en", vocab=vocab)
|
||||
nlp.vocab.vectors.name = "spacy_pretrained_vectors"
|
||||
print("Created blank 'en' model with vocab from '%s'" % vocab_path)
|
||||
|
|
|
@ -619,25 +619,24 @@ https://github.com/explosion/spaCy/tree/master/examples/training/create_kb.py
|
|||
|
||||
#### Step by step guide {#step-by-step-kb}
|
||||
|
||||
1. **Load the model** you want to start with, or create an **empty model** using
|
||||
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and
|
||||
a pre-defined [`vocab`](/api/vocab) object.
|
||||
2. **Pretrain the entity embeddings** by running the descriptions of the
|
||||
entities through a simple encoder-decoder network. The current implementation
|
||||
requires the `nlp` model to have access to pretrained word embeddings, but a
|
||||
custom implementation of this encoding step can also be used.
|
||||
3. **Construct the KB** by defining all entities with their pretrained vectors,
|
||||
and all aliases with their prior probabilities.
|
||||
1. **Load the model** you want to start with. It should contain pretrained word
|
||||
vectors.
|
||||
2. **Obtain the entity embeddings** by running the descriptions of the entities
|
||||
through the `nlp` model and taking the average of all words with
|
||||
`nlp(desc).vector`. At this point, a custom encoding step can also be used.
|
||||
3. **Construct the KB** by defining all entities with their embeddings, and all
|
||||
aliases with their prior probabilities.
|
||||
4. **Save** the KB using [`kb.dump`](/api/kb#dump).
|
||||
5. **Test** the KB to make sure the entities were added correctly.
|
||||
5. **Print** the contents of the KB to make sure the entities were added
|
||||
correctly.
|
||||
|
||||
### Training an entity linking model {#entity-linker-model}
|
||||
|
||||
This example shows how to create an entity linker pipe using a previously
|
||||
created knowledge base. The entity linker pipe is then trained with your own
|
||||
examples. To do so, you'll need to provide **example texts**, and the
|
||||
**character offsets** and **knowledge base identifiers** of each entity
|
||||
contained in the texts.
|
||||
created knowledge base. The entity linker is then trained with a set of custom
|
||||
examples. To do so, you need to provide **example texts**, and the **character
|
||||
offsets** and **knowledge base identifiers** of each entity contained in the
|
||||
texts.
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
|
||||
|
@ -647,14 +646,16 @@ https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_li
|
|||
|
||||
1. **Load the KB** you want to start with, and specify the path to the `Vocab`
|
||||
object that was used to create this KB. Then, create an **empty model** using
|
||||
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language.
|
||||
Don't forget to add the KB to the entity linker, and to add the entity linker
|
||||
to the pipeline. In practical applications, you will want a more advanced
|
||||
pipeline including also a component for
|
||||
[named entity recognition](/usage/training#ner). If you're using a model with
|
||||
additional components, make sure to disable all other pipeline components
|
||||
during training using [`nlp.disable_pipes`](/api/language#disable_pipes).
|
||||
This way, you'll only be training the entity linker.
|
||||
[`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language. Add
|
||||
a component for recognizing sentences en one for identifying relevant
|
||||
entities. In practical applications, you will want a more advanced pipeline
|
||||
including also a component for
|
||||
[named entity recognition](/usage/training#ner). Then, create a new entity
|
||||
linker component, add the KB to it, and then add the entity linker to the
|
||||
pipeline. If you're using a model with additional components, make sure to
|
||||
disable all other pipeline components during training using
|
||||
[`nlp.disable_pipes`](/api/language#disable_pipes). This way, you'll only be
|
||||
training the entity linker.
|
||||
2. **Shuffle and loop over** the examples. For each example, **update the
|
||||
model** by calling [`nlp.update`](/api/language#update), which steps through
|
||||
the annotated examples of the input. For each combination of a mention in
|
||||
|
|
Loading…
Reference in New Issue
Block a user