final fixes

This commit is contained in:
svlandeg 2020-11-20 22:18:53 +01:00
parent 331ec83493
commit 5ac0867427

View File

@ -567,10 +567,10 @@ def create_relation_model(...) -> Model[List[Doc], Floats2d]:
return model return model
``` ```
We will adapt a **modular approach** to the definition of this relation model, We adapt a **modular approach** to the definition of this relation model, and
and define it as chaining two layers together: the first layer that generates an define it as chaining two layers together: the first layer that generates an
instance tensor from a given set of documents, and the second layer that instance tensor from a given set of documents, and the second layer that
transforms the instance tensor into a final tensor holding the predictions. transforms the instance tensor into a final tensor holding the predictions:
> #### config.cfg (excerpt) > #### config.cfg (excerpt)
> >
@ -586,7 +586,7 @@ transforms the instance tensor into a final tensor holding the predictions.
> ``` > ```
```python ```python
### The model architecture ### The model architecture {highlight="6"}
@spacy.registry.architectures.register("rel_model.v1") @spacy.registry.architectures.register("rel_model.v1")
def create_relation_model( def create_relation_model(
create_instance_tensor: Model[List[Doc], Floats2d], create_instance_tensor: Model[List[Doc], Floats2d],
@ -596,8 +596,9 @@ def create_relation_model(
return model return model
``` ```
The `classification_layer` could be something like a Linear layer followed by a The `classification_layer` could be something like a
logistic activation function: [Linear](https://thinc.ai/docs/api-layers#linear) layer followed by a
[logistic](https://thinc.ai/docs/api-layers#logistic) activation function:
> #### config.cfg (excerpt) > #### config.cfg (excerpt)
> >
@ -748,16 +749,6 @@ generation function.
#### Intermezzo: define how to store the relations data {#component-rel-attribute} #### Intermezzo: define how to store the relations data {#component-rel-attribute}
For our new relation extraction component, we will use a custom
[extension attribute](/usage/processing-pipelines#custom-components-attributes)
`doc._.rel` in which we store relation data. The attribute refers to a
dictionary, keyed by the **start offsets of each entity** involved in the
candidate relation. The values in the dictionary refer to another dictionary
where relation labels are mapped to values between 0 and 1. We assume anything
above 0.5 to be a `True` relation. The ~~Example~~ instances that we'll use as
training data, will include their gold-standard relation annotations in
`example.reference._.rel`.
> #### Example output > #### Example output
> >
> ```python > ```python
@ -771,6 +762,16 @@ training data, will include their gold-standard relation annotations in
> # (6, 0): {'CAPITAL_OF': 0.01, 'LOCATED_IN': 0.13, 'UNRELATED': 0.017} > # (6, 0): {'CAPITAL_OF': 0.01, 'LOCATED_IN': 0.13, 'UNRELATED': 0.017}
> ``` > ```
For our new relation extraction component, we will use a custom
[extension attribute](/usage/processing-pipelines#custom-components-attributes)
`doc._.rel` in which we store relation data. The attribute refers to a
dictionary, keyed by the **start offsets of each entity** involved in the
candidate relation. The values in the dictionary refer to another dictionary
where relation labels are mapped to values between 0 and 1. We assume anything
above 0.5 to be a `True` relation. The ~~Example~~ instances that we'll use as
training data, will include their gold-standard relation annotations in
`example.reference._.rel`.
```python ```python
### Registering the extension attribute ### Registering the extension attribute
from spacy.tokens import Doc from spacy.tokens import Doc
@ -817,11 +818,11 @@ class RelationExtractor(TrainablePipe):
... ...
``` ```
Typically, the constructor defines the vocab, the Machine Learning model, and Typically, the **constructor** defines the vocab, the Machine Learning model,
the name of this component. Additionally, this component, just like the and the name of this component. Additionally, this component, just like the
`textcat` and the `tagger`, stores an internal list of labels. The ML model will `textcat` and the `tagger`, stores an **internal list of labels**. The ML model
predict scores for each label. We add convenience method to easily retrieve and will predict scores for each label. We add convenience methods to easily
add to them. retrieve and add to them.
```python ```python
def __init__(self, vocab, model, name="rel"): def __init__(self, vocab, model, name="rel"):
@ -1003,7 +1004,6 @@ assigns it a name and lets you create the component with
> @architectures = "rel_model.v1" > @architectures = "rel_model.v1"
> # ... > # ...
> >
>
> [training.score_weights] > [training.score_weights]
> rel_micro_p = 0.0 > rel_micro_p = 0.0
> rel_micro_r = 0.0 > rel_micro_r = 0.0