mirror of
https://github.com/explosion/spaCy.git
synced 2025-05-30 10:43:18 +03:00
REL intro and get_candidates function
This commit is contained in:
parent
df06f7a792
commit
2c4b2ee5e9
|
@ -486,6 +486,60 @@ with Model.define_operators({">>": chain}):
|
||||||
|
|
||||||
## Create new trainable components {#components}
|
## Create new trainable components {#components}
|
||||||
|
|
||||||
|
In addition to [swapping out](#swap-architectures) default models in built-in
|
||||||
|
components, you can also implement an entirely new,
|
||||||
|
[trainable pipeline component](usage/processing-pipelines#trainable-components)
|
||||||
|
from scratch. This can be done by creating a new class inheriting from [`Pipe`](/api/pipe),
|
||||||
|
and linking it up to your custom model implementation.
|
||||||
|
|
||||||
|
### Example: Pipeline component for relation extraction {#component-rel}
|
||||||
|
|
||||||
|
This section will run through an example of implementing a novel relation extraction
|
||||||
|
component from scratch. As a first step, we need a method that will generate pairs of
|
||||||
|
entities that we want to classify as being related or not. These candidate pairs are
|
||||||
|
typically formed within one document, which means we'll have a function that takes a
|
||||||
|
`Doc` as input and outputs a `List` of `Span` tuples. In this example, we will focus
|
||||||
|
on binary relation extraction, i.e. the tuple will be of length 2.
|
||||||
|
|
||||||
|
We register this function in the 'misc' register so we can easily refer to it from the config,
|
||||||
|
and allow swapping it out for any candidate
|
||||||
|
generation function. For instance, a very straightforward implementation would be to just
|
||||||
|
take any two entities from the same document:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@registry.misc.register("rel_cand_generator.v1")
|
||||||
|
def create_candidate_indices() -> Callable[[Doc], List[Tuple[Span, Span]]]:
|
||||||
|
def get_candidate_indices(doc: "Doc"):
|
||||||
|
indices = []
|
||||||
|
for ent1 in doc.ents:
|
||||||
|
for ent2 in doc.ents:
|
||||||
|
indices.append((ent1, ent2))
|
||||||
|
return indices
|
||||||
|
return get_candidate_indices
|
||||||
|
```
|
||||||
|
|
||||||
|
But we could also refine this further by excluding relations of an entity with itself,
|
||||||
|
and posing a maximum distance (in number of tokens) between two entities:
|
||||||
|
|
||||||
|
```python
|
||||||
|
### {highlight="1,2,7,8"}
|
||||||
|
@registry.misc.register("rel_cand_generator.v2")
|
||||||
|
def create_candidate_indices(max_length: int) -> Callable[[Doc], List[Tuple[Span, Span]]]:
|
||||||
|
def get_candidate_indices(doc: "Doc"):
|
||||||
|
indices = []
|
||||||
|
for ent1 in doc.ents:
|
||||||
|
for ent2 in doc.ents:
|
||||||
|
if ent1 != ent2:
|
||||||
|
if max_length and abs(ent2.start - ent1.start) <= max_length:
|
||||||
|
indices.append((ent1, ent2))
|
||||||
|
return indices
|
||||||
|
return get_candidate_indices
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
|
<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
|
|
|
@ -1035,7 +1035,7 @@ plug fully custom machine learning components into your pipeline. You'll need
|
||||||
the following:
|
the following:
|
||||||
|
|
||||||
1. **Model:** A Thinc [`Model`](https://thinc.ai/docs/api-model) instance. This
|
1. **Model:** A Thinc [`Model`](https://thinc.ai/docs/api-model) instance. This
|
||||||
can be a model using implemented in
|
can be a model implemented in
|
||||||
[Thinc](/usage/layers-architectures#thinc), or a
|
[Thinc](/usage/layers-architectures#thinc), or a
|
||||||
[wrapped model](/usage/layers-architectures#frameworks) implemented in
|
[wrapped model](/usage/layers-architectures#frameworks) implemented in
|
||||||
PyTorch, TensorFlow, MXNet or a fully custom solution. The model must take a
|
PyTorch, TensorFlow, MXNet or a fully custom solution. The model must take a
|
||||||
|
|
Loading…
Reference in New Issue
Block a user