mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 16:07:41 +03:00 
			
		
		
		
	REL intro and get_candidates function
This commit is contained in:
		
							parent
							
								
									df06f7a792
								
							
						
					
					
						commit
						2c4b2ee5e9
					
				|  | @ -486,6 +486,60 @@ with Model.define_operators({">>": chain}): | |||
| 
 | ||||
| ## Create new trainable components {#components} | ||||
| 
 | ||||
| In addition to [swapping out](#swap-architectures) default models in built-in | ||||
| components, you can also implement an entirely new, | ||||
| [trainable pipeline component](usage/processing-pipelines#trainable-components) | ||||
| from scratch. This can be done by creating a new class inheriting from [`Pipe`](/api/pipe),  | ||||
| and linking it up to your custom model implementation. | ||||
| 
 | ||||
| ### Example: Pipeline component for relation extraction {#component-rel} | ||||
| 
 | ||||
| This section will run through an example of implementing a novel relation extraction  | ||||
| component from scratch. As a first step, we need a method that will generate pairs of | ||||
| entities that we want to classify as being related or not. These candidate pairs are  | ||||
| typically formed within one document, which means we'll have a function that takes a  | ||||
| `Doc` as input and outputs a `List` of `Span` tuples. In this example, we will focus  | ||||
| on binary relation extraction, i.e. the tuple will be of length 2. | ||||
| 
 | ||||
| We register this function in the 'misc' register so we can easily refer to it from the config,  | ||||
| and allow swapping it out for any candidate  | ||||
| generation function. For instance, a very straightforward implementation would be to just  | ||||
| take any two entities from the same document: | ||||
| 
 | ||||
| ```python | ||||
| @registry.misc.register("rel_cand_generator.v1") | ||||
| def create_candidate_indices() -> Callable[[Doc], List[Tuple[Span, Span]]]: | ||||
|     def get_candidate_indices(doc: "Doc"): | ||||
|         indices = [] | ||||
|         for ent1 in doc.ents: | ||||
|             for ent2 in doc.ents: | ||||
|                 indices.append((ent1, ent2)) | ||||
|         return indices | ||||
|     return get_candidate_indices | ||||
| ``` | ||||
| 
 | ||||
| But we could also refine this further by excluding relations of an entity with itself,  | ||||
| and posing a maximum distance (in number of tokens) between two entities: | ||||
| 
 | ||||
| ```python | ||||
| ### {highlight="1,2,7,8"} | ||||
| @registry.misc.register("rel_cand_generator.v2") | ||||
| def create_candidate_indices(max_length: int) -> Callable[[Doc], List[Tuple[Span, Span]]]: | ||||
|     def get_candidate_indices(doc: "Doc"): | ||||
|         indices = [] | ||||
|         for ent1 in doc.ents: | ||||
|             for ent2 in doc.ents: | ||||
|                 if ent1 != ent2: | ||||
|                     if max_length and abs(ent2.start - ent1.start) <= max_length: | ||||
|                         indices.append((ent1, ent2)) | ||||
|         return indices | ||||
|     return get_candidate_indices | ||||
| ``` | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| 
 | ||||
| <Infobox title="This section is still under construction" emoji="🚧" variant="warning"> | ||||
| </Infobox> | ||||
| 
 | ||||
|  |  | |||
|  | @ -1035,7 +1035,7 @@ plug fully custom machine learning components into your pipeline. You'll need | |||
| the following: | ||||
| 
 | ||||
| 1. **Model:** A Thinc [`Model`](https://thinc.ai/docs/api-model) instance. This | ||||
|    can be a model using implemented in | ||||
|    can be a model implemented in | ||||
|    [Thinc](/usage/layers-architectures#thinc), or a | ||||
|    [wrapped model](/usage/layers-architectures#frameworks) implemented in | ||||
|    PyTorch, TensorFlow, MXNet or a fully custom solution. The model must take a | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue
	
	Block a user