mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-26 18:06:29 +03:00
adjust references to null_annotation_setter to trfdata_setter
This commit is contained in:
parent
ec069627fe
commit
559b65f2e0
|
@ -25,24 +25,23 @@ work out-of-the-box.
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
This pipeline component lets you use transformer models in your pipeline.
|
This pipeline component lets you use transformer models in your pipeline. It
|
||||||
Supports all models that are available via the
|
supports all models that are available via the
|
||||||
[HuggingFace `transformers`](https://huggingface.co/transformers) library.
|
[HuggingFace `transformers`](https://huggingface.co/transformers) library.
|
||||||
Usually you will connect subsequent components to the shared transformer using
|
Usually you will connect subsequent components to the shared transformer using
|
||||||
the [TransformerListener](/api/architectures#TransformerListener) layer. This
|
the [TransformerListener](/api/architectures#TransformerListener) layer. This
|
||||||
works similarly to spaCy's [Tok2Vec](/api/tok2vec) component and
|
works similarly to spaCy's [Tok2Vec](/api/tok2vec) component and
|
||||||
[Tok2VecListener](/api/architectures/Tok2VecListener) sublayer.
|
[Tok2VecListener](/api/architectures/Tok2VecListener) sublayer.
|
||||||
|
|
||||||
The component assigns the output of the transformer to the `Doc`'s extension
|
We calculate an alignment between the word-piece tokens and the spaCy
|
||||||
attributes. We also calculate an alignment between the word-piece tokens and the
|
tokenization, so that we can use the last hidden states to store the information
|
||||||
spaCy tokenization, so that we can use the last hidden states to set the
|
on the `Doc`. When multiple word-piece tokens align to the same spaCy token, the
|
||||||
`Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy
|
spaCy token receives the sum of their values. By default, the information is
|
||||||
token, the spaCy token receives the sum of their values. To access the values,
|
written to the [`Doc._.trf_data`](#custom-attributes) extension attribute, but
|
||||||
you can use the custom [`Doc._.trf_data`](#custom-attributes) attribute. The
|
you can implement a custom [`@annotation_setter`](#annotation_setters) to change
|
||||||
package also adds the function registries [`@span_getters`](#span_getters) and
|
this behaviour. The package also adds the function registry
|
||||||
[`@annotation_setters`](#annotation_setters) with several built-in registered
|
[`@span_getters`](#span_getters) with several built-in registered functions. For
|
||||||
functions. For more details, see the
|
more details, see the [usage documentation](/usage/embeddings-transformers).
|
||||||
[usage documentation](/usage/embeddings-transformers).
|
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
|
@ -61,11 +60,11 @@ architectures and their arguments and hyperparameters.
|
||||||
> nlp.add_pipe("transformer", config=DEFAULT_CONFIG)
|
> nlp.add_pipe("transformer", config=DEFAULT_CONFIG)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Setting | Description |
|
| Setting | Description |
|
||||||
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `max_batch_items` | Maximum size of a padded batch. Defaults to `4096`. ~~int~~ |
|
| `max_batch_items` | Maximum size of a padded batch. Defaults to `4096`. ~~int~~ |
|
||||||
| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs can set additional annotations on the `Doc`. The `Doc._.transformer_data` attribute is set prior to calling the callback. Defaults to `null_annotation_setter` (no additional annotations). ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
|
| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs to store the annotations on the `Doc`. Defaults to `trfdata_setter` which sets the `Doc._.transformer_data` attribute. ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
|
||||||
| `model` | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [TransformerModel](/api/architectures#TransformerModel). ~~Model[List[Doc], FullTransformerBatch]~~ |
|
| `model` | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [TransformerModel](/api/architectures#TransformerModel). ~~Model[List[Doc], FullTransformerBatch]~~ |
|
||||||
|
|
||||||
```python
|
```python
|
||||||
https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py
|
https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py
|
||||||
|
@ -518,19 +517,23 @@ right context.
|
||||||
|
|
||||||
## Annotation setters {#annotation_setters tag="registered functions" source="github.com/explosion/spacy-transformers/blob/master/spacy_transformers/annotation_setters.py"}
|
## Annotation setters {#annotation_setters tag="registered functions" source="github.com/explosion/spacy-transformers/blob/master/spacy_transformers/annotation_setters.py"}
|
||||||
|
|
||||||
Annotation setters are functions that that take a batch of `Doc` objects and a
|
Annotation setters are functions that take a batch of `Doc` objects and a
|
||||||
[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set
|
[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and store the
|
||||||
additional annotations on the `Doc`, e.g. to set custom or built-in attributes.
|
annotations on the `Doc`, e.g. to set custom or built-in attributes. You can
|
||||||
You can register custom annotation setters using the
|
register custom annotation setters using the `@registry.annotation_setters`
|
||||||
`@registry.annotation_setters` decorator.
|
decorator. The default annotation setter used by the `Transformer` pipeline
|
||||||
|
component is `trfdata_setter`, which sets the custom `Doc._.transformer_data`
|
||||||
|
attribute.
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> @registry.annotation_setters("spacy-transformers.null_annotation_setter.v1")
|
> @registry.annotation_setters("spacy-transformers.trfdata_setter.v1")
|
||||||
> def configure_null_annotation_setter() -> Callable:
|
> def configure_trfdata_setter() -> Callable:
|
||||||
> def setter(docs: List[Doc], trf_data: FullTransformerBatch) -> None:
|
> def setter(docs: List[Doc], trf_data: FullTransformerBatch) -> None:
|
||||||
> pass
|
> doc_data = list(trf_data.doc_data)
|
||||||
|
> for doc, data in zip(docs, doc_data):
|
||||||
|
> doc._.trf_data = data
|
||||||
>
|
>
|
||||||
> return setter
|
> return setter
|
||||||
> ```
|
> ```
|
||||||
|
@ -542,9 +545,9 @@ You can register custom annotation setters using the
|
||||||
|
|
||||||
The following built-in functions are available:
|
The following built-in functions are available:
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ---------------------------------------------- | ------------------------------------- |
|
| -------------------------------------- | ------------------------------------------------------------- |
|
||||||
| `spacy-transformers.null_annotation_setter.v1` | Don't set any additional annotations. |
|
| `spacy-transformers.trfdata_setter.v1` | Set the annotations to the custom attribute `doc._.trf_data`. |
|
||||||
|
|
||||||
## Custom attributes {#custom-attributes}
|
## Custom attributes {#custom-attributes}
|
||||||
|
|
||||||
|
|
|
@ -299,7 +299,7 @@ component:
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> from spacy_transformers import Transformer, TransformerModel
|
> from spacy_transformers import Transformer, TransformerModel
|
||||||
> from spacy_transformers.annotation_setters import null_annotation_setter
|
> from spacy_transformers.annotation_setters import configure_trfdata_setter
|
||||||
> from spacy_transformers.span_getters import get_doc_spans
|
> from spacy_transformers.span_getters import get_doc_spans
|
||||||
>
|
>
|
||||||
> trf = Transformer(
|
> trf = Transformer(
|
||||||
|
@ -309,7 +309,7 @@ component:
|
||||||
> get_spans=get_doc_spans,
|
> get_spans=get_doc_spans,
|
||||||
> tokenizer_config={"use_fast": True},
|
> tokenizer_config={"use_fast": True},
|
||||||
> ),
|
> ),
|
||||||
> annotation_setter=null_annotation_setter,
|
> annotation_setter=configure_trfdata_setter(),
|
||||||
> max_batch_items=4096,
|
> max_batch_items=4096,
|
||||||
> )
|
> )
|
||||||
> ```
|
> ```
|
||||||
|
@ -329,7 +329,7 @@ tokenizer_config = {"use_fast": true}
|
||||||
@span_getters = "doc_spans.v1"
|
@span_getters = "doc_spans.v1"
|
||||||
|
|
||||||
[components.transformer.annotation_setter]
|
[components.transformer.annotation_setter]
|
||||||
@annotation_setters = "spacy-transformers.null_annotation_setter.v1"
|
@annotation_setters = "spacy-transformers.trfdata_setter.v1"
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user