adjust references to null_annotation_setter to trfdata_setter

2025-11-22 18:55:43 +03:00 · 2020-08-27 09:43:32 +02:00 · 2020-08-27 09:43:32 +02:00 · 559b65f2e0
commit 559b65f2e0
parent ec069627fe
2 changed files with 34 additions and 31 deletions
--- a/website/docs/api/transformer.md
+++ b/website/docs/api/transformer.md
@ -25,24 +25,23 @@ work out-of-the-box.

 </Infobox>

-This pipeline component lets you use transformer models in your pipeline.
-Supports all models that are available via the
+This pipeline component lets you use transformer models in your pipeline. It
+supports all models that are available via the
 [HuggingFace `transformers`](https://huggingface.co/transformers) library.
 Usually you will connect subsequent components to the shared transformer using
 the [TransformerListener](/api/architectures#TransformerListener) layer. This
 works similarly to spaCy's [Tok2Vec](/api/tok2vec) component and
 [Tok2VecListener](/api/architectures/Tok2VecListener) sublayer.

-The component assigns the output of the transformer to the `Doc`'s extension
-attributes. We also calculate an alignment between the word-piece tokens and the
-spaCy tokenization, so that we can use the last hidden states to set the
-`Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy
-token, the spaCy token receives the sum of their values. To access the values,
-you can use the custom [`Doc._.trf_data`](#custom-attributes) attribute. The
-package also adds the function registries [`@span_getters`](#span_getters) and
-[`@annotation_setters`](#annotation_setters) with several built-in registered
-functions. For more details, see the
-[usage documentation](/usage/embeddings-transformers).
+We calculate an alignment between the word-piece tokens and the spaCy
+tokenization, so that we can use the last hidden states to store the information
+on the `Doc`. When multiple word-piece tokens align to the same spaCy token, the
+spaCy token receives the sum of their values. By default, the information is
+written to the [`Doc._.trf_data`](#custom-attributes) extension attribute, but
+you can implement a custom [`@annotation_setter`](#annotation_setters) to change
+this behaviour. The package also adds the function registry
+[`@span_getters`](#span_getters) with several built-in registered functions. For
+more details, see the [usage documentation](/usage/embeddings-transformers).

 ## Config and implementation {#config}

@ -62,9 +61,9 @@ architectures and their arguments and hyperparameters.
 > ```

 | Setting             | Description                                                                                                                                                                                                                                       |
-| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `max_batch_items`   | Maximum size of a padded batch. Defaults to `4096`. ~~int~~                                                                                                                                                                                       |
-| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs can set additional annotations on the `Doc`. The `Doc._.transformer_data` attribute is set prior to calling the callback. Defaults to `null_annotation_setter` (no additional annotations). ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
+| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs to store the annotations on the `Doc`. Defaults to `trfdata_setter` which sets the `Doc._.transformer_data` attribute. ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
 | `model`             | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [TransformerModel](/api/architectures#TransformerModel). ~~Model[List[Doc], FullTransformerBatch]~~                                                    |

 ```python
@ -518,19 +517,23 @@ right context.

 ## Annotation setters {#annotation_setters tag="registered functions" source="github.com/explosion/spacy-transformers/blob/master/spacy_transformers/annotation_setters.py"}

-Annotation setters are functions that that take a batch of `Doc` objects and a
-[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set
-additional annotations on the `Doc`, e.g. to set custom or built-in attributes.
-You can register custom annotation setters using the
-`@registry.annotation_setters` decorator.
+Annotation setters are functions that take a batch of `Doc` objects and a
+[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and store the
+annotations on the `Doc`, e.g. to set custom or built-in attributes. You can
+register custom annotation setters using the `@registry.annotation_setters`
+decorator. The default annotation setter used by the `Transformer` pipeline
+component is `trfdata_setter`, which sets the custom `Doc._.transformer_data`
+attribute.

 > #### Example
 >
 > ```python
-> @registry.annotation_setters("spacy-transformers.null_annotation_setter.v1")
-> def configure_null_annotation_setter() -> Callable:
+> @registry.annotation_setters("spacy-transformers.trfdata_setter.v1")
+> def configure_trfdata_setter() -> Callable:
 >     def setter(docs: List[Doc], trf_data: FullTransformerBatch) -> None:
->         pass
+>         doc_data = list(trf_data.doc_data)
+>         for doc, data in zip(docs, doc_data):
+>             doc._.trf_data = data
 >
 >     return setter
 > ```
@ -543,8 +546,8 @@ You can register custom annotation setters using the
 The following built-in functions are available:

 | Name                                   | Description                                                   |
-| ---------------------------------------------- | ------------------------------------- |
-| `spacy-transformers.null_annotation_setter.v1` | Don't set any additional annotations. |
+| -------------------------------------- | ------------------------------------------------------------- |
+| `spacy-transformers.trfdata_setter.v1` | Set the annotations to the custom attribute `doc._.trf_data`. |

 ## Custom attributes {#custom-attributes}

--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@ -299,7 +299,7 @@ component:
 >
 > ```python
 > from spacy_transformers import Transformer, TransformerModel
-> from spacy_transformers.annotation_setters import null_annotation_setter
+> from spacy_transformers.annotation_setters import configure_trfdata_setter
 > from spacy_transformers.span_getters import get_doc_spans
 >
 > trf = Transformer(
@ -309,7 +309,7 @@ component:
 >         get_spans=get_doc_spans,
 >         tokenizer_config={"use_fast": True},
 >     ),
->     annotation_setter=null_annotation_setter,
+>     annotation_setter=configure_trfdata_setter(),
 >     max_batch_items=4096,
 > )
 > ```
@ -329,7 +329,7 @@ tokenizer_config = {"use_fast": true}
@span_getters = "doc_spans.v1"

 [components.transformer.annotation_setter]
-@annotation_setters = "spacy-transformers.null_annotation_setter.v1"
+@annotation_setters = "spacy-transformers.trfdata_setter.v1"

 ```