adjust references to null_annotation_setter to trfdata_setter

2025-07-15 10:42:34 +03:00 · 2020-08-27 09:43:32 +02:00 · 2020-08-27 09:43:32 +02:00 · 559b65f2e0
commit 559b65f2e0
parent ec069627fe
2 changed files with 34 additions and 31 deletions
--- a/website/docs/api/transformer.md
+++ b/website/docs/api/transformer.md
@ -25,24 +25,23 @@ work out-of-the-box.
 </Infobox>
-This pipeline component lets you use transformer models in your pipeline.
+This pipeline component lets you use transformer models in your pipeline. It
-Supports all models that are available via the
+supports all models that are available via the
 [HuggingFace `transformers`](https://huggingface.co/transformers) library.
 Usually you will connect subsequent components to the shared transformer using
 the [TransformerListener](/api/architectures#TransformerListener) layer. This
 works similarly to spaCy's [Tok2Vec](/api/tok2vec) component and
 [Tok2VecListener](/api/architectures/Tok2VecListener) sublayer.
-The component assigns the output of the transformer to the `Doc`'s extension
+We calculate an alignment between the word-piece tokens and the spaCy
-attributes. We also calculate an alignment between the word-piece tokens and the
+tokenization, so that we can use the last hidden states to store the information
-spaCy tokenization, so that we can use the last hidden states to set the
+on the `Doc`. When multiple word-piece tokens align to the same spaCy token, the
-`Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy
+spaCy token receives the sum of their values. By default, the information is
-token, the spaCy token receives the sum of their values. To access the values,
+written to the [`Doc._.trf_data`](#custom-attributes) extension attribute, but
-you can use the custom [`Doc._.trf_data`](#custom-attributes) attribute. The
+you can implement a custom [`@annotation_setter`](#annotation_setters) to change
-package also adds the function registries [`@span_getters`](#span_getters) and
+this behaviour. The package also adds the function registry
-[`@annotation_setters`](#annotation_setters) with several built-in registered
+[`@span_getters`](#span_getters) with several built-in registered functions. For
-functions. For more details, see the
+more details, see the [usage documentation](/usage/embeddings-transformers).
 [usage documentation](/usage/embeddings-transformers).
 ## Config and implementation {#config}
@ -61,11 +60,11 @@ architectures and their arguments and hyperparameters.
 > nlp.add_pipe("transformer", config=DEFAULT_CONFIG)
 > ```
-| Setting             | Description                                                                                                                                                                                                                                                                                                            |
+| Setting             | Description                                                                                                                                                                                                                                       |
-| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `max_batch_items`   | Maximum size of a padded batch. Defaults to `4096`. ~~int~~                                                                                                                                                                                                                                                            |
+| `max_batch_items`   | Maximum size of a padded batch. Defaults to `4096`. ~~int~~                                                                                                                                                                                       |
-| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs can set additional annotations on the `Doc`. The `Doc._.transformer_data` attribute is set prior to calling the callback. Defaults to `null_annotation_setter` (no additional annotations). ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
+| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs to store the annotations on the `Doc`. Defaults to `trfdata_setter` which sets the `Doc._.transformer_data` attribute. ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
-| `model`             | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [TransformerModel](/api/architectures#TransformerModel). ~~Model[List[Doc], FullTransformerBatch]~~                                                                                                                         |
+| `model`             | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [TransformerModel](/api/architectures#TransformerModel). ~~Model[List[Doc], FullTransformerBatch]~~                                                    |
 ```python
 https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py
@ -518,19 +517,23 @@ right context.
 ## Annotation setters {#annotation_setters tag="registered functions" source="github.com/explosion/spacy-transformers/blob/master/spacy_transformers/annotation_setters.py"}
-Annotation setters are functions that that take a batch of `Doc` objects and a
+Annotation setters are functions that take a batch of `Doc` objects and a
-[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set
+[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and store the
-additional annotations on the `Doc`, e.g. to set custom or built-in attributes.
+annotations on the `Doc`, e.g. to set custom or built-in attributes. You can
-You can register custom annotation setters using the
+register custom annotation setters using the `@registry.annotation_setters`
-`@registry.annotation_setters` decorator.
+decorator. The default annotation setter used by the `Transformer` pipeline
 component is `trfdata_setter`, which sets the custom `Doc._.transformer_data`
 attribute.
 > #### Example
 >
 > ```python
-> @registry.annotation_setters("spacy-transformers.null_annotation_setter.v1")
+> @registry.annotation_setters("spacy-transformers.trfdata_setter.v1")
-> def configure_null_annotation_setter() -> Callable:
+> def configure_trfdata_setter() -> Callable:
 >     def setter(docs: List[Doc], trf_data: FullTransformerBatch) -> None:
->         pass
+>         doc_data = list(trf_data.doc_data)
 >         for doc, data in zip(docs, doc_data):
 >             doc._.trf_data = data
 >
 >     return setter
 > ```
@ -542,9 +545,9 @@ You can register custom annotation setters using the
 The following built-in functions are available:
-| Name                                           | Description                           |
+| Name                                   | Description                                                   |
-| ---------------------------------------------- | ------------------------------------- |
+| -------------------------------------- | ------------------------------------------------------------- |
-| `spacy-transformers.null_annotation_setter.v1` | Don't set any additional annotations. |
+| `spacy-transformers.trfdata_setter.v1` | Set the annotations to the custom attribute `doc._.trf_data`. |
 ## Custom attributes {#custom-attributes}
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@ -299,7 +299,7 @@ component:
 >
 > ```python
 > from spacy_transformers import Transformer, TransformerModel
-> from spacy_transformers.annotation_setters import null_annotation_setter
+> from spacy_transformers.annotation_setters import configure_trfdata_setter
 > from spacy_transformers.span_getters import get_doc_spans
 >
 > trf = Transformer(
@ -309,7 +309,7 @@ component:
 >         get_spans=get_doc_spans,
 >         tokenizer_config={"use_fast": True},
 >     ),
->     annotation_setter=null_annotation_setter,
+>     annotation_setter=configure_trfdata_setter(),
 >     max_batch_items=4096,
 > )
 > ```
@ -329,7 +329,7 @@ tokenizer_config = {"use_fast": true}
@span_getters = "doc_spans.v1"
 [components.transformer.annotation_setter]
-@annotation_setters = "spacy-transformers.null_annotation_setter.v1"
+@annotation_setters = "spacy-transformers.trfdata_setter.v1"
 ```