diff --git a/website/docs/api/curated-transformer.mdx b/website/docs/api/curated-transformer.mdx
index c9e0dfb5c..6fcb36765 100644
--- a/website/docs/api/curated-transformer.mdx
+++ b/website/docs/api/curated-transformer.mdx
@@ -3,7 +3,7 @@ title: CuratedTransformer
 teaser: Pipeline component for multi-task learning with transformer models
 tag: class
 source: github.com/explosion/spacy-transformers/blob/master/spacy_curated_transformers/pipeline_component.py
-version: 3
+version: 3.7
 api_base_class: /api/pipe
 api_string_name: transformer
 ---
@@ -17,25 +17,33 @@ api_string_name: transformer
 <Infobox title="Important note" variant="warning">
 
 This component is available via the extension package
-[`spacy-curated-transformers`](https://github.com/explosion/spacy-curated-transformers). It
-exposes the component via entry points, so if you have the package installed,
+[`spacy-curated-transformers`](https://github.com/explosion/spacy-curated-transformers).
+It exposes the component via entry points, so if you have the package installed,
 using `factory = "curated_transformer"` in your
-[training config](/usage/training#config) or `nlp.add_pipe("curated_transformer")` will
-work out-of-the-box.
+[training config](/usage/training#config) or
+`nlp.add_pipe("curated_transformer")` will work out-of-the-box.
 
 </Infobox>
 
-This Python package provides a curated set of transformer models for spaCy. It is focused on deep integration into spaCy and will support deployment-focused features such as distillation and quantization in the future. spaCy curated transformers currently supports the following model types:
+This Python package provides a curated set of transformer models for spaCy. It
+is focused on deep integration into spaCy and will support deployment-focused
+features such as distillation and quantization in the future. spaCy curated
+transformers currently supports the following model types:
 
-* ALBERT
-* BERT
-* CamemBERT
-* RoBERTa
-* XLM-RoBERTa
+- ALBERT
+- BERT
+- CamemBERT
+- RoBERTa
+- XLM-RoBERTa
 
-You will usually connect downstream components to a shared curated transformer using one of the curated transformer listener layers. This works similarly to spaCy's [Tok2Vec](/api/tok2vec), and the [Tok2VecListener](/api/architectures/#Tok2VecListener) sublayer.
+You will usually connect downstream components to a shared curated transformer
+using one of the curated transformer listener layers. This works similarly to
+spaCy's [Tok2Vec](/api/tok2vec), and the
+[Tok2VecListener](/api/architectures/#Tok2VecListener) sublayer.
 
-Supporting a wide variety of transformer models is a non-goal. If you want to use another type of model, use [spacy-transformers](/api/spacy-transformers), which allows you to use Hugging Face transformers models with spaCy.
+Supporting a wide variety of transformer models is a non-goal. If you want to
+use another type of model, use [spacy-transformers](/api/spacy-transformers),
+which allows you to use Hugging Face transformers models with spaCy.
 
 The component assigns the output of the transformer to the `Doc`'s extension
 attributes. We also calculate an alignment between the word-piece tokens and the
@@ -51,8 +59,8 @@ For more details, see the [usage documentation](/usage/embeddings-transformers).
 The component sets the following
 [custom extension attribute](/usage/processing-pipeline#custom-components-attributes):
 
-| Location         | Value                                                                    |
-| ---------------- | ------------------------------------------------------------------------ |
+| Location         | Value                                                                                |
+| ---------------- | ------------------------------------------------------------------------------------ |
 | `Doc._.trf_data` | CuratedTransformer tokens and outputs for the `Doc` object. ~~DocTransformerOutput~~ |
 
 ## Config and implementation {id="config"}
@@ -74,8 +82,8 @@ on the transformer architectures and their arguments and hyperparameters.
 > ```
 
 | Setting             | Description                                                                                                                                                                                                                                        |
-|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `model`             | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [CuratedTransformerModel](/api/architectures#CuratedTransformerModel). ~~Model[List[Doc], FullCuratedTransformerBatch]~~                                                     |
+| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `model`             | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [CuratedTransformerModel](/api/architectures#CuratedTransformerModel). ~~Model[List[Doc], FullCuratedTransformerBatch]~~                                |
 | `frozen`            | If `True`, the model's weights are frozen and no backpropagation is performed. ~~bool~~                                                                                                                                                            |
 | `all_layer_outputs` | If `True`, the model returns the outputs of all the layers. Otherwise, only the output of the last layer is returned. This must be set to `True` if any of the pipe's downstream listeners require the outputs of all transformer layers. ~~bool~~ |
 
@@ -110,16 +118,16 @@ https://github.com/explosion/spacy-curated-transformers/blob/main/spacy_curated_
 > trf = CuratedTransformer(nlp.vocab, model)
 > ```
 
-Construct a `CuratedTransformer` component. One or more subsequent spaCy components can
-use the transformer outputs as features in its model, with gradients
-backpropagated to the single shared weights. The activations from the
+Construct a `CuratedTransformer` component. One or more subsequent spaCy
+components can use the transformer outputs as features in its model, with
+gradients backpropagated to the single shared weights. The activations from the
 transformer are saved in the [`Doc._.trf_data`](#assigned-attributes) extension
 attribute. You can also provide a callback to set additional annotations. In
 your application, you would normally use a shortcut for this and instantiate the
 component using its string name and [`nlp.add_pipe`](/api/language#create_pipe).
 
 | Name                | Description                                                                                                                                                                                                                                        |
-|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `vocab`             | The shared vocabulary. ~~Vocab~~                                                                                                                                                                                                                   |
 | `model`             | One of the supported pre-trained transformer models. ~~Model~~                                                                                                                                                                                     |
 | _keyword-only_      |                                                                                                                                                                                                                                                    |
@@ -194,7 +202,7 @@ by [`Language.initialize`](/api/language#initialize).
 > ```
 
 | Name             | Description                                                                                                                                                                |
-|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `get_examples`   | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
 | _keyword-only_   |                                                                                                                                                                            |
 | `nlp`            | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                       |
@@ -221,9 +229,9 @@ modifying them.
 ## CuratedTransformer.set_annotations {id="set_annotations",tag="method"}
 
 Assign the extracted features to the `Doc` objects. By default, the
-[`DocTransformerOutput`](/api/curated-transformer#doctransformeroutput) object is written to the
-[`Doc._.trf_data`](#assigned-attributes) attribute. Your `set_extra_annotations`
-callback is then called, if provided.
+[`DocTransformerOutput`](/api/curated-transformer#doctransformeroutput) object
+is written to the [`Doc._.trf_data`](#assigned-attributes) attribute. Your
+`set_extra_annotations` callback is then called, if provided.
 
 > #### Example
 >
@@ -233,29 +241,28 @@ callback is then called, if provided.
 > trf.set_annotations(docs, scores)
 > ```
 
-| Name     | Description                                           |
-| -------- | ----------------------------------------------------- |
-| `docs`   | The documents to modify. ~~Iterable[Doc]~~            |
+| Name     | Description                                                  |
+| -------- | ------------------------------------------------------------ |
+| `docs`   | The documents to modify. ~~Iterable[Doc]~~                   |
 | `scores` | The scores to set, produced by `CuratedTransformer.predict`. |
 
 ## CuratedTransformer.update {id="update",tag="method"}
 
 Prepare for an update to the transformer.
 
-Like the [`Tok2Vec`](api/tok2vec) component, the `CuratedTransformer` component is unusual
-in that it does not receive "gold standard" annotations to calculate
-a weight update. The optimal output of the transformer data is unknown;
-it's a hidden layer inside the network that is updated by backpropagating
-from output layers.
+Like the [`Tok2Vec`](api/tok2vec) component, the `CuratedTransformer` component
+is unusual in that it does not receive "gold standard" annotations to calculate
+a weight update. The optimal output of the transformer data is unknown; it's a
+hidden layer inside the network that is updated by backpropagating from output
+layers.
 
 The `CuratedTransformer` component therefore does not perform a weight update
-during its own `update` method. Instead, it runs its transformer model
-and communicates the output and the backpropagation callback to any
-downstream components that have been connected to it via the
-TransformerListener sublayer. If there are multiple listeners, the last
-layer will actually backprop to the transformer and call the optimizer,
-while the others simply increment the gradients.
-
+during its own `update` method. Instead, it runs its transformer model and
+communicates the output and the backpropagation callback to any downstream
+components that have been connected to it via the TransformerListener sublayer.
+If there are multiple listeners, the last layer will actually backprop to the
+transformer and call the optimizer, while the others simply increment the
+gradients.
 
 > #### Example
 >
@@ -339,7 +346,7 @@ Load the pipe from disk. Modifies the object in place and returns it.
 | `path`         | A path to a directory. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
 | _keyword-only_ |                                                                                                 |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~     |
-| **RETURNS**    | The modified `CuratedTransformer` object. ~~CuratedTransformer~~                                              |
+| **RETURNS**    | The modified `CuratedTransformer` object. ~~CuratedTransformer~~                                |
 
 ## CuratedTransformer.to_bytes {id="to_bytes",tag="method"}
 
@@ -356,7 +363,7 @@ Serialize the pipe to a bytestring.
 | -------------- | ------------------------------------------------------------------------------------------- |
 | _keyword-only_ |                                                                                             |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
-| **RETURNS**    | The serialized form of the `CuratedTransformer` object. ~~bytes~~                                  |
+| **RETURNS**    | The serialized form of the `CuratedTransformer` object. ~~bytes~~                           |
 
 ## CuratedTransformer.from_bytes {id="from_bytes",tag="method"}
 
@@ -375,7 +382,7 @@ Load the pipe from a bytestring. Modifies the object in place and returns it.
 | `bytes_data`   | The data to load from. ~~bytes~~                                                            |
 | _keyword-only_ |                                                                                             |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
-| **RETURNS**    | The `CuratedTransformer` object. ~~CuratedTransformer~~                                                   |
+| **RETURNS**    | The `CuratedTransformer` object. ~~CuratedTransformer~~                                     |
 
 ## Serialization fields {id="serialization-fields"}
 
@@ -397,16 +404,16 @@ serialization by passing in the string names via the `exclude` argument.
 
 ## DocTransformerOutput {id="transformerdata",tag="dataclass"}
 
-CuratedTransformer tokens and outputs for one `Doc` object. The transformer models
-return tensors that refer to a whole padded batch of documents. These tensors
-are wrapped into the
+CuratedTransformer tokens and outputs for one `Doc` object. The transformer
+models return tensors that refer to a whole padded batch of documents. These
+tensors are wrapped into the
 [FullCuratedTransformerBatch](/api/transformer#fulltransformerbatch) object. The
-`FullCuratedTransformerBatch` then splits out the per-document data, which is handled
-by this class. Instances of this class are typically assigned to the
+`FullCuratedTransformerBatch` then splits out the per-document data, which is
+handled by this class. Instances of this class are typically assigned to the
 [`Doc._.trf_data`](/api/transformer#assigned-attributes) extension attribute.
 
 | Name           | Description                                                                                                                                                                                                                                                                                                                          |
-|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | `tokens`       | A slice of the tokens data produced by the tokenizer. This may have several fields, including the token IDs, the texts and the attention mask. See the [`transformers.BatchEncoding`](https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.BatchEncoding) object for details. ~~dict~~                       |
 | `model_output` | The model output from the transformer model, determined by the model and transformer config. New in `spacy-transformers` v1.1.0. ~~transformers.file_utils.ModelOutput~~                                                                                                                                                             |
 | `tensors`      | The `model_output` in the earlier `transformers` tuple format converted using [`ModelOutput.to_tuple()`](https://huggingface.co/transformers/main_classes/output.html#transformers.file_utils.ModelOutput.to_tuple). Returns `Tuple` instead of `List` as of `spacy-transformers` v1.1.0. ~~Tuple[Union[FloatsXd, List[FloatsXd]]]~~ |
@@ -417,26 +424,26 @@ by this class. Instances of this class are typically assigned to the
 
 Create an empty `DocTransformerOutput` container.
 
-| Name        | Description                        |
-| ----------- | ---------------------------------- |
+| Name        | Description                             |
+| ----------- | --------------------------------------- |
 | **RETURNS** | The container. ~~DocTransformerOutput~~ |
 
-
 ## Span getters {id="span_getters",source="github.com/explosion/spacy-transformers/blob/master/spacy_curated_transformers/span_getters.py"}
 
 Span getters are functions that take a batch of [`Doc`](/api/doc) objects and
 return a lists of [`Span`](/api/span) objects for each doc to be processed by
 the transformer. This is used to manage long documents by cutting them into
 smaller sequences before running the transformer. The spans are allowed to
-overlap, and you can also omit sections of the `Doc` if they are not relevant. Span getters can be referenced in the `[components.transformer.model.with_spans]`
-block of the config to customize the sequences processed by the transformer.
+overlap, and you can also omit sections of the `Doc` if they are not relevant.
+Span getters can be referenced in the
+`[components.transformer.model.with_spans]` block of the config to customize the
+sequences processed by the transformer.
 
 | Name        | Description                                                   |
 | ----------- | ------------------------------------------------------------- |
 | `docs`      | A batch of `Doc` objects. ~~Iterable[Doc]~~                   |
 | **RETURNS** | The spans to process by the transformer. ~~List[List[Span]]~~ |
 
-
 ### WithStridedSpans.v1 {id="strided_spans",tag="registered function"}
 
 > #### Example config
@@ -468,7 +475,7 @@ Placeholder text for tokenizers
 Construct a callback that initializes a Byte-BPE piece encoder model.
 
 | Name          | Description                           |
-|---------------|---------------------------------------|
+| ------------- | ------------------------------------- |
 | `vocab_path`  | Path to the vocabulary file. ~~Path~~ |
 | `merges_path` | Path to the merges file. ~~Path~~     |
 
@@ -477,30 +484,29 @@ Construct a callback that initializes a Byte-BPE piece encoder model.
 Construct a callback that initializes a character piece encoder model.
 
 | Name        | Description                                                                 |
-|-------------|-----------------------------------------------------------------------------|
+| ----------- | --------------------------------------------------------------------------- |
 | `path`      | Path to the serialized character model. ~~Path~~                            |
 | `bos_piece` | Piece used as a beginning-of-sentence token. Defaults to `"[BOS]"`. ~~str~~ |
 | `eos_piece` | Piece used as a end-of-sentence token. Defaults to `"[EOS]"`. ~~str~~       |
 | `unk_piece` | Piece used as a stand-in for unknown tokens. Defaults to `"[UNK]"`. ~~str~~ |
 | `normalize` | Unicode normalization form to use. Defaults to `"NFKC"`. ~~str~~            |
 
-
 ### HFPieceEncoderLoader.v1 {id="hf_pieceencoder_loader",tag="registered_function"}
 
-Construct a callback that initializes a HuggingFace piece encoder model. Used in conjunction with the HuggingFace model loader.
+Construct a callback that initializes a HuggingFace piece encoder model. Used in
+conjunction with the HuggingFace model loader.
 
 | Name       | Description                                |
-|------------|--------------------------------------------|
+| ---------- | ------------------------------------------ |
 | `name`     | Name of the HuggingFace model. ~~str~~     |
 | `revision` | Name of the model revision/branch. ~~str~~ |
 
-
 ### SentencepieceLoader.v1 {id="sentencepiece_loader",tag="registered_function"}
 
 Construct a callback that initializes a SentencePiece piece encoder model.
 
 | Name   | Description                                          |
-|--------|------------------------------------------------------|
+| ------ | ---------------------------------------------------- |
 | `path` | Path to the serialized SentencePiece model. ~~Path~~ |
 
 ### WordpieceLoader.v1 {id="wordpiece_loader",tag="registered_function"}
@@ -508,39 +514,38 @@ Construct a callback that initializes a SentencePiece piece encoder model.
 Construct a callback that initializes a WordPiece piece encoder model.
 
 | Name   | Description                                      |
-|--------|--------------------------------------------------|
+| ------ | ------------------------------------------------ |
 | `path` | Path to the serialized WordPiece model. ~~Path~~ |
 
 ## Model Loaders
 
 ### HFTransformerEncoderLoader.v1 {id="hf_trfencoder_loader",tag="registered_function"}
 
-Construct a callback that initializes a supported transformer model with weights from a corresponding HuggingFace model.
+Construct a callback that initializes a supported transformer model with weights
+from a corresponding HuggingFace model.
 
 | Name       | Description                                |
-|------------|--------------------------------------------|
+| ---------- | ------------------------------------------ |
 | `name`     | Name of the HuggingFace model. ~~str~~     |
 | `revision` | Name of the model revision/branch. ~~str~~ |
 
 ### PyTorchCheckpointLoader.v1 {id="pytorch_checkpoint_loader",tag="registered_function"}
 
-Construct a callback that initializes a supported transformer model with weights from a PyTorch checkpoint.
+Construct a callback that initializes a supported transformer model with weights
+from a PyTorch checkpoint.
 
 | Name   | Description                              |
-|--------|------------------------------------------|
+| ------ | ---------------------------------------- |
 | `path` | Path to the PyTorch checkpoint. ~~Path~~ |
 
 ## Callbacks
 
 ### gradual_transformer_unfreezing.v1 {id="gradual_transformer_unfreezing",tag="registered_function"}
 
-Construct a callback that can be used to gradually unfreeze the
-weights of one or more Transformer components during training. This
-can be used to prevent catastrophic forgetting during fine-tuning.
-
+Construct a callback that can be used to gradually unfreeze the weights of one
+or more Transformer components during training. This can be used to prevent
+catastrophic forgetting during fine-tuning.
 
 | Name           | Description                                                                                                                                                                  |
-|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `target_pipes` | A dictionary whose keys and values correspond to the names of Transformer components and the training step at which they should be unfrozen respectively. ~~Dict[str, int]~~ |
-
-