diff --git a/website/docs/api/transformer.md b/website/docs/api/transformer.md
index c32651e02..b09455b41 100644
--- a/website/docs/api/transformer.md
+++ b/website/docs/api/transformer.md
@@ -29,7 +29,7 @@ This pipeline component lets you use transformer models in your pipeline.
 Supports all models that are available via the
 [HuggingFace `transformers`](https://huggingface.co/transformers) library.
 Usually you will connect subsequent components to the shared transformer using
-the [TransformerListener](/api/architectures#TransformerListener) layer. This
+the [TransformerListener](/api/architectures##transformers-Tok2VecListener) layer. This
 works similarly to spaCy's [Tok2Vec](/api/tok2vec) component and
 [Tok2VecListener](/api/architectures/Tok2VecListener) sublayer.
 
@@ -233,7 +233,7 @@ The `Transformer` component therefore does **not** perform a weight update
 during its own `update` method. Instead, it runs its transformer model and
 communicates the output and the backpropagation callback to any **downstream
 components** that have been connected to it via the
-[TransformerListener](/api/architectures#TransformerListener) sublayer. If there
+[TransformerListener](/api/architectures##transformers-Tok2VecListener) sublayer. If there
 are multiple listeners, the last layer will actually backprop to the transformer
 and call the optimizer, while the others simply increment the gradients.
 
diff --git a/website/docs/usage/embeddings-transformers.md b/website/docs/usage/embeddings-transformers.md
index e2c1a6fd0..b5f58927a 100644
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@@ -101,7 +101,7 @@ it processes a batch of documents, it will pass forward its predictions to the
 listeners, allowing the listeners to **reuse the predictions** when they are
 eventually called. A similar mechanism is used to pass gradients from the
 listeners back to the model. The [`Transformer`](/api/transformer) component and
-[TransformerListener](/api/architectures#TransformerListener) layer do the same
+[TransformerListener](/api/architectures#transformers-Tok2VecListener) layer do the same
 thing for transformer models, but the `Transformer` component will also save the
 transformer outputs to the
 [`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
@@ -179,7 +179,7 @@ interoperates with [PyTorch](https://pytorch.org) and the
 giving you access to thousands of pretrained models for your pipelines. There
 are many [great guides](http://jalammar.github.io/illustrated-transformer/) to
 transformer models, but for practical purposes, you can simply think of them as
-a drop-in replacement that let you achieve **higher accuracy** in exchange for
+drop-in replacements that let you achieve **higher accuracy** in exchange for
 **higher training and runtime costs**.
 
 ### Setup and installation {#transformers-installation}