diff --git a/netlify.toml b/netlify.toml
index ddcd0ca6c..27a15fa25 100644
--- a/netlify.toml
+++ b/netlify.toml
@@ -55,6 +55,7 @@ redirects = [
{from = "/models/comparison", to = "/models", force = true},
{from = "/api/#section-cython", to = "/api/cython", force = true},
{from = "/api/#cython", to = "/api/cython", force = true},
+ {from = "/api/architectures#TextCatCNN", to = "/api/legacy#TextCatCNN_v2", force = true},
{from = "/api/sentencesegmenter", to="/api/sentencizer"},
{from = "/universe", to = "/universe/project/:id", query = {id = ":id"}, force = true},
{from = "/universe", to = "/universe/category/:category", query = {category = ":category"}, force = true},
diff --git a/website/docs/api/architectures.mdx b/website/docs/api/architectures.mdx
index 9447ca116..643d66140 100644
--- a/website/docs/api/architectures.mdx
+++ b/website/docs/api/architectures.mdx
@@ -1018,49 +1018,6 @@ but used an internal `tok2vec` instead of taking it as argument:
-### spacy.TextCatCNN.v2 {id="TextCatCNN"}
-
-> #### Example Config
->
-> ```ini
-> [model]
-> @architectures = "spacy.TextCatCNN.v2"
-> exclusive_classes = false
-> nO = null
->
-> [model.tok2vec]
-> @architectures = "spacy.HashEmbedCNN.v2"
-> pretrained_vectors = null
-> width = 96
-> depth = 4
-> embed_size = 2000
-> window_size = 1
-> maxout_pieces = 3
-> subword_features = true
-> ```
-
-A neural network model where token vectors are calculated using a CNN. The
-vectors are mean pooled and used as features in a feed-forward network. This
-architecture is usually less accurate than the ensemble, but runs faster.
-
-This model is identical to [TexCatReduce.v1](#TextCatReduce) with
-`use_reduce_mean=true`, `use_reduce_first=false` and `use_reduce_max=false`.
-
-| Name | Description |
-| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `exclusive_classes` | Whether or not categories are mutually exclusive. ~~bool~~ |
-| `tok2vec` | The [`tok2vec`](#tok2vec) layer of the model. ~~Model~~ |
-| `nO` | Output dimension, determined by the number of different labels. If not set, the [`TextCategorizer`](/api/textcategorizer) component will set it when `initialize` is called. ~~Optional[int]~~ |
-| **CREATES** | The model using the architecture. ~~Model[List[Doc], Floats2d]~~ |
-
-
-
-[TextCatCNN.v1](/api/legacy#TextCatCNN_v1) had the exact same signature, but was
-not yet resizable. Since v2, new labels can be added to this component, even
-after training.
-
-
-
### spacy.TextCatBOW.v3 {id="TextCatBOW"}
> #### Example Config
diff --git a/website/docs/api/legacy.mdx b/website/docs/api/legacy.mdx
index 32111ce92..5fdc791c2 100644
--- a/website/docs/api/legacy.mdx
+++ b/website/docs/api/legacy.mdx
@@ -162,7 +162,10 @@ network has an internal CNN Tok2Vec layer and uses attention.
Since `spacy.TextCatCNN.v2`, this architecture has become resizable, which means
that you can add labels to a previously trained textcat. `TextCatCNN` v1 did not
-yet support that.
+yet support that. `TextCatCNN` has been replaced by the more general
+[`TextCatReduce`](/api/architectures#TextCatReduce) layer. `TextCatCNN` is
+identical to `TextCatReduce` with `use_reduce_mean=true`,
+`use_reduce_first=false` and `use_reduce_max=false`.
> #### Example Config
>
@@ -194,6 +197,51 @@ architecture is usually less accurate than the ensemble, but runs faster.
| `nO` | Output dimension, determined by the number of different labels. If not set, the [`TextCategorizer`](/api/textcategorizer) component will set it when `initialize` is called. ~~Optional[int]~~ |
| **CREATES** | The model using the architecture. ~~Model[List[Doc], Floats2d]~~ |
+### spacy.TextCatCNN.v2 {id="TextCatCNN_v2"}
+
+> #### Example Config
+>
+> ```ini
+> [model]
+> @architectures = "spacy.TextCatCNN.v2"
+> exclusive_classes = false
+> nO = null
+>
+> [model.tok2vec]
+> @architectures = "spacy.HashEmbedCNN.v2"
+> pretrained_vectors = null
+> width = 96
+> depth = 4
+> embed_size = 2000
+> window_size = 1
+> maxout_pieces = 3
+> subword_features = true
+> ```
+
+A neural network model where token vectors are calculated using a CNN. The
+vectors are mean pooled and used as features in a feed-forward network. This
+architecture is usually less accurate than the ensemble, but runs faster.
+
+`TextCatCNN` has been replaced by the more general
+[`TextCatReduce`](/api/architectures#TextCatReduce) layer. `TextCatCNN` is
+identical to `TextCatReduce` with `use_reduce_mean=true`,
+`use_reduce_first=false` and `use_reduce_max=false`.
+
+| Name | Description |
+| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `exclusive_classes` | Whether or not categories are mutually exclusive. ~~bool~~ |
+| `tok2vec` | The [`tok2vec`](#tok2vec) layer of the model. ~~Model~~ |
+| `nO` | Output dimension, determined by the number of different labels. If not set, the [`TextCategorizer`](/api/textcategorizer) component will set it when `initialize` is called. ~~Optional[int]~~ |
+| **CREATES** | The model using the architecture. ~~Model[List[Doc], Floats2d]~~ |
+
+
+
+[TextCatCNN.v1](/api/legacy#TextCatCNN_v1) had the exact same signature, but was
+not yet resizable. Since v2, new labels can be added to this component, even
+after training.
+
+
+
### spacy.TextCatBOW.v1 {id="TextCatBOW_v1"}
Since `spacy.TextCatBOW.v2`, this architecture has become resizable, which means