diff --git a/website/docs/api/dependencyparser.md b/website/docs/api/dependencyparser.md
index 9fd8f60d2..ed5e8bdb2 100644
--- a/website/docs/api/dependencyparser.md
+++ b/website/docs/api/dependencyparser.md
@@ -293,7 +293,11 @@ context, the original parameters are restored.
 
 ## DependencyParser.add_label {#add_label tag="method"}
 
-Add a new label to the pipe.
+Add a new label to the pipe. Note that you don't have to call this method if you
+provide a **representative data sample** to the
+[`begin_training`](#begin_training) method. In this case, all labels found in
+the sample will be automatically added to the model, and the output dimension
+will be [inferred](/usage/layers-architectures#shape-inference) automatically.
 
 > #### Example
 >
@@ -307,17 +311,13 @@ Add a new label to the pipe.
 | `label`     | The label to add. ~~str~~                                   |
 | **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
 
-Note that you don't have to call `pipe.add_label` if you provide a
-representative data sample to the [`begin_training`](#begin_training) method. In
-this case, all labels found in the sample will be automatically added to the
-model, and the output dimension will be
-[inferred](/usage/layers-architectures#shape-inference) automatically.
-
 ## DependencyParser.set_output {#set_output tag="method"}
 
 Change the output dimension of the component's model by calling the model's
 attribute `resize_output`. This is a function that takes the original model and
-the new output dimension `nO`, and changes the model in place.
+the new output dimension `nO`, and changes the model in place. When resizing an
+already trained model, care should be taken to avoid the "catastrophic
+forgetting" problem.
 
 > #### Example
 >
@@ -330,9 +330,6 @@ the new output dimension `nO`, and changes the model in place.
 | ---- | --------------------------------- |
 | `nO` | The new output dimension. ~~int~~ |
 
-When resizing an already trained model, care should be taken to avoid the
-"catastrophic forgetting" problem.
-
 ## DependencyParser.to_disk {#to_disk tag="method"}
 
 Serialize the pipe to disk.
diff --git a/website/docs/api/entityrecognizer.md b/website/docs/api/entityrecognizer.md
index 51ad984ee..fc6904824 100644
--- a/website/docs/api/entityrecognizer.md
+++ b/website/docs/api/entityrecognizer.md
@@ -281,7 +281,11 @@ context, the original parameters are restored.
 
 ## EntityRecognizer.add_label {#add_label tag="method"}
 
-Add a new label to the pipe.
+Add a new label to the pipe. Note that you don't have to call this method if you
+provide a **representative data sample** to the
+[`begin_training`](#begin_training) method. In this case, all labels found in
+the sample will be automatically added to the model, and the output dimension
+will be [inferred](/usage/layers-architectures#shape-inference) automatically.
 
 > #### Example
 >
@@ -295,17 +299,13 @@ Add a new label to the pipe.
 | `label`     | The label to add. ~~str~~                                   |
 | **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
 
-Note that you don't have to call `pipe.add_label` if you provide a
-representative data sample to the [`begin_training`](#begin_training) method. In
-this case, all labels found in the sample will be automatically added to the
-model, and the output dimension will be
-[inferred](/usage/layers-architectures#shape-inference) automatically.
-
 ## EntityRecognizer.set_output {#set_output tag="method"}
 
 Change the output dimension of the component's model by calling the model's
 attribute `resize_output`. This is a function that takes the original model and
-the new output dimension `nO`, and changes the model in place.
+the new output dimension `nO`, and changes the model in place. When resizing an
+already trained model, care should be taken to avoid the "catastrophic
+forgetting" problem.
 
 > #### Example
 >
@@ -318,9 +318,6 @@ the new output dimension `nO`, and changes the model in place.
 | ---- | --------------------------------- |
 | `nO` | The new output dimension. ~~int~~ |
 
-When resizing an already trained model, care should be taken to avoid the
-"catastrophic forgetting" problem.
-
 ## EntityRecognizer.to_disk {#to_disk tag="method"}
 
 Serialize the pipe to disk.
diff --git a/website/docs/api/morphologizer.md b/website/docs/api/morphologizer.md
index 120b62b2f..c83d3d9fd 100644
--- a/website/docs/api/morphologizer.md
+++ b/website/docs/api/morphologizer.md
@@ -259,7 +259,11 @@ context, the original parameters are restored.
 Add a new label to the pipe. If the `Morphologizer` should set annotations for
 both `pos` and `morph`, the label should include the UPOS as the feature `POS`.
 Raises an error if the output dimension is already set, or if the model has
-already been fully [initialized](#begin_training).
+already been fully [initialized](#begin_training). Note that you don't have to
+call this method if you provide a **representative data sample** to the
+[`begin_training`](#begin_training) method. In this case, all labels found in
+the sample will be automatically added to the model, and the output dimension
+will be [inferred](/usage/layers-architectures#shape-inference) automatically.
 
 > #### Example
 >
@@ -273,12 +277,6 @@ already been fully [initialized](#begin_training).
 | `label`     | The label to add. ~~str~~                                   |
 | **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
 
-Note that you don't have to call `pipe.add_label` if you provide a
-representative data sample to the [`begin_training`](#begin_training) method. In
-this case, all labels found in the sample will be automatically added to the
-model, and the output dimension will be
-[inferred](/usage/layers-architectures#shape-inference) automatically.
-
 ## Morphologizer.to_disk {#to_disk tag="method"}
 
 Serialize the pipe to disk.
diff --git a/website/docs/api/pipe.md b/website/docs/api/pipe.md
index 7b77141fa..be1279553 100644
--- a/website/docs/api/pipe.md
+++ b/website/docs/api/pipe.md
@@ -293,12 +293,6 @@ context, the original parameters are restored.
 > pipe.add_label("MY_LABEL")
 > ```
 
-<Infobox variant="danger">
-
-This method needs to be overwritten with your own custom `add_label` method.
-
-</Infobox>
-
 Add a new label to the pipe, to be predicted by the model. The actual
 implementation depends on the specific component, but in general `add_label`
 shouldn't be called if the output dimension is already set, or if the model has
@@ -308,6 +302,12 @@ the component is [resizable](#is_resizable), in which case
 [`set_output`](#set_output) should be called to ensure that the model is
 properly resized.
 
+<Infobox variant="danger">
+
+This method needs to be overwritten with your own custom `add_label` method.
+
+</Infobox>
+
 | Name        | Description                                             |
 | ----------- | ------------------------------------------------------- |
 | `label`     | The label to add. ~~str~~                               |
@@ -326,41 +326,37 @@ model, and the output dimension will be
 > ```python
 > can_resize = pipe.is_resizable()
 > ```
+>
+> ```python
+> ### Custom resizing
+> def custom_resize(model, new_nO):
+>     # adjust model
+>     return model
+>
+> custom_model.attrs["resize_output"] = custom_resize
+> ```
 
 Check whether or not the output dimension of the component's model can be
 resized. If this method returns `True`, [`set_output`](#set_output) can be
 called to change the model's output dimension.
 
+For built-in components that are not resizable, you have to create and train a
+new model from scratch with the appropriate architecture and output dimension.
+For custom components, you can implement a `resize_output` function and add it
+as an attribute to the component's model.
+
 | Name        | Description                                                                                    |
 | ----------- | ---------------------------------------------------------------------------------------------- |
 | **RETURNS** | Whether or not the output dimension of the model can be changed after initialization. ~~bool~~ |
 
-> #### Example
->
-> ```python
-> def custom_resize(model, new_nO):
->     # adjust model
->     return model
-> custom_model.attrs["resize_output"] = custom_resize
-> ```
-
-For built-in components that are not resizable, you have to create and train a
-new model from scratch with the appropriate architecture and output dimension.
-
-For custom components, you can implement a `resize_output` function and add it
-as an attribute to the component's model.
-
 ## Pipe.set_output {#set_output tag="method"}
 
 Change the output dimension of the component's model. If the component is not
-[resizable](#is_resizable), this method will throw a `NotImplementedError`.
-
-If a component is resizable, the model's attribute `resize_output` will be
-called. This is a function that takes the original model and the new output
-dimension `nO`, and changes the model in place.
-
-When resizing an already trained model, care should be taken to avoid the
-"catastrophic forgetting" problem.
+[resizable](#is_resizable), this method will raise a `NotImplementedError`. If a
+component is resizable, the model's attribute `resize_output` will be called.
+This is a function that takes the original model and the new output dimension
+`nO`, and changes the model in place. When resizing an already trained model,
+care should be taken to avoid the "catastrophic forgetting" problem.
 
 > #### Example
 >
diff --git a/website/docs/api/tagger.md b/website/docs/api/tagger.md
index 0e929a6ab..eceb28b19 100644
--- a/website/docs/api/tagger.md
+++ b/website/docs/api/tagger.md
@@ -289,7 +289,12 @@ context, the original parameters are restored.
 ## Tagger.add_label {#add_label tag="method"}
 
 Add a new label to the pipe. Raises an error if the output dimension is already
-set, or if the model has already been fully [initialized](#begin_training).
+set, or if the model has already been fully [initialized](#begin_training). Note
+that you don't have to call this method if you provide a **representative data
+sample** to the [`begin_training`](#begin_training) method. In this case, all
+labels found in the sample will be automatically added to the model, and the
+output dimension will be [inferred](/usage/layers-architectures#shape-inference)
+automatically.
 
 > #### Example
 >
@@ -303,12 +308,6 @@ set, or if the model has already been fully [initialized](#begin_training).
 | `label`     | The label to add. ~~str~~                                   |
 | **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
 
-Note that you don't have to call `pipe.add_label` if you provide a
-representative data sample to the [`begin_training`](#begin_training) method. In
-this case, all labels found in the sample will be automatically added to the
-model, and the output dimension will be
-[inferred](/usage/layers-architectures#shape-inference) automatically.
-
 ## Tagger.to_disk {#to_disk tag="method"}
 
 Serialize the pipe to disk.
diff --git a/website/docs/api/textcategorizer.md b/website/docs/api/textcategorizer.md
index e0c7c2f79..0d71655c6 100644
--- a/website/docs/api/textcategorizer.md
+++ b/website/docs/api/textcategorizer.md
@@ -298,7 +298,12 @@ Modify the pipe's model, to use the given parameter values.
 ## TextCategorizer.add_label {#add_label tag="method"}
 
 Add a new label to the pipe. Raises an error if the output dimension is already
-set, or if the model has already been fully [initialized](#begin_training).
+set, or if the model has already been fully [initialized](#begin_training). Note
+that you don't have to call this method if you provide a **representative data
+sample** to the [`begin_training`](#begin_training) method. In this case, all
+labels found in the sample will be automatically added to the model, and the
+output dimension will be [inferred](/usage/layers-architectures#shape-inference)
+automatically.
 
 > #### Example
 >
@@ -312,12 +317,6 @@ set, or if the model has already been fully [initialized](#begin_training).
 | `label`     | The label to add. ~~str~~                                   |
 | **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
 
-Note that you don't have to call `pipe.add_label` if you provide a
-representative data sample to the [`begin_training`](#begin_training) method. In
-this case, all labels found in the sample will be automatically added to the
-model, and the output dimension will be
-[inferred](/usage/layers-architectures#shape-inference) automatically.
-
 ## TextCategorizer.to_disk {#to_disk tag="method"}
 
 Serialize the pipe to disk.
diff --git a/website/docs/usage/layers-architectures.md b/website/docs/usage/layers-architectures.md
index 894cccc26..6783f2b7f 100644
--- a/website/docs/usage/layers-architectures.md
+++ b/website/docs/usage/layers-architectures.md
@@ -5,8 +5,7 @@ menu:
   - ['Type Signatures', 'type-sigs']
   - ['Swapping Architectures', 'swap-architectures']
   - ['PyTorch & TensorFlow', 'frameworks']
-  - ['Custom Models', 'custom-models']
-  - ['Thinc implementation', 'thinc']
+  - ['Custom Thinc Models', 'thinc']
   - ['Trainable Components', 'components']
 next: /usage/projects
 ---
@@ -226,13 +225,24 @@ you'll be able to try it out in any of the spaCy components. ​
 
 Thinc allows you to [wrap models](https://thinc.ai/docs/usage-frameworks)
 written in other machine learning frameworks like PyTorch, TensorFlow and MXNet
-using a unified [`Model`](https://thinc.ai/docs/api-model) API.
-
-For example, let's use PyTorch to define a very simple Neural network consisting
-of two hidden `Linear` layers with `ReLU` activation and dropout, and a
-softmax-activated output layer.
+using a unified [`Model`](https://thinc.ai/docs/api-model) API. This makes it
+easy to use a model implemented in a different framework to power a component in
+your spaCy pipeline. For example, to wrap a PyTorch model as a Thinc `Model`,
+you can use Thinc's
+[`PyTorchWrapper`](https://thinc.ai/docs/api-layers#pytorchwrapper):
 
 ```python
+from thinc.api import PyTorchWrapper
+
+wrapped_pt_model = PyTorchWrapper(torch_model)
+```
+
+Let's use PyTorch to define a very simple neural network consisting of two
+hidden `Linear` layers with `ReLU` activation and dropout, and a
+softmax-activated output layer:
+
+```python
+### PyTorch model
 from torch import nn
 
 torch_model = nn.Sequential(
@@ -246,15 +256,6 @@ torch_model = nn.Sequential(
    )
 ```
 
-This PyTorch model can be wrapped as a Thinc `Model` by using Thinc's
-`PyTorchWrapper`:
-
-```python
-from thinc.api import PyTorchWrapper
-
-wrapped_pt_model = PyTorchWrapper(torch_model)
-```
-
 The resulting wrapped `Model` can be used as a **custom architecture** as such,
 or can be a **subcomponent of a larger model**. For instance, we can use Thinc's
 [`chain`](https://thinc.ai/docs/api-layers#chain) combinator, which works like
@@ -273,21 +274,26 @@ model = chain(char_embed, with_array(wrapped_pt_model))
 In the above example, we have combined our custom PyTorch model with a character
 embedding layer defined by spaCy.
 [CharacterEmbed](/api/architectures#CharacterEmbed) returns a `Model` that takes
-a `List[Doc]` as input, and outputs a `List[Floats2d]`. To make sure that the
-wrapped PyTorch model receives valid inputs, we use Thinc's
+a ~~List[Doc]~~ as input, and outputs a ~~List[Floats2d]~~. To make sure that
+the wrapped PyTorch model receives valid inputs, we use Thinc's
 [`with_array`](https://thinc.ai/docs/api-layers#with_array) helper.
 
-As another example, you could have a model where you use PyTorch just for the
-transformer layers, and use "native" Thinc layers to do fiddly input and output
-transformations and add on task-specific "heads", as efficiency is less of a
-consideration for those parts of the network.
+You could also implement a model that only uses PyTorch for the transformer
+layers, and "native" Thinc layers to do fiddly input and output transformations
+and add on task-specific "heads", as efficiency is less of a consideration for
+those parts of the network.
 
-## Custom models for trainable components {#custom-models}
+### Using wrapped models {#frameworks-usage}
 
 To use our custom model including the PyTorch subnetwork, all we need to do is
-register the architecture. The full example then becomes:
+register the architecture using the
+[`architectures` registry](/api/top-level#registry). This will assign the
+architecture a name so spaCy knows how to find it, and allows passing in
+arguments like hyperparameters via the [config](/usage/training#config). The
+full example then becomes:
 
 ```python
+### Registering the architecture {highlight="9"}
 from typing import List
 from thinc.types import Floats2d
 from thinc.api import Model, PyTorchWrapper, chain, with_array
@@ -297,7 +303,7 @@ from spacy.ml import CharacterEmbed
 from torch import nn
 
 @spacy.registry.architectures("CustomTorchModel.v1")
-def TorchModel(
+def create_torch_model(
     nO: int,
     width: int,
     hidden_width: int,
@@ -321,8 +327,10 @@ def TorchModel(
     return model
 ```
 
-Now you can use this model definition in any existing trainable spaCy component,
-by specifying it in the config file:
+The model definition can now be used in any existing trainable spaCy component,
+by specifying it in the config file. In this configuration, all required
+parameters for the various subcomponents of the custom architecture are passed
+in as settings via the config.
 
 ```ini
 ### config.cfg (excerpt) {highlight="5-5"}
@@ -340,106 +348,124 @@ nC = 8
 dropout = 0.2
 ```
 
-In this configuration, we pass all required parameters for the various
-subcomponents of the custom architecture as settings in the training config
-file. Remember that it is best not to rely on any (hidden) default values, to
-ensure that training configs are complete and experiments fully reproducible.
+<Infobox variant="warning">
 
-## Thinc implemention details {#thinc}
+Remember that it is best not to rely on any (hidden) default values, to ensure
+that training configs are complete and experiments fully reproducible.
 
-Ofcourse it's also possible to define the `Model` from the previous section
+</Infobox>
+
+## Custom models with Thinc {#thinc}
+
+Of course it's also possible to define the `Model` from the previous section
 entirely in Thinc. The Thinc documentation provides details on the
 [various layers](https://thinc.ai/docs/api-layers) and helper functions
-available.
-
-The combinators often used in Thinc can be used to
-[overload operators](https://thinc.ai/docs/usage-models#operators). A common
-usage is to bind `chain` to `>>`. The "native" Thinc version of our simple
-neural network would then become:
+available. Combinators can also be used to
+[overload operators](https://thinc.ai/docs/usage-models#operators) and a common
+usage pattern is to bind `chain` to `>>`. The "native" Thinc version of our
+simple neural network would then become:
 
 ```python
 from thinc.api import chain, with_array, Model, Relu, Dropout, Softmax
 from spacy.ml import CharacterEmbed
 
 char_embed = CharacterEmbed(width, embed_size, nM, nC)
-
 with Model.define_operators({">>": chain}):
     layers = (
-            Relu(hidden_width, width)
-            >> Dropout(dropout)
-            >> Relu(hidden_width, hidden_width)
-            >> Dropout(dropout)
-            >> Softmax(nO, hidden_width)
+        Relu(hidden_width, width)
+        >> Dropout(dropout)
+        >> Relu(hidden_width, hidden_width)
+        >> Dropout(dropout)
+        >> Softmax(nO, hidden_width)
     )
     model = char_embed >> with_array(layers)
 ```
 
-**⚠️ Note that Thinc layers define the output dimension (`nO`) as the first
-argument, followed (optionally) by the input dimension (`nI`). This is in
-contrast to how the PyTorch layers are defined, where `in_features` precedes
-`out_features`.**
+<Infobox variant="warning" title="Important note on inputs and outputs">
 
-### Shape inference in thinc {#shape-inference}
+Note that Thinc layers define the output dimension (`nO`) as the first argument,
+followed (optionally) by the input dimension (`nI`). This is in contrast to how
+the PyTorch layers are defined, where `in_features` precedes `out_features`.
 
-It is not strictly necessary to define all the input and output dimensions for
-each layer, as Thinc can perform
+</Infobox>
+
+### Shape inference in Thinc {#thinc-shape-inference}
+
+It is **not** strictly necessary to define all the input and output dimensions
+for each layer, as Thinc can perform
 [shape inference](https://thinc.ai/docs/usage-models#validation) between
 sequential layers by matching up the output dimensionality of one layer to the
 input dimensionality of the next. This means that we can simplify the `layers`
 definition:
 
+> #### Diff
+>
+> ```diff
+> layers = (
+>     Relu(hidden_width, width)
+>     >> Dropout(dropout)
+> -   >> Relu(hidden_width, hidden_width)
+> +    >> Relu(hidden_width)
+>     >> Dropout(dropout)
+> -   >> Softmax(nO, hidden_width)
+> +   >> Softmax(nO)
+> )
+> ```
+
 ```python
 with Model.define_operators({">>": chain}):
     layers = (
-            Relu(hidden_width, width)
-            >> Dropout(dropout)
-            >> Relu(hidden_width)
-            >> Dropout(dropout)
-            >> Softmax(nO)
+        Relu(hidden_width, width)
+        >> Dropout(dropout)
+        >> Relu(hidden_width)
+        >> Dropout(dropout)
+        >> Softmax(nO)
     )
 ```
 
-Thinc can go one step further and deduce the correct input dimension of the
-first layer, and output dimension of the last. To enable this functionality, you
-have to call [`model.initialize`](https://thinc.ai/docs/api-model#initialize)
-with an input sample `X` and an output sample `Y` with the correct dimensions.
+Thinc can even go one step further and **deduce the correct input dimension** of
+the first layer, and output dimension of the last. To enable this functionality,
+you have to call
+[`Model.initialize`](https://thinc.ai/docs/api-model#initialize) with an **input
+sample** `X` and an **output sample** `Y` with the correct dimensions:
 
 ```python
+### Shape inference with initialization {highlight="3,7,10"}
 with Model.define_operators({">>": chain}):
     layers = (
-            Relu(hidden_width)
-            >> Dropout(dropout)
-            >> Relu(hidden_width)
-            >> Dropout(dropout)
-            >> Softmax()
+        Relu(hidden_width)
+        >> Dropout(dropout)
+        >> Relu(hidden_width)
+        >> Dropout(dropout)
+        >> Softmax()
     )
     model = char_embed >> with_array(layers)
     model.initialize(X=input_sample, Y=output_sample)
 ```
 
 The built-in [pipeline components](/usage/processing-pipelines) in spaCy ensure
-that their internal models are always initialized with appropriate sample data.
-In this case, `X` is typically a `List` of `Doc` objects, while `Y` is a `List`
-of 1D or 2D arrays, depending on the specific task. This functionality is
-triggered when [`nlp.begin_training`](/api/language#begin_training) is called.
+that their internal models are **always initialized** with appropriate sample
+data. In this case, `X` is typically a ~~List[Doc]~~, while `Y` is typically a
+~~List[Array1d]~~ or ~~List[Array2d]~~, depending on the specific task. This
+functionality is triggered when
+[`nlp.begin_training`](/api/language#begin_training) is called.
 
-### Dropout and normalization {#drop-norm}
+### Dropout and normalization in Thinc {#thinc-dropout-norm}
 
-Many of the `Thinc` layers allow you to define a `dropout` argument that will
-result in "chaining" an additional
+Many of the available Thinc [layers](https://thinc.ai/docs/api-layers) allow you
+to define a `dropout` argument that will result in "chaining" an additional
 [`Dropout`](https://thinc.ai/docs/api-layers#dropout) layer. Optionally, you can
 often specify whether or not you want to add layer normalization, which would
 result in an additional
-[`LayerNorm`](https://thinc.ai/docs/api-layers#layernorm) layer.
-
-That means that the following `layers` definition is equivalent to the previous:
+[`LayerNorm`](https://thinc.ai/docs/api-layers#layernorm) layer. That means that
+the following `layers` definition is equivalent to the previous:
 
 ```python
 with Model.define_operators({">>": chain}):
     layers = (
-            Relu(hidden_width, dropout=dropout, normalize=False)
-            >> Relu(hidden_width, dropout=dropout, normalize=False)
-            >> Softmax()
+        Relu(hidden_width, dropout=dropout, normalize=False)
+        >> Relu(hidden_width, dropout=dropout, normalize=False)
+        >> Softmax()
     )
     model = char_embed >> with_array(layers)
     model.initialize(X=input_sample, Y=output_sample)
diff --git a/website/meta/type-annotations.json b/website/meta/type-annotations.json
index b1d94403d..79d4d357d 100644
--- a/website/meta/type-annotations.json
+++ b/website/meta/type-annotations.json
@@ -34,6 +34,8 @@
     "Floats2d": "https://thinc.ai/docs/api-types#types",
     "Floats3d": "https://thinc.ai/docs/api-types#types",
     "FloatsXd": "https://thinc.ai/docs/api-types#types",
+    "Array1d": "https://thinc.ai/docs/api-types#types",
+    "Array2d": "https://thinc.ai/docs/api-types#types",
     "Ops": "https://thinc.ai/docs/api-backends#ops",
     "cymem.Pool": "https://github.com/explosion/cymem",
     "preshed.BloomFilter": "https://github.com/explosion/preshed",