diff --git a/website/docs/api/dependencyparser.md b/website/docs/api/dependencyparser.md
index 9fd8f60d2..ed5e8bdb2 100644
--- a/website/docs/api/dependencyparser.md
+++ b/website/docs/api/dependencyparser.md
@@ -293,7 +293,11 @@ context, the original parameters are restored.
## DependencyParser.add_label {#add_label tag="method"}
-Add a new label to the pipe.
+Add a new label to the pipe. Note that you don't have to call this method if you
+provide a **representative data sample** to the
+[`begin_training`](#begin_training) method. In this case, all labels found in
+the sample will be automatically added to the model, and the output dimension
+will be [inferred](/usage/layers-architectures#shape-inference) automatically.
> #### Example
>
@@ -307,17 +311,13 @@ Add a new label to the pipe.
| `label` | The label to add. ~~str~~ |
| **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
-Note that you don't have to call `pipe.add_label` if you provide a
-representative data sample to the [`begin_training`](#begin_training) method. In
-this case, all labels found in the sample will be automatically added to the
-model, and the output dimension will be
-[inferred](/usage/layers-architectures#shape-inference) automatically.
-
## DependencyParser.set_output {#set_output tag="method"}
Change the output dimension of the component's model by calling the model's
attribute `resize_output`. This is a function that takes the original model and
-the new output dimension `nO`, and changes the model in place.
+the new output dimension `nO`, and changes the model in place. When resizing an
+already trained model, care should be taken to avoid the "catastrophic
+forgetting" problem.
> #### Example
>
@@ -330,9 +330,6 @@ the new output dimension `nO`, and changes the model in place.
| ---- | --------------------------------- |
| `nO` | The new output dimension. ~~int~~ |
-When resizing an already trained model, care should be taken to avoid the
-"catastrophic forgetting" problem.
-
## DependencyParser.to_disk {#to_disk tag="method"}
Serialize the pipe to disk.
diff --git a/website/docs/api/entityrecognizer.md b/website/docs/api/entityrecognizer.md
index 51ad984ee..fc6904824 100644
--- a/website/docs/api/entityrecognizer.md
+++ b/website/docs/api/entityrecognizer.md
@@ -281,7 +281,11 @@ context, the original parameters are restored.
## EntityRecognizer.add_label {#add_label tag="method"}
-Add a new label to the pipe.
+Add a new label to the pipe. Note that you don't have to call this method if you
+provide a **representative data sample** to the
+[`begin_training`](#begin_training) method. In this case, all labels found in
+the sample will be automatically added to the model, and the output dimension
+will be [inferred](/usage/layers-architectures#shape-inference) automatically.
> #### Example
>
@@ -295,17 +299,13 @@ Add a new label to the pipe.
| `label` | The label to add. ~~str~~ |
| **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
-Note that you don't have to call `pipe.add_label` if you provide a
-representative data sample to the [`begin_training`](#begin_training) method. In
-this case, all labels found in the sample will be automatically added to the
-model, and the output dimension will be
-[inferred](/usage/layers-architectures#shape-inference) automatically.
-
## EntityRecognizer.set_output {#set_output tag="method"}
Change the output dimension of the component's model by calling the model's
attribute `resize_output`. This is a function that takes the original model and
-the new output dimension `nO`, and changes the model in place.
+the new output dimension `nO`, and changes the model in place. When resizing an
+already trained model, care should be taken to avoid the "catastrophic
+forgetting" problem.
> #### Example
>
@@ -318,9 +318,6 @@ the new output dimension `nO`, and changes the model in place.
| ---- | --------------------------------- |
| `nO` | The new output dimension. ~~int~~ |
-When resizing an already trained model, care should be taken to avoid the
-"catastrophic forgetting" problem.
-
## EntityRecognizer.to_disk {#to_disk tag="method"}
Serialize the pipe to disk.
diff --git a/website/docs/api/morphologizer.md b/website/docs/api/morphologizer.md
index 120b62b2f..c83d3d9fd 100644
--- a/website/docs/api/morphologizer.md
+++ b/website/docs/api/morphologizer.md
@@ -259,7 +259,11 @@ context, the original parameters are restored.
Add a new label to the pipe. If the `Morphologizer` should set annotations for
both `pos` and `morph`, the label should include the UPOS as the feature `POS`.
Raises an error if the output dimension is already set, or if the model has
-already been fully [initialized](#begin_training).
+already been fully [initialized](#begin_training). Note that you don't have to
+call this method if you provide a **representative data sample** to the
+[`begin_training`](#begin_training) method. In this case, all labels found in
+the sample will be automatically added to the model, and the output dimension
+will be [inferred](/usage/layers-architectures#shape-inference) automatically.
> #### Example
>
@@ -273,12 +277,6 @@ already been fully [initialized](#begin_training).
| `label` | The label to add. ~~str~~ |
| **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
-Note that you don't have to call `pipe.add_label` if you provide a
-representative data sample to the [`begin_training`](#begin_training) method. In
-this case, all labels found in the sample will be automatically added to the
-model, and the output dimension will be
-[inferred](/usage/layers-architectures#shape-inference) automatically.
-
## Morphologizer.to_disk {#to_disk tag="method"}
Serialize the pipe to disk.
diff --git a/website/docs/api/pipe.md b/website/docs/api/pipe.md
index 7b77141fa..be1279553 100644
--- a/website/docs/api/pipe.md
+++ b/website/docs/api/pipe.md
@@ -293,12 +293,6 @@ context, the original parameters are restored.
> pipe.add_label("MY_LABEL")
> ```
-
-
-This method needs to be overwritten with your own custom `add_label` method.
-
-
-
Add a new label to the pipe, to be predicted by the model. The actual
implementation depends on the specific component, but in general `add_label`
shouldn't be called if the output dimension is already set, or if the model has
@@ -308,6 +302,12 @@ the component is [resizable](#is_resizable), in which case
[`set_output`](#set_output) should be called to ensure that the model is
properly resized.
+
+
+This method needs to be overwritten with your own custom `add_label` method.
+
+
+
| Name | Description |
| ----------- | ------------------------------------------------------- |
| `label` | The label to add. ~~str~~ |
@@ -326,41 +326,37 @@ model, and the output dimension will be
> ```python
> can_resize = pipe.is_resizable()
> ```
+>
+> ```python
+> ### Custom resizing
+> def custom_resize(model, new_nO):
+> # adjust model
+> return model
+>
+> custom_model.attrs["resize_output"] = custom_resize
+> ```
Check whether or not the output dimension of the component's model can be
resized. If this method returns `True`, [`set_output`](#set_output) can be
called to change the model's output dimension.
+For built-in components that are not resizable, you have to create and train a
+new model from scratch with the appropriate architecture and output dimension.
+For custom components, you can implement a `resize_output` function and add it
+as an attribute to the component's model.
+
| Name | Description |
| ----------- | ---------------------------------------------------------------------------------------------- |
| **RETURNS** | Whether or not the output dimension of the model can be changed after initialization. ~~bool~~ |
-> #### Example
->
-> ```python
-> def custom_resize(model, new_nO):
-> # adjust model
-> return model
-> custom_model.attrs["resize_output"] = custom_resize
-> ```
-
-For built-in components that are not resizable, you have to create and train a
-new model from scratch with the appropriate architecture and output dimension.
-
-For custom components, you can implement a `resize_output` function and add it
-as an attribute to the component's model.
-
## Pipe.set_output {#set_output tag="method"}
Change the output dimension of the component's model. If the component is not
-[resizable](#is_resizable), this method will throw a `NotImplementedError`.
-
-If a component is resizable, the model's attribute `resize_output` will be
-called. This is a function that takes the original model and the new output
-dimension `nO`, and changes the model in place.
-
-When resizing an already trained model, care should be taken to avoid the
-"catastrophic forgetting" problem.
+[resizable](#is_resizable), this method will raise a `NotImplementedError`. If a
+component is resizable, the model's attribute `resize_output` will be called.
+This is a function that takes the original model and the new output dimension
+`nO`, and changes the model in place. When resizing an already trained model,
+care should be taken to avoid the "catastrophic forgetting" problem.
> #### Example
>
diff --git a/website/docs/api/tagger.md b/website/docs/api/tagger.md
index 0e929a6ab..eceb28b19 100644
--- a/website/docs/api/tagger.md
+++ b/website/docs/api/tagger.md
@@ -289,7 +289,12 @@ context, the original parameters are restored.
## Tagger.add_label {#add_label tag="method"}
Add a new label to the pipe. Raises an error if the output dimension is already
-set, or if the model has already been fully [initialized](#begin_training).
+set, or if the model has already been fully [initialized](#begin_training). Note
+that you don't have to call this method if you provide a **representative data
+sample** to the [`begin_training`](#begin_training) method. In this case, all
+labels found in the sample will be automatically added to the model, and the
+output dimension will be [inferred](/usage/layers-architectures#shape-inference)
+automatically.
> #### Example
>
@@ -303,12 +308,6 @@ set, or if the model has already been fully [initialized](#begin_training).
| `label` | The label to add. ~~str~~ |
| **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
-Note that you don't have to call `pipe.add_label` if you provide a
-representative data sample to the [`begin_training`](#begin_training) method. In
-this case, all labels found in the sample will be automatically added to the
-model, and the output dimension will be
-[inferred](/usage/layers-architectures#shape-inference) automatically.
-
## Tagger.to_disk {#to_disk tag="method"}
Serialize the pipe to disk.
diff --git a/website/docs/api/textcategorizer.md b/website/docs/api/textcategorizer.md
index e0c7c2f79..0d71655c6 100644
--- a/website/docs/api/textcategorizer.md
+++ b/website/docs/api/textcategorizer.md
@@ -298,7 +298,12 @@ Modify the pipe's model, to use the given parameter values.
## TextCategorizer.add_label {#add_label tag="method"}
Add a new label to the pipe. Raises an error if the output dimension is already
-set, or if the model has already been fully [initialized](#begin_training).
+set, or if the model has already been fully [initialized](#begin_training). Note
+that you don't have to call this method if you provide a **representative data
+sample** to the [`begin_training`](#begin_training) method. In this case, all
+labels found in the sample will be automatically added to the model, and the
+output dimension will be [inferred](/usage/layers-architectures#shape-inference)
+automatically.
> #### Example
>
@@ -312,12 +317,6 @@ set, or if the model has already been fully [initialized](#begin_training).
| `label` | The label to add. ~~str~~ |
| **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
-Note that you don't have to call `pipe.add_label` if you provide a
-representative data sample to the [`begin_training`](#begin_training) method. In
-this case, all labels found in the sample will be automatically added to the
-model, and the output dimension will be
-[inferred](/usage/layers-architectures#shape-inference) automatically.
-
## TextCategorizer.to_disk {#to_disk tag="method"}
Serialize the pipe to disk.
diff --git a/website/docs/usage/layers-architectures.md b/website/docs/usage/layers-architectures.md
index 894cccc26..6783f2b7f 100644
--- a/website/docs/usage/layers-architectures.md
+++ b/website/docs/usage/layers-architectures.md
@@ -5,8 +5,7 @@ menu:
- ['Type Signatures', 'type-sigs']
- ['Swapping Architectures', 'swap-architectures']
- ['PyTorch & TensorFlow', 'frameworks']
- - ['Custom Models', 'custom-models']
- - ['Thinc implementation', 'thinc']
+ - ['Custom Thinc Models', 'thinc']
- ['Trainable Components', 'components']
next: /usage/projects
---
@@ -226,13 +225,24 @@ you'll be able to try it out in any of the spaCy components.
Thinc allows you to [wrap models](https://thinc.ai/docs/usage-frameworks)
written in other machine learning frameworks like PyTorch, TensorFlow and MXNet
-using a unified [`Model`](https://thinc.ai/docs/api-model) API.
-
-For example, let's use PyTorch to define a very simple Neural network consisting
-of two hidden `Linear` layers with `ReLU` activation and dropout, and a
-softmax-activated output layer.
+using a unified [`Model`](https://thinc.ai/docs/api-model) API. This makes it
+easy to use a model implemented in a different framework to power a component in
+your spaCy pipeline. For example, to wrap a PyTorch model as a Thinc `Model`,
+you can use Thinc's
+[`PyTorchWrapper`](https://thinc.ai/docs/api-layers#pytorchwrapper):
```python
+from thinc.api import PyTorchWrapper
+
+wrapped_pt_model = PyTorchWrapper(torch_model)
+```
+
+Let's use PyTorch to define a very simple neural network consisting of two
+hidden `Linear` layers with `ReLU` activation and dropout, and a
+softmax-activated output layer:
+
+```python
+### PyTorch model
from torch import nn
torch_model = nn.Sequential(
@@ -246,15 +256,6 @@ torch_model = nn.Sequential(
)
```
-This PyTorch model can be wrapped as a Thinc `Model` by using Thinc's
-`PyTorchWrapper`:
-
-```python
-from thinc.api import PyTorchWrapper
-
-wrapped_pt_model = PyTorchWrapper(torch_model)
-```
-
The resulting wrapped `Model` can be used as a **custom architecture** as such,
or can be a **subcomponent of a larger model**. For instance, we can use Thinc's
[`chain`](https://thinc.ai/docs/api-layers#chain) combinator, which works like
@@ -273,21 +274,26 @@ model = chain(char_embed, with_array(wrapped_pt_model))
In the above example, we have combined our custom PyTorch model with a character
embedding layer defined by spaCy.
[CharacterEmbed](/api/architectures#CharacterEmbed) returns a `Model` that takes
-a `List[Doc]` as input, and outputs a `List[Floats2d]`. To make sure that the
-wrapped PyTorch model receives valid inputs, we use Thinc's
+a ~~List[Doc]~~ as input, and outputs a ~~List[Floats2d]~~. To make sure that
+the wrapped PyTorch model receives valid inputs, we use Thinc's
[`with_array`](https://thinc.ai/docs/api-layers#with_array) helper.
-As another example, you could have a model where you use PyTorch just for the
-transformer layers, and use "native" Thinc layers to do fiddly input and output
-transformations and add on task-specific "heads", as efficiency is less of a
-consideration for those parts of the network.
+You could also implement a model that only uses PyTorch for the transformer
+layers, and "native" Thinc layers to do fiddly input and output transformations
+and add on task-specific "heads", as efficiency is less of a consideration for
+those parts of the network.
-## Custom models for trainable components {#custom-models}
+### Using wrapped models {#frameworks-usage}
To use our custom model including the PyTorch subnetwork, all we need to do is
-register the architecture. The full example then becomes:
+register the architecture using the
+[`architectures` registry](/api/top-level#registry). This will assign the
+architecture a name so spaCy knows how to find it, and allows passing in
+arguments like hyperparameters via the [config](/usage/training#config). The
+full example then becomes:
```python
+### Registering the architecture {highlight="9"}
from typing import List
from thinc.types import Floats2d
from thinc.api import Model, PyTorchWrapper, chain, with_array
@@ -297,7 +303,7 @@ from spacy.ml import CharacterEmbed
from torch import nn
@spacy.registry.architectures("CustomTorchModel.v1")
-def TorchModel(
+def create_torch_model(
nO: int,
width: int,
hidden_width: int,
@@ -321,8 +327,10 @@ def TorchModel(
return model
```
-Now you can use this model definition in any existing trainable spaCy component,
-by specifying it in the config file:
+The model definition can now be used in any existing trainable spaCy component,
+by specifying it in the config file. In this configuration, all required
+parameters for the various subcomponents of the custom architecture are passed
+in as settings via the config.
```ini
### config.cfg (excerpt) {highlight="5-5"}
@@ -340,106 +348,124 @@ nC = 8
dropout = 0.2
```
-In this configuration, we pass all required parameters for the various
-subcomponents of the custom architecture as settings in the training config
-file. Remember that it is best not to rely on any (hidden) default values, to
-ensure that training configs are complete and experiments fully reproducible.
+
-## Thinc implemention details {#thinc}
+Remember that it is best not to rely on any (hidden) default values, to ensure
+that training configs are complete and experiments fully reproducible.
-Ofcourse it's also possible to define the `Model` from the previous section
+
+
+## Custom models with Thinc {#thinc}
+
+Of course it's also possible to define the `Model` from the previous section
entirely in Thinc. The Thinc documentation provides details on the
[various layers](https://thinc.ai/docs/api-layers) and helper functions
-available.
-
-The combinators often used in Thinc can be used to
-[overload operators](https://thinc.ai/docs/usage-models#operators). A common
-usage is to bind `chain` to `>>`. The "native" Thinc version of our simple
-neural network would then become:
+available. Combinators can also be used to
+[overload operators](https://thinc.ai/docs/usage-models#operators) and a common
+usage pattern is to bind `chain` to `>>`. The "native" Thinc version of our
+simple neural network would then become:
```python
from thinc.api import chain, with_array, Model, Relu, Dropout, Softmax
from spacy.ml import CharacterEmbed
char_embed = CharacterEmbed(width, embed_size, nM, nC)
-
with Model.define_operators({">>": chain}):
layers = (
- Relu(hidden_width, width)
- >> Dropout(dropout)
- >> Relu(hidden_width, hidden_width)
- >> Dropout(dropout)
- >> Softmax(nO, hidden_width)
+ Relu(hidden_width, width)
+ >> Dropout(dropout)
+ >> Relu(hidden_width, hidden_width)
+ >> Dropout(dropout)
+ >> Softmax(nO, hidden_width)
)
model = char_embed >> with_array(layers)
```
-**⚠️ Note that Thinc layers define the output dimension (`nO`) as the first
-argument, followed (optionally) by the input dimension (`nI`). This is in
-contrast to how the PyTorch layers are defined, where `in_features` precedes
-`out_features`.**
+
-### Shape inference in thinc {#shape-inference}
+Note that Thinc layers define the output dimension (`nO`) as the first argument,
+followed (optionally) by the input dimension (`nI`). This is in contrast to how
+the PyTorch layers are defined, where `in_features` precedes `out_features`.
-It is not strictly necessary to define all the input and output dimensions for
-each layer, as Thinc can perform
+
+
+### Shape inference in Thinc {#thinc-shape-inference}
+
+It is **not** strictly necessary to define all the input and output dimensions
+for each layer, as Thinc can perform
[shape inference](https://thinc.ai/docs/usage-models#validation) between
sequential layers by matching up the output dimensionality of one layer to the
input dimensionality of the next. This means that we can simplify the `layers`
definition:
+> #### Diff
+>
+> ```diff
+> layers = (
+> Relu(hidden_width, width)
+> >> Dropout(dropout)
+> - >> Relu(hidden_width, hidden_width)
+> + >> Relu(hidden_width)
+> >> Dropout(dropout)
+> - >> Softmax(nO, hidden_width)
+> + >> Softmax(nO)
+> )
+> ```
+
```python
with Model.define_operators({">>": chain}):
layers = (
- Relu(hidden_width, width)
- >> Dropout(dropout)
- >> Relu(hidden_width)
- >> Dropout(dropout)
- >> Softmax(nO)
+ Relu(hidden_width, width)
+ >> Dropout(dropout)
+ >> Relu(hidden_width)
+ >> Dropout(dropout)
+ >> Softmax(nO)
)
```
-Thinc can go one step further and deduce the correct input dimension of the
-first layer, and output dimension of the last. To enable this functionality, you
-have to call [`model.initialize`](https://thinc.ai/docs/api-model#initialize)
-with an input sample `X` and an output sample `Y` with the correct dimensions.
+Thinc can even go one step further and **deduce the correct input dimension** of
+the first layer, and output dimension of the last. To enable this functionality,
+you have to call
+[`Model.initialize`](https://thinc.ai/docs/api-model#initialize) with an **input
+sample** `X` and an **output sample** `Y` with the correct dimensions:
```python
+### Shape inference with initialization {highlight="3,7,10"}
with Model.define_operators({">>": chain}):
layers = (
- Relu(hidden_width)
- >> Dropout(dropout)
- >> Relu(hidden_width)
- >> Dropout(dropout)
- >> Softmax()
+ Relu(hidden_width)
+ >> Dropout(dropout)
+ >> Relu(hidden_width)
+ >> Dropout(dropout)
+ >> Softmax()
)
model = char_embed >> with_array(layers)
model.initialize(X=input_sample, Y=output_sample)
```
The built-in [pipeline components](/usage/processing-pipelines) in spaCy ensure
-that their internal models are always initialized with appropriate sample data.
-In this case, `X` is typically a `List` of `Doc` objects, while `Y` is a `List`
-of 1D or 2D arrays, depending on the specific task. This functionality is
-triggered when [`nlp.begin_training`](/api/language#begin_training) is called.
+that their internal models are **always initialized** with appropriate sample
+data. In this case, `X` is typically a ~~List[Doc]~~, while `Y` is typically a
+~~List[Array1d]~~ or ~~List[Array2d]~~, depending on the specific task. This
+functionality is triggered when
+[`nlp.begin_training`](/api/language#begin_training) is called.
-### Dropout and normalization {#drop-norm}
+### Dropout and normalization in Thinc {#thinc-dropout-norm}
-Many of the `Thinc` layers allow you to define a `dropout` argument that will
-result in "chaining" an additional
+Many of the available Thinc [layers](https://thinc.ai/docs/api-layers) allow you
+to define a `dropout` argument that will result in "chaining" an additional
[`Dropout`](https://thinc.ai/docs/api-layers#dropout) layer. Optionally, you can
often specify whether or not you want to add layer normalization, which would
result in an additional
-[`LayerNorm`](https://thinc.ai/docs/api-layers#layernorm) layer.
-
-That means that the following `layers` definition is equivalent to the previous:
+[`LayerNorm`](https://thinc.ai/docs/api-layers#layernorm) layer. That means that
+the following `layers` definition is equivalent to the previous:
```python
with Model.define_operators({">>": chain}):
layers = (
- Relu(hidden_width, dropout=dropout, normalize=False)
- >> Relu(hidden_width, dropout=dropout, normalize=False)
- >> Softmax()
+ Relu(hidden_width, dropout=dropout, normalize=False)
+ >> Relu(hidden_width, dropout=dropout, normalize=False)
+ >> Softmax()
)
model = char_embed >> with_array(layers)
model.initialize(X=input_sample, Y=output_sample)
diff --git a/website/meta/type-annotations.json b/website/meta/type-annotations.json
index b1d94403d..79d4d357d 100644
--- a/website/meta/type-annotations.json
+++ b/website/meta/type-annotations.json
@@ -34,6 +34,8 @@
"Floats2d": "https://thinc.ai/docs/api-types#types",
"Floats3d": "https://thinc.ai/docs/api-types#types",
"FloatsXd": "https://thinc.ai/docs/api-types#types",
+ "Array1d": "https://thinc.ai/docs/api-types#types",
+ "Array2d": "https://thinc.ai/docs/api-types#types",
"Ops": "https://thinc.ai/docs/api-backends#ops",
"cymem.Pool": "https://github.com/explosion/cymem",
"preshed.BloomFilter": "https://github.com/explosion/preshed",