spaCy/website/docs/usage/layers-architectures.md

---
title: Layers and Model Architectures
teaser: Power spaCy components with custom neural networks
menu:
  - ['Type Signatures', 'type-sigs']
  - ['Swapping Architectures', 'swap-architectures']
  - ['PyTorch & TensorFlow', 'frameworks']
  - ['Custom Thinc Models', 'thinc']
  - ['Trainable Components', 'components']
next: /usage/projects
---

> #### Example
>
> ```python
> from thinc.api import Model, chain
>
> @spacy.registry.architectures.register("model.v1")
> def build_model(width: int, classes: int) -> Model:
>     tok2vec = build_tok2vec(width)
>     output_layer = build_output_layer(width, classes)
>     model = chain(tok2vec, output_layer)
>     return model
> ```

A **model architecture** is a function that wires up a
[Thinc `Model`](https://thinc.ai/docs/api-model) instance. It describes the
neural network that is run internally as part of a component in a spaCy
pipeline. To define the actual architecture, you can implement your logic in
Thinc directly, or you can use Thinc as a thin wrapper around frameworks such as
PyTorch, TensorFlow and MXNet. Each `Model` can also be used as a sublayer of a
larger network, allowing you to freely combine implementations from different
frameworks into a single model.

spaCy's built-in components require a `Model` instance to be passed to them via
the config system. To change the model architecture of an existing component,
you just need to [**update the config**](#swap-architectures) so that it refers
to a different registered function. Once the component has been created from
this config, you won't be able to change it anymore. The architecture is like a
recipe for the network, and you can't change the recipe once the dish has
already been prepared. You have to make a new one.

```ini
### config.cfg (excerpt)
[components.tagger]
factory = "tagger"

[components.tagger.model]
@architectures = "model.v1"
width = 512
classes = 16
```

## Type signatures {#type-sigs}

> #### Example
>
> ```python
> from typing import List
> from thinc.api import Model, chain
> from thinc.types import Floats2d
> def chain_model(
>     tok2vec: Model[List[Doc], List[Floats2d]],
>     layer1: Model[List[Floats2d], Floats2d],
>     layer2: Model[Floats2d, Floats2d]
> ) -> Model[List[Doc], Floats2d]:
>     model = chain(tok2vec, layer1, layer2)
>     return model
> ```

The Thinc `Model` class is a **generic type** that can specify its input and
output types. Python uses a square-bracket notation for this, so the type
~~Model[List, Dict]~~ says that each batch of inputs to the model will be a
list, and the outputs will be a dictionary. You can be even more specific and
write for instance~~Model[List[Doc], Dict[str, float]]~~ to specify that the
model expects a list of [`Doc`](/api/doc) objects as input, and returns a
dictionary mapping of strings to floats. Some of the most common types you'll
see are: 

| Type               | Description                                                                                          |
| ------------------ | ---------------------------------------------------------------------------------------------------- |
| ~~List[Doc]~~      | A batch of [`Doc`](/api/doc) objects. Most components expect their models to take this as input.     |
| ~~Floats2d~~       | A two-dimensional `numpy` or `cupy` array of floats. Usually 32-bit.                                 |
| ~~Ints2d~~         | A two-dimensional `numpy` or `cupy` array of integers. Common dtypes include uint64, int32 and int8. |
| ~~List[Floats2d]~~ | A list of two-dimensional arrays, generally with one array per `Doc` and one row per token.          |
| ~~Ragged~~         | A container to handle variable-length sequence data in an unpadded contiguous array.                 |
| ~~Padded~~         | A container to handle variable-length sequence data in a padded contiguous array.                    |

The model type signatures help you figure out which model architectures and
components can **fit together**. For instance, the
[`TextCategorizer`](/api/textcategorizer) class expects a model typed
~~Model[List[Doc], Floats2d]~~, because the model will predict one row of
category probabilities per [`Doc`](/api/doc). In contrast, the
[`Tagger`](/api/tagger) class expects a model typed ~~Model[List[Doc],
List[Floats2d]]~~, because it needs to predict one row of probabilities per
token.

There's no guarantee that two models with the same type signature can be used
interchangeably. There are many other ways they could be incompatible. However,
if the types don't match, they almost surely _won't_ be compatible. This little
bit of validation goes a long way, especially if you
[configure your editor](https://thinc.ai/docs/usage-type-checking) or other
tools to highlight these errors early. The config file is also validated at the
beginning of training, to verify that all the types match correctly.

<Accordion title="Tip: Static type checking in your editor">

If you're using a modern editor like Visual Studio Code, you can
[set up `mypy`](https://thinc.ai/docs/usage-type-checking#install) with the
custom Thinc plugin and get live feedback about mismatched types as you write
code.

[![](../images/thinc_mypy.jpg)](https://thinc.ai/docs/usage-type-checking#linting)

</Accordion>

## Swapping model architectures {#swap-architectures}

If no model is specified for the [`TextCategorizer`](/api/textcategorizer), the
[TextCatEnsemble](/api/architectures#TextCatEnsemble) architecture is used by
default. This architecture combines a simple bag-of-words model with a neural
network, usually resulting in the most accurate results, but at the cost of
speed. The config file for this model would look something like this:

```ini
### config.cfg (excerpt)
[components.textcat]
factory = "textcat"
labels = []

[components.textcat.model]
@architectures = "spacy.TextCatEnsemble.v1"
exclusive_classes = false
pretrained_vectors = null
width = 64
conv_depth = 2
embed_size = 2000
window_size = 1
ngram_size = 1
dropout = 0
nO = null
```

spaCy has two additional built-in `textcat` architectures, and you can easily
use those by swapping out the definition of the textcat's model. For instance,
to use the simple and fast bag-of-words model
[TextCatBOW](/api/architectures#TextCatBOW), you can change the config to:

```ini
### config.cfg (excerpt) {highlight="6-10"}
[components.textcat]
factory = "textcat"
labels = []

[components.textcat.model]
@architectures = "spacy.TextCatBOW.v1"
exclusive_classes = false
ngram_size = 1
no_output_layer = false
nO = null
```

For details on all pre-defined architectures shipped with spaCy and how to
configure them, check out the [model architectures](/api/architectures)
documentation.

### Defining sublayers {#sublayers}

Model architecture functions often accept **sublayers as arguments**, so that
you can try **substituting a different layer** into the network. Depending on
how the architecture function is structured, you might be able to define your
network structure entirely through the [config system](/usage/training#config),
using layers that have already been defined. 

In most neural network models for NLP, the most important parts of the network
are what we refer to as the
[embed and encode](https://explosion.ai/blog/deep-learning-formula-nlp) steps.
These steps together compute dense, context-sensitive representations of the
tokens, and their combination forms a typical
[`Tok2Vec`](/api/architectures#Tok2Vec) layer:

```ini
### config.cfg (excerpt)
[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v1"

[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v1"
# ...

[components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v1"
# ...
```

By defining these sublayers specifically, it becomes straightforward to swap out
a sublayer for another one, for instance changing the first sublayer to a
character embedding with the [CharacterEmbed](/api/architectures#CharacterEmbed)
architecture:

```ini
### config.cfg (excerpt)
[components.tok2vec.model.embed]
@architectures = "spacy.CharacterEmbed.v1"
# ...

[components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v1"
# ...
```

Most of spaCy's default architectures accept a `tok2vec` layer as a sublayer
within the larger task-specific neural network. This makes it easy to **switch
between** transformer, CNN, BiLSTM or other feature extraction approaches. The
[transformers documentation](/usage/embeddings-transformers#training-custom-model)
section shows an example of swapping out a model's standard `tok2vec` layer with
a transformer. And if you want to define your own solution, all you need to do
is register a ~~Model[List[Doc], List[Floats2d]]~~ architecture function, and
you'll be able to try it out in any of the spaCy components. 

## Wrapping PyTorch, TensorFlow and other frameworks {#frameworks}

Thinc allows you to [wrap models](https://thinc.ai/docs/usage-frameworks)
written in other machine learning frameworks like PyTorch, TensorFlow and MXNet
using a unified [`Model`](https://thinc.ai/docs/api-model) API. This makes it
easy to use a model implemented in a different framework to power a component in
your spaCy pipeline. For example, to wrap a PyTorch model as a Thinc `Model`,
you can use Thinc's
[`PyTorchWrapper`](https://thinc.ai/docs/api-layers#pytorchwrapper):

```python
from thinc.api import PyTorchWrapper

wrapped_pt_model = PyTorchWrapper(torch_model)
```

Let's use PyTorch to define a very simple neural network consisting of two
hidden `Linear` layers with `ReLU` activation and dropout, and a
softmax-activated output layer:

```python
### PyTorch model
from torch import nn

torch_model = nn.Sequential(
    nn.Linear(width, hidden_width),
    nn.ReLU(),
    nn.Dropout2d(dropout),
    nn.Linear(hidden_width, nO),
    nn.ReLU(),
    nn.Dropout2d(dropout),
    nn.Softmax(dim=1)
)
```

The resulting wrapped `Model` can be used as a **custom architecture** as such,
or can be a **subcomponent of a larger model**. For instance, we can use Thinc's
[`chain`](https://thinc.ai/docs/api-layers#chain) combinator, which works like
`Sequential` in PyTorch, to combine the wrapped model with other components in a
larger network. This effectively means that you can easily wrap different
components from different frameworks, and "glue" them together with Thinc:

```python
from thinc.api import chain, with_array, PyTorchWrapper
from spacy.ml import CharacterEmbed

wrapped_pt_model = PyTorchWrapper(torch_model)
char_embed = CharacterEmbed(width, embed_size, nM, nC)
model = chain(char_embed, with_array(wrapped_pt_model))
```

In the above example, we have combined our custom PyTorch model with a character
embedding layer defined by spaCy.
[CharacterEmbed](/api/architectures#CharacterEmbed) returns a `Model` that takes
a ~~List[Doc]~~ as input, and outputs a ~~List[Floats2d]~~. To make sure that
the wrapped PyTorch model receives valid inputs, we use Thinc's
[`with_array`](https://thinc.ai/docs/api-layers#with_array) helper.

You could also implement a model that only uses PyTorch for the transformer
layers, and "native" Thinc layers to do fiddly input and output transformations
and add on task-specific "heads", as efficiency is less of a consideration for
those parts of the network.

### Using wrapped models {#frameworks-usage}

To use our custom model including the PyTorch subnetwork, all we need to do is
register the architecture using the
[`architectures` registry](/api/top-level#registry). This will assign the
architecture a name so spaCy knows how to find it, and allows passing in
arguments like hyperparameters via the [config](/usage/training#config). The
full example then becomes:

```python
### Registering the architecture {highlight="9"}
from typing import List
from thinc.types import Floats2d
from thinc.api import Model, PyTorchWrapper, chain, with_array
import spacy
from spacy.tokens.doc import Doc
from spacy.ml import CharacterEmbed
from torch import nn

@spacy.registry.architectures("CustomTorchModel.v1")
def create_torch_model(
    nO: int,
    width: int,
    hidden_width: int,
    embed_size: int,
    nM: int,
    nC: int,
    dropout: float,
) -> Model[List[Doc], List[Floats2d]]:
    char_embed = CharacterEmbed(width, embed_size, nM, nC)
    torch_model = nn.Sequential(
        nn.Linear(width, hidden_width),
        nn.ReLU(),
        nn.Dropout2d(dropout),
        nn.Linear(hidden_width, nO),
        nn.ReLU(),
        nn.Dropout2d(dropout),
        nn.Softmax(dim=1)
    )
    wrapped_pt_model = PyTorchWrapper(torch_model)
    model = chain(char_embed, with_array(wrapped_pt_model))
    return model
```

The model definition can now be used in any existing trainable spaCy component,
by specifying it in the config file. In this configuration, all required
parameters for the various subcomponents of the custom architecture are passed
in as settings via the config.

```ini
### config.cfg (excerpt) {highlight="5-5"}
[components.tagger]
factory = "tagger"

[components.tagger.model]
@architectures = "CustomTorchModel.v1"
nO = 50
width = 96
hidden_width = 48
embed_size = 2000
nM = 64
nC = 8
dropout = 0.2
```

<Infobox variant="warning">

Remember that it is best not to rely on any (hidden) default values, to ensure
that training configs are complete and experiments fully reproducible.

</Infobox>

Note that when using a PyTorch or Tensorflow model, it is recommended to set the
GPU memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or
"tensorflow" in the training config, cupy will allocate memory via those
respective libraries, preventing OOM errors when there's available memory
sitting in the other library's pool.

```ini
### config.cfg (excerpt)
[training]
gpu_allocator = "pytorch"
```

## Custom models with Thinc {#thinc}

Of course it's also possible to define the `Model` from the previous section
entirely in Thinc. The Thinc documentation provides details on the
[various layers](https://thinc.ai/docs/api-layers) and helper functions
available. Combinators can be used to
[overload operators](https://thinc.ai/docs/usage-models#operators) and a common
usage pattern is to bind `chain` to `>>`. The "native" Thinc version of our
simple neural network would then become:

```python
from thinc.api import chain, with_array, Model, Relu, Dropout, Softmax
from spacy.ml import CharacterEmbed

char_embed = CharacterEmbed(width, embed_size, nM, nC)
with Model.define_operators({">>": chain}):
    layers = (
        Relu(hidden_width, width)
        >> Dropout(dropout)
        >> Relu(hidden_width, hidden_width)
        >> Dropout(dropout)
        >> Softmax(nO, hidden_width)
    )
    model = char_embed >> with_array(layers)
```

<Infobox variant="warning" title="Important note on inputs and outputs">

Note that Thinc layers define the output dimension (`nO`) as the first argument,
followed (optionally) by the input dimension (`nI`). This is in contrast to how
the PyTorch layers are defined, where `in_features` precedes `out_features`.

</Infobox>

### Shape inference in Thinc {#thinc-shape-inference}

It is **not** strictly necessary to define all the input and output dimensions
for each layer, as Thinc can perform
[shape inference](https://thinc.ai/docs/usage-models#validation) between
sequential layers by matching up the output dimensionality of one layer to the
input dimensionality of the next. This means that we can simplify the `layers`
definition:

> #### Diff
>
> ```diff
> layers = (
>     Relu(hidden_width, width)
>     >> Dropout(dropout)
> -   >> Relu(hidden_width, hidden_width)
> +    >> Relu(hidden_width)
>     >> Dropout(dropout)
> -   >> Softmax(nO, hidden_width)
> +   >> Softmax(nO)
> )
> ```

```python
with Model.define_operators({">>": chain}):
    layers = (
        Relu(hidden_width, width)
        >> Dropout(dropout)
        >> Relu(hidden_width)
        >> Dropout(dropout)
        >> Softmax(nO)
    )
```

Thinc can even go one step further and **deduce the correct input dimension** of
the first layer, and output dimension of the last. To enable this functionality,
you have to call
[`Model.initialize`](https://thinc.ai/docs/api-model#initialize) with an **input
sample** `X` and an **output sample** `Y` with the correct dimensions:

```python
### Shape inference with initialization {highlight="3,7,10"}
with Model.define_operators({">>": chain}):
    layers = (
        Relu(hidden_width)
        >> Dropout(dropout)
        >> Relu(hidden_width)
        >> Dropout(dropout)
        >> Softmax()
    )
    model = char_embed >> with_array(layers)
    model.initialize(X=input_sample, Y=output_sample)
```

The built-in [pipeline components](/usage/processing-pipelines) in spaCy ensure
that their internal models are **always initialized** with appropriate sample
data. In this case, `X` is typically a ~~List[Doc]~~, while `Y` is typically a
~~List[Array1d]~~ or ~~List[Array2d]~~, depending on the specific task. This
functionality is triggered when [`nlp.initialize`](/api/language#initialize) is
called.

### Dropout and normalization in Thinc {#thinc-dropout-norm}

Many of the available Thinc [layers](https://thinc.ai/docs/api-layers) allow you
to define a `dropout` argument that will result in "chaining" an additional
[`Dropout`](https://thinc.ai/docs/api-layers#dropout) layer. Optionally, you can
often specify whether or not you want to add layer normalization, which would
result in an additional
[`LayerNorm`](https://thinc.ai/docs/api-layers#layernorm) layer. That means that
the following `layers` definition is equivalent to the previous:

```python
with Model.define_operators({">>": chain}):
    layers = (
        Relu(hidden_width, dropout=dropout, normalize=False)
        >> Relu(hidden_width, dropout=dropout, normalize=False)
        >> Softmax()
    )
    model = char_embed >> with_array(layers)
    model.initialize(X=input_sample, Y=output_sample)
```

## Create new trainable components {#components}

In addition to [swapping out](#swap-architectures) default models in built-in
components, you can also implement an entirely new,
[trainable pipeline component](usage/processing-pipelines#trainable-components)
from scratch. This can be done by creating a new class inheriting from
[`Pipe`](/api/pipe), and linking it up to your custom model implementation.

### Example: Pipeline component for relation extraction {#component-rel}

This section outlines an example use-case of implementing a novel relation
extraction component from scratch. We assume we want to implement a binary
relation extraction method that determines whether two entities in a document
are related or not, and if so, with what type of relation. We'll allow multiple
types of relations between two such entities - i.e. it is a multi-label setting.

There are two major steps required: first, we need to
[implement a machine learning model](#component-rel-model) specific to this
task, and then we'll use this model to
[implement a custom pipeline component](#component-rel-pipe).

#### Step 1: Implementing the Model {#component-rel-model}

We'll need to implement a [`Model`](https://thinc.ai/docs/api-model) that takes
a list of documents as input, and outputs a two-dimensional matrix of scores:

```python
@registry.architectures.register("rel_model.v1")
def create_relation_model(...) -> Model[List[Doc], Floats2d]:
    model = _create_my_model()
    return model
```

The first layer in this model will typically be an
[embedding layer](/usage/embeddings-transformers) such as a
[`Tok2Vec`](/api/tok2vec) component or [`Transformer`](/api/transformer). This
layer is assumed to be of type `Model[List["Doc"], List[Floats2d]]` as it
transforms each document into a list of tokens, with each token being
represented by its embedding in the vector space.

Next, we need a method that will generate pairs of entities that we want to
classify as being related or not. These candidate pairs are typically formed
within one document, which means we'll have a function that takes a `Doc` as
input and outputs a `List` of `Span` tuples. For instance, a very
straightforward implementation would be to just take any two entities from the
same document:

```python
def get_candidates(doc: "Doc") -> List[Tuple[Span, Span]]:
    candidates = []
    for ent1 in doc.ents:
        for ent2 in doc.ents:
            candidates.append((ent1, ent2))
    return candidates
```

> ```
> [model]
> @architectures = "rel_model.v1"
>
> [model.tok2vec]
> ...
>
> [model.get_candidates]
> @misc = "rel_cand_generator.v2"
> max_length = 6
> ```

But we could also refine this further by excluding relations of an entity with
itself, and posing a maximum distance (in number of tokens) between two
entities. We'll register this function in the
[`@misc` registry](/api/top-level#registry) so we can refer to it from the
config, and easily swap it out for any other candidate generation function.

```python
### {highlight="1,2,7,8"}
@registry.misc.register("rel_cand_generator.v2")
def create_candidate_indices(max_length: int) -> Callable[[Doc], List[Tuple[Span, Span]]]:
    def get_candidates(doc: "Doc") -> List[Tuple[Span, Span]]:
        candidates = []
        for ent1 in doc.ents:
            for ent2 in doc.ents:
                if ent1 != ent2:
                    if max_length and abs(ent2.start - ent1.start) <= max_length:
                        candidates.append((ent1, ent2))
        return candidates
    return get_candidates
```

Finally, we'll require a method that transforms the candidate pairs of entities
into a 2D tensor using the specified Tok2Vec function, and this `Floats2d`
object will then be processed by a final `output_layer` of the network. Taking
all this together, we can define our relation model like this in the config:

```
[model]
@architectures = "rel_model.v1"
...

[model.tok2vec]
...

[model.get_candidates]
@misc = "rel_cand_generator.v2"
max_length = 6

[model.create_candidate_tensor]
@misc = "rel_cand_tensor.v1"

[model.output_layer]
@architectures = "rel_output_layer.v1"
...
```

<!-- TODO: Link to project for implementation details -->

When creating this model, we'll store the custom functions as
[attributes](https://thinc.ai/docs/api-model#properties) and the sublayers as
references, so we can access them easily:

```python
tok2vec_layer = model.get_ref("tok2vec")
output_layer = model.get_ref("output_layer")
create_candidate_tensor = model.attrs["create_candidate_tensor"]
get_candidates = model.attrs["get_candidates"]
```

#### Step 2: Implementing the pipeline component {#component-rel-pipe}

To use our new relation extraction model as part of a custom component, we 
create a subclass of [`Pipe`](/api/pipe) that will hold the model:

```python
from spacy.pipeline import Pipe
from spacy.language import Language

class RelationExtractor(Pipe):
     def __init__(self, vocab, model, name="rel", labels=[]):
        ...

    def predict(self, docs):
        ...

    def set_annotations(self, docs, scores):
         ...

@Language.factory("relation_extractor")
def make_relation_extractor(nlp, name, model, labels):
    return RelationExtractor(nlp.vocab, model, name, labels=labels)
```

The [`predict`](/api/pipe#predict ) function needs to be implemented for each subclass. 
In our case, we can simply delegate to the internal model's 
[predict](https://thinc.ai/docs/api-model#predict) function:
```python
def predict(self, docs: Iterable[Doc]) -> Floats2d:
    scores = self.model.predict(docs)
    return self.model.ops.asarray(scores)
```


<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
</Infobox>

<!-- TODO: write trainable component section
- Interaction with `predict`, `get_loss` and `set_annotations`
- Initialization life-cycle with `initialize`, correlation with add_label
Example: relation extraction component (implemented as project template)
Avoid duplication with usage/processing-pipelines#trainable-components ?
-->

<!-- ![Diagram of a pipeline component with its model](../images/layers-architectures.svg)

```python
def update(self, examples):
    docs = [ex.predicted for ex in examples]
    refs = [ex.reference for ex in examples]
    predictions, backprop = self.model.begin_update(docs)
    gradient = self.get_loss(predictions, refs)
    backprop(gradient)

def __call__(self, doc):
    predictions = self.model([doc])
    self.set_annotations(predictions)
```
-->
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
+								---
 								title: Layers and Model Architectures
 								teaser: Power spaCy components with custom neural networks
 								menu:
 								  - ['Type Signatures', 'type-sigs']
-												Update layers/arch docs structure [ci skip]

											
										
										
											2020-09-02 14:04:35 +03:00
+								  - ['Swapping Architectures', 'swap-architectures']
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
+								  - ['PyTorch & TensorFlow', 'frameworks']
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								  - ['Custom Thinc Models', 'thinc']
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
+								  - ['Trainable Components', 'components']
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:21:55 +03:00
+								next: /usage/projects
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
+								---
-												rewrite intro, simpel Model example

											
										
										
											2020-09-02 14:41:18 +03:00
+								> #### Example
 								>
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								> ```python
-												rewrite intro, simpel Model example

											
										
										
											2020-09-02 14:41:18 +03:00
+								> from thinc.api import Model, chain
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								>
-												update examples

											
										
										
											2020-09-02 15:15:50 +03:00
+								> @spacy.registry.architectures.register("model.v1")
-												rewrite intro, simpel Model example

											
										
										
											2020-09-02 14:41:18 +03:00
+								> def build_model(width: int, classes: int) -> Model:
 								>     tok2vec = build_tok2vec(width)
 								>     output_layer = build_output_layer(width, classes)
 								>     model = chain(tok2vec, output_layer)
 								>     return model
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								> ```
-												rewrite intro, simpel Model example

											
										
										
											2020-09-02 14:41:18 +03:00
 								A **model architecture** is a function that wires up a
 								[Thinc `Model`](https://thinc.ai/docs/api-model) instance. It describes the
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								neural network that is run internally as part of a component in a spaCy
 								pipeline. To define the actual architecture, you can implement your logic in
 								Thinc directly, or you can use Thinc as a thin wrapper around frameworks such as
-												Update docs [ci skip]

											
										
										
											2020-09-12 18:05:10 +03:00
+								PyTorch, TensorFlow and MXNet. Each `Model` can also be used as a sublayer of a
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								larger network, allowing you to freely combine implementations from different
-												Update docs [ci skip]

											
										
										
											2020-09-12 18:05:10 +03:00
+								frameworks into a single model.
-												rewrite intro, simpel Model example

											
										
										
											2020-09-02 14:41:18 +03:00
 								spaCy's built-in components require a `Model` instance to be passed to them via
 								the config system. To change the model architecture of an existing component,
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								you just need to [**update the config**](#swap-architectures) so that it refers
 								to a different registered function. Once the component has been created from
 								this config, you won't be able to change it anymore. The architecture is like a
 								recipe for the network, and you can't change the recipe once the dish has
 								already been prepared. You have to make a new one.
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
-												update examples

											
										
										
											2020-09-02 15:15:50 +03:00
+								```ini
 								### config.cfg (excerpt)
 								[components.tagger]
 								factory = "tagger"
 								[components.tagger.model]
 								@architectures = "model.v1"
 								width = 512
 								classes = 16
 								```
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
+								## Type signatures {#type-sigs}
-												Update docs [ci skip]

											
										
										
											2020-08-21 20:34:06 +03:00
+								> #### Example
 								>
 								> ```python
-												update examples

											
										
										
											2020-09-02 15:15:50 +03:00
+								> from typing import List
 								> from thinc.api import Model, chain
 								> from thinc.types import Floats2d
 								> def chain_model(
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								>     tok2vec: Model[List[Doc], List[Floats2d]],
 								>     layer1: Model[List[Floats2d], Floats2d],
-												update examples

											
										
										
											2020-09-02 15:15:50 +03:00
+								>     layer2: Model[Floats2d, Floats2d]
 								> ) -> Model[List[Doc], Floats2d]:
 								>     model = chain(tok2vec, layer1, layer2)
-												Update docs [ci skip]

											
										
										
											2020-08-21 20:34:06 +03:00
+								>     return model
 								> ```
-												small rewrites in types paragraph

											
										
										
											2020-09-02 15:25:18 +03:00
+								The Thinc `Model` class is a **generic type** that can specify its input and
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
+								output types. Python uses a square-bracket notation for this, so the type
 								~~Model[List, Dict]~~ says that each batch of inputs to the model will be a
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								list, and the outputs will be a dictionary. You can be even more specific and
 								write for instance~~Model[List[Doc], Dict[str, float]]~~ to specify that the
 								model expects a list of [`Doc`](/api/doc) objects as input, and returns a
 								dictionary mapping of strings to floats. Some of the most common types you'll
 								see are:
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
 								| Type               | Description                                                                                          |
 								| ------------------ | ---------------------------------------------------------------------------------------------------- |
 								| ~~List[Doc]~~      | A batch of [`Doc`](/api/doc) objects. Most components expect their models to take this as input.     |
 								| ~~Floats2d~~       | A two-dimensional `numpy` or `cupy` array of floats. Usually 32-bit.                                 |
 								| ~~Ints2d~~         | A two-dimensional `numpy` or `cupy` array of integers. Common dtypes include uint64, int32 and int8. |
 								| ~~List[Floats2d]~~ | A list of two-dimensional arrays, generally with one array per `Doc` and one row per token.          |
 								| ~~Ragged~~         | A container to handle variable-length sequence data in an unpadded contiguous array.                 |
-												small fixes

											
										
										
											2020-09-02 11:46:38 +03:00
+								| ~~Padded~~         | A container to handle variable-length sequence data in a padded contiguous array.                    |
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
-												Update docs [ci skip]

											
										
										
											2020-08-21 20:34:06 +03:00
+								The model type signatures help you figure out which model architectures and
 								components can **fit together**. For instance, the
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:21:55 +03:00
+								[`TextCategorizer`](/api/textcategorizer) class expects a model typed
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
+								~~Model[List[Doc], Floats2d]~~, because the model will predict one row of
-												Update docs [ci skip]

											
										
										
											2020-08-21 20:34:06 +03:00
+								category probabilities per [`Doc`](/api/doc). In contrast, the
 								[`Tagger`](/api/tagger) class expects a model typed ~~Model[List[Doc],
 								List[Floats2d]]~~, because it needs to predict one row of probabilities per
 								token.
 								There's no guarantee that two models with the same type signature can be used
 								interchangeably. There are many other ways they could be incompatible. However,
 								if the types don't match, they almost surely _won't_ be compatible. This little
 								bit of validation goes a long way, especially if you
 								[configure your editor](https://thinc.ai/docs/usage-type-checking) or other
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								tools to highlight these errors early. The config file is also validated at the
 								beginning of training, to verify that all the types match correctly.
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
-												Update docs [ci skip]

											
										
										
											2020-09-03 11:07:45 +03:00
+								<Accordion title="Tip: Static type checking in your editor">
-												Update docs [ci skip]

											
										
										
											2020-08-21 21:02:18 +03:00
 								If you're using a modern editor like Visual Studio Code, you can
 								[set up `mypy`](https://thinc.ai/docs/usage-type-checking#install) with the
 								custom Thinc plugin and get live feedback about mismatched types as you write
 								code.
 								[![](../images/thinc_mypy.jpg)](https://thinc.ai/docs/usage-type-checking#linting)
-												editor tip as Accordion instead of Infobox

											
										
										
											2020-09-02 15:26:57 +03:00
+								</Accordion>
-												Update docs [ci skip]

											
										
										
											2020-08-21 21:02:18 +03:00
-												Update layers/arch docs structure [ci skip]

											
										
										
											2020-09-02 14:04:35 +03:00
+								## Swapping model architectures {#swap-architectures}
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								If no model is specified for the [`TextCategorizer`](/api/textcategorizer), the
 								[TextCatEnsemble](/api/architectures#TextCatEnsemble) architecture is used by
-												example wrapped Torch model and chaining with Thinc

											
										
										
											2020-09-08 19:32:58 +03:00
+								default. This architecture combines a simple bag-of-words model with a neural
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								network, usually resulting in the most accurate results, but at the cost of
 								speed. The config file for this model would look something like this:
 								```ini
 								### config.cfg (excerpt)
 								[components.textcat]
 								factory = "textcat"
 								labels = []
 								[components.textcat.model]
 								@architectures = "spacy.TextCatEnsemble.v1"
 								exclusive_classes = false
 								pretrained_vectors = null
 								width = 64
 								conv_depth = 2
 								embed_size = 2000
 								window_size = 1
 								ngram_size = 1
 								dropout = 0
 								nO = null
 								```
 								spaCy has two additional built-in `textcat` architectures, and you can easily
 								use those by swapping out the definition of the textcat's model. For instance,
-												Update docs [ci skip]

											
										
										
											2020-09-03 11:07:45 +03:00
+								to use the simple and fast bag-of-words model
 								[TextCatBOW](/api/architectures#TextCatBOW), you can change the config to:
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
 								```ini
-												Update docs [ci skip]

											
										
										
											2020-09-03 11:07:45 +03:00
+								### config.cfg (excerpt) {highlight="6-10"}
-												swapping section

											
										
										
											2020-09-02 16:26:07 +03:00
+								[components.textcat]
 								factory = "textcat"
 								labels = []
 								[components.textcat.model]
 								@architectures = "spacy.TextCatBOW.v1"
 								exclusive_classes = false
 								ngram_size = 1
 								no_output_layer = false
 								nO = null
 								```
-												Update docs [ci skip]

											
										
										
											2020-09-03 11:07:45 +03:00
+								For details on all pre-defined architectures shipped with spaCy and how to
 								configure them, check out the [model architectures](/api/architectures)
 								documentation.
-												Update layers/arch docs structure [ci skip]

											
										
										
											2020-09-02 14:04:35 +03:00
 								### Defining sublayers {#sublayers}
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
-												sublayers paragraph

											
										
										
											2020-09-02 18:36:22 +03:00
+								Model architecture functions often accept **sublayers as arguments**, so that
-												Update docs [ci skip]

											
										
										
											2020-08-21 20:34:06 +03:00
+								you can try **substituting a different layer** into the network. Depending on
 								how the architecture function is structured, you might be able to define your
 								network structure entirely through the [config system](/usage/training#config),
-												sublayers paragraph

											
										
										
											2020-09-02 18:36:22 +03:00
+								using layers that have already been defined.
-												Update docs [ci skip]

											
										
										
											2020-08-21 20:34:06 +03:00
 								In most neural network models for NLP, the most important parts of the network
 								are what we refer to as the
-												sublayers paragraph

											
										
										
											2020-09-02 18:36:22 +03:00
+								[embed and encode](https://explosion.ai/blog/deep-learning-formula-nlp) steps.
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
+								These steps together compute dense, context-sensitive representations of the
-												sublayers paragraph

											
										
										
											2020-09-02 18:36:22 +03:00
+								tokens, and their combination forms a typical
 								[`Tok2Vec`](/api/architectures#Tok2Vec) layer:
 								```ini
 								### config.cfg (excerpt)
 								[components.tok2vec]
 								factory = "tok2vec"
 								[components.tok2vec.model]
 								@architectures = "spacy.Tok2Vec.v1"
 								[components.tok2vec.model.embed]
 								@architectures = "spacy.MultiHashEmbed.v1"
 								# ...
 								[components.tok2vec.model.encode]
 								@architectures = "spacy.MaxoutWindowEncoder.v1"
 								# ...
 								```
-												Update docs [ci skip]

											
										
										
											2020-08-21 20:34:06 +03:00
-												sublayers paragraph

											
										
										
											2020-09-02 18:36:22 +03:00
+								By defining these sublayers specifically, it becomes straightforward to swap out
 								a sublayer for another one, for instance changing the first sublayer to a
 								character embedding with the [CharacterEmbed](/api/architectures#CharacterEmbed)
 								architecture:
 								```ini
 								### config.cfg (excerpt)
 								[components.tok2vec.model.embed]
 								@architectures = "spacy.CharacterEmbed.v1"
 								# ...
 								[components.tok2vec.model.encode]
 								@architectures = "spacy.MaxoutWindowEncoder.v1"
 								# ...
 								```
 								Most of spaCy's default architectures accept a `tok2vec` layer as a sublayer
 								within the larger task-specific neural network. This makes it easy to **switch
 								between** transformer, CNN, BiLSTM or other feature extraction approaches. The
 								[transformers documentation](/usage/embeddings-transformers#training-custom-model)
 								section shows an example of swapping out a model's standard `tok2vec` layer with
 								a transformer. And if you want to define your own solution, all you need to do
 								is register a ~~Model[List[Doc], List[Floats2d]]~~ architecture function, and
 								you'll be able to try it out in any of the spaCy components.
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
-												Update layers/arch docs structure [ci skip]

											
										
										
											2020-09-02 14:04:35 +03:00
+								## Wrapping PyTorch, TensorFlow and other frameworks {#frameworks}
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
-												Update layers/arch docs structure [ci skip]

											
										
										
											2020-09-02 14:04:35 +03:00
+								Thinc allows you to [wrap models](https://thinc.ai/docs/usage-frameworks)
 								written in other machine learning frameworks like PyTorch, TensorFlow and MXNet
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								using a unified [`Model`](https://thinc.ai/docs/api-model) API. This makes it
 								easy to use a model implemented in a different framework to power a component in
 								your spaCy pipeline. For example, to wrap a PyTorch model as a Thinc `Model`,
 								you can use Thinc's
 								[`PyTorchWrapper`](https://thinc.ai/docs/api-layers#pytorchwrapper):
-												example wrapped Torch model and chaining with Thinc

											
										
										
											2020-09-08 19:32:58 +03:00
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								```python
 								from thinc.api import PyTorchWrapper
 								wrapped_pt_model = PyTorchWrapper(torch_model)
 								```
 								Let's use PyTorch to define a very simple neural network consisting of two
 								hidden `Linear` layers with `ReLU` activation and dropout, and a
 								softmax-activated output layer:
-												example wrapped Torch model and chaining with Thinc

											
										
										
											2020-09-08 19:32:58 +03:00
 								```python
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								### PyTorch model
-												example wrapped Torch model and chaining with Thinc

											
										
										
											2020-09-08 19:32:58 +03:00
+								from torch import nn
 								torch_model = nn.Sequential(
 								    nn.Linear(width, hidden_width),
 								    nn.ReLU(),
 								    nn.Dropout2d(dropout),
 								    nn.Linear(hidden_width, nO),
 								    nn.ReLU(),
 								    nn.Dropout2d(dropout),
 								    nn.Softmax(dim=1)
-												Update docs [ci skip]

											
										
										
											2020-09-12 18:05:10 +03:00
+								)
-												example wrapped Torch model and chaining with Thinc

											
										
										
											2020-09-08 19:32:58 +03:00
+								```
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
+								The resulting wrapped `Model` can be used as a **custom architecture** as such,
 								or can be a **subcomponent of a larger model**. For instance, we can use Thinc's
 								[`chain`](https://thinc.ai/docs/api-layers#chain) combinator, which works like
 								`Sequential` in PyTorch, to combine the wrapped model with other components in a
 								larger network. This effectively means that you can easily wrap different
 								components from different frameworks, and "glue" them together with Thinc:
-												example wrapped Torch model and chaining with Thinc

											
										
										
											2020-09-08 19:32:58 +03:00
+								```python
-												Update docs [ci skip]

											
										
										
											2020-09-12 18:05:10 +03:00
+								from thinc.api import chain, with_array, PyTorchWrapper
-												example wrapped Torch model and chaining with Thinc

											
										
										
											2020-09-08 19:32:58 +03:00
+								from spacy.ml import CharacterEmbed
-												Update docs [ci skip]

											
										
										
											2020-09-12 18:05:10 +03:00
+								wrapped_pt_model = PyTorchWrapper(torch_model)
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
+								char_embed = CharacterEmbed(width, embed_size, nM, nC)
 								model = chain(char_embed, with_array(wrapped_pt_model))
-												example wrapped Torch model and chaining with Thinc

											
										
										
											2020-09-08 19:32:58 +03:00
+								```
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
+								In the above example, we have combined our custom PyTorch model with a character
 								embedding layer defined by spaCy.
 								[CharacterEmbed](/api/architectures#CharacterEmbed) returns a `Model` that takes
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								a ~~List[Doc]~~ as input, and outputs a ~~List[Floats2d]~~. To make sure that
 								the wrapped PyTorch model receives valid inputs, we use Thinc's
-												example wrapped Torch model and chaining with Thinc

											
										
										
											2020-09-08 19:32:58 +03:00
+								[`with_array`](https://thinc.ai/docs/api-layers#with_array) helper.
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								You could also implement a model that only uses PyTorch for the transformer
 								layers, and "native" Thinc layers to do fiddly input and output transformations
 								and add on task-specific "heads", as efficiency is less of a consideration for
 								those parts of the network.
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								### Using wrapped models {#frameworks-usage}
-												Update docs [ci skip]

											
										
										
											2020-08-21 21:02:18 +03:00
-												PyTorch spelling

											
										
										
											2020-09-09 17:27:21 +03:00
+								To use our custom model including the PyTorch subnetwork, all we need to do is
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								register the architecture using the
 								[`architectures` registry](/api/top-level#registry). This will assign the
 								architecture a name so spaCy knows how to find it, and allows passing in
 								arguments like hyperparameters via the [config](/usage/training#config). The
 								full example then becomes:
-												how to register and use custom function

											
										
										
											2020-09-08 21:22:20 +03:00
 								```python
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								### Registering the architecture {highlight="9"}
-												how to register and use custom function

											
										
										
											2020-09-08 21:22:20 +03:00
+								from typing import List
 								from thinc.types import Floats2d
 								from thinc.api import Model, PyTorchWrapper, chain, with_array
 								import spacy
 								from spacy.tokens.doc import Doc
 								from spacy.ml import CharacterEmbed
 								from torch import nn
 								@spacy.registry.architectures("CustomTorchModel.v1")
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								def create_torch_model(
-												formatting

											
										
										
											2020-09-09 12:25:35 +03:00
+								    nO: int,
-												how to register and use custom function

											
										
										
											2020-09-08 21:22:20 +03:00
+								    width: int,
 								    hidden_width: int,
 								    embed_size: int,
 								    nM: int,
 								    nC: int,
 								    dropout: float,
 								) -> Model[List[Doc], List[Floats2d]]:
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
+								    char_embed = CharacterEmbed(width, embed_size, nM, nC)
-												how to register and use custom function

											
										
										
											2020-09-08 21:22:20 +03:00
+								    torch_model = nn.Sequential(
 								        nn.Linear(width, hidden_width),
 								        nn.ReLU(),
 								        nn.Dropout2d(dropout),
 								        nn.Linear(hidden_width, nO),
 								        nn.ReLU(),
 								        nn.Dropout2d(dropout),
 								        nn.Softmax(dim=1)
 								    )
 								    wrapped_pt_model = PyTorchWrapper(torch_model)
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
+								    model = chain(char_embed, with_array(wrapped_pt_model))
-												how to register and use custom function

											
										
										
											2020-09-08 21:22:20 +03:00
+								    return model
 								```
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								The model definition can now be used in any existing trainable spaCy component,
 								by specifying it in the config file. In this configuration, all required
 								parameters for the various subcomponents of the custom architecture are passed
 								in as settings via the config.
-												how to register and use custom function

											
										
										
											2020-09-08 21:22:20 +03:00
 								```ini
-												formatting

											
										
										
											2020-09-09 12:25:35 +03:00
+								### config.cfg (excerpt) {highlight="5-5"}
-												how to register and use custom function

											
										
										
											2020-09-08 21:22:20 +03:00
+								[components.tagger]
 								factory = "tagger"
 								[components.tagger.model]
 								@architectures = "CustomTorchModel.v1"
 								nO = 50
 								width = 96
 								hidden_width = 48
 								embed_size = 2000
-												formatting

											
										
										
											2020-09-09 12:25:35 +03:00
+								nM = 64
 								nC = 8
 								dropout = 0.2
-												how to register and use custom function

											
										
										
											2020-09-08 21:22:20 +03:00
+								```
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								<Infobox variant="warning">
 								Remember that it is best not to rely on any (hidden) default values, to ensure
 								that training configs are complete and experiments fully reproducible.
 								</Infobox>
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
-												Update docs [ci skip]

											
										
										
											2020-09-20 18:44:58 +03:00
+								Note that when using a PyTorch or Tensorflow model, it is recommended to set the
 								GPU memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or
 								"tensorflow" in the training config, cupy will allocate memory via those
 								respective libraries, preventing OOM errors when there's available memory
 								sitting in the other library's pool.
-												Introducing the gpu_allocator (#6091)

* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'

* --code instead of --code-path

* update documentation

* avoid querying the "system" section directly

* add explanation of gpu_allocator to TF/PyTorch section in docs

* fix typo

* fix typo 2

* use set_gpu_allocator from thinc 8.0.0a34

* default null instead of empty string
											
										
										
											2020-09-19 02:17:02 +03:00
 								```ini
 								### config.cfg (excerpt)
 								[training]
 								gpu_allocator = "pytorch"
 								```
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								## Custom models with Thinc {#thinc}
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								Of course it's also possible to define the `Model` from the previous section
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
+								entirely in Thinc. The Thinc documentation provides details on the
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
+								[various layers](https://thinc.ai/docs/api-layers) and helper functions
-												slight rewrite to hide some thinc implementation details

											
										
										
											2020-10-04 14:26:46 +03:00
+								available. Combinators can be used to
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								[overload operators](https://thinc.ai/docs/usage-models#operators) and a common
 								usage pattern is to bind `chain` to `>>`. The "native" Thinc version of our
 								simple neural network would then become:
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
 								```python
 								from thinc.api import chain, with_array, Model, Relu, Dropout, Softmax
 								from spacy.ml import CharacterEmbed
 								char_embed = CharacterEmbed(width, embed_size, nM, nC)
 								with Model.define_operators({">>": chain}):
 								    layers = (
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								        Relu(hidden_width, width)
 								        >> Dropout(dropout)
 								        >> Relu(hidden_width, hidden_width)
 								        >> Dropout(dropout)
 								        >> Softmax(nO, hidden_width)
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
+								    )
 								    model = char_embed >> with_array(layers)
 								```
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								<Infobox variant="warning" title="Important note on inputs and outputs">
 								Note that Thinc layers define the output dimension (`nO`) as the first argument,
 								followed (optionally) by the input dimension (`nI`). This is in contrast to how
 								the PyTorch layers are defined, where `in_features` precedes `out_features`.
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								</Infobox>
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								### Shape inference in Thinc {#thinc-shape-inference}
 								It is **not** strictly necessary to define all the input and output dimensions
 								for each layer, as Thinc can perform
-												document Pipe API details, crossreferences etc

											
										
										
											2020-09-09 16:56:27 +03:00
+								[shape inference](https://thinc.ai/docs/usage-models#validation) between
 								sequential layers by matching up the output dimensionality of one layer to the
 								input dimensionality of the next. This means that we can simplify the `layers`
 								definition:
-												add section on Thinc implementation details

											
										
										
											2020-09-08 21:43:09 +03:00
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								> #### Diff
 								>
 								> ```diff
 								> layers = (
 								>     Relu(hidden_width, width)
 								>     >> Dropout(dropout)
 								> -   >> Relu(hidden_width, hidden_width)
 								> +    >> Relu(hidden_width)
 								>     >> Dropout(dropout)
 								> -   >> Softmax(nO, hidden_width)
 								> +   >> Softmax(nO)
 								> )
 								> ```
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
+								```python
 								with Model.define_operators({">>": chain}):
 								    layers = (
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								        Relu(hidden_width, width)
 								        >> Dropout(dropout)
 								        >> Relu(hidden_width)
 								        >> Dropout(dropout)
 								        >> Softmax(nO)
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
+								    )
 								```
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								Thinc can even go one step further and **deduce the correct input dimension** of
 								the first layer, and output dimension of the last. To enable this functionality,
 								you have to call
 								[`Model.initialize`](https://thinc.ai/docs/api-model#initialize) with an **input
 								sample** `X` and an **output sample** `Y` with the correct dimensions:
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
 								```python
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								### Shape inference with initialization {highlight="3,7,10"}
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
+								with Model.define_operators({">>": chain}):
 								    layers = (
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								        Relu(hidden_width)
 								        >> Dropout(dropout)
 								        >> Relu(hidden_width)
 								        >> Dropout(dropout)
 								        >> Softmax()
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
+								    )
 								    model = char_embed >> with_array(layers)
 								    model.initialize(X=input_sample, Y=output_sample)
 								```
-												references to usage page on layers and architectures

											
										
										
											2020-09-09 15:47:32 +03:00
+								The built-in [pipeline components](/usage/processing-pipelines) in spaCy ensure
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								that their internal models are **always initialized** with appropriate sample
 								data. In this case, `X` is typically a ~~List[Doc]~~, while `Y` is typically a
 								~~List[Array1d]~~ or ~~List[Array2d]~~, depending on the specific task. This
-												begin_training -> initialize

											
										
										
											2020-09-28 22:35:09 +03:00
+								functionality is triggered when [`nlp.initialize`](/api/language#initialize) is
 								called.
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								### Dropout and normalization in Thinc {#thinc-dropout-norm}
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								Many of the available Thinc [layers](https://thinc.ai/docs/api-layers) allow you
 								to define a `dropout` argument that will result in "chaining" an additional
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
+								[`Dropout`](https://thinc.ai/docs/api-layers#dropout) layer. Optionally, you can
 								often specify whether or not you want to add layer normalization, which would
 								result in an additional
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								[`LayerNorm`](https://thinc.ai/docs/api-layers#layernorm) layer. That means that
 								the following `layers` definition is equivalent to the previous:
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
 								```python
 								with Model.define_operators({">>": chain}):
 								    layers = (
-												Update docs and formatting

											
										
										
											2020-09-09 22:26:10 +03:00
+								        Relu(hidden_width, dropout=dropout, normalize=False)
 								        >> Relu(hidden_width, dropout=dropout, normalize=False)
 								        >> Softmax()
-												details on Thinc shape inference

											
										
										
											2020-09-09 14:57:05 +03:00
+								    )
 								    model = char_embed >> with_array(layers)
 								    model.initialize(X=input_sample, Y=output_sample)
 								```
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
-												references to usage page on layers and architectures

											
										
										
											2020-09-09 15:47:32 +03:00
+								## Create new trainable components {#components}
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
-												REL intro and get_candidates function

											
										
										
											2020-10-04 00:27:05 +03:00
+								In addition to [swapping out](#swap-architectures) default models in built-in
 								components, you can also implement an entirely new,
 								[trainable pipeline component](usage/processing-pipelines#trainable-components)
-												tok2vec layer

											
										
										
											2020-10-04 01:08:02 +03:00
+								from scratch. This can be done by creating a new class inheriting from
 								[`Pipe`](/api/pipe), and linking it up to your custom model implementation.
-												REL intro and get_candidates function

											
										
										
											2020-10-04 00:27:05 +03:00
 								### Example: Pipeline component for relation extraction {#component-rel}
-												slight rewrite to hide some thinc implementation details

											
										
										
											2020-10-04 14:26:46 +03:00
+								This section outlines an example use-case of implementing a novel relation
-												highlight the two steps: the model and the pipeline component

											
										
										
											2020-10-04 15:11:53 +03:00
+								extraction component from scratch. We assume we want to implement a binary
 								relation extraction method that determines whether two entities in a document
 								are related or not, and if so, with what type of relation. We'll allow multiple
-												slight rewrite to hide some thinc implementation details

											
										
										
											2020-10-04 14:26:46 +03:00
+								types of relations between two such entities - i.e. it is a multi-label setting.
-												highlight the two steps: the model and the pipeline component

											
										
										
											2020-10-04 15:11:53 +03:00
+								There are two major steps required: first, we need to
 								[implement a machine learning model](#component-rel-model) specific to this
 								task, and then we'll use this model to
 								[implement a custom pipeline component](#component-rel-pipe).
 								#### Step 1: Implementing the Model {#component-rel-model}
 								We'll need to implement a [`Model`](https://thinc.ai/docs/api-model) that takes
-												slight rewrite to hide some thinc implementation details

											
										
										
											2020-10-04 14:26:46 +03:00
+								a list of documents as input, and outputs a two-dimensional matrix of scores:
 								```python
 								@registry.architectures.register("rel_model.v1")
 								def create_relation_model(...) -> Model[List[Doc], Floats2d]:
 								    model = _create_my_model()
 								    return model
 								```
 								The first layer in this model will typically be an
 								[embedding layer](/usage/embeddings-transformers) such as a
 								[`Tok2Vec`](/api/tok2vec) component or [`Transformer`](/api/transformer). This
 								layer is assumed to be of type `Model[List["Doc"], List[Floats2d]]` as it
-												highlight the two steps: the model and the pipeline component

											
										
										
											2020-10-04 15:11:53 +03:00
+								transforms each document into a list of tokens, with each token being
-												slight rewrite to hide some thinc implementation details

											
										
										
											2020-10-04 14:26:46 +03:00
+								represented by its embedding in the vector space.
-												highlight the two steps: the model and the pipeline component

											
										
										
											2020-10-04 15:11:53 +03:00
+								Next, we need a method that will generate pairs of entities that we want to
 								classify as being related or not. These candidate pairs are typically formed
 								within one document, which means we'll have a function that takes a `Doc` as
 								input and outputs a `List` of `Span` tuples. For instance, a very
 								straightforward implementation would be to just take any two entities from the
 								same document:
-												REL intro and get_candidates function

											
										
										
											2020-10-04 00:27:05 +03:00
 								```python
-												tok2vec layer

											
										
										
											2020-10-04 01:08:02 +03:00
+								def get_candidates(doc: "Doc") -> List[Tuple[Span, Span]]:
 								    candidates = []
 								    for ent1 in doc.ents:
 								        for ent2 in doc.ents:
 								            candidates.append((ent1, ent2))
 								    return candidates
-												REL intro and get_candidates function

											
										
										
											2020-10-04 00:27:05 +03:00
+								```
-												tok2vec layer

											
										
										
											2020-10-04 01:08:02 +03:00
+								> ```
-												slight rewrite to hide some thinc implementation details

											
										
										
											2020-10-04 14:26:46 +03:00
+								> [model]
 								> @architectures = "rel_model.v1"
-												highlight the two steps: the model and the pipeline component

											
										
										
											2020-10-04 15:11:53 +03:00
+								>
-												slight rewrite to hide some thinc implementation details

											
										
										
											2020-10-04 14:26:46 +03:00
+								> [model.tok2vec]
 								> ...
-												highlight the two steps: the model and the pipeline component

											
										
										
											2020-10-04 15:11:53 +03:00
+								>
-												slight rewrite to hide some thinc implementation details

											
										
										
											2020-10-04 14:26:46 +03:00
+								> [model.get_candidates]
-												tok2vec layer

											
										
										
											2020-10-04 01:08:02 +03:00
+								> @misc = "rel_cand_generator.v2"
 								> max_length = 6
 								> ```
-												REL intro and get_candidates function

											
										
										
											2020-10-04 00:27:05 +03:00
-												slight rewrite to hide some thinc implementation details

											
										
										
											2020-10-04 14:26:46 +03:00
+								But we could also refine this further by excluding relations of an entity with
 								itself, and posing a maximum distance (in number of tokens) between two
 								entities. We'll register this function in the
 								[`@misc` registry](/api/top-level#registry) so we can refer to it from the
 								config, and easily swap it out for any other candidate generation function.
-												REL intro and get_candidates function

											
										
										
											2020-10-04 00:27:05 +03:00
+								```python
 								### {highlight="1,2,7,8"}
 								@registry.misc.register("rel_cand_generator.v2")
 								def create_candidate_indices(max_length: int) -> Callable[[Doc], List[Tuple[Span, Span]]]:
-												tok2vec layer

											
										
										
											2020-10-04 01:08:02 +03:00
+								    def get_candidates(doc: "Doc") -> List[Tuple[Span, Span]]:
 								        candidates = []
-												REL intro and get_candidates function

											
										
										
											2020-10-04 00:27:05 +03:00
+								        for ent1 in doc.ents:
 								            for ent2 in doc.ents:
 								                if ent1 != ent2:
 								                    if max_length and abs(ent2.start - ent1.start) <= max_length:
-												tok2vec layer

											
										
										
											2020-10-04 01:08:02 +03:00
+								                        candidates.append((ent1, ent2))
 								        return candidates
 								    return get_candidates
 								```
-												highlight the two steps: the model and the pipeline component

											
										
										
											2020-10-04 15:11:53 +03:00
+								Finally, we'll require a method that transforms the candidate pairs of entities
 								into a 2D tensor using the specified Tok2Vec function, and this `Floats2d`
 								object will then be processed by a final `output_layer` of the network. Taking
 								all this together, we can define our relation model like this in the config:
-												slight rewrite to hide some thinc implementation details

											
										
										
											2020-10-04 14:26:46 +03:00
-												highlight the two steps: the model and the pipeline component

											
										
										
											2020-10-04 15:11:53 +03:00
+								```
 								[model]
 								@architectures = "rel_model.v1"
 								...
-												tok2vec layer

											
										
										
											2020-10-04 01:08:02 +03:00
-												highlight the two steps: the model and the pipeline component

											
										
										
											2020-10-04 15:11:53 +03:00
+								[model.tok2vec]
 								...
 								[model.get_candidates]
 								@misc = "rel_cand_generator.v2"
 								max_length = 6
 								[model.create_candidate_tensor]
 								@misc = "rel_cand_tensor.v1"
 								[model.output_layer]
 								@architectures = "rel_output_layer.v1"
 								...
 								```
 								<!-- TODO: Link to project for implementation details -->
 								When creating this model, we'll store the custom functions as
 								[attributes](https://thinc.ai/docs/api-model#properties) and the sublayers as
 								references, so we can access them easily:
 								```python
 								tok2vec_layer = model.get_ref("tok2vec")
 								output_layer = model.get_ref("output_layer")
 								create_candidate_tensor = model.attrs["create_candidate_tensor"]
 								get_candidates = model.attrs["get_candidates"]
 								```
 								#### Step 2: Implementing the pipeline component {#component-rel-pipe}
 								To use our new relation extraction model as part of a custom component, we
 								create a subclass of [`Pipe`](/api/pipe) that will hold the model:
 								```python
 								from spacy.pipeline import Pipe
 								from spacy.language import Language
 								class RelationExtractor(Pipe):
 								     def __init__(self, vocab, model, name="rel", labels=[]):
 								        ...
 								    def predict(self, docs):
 								        ...
 								    def set_annotations(self, docs, scores):
 								         ...
 								@Language.factory("relation_extractor")
 								def make_relation_extractor(nlp, name, model, labels):
 								    return RelationExtractor(nlp.vocab, model, name, labels=labels)
 								```
 								The [`predict`](/api/pipe#predict ) function needs to be implemented for each subclass.
 								In our case, we can simply delegate to the internal model's
 								[predict](https://thinc.ai/docs/api-model#predict) function:
 								```python
 								def predict(self, docs: Iterable[Doc]) -> Floats2d:
 								    scores = self.model.predict(docs)
 								    return self.model.ops.asarray(scores)
 								```
-												REL intro and get_candidates function

											
										
										
											2020-10-04 00:27:05 +03:00
-												Update docs [ci skip]

											
										
										
											2020-09-12 18:05:10 +03:00
+								<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
 								</Infobox>
-												Update layers/arch docs structure [ci skip]

											
										
										
											2020-09-02 14:04:35 +03:00
-												Update docs [ci skip]

											
										
										
											2020-09-20 18:44:58 +03:00
+								<!-- TODO: write trainable component section
-												Update docs [ci skip]

											
										
										
											2020-08-21 17:11:38 +03:00
+								- Interaction with `predict`, `get_loss` and `set_annotations`
-												begin_training -> initialize

											
										
										
											2020-09-28 22:35:09 +03:00
+								- Initialization life-cycle with `initialize`, correlation with add_label
-												Update layers/arch docs structure [ci skip]

											
										
										
											2020-09-02 14:04:35 +03:00
+								Example: relation extraction component (implemented as project template)
-												references to usage page on layers and architectures

											
										
										
											2020-09-09 15:47:32 +03:00
+								Avoid duplication with usage/processing-pipelines#trainable-components ?
-												Update layers/arch docs structure [ci skip]

											
										
										
											2020-09-02 14:04:35 +03:00
+								-->
-												Update docs [ci skip]

											
										
										
											2020-09-12 18:05:10 +03:00
+								<!-- ![Diagram of a pipeline component with its model](../images/layers-architectures.svg)
-												Update docs [ci skip]

											
										
										
											2020-08-22 18:15:05 +03:00
 								```python
 								def update(self, examples):
 								    docs = [ex.predicted for ex in examples]
 								    refs = [ex.reference for ex in examples]
 								    predictions, backprop = self.model.begin_update(docs)
 								    gradient = self.get_loss(predictions, refs)
 								    backprop(gradient)
 								def __call__(self, doc):
 								    predictions = self.model([doc])
 								    self.set_annotations(predictions)
 								```
-												Update docs [ci skip]

											
										
										
											2020-09-12 18:05:10 +03:00
+								-->