spaCy/website/docs/usage/transformers.md

---
title: Transformers
teaser: Using transformer models like BERT in spaCy
menu:
  - ['Installation', 'install']
  - ['Runtime Usage', 'runtime']
  - ['Training Usage', 'training']
next: /usage/training
---

## Installation {#install hidden="true"}

Transformers are a family of neural network architectures that compute **dense,
context-sensitive representations** for the tokens in your documents. Downstream
models in your pipeline can then use these representations as input features to
**improve their predictions**. You can connect multiple components to a single
transformer model, with any or all of those components giving feedback to the
transformer to fine-tune it to your tasks. spaCy's transformer support
interoperates with [PyTorch](https://pytorch.org) and the
[HuggingFace `transformers`](https://huggingface.co/transformers/) library,
giving you access to thousands of pretrained models for your pipelines. There
are many [great guides](http://jalammar.github.io/illustrated-transformer/) to
transformer models, but for practical purposes, you can simply think of them as
a drop-in replacement that let you achieve **higher accuracy** in exchange for
**higher training and runtime costs**.

### System requirements

We recommend an NVIDIA GPU with at least 10GB of memory in order to work with
transformer models. The exact requirements will depend on the transformer you
model you choose and whether you're training the pipeline or simply running it.
Training a transformer-based model without a GPU will be too slow for most
practical purposes. You'll also need to make sure your GPU drivers are
up-to-date and v9+ of the CUDA runtime is installed.

Once you have CUDA installed, you'll need to install two pip packages,
[`cupy`](https://docs.cupy.dev/en/stable/install.html) and
[`spacy-transformers`](https://github.com/explosion/spacy-transformers). `cupy`
is just like `numpy`, but for GPU. The best way to install it is to choose a
wheel that matches the version of CUDA you're using. You may also need to set
the `CUDA_PATH` environment variable if your CUDA runtime is installed in a
non-standard location. Putting it all together, if you had installed CUDA 10.2
in `/opt/nvidia/cuda`, you would run:

```bash
### Installation with CUDA
export CUDA_PATH="/opt/nvidia/cuda"
pip install cupy-cuda102
pip install spacy-transformers
```

Provisioning a new machine will require about 5GB of data to be downloaded in
total: 3GB for the CUDA runtime, 800MB for PyTorch, 400MB for CuPy, 500MB for
the transformer weights, and about 200MB for spaCy and its various requirements.

## Runtime usage {#runtime}

Transformer models can be used as **drop-in replacements** for other types of
neural networks, so your spaCy pipeline can include them in a way that's
completely invisible to the user. Users will download, load and use the model in
the standard way, like any other spaCy pipeline. Instead of using the
transformers as subnetworks directly, you can also use them via the
[`Transformer`](/api/transformer) pipeline component.

![The processing pipeline with the transformer component](../images/pipeline_transformer.svg)

The `Transformer` component sets the
[`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
which lets you access the transformers outputs at runtime.

```bash
$ python -m spacy download en_core_trf_lg
```

```python
### Example
import spacy
from thinc.api import use_pytorch_for_gpu_memory, require_gpu

# Use the GPU, with memory allocations directed via PyTorch.
# This prevents out-of-memory errors that would otherwise occur from competing
# memory pools.
use_pytorch_for_gpu_memory()
require_gpu(0)

nlp = spacy.load("en_core_trf_lg")
for doc in nlp.pipe(["some text", "some other text"]):
    tokvecs = doc._.trf_data.tensors[-1]
```

You can also customize how the [`Transformer`](/api/transformer) component sets
annotations onto the [`Doc`](/api/doc), by customizing the `annotation_setter`.
This callback will be called with the raw input and output data for the whole
batch, along with the batch of `Doc` objects, allowing you to implement whatever
you need. The annotation setter is called with a batch of [`Doc`](/api/doc)
objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch)
containing the transformers data for the batch.

```python
def custom_annotation_setter(docs, trf_data):
    # TODO:
    ...

nlp = spacy.load("en_core_trf_lg")
nlp.get_pipe("transformer").annotation_setter = custom_annotation_setter
doc = nlp("This is a text")
print()  # TODO:
```

## Training usage {#training}

The recommended workflow for training is to use spaCy's
[config system](/usage/training#config), usually via the
[`spacy train`](/api/cli#train) command. The training config defines all
component settings and hyperparameters in one place and lets you describe a tree
of objects by referring to creation functions, including functions you register
yourself. For details on how to get started with training your own model, check
out the [training quickstart](/usage/training#quickstart).

<Project id="en_core_bert">

The easiest way to get started is to clone a transformers-based project
template. Swap in your data, edit the settings and hyperparameters and train,
evaluate, package and visualize your model.

</Project>

The `[components]` section in the [`config.cfg`](/api/data-formats#config)
describes the pipeline components and the settings used to construct them,
including their model implementation. Here's a config snippet for the
[`Transformer`](/api/transformer) component, along with matching Python code. In
this case, the `[components.transformer]` block describes the `transformer`
component:

> #### Python equivalent
>
> ```python
> from spacy_transformers import Transformer, TransformerModel
> from spacy_transformers.annotation_setters import null_annotation_setter
> from spacy_transformers.span_getters import get_doc_spans
>
> trf = Transformer(
>     nlp.vocab,
>     TransformerModel(
>         "bert-base-cased",
>         get_spans=get_doc_spans,
>         tokenizer_config={"use_fast": True},
>     ),
>     annotation_setter=null_annotation_setter,
>     max_batch_items=4096,
> )
> ```

```ini
### config.cfg (excerpt)
[components.transformer]
factory = "transformer"
max_batch_items = 4096

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "bert-base-cased"
tokenizer_config = {"use_fast": true}

[components.transformer.model.get_spans]
@span_getters = "doc_spans.v1"

[components.transformer.annotation_setter]
@annotation_setters = "spacy-transformer.null_annotation_setter.v1"

```

The `[components.transformer.model]` block describes the `model` argument passed
to the transformer component. It's a Thinc
[`Model`](https://thinc.ai/docs/api-model) object that will be passed into the
component. Here, it references the function
[spacy-transformers.TransformerModel.v1](/api/architectures#TransformerModel)
registered in the [`architectures` registry](/api/top-level#registry). If a key
in a block starts with `@`, it's **resolved to a function** and all other
settings are passed to the function as arguments. In this case, `name`,
`tokenizer_config` and `get_spans`.

`get_spans` is a function that takes a batch of `Doc` object and returns lists
of potentially overlapping `Span` objects to process by the transformer. Several
[built-in functions](/api/transformer#span-getters) are available – for example,
to process the whole document or individual sentences. When the config is
resolved, the function is created and passed into the model as an argument.

<Infobox variant="warning">

Remember that the `config.cfg` used for training should contain **no missing
values** and requires all settings to be defined. You don't want any hidden
defaults creeping in and changing your results! spaCy will tell you if settings
are missing, and you can run
[`spacy init fill-config`](/api/cli#init-fill-config) to automatically fill in
all defaults.

</Infobox>

### Customizing the settings {#training-custom-settings}

To change any of the settings, you can edit the `config.cfg` and re-run the
training. To change any of the functions, like the span getter, you can replace
the name of the referenced function – e.g. `@span_getters = "sent_spans.v1"` to
process sentences. You can also register your own functions using the
`span_getters` registry:

> #### config.cfg
>
> ```ini
> [components.transformer.model.get_spans]
> @span_getters = "custom_sent_spans"
> ```

```python
### code.py
import spacy_transformers

@spacy_transformers.registry.span_getters("custom_sent_spans")
def configure_custom_sent_spans():
    # TODO: write custom example
    def get_sent_spans(docs):
        return [list(doc.sents) for doc in docs]

    return get_sent_spans
```

To resolve the config during training, spaCy needs to know about your custom
function. You can make it available via the `--code` argument that can point to
a Python file. For more details on training with custom code, see the
[training documentation](/usage/training#custom-code).

```bash
$ python -m spacy train ./config.cfg --code ./code.py
```

### Customizing the model implementations {#training-custom-model}

The [`Transformer`](/api/transformer) component expects a Thinc
[`Model`](https://thinc.ai/docs/api-model) object to be passed in as its `model`
argument. You're not limited to the implementation provided by
`spacy-transformers` – the only requirement is that your registered function
must return an object of type ~~Model[List[Doc], FullTransformerBatch]~~: that
is, a Thinc model that takes a list of [`Doc`](/api/doc) objects, and returns a
[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) object with the
transformer data.

> #### Model type annotations
>
> In the documentation and code base, you may come across type annotations and
> descriptions of [Thinc](https://thinc.ai) model types, like ~~Model[List[Doc],
> List[Floats2d]]~~. This so-called generic type describes the layer and its
> input and output type – in this case, it takes a list of `Doc` objects as the
> input and list of 2-dimensional arrays of floats as the output. You can read
> more about defining Thinc models [here](https://thinc.ai/docs/usage-models).
> Also see the [type checking](https://thinc.ai/docs/usage-type-checking) for
> how to enable linting in your editor to see live feedback if your inputs and
> outputs don't match.

The same idea applies to task models that power the **downstream components**.
Most of spaCy's built-in model creation functions support a `tok2vec` argument,
which should be a Thinc layer of type `Model[List[Doc], List[Floats2d]]`. This
is where we'll plug in our transformer model, using the
[Tok2VecListener](/api/architectures#Tok2VecListener) layer, which sneakily
delegates to the `Transformer` pipeline component.

```ini
### config.cfg (excerpt) {highlight="12"}
[components.ner]
factory = "ner"

[nlp.pipeline.ner.model]
@architectures = "spacy.TransitionBasedParser.v1"
nr_feature_tokens = 3
hidden_width = 128
maxout_pieces = 3
use_upper = false

[nlp.pipeline.ner.model.tok2vec]
@architectures = "spacy-transformers.Tok2VecListener.v1"
grad_factor = 1.0

[nlp.pipeline.ner.model.tok2vec.pooling]
@layers = "reduce_mean.v1"
```

The [Tok2VecListener](/api/architectures#Tok2VecListener) layer expects a
[pooling layer](https://thinc.ai/docs/api-layers#reduction-ops) as the argument
`pooling`, which needs to be of type `Model[Ragged, Floats2d]`. This layer
determines how the vector for each spaCy token will be computed from the zero or
more source rows the token is aligned against. Here we use the
[`reduce_mean`](https://thinc.ai/docs/api-layers#reduce_mean) layer, which
averages the wordpiece rows. We could instead use
[`reduce_max`](https://thinc.ai/docs/api-layers#reduce_max), or a custom
function you write yourself.

You can have multiple components all listening to the same transformer model,
and all passing gradients back to it. By default, all of the gradients will be
**equally weighted**. You can control this with the `grad_factor` setting, which
lets you reweight the gradients from the different listeners. For instance,
setting `grad_factor = 0` would disable gradients from one of the listeners,
while `grad_factor = 2.0` would multiply them by 2. This is similar to having a
custom learning rate for each component. Instead of a constant, you can also
provide a schedule, allowing you to freeze the shared parameters at the start of
training.
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								---
 								title: Transformers
 								teaser: Using transformer models like BERT in spaCy
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								menu:
 								  - ['Installation', 'install']
 								  - ['Runtime Usage', 'runtime']
 								  - ['Training Usage', 'training']
-												Update docs [ci skip]

											
										
										
											2020-07-29 20:09:44 +03:00
+								next: /usage/training
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								---
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								## Installation {#install hidden="true"}
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								Transformers are a family of neural network architectures that compute **dense,
 								context-sensitive representations** for the tokens in your documents. Downstream
-												Update transformer docs intro. Also write system requirements

											
										
										
											2020-08-16 21:13:24 +03:00
+								models in your pipeline can then use these representations as input features to
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								**improve their predictions**. You can connect multiple components to a single
-												Update transformer docs intro. Also write system requirements

											
										
										
											2020-08-16 21:13:24 +03:00
+								transformer model, with any or all of those components giving feedback to the
 								transformer to fine-tune it to your tasks. spaCy's transformer support
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								interoperates with [PyTorch](https://pytorch.org) and the
 								[HuggingFace `transformers`](https://huggingface.co/transformers/) library,
 								giving you access to thousands of pretrained models for your pipelines. There
 								are many [great guides](http://jalammar.github.io/illustrated-transformer/) to
 								transformer models, but for practical purposes, you can simply think of them as
 								a drop-in replacement that let you achieve **higher accuracy** in exchange for
 								**higher training and runtime costs**.
-												Update transformer docs intro. Also write system requirements

											
										
										
											2020-08-16 21:13:24 +03:00
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								### System requirements
-												Update transformer docs intro. Also write system requirements

											
										
										
											2020-08-16 21:13:24 +03:00
 								We recommend an NVIDIA GPU with at least 10GB of memory in order to work with
 								transformer models. The exact requirements will depend on the transformer you
 								model you choose and whether you're training the pipeline or simply running it.
 								Training a transformer-based model without a GPU will be too slow for most
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								practical purposes. You'll also need to make sure your GPU drivers are
 								up-to-date and v9+ of the CUDA runtime is installed.
-												Update transformer docs intro. Also write system requirements

											
										
										
											2020-08-16 21:13:24 +03:00
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								Once you have CUDA installed, you'll need to install two pip packages,
 								[`cupy`](https://docs.cupy.dev/en/stable/install.html) and
 								[`spacy-transformers`](https://github.com/explosion/spacy-transformers). `cupy`
-												Update transformers page

											
										
										
											2020-08-16 21:29:50 +03:00
+								is just like `numpy`, but for GPU. The best way to install it is to choose a
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								wheel that matches the version of CUDA you're using. You may also need to set
 								the `CUDA_PATH` environment variable if your CUDA runtime is installed in a
 								non-standard location. Putting it all together, if you had installed CUDA 10.2
-												Update transformers page

											
										
										
											2020-08-16 21:29:50 +03:00
+								in `/opt/nvidia/cuda`, you would run:
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								```bash
 								### Installation with CUDA
-												Update transformer docs intro. Also write system requirements

											
										
										
											2020-08-16 21:13:24 +03:00
+								export CUDA_PATH="/opt/nvidia/cuda"
 								pip install cupy-cuda102
 								pip install spacy-transformers
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								```
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								Provisioning a new machine will require about 5GB of data to be downloaded in
 								total: 3GB for the CUDA runtime, 800MB for PyTorch, 400MB for CuPy, 500MB for
 								the transformer weights, and about 200MB for spaCy and its various requirements.
-												Update transformers page

											
										
										
											2020-08-16 21:29:50 +03:00
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								## Runtime usage {#runtime}
 								Transformer models can be used as **drop-in replacements** for other types of
 								neural networks, so your spaCy pipeline can include them in a way that's
 								completely invisible to the user. Users will download, load and use the model in
 								the standard way, like any other spaCy pipeline. Instead of using the
 								transformers as subnetworks directly, you can also use them via the
 								[`Transformer`](/api/transformer) pipeline component.
 								![The processing pipeline with the transformer component](../images/pipeline_transformer.svg)
 								The `Transformer` component sets the
 								[`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
 								which lets you access the transformers outputs at runtime.
 								```bash
 								$ python -m spacy download en_core_trf_lg
 								```
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								```python
 								### Example
 								import spacy
-												Update transformer docs intro. Also write system requirements

											
										
										
											2020-08-16 21:13:24 +03:00
+								from thinc.api import use_pytorch_for_gpu_memory, require_gpu
 								# Use the GPU, with memory allocations directed via PyTorch.
 								# This prevents out-of-memory errors that would otherwise occur from competing
 								# memory pools.
 								use_pytorch_for_gpu_memory()
 								require_gpu(0)
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								nlp = spacy.load("en_core_trf_lg")
 								for doc in nlp.pipe(["some text", "some other text"]):
 								    tokvecs = doc._.trf_data.tensors[-1]
 								```
 								You can also customize how the [`Transformer`](/api/transformer) component sets
 								annotations onto the [`Doc`](/api/doc), by customizing the `annotation_setter`.
 								This callback will be called with the raw input and output data for the whole
 								batch, along with the batch of `Doc` objects, allowing you to implement whatever
 								you need. The annotation setter is called with a batch of [`Doc`](/api/doc)
 								objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch)
 								containing the transformers data for the batch.
 								```python
 								def custom_annotation_setter(docs, trf_data):
 								    # TODO:
 								    ...
 								nlp = spacy.load("en_core_trf_lg")
 								nlp.get_pipe("transformer").annotation_setter = custom_annotation_setter
 								doc = nlp("This is a text")
 								print()  # TODO:
 								```
 								## Training usage {#training}
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
 								The recommended workflow for training is to use spaCy's
 								[config system](/usage/training#config), usually via the
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								[`spacy train`](/api/cli#train) command. The training config defines all
 								component settings and hyperparameters in one place and lets you describe a tree
 								of objects by referring to creation functions, including functions you register
-												Update docs [ci skip]

											
										
										
											2020-07-31 14:26:39 +03:00
+								yourself. For details on how to get started with training your own model, check
 								out the [training quickstart](/usage/training#quickstart).
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								<Project id="en_core_bert">
 								The easiest way to get started is to clone a transformers-based project
 								template. Swap in your data, edit the settings and hyperparameters and train,
 								evaluate, package and visualize your model.
 								</Project>
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
-												Update docs [ci skip]

											
										
										
											2020-08-10 02:20:10 +03:00
+								The `[components]` section in the [`config.cfg`](/api/data-formats#config)
 								describes the pipeline components and the settings used to construct them,
 								including their model implementation. Here's a config snippet for the
-												Update docs [ci skip]

											
										
										
											2020-07-29 20:41:34 +03:00
+								[`Transformer`](/api/transformer) component, along with matching Python code. In
 								this case, the `[components.transformer]` block describes the `transformer`
 								component:
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
 								> #### Python equivalent
 								>
 								> ```python
 								> from spacy_transformers import Transformer, TransformerModel
 								> from spacy_transformers.annotation_setters import null_annotation_setter
 								> from spacy_transformers.span_getters import get_doc_spans
 								>
 								> trf = Transformer(
 								>     nlp.vocab,
 								>     TransformerModel(
 								>         "bert-base-cased",
 								>         get_spans=get_doc_spans,
 								>         tokenizer_config={"use_fast": True},
 								>     ),
 								>     annotation_setter=null_annotation_setter,
 								>     max_batch_items=4096,
 								> )
 								> ```
 								```ini
 								### config.cfg (excerpt)
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								[components.transformer]
 								factory = "transformer"
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								max_batch_items = 4096
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
 								[components.transformer.model]
 								@architectures = "spacy-transformers.TransformerModel.v1"
 								name = "bert-base-cased"
 								tokenizer_config = {"use_fast": true}
 								[components.transformer.model.get_spans]
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								@span_getters = "doc_spans.v1"
 								[components.transformer.annotation_setter]
 								@annotation_setters = "spacy-transformer.null_annotation_setter.v1"
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								```
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								The `[components.transformer.model]` block describes the `model` argument passed
 								to the transformer component. It's a Thinc
 								[`Model`](https://thinc.ai/docs/api-model) object that will be passed into the
 								component. Here, it references the function
 								[spacy-transformers.TransformerModel.v1](/api/architectures#TransformerModel)
 								registered in the [`architectures` registry](/api/top-level#registry). If a key
 								in a block starts with `@`, it's **resolved to a function** and all other
 								settings are passed to the function as arguments. In this case, `name`,
 								`tokenizer_config` and `get_spans`.
 								`get_spans` is a function that takes a batch of `Doc` object and returns lists
 								of potentially overlapping `Span` objects to process by the transformer. Several
 								[built-in functions](/api/transformer#span-getters) are available – for example,
 								to process the whole document or individual sentences. When the config is
 								resolved, the function is created and passed into the model as an argument.
 								<Infobox variant="warning">
 								Remember that the `config.cfg` used for training should contain **no missing
 								values** and requires all settings to be defined. You don't want any hidden
 								defaults creeping in and changing your results! spaCy will tell you if settings
-												Update quickstart, template and docs

											
										
										
											2020-08-15 15:50:29 +03:00
+								are missing, and you can run
 								[`spacy init fill-config`](/api/cli#init-fill-config) to automatically fill in
 								all defaults.
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
 								</Infobox>
 								### Customizing the settings {#training-custom-settings}
 								To change any of the settings, you can edit the `config.cfg` and re-run the
 								training. To change any of the functions, like the span getter, you can replace
 								the name of the referenced function – e.g. `@span_getters = "sent_spans.v1"` to
 								process sentences. You can also register your own functions using the
 								`span_getters` registry:
 								> #### config.cfg
 								>
 								> ```ini
 								> [components.transformer.model.get_spans]
 								> @span_getters = "custom_sent_spans"
 								> ```
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								```python
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								### code.py
 								import spacy_transformers
 								@spacy_transformers.registry.span_getters("custom_sent_spans")
 								def configure_custom_sent_spans():
 								    # TODO: write custom example
 								    def get_sent_spans(docs):
 								        return [list(doc.sents) for doc in docs]
 								    return get_sent_spans
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								```
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								To resolve the config during training, spaCy needs to know about your custom
 								function. You can make it available via the `--code` argument that can point to
-												Update docs [ci skip]

											
										
										
											2020-07-29 20:48:26 +03:00
+								a Python file. For more details on training with custom code, see the
 								[training documentation](/usage/training#custom-code).
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								```bash
-												Update docs

											
										
										
											2020-08-06 20:30:43 +03:00
+								$ python -m spacy train ./config.cfg --code ./code.py
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								```
 								### Customizing the model implementations {#training-custom-model}
 								The [`Transformer`](/api/transformer) component expects a Thinc
 								[`Model`](https://thinc.ai/docs/api-model) object to be passed in as its `model`
 								argument. You're not limited to the implementation provided by
 								`spacy-transformers` – the only requirement is that your registered function
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								must return an object of type ~~Model[List[Doc], FullTransformerBatch]~~: that
 								is, a Thinc model that takes a list of [`Doc`](/api/doc) objects, and returns a
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) object with the
 								transformer data.
 								> #### Model type annotations
 								>
 								> In the documentation and code base, you may come across type annotations and
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								> descriptions of [Thinc](https://thinc.ai) model types, like ~~Model[List[Doc],
 								> List[Floats2d]]~~. This so-called generic type describes the layer and its
 								> input and output type – in this case, it takes a list of `Doc` objects as the
 								> input and list of 2-dimensional arrays of floats as the output. You can read
 								> more about defining Thinc models [here](https://thinc.ai/docs/usage-models).
 								> Also see the [type checking](https://thinc.ai/docs/usage-type-checking) for
 								> how to enable linting in your editor to see live feedback if your inputs and
 								> outputs don't match.
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
 								The same idea applies to task models that power the **downstream components**.
 								Most of spaCy's built-in model creation functions support a `tok2vec` argument,
 								which should be a Thinc layer of type `Model[List[Doc], List[Floats2d]]`. This
 								is where we'll plug in our transformer model, using the
 								[Tok2VecListener](/api/architectures#Tok2VecListener) layer, which sneakily
 								delegates to the `Transformer` pipeline component.
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								```ini
 								### config.cfg (excerpt) {highlight="12"}
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								[components.ner]
 								factory = "ner"
 								[nlp.pipeline.ner.model]
 								@architectures = "spacy.TransitionBasedParser.v1"
 								nr_feature_tokens = 3
 								hidden_width = 128
 								maxout_pieces = 3
 								use_upper = false
 								[nlp.pipeline.ner.model.tok2vec]
 								@architectures = "spacy-transformers.Tok2VecListener.v1"
 								grad_factor = 1.0
 								[nlp.pipeline.ner.model.tok2vec.pooling]
 								@layers = "reduce_mean.v1"
 								```
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								The [Tok2VecListener](/api/architectures#Tok2VecListener) layer expects a
-												Update docs [ci skip]

											
										
										
											2020-07-29 20:41:34 +03:00
+								[pooling layer](https://thinc.ai/docs/api-layers#reduction-ops) as the argument
 								`pooling`, which needs to be of type `Model[Ragged, Floats2d]`. This layer
 								determines how the vector for each spaCy token will be computed from the zero or
 								more source rows the token is aligned against. Here we use the
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								[`reduce_mean`](https://thinc.ai/docs/api-layers#reduce_mean) layer, which
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								averages the wordpiece rows. We could instead use
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								[`reduce_max`](https://thinc.ai/docs/api-layers#reduce_max), or a custom
 								function you write yourself.
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								You can have multiple components all listening to the same transformer model,
 								and all passing gradients back to it. By default, all of the gradients will be
-												Update docs [ci skip]

											
										
										
											2020-07-29 19:44:10 +03:00
+								**equally weighted**. You can control this with the `grad_factor` setting, which
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								lets you reweight the gradients from the different listeners. For instance,
 								setting `grad_factor = 0` would disable gradients from one of the listeners,
 								while `grad_factor = 2.0` would multiply them by 2. This is similar to having a
 								custom learning rate for each component. Instead of a constant, you can also
 								provide a schedule, allowing you to freeze the shared parameters at the start of
 								training.