9.7 KiB
title | teaser | menu | next | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Layers and Model Architectures | Power spaCy components with custom neural networks |
|
/usage/projects |
A model architecture is a function that wires up a
Thinc Model
instance, which you can then
use in a component or as a layer of a larger network. You can use Thinc as a
thin wrapper around frameworks such as PyTorch, TensorFlow or MXNet, or you can
implement your logic in Thinc directly. spaCy's built-in components will never
construct their Model
instances themselves, so you won't have to subclass the
component to change its model architecture. You can just update the config
so that it refers to a different registered function. Once the component has
been created, its model instance has already been assigned, so you cannot change
its model architecture. The architecture is like a recipe for the network, and
you can't change the recipe once the dish has already been prepared. You have to
make a new one.
Type signatures
Example
@spacy.registry.architectures.register("spacy.Tagger.v1") def build_tagger_model( tok2vec: Model[List[Doc], List[Floats2d]], nO: Optional[int] = None ) -> Model[List[Doc], List[Floats2d]]: t2v_width = tok2vec.get_dim("nO") if tok2vec.has_dim("nO") else None output_layer = Softmax(nO, t2v_width, init_W=zero_init) softmax = with_array(output_layer) model = chain(tok2vec, softmax) model.set_ref("tok2vec", tok2vec) model.set_ref("softmax", output_layer) model.set_ref("output_layer", output_layer) return model
The Thinc Model
class is a generic type that can specify its input and
output types. Python uses a square-bracket notation for this, so the type
Model[List, Dict] says that each batch of inputs to the model will be a
list, and the outputs will be a dictionary. Both typing.List
and typing.Dict
are also generics, allowing you to be more specific about the data. For
instance, you can write Model[List[Doc], Dict[str, float]] to specify that
the model expects a list of Doc
objects as input, and returns a
dictionary mapping strings to floats. Some of the most common types you'll see
are:
Type | Description |
---|---|
A batch of Doc objects. Most components expect their models to take this as input. |
|
A two-dimensional numpy or cupy array of floats. Usually 32-bit. |
|
A two-dimensional numpy or cupy array of integers. Common dtypes include uint64, int32 and int8. |
|
A list of two-dimensional arrays, generally with one array per Doc and one row per token. |
|
A container to handle variable-length sequence data in an unpadded contiguous array. | |
A container to handle variable-length sequence data in a padded contiguous array. |
The model type signatures help you figure out which model architectures and
components can fit together. For instance, the
TextCategorizer
class expects a model typed
Model[List[Doc], Floats2d], because the model will predict one row of
category probabilities per Doc
. In contrast, the
Tagger
class expects a model typed Model[List[Doc],
List[Floats2d]], because it needs to predict one row of probabilities per
token.
There's no guarantee that two models with the same type signature can be used interchangeably. There are many other ways they could be incompatible. However, if the types don't match, they almost surely won't be compatible. This little bit of validation goes a long way, especially if you configure your editor or other tools to highlight these errors early. Thinc will also verify that your types match correctly when your config file is processed at the beginning of training.
If you're using a modern editor like Visual Studio Code, you can
set up mypy
with the
custom Thinc plugin and get live feedback about mismatched types as you write
code.
Defining sublayers
Model architecture functions often accept sublayers as arguments, so that you can try substituting a different layer into the network. Depending on how the architecture function is structured, you might be able to define your network structure entirely through the config system, using layers that have already been defined. The transformers documentation section shows a common example of swapping in a different sublayer.
In most neural network models for NLP, the most important parts of the network
are what we refer to as the
embed and encode steps.
These steps together compute dense, context-sensitive representations of the
tokens. Most of spaCy's default architectures accept a
tok2vec
embedding layer as an argument, so
you can control this important part of the network separately. This makes it
easy to switch between transformer, CNN, BiLSTM or other feature extraction
approaches. And if you want to define your own solution, all you need to do is
register a Model[List[Doc], List[Floats2d]] architecture function, and
you'll be able to try it out in any of the spaCy components.
Registering new architectures
- Recap concept, link to config docs.
Wrapping PyTorch, TensorFlow and other frameworks
Thinc allows you to wrap models written in other machine learning frameworks
like PyTorch, TensorFlow and MXNet using a unified
Model
API. As well as wrapping whole
models, Thinc lets you call into an external framework for just part of your
model: you can have a model where you use PyTorch just for the transformer
layers, using "native" Thinc layers to do fiddly input and output
transformations and add on task-specific "heads", as efficiency is less of a
consideration for those parts of the network.
Thinc uses a special class, Shim
, to
hold references to external objects. This allows each wrapper space to define a
custom type, with whatever attributes and methods are helpful, to assist in
managing the communication between Thinc and the external library. The
Model
class holds shim
instances in
a separate list, and communicates with the shims about updates, serialization,
changes of device, etc.
The wrapper will receive each batch of inputs, convert them into a suitable form for the underlying model instance, and pass them over to the shim, which will manage the actual communication with the model. The output is then passed back into the wrapper, and converted for use in the rest of the network. The equivalent procedure happens during backpropagation. Array conversion is handled via the DLPack standard wherever possible, so that data can be passed between the frameworks without copying the data back to the host device unnecessarily.
Framework | Wrapper layer | Shim | DLPack |
---|---|---|---|
PyTorch | PyTorchWrapper |
PyTorchShim |
✅ |
TensorFlow | TensorFlowWrapper |
TensorFlowShim |
❌ 1 |
MXNet | MXNetWrapper |
MXNetShim |
✅ |
- DLPack support in TensorFlow is now available but still experimental.
Models for trainable components
- Interaction with
predict
,get_loss
andset_annotations
- Initialization life-cycle with
begin_training
. - Link to relation extraction notebook.
def update(self, examples):
docs = [ex.predicted for ex in examples]
refs = [ex.reference for ex in examples]
predictions, backprop = self.model.begin_update(docs)
gradient = self.get_loss(predictions, refs)
backprop(gradient)
def __call__(self, doc):
predictions = self.model([doc])
self.set_annotations(predictions)