spaCy/website/docs/models/index.md

---
title: Models
teaser: Downloadable pretrained models for spaCy
menu:
  - ['Quickstart', 'quickstart']
  - ['Model Architecture', 'architecture']
  - ['Conventions', 'conventions']
---

The models directory includes two types of pretrained models:

1. **Core models:** General-purpose pretrained models to predict named entities,
   part-of-speech tags and syntactic dependencies. Can be used out-of-the-box
   and fine-tuned on more specific data.
2. **Starter models:** Transfer learning starter packs with pretrained weights
   you can initialize your models with to achieve better accuracy. They can
   include word vectors (which will be used as features during training) or
   other pretrained representations like BERT. These models don't include
   components for specific tasks like NER or text classification and are
   intended to be used as base models when training your own models.

### Quickstart {hidden="true"}

import QuickstartModels from 'widgets/quickstart-models.js'

<QuickstartModels title="Quickstart" id="quickstart" description="Install a default model, get the code to load it from within spaCy and test it." />

<Infobox title="📖 Installation and usage">

For more details on how to use models with spaCy, see the
[usage guide on models](/usage/models).

</Infobox>

## Model architecture {#architecture}

spaCy v2.0 features new neural models for **tagging**, **parsing** and **entity
recognition**. The models have been designed and implemented from scratch
specifically for spaCy, to give you an unmatched balance of speed, size and
accuracy. A novel bloom embedding strategy with subword features is used to
support huge vocabularies in tiny tables. Convolutional layers with residual
connections, layer normalization and maxout non-linearity are used, giving much
better efficiency than the standard BiLSTM solution.

The parser and NER use an imitation learning objective to deliver **accuracy
in-line with the latest research systems**, even when evaluated from raw text.
With these innovations, spaCy v2.0's models are **10× smaller**, **20% more
accurate**, and **even cheaper to run** than the previous generation. The
current architecture hasn't been published yet, but in the meantime we prepared
a video that explains how the models work, with particular focus on NER.

<YouTube id="sqDHBH9IjRU" />

The parsing model is a blend of recent results. The two recent inspirations have
been the work of Eli Klipperwasser and Yoav Goldberg at Bar Ilan[^1], and the
SyntaxNet team from Google. The foundation of the parser is still based on the
work of Joakim Nivre[^2], who introduced the transition-based framework[^3], the
arc-eager transition system, and the imitation learning objective. The model is
implemented using [Thinc](https://github.com/explosion/thinc), spaCy's machine
learning library. We first predict context-sensitive vectors for each word in
the input:

```python
(embed_lower | embed_prefix | embed_suffix | embed_shape)
    >> Maxout(token_width)
    >> convolution ** 4
```

This convolutional layer is shared between the tagger, parser and NER, and will
also be shared by the future neural lemmatizer. Because the parser shares these
layers with the tagger, the parser does not require tag features. I got this
trick from David Weiss's "Stack Combination" paper[^4].

To boost the representation, the tagger actually predicts a "super tag" with
POS, morphology and dependency label[^5]. The tagger predicts these supertags by
adding a softmax layer onto the convolutional layer – so, we're teaching the
convolutional layer to give us a representation that's one affine transform from
this informative lexical information. This is obviously good for the parser
(which backprops to the convolutions, too). The parser model makes a state
vector by concatenating the vector representations for its context tokens. The
current context tokens:

| Context tokens                                                                     | Description                                                                 |
| ---------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| `S0`, `S1`, `S2`                                                                   | Top three words on the stack.                                               |
| `B0`, `B1`                                                                         | First two words of the buffer.                                              |
| `S0L1`, `S1L1`, `S2L1`, `B0L1`, `B1L1`<br />`S0L2`, `S1L2`, `S2L2`, `B0L2`, `B1L2` | Leftmost and second leftmost children of `S0`, `S1`, `S2`, `B0` and `B1`.   |
| `S0R1`, `S1R1`, `S2R1`, `B0R1`, `B1R1`<br />`S0R2`, `S1R2`, `S2R2`, `B0R2`, `B1R2` | Rightmost and second rightmost children of `S0`, `S1`, `S2`, `B0` and `B1`. |

This makes the state vector quite long: `13*T`, where `T` is the token vector
width (128 is working well). Fortunately, there's a way to structure the
computation to save some expense (and make it more GPU-friendly).

The parser typically visits `2*N` states for a sentence of length `N` (although
it may visit more, if it back-tracks with a non-monotonic transition[^4]). A
naive implementation would require `2*N (B, 13*T) @ (13*T, H)` matrix
multiplications for a batch of size `B`. We can instead perform one
`(B*N, T) @ (T, 13*H)` multiplication, to pre-compute the hidden weights for
each positional feature with respect to the words in the batch. (Note that our
token vectors come from the CNN — so we can't play this trick over the
vocabulary. That's how Stanford's NN parser[^3] works — and why its model is so
big.)

This pre-computation strategy allows a nice compromise between GPU-friendliness
and implementation simplicity. The CNN and the wide lower layer are computed on
the GPU, and then the precomputed hidden weights are moved to the CPU, before we
start the transition-based parsing process. This makes a lot of things much
easier. We don't have to worry about variable-length batch sizes, and we don't
have to implement the dynamic oracle in CUDA to train.

Currently the parser's loss function is multi-label log loss[^6], as the dynamic
oracle allows multiple states to be 0 cost. This is defined as follows, where
`gZ` is the sum of the scores assigned to gold classes:

```python
(exp(score) / Z) - (exp(score) / gZ)
```

<Infobox title="Bibliography">

1. [Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations {#fn-1}](https://www.semanticscholar.org/paper/Simple-and-Accurate-Dependency-Parsing-Using-Bidir-Kiperwasser-Goldberg/3cf31ecb2724b5088783d7c96a5fc0d5604cbf41).
   Eliyahu Kiperwasser, Yoav Goldberg. (2016)
2. [A Dynamic Oracle for Arc-Eager Dependency Parsing {#fn-2}](https://www.semanticscholar.org/paper/A-Dynamic-Oracle-for-Arc-Eager-Dependency-Parsing-Goldberg-Nivre/22697256ec19ecc3e14fcfc63624a44cf9c22df4).
   Yoav Goldberg, Joakim Nivre (2012)
3. [Parsing English in 500 Lines of Python {#fn-3}](https://explosion.ai/blog/parsing-english-in-python).
   Matthew Honnibal (2013)
4. [Stack-propagation: Improved Representation Learning for Syntax {#fn-4}](https://www.semanticscholar.org/paper/Stack-propagation-Improved-Representation-Learning-Zhang-Weiss/0c133f79b23e8c680891d2e49a66f0e3d37f1466).
   Yuan Zhang, David Weiss (2016)
5. [Deep multi-task learning with low level tasks supervised at lower layers {#fn-5}](https://www.semanticscholar.org/paper/Deep-multi-task-learning-with-low-level-tasks-supe-S%C3%B8gaard-Goldberg/03ad06583c9721855ccd82c3d969a01360218d86).
   Anders Søgaard, Yoav Goldberg (2016)
6. [An Improved Non-monotonic Transition System for Dependency Parsing {#fn-6}](https://www.semanticscholar.org/paper/An-Improved-Non-monotonic-Transition-System-for-De-Honnibal-Johnson/4094cee47ade13b77b5ab4d2e6cb9dd2b8a2917c).
   Matthew Honnibal, Mark Johnson (2015)
7. [A Fast and Accurate Dependency Parser using Neural Networks {#fn-7}](http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf).
   Danqi Cheng, Christopher D. Manning (2014)
8. [Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques {#fn-8}](https://www.semanticscholar.org/paper/Parsing-the-Wall-Street-Journal-using-a-Lexical-Fu-Riezler-King/0ad07862a91cd59b7eb5de38267e47725a62b8b2).
   Stefan Riezler et al. (2002)

</Infobox>

## Model naming conventions {#conventions}

In general, spaCy expects all model packages to follow the naming convention of
`[lang`\_[name]]. For spaCy's models, we also chose to divide the name into
three components:

1. **Type:** Model capabilities (e.g. `core` for general-purpose model with
   vocabulary, syntax, entities and word vectors, or `depent` for only vocab,
   syntax and entities).
2. **Genre:** Type of text the model is trained on, e.g. `web` or `news`.
3. **Size:** Model size indicator, `sm`, `md` or `lg`.

For example, `en_core_web_sm` is a small English model trained on written web
text (blogs, news, comments), that includes vocabulary, vectors, syntax and
entities.

### Model versioning {#model-versioning}

Additionally, the model versioning reflects both the compatibility with spaCy,
as well as the major and minor model version. A model version `a.b.c` translates
to:

- `a`: **spaCy major version**. For example, `2` for spaCy v2.x.
- `b`: **Model major version**. Models with a different major version can't be
  loaded by the same code. For example, changing the width of the model, adding
  hidden layers or changing the activation changes the model major version.
- `c`: **Model minor version**. Same model structure, but different parameter
  values, e.g. from being trained on different data, for different numbers of
  iterations, etc.

For a detailed compatibility overview, see the
[`compatibility.json`](https://github.com/explosion/spacy-models/tree/master/compatibility.json)
in the models repository. This is also the source of spaCy's internal
compatibility check, performed when you run the [`download`](/api/cli#download)
command.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								---
 								title: Models
-												Divide models into core and starters [ci skip]

											
										
										
											2019-12-21 16:10:22 +03:00
+								teaser: Downloadable pretrained models for spaCy
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								menu:
 								  - ['Quickstart', 'quickstart']
 								  - ['Model Architecture', 'architecture']
 								  - ['Conventions', 'conventions']
 								---
-												Divide models into core and starters [ci skip]

											
										
										
											2019-12-21 16:10:22 +03:00
+								The models directory includes two types of pretrained models:
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												Divide models into core and starters [ci skip]

											
										
										
											2019-12-21 16:10:22 +03:00
+. **Core models:** General-purpose pretrained models to predict named entities,
 								   part-of-speech tags and syntactic dependencies. Can be used out-of-the-box
 								   and fine-tuned on more specific data.
 . **Starter models:** Transfer learning starter packs with pretrained weights
 								   you can initialize your models with to achieve better accuracy. They can
 								   include word vectors (which will be used as features during training) or
 								   other pretrained representations like BERT. These models don't include
 								   components for specific tasks like NER or text classification and are
 								   intended to be used as base models when training your own models.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								### Quickstart {hidden="true"}
 								import QuickstartModels from 'widgets/quickstart-models.js'
-												Divide models into core and starters [ci skip]

											
										
										
											2019-12-21 16:10:22 +03:00
+								<QuickstartModels title="Quickstart" id="quickstart" description="Install a default model, get the code to load it from within spaCy and test it." />
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								<Infobox title="📖 Installation and usage">
 								For more details on how to use models with spaCy, see the
 								[usage guide on models](/usage/models).
 								</Infobox>
 								## Model architecture {#architecture}
-												Divide models into core and starters [ci skip]

											
										
										
											2019-12-21 16:10:22 +03:00
+								spaCy v2.0 features new neural models for **tagging**, **parsing** and **entity
 								recognition**. The models have been designed and implemented from scratch
 								specifically for spaCy, to give you an unmatched balance of speed, size and
 								accuracy. A novel bloom embedding strategy with subword features is used to
 								support huge vocabularies in tiny tables. Convolutional layers with residual
 								connections, layer normalization and maxout non-linearity are used, giving much
 								better efficiency than the standard BiLSTM solution.
 								The parser and NER use an imitation learning objective to deliver **accuracy
 								in-line with the latest research systems**, even when evaluated from raw text.
 								With these innovations, spaCy v2.0's models are **10× smaller**, **20% more
 								accurate**, and **even cheaper to run** than the previous generation. The
 								current architecture hasn't been published yet, but in the meantime we prepared
 								a video that explains how the models work, with particular focus on NER.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								<YouTube id="sqDHBH9IjRU" />
 								The parsing model is a blend of recent results. The two recent inspirations have
 								been the work of Eli Klipperwasser and Yoav Goldberg at Bar Ilan[^1], and the
 								SyntaxNet team from Google. The foundation of the parser is still based on the
 								work of Joakim Nivre[^2], who introduced the transition-based framework[^3], the
 								arc-eager transition system, and the imitation learning objective. The model is
 								implemented using [Thinc](https://github.com/explosion/thinc), spaCy's machine
 								learning library. We first predict context-sensitive vectors for each word in
 								the input:
 								```python
 								(embed_lower | embed_prefix | embed_suffix | embed_shape)
 								    >> Maxout(token_width)
 								    >> convolution ** 4
 								```
 								This convolutional layer is shared between the tagger, parser and NER, and will
 								also be shared by the future neural lemmatizer. Because the parser shares these
 								layers with the tagger, the parser does not require tag features. I got this
 								trick from David Weiss's "Stack Combination" paper[^4].
 								To boost the representation, the tagger actually predicts a "super tag" with
 								POS, morphology and dependency label[^5]. The tagger predicts these supertags by
 								adding a softmax layer onto the convolutional layer – so, we're teaching the
 								convolutional layer to give us a representation that's one affine transform from
 								this informative lexical information. This is obviously good for the parser
 								(which backprops to the convolutions, too). The parser model makes a state
 								vector by concatenating the vector representations for its context tokens. The
 								current context tokens:
 								| Context tokens                                                                     | Description                                                                 |
 								| ---------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
 								| `S0`, `S1`, `S2`                                                                   | Top three words on the stack.                                               |
 								| `B0`, `B1`                                                                         | First two words of the buffer.                                              |
 								| `S0L1`, `S1L1`, `S2L1`, `B0L1`, `B1L1`<br />`S0L2`, `S1L2`, `S2L2`, `B0L2`, `B1L2` | Leftmost and second leftmost children of `S0`, `S1`, `S2`, `B0` and `B1`.   |
 								| `S0R1`, `S1R1`, `S2R1`, `B0R1`, `B1R1`<br />`S0R2`, `S1R2`, `S2R2`, `B0R2`, `B1R2` | Rightmost and second rightmost children of `S0`, `S1`, `S2`, `B0` and `B1`. |
 								This makes the state vector quite long: `13*T`, where `T` is the token vector
 								width (128 is working well). Fortunately, there's a way to structure the
 								computation to save some expense (and make it more GPU-friendly).
 								The parser typically visits `2*N` states for a sentence of length `N` (although
 								it may visit more, if it back-tracks with a non-monotonic transition[^4]). A
 								naive implementation would require `2*N (B, 13*T) @ (13*T, H)` matrix
 								multiplications for a batch of size `B`. We can instead perform one
 								`(B*N, T) @ (T, 13*H)` multiplication, to pre-compute the hidden weights for
 								each positional feature with respect to the words in the batch. (Note that our
 								token vectors come from the CNN — so we can't play this trick over the
 								vocabulary. That's how Stanford's NN parser[^3] works — and why its model is so
 								big.)
 								This pre-computation strategy allows a nice compromise between GPU-friendliness
 								and implementation simplicity. The CNN and the wide lower layer are computed on
 								the GPU, and then the precomputed hidden weights are moved to the CPU, before we
 								start the transition-based parsing process. This makes a lot of things much
 								easier. We don't have to worry about variable-length batch sizes, and we don't
 								have to implement the dynamic oracle in CUDA to train.
 								Currently the parser's loss function is multi-label log loss[^6], as the dynamic
 								oracle allows multiple states to be 0 cost. This is defined as follows, where
 								`gZ` is the sum of the scores assigned to gold classes:
 								```python
 								(exp(score) / Z) - (exp(score) / gZ)
 								```
 								<Infobox title="Bibliography">
 . [Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations {#fn-1}](https://www.semanticscholar.org/paper/Simple-and-Accurate-Dependency-Parsing-Using-Bidir-Kiperwasser-Goldberg/3cf31ecb2724b5088783d7c96a5fc0d5604cbf41).
 								   Eliyahu Kiperwasser, Yoav Goldberg. (2016)
 . [A Dynamic Oracle for Arc-Eager Dependency Parsing {#fn-2}](https://www.semanticscholar.org/paper/A-Dynamic-Oracle-for-Arc-Eager-Dependency-Parsing-Goldberg-Nivre/22697256ec19ecc3e14fcfc63624a44cf9c22df4).
 								   Yoav Goldberg, Joakim Nivre (2012)
 . [Parsing English in 500 Lines of Python {#fn-3}](https://explosion.ai/blog/parsing-english-in-python).
 								   Matthew Honnibal (2013)
 . [Stack-propagation: Improved Representation Learning for Syntax {#fn-4}](https://www.semanticscholar.org/paper/Stack-propagation-Improved-Representation-Learning-Zhang-Weiss/0c133f79b23e8c680891d2e49a66f0e3d37f1466).
 								   Yuan Zhang, David Weiss (2016)
 . [Deep multi-task learning with low level tasks supervised at lower layers {#fn-5}](https://www.semanticscholar.org/paper/Deep-multi-task-learning-with-low-level-tasks-supe-S%C3%B8gaard-Goldberg/03ad06583c9721855ccd82c3d969a01360218d86).
 								   Anders Søgaard, Yoav Goldberg (2016)
 . [An Improved Non-monotonic Transition System for Dependency Parsing {#fn-6}](https://www.semanticscholar.org/paper/An-Improved-Non-monotonic-Transition-System-for-De-Honnibal-Johnson/4094cee47ade13b77b5ab4d2e6cb9dd2b8a2917c).
 								   Matthew Honnibal, Mark Johnson (2015)
 . [A Fast and Accurate Dependency Parser using Neural Networks {#fn-7}](http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf).
 								   Danqi Cheng, Christopher D. Manning (2014)
 . [Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques {#fn-8}](https://www.semanticscholar.org/paper/Parsing-the-Wall-Street-Journal-using-a-Lexical-Fu-Riezler-King/0ad07862a91cd59b7eb5de38267e47725a62b8b2).
 								   Stefan Riezler et al. (2002)
 								</Infobox>
 								## Model naming conventions {#conventions}
 								In general, spaCy expects all model packages to follow the naming convention of
 								`[lang`\_[name]]. For spaCy's models, we also chose to divide the name into
 								three components:
 . **Type:** Model capabilities (e.g. `core` for general-purpose model with
 								   vocabulary, syntax, entities and word vectors, or `depent` for only vocab,
 								   syntax and entities).
 . **Genre:** Type of text the model is trained on, e.g. `web` or `news`.
 . **Size:** Model size indicator, `sm`, `md` or `lg`.
 								For example, `en_core_web_sm` is a small English model trained on written web
 								text (blogs, news, comments), that includes vocabulary, vectors, syntax and
 								entities.
 								### Model versioning {#model-versioning}
 								Additionally, the model versioning reflects both the compatibility with spaCy,
 								as well as the major and minor model version. A model version `a.b.c` translates
 								to:
 								- `a`: **spaCy major version**. For example, `2` for spaCy v2.x.
 								- `b`: **Model major version**. Models with a different major version can't be
 								  loaded by the same code. For example, changing the width of the model, adding
 								  hidden layers or changing the activation changes the model major version.
 								- `c`: **Model minor version**. Same model structure, but different parameter
 								  values, e.g. from being trained on different data, for different numbers of
 								  iterations, etc.
 								For a detailed compatibility overview, see the
 								[`compatibility.json`](https://github.com/explosion/spacy-models/tree/master/compatibility.json)
 								in the models repository. This is also the source of spaCy's internal
 								compatibility check, performed when you run the [`download`](/api/cli#download)
 								command.