This release features entirely new **deep learning-powered models** for spaCy's
tagger, parser and entity recognizer. The new models are **10× smaller**, **20%
more accurate** and **even cheaper to run** than the previous generation.
We've also made several usability improvements that are particularly helpful for
**production deployments**. spaCy v2 now fully supports the Pickle protocol,
making it easy to use spaCy with [Apache Spark](https://spark.apache.org/). The
string-to-integer mapping is **no longer stateful**, making it easy to reconcile
annotations made in different processes. Models are smaller and use less memory,
and the APIs for serialization are now much more consistent. Custom pipeline
components let you modify the `Doc` at any stage in the pipeline. You can now
also add your own custom attributes, properties and methods to the `Doc`,
`Token` and `Span`.
- [Summary](#summary)
- [New features](#features)
- [Neural network models](#features-models)
- [Improved processing pipelines](#features-pipelines)
- [Text classification](#features-text-classification)
- [Hash values as IDs](#features-hash-ids)
- [Improved word vectors support](#features-vectors)
- [Saving, loading and serialization](#features-serializer)
- [displaCy visualizer](#features-displacy)
- [Language data and lazy loading](#features-language)
- [Revised matcher API and phrase matcher](#features-matcher)
- [Backwards incompatibilities](#incompat)
- [Migrating from spaCy v1.x](#migrating)
The main usability improvements you'll notice in spaCy v2.0 are around
**defining, training and loading your own models** and components. The new
neural network models make it much easier to train a model from scratch, or
update an existing model with a few examples. In v1.x, the statistical models
depended on the state of the `Vocab`. If you taught the model a new word, you
would have to save and load a lot of data — otherwise the model wouldn't
correctly recall the features of your new example. That's no longer the case.
Due to some clever use of hashing, the statistical models **never change size**,
even as they learn new vocabulary items. The whole pipeline is also now fully
differentiable. Even if you don't have explicitly annotated data, you can update
spaCy using all the **latest deep learning tricks** like adversarial training,
noise contrastive estimation or reinforcement learning.
## New features {#features}
This section contains an overview of the most important **new features and
improvements**. The [API docs](/api) include additional deprecation notes.
### Convolutional neural network models {#features-models}
> #### Example
>
> ```bash
> python -m spacy download en_core_web_sm
> python -m spacy download de_core_news_sm
> python -m spacy download xx_ent_wiki_sm
> ```
spaCy v2.0 features new neural models for tagging, parsing and entity
recognition. The models have been designed and implemented from scratch
specifically for spaCy, to give you an unmatched balance of speed, size and
accuracy. The new models are **10× smaller**, **20% more accurate**, and **even
cheaper to run** than the previous generation.
spaCy v2.0's new neural network models bring significant improvements in
accuracy, especially for English Named Entity Recognition. The new
[`en_core_web_lg`](/models/en#en_core_web_lg) model makes about **25% fewer
mistakes** than the corresponding v1.x model and is within **1% of the current
state-of-the-art**
([Strubell et al., 2017](https://arxiv.org/pdf/1702.02098.pdf)). The v2.0 models
are also cheaper to run at scale, as they require **under 1 GB of memory** per
process.