This release features entirely new **deep learning-powered models** for spaCy's
tagger, parser and entity recognizer. The new models are **10× smaller**, **20%
more accurate** and **even cheaper to run** than the previous generation.
We've also made several usability improvements that are particularly helpful for
**production deployments**. spaCy v2 now fully supports the Pickle protocol,
making it easy to use spaCy with [Apache Spark](https://spark.apache.org/). The
string-to-integer mapping is **no longer stateful**, making it easy to reconcile
annotations made in different processes. Models are smaller and use less memory,
and the APIs for serialization are now much more consistent. Custom pipeline
components let you modify the `Doc` at any stage in the pipeline. You can now
also add your own custom attributes, properties and methods to the `Doc`,
`Token` and `Span`.
- [Summary](#summary)
- [New features](#features)
- [Neural network models](#features-models)
- [Improved processing pipelines](#features-pipelines)
- [Text classification](#features-text-classification)
- [Hash values as IDs](#features-hash-ids)
- [Improved word vectors support](#features-vectors)
- [Saving, loading and serialization](#features-serializer)
- [displaCy visualizer](#features-displacy)
- [Language data and lazy loading](#features-language)
- [Revised matcher API and phrase matcher](#features-matcher)
- [Backwards incompatibilities](#incompat)
- [Migrating from spaCy v1.x](#migrating)
The main usability improvements you'll notice in spaCy v2.0 are around
**defining, training and loading your own models** and components. The new
neural network models make it much easier to train a model from scratch, or
update an existing model with a few examples. In v1.x, the statistical models
depended on the state of the `Vocab`. If you taught the model a new word, you
would have to save and load a lot of data — otherwise the model wouldn't
correctly recall the features of your new example. That's no longer the case.
Due to some clever use of hashing, the statistical models **never change size**,
even as they learn new vocabulary items. The whole pipeline is also now fully
differentiable. Even if you don't have explicitly annotated data, you can update
spaCy using all the **latest deep learning tricks** like adversarial training,
noise contrastive estimation or reinforcement learning.
## New features {#features}
This section contains an overview of the most important **new features and
improvements**. The [API docs](/api) include additional deprecation notes. New
methods and functions that were introduced in this version are marked with the
tag