spaCy/website/usage/_v2/_summary.jade

75 lines
3.9 KiB
Plaintext
Raw Normal View History

//- 💫 DOCS > USAGE > WHAT'S NEW IN V2.0 > SUMMARY
p
| We're very excited to finally introduce spaCy v2.0! On this page, you'll
| find a summary of the new features, information on the backwards
| incompatibilities, including a handy overview of what's been renamed or
| deprecated. To help you make the most of v2.0, we also
| #[strong re-wrote almost all of the usage guides and API docs], and added
| more #[+a("/usage/examples") real-world examples]. If you're new to
| spaCy, or just want to brush up on some NLP basics and the details of
| the library, check out the
| #[+a("/usage/spacy-101") spaCy 101 guide] that explains the most
| important concepts with examples and illustrations.
+h(2, "summary") Summary
+grid.o-no-block
+grid-col("half")
p
| This release features entirely new
| #[strong deep learning-powered models] for spaCy's tagger,
| parser and entity recognizer. The new models are
| #[strong 10× smaller], #[strong 20% more accurate] and
2017-11-06 23:15:36 +03:00
| #[strong even cheaper to run] than the previous generation.
p
| We've also made several usability improvements that are
| particularly helpful for #[strong production deployments].
| spaCy v2 now fully supports the Pickle protocol, making it
| easy to use spaCy with
| #[+a("https://spark.apache.org/") Apache Spark]. The
| string-to-integer mapping is #[strong no longer stateful],
| making it easy to reconcile annotations made in different
| processes. Models are smaller and use less memory, and the
| APIs for serialization are now much more consistent. Custom
| pipeline components let you modify the #[code Doc] at any
| stage in the pipeline. You can now also add your own
| custom attributes, properties and methods to the #[code Doc],
| #[code Token] and #[code Span].
+table-of-contents
+item #[+a("#summary") Summary]
+item #[+a("#features") New features]
+item #[+a("#features-models") Neural network models]
+item #[+a("#features-pipelines") Improved processing pipelines]
+item #[+a("#features-text-classification") Text classification]
+item #[+a("#features-hash-ids") Hash values as IDs]
+item #[+a("#features-vectors") Improved word vectors support]
+item #[+a("#features-serializer") Saving, loading and serialization]
+item #[+a("#features-displacy") displaCy visualizer]
+item #[+a("#features-language") Language data and lazy loading]
+item #[+a("#features-matcher") Revised matcher API and phrase matcher]
+item #[+a("#incompat") Backwards incompatibilities]
+item #[+a("#migrating") Migrating from spaCy v1.x]
+item #[+a("#benchmarks") Benchmarks]
p
| The main usability improvements you'll notice in spaCy v2.0 are around
| #[strong defining, training and loading your own models] and components.
| The new neural network models make it much easier to train a model from
| scratch, or update an existing model with a few examples. In v1.x, the
| statistical models depended on the state of the #[code Vocab]. If you
| taught the model a new word, you would have to save and load a lot of
| data — otherwise the model wouldn't correctly recall the features of your
| new example. That's no longer the case.
p
| Due to some clever use of hashing, the statistical models
| #[strong never change size], even as they learn new vocabulary items.
| The whole pipeline is also now fully differentiable. Even if you don't
| have explicitly annotated data, you can update spaCy using all the
| #[strong latest deep learning tricks] like adversarial training, noise
| contrastive estimation or reinforcement learning.