Update v2 docs and add benchmarks stub

2025-10-19 02:04:19 +03:00 · 2017-06-04 15:34:28 +02:00 · 2017-06-04 15:34:28 +02:00 · 468ff1a7dd
commit 468ff1a7dd
parent 23fd6b1782
1 changed files with 90 additions and 57 deletions
--- a/website/docs/usage/v2.jade
+++ b/website/docs/usage/v2.jade
@ -3,58 +3,69 @@
 include ../../_includes/_mixins

 p
-    |  We're very excited to finally introduce spaCy v2.0. This release features
-    |  entirely new deep learning-powered models for spaCy's tagger, parser and
-    |  entity recognizer. The new models are #[strong 20x smaller] than the linear
-    |  models that have powered spaCy until now: from 300mb to only 14mb.  Speed
-    |  and accuracy are currently comparable to the 1.x models: speed on CPU is
-    |  slightly lower, while accuracy is slightly higher. We expect performance to
-    |  improve quickly between now and the release date, as we run more experiments
-    |  and optimize the implementation.
-
-p
-    |  The main usability improvements you'll notice in spaCy 2 are around the
-    |  defining, training and loading your own models and components. The new neural
-    |  network models make it much easier to train a model from scratch, or update
-    |  an existing model with a few examples. In v1, the statistical models depended
-    |  on the state of the vocab. If you taught the model a new word, you would have
-    |  to save and load a lot of data -- otherwise the model wouldn't correctly
-    |  recall the features of your new example. That's no longer the case. Due to some
-    |  clever use of hashing, the statistical models never change size, even as they
-    |  learn new vocabulary items. The whole pipeline is also now fully differentiable,
-    |  so even if you don't have explicitly annotated data, you can update spaCy using
-    |  all the latest deep learning tricks: adversarial training, noise contrastive
-    |  estimation, reinforcement learning, etc.
-
-p
-    |  Finally, we've made several usability improvements that are particularly helpful
-    |  for production deployments.  spaCy 2 now fully supports the Pickle protocol,
-    |  making it easy to use spaCy with Apache Spark. The string-to-integer mapping is
-    |  no longer stateful, making it easy to reconcile annotations made in different
-    |  processes. Models are smaller and use less memory, and the APIs for serialization
-    |  are now much more consistent.
-
-p
-    |  Because we'e made so many architectural changes to the library, we've tried to
-    |  keep breaking changes to a minimum. A lot of projects follow the philosophy that
-    |  if you're going to break anything, you may as well break everything. We think
-    |  migration is easier if there's a logic to what's changed. We've therefore followed
-    |  a policy of avoiding breaking changes to the #[code Doc], #[code Span] and #[code Token]
-    |  objects. This way, you can focus on only migrating the code that does training, loading
-    |  and serialisation --- in other words, code that works with the #[code nlp] object directly.
-    |  Code that uses the annotations should continue to work.
-
-p
-    |  On this page, you'll find a summary of the #[+a("#features") new features],
-    |  information on the #[+a("#incompat") backwards incompatibilities],
-    |  including a handy overview of what's been renamed or deprecated.
-    |  To help you make the most of v2.0, we also
+    |  We're very excited to finally introduce spaCy v2.0! On this page, you'll
+    |  find a summary of the new features, information on the backwards
+    |  incompatibilities, including a handy overview of what's been renamed or
+    |  deprecated. To help you make the most of v2.0, we also
    |  #[strong re-wrote almost all of the usage guides and API docs], and added
    |  more real-world examples. If you're new to spaCy, or just want to brush
    |  up on some NLP basics and the details of the library, check out
    |  the #[+a("/docs/usage/spacy-101") spaCy 101 guide] that explains the most
    |  important concepts with examples and illustrations.

+h(2, "summary") Summary
+
+grid.o-no-block
+    +grid-col("half")
+
+        p This release features
+            |  entirely new #[strong deep learning-powered models] for spaCy's tagger,
+            |  parser and entity recognizer. The new models are #[strong 20x smaller]
+            |  than the linear models that have powered spaCy until now: from 300 MB to
+            |  only 14 MB.
+
+        p
+            |  We've also made several usability improvements that are
+            |  particularly helpful for #[strong production deployments]. spaCy
+            |  v2 now fully supports the Pickle protocol, making it easy to use
+            |  spaCy with #[+a("https://spark.apache.org/") Apache Spark]. The
+            |  string-to-integer mapping is #[strong no longer stateful], making
+            |  it easy to reconcile annotations made in different processes.
+            |  Models are smaller and use less memory, and the APIs for serialization
+            |  are now much more consistent.
+
+    +table-of-contents
+        +item #[+a("#summary") Summary]
+        +item #[+a("#features") New features]
+        +item #[+a("#features-pipelines") Improved processing pipelines]
+        +item #[+a("#features-hash-ids") Hash values instead of integer IDs]
+        +item #[+a("#features-serializer") Saving, loading and serialization]
+        +item #[+a("#features-displacy") displaCy visualizer]
+        +item #[+a("#features-language") Language data and lazy loading]
+        +item #[+a("#features-matcher") Revised matcher API]
+        +item #[+a("#features-models") Neural network models]
+        +item #[+a("#incompat") Backwards incompatibilities]
+        +item #[+a("#migrating") Migrating from spaCy v1.x]
+        +item #[+a("#benchmarks") Benchmarks]
+
+p
+    |  The main usability improvements you'll notice in spaCy v2.0 are around
+    |  #[strong defining, training and loading your own models] and components.
+    |  The new neural network models make it much easier to train a model from
+    |  scratch, or update an existing model with a few examples. In v1.x, the
+    |  statistical models depended on the state of the #[code Vocab]. If you
+    |  taught the model a new word, you would have to save and load a lot of
+    |  data — otherwise the model wouldn't correctly recall the features of your
+    |  new example. That's no longer the case.
+
+p
+    |  Due to some clever use of hashing, the statistical models
+    |  #[strong never change size], even as they learn new vocabulary items.
+    |  The whole pipeline is also now fully differentiable. Even if you don't
+    |  have explicitly annotated data, you can update spaCy using all the
+    |  #[strong latest deep learning tricks] like adversarial training, noise
+    |  contrastive estimation or reinforcement learning.
+
 +h(2, "features") New features

 p
@ -334,19 +345,23 @@ p
 +h(2, "migrating") Migrating from spaCy 1.x

 p
+    |  Because we'e made so many architectural changes to the library, we've
+    |  tried to #[strong keep breaking changes to a minimum]. A lot of projects
+    |  follow the philosophy that if you're going to break anything, you may as
+    |  well break everything. We think migration is easier if there's a logic to
+    |  what has changed.

-+infobox("Some tips")
-    |  Before migrating, we strongly recommend writing a few
-    |  #[strong simple tests] specific to how you're using spaCy in your
-    |  application. This makes it easier to check whether your code requires
-    |  changes, and if so, which parts are affected.
-    |  (By the way, feel free contribute your tests to
-    |  #[+src(gh("spaCy", "spacy/tests")) our test suite] – this will also ensure
-    |  we never accidentally introduce a bug in a workflow that's
-    |  important to you.) If you've trained your own models, keep in mind that
-    |  your train and runtime inputs must match. This means you'll have to
-    |  #[strong retrain your models] with spaCy v2.0 to make them compatible.
+p
+    |  We've therefore followed a policy of avoiding breaking changes to the
+    |  #[code Doc], #[code Span] and #[code Token] objects. This way, you can
+    |  focus on only migrating the code that does training, loading and
+    |  serialization — in other words, code that works with the #[code nlp]
+    |  object directly. Code that uses the annotations should continue to work.

+infobox("Important note")
+    |  If you've trained your own models, keep in mind that your train and
+    |  runtime inputs must match. This means you'll have to
+    |  #[strong retrain your models] with spaCy v2.0.

 +h(3, "migrating-saving-loading") Saving, loading and serialization

@ -448,3 +463,21 @@ p
    |  the doc, the index of the current match and all total matches. This lets
    |  you both accept or reject the match, and define the actions to be
    |  triggered.
+
+h(2, "benchmarks") Benchmarks
+
+table(["Model", "Version", "Type", "UAS", "LAS", "NER F", "POS", "w/s"])
+    +row
+        +cell #[code en_core_web_sm]
+        for cell in ["2.0.0", "neural", "", "", "", "", ""]
+            +cell=cell
+
+    +row
+        +cell #[code es_dep_web_sm]
+        for cell in ["2.0.0", "neural", "", "", "", "", ""]
+            +cell=cell
+
+    +row("divider")
+        +cell #[code en_core_web_sm]
+        for cell in ["1.1.0", "linear", "", "", "", "", ""]
+            +cell=cell