Update 101 and usage docs

2025-11-03 09:27:56 +03:00 · 2017-05-28 00:03:16 +02:00 · 2017-05-28 00:03:16 +02:00 · b03fb2d7b0
commit b03fb2d7b0
parent 49235017bf
4 changed files with 7 additions and 3 deletions
--- a/website/assets/img/docs/pipeline.svg
+++ b/website/assets/img/docs/pipeline.svg
@ -2,7 +2,7 @@
    <style>
        .svg__pipeline__text { fill: #1a1e23; font: 20px "Source Sans Pro" }
        .svg__pipeline__text-small { fill: #1a1e23; font: bold 18px "Source Sans Pro" }
-        .svg__pipeline__text-code {  fill: #1a1e23; font: 600 16px "Source Code Pro" }
+        .svg__pipeline__text-code { fill: #1a1e23; font: 600 16px "Source Code Pro" }
    </style>
    <rect width="601" height="127" x="159" y="21" fill="none" stroke="#09a3d5" stroke-width="3" rx="19.1" stroke-dasharray="3 6" ry="19.1"/>
    <path fill="#e1d5e7" stroke="#9673a6" stroke-width="2" d="M801 55h120v60H801z"/>
--- a/website/docs/usage/_spacy-101/_vocab-stringstore.jade
+++ b/website/docs/usage/_spacy-101/_vocab-stringstore.jade
@ -89,4 +89,6 @@ p

 p
    |  Even though both #[code Doc] objects contain the same words, the internal
-    |  integer IDs are very different.
+    |  integer IDs are very different. The same applies for all other strings,
+    |  like the annotation scheme. To avoid mismatched IDs, spaCy will always
+    |  export the vocab if you save a #[code Doc] or #[code nlp] object.
--- a/website/docs/usage/lightning-tour.jade
+++ b/website/docs/usage/lightning-tour.jade
@ -139,6 +139,8 @@ p
    new_doc = Doc(Vocab()).from_disk('/moby_dick.bin')

 +infobox
+    |  #[strong API:] #[+api("language") #[code Language]],
+    |  #[+api("doc") #[code Doc]]
    |  #[strong Usage:] #[+a("/docs/usage/saving-loading") Saving and loading]

 +h(2, "rule-matcher") Match text with token rules
--- a/website/docs/usage/rule-based-matching.jade
+++ b/website/docs/usage/rule-based-matching.jade
@ -345,7 +345,7 @@ p
    |  account and check the #[code subtree] for intensifiers like "very", to
    |  increase the sentiment score. At some point, you might also want to train
    |  a sentiment model. However, the approach described in this example is
-    |  very useful for #[strong bootstrapping rules to gather training data].
+    |  very useful for #[strong bootstrapping rules to collect training data].
    |  It's also an incredibly fast way to gather first insights into your data
    |  – with about 1 million tweets, you'd be looking at a processing time of
    |  #[strong under 1 minute].