Update 101 and usage docs

2025-07-15 18:52:29 +03:00 · 2017-05-28 00:03:16 +02:00 · 2017-05-28 00:03:16 +02:00 · b03fb2d7b0
commit b03fb2d7b0
parent 49235017bf
4 changed files with 7 additions and 3 deletions
--- a/website/docs/usage/_spacy-101/_vocab-stringstore.jade
+++ b/website/docs/usage/_spacy-101/_vocab-stringstore.jade
@ -89,4 +89,6 @@ p

 p
    |  Even though both #[code Doc] objects contain the same words, the internal
-    |  integer IDs are very different.
+    |  integer IDs are very different. The same applies for all other strings,
+    |  like the annotation scheme. To avoid mismatched IDs, spaCy will always
+    |  export the vocab if you save a #[code Doc] or #[code nlp] object.
--- a/website/docs/usage/lightning-tour.jade
+++ b/website/docs/usage/lightning-tour.jade
@ -139,6 +139,8 @@ p
    new_doc = Doc(Vocab()).from_disk('/moby_dick.bin')

 +infobox
+    |  #[strong API:] #[+api("language") #[code Language]],
+    |  #[+api("doc") #[code Doc]]
    |  #[strong Usage:] #[+a("/docs/usage/saving-loading") Saving and loading]

 +h(2, "rule-matcher") Match text with token rules
--- a/website/docs/usage/rule-based-matching.jade
+++ b/website/docs/usage/rule-based-matching.jade
@ -345,7 +345,7 @@ p
    |  account and check the #[code subtree] for intensifiers like "very", to
    |  increase the sentiment score. At some point, you might also want to train
    |  a sentiment model. However, the approach described in this example is
-    |  very useful for #[strong bootstrapping rules to gather training data].
+    |  very useful for #[strong bootstrapping rules to collect training data].
    |  It's also an incredibly fast way to gather first insights into your data
    |  – with about 1 million tweets, you'd be looking at a processing time of
    |  #[strong under 1 minute].