Update docs and change integer IDs to hash values

2025-08-02 03:10:22 +03:00 · 2017-05-28 19:25:34 +02:00 · 2017-05-28 19:25:34 +02:00 · c7b57ea314
commit c7b57ea314
parent 738b4f7187
7 changed files with 13 additions and 10 deletions
--- a/website/docs/api/span.jade
+++ b/website/docs/api/span.jade
@ -355,7 +355,7 @@ p
    +row
        +cell #[code ent_id]
        +cell int
-        +cell The integer ID of the named entity the token is an instance of.
+        +cell The hash value of the named entity the token is an instance of.

    +row
        +cell #[code ent_id_]
--- a/website/docs/api/token.jade
+++ b/website/docs/api/token.jade
@ -397,13 +397,15 @@ p The L2 norm of the token's vector representation.
    +row
        +cell #[code shape_]
        +cell unicode
+        +cell
            |  Transform of the tokens's string, to show orthographic features.
            |  For example, "Xxxx" or "dd".

    +row
        +cell #[code prefix]
        +cell int
-        +cell Integer ID of a length-N substring from the start of the
+        +cell
+            |  Hash value of a length-N substring from the start of the
            |  token. Defaults to #[code N=1].

    +row
@ -417,7 +419,8 @@ p The L2 norm of the token's vector representation.
        +cell #[code suffix]
        +cell int
        +cell
-            |  Length-N substring from the end of the token. Defaults to #[code N=3].
+            |  Hash value of a length-N substring from the end of the token.
+            |  Defaults to #[code N=3].

    +row
        +cell #[code suffix_]
--- a/website/docs/api/vocab.jade
+++ b/website/docs/api/vocab.jade
@ -36,7 +36,7 @@ p Create the vocabulary.
        +cell #[code strings]
        +cell #[code StringStore]
        +cell
-            |  A #[code StringStore] that maps strings to integers, and vice
+            |  A #[code StringStore] that maps strings to hash values, and vice
            |  versa.

    +footrow
@ -74,7 +74,7 @@ p
    +row
        +cell #[code id_or_string]
        +cell int / unicode
-        +cell The integer ID of a word, or its unicode string.
+        +cell The hash value of a word, or its unicode string.

    +footrow
        +cell returns
--- a/website/docs/usage/_spacy-101/_pos-deps.jade
+++ b/website/docs/usage/_spacy-101/_pos-deps.jade
@ -12,7 +12,7 @@ p
 p
    |  Linguistic annotations are available as
    |  #[+api("token#attributes") #[code Token] attributes]. Like many NLP
-    |  libraries, spaCy #[strong encodes all strings to integers] to reduce
+    |  libraries, spaCy #[strong encodes all strings to hash values] to reduce
    |  memory usage and improve efficiency. So to get the readable string
    |  representation of an attribute, we need to add an underscore #[code _]
    |  to its name:
--- a/website/docs/usage/_spacy-101/_serialization.jade
+++ b/website/docs/usage/_spacy-101/_serialization.jade
@ -43,7 +43,7 @@ p
 +aside("Why saving the vocab?")
    |  Saving the vocabulary with the #[code Doc] is important, because the
    |  #[code Vocab] holds the context-independent information about the words,
-    |  tags and labels, and their #[strong integer IDs]. If the #[code Vocab]
+    |  tags and labels, and their #[strong hash values]. If the #[code Vocab]
    |  wasn't saved with the #[code Doc], spaCy wouldn't know how to resolve
    |  those IDs – for example, the word text or the dependency labels. You
    |  might be saving #[code 446] for "whale", but in a different vocabulary,
--- a/website/docs/usage/dependency-parse.jade
+++ b/website/docs/usage/dependency-parse.jade
@ -48,7 +48,7 @@ p
    |  #[strong connected by a single arc] in the dependency tree. The term
    |  #[strong dep] is used for the arc label, which describes the type of
    |  syntactic relation that connects the child to the head. As with other
-    |  attributes, the value of #[code .dep] is an integer. You can get
+    |  attributes, the value of #[code .dep] is a hash value. You can get
    |  the string value with #[code .dep_].

 +code("Example").
--- a/website/docs/usage/entity-recognition.jade
+++ b/website/docs/usage/entity-recognition.jade
@ -20,7 +20,7 @@ p
    |  The standard way to access entity annotations is the
    |  #[+api("doc#ents") #[code doc.ents]] property, which produces a sequence
    |  of #[+api("span") #[code Span]] objects. The entity type is accessible
-    |  either as an integer ID or as a string, using the attributes
+    |  either as a hash value or as a string, using the attributes
    |  #[code ent.label] and #[code ent.label_]. The #[code Span] object acts
    |  as a sequence of tokens, so you can iterate over the entity or index into
    |  it. You can also get the text form of the whole entity, as though it were
@ -78,7 +78,7 @@ p
    doc = nlp(u'Netflix is hiring a new VP of global policy')
    # the model didn't recognise any entities :(

-    ORG = doc.vocab.strings[u'ORG'] # get integer ID of entity label
+    ORG = doc.vocab.strings[u'ORG'] # get hash value of entity label
    netflix_ent = Span(doc, 0, 1, label=ORG) # create a Span for the new entity
    doc.ents = [netflix_ent]