diff --git a/website/docs/api/span.jade b/website/docs/api/span.jade index 25083c694..542336714 100644 --- a/website/docs/api/span.jade +++ b/website/docs/api/span.jade @@ -355,7 +355,7 @@ p +row +cell #[code ent_id] +cell int - +cell The integer ID of the named entity the token is an instance of. + +cell The hash value of the named entity the token is an instance of. +row +cell #[code ent_id_] diff --git a/website/docs/api/token.jade b/website/docs/api/token.jade index ee989047c..87387e09d 100644 --- a/website/docs/api/token.jade +++ b/website/docs/api/token.jade @@ -397,13 +397,15 @@ p The L2 norm of the token's vector representation. +row +cell #[code shape_] +cell unicode + +cell | Transform of the tokens's string, to show orthographic features. | For example, "Xxxx" or "dd". +row +cell #[code prefix] +cell int - +cell Integer ID of a length-N substring from the start of the + +cell + | Hash value of a length-N substring from the start of the | token. Defaults to #[code N=1]. +row @@ -417,7 +419,8 @@ p The L2 norm of the token's vector representation. +cell #[code suffix] +cell int +cell - | Length-N substring from the end of the token. Defaults to #[code N=3]. + | Hash value of a length-N substring from the end of the token. + | Defaults to #[code N=3]. +row +cell #[code suffix_] diff --git a/website/docs/api/vocab.jade b/website/docs/api/vocab.jade index 277fed5d3..ce62612d3 100644 --- a/website/docs/api/vocab.jade +++ b/website/docs/api/vocab.jade @@ -36,7 +36,7 @@ p Create the vocabulary. +cell #[code strings] +cell #[code StringStore] +cell - | A #[code StringStore] that maps strings to integers, and vice + | A #[code StringStore] that maps strings to hash values, and vice | versa. +footrow @@ -74,7 +74,7 @@ p +row +cell #[code id_or_string] +cell int / unicode - +cell The integer ID of a word, or its unicode string. + +cell The hash value of a word, or its unicode string. +footrow +cell returns diff --git a/website/docs/usage/_spacy-101/_pos-deps.jade b/website/docs/usage/_spacy-101/_pos-deps.jade index b42847aee..52a7fdd3c 100644 --- a/website/docs/usage/_spacy-101/_pos-deps.jade +++ b/website/docs/usage/_spacy-101/_pos-deps.jade @@ -12,7 +12,7 @@ p p | Linguistic annotations are available as | #[+api("token#attributes") #[code Token] attributes]. Like many NLP - | libraries, spaCy #[strong encodes all strings to integers] to reduce + | libraries, spaCy #[strong encodes all strings to hash values] to reduce | memory usage and improve efficiency. So to get the readable string | representation of an attribute, we need to add an underscore #[code _] | to its name: diff --git a/website/docs/usage/_spacy-101/_serialization.jade b/website/docs/usage/_spacy-101/_serialization.jade index a763f422b..5620a6151 100644 --- a/website/docs/usage/_spacy-101/_serialization.jade +++ b/website/docs/usage/_spacy-101/_serialization.jade @@ -43,7 +43,7 @@ p +aside("Why saving the vocab?") | Saving the vocabulary with the #[code Doc] is important, because the | #[code Vocab] holds the context-independent information about the words, - | tags and labels, and their #[strong integer IDs]. If the #[code Vocab] + | tags and labels, and their #[strong hash values]. If the #[code Vocab] | wasn't saved with the #[code Doc], spaCy wouldn't know how to resolve | those IDs – for example, the word text or the dependency labels. You | might be saving #[code 446] for "whale", but in a different vocabulary, diff --git a/website/docs/usage/dependency-parse.jade b/website/docs/usage/dependency-parse.jade index 683991d95..beae36578 100644 --- a/website/docs/usage/dependency-parse.jade +++ b/website/docs/usage/dependency-parse.jade @@ -48,7 +48,7 @@ p | #[strong connected by a single arc] in the dependency tree. The term | #[strong dep] is used for the arc label, which describes the type of | syntactic relation that connects the child to the head. As with other - | attributes, the value of #[code .dep] is an integer. You can get + | attributes, the value of #[code .dep] is a hash value. You can get | the string value with #[code .dep_]. +code("Example"). diff --git a/website/docs/usage/entity-recognition.jade b/website/docs/usage/entity-recognition.jade index 0155cf2e4..f9bfd4df9 100644 --- a/website/docs/usage/entity-recognition.jade +++ b/website/docs/usage/entity-recognition.jade @@ -20,7 +20,7 @@ p | The standard way to access entity annotations is the | #[+api("doc#ents") #[code doc.ents]] property, which produces a sequence | of #[+api("span") #[code Span]] objects. The entity type is accessible - | either as an integer ID or as a string, using the attributes + | either as a hash value or as a string, using the attributes | #[code ent.label] and #[code ent.label_]. The #[code Span] object acts | as a sequence of tokens, so you can iterate over the entity or index into | it. You can also get the text form of the whole entity, as though it were @@ -78,7 +78,7 @@ p doc = nlp(u'Netflix is hiring a new VP of global policy') # the model didn't recognise any entities :( - ORG = doc.vocab.strings[u'ORG'] # get integer ID of entity label + ORG = doc.vocab.strings[u'ORG'] # get hash value of entity label netflix_ent = Span(doc, 0, 1, label=ORG) # create a Span for the new entity doc.ents = [netflix_ent]