mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-13 10:46:29 +03:00
Update docs and change integer IDs to hash values
This commit is contained in:
parent
738b4f7187
commit
c7b57ea314
|
@ -355,7 +355,7 @@ p
|
||||||
+row
|
+row
|
||||||
+cell #[code ent_id]
|
+cell #[code ent_id]
|
||||||
+cell int
|
+cell int
|
||||||
+cell The integer ID of the named entity the token is an instance of.
|
+cell The hash value of the named entity the token is an instance of.
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code ent_id_]
|
+cell #[code ent_id_]
|
||||||
|
|
|
@ -397,13 +397,15 @@ p The L2 norm of the token's vector representation.
|
||||||
+row
|
+row
|
||||||
+cell #[code shape_]
|
+cell #[code shape_]
|
||||||
+cell unicode
|
+cell unicode
|
||||||
|
+cell
|
||||||
| Transform of the tokens's string, to show orthographic features.
|
| Transform of the tokens's string, to show orthographic features.
|
||||||
| For example, "Xxxx" or "dd".
|
| For example, "Xxxx" or "dd".
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code prefix]
|
+cell #[code prefix]
|
||||||
+cell int
|
+cell int
|
||||||
+cell Integer ID of a length-N substring from the start of the
|
+cell
|
||||||
|
| Hash value of a length-N substring from the start of the
|
||||||
| token. Defaults to #[code N=1].
|
| token. Defaults to #[code N=1].
|
||||||
|
|
||||||
+row
|
+row
|
||||||
|
@ -417,7 +419,8 @@ p The L2 norm of the token's vector representation.
|
||||||
+cell #[code suffix]
|
+cell #[code suffix]
|
||||||
+cell int
|
+cell int
|
||||||
+cell
|
+cell
|
||||||
| Length-N substring from the end of the token. Defaults to #[code N=3].
|
| Hash value of a length-N substring from the end of the token.
|
||||||
|
| Defaults to #[code N=3].
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code suffix_]
|
+cell #[code suffix_]
|
||||||
|
|
|
@ -36,7 +36,7 @@ p Create the vocabulary.
|
||||||
+cell #[code strings]
|
+cell #[code strings]
|
||||||
+cell #[code StringStore]
|
+cell #[code StringStore]
|
||||||
+cell
|
+cell
|
||||||
| A #[code StringStore] that maps strings to integers, and vice
|
| A #[code StringStore] that maps strings to hash values, and vice
|
||||||
| versa.
|
| versa.
|
||||||
|
|
||||||
+footrow
|
+footrow
|
||||||
|
@ -74,7 +74,7 @@ p
|
||||||
+row
|
+row
|
||||||
+cell #[code id_or_string]
|
+cell #[code id_or_string]
|
||||||
+cell int / unicode
|
+cell int / unicode
|
||||||
+cell The integer ID of a word, or its unicode string.
|
+cell The hash value of a word, or its unicode string.
|
||||||
|
|
||||||
+footrow
|
+footrow
|
||||||
+cell returns
|
+cell returns
|
||||||
|
|
|
@ -12,7 +12,7 @@ p
|
||||||
p
|
p
|
||||||
| Linguistic annotations are available as
|
| Linguistic annotations are available as
|
||||||
| #[+api("token#attributes") #[code Token] attributes]. Like many NLP
|
| #[+api("token#attributes") #[code Token] attributes]. Like many NLP
|
||||||
| libraries, spaCy #[strong encodes all strings to integers] to reduce
|
| libraries, spaCy #[strong encodes all strings to hash values] to reduce
|
||||||
| memory usage and improve efficiency. So to get the readable string
|
| memory usage and improve efficiency. So to get the readable string
|
||||||
| representation of an attribute, we need to add an underscore #[code _]
|
| representation of an attribute, we need to add an underscore #[code _]
|
||||||
| to its name:
|
| to its name:
|
||||||
|
|
|
@ -43,7 +43,7 @@ p
|
||||||
+aside("Why saving the vocab?")
|
+aside("Why saving the vocab?")
|
||||||
| Saving the vocabulary with the #[code Doc] is important, because the
|
| Saving the vocabulary with the #[code Doc] is important, because the
|
||||||
| #[code Vocab] holds the context-independent information about the words,
|
| #[code Vocab] holds the context-independent information about the words,
|
||||||
| tags and labels, and their #[strong integer IDs]. If the #[code Vocab]
|
| tags and labels, and their #[strong hash values]. If the #[code Vocab]
|
||||||
| wasn't saved with the #[code Doc], spaCy wouldn't know how to resolve
|
| wasn't saved with the #[code Doc], spaCy wouldn't know how to resolve
|
||||||
| those IDs – for example, the word text or the dependency labels. You
|
| those IDs – for example, the word text or the dependency labels. You
|
||||||
| might be saving #[code 446] for "whale", but in a different vocabulary,
|
| might be saving #[code 446] for "whale", but in a different vocabulary,
|
||||||
|
|
|
@ -48,7 +48,7 @@ p
|
||||||
| #[strong connected by a single arc] in the dependency tree. The term
|
| #[strong connected by a single arc] in the dependency tree. The term
|
||||||
| #[strong dep] is used for the arc label, which describes the type of
|
| #[strong dep] is used for the arc label, which describes the type of
|
||||||
| syntactic relation that connects the child to the head. As with other
|
| syntactic relation that connects the child to the head. As with other
|
||||||
| attributes, the value of #[code .dep] is an integer. You can get
|
| attributes, the value of #[code .dep] is a hash value. You can get
|
||||||
| the string value with #[code .dep_].
|
| the string value with #[code .dep_].
|
||||||
|
|
||||||
+code("Example").
|
+code("Example").
|
||||||
|
|
|
@ -20,7 +20,7 @@ p
|
||||||
| The standard way to access entity annotations is the
|
| The standard way to access entity annotations is the
|
||||||
| #[+api("doc#ents") #[code doc.ents]] property, which produces a sequence
|
| #[+api("doc#ents") #[code doc.ents]] property, which produces a sequence
|
||||||
| of #[+api("span") #[code Span]] objects. The entity type is accessible
|
| of #[+api("span") #[code Span]] objects. The entity type is accessible
|
||||||
| either as an integer ID or as a string, using the attributes
|
| either as a hash value or as a string, using the attributes
|
||||||
| #[code ent.label] and #[code ent.label_]. The #[code Span] object acts
|
| #[code ent.label] and #[code ent.label_]. The #[code Span] object acts
|
||||||
| as a sequence of tokens, so you can iterate over the entity or index into
|
| as a sequence of tokens, so you can iterate over the entity or index into
|
||||||
| it. You can also get the text form of the whole entity, as though it were
|
| it. You can also get the text form of the whole entity, as though it were
|
||||||
|
@ -78,7 +78,7 @@ p
|
||||||
doc = nlp(u'Netflix is hiring a new VP of global policy')
|
doc = nlp(u'Netflix is hiring a new VP of global policy')
|
||||||
# the model didn't recognise any entities :(
|
# the model didn't recognise any entities :(
|
||||||
|
|
||||||
ORG = doc.vocab.strings[u'ORG'] # get integer ID of entity label
|
ORG = doc.vocab.strings[u'ORG'] # get hash value of entity label
|
||||||
netflix_ent = Span(doc, 0, 1, label=ORG) # create a Span for the new entity
|
netflix_ent = Span(doc, 0, 1, label=ORG) # create a Span for the new entity
|
||||||
doc.ents = [netflix_ent]
|
doc.ents = [netflix_ent]
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user