An individual token — i.e. a word, punctuation symbol, whitespace, etc.
class
spacy/tokens/token.pyx
Token.__init__
Construct a Token object.
Example
doc=nlp("Give it back! He pleaded.")token=doc[0]asserttoken.text=="Give"
Name
Description
vocab
A storage container for lexical types. Vocab
doc
The parent document. Doc
offset
The index of the token within the document. int
Token.__len__
The number of unicode characters in the token, i.e. token.text.
Example
doc=nlp("Give it back! He pleaded.")token=doc[0]assertlen(token)==4
Name
Description
RETURNS
The number of unicode characters in the token. int
Token.set_extension
Define a custom attribute on the Token which becomes available via Token._.
For details, see the documentation on
custom attributes.
Example
fromspacy.tokensimportTokenfruit_getter=lambdatoken:token.textin("apple","pear","banana")Token.set_extension("is_fruit",getter=fruit_getter)doc=nlp("I have an apple")assertdoc[3]._.is_fruit
Name
Description
name
Name of the attribute to set by the extension. For example, "my_attr" will be available as token._.my_attr. str
default
Optional default value of the attribute if no getter or method is defined. Optional[Any]
method
Set a custom method on the object, for example token._.compare(other_token). Optional[CallableToken, ...], Any
getter
Getter function that takes the object and returns an attribute value. Is called when the user accesses the ._ attribute. Optional[CallableToken], Any
setter
Setter function that takes the Token and a value, and modifies the object. Is called when the user writes to the Token._ attribute. Optional[CallableToken, Any], None
force
Force overwriting existing attribute. bool
Token.get_extension
Look up a previously registered extension by name. Returns a 4-tuple
(default, method, getter, setter) if the extension is registered. Raises a
KeyError otherwise.
A (default, method, getter, setter) tuple of the removed extension. Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]
Token.check_flag
Check the value of a boolean flag.
Example
fromspacy.attrsimportIS_TITLEdoc=nlp("Give it back! He pleaded.")token=doc[0]asserttoken.check_flag(IS_TITLE)==True
Name
Description
flag_id
The attribute ID of the flag to check. int
RETURNS
Whether the flag is set. bool
Token.similarity
Compute a semantic similarity estimate. Defaults to cosine over vectors.
Example
apples,_,oranges=nlp("apples and oranges")apples_oranges=apples.similarity(oranges)oranges_apples=oranges.similarity(apples)assertapples_oranges==oranges_apples
Name
Description
other
The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. Union[Doc, Span, Token, Lexeme]
RETURNS
A scalar similarity score. Higher is more similar. float
Token.nbor
Get a neighboring token.
Example
doc=nlp("Give it back! He pleaded.")give_nbor=doc[0].nbor()assertgive_nbor.text=="it"
Name
Description
i
The relative position of the token to get. Defaults to 1. int
RETURNS
The token at position self.doc[self.i+i]. Token
Token.set_morph
Set the morphological analysis from a UD FEATS string, hash value of a UD FEATS
string, features dict or MorphAnalysis. The value None can be used to reset
the morph to an unset state.
Example
doc=nlp("Give it back! He pleaded.")doc[0].set_morph("Mood=Imp|VerbForm=Fin")assert"Mood=Imp"indoc[0].morphassertdoc[0].morph.get("Mood")==["Imp"]
Name
Description
features
The morphological features to set. Union[int, dict, str, MorphAnalysis, None]
Token.has_morph
Check whether the token has annotated morph information. Return False when the
morph annotation is unset/missing.
Name
Description
RETURNS
Whether the morph annotation is set. bool
Token.is_ancestor
Check whether this token is a parent, grandparent, etc. of another in the
dependency tree.
Example
doc=nlp("Give it back! He pleaded.")give=doc[0]it=doc[1]assertgive.is_ancestor(it)
Name
Description
descendant
Another token. Token
RETURNS
Whether this token is the ancestor of the descendant. bool
Token.ancestors
The rightmost token of this token's syntactic descendants.
Example
doc=nlp("Give it back! He pleaded.")it_ancestors=doc[1].ancestorsassert[t.textfortinit_ancestors]==["Give"]he_ancestors=doc[4].ancestorsassert[t.textfortinhe_ancestors]==["pleaded"]
Name
Description
YIELDS
A sequence of ancestor tokens such that ancestor.is_ancestor(self). Token
Token.conjuncts
A tuple of coordinated tokens, not including the token itself.
Example
doc=nlp("I like apples and oranges")apples_conjuncts=doc[2].conjunctsassert[t.textfortinapples_conjuncts]==["oranges"]
Name
Description
RETURNS
The coordinated tokens. Tuple[Token, ...]
Token.children
A sequence of the token's immediate syntactic children.
Example
doc=nlp("Give it back! He pleaded.")give_children=doc[0].childrenassert[t.textfortingive_children]==["it","back","!"]
Name
Description
YIELDS
A child token such that child.head == self. Token
Token.lefts
The leftward immediate children of the word in the syntactic dependency parse.
Example
doc=nlp("I like New York in Autumn.")lefts=[t.textfortindoc[3].lefts]assertlefts==["New"]
Name
Description
YIELDS
A left-child of the token. Token
Token.rights
The rightward immediate children of the word in the syntactic dependency parse.
Example
doc=nlp("I like New York in Autumn.")rights=[t.textfortindoc[3].rights]assertrights==["in"]
Name
Description
YIELDS
A right-child of the token. Token
Token.n_lefts
The number of leftward immediate children of the word in the syntactic
dependency parse.
Example
doc=nlp("I like New York in Autumn.")assertdoc[3].n_lefts==1
Name
Description
RETURNS
The number of left-child tokens. int
Token.n_rights
The number of rightward immediate children of the word in the syntactic
dependency parse.
Example
doc=nlp("I like New York in Autumn.")assertdoc[3].n_rights==1
Name
Description
RETURNS
The number of right-child tokens. int
Token.subtree
A sequence containing the token and all the token's syntactic descendants.
Example
doc=nlp("Give it back! He pleaded.")give_subtree=doc[0].subtreeassert[t.textfortingive_subtree]==["Give","it","back","!"]
Name
Description
YIELDS
A descendant token such that self.is_ancestor(token) or token == self. Token
Token.is_sent_start
A boolean value indicating whether the token starts a sentence. None if
unknown. Defaults to True for the first token in the Doc.
Example
doc=nlp("Give it back! He pleaded.")assertdoc[4].is_sent_startassertnotdoc[5].is_sent_start
Name
Description
RETURNS
Whether the token starts a sentence. Optional[bool]
Token.has_vector
A boolean value indicating whether a word vector is associated with the token.
Example
doc=nlp("I like apples")apples=doc[2]assertapples.has_vector
Name
Description
RETURNS
Whether the token has a vector data attached. bool
Token.vector
A real-valued meaning representation.
Example
doc=nlp("I like apples")apples=doc[2]assertapples.vector.dtype=="float32"assertapples.vector.shape==(300,)
Name
Description
RETURNS
A 1-dimensional array representing the token's vector. numpy.ndarray[ndim=1, dtype=float32]
Token.vector_norm
The L2 norm of the token's vector representation.
Example
doc=nlp("I like apples and pasta")apples=doc[2]pasta=doc[4]apples.vector_norm# 6.89589786529541pasta.vector_norm# 7.759851932525635assertapples.vector_norm!=pasta.vector_norm
Name
Description
RETURNS
The L2 norm of the vector representation. float
Attributes
Name
Description
doc
The parent document. Doc
lex 3
The underlying lexeme. Lexeme
sent 2.0.12
The sentence span that this token is a part of. Span
text
Verbatim text content. str
text_with_ws
Text content, with trailing space character if present. str
whitespace_
Trailing space character if present. str
orth
ID of the verbatim text content. int
orth_
Verbatim text content (identical to Token.text). Exists mostly for consistency with the other attributes. str
vocab
The vocab object of the parent Doc. vocab
tensor 2.1.7
The token's slice of the parent Doc's tensor. numpy.ndarray
head
The syntactic parent, or "governor", of this token. Token
left_edge
The leftmost token of this token's syntactic descendants. Token
right_edge
The rightmost token of this token's syntactic descendants. Token
i
The index of the token within the parent document. int
ent_type
Named entity type. int
ent_type_
Named entity type. str
ent_iob
IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set. int
ent_iob_
IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set. str
ent_kb_id 2.2
Knowledge base ID that refers to the named entity this token is a part of, if any. int
ent_kb_id_ 2.2
Knowledge base ID that refers to the named entity this token is a part of, if any. str
ent_id
ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. int
ent_id_
ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. str
lemma
Base form of the token, with no inflectional suffixes. int
lemma_
Base form of the token, with no inflectional suffixes. str
norm
The token's norm, i.e. a normalized form of the token text. Can be set in the language's tokenizer exceptions. int
norm_
The token's norm, i.e. a normalized form of the token text. Can be set in the language's tokenizer exceptions. str
lower
Lowercase form of the token. int
lower_
Lowercase form of the token text. Equivalent to Token.text.lower(). str
shape
Transform of the token's string to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". int
shape_
Transform of the token's string to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". str
prefix
Hash value of a length-N substring from the start of the token. Defaults to N=1. int
prefix_
A length-N substring from the start of the token. Defaults to N=1. str
suffix
Hash value of a length-N substring from the end of the token. Defaults to N=3. int
suffix_
Length-N substring from the end of the token. Defaults to N=3. str
is_alpha
Does the token consist of alphabetic characters? Equivalent to token.text.isalpha(). bool
is_ascii
Does the token consist of ASCII characters? Equivalent to all(ord(c) < 128 for c in token.text). bool
is_digit
Does the token consist of digits? Equivalent to token.text.isdigit(). bool
is_lower
Is the token in lowercase? Equivalent to token.text.islower(). bool
is_upper
Is the token in uppercase? Equivalent to token.text.isupper(). bool
is_title
Is the token in titlecase? Equivalent to token.text.istitle(). bool
is_punct
Is the token punctuation? bool
is_left_punct
Is the token a left punctuation mark, e.g. "(" ? bool
is_right_punct
Is the token a right punctuation mark, e.g. ")" ? bool
is_space
Does the token consist of whitespace characters? Equivalent to token.text.isspace(). bool
is_bracket
Is the token a bracket? bool
is_quote
Is the token a quotation mark? bool
is_currency 2.0.8
Is the token a currency symbol? bool
like_url
Does the token resemble a URL? bool
like_num
Does the token represent a number? e.g. "10.9", "10", "ten", etc. bool
like_email
Does the token resemble an email address? bool
is_oov
Is the token out-of-vocabulary (i.e. does it not have a word vector)? bool