| title | 
teaser | 
tag | 
source | 
| Token | 
An individual token — i.e. a word, punctuation symbol, whitespace, etc. | 
class | 
spacy/tokens/token.pyx | 
Token.__init__
Construct a Token object.
Example
doc = nlp(u"Give it back! He pleaded.")
token = doc[0]
assert token.text == u"Give"
| Name | 
Type | 
Description | 
vocab | 
Vocab | 
A storage container for lexical types. | 
doc | 
Doc | 
The parent document. | 
offset | 
int | 
The index of the token within the document. | 
| RETURNS | 
Token | 
The newly constructed object. | 
Token.__len__
The number of unicode characters in the token, i.e. token.text.
Example
doc = nlp(u"Give it back! He pleaded.")
token = doc[0]
assert len(token) == 4
| Name | 
Type | 
Description | 
| RETURNS | 
int | 
The number of unicode characters in the token. | 
Token.set_extension
Define a custom attribute on the Token which becomes available via Token._.
For details, see the documentation on
custom attributes.
Example
from spacy.tokens import Token
fruit_getter = lambda token: token.text in (u"apple", u"pear", u"banana")
Token.set_extension("is_fruit", getter=fruit_getter)
doc = nlp(u"I have an apple")
assert doc[3]._.is_fruit
| Name | 
Type | 
Description | 
name | 
unicode | 
Name of the attribute to set by the extension. For example, 'my_attr' will be available as token._.my_attr. | 
default | 
- | 
Optional default value of the attribute if no getter or method is defined. | 
method | 
callable | 
Set a custom method on the object, for example token._.compare(other_token). | 
getter | 
callable | 
Getter function that takes the object and returns an attribute value. Is called when the user accesses the ._ attribute. | 
setter | 
callable | 
Setter function that takes the Token and a value, and modifies the object. Is called when the user writes to the Token._ attribute. | 
Token.get_extension
Look up a previously registered extension by name. Returns a 4-tuple
(default, method, getter, setter) if the extension is registered. Raises a
KeyError otherwise.
Example
from spacy.tokens import Token
Token.set_extension("is_fruit", default=False)
extension = Token.get_extension("is_fruit")
assert extension == (False, None, None, None)
| Name | 
Type | 
Description | 
name | 
unicode | 
Name of the extension. | 
| RETURNS | 
tuple | 
A (default, method, getter, setter) tuple of the extension. | 
Token.has_extension
Check whether an extension has been registered on the Token class.
Example
from spacy.tokens import Token
Token.set_extension("is_fruit", default=False)
assert Token.has_extension("is_fruit")
| Name | 
Type | 
Description | 
name | 
unicode | 
Name of the extension to check. | 
| RETURNS | 
bool | 
Whether the extension has been registered. | 
Token.remove_extension {#remove_extension tag="classmethod" new=""2.0.11""}
Remove a previously registered extension.
Example
from spacy.tokens import Token
Token.set_extension("is_fruit", default=False)
removed = Token.remove_extension("is_fruit")
assert not Token.has_extension("is_fruit")
| Name | 
Type | 
Description | 
name | 
unicode | 
Name of the extension. | 
| RETURNS | 
tuple | 
A (default, method, getter, setter) tuple of the removed extension. | 
Token.check_flag
Check the value of a boolean flag.
Example
from spacy.attrs import IS_TITLE
doc = nlp(u"Give it back! He pleaded.")
token = doc[0]
assert token.check_flag(IS_TITLE) == True
| Name | 
Type | 
Description | 
flag_id | 
int | 
The attribute ID of the flag to check. | 
| RETURNS | 
bool | 
Whether the flag is set. | 
Token.similarity
Compute a semantic similarity estimate. Defaults to cosine over vectors.
Example
apples, _, oranges = nlp(u"apples and oranges")
apples_oranges = apples.similarity(oranges)
oranges_apples = oranges.similarity(apples)
assert apples_oranges == oranges_apples
| Name | 
Type | 
Description | 
| other | 
- | 
The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. | 
| RETURNS | 
float | 
A scalar similarity score. Higher is more similar. | 
Token.nbor
Get a neighboring token.
Example
doc = nlp(u"Give it back! He pleaded.")
give_nbor = doc[0].nbor()
assert give_nbor.text == u"it"
| Name | 
Type | 
Description | 
i | 
int | 
The relative position of the token to get. Defaults to 1. | 
| RETURNS | 
Token | 
The token at position self.doc[self.i+i]. | 
Token.is_ancestor
Check whether this token is a parent, grandparent, etc. of another in the
dependency tree.
Example
doc = nlp(u"Give it back! He pleaded.")
give = doc[0]
it = doc[1]
assert give.is_ancestor(it)
| Name | 
Type | 
Description | 
| descendant | 
Token | 
Another token. | 
| RETURNS | 
bool | 
Whether this token is the ancestor of the descendant. | 
Token.ancestors
The rightmost token of this token's syntactic descendants.
Example
doc = nlp(u"Give it back! He pleaded.")
it_ancestors = doc[1].ancestors
assert [t.text for t in it_ancestors] == [u"Give"]
he_ancestors = doc[4].ancestors
assert [t.text for t in he_ancestors] == [u"pleaded"]
| Name | 
Type | 
Description | 
| YIELDS | 
Token | 
A sequence of ancestor tokens such that ancestor.is_ancestor(self). | 
Token.conjuncts
A sequence of coordinated tokens, including the token itself.
Example
doc = nlp(u"I like apples and oranges")
apples_conjuncts = doc[2].conjuncts
assert [t.text for t in apples_conjuncts] == [u"oranges"]
| Name | 
Type | 
Description | 
| YIELDS | 
Token | 
A coordinated token. | 
Token.children
A sequence of the token's immediate syntactic children.
Example
doc = nlp(u"Give it back! He pleaded.")
give_children = doc[0].children
assert [t.text for t in give_children] == [u"it", u"back", u"!"]
| Name | 
Type | 
Description | 
| YIELDS | 
Token | 
A child token such that child.head==self. | 
Token.lefts
The leftward immediate children of the word, in the syntactic dependency parse.
Example
doc = nlp(u"I like New York in Autumn.")
lefts = [t.text for t in doc[3].lefts]
assert lefts == [u'New']
| Name | 
Type | 
Description | 
| YIELDS | 
Token | 
A left-child of the token. | 
Token.rights
The rightward immediate children of the word, in the syntactic dependency parse.
Example
doc = nlp(u"I like New York in Autumn.")
rights = [t.text for t in doc[3].rights]
assert rights == [u"in"]
| Name | 
Type | 
Description | 
| YIELDS | 
Token | 
A right-child of the token. | 
Token.n_lefts
The number of leftward immediate children of the word, in the syntactic
dependency parse.
Example
doc = nlp(u"I like New York in Autumn.")
assert doc[3].n_lefts == 1
| Name | 
Type | 
Description | 
| RETURNS | 
int | 
The number of left-child tokens. | 
Token.n_rights
The number of rightward immediate children of the word, in the syntactic
dependency parse.
Example
doc = nlp(u"I like New York in Autumn.")
assert doc[3].n_rights == 1
| Name | 
Type | 
Description | 
| RETURNS | 
int | 
The number of right-child tokens. | 
Token.subtree
A sequence containing the token and all the token's syntactic descendants.
Example
doc = nlp(u"Give it back! He pleaded.")
give_subtree = doc[0].subtree
assert [t.text for t in give_subtree] == [u"Give", u"it", u"back", u"!"]
| Name | 
Type | 
Description | 
| YIELDS | 
Token | 
A descendant token such that self.is_ancestor(token) or token == self. | 
Token.is_sent_start
A boolean value indicating whether the token starts a sentence. None if
unknown. Defaults to True for the first token in the doc.
Example
doc = nlp(u"Give it back! He pleaded.")
assert doc[4].is_sent_start
assert not doc[5].is_sent_start
| Name | 
Type | 
Description | 
| RETURNS | 
bool | 
Whether the token starts a sentence. | 
As of spaCy v2.0, the Token.sent_start property is deprecated and has been
replaced with Token.is_sent_start, which returns a boolean value instead of a
misleading 0 for False and 1 for True. It also now returns None if the
answer is unknown, and fixes a quirk in the old logic that would always set the
property to 0 for the first word of the document.
- assert doc[4].sent_start == 1
+ assert doc[4].is_sent_start == True
Token.has_vector
A boolean value indicating whether a word vector is associated with the token.
Example
doc = nlp(u"I like apples")
apples = doc[2]
assert apples.has_vector
| Name | 
Type | 
Description | 
| RETURNS | 
bool | 
Whether the token has a vector data attached. | 
Token.vector
A real-valued meaning representation.
Example
doc = nlp(u"I like apples")
apples = doc[2]
assert apples.vector.dtype == "float32"
assert apples.vector.shape == (300,)
| Name | 
Type | 
Description | 
| RETURNS | 
numpy.ndarray[ndim=1, dtype='float32'] | 
A 1D numpy array representing the token's semantics. | 
Token.vector_norm
The L2 norm of the token's vector representation.
Example
doc = nlp(u"I like apples and pasta")
apples = doc[2]
pasta = doc[4]
apples.vector_norm  # 6.89589786529541
pasta.vector_norm  # 7.759851932525635
assert apples.vector_norm != pasta.vector_norm
| Name | 
Type | 
Description | 
| RETURNS | 
float | 
The L2 norm of the vector representation. | 
Attributes
| Name | 
Type | 
Description | 
doc | 
Doc | 
The parent document. | 
sent 2.0.12 | 
Span | 
The sentence span that this token is a part of. | 
text | 
unicode | 
Verbatim text content. | 
text_with_ws | 
unicode | 
Text content, with trailing space character if present. | 
whitespace_ | 
unicode | 
Trailing space character if present. | 
orth | 
int | 
ID of the verbatim text content. | 
orth_ | 
unicode | 
Verbatim text content (identical to Token.text). Exists mostly for consistency with the other attributes. | 
vocab | 
Vocab | 
The vocab object of the parent Doc. | 
doc | 
Doc | 
The parent document. | 
head | 
Token | 
The syntactic parent, or "governor", of this token. | 
left_edge | 
Token | 
The leftmost token of this token's syntactic descendants. | 
right_edge | 
Token | 
The rightmost token of this token's syntactic descendants. | 
i | 
int | 
The index of the token within the parent document. | 
ent_type | 
int | 
Named entity type. | 
ent_type_ | 
unicode | 
Named entity type. | 
ent_iob | 
int | 
IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set. | 
ent_iob_ | 
unicode | 
IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set. | 
ent_id | 
int | 
ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. | 
ent_id_ | 
unicode | 
ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. | 
lemma | 
int | 
Base form of the token, with no inflectional suffixes. | 
lemma_ | 
unicode | 
Base form of the token, with no inflectional suffixes. | 
norm | 
int | 
The token's norm, i.e. a normalized form of the token text. Usually set in the language's tokenizer exceptions or norm exceptions. | 
norm_ | 
unicode | 
The token's norm, i.e. a normalized form of the token text. Usually set in the language's tokenizer exceptions or norm exceptions. | 
lower | 
int | 
Lowercase form of the token. | 
lower_ | 
unicode | 
Lowercase form of the token text. Equivalent to Token.text.lower(). | 
shape | 
int | 
Transform of the tokens's string, to show orthographic features. For example, "Xxxx" or "dd". | 
shape_ | 
unicode | 
Transform of the tokens's string, to show orthographic features. For example, "Xxxx" or "dd". | 
prefix | 
int | 
Hash value of a length-N substring from the start of the token. Defaults to N=1. | 
prefix_ | 
unicode | 
A length-N substring from the start of the token. Defaults to N=1. | 
suffix | 
int | 
Hash value of a length-N substring from the end of the token. Defaults to N=3. | 
suffix_ | 
unicode | 
Length-N substring from the end of the token. Defaults to N=3. | 
is_alpha | 
bool | 
Does the token consist of alphabetic characters? Equivalent to token.text.isalpha(). | 
is_ascii | 
bool | 
Does the token consist of ASCII characters? Equivalent to all(ord(c) < 128 for c in token.text). | 
is_digit | 
bool | 
Does the token consist of digits? Equivalent to token.text.isdigit(). | 
is_lower | 
bool | 
Is the token in lowercase? Equivalent to token.text.islower(). | 
is_upper | 
bool | 
Is the token in uppercase? Equivalent to token.text.isupper(). | 
is_title | 
bool | 
Is the token in titlecase? Equivalent to token.text.istitle(). | 
is_punct | 
bool | 
Is the token punctuation? | 
is_left_punct | 
bool | 
Is the token a left punctuation mark, e.g. (? | 
is_right_punct | 
bool | 
Is the token a right punctuation mark, e.g. )? | 
is_space | 
bool | 
Does the token consist of whitespace characters? Equivalent to token.text.isspace(). | 
is_bracket | 
bool | 
Is the token a bracket? | 
is_quote | 
bool | 
Is the token a quotation mark? | 
is_currency 2.0.8 | 
bool | 
Is the token a currency symbol? | 
like_url | 
bool | 
Does the token resemble a URL? | 
like_num | 
bool | 
Does the token represent a number? e.g. "10.9", "10", "ten", etc. | 
like_email | 
bool | 
Does the token resemble an email address? | 
is_oov | 
bool | 
Is the token out-of-vocabulary? | 
is_stop | 
bool | 
Is the token part of a "stop list"? | 
pos | 
int | 
Coarse-grained part-of-speech. | 
pos_ | 
unicode | 
Coarse-grained part-of-speech. | 
tag | 
int | 
Fine-grained part-of-speech. | 
tag_ | 
unicode | 
Fine-grained part-of-speech. | 
dep | 
int | 
Syntactic dependency relation. | 
dep_ | 
unicode | 
Syntactic dependency relation. | 
lang | 
int | 
Language of the parent document's vocabulary. | 
lang_ | 
unicode | 
Language of the parent document's vocabulary. | 
prob | 
float | 
Smoothed log probability estimate of token's type. | 
idx | 
int | 
The character offset of the token within the parent document. | 
sentiment | 
float | 
A scalar value indicating the positivity or negativity of the token. | 
lex_id | 
int | 
Sequential ID of the token's lexical type. | 
rank | 
int | 
Sequential ID of the token's lexical type, used to index into tables, e.g. for word vectors. | 
cluster | 
int | 
Brown cluster ID. | 
_ | 
Underscore | 
User space for adding custom attribute extensions. |