Create a Span object from the slice doc[start : end].
Example
doc=nlp("Give it back! He pleaded.")span=doc[1:4]assert[t.textfortinspan]==["it","back","!"]
Name
Type
Description
doc
Doc
The parent document.
start
int
The index of the first token of the span.
end
int
The index of the first token after the span.
label
int / str
A label to attach to the span, e.g. for named entities. As of v2.1, the label can also be a string.
kb_id
int / str
A knowledge base ID to attach to the span, e.g. for named entities. The ID can be an integer or a string.
vector
numpy.ndarray[ndim=1, dtype="float32"]
A meaning representation of the span.
Span.__getitem__
Get a Token object.
Example
doc=nlp("Give it back! He pleaded.")span=doc[1:4]assertspan[1].text=="back"
Name
Type
Description
i
int
The index of the token within the span.
RETURNS
Token
The token at span[i].
Get a Span object.
Example
doc=nlp("Give it back! He pleaded.")span=doc[1:4]assertspan[1:3].text=="back!"
Name
Type
Description
start_end
tuple
The slice of the span to get.
RETURNS
Span
The span at span[start : end].
Span.__iter__
Iterate over Token objects.
Example
doc=nlp("Give it back! He pleaded.")span=doc[1:4]assert[t.textfortinspan]==["it","back","!"]
Name
Type
Description
YIELDS
Token
A Token object.
Span.__len__
Get the number of tokens in the span.
Example
doc=nlp("Give it back! He pleaded.")span=doc[1:4]assertlen(span)==3
Name
Type
Description
RETURNS
int
The number of tokens in the span.
Span.set_extension
Define a custom attribute on the Span which becomes available via Span._.
For details, see the documentation on
custom attributes.
Example
fromspacy.tokensimportSpancity_getter=lambdaspan:any(cityinspan.textforcityin("New York","Paris","Berlin"))Span.set_extension("has_city",getter=city_getter)doc=nlp("I like New York in Autumn")assertdoc[1:4]._.has_city
Name
Type
Description
name
str
Name of the attribute to set by the extension. For example, "my_attr" will be available as span._.my_attr.
default
-
Optional default value of the attribute if no getter or method is defined.
method
callable
Set a custom method on the object, for example span._.compare(other_span).
getter
callable
Getter function that takes the object and returns an attribute value. Is called when the user accesses the ._ attribute.
setter
callable
Setter function that takes the Span and a value, and modifies the object. Is called when the user writes to the Span._ attribute.
force
bool
Force overwriting existing attribute.
Span.get_extension
Look up a previously registered extension by name. Returns a 4-tuple
(default, method, getter, setter) if the extension is registered. Raises a
KeyError otherwise.
A (default, method, getter, setter) tuple of the removed extension.
Span.char_span
Create a Span object from the slice span.text[start:end]. Returns None if
the character indices don't map to a valid span.
Example
doc=nlp("I like New York")span=doc[1:4].char_span(5,13,label="GPE")assertspan.text=="New York"
Name
Type
Description
start
int
The index of the first character of the span.
end
int
The index of the last character after the span.
label
uint64 / str
A label to attach to the span, e.g. for named entities.
kb_id
uint64 / str
An ID from a knowledge base to capture the meaning of a named entity.
vector
numpy.ndarray[ndim=1, dtype="float32"]
A meaning representation of the span.
RETURNS
Span
The newly constructed object or None.
Span.similarity
Make a semantic similarity estimate. The default estimate is cosine similarity
using an average of word vectors.
Example
doc=nlp("green apples and red oranges")green_apples=doc[:2]red_oranges=doc[3:]apples_oranges=green_apples.similarity(red_oranges)oranges_apples=red_oranges.similarity(green_apples)assertapples_oranges==oranges_apples
Name
Type
Description
other
-
The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects.
RETURNS
float
A scalar similarity score. Higher is more similar.
Span.get_lca_matrix
Calculates the lowest common ancestor matrix for a given Span. Returns LCA
matrix containing the integer index of the ancestor, or -1 if no common
ancestor is found, e.g. if span excludes a necessary ancestor.
Example
doc=nlp("I like New York in Autumn")span=doc[1:4]matrix=span.get_lca_matrix()# array([[0, 0, 0], [0, 1, 2], [0, 2, 2]], dtype=int32)
Name
Type
Description
RETURNS
numpy.ndarray[ndim=2, dtype="int32"]
The lowest common ancestor matrix of the Span.
Span.to_array
Given a list of M attribute IDs, export the tokens to a numpy ndarray of
shape (N, M), where N is the length of the document. The values will be
32-bit integers.
Example
fromspacy.attrsimportLOWER,POS,ENT_TYPE,IS_ALPHAdoc=nlp("I like New York in Autumn.")span=doc[2:3]# All strings mapped to integers, for easy export to numpynp_array=span.to_array([LOWER,POS,ENT_TYPE,IS_ALPHA])
Name
Type
Description
attr_ids
list
A list of attribute ID ints.
RETURNS
numpy.ndarray[long, ndim=2]
A feature matrix, with one row per word, and one column per attribute indicated in the input attr_ids.
Span.ents
The named entities in the span. Returns a tuple of named entity Span objects,
if the entity recognizer has been applied.
Example
doc=nlp("Mr. Best flew to New York on Saturday morning.")span=doc[0:6]ents=list(span.ents)assertents[0].label==346assertents[0].label_=="PERSON"assertents[0].text=="Mr. Best"
Name
Type
Description
RETURNS
tuple
Entities in the span, one Span per entity.
Span.as_doc
Create a new Doc object corresponding to the Span, with a copy of the data.
Example
doc=nlp("I like New York in Autumn.")span=doc[2:4]doc2=span.as_doc()assertdoc2.text=="New York"
Name
Type
Description
copy_user_data
bool
Whether or not to copy the original doc's user data.
RETURNS
Doc
A Doc object of the Span's content.
Span.root
The token with the shortest path to the root of the sentence (or the root
itself). If multiple tokens are equally high in the tree, the first token is
taken.
Example
doc=nlp("I like New York in Autumn.")i,like,new,york,in_,autumn,dot=range(len(doc))assertdoc[new].head.text=="York"assertdoc[york].head.text=="like"new_york=doc[new:york+1]assertnew_york.root.text=="York"
Name
Type
Description
RETURNS
Token
The root token.
Span.conjuncts
A tuple of tokens coordinated to span.root.
Example
doc=nlp("I like apples and oranges")apples_conjuncts=doc[2:3].conjunctsassert[t.textfortinapples_conjuncts]==["oranges"]
Name
Type
Description
RETURNS
tuple
The coordinated tokens.
Span.lefts
Tokens that are to the left of the span, whose heads are within the span.
Example
doc=nlp("I like New York in Autumn.")lefts=[t.textfortindoc[3:7].lefts]assertlefts==["New"]
Name
Type
Description
YIELDS
Token
A left-child of a token of the span.
Span.rights
Tokens that are to the right of the span, whose heads are within the span.
Example
doc=nlp("I like New York in Autumn.")rights=[t.textfortindoc[2:4].rights]assertrights==["in"]
Name
Type
Description
YIELDS
Token
A right-child of a token of the span.
Span.n_lefts
The number of tokens that are to the left of the span, whose heads are within
the span.
Example
doc=nlp("I like New York in Autumn.")assertdoc[3:7].n_lefts==1
Name
Type
Description
RETURNS
int
The number of left-child tokens.
Span.n_rights
The number of tokens that are to the right of the span, whose heads are within
the span.
Example
doc=nlp("I like New York in Autumn.")assertdoc[2:4].n_rights==1
Name
Type
Description
RETURNS
int
The number of right-child tokens.
Span.subtree
Tokens within the span and tokens which descend from them.
Example
doc=nlp("Give it back! He pleaded.")subtree=[t.textfortindoc[:3].subtree]assertsubtree==["Give","it","back","!"]
Name
Type
Description
YIELDS
Token
A token within the span, or a descendant from it.
Span.has_vector
A boolean value indicating whether a word vector is associated with the object.
Example
doc=nlp("I like apples")assertdoc[1:].has_vector
Name
Type
Description
RETURNS
bool
Whether the span has a vector data attached.
Span.vector
A real-valued meaning representation. Defaults to an average of the token
vectors.
Example
doc=nlp("I like apples")assertdoc[1:].vector.dtype=="float32"assertdoc[1:].vector.shape==(300,)
Name
Type
Description
RETURNS
numpy.ndarray[ndim=1, dtype="float32"]
A 1D numpy array representing the span's semantics.
Span.vector_norm
The L2 norm of the span's vector representation.
Example
doc=nlp("I like apples")doc[1:].vector_norm# 4.800883928527915doc[2:].vector_norm# 6.895897646384268assertdoc[1:].vector_norm!=doc[2:].vector_norm
Name
Type
Description
RETURNS
float
The L2 norm of the vector representation.
Attributes
Name
Type
Description
doc
Doc
The parent document.
tensor 2.1.7
ndarray
The span's slice of the parent Doc's tensor.
sent
Span
The sentence span that this span is a part of.
start
int
The token offset for the start of the span.
end
int
The token offset for the end of the span.
start_char
int
The character offset for the start of the span.
end_char
int
The character offset for the end of the span.
text
str
A string representation of the span text.
text_with_ws
str
The text content of the span with a trailing whitespace character if the last token has one.
orth
int
ID of the verbatim text content.
orth_
str
Verbatim text content (identical to Span.text). Exists mostly for consistency with the other attributes.
label
int
The hash value of the span's label.
label_
str
The span's label.
lemma_
str
The span's lemma.
kb_id
int
The hash value of the knowledge base ID referred to by the span.
kb_id_
str
The knowledge base ID referred to by the span.
ent_id
int
The hash value of the named entity the token is an instance of.
ent_id_
str
The string ID of the named entity the token is an instance of.
sentiment
float
A scalar value indicating the positivity or negativity of the span.