mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-25 13:11:03 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			491 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			491 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Span
 | |
| tag: class
 | |
| source: spacy/tokens/span.pyx
 | |
| ---
 | |
| 
 | |
| A slice from a [`Doc`](/api/doc) object.
 | |
| 
 | |
| ## Span.\_\_init\_\_ {#init tag="method"}
 | |
| 
 | |
| Create a Span object from the slice `doc[start : end]`.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("Give it back! He pleaded.")
 | |
| > span = doc[1:4]
 | |
| > assert [t.text for t in span] ==  ["it", "back", "!"]
 | |
| > ```
 | |
| 
 | |
| | Name        | Type                                     | Description                                                                                                       |
 | |
| | ----------- | ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
 | |
| | `doc`       | `Doc`                                    | The parent document.                                                                                              |
 | |
| | `start`     | int                                      | The index of the first token of the span.                                                                         |
 | |
| | `end`       | int                                      | The index of the first token after the span.                                                                      |
 | |
| | `label`     | int / unicode                            | A label to attach to the span, e.g. for named entities. As of v2.1, the label can also be a unicode string.       |
 | |
| | `kb_id`     | int / unicode                            | A knowledge base ID to attach to the span, e.g. for named entities. The ID can be an integer or a unicode string. |
 | |
| | `vector`    | `numpy.ndarray[ndim=1, dtype='float32']` | A meaning representation of the span.                                                                             |
 | |
| | **RETURNS** | `Span`                                   | The newly constructed object.                                                                                     |
 | |
| 
 | |
| ## Span.\_\_getitem\_\_ {#getitem tag="method"}
 | |
| 
 | |
| Get a `Token` object.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("Give it back! He pleaded.")
 | |
| > span = doc[1:4]
 | |
| > assert span[1].text == "back"
 | |
| > ```
 | |
| 
 | |
| | Name        | Type    | Description                             |
 | |
| | ----------- | ------- | --------------------------------------- |
 | |
| | `i`         | int     | The index of the token within the span. |
 | |
| | **RETURNS** | `Token` | The token at `span[i]`.                 |
 | |
| 
 | |
| Get a `Span` object.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("Give it back! He pleaded.")
 | |
| > span = doc[1:4]
 | |
| > assert span[1:3].text == "back!"
 | |
| > ```
 | |
| 
 | |
| | Name        | Type   | Description                      |
 | |
| | ----------- | ------ | -------------------------------- |
 | |
| | `start_end` | tuple  | The slice of the span to get.    |
 | |
| | **RETURNS** | `Span` | The span at `span[start : end]`. |
 | |
| 
 | |
| ## Span.\_\_iter\_\_ {#iter tag="method"}
 | |
| 
 | |
| Iterate over `Token` objects.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("Give it back! He pleaded.")
 | |
| > span = doc[1:4]
 | |
| > assert [t.text for t in span] == ["it", "back", "!"]
 | |
| > ```
 | |
| 
 | |
| | Name       | Type    | Description       |
 | |
| | ---------- | ------- | ----------------- |
 | |
| | **YIELDS** | `Token` | A `Token` object. |
 | |
| 
 | |
| ## Span.\_\_len\_\_ {#len tag="method"}
 | |
| 
 | |
| Get the number of tokens in the span.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("Give it back! He pleaded.")
 | |
| > span = doc[1:4]
 | |
| > assert len(span) == 3
 | |
| > ```
 | |
| 
 | |
| | Name        | Type | Description                       |
 | |
| | ----------- | ---- | --------------------------------- |
 | |
| | **RETURNS** | int  | The number of tokens in the span. |
 | |
| 
 | |
| ## Span.set_extension {#set_extension tag="classmethod" new="2"}
 | |
| 
 | |
| Define a custom attribute on the `Span` which becomes available via `Span._`.
 | |
| For details, see the documentation on
 | |
| [custom attributes](/usage/processing-pipelines#custom-components-attributes).
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > from spacy.tokens import Span
 | |
| > city_getter = lambda span: any(city in span.text for city in ("New York", "Paris", "Berlin"))
 | |
| > Span.set_extension("has_city", getter=city_getter)
 | |
| > doc = nlp("I like New York in Autumn")
 | |
| > assert doc[1:4]._.has_city
 | |
| > ```
 | |
| 
 | |
| | Name      | Type     | Description                                                                                                                           |
 | |
| | --------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------- |
 | |
| | `name`    | unicode  | Name of the attribute to set by the extension. For example, `'my_attr'` will be available as `span._.my_attr`.                        |
 | |
| | `default` | -        | Optional default value of the attribute if no getter or method is defined.                                                            |
 | |
| | `method`  | callable | Set a custom method on the object, for example `span._.compare(other_span)`.                                                          |
 | |
| | `getter`  | callable | Getter function that takes the object and returns an attribute value. Is called when the user accesses the `._` attribute.            |
 | |
| | `setter`  | callable | Setter function that takes the `Span` and a value, and modifies the object. Is called when the user writes to the `Span._` attribute. |
 | |
| | `force`   | bool     | Force overwriting existing attribute.                                                                                                 |
 | |
| 
 | |
| ## Span.get_extension {#get_extension tag="classmethod" new="2"}
 | |
| 
 | |
| Look up a previously registered extension by name. Returns a 4-tuple
 | |
| `(default, method, getter, setter)` if the extension is registered. Raises a
 | |
| `KeyError` otherwise.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > from spacy.tokens import Span
 | |
| > Span.set_extension("is_city", default=False)
 | |
| > extension = Span.get_extension("is_city")
 | |
| > assert extension == (False, None, None, None)
 | |
| > ```
 | |
| 
 | |
| | Name        | Type    | Description                                                   |
 | |
| | ----------- | ------- | ------------------------------------------------------------- |
 | |
| | `name`      | unicode | Name of the extension.                                        |
 | |
| | **RETURNS** | tuple   | A `(default, method, getter, setter)` tuple of the extension. |
 | |
| 
 | |
| ## Span.has_extension {#has_extension tag="classmethod" new="2"}
 | |
| 
 | |
| Check whether an extension has been registered on the `Span` class.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > from spacy.tokens import Span
 | |
| > Span.set_extension("is_city", default=False)
 | |
| > assert Span.has_extension("is_city")
 | |
| > ```
 | |
| 
 | |
| | Name        | Type    | Description                                |
 | |
| | ----------- | ------- | ------------------------------------------ |
 | |
| | `name`      | unicode | Name of the extension to check.            |
 | |
| | **RETURNS** | bool    | Whether the extension has been registered. |
 | |
| 
 | |
| ## Span.remove_extension {#remove_extension tag="classmethod" new="2.0.12"}
 | |
| 
 | |
| Remove a previously registered extension.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > from spacy.tokens import Span
 | |
| > Span.set_extension("is_city", default=False)
 | |
| > removed = Span.remove_extension("is_city")
 | |
| > assert not Span.has_extension("is_city")
 | |
| > ```
 | |
| 
 | |
| | Name        | Type    | Description                                                           |
 | |
| | ----------- | ------- | --------------------------------------------------------------------- |
 | |
| | `name`      | unicode | Name of the extension.                                                |
 | |
| | **RETURNS** | tuple   | A `(default, method, getter, setter)` tuple of the removed extension. |
 | |
| 
 | |
| ## Span.similarity {#similarity tag="method" model="vectors"}
 | |
| 
 | |
| Make a semantic similarity estimate. The default estimate is cosine similarity
 | |
| using an average of word vectors.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("green apples and red oranges")
 | |
| > green_apples = doc[:2]
 | |
| > red_oranges = doc[3:]
 | |
| > apples_oranges = green_apples.similarity(red_oranges)
 | |
| > oranges_apples = red_oranges.similarity(green_apples)
 | |
| > assert apples_oranges == oranges_apples
 | |
| > ```
 | |
| 
 | |
| | Name        | Type  | Description                                                                                  |
 | |
| | ----------- | ----- | -------------------------------------------------------------------------------------------- |
 | |
| | `other`     | -     | The object to compare with. By default, accepts `Doc`, `Span`, `Token` and `Lexeme` objects. |
 | |
| | **RETURNS** | float | A scalar similarity score. Higher is more similar.                                           |
 | |
| 
 | |
| ## Span.get_lca_matrix {#get_lca_matrix tag="method"}
 | |
| 
 | |
| Calculates the lowest common ancestor matrix for a given `Span`. Returns LCA
 | |
| matrix containing the integer index of the ancestor, or `-1` if no common
 | |
| ancestor is found, e.g. if span excludes a necessary ancestor.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like New York in Autumn")
 | |
| > span = doc[1:4]
 | |
| > matrix = span.get_lca_matrix()
 | |
| > # array([[0, 0, 0], [0, 1, 2], [0, 2, 2]], dtype=int32)
 | |
| > ```
 | |
| 
 | |
| | Name        | Type                                   | Description                                      |
 | |
| | ----------- | -------------------------------------- | ------------------------------------------------ |
 | |
| | **RETURNS** | `numpy.ndarray[ndim=2, dtype='int32']` | The lowest common ancestor matrix of the `Span`. |
 | |
| 
 | |
| ## Span.to_array {#to_array tag="method" new="2"}
 | |
| 
 | |
| Given a list of `M` attribute IDs, export the tokens to a numpy `ndarray` of
 | |
| shape `(N, M)`, where `N` is the length of the document. The values will be
 | |
| 32-bit integers.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
 | |
| > doc = nlp("I like New York in Autumn.")
 | |
| > span = doc[2:3]
 | |
| > # All strings mapped to integers, for easy export to numpy
 | |
| > np_array = span.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
 | |
| > ```
 | |
| 
 | |
| | Name        | Type                          | Description                                                                                              |
 | |
| | ----------- | ----------------------------- | -------------------------------------------------------------------------------------------------------- |
 | |
| | `attr_ids`  | list                          | A list of attribute ID ints.                                                                             |
 | |
| | **RETURNS** | `numpy.ndarray[long, ndim=2]` | A feature matrix, with one row per word, and one column per attribute indicated in the input `attr_ids`. |
 | |
| 
 | |
| ## Span.merge {#merge tag="method"}
 | |
| 
 | |
| <Infobox title="Deprecation note" variant="danger">
 | |
| 
 | |
| As of v2.1.0, `Span.merge` still works but is considered deprecated. You should
 | |
| use the new and less error-prone [`Doc.retokenize`](/api/doc#retokenize)
 | |
| instead.
 | |
| 
 | |
| </Infobox>
 | |
| 
 | |
| Retokenize the document, such that the span is merged into a single token.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like New York in Autumn.")
 | |
| > span = doc[2:4]
 | |
| > span.merge()
 | |
| > assert len(doc) == 6
 | |
| > assert doc[2].text == "New York"
 | |
| > ```
 | |
| 
 | |
| | Name           | Type    | Description                                                                                                               |
 | |
| | -------------- | ------- | ------------------------------------------------------------------------------------------------------------------------- |
 | |
| | `**attributes` | -       | Attributes to assign to the merged token. By default, attributes are inherited from the syntactic root token of the span. |
 | |
| | **RETURNS**    | `Token` | The newly merged token.                                                                                                   |
 | |
| 
 | |
| ## Span.ents {#ents tag="property" new="2.0.13" model="ner"}
 | |
| 
 | |
| The named entities in the span. Returns a tuple of named entity `Span` objects,
 | |
| if the entity recognizer has been applied.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("Mr. Best flew to New York on Saturday morning.")
 | |
| > span = doc[0:6]
 | |
| > ents = list(span.ents)
 | |
| > assert ents[0].label == 346
 | |
| > assert ents[0].label_ == "PERSON"
 | |
| > assert ents[0].text == "Mr. Best"
 | |
| > ```
 | |
| 
 | |
| | Name        | Type  | Description                                  |
 | |
| | ----------- | ----- | -------------------------------------------- |
 | |
| | **RETURNS** | tuple | Entities in the span, one `Span` per entity. |
 | |
| 
 | |
| ## Span.as_doc {#as_doc tag="method"}
 | |
| 
 | |
| Create a new `Doc` object corresponding to the `Span`, with a copy of the data.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like New York in Autumn.")
 | |
| > span = doc[2:4]
 | |
| > doc2 = span.as_doc()
 | |
| > assert doc2.text == "New York"
 | |
| > ```
 | |
| 
 | |
| | Name              | Type  | Description                                          |
 | |
| | ----------------- | ----- | ---------------------------------------------------- |
 | |
| | `copy_user_data`  | bool  | Whether or not to copy the original doc's user data. |
 | |
| | **RETURNS**       | `Doc` | A `Doc` object of the `Span`'s content.              |
 | |
| 
 | |
| ## Span.root {#root tag="property" model="parser"}
 | |
| 
 | |
| The token with the shortest path to the root of the sentence (or the root
 | |
| itself). If multiple tokens are equally high in the tree, the first token is
 | |
| taken.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like New York in Autumn.")
 | |
| > i, like, new, york, in_, autumn, dot = range(len(doc))
 | |
| > assert doc[new].head.text == "York"
 | |
| > assert doc[york].head.text == "like"
 | |
| > new_york = doc[new:york+1]
 | |
| > assert new_york.root.text == "York"
 | |
| > ```
 | |
| 
 | |
| | Name        | Type    | Description     |
 | |
| | ----------- | ------- | --------------- |
 | |
| | **RETURNS** | `Token` | The root token. |
 | |
| 
 | |
| ## Span.conjuncts {#conjuncts tag="property" model="parser"}
 | |
| 
 | |
| A tuple of tokens coordinated to `span.root`.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like apples and oranges")
 | |
| > apples_conjuncts = doc[2:3].conjuncts
 | |
| > assert [t.text for t in apples_conjuncts] == ["oranges"]
 | |
| > ```
 | |
| 
 | |
| | Name        | Type    | Description             |
 | |
| | ----------- | ------- | ----------------------- |
 | |
| | **RETURNS** | `tuple` | The coordinated tokens. |
 | |
| 
 | |
| ## Span.lefts {#lefts tag="property" model="parser"}
 | |
| 
 | |
| Tokens that are to the left of the span, whose heads are within the span.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like New York in Autumn.")
 | |
| > lefts = [t.text for t in doc[3:7].lefts]
 | |
| > assert lefts == ["New"]
 | |
| > ```
 | |
| 
 | |
| | Name       | Type    | Description                          |
 | |
| | ---------- | ------- | ------------------------------------ |
 | |
| | **YIELDS** | `Token` | A left-child of a token of the span. |
 | |
| 
 | |
| ## Span.rights {#rights tag="property" model="parser"}
 | |
| 
 | |
| Tokens that are to the right of the span, whose heads are within the span.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like New York in Autumn.")
 | |
| > rights = [t.text for t in doc[2:4].rights]
 | |
| > assert rights == ["in"]
 | |
| > ```
 | |
| 
 | |
| | Name       | Type    | Description                           |
 | |
| | ---------- | ------- | ------------------------------------- |
 | |
| | **YIELDS** | `Token` | A right-child of a token of the span. |
 | |
| 
 | |
| ## Span.n_lefts {#n_lefts tag="property" model="parser"}
 | |
| 
 | |
| The number of tokens that are to the left of the span, whose heads are within
 | |
| the span.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like New York in Autumn.")
 | |
| > assert doc[3:7].n_lefts == 1
 | |
| > ```
 | |
| 
 | |
| | Name        | Type | Description                      |
 | |
| | ----------- | ---- | -------------------------------- |
 | |
| | **RETURNS** | int  | The number of left-child tokens. |
 | |
| 
 | |
| ## Span.n_rights {#n_rights tag="property" model="parser"}
 | |
| 
 | |
| The number of tokens that are to the right of the span, whose heads are within
 | |
| the span.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like New York in Autumn.")
 | |
| > assert doc[2:4].n_rights == 1
 | |
| > ```
 | |
| 
 | |
| | Name        | Type | Description                       |
 | |
| | ----------- | ---- | --------------------------------- |
 | |
| | **RETURNS** | int  | The number of right-child tokens. |
 | |
| 
 | |
| ## Span.subtree {#subtree tag="property" model="parser"}
 | |
| 
 | |
| Tokens within the span and tokens which descend from them.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("Give it back! He pleaded.")
 | |
| > subtree = [t.text for t in doc[:3].subtree]
 | |
| > assert subtree == ["Give", "it", "back", "!"]
 | |
| > ```
 | |
| 
 | |
| | Name       | Type    | Description                                       |
 | |
| | ---------- | ------- | ------------------------------------------------- |
 | |
| | **YIELDS** | `Token` | A token within the span, or a descendant from it. |
 | |
| 
 | |
| ## Span.has_vector {#has_vector tag="property" model="vectors"}
 | |
| 
 | |
| A boolean value indicating whether a word vector is associated with the object.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like apples")
 | |
| > assert doc[1:].has_vector
 | |
| > ```
 | |
| 
 | |
| | Name        | Type | Description                                  |
 | |
| | ----------- | ---- | -------------------------------------------- |
 | |
| | **RETURNS** | bool | Whether the span has a vector data attached. |
 | |
| 
 | |
| ## Span.vector {#vector tag="property" model="vectors"}
 | |
| 
 | |
| A real-valued meaning representation. Defaults to an average of the token
 | |
| vectors.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like apples")
 | |
| > assert doc[1:].vector.dtype == "float32"
 | |
| > assert doc[1:].vector.shape == (300,)
 | |
| > ```
 | |
| 
 | |
| | Name        | Type                                     | Description                                         |
 | |
| | ----------- | ---------------------------------------- | --------------------------------------------------- |
 | |
| | **RETURNS** | `numpy.ndarray[ndim=1, dtype='float32']` | A 1D numpy array representing the span's semantics. |
 | |
| 
 | |
| ## Span.vector_norm {#vector_norm tag="property" model="vectors"}
 | |
| 
 | |
| The L2 norm of the span's vector representation.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > doc = nlp("I like apples")
 | |
| > doc[1:].vector_norm # 4.800883928527915
 | |
| > doc[2:].vector_norm # 6.895897646384268
 | |
| > assert doc[1:].vector_norm != doc[2:].vector_norm
 | |
| > ```
 | |
| 
 | |
| | Name        | Type  | Description                               |
 | |
| | ----------- | ----- | ----------------------------------------- |
 | |
| | **RETURNS** | float | The L2 norm of the vector representation. |
 | |
| 
 | |
| ## Attributes {#attributes}
 | |
| 
 | |
| | Name                                    | Type         | Description                                                                                                    |
 | |
| | --------------------------------------- | ------------ | -------------------------------------------------------------------------------------------------------------- |
 | |
| | `doc`                                   | `Doc`        | The parent document.                                                                                           |
 | |
| | `tensor` <Tag variant="new">2.1.7</Tag> | `ndarray`    | The span's slice of the parent `Doc`'s tensor.                                                                 |
 | |
| | `sent`                                  | `Span`       | The sentence span that this span is a part of.                                                                 |
 | |
| | `start`                                 | int          | The token offset for the start of the span.                                                                    |
 | |
| | `end`                                   | int          | The token offset for the end of the span.                                                                      |
 | |
| | `start_char`                            | int          | The character offset for the start of the span.                                                                |
 | |
| | `end_char`                              | int          | The character offset for the end of the span.                                                                  |
 | |
| | `text`                                  | unicode      | A unicode representation of the span text.                                                                     |
 | |
| | `text_with_ws`                          | unicode      | The text content of the span with a trailing whitespace character if the last token has one.                   |
 | |
| | `orth`                                  | int          | ID of the verbatim text content.                                                                               |
 | |
| | `orth_`                                 | unicode      | Verbatim text content (identical to `Span.text`). Exists mostly for consistency with the other attributes.     |
 | |
| | `label`                                 | int          | The hash value of the span's label.                                                                            |
 | |
| | `label_`                                | unicode      | The span's label.                                                                                              |
 | |
| | `lemma_`                                | unicode      | The span's lemma.                                                                                              |
 | |
| | `kb_id`                                 | int          | The hash value of the knowledge base ID referred to by the span.                                               |
 | |
| | `kb_id_`                                | unicode      | The knowledge base ID referred to by the span.                                                                 |
 | |
| | `ent_id`                                | int          | The hash value of the named entity the token is an instance of.                                                |
 | |
| | `ent_id_`                               | unicode      | The string ID of the named entity the token is an instance of.                                                 |
 | |
| | `sentiment`                             | float        | A scalar value indicating the positivity or negativity of the span.                                            |
 | |
| | `_`                                     | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes). |
 |