mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-27 10:26:35 +03:00
604a7c3c26
* `SpanGroup`: Add support for binding copies to a new reference document * `SpanGroups`: Replace superfluous serialize-deserialize roundtrip in `copy` Instead, directly copy the in-memory representations of the constituent `SpanGroup`s. * Update `SpanGroup.copy()` signature * Rename `new_doc` param to `doc` * Fix kwdarg * Update `.pyi` file and docstrings * `mypy` fix * Update spacy/tokens/span_group.pyx * Update docs Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
300 lines
10 KiB
Markdown
300 lines
10 KiB
Markdown
---
|
|
title: SpanGroup
|
|
tag: class
|
|
source: spacy/tokens/span_group.pyx
|
|
new: 3
|
|
---
|
|
|
|
A group of arbitrary, potentially overlapping [`Span`](/api/span) objects that
|
|
all belong to the same [`Doc`](/api/doc) object. The group can be named, and you
|
|
can attach additional attributes to it. Span groups are generally accessed via
|
|
the [`Doc.spans`](/api/doc#spans) attribute, which will convert lists of spans
|
|
into a `SpanGroup` object for you automatically on assignment. `SpanGroup`
|
|
objects behave similar to `list`s, so you can append `Span` objects to them or
|
|
access a member at a given index.
|
|
|
|
## SpanGroup.\_\_init\_\_ {#init tag="method"}
|
|
|
|
Create a `SpanGroup`.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> spans = [doc[0:1], doc[1:3]]
|
|
>
|
|
> # Construction 1
|
|
> from spacy.tokens import SpanGroup
|
|
>
|
|
> group = SpanGroup(doc, name="errors", spans=spans, attrs={"annotator": "matt"})
|
|
> doc.spans["errors"] = group
|
|
>
|
|
> # Construction 2
|
|
> doc.spans["errors"] = spans
|
|
> assert isinstance(doc.spans["errors"], SpanGroup)
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| `doc` | The document the span group belongs to. ~~Doc~~ |
|
|
| _keyword-only_ | |
|
|
| `name` | The name of the span group. If the span group is created automatically on assignment to `doc.spans`, the key name is used. Defaults to `""`. ~~str~~ |
|
|
| `attrs` | Optional JSON-serializable attributes to attach to the span group. ~~Dict[str, Any]~~ |
|
|
| `spans` | The spans to add to the span group. ~~Iterable[Span]~~ |
|
|
|
|
## SpanGroup.doc {#doc tag="property"}
|
|
|
|
The [`Doc`](/api/doc) object the span group is referring to.
|
|
|
|
<Infobox title="SpanGroup and Doc lifecycle" variant="warning">
|
|
|
|
When a `Doc` object is garbage collected, any related `SpanGroup` object won't
|
|
be functional anymore, as these objects use a `weakref` to refer to the
|
|
document. An error will be raised as the internal `doc` object will be `None`.
|
|
To avoid this, make sure that the original `Doc` objects are still available in
|
|
the scope of your function.
|
|
|
|
</Infobox>
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[1:3]]
|
|
> assert doc.spans["errors"].doc == doc
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------- |
|
|
| **RETURNS** | The reference document. ~~Doc~~ |
|
|
|
|
## SpanGroup.has_overlap {#has_overlap tag="property"}
|
|
|
|
Check whether the span group contains overlapping spans.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[1:3]]
|
|
> assert not doc.spans["errors"].has_overlap
|
|
> doc.spans["errors"].append(doc[2:4])
|
|
> assert doc.spans["errors"].has_overlap
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | -------------------------------------------------- |
|
|
| **RETURNS** | Whether the span group contains overlaps. ~~bool~~ |
|
|
|
|
## SpanGroup.\_\_len\_\_ {#len tag="method"}
|
|
|
|
Get the number of spans in the group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[1:3]]
|
|
> assert len(doc.spans["errors"]) == 2
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ----------------------------------------- |
|
|
| **RETURNS** | The number of spans in the group. ~~int~~ |
|
|
|
|
## SpanGroup.\_\_getitem\_\_ {#getitem tag="method"}
|
|
|
|
Get a span from the group. Note that a copy of the span is returned, so if any
|
|
changes are made to this span, they are not reflected in the corresponding
|
|
member of the span group. The item or group will need to be reassigned for
|
|
changes to be reflected in the span group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[1:3]]
|
|
> span = doc.spans["errors"][1]
|
|
> assert span.text == "goi ng"
|
|
> span.label_ = 'LABEL'
|
|
> assert doc.spans["errors"][1].label_ != 'LABEL' # The span within the group was not updated
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------------- |
|
|
| `i` | The item index. ~~int~~ |
|
|
| **RETURNS** | The span at the given index. ~~Span~~ |
|
|
|
|
## SpanGroup.\_\_setitem\_\_ {#setitem tag="method", new="3.3"}
|
|
|
|
Set a span in the span group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[1:3]]
|
|
> span = doc[0:2]
|
|
> doc.spans["errors"][0] = span
|
|
> assert doc.spans["errors"][0].text == "Their goi"
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ------ | ----------------------- |
|
|
| `i` | The item index. ~~int~~ |
|
|
| `span` | The new value. ~~Span~~ |
|
|
|
|
## SpanGroup.\_\_delitem\_\_ {#delitem tag="method", new="3.3"}
|
|
|
|
Delete a span from the span group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[1:3]]
|
|
> del doc.spans[0]
|
|
> assert len(doc.spans["errors"]) == 1
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ---- | ----------------------- |
|
|
| `i` | The item index. ~~int~~ |
|
|
|
|
## SpanGroup.\_\_add\_\_ {#add tag="method", new="3.3"}
|
|
|
|
Concatenate the current span group with another span group and return the result
|
|
in a new span group. Any `attrs` from the first span group will have precedence
|
|
over `attrs` in the second.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[1:3]]
|
|
> doc.spans["other"] = [doc[0:2], doc[2:4]]
|
|
> span_group = doc.spans["errors"] + doc.spans["other"]
|
|
> assert len(span_group) == 4
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ---------------------------------------------------------------------------- |
|
|
| `other` | The span group or spans to concatenate. ~~Union[SpanGroup, Iterable[Span]]~~ |
|
|
| **RETURNS** | The new span group. ~~SpanGroup~~ |
|
|
|
|
## SpanGroup.\_\_iadd\_\_ {#iadd tag="method", new="3.3"}
|
|
|
|
Append an iterable of spans or the content of a span group to the current span
|
|
group. Any `attrs` in the other span group will be added for keys that are not
|
|
already present in the current span group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[1:3]]
|
|
> doc.spans["errors"] += [doc[3:4], doc[2:3]]
|
|
> assert len(doc.spans["errors"]) == 4
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ----------------------------------------------------------------------- |
|
|
| `other` | The span group or spans to append. ~~Union[SpanGroup, Iterable[Span]]~~ |
|
|
| **RETURNS** | The span group. ~~SpanGroup~~ |
|
|
|
|
## SpanGroup.append {#append tag="method"}
|
|
|
|
Add a [`Span`](/api/span) object to the group. The span must refer to the same
|
|
[`Doc`](/api/doc) object as the span group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1]]
|
|
> doc.spans["errors"].append(doc[1:3])
|
|
> assert len(doc.spans["errors"]) == 2
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ------ | ---------------------------- |
|
|
| `span` | The span to append. ~~Span~~ |
|
|
|
|
## SpanGroup.extend {#extend tag="method"}
|
|
|
|
Add multiple [`Span`](/api/span) objects or contents of another `SpanGroup` to
|
|
the group. All spans must refer to the same [`Doc`](/api/doc) object as the span
|
|
group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = []
|
|
> doc.spans["errors"].extend([doc[1:3], doc[0:1]])
|
|
> assert len(doc.spans["errors"]) == 2
|
|
> span_group = SpanGroup(doc, spans=[doc[1:4], doc[0:3]])
|
|
> doc.spans["errors"].extend(span_group)
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ------- | -------------------------------------------------------- |
|
|
| `spans` | The spans to add. ~~Union[SpanGroup, Iterable["Span"]]~~ |
|
|
|
|
## SpanGroup.copy {#copy tag="method", new="3.3"}
|
|
|
|
Return a copy of the span group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.tokens import SpanGroup
|
|
>
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[1:3], doc[0:3]]
|
|
> new_group = doc.spans["errors"].copy()
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | -------------------------------------------------------------------------------------------------- |
|
|
| `doc` | The document to which the copy is bound. Defaults to `None` for the current doc. ~~Optional[Doc]~~ |
|
|
| **RETURNS** | A copy of the `SpanGroup` object. ~~SpanGroup~~ |
|
|
|
|
## SpanGroup.to_bytes {#to_bytes tag="method"}
|
|
|
|
Serialize the span group to a bytestring.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[1:3]]
|
|
> group_bytes = doc.spans["errors"].to_bytes()
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------------- |
|
|
| **RETURNS** | The serialized `SpanGroup`. ~~bytes~~ |
|
|
|
|
## SpanGroup.from_bytes {#from_bytes tag="method"}
|
|
|
|
Load the span group from a bytestring. Modifies the object in place and returns
|
|
it.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.tokens import SpanGroup
|
|
>
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[1:3]]
|
|
> group_bytes = doc.spans["errors"].to_bytes()
|
|
> new_group = SpanGroup()
|
|
> new_group.from_bytes(group_bytes)
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ------------ | ------------------------------------- |
|
|
| `bytes_data` | The data to load from. ~~bytes~~ |
|
|
| **RETURNS** | The `SpanGroup` object. ~~SpanGroup~~ |
|