mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-25 00:34:20 +03:00
f277bfdf0f
* Draft out initial Spans data structure * Initial span group commit * Basic span group support on Doc * Basic test for span group * Compile span_group.pyx * Draft addition of SpanGroup to DocBin * Add deserialization for SpanGroup * Add tests for serializing SpanGroup * Fix serialization of SpanGroup * Add EdgeC and GraphC structs * Add draft Graph data structure * Compile graph * More work on Graph * Update GraphC * Upd graph * Fix walk functions * Let Graph take nodes and edges on construction * Fix walking and getting * Add graph tests * Fix import * Add module with the SpanGroups dict thingy * Update test * Rename 'span_groups' attribute * Try to fix c++11 compilation * Fix test * Update DocBin * Try to fix compilation * Try to fix graph * Improve SpanGroup docstrings * Add doc.spans to documentation * Fix serialization * Tidy up and add docs * Update docs [ci skip] * Add SpanGroup.has_overlap * WIP updated Graph API * Start testing new Graph API * Update Graph tests * Update Graph * Add docstring Co-authored-by: Ines Montani <ines@ines.io>
186 lines
6.0 KiB
Markdown
186 lines
6.0 KiB
Markdown
---
|
|
title: SpanGroup
|
|
tag: class
|
|
source: spacy/tokens/span_group.pyx
|
|
new: 3
|
|
---
|
|
|
|
A group of arbitrary, potentially overlapping [`Span`](/api/span) objects that
|
|
all belong to the same [`Doc`](/api/doc) object. The group can be named, and you
|
|
can attach additional attributes to it. Span groups are generally accessed via
|
|
the [`Doc.spans`](/api/doc#spans) attribute, which will convert lists of spans
|
|
into a `SpanGroup` object for you automatically on assignment. `SpanGroup`
|
|
objects behave similar to `list`s, so you can append `Span` objects to them or
|
|
access a member at a given index.
|
|
|
|
## SpanGroup.\_\_init\_\_ {#init tag="method"}
|
|
|
|
Create a `SpanGroup`.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> spans = [doc[0:1], doc[2:4]]
|
|
>
|
|
> # Construction 1
|
|
> from spacy.tokens import SpanGroup
|
|
>
|
|
> group = SpanGroup(doc, name="errors", spans=spans, attrs={"annotator": "matt"})
|
|
> doc.spans["errors"] = group
|
|
>
|
|
> # Construction 2
|
|
> doc.spans["errors"] = spans
|
|
> assert isinstance(doc.spans["errors"], SpanGroup)
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| `doc` | The document the span group belongs to. ~~Doc~~ |
|
|
| _keyword-only_ | |
|
|
| `name` | The name of the span group. If the span group is created automatically on assignment to `doc.spans`, the key name is used. Defaults to `""`. ~~str~~ |
|
|
| `attrs` | Optional JSON-serializable attributes to attach to the span group. ~~Dict[str, Any]~~ |
|
|
| `spans` | The spans to add to the span group. ~~Iterable[Span]~~ |
|
|
|
|
## SpanGroup.doc {#doc tag="property"}
|
|
|
|
The [`Doc`](/api/doc) object the span group is referring to.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[2:4]]
|
|
> assert doc.spans["errors"].doc == doc
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------- |
|
|
| **RETURNS** | The reference document. ~~Doc~~ |
|
|
|
|
## SpanGroup.has_overlap {#has_overlap tag="property"}
|
|
|
|
Check whether the span group contains overlapping spans.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[2:4]]
|
|
> assert not doc.spans["errors"].has_overlap
|
|
> doc.spans["errors"].append(doc[1:2])
|
|
> assert doc.spans["errors"].has_overlap
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | -------------------------------------------------- |
|
|
| **RETURNS** | Whether the span group contains overlaps. ~~bool~~ |
|
|
|
|
## SpanGroup.\_\_len\_\_ {#len tag="method"}
|
|
|
|
Get the number of spans in the group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[2:4]]
|
|
> assert len(doc.spans["errors"]) == 2
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ----------------------------------------- |
|
|
| **RETURNS** | The number of spans in the group. ~~int~~ |
|
|
|
|
## SpanGroup.\_\_getitem\_\_ {#getitem tag="method"}
|
|
|
|
Get a span from the group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[2:4]]
|
|
> span = doc.spans["errors"][1]
|
|
> assert span.text == "goi ng"
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------------- |
|
|
| `i` | The item index. ~~int~~ |
|
|
| **RETURNS** | The span at the given index. ~~Span~~ |
|
|
|
|
## SpanGroup.append {#append tag="method"}
|
|
|
|
Add a [`Span`](/api/span) object to the group. The span must refer to the same
|
|
[`Doc`](/api/doc) object as the span group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1]]
|
|
> doc.spans["errors"].append(doc[2:4])
|
|
> assert len(doc.spans["errors"]) == 2
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ------ | ---------------------------- |
|
|
| `span` | The span to append. ~~Span~~ |
|
|
|
|
## SpanGroup.extend {#extend tag="method"}
|
|
|
|
Add multiple [`Span`](/api/span) objects to the group. All spans must refer to
|
|
the same [`Doc`](/api/doc) object as the span group.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = []
|
|
> doc.spans["errors"].extend([doc[2:4], doc[0:1]])
|
|
> assert len(doc.spans["errors"]) == 2
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ------- | ------------------------------------ |
|
|
| `spans` | The spans to add. ~~Iterable[Span]~~ |
|
|
|
|
## SpanGroup.to_bytes {#to_bytes tag="method"}
|
|
|
|
Serialize the span group to a bytestring.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[2:4]]
|
|
> group_bytes = doc.spans["errors"].to_bytes()
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ----------- | ------------------------------------- |
|
|
| **RETURNS** | The serialized `SpanGroup`. ~~bytes~~ |
|
|
|
|
## SpanGroup.from_bytes {#from_bytes tag="method"}
|
|
|
|
Load the span group from a bytestring. Modifies the object in place and returns
|
|
it.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.tokens import SpanGroup
|
|
>
|
|
> doc = nlp("Their goi ng home")
|
|
> doc.spans["errors"] = [doc[0:1], doc[2:4]]
|
|
> group_bytes = doc.spans["errors"].to_bytes()
|
|
> new_group = SpanGroup()
|
|
> new_group.from_bytes(group_bytes)
|
|
> ```
|
|
|
|
| Name | Description |
|
|
| ------------ | ------------------------------------- |
|
|
| `bytes_data` | The data to load from. ~~bytes~~ |
|
|
| **RETURNS** | The `SpanGroup` object. ~~SpanGroup~~ |
|