* Draft out initial Spans data structure * Initial span group commit * Basic span group support on Doc * Basic test for span group * Compile span_group.pyx * Draft addition of SpanGroup to DocBin * Add deserialization for SpanGroup * Add tests for serializing SpanGroup * Fix serialization of SpanGroup * Add EdgeC and GraphC structs * Add draft Graph data structure * Compile graph * More work on Graph * Update GraphC * Upd graph * Fix walk functions * Let Graph take nodes and edges on construction * Fix walking and getting * Add graph tests * Fix import * Add module with the SpanGroups dict thingy * Update test * Rename 'span_groups' attribute * Try to fix c++11 compilation * Fix test * Update DocBin * Try to fix compilation * Try to fix graph * Improve SpanGroup docstrings * Add doc.spans to documentation * Fix serialization * Tidy up and add docs * Update docs [ci skip] * Add SpanGroup.has_overlap * WIP updated Graph API * Start testing new Graph API * Update Graph tests * Update Graph * Add docstring Co-authored-by: Ines Montani <ines@ines.io>
6.0 KiB
title | tag | source | new |
---|---|---|---|
SpanGroup | class | spacy/tokens/span_group.pyx | 3 |
A group of arbitrary, potentially overlapping Span
objects that
all belong to the same Doc
object. The group can be named, and you
can attach additional attributes to it. Span groups are generally accessed via
the Doc.spans
attribute, which will convert lists of spans
into a SpanGroup
object for you automatically on assignment. SpanGroup
objects behave similar to list
s, so you can append Span
objects to them or
access a member at a given index.
SpanGroup.__init__
Create a SpanGroup
.
Example
doc = nlp("Their goi ng home") spans = [doc[0:1], doc[2:4]] # Construction 1 from spacy.tokens import SpanGroup group = SpanGroup(doc, name="errors", spans=spans, attrs={"annotator": "matt"}) doc.spans["errors"] = group # Construction 2 doc.spans["errors"] = spans assert isinstance(doc.spans["errors"], SpanGroup)
Name | Description |
---|---|
doc |
The document the span group belongs to. |
keyword-only | |
name |
The name of the span group. If the span group is created automatically on assignment to doc.spans , the key name is used. Defaults to "" . |
attrs |
Optional JSON-serializable attributes to attach to the span group. |
spans |
The spans to add to the span group. |
SpanGroup.doc
The Doc
object the span group is referring to.
Example
doc = nlp("Their goi ng home") doc.spans["errors"] = [doc[0:1], doc[2:4]] assert doc.spans["errors"].doc == doc
Name | Description |
---|---|
RETURNS | The reference document. |
SpanGroup.has_overlap
Check whether the span group contains overlapping spans.
Example
doc = nlp("Their goi ng home") doc.spans["errors"] = [doc[0:1], doc[2:4]] assert not doc.spans["errors"].has_overlap doc.spans["errors"].append(doc[1:2]) assert doc.spans["errors"].has_overlap
Name | Description |
---|---|
RETURNS | Whether the span group contains overlaps. |
SpanGroup.__len__
Get the number of spans in the group.
Example
doc = nlp("Their goi ng home") doc.spans["errors"] = [doc[0:1], doc[2:4]] assert len(doc.spans["errors"]) == 2
Name | Description |
---|---|
RETURNS | The number of spans in the group. |
SpanGroup.__getitem__
Get a span from the group.
Example
doc = nlp("Their goi ng home") doc.spans["errors"] = [doc[0:1], doc[2:4]] span = doc.spans["errors"][1] assert span.text == "goi ng"
Name | Description |
---|---|
i |
The item index. |
RETURNS | The span at the given index. |
SpanGroup.append
Add a Span
object to the group. The span must refer to the same
Doc
object as the span group.
Example
doc = nlp("Their goi ng home") doc.spans["errors"] = [doc[0:1]] doc.spans["errors"].append(doc[2:4]) assert len(doc.spans["errors"]) == 2
Name | Description |
---|---|
span |
The span to append. |
SpanGroup.extend
Add multiple Span
objects to the group. All spans must refer to
the same Doc
object as the span group.
Example
doc = nlp("Their goi ng home") doc.spans["errors"] = [] doc.spans["errors"].extend([doc[2:4], doc[0:1]]) assert len(doc.spans["errors"]) == 2
Name | Description |
---|---|
spans |
The spans to add. |
SpanGroup.to_bytes
Serialize the span group to a bytestring.
Example
doc = nlp("Their goi ng home") doc.spans["errors"] = [doc[0:1], doc[2:4]] group_bytes = doc.spans["errors"].to_bytes()
Name | Description |
---|---|
RETURNS | The serialized SpanGroup . |
SpanGroup.from_bytes
Load the span group from a bytestring. Modifies the object in place and returns it.
Example
from spacy.tokens import SpanGroup doc = nlp("Their goi ng home") doc.spans["errors"] = [doc[0:1], doc[2:4]] group_bytes = doc.spans["errors"].to_bytes() new_group = SpanGroup() new_group.from_bytes(group_bytes)
Name | Description |
---|---|
bytes_data |
The data to load from. |
RETURNS | The SpanGroup object. |