spaCy/website/docs/api/spangroup.md
Matthew Honnibal f277bfdf0f
Add SpanGroup and Graph container types to represent arbitrary annotations (#6696)
* Draft out initial Spans data structure

* Initial span group commit

* Basic span group support on Doc

* Basic test for span group

* Compile span_group.pyx

* Draft addition of SpanGroup to DocBin

* Add deserialization for SpanGroup

* Add tests for serializing SpanGroup

* Fix serialization of SpanGroup

* Add EdgeC and GraphC structs

* Add draft Graph data structure

* Compile graph

* More work on Graph

* Update GraphC

* Upd graph

* Fix walk functions

* Let Graph take nodes and edges on construction

* Fix walking and getting

* Add graph tests

* Fix import

* Add module with the SpanGroups dict thingy

* Update test

* Rename 'span_groups' attribute

* Try to fix c++11 compilation

* Fix test

* Update DocBin

* Try to fix compilation

* Try to fix graph

* Improve SpanGroup docstrings

* Add doc.spans to documentation

* Fix serialization

* Tidy up and add docs

* Update docs [ci skip]

* Add SpanGroup.has_overlap

* WIP updated Graph API

* Start testing new Graph API

* Update Graph tests

* Update Graph

* Add docstring

Co-authored-by: Ines Montani <ines@ines.io>
2021-01-14 17:30:41 +11:00

6.0 KiB

title tag source new
SpanGroup class spacy/tokens/span_group.pyx 3

A group of arbitrary, potentially overlapping Span objects that all belong to the same Doc object. The group can be named, and you can attach additional attributes to it. Span groups are generally accessed via the Doc.spans attribute, which will convert lists of spans into a SpanGroup object for you automatically on assignment. SpanGroup objects behave similar to lists, so you can append Span objects to them or access a member at a given index.

SpanGroup.__init__

Create a SpanGroup.

Example

doc = nlp("Their goi ng home")
spans = [doc[0:1], doc[2:4]]

# Construction 1
from spacy.tokens import SpanGroup

group = SpanGroup(doc, name="errors", spans=spans, attrs={"annotator": "matt"})
doc.spans["errors"] = group

# Construction 2
doc.spans["errors"] = spans
assert isinstance(doc.spans["errors"], SpanGroup)
Name Description
doc The document the span group belongs to. Doc
keyword-only
name The name of the span group. If the span group is created automatically on assignment to doc.spans, the key name is used. Defaults to "". str
attrs Optional JSON-serializable attributes to attach to the span group. Dict[str, Any]
spans The spans to add to the span group. Iterable[Span]

SpanGroup.doc

The Doc object the span group is referring to.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
assert doc.spans["errors"].doc == doc
Name Description
RETURNS The reference document. Doc

SpanGroup.has_overlap

Check whether the span group contains overlapping spans.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
assert not doc.spans["errors"].has_overlap
doc.spans["errors"].append(doc[1:2])
assert doc.spans["errors"].has_overlap
Name Description
RETURNS Whether the span group contains overlaps. bool

SpanGroup.__len__

Get the number of spans in the group.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
assert len(doc.spans["errors"]) == 2
Name Description
RETURNS The number of spans in the group. int

SpanGroup.__getitem__

Get a span from the group.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
span = doc.spans["errors"][1]
assert span.text == "goi ng"
Name Description
i The item index. int
RETURNS The span at the given index. Span

SpanGroup.append

Add a Span object to the group. The span must refer to the same Doc object as the span group.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1]]
doc.spans["errors"].append(doc[2:4])
assert len(doc.spans["errors"]) == 2
Name Description
span The span to append. Span

SpanGroup.extend

Add multiple Span objects to the group. All spans must refer to the same Doc object as the span group.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = []
doc.spans["errors"].extend([doc[2:4], doc[0:1]])
assert len(doc.spans["errors"]) == 2
Name Description
spans The spans to add. Iterable[Span]

SpanGroup.to_bytes

Serialize the span group to a bytestring.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
group_bytes = doc.spans["errors"].to_bytes()
Name Description
RETURNS The serialized SpanGroup. bytes

SpanGroup.from_bytes

Load the span group from a bytestring. Modifies the object in place and returns it.

Example

from spacy.tokens import SpanGroup

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
group_bytes = doc.spans["errors"].to_bytes()
new_group = SpanGroup()
new_group.from_bytes(group_bytes)
Name Description
bytes_data The data to load from. bytes
RETURNS The SpanGroup object. SpanGroup