spaCy/spangroup.md at 7a239f2ec7c71a494f2380686fdbcfdd421e7fa6

mirror of https://github.com/explosion/spaCy.git synced 2025-10-24 04:31:17 +03:00

Sofie Van Landeghem de025beb5f

Warn and document spangroup.doc weakref (#8980 )

* test for error after Doc has been garbage collected

* warn about using a SpanGroup when the Doc has been garbage collected

* add warning to the docs

* rephrase slightly

* raise error instead of warning

* update

* move warning to doc property

2021-08-20 11:06:19 +02:00

6.4 KiB

Raw Blame History

title	tag	source	new
SpanGroup	class	spacy/tokens/span_group.pyx	3

A group of arbitrary, potentially overlapping Span objects that all belong to the same Doc object. The group can be named, and you can attach additional attributes to it. Span groups are generally accessed via the Doc.spans attribute, which will convert lists of spans into a SpanGroup object for you automatically on assignment. SpanGroup objects behave similar to lists, so you can append Span objects to them or access a member at a given index.

SpanGroup.init

Create a SpanGroup.

Example

doc = nlp("Their goi ng home")
spans = [doc[0:1], doc[2:4]]

# Construction 1
from spacy.tokens import SpanGroup

group = SpanGroup(doc, name="errors", spans=spans, attrs={"annotator": "matt"})
doc.spans["errors"] = group

# Construction 2
doc.spans["errors"] = spans
assert isinstance(doc.spans["errors"], SpanGroup)

Name	Description
`doc`	The document the span group belongs to. ~~Doc~~
keyword-only
`name`	The name of the span group. If the span group is created automatically on assignment to `doc.spans`, the key name is used. Defaults to `""`. ~~str~~
`attrs`	Optional JSON-serializable attributes to attach to the span group. ~~Dict[str, Any]~~
`spans`	The spans to add to the span group. ~~Iterable[Span]~~

SpanGroup.doc

The Doc object the span group is referring to.

When a Doc object is garbage collected, any related SpanGroup object won't be functional anymore, as these objects use a weakref to refer to the document. An error will be raised as the internal doc object will be None. To avoid this, make sure that the original Doc objects are still available in the scope of your function.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
assert doc.spans["errors"].doc == doc

Name	Description
RETURNS	The reference document. ~~Doc~~

SpanGroup.has_overlap

Check whether the span group contains overlapping spans.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
assert not doc.spans["errors"].has_overlap
doc.spans["errors"].append(doc[1:2])
assert doc.spans["errors"].has_overlap

Name	Description
RETURNS	Whether the span group contains overlaps. ~~bool~~

SpanGroup.len

Get the number of spans in the group.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
assert len(doc.spans["errors"]) == 2

Name	Description
RETURNS	The number of spans in the group. ~~int~~

SpanGroup.getitem

Get a span from the group.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
span = doc.spans["errors"][1]
assert span.text == "goi ng"

Name	Description
`i`	The item index. ~~int~~
RETURNS	The span at the given index. ~~Span~~

SpanGroup.append

Add a Span object to the group. The span must refer to the same Doc object as the span group.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1]]
doc.spans["errors"].append(doc[2:4])
assert len(doc.spans["errors"]) == 2

Name	Description
`span`	The span to append. ~~Span~~

SpanGroup.extend

Add multiple Span objects to the group. All spans must refer to the same Doc object as the span group.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = []
doc.spans["errors"].extend([doc[2:4], doc[0:1]])
assert len(doc.spans["errors"]) == 2

Name	Description
`spans`	The spans to add. ~~Iterable[Span]~~

SpanGroup.to_bytes

Serialize the span group to a bytestring.

Example

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
group_bytes = doc.spans["errors"].to_bytes()

Name	Description
RETURNS	The serialized `SpanGroup`. ~~bytes~~

SpanGroup.from_bytes

Load the span group from a bytestring. Modifies the object in place and returns it.

Example

from spacy.tokens import SpanGroup

doc = nlp("Their goi ng home")
doc.spans["errors"] = [doc[0:1], doc[2:4]]
group_bytes = doc.spans["errors"].to_bytes()
new_group = SpanGroup()
new_group.from_bytes(group_bytes)

Name	Description
`bytes_data`	The data to load from. ~~bytes~~
RETURNS	The `SpanGroup` object. ~~SpanGroup~~

6.4 KiB Raw Blame History

SpanGroup.__init__

Example

SpanGroup.doc

Example

SpanGroup.has_overlap

Example

SpanGroup.__len__

Example

SpanGroup.__getitem__

Example

SpanGroup.append

Example

SpanGroup.extend

Example

SpanGroup.to_bytes

Example

SpanGroup.from_bytes

Example

6.4 KiB

Raw Blame History

SpanGroup.init

SpanGroup.len

SpanGroup.getitem