| title | teaser | tag | source | 
| GoldParse | A collection for training annotations | class | spacy/gold.pyx | 
GoldParse.__init__
Create a GoldParse.
| Name | Type | Description | 
| doc | Doc | The document the annotations refer to. | 
| words | iterable | A sequence of unicode word strings. | 
| tags | iterable | A sequence of strings, representing tag annotations. | 
| heads | iterable | A sequence of integers, representing syntactic head offsets. | 
| deps | iterable | A sequence of strings, representing the syntactic relation types. | 
| entities | iterable | A sequence of named entity annotations, either as BILUO tag strings, or as (start_char, end_char, label)tuples, representing the entity positions. | 
| RETURNS | GoldParse | The newly constructed object. | 
GoldParse.__len__
Get the number of gold-standard tokens.
| Name | Type | Description | 
| RETURNS | int | The number of gold-standard tokens. | 
GoldParse.is_projective
Whether the provided syntactic annotations form a projective dependency tree.
| Name | Type | Description | 
| RETURNS | bool | Whether annotations form projective tree. | 
Attributes
| Name | Type | Description | 
| tags | list | The part-of-speech tag annotations. | 
| heads | list | The syntactic head annotations. | 
| labels | list | The syntactic relation-type annotations. | 
| ents | list | The named entity annotations. | 
| cand_to_gold | list | The alignment from candidate tokenization to gold tokenization. | 
| gold_to_cand | list | The alignment from gold tokenization to candidate tokenization. | 
| cats2 | list | Entries in the list should be either a label, or a (start, end, label)triple. The tuple form is used for categories applied to spans of the document. | 
Utilities
gold.biluo_tags_from_offsets
Encode labelled spans into per-token tags, using the
BILUO scheme (Begin/In/Last/Unit/Out).
Returns a list of unicode strings, describing the tags. Each tag string will be
of the form of either "", "O" or "{action}-{label}", where action is one
of "B", "I", "L", "U". The string "-" is used where the entity offsets
don't align with the tokenization in the Doc object. The training algorithm
will view these as missing values. O denotes a non-entity token. B denotes
the beginning of a multi-token entity, I the inside of an entity of three or
more tokens, and L the end of an entity of two or more tokens. U denotes a
single-token entity.
Example
from spacy.gold import biluo_tags_from_offsets
doc = nlp(u"I like London.")
entities = [(7, 13, "LOC")]
tags = biluo_tags_from_offsets(doc, entities)
assert tags == ["O", "O", "U-LOC", "O"]
| Name | Type | Description | 
| doc | Doc | The document that the entity offsets refer to. The output tags will refer to the token boundaries within the document. | 
| entities | iterable | A sequence of (start, end, label)triples.startandendshould be character-offset integers denoting the slice into the original string. | 
| RETURNS | list | Unicode strings, describing the BILUO tags. | 
gold.offsets_from_biluo_tags
Encode per-token tags following the BILUO scheme into
entity offsets.
Example
from spacy.gold import offsets_from_biluo_tags
doc = nlp(u"I like London.")
tags = ["O", "O", "U-LOC", "O"]
entities = offsets_from_biluo_tags(doc, tags)
assert entities == [(7, 13, "LOC")]
| Name | Type | Description | 
| doc | Doc | The document that the BILUO tags refer to. | 
| entities | iterable | A sequence of BILUO tags with each tag describing one token. Each tag string will be of the form of either "","O"or"{action}-{label}", where action is one of"B","I","L","U". | 
| RETURNS | list | A sequence of (start, end, label)triples.startandendwill be character-offset integers denoting the slice into the original string. | 
gold.spans_from_biluo_tags
Encode per-token tags following the BILUO scheme into
Span objects. This can be used to create entity spans from
token-based tags, e.g. to overwrite the doc.ents.
Example
from spacy.gold import offsets_from_biluo_tags
doc = nlp(u"I like London.")
tags = ["O", "O", "U-LOC", "O"]
doc.ents = spans_from_biluo_tags(doc, tags)
| Name | Type | Description | 
| doc | Doc | The document that the BILUO tags refer to. | 
| entities | iterable | A sequence of BILUO tags with each tag describing one token. Each tag string will be of the form of either "","O"or"{action}-{label}", where action is one of"B","I","L","U". | 
| RETURNS | list | A sequence of Spanobjects with added entity labels. |