--- title: Example teaser: A training instance tag: class source: spacy/gold/example.pyx new: 3.0 --- An `Example` holds the information for one training instance. It stores two `Doc` objects: one for holding the gold-standard reference data, and one for holding the predictions of the pipeline. An `Alignment` object stores the alignment between these two documents, as they can differ in tokenization. ## Example.\_\_init\_\_ {#init tag="method"} Construct an `Example` object from the `predicted` document and the `reference` document. If `alignment` is `None`, it will be initialized from the words in both documents. > #### Example > > ```python > from spacy.tokens import Doc > from spacy.gold import Example > > words = ["hello", "world", "!"] > spaces = [True, False, False] > predicted = Doc(nlp.vocab, words=words, spaces=spaces) > reference = parse_gold_doc(my_data) > example = Example(predicted, reference) > ``` | Name | Type | Description | | -------------- | ----------- | ------------------------------------------------------------------------------------------------ | | `predicted` | `Doc` | The document containing (partial) predictions. Can not be `None`. | | `reference` | `Doc` | The document containing gold-standard annotations. Can not be `None`. | | _keyword-only_ | | | | `alignment` | `Alignment` | An object holding the alignment between the tokens of the `predicted` and `reference` documents. | | **RETURNS** | `Example` | The newly constructed object. | ## Example.from_dict {#from_dict tag="classmethod"} Construct an `Example` object from the `predicted` document and the reference annotations provided as a dictionary. > #### Example > > ```python > from spacy.tokens import Doc > from spacy.gold import Example > > predicted = Doc(vocab, words=["Apply", "some", "sunscreen"]) > token_ref = ["Apply", "some", "sun", "screen"] > tags_ref = ["VERB", "DET", "NOUN", "NOUN"] > example = Example.from_dict(predicted, {"words": token_ref, "tags": tags_ref}) > ``` | Name | Type | Description | | -------------- | ---------------- | ----------------------------------------------------------------- | | `predicted` | `Doc` | The document containing (partial) predictions. Can not be `None`. | | `example_dict` | `Dict[str, obj]` | The gold-standard annotations as a dictionary. Can not be `None`. | | **RETURNS** | `Example` | The newly constructed object. | ## Example.text {#text tag="property"} The text of the `predicted` document in this `Example`. > #### Example > > ```python > raw_text = example.text > ``` | Name | Type | Description | | ----------- | ---- | ------------------------------------- | | **RETURNS** | str | The text of the `predicted` document. | ## Example.predicted {#predicted tag="property"} > #### Example > > ```python > docs = [eg.predicted for eg in examples] > predictions, _ = model.begin_update(docs) > set_annotations(docs, predictions) > ``` The `Doc` holding the predictions. Occassionally also refered to as `example.x`. | Name | Type | Description | | ----------- | ----- | ---------------------------------------------- | | **RETURNS** | `Doc` | The document containing (partial) predictions. | ## Example.reference {#reference tag="property"} > #### Example > > ```python > for i, eg in enumerate(examples): > for j, label in enumerate(all_labels): > gold_labels[i][j] = eg.reference.cats.get(label, 0.0) > ``` The `Doc` holding the gold-standard annotations. Occassionally also refered to as `example.y`. | Name | Type | Description | | ----------- | ----- | -------------------------------------------------- | | **RETURNS** | `Doc` | The document containing gold-standard annotations. | ## Example.alignment {#alignment tag="property"} > #### Example > > ```python > tokens_x = ["Apply", "some", "sunscreen"] > x = Doc(vocab, words=tokens_x) > tokens_y = ["Apply", "some", "sun", "screen"] > example = Example.from_dict(x, {"words": tokens_y}) > alignment = example.alignment > assert list(alignment.y2x.data) == [[0], [1], [2], [2]] > ``` The `Alignment` object mapping the tokens of the `predicted` document to those of the `reference` document. | Name | Type | Description | | ----------- | ----------- | -------------------------------------------------- | | **RETURNS** | `Alignment` | The document containing gold-standard annotations. | ## Example.get_aligned {#get_aligned tag="method"} > #### Example > > ```python > predicted = Doc(vocab, words=["Apply", "some", "sunscreen"]) > token_ref = ["Apply", "some", "sun", "screen"] > tags_ref = ["VERB", "DET", "NOUN", "NOUN"] > example = Example.from_dict(predicted, {"words": token_ref, "tags": tags_ref}) > assert example.get_aligned("TAG", as_string=True) == ["VERB", "DET", "NOUN"] > ``` Get the aligned view of a certain token attribute, denoted by its int ID or string name. | Name | Type | Description | Default | | ----------- | -------------------------- | ------------------------------------------------------------------ | ------- | | `field` | int or str | Attribute ID or string name | | | `as_string` | bool | Whether or not to return the list of values as strings. | `False` | | **RETURNS** | `List[int]` or `List[str]` | List of integer values, or string values if `as_string` is `True`. | | ## Example.get_aligned_parse {#get_aligned_parse tag="method"} > #### Example > > ```python > doc = nlp("He pretty quickly walks away") > example = Example.from_dict(doc, {"heads": [3, 2, 3, 0, 2]}) > proj_heads, proj_labels = example.get_aligned_parse(projectivize=True) > assert proj_heads == [3, 2, 3, 0, 3] > ``` Get the aligned view of the dependency parse. If `projectivize` is set to `True`, non-projective dependency trees are made projective through the Pseudo-Projective Dependency Parsing algorithm by Nivre and Nilsson (2005). | Name | Type | Description | Default | | -------------- | -------------------------- | ------------------------------------------------------------------ | ------- | | `projectivize` | bool | Whether or not to projectivize the dependency trees | `True` | | **RETURNS** | `List[int]` or `List[str]` | List of integer values, or string values if `as_string` is `True`. | | ## Example.get_aligned_ner {#get_aligned_ner tag="method"} > #### Example > > ```python > words = ["Mrs", "Smith", "flew", "to", "New York"] > doc = Doc(en_vocab, words=words) > entities = [(0, 9, "PERSON"), (18, 26, "LOC")] > gold_words = ["Mrs Smith", "flew", "to", "New", "York"] > example = Example.from_dict(doc, {"words": gold_words, "entities": entities}) > ner_tags = example.get_aligned_ner() > assert ner_tags == ["B-PERSON", "L-PERSON", "O", "O", "U-LOC"] > ``` Get the aligned view of the NER [BILUO](/usage/linguistic-features#accessing-ner) tags. | Name | Type | Description | | ----------- | ----------- | ----------------------------------------------------------------------------------- | | **RETURNS** | `List[str]` | List of BILUO values, denoting whether tokens are part of an NER annotation or not. | ## Example.get_aligned_spans_y2x {#get_aligned_spans_y2x tag="method"} > #### Example > > ```python > words = ["Mr and Mrs Smith", "flew", "to", "New York"] > doc = Doc(en_vocab, words=words) > entities = [(0, 16, "PERSON")] > tokens_ref = ["Mr", "and", "Mrs", "Smith", "flew", "to", "New", "York"] > example = Example.from_dict(doc, {"words": tokens_ref, "entities": entities}) > ents_ref = example.reference.ents > assert [(ent.start, ent.end) for ent in ents_ref] == [(0, 4)] > ents_y2x = example.get_aligned_spans_y2x(ents_ref) > assert [(ent.start, ent.end) for ent in ents_y2x] == [(0, 1)] > ``` Get the aligned view of any set of [`Span`](/api/span) objects defined over `example.reference`. The resulting span indices will align to the tokenization in `example.predicted`. | Name | Type | Description | | ----------- | ---------------- | --------------------------------------------------------------- | | `y_spans` | `Iterable[Span]` | `Span` objects aligned to the tokenization of `self.reference`. | | **RETURNS** | `Iterable[Span]` | `Span` objects aligned to the tokenization of `self.predicted`. | ## Example.get_aligned_spans_x2y {#get_aligned_spans_x2y tag="method"} > #### Example > > ```python > nlp.add_pipe("my_ner") > doc = nlp("Mr and Mrs Smith flew to New York") > tokens_ref = ["Mr and Mrs", "Smith", "flew", "to", "New York"] > example = Example.from_dict(doc, {"words": tokens_ref}) > ents_pred = example.predicted.ents > # Assume the NER model has found "Mr and Mrs Smith" as a named entity > assert [(ent.start, ent.end) for ent in ents_pred] == [(0, 4)] > ents_x2y = example.get_aligned_spans_x2y(ents_pred) > assert [(ent.start, ent.end) for ent in ents_x2y] == [(0, 2)] > ``` Get the aligned view of any set of [`Span`](/api/span) objects defined over `example.predicted`. The resulting span indices will align to the tokenization in `example.reference`. This method is particularly useful to assess the accuracy of predicted entities against the original gold-standard annotation. | Name | Type | Description | | ----------- | ---------------- | --------------------------------------------------------------- | | `x_spans` | `Iterable[Span]` | `Span` objects aligned to the tokenization of `self.predicted`. | | **RETURNS** | `Iterable[Span]` | `Span` objects aligned to the tokenization of `self.reference`. | ## Example.to_dict {#to_dict tag="method"} Return a dictionary representation of the reference annotation contained in this `Example`. > #### Example > > ```python > eg_dict = example.to_dict() > ``` | Name | Type | Description | | ----------- | ---------------- | ------------------------------------------------------ | | **RETURNS** | `Dict[str, obj]` | Dictionary representation of the reference annotation. | ## Example.split_sents {#split_sents tag="method"} > #### Example > > ```python > doc = nlp("I went yesterday had lots of fun") > tokens_ref = ["I", "went", "yesterday", "had", "lots", "of", "fun"] > sents_ref = [True, False, False, True, False, False, False] > example = Example.from_dict(doc, {"words": tokens_ref, "sent_starts": sents_ref}) > split_examples = example.split_sents() > assert split_examples[0].text == "I went yesterday " > assert split_examples[1].text == "had lots of fun" > ``` Split one `Example` into multiple `Example` objects, one for each sentence. | Name | Type | Description | | ----------- | --------------- | ---------------------------------------------------------- | | **RETURNS** | `List[Example]` | List of `Example` objects, one for each original sentence. |