fix component constructors, update, begin_training, reference to GoldParse

2025-08-10 07:04:53 +03:00 · 2020-07-07 19:17:19 +02:00 · 2020-07-07 19:17:19 +02:00 · 2b60e894cb
commit 2b60e894cb
parent 14a796e3f9
12 changed files with 265 additions and 238 deletions
--- a/website/docs/api/dependencyparser.md
+++ b/website/docs/api/dependencyparser.md
@ -33,16 +33,16 @@ shortcut for this and instantiate the component using its string name and
 >
 > # Construction from class
 > from spacy.pipeline import DependencyParser
-> parser = DependencyParser(nlp.vocab)
+> parser = DependencyParser(nlp.vocab, parser_model)
 > parser.from_disk("/path/to/model")
 > ```

-| Name        | Type                          | Description                                                                                                                                           |
-| ----------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `vocab`     | `Vocab`                       | The shared vocabulary.                                                                                                                                |
-| `model`     | `thinc.neural.Model` / `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
-| `**cfg`     | -                             | Configuration parameters.                                                                                                                             |
-| **RETURNS** | `DependencyParser`            | The newly constructed object.                                                                                                                         |
+| Name        | Type               | Description                                                                     |
+| ----------- | ------------------ | ------------------------------------------------------------------------------- |
+| `vocab`     | `Vocab`            | The shared vocabulary.                                                          |
+| `model`     | `Model`            | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
+| `**cfg`     | -                  | Configuration parameters.                                                       |
+| **RETURNS** | `DependencyParser` | The newly constructed object.                                                   |

 ## DependencyParser.\_\_call\_\_ {#call tag="method"}

@ -126,26 +126,28 @@ Modify a batch of documents, using pre-computed scores.

 ## DependencyParser.update {#update tag="method"}

-Learn from a batch of documents and gold-standard information, updating the
-pipe's model. Delegates to [`predict`](/api/dependencyparser#predict) and
+Learn from a batch of [`Example`](/api/example) objects, updating the pipe's
+model. Delegates to [`predict`](/api/dependencyparser#predict) and
 [`get_loss`](/api/dependencyparser#get_loss).

 > #### Example
 >
 > ```python
-> parser = DependencyParser(nlp.vocab)
+> parser = DependencyParser(nlp.vocab, parser_model)
 > losses = {}
 > optimizer = nlp.begin_training()
-> parser.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)
+> parser.update(examples, losses=losses, sgd=optimizer)
 > ```

-| Name     | Type     | Description                                                                                  |
-| -------- | -------- | -------------------------------------------------------------------------------------------- |
-| `docs`   | iterable | A batch of documents to learn from.                                                          |
-| `golds`  | iterable | The gold-standard data. Must have the same length as `docs`.                                 |
-| `drop`   | float    | The dropout rate.                                                                            |
-| `sgd`    | callable | The optimizer. Should take two arguments `weights` and `gradient`, and an optional ID.       |
-| `losses` | dict     | Optional record of the loss during training. The value keyed by the model's name is updated. |
+| Name              | Type                | Description                                                                                                                                    |
+| ----------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
+| `examples`        | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from.                                                                                    |
+| _keyword-only_    |                     |                                                                                                                                                |
+| `drop`            | float               | The dropout rate.                                                                                                                              |
+| `set_annotations` | bool                | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/dependencyparser#set_annotations). |
+| `sgd`             | `Optimizer`         | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object.                                                                                |
+| `losses`          | `Dict[str, float]`  | Optional record of the loss during training. The value keyed by the model's name is updated.                                                   |
+| **RETURNS**       | `Dict[str, float]`  | The updated `losses` dictionary.                                                                                                               |

 ## DependencyParser.get_loss {#get_loss tag="method"}

@ -169,8 +171,8 @@ predicted scores.

 ## DependencyParser.begin_training {#begin_training tag="method"}

-Initialize the pipe for training, using data examples if available. If no model
-has been initialized yet, the model is added.
+Initialize the pipe for training, using data examples if available. Return an
+[`Optimizer`](https://thinc.ai/docs/api-optimizers) object.

 > #### Example
 >
@ -180,16 +182,17 @@ has been initialized yet, the model is added.
 > optimizer = parser.begin_training(pipeline=nlp.pipeline)
 > ```

-| Name          | Type     | Description                                                                                                                                                                                 |
-| ------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `gold_tuples` | iterable | Optional gold-standard annotations from which to construct [`GoldParse`](/api/goldparse) objects.                                                                                           |
-| `pipeline`    | list     | Optional list of pipeline components that this component is part of.                                                                                                                        |
-| `sgd`         | callable | An optional optimizer. Should take two arguments `weights` and `gradient`, and an optional ID. Will be created via [`DependencyParser`](/api/dependencyparser#create_optimizer) if not set. |
-| **RETURNS**   | callable | An optimizer.                                                                                                                                                                               |
+| Name           | Type                    | Description                                                                                                                                                          |
+| -------------- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `get_examples` | `Iterable[Example]`     | Optional gold-standard annotations in the form of [`Example`](/api/example) objects.                                                                                 |
+| `pipeline`     | `List[(str, callable)]` | Optional list of pipeline components that this component is part of.                                                                                                 |
+| `sgd`          | `Optimizer`             | An optional [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. Will be created via [`create_optimizer`](/api/dependencyparser#create_optimizer) if not set. |
+| **RETURNS**    | `Optimizer`             | An optimizer.                                                                                                                                                        |

 ## DependencyParser.create_optimizer {#create_optimizer tag="method"}

-Create an optimizer for the pipeline component.
+Create an [`Optimizer`](https://thinc.ai/docs/api-optimizers) for the pipeline
+component.

 > #### Example
 >
@ -198,9 +201,9 @@ Create an optimizer for the pipeline component.
 > optimizer = parser.create_optimizer()
 > ```

-| Name        | Type     | Description    |
-| ----------- | -------- | -------------- |
-| **RETURNS** | callable | The optimizer. |
+| Name        | Type        | Description    |
+| ----------- | ----------- | -------------- |
+| **RETURNS** | `Optimizer` | The optimizer. |

 ## DependencyParser.use_params {#use_params tag="method, contextmanager"}

--- a/website/docs/api/entitylinker.md
+++ b/website/docs/api/entitylinker.md
@ -38,18 +38,17 @@ shortcut for this and instantiate the component using its string name and
 >
 > # Construction from class
 > from spacy.pipeline import EntityLinker
-> entity_linker = EntityLinker(nlp.vocab)
+> entity_linker = EntityLinker(nlp.vocab, nel_model)
 > entity_linker.from_disk("/path/to/model")
 > ```

-| Name           | Type                          | Description                                                                                                                                           |
-| -------------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `vocab`        | `Vocab`                       | The shared vocabulary.                                                                                                                                |
-| `model`        | `thinc.neural.Model` / `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
-| `hidden_width` | int                           | Width of the hidden layer of the entity linking model, defaults to `128`.                                                                             |
-| `incl_prior`   | bool                          | Whether or not to include prior probabilities in the model. Defaults to `True`.                                                                       |
-| `incl_context` | bool                          | Whether or not to include the local context in the model (if not: only prior probabilities are used). Defaults to `True`.                             |
-| **RETURNS**    | `EntityLinker`                | The newly constructed object.                                                                                                                         |
+| Name    | Type    | Description                                                                     |
+| ------- | ------- | ------------------------------------------------------------------------------- |
+| `vocab` | `Vocab` | The shared vocabulary.                                                          |
+| `model` | `Model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
+| `**cfg` | -       | Configuration parameters.                                                       |
+
+| **RETURNS** | `EntityLinker` | The newly constructed object. |

 ## EntityLinker.\_\_call\_\_ {#call tag="method"}

@ -134,7 +133,7 @@ entities.

 ## EntityLinker.update {#update tag="method"}

-Learn from a batch of documents and gold-standard information, updating both the
+Learn from a batch of [`Example`](/api/example) objects, updating both the
 pipe's entity linking model and context encoder. Delegates to
 [`predict`](/api/entitylinker#predict) and
 [`get_loss`](/api/entitylinker#get_loss).
@ -142,19 +141,21 @@ pipe's entity linking model and context encoder. Delegates to
 > #### Example
 >
 > ```python
-> entity_linker = EntityLinker(nlp.vocab)
+> entity_linker = EntityLinker(nlp.vocab, nel_model)
 > losses = {}
 > optimizer = nlp.begin_training()
-> entity_linker.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)
+> entity_linker.update(examples, losses=losses, sgd=optimizer)
 > ```

-| Name     | Type     | Description                                                                                             |
-| -------- | -------- | ------------------------------------------------------------------------------------------------------- |
-| `docs`   | iterable | A batch of documents to learn from.                                                                     |
-| `golds`  | iterable | The gold-standard data. Must have the same length as `docs`.                                            |
-| `drop`   | float    | The dropout rate, used both for the EL model and the context encoder.                                   |
-| `sgd`    | callable | The optimizer for the EL model. Should take two arguments `weights` and `gradient`, and an optional ID. |
-| `losses` | dict     | Optional record of the loss during training. The value keyed by the model's name is updated.            |
+| Name              | Type                | Description                                                                                                                                |
+| ----------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
+| `examples`        | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from.                                                                                |
+| _keyword-only_    |                     |                                                                                                                                            |
+| `drop`            | float               | The dropout rate.                                                                                                                          |
+| `set_annotations` | bool                | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/entitylinker#set_annotations). |
+| `sgd`             | `Optimizer`         | [`Optimizer`](https://thinc.ai/docs/api-optimizers) object.                                                                                |
+| `losses`          | `Dict[str, float]`  | Optional record of the loss during training. The value keyed by the model's name is updated.                                               |
+| **RETURNS**       | float               | The loss from this batch.                                                                                                                  |

 ## EntityLinker.get_loss {#get_loss tag="method"}

@ -195,9 +196,9 @@ identifiers.

 ## EntityLinker.begin_training {#begin_training tag="method"}

-Initialize the pipe for training, using data examples if available. If no model
-has been initialized yet, the model is added. Before calling this method, a
-knowledge base should have been defined with
+Initialize the pipe for training, using data examples if available. Return an
+[`Optimizer`](https://thinc.ai/docs/api-optimizers) object. Before calling this
+method, a knowledge base should have been defined with
 [`set_kb`](/api/entitylinker#set_kb).

 > #### Example
@ -209,12 +210,12 @@ knowledge base should have been defined with
 > optimizer = entity_linker.begin_training(pipeline=nlp.pipeline)
 > ```

-| Name          | Type     | Description                                                                                                                                                                         |
-| ------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `gold_tuples` | iterable | Optional gold-standard annotations from which to construct [`GoldParse`](/api/goldparse) objects.                                                                                   |
-| `pipeline`    | list     | Optional list of pipeline components that this component is part of.                                                                                                                |
-| `sgd`         | callable | An optional optimizer. Should take two arguments `weights` and `gradient`, and an optional ID. Will be created via [`EntityLinker`](/api/entitylinker#create_optimizer) if not set. |
-| **RETURNS**   | callable | An optimizer.                                                                                                                                                                       |
+| Name           | Type                    | Description                                                                                                                                                      |
+| -------------- | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `get_examples` | `Iterable[Example]`     | Optional gold-standard annotations in the form of [`Example`](/api/example) objects.                                                                             |
+| `pipeline`     | `List[(str, callable)]` | Optional list of pipeline components that this component is part of.                                                                                             |
+| `sgd`          | `Optimizer`             | An optional [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. Will be created via [`create_optimizer`](/api/entitylinker#create_optimizer) if not set. |
+| **RETURNS**    | `Optimizer`             | An optimizer.                                                                                                                                                    |  |

 ## EntityLinker.create_optimizer {#create_optimizer tag="method"}

--- a/website/docs/api/entityrecognizer.md
+++ b/website/docs/api/entityrecognizer.md
@ -33,16 +33,16 @@ shortcut for this and instantiate the component using its string name and
 >
 > # Construction from class
 > from spacy.pipeline import EntityRecognizer
-> ner = EntityRecognizer(nlp.vocab)
+> ner = EntityRecognizer(nlp.vocab, ner_model)
 > ner.from_disk("/path/to/model")
 > ```

-| Name        | Type                          | Description                                                                                                                                           |
-| ----------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `vocab`     | `Vocab`                       | The shared vocabulary.                                                                                                                                |
-| `model`     | `thinc.neural.Model` / `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
-| `**cfg`     | -                             | Configuration parameters.                                                                                                                             |
-| **RETURNS** | `EntityRecognizer`            | The newly constructed object.                                                                                                                         |
+| Name        | Type               | Description                                                                     |
+| ----------- | ------------------ | ------------------------------------------------------------------------------- |
+| `vocab`     | `Vocab`            | The shared vocabulary.                                                          |
+| `model`     | `Model`            | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
+| `**cfg`     | -                  | Configuration parameters.                                                       |
+| **RETURNS** | `EntityRecognizer` | The newly constructed object.                                                   |

 ## EntityRecognizer.\_\_call\_\_ {#call tag="method"}

@ -102,10 +102,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
 > scores, tensors = ner.predict([doc1, doc2])
 > ```

-| Name        | Type     | Description                                                                                                                                                                                                                        |
-| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `docs`      | iterable | The documents to predict.                                                                                                                                                                                                          |
-| **RETURNS** | list | List of `syntax.StateClass` objects. `syntax.StateClass` is a helper class for the parse state (internal). |
+| Name        | Type     | Description                                                                                                |
+| ----------- | -------- | ---------------------------------------------------------------------------------------------------------- |
+| `docs`      | iterable | The documents to predict.                                                                                  |
+| **RETURNS** | list     | List of `syntax.StateClass` objects. `syntax.StateClass` is a helper class for the parse state (internal). |

 ## EntityRecognizer.set_annotations {#set_annotations tag="method"}

@ -127,26 +127,28 @@ Modify a batch of documents, using pre-computed scores.

 ## EntityRecognizer.update {#update tag="method"}

-Learn from a batch of documents and gold-standard information, updating the
-pipe's model. Delegates to [`predict`](/api/entityrecognizer#predict) and
+Learn from a batch of [`Example`](/api/example) objects, updating the pipe's
+model. Delegates to [`predict`](/api/entityrecognizer#predict) and
 [`get_loss`](/api/entityrecognizer#get_loss).

 > #### Example
 >
 > ```python
-> ner = EntityRecognizer(nlp.vocab)
+> ner = EntityRecognizer(nlp.vocab, ner_model)
 > losses = {}
 > optimizer = nlp.begin_training()
-> ner.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)
+> ner.update(examples, losses=losses, sgd=optimizer)
 > ```

-| Name     | Type     | Description                                                                                  |
-| -------- | -------- | -------------------------------------------------------------------------------------------- |
-| `docs`   | iterable | A batch of documents to learn from.                                                          |
-| `golds`  | iterable | The gold-standard data. Must have the same length as `docs`.                                 |
-| `drop`   | float    | The dropout rate.                                                                            |
-| `sgd`    | callable | The optimizer. Should take two arguments `weights` and `gradient`, and an optional ID.       |
-| `losses` | dict     | Optional record of the loss during training. The value keyed by the model's name is updated. |
+| Name              | Type                | Description                                                                                                                                    |
+| ----------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
+| `examples`        | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from.                                                                                    |
+| _keyword-only_    |                     |                                                                                                                                                |
+| `drop`            | float               | The dropout rate.                                                                                                                              |
+| `set_annotations` | bool                | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/entityrecognizer#set_annotations). |
+| `sgd`             | `Optimizer`         | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object.                                                                                |
+| `losses`          | `Dict[str, float]`  | Optional record of the loss during training. The value keyed by the model's name is updated.                                                   |
+| **RETURNS**       | `Dict[str, float]`  | The updated `losses` dictionary.                                                                                                               |

 ## EntityRecognizer.get_loss {#get_loss tag="method"}

@ -170,8 +172,8 @@ predicted scores.

 ## EntityRecognizer.begin_training {#begin_training tag="method"}

-Initialize the pipe for training, using data examples if available. If no model
-has been initialized yet, the model is added.
+Initialize the pipe for training, using data examples if available. Return an
+[`Optimizer`](https://thinc.ai/docs/api-optimizers) object.

 > #### Example
 >
@ -181,12 +183,14 @@ has been initialized yet, the model is added.
 > optimizer = ner.begin_training(pipeline=nlp.pipeline)
 > ```

-| Name          | Type     | Description                                                                                                                                                                                 |
-| ------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `gold_tuples` | iterable | Optional gold-standard annotations from which to construct [`GoldParse`](/api/goldparse) objects.                                                                                           |
-| `pipeline`    | list     | Optional list of pipeline components that this component is part of.                                                                                                                        |
-| `sgd`         | callable | An optional optimizer. Should take two arguments `weights` and `gradient`, and an optional ID. Will be created via [`EntityRecognizer`](/api/entityrecognizer#create_optimizer) if not set. |
-| **RETURNS**   | callable | An optimizer.                                                                                                                                                                               |
+| Name           | Type                    | Description                                                                                                                                                          |
+| -------------- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `get_examples` | `Iterable[Example]`     | Optional gold-standard annotations in the form of [`Example`](/api/example) objects.                                                                                 |
+| `pipeline`     | `List[(str, callable)]` | Optional list of pipeline components that this component is part of.                                                                                                 |
+| `sgd`          | `Optimizer`             | An optional [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. Will be created via [`create_optimizer`](/api/entityrecognizer#create_optimizer) if not set. |
+| **RETURNS**    | `Optimizer`             | An optimizer.                                                                                                                                                        |
+
+|

 ## EntityRecognizer.create_optimizer {#create_optimizer tag="method"}

--- a/website/docs/api/example.md
+++ b/website/docs/api/example.md
@ -141,11 +141,12 @@ of the `reference` document.
 > assert example.get_aligned("TAG", as_string=True) == ["VERB", "DET", "NOUN"]
 > ```

-Get the aligned view of a certain token attribute, denoted by its int ID or string name.
+Get the aligned view of a certain token attribute, denoted by its int ID or
+string name.

 | Name        | Type                       | Description                                                        | Default |
 | ----------- | -------------------------- | ------------------------------------------------------------------ | ------- |
-| `field`     | int or str                 | Attribute ID or string name                               |         |
+| `field`     | int or str                 | Attribute ID or string name                                        |         |
 | `as_string` | bool                       | Whether or not to return the list of values as strings.            | `False` |
 | **RETURNS** | `List[int]` or `List[str]` | List of integer values, or string values if `as_string` is `True`. |         |

@ -176,7 +177,7 @@ Pseudo-Projective Dependency Parsing algorithm by Nivre and Nilsson (2005).
 > ```python
 > words = ["Mrs", "Smith", "flew", "to", "New York"]
 > doc = Doc(en_vocab, words=words)
-> entities = [(0, len("Mrs Smith"), "PERSON"), (18, 18 + len("New York"), "LOC")]
+> entities = [(0, 9, "PERSON"), (18, 26, "LOC")]
 > gold_words = ["Mrs Smith", "flew", "to", "New", "York"]
 > example = Example.from_dict(doc, {"words": gold_words, "entities": entities})
 > ner_tags = example.get_aligned_ner()
@ -197,7 +198,7 @@ Get the aligned view of the NER
 > ```python
 > words = ["Mr and Mrs Smith", "flew", "to", "New York"]
 > doc = Doc(en_vocab, words=words)
-> entities = [(0, len("Mr and Mrs Smith"), "PERSON")]
+> entities = [(0, 16, "PERSON")]
 > tokens_ref = ["Mr", "and", "Mrs", "Smith", "flew", "to", "New", "York"]
 > example = Example.from_dict(doc, {"words": tokens_ref, "entities": entities})
 > ents_ref = example.reference.ents
@ -220,15 +221,12 @@ in `example.predicted`.
 > #### Example
 >
 > ```python
-> ruler = EntityRuler(nlp)
-> patterns = [{"label": "PERSON", "pattern": "Mr and Mrs Smith"}]
-> ruler.add_patterns(patterns)
-> nlp.add_pipe(ruler)
+> nlp.add_pipe(my_ner)
 > doc = nlp("Mr and Mrs Smith flew to New York")
-> entities = [(0, len("Mr and Mrs Smith"), "PERSON")]
 > tokens_ref = ["Mr and Mrs", "Smith", "flew", "to", "New York"]
-> example = Example.from_dict(doc, {"words": tokens_ref, "entities": entities})
+> example = Example.from_dict(doc, {"words": tokens_ref})
 > ents_pred = example.predicted.ents
+> # Assume the NER model has found "Mr and Mrs Smith" as a named entity
 > assert [(ent.start, ent.end) for ent in ents_pred] == [(0, 4)]
 > ents_x2y = example.get_aligned_spans_x2y(ents_pred)
 > assert [(ent.start, ent.end) for ent in ents_x2y] == [(0, 2)]
--- a/website/docs/api/language.md
+++ b/website/docs/api/language.md
@ -87,18 +87,18 @@ Update the models in the pipeline.
 > ```python
 > for raw_text, entity_offsets in train_data:
 >     doc = nlp.make_doc(raw_text)
->     gold = GoldParse(doc, entities=entity_offsets)
->     nlp.update([doc], [gold], drop=0.5, sgd=optimizer)
+>     example = Example.from_dict(doc, {"entities": entity_offsets})
+>     nlp.update([example], sgd=optimizer)
 > ```

-| Name                                         | Type     | Description                                                                                                                                                                                                         |
-| -------------------------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `docs`                                       | iterable | A batch of `Doc` objects or strings. If strings, a `Doc` object will be created from the text.                                                                                                                      |
-| `golds`                                      | iterable | A batch of `GoldParse` objects or dictionaries. Dictionaries will be used to create [`GoldParse`](/api/goldparse) objects. For the available keys and their usage, see [`GoldParse.__init__`](/api/goldparse#init). |
-| `drop`                                       | float    | The dropout rate.                                                                                                                                                                                                   |
-| `sgd`                                        | callable | An optimizer.                                                                                                                                                                                                       |
-| `losses`                                     | dict     | Dictionary to update with the loss, keyed by pipeline component.                                                                                                                                                    |
-| `component_cfg` <Tag variant="new">2.1</Tag> | dict     | Config parameters for specific pipeline components, keyed by component name.                                                                                                                                        |
+| Name                                         | Type                | Description                                                                  |
+| -------------------------------------------- | ------------------- | ---------------------------------------------------------------------------- |
+| `examples`                                   | `Iterable[Example]` | A batch of `Example` objects to learn from.                                  |
+| _keyword-only_                               |                     |                                                                              |
+| `drop`                                       | float               | The dropout rate.                                                            |
+| `sgd`                                        | `Optimizer`         | An [`Optimizer`](https://thinc.ai/docs/api-optimizers) object.               |
+| `losses`                                     | `Dict[str, float]`  | Dictionary to update with the loss, keyed by pipeline component.             |
+| `component_cfg` <Tag variant="new">2.1</Tag> | `Dict[str, Dict]`   | Config parameters for specific pipeline components, keyed by component name. |

 ## Language.evaluate {#evaluate tag="method"}

@ -107,35 +107,37 @@ Evaluate a model's pipeline components.
 > #### Example
 >
 > ```python
-> scorer = nlp.evaluate(docs_golds, verbose=True)
+> scorer = nlp.evaluate(examples, verbose=True)
 > print(scorer.scores)
 > ```

-| Name                                         | Type     | Description                                                                                                                                                                                                                                                                                |
-| -------------------------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `docs_golds`                                 | iterable | Tuples of `Doc` and `GoldParse` objects, such that the `Doc` objects contain the predictions and the `GoldParse` objects the correct annotations. Alternatively, `(text, annotations)` tuples of raw text and a dict (see [simple training style](/usage/training#training-simple-style)). |
-| `verbose`                                    | bool     | Print debugging information.                                                                                                                                                                                                                                                               |
-| `batch_size`                                 | int      | The batch size to use.                                                                                                                                                                                                                                                                     |
-| `scorer`                                     | `Scorer` | Optional [`Scorer`](/api/scorer) to use. If not passed in, a new one will be created.                                                                                                                                                                                                      |
-| `component_cfg` <Tag variant="new">2.1</Tag> | dict     | Config parameters for specific pipeline components, keyed by component name.                                                                                                                                                                                                               |
-| **RETURNS**                                  | Scorer   | The scorer containing the evaluation scores.                                                                                                                                                                                                                                               |
+| Name                                         | Type                | Description                                                                           |
+| -------------------------------------------- | ------------------- | ------------------------------------------------------------------------------------- |
+| `examples`                                   | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from.                           |
+| `verbose`                                    | bool                | Print debugging information.                                                          |
+| `batch_size`                                 | int                 | The batch size to use.                                                                |
+| `scorer`                                     | `Scorer`            | Optional [`Scorer`](/api/scorer) to use. If not passed in, a new one will be created. |
+| `component_cfg` <Tag variant="new">2.1</Tag> | `Dict[str, Dict]`   | Config parameters for specific pipeline components, keyed by component name.          |
+| **RETURNS**                                  | Scorer              | The scorer containing the evaluation scores.                                          |

 ## Language.begin_training {#begin_training tag="method"}

-Allocate models, pre-process training data and acquire an optimizer.
+Allocate models, pre-process training data and acquire an
+[`Optimizer`](https://thinc.ai/docs/api-optimizers).

 > #### Example
 >
 > ```python
-> optimizer = nlp.begin_training(gold_tuples)
+> optimizer = nlp.begin_training(get_examples)
 > ```

-| Name                                         | Type     | Description                                                                  |
-| -------------------------------------------- | -------- | ---------------------------------------------------------------------------- |
-| `gold_tuples`                                | iterable | Gold-standard training data.                                                 |
-| `component_cfg` <Tag variant="new">2.1</Tag> | dict     | Config parameters for specific pipeline components, keyed by component name. |
-| `**cfg`                                      | -        | Config parameters (sent to all components).                                  |
-| **RETURNS**                                  | callable | An optimizer.                                                                |
+| Name                                         | Type                | Description                                                                                                        |
+| -------------------------------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------ |
+| `get_examples`                               | `Iterable[Example]` | Optional gold-standard annotations in the form of [`Example`](/api/example) objects.                               |
+| `sgd`                                        | `Optimizer`         | An optional [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. If not set, a default one will be created. |
+| `component_cfg` <Tag variant="new">2.1</Tag> | `Dict[str, Dict]`   | Config parameters for specific pipeline components, keyed by component name.                                       |
+| `**cfg`                                      | -                   | Config parameters (sent to all components).                                                                        |
+| **RETURNS**                                  | `Optimizer`         | An optimizer.                                                                                                      |

 ## Language.use_params {#use_params tag="contextmanager, method"}

@ -155,16 +157,6 @@ their original weights after the block.
 | `params` | dict | A dictionary of parameters keyed by model ID. |
 | `**cfg`  | -    | Config parameters.                            |

-## Language.preprocess_gold {#preprocess_gold tag="method"}
-
-Can be called before training to pre-process gold data. By default, it handles
-nonprojectivity and adds missing tags to the tag map.
-
-| Name         | Type     | Description                              |
-| ------------ | -------- | ---------------------------------------- |
-| `docs_golds` | iterable | Tuples of `Doc` and `GoldParse` objects. |
-| **YIELDS**   | tuple    | Tuples of `Doc` and `GoldParse` objects. |
-
 ## Language.create_pipe {#create_pipe tag="method" new="2"}

 Create a pipeline component from a factory.
--- a/website/docs/api/scorer.md
+++ b/website/docs/api/scorer.md
@ -27,22 +27,20 @@ Create a new `Scorer`.

 ## Scorer.score {#score tag="method"}

-Update the evaluation scores from a single [`Doc`](/api/doc) /
-[`GoldParse`](/api/goldparse) pair.
+Update the evaluation scores from a single [`Example`](/api/example) object.

 > #### Example
 >
 > ```python
 > scorer = Scorer()
-> scorer.score(doc, gold)
+> scorer.score(example)
 > ```

-| Name           | Type        | Description                                                                                                          |
-| -------------- | ----------- | -------------------------------------------------------------------------------------------------------------------- |
-| `doc`          | `Doc`       | The predicted annotations.                                                                                           |
-| `gold`         | `GoldParse` | The correct annotations.                                                                                             |
-| `verbose`      | bool        | Print debugging information.                                                                                         |
-| `punct_labels` | tuple       | Dependency labels for punctuation. Used to evaluate dependency attachments to punctuation if `eval_punct` is `True`. |
+| Name           | Type      | Description                                                                                                          |
+| -------------- | --------- | -------------------------------------------------------------------------------------------------------------------- |
+| `example`      | `Example` | The `Example` object holding both the predictions and the correct gold-standard annotations.                         |
+| `verbose`      | bool      | Print debugging information.                                                                                         |
+| `punct_labels` | tuple     | Dependency labels for punctuation. Used to evaluate dependency attachments to punctuation if `eval_punct` is `True`. |

 ## Properties

--- a/website/docs/api/tagger.md
+++ b/website/docs/api/tagger.md
@ -33,16 +33,16 @@ shortcut for this and instantiate the component using its string name and
 >
 > # Construction from class
 > from spacy.pipeline import Tagger
-> tagger = Tagger(nlp.vocab)
+> tagger = Tagger(nlp.vocab, tagger_model)
 > tagger.from_disk("/path/to/model")
 > ```

-| Name        | Type                          | Description                                                                                                                                           |
-| ----------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `vocab`     | `Vocab`                       | The shared vocabulary.                                                                                                                                |
-| `model`     | `thinc.neural.Model` / `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
-| `**cfg`     | -                             | Configuration parameters.                                                                                                                             |
-| **RETURNS** | `Tagger`                      | The newly constructed object.                                                                                                                         |
+| Name        | Type     | Description                                                                     |
+| ----------- | -------- | ------------------------------------------------------------------------------- |
+| `vocab`     | `Vocab`  | The shared vocabulary.                                                          |
+| `model`     | `Model`  | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
+| `**cfg`     | -        | Configuration parameters.                                                       |
+| **RETURNS** | `Tagger` | The newly constructed object.                                                   |

 ## Tagger.\_\_call\_\_ {#call tag="method"}

@ -132,19 +132,20 @@ pipe's model. Delegates to [`predict`](/api/tagger#predict) and
 > #### Example
 >
 > ```python
-> tagger = Tagger(nlp.vocab)
+> tagger = Tagger(nlp.vocab, tagger_model)
 > losses = {}
 > optimizer = nlp.begin_training()
-> tagger.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)
+> tagger.update(examples, losses=losses, sgd=optimizer)
 > ```

-| Name     | Type     | Description                                                                                  |
-| -------- | -------- | -------------------------------------------------------------------------------------------- |
-| `docs`   | iterable | A batch of documents to learn from.                                                          |
-| `golds`  | iterable | The gold-standard data. Must have the same length as `docs`.                                 |
-| `drop`   | float    | The dropout rate.                                                                            |
-| `sgd`    | callable | The optimizer. Should take two arguments `weights` and `gradient`, and an optional ID.       |
-| `losses` | dict     | Optional record of the loss during training. The value keyed by the model's name is updated. |
+| Name              | Type                | Description                                                                                                                          |
+| ----------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
+| `examples`        | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from.                                                                          |
+| _keyword-only_    |                     |                                                                                                                                      |
+| `drop`            | float               | The dropout rate.                                                                                                                    |
+| `set_annotations` | bool                | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/tagger#set_annotations). |
+| `sgd`             | `Optimizer`         | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object.                                                                      |
+| `losses`          | `Dict[str, float]`  | Optional record of the loss during training. The value keyed by the model's name is updated.                                         |

 ## Tagger.get_loss {#get_loss tag="method"}

@ -168,8 +169,8 @@ predicted scores.

 ## Tagger.begin_training {#begin_training tag="method"}

-Initialize the pipe for training, using data examples if available. If no model
-has been initialized yet, the model is added.
+Initialize the pipe for training, using data examples if available. Return an
+[`Optimizer`](https://thinc.ai/docs/api-optimizers) object.

 > #### Example
 >
@ -179,12 +180,12 @@ has been initialized yet, the model is added.
 > optimizer = tagger.begin_training(pipeline=nlp.pipeline)
 > ```

-| Name          | Type     | Description                                                                                                                                                             |
-| ------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `gold_tuples` | iterable | Optional gold-standard annotations from which to construct [`GoldParse`](/api/goldparse) objects.                                                                       |
-| `pipeline`    | list     | Optional list of pipeline components that this component is part of.                                                                                                    |
-| `sgd`         | callable | An optional optimizer. Should take two arguments `weights` and `gradient`, and an optional ID. Will be created via [`Tagger`](/api/tagger#create_optimizer) if not set. |
-| **RETURNS**   | callable | An optimizer.                                                                                                                                                           |
+| Name           | Type                    | Description                                                                                                                                                |
+| -------------- | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `get_examples` | `Iterable[Example]`     | Optional gold-standard annotations in the form of [`Example`](/api/example) objects.                                                                       |
+| `pipeline`     | `List[(str, callable)]` | Optional list of pipeline components that this component is part of.                                                                                       |
+| `sgd`          | `Optimizer`             | An optional [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. Will be created via [`create_optimizer`](/api/tagger#create_optimizer) if not set. |
+| **RETURNS**    | `Optimizer`             | An optimizer.                                                                                                                                              |

 ## Tagger.create_optimizer {#create_optimizer tag="method"}

--- a/website/docs/api/textcategorizer.md
+++ b/website/docs/api/textcategorizer.md
@ -35,17 +35,16 @@ shortcut for this and instantiate the component using its string name and
 >
 > # Construction from class
 > from spacy.pipeline import TextCategorizer
-> textcat = TextCategorizer(nlp.vocab)
+> textcat = TextCategorizer(nlp.vocab, textcat_model)
 > textcat.from_disk("/path/to/model")
 > ```

-| Name                | Type                          | Description                                                                                                                                           |
-| ------------------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `vocab`             | `Vocab`                       | The shared vocabulary.                                                                                                                                |
-| `model`             | `thinc.neural.Model` / `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
-| `exclusive_classes` | bool                          | Make categories mutually exclusive. Defaults to `False`.                                                                                              |
-| `architecture`      | str                           | Model architecture to use, see [architectures](#architectures) for details. Defaults to `"ensemble"`.                                                 |
-| **RETURNS**         | `TextCategorizer`             | The newly constructed object.                                                                                                                         |
+| Name        | Type              | Description                                                                     |
+| ----------- | ----------------- | ------------------------------------------------------------------------------- |
+| `vocab`     | `Vocab`           | The shared vocabulary.                                                          |
+| `model`     | `Model`           | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
+| `**cfg`     | -                 | Configuration parameters.                                                       |
+| **RETURNS** | `TextCategorizer` | The newly constructed object.                                                   |

 ### Architectures {#architectures new="2.1"}

@ -151,19 +150,20 @@ pipe's model. Delegates to [`predict`](/api/textcategorizer#predict) and
 > #### Example
 >
 > ```python
-> textcat = TextCategorizer(nlp.vocab)
+> textcat = TextCategorizer(nlp.vocab, textcat_model)
 > losses = {}
 > optimizer = nlp.begin_training()
-> textcat.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)
+> textcat.update(examples, losses=losses, sgd=optimizer)
 > ```

-| Name     | Type     | Description                                                                                  |
-| -------- | -------- | -------------------------------------------------------------------------------------------- |
-| `docs`   | iterable | A batch of documents to learn from.                                                          |
-| `golds`  | iterable | The gold-standard data. Must have the same length as `docs`.                                 |
-| `drop`   | float    | The dropout rate.                                                                            |
-| `sgd`    | callable | The optimizer. Should take two arguments `weights` and `gradient`, and an optional ID.       |
-| `losses` | dict     | Optional record of the loss during training. The value keyed by the model's name is updated. |
+| Name              | Type                | Description                                                                                                                                   |
+| ----------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
+| `examples`        | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from.                                                                                   |
+| _keyword-only_    |                     |                                                                                                                                               |
+| `drop`            | float               | The dropout rate.                                                                                                                             |
+| `set_annotations` | bool                | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/textcategorizer#set_annotations). |
+| `sgd`             | `Optimizer`         | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object.                                                                               |
+| `losses`          | `Dict[str, float]`  | Optional record of the loss during training. The value keyed by the model's name is updated.                                                  |

 ## TextCategorizer.get_loss {#get_loss tag="method"}

@ -187,8 +187,8 @@ predicted scores.

 ## TextCategorizer.begin_training {#begin_training tag="method"}

-Initialize the pipe for training, using data examples if available. If no model
-has been initialized yet, the model is added.
+Initialize the pipe for training, using data examples if available. Return an
+[`Optimizer`](https://thinc.ai/docs/api-optimizers) object.

 > #### Example
 >
@ -198,12 +198,12 @@ has been initialized yet, the model is added.
 > optimizer = textcat.begin_training(pipeline=nlp.pipeline)
 > ```

-| Name          | Type     | Description                                                                                                                                                                               |
-| ------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `gold_tuples` | iterable | Optional gold-standard annotations from which to construct [`GoldParse`](/api/goldparse) objects.                                                                                         |
-| `pipeline`    | list     | Optional list of pipeline components that this component is part of.                                                                                                                      |
-| `sgd`         | callable | An optional optimizer. Should take two arguments `weights` and `gradient`, and an optional ID. Will be created via [`TextCategorizer`](/api/textcategorizer#create_optimizer) if not set. |
-| **RETURNS**   | callable | An optimizer.                                                                                                                                                                             |
+| Name           | Type                    | Description                                                                                                                                                         |
+| -------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `get_examples` | `Iterable[Example]`     | Optional gold-standard annotations in the form of [`Example`](/api/example) objects.                                                                                |
+| `pipeline`     | `List[(str, callable)]` | Optional list of pipeline components that this component is part of.                                                                                                |
+| `sgd`          | `Optimizer`             | An optional [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. Will be created via [`create_optimizer`](/api/textcategorizer#create_optimizer) if not set. |
+| **RETURNS**    | `Optimizer`             | An optimizer.                                                                                                                                                       |

 ## TextCategorizer.create_optimizer {#create_optimizer tag="method"}

--- a/website/docs/api/top-level.md
+++ b/website/docs/api/top-level.md
@ -719,8 +719,7 @@ vary on each step.
 > ```python
 > batches = minibatch(train_data)
 > for batch in batches:
->     texts, annotations = zip(*batch)
->     nlp.update(texts, annotations)
+>     nlp.update(batch)
 > ```

 | Name       | Type           | Description            |
--- a/website/docs/usage/101/_architecture.md
+++ b/website/docs/usage/101/_architecture.md
@ -45,10 +45,11 @@ an **annotated document**. It also orchestrates training and serialization.

 ### Other classes {#architecture-other}

-| Name                              | Description                                                                                                   |
-| --------------------------------- | ------------------------------------------------------------------------------------------------------------- |
-| [`Vocab`](/api/vocab)             | A lookup table for the vocabulary that allows you to access `Lexeme` objects.                                 |
-| [`StringStore`](/api/stringstore) | Map strings to and from hash values.                                                                          |
-| [`Vectors`](/api/vectors)         | Container class for vector data keyed by string.                                                              |
-| [`GoldParse`](/api/goldparse)     | Collection for training annotations.                                                                          |
-| [`GoldCorpus`](/api/goldcorpus)   | An annotated corpus, using the JSON file format. Manages annotations for tagging, dependency parsing and NER. |
+| Name                              | Description                                                                   |
+| --------------------------------- | ----------------------------------------------------------------------------- |
+| [`Vocab`](/api/vocab)             | A lookup table for the vocabulary that allows you to access `Lexeme` objects. |
+| [`StringStore`](/api/stringstore) | Map strings to and from hash values.                                          |
+| [`Vectors`](/api/vectors)         | Container class for vector data keyed by string.                              |
+| [`Example`](/api/example)         | Collection for training annotations.                                          |
+
+|
--- a/website/docs/usage/spacy-101.md
+++ b/website/docs/usage/spacy-101.md
@ -633,8 +633,9 @@ for ent in doc.ents:
 ### Train and update neural network models {#lightning-tour-training"}

 ```python
-import spacy
 import random
+import spacy
+from spacy.gold import Example

 nlp = spacy.load("en_core_web_sm")
 train_data = [("Uber blew through $1 million", {"entities": [(0, 4, "ORG")]})]
@ -644,7 +645,9 @@ with nlp.select_pipes(enable="ner"):
    for i in range(10):
        random.shuffle(train_data)
        for text, annotations in train_data:
-            nlp.update([text], [annotations], sgd=optimizer)
+            doc = nlp.make_doc(text)
+            example = Example.from_dict(doc, annotations)
+            nlp.update([example], sgd=optimizer)
 nlp.to_disk("/model")
 ```

--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -375,45 +375,71 @@ mattis pretium.

 ## Internal training API {#api}

-<!-- TODO: rewrite for new nlp.update / example logic -->
+The [`Example`](/api/example) object contains annotated training data, also
+called the **gold standard**. It's initialized with a [`Doc`](/api/doc) object
+that will hold the predictions, and another `Doc` object that holds the
+gold-standard annotations. Here's an example of a simple `Example` for
+part-of-speech tags:

-The [`GoldParse`](/api/goldparse) object collects the annotated training
-examples, also called the **gold standard**. It's initialized with the
-[`Doc`](/api/doc) object it refers to, and keyword arguments specifying the
-annotations, like `tags` or `entities`. Its job is to encode the annotations,
-keep them aligned and create the C-level data structures required for efficient
-access. Here's an example of a simple `GoldParse` for part-of-speech tags:
+```python
+words = ["I", "like", "stuff"]
+predicted = Doc(vocab, words=words)
+# create the reference Doc with gold-standard TAG annotations
+tags = ["NOUN", "VERB", "NOUN"]
+tag_ids = [vocab.strings.add(tag) for tag in tags]
+reference = Doc(vocab, words=words).from_array("TAG", numpy.array(tag_ids, dtype="uint64"))
+example = Example(predicted, reference)
+```
+
+Alternatively, the `reference` `Doc` with the gold-standard annotations can be
+created from a dictionary with keyword arguments specifying the annotations,
+like `tags` or `entities`:
+
+```python
+words = ["I", "like", "stuff"]
+tags = ["NOUN", "VERB", "NOUN"]
+predicted = Doc(en_vocab, words=words)
+example = Example.from_dict(predicted, {"tags": tags})
+```
+
+Using the `Example` object and its gold-standard annotations, the model can be
+updated to learn a sentence of three words with their assigned part-of-speech
+tags.
+
+<!-- TODO: is this the best place for the tag_map explanation ? -->
+
+The [tag map](/usage/adding-languages#tag-map) is part of the vocabulary and
+defines the annotation scheme. If you're training a new language model, this
+will let you map the tags present in the treebank you train on to spaCy's tag
+scheme:

 ```python
 vocab = Vocab(tag_map={"N": {"pos": "NOUN"}, "V": {"pos": "VERB"}})
-doc = Doc(vocab, words=["I", "like", "stuff"])
-gold = GoldParse(doc, tags=["N", "V", "N"])
 ```

-Using the `Doc` and its gold-standard annotations, the model can be updated to
-learn a sentence of three words with their assigned part-of-speech tags. The
-[tag map](/usage/adding-languages#tag-map) is part of the vocabulary and defines
-the annotation scheme. If you're training a new language model, this will let
-you map the tags present in the treebank you train on to spaCy's tag scheme.
+Another example shows how to define gold-standard named entities:

 ```python
-doc = Doc(Vocab(), words=["Facebook", "released", "React", "in", "2014"])
-gold = GoldParse(doc, entities=["U-ORG", "O", "U-TECHNOLOGY", "O", "U-DATE"])
+doc = Doc(vocab, words=["Facebook", "released", "React", "in", "2014"])
+example = Example.from_dict(doc, {"entities": ["U-ORG", "O", "U-TECHNOLOGY", "O", "U-DATE"]})
 ```

-The same goes for named entities. The letters added before the labels refer to
-the tags of the [BILUO scheme](/usage/linguistic-features#updating-biluo) – `O`
-is a token outside an entity, `U` an single entity unit, `B` the beginning of an
-entity, `I` a token inside an entity and `L` the last token of an entity.
+The letters added before the labels refer to the tags of the
+[BILUO scheme](/usage/linguistic-features#updating-biluo) – `O` is a token
+outside an entity, `U` an single entity unit, `B` the beginning of an entity,
+`I` a token inside an entity and `L` the last token of an entity.

 > - **Training data**: The training examples.
 > - **Text and label**: The current example.
 > - **Doc**: A `Doc` object created from the example text.
-> - **GoldParse**: A `GoldParse` object of the `Doc` and label.
+> - **Example**: An `Example` object holding both predictions and gold-standard
+>   annotations.
 > - **nlp**: The `nlp` object with the model.
 > - **Optimizer**: A function that holds state between updates.
 > - **Update**: Update the model's weights.

+<!-- TODO: update graphic & related text -->
+
 ![The training loop](../images/training-loop.svg)

 Of course, it's not enough to only show a model a single example once.
@ -427,32 +453,33 @@ dropout means that each feature or internal representation has a 1/4 likelihood
 of being dropped.

 > - [`begin_training`](/api/language#begin_training): Start the training and
->   return an optimizer function to update the model's weights. Can take an
->   optional function converting the training data to spaCy's training format.
-> - [`update`](/api/language#update): Update the model with the training example
->   and gold data.
+>   return an [`Optimizer`](https://thinc.ai/docs/api-optimizers) object to
+>   update the model's weights.
+> - [`update`](/api/language#update): Update the model with the training
+>   examplea.
 > - [`to_disk`](/api/language#to_disk): Save the updated model to a directory.

 ```python
 ### Example training loop
-optimizer = nlp.begin_training(get_data)
+optimizer = nlp.begin_training()
 for itn in range(100):
    random.shuffle(train_data)
    for raw_text, entity_offsets in train_data:
        doc = nlp.make_doc(raw_text)
-        gold = GoldParse(doc, entities=entity_offsets)
-        nlp.update([doc], [gold], drop=0.5, sgd=optimizer)
+        example = Example.from_dict(doc, {"entities": entity_offsets})
+        nlp.update([example], sgd=optimizer)
 nlp.to_disk("/model")
 ```

 The [`nlp.update`](/api/language#update) method takes the following arguments:

-| Name    | Description                                                                                                                                                                                                   |
-| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `docs`  | [`Doc`](/api/doc) objects. The `update` method takes a sequence of them, so you can batch up your training examples. Alternatively, you can also pass in a sequence of raw texts.                             |
-| `golds` | [`GoldParse`](/api/goldparse) objects. The `update` method takes a sequence of them, so you can batch up your training examples. Alternatively, you can also pass in a dictionary containing the annotations. |
-| `drop`  | Dropout rate. Makes it harder for the model to just memorize the data.                                                                                                                                        |
-| `sgd`   | An optimizer, i.e. a callable to update the model's weights. If not set, spaCy will create a new one and save it for further use.                                                                             |
+| Name       | Description                                                                                                                                                            |
+| ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `examples` | [`Example`](/api/example) objects. The `update` method takes a sequence of them, so you can batch up your training examples.                                           |
+| `drop`     | Dropout rate. Makes it harder for the model to just memorize the data.                                                                                                 |
+| `sgd`      | An [`Optimizer`](https://thinc.ai/docs/api-optimizers) object, which updated the model's weights. If not set, spaCy will create a new one and save it for further use. |
+
+<!-- TODO: DocBin format ? -->

 Instead of writing your own training loop, you can also use the built-in
 [`train`](/api/cli#train) command, which expects data in spaCy's