Clarify that how Example is used in distillation

2025-08-02 19:30:19 +03:00 · 2023-01-18 15:39:43 +01:00 · 2023-01-18 15:39:43 +01:00 · b61ee4c332
commit b61ee4c332
parent c2f83f2510
10 changed files with 81 additions and 79 deletions
--- a/spacy/language.py
+++ b/spacy/language.py
@ -1033,7 +1033,9 @@ class Language:
    ):
        """Update the models in the pipeline.
        teacher (Language): Teacher to distill from.
-        examples (Iterable[Example]): A batch of examples
+        examples (Iterable[Example]): Distillation examples. The reference
            (teacher) and predicted (student) docs must have the same number of
            tokens and the same orthography.
        drop (float): The dropout rate.
        sgd (Optional[Optimizer]): An optimizer.
        losses (Optional(Dict[str, float])): Dictionary to update with the loss,
--- a/spacy/pipeline/trainable_pipe.pyx
+++ b/spacy/pipeline/trainable_pipe.pyx
@ -71,8 +71,8 @@ cdef class TrainablePipe(Pipe):
        teacher_pipe (Optional[TrainablePipe]): The teacher pipe to learn
            from.
        examples (Iterable[Example]): Distillation examples. The reference
-            and predicted docs must have the same number of tokens and the
+            (teacher) and predicted (student) docs must have the same number of
-            same orthography.
+            tokens and the same orthography.
        drop (float): dropout rate.
        sgd (Optional[Optimizer]): An optimizer. Will be created via
            create_optimizer if not set.
--- a/website/docs/api/dependencyparser.mdx
+++ b/website/docs/api/dependencyparser.mdx
@ -154,15 +154,15 @@ This feature is experimental.
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
-| Name           | Description                                                                                                                                 |
+| Name           | Description                                                                                                                                                                                 |
-| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
+| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                                                                 |
-| `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
+| `examples`     | A batch of [`Example`](/api/example) distillation examples. The reference (teacher) and predicted (student) docs must have the same number of tokens and orthography. ~~Iterable[Example]~~ |
-| _keyword-only_ |                                                                                                                                             |
+| _keyword-only_ |                                                                                                                                                                                             |
-| `drop`         | Dropout rate. ~~float~~                                                                                                                     |
+| `drop`         | Dropout rate. ~~float~~                                                                                                                                                                     |
-| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
+| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                                                               |
-| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
+| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                                                                |
-| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
+| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                                                                       |
 ## DependencyParser.pipe {id="pipe",tag="method"}
--- a/website/docs/api/edittreelemmatizer.mdx
+++ b/website/docs/api/edittreelemmatizer.mdx
@ -138,15 +138,15 @@ This feature is experimental.
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
-| Name           | Description                                                                                                                                 |
+| Name           | Description                                                                                                                                                                                 |
-| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
+| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                                                                 |
-| `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
+| `examples`     | A batch of [`Example`](/api/example) distillation examples. The reference (teacher) and predicted (student) docs must have the same number of tokens and orthography. ~~Iterable[Example]~~ |
-| _keyword-only_ |                                                                                                                                             |
+| _keyword-only_ |                                                                                                                                                                                             |
-| `drop`         | Dropout rate. ~~float~~                                                                                                                     |
+| `drop`         | Dropout rate. ~~float~~                                                                                                                                                                     |
-| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
+| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                                                               |
-| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
+| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                                                                |
-| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
+| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                                                                       |
 ## EditTreeLemmatizer.pipe {id="pipe",tag="method"}
--- a/website/docs/api/entityrecognizer.mdx
+++ b/website/docs/api/entityrecognizer.mdx
@ -150,15 +150,15 @@ This feature is experimental.
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
-| Name           | Description                                                                                                                                 |
+| Name           | Description                                                                                                                                                                                 |
-| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
+| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                                                                 |
-| `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
+| `examples`     | A batch of [`Example`](/api/example) distillation examples. The reference (teacher) and predicted (student) docs must have the same number of tokens and orthography. ~~Iterable[Example]~~ |
-| _keyword-only_ |                                                                                                                                             |
+| _keyword-only_ |                                                                                                                                                                                             |
-| `drop`         | Dropout rate. ~~float~~                                                                                                                     |
+| `drop`         | Dropout rate. ~~float~~                                                                                                                                                                     |
-| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
+| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                                                               |
-| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
+| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                                                                |
-| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
+| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                                                                       |
 ## EntityRecognizer.pipe {id="pipe",tag="method"}
--- a/website/docs/api/language.mdx
+++ b/website/docs/api/language.mdx
@ -347,19 +347,19 @@ Distill the models in a student pipeline from a teacher pipeline.
 > student.distill(teacher, examples, sgd=optimizer)
 > ```
-| Name            | Description                                                                                                                                    |
+| Name            | Description                                                                                                                                                                                 |
-| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
+| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `teacher`       | The teacher pipeline to distill from. ~~Language~~                                                                                             |
+| `teacher`       | The teacher pipeline to distill from. ~~Language~~                                                                                                                                          |
-| `examples`      | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                              |
+| `examples`      | A batch of [`Example`](/api/example) distillation examples. The reference (teacher) and predicted (student) docs must have the same number of tokens and orthography. ~~Iterable[Example]~~ |
-| _keyword-only_  |                                                                                                                                                |
+| _keyword-only_  |                                                                                                                                                                                             |
-| `drop`          | The dropout rate. ~~float~~                                                                                                                    |
+| `drop`          | The dropout rate. ~~float~~                                                                                                                                                                 |
-| `sgd`           | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                  |
+| `sgd`           | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                                                               |
-| `losses`        | Dictionary to update with the loss, keyed by pipeline component. ~~Optional[Dict[str, float]]~~                                                |
+| `losses`        | Dictionary to update with the loss, keyed by pipeline component. ~~Optional[Dict[str, float]]~~                                                                                             |
-| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ |
+| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~                                              |
-| `exclude`       | Names of components that shouldn't be updated. Defaults to `[]`. ~~Iterable[str]~~                                                             |
+| `exclude`       | Names of components that shouldn't be updated. Defaults to `[]`. ~~Iterable[str]~~                                                                                                          |
-| `annotates`     | Names of components that should set annotations on the prediced examples after updating. Defaults to `[]`. ~~Iterable[str]~~                   |
+| `annotates`     | Names of components that should set annotations on the prediced examples after updating. Defaults to `[]`. ~~Iterable[str]~~                                                                |
-| `component_map` | Map student component names to teacher component names, only necessary when the names differ. Defaults to `None`. ~~Iterable[str]~~            |
+| `component_map` | Map student component names to teacher component names, only necessary when the names differ. Defaults to `None`. ~~Iterable[str]~~                                                         |
-| **RETURNS**     | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                          |
+| **RETURNS**     | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                                                                       |
 ## Language.rehearse {id="rehearse",tag="method,experimental",version="3"}
--- a/website/docs/api/morphologizer.mdx
+++ b/website/docs/api/morphologizer.mdx
@ -144,15 +144,15 @@ This feature is experimental.
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
-| Name           | Description                                                                                                                                 |
+| Name           | Description                                                                                                                                                                                 |
-| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
+| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                                                                 |
-| `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
+| `examples`     | A batch of [`Example`](/api/example) distillation examples. The reference (teacher) and predicted (student) docs must have the same number of tokens and orthography. ~~Iterable[Example]~~ |
-| _keyword-only_ |                                                                                                                                             |
+| _keyword-only_ |                                                                                                                                                                                             |
-| `drop`         | Dropout rate. ~~float~~                                                                                                                     |
+| `drop`         | Dropout rate. ~~float~~                                                                                                                                                                     |
-| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
+| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                                                               |
-| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
+| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                                                                |
-| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
+| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                                                                       |
 ## Morphologizer.pipe {id="pipe",tag="method"}
--- a/website/docs/api/pipe.mdx
+++ b/website/docs/api/pipe.mdx
@ -257,15 +257,15 @@ This feature is experimental.
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
-| Name           | Description                                                                                                                                 |
+| Name           | Description                                                                                                                                                                                 |
-| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
+| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                                                                 |
-| `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
+| `examples`     | A batch of [`Example`](/api/example) distillation examples. The reference (teacher) and predicted (student) docs must have the same number of tokens and orthography. ~~Iterable[Example]~~ |
-| _keyword-only_ |                                                                                                                                             |
+| _keyword-only_ |                                                                                                                                                                                             |
-| `drop`         | Dropout rate. ~~float~~                                                                                                                     |
+| `drop`         | Dropout rate. ~~float~~                                                                                                                                                                     |
-| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
+| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                                                               |
-| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
+| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                                                                |
-| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
+| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                                                                       |
 ## TrainablePipe.rehearse {id="rehearse",tag="method,experimental",version="3"}
--- a/website/docs/api/sentencerecognizer.mdx
+++ b/website/docs/api/sentencerecognizer.mdx
@ -129,15 +129,15 @@ This feature is experimental.
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
-| Name           | Description                                                                                                                                 |
+| Name           | Description                                                                                                                                                                                 |
-| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
+| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                                                                 |
-| `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
+| `examples`     | A batch of [`Example`](/api/example) distillation examples. The reference (teacher) and predicted (student) docs must have the same number of tokens and orthography. ~~Iterable[Example]~~ |
-| _keyword-only_ |                                                                                                                                             |
+| _keyword-only_ |                                                                                                                                                                                             |
-| `drop`         | Dropout rate. ~~float~~                                                                                                                     |
+| `drop`         | Dropout rate. ~~float~~                                                                                                                                                                     |
-| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
+| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                                                               |
-| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
+| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                                                                |
-| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
+| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                                                                       |
 ## SentenceRecognizer.pipe {id="pipe",tag="method"}
--- a/website/docs/api/tagger.mdx
+++ b/website/docs/api/tagger.mdx
@ -128,15 +128,15 @@ This feature is experimental.
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
-| Name           | Description                                                                                                                                 |
+| Name           | Description                                                                                                                                                                                 |
-| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
+| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                                                                 |
-| `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
+| `examples`     | A batch of [`Example`](/api/example) distillation examples. The reference (teacher) and predicted (student) docs must have the same number of tokens and orthography. ~~Iterable[Example]~~ |
-| _keyword-only_ |                                                                                                                                             |
+| _keyword-only_ |                                                                                                                                                                                             |
-| `drop`         | Dropout rate. ~~float~~                                                                                                                     |
+| `drop`         | Dropout rate. ~~float~~                                                                                                                                                                     |
-| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
+| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                                                               |
-| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
+| `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                                                                |
-| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
+| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                                                                       |
 ## Tagger.pipe {id="pipe",tag="method"}