Add distill documentation for all pipes that support distillation

2025-08-04 04:10:20 +03:00 · 2023-01-13 14:36:37 +01:00 · 2023-01-13 14:36:37 +01:00 · 2774667f77
commit 2774667f77
parent 44498e651a
6 changed files with 219 additions and 0 deletions
--- a/website/docs/api/dependencyparser.mdx
+++ b/website/docs/api/dependencyparser.mdx
@ -131,6 +131,39 @@ and all pipeline components are applied to the `Doc` in order. Both
 | `doc`       | The document to process. ~~Doc~~ |
 | **RETURNS** | The processed document. ~~Doc~~  |
 ## DependencyParser.distill {id="distill", tag="method,experimental", version="4"}
 Train a pipe (the student) on the predictions of another pipe (the teacher). The
 student is typically trained on the probability distribution of the teacher, but
 details may differ per pipe. The goal of distillation is to transfer knowledge
 from the teacher to the student.
 The distillation is performed on ~~Example~~ objects. The `Example.reference`
 and `Example.predicted` ~~Doc~~s must have the same number of tokens and the
 same orthography. Even though the reference does not need have to have gold
 annotations, the teacher could adds its own annotations when necessary.
 This feature is experimental.
 > #### Example
 >
 > ```python
 > teacher_pipe = teacher.add_pipe("parser")
 > student_pipe = student.add_pipe("parser")
 > optimizer = nlp.resume_training()
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
 | Name           | Description                                                                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
 | `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
 | `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
 | _keyword-only_ |                                                                                                                                             |
 | `drop`         | Dropout rate. ~~float~~                                                                                                                     |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
 | `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
 | **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
 ## DependencyParser.pipe {id="pipe",tag="method"}
 Apply the pipe to a stream of documents. This usually happens under the hood
--- a/website/docs/api/edittreelemmatizer.mdx
+++ b/website/docs/api/edittreelemmatizer.mdx
@ -115,6 +115,39 @@ and all pipeline components are applied to the `Doc` in order. Both
 | `doc`       | The document to process. ~~Doc~~ |
 | **RETURNS** | The processed document. ~~Doc~~  |
 ## EditTreeLemmatizer.distill {id="distill", tag="method,experimental", version="4"}
 Train a pipe (the student) on the predictions of another pipe (the teacher). The
 student is typically trained on the probability distribution of the teacher, but
 details may differ per pipe. The goal of distillation is to transfer knowledge
 from the teacher to the student.
 The distillation is performed on ~~Example~~ objects. The `Example.reference`
 and `Example.predicted` ~~Doc~~s must have the same number of tokens and the
 same orthography. Even though the reference does not need have to have gold
 annotations, the teacher could adds its own annotations when necessary.
 This feature is experimental.
 > #### Example
 >
 > ```python
 > teacher_pipe = teacher.add_pipe("trainable_lemmatizer")
 > student_pipe = student.add_pipe("trainable_lemmatizer")
 > optimizer = nlp.resume_training()
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
 | Name           | Description                                                                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
 | `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
 | `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
 | _keyword-only_ |                                                                                                                                             |
 | `drop`         | Dropout rate. ~~float~~                                                                                                                     |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
 | `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
 | **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
 ## EditTreeLemmatizer.pipe {id="pipe",tag="method"}
 Apply the pipe to a stream of documents. This usually happens under the hood
--- a/website/docs/api/entityrecognizer.mdx
+++ b/website/docs/api/entityrecognizer.mdx
@ -127,6 +127,39 @@ and all pipeline components are applied to the `Doc` in order. Both
 | `doc`       | The document to process. ~~Doc~~ |
 | **RETURNS** | The processed document. ~~Doc~~  |
 ## EntityRecognizer.distill {id="distill", tag="method,experimental", version="4"}
 Train a pipe (the student) on the predictions of another pipe (the teacher). The
 student is typically trained on the probability distribution of the teacher, but
 details may differ per pipe. The goal of distillation is to transfer knowledge
 from the teacher to the student.
 The distillation is performed on ~~Example~~ objects. The `Example.reference`
 and `Example.predicted` ~~Doc~~s must have the same number of tokens and the
 same orthography. Even though the reference does not need have to have gold
 annotations, the teacher could adds its own annotations when necessary.
 This feature is experimental.
 > #### Example
 >
 > ```python
 > teacher_pipe = teacher.add_pipe("ner")
 > student_pipe = student.add_pipe("ner")
 > optimizer = nlp.resume_training()
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
 | Name           | Description                                                                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
 | `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
 | `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
 | _keyword-only_ |                                                                                                                                             |
 | `drop`         | Dropout rate. ~~float~~                                                                                                                     |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
 | `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
 | **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
 ## EntityRecognizer.pipe {id="pipe",tag="method"}
 Apply the pipe to a stream of documents. This usually happens under the hood
@ -264,6 +297,27 @@ predicted scores.
 | `scores`    | Scores representing the model's predictions. ~~StateClass~~                 |
 | **RETURNS** | The loss and the gradient, i.e. `(loss, gradient)`. ~~Tuple[float, float]~~ |
 ## EntityRecognizer.get_teacher_student_loss {id="get_teacher_student_loss", tag="method", version="4"}
 Calculate the loss and its gradient for the batch of student scores relative to
 the teacher scores.
 > #### Example
 >
 > ```python
 > teacher_ner = teacher.get_pipe("ner")
 > student_ner = student.add_pipe("ner")
 > student_scores = student_ner.predict([eg.predicted for eg in examples])
 > teacher_scores = teacher_ner.predict([eg.predicted for eg in examples])
 > loss, d_loss = student_ner.get_teacher_student_loss(teacher_scores, student_scores)
 > ```
 | Name             | Description                                                                 |
 | ---------------- | --------------------------------------------------------------------------- |
 | `teacher_scores` | Scores representing the teacher model's predictions.                        |
 | `student_scores` | Scores representing the student model's predictions.                        |
 | **RETURNS**      | The loss and the gradient, i.e. `(loss, gradient)`. ~~Tuple[float, float]~~ |
 ## EntityRecognizer.create_optimizer {id="create_optimizer",tag="method"}
 Create an optimizer for the pipeline component.
--- a/website/docs/api/morphologizer.mdx
+++ b/website/docs/api/morphologizer.mdx
@ -121,6 +121,39 @@ delegate to the [`predict`](/api/morphologizer#predict) and
 | `doc`       | The document to process. ~~Doc~~ |
 | **RETURNS** | The processed document. ~~Doc~~  |
 ## Morphologizer.distill {id="distill", tag="method,experimental", version="4"}
 Train a pipe (the student) on the predictions of another pipe (the teacher). The
 student is typically trained on the probability distribution of the teacher, but
 details may differ per pipe. The goal of distillation is to transfer knowledge
 from the teacher to the student.
 The distillation is performed on ~~Example~~ objects. The `Example.reference`
 and `Example.predicted` ~~Doc~~s must have the same number of tokens and the
 same orthography. Even though the reference does not need have to have gold
 annotations, the teacher could adds its own annotations when necessary.
 This feature is experimental.
 > #### Example
 >
 > ```python
 > teacher_pipe = teacher.add_pipe("morphologizer")
 > student_pipe = student.add_pipe("morphologizer")
 > optimizer = nlp.resume_training()
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
 | Name           | Description                                                                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
 | `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
 | `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
 | _keyword-only_ |                                                                                                                                             |
 | `drop`         | Dropout rate. ~~float~~                                                                                                                     |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
 | `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
 | **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
 ## Morphologizer.pipe {id="pipe",tag="method"}
 Apply the pipe to a stream of documents. This usually happens under the hood
--- a/website/docs/api/sentencerecognizer.mdx
+++ b/website/docs/api/sentencerecognizer.mdx
@ -106,6 +106,39 @@ and all pipeline components are applied to the `Doc` in order. Both
 | `doc`       | The document to process. ~~Doc~~ |
 | **RETURNS** | The processed document. ~~Doc~~  |
 ## SentenceRecognizer.distill {id="distill", tag="method,experimental", version="4"}
 Train a pipe (the student) on the predictions of another pipe (the teacher). The
 student is typically trained on the probability distribution of the teacher, but
 details may differ per pipe. The goal of distillation is to transfer knowledge
 from the teacher to the student.
 The distillation is performed on ~~Example~~ objects. The `Example.reference`
 and `Example.predicted` ~~Doc~~s must have the same number of tokens and the
 same orthography. Even though the reference does not need have to have gold
 annotations, the teacher could adds its own annotations when necessary.
 This feature is experimental.
 > #### Example
 >
 > ```python
 > teacher_pipe = teacher.add_pipe("senter")
 > student_pipe = student.add_pipe("senter")
 > optimizer = nlp.resume_training()
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
 | Name           | Description                                                                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
 | `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
 | `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
 | _keyword-only_ |                                                                                                                                             |
 | `drop`         | Dropout rate. ~~float~~                                                                                                                     |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
 | `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
 | **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
 ## SentenceRecognizer.pipe {id="pipe",tag="method"}
 Apply the pipe to a stream of documents. This usually happens under the hood
--- a/website/docs/api/tagger.mdx
+++ b/website/docs/api/tagger.mdx
@ -105,6 +105,39 @@ and all pipeline components are applied to the `Doc` in order. Both
 | `doc`       | The document to process. ~~Doc~~ |
 | **RETURNS** | The processed document. ~~Doc~~  |
 ## Tagger.distill {id="distill", tag="method,experimental", version="4"}
 Train a pipe (the student) on the predictions of another pipe (the teacher). The
 student is typically trained on the probability distribution of the teacher, but
 details may differ per pipe. The goal of distillation is to transfer knowledge
 from the teacher to the student.
 The distillation is performed on ~~Example~~ objects. The `Example.reference`
 and `Example.predicted` ~~Doc~~s must have the same number of tokens and the
 same orthography. Even though the reference does not need have to have gold
 annotations, the teacher could adds its own annotations when necessary.
 This feature is experimental.
 > #### Example
 >
 > ```python
 > teacher_pipe = teacher.add_pipe("tagger")
 > student_pipe = student.add_pipe("tagger")
 > optimizer = nlp.resume_training()
 > losses = student.distill(teacher_pipe, examples, sgd=optimizer)
 > ```
 | Name           | Description                                                                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
 | `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~                                                                                 |
 | `examples`     | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ |
 | _keyword-only_ |                                                                                                                                             |
 | `drop`         | Dropout rate. ~~float~~                                                                                                                     |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                               |
 | `losses`       | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~                |
 | **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                       |
 ## Tagger.pipe {id="pipe",tag="method"}
 Apply the pipe to a stream of documents. This usually happens under the hood