From 2774667f7729dfd53a48790c7605f5eda0da9329 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Danie=CC=88l=20de=20Kok?= Date: Fri, 13 Jan 2023 14:36:37 +0100 Subject: [PATCH] Add distill documentation for all pipes that support distillation --- website/docs/api/dependencyparser.mdx | 33 +++++++++++++++ website/docs/api/edittreelemmatizer.mdx | 33 +++++++++++++++ website/docs/api/entityrecognizer.mdx | 54 +++++++++++++++++++++++++ website/docs/api/morphologizer.mdx | 33 +++++++++++++++ website/docs/api/sentencerecognizer.mdx | 33 +++++++++++++++ website/docs/api/tagger.mdx | 33 +++++++++++++++ 6 files changed, 219 insertions(+) diff --git a/website/docs/api/dependencyparser.mdx b/website/docs/api/dependencyparser.mdx index 523308cc2..5179ce48b 100644 --- a/website/docs/api/dependencyparser.mdx +++ b/website/docs/api/dependencyparser.mdx @@ -131,6 +131,39 @@ and all pipeline components are applied to the `Doc` in order. Both | `doc` | The document to process. ~~Doc~~ | | **RETURNS** | The processed document. ~~Doc~~ | +## DependencyParser.distill {id="distill", tag="method,experimental", version="4"} + +Train a pipe (the student) on the predictions of another pipe (the teacher). The +student is typically trained on the probability distribution of the teacher, but +details may differ per pipe. The goal of distillation is to transfer knowledge +from the teacher to the student. + +The distillation is performed on ~~Example~~ objects. The `Example.reference` +and `Example.predicted` ~~Doc~~s must have the same number of tokens and the +same orthography. Even though the reference does not need have to have gold +annotations, the teacher could adds its own annotations when necessary. + +This feature is experimental. + +> #### Example +> +> ```python +> teacher_pipe = teacher.add_pipe("parser") +> student_pipe = student.add_pipe("parser") +> optimizer = nlp.resume_training() +> losses = student.distill(teacher_pipe, examples, sgd=optimizer) +> ``` + +| Name | Description | +| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~ | +| `examples` | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ | +| _keyword-only_ | | +| `drop` | Dropout rate. ~~float~~ | +| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | +| `losses` | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ | +| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ | + ## DependencyParser.pipe {id="pipe",tag="method"} Apply the pipe to a stream of documents. This usually happens under the hood diff --git a/website/docs/api/edittreelemmatizer.mdx b/website/docs/api/edittreelemmatizer.mdx index b13c3e52e..2e0993657 100644 --- a/website/docs/api/edittreelemmatizer.mdx +++ b/website/docs/api/edittreelemmatizer.mdx @@ -115,6 +115,39 @@ and all pipeline components are applied to the `Doc` in order. Both | `doc` | The document to process. ~~Doc~~ | | **RETURNS** | The processed document. ~~Doc~~ | +## EditTreeLemmatizer.distill {id="distill", tag="method,experimental", version="4"} + +Train a pipe (the student) on the predictions of another pipe (the teacher). The +student is typically trained on the probability distribution of the teacher, but +details may differ per pipe. The goal of distillation is to transfer knowledge +from the teacher to the student. + +The distillation is performed on ~~Example~~ objects. The `Example.reference` +and `Example.predicted` ~~Doc~~s must have the same number of tokens and the +same orthography. Even though the reference does not need have to have gold +annotations, the teacher could adds its own annotations when necessary. + +This feature is experimental. + +> #### Example +> +> ```python +> teacher_pipe = teacher.add_pipe("trainable_lemmatizer") +> student_pipe = student.add_pipe("trainable_lemmatizer") +> optimizer = nlp.resume_training() +> losses = student.distill(teacher_pipe, examples, sgd=optimizer) +> ``` + +| Name | Description | +| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~ | +| `examples` | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ | +| _keyword-only_ | | +| `drop` | Dropout rate. ~~float~~ | +| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | +| `losses` | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ | +| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ | + ## EditTreeLemmatizer.pipe {id="pipe",tag="method"} Apply the pipe to a stream of documents. This usually happens under the hood diff --git a/website/docs/api/entityrecognizer.mdx b/website/docs/api/entityrecognizer.mdx index 1f386bbb6..005d5d11d 100644 --- a/website/docs/api/entityrecognizer.mdx +++ b/website/docs/api/entityrecognizer.mdx @@ -127,6 +127,39 @@ and all pipeline components are applied to the `Doc` in order. Both | `doc` | The document to process. ~~Doc~~ | | **RETURNS** | The processed document. ~~Doc~~ | +## EntityRecognizer.distill {id="distill", tag="method,experimental", version="4"} + +Train a pipe (the student) on the predictions of another pipe (the teacher). The +student is typically trained on the probability distribution of the teacher, but +details may differ per pipe. The goal of distillation is to transfer knowledge +from the teacher to the student. + +The distillation is performed on ~~Example~~ objects. The `Example.reference` +and `Example.predicted` ~~Doc~~s must have the same number of tokens and the +same orthography. Even though the reference does not need have to have gold +annotations, the teacher could adds its own annotations when necessary. + +This feature is experimental. + +> #### Example +> +> ```python +> teacher_pipe = teacher.add_pipe("ner") +> student_pipe = student.add_pipe("ner") +> optimizer = nlp.resume_training() +> losses = student.distill(teacher_pipe, examples, sgd=optimizer) +> ``` + +| Name | Description | +| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~ | +| `examples` | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ | +| _keyword-only_ | | +| `drop` | Dropout rate. ~~float~~ | +| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | +| `losses` | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ | +| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ | + ## EntityRecognizer.pipe {id="pipe",tag="method"} Apply the pipe to a stream of documents. This usually happens under the hood @@ -264,6 +297,27 @@ predicted scores. | `scores` | Scores representing the model's predictions. ~~StateClass~~ | | **RETURNS** | The loss and the gradient, i.e. `(loss, gradient)`. ~~Tuple[float, float]~~ | +## EntityRecognizer.get_teacher_student_loss {id="get_teacher_student_loss", tag="method", version="4"} + +Calculate the loss and its gradient for the batch of student scores relative to +the teacher scores. + +> #### Example +> +> ```python +> teacher_ner = teacher.get_pipe("ner") +> student_ner = student.add_pipe("ner") +> student_scores = student_ner.predict([eg.predicted for eg in examples]) +> teacher_scores = teacher_ner.predict([eg.predicted for eg in examples]) +> loss, d_loss = student_ner.get_teacher_student_loss(teacher_scores, student_scores) +> ``` + +| Name | Description | +| ---------------- | --------------------------------------------------------------------------- | +| `teacher_scores` | Scores representing the teacher model's predictions. | +| `student_scores` | Scores representing the student model's predictions. | +| **RETURNS** | The loss and the gradient, i.e. `(loss, gradient)`. ~~Tuple[float, float]~~ | + ## EntityRecognizer.create_optimizer {id="create_optimizer",tag="method"} Create an optimizer for the pipeline component. diff --git a/website/docs/api/morphologizer.mdx b/website/docs/api/morphologizer.mdx index 2fbc67591..4f79458d3 100644 --- a/website/docs/api/morphologizer.mdx +++ b/website/docs/api/morphologizer.mdx @@ -121,6 +121,39 @@ delegate to the [`predict`](/api/morphologizer#predict) and | `doc` | The document to process. ~~Doc~~ | | **RETURNS** | The processed document. ~~Doc~~ | +## Morphologizer.distill {id="distill", tag="method,experimental", version="4"} + +Train a pipe (the student) on the predictions of another pipe (the teacher). The +student is typically trained on the probability distribution of the teacher, but +details may differ per pipe. The goal of distillation is to transfer knowledge +from the teacher to the student. + +The distillation is performed on ~~Example~~ objects. The `Example.reference` +and `Example.predicted` ~~Doc~~s must have the same number of tokens and the +same orthography. Even though the reference does not need have to have gold +annotations, the teacher could adds its own annotations when necessary. + +This feature is experimental. + +> #### Example +> +> ```python +> teacher_pipe = teacher.add_pipe("morphologizer") +> student_pipe = student.add_pipe("morphologizer") +> optimizer = nlp.resume_training() +> losses = student.distill(teacher_pipe, examples, sgd=optimizer) +> ``` + +| Name | Description | +| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~ | +| `examples` | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ | +| _keyword-only_ | | +| `drop` | Dropout rate. ~~float~~ | +| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | +| `losses` | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ | +| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ | + ## Morphologizer.pipe {id="pipe",tag="method"} Apply the pipe to a stream of documents. This usually happens under the hood diff --git a/website/docs/api/sentencerecognizer.mdx b/website/docs/api/sentencerecognizer.mdx index 151192eac..02fd57102 100644 --- a/website/docs/api/sentencerecognizer.mdx +++ b/website/docs/api/sentencerecognizer.mdx @@ -106,6 +106,39 @@ and all pipeline components are applied to the `Doc` in order. Both | `doc` | The document to process. ~~Doc~~ | | **RETURNS** | The processed document. ~~Doc~~ | +## SentenceRecognizer.distill {id="distill", tag="method,experimental", version="4"} + +Train a pipe (the student) on the predictions of another pipe (the teacher). The +student is typically trained on the probability distribution of the teacher, but +details may differ per pipe. The goal of distillation is to transfer knowledge +from the teacher to the student. + +The distillation is performed on ~~Example~~ objects. The `Example.reference` +and `Example.predicted` ~~Doc~~s must have the same number of tokens and the +same orthography. Even though the reference does not need have to have gold +annotations, the teacher could adds its own annotations when necessary. + +This feature is experimental. + +> #### Example +> +> ```python +> teacher_pipe = teacher.add_pipe("senter") +> student_pipe = student.add_pipe("senter") +> optimizer = nlp.resume_training() +> losses = student.distill(teacher_pipe, examples, sgd=optimizer) +> ``` + +| Name | Description | +| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~ | +| `examples` | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ | +| _keyword-only_ | | +| `drop` | Dropout rate. ~~float~~ | +| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | +| `losses` | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ | +| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ | + ## SentenceRecognizer.pipe {id="pipe",tag="method"} Apply the pipe to a stream of documents. This usually happens under the hood diff --git a/website/docs/api/tagger.mdx b/website/docs/api/tagger.mdx index b8d71bc0d..664fd7940 100644 --- a/website/docs/api/tagger.mdx +++ b/website/docs/api/tagger.mdx @@ -105,6 +105,39 @@ and all pipeline components are applied to the `Doc` in order. Both | `doc` | The document to process. ~~Doc~~ | | **RETURNS** | The processed document. ~~Doc~~ | +## Tagger.distill {id="distill", tag="method,experimental", version="4"} + +Train a pipe (the student) on the predictions of another pipe (the teacher). The +student is typically trained on the probability distribution of the teacher, but +details may differ per pipe. The goal of distillation is to transfer knowledge +from the teacher to the student. + +The distillation is performed on ~~Example~~ objects. The `Example.reference` +and `Example.predicted` ~~Doc~~s must have the same number of tokens and the +same orthography. Even though the reference does not need have to have gold +annotations, the teacher could adds its own annotations when necessary. + +This feature is experimental. + +> #### Example +> +> ```python +> teacher_pipe = teacher.add_pipe("tagger") +> student_pipe = student.add_pipe("tagger") +> optimizer = nlp.resume_training() +> losses = student.distill(teacher_pipe, examples, sgd=optimizer) +> ``` + +| Name | Description | +| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| `teacher_pipe` | The teacher pipe to learn from. ~~Optional[TrainablePipe]~~ | +| `examples` | Distillation examples. The reference and predicted docs must have the same number of tokens and the same orthography. ~~Iterable[Example]~~ | +| _keyword-only_ | | +| `drop` | Dropout rate. ~~float~~ | +| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ | +| `losses` | Optional record of the loss during distillation. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ | +| **RETURNS** | The updated `losses` dictionary. ~~Dict[str, float]~~ | + ## Tagger.pipe {id="pipe",tag="method"} Apply the pipe to a stream of documents. This usually happens under the hood