Type and doc fixes

2025-09-19 18:42:37 +03:00 · 2024-04-11 15:48:48 +02:00 · 2024-04-11 15:48:48 +02:00 · 842bbeae29
commit 842bbeae29
parent 72c0f7c798
2 changed files with 12 additions and 11 deletions
--- a/spacy/cli/distill.py
+++ b/spacy/cli/distill.py
@ -55,7 +55,7 @@ def distill_cli(


 def distill(
-    teacher_model: str,
+    teacher_model: Union[str, Path],
    student_config_path: Union[str, Path],
    output_path: Optional[Union[str, Path]] = None,
    *,
--- a/website/docs/api/cli.mdx
+++ b/website/docs/api/cli.mdx
@ -1707,17 +1707,18 @@ $ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose] [--
 Distill a _student_ pipeline from a _teacher_ pipeline. Distillation trains the
 models in the student pipeline on the activations of the teacher's models. A
 typical use case for distillation is to extract a smaller, more performant model
-from a larger high-accuracy model. Since distillation uses the activations of the
-teacher, distillation can be performed on a corpus of raw text without (gold standard)
-annotations.
+from a larger high-accuracy model. Since distillation uses the activations of
+the teacher, distillation can be performed on a corpus of raw text without (gold
+standard) annotations. A development set of gold annotations _is_ needed to
+evaluate the distilled model on during distillation.

-`distill` will save out the best performing pipeline across all epochs, as well as the final
-pipeline. The `--code` argument can be used to provide a Python file that's
-imported before the training process starts. This lets you register
+`distill` will save out the best performing pipeline across all epochs, as well
+as the final pipeline. The `--code` argument can be used to provide a Python
+file that's imported before the training process starts. This lets you register
 [custom functions](/usage/training#custom-functions) and architectures and refer
 to them in your config, all while still using spaCy's built-in `train` workflow.
-If you need to manage complex multi-step training workflows, check out the new
-[spaCy projects](/usage/projects).
+If you need to manage complex multi-step training workflows, check out the
+[Weasel](https://github.com/explosion/weasel).

 > #### Example
 >
@ -1731,14 +1732,14 @@ $ python -m spacy distill [teacher_model] [student_config_path] [--output] [--co

 | Name                  | Description                                                                                                                                                                                            |
 | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `teacher_model`       | The teacher pipeline to distill the student from. ~~Path (positional)~~                                                                                                                                |
+| `teacher_model`       | The teacher pipeline (name or path) to distill the student from. ~~Union[str, Path] (positional)~~                                                                                                     |
 | `student_config_path` | The configuration of the student pipeline. ~~Path (positional)~~                                                                                                                                       |
 | `--output`, `-o`      | Directory to store the distilled pipeline in. Will be created if it doesn't exist. ~~Optional[Path] \(option)~~                                                                                        |
 | `--code`, `-c`        | Comma-separated paths to Python files with additional code to be imported. Allows [registering custom functions](/usage/training#custom-functions) for new architectures. ~~Optional[Path] \(option)~~ |
 | `--verbose`, `-V`     | Show more detailed messages during distillation. ~~bool (flag)~~                                                                                                                                       |
 | `--gpu-id`, `-g`      | GPU ID or `-1` for CPU. Defaults to `-1`. ~~int (option)~~                                                                                                                                             |
 | `--help`, `-h`        | Show help message and available arguments. ~~bool (flag)~~                                                                                                                                             |
-| **CREATES**           | A `dvc.yaml` file in the project directory, based on the steps defined in the given workflow.                                                                                                          |
+| **CREATES**           | The final trained pipeline and the best trained pipeline.                                                                                                                                              |

 ## huggingface-hub {id="huggingface-hub",version="3.1"}