documentation

2025-09-22 03:52:39 +03:00 · 2022-12-07 19:43:18 +00:00 · 2022-12-07 19:43:18 +00:00 · ae451d1047
commit ae451d1047
parent 17abfa6790
1 changed files with 31 additions and 0 deletions
--- a/website/docs/api/cli.md
+++ b/website/docs/api/cli.md
@ -12,6 +12,7 @@ menu:
  - ['train', 'train']
  - ['pretrain', 'pretrain']
  - ['evaluate', 'evaluate']
+  - ['apply', 'apply']
  - ['find-threshold', 'find-threshold']
  - ['assemble', 'assemble']
  - ['package', 'package']
@ -1162,6 +1163,36 @@ $ python -m spacy evaluate [model] [data_path] [--output] [--code] [--gold-prepr
 | `--help`, `-h`                            | Show help message and available arguments. ~~bool (flag)~~                                                                                                                           |
 | **CREATES**                               | Training results and optional metrics and visualizations.                                                                                                                            |

+## apply {#apply new="3.5" tag="command"}
+
+Applies a trained pipeline to data and stores the resulting
+annotated documents in a `DocBin`. The input can be a single file
+or a directory. The recognized input formats are:
+
+1. `.spacy`
+2. `.jsonl` containing a user specified `text_key`
+3. Files with any other extension are assumed to be plain text files containing a single document.
+
+When a directory is provided it is traversed recursively to collect all files.
+
+```cli
+$ python -m spacy apply [model] [data-path] [output-file] [--code] [--text-key] [--force-overwrite] [--gpu-id] [--batch-size] [--n-process]
+```
+
+| Name                                      | Description                                                                                                                                                                          |
+| ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `model`                                   | Pipeline to apply to the data. Can be a package or a path to a data directory. ~~str (positional)~~                                                                                           |
+| `data_path`                               | Location of evaluation data in spaCy's [binary format](/api/data-formats#training). ~~Path (positional)~~                                                                            |
+| `output-file`, `-o`                          | Output `DocBin` path.  ~~str (positional)~~                                                                                  |
+| `--code`, `-c` <Tag variant="new">3</Tag> | Path to Python file with additional code to be imported. Allows [registering custom functions](/usage/training#custom-functions) for new architectures. ~~Optional[Path] \(option)~~ |
+| `--text-key`, `-tk`                    | The key for `.jsonl` files to use to grab the texts from. ~~Optional[str] \(option)~~                                                                                                                                              |
+| `--force-overwrite`, `-F`                    | If the provided `output-file` already exists, then force `apply` to overwrite it. If this is `False` (default) then quits with a warning instead. ~~bool (flag)~~                                                                                                                                              |
+| `--gpu-id`, `-g`                          | GPU to use, if any. Defaults to `-1` for CPU. ~~int (option)~~                                                                                                                       |
+| `--batch-size`, `-g`                          | Batch size to use for prediction. Defaults to `1`. ~~int (option)~~                                                                                                                       |
+| `--n-process`, `-g`                          | Number of processes to use for prediction. Defaults to `1`. ~~int (option)~~                                                                                                                       |
+| `--help`, `-h`                            | Show help message and available arguments. ~~bool (flag)~~                                                                                                                           |
+| **CREATES**                               | A `DocBin` with the annotations from the `model` for all the files found in `data-path`.                                                                                                                            |
+
 ## find-threshold {#find-threshold new="3.5" tag="command"}

 Runs prediction trials for a trained model with varying tresholds to maximize