This commit is contained in:
kadarakos 2022-12-08 11:29:15 +00:00
parent ae451d1047
commit a9991abf8f

View File

@ -475,7 +475,7 @@ report span characteristics such as the average span length and the span (or
span boundary) distinctiveness. The distinctiveness measure shows how different
the tokens are with respect to the rest of the corpus using the KL-divergence of
the token distributions. To learn more, you can check out Papay et al.'s work on
[*Dissecting Span Identification Tasks with Performance Prediction* (EMNLP 2020)](https://aclanthology.org/2020.emnlp-main.396/).
[_Dissecting Span Identification Tasks with Performance Prediction_ (EMNLP 2020)](https://aclanthology.org/2020.emnlp-main.396/).
</Infobox>
@ -1165,13 +1165,14 @@ $ python -m spacy evaluate [model] [data_path] [--output] [--code] [--gold-prepr
## apply {#apply new="3.5" tag="command"}
Applies a trained pipeline to data and stores the resulting
annotated documents in a `DocBin`. The input can be a single file
or a directory. The recognized input formats are:
Applies a trained pipeline to data and stores the resulting annotated documents
in a `DocBin`. The input can be a single file or a directory. The recognized
input formats are:
1. `.spacy`
2. `.jsonl` containing a user specified `text_key`
3. Files with any other extension are assumed to be plain text files containing a single document.
3. Files with any other extension are assumed to be plain text files containing
a single document.
When a directory is provided it is traversed recursively to collect all files.
@ -1181,17 +1182,17 @@ $ python -m spacy apply [model] [data-path] [output-file] [--code] [--text-key]
| Name | Description |
| ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model` | Pipeline to apply to the data. Can be a package or a path to a data directory. ~~str (positional)~~ |
| `model` | Pipeline to apply to the data. Can be a package or a path to a data directory. ~~str (positional)~~ |
| `data_path` | Location of evaluation data in spaCy's [binary format](/api/data-formats#training). ~~Path (positional)~~ |
| `output-file`, `-o` | Output `DocBin` path. ~~str (positional)~~ |
| `output-file`, `-o` | Output `DocBin` path. ~~str (positional)~~ |
| `--code`, `-c` <Tag variant="new">3</Tag> | Path to Python file with additional code to be imported. Allows [registering custom functions](/usage/training#custom-functions) for new architectures. ~~Optional[Path] \(option)~~ |
| `--text-key`, `-tk` | The key for `.jsonl` files to use to grab the texts from. ~~Optional[str] \(option)~~ |
| `--force-overwrite`, `-F` | If the provided `output-file` already exists, then force `apply` to overwrite it. If this is `False` (default) then quits with a warning instead. ~~bool (flag)~~ |
| `--text-key`, `-tk` | The key for `.jsonl` files to use to grab the texts from. ~~Optional[str] \(option)~~ |
| `--force-overwrite`, `-F` | If the provided `output-file` already exists, then force `apply` to overwrite it. If this is `False` (default) then quits with a warning instead. ~~bool (flag)~~ |
| `--gpu-id`, `-g` | GPU to use, if any. Defaults to `-1` for CPU. ~~int (option)~~ |
| `--batch-size`, `-g` | Batch size to use for prediction. Defaults to `1`. ~~int (option)~~ |
| `--n-process`, `-g` | Number of processes to use for prediction. Defaults to `1`. ~~int (option)~~ |
| `--batch-size`, `-g` | Batch size to use for prediction. Defaults to `1`. ~~int (option)~~ |
| `--n-process`, `-g` | Number of processes to use for prediction. Defaults to `1`. ~~int (option)~~ |
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | A `DocBin` with the annotations from the `model` for all the files found in `data-path`. |
| **CREATES** | A `DocBin` with the annotations from the `model` for all the files found in `data-path`. |
## find-threshold {#find-threshold new="3.5" tag="command"}
@ -1218,7 +1219,6 @@ be provided.
> $ python -m spacy find-threshold my_nlp data.spacy spancat threshold spans_sc_f
> ```
| Name | Description |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model` | Pipeline to evaluate. Can be a package or a path to a data directory. ~~str (positional)~~ |