This commit is contained in:
kadarakos 2022-12-08 11:29:15 +00:00
parent ae451d1047
commit a9991abf8f

View File

@ -475,7 +475,7 @@ report span characteristics such as the average span length and the span (or
span boundary) distinctiveness. The distinctiveness measure shows how different
the tokens are with respect to the rest of the corpus using the KL-divergence of
the token distributions. To learn more, you can check out Papay et al.'s work on
[*Dissecting Span Identification Tasks with Performance Prediction* (EMNLP 2020)](https://aclanthology.org/2020.emnlp-main.396/).
[_Dissecting Span Identification Tasks with Performance Prediction_ (EMNLP 2020)](https://aclanthology.org/2020.emnlp-main.396/).
</Infobox>
@ -1165,13 +1165,14 @@ $ python -m spacy evaluate [model] [data_path] [--output] [--code] [--gold-prepr
## apply {#apply new="3.5" tag="command"}
Applies a trained pipeline to data and stores the resulting
annotated documents in a `DocBin`. The input can be a single file
or a directory. The recognized input formats are:
Applies a trained pipeline to data and stores the resulting annotated documents
in a `DocBin`. The input can be a single file or a directory. The recognized
input formats are:
1. `.spacy`
2. `.jsonl` containing a user specified `text_key`
3. Files with any other extension are assumed to be plain text files containing a single document.
3. Files with any other extension are assumed to be plain text files containing
a single document.
When a directory is provided it is traversed recursively to collect all files.
@ -1218,7 +1219,6 @@ be provided.
> $ python -m spacy find-threshold my_nlp data.spacy spancat threshold spans_sc_f
> ```
| Name | Description |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model` | Pipeline to evaluate. Can be a package or a path to a data directory. ~~str (positional)~~ |