Merge pull request #8938 from explosion/docs/prodigy-v1-11-project [ci skip]

Update Prodigy project template for v1.11
2025-11-17 08:16:04 +03:00 · 2021-08-12 21:16:49 +10:00 · 2021-08-12 21:16:49 +10:00 · 6260f044cc
commit 6260f044cc
parent e227d24d43 4f769ff913
2 changed files with 70 additions and 39 deletions
--- a/website/docs/images/prodigy_train_curve.jpg
+++ b/website/docs/images/prodigy_train_curve.jpg
--- a/website/docs/usage/projects.md
+++ b/website/docs/usage/projects.md
@ -758,16 +758,6 @@ workflows, but only one can be tracked by DVC.
 ### Prodigy {#prodigy} <IntegrationLogo name="prodigy" width={100} height="auto" align="right" />
 <Infobox title="This section is still under construction" emoji="🚧" variant="warning">
 The Prodigy integration will require a nightly version of Prodigy that supports
 spaCy v3+. You can already use annotations created with Prodigy in spaCy v3 by
 exporting your data with
 [`data-to-spacy`](https://prodi.gy/docs/recipes#data-to-spacy) and running
 [`spacy convert`](/api/cli#convert) to convert it to the binary format.
 </Infobox>
 [Prodigy](https://prodi.gy) is a modern annotation tool for creating training
 data for machine learning models, developed by us. It integrates with spaCy
 out-of-the-box and provides many different
@ -776,17 +766,23 @@ with and without a model in the loop. If Prodigy is installed in your project,
 you can start the annotation server from your `project.yml` for a tight feedback
 loop between data development and training.
-The following example command starts the Prodigy app using the
+<Infobox variant="warning">
-[`ner.correct`](https://prodi.gy/docs/recipes#ner-correct) recipe and streams in
+
-suggestions for the given entity labels produced by a pretrained model. You can
+This integration requires [Prodigy v1.11](https://prodi.gy/docs/changelog#v1.11)
-then correct the suggestions manually in the UI. After you save and exit the
+or higher. If you're using an older version of Prodigy, you can still use your
-server, the full dataset is exported in spaCy's format and split into a training
+annotations in spaCy v3 by exporting your data with
-and evaluation set.
+[`data-to-spacy`](https://prodi.gy/docs/recipes#data-to-spacy) and running
 [`spacy convert`](/api/cli#convert) to convert it to the binary format.
 </Infobox>
 The following example shows a workflow for merging and exporting NER annotations
 collected with Prodigy and training a spaCy pipeline:
 > #### Example usage
 >
 > ```cli
-> $ python -m spacy project run annotate
+> $ python -m spacy project run all
 > ```
 <!-- prettier-ignore -->
@ -794,36 +790,71 @@ and evaluation set.
 ### project.yml
 vars:
  prodigy:
-    dataset: 'ner_articles'
+    train_dataset: "fashion_brands_training"
-    labels: 'PERSON,ORG,PRODUCT'
+    eval_dataset: "fashion_brands_eval"
-    model: 'en_core_web_md'
+
 workflows:
  all:
    - data-to-spacy
    - train_spacy
 commands:
-  - name: annotate
+  - name: "data-to-spacy"
-  - script:
+    help: "Merge your annotations and create data in spaCy's binary format"
-      - 'python -m prodigy ner.correct ${vars.prodigy.dataset} ${vars.prodigy.model} ./assets/raw_data.jsonl --labels ${vars.prodigy.labels}'
+    script:
-      - 'python -m prodigy data-to-spacy ./corpus/train.json ./corpus/eval.json --ner ${vars.prodigy.dataset}'
+      - "python -m prodigy data-to-spacy corpus/ --ner ${vars.prodigy.train_dataset},eval:${vars.prodigy.eval_dataset}"
-      - 'python -m spacy convert ./corpus/train.json ./corpus/train.spacy'
+    outputs:
-      - 'python -m spacy convert ./corpus/eval.json ./corpus/eval.spacy'
+      - "corpus/train.spacy"
-  - deps:
+      - "corpus/dev.spacy"
-      - 'assets/raw_data.jsonl'
+  - name: "train_spacy"
-  - outputs:
+    help: "Train a named entity recognition model with spaCy"
-      - 'corpus/train.spacy'
+    script:
-      - 'corpus/eval.spacy'
+      - "python -m spacy train configs/config.cfg --output training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy"
    deps:
      - "corpus/train.spacy"
      - "corpus/dev.spacy"
    outputs:
      - "training/model-best"
 ```
-You can use the same approach for other types of projects and annotation
+> #### Example train curve output
 >
 > [![Screenshot of train curve terminal output](../images/prodigy_train_curve.jpg)](https://prodi.gy/docs/recipes#train-curve)
 The [`train-curve`](https://prodi.gy/docs/recipes#train-curve) recipe is another
 cool workflow you can include in your project. It will run the training with
 different portions of the data, e.g. 25%, 50%, 75% and 100%. As a rule of thumb,
 if accuracy increases in the last segment, this could indicate that collecting
 more annotations of the same type might improve the model further.
 <!-- prettier-ignore -->
 ```yaml
 ### project.yml (excerpt)
 - name: "train_curve"
    help: "Train the model with Prodigy by using different portions of training examples to evaluate if more annotations can potentially improve the performance"
    script:
      - "python -m prodigy train-curve --ner ${vars.prodigy.train_dataset},eval:${vars.prodigy.eval_dataset} --config configs/${vars.config} --show-plot"
 ```
 You can use the same approach for various types of projects and annotation
 workflows, including
-[text classification](https://prodi.gy/docs/recipes#textcat),
+[named entity recognition](https://prodi.gy/docs/named-entity-recognition),
-[dependency parsing](https://prodi.gy/docs/recipes#dep),
+[span categorization](https://prodi.gy/docs/span-categorization),
 [text classification](https://prodi.gy/docs/text-classification),
 [dependency parsing](https://prodi.gy/docs/dependencies-relations),
 [part-of-speech tagging](https://prodi.gy/docs/recipes#pos) or fully
-[custom recipes](https://prodi.gy/docs/custom-recipes) – for instance, an A/B
+[custom recipes](https://prodi.gy/docs/custom-recipes). You can also use spaCy
-evaluation workflow that lets you compare two different models and their
+project templates to quickly start the annotation server to collect more
-results.
+annotations and add them to your Prodigy dataset.
-<!-- TODO: <Project id="integrations/prodigy">
+<Project id="integrations/prodigy">
-</Project> -->
+Get started with spaCy and Prodigy using our project template. It includes
 commands to create a merged training corpus from your Prodigy annotations,
 training and packaging a spaCy pipeline and analyzing if more annotations may
 improve performance.
 </Project>
 ---