From ebe58e7fa18af919eb69b9e468d0ec30c9338dcc Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Wed, 10 Jul 2019 10:27:33 +0200 Subject: [PATCH] Document gold.docs_to_json [ci skip] --- website/docs/api/annotation.md | 4 +++- website/docs/api/goldparse.md | 21 +++++++++++++++++++++ website/docs/usage/training.md | 3 +++ 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/website/docs/api/annotation.md b/website/docs/api/annotation.md index a5bb30b6f..ed0e0b3e0 100644 --- a/website/docs/api/annotation.md +++ b/website/docs/api/annotation.md @@ -520,7 +520,9 @@ spaCy takes training data in JSON format. The built-in [`convert`](/api/cli#convert) command helps you convert the `.conllu` format used by the [Universal Dependencies corpora](https://github.com/UniversalDependencies) to -spaCy's training format. +spaCy's training format. To convert one or more existing `Doc` objects to +spaCy's JSON format, you can use the +[`gold.docs_to_json`](/api/goldparse#docs_to_json) helper. > #### Annotating entities > diff --git a/website/docs/api/goldparse.md b/website/docs/api/goldparse.md index ca5b6a811..13f68a85d 100644 --- a/website/docs/api/goldparse.md +++ b/website/docs/api/goldparse.md @@ -55,6 +55,27 @@ Whether the provided syntactic annotations form a projective dependency tree. ## Utilities {#util} +### gold.docs_to_json {#docs_to_json tag="function"} + +Convert a list of Doc objects into the +[JSON-serializable format](/api/annotation#json-input) used by the +[`spacy train`](/api/cli#train) command. + +> #### Example +> +> ```python +> from spacy.gold import docs_to_json +> +> doc = nlp(u"I like London") +> json_data = docs_to_json([doc]) +> ``` + +| Name | Type | Description | +| ----------- | ---------------- | ------------------------------------------ | +| `docs` | iterable / `Doc` | The `Doc` object(s) to convert. | +| `id` | int | ID to assign to the JSON. Defaults to `0`. | +| **RETURNS** | list | The data in spaCy's JSON format. | + ### gold.biluo_tags_from_offsets {#biluo_tags_from_offsets tag="function"} Encode labelled spans into per-token tags, using the diff --git a/website/docs/usage/training.md b/website/docs/usage/training.md index 773b70f05..b84bf4e12 100644 --- a/website/docs/usage/training.md +++ b/website/docs/usage/training.md @@ -39,6 +39,9 @@ mkdir models python -m spacy train es models ancora-json/es_ancora-ud-train.json ancora-json/es_ancora-ud-dev.json ``` +You can also use the [`gold.docs_to_json`](/api/goldparse#docs_to_json) helper +to convert a list of `Doc` objects to spaCy's JSON training format. + #### Understanding the training output When you train a model using the [`spacy train`](/api/cli#train) command, you'll