Document gold.docs_to_json [ci skip]

This commit is contained in:
Ines Montani 2019-07-10 10:27:33 +02:00
parent 881f5bc401
commit ebe58e7fa1
3 changed files with 27 additions and 1 deletions

View File

@ -520,7 +520,9 @@ spaCy takes training data in JSON format. The built-in
[`convert`](/api/cli#convert) command helps you convert the `.conllu` format
used by the
[Universal Dependencies corpora](https://github.com/UniversalDependencies) to
spaCy's training format.
spaCy's training format. To convert one or more existing `Doc` objects to
spaCy's JSON format, you can use the
[`gold.docs_to_json`](/api/goldparse#docs_to_json) helper.
> #### Annotating entities
>

View File

@ -55,6 +55,27 @@ Whether the provided syntactic annotations form a projective dependency tree.
## Utilities {#util}
### gold.docs_to_json {#docs_to_json tag="function"}
Convert a list of Doc objects into the
[JSON-serializable format](/api/annotation#json-input) used by the
[`spacy train`](/api/cli#train) command.
> #### Example
>
> ```python
> from spacy.gold import docs_to_json
>
> doc = nlp(u"I like London")
> json_data = docs_to_json([doc])
> ```
| Name | Type | Description |
| ----------- | ---------------- | ------------------------------------------ |
| `docs` | iterable / `Doc` | The `Doc` object(s) to convert. |
| `id` | int | ID to assign to the JSON. Defaults to `0`. |
| **RETURNS** | list | The data in spaCy's JSON format. |
### gold.biluo_tags_from_offsets {#biluo_tags_from_offsets tag="function"}
Encode labelled spans into per-token tags, using the

View File

@ -39,6 +39,9 @@ mkdir models
python -m spacy train es models ancora-json/es_ancora-ud-train.json ancora-json/es_ancora-ud-dev.json
```
You can also use the [`gold.docs_to_json`](/api/goldparse#docs_to_json) helper
to convert a list of `Doc` objects to spaCy's JSON training format.
#### Understanding the training output
When you train a model using the [`spacy train`](/api/cli#train) command, you'll