Update v3 docs WIP [ci skip]

This commit is contained in:
Ines Montani 2020-07-06 15:57:44 +02:00
parent fa261d09e8
commit a35236e5f0
4 changed files with 42 additions and 15 deletions

View File

@ -106,7 +106,7 @@ systems, or to pre-process text for **deep learning**.
- **spaCy is not a company**. It's an open-source library. Our company - **spaCy is not a company**. It's an open-source library. Our company
publishing spaCy and other software is called publishing spaCy and other software is called
[Explosion AI](https://explosion.ai). [Explosion](https://explosion.ai).
## Features {#features} ## Features {#features}

View File

@ -36,22 +36,19 @@ ready-to-use spaCy models.
The recommended way to train your spaCy models is via the The recommended way to train your spaCy models is via the
[`spacy train`](/api/cli#train) command on the command line. [`spacy train`](/api/cli#train) command on the command line.
1. The **training data** in spaCy's 1. The **training and evaluation data** in spaCy's
[binary format](/api/data-formats#binary-training) created using [binary `.spacy` format](/api/data-formats#binary-training) created using
[`spacy convert`](/api/cli#convert). [`spacy convert`](/api/cli#convert).
2. A `config.cfg` **configuration file** with all settings and hyperparameters. 2. A [`config.cfg`](#config) **configuration file** with all settings and
hyperparameters.
3. An optional **Python file** to register 3. An optional **Python file** to register
[custom models and architectures](#custom-models). [custom models and architectures](#custom-models).
<!-- TODO: decide how we want to present the "getting started" workflow here, get a default config etc. --> <!-- TODO: decide how we want to present the "getting started" workflow here, get a default config etc. -->
<Project id="some_example_project"> ```bash
$ python -m spacy train train.spacy dev.spacy config.cfg --output ./output
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum ```
sodales lectus, ut sodales orci ullamcorper id. Sed condimentum neque ut erat
mattis pretium.
</Project>
> #### Tip: Debug your data > #### Tip: Debug your data
> >
@ -60,9 +57,17 @@ mattis pretium.
> invalid entity annotations, cyclic dependencies, low data labels and more. > invalid entity annotations, cyclic dependencies, low data labels and more.
> >
> ```bash > ```bash
> $ python -m spacy debug-data en train.json dev.json --verbose > $ python -m spacy debug-data en train.spacy dev.spacy --verbose
> ``` > ```
<Project id="some_example_project">
The easiest way to get started with an end-to-end training process is to clone a
[project](/usage/projects) template. Projects let you manage multi-step
workflows, from data preprocessing to training and packaging your model.
</Project>
<Accordion title="Understanding the training output"> <Accordion title="Understanding the training output">
When you train a model using the [`spacy train`](/api/cli#train) command, you'll When you train a model using the [`spacy train`](/api/cli#train) command, you'll
@ -94,7 +99,28 @@ still look good.
--- ---
### Training config files {#cli} ### Training config files {#config}
> #### Migration from spaCy v2.x
>
> TODO: ...
Training config files include all **settings and hyperparameters** for training
your model. Instead of providing lots of arguments on the command line, you only
need to pass your `config.cfg` file to [`spacy train`](/api/cli#train).
To read more about how the config system works under the hood, check out the
[Thinc documentation](https://thinc.ai/docs/usage-config).
- **Structured sections.**
- **References to registered functions.** Sections can refer to registered
functions like [model architectures](/api/architectures),
[optimizers](https://thinc.ai/docs/api-optimizers) or
[schedules](https://thinc.ai/docs/api-schedules) and define arguments that are
passed into them. You can also register your own functions to define
[custom architectures](#custom-models), reference them in your config,
- **Interpolation.** If you have hyperparameters used by multiple components,
define them once and reference them as variables.
<!-- TODO: we need to come up with a good way to present the sections and their expected values visually? --> <!-- TODO: we need to come up with a good way to present the sections and their expected values visually? -->
@ -174,6 +200,7 @@ mattis pretium.
### Training with custom code ### Training with custom code
<!-- TODO: document usage of spacy train with --code --> <!-- TODO: document usage of spacy train with --code -->
<!-- TODO: link to type annotations and maybe show example: https://thinc.ai/docs/usage-config#advanced-types -->
## Transfer learning {#transfer-learning} ## Transfer learning {#transfer-learning}

View File

@ -6,7 +6,7 @@
"siteUrlNightly": "https://nightly.spacy.io", "siteUrlNightly": "https://nightly.spacy.io",
"nightlyBranches": ["nightly.spacy.io"], "nightlyBranches": ["nightly.spacy.io"],
"email": "contact@explosion.ai", "email": "contact@explosion.ai",
"company": "Explosion AI", "company": "Explosion",
"companyUrl": "https://explosion.ai", "companyUrl": "https://explosion.ai",
"repo": "explosion/spaCy", "repo": "explosion/spaCy",
"modelsRepo": "explosion/spacy-models", "modelsRepo": "explosion/spacy-models",

View File

@ -3,7 +3,7 @@
"private": true, "private": true,
"description": "spaCy website", "description": "spaCy website",
"version": "3.0.0", "version": "3.0.0",
"author": "Explosion AI <contact@explosion.ai>", "author": "Explosion <contact@explosion.ai>",
"license": "MIT", "license": "MIT",
"dependencies": { "dependencies": {
"@jupyterlab/outputarea": "^0.19.1", "@jupyterlab/outputarea": "^0.19.1",