From a35236e5f068849ca0b0c70944cd85e14e739dff Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Mon, 6 Jul 2020 15:57:44 +0200 Subject: [PATCH] Update v3 docs WIP [ci skip] --- website/docs/usage/spacy-101.md | 2 +- website/docs/usage/training.md | 51 +++++++++++++++++++++++++-------- website/meta/site.json | 2 +- website/package.json | 2 +- 4 files changed, 42 insertions(+), 15 deletions(-) diff --git a/website/docs/usage/spacy-101.md b/website/docs/usage/spacy-101.md index 3c4e85a7d..0cfe404f2 100644 --- a/website/docs/usage/spacy-101.md +++ b/website/docs/usage/spacy-101.md @@ -106,7 +106,7 @@ systems, or to pre-process text for **deep learning**. - **spaCy is not a company**. It's an open-source library. Our company publishing spaCy and other software is called - [Explosion AI](https://explosion.ai). + [Explosion](https://explosion.ai). ## Features {#features} diff --git a/website/docs/usage/training.md b/website/docs/usage/training.md index 53b713f98..2bbf5dddd 100644 --- a/website/docs/usage/training.md +++ b/website/docs/usage/training.md @@ -36,22 +36,19 @@ ready-to-use spaCy models. The recommended way to train your spaCy models is via the [`spacy train`](/api/cli#train) command on the command line. -1. The **training data** in spaCy's - [binary format](/api/data-formats#binary-training) created using +1. The **training and evaluation data** in spaCy's + [binary `.spacy` format](/api/data-formats#binary-training) created using [`spacy convert`](/api/cli#convert). -2. A `config.cfg` **configuration file** with all settings and hyperparameters. +2. A [`config.cfg`](#config) **configuration file** with all settings and + hyperparameters. 3. An optional **Python file** to register [custom models and architectures](#custom-models). - - -Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum -sodales lectus, ut sodales orci ullamcorper id. Sed condimentum neque ut erat -mattis pretium. - - +```bash +$ python -m spacy train train.spacy dev.spacy config.cfg --output ./output +``` > #### Tip: Debug your data > @@ -60,9 +57,17 @@ mattis pretium. > invalid entity annotations, cyclic dependencies, low data labels and more. > > ```bash -> $ python -m spacy debug-data en train.json dev.json --verbose +> $ python -m spacy debug-data en train.spacy dev.spacy --verbose > ``` + + +The easiest way to get started with an end-to-end training process is to clone a +[project](/usage/projects) template. Projects let you manage multi-step +workflows, from data preprocessing to training and packaging your model. + + + When you train a model using the [`spacy train`](/api/cli#train) command, you'll @@ -94,7 +99,28 @@ still look good. --- -### Training config files {#cli} +### Training config files {#config} + +> #### Migration from spaCy v2.x +> +> TODO: ... + +Training config files include all **settings and hyperparameters** for training +your model. Instead of providing lots of arguments on the command line, you only +need to pass your `config.cfg` file to [`spacy train`](/api/cli#train). + +To read more about how the config system works under the hood, check out the +[Thinc documentation](https://thinc.ai/docs/usage-config). + +- **Structured sections.** +- **References to registered functions.** Sections can refer to registered + functions like [model architectures](/api/architectures), + [optimizers](https://thinc.ai/docs/api-optimizers) or + [schedules](https://thinc.ai/docs/api-schedules) and define arguments that are + passed into them. You can also register your own functions to define + [custom architectures](#custom-models), reference them in your config, +- **Interpolation.** If you have hyperparameters used by multiple components, + define them once and reference them as variables. @@ -174,6 +200,7 @@ mattis pretium. ### Training with custom code + ## Transfer learning {#transfer-learning} diff --git a/website/meta/site.json b/website/meta/site.json index 724665060..5fb1a4533 100644 --- a/website/meta/site.json +++ b/website/meta/site.json @@ -6,7 +6,7 @@ "siteUrlNightly": "https://nightly.spacy.io", "nightlyBranches": ["nightly.spacy.io"], "email": "contact@explosion.ai", - "company": "Explosion AI", + "company": "Explosion", "companyUrl": "https://explosion.ai", "repo": "explosion/spaCy", "modelsRepo": "explosion/spacy-models", diff --git a/website/package.json b/website/package.json index d5c770ddf..3c76014b3 100644 --- a/website/package.json +++ b/website/package.json @@ -3,7 +3,7 @@ "private": true, "description": "spaCy website", "version": "3.0.0", - "author": "Explosion AI ", + "author": "Explosion ", "license": "MIT", "dependencies": { "@jupyterlab/outputarea": "^0.19.1",