Update docs [ci skip]

This commit is contained in:
Ines Montani 2020-09-20 17:44:58 +02:00
parent b2302c0a1c
commit 012b3a7096
14 changed files with 77 additions and 59 deletions

View File

@ -895,8 +895,6 @@ what you need. By default, spaCy's
can provide any other repo (public or private) that you have access to using the can provide any other repo (public or private) that you have access to using the
`--repo` option. `--repo` option.
<!-- TODO: update example once we've decided on repo structure -->
```cli ```cli
$ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse] $ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse]
``` ```
@ -904,7 +902,7 @@ $ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse]
> #### Example > #### Example
> >
> ```cli > ```cli
> $ python -m spacy project clone some_example > $ python -m spacy project clone pipelines/ner_wikiner
> ``` > ```
> >
> Clone from custom repo: > Clone from custom repo:

View File

@ -289,8 +289,7 @@ of objects by referring to creation functions, including functions you register
yourself. For details on how to get started with training your own model, check yourself. For details on how to get started with training your own model, check
out the [training quickstart](/usage/training#quickstart). out the [training quickstart](/usage/training#quickstart).
<!-- TODO: <!-- TODO: <Project id="en_core_trf_lg">
<Project id="en_core_trf_lg">
The easiest way to get started is to clone a transformers-based project The easiest way to get started is to clone a transformers-based project
template. Swap in your data, edit the settings and hyperparameters and train, template. Swap in your data, edit the settings and hyperparameters and train,
@ -623,7 +622,7 @@ that are familiar from the training block: the `[pretraining.batcher]`,
`[pretraining.optimizer]` and `[pretraining.corpus]` all work the same way and `[pretraining.optimizer]` and `[pretraining.corpus]` all work the same way and
expect the same types of objects, although for pretraining your corpus does not expect the same types of objects, although for pretraining your corpus does not
need to have any annotations, so you will often use a different reader, such as need to have any annotations, so you will often use a different reader, such as
the [`JsonlReader`](/api/toplevel#jsonlreader). the [`JsonlReader`](/api/top-level#jsonlreader).
> #### Raw text format > #### Raw text format
> >

View File

@ -45,7 +45,7 @@ spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy
right up to **current state-of-the-art**. You can also use a CPU-optimized right up to **current state-of-the-art**. You can also use a CPU-optimized
pipeline, which is less accurate but much cheaper to run. pipeline, which is less accurate but much cheaper to run.
<!-- TODO: --> <!-- TODO: update benchmarks and intro -->
> #### Evaluation details > #### Evaluation details
> >
@ -68,6 +68,6 @@ our project template.
</Project> </Project>
<!-- ## Citing spaCy {#citation} <!-- TODO: ## Citing spaCy {#citation}
<!-- TODO: update --> -->

View File

@ -356,11 +356,11 @@ that training configs are complete and experiments fully reproducible.
</Infobox> </Infobox>
Note that when using a PyTorch or Tensorflow model, it is recommended to set the GPU Note that when using a PyTorch or Tensorflow model, it is recommended to set the
memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or GPU memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or
"tensorflow" in the training config, cupy will allocate memory via those respective libraries, "tensorflow" in the training config, cupy will allocate memory via those
preventing OOM errors when there's available memory sitting in the other respective libraries, preventing OOM errors when there's available memory
library's pool. sitting in the other library's pool.
```ini ```ini
### config.cfg (excerpt) ### config.cfg (excerpt)
@ -489,7 +489,7 @@ with Model.define_operators({">>": chain}):
<Infobox title="This section is still under construction" emoji="🚧" variant="warning"> <Infobox title="This section is still under construction" emoji="🚧" variant="warning">
</Infobox> </Infobox>
<!-- TODO: <!-- TODO: write trainable component section
- Interaction with `predict`, `get_loss` and `set_annotations` - Interaction with `predict`, `get_loss` and `set_annotations`
- Initialization life-cycle with `begin_training`, correlation with add_label - Initialization life-cycle with `begin_training`, correlation with add_label
Example: relation extraction component (implemented as project template) Example: relation extraction component (implemented as project template)

View File

@ -381,8 +381,6 @@ and loading pipeline packages, the underlying functionality is entirely based on
native Python packaging. This allows your application to handle a spaCy pipeline native Python packaging. This allows your application to handle a spaCy pipeline
like any other package dependency. like any other package dependency.
<!-- TODO: reference relevant spaCy project -->
### Downloading and requiring package dependencies {#models-download} ### Downloading and requiring package dependencies {#models-download}
spaCy's built-in [`download`](/api/cli#download) command is mostly intended as a spaCy's built-in [`download`](/api/cli#download) command is mostly intended as a

View File

@ -29,15 +29,13 @@ and share your results with your team. spaCy projects can be used via the new
![Illustration of project workflow and commands](../images/projects.svg) ![Illustration of project workflow and commands](../images/projects.svg)
<!-- TODO: <Project id="pipelines/tagger_parser_ud">
<Project id="some_example_project">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum The easiest way to get started is to clone a project template and run it  for
sodales lectus, ut sodales orci ullamcorper id. Sed condimentum neque ut erat example, this end-to-end template that lets you train a **part-of-speech
mattis pretium. tagger** and **dependency parser** on a Universal Dependencies treebank.
</Project> </Project>
-->
spaCy projects make it easy to integrate with many other **awesome tools** in spaCy projects make it easy to integrate with many other **awesome tools** in
the data science and machine learning ecosystem to track and manage your data the data science and machine learning ecosystem to track and manage your data
@ -65,10 +63,8 @@ project template and copies the files to a local directory. You can then run the
project, e.g. to train a pipeline and edit the commands and scripts to build project, e.g. to train a pipeline and edit the commands and scripts to build
fully custom workflows. fully custom workflows.
<!-- TODO: update with real example project -->
```cli ```cli
python -m spacy project clone some_example_project python -m spacy project clone pipelines/tagger_parser_ud
``` ```
By default, the project will be cloned into the current working directory. You By default, the project will be cloned into the current working directory. You
@ -216,10 +212,8 @@ format, train a pipeline, evaluate it and export metrics, package it and spin up
a quick web demo. It looks pretty similar to a config file used to define CI a quick web demo. It looks pretty similar to a config file used to define CI
pipelines. pipelines.
<!-- TODO: update with better (final) example -->
```yaml ```yaml
https://github.com/explosion/projects/tree/v3/tutorials/ner_fashion_brands/project.yml https://github.com/explosion/projects/tree/v3/pipelines/tagger_parser_ud/project.yml
``` ```
| Section | Description | | Section | Description |

View File

@ -574,7 +574,7 @@ The directory will be created if it doesn't exist, and the whole pipeline data,
meta and configuration will be written out. To make the pipeline more convenient meta and configuration will be written out. To make the pipeline more convenient
to deploy, we recommend wrapping it as a [Python package](/api/cli#package). to deploy, we recommend wrapping it as a [Python package](/api/cli#package).
<Accordion title="Whats the difference between the config.cfg and meta.json?" spaced id="models-meta-vs-config"> <Accordion title="Whats the difference between the config.cfg and meta.json?" spaced id="models-meta-vs-config" spaced>
When you save a pipeline in spaCy v3.0+, two files will be exported: a When you save a pipeline in spaCy v3.0+, two files will be exported: a
[`config.cfg`](/api/data-formats#config) based on [`config.cfg`](/api/data-formats#config) based on
@ -596,6 +596,15 @@ based on [`nlp.meta`](/api/language#meta).
</Accordion> </Accordion>
<Project id="pipelines/tagger_parser_ud">
The easiest way to get started with an end-to-end workflow is to clone a
[project template](/usage/projects) and run it  for example, this template that
lets you train a **part-of-speech tagger** and **dependency parser** on a
Universal Dependencies treebank and generates an installable Python package.
</Project>
### Generating a pipeline package {#models-generating} ### Generating a pipeline package {#models-generating}
<Infobox title="Important note" variant="warning"> <Infobox title="Important note" variant="warning">
@ -699,5 +708,3 @@ class and call [`from_disk`](/api/language#from_disk) instead.
```python ```python
nlp = spacy.blank("en").from_disk("/path/to/data") nlp = spacy.blank("en").from_disk("/path/to/data")
``` ```
<!-- TODO: point to spaCy projects? -->

View File

@ -92,7 +92,7 @@ spaCy's binary `.spacy` format. You can either include the data paths in the
$ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy $ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy
``` ```
<Accordion title="How are the config recommendations generated?" id="quickstart-source"> <Accordion title="How are the config recommendations generated?" id="quickstart-source" spaced>
The recommended config settings generated by the quickstart widget and the The recommended config settings generated by the quickstart widget and the
[`init config`](/api/cli#init-config) command are based on some general **best [`init config`](/api/cli#init-config) command are based on some general **best
@ -112,6 +112,15 @@ as we run more experiments.
</Accordion> </Accordion>
<Project id="pipelines/tagger_parser_ud">
The easiest way to get started is to clone a [project template](/usage/projects)
and run it  for example, this end-to-end template that lets you train a
**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
treebank.
</Project>
## Training config {#config} ## Training config {#config}
Training config files include all **settings and hyperparameters** for training Training config files include all **settings and hyperparameters** for training

View File

@ -176,18 +176,16 @@ freely combine implementations from different frameworks into a single model.
### Manage end-to-end workflows with projects {#features-projects} ### Manage end-to-end workflows with projects {#features-projects}
<!-- TODO: update example -->
> #### Example > #### Example
> >
> ```cli > ```cli
> # Clone a project template > # Clone a project template
> $ python -m spacy project clone example > $ python -m spacy project clone pipelines/tagger_parser_ud
> $ cd example > $ cd tagger_parser_ud
> # Download data assets > # Download data assets
> $ python -m spacy project assets > $ python -m spacy project assets
> # Run a workflow > # Run a workflow
> $ python -m spacy project run train > $ python -m spacy project run all
> ``` > ```
spaCy projects let you manage and share **end-to-end spaCy workflows** for spaCy projects let you manage and share **end-to-end spaCy workflows** for
@ -207,14 +205,6 @@ data, [Streamlit](/usage/projects#streamlit) for building interactive apps,
[Ray](/usage/projects#ray) for parallel training, [Ray](/usage/projects#ray) for parallel training,
[Weights & Biases](/usage/projects#wandb) for experiment tracking, and more! [Weights & Biases](/usage/projects#wandb) for experiment tracking, and more!
<!-- <Project id="some_example_project">
The easiest way to get started with an end-to-end training process is to clone a
[project](/usage/projects) template. Projects let you manage multi-step
workflows, from data preprocessing to training and packaging your pipeline.
</Project>-->
<Infobox title="Details & Documentation" emoji="📖" list> <Infobox title="Details & Documentation" emoji="📖" list>
- **Usage:** [spaCy projects](/usage/projects), - **Usage:** [spaCy projects](/usage/projects),
@ -224,6 +214,15 @@ workflows, from data preprocessing to training and packaging your pipeline.
</Infobox> </Infobox>
<Project id="pipelines/tagger_parser_ud">
The easiest way to get started is to clone a [project template](/usage/projects)
and run it  for example, this end-to-end template that lets you train a
**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
treebank.
</Project>
### Parallel and distributed training with Ray {#features-parallel-training} ### Parallel and distributed training with Ray {#features-parallel-training}
> #### Example > #### Example
@ -875,7 +874,14 @@ values. You can then use the auto-generated `config.cfg` for training:
+ python -m spacy train ./config.cfg --output ./output + python -m spacy train ./config.cfg --output ./output
``` ```
<!-- TODO: project template --> <Project id="pipelines/tagger_parser_ud">
The easiest way to get started is to clone a [project template](/usage/projects)
and run it  for example, this end-to-end template that lets you train a
**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
treebank.
</Project>
#### Training via the Python API {#migrating-training-python} #### Training via the Python API {#migrating-training-python}

View File

@ -12,6 +12,7 @@
"companyUrl": "https://explosion.ai", "companyUrl": "https://explosion.ai",
"repo": "explosion/spaCy", "repo": "explosion/spaCy",
"modelsRepo": "explosion/spacy-models", "modelsRepo": "explosion/spacy-models",
"projectsRepo": "explosion/projects/tree/v3",
"social": { "social": {
"twitter": "spacy_io", "twitter": "spacy_io",
"github": "explosion" "github": "explosion"

View File

@ -13,7 +13,7 @@ export default function Tag({ spaced = false, variant, tooltip, children }) {
const isValid = isString(children) && !isNaN(children) const isValid = isString(children) && !isNaN(children)
const version = isValid ? Number(children).toFixed(1) : children const version = isValid ? Number(children).toFixed(1) : children
const tooltipText = `This feature is new and was introduced in spaCy v${version}` const tooltipText = `This feature is new and was introduced in spaCy v${version}`
// TODO: we probably want to handle this more elegantly, but the idea is // We probably want to handle this more elegantly, but the idea is
// that we can hide tags referring to old versions // that we can hide tags referring to old versions
const major = isString(version) ? Number(version.split('.')[0]) : version const major = isString(version) ? Number(version.split('.')[0]) : version
return major < MIN_VERSION ? null : ( return major < MIN_VERSION ? null : (

View File

@ -10,6 +10,7 @@ const htmlToReactParser = new HtmlToReactParser()
const DEFAULT_BRANCH = 'develop' const DEFAULT_BRANCH = 'develop'
export const repo = siteMetadata.repo export const repo = siteMetadata.repo
export const modelsRepo = siteMetadata.modelsRepo export const modelsRepo = siteMetadata.modelsRepo
export const projectsRepo = siteMetadata.projectsRepo
/** /**
* This is used to provide selectors for headings so they can be crawled by * This is used to provide selectors for headings so they can be crawled by

View File

@ -222,10 +222,11 @@ const Landing = ({ data }) => {
<br /> <br />
<br /> <br />
<br /> <br />
{/** TODO: update with actual example */} <Project id="pipelines/tagger_parser_ud" title="Get started">
<Project id="some_example"> The easiest way to get started is to clone a project template and run it
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum  for example, this template for training a{' '}
sodales lectus. <strong>part-of-speech tagger</strong> and{' '}
<strong>dependency parser</strong> on a Universal Dependencies treebank.
</Project> </Project>
</LandingCol> </LandingCol>
<LandingCol> <LandingCol>

View File

@ -4,25 +4,29 @@ import CopyInput from '../components/copy'
import Infobox from '../components/infobox' import Infobox from '../components/infobox'
import Link from '../components/link' import Link from '../components/link'
import { InlineCode } from '../components/code' import { InlineCode } from '../components/code'
import { projectsRepo } from '../components/util'
// TODO: move to meta?
const DEFAULT_REPO = 'https://github.com/explosion/projects/tree/v3'
const COMMAND = 'python -m spacy project clone' const COMMAND = 'python -m spacy project clone'
export default function Project({ id, repo, children }) { export default function Project({
title = 'Get started with a project template',
id,
repo,
children,
}) {
const repoArg = repo ? ` --repo ${repo}` : '' const repoArg = repo ? ` --repo ${repo}` : ''
const text = `${COMMAND} ${id}${repoArg}` const text = `${COMMAND} ${id}${repoArg}`
const url = `${repo || DEFAULT_REPO}/${id}` const url = `${repo || projectsRepo}/${id}`
const title = ( const header = (
<> <>
Get started with a project template:{' '} {title}:{' '}
<Link to={url}> <Link to={url}>
<InlineCode>{id}</InlineCode> <InlineCode>{id}</InlineCode>
</Link> </Link>
</> </>
) )
return ( return (
<Infobox title={title} emoji="🪐"> <Infobox title={header} emoji="🪐">
{children} {children}
<CopyInput text={text} prefix="$" /> <CopyInput text={text} prefix="$" />
</Infobox> </Infobox>