diff --git a/website/docs/api/cli.md b/website/docs/api/cli.md index 7374e1e3f..53cd954be 100644 --- a/website/docs/api/cli.md +++ b/website/docs/api/cli.md @@ -895,8 +895,6 @@ what you need. By default, spaCy's can provide any other repo (public or private) that you have access to using the `--repo` option. - - ```cli $ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse] ``` @@ -904,7 +902,7 @@ $ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse] > #### Example > > ```cli -> $ python -m spacy project clone some_example +> $ python -m spacy project clone pipelines/ner_wikiner > ``` > > Clone from custom repo: diff --git a/website/docs/usage/embeddings-transformers.md b/website/docs/usage/embeddings-transformers.md index c6c703842..a855d703c 100644 --- a/website/docs/usage/embeddings-transformers.md +++ b/website/docs/usage/embeddings-transformers.md @@ -289,8 +289,7 @@ of objects by referring to creation functions, including functions you register yourself. For details on how to get started with training your own model, check out the [training quickstart](/usage/training#quickstart). - + > #### Evaluation details > @@ -68,6 +68,6 @@ our project template. - +--> diff --git a/website/docs/usage/layers-architectures.md b/website/docs/usage/layers-architectures.md index f9787d815..a58ba2ba9 100644 --- a/website/docs/usage/layers-architectures.md +++ b/website/docs/usage/layers-architectures.md @@ -356,11 +356,11 @@ that training configs are complete and experiments fully reproducible. -Note that when using a PyTorch or Tensorflow model, it is recommended to set the GPU -memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or -"tensorflow" in the training config, cupy will allocate memory via those respective libraries, -preventing OOM errors when there's available memory sitting in the other -library's pool. +Note that when using a PyTorch or Tensorflow model, it is recommended to set the +GPU memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or +"tensorflow" in the training config, cupy will allocate memory via those +respective libraries, preventing OOM errors when there's available memory +sitting in the other library's pool. ```ini ### config.cfg (excerpt) @@ -489,7 +489,7 @@ with Model.define_operators({">>": chain}): - - ### Downloading and requiring package dependencies {#models-download} spaCy's built-in [`download`](/api/cli#download) command is mostly intended as a diff --git a/website/docs/usage/projects.md b/website/docs/usage/projects.md index 08bfb9da2..f8d5a3761 100644 --- a/website/docs/usage/projects.md +++ b/website/docs/usage/projects.md @@ -29,15 +29,13 @@ and share your results with your team. spaCy projects can be used via the new ![Illustration of project workflow and commands](../images/projects.svg) - spaCy projects make it easy to integrate with many other **awesome tools** in the data science and machine learning ecosystem to track and manage your data @@ -65,10 +63,8 @@ project template and copies the files to a local directory. You can then run the project, e.g. to train a pipeline and edit the commands and scripts to build fully custom workflows. - - ```cli -python -m spacy project clone some_example_project +python -m spacy project clone pipelines/tagger_parser_ud ``` By default, the project will be cloned into the current working directory. You @@ -216,10 +212,8 @@ format, train a pipeline, evaluate it and export metrics, package it and spin up a quick web demo. It looks pretty similar to a config file used to define CI pipelines. - - ```yaml -https://github.com/explosion/projects/tree/v3/tutorials/ner_fashion_brands/project.yml +https://github.com/explosion/projects/tree/v3/pipelines/tagger_parser_ud/project.yml ``` | Section | Description | diff --git a/website/docs/usage/saving-loading.md b/website/docs/usage/saving-loading.md index c0fe1323c..3a95bf6aa 100644 --- a/website/docs/usage/saving-loading.md +++ b/website/docs/usage/saving-loading.md @@ -574,7 +574,7 @@ The directory will be created if it doesn't exist, and the whole pipeline data, meta and configuration will be written out. To make the pipeline more convenient to deploy, we recommend wrapping it as a [Python package](/api/cli#package). - + When you save a pipeline in spaCy v3.0+, two files will be exported: a [`config.cfg`](/api/data-formats#config) based on @@ -596,6 +596,15 @@ based on [`nlp.meta`](/api/language#meta). + + +The easiest way to get started with an end-to-end workflow is to clone a +[project template](/usage/projects) and run it – for example, this template that +lets you train a **part-of-speech tagger** and **dependency parser** on a +Universal Dependencies treebank and generates an installable Python package. + + + ### Generating a pipeline package {#models-generating} @@ -699,5 +708,3 @@ class and call [`from_disk`](/api/language#from_disk) instead. ```python nlp = spacy.blank("en").from_disk("/path/to/data") ``` - - diff --git a/website/docs/usage/training.md b/website/docs/usage/training.md index c0f4caad7..6e9de62c5 100644 --- a/website/docs/usage/training.md +++ b/website/docs/usage/training.md @@ -92,7 +92,7 @@ spaCy's binary `.spacy` format. You can either include the data paths in the $ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy ``` - + The recommended config settings generated by the quickstart widget and the [`init config`](/api/cli#init-config) command are based on some general **best @@ -112,6 +112,15 @@ as we run more experiments. + + +The easiest way to get started is to clone a [project template](/usage/projects) +and run it – for example, this end-to-end template that lets you train a +**part-of-speech tagger** and **dependency parser** on a Universal Dependencies +treebank. + + + ## Training config {#config} Training config files include all **settings and hyperparameters** for training diff --git a/website/docs/usage/v3.md b/website/docs/usage/v3.md index 24babc9bd..5abeb5707 100644 --- a/website/docs/usage/v3.md +++ b/website/docs/usage/v3.md @@ -176,18 +176,16 @@ freely combine implementations from different frameworks into a single model. ### Manage end-to-end workflows with projects {#features-projects} - - > #### Example > > ```cli > # Clone a project template -> $ python -m spacy project clone example -> $ cd example +> $ python -m spacy project clone pipelines/tagger_parser_ud +> $ cd tagger_parser_ud > # Download data assets > $ python -m spacy project assets > # Run a workflow -> $ python -m spacy project run train +> $ python -m spacy project run all > ``` spaCy projects let you manage and share **end-to-end spaCy workflows** for @@ -207,14 +205,6 @@ data, [Streamlit](/usage/projects#streamlit) for building interactive apps, [Ray](/usage/projects#ray) for parallel training, [Weights & Biases](/usage/projects#wandb) for experiment tracking, and more! - - - **Usage:** [spaCy projects](/usage/projects), @@ -224,6 +214,15 @@ workflows, from data preprocessing to training and packaging your pipeline. + + +The easiest way to get started is to clone a [project template](/usage/projects) +and run it – for example, this end-to-end template that lets you train a +**part-of-speech tagger** and **dependency parser** on a Universal Dependencies +treebank. + + + ### Parallel and distributed training with Ray {#features-parallel-training} > #### Example @@ -875,7 +874,14 @@ values. You can then use the auto-generated `config.cfg` for training: + python -m spacy train ./config.cfg --output ./output ``` - + + +The easiest way to get started is to clone a [project template](/usage/projects) +and run it – for example, this end-to-end template that lets you train a +**part-of-speech tagger** and **dependency parser** on a Universal Dependencies +treebank. + + #### Training via the Python API {#migrating-training-python} diff --git a/website/meta/site.json b/website/meta/site.json index 1955932b9..1a96ca660 100644 --- a/website/meta/site.json +++ b/website/meta/site.json @@ -12,6 +12,7 @@ "companyUrl": "https://explosion.ai", "repo": "explosion/spaCy", "modelsRepo": "explosion/spacy-models", + "projectsRepo": "explosion/projects/tree/v3", "social": { "twitter": "spacy_io", "github": "explosion" diff --git a/website/src/components/tag.js b/website/src/components/tag.js index 3f2b4e994..b406e771e 100644 --- a/website/src/components/tag.js +++ b/website/src/components/tag.js @@ -13,7 +13,7 @@ export default function Tag({ spaced = false, variant, tooltip, children }) { const isValid = isString(children) && !isNaN(children) const version = isValid ? Number(children).toFixed(1) : children const tooltipText = `This feature is new and was introduced in spaCy v${version}` - // TODO: we probably want to handle this more elegantly, but the idea is + // We probably want to handle this more elegantly, but the idea is // that we can hide tags referring to old versions const major = isString(version) ? Number(version.split('.')[0]) : version return major < MIN_VERSION ? null : ( diff --git a/website/src/components/util.js b/website/src/components/util.js index 3d86cf37e..be55f0bb3 100644 --- a/website/src/components/util.js +++ b/website/src/components/util.js @@ -10,6 +10,7 @@ const htmlToReactParser = new HtmlToReactParser() const DEFAULT_BRANCH = 'develop' export const repo = siteMetadata.repo export const modelsRepo = siteMetadata.modelsRepo +export const projectsRepo = siteMetadata.projectsRepo /** * This is used to provide selectors for headings so they can be crawled by diff --git a/website/src/widgets/landing.js b/website/src/widgets/landing.js index 41b009010..2e75c893a 100644 --- a/website/src/widgets/landing.js +++ b/website/src/widgets/landing.js @@ -222,10 +222,11 @@ const Landing = ({ data }) => {


- {/** TODO: update with actual example */} - - Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum - sodales lectus. + + The easiest way to get started is to clone a project template and run it + – for example, this template for training a{' '} + part-of-speech tagger and{' '} + dependency parser on a Universal Dependencies treebank. diff --git a/website/src/widgets/project.js b/website/src/widgets/project.js index 0bd74bc90..8d309394d 100644 --- a/website/src/widgets/project.js +++ b/website/src/widgets/project.js @@ -4,25 +4,29 @@ import CopyInput from '../components/copy' import Infobox from '../components/infobox' import Link from '../components/link' import { InlineCode } from '../components/code' +import { projectsRepo } from '../components/util' -// TODO: move to meta? -const DEFAULT_REPO = 'https://github.com/explosion/projects/tree/v3' const COMMAND = 'python -m spacy project clone' -export default function Project({ id, repo, children }) { +export default function Project({ + title = 'Get started with a project template', + id, + repo, + children, +}) { const repoArg = repo ? ` --repo ${repo}` : '' const text = `${COMMAND} ${id}${repoArg}` - const url = `${repo || DEFAULT_REPO}/${id}` - const title = ( + const url = `${repo || projectsRepo}/${id}` + const header = ( <> - Get started with a project template:{' '} + {title}:{' '} {id} ) return ( - + {children}