Update docs [ci skip]

2025-12-22 09:34:23 +03:00 · 2020-09-20 17:44:58 +02:00 · 2020-09-20 17:44:58 +02:00 · 012b3a7096
commit 012b3a7096
parent b2302c0a1c
14 changed files with 77 additions and 59 deletions
--- a/website/docs/api/cli.md
+++ b/website/docs/api/cli.md
@ -895,8 +895,6 @@ what you need. By default, spaCy's
 can provide any other repo (public or private) that you have access to using the
 `--repo` option.

-<!-- TODO: update example once we've decided on repo structure -->
-
 ```cli
 $ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse]
 ```
@ -904,7 +902,7 @@ $ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse]
 > #### Example
 >
 > ```cli
-> $ python -m spacy project clone some_example
+> $ python -m spacy project clone pipelines/ner_wikiner
 > ```
 >
 > Clone from custom repo:
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@ -289,8 +289,7 @@ of objects by referring to creation functions, including functions you register
 yourself. For details on how to get started with training your own model, check
 out the [training quickstart](/usage/training#quickstart).

-<!-- TODO:
-<Project id="en_core_trf_lg">
+<!-- TODO: <Project id="en_core_trf_lg">

 The easiest way to get started is to clone a transformers-based project
 template. Swap in your data, edit the settings and hyperparameters and train,
@ -623,7 +622,7 @@ that are familiar from the training block: the `[pretraining.batcher]`,
 `[pretraining.optimizer]` and `[pretraining.corpus]` all work the same way and
 expect the same types of objects, although for pretraining your corpus does not
 need to have any annotations, so you will often use a different reader, such as
-the [`JsonlReader`](/api/toplevel#jsonlreader).
+the [`JsonlReader`](/api/top-level#jsonlreader).

 > #### Raw text format
 >
--- a/website/docs/usage/facts-figures.md
+++ b/website/docs/usage/facts-figures.md
@ -45,7 +45,7 @@ spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy
 right up to **current state-of-the-art**. You can also use a CPU-optimized
 pipeline, which is less accurate but much cheaper to run.

-<!-- TODO: -->
+<!-- TODO: update benchmarks and intro -->

 > #### Evaluation details
 >
@ -68,6 +68,6 @@ our project template.

 </Project>

-<!-- ## Citing spaCy {#citation}
+<!-- TODO: ## Citing spaCy {#citation}

-<!-- TODO: update -->
+-->
--- a/website/docs/usage/layers-architectures.md
+++ b/website/docs/usage/layers-architectures.md
@ -356,11 +356,11 @@ that training configs are complete and experiments fully reproducible.

 </Infobox>

-Note that when using a PyTorch or Tensorflow model, it is recommended to set the GPU
-memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or
-"tensorflow" in the training config, cupy will allocate memory via those respective libraries,
-preventing OOM errors when there's available memory sitting in the other
-library's pool.
+Note that when using a PyTorch or Tensorflow model, it is recommended to set the
+GPU memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or
+"tensorflow" in the training config, cupy will allocate memory via those
+respective libraries, preventing OOM errors when there's available memory
+sitting in the other library's pool.

 ```ini
 ### config.cfg (excerpt)
@ -489,7 +489,7 @@ with Model.define_operators({">>": chain}):
 <Infobox title="This section is still under construction" emoji="🚧" variant="warning">
 </Infobox>

-<!-- TODO:
+<!-- TODO: write trainable component section
 - Interaction with `predict`, `get_loss` and `set_annotations`
 - Initialization life-cycle with `begin_training`, correlation with add_label
 Example: relation extraction component (implemented as project template)
--- a/website/docs/usage/models.md
+++ b/website/docs/usage/models.md
@ -381,8 +381,6 @@ and loading pipeline packages, the underlying functionality is entirely based on
 native Python packaging. This allows your application to handle a spaCy pipeline
 like any other package dependency.

-<!-- TODO: reference relevant spaCy project -->
-
 ### Downloading and requiring package dependencies {#models-download}

 spaCy's built-in [`download`](/api/cli#download) command is mostly intended as a
--- a/website/docs/usage/projects.md
+++ b/website/docs/usage/projects.md
@ -29,15 +29,13 @@ and share your results with your team. spaCy projects can be used via the new

 ![Illustration of project workflow and commands](../images/projects.svg)

-<!-- TODO:
-<Project id="some_example_project">
+<Project id="pipelines/tagger_parser_ud">

-Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum
-sodales lectus, ut sodales orci ullamcorper id. Sed condimentum neque ut erat
-mattis pretium.
+The easiest way to get started is to clone a project template and run it – for
+example, this end-to-end template that lets you train a **part-of-speech
+tagger** and **dependency parser** on a Universal Dependencies treebank.

 </Project>
-->

 spaCy projects make it easy to integrate with many other **awesome tools** in
 the data science and machine learning ecosystem to track and manage your data
@ -65,10 +63,8 @@ project template and copies the files to a local directory. You can then run the
 project, e.g. to train a pipeline and edit the commands and scripts to build
 fully custom workflows.

-<!-- TODO: update with real example project -->
-
 ```cli
-python -m spacy project clone some_example_project
+python -m spacy project clone pipelines/tagger_parser_ud
 ```

 By default, the project will be cloned into the current working directory. You
@ -216,10 +212,8 @@ format, train a pipeline, evaluate it and export metrics, package it and spin up
 a quick web demo. It looks pretty similar to a config file used to define CI
 pipelines.

-<!-- TODO: update with better (final) example -->
-
 ```yaml
-https://github.com/explosion/projects/tree/v3/tutorials/ner_fashion_brands/project.yml
+https://github.com/explosion/projects/tree/v3/pipelines/tagger_parser_ud/project.yml
 ```

 | Section       | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
--- a/website/docs/usage/saving-loading.md
+++ b/website/docs/usage/saving-loading.md
@ -574,7 +574,7 @@ The directory will be created if it doesn't exist, and the whole pipeline data,
 meta and configuration will be written out. To make the pipeline more convenient
 to deploy, we recommend wrapping it as a [Python package](/api/cli#package).

-<Accordion title="What’s the difference between the config.cfg and meta.json?" spaced id="models-meta-vs-config">
+<Accordion title="What’s the difference between the config.cfg and meta.json?" spaced id="models-meta-vs-config" spaced>

 When you save a pipeline in spaCy v3.0+, two files will be exported: a
 [`config.cfg`](/api/data-formats#config) based on
@ -596,6 +596,15 @@ based on [`nlp.meta`](/api/language#meta).

 </Accordion>

+<Project id="pipelines/tagger_parser_ud">
+
+The easiest way to get started with an end-to-end workflow is to clone a
+[project template](/usage/projects) and run it – for example, this template that
+lets you train a **part-of-speech tagger** and **dependency parser** on a
+Universal Dependencies treebank and generates an installable Python package.
+
+</Project>
+
 ### Generating a pipeline package {#models-generating}

 <Infobox title="Important note" variant="warning">
@ -699,5 +708,3 @@ class and call [`from_disk`](/api/language#from_disk) instead.
 ```python
 nlp = spacy.blank("en").from_disk("/path/to/data")
 ```
-
-<!-- TODO: point to spaCy projects? -->
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -92,7 +92,7 @@ spaCy's binary `.spacy` format. You can either include the data paths in the
 $ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy
 ```

-<Accordion title="How are the config recommendations generated?" id="quickstart-source">
+<Accordion title="How are the config recommendations generated?" id="quickstart-source" spaced>

 The recommended config settings generated by the quickstart widget and the
 [`init config`](/api/cli#init-config) command are based on some general **best
@ -112,6 +112,15 @@ as we run more experiments.

 </Accordion>

+<Project id="pipelines/tagger_parser_ud">
+
+The easiest way to get started is to clone a [project template](/usage/projects)
+and run it – for example, this end-to-end template that lets you train a
+**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
+treebank.
+
+</Project>
+
 ## Training config {#config}

 Training config files include all **settings and hyperparameters** for training
--- a/website/docs/usage/v3.md
+++ b/website/docs/usage/v3.md
@ -176,18 +176,16 @@ freely combine implementations from different frameworks into a single model.

 ### Manage end-to-end workflows with projects {#features-projects}

-<!-- TODO: update example -->
-
 > #### Example
 >
 > ```cli
 > # Clone a project template
-> $ python -m spacy project clone example
-> $ cd example
+> $ python -m spacy project clone pipelines/tagger_parser_ud
+> $ cd tagger_parser_ud
 > # Download data assets
 > $ python -m spacy project assets
 > # Run a workflow
-> $ python -m spacy project run train
+> $ python -m spacy project run all
 > ```

 spaCy projects let you manage and share **end-to-end spaCy workflows** for
@ -207,14 +205,6 @@ data, [Streamlit](/usage/projects#streamlit) for building interactive apps,
 [Ray](/usage/projects#ray) for parallel training,
 [Weights & Biases](/usage/projects#wandb) for experiment tracking, and more!

-<!-- <Project id="some_example_project">
-
-The easiest way to get started with an end-to-end training process is to clone a
-[project](/usage/projects) template. Projects let you manage multi-step
-workflows, from data preprocessing to training and packaging your pipeline.
-
-</Project>-->
-
 <Infobox title="Details & Documentation" emoji="📖" list>

 - **Usage:** [spaCy projects](/usage/projects),
@ -224,6 +214,15 @@ workflows, from data preprocessing to training and packaging your pipeline.

 </Infobox>

+<Project id="pipelines/tagger_parser_ud">
+
+The easiest way to get started is to clone a [project template](/usage/projects)
+and run it – for example, this end-to-end template that lets you train a
+**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
+treebank.
+
+</Project>
+
 ### Parallel and distributed training with Ray {#features-parallel-training}

 > #### Example
@ -875,7 +874,14 @@ values. You can then use the auto-generated `config.cfg` for training:
 + python -m spacy train ./config.cfg --output ./output
 ```

-<!-- TODO: project template -->
+<Project id="pipelines/tagger_parser_ud">
+
+The easiest way to get started is to clone a [project template](/usage/projects)
+and run it – for example, this end-to-end template that lets you train a
+**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
+treebank.
+
+</Project>

 #### Training via the Python API {#migrating-training-python}

--- a/website/meta/site.json
+++ b/website/meta/site.json
@ -12,6 +12,7 @@
    "companyUrl": "https://explosion.ai",
    "repo": "explosion/spaCy",
    "modelsRepo": "explosion/spacy-models",
+    "projectsRepo": "explosion/projects/tree/v3",
    "social": {
        "twitter": "spacy_io",
        "github": "explosion"
--- a/website/src/components/tag.js
+++ b/website/src/components/tag.js
@ -13,7 +13,7 @@ export default function Tag({ spaced = false, variant, tooltip, children }) {
        const isValid = isString(children) && !isNaN(children)
        const version = isValid ? Number(children).toFixed(1) : children
        const tooltipText = `This feature is new and was introduced in spaCy v${version}`
-        // TODO: we probably want to handle this more elegantly, but the idea is
+        // We probably want to handle this more elegantly, but the idea is
        // that we can hide tags referring to old versions
        const major = isString(version) ? Number(version.split('.')[0]) : version
        return major < MIN_VERSION ? null : (
--- a/website/src/components/util.js
+++ b/website/src/components/util.js
@ -10,6 +10,7 @@ const htmlToReactParser = new HtmlToReactParser()
 const DEFAULT_BRANCH = 'develop'
 export const repo = siteMetadata.repo
 export const modelsRepo = siteMetadata.modelsRepo
+export const projectsRepo = siteMetadata.projectsRepo

 /**
 * This is used to provide selectors for headings so they can be crawled by
--- a/website/src/widgets/landing.js
+++ b/website/src/widgets/landing.js
@ -222,10 +222,11 @@ const Landing = ({ data }) => {
                    <br />
                    <br />
                    <br />
-                    {/** TODO: update with actual example */}
-                    <Project id="some_example">
-                        Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum
-                        sodales lectus.
+                    <Project id="pipelines/tagger_parser_ud" title="Get started">
+                        The easiest way to get started is to clone a project template and run it
+                        – for example, this template for training a{' '}
+                        <strong>part-of-speech tagger</strong> and{' '}
+                        <strong>dependency parser</strong> on a Universal Dependencies treebank.
                    </Project>
                </LandingCol>
                <LandingCol>
--- a/website/src/widgets/project.js
+++ b/website/src/widgets/project.js
@ -4,25 +4,29 @@ import CopyInput from '../components/copy'
 import Infobox from '../components/infobox'
 import Link from '../components/link'
 import { InlineCode } from '../components/code'
+import { projectsRepo } from '../components/util'

-// TODO: move to meta?
-const DEFAULT_REPO = 'https://github.com/explosion/projects/tree/v3'
 const COMMAND = 'python -m spacy project clone'

-export default function Project({ id, repo, children }) {
+export default function Project({
+    title = 'Get started with a project template',
+    id,
+    repo,
+    children,
+}) {
    const repoArg = repo ? ` --repo ${repo}` : ''
    const text = `${COMMAND} ${id}${repoArg}`
-    const url = `${repo || DEFAULT_REPO}/${id}`
-    const title = (
+    const url = `${repo || projectsRepo}/${id}`
+    const header = (
        <>
-            Get started with a project template:{' '}
+            {title}:{' '}
            <Link to={url}>
                <InlineCode>{id}</InlineCode>
            </Link>
        </>
    )
    return (
-        <Infobox title={title} emoji="🪐">
+        <Infobox title={header} emoji="🪐">
            {children}
            <CopyInput text={text} prefix="$" />
        </Infobox>