Document new features

2025-11-04 09:57:26 +03:00 · 2020-07-09 21:10:36 +02:00 · 2020-07-09 21:10:36 +02:00 · 7bcf9f7cfb
commit 7bcf9f7cfb
parent 797ca6f3dd
1 changed files with 45 additions and 21 deletions
--- a/website/docs/usage/projects.md
+++ b/website/docs/usage/projects.md
@ -186,12 +186,13 @@ pipelines.
 https://github.com/explosion/spacy-boilerplates/blob/master/ner_fashion/project.yml
 ```
-| Section     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| Section       | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
-| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `variables` | A dictionary of variables that can be referenced in paths, URLs and scripts. For example, `{NAME}` will use the value of the variable `NAME`.                                                                                                                                                                                                                                                                                                                                                                |
+| `variables`   | A dictionary of variables that can be referenced in paths, URLs and scripts. For example, `{NAME}` will use the value of the variable `NAME`.                                                                                                                                                                                                                                                                                                                                                                |
-| `assets`    | A list of assets that can be fetched with the [`project assets`](/api/cli#project-assets) command. `url` defines a URL or local path, `dest` is the destination file relative to the project directory, and an optional `checksum` ensures that an error is raised if the file's checksum doesn't match.                                                                                                                                                                                                     |
+| `directories` | An optional list of [directories](#project-files) that should be created in the project for assets, training outputs, metrics etc. spaCy will make sure that these directories always exist.                                                                                                                                                                                                                                                                                                                 |
-| `workflows` | A dictionary of workflow names, mapped to a list of command names, to execute in order. Workflows can be run with the [`project run`](/api/cli#project-run) command.                                                                                                                                                                                                                                                                                                                                         |
+| `assets`      | A list of assets that can be fetched with the [`project assets`](/api/cli#project-assets) command. `url` defines a URL or local path, `dest` is the destination file relative to the project directory, and an optional `checksum` ensures that an error is raised if the file's checksum doesn't match.                                                                                                                                                                                                     |
-| `commands`  | A list of named commands. A command can define an optional help message (shown in the CLI when the user adds `--help`) and the `script`, a list of commands to run. The `deps` and `outputs` let you define the created file the command depends on and produces, respectively. This lets spaCy determine whether a command needs to be re-run because its dependencies or outputs changed. Commands can be run as part of a workflow, or separately with the [`project run`](/api/cli#project-run) command. |
+| `workflows`   | A dictionary of workflow names, mapped to a list of command names, to execute in order. Workflows can be run with the [`project run`](/api/cli#project-run) command.                                                                                                                                                                                                                                                                                                                                         |
 | `commands`    | A list of named commands. A command can define an optional help message (shown in the CLI when the user adds `--help`) and the `script`, a list of commands to run. The `deps` and `outputs` let you define the created file the command depends on and produces, respectively. This lets spaCy determine whether a command needs to be re-run because its dependencies or outputs changed. Commands can be run as part of a workflow, or separately with the [`project run`](/api/cli#project-run) command. |
 ### Dependencies and outputs {#deps-outputs}
@ -228,7 +229,9 @@ commands:
 If you're running a command and it depends on files that are missing, spaCy will
 show you an error. If a command defines dependencies and outputs that haven't
 changed since the last run, the command will be skipped. This means that you're
-only re-running commands if they need to be re-run. To force re-running a
+only re-running commands if they need to be re-run. Commands can also set
 `no_skip: true` if they should never be skipped – for example commands that run
 tests. Commands without outputs are also never skipped. To force re-running a
 command or workflow, even if nothing changed, you can set the `--force` flag.
 Note that [`spacy project`](/api/cli#project) doesn't compile any dependency
@ -243,28 +246,42 @@ won't be cached or tracked.
 ### Files and directory structure {#project-files}
-A project directory created by [`spacy project clone`](/api/cli#project-clone)
+The `project.yml` can define a list of `directories` that should be created
-includes the following files and directories. They can optionally be
+within a project – for instance, `assets`, `training`, `corpus` and so on. spaCy
-pre-populated by a project template (most commonly used for metas, configs or
+will make sure that these directories are always available, so your commands can
-scripts).
+write to and read from them. Project directories will also include all files and
 directories copied from the project template with
 [`spacy project clone`](/api/cli#project-clone). Here's an example of a project
 directory:
 > #### project.yml
 >
 > <!-- prettier-ignore -->
 > ```yaml
 > directories: ['assets', 'configs', 'corpus', 'metas', 'metrics', 'notebooks', 'packages', 'scripts', 'training']
 > ```
 ```yaml
-### Project directory
+### Example project directory
 ├── project.yml          # the project settings
 ├── project.lock         # lockfile that tracks inputs/outputs
 ├── assets/              # downloaded data assets
-├── metrics/             # output directory for evaluation metrics
+├── configs/             # model config.cfg files used for training
 ├── training/            # output directory for trained models
 ├── corpus/              # output directory for training corpus
-├── packages/            # output directory for model Python packages
+├── metas/               # model meta.json templates used for packaging
 ├── metrics/             # output directory for evaluation metrics
 ├── notebooks/           # directory for Jupyter notebooks
 ├── packages/            # output directory for model Python packages
 ├── scripts/             # directory for scripts, e.g. referenced in commands
-├── metas/               # model meta.json templates used for packaging
+├── training/            # output directory for trained models
 ├── configs/             # model config.cfg files used for training
 └── ...                  # any other files, like a requirements.txt etc.
 ```
 If you don't want a project to create a directory, you can delete it and remove
 its entry from the `project.yml` – just make sure it's not required by any of
 the commands. [Custom templates](#custom) can use any directories they need –
 the only file that's required for a project is the `project.yml`.
 ---
 ## Custom scripts and projects {#custom}
@ -275,7 +292,9 @@ a list of commands that are called in a subprocess, in order. This lets you
 execute other Python scripts or command-line tools. Let's say you've written a
 few integration tests that load the best model produced by the training command
 and check that it works correctly. You can now define a `test` command that
-calls into [`pytest`](https://docs.pytest.org/en/latest/) and runs your tests:
+calls into [`pytest`](https://docs.pytest.org/en/latest/), runs your tests and
 uses [`pytest-html`](https://github.com/pytest-dev/pytest-html) to export a test
 report:
 > #### Calling into Python
 >
@ -290,15 +309,20 @@ commands:
  - name: test
    help: 'Test the trained model'
    script:
-      - 'python -m pytest ./scripts/tests'
+      - 'pip install pytest pytest-html'
      - 'python -m pytest ./scripts/tests --html=metrics/test-report.html'
    deps:
      - 'training/model-best'
    outputs:
      - 'metrics/test-report.html'
    no_skip: true
 ```
 Adding `training/model-best` to the command's `deps` lets you ensure that the
 file is available. If not, spaCy will show an error and the command won't run.
-
+Setting `no_skip: true` means that the command will always run, even if the
-<!-- TODO: add another example -->
+dependencies (the trained model) hasn't changed. This makes sense here, because
 you typically don't want to skip your tests.
 ### Cloning from your own repo {#custom-repo}