mirror of
https://github.com/explosion/spaCy.git
synced 2025-06-06 14:13:11 +03:00
Document new features
This commit is contained in:
parent
797ca6f3dd
commit
7bcf9f7cfb
|
@ -186,12 +186,13 @@ pipelines.
|
||||||
https://github.com/explosion/spacy-boilerplates/blob/master/ner_fashion/project.yml
|
https://github.com/explosion/spacy-boilerplates/blob/master/ner_fashion/project.yml
|
||||||
```
|
```
|
||||||
|
|
||||||
| Section | Description |
|
| Section | Description |
|
||||||
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||||
| `variables` | A dictionary of variables that can be referenced in paths, URLs and scripts. For example, `{NAME}` will use the value of the variable `NAME`. |
|
| `variables` | A dictionary of variables that can be referenced in paths, URLs and scripts. For example, `{NAME}` will use the value of the variable `NAME`. |
|
||||||
| `assets` | A list of assets that can be fetched with the [`project assets`](/api/cli#project-assets) command. `url` defines a URL or local path, `dest` is the destination file relative to the project directory, and an optional `checksum` ensures that an error is raised if the file's checksum doesn't match. |
|
| `directories` | An optional list of [directories](#project-files) that should be created in the project for assets, training outputs, metrics etc. spaCy will make sure that these directories always exist. |
|
||||||
| `workflows` | A dictionary of workflow names, mapped to a list of command names, to execute in order. Workflows can be run with the [`project run`](/api/cli#project-run) command. |
|
| `assets` | A list of assets that can be fetched with the [`project assets`](/api/cli#project-assets) command. `url` defines a URL or local path, `dest` is the destination file relative to the project directory, and an optional `checksum` ensures that an error is raised if the file's checksum doesn't match. |
|
||||||
| `commands` | A list of named commands. A command can define an optional help message (shown in the CLI when the user adds `--help`) and the `script`, a list of commands to run. The `deps` and `outputs` let you define the created file the command depends on and produces, respectively. This lets spaCy determine whether a command needs to be re-run because its dependencies or outputs changed. Commands can be run as part of a workflow, or separately with the [`project run`](/api/cli#project-run) command. |
|
| `workflows` | A dictionary of workflow names, mapped to a list of command names, to execute in order. Workflows can be run with the [`project run`](/api/cli#project-run) command. |
|
||||||
|
| `commands` | A list of named commands. A command can define an optional help message (shown in the CLI when the user adds `--help`) and the `script`, a list of commands to run. The `deps` and `outputs` let you define the created file the command depends on and produces, respectively. This lets spaCy determine whether a command needs to be re-run because its dependencies or outputs changed. Commands can be run as part of a workflow, or separately with the [`project run`](/api/cli#project-run) command. |
|
||||||
|
|
||||||
### Dependencies and outputs {#deps-outputs}
|
### Dependencies and outputs {#deps-outputs}
|
||||||
|
|
||||||
|
@ -228,7 +229,9 @@ commands:
|
||||||
If you're running a command and it depends on files that are missing, spaCy will
|
If you're running a command and it depends on files that are missing, spaCy will
|
||||||
show you an error. If a command defines dependencies and outputs that haven't
|
show you an error. If a command defines dependencies and outputs that haven't
|
||||||
changed since the last run, the command will be skipped. This means that you're
|
changed since the last run, the command will be skipped. This means that you're
|
||||||
only re-running commands if they need to be re-run. To force re-running a
|
only re-running commands if they need to be re-run. Commands can also set
|
||||||
|
`no_skip: true` if they should never be skipped – for example commands that run
|
||||||
|
tests. Commands without outputs are also never skipped. To force re-running a
|
||||||
command or workflow, even if nothing changed, you can set the `--force` flag.
|
command or workflow, even if nothing changed, you can set the `--force` flag.
|
||||||
|
|
||||||
Note that [`spacy project`](/api/cli#project) doesn't compile any dependency
|
Note that [`spacy project`](/api/cli#project) doesn't compile any dependency
|
||||||
|
@ -243,28 +246,42 @@ won't be cached or tracked.
|
||||||
|
|
||||||
### Files and directory structure {#project-files}
|
### Files and directory structure {#project-files}
|
||||||
|
|
||||||
A project directory created by [`spacy project clone`](/api/cli#project-clone)
|
The `project.yml` can define a list of `directories` that should be created
|
||||||
includes the following files and directories. They can optionally be
|
within a project – for instance, `assets`, `training`, `corpus` and so on. spaCy
|
||||||
pre-populated by a project template (most commonly used for metas, configs or
|
will make sure that these directories are always available, so your commands can
|
||||||
scripts).
|
write to and read from them. Project directories will also include all files and
|
||||||
|
directories copied from the project template with
|
||||||
|
[`spacy project clone`](/api/cli#project-clone). Here's an example of a project
|
||||||
|
directory:
|
||||||
|
|
||||||
|
> #### project.yml
|
||||||
|
>
|
||||||
|
> <!-- prettier-ignore -->
|
||||||
|
> ```yaml
|
||||||
|
> directories: ['assets', 'configs', 'corpus', 'metas', 'metrics', 'notebooks', 'packages', 'scripts', 'training']
|
||||||
|
> ```
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
### Project directory
|
### Example project directory
|
||||||
├── project.yml # the project settings
|
├── project.yml # the project settings
|
||||||
├── project.lock # lockfile that tracks inputs/outputs
|
├── project.lock # lockfile that tracks inputs/outputs
|
||||||
├── assets/ # downloaded data assets
|
├── assets/ # downloaded data assets
|
||||||
├── metrics/ # output directory for evaluation metrics
|
├── configs/ # model config.cfg files used for training
|
||||||
├── training/ # output directory for trained models
|
|
||||||
├── corpus/ # output directory for training corpus
|
├── corpus/ # output directory for training corpus
|
||||||
├── packages/ # output directory for model Python packages
|
├── metas/ # model meta.json templates used for packaging
|
||||||
├── metrics/ # output directory for evaluation metrics
|
├── metrics/ # output directory for evaluation metrics
|
||||||
├── notebooks/ # directory for Jupyter notebooks
|
├── notebooks/ # directory for Jupyter notebooks
|
||||||
|
├── packages/ # output directory for model Python packages
|
||||||
├── scripts/ # directory for scripts, e.g. referenced in commands
|
├── scripts/ # directory for scripts, e.g. referenced in commands
|
||||||
├── metas/ # model meta.json templates used for packaging
|
├── training/ # output directory for trained models
|
||||||
├── configs/ # model config.cfg files used for training
|
|
||||||
└── ... # any other files, like a requirements.txt etc.
|
└── ... # any other files, like a requirements.txt etc.
|
||||||
```
|
```
|
||||||
|
|
||||||
|
If you don't want a project to create a directory, you can delete it and remove
|
||||||
|
its entry from the `project.yml` – just make sure it's not required by any of
|
||||||
|
the commands. [Custom templates](#custom) can use any directories they need –
|
||||||
|
the only file that's required for a project is the `project.yml`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Custom scripts and projects {#custom}
|
## Custom scripts and projects {#custom}
|
||||||
|
@ -275,7 +292,9 @@ a list of commands that are called in a subprocess, in order. This lets you
|
||||||
execute other Python scripts or command-line tools. Let's say you've written a
|
execute other Python scripts or command-line tools. Let's say you've written a
|
||||||
few integration tests that load the best model produced by the training command
|
few integration tests that load the best model produced by the training command
|
||||||
and check that it works correctly. You can now define a `test` command that
|
and check that it works correctly. You can now define a `test` command that
|
||||||
calls into [`pytest`](https://docs.pytest.org/en/latest/) and runs your tests:
|
calls into [`pytest`](https://docs.pytest.org/en/latest/), runs your tests and
|
||||||
|
uses [`pytest-html`](https://github.com/pytest-dev/pytest-html) to export a test
|
||||||
|
report:
|
||||||
|
|
||||||
> #### Calling into Python
|
> #### Calling into Python
|
||||||
>
|
>
|
||||||
|
@ -290,15 +309,20 @@ commands:
|
||||||
- name: test
|
- name: test
|
||||||
help: 'Test the trained model'
|
help: 'Test the trained model'
|
||||||
script:
|
script:
|
||||||
- 'python -m pytest ./scripts/tests'
|
- 'pip install pytest pytest-html'
|
||||||
|
- 'python -m pytest ./scripts/tests --html=metrics/test-report.html'
|
||||||
deps:
|
deps:
|
||||||
- 'training/model-best'
|
- 'training/model-best'
|
||||||
|
outputs:
|
||||||
|
- 'metrics/test-report.html'
|
||||||
|
no_skip: true
|
||||||
```
|
```
|
||||||
|
|
||||||
Adding `training/model-best` to the command's `deps` lets you ensure that the
|
Adding `training/model-best` to the command's `deps` lets you ensure that the
|
||||||
file is available. If not, spaCy will show an error and the command won't run.
|
file is available. If not, spaCy will show an error and the command won't run.
|
||||||
|
Setting `no_skip: true` means that the command will always run, even if the
|
||||||
<!-- TODO: add another example -->
|
dependencies (the trained model) hasn't changed. This makes sense here, because
|
||||||
|
you typically don't want to skip your tests.
|
||||||
|
|
||||||
### Cloning from your own repo {#custom-repo}
|
### Cloning from your own repo {#custom-repo}
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user