Update projects.md

This commit is contained in:
Ines Montani 2020-07-09 22:35:54 +02:00
parent 7bcf9f7cfb
commit 28cdae898a

View File

@ -57,7 +57,7 @@ production.
### 1. Clone a project template {#clone} ### 1. Clone a project template {#clone}
> #### Cloning under the hoodimport { ReactComponent as WandBLogo } from '../images/logos/wandb.svg' > #### Cloning under the hood
> >
> To clone a project, spaCy calls into `git` and uses the "sparse checkout" > To clone a project, spaCy calls into `git` and uses the "sparse checkout"
> feature to only clone the relevant directory or directories. > feature to only clone the relevant directory or directories.
@ -296,13 +296,6 @@ calls into [`pytest`](https://docs.pytest.org/en/latest/), runs your tests and
uses [`pytest-html`](https://github.com/pytest-dev/pytest-html) to export a test uses [`pytest-html`](https://github.com/pytest-dev/pytest-html) to export a test
report: report:
> #### Calling into Python
>
> If any of your command scripts call into `python`, spaCy will take care of
> replacing that with your `sys.executable`, to make sure you're executing
> everything with the same Python (not some other Python installed on your
> system). It also normalizes references to `python3`, `pip3` and `pip`.
```yaml ```yaml
### project.yml ### project.yml
commands: commands:
@ -324,6 +317,62 @@ Setting `no_skip: true` means that the command will always run, even if the
dependencies (the trained model) hasn't changed. This makes sense here, because dependencies (the trained model) hasn't changed. This makes sense here, because
you typically don't want to skip your tests. you typically don't want to skip your tests.
### Writing custom scripts {#custom-scripts}
Your project commands can include any custom scripts essentially, anything you
can run from the command line. Here's an example of a custom script that uses
[`typer`](https://typer.tiangolo.com/) for quick and easy command-line arguments
that you can define via your `project.yml`:
> #### About Typer
>
> [`typer`](https://typer.tiangolo.com/) is a modern library for building Python
> CLIs using type hints. It's a dependency of spaCy, so it will already be
> pre-installed in your environment. Function arguments automatically become
> positional CLI arguments and using Python type hints, you can define the value
> types. For instance, `batch_size: int` means that the value provided via the
> command line is converted to an integer.
```python
### scripts/custom_evaluation.py
import typer
def custom_evaluation(batch_size: int = 128, model_path: str, data_path: str):
# The arguments are now available as positional CLI arguments
print(batch_size, model_path, data_path)
if __name__ == "__main__":
typer.run(custom_evaluation)
```
In your `project.yml`, you can then run the script by calling
`python scripts/custom_evaluation.py` with the function arguments. You can also
use the `variables` section to define reusable variables that will be
substituted in commands, paths and URLs. In this example, the `BATCH_SIZE` is
defined as a variable will be added in place of `{BATCH_SIZE}` in the script.
> #### Calling into Python
>
> If any of your command scripts call into `python`, spaCy will take care of
> replacing that with your `sys.executable`, to make sure you're executing
> everything with the same Python (not some other Python installed on your
> system). It also normalizes references to `python3`, `pip3` and `pip`.
<!-- prettier-ignore -->
```yaml
### project.yml
variables:
BATCH_SIZE: 128
commands:
- name: evaluate
script:
- 'python scripts/custom_evaluation.py {BATCH_SIZE} ./training/model-best ./corpus/eval.json'
deps:
- 'training/model-best'
- 'corpus/eval.json'
```
### Cloning from your own repo {#custom-repo} ### Cloning from your own repo {#custom-repo}
The [`spacy project clone`](/api/cli#project-clone) command lets you customize The [`spacy project clone`](/api/cli#project-clone) command lets you customize
@ -345,8 +394,9 @@ notebooks with usage examples.
It's typically not a good idea to check large data assets, trained models or It's typically not a good idea to check large data assets, trained models or
other artifacts into a Git repo and you should exclude them from your project other artifacts into a Git repo and you should exclude them from your project
template. If you want to version your data and models, check out template by adding a `.gitignore`. If you want to version your data and models,
[Data Version Control](#dvc) (DVC), which integrates with spaCy projects. check out [Data Version Control](#dvc) (DVC), which integrates with spaCy
projects.
</Infobox> </Infobox>