Update docs [ci skip]

This commit is contained in:
Ines Montani 2020-08-19 00:28:37 +02:00
parent c0f6e77a41
commit 13291e97ba
18 changed files with 295 additions and 183 deletions

View File

@ -39,8 +39,8 @@ the model name to be specified with its version (e.g. `en_core_web_sm-2.2.0`).
> to a local PyPi installation and fetching it straight from there. This will > to a local PyPi installation and fetching it straight from there. This will
> also allow you to add it as a versioned package dependency to your project. > also allow you to add it as a versioned package dependency to your project.
```bash ```cli
$ python -m spacy download [model] [--direct] [pip args] $ python -m spacy download [model] [--direct] [pip_args]
``` ```
| Name | Description | | Name | Description |
@ -57,11 +57,11 @@ Print information about your spaCy installation, models and local setup, and
generate [Markdown](https://en.wikipedia.org/wiki/Markdown)-formatted markup to generate [Markdown](https://en.wikipedia.org/wiki/Markdown)-formatted markup to
copy-paste into [GitHub issues](https://github.com/explosion/spaCy/issues). copy-paste into [GitHub issues](https://github.com/explosion/spaCy/issues).
```bash ```cli
$ python -m spacy info [--markdown] [--silent] $ python -m spacy info [--markdown] [--silent]
``` ```
```bash ```cli
$ python -m spacy info [model] [--markdown] [--silent] $ python -m spacy info [model] [--markdown] [--silent]
``` ```
@ -88,7 +88,7 @@ and command for updating are shown.
> suite, to ensure all models are up to date before proceeding. If incompatible > suite, to ensure all models are up to date before proceeding. If incompatible
> models are found, it will return `1`. > models are found, it will return `1`.
```bash ```cli
$ python -m spacy validate $ python -m spacy validate
``` ```
@ -111,14 +111,14 @@ config. The settings you specify will impact the suggested model architectures
and pipeline setup, as well as the hyperparameters. You can also adjust and and pipeline setup, as well as the hyperparameters. You can also adjust and
customize those settings in your config file later. customize those settings in your config file later.
> ```bash > #### Example
> ### Example {wrap="true"} >
> ```cli
> $ python -m spacy init config config.cfg --lang en --pipeline ner,textcat --optimize accuracy > $ python -m spacy init config config.cfg --lang en --pipeline ner,textcat --optimize accuracy
> ``` > ```
```bash ```cli
$ python -m spacy init config [output_file] [--lang] [--pipeline] $ python -m spacy init config [output_file] [--lang] [--pipeline] [--optimize] [--cpu]
[--optimize] [--cpu]
``` ```
| Name | Description | | Name | Description |
@ -143,12 +143,13 @@ be created, and their signatures are used to find the defaults. If your config
contains a problem that can't be resolved automatically, spaCy will show you a contains a problem that can't be resolved automatically, spaCy will show you a
validation error with more details. validation error with more details.
> ```bash > #### Example
> ### Example {wrap="true"} >
> ```cli
> $ python -m spacy init fill-config base.cfg config.cfg > $ python -m spacy init fill-config base.cfg config.cfg
> ``` > ```
```bash ```cli
$ python -m spacy init fill-config [base_path] [output_file] [--diff] $ python -m spacy init fill-config [base_path] [output_file] [--diff]
``` ```
@ -175,9 +176,8 @@ The `init-model` command is now available as a subcommand of `spacy init`.
</Infobox> </Infobox>
```bash ```cli
$ python -m spacy init model [lang] [output_dir] [--jsonl-loc] [--vectors-loc] $ python -m spacy init model [lang] [output_dir] [--jsonl-loc] [--vectors-loc] [--prune-vectors]
[--prune-vectors]
``` ```
| Name | Description | | Name | Description |
@ -200,10 +200,8 @@ Convert files into spaCy's
management functions. The converter can be specified on the command line, or management functions. The converter can be specified on the command line, or
chosen based on the file extension of the input file. chosen based on the file extension of the input file.
```bash ```cli
$ python -m spacy convert [input_file] [output_dir] [--converter] $ python -m spacy convert [input_file] [output_dir] [--converter] [--file-type] [--n-sents] [--seg-sents] [--model] [--morphology] [--merge-subtokens] [--ner-map] [--lang]
[--file-type] [--n-sents] [--seg-sents] [--model] [--morphology]
[--merge-subtokens] [--ner-map] [--lang]
``` ```
| Name | Description | | Name | Description |
@ -246,13 +244,13 @@ errors at once and some issues are only shown once previous errors have been
fixed. To auto-fill a partial config and save the result, you can use the fixed. To auto-fill a partial config and save the result, you can use the
[`init fillconfig`](/api/cli#init-fill-config) command. [`init fillconfig`](/api/cli#init-fill-config) command.
```bash ```cli
$ python -m spacy debug config [config_path] [--code_path] [overrides] $ python -m spacy debug config [config_path] [--code_path] [overrides]
``` ```
> #### Example > #### Example
> >
> ```bash > ```cli
> $ python -m spacy debug config ./config.cfg > $ python -m spacy debug config ./config.cfg
> ``` > ```
@ -298,14 +296,13 @@ takes the same arguments as `train` and reads settings off the
</Infobox> </Infobox>
```bash ```cli
$ python -m spacy debug data [config_path] [--code] [--ignore-warnings] $ python -m spacy debug data [config_path] [--code] [--ignore-warnings] [--verbose] [--no-format] [overrides]
[--verbose] [--no-format] [overrides]
``` ```
> #### Example > #### Example
> >
> ```bash > ```cli
> $ python -m spacy debug data ./config.cfg > $ python -m spacy debug data ./config.cfg
> ``` > ```
@ -473,7 +470,7 @@ The `profile` command is now available as a subcommand of `spacy debug`.
</Infobox> </Infobox>
```bash ```cli
$ python -m spacy debug profile [model] [inputs] [--n-texts] $ python -m spacy debug profile [model] [inputs] [--n-texts]
``` ```
@ -490,9 +487,8 @@ $ python -m spacy debug profile [model] [inputs] [--n-texts]
Debug a Thinc [`Model`](https://thinc.ai/docs/api-model) by running it on a Debug a Thinc [`Model`](https://thinc.ai/docs/api-model) by running it on a
sample text and checking how it updates its internal weights and parameters. sample text and checking how it updates its internal weights and parameters.
```bash ```cli
$ python -m spacy debug model [config_path] [component] [--layers] [-DIM] $ python -m spacy debug model [config_path] [component] [--layers] [-DIM] [-PAR] [-GRAD] [-ATTR] [-P0] [-P1] [-P2] [P3] [--gpu-id]
[-PAR] [-GRAD] [-ATTR] [-P0] [-P1] [-P2] [P3] [--gpu-id]
``` ```
<Accordion title="Example outputs" spaced> <Accordion title="Example outputs" spaced>
@ -502,7 +498,7 @@ model ("Step 0"), which helps us to understand the internal structure of the
Neural Network, and to focus on specific layers that we want to inspect further Neural Network, and to focus on specific layers that we want to inspect further
(see next example). (see next example).
```bash ```cli
$ python -m spacy debug model ./config.cfg tagger -P0 $ python -m spacy debug model ./config.cfg tagger -P0
``` ```
@ -548,7 +544,7 @@ an all-zero matrix determined by the `nO` and `nI` dimensions. After a first
training step (Step 2), this matrix has clearly updated its values through the training step (Step 2), this matrix has clearly updated its values through the
training feedback loop. training feedback loop.
```bash ```cli
$ python -m spacy debug model ./config.cfg tagger -l "5,15" -DIM -PAR -P0 -P1 -P2 $ python -m spacy debug model ./config.cfg tagger -l "5,15" -DIM -PAR -P0 -P1 -P2
``` ```
@ -632,7 +628,7 @@ in the section `[paths]`.
</Infobox> </Infobox>
```bash ```cli
$ python -m spacy train [config_path] [--output] [--code] [--verbose] [overrides] $ python -m spacy train [config_path] [--output] [--code] [--verbose] [overrides]
``` ```
@ -669,9 +665,8 @@ the [data format](/api/data-formats#config) for details.
</Infobox> </Infobox>
```bash ```cli
$ python -m spacy pretrain [texts_loc] [output_dir] [config_path] $ python -m spacy pretrain [texts_loc] [output_dir] [config_path] [--code] [--resume-path] [--epoch-resume] [overrides]
[--code] [--resume-path] [--epoch-resume] [overrides]
``` ```
| Name | Description | | Name | Description |
@ -698,9 +693,8 @@ skew. To render a sample of dependency parses in a HTML file using the
[displaCy visualizations](/usage/visualizers), set as output directory as the [displaCy visualizations](/usage/visualizers), set as output directory as the
`--displacy-path` argument. `--displacy-path` argument.
```bash ```cli
$ python -m spacy evaluate [model] [data_path] [--output] [--gold-preproc] $ python -m spacy evaluate [model] [data_path] [--output] [--gold-preproc] [--gpu-id] [--displacy-path] [--displacy-limit]
[--gpu-id] [--displacy-path] [--displacy-limit]
``` ```
| Name | Description | | Name | Description |
@ -733,17 +727,16 @@ this, you can set the `--no-sdist` flag.
</Infobox> </Infobox>
```bash ```cli
$ python -m spacy package [input_dir] [output_dir] [--meta-path] [--create-meta] $ python -m spacy package [input_dir] [output_dir] [--meta-path] [--create-meta] [--no-sdist] [--version] [--force]
[--no-sdist] [--version] [--force]
``` ```
> #### Example > #### Example
> >
> ```bash > ```cli
> python -m spacy package /input /output > $ python -m spacy package /input /output
> cd /output/en_model-0.0.0 > $ cd /output/en_model-0.0.0
> pip install dist/en_model-0.0.0.tar.gz > $ pip install dist/en_model-0.0.0.tar.gz
> ``` > ```
| Name | Description | | Name | Description |
@ -775,19 +768,19 @@ can provide any other repo (public or private) that you have access to using the
<!-- TODO: update example once we've decided on repo structure --> <!-- TODO: update example once we've decided on repo structure -->
```bash ```cli
$ python -m spacy project clone [name] [dest] [--repo] $ python -m spacy project clone [name] [dest] [--repo]
``` ```
> #### Example > #### Example
> >
> ```bash > ```cli
> $ python -m spacy project clone some_example > $ python -m spacy project clone some_example
> ``` > ```
> >
> Clone from custom repo: > Clone from custom repo:
> >
> ```bash > ```cli
> $ python -m spacy project clone template --repo https://github.com/your_org/your_repo > $ python -m spacy project clone template --repo https://github.com/your_org/your_repo
> ``` > ```
@ -810,13 +803,13 @@ considered "private" and you have to take care of putting them into the
destination directory yourself. If a local path is provided, the asset is copied destination directory yourself. If a local path is provided, the asset is copied
into the current project. into the current project.
```bash ```cli
$ python -m spacy project assets [project_dir] $ python -m spacy project assets [project_dir]
``` ```
> #### Example > #### Example
> >
> ```bash > ```cli
> $ python -m spacy project assets > $ python -m spacy project assets
> ``` > ```
@ -835,13 +828,13 @@ all commands in the workflow are run, in order. If commands define
re-run if state has changed. For example, if the input dataset changes, a re-run if state has changed. For example, if the input dataset changes, a
preprocessing command that depends on those files will be re-run. preprocessing command that depends on those files will be re-run.
```bash ```cli
$ python -m spacy project run [subcommand] [project_dir] [--force] [--dry] $ python -m spacy project run [subcommand] [project_dir] [--force] [--dry]
``` ```
> #### Example > #### Example
> >
> ```bash > ```cli
> $ python -m spacy project run train > $ python -m spacy project run train
> ``` > ```
@ -874,16 +867,16 @@ You'll also need to add the assets you want to track with
</Infobox> </Infobox>
```bash ```cli
$ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose] $ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose]
``` ```
> #### Example > #### Example
> >
> ```bash > ```cli
> git init > $ git init
> dvc init > $ dvc init
> python -m spacy project dvc all > $ python -m spacy project dvc all
> ``` > ```
| Name | Description | | Name | Description |

View File

@ -118,8 +118,8 @@ need paths, you can define them here. All config values can also be
[`spacy train`](/api/cli#train), which is especially relevant for data paths [`spacy train`](/api/cli#train), which is especially relevant for data paths
that you don't want to hard-code in your config file. that you don't want to hard-code in your config file.
```bash ```cli
$ python -m spacy train ./config.cfg --paths.train ./corpus/train.spacy $ python -m spacy train config.cfg --paths.train ./corpus/train.spacy
``` ```
### training {#config-training tag="section"} ### training {#config-training tag="section"}
@ -209,8 +209,8 @@ objects to JSON, you can now serialize them directly using the
[`spacy convert`](/api/cli) lets you convert your JSON data to the new `.spacy` [`spacy convert`](/api/cli) lets you convert your JSON data to the new `.spacy`
format: format:
```bash ```cli
$ python -m spacy convert ./data.json ./output $ python -m spacy convert ./data.json ./output.spacy
``` ```
</Infobox> </Infobox>

View File

@ -110,9 +110,9 @@ in `/opt/nvidia/cuda`, you would run:
```bash ```bash
### Installation with CUDA ### Installation with CUDA
export CUDA_PATH="/opt/nvidia/cuda" $ export CUDA_PATH="/opt/nvidia/cuda"
pip install cupy-cuda102 $ pip install cupy-cuda102
pip install spacy-transformers $ pip install spacy-transformers
``` ```
### Runtime usage {#transformers-runtime} ### Runtime usage {#transformers-runtime}
@ -130,7 +130,7 @@ The `Transformer` component sets the
[`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute, [`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
which lets you access the transformers outputs at runtime. which lets you access the transformers outputs at runtime.
```bash ```cli
$ python -m spacy download en_core_trf_lg $ python -m spacy download en_core_trf_lg
``` ```
@ -292,8 +292,8 @@ function. You can make it available via the `--code` argument that can point to
a Python file. For more details on training with custom code, see the a Python file. For more details on training with custom code, see the
[training documentation](/usage/training#custom-code). [training documentation](/usage/training#custom-code).
```bash ```cli
$ python -m spacy train ./config.cfg --code ./code.py python -m spacy train ./config.cfg --code ./code.py
``` ```
### Customizing the model implementations {#training-custom-model} ### Customizing the model implementations {#training-custom-model}

View File

@ -40,7 +40,7 @@ $ pip install -U spacy
> After installation you need to download a language model. For more info and > After installation you need to download a language model. For more info and
> available models, see the [docs on models](/models). > available models, see the [docs on models](/models).
> >
> ```bash > ```cli
> $ python -m spacy download en_core_web_sm > $ python -m spacy download en_core_web_sm
> >
> >>> import spacy > >>> import spacy
@ -62,9 +62,9 @@ When using pip it is generally recommended to install packages in a virtual
environment to avoid modifying system state: environment to avoid modifying system state:
```bash ```bash
python -m venv .env $ python -m venv .env
source .env/bin/activate $ source .env/bin/activate
pip install spacy $ pip install spacy
``` ```
### conda {#conda} ### conda {#conda}
@ -106,9 +106,9 @@ links created in different virtual environments. It's recommended to run the
command with `python -m` to make sure you're executing the correct version of command with `python -m` to make sure you're executing the correct version of
spaCy. spaCy.
```bash ```cli
pip install -U spacy $ pip install -U spacy
python -m spacy validate $ python -m spacy validate
``` ```
### Run spaCy with GPU {#gpu new="2.0.14"} ### Run spaCy with GPU {#gpu new="2.0.14"}
@ -156,15 +156,15 @@ system. See notes on [Ubuntu](#source-ubuntu), [macOS / OS X](#source-osx) and
[Windows](#source-windows) for details. [Windows](#source-windows) for details.
```bash ```bash
python -m pip install -U pip # update pip $ python -m pip install -U pip # update pip
git clone https://github.com/explosion/spaCy # clone spaCy $ git clone https://github.com/explosion/spaCy # clone spaCy
cd spaCy # navigate into directory $ cd spaCy # navigate into dir
python -m venv .env # create environment in .env $ python -m venv .env # create environment in .env
source .env/bin/activate # activate virtual environment $ source .env/bin/activate # activate virtual env
\export PYTHONPATH=`pwd` # set Python path to spaCy directory $ export PYTHONPATH=`pwd` # set Python path to spaCy dir
pip install -r requirements.txt # install all requirements $ pip install -r requirements.txt # install all requirements
python setup.py build_ext --inplace # compile spaCy $ python setup.py build_ext --inplace # compile spaCy
``` ```
Compared to regular install via pip, the Compared to regular install via pip, the
@ -209,20 +209,18 @@ that directory. Don't forget to also install the test utilities via spaCy's
[`requirements.txt`](https://github.com/explosion/spaCy/tree/master/requirements.txt): [`requirements.txt`](https://github.com/explosion/spaCy/tree/master/requirements.txt):
```bash ```bash
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))" $ python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
pip install -r path/to/requirements.txt $ pip install -r path/to/requirements.txt
python -m pytest [spacy directory] $ python -m pytest [spacy directory]
``` ```
Calling `pytest` on the spaCy directory will run only the basic tests. The flag Calling `pytest` on the spaCy directory will run only the basic tests. The flag
`--slow` is optional and enables additional tests that take longer. `--slow` is optional and enables additional tests that take longer.
```bash ```bash
# make sure you are using recent pytest version $ python -m pip install -U pytest # update pytest
python -m pip install -U pytest $ python -m pytest [spacy directory] # basic tests
$ python -m pytest [spacy directory] --slow # basic and slow tests
python -m pytest [spacy directory] # basic tests
python -m pytest [spacy directory] --slow # basic and slow tests
``` ```
## Troubleshooting guide {#troubleshooting} ## Troubleshooting guide {#troubleshooting}
@ -283,7 +281,7 @@ only 65535 in a narrow unicode build. You can check this by running the
following command: following command:
```bash ```bash
python -c "import sys; print(sys.maxunicode)" $ python -c "import sys; print(sys.maxunicode)"
``` ```
If you're running a narrow unicode build, reinstall Python and use a wide If you're running a narrow unicode build, reinstall Python and use a wide
@ -305,8 +303,8 @@ run `source ~/.bash_profile` or `source ~/.zshrc`. Make sure to add **both
lines** for `LC_ALL` and `LANG`. lines** for `LC_ALL` and `LANG`.
```bash ```bash
\export LC_ALL=en_US.UTF-8 $ export LC_ALL=en_US.UTF-8
\export LANG=en_US.UTF-8 $ export LANG=en_US.UTF-8
``` ```
</Accordion> </Accordion>

View File

@ -1588,9 +1588,9 @@ some nice Latin vectors. You can then pass the directory path to
> doc1.similarity(doc2) > doc1.similarity(doc2)
> ``` > ```
```bash ```cli
wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz $ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz
python -m spacy init model en /tmp/la_vectors_wiki_lg --vectors-loc cc.la.300.vec.gz $ python -m spacy init model en /tmp/la_vectors_wiki_lg --vectors-loc cc.la.300.vec.gz
``` ```
<Accordion title="How to optimize vector coverage" id="custom-vectors-coverage" spaced> <Accordion title="How to optimize vector coverage" id="custom-vectors-coverage" spaced>
@ -1649,8 +1649,8 @@ the vector of "leaving", which is identical. If you're using the
option to easily reduce the size of the vectors as you add them to a spaCy option to easily reduce the size of the vectors as you add them to a spaCy
model: model:
```bash ```cli
$ python -m spacy init model /tmp/la_vectors_web_md --vectors-loc la.300d.vec.tgz --prune-vectors 10000 $ python -m spacy init model en /tmp/la_vectors_web_md --vectors-loc la.300d.vec.tgz --prune-vectors 10000
``` ```
This will create a spaCy model with vectors for the first 10,000 words in the This will create a spaCy model with vectors for the first 10,000 words in the
@ -1741,9 +1741,8 @@ language name, and even train models with it and refer to it in your
> needs to be available during training. You can load a Python file containing > needs to be available during training. You can load a Python file containing
> the code using the `--code` argument: > the code using the `--code` argument:
> >
> ```bash > ```cli
> ### {wrap="true"} > python -m spacy train config.cfg --code code.py
> $ python -m spacy train config.cfg --code code.py
> ``` > ```
```python ```python

View File

@ -116,15 +116,10 @@ The Chinese language class supports three word segmentation options:
<Infobox variant="warning"> <Infobox variant="warning">
In spaCy v3, the default Chinese word segmenter has switched from Jieba to In spaCy v3.0, the default Chinese word segmenter has switched from Jieba to
character segmentation. character segmentation. Also note that
[`pkuseg`](https://github.com/lancopku/pkuseg-python) doesn't yet ship with
</Infobox> pre-compiled wheels for Python 3.8. If you're running Python 3.8, you can
<Infobox variant="warning">
Note that [`pkuseg`](https://github.com/lancopku/pkuseg-python) doesn't yet ship
with pre-compiled wheels for Python 3.8. If you're running Python 3.8, you can
install it from our fork and compile it locally: install it from our fork and compile it locally:
```bash ```bash
@ -174,7 +169,7 @@ nlp.tokenizer.pkuseg_update_user_dict([], reset=True)
</Accordion> </Accordion>
<Accordion title="Details on pretrained and custom Chinese models"> <Accordion title="Details on pretrained and custom Chinese models" spaced>
The [Chinese models](/models/zh) provided by spaCy include a custom `pkuseg` The [Chinese models](/models/zh) provided by spaCy include a custom `pkuseg`
model trained only on model trained only on
@ -247,20 +242,20 @@ best-matching model compatible with your spaCy installation.
> + nlp = spacy.load("en_core_web_sm") > + nlp = spacy.load("en_core_web_sm")
> ``` > ```
```bash ```cli
# Download best-matching version of specific model for your spaCy installation # Download best-matching version of a model for your spaCy installation
python -m spacy download en_core_web_sm $ python -m spacy download en_core_web_sm
# Download exact model version # Download exact model version
python -m spacy download en_core_web_sm-2.2.0 --direct $ python -m spacy download en_core_web_sm-3.0.0 --direct
``` ```
The download command will [install the model](/usage/models#download-pip) via The download command will [install the model](/usage/models#download-pip) via
pip and place the package in your `site-packages` directory. pip and place the package in your `site-packages` directory.
```bash ```cli
pip install spacy $ pip install -U spacy
python -m spacy download en_core_web_sm $ python -m spacy download en_core_web_sm
``` ```
```python ```python
@ -279,10 +274,10 @@ click on the archive link and copy it to your clipboard.
```bash ```bash
# With external URL # With external URL
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz $ pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
# With local file # With local file
pip install /Users/you/en_core_web_sm-3.0.0.tar.gz $ pip install /Users/you/en_core_web_sm-3.0.0.tar.gz
``` ```
By default, this will install the model into your `site-packages` directory. You By default, this will install the model into your `site-packages` directory. You
@ -305,7 +300,7 @@ archive consists of a model directory that contains another directory with the
model data. model data.
```yaml ```yaml
### Directory structure {highlight="7"} ### Directory structure {highlight="6"}
└── en_core_web_md-3.0.0.tar.gz # downloaded archive └── en_core_web_md-3.0.0.tar.gz # downloaded archive
├── setup.py # setup file for pip installation ├── setup.py # setup file for pip installation
├── meta.json # copy of model meta ├── meta.json # copy of model meta

View File

@ -67,8 +67,8 @@ project template and copies the files to a local directory. You can then run the
project, e.g. to train a model and edit the commands and scripts to build fully project, e.g. to train a model and edit the commands and scripts to build fully
custom workflows. custom workflows.
```bash ```cli
$ python -m spacy clone some_example_project python -m spacy project clone some_example_project
``` ```
By default, the project will be cloned into the current working directory. You By default, the project will be cloned into the current working directory. You
@ -95,9 +95,9 @@ to download and where to put them. The
[`spacy project assets`](/api/cli#project-assets) will fetch the project assets [`spacy project assets`](/api/cli#project-assets) will fetch the project assets
for you: for you:
```bash ```
cd some_example_project $ cd some_example_project
python -m spacy project assets $ python -m spacy project assets
``` ```
### 3. Run a command {#run} ### 3. Run a command {#run}
@ -123,7 +123,7 @@ Commands consist of one or more steps and can be run with
[`spacy project run`](/api/cli#project-run). The following will run the command [`spacy project run`](/api/cli#project-run). The following will run the command
`preprocess` defined in the `project.yml`: `preprocess` defined in the `project.yml`:
```bash ```cli
$ python -m spacy project run preprocess $ python -m spacy project run preprocess
``` ```
@ -156,7 +156,7 @@ to turn the best model artifact into an installable Python package. The
following command run the workflow named `all` defined in the `project.yml`, and following command run the workflow named `all` defined in the `project.yml`, and
execute the commands it specifies, in order: execute the commands it specifies, in order:
```bash ```cli
$ python -m spacy project run all $ python -m spacy project run all
``` ```
@ -379,8 +379,8 @@ The [`spacy project clone`](/api/cli#project-clone) command lets you customize
the repo to clone from using the `--repo` option. It calls into `git`, so you'll the repo to clone from using the `--repo` option. It calls into `git`, so you'll
be able to clone from any repo that you have access to, including private repos. be able to clone from any repo that you have access to, including private repos.
```bash ```cli
$ python -m spacy project your_project --repo https://github.com/you/repo python -m spacy project clone your_project --repo https://github.com/you/repo
``` ```
At a minimum, a valid project template needs to contain a At a minimum, a valid project template needs to contain a
@ -445,9 +445,9 @@ to include support for remote storage like Google Cloud Storage, S3, Azure, SSH
and more. and more.
```bash ```bash
pip install dvc # Install DVC $ pip install dvc # Install DVC
git init # Initialize a Git repo $ git init # Initialize a Git repo
dvc init # Initialize a DVC project $ dvc init # Initialize a DVC project
``` ```
<Infobox title="Important note on privacy" variant="warning"> <Infobox title="Important note on privacy" variant="warning">
@ -466,8 +466,8 @@ can then manage your spaCy project like any other DVC project, run
and [`dvc repro`](https://dvc.org/doc/command-reference/repro) to reproduce the and [`dvc repro`](https://dvc.org/doc/command-reference/repro) to reproduce the
workflow or individual commands. workflow or individual commands.
```bash ```cli
$ python -m spacy project dvc [workflow name] $ python -m spacy project dvc [workflow_name]
``` ```
<Infobox title="Important note for multiple workflows" variant="warning"> <Infobox title="Important note for multiple workflows" variant="warning">
@ -508,7 +508,7 @@ and evaluation set.
> #### Example usage > #### Example usage
> >
> ```bash > ```cli
> $ python -m spacy project run annotate > $ python -m spacy project run annotate
> ``` > ```
@ -595,7 +595,7 @@ spacy_streamlit.visualize(MODELS, DEFAULT_TEXT, visualizers=["ner"])
> #### Example usage > #### Example usage
> >
> ```bash > ```cli
> $ python -m spacy project run visualize > $ python -m spacy project run visualize
> ``` > ```
@ -636,8 +636,8 @@ API.
> #### Example usage > #### Example usage
> >
> ```bash > ```cli
> $ python -m spacy project run visualize > $ python -m spacy project run serve
> ``` > ```
<!-- prettier-ignore --> <!-- prettier-ignore -->

View File

@ -562,11 +562,11 @@ import DisplaCyEntSnekHtml from 'images/displacy-ent-snek.html'
## Saving, loading and distributing models {#models} ## Saving, loading and distributing models {#models}
After training your model, you'll usually want to save its state, and load it After training your model, you'll usually want to save its state, and load it
back later. You can do this with the back later. You can do this with the [`Language.to_disk`](/api/language#to_disk)
[`Language.to_disk()`](/api/language#to_disk) method: method:
```python ```python
nlp.to_disk('/home/me/data/en_example_model') nlp.to_disk("./en_example_model")
``` ```
The directory will be created if it doesn't exist, and the whole pipeline data, The directory will be created if it doesn't exist, and the whole pipeline data,
@ -629,8 +629,8 @@ docs.
> } > }
> ``` > ```
```bash ```cli
$ python -m spacy package /home/me/data/en_example_model /home/me/my_models $ python -m spacy package ./en_example_model ./my_models
``` ```
This command will create a model package directory and will run This command will create a model package directory and will run

View File

@ -160,7 +160,7 @@ the website or company in a specific context.
> #### Loading models > #### Loading models
> >
> ```bash > ```cli
> $ python -m spacy download en_core_web_sm > $ python -m spacy download en_core_web_sm
> >
> >>> import spacy > >>> import spacy

View File

@ -66,7 +66,7 @@ the [`init fill-config`](/api/cli#init-fill-config) command to fill in the
remaining defaults. Training configs should always be **complete and without remaining defaults. Training configs should always be **complete and without
hidden defaults**, to keep your experiments reproducible. hidden defaults**, to keep your experiments reproducible.
```bash ```cli
$ python -m spacy init fill-config base_config.cfg config.cfg $ python -m spacy init fill-config base_config.cfg config.cfg
``` ```
@ -76,8 +76,8 @@ $ python -m spacy init fill-config base_config.cfg config.cfg
> your training and development data, get useful stats, and find problems like > your training and development data, get useful stats, and find problems like
> invalid entity annotations, cyclic dependencies, low data labels and more. > invalid entity annotations, cyclic dependencies, low data labels and more.
> >
> ```bash > ```cli
> $ python -m spacy debug data config.cfg --verbose > $ python -m spacy debug data config.cfg
> ``` > ```
Instead of exporting your starter config from the quickstart widget and Instead of exporting your starter config from the quickstart widget and
@ -88,7 +88,7 @@ add your data and run [`train`](/api/cli#train) with your config. See the
spaCy's binary `.spacy` format. You can either include the data paths in the spaCy's binary `.spacy` format. You can either include the data paths in the
`[paths]` section of your config, or pass them in via the command line. `[paths]` section of your config, or pass them in via the command line.
```bash ```cli
$ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy $ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy
``` ```
@ -186,9 +186,8 @@ For cases like this, you can set additional command-line options starting with
`--paths.train ./corpus/train.spacy` sets the `train` value in the `[paths]` `--paths.train ./corpus/train.spacy` sets the `train` value in the `[paths]`
block. block.
```bash ```cli
$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy $ python -m spacy train config.cfg --paths.train ./corpus/train.spacy --paths.dev ./corpus/dev.spacy --training.batch_size 128
--paths.dev ./corpus/dev.spacy --training.batch_size 128
``` ```
Only existing sections and values in the config can be overwritten. At the end Only existing sections and values in the config can be overwritten. At the end
@ -486,8 +485,9 @@ still look good.
### Training with custom code {#custom-code} ### Training with custom code {#custom-code}
> ```bash > #### Example
> ### Example {wrap="true"} >
> ```cli
> $ python -m spacy train config.cfg --code functions.py > $ python -m spacy train config.cfg --code functions.py
> ``` > ```
@ -605,9 +605,8 @@ you can now run [`spacy train`](/api/cli#train) and point the argument `--code`
to your Python file. Before loading the config, spaCy will import the to your Python file. Before loading the config, spaCy will import the
`functions.py` module and your custom functions will be registered. `functions.py` module and your custom functions will be registered.
```bash ```cli
### Training with custom code {wrap="true"} $ python -m spacy train config.cfg --output ./output --code ./functions.py
python -m spacy train config.cfg --output ./output --code ./functions.py
``` ```
#### Example: Custom batch size schedule {#custom-code-schedule} #### Example: Custom batch size schedule {#custom-code-schedule}

View File

@ -212,14 +212,15 @@ Note that spaCy v3.0 now requires **Python 3.6+**.
### Removed or renamed API {#incompat-removed} ### Removed or renamed API {#incompat-removed}
| Removed | Replacement | | Removed | Replacement |
| -------------------------------------------------------- | ----------------------------------------------------- | | ------------------------------------------------------ | ----------------------------------------------------------------------------------------- |
| `Language.disable_pipes` | [`Language.select_pipes`](/api/language#select_pipes) | | `Language.disable_pipes` | [`Language.select_pipes`](/api/language#select_pipes) |
| `GoldParse` | [`Example`](/api/example) | | `GoldParse` | [`Example`](/api/example) |
| `GoldCorpus` | [`Corpus`](/api/corpus) | | `GoldCorpus` | [`Corpus`](/api/corpus) |
| `spacy debug-data` | [`spacy debug data`](/api/cli#debug-data) | | `KnowledgeBase.load_bulk` `KnowledgeBase.dump` | [`KnowledgeBase.from_disk`](/api/kb#from_disk) [`KnowledgeBase.to_disk`](/api/kb#to_disk) |
| `spacy profile` | [`spacy debug profile`](/api/cli#debug-profile) | | `spacy debug-data` | [`spacy debug data`](/api/cli#debug-data) |
| `spacy link`, `util.set_data_path`, `util.get_data_path` | not needed, model symlinks are deprecated | | `spacy profile` | [`spacy debug profile`](/api/cli#debug-profile) |
| `spacy link` `util.set_data_path` `util.get_data_path` | not needed, model symlinks are deprecated |
The following deprecated methods, attributes and arguments were removed in v3.0. The following deprecated methods, attributes and arguments were removed in v3.0.
Most of them have been **deprecated for a while** and many would previously Most of them have been **deprecated for a while** and many would previously
@ -412,12 +413,11 @@ spaCy v3.0 uses a new
serializing a [`DocBin`](/api/docbin), which represents a collection of `Doc` serializing a [`DocBin`](/api/docbin), which represents a collection of `Doc`
objects. This means that you can train spaCy models using the same format it objects. This means that you can train spaCy models using the same format it
outputs: annotated `Doc` objects. The binary format is extremely **efficient in outputs: annotated `Doc` objects. The binary format is extremely **efficient in
storage**, especially when packing multiple documents together. storage**, especially when packing multiple documents together. You can convert
your existing JSON-formatted data using the [`spacy convert`](/api/cli#convert)
command, which outputs `.spacy` files:
You can convert your existing JSON-formatted data using the ```cli
[`spacy convert`](/api/cli#convert) command, which outputs `.spacy` files:
```bash
$ python -m spacy convert ./training.json ./output $ python -m spacy convert ./training.json ./output
``` ```
@ -429,7 +429,7 @@ The easiest way to get started with a training config is to use the
requirements, and it will auto-generate a starter config with the best-matching requirements, and it will auto-generate a starter config with the best-matching
default settings. default settings.
```bash ```cli
$ python -m spacy init config ./config.cfg --lang en --pipeline tagger,parser $ python -m spacy init config ./config.cfg --lang en --pipeline tagger,parser
``` ```

View File

@ -8,7 +8,7 @@ import { window } from 'browser-monads'
import CUSTOM_TYPES from '../../meta/type-annotations.json' import CUSTOM_TYPES from '../../meta/type-annotations.json'
import { isString, htmlToReact } from './util' import { isString, htmlToReact } from './util'
import Link from './link' import Link, { OptionalLink } from './link'
import GitHubCode from './github' import GitHubCode from './github'
import classes from '../styles/code.module.sass' import classes from '../styles/code.module.sass'
@ -89,6 +89,91 @@ export const TypeAnnotation = ({ lang = 'python', link = true, children }) => {
) )
} }
function replacePrompt(line, prompt, isFirst = false) {
let result = line
const hasPrompt = result.startsWith(`${prompt} `)
const showPrompt = hasPrompt || isFirst
if (hasPrompt) result = result.slice(2)
return result && showPrompt ? `<span data-prompt="${prompt}">${result}</span>` : result
}
function parseArgs(raw) {
const commandGroups = ['init', 'debug', 'project']
let args = raw.split(' ').filter(arg => arg)
const result = {}
while (args.length) {
let opt = args.shift()
if (opt.length > 1 && opt.startsWith('-')) {
const isFlag = !args.length || (args[0].length > 1 && args[0].startsWith('-'))
result[opt] = isFlag ? true : args.shift()
} else {
const key = commandGroups.includes(opt) ? `${opt} ${args.shift()}` : opt
result[key] = null
}
}
return result
}
function formatCode(html, lang, prompt) {
if (lang === 'cli') {
const cliRegex = /^(\$ )?python -m spacy/
const lines = html
.trim()
.split('\n')
.map((line, i) => {
if (cliRegex.test(line)) {
const text = line.replace(cliRegex, '')
const args = parseArgs(text)
const cmd = Object.keys(args).map((key, i) => {
const value = args[key]
return value === null || value === true || i === 0 ? key : `${key} ${value}`
})
return (
<Fragment key={i}>
<span data-prompt="$" className={classes.cliArgSubtle}>
python -m
</span>{' '}
<span>spacy</span>{' '}
{cmd.map((item, j) => {
const isCmd = j === 0
const url = isCmd ? `/api/cli#${item.replace(' ', '-')}` : null
const isAbstract = isString(item) && /^\[(.+)\]$/.test(item)
const itemClassNames = classNames(classes.cliArg, {
[classes.cliArgHighlight]: isCmd,
[classes.cliArgEmphasis]: isAbstract,
})
const text = isAbstract ? item.slice(1, -1) : item
return (
<Fragment key={j}>
{j !== 0 && ' '}
<span className={itemClassNames}>
<OptionalLink hidden hideIcon to={url}>
{text}
</OptionalLink>
</span>
</Fragment>
)
})}
</Fragment>
)
}
const htmlLine = replacePrompt(highlightCode('bash', line), '$')
return htmlToReact(htmlLine)
})
return lines.map((line, i) => (
<Fragment key={i}>
{i !== 0 && <br />}
{line}
</Fragment>
))
}
const result = html
.split('\n')
.map((line, i) => (prompt ? replacePrompt(line, prompt, i === 0) : line))
.join('\n')
return htmlToReact(result)
}
export class Code extends React.Component { export class Code extends React.Component {
state = { Juniper: null } state = { Juniper: null }
@ -136,7 +221,8 @@ export class Code extends React.Component {
children, children,
} = this.props } = this.props
const codeClassNames = classNames(classes.code, className, `language-${lang}`, { const codeClassNames = classNames(classes.code, className, `language-${lang}`, {
[classes.wrap]: !!highlight || !!wrap, [classes.wrap]: !!highlight || !!wrap || lang === 'cli',
[classes.cli]: lang === 'cli',
}) })
const ghClassNames = classNames(codeClassNames, classes.maxHeight) const ghClassNames = classNames(codeClassNames, classes.maxHeight)
const { Juniper } = this.state const { Juniper } = this.state
@ -154,14 +240,14 @@ export class Code extends React.Component {
const codeText = Array.isArray(children) ? children.join('') : children || '' const codeText = Array.isArray(children) ? children.join('') : children || ''
const highlightRange = highlight ? rangeParser.parse(highlight).filter(n => n > 0) : [] const highlightRange = highlight ? rangeParser.parse(highlight).filter(n => n > 0) : []
const html = lang === 'none' ? codeText : highlightCode(lang, codeText, highlightRange) const rawHtml = ['none', 'cli'].includes(lang)
? codeText
: highlightCode(lang, codeText, highlightRange)
const html = formatCode(rawHtml, lang, prompt)
return ( return (
<> <>
{title && <h4 className={classes.title}>{title}</h4>} {title && <h4 className={classes.title}>{title}</h4>}
<code className={codeClassNames} data-prompt={prompt}> <code className={codeClassNames}>{html}</code>
{htmlToReact(html)}
</code>
</> </>
) )
} }

View File

@ -117,7 +117,7 @@ const Quickstart = ({
{help && ( {help && (
<span data-tooltip={help} className={classes.help}> <span data-tooltip={help} className={classes.help}>
{' '} {' '}
<Icon name="help" width={16} spaced /> <Icon name="help" width={16} />
</span> </span>
)} )}
</div> </div>
@ -201,7 +201,7 @@ const Quickstart = ({
className={classes.help} className={classes.help}
> >
{' '} {' '}
<Icon name="help" width={16} spaced /> <Icon name="help" width={16} />
</span> </span>
)} )}
</label> </label>

Binary file not shown.

Binary file not shown.

View File

@ -28,7 +28,7 @@ $border-radius: 6px
margin-top: 0 !important margin-top: 0 !important
code code
padding: 0 padding: 0 !important
margin: 0 margin: 0
h4 h4

View File

@ -27,7 +27,7 @@
padding: 1.75em 1.5em padding: 1.75em 1.5em
.code .code
&[data-prompt]:before, &[data-prompt]:before, span[data-prompt]:before
content: attr(data-prompt) content: attr(data-prompt)
margin-right: 0.65em margin-right: 0.65em
display: inline-block display: inline-block
@ -163,3 +163,31 @@
font-weight: normal font-weight: normal
padding-top: 0.1rem padding-top: 0.1rem
color: var(--color-subtle-dark) color: var(--color-subtle-dark)
.cli
padding-top: calc(var(--spacing-sm) - 6px)
padding-bottom: calc(var(--spacing-sm) - 12px)
[data-prompt]:before
color: var(--color-subtle)
.cli-arg
border: 1px solid var(--color-dark)
padding: 1px 6px
margin-bottom: 5px
border-radius: 0.5em
display: inline-block
a
color: inherit !important
.cli-arg-highlight
background: var(--color-theme)
border-color: var(--color-theme)
color: var(--color-back) !important
.cli-arg-subtle
color: var(--syntax-comment)
.cli-arg-emphasis
font-style: italic

View File

@ -157,6 +157,14 @@
font-display: fallback font-display: fallback
src: url("../fonts/jetbrainsmono-regular.woff") format("woff"), url("../fonts/jetbrainsmono-regular.woff2") format("woff2") src: url("../fonts/jetbrainsmono-regular.woff") format("woff"), url("../fonts/jetbrainsmono-regular.woff2") format("woff2")
@font-face
font-family: "JetBrains Mono"
font-style: italic
font-weight: 500
font-display: fallback
src: url("../fonts/jetbrainsmono-italic.woff") format("woff"), url("../fonts/jetbrainsmono-italic.woff2") format("woff2")
/* Reset */ /* Reset */
*, *:before, *:after *, *:before, *:after
@ -366,6 +374,12 @@ body [id]:target
&.operator &.operator
color: var(--syntax-comment) color: var(--syntax-comment)
[class*="language-bash"] .token
&.function
color: var(--color-subtle)
&.operator, &.variable
color: var(--syntax-comment)
// Settings for ini syntax (config files) // Settings for ini syntax (config files)
[class*="language-ini"] [class*="language-ini"]