mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 09:14:32 +03:00
Update docs [ci skip]
This commit is contained in:
parent
c0f6e77a41
commit
13291e97ba
|
@ -39,8 +39,8 @@ the model name to be specified with its version (e.g. `en_core_web_sm-2.2.0`).
|
|||
> to a local PyPi installation and fetching it straight from there. This will
|
||||
> also allow you to add it as a versioned package dependency to your project.
|
||||
|
||||
```bash
|
||||
$ python -m spacy download [model] [--direct] [pip args]
|
||||
```cli
|
||||
$ python -m spacy download [model] [--direct] [pip_args]
|
||||
```
|
||||
|
||||
| Name | Description |
|
||||
|
@ -57,11 +57,11 @@ Print information about your spaCy installation, models and local setup, and
|
|||
generate [Markdown](https://en.wikipedia.org/wiki/Markdown)-formatted markup to
|
||||
copy-paste into [GitHub issues](https://github.com/explosion/spaCy/issues).
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy info [--markdown] [--silent]
|
||||
```
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy info [model] [--markdown] [--silent]
|
||||
```
|
||||
|
||||
|
@ -88,7 +88,7 @@ and command for updating are shown.
|
|||
> suite, to ensure all models are up to date before proceeding. If incompatible
|
||||
> models are found, it will return `1`.
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy validate
|
||||
```
|
||||
|
||||
|
@ -111,14 +111,14 @@ config. The settings you specify will impact the suggested model architectures
|
|||
and pipeline setup, as well as the hyperparameters. You can also adjust and
|
||||
customize those settings in your config file later.
|
||||
|
||||
> ```bash
|
||||
> ### Example {wrap="true"}
|
||||
> #### Example
|
||||
>
|
||||
> ```cli
|
||||
> $ python -m spacy init config config.cfg --lang en --pipeline ner,textcat --optimize accuracy
|
||||
> ```
|
||||
|
||||
```bash
|
||||
$ python -m spacy init config [output_file] [--lang] [--pipeline]
|
||||
[--optimize] [--cpu]
|
||||
```cli
|
||||
$ python -m spacy init config [output_file] [--lang] [--pipeline] [--optimize] [--cpu]
|
||||
```
|
||||
|
||||
| Name | Description |
|
||||
|
@ -143,12 +143,13 @@ be created, and their signatures are used to find the defaults. If your config
|
|||
contains a problem that can't be resolved automatically, spaCy will show you a
|
||||
validation error with more details.
|
||||
|
||||
> ```bash
|
||||
> ### Example {wrap="true"}
|
||||
> #### Example
|
||||
>
|
||||
> ```cli
|
||||
> $ python -m spacy init fill-config base.cfg config.cfg
|
||||
> ```
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy init fill-config [base_path] [output_file] [--diff]
|
||||
```
|
||||
|
||||
|
@ -175,9 +176,8 @@ The `init-model` command is now available as a subcommand of `spacy init`.
|
|||
|
||||
</Infobox>
|
||||
|
||||
```bash
|
||||
$ python -m spacy init model [lang] [output_dir] [--jsonl-loc] [--vectors-loc]
|
||||
[--prune-vectors]
|
||||
```cli
|
||||
$ python -m spacy init model [lang] [output_dir] [--jsonl-loc] [--vectors-loc] [--prune-vectors]
|
||||
```
|
||||
|
||||
| Name | Description |
|
||||
|
@ -200,10 +200,8 @@ Convert files into spaCy's
|
|||
management functions. The converter can be specified on the command line, or
|
||||
chosen based on the file extension of the input file.
|
||||
|
||||
```bash
|
||||
$ python -m spacy convert [input_file] [output_dir] [--converter]
|
||||
[--file-type] [--n-sents] [--seg-sents] [--model] [--morphology]
|
||||
[--merge-subtokens] [--ner-map] [--lang]
|
||||
```cli
|
||||
$ python -m spacy convert [input_file] [output_dir] [--converter] [--file-type] [--n-sents] [--seg-sents] [--model] [--morphology] [--merge-subtokens] [--ner-map] [--lang]
|
||||
```
|
||||
|
||||
| Name | Description |
|
||||
|
@ -246,13 +244,13 @@ errors at once and some issues are only shown once previous errors have been
|
|||
fixed. To auto-fill a partial config and save the result, you can use the
|
||||
[`init fillconfig`](/api/cli#init-fill-config) command.
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy debug config [config_path] [--code_path] [overrides]
|
||||
```
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```bash
|
||||
> ```cli
|
||||
> $ python -m spacy debug config ./config.cfg
|
||||
> ```
|
||||
|
||||
|
@ -298,14 +296,13 @@ takes the same arguments as `train` and reads settings off the
|
|||
|
||||
</Infobox>
|
||||
|
||||
```bash
|
||||
$ python -m spacy debug data [config_path] [--code] [--ignore-warnings]
|
||||
[--verbose] [--no-format] [overrides]
|
||||
```cli
|
||||
$ python -m spacy debug data [config_path] [--code] [--ignore-warnings] [--verbose] [--no-format] [overrides]
|
||||
```
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```bash
|
||||
> ```cli
|
||||
> $ python -m spacy debug data ./config.cfg
|
||||
> ```
|
||||
|
||||
|
@ -473,7 +470,7 @@ The `profile` command is now available as a subcommand of `spacy debug`.
|
|||
|
||||
</Infobox>
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy debug profile [model] [inputs] [--n-texts]
|
||||
```
|
||||
|
||||
|
@ -490,9 +487,8 @@ $ python -m spacy debug profile [model] [inputs] [--n-texts]
|
|||
Debug a Thinc [`Model`](https://thinc.ai/docs/api-model) by running it on a
|
||||
sample text and checking how it updates its internal weights and parameters.
|
||||
|
||||
```bash
|
||||
$ python -m spacy debug model [config_path] [component] [--layers] [-DIM]
|
||||
[-PAR] [-GRAD] [-ATTR] [-P0] [-P1] [-P2] [P3] [--gpu-id]
|
||||
```cli
|
||||
$ python -m spacy debug model [config_path] [component] [--layers] [-DIM] [-PAR] [-GRAD] [-ATTR] [-P0] [-P1] [-P2] [P3] [--gpu-id]
|
||||
```
|
||||
|
||||
<Accordion title="Example outputs" spaced>
|
||||
|
@ -502,7 +498,7 @@ model ("Step 0"), which helps us to understand the internal structure of the
|
|||
Neural Network, and to focus on specific layers that we want to inspect further
|
||||
(see next example).
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy debug model ./config.cfg tagger -P0
|
||||
```
|
||||
|
||||
|
@ -548,7 +544,7 @@ an all-zero matrix determined by the `nO` and `nI` dimensions. After a first
|
|||
training step (Step 2), this matrix has clearly updated its values through the
|
||||
training feedback loop.
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy debug model ./config.cfg tagger -l "5,15" -DIM -PAR -P0 -P1 -P2
|
||||
```
|
||||
|
||||
|
@ -632,7 +628,7 @@ in the section `[paths]`.
|
|||
|
||||
</Infobox>
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy train [config_path] [--output] [--code] [--verbose] [overrides]
|
||||
```
|
||||
|
||||
|
@ -669,9 +665,8 @@ the [data format](/api/data-formats#config) for details.
|
|||
|
||||
</Infobox>
|
||||
|
||||
```bash
|
||||
$ python -m spacy pretrain [texts_loc] [output_dir] [config_path]
|
||||
[--code] [--resume-path] [--epoch-resume] [overrides]
|
||||
```cli
|
||||
$ python -m spacy pretrain [texts_loc] [output_dir] [config_path] [--code] [--resume-path] [--epoch-resume] [overrides]
|
||||
```
|
||||
|
||||
| Name | Description |
|
||||
|
@ -698,9 +693,8 @@ skew. To render a sample of dependency parses in a HTML file using the
|
|||
[displaCy visualizations](/usage/visualizers), set as output directory as the
|
||||
`--displacy-path` argument.
|
||||
|
||||
```bash
|
||||
$ python -m spacy evaluate [model] [data_path] [--output] [--gold-preproc]
|
||||
[--gpu-id] [--displacy-path] [--displacy-limit]
|
||||
```cli
|
||||
$ python -m spacy evaluate [model] [data_path] [--output] [--gold-preproc] [--gpu-id] [--displacy-path] [--displacy-limit]
|
||||
```
|
||||
|
||||
| Name | Description |
|
||||
|
@ -733,17 +727,16 @@ this, you can set the `--no-sdist` flag.
|
|||
|
||||
</Infobox>
|
||||
|
||||
```bash
|
||||
$ python -m spacy package [input_dir] [output_dir] [--meta-path] [--create-meta]
|
||||
[--no-sdist] [--version] [--force]
|
||||
```cli
|
||||
$ python -m spacy package [input_dir] [output_dir] [--meta-path] [--create-meta] [--no-sdist] [--version] [--force]
|
||||
```
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```bash
|
||||
> python -m spacy package /input /output
|
||||
> cd /output/en_model-0.0.0
|
||||
> pip install dist/en_model-0.0.0.tar.gz
|
||||
> ```cli
|
||||
> $ python -m spacy package /input /output
|
||||
> $ cd /output/en_model-0.0.0
|
||||
> $ pip install dist/en_model-0.0.0.tar.gz
|
||||
> ```
|
||||
|
||||
| Name | Description |
|
||||
|
@ -775,19 +768,19 @@ can provide any other repo (public or private) that you have access to using the
|
|||
|
||||
<!-- TODO: update example once we've decided on repo structure -->
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy project clone [name] [dest] [--repo]
|
||||
```
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```bash
|
||||
> ```cli
|
||||
> $ python -m spacy project clone some_example
|
||||
> ```
|
||||
>
|
||||
> Clone from custom repo:
|
||||
>
|
||||
> ```bash
|
||||
> ```cli
|
||||
> $ python -m spacy project clone template --repo https://github.com/your_org/your_repo
|
||||
> ```
|
||||
|
||||
|
@ -810,13 +803,13 @@ considered "private" and you have to take care of putting them into the
|
|||
destination directory yourself. If a local path is provided, the asset is copied
|
||||
into the current project.
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy project assets [project_dir]
|
||||
```
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```bash
|
||||
> ```cli
|
||||
> $ python -m spacy project assets
|
||||
> ```
|
||||
|
||||
|
@ -835,13 +828,13 @@ all commands in the workflow are run, in order. If commands define
|
|||
re-run if state has changed. For example, if the input dataset changes, a
|
||||
preprocessing command that depends on those files will be re-run.
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy project run [subcommand] [project_dir] [--force] [--dry]
|
||||
```
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```bash
|
||||
> ```cli
|
||||
> $ python -m spacy project run train
|
||||
> ```
|
||||
|
||||
|
@ -874,16 +867,16 @@ You'll also need to add the assets you want to track with
|
|||
|
||||
</Infobox>
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose]
|
||||
```
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```bash
|
||||
> git init
|
||||
> dvc init
|
||||
> python -m spacy project dvc all
|
||||
> ```cli
|
||||
> $ git init
|
||||
> $ dvc init
|
||||
> $ python -m spacy project dvc all
|
||||
> ```
|
||||
|
||||
| Name | Description |
|
||||
|
|
|
@ -118,8 +118,8 @@ need paths, you can define them here. All config values can also be
|
|||
[`spacy train`](/api/cli#train), which is especially relevant for data paths
|
||||
that you don't want to hard-code in your config file.
|
||||
|
||||
```bash
|
||||
$ python -m spacy train ./config.cfg --paths.train ./corpus/train.spacy
|
||||
```cli
|
||||
$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy
|
||||
```
|
||||
|
||||
### training {#config-training tag="section"}
|
||||
|
@ -209,8 +209,8 @@ objects to JSON, you can now serialize them directly using the
|
|||
[`spacy convert`](/api/cli) lets you convert your JSON data to the new `.spacy`
|
||||
format:
|
||||
|
||||
```bash
|
||||
$ python -m spacy convert ./data.json ./output
|
||||
```cli
|
||||
$ python -m spacy convert ./data.json ./output.spacy
|
||||
```
|
||||
|
||||
</Infobox>
|
||||
|
|
|
@ -110,9 +110,9 @@ in `/opt/nvidia/cuda`, you would run:
|
|||
|
||||
```bash
|
||||
### Installation with CUDA
|
||||
export CUDA_PATH="/opt/nvidia/cuda"
|
||||
pip install cupy-cuda102
|
||||
pip install spacy-transformers
|
||||
$ export CUDA_PATH="/opt/nvidia/cuda"
|
||||
$ pip install cupy-cuda102
|
||||
$ pip install spacy-transformers
|
||||
```
|
||||
|
||||
### Runtime usage {#transformers-runtime}
|
||||
|
@ -130,7 +130,7 @@ The `Transformer` component sets the
|
|||
[`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
|
||||
which lets you access the transformers outputs at runtime.
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy download en_core_trf_lg
|
||||
```
|
||||
|
||||
|
@ -292,8 +292,8 @@ function. You can make it available via the `--code` argument that can point to
|
|||
a Python file. For more details on training with custom code, see the
|
||||
[training documentation](/usage/training#custom-code).
|
||||
|
||||
```bash
|
||||
$ python -m spacy train ./config.cfg --code ./code.py
|
||||
```cli
|
||||
python -m spacy train ./config.cfg --code ./code.py
|
||||
```
|
||||
|
||||
### Customizing the model implementations {#training-custom-model}
|
||||
|
|
|
@ -40,7 +40,7 @@ $ pip install -U spacy
|
|||
> After installation you need to download a language model. For more info and
|
||||
> available models, see the [docs on models](/models).
|
||||
>
|
||||
> ```bash
|
||||
> ```cli
|
||||
> $ python -m spacy download en_core_web_sm
|
||||
>
|
||||
> >>> import spacy
|
||||
|
@ -62,9 +62,9 @@ When using pip it is generally recommended to install packages in a virtual
|
|||
environment to avoid modifying system state:
|
||||
|
||||
```bash
|
||||
python -m venv .env
|
||||
source .env/bin/activate
|
||||
pip install spacy
|
||||
$ python -m venv .env
|
||||
$ source .env/bin/activate
|
||||
$ pip install spacy
|
||||
```
|
||||
|
||||
### conda {#conda}
|
||||
|
@ -106,9 +106,9 @@ links created in different virtual environments. It's recommended to run the
|
|||
command with `python -m` to make sure you're executing the correct version of
|
||||
spaCy.
|
||||
|
||||
```bash
|
||||
pip install -U spacy
|
||||
python -m spacy validate
|
||||
```cli
|
||||
$ pip install -U spacy
|
||||
$ python -m spacy validate
|
||||
```
|
||||
|
||||
### Run spaCy with GPU {#gpu new="2.0.14"}
|
||||
|
@ -156,15 +156,15 @@ system. See notes on [Ubuntu](#source-ubuntu), [macOS / OS X](#source-osx) and
|
|||
[Windows](#source-windows) for details.
|
||||
|
||||
```bash
|
||||
python -m pip install -U pip # update pip
|
||||
git clone https://github.com/explosion/spaCy # clone spaCy
|
||||
cd spaCy # navigate into directory
|
||||
$ python -m pip install -U pip # update pip
|
||||
$ git clone https://github.com/explosion/spaCy # clone spaCy
|
||||
$ cd spaCy # navigate into dir
|
||||
|
||||
python -m venv .env # create environment in .env
|
||||
source .env/bin/activate # activate virtual environment
|
||||
\export PYTHONPATH=`pwd` # set Python path to spaCy directory
|
||||
pip install -r requirements.txt # install all requirements
|
||||
python setup.py build_ext --inplace # compile spaCy
|
||||
$ python -m venv .env # create environment in .env
|
||||
$ source .env/bin/activate # activate virtual env
|
||||
$ export PYTHONPATH=`pwd` # set Python path to spaCy dir
|
||||
$ pip install -r requirements.txt # install all requirements
|
||||
$ python setup.py build_ext --inplace # compile spaCy
|
||||
```
|
||||
|
||||
Compared to regular install via pip, the
|
||||
|
@ -209,20 +209,18 @@ that directory. Don't forget to also install the test utilities via spaCy's
|
|||
[`requirements.txt`](https://github.com/explosion/spaCy/tree/master/requirements.txt):
|
||||
|
||||
```bash
|
||||
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
|
||||
pip install -r path/to/requirements.txt
|
||||
python -m pytest [spacy directory]
|
||||
$ python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
|
||||
$ pip install -r path/to/requirements.txt
|
||||
$ python -m pytest [spacy directory]
|
||||
```
|
||||
|
||||
Calling `pytest` on the spaCy directory will run only the basic tests. The flag
|
||||
`--slow` is optional and enables additional tests that take longer.
|
||||
|
||||
```bash
|
||||
# make sure you are using recent pytest version
|
||||
python -m pip install -U pytest
|
||||
|
||||
python -m pytest [spacy directory] # basic tests
|
||||
python -m pytest [spacy directory] --slow # basic and slow tests
|
||||
$ python -m pip install -U pytest # update pytest
|
||||
$ python -m pytest [spacy directory] # basic tests
|
||||
$ python -m pytest [spacy directory] --slow # basic and slow tests
|
||||
```
|
||||
|
||||
## Troubleshooting guide {#troubleshooting}
|
||||
|
@ -283,7 +281,7 @@ only 65535 in a narrow unicode build. You can check this by running the
|
|||
following command:
|
||||
|
||||
```bash
|
||||
python -c "import sys; print(sys.maxunicode)"
|
||||
$ python -c "import sys; print(sys.maxunicode)"
|
||||
```
|
||||
|
||||
If you're running a narrow unicode build, reinstall Python and use a wide
|
||||
|
@ -305,8 +303,8 @@ run `source ~/.bash_profile` or `source ~/.zshrc`. Make sure to add **both
|
|||
lines** for `LC_ALL` and `LANG`.
|
||||
|
||||
```bash
|
||||
\export LC_ALL=en_US.UTF-8
|
||||
\export LANG=en_US.UTF-8
|
||||
$ export LC_ALL=en_US.UTF-8
|
||||
$ export LANG=en_US.UTF-8
|
||||
```
|
||||
|
||||
</Accordion>
|
||||
|
|
|
@ -1588,9 +1588,9 @@ some nice Latin vectors. You can then pass the directory path to
|
|||
> doc1.similarity(doc2)
|
||||
> ```
|
||||
|
||||
```bash
|
||||
wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz
|
||||
python -m spacy init model en /tmp/la_vectors_wiki_lg --vectors-loc cc.la.300.vec.gz
|
||||
```cli
|
||||
$ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz
|
||||
$ python -m spacy init model en /tmp/la_vectors_wiki_lg --vectors-loc cc.la.300.vec.gz
|
||||
```
|
||||
|
||||
<Accordion title="How to optimize vector coverage" id="custom-vectors-coverage" spaced>
|
||||
|
@ -1649,8 +1649,8 @@ the vector of "leaving", which is identical. If you're using the
|
|||
option to easily reduce the size of the vectors as you add them to a spaCy
|
||||
model:
|
||||
|
||||
```bash
|
||||
$ python -m spacy init model /tmp/la_vectors_web_md --vectors-loc la.300d.vec.tgz --prune-vectors 10000
|
||||
```cli
|
||||
$ python -m spacy init model en /tmp/la_vectors_web_md --vectors-loc la.300d.vec.tgz --prune-vectors 10000
|
||||
```
|
||||
|
||||
This will create a spaCy model with vectors for the first 10,000 words in the
|
||||
|
@ -1741,9 +1741,8 @@ language name, and even train models with it and refer to it in your
|
|||
> needs to be available during training. You can load a Python file containing
|
||||
> the code using the `--code` argument:
|
||||
>
|
||||
> ```bash
|
||||
> ### {wrap="true"}
|
||||
> $ python -m spacy train config.cfg --code code.py
|
||||
> ```cli
|
||||
> python -m spacy train config.cfg --code code.py
|
||||
> ```
|
||||
|
||||
```python
|
||||
|
|
|
@ -116,15 +116,10 @@ The Chinese language class supports three word segmentation options:
|
|||
|
||||
<Infobox variant="warning">
|
||||
|
||||
In spaCy v3, the default Chinese word segmenter has switched from Jieba to
|
||||
character segmentation.
|
||||
|
||||
</Infobox>
|
||||
|
||||
<Infobox variant="warning">
|
||||
|
||||
Note that [`pkuseg`](https://github.com/lancopku/pkuseg-python) doesn't yet ship
|
||||
with pre-compiled wheels for Python 3.8. If you're running Python 3.8, you can
|
||||
In spaCy v3.0, the default Chinese word segmenter has switched from Jieba to
|
||||
character segmentation. Also note that
|
||||
[`pkuseg`](https://github.com/lancopku/pkuseg-python) doesn't yet ship with
|
||||
pre-compiled wheels for Python 3.8. If you're running Python 3.8, you can
|
||||
install it from our fork and compile it locally:
|
||||
|
||||
```bash
|
||||
|
@ -174,7 +169,7 @@ nlp.tokenizer.pkuseg_update_user_dict([], reset=True)
|
|||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Details on pretrained and custom Chinese models">
|
||||
<Accordion title="Details on pretrained and custom Chinese models" spaced>
|
||||
|
||||
The [Chinese models](/models/zh) provided by spaCy include a custom `pkuseg`
|
||||
model trained only on
|
||||
|
@ -247,20 +242,20 @@ best-matching model compatible with your spaCy installation.
|
|||
> + nlp = spacy.load("en_core_web_sm")
|
||||
> ```
|
||||
|
||||
```bash
|
||||
# Download best-matching version of specific model for your spaCy installation
|
||||
python -m spacy download en_core_web_sm
|
||||
```cli
|
||||
# Download best-matching version of a model for your spaCy installation
|
||||
$ python -m spacy download en_core_web_sm
|
||||
|
||||
# Download exact model version
|
||||
python -m spacy download en_core_web_sm-2.2.0 --direct
|
||||
$ python -m spacy download en_core_web_sm-3.0.0 --direct
|
||||
```
|
||||
|
||||
The download command will [install the model](/usage/models#download-pip) via
|
||||
pip and place the package in your `site-packages` directory.
|
||||
|
||||
```bash
|
||||
pip install spacy
|
||||
python -m spacy download en_core_web_sm
|
||||
```cli
|
||||
$ pip install -U spacy
|
||||
$ python -m spacy download en_core_web_sm
|
||||
```
|
||||
|
||||
```python
|
||||
|
@ -279,10 +274,10 @@ click on the archive link and copy it to your clipboard.
|
|||
|
||||
```bash
|
||||
# With external URL
|
||||
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
|
||||
$ pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
|
||||
|
||||
# With local file
|
||||
pip install /Users/you/en_core_web_sm-3.0.0.tar.gz
|
||||
$ pip install /Users/you/en_core_web_sm-3.0.0.tar.gz
|
||||
```
|
||||
|
||||
By default, this will install the model into your `site-packages` directory. You
|
||||
|
@ -305,7 +300,7 @@ archive consists of a model directory that contains another directory with the
|
|||
model data.
|
||||
|
||||
```yaml
|
||||
### Directory structure {highlight="7"}
|
||||
### Directory structure {highlight="6"}
|
||||
└── en_core_web_md-3.0.0.tar.gz # downloaded archive
|
||||
├── setup.py # setup file for pip installation
|
||||
├── meta.json # copy of model meta
|
||||
|
|
|
@ -67,8 +67,8 @@ project template and copies the files to a local directory. You can then run the
|
|||
project, e.g. to train a model and edit the commands and scripts to build fully
|
||||
custom workflows.
|
||||
|
||||
```bash
|
||||
$ python -m spacy clone some_example_project
|
||||
```cli
|
||||
python -m spacy project clone some_example_project
|
||||
```
|
||||
|
||||
By default, the project will be cloned into the current working directory. You
|
||||
|
@ -95,9 +95,9 @@ to download and where to put them. The
|
|||
[`spacy project assets`](/api/cli#project-assets) will fetch the project assets
|
||||
for you:
|
||||
|
||||
```bash
|
||||
cd some_example_project
|
||||
python -m spacy project assets
|
||||
```
|
||||
$ cd some_example_project
|
||||
$ python -m spacy project assets
|
||||
```
|
||||
|
||||
### 3. Run a command {#run}
|
||||
|
@ -123,7 +123,7 @@ Commands consist of one or more steps and can be run with
|
|||
[`spacy project run`](/api/cli#project-run). The following will run the command
|
||||
`preprocess` defined in the `project.yml`:
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy project run preprocess
|
||||
```
|
||||
|
||||
|
@ -156,7 +156,7 @@ to turn the best model artifact into an installable Python package. The
|
|||
following command run the workflow named `all` defined in the `project.yml`, and
|
||||
execute the commands it specifies, in order:
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy project run all
|
||||
```
|
||||
|
||||
|
@ -379,8 +379,8 @@ The [`spacy project clone`](/api/cli#project-clone) command lets you customize
|
|||
the repo to clone from using the `--repo` option. It calls into `git`, so you'll
|
||||
be able to clone from any repo that you have access to, including private repos.
|
||||
|
||||
```bash
|
||||
$ python -m spacy project your_project --repo https://github.com/you/repo
|
||||
```cli
|
||||
python -m spacy project clone your_project --repo https://github.com/you/repo
|
||||
```
|
||||
|
||||
At a minimum, a valid project template needs to contain a
|
||||
|
@ -445,9 +445,9 @@ to include support for remote storage like Google Cloud Storage, S3, Azure, SSH
|
|||
and more.
|
||||
|
||||
```bash
|
||||
pip install dvc # Install DVC
|
||||
git init # Initialize a Git repo
|
||||
dvc init # Initialize a DVC project
|
||||
$ pip install dvc # Install DVC
|
||||
$ git init # Initialize a Git repo
|
||||
$ dvc init # Initialize a DVC project
|
||||
```
|
||||
|
||||
<Infobox title="Important note on privacy" variant="warning">
|
||||
|
@ -466,8 +466,8 @@ can then manage your spaCy project like any other DVC project, run
|
|||
and [`dvc repro`](https://dvc.org/doc/command-reference/repro) to reproduce the
|
||||
workflow or individual commands.
|
||||
|
||||
```bash
|
||||
$ python -m spacy project dvc [workflow name]
|
||||
```cli
|
||||
$ python -m spacy project dvc [workflow_name]
|
||||
```
|
||||
|
||||
<Infobox title="Important note for multiple workflows" variant="warning">
|
||||
|
@ -508,7 +508,7 @@ and evaluation set.
|
|||
|
||||
> #### Example usage
|
||||
>
|
||||
> ```bash
|
||||
> ```cli
|
||||
> $ python -m spacy project run annotate
|
||||
> ```
|
||||
|
||||
|
@ -595,7 +595,7 @@ spacy_streamlit.visualize(MODELS, DEFAULT_TEXT, visualizers=["ner"])
|
|||
|
||||
> #### Example usage
|
||||
>
|
||||
> ```bash
|
||||
> ```cli
|
||||
> $ python -m spacy project run visualize
|
||||
> ```
|
||||
|
||||
|
@ -636,8 +636,8 @@ API.
|
|||
|
||||
> #### Example usage
|
||||
>
|
||||
> ```bash
|
||||
> $ python -m spacy project run visualize
|
||||
> ```cli
|
||||
> $ python -m spacy project run serve
|
||||
> ```
|
||||
|
||||
<!-- prettier-ignore -->
|
||||
|
|
|
@ -562,11 +562,11 @@ import DisplaCyEntSnekHtml from 'images/displacy-ent-snek.html'
|
|||
## Saving, loading and distributing models {#models}
|
||||
|
||||
After training your model, you'll usually want to save its state, and load it
|
||||
back later. You can do this with the
|
||||
[`Language.to_disk()`](/api/language#to_disk) method:
|
||||
back later. You can do this with the [`Language.to_disk`](/api/language#to_disk)
|
||||
method:
|
||||
|
||||
```python
|
||||
nlp.to_disk('/home/me/data/en_example_model')
|
||||
nlp.to_disk("./en_example_model")
|
||||
```
|
||||
|
||||
The directory will be created if it doesn't exist, and the whole pipeline data,
|
||||
|
@ -629,8 +629,8 @@ docs.
|
|||
> }
|
||||
> ```
|
||||
|
||||
```bash
|
||||
$ python -m spacy package /home/me/data/en_example_model /home/me/my_models
|
||||
```cli
|
||||
$ python -m spacy package ./en_example_model ./my_models
|
||||
```
|
||||
|
||||
This command will create a model package directory and will run
|
||||
|
|
|
@ -160,7 +160,7 @@ the website or company in a specific context.
|
|||
|
||||
> #### Loading models
|
||||
>
|
||||
> ```bash
|
||||
> ```cli
|
||||
> $ python -m spacy download en_core_web_sm
|
||||
>
|
||||
> >>> import spacy
|
||||
|
|
|
@ -66,7 +66,7 @@ the [`init fill-config`](/api/cli#init-fill-config) command to fill in the
|
|||
remaining defaults. Training configs should always be **complete and without
|
||||
hidden defaults**, to keep your experiments reproducible.
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy init fill-config base_config.cfg config.cfg
|
||||
```
|
||||
|
||||
|
@ -76,8 +76,8 @@ $ python -m spacy init fill-config base_config.cfg config.cfg
|
|||
> your training and development data, get useful stats, and find problems like
|
||||
> invalid entity annotations, cyclic dependencies, low data labels and more.
|
||||
>
|
||||
> ```bash
|
||||
> $ python -m spacy debug data config.cfg --verbose
|
||||
> ```cli
|
||||
> $ python -m spacy debug data config.cfg
|
||||
> ```
|
||||
|
||||
Instead of exporting your starter config from the quickstart widget and
|
||||
|
@ -88,7 +88,7 @@ add your data and run [`train`](/api/cli#train) with your config. See the
|
|||
spaCy's binary `.spacy` format. You can either include the data paths in the
|
||||
`[paths]` section of your config, or pass them in via the command line.
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy
|
||||
```
|
||||
|
||||
|
@ -186,9 +186,8 @@ For cases like this, you can set additional command-line options starting with
|
|||
`--paths.train ./corpus/train.spacy` sets the `train` value in the `[paths]`
|
||||
block.
|
||||
|
||||
```bash
|
||||
$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy
|
||||
--paths.dev ./corpus/dev.spacy --training.batch_size 128
|
||||
```cli
|
||||
$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy --paths.dev ./corpus/dev.spacy --training.batch_size 128
|
||||
```
|
||||
|
||||
Only existing sections and values in the config can be overwritten. At the end
|
||||
|
@ -486,8 +485,9 @@ still look good.
|
|||
|
||||
### Training with custom code {#custom-code}
|
||||
|
||||
> ```bash
|
||||
> ### Example {wrap="true"}
|
||||
> #### Example
|
||||
>
|
||||
> ```cli
|
||||
> $ python -m spacy train config.cfg --code functions.py
|
||||
> ```
|
||||
|
||||
|
@ -605,9 +605,8 @@ you can now run [`spacy train`](/api/cli#train) and point the argument `--code`
|
|||
to your Python file. Before loading the config, spaCy will import the
|
||||
`functions.py` module and your custom functions will be registered.
|
||||
|
||||
```bash
|
||||
### Training with custom code {wrap="true"}
|
||||
python -m spacy train config.cfg --output ./output --code ./functions.py
|
||||
```cli
|
||||
$ python -m spacy train config.cfg --output ./output --code ./functions.py
|
||||
```
|
||||
|
||||
#### Example: Custom batch size schedule {#custom-code-schedule}
|
||||
|
|
|
@ -212,14 +212,15 @@ Note that spaCy v3.0 now requires **Python 3.6+**.
|
|||
|
||||
### Removed or renamed API {#incompat-removed}
|
||||
|
||||
| Removed | Replacement |
|
||||
| -------------------------------------------------------- | ----------------------------------------------------- |
|
||||
| `Language.disable_pipes` | [`Language.select_pipes`](/api/language#select_pipes) |
|
||||
| `GoldParse` | [`Example`](/api/example) |
|
||||
| `GoldCorpus` | [`Corpus`](/api/corpus) |
|
||||
| `spacy debug-data` | [`spacy debug data`](/api/cli#debug-data) |
|
||||
| `spacy profile` | [`spacy debug profile`](/api/cli#debug-profile) |
|
||||
| `spacy link`, `util.set_data_path`, `util.get_data_path` | not needed, model symlinks are deprecated |
|
||||
| Removed | Replacement |
|
||||
| ------------------------------------------------------ | ----------------------------------------------------------------------------------------- |
|
||||
| `Language.disable_pipes` | [`Language.select_pipes`](/api/language#select_pipes) |
|
||||
| `GoldParse` | [`Example`](/api/example) |
|
||||
| `GoldCorpus` | [`Corpus`](/api/corpus) |
|
||||
| `KnowledgeBase.load_bulk` `KnowledgeBase.dump` | [`KnowledgeBase.from_disk`](/api/kb#from_disk) [`KnowledgeBase.to_disk`](/api/kb#to_disk) |
|
||||
| `spacy debug-data` | [`spacy debug data`](/api/cli#debug-data) |
|
||||
| `spacy profile` | [`spacy debug profile`](/api/cli#debug-profile) |
|
||||
| `spacy link` `util.set_data_path` `util.get_data_path` | not needed, model symlinks are deprecated |
|
||||
|
||||
The following deprecated methods, attributes and arguments were removed in v3.0.
|
||||
Most of them have been **deprecated for a while** and many would previously
|
||||
|
@ -412,12 +413,11 @@ spaCy v3.0 uses a new
|
|||
serializing a [`DocBin`](/api/docbin), which represents a collection of `Doc`
|
||||
objects. This means that you can train spaCy models using the same format it
|
||||
outputs: annotated `Doc` objects. The binary format is extremely **efficient in
|
||||
storage**, especially when packing multiple documents together.
|
||||
storage**, especially when packing multiple documents together. You can convert
|
||||
your existing JSON-formatted data using the [`spacy convert`](/api/cli#convert)
|
||||
command, which outputs `.spacy` files:
|
||||
|
||||
You can convert your existing JSON-formatted data using the
|
||||
[`spacy convert`](/api/cli#convert) command, which outputs `.spacy` files:
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy convert ./training.json ./output
|
||||
```
|
||||
|
||||
|
@ -429,7 +429,7 @@ The easiest way to get started with a training config is to use the
|
|||
requirements, and it will auto-generate a starter config with the best-matching
|
||||
default settings.
|
||||
|
||||
```bash
|
||||
```cli
|
||||
$ python -m spacy init config ./config.cfg --lang en --pipeline tagger,parser
|
||||
```
|
||||
|
||||
|
|
|
@ -8,7 +8,7 @@ import { window } from 'browser-monads'
|
|||
|
||||
import CUSTOM_TYPES from '../../meta/type-annotations.json'
|
||||
import { isString, htmlToReact } from './util'
|
||||
import Link from './link'
|
||||
import Link, { OptionalLink } from './link'
|
||||
import GitHubCode from './github'
|
||||
import classes from '../styles/code.module.sass'
|
||||
|
||||
|
@ -89,6 +89,91 @@ export const TypeAnnotation = ({ lang = 'python', link = true, children }) => {
|
|||
)
|
||||
}
|
||||
|
||||
function replacePrompt(line, prompt, isFirst = false) {
|
||||
let result = line
|
||||
const hasPrompt = result.startsWith(`${prompt} `)
|
||||
const showPrompt = hasPrompt || isFirst
|
||||
if (hasPrompt) result = result.slice(2)
|
||||
return result && showPrompt ? `<span data-prompt="${prompt}">${result}</span>` : result
|
||||
}
|
||||
|
||||
function parseArgs(raw) {
|
||||
const commandGroups = ['init', 'debug', 'project']
|
||||
let args = raw.split(' ').filter(arg => arg)
|
||||
const result = {}
|
||||
while (args.length) {
|
||||
let opt = args.shift()
|
||||
if (opt.length > 1 && opt.startsWith('-')) {
|
||||
const isFlag = !args.length || (args[0].length > 1 && args[0].startsWith('-'))
|
||||
result[opt] = isFlag ? true : args.shift()
|
||||
} else {
|
||||
const key = commandGroups.includes(opt) ? `${opt} ${args.shift()}` : opt
|
||||
result[key] = null
|
||||
}
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
function formatCode(html, lang, prompt) {
|
||||
if (lang === 'cli') {
|
||||
const cliRegex = /^(\$ )?python -m spacy/
|
||||
const lines = html
|
||||
.trim()
|
||||
.split('\n')
|
||||
.map((line, i) => {
|
||||
if (cliRegex.test(line)) {
|
||||
const text = line.replace(cliRegex, '')
|
||||
const args = parseArgs(text)
|
||||
const cmd = Object.keys(args).map((key, i) => {
|
||||
const value = args[key]
|
||||
return value === null || value === true || i === 0 ? key : `${key} ${value}`
|
||||
})
|
||||
return (
|
||||
<Fragment key={i}>
|
||||
<span data-prompt="$" className={classes.cliArgSubtle}>
|
||||
python -m
|
||||
</span>{' '}
|
||||
<span>spacy</span>{' '}
|
||||
{cmd.map((item, j) => {
|
||||
const isCmd = j === 0
|
||||
const url = isCmd ? `/api/cli#${item.replace(' ', '-')}` : null
|
||||
const isAbstract = isString(item) && /^\[(.+)\]$/.test(item)
|
||||
const itemClassNames = classNames(classes.cliArg, {
|
||||
[classes.cliArgHighlight]: isCmd,
|
||||
[classes.cliArgEmphasis]: isAbstract,
|
||||
})
|
||||
const text = isAbstract ? item.slice(1, -1) : item
|
||||
return (
|
||||
<Fragment key={j}>
|
||||
{j !== 0 && ' '}
|
||||
<span className={itemClassNames}>
|
||||
<OptionalLink hidden hideIcon to={url}>
|
||||
{text}
|
||||
</OptionalLink>
|
||||
</span>
|
||||
</Fragment>
|
||||
)
|
||||
})}
|
||||
</Fragment>
|
||||
)
|
||||
}
|
||||
const htmlLine = replacePrompt(highlightCode('bash', line), '$')
|
||||
return htmlToReact(htmlLine)
|
||||
})
|
||||
return lines.map((line, i) => (
|
||||
<Fragment key={i}>
|
||||
{i !== 0 && <br />}
|
||||
{line}
|
||||
</Fragment>
|
||||
))
|
||||
}
|
||||
const result = html
|
||||
.split('\n')
|
||||
.map((line, i) => (prompt ? replacePrompt(line, prompt, i === 0) : line))
|
||||
.join('\n')
|
||||
return htmlToReact(result)
|
||||
}
|
||||
|
||||
export class Code extends React.Component {
|
||||
state = { Juniper: null }
|
||||
|
||||
|
@ -136,7 +221,8 @@ export class Code extends React.Component {
|
|||
children,
|
||||
} = this.props
|
||||
const codeClassNames = classNames(classes.code, className, `language-${lang}`, {
|
||||
[classes.wrap]: !!highlight || !!wrap,
|
||||
[classes.wrap]: !!highlight || !!wrap || lang === 'cli',
|
||||
[classes.cli]: lang === 'cli',
|
||||
})
|
||||
const ghClassNames = classNames(codeClassNames, classes.maxHeight)
|
||||
const { Juniper } = this.state
|
||||
|
@ -154,14 +240,14 @@ export class Code extends React.Component {
|
|||
|
||||
const codeText = Array.isArray(children) ? children.join('') : children || ''
|
||||
const highlightRange = highlight ? rangeParser.parse(highlight).filter(n => n > 0) : []
|
||||
const html = lang === 'none' ? codeText : highlightCode(lang, codeText, highlightRange)
|
||||
|
||||
const rawHtml = ['none', 'cli'].includes(lang)
|
||||
? codeText
|
||||
: highlightCode(lang, codeText, highlightRange)
|
||||
const html = formatCode(rawHtml, lang, prompt)
|
||||
return (
|
||||
<>
|
||||
{title && <h4 className={classes.title}>{title}</h4>}
|
||||
<code className={codeClassNames} data-prompt={prompt}>
|
||||
{htmlToReact(html)}
|
||||
</code>
|
||||
<code className={codeClassNames}>{html}</code>
|
||||
</>
|
||||
)
|
||||
}
|
||||
|
|
|
@ -117,7 +117,7 @@ const Quickstart = ({
|
|||
{help && (
|
||||
<span data-tooltip={help} className={classes.help}>
|
||||
{' '}
|
||||
<Icon name="help" width={16} spaced />
|
||||
<Icon name="help" width={16} />
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
|
@ -201,7 +201,7 @@ const Quickstart = ({
|
|||
className={classes.help}
|
||||
>
|
||||
{' '}
|
||||
<Icon name="help" width={16} spaced />
|
||||
<Icon name="help" width={16} />
|
||||
</span>
|
||||
)}
|
||||
</label>
|
||||
|
|
BIN
website/src/fonts/jetBrainsmono-italic.woff
Normal file
BIN
website/src/fonts/jetBrainsmono-italic.woff
Normal file
Binary file not shown.
BIN
website/src/fonts/jetbrainsmono-italic.woff2
Normal file
BIN
website/src/fonts/jetbrainsmono-italic.woff2
Normal file
Binary file not shown.
|
@ -28,7 +28,7 @@ $border-radius: 6px
|
|||
margin-top: 0 !important
|
||||
|
||||
code
|
||||
padding: 0
|
||||
padding: 0 !important
|
||||
margin: 0
|
||||
|
||||
h4
|
||||
|
|
|
@ -27,7 +27,7 @@
|
|||
padding: 1.75em 1.5em
|
||||
|
||||
.code
|
||||
&[data-prompt]:before,
|
||||
&[data-prompt]:before, span[data-prompt]:before
|
||||
content: attr(data-prompt)
|
||||
margin-right: 0.65em
|
||||
display: inline-block
|
||||
|
@ -163,3 +163,31 @@
|
|||
font-weight: normal
|
||||
padding-top: 0.1rem
|
||||
color: var(--color-subtle-dark)
|
||||
|
||||
.cli
|
||||
padding-top: calc(var(--spacing-sm) - 6px)
|
||||
padding-bottom: calc(var(--spacing-sm) - 12px)
|
||||
|
||||
[data-prompt]:before
|
||||
color: var(--color-subtle)
|
||||
|
||||
.cli-arg
|
||||
border: 1px solid var(--color-dark)
|
||||
padding: 1px 6px
|
||||
margin-bottom: 5px
|
||||
border-radius: 0.5em
|
||||
display: inline-block
|
||||
|
||||
a
|
||||
color: inherit !important
|
||||
|
||||
.cli-arg-highlight
|
||||
background: var(--color-theme)
|
||||
border-color: var(--color-theme)
|
||||
color: var(--color-back) !important
|
||||
|
||||
.cli-arg-subtle
|
||||
color: var(--syntax-comment)
|
||||
|
||||
.cli-arg-emphasis
|
||||
font-style: italic
|
||||
|
|
|
@ -157,6 +157,14 @@
|
|||
font-display: fallback
|
||||
src: url("../fonts/jetbrainsmono-regular.woff") format("woff"), url("../fonts/jetbrainsmono-regular.woff2") format("woff2")
|
||||
|
||||
@font-face
|
||||
font-family: "JetBrains Mono"
|
||||
font-style: italic
|
||||
font-weight: 500
|
||||
font-display: fallback
|
||||
src: url("../fonts/jetbrainsmono-italic.woff") format("woff"), url("../fonts/jetbrainsmono-italic.woff2") format("woff2")
|
||||
|
||||
|
||||
/* Reset */
|
||||
|
||||
*, *:before, *:after
|
||||
|
@ -366,6 +374,12 @@ body [id]:target
|
|||
&.operator
|
||||
color: var(--syntax-comment)
|
||||
|
||||
[class*="language-bash"] .token
|
||||
&.function
|
||||
color: var(--color-subtle)
|
||||
|
||||
&.operator, &.variable
|
||||
color: var(--syntax-comment)
|
||||
|
||||
// Settings for ini syntax (config files)
|
||||
[class*="language-ini"]
|
||||
|
|
Loading…
Reference in New Issue
Block a user