diff --git a/website/docs/api/cli.md b/website/docs/api/cli.md index c7a1c3f06..a86c920ad 100644 --- a/website/docs/api/cli.md +++ b/website/docs/api/cli.md @@ -39,8 +39,8 @@ the model name to be specified with its version (e.g. `en_core_web_sm-2.2.0`). > to a local PyPi installation and fetching it straight from there. This will > also allow you to add it as a versioned package dependency to your project. -```bash -$ python -m spacy download [model] [--direct] [pip args] +```cli +$ python -m spacy download [model] [--direct] [pip_args] ``` | Name | Description | @@ -57,11 +57,11 @@ Print information about your spaCy installation, models and local setup, and generate [Markdown](https://en.wikipedia.org/wiki/Markdown)-formatted markup to copy-paste into [GitHub issues](https://github.com/explosion/spaCy/issues). -```bash +```cli $ python -m spacy info [--markdown] [--silent] ``` -```bash +```cli $ python -m spacy info [model] [--markdown] [--silent] ``` @@ -88,7 +88,7 @@ and command for updating are shown. > suite, to ensure all models are up to date before proceeding. If incompatible > models are found, it will return `1`. -```bash +```cli $ python -m spacy validate ``` @@ -111,14 +111,14 @@ config. The settings you specify will impact the suggested model architectures and pipeline setup, as well as the hyperparameters. You can also adjust and customize those settings in your config file later. -> ```bash -> ### Example {wrap="true"} +> #### Example +> +> ```cli > $ python -m spacy init config config.cfg --lang en --pipeline ner,textcat --optimize accuracy > ``` -```bash -$ python -m spacy init config [output_file] [--lang] [--pipeline] -[--optimize] [--cpu] +```cli +$ python -m spacy init config [output_file] [--lang] [--pipeline] [--optimize] [--cpu] ``` | Name | Description | @@ -143,12 +143,13 @@ be created, and their signatures are used to find the defaults. If your config contains a problem that can't be resolved automatically, spaCy will show you a validation error with more details. -> ```bash -> ### Example {wrap="true"} +> #### Example +> +> ```cli > $ python -m spacy init fill-config base.cfg config.cfg > ``` -```bash +```cli $ python -m spacy init fill-config [base_path] [output_file] [--diff] ``` @@ -175,9 +176,8 @@ The `init-model` command is now available as a subcommand of `spacy init`. -```bash -$ python -m spacy init model [lang] [output_dir] [--jsonl-loc] [--vectors-loc] -[--prune-vectors] +```cli +$ python -m spacy init model [lang] [output_dir] [--jsonl-loc] [--vectors-loc] [--prune-vectors] ``` | Name | Description | @@ -200,10 +200,8 @@ Convert files into spaCy's management functions. The converter can be specified on the command line, or chosen based on the file extension of the input file. -```bash -$ python -m spacy convert [input_file] [output_dir] [--converter] -[--file-type] [--n-sents] [--seg-sents] [--model] [--morphology] -[--merge-subtokens] [--ner-map] [--lang] +```cli +$ python -m spacy convert [input_file] [output_dir] [--converter] [--file-type] [--n-sents] [--seg-sents] [--model] [--morphology] [--merge-subtokens] [--ner-map] [--lang] ``` | Name | Description | @@ -246,13 +244,13 @@ errors at once and some issues are only shown once previous errors have been fixed. To auto-fill a partial config and save the result, you can use the [`init fillconfig`](/api/cli#init-fill-config) command. -```bash +```cli $ python -m spacy debug config [config_path] [--code_path] [overrides] ``` > #### Example > -> ```bash +> ```cli > $ python -m spacy debug config ./config.cfg > ``` @@ -298,14 +296,13 @@ takes the same arguments as `train` and reads settings off the -```bash -$ python -m spacy debug data [config_path] [--code] [--ignore-warnings] -[--verbose] [--no-format] [overrides] +```cli +$ python -m spacy debug data [config_path] [--code] [--ignore-warnings] [--verbose] [--no-format] [overrides] ``` > #### Example > -> ```bash +> ```cli > $ python -m spacy debug data ./config.cfg > ``` @@ -473,7 +470,7 @@ The `profile` command is now available as a subcommand of `spacy debug`. -```bash +```cli $ python -m spacy debug profile [model] [inputs] [--n-texts] ``` @@ -490,9 +487,8 @@ $ python -m spacy debug profile [model] [inputs] [--n-texts] Debug a Thinc [`Model`](https://thinc.ai/docs/api-model) by running it on a sample text and checking how it updates its internal weights and parameters. -```bash -$ python -m spacy debug model [config_path] [component] [--layers] [-DIM] -[-PAR] [-GRAD] [-ATTR] [-P0] [-P1] [-P2] [P3] [--gpu-id] +```cli +$ python -m spacy debug model [config_path] [component] [--layers] [-DIM] [-PAR] [-GRAD] [-ATTR] [-P0] [-P1] [-P2] [P3] [--gpu-id] ``` @@ -502,7 +498,7 @@ model ("Step 0"), which helps us to understand the internal structure of the Neural Network, and to focus on specific layers that we want to inspect further (see next example). -```bash +```cli $ python -m spacy debug model ./config.cfg tagger -P0 ``` @@ -548,7 +544,7 @@ an all-zero matrix determined by the `nO` and `nI` dimensions. After a first training step (Step 2), this matrix has clearly updated its values through the training feedback loop. -```bash +```cli $ python -m spacy debug model ./config.cfg tagger -l "5,15" -DIM -PAR -P0 -P1 -P2 ``` @@ -632,7 +628,7 @@ in the section `[paths]`. -```bash +```cli $ python -m spacy train [config_path] [--output] [--code] [--verbose] [overrides] ``` @@ -669,9 +665,8 @@ the [data format](/api/data-formats#config) for details. -```bash -$ python -m spacy pretrain [texts_loc] [output_dir] [config_path] -[--code] [--resume-path] [--epoch-resume] [overrides] +```cli +$ python -m spacy pretrain [texts_loc] [output_dir] [config_path] [--code] [--resume-path] [--epoch-resume] [overrides] ``` | Name | Description | @@ -698,9 +693,8 @@ skew. To render a sample of dependency parses in a HTML file using the [displaCy visualizations](/usage/visualizers), set as output directory as the `--displacy-path` argument. -```bash -$ python -m spacy evaluate [model] [data_path] [--output] [--gold-preproc] -[--gpu-id] [--displacy-path] [--displacy-limit] +```cli +$ python -m spacy evaluate [model] [data_path] [--output] [--gold-preproc] [--gpu-id] [--displacy-path] [--displacy-limit] ``` | Name | Description | @@ -733,17 +727,16 @@ this, you can set the `--no-sdist` flag. -```bash -$ python -m spacy package [input_dir] [output_dir] [--meta-path] [--create-meta] -[--no-sdist] [--version] [--force] +```cli +$ python -m spacy package [input_dir] [output_dir] [--meta-path] [--create-meta] [--no-sdist] [--version] [--force] ``` > #### Example > -> ```bash -> python -m spacy package /input /output -> cd /output/en_model-0.0.0 -> pip install dist/en_model-0.0.0.tar.gz +> ```cli +> $ python -m spacy package /input /output +> $ cd /output/en_model-0.0.0 +> $ pip install dist/en_model-0.0.0.tar.gz > ``` | Name | Description | @@ -775,19 +768,19 @@ can provide any other repo (public or private) that you have access to using the -```bash +```cli $ python -m spacy project clone [name] [dest] [--repo] ``` > #### Example > -> ```bash +> ```cli > $ python -m spacy project clone some_example > ``` > > Clone from custom repo: > -> ```bash +> ```cli > $ python -m spacy project clone template --repo https://github.com/your_org/your_repo > ``` @@ -810,13 +803,13 @@ considered "private" and you have to take care of putting them into the destination directory yourself. If a local path is provided, the asset is copied into the current project. -```bash +```cli $ python -m spacy project assets [project_dir] ``` > #### Example > -> ```bash +> ```cli > $ python -m spacy project assets > ``` @@ -835,13 +828,13 @@ all commands in the workflow are run, in order. If commands define re-run if state has changed. For example, if the input dataset changes, a preprocessing command that depends on those files will be re-run. -```bash +```cli $ python -m spacy project run [subcommand] [project_dir] [--force] [--dry] ``` > #### Example > -> ```bash +> ```cli > $ python -m spacy project run train > ``` @@ -874,16 +867,16 @@ You'll also need to add the assets you want to track with -```bash +```cli $ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose] ``` > #### Example > -> ```bash -> git init -> dvc init -> python -m spacy project dvc all +> ```cli +> $ git init +> $ dvc init +> $ python -m spacy project dvc all > ``` | Name | Description | diff --git a/website/docs/api/data-formats.md b/website/docs/api/data-formats.md index 56528de43..701e16b1e 100644 --- a/website/docs/api/data-formats.md +++ b/website/docs/api/data-formats.md @@ -118,8 +118,8 @@ need paths, you can define them here. All config values can also be [`spacy train`](/api/cli#train), which is especially relevant for data paths that you don't want to hard-code in your config file. -```bash -$ python -m spacy train ./config.cfg --paths.train ./corpus/train.spacy +```cli +$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy ``` ### training {#config-training tag="section"} @@ -209,8 +209,8 @@ objects to JSON, you can now serialize them directly using the [`spacy convert`](/api/cli) lets you convert your JSON data to the new `.spacy` format: -```bash -$ python -m spacy convert ./data.json ./output +```cli +$ python -m spacy convert ./data.json ./output.spacy ``` diff --git a/website/docs/usage/embeddings-transformers.md b/website/docs/usage/embeddings-transformers.md index df9e68282..5a3189ecb 100644 --- a/website/docs/usage/embeddings-transformers.md +++ b/website/docs/usage/embeddings-transformers.md @@ -110,9 +110,9 @@ in `/opt/nvidia/cuda`, you would run: ```bash ### Installation with CUDA -export CUDA_PATH="/opt/nvidia/cuda" -pip install cupy-cuda102 -pip install spacy-transformers +$ export CUDA_PATH="/opt/nvidia/cuda" +$ pip install cupy-cuda102 +$ pip install spacy-transformers ``` ### Runtime usage {#transformers-runtime} @@ -130,7 +130,7 @@ The `Transformer` component sets the [`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute, which lets you access the transformers outputs at runtime. -```bash +```cli $ python -m spacy download en_core_trf_lg ``` @@ -292,8 +292,8 @@ function. You can make it available via the `--code` argument that can point to a Python file. For more details on training with custom code, see the [training documentation](/usage/training#custom-code). -```bash -$ python -m spacy train ./config.cfg --code ./code.py +```cli +python -m spacy train ./config.cfg --code ./code.py ``` ### Customizing the model implementations {#training-custom-model} diff --git a/website/docs/usage/index.md b/website/docs/usage/index.md index bda9f76d6..90e02aef7 100644 --- a/website/docs/usage/index.md +++ b/website/docs/usage/index.md @@ -40,7 +40,7 @@ $ pip install -U spacy > After installation you need to download a language model. For more info and > available models, see the [docs on models](/models). > -> ```bash +> ```cli > $ python -m spacy download en_core_web_sm > > >>> import spacy @@ -62,9 +62,9 @@ When using pip it is generally recommended to install packages in a virtual environment to avoid modifying system state: ```bash -python -m venv .env -source .env/bin/activate -pip install spacy +$ python -m venv .env +$ source .env/bin/activate +$ pip install spacy ``` ### conda {#conda} @@ -106,9 +106,9 @@ links created in different virtual environments. It's recommended to run the command with `python -m` to make sure you're executing the correct version of spaCy. -```bash -pip install -U spacy -python -m spacy validate +```cli +$ pip install -U spacy +$ python -m spacy validate ``` ### Run spaCy with GPU {#gpu new="2.0.14"} @@ -156,15 +156,15 @@ system. See notes on [Ubuntu](#source-ubuntu), [macOS / OS X](#source-osx) and [Windows](#source-windows) for details. ```bash -python -m pip install -U pip # update pip -git clone https://github.com/explosion/spaCy # clone spaCy -cd spaCy # navigate into directory +$ python -m pip install -U pip # update pip +$ git clone https://github.com/explosion/spaCy # clone spaCy +$ cd spaCy # navigate into dir -python -m venv .env # create environment in .env -source .env/bin/activate # activate virtual environment -\export PYTHONPATH=`pwd` # set Python path to spaCy directory -pip install -r requirements.txt # install all requirements -python setup.py build_ext --inplace # compile spaCy +$ python -m venv .env # create environment in .env +$ source .env/bin/activate # activate virtual env +$ export PYTHONPATH=`pwd` # set Python path to spaCy dir +$ pip install -r requirements.txt # install all requirements +$ python setup.py build_ext --inplace # compile spaCy ``` Compared to regular install via pip, the @@ -209,20 +209,18 @@ that directory. Don't forget to also install the test utilities via spaCy's [`requirements.txt`](https://github.com/explosion/spaCy/tree/master/requirements.txt): ```bash -python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))" -pip install -r path/to/requirements.txt -python -m pytest [spacy directory] +$ python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))" +$ pip install -r path/to/requirements.txt +$ python -m pytest [spacy directory] ``` Calling `pytest` on the spaCy directory will run only the basic tests. The flag `--slow` is optional and enables additional tests that take longer. ```bash -# make sure you are using recent pytest version -python -m pip install -U pytest - -python -m pytest [spacy directory] # basic tests -python -m pytest [spacy directory] --slow # basic and slow tests +$ python -m pip install -U pytest # update pytest +$ python -m pytest [spacy directory] # basic tests +$ python -m pytest [spacy directory] --slow # basic and slow tests ``` ## Troubleshooting guide {#troubleshooting} @@ -283,7 +281,7 @@ only 65535 in a narrow unicode build. You can check this by running the following command: ```bash -python -c "import sys; print(sys.maxunicode)" +$ python -c "import sys; print(sys.maxunicode)" ``` If you're running a narrow unicode build, reinstall Python and use a wide @@ -305,8 +303,8 @@ run `source ~/.bash_profile` or `source ~/.zshrc`. Make sure to add **both lines** for `LC_ALL` and `LANG`. ```bash -\export LC_ALL=en_US.UTF-8 -\export LANG=en_US.UTF-8 +$ export LC_ALL=en_US.UTF-8 +$ export LANG=en_US.UTF-8 ``` diff --git a/website/docs/usage/linguistic-features.md b/website/docs/usage/linguistic-features.md index 325063e58..10efcf875 100644 --- a/website/docs/usage/linguistic-features.md +++ b/website/docs/usage/linguistic-features.md @@ -1588,9 +1588,9 @@ some nice Latin vectors. You can then pass the directory path to > doc1.similarity(doc2) > ``` -```bash -wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz -python -m spacy init model en /tmp/la_vectors_wiki_lg --vectors-loc cc.la.300.vec.gz +```cli +$ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/word-vectors-v2/cc.la.300.vec.gz +$ python -m spacy init model en /tmp/la_vectors_wiki_lg --vectors-loc cc.la.300.vec.gz ``` @@ -1649,8 +1649,8 @@ the vector of "leaving", which is identical. If you're using the option to easily reduce the size of the vectors as you add them to a spaCy model: -```bash -$ python -m spacy init model /tmp/la_vectors_web_md --vectors-loc la.300d.vec.tgz --prune-vectors 10000 +```cli +$ python -m spacy init model en /tmp/la_vectors_web_md --vectors-loc la.300d.vec.tgz --prune-vectors 10000 ``` This will create a spaCy model with vectors for the first 10,000 words in the @@ -1741,9 +1741,8 @@ language name, and even train models with it and refer to it in your > needs to be available during training. You can load a Python file containing > the code using the `--code` argument: > -> ```bash -> ### {wrap="true"} -> $ python -m spacy train config.cfg --code code.py +> ```cli +> python -m spacy train config.cfg --code code.py > ``` ```python diff --git a/website/docs/usage/models.md b/website/docs/usage/models.md index be98cd36c..ec0e02297 100644 --- a/website/docs/usage/models.md +++ b/website/docs/usage/models.md @@ -116,15 +116,10 @@ The Chinese language class supports three word segmentation options: -In spaCy v3, the default Chinese word segmenter has switched from Jieba to -character segmentation. - - - - - -Note that [`pkuseg`](https://github.com/lancopku/pkuseg-python) doesn't yet ship -with pre-compiled wheels for Python 3.8. If you're running Python 3.8, you can +In spaCy v3.0, the default Chinese word segmenter has switched from Jieba to +character segmentation. Also note that +[`pkuseg`](https://github.com/lancopku/pkuseg-python) doesn't yet ship with +pre-compiled wheels for Python 3.8. If you're running Python 3.8, you can install it from our fork and compile it locally: ```bash @@ -174,7 +169,7 @@ nlp.tokenizer.pkuseg_update_user_dict([], reset=True) - + The [Chinese models](/models/zh) provided by spaCy include a custom `pkuseg` model trained only on @@ -247,20 +242,20 @@ best-matching model compatible with your spaCy installation. > + nlp = spacy.load("en_core_web_sm") > ``` -```bash -# Download best-matching version of specific model for your spaCy installation -python -m spacy download en_core_web_sm +```cli +# Download best-matching version of a model for your spaCy installation +$ python -m spacy download en_core_web_sm # Download exact model version -python -m spacy download en_core_web_sm-2.2.0 --direct +$ python -m spacy download en_core_web_sm-3.0.0 --direct ``` The download command will [install the model](/usage/models#download-pip) via pip and place the package in your `site-packages` directory. -```bash -pip install spacy -python -m spacy download en_core_web_sm +```cli +$ pip install -U spacy +$ python -m spacy download en_core_web_sm ``` ```python @@ -279,10 +274,10 @@ click on the archive link and copy it to your clipboard. ```bash # With external URL -pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz +$ pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz # With local file -pip install /Users/you/en_core_web_sm-3.0.0.tar.gz +$ pip install /Users/you/en_core_web_sm-3.0.0.tar.gz ``` By default, this will install the model into your `site-packages` directory. You @@ -305,7 +300,7 @@ archive consists of a model directory that contains another directory with the model data. ```yaml -### Directory structure {highlight="7"} +### Directory structure {highlight="6"} └── en_core_web_md-3.0.0.tar.gz # downloaded archive ├── setup.py # setup file for pip installation ├── meta.json # copy of model meta diff --git a/website/docs/usage/projects.md b/website/docs/usage/projects.md index ccf8ec49f..ab8101477 100644 --- a/website/docs/usage/projects.md +++ b/website/docs/usage/projects.md @@ -67,8 +67,8 @@ project template and copies the files to a local directory. You can then run the project, e.g. to train a model and edit the commands and scripts to build fully custom workflows. -```bash -$ python -m spacy clone some_example_project +```cli +python -m spacy project clone some_example_project ``` By default, the project will be cloned into the current working directory. You @@ -95,9 +95,9 @@ to download and where to put them. The [`spacy project assets`](/api/cli#project-assets) will fetch the project assets for you: -```bash -cd some_example_project -python -m spacy project assets +``` +$ cd some_example_project +$ python -m spacy project assets ``` ### 3. Run a command {#run} @@ -123,7 +123,7 @@ Commands consist of one or more steps and can be run with [`spacy project run`](/api/cli#project-run). The following will run the command `preprocess` defined in the `project.yml`: -```bash +```cli $ python -m spacy project run preprocess ``` @@ -156,7 +156,7 @@ to turn the best model artifact into an installable Python package. The following command run the workflow named `all` defined in the `project.yml`, and execute the commands it specifies, in order: -```bash +```cli $ python -m spacy project run all ``` @@ -379,8 +379,8 @@ The [`spacy project clone`](/api/cli#project-clone) command lets you customize the repo to clone from using the `--repo` option. It calls into `git`, so you'll be able to clone from any repo that you have access to, including private repos. -```bash -$ python -m spacy project your_project --repo https://github.com/you/repo +```cli +python -m spacy project clone your_project --repo https://github.com/you/repo ``` At a minimum, a valid project template needs to contain a @@ -445,9 +445,9 @@ to include support for remote storage like Google Cloud Storage, S3, Azure, SSH and more. ```bash -pip install dvc # Install DVC -git init # Initialize a Git repo -dvc init # Initialize a DVC project +$ pip install dvc # Install DVC +$ git init # Initialize a Git repo +$ dvc init # Initialize a DVC project ``` @@ -466,8 +466,8 @@ can then manage your spaCy project like any other DVC project, run and [`dvc repro`](https://dvc.org/doc/command-reference/repro) to reproduce the workflow or individual commands. -```bash -$ python -m spacy project dvc [workflow name] +```cli +$ python -m spacy project dvc [workflow_name] ``` @@ -508,7 +508,7 @@ and evaluation set. > #### Example usage > -> ```bash +> ```cli > $ python -m spacy project run annotate > ``` @@ -595,7 +595,7 @@ spacy_streamlit.visualize(MODELS, DEFAULT_TEXT, visualizers=["ner"]) > #### Example usage > -> ```bash +> ```cli > $ python -m spacy project run visualize > ``` @@ -636,8 +636,8 @@ API. > #### Example usage > -> ```bash -> $ python -m spacy project run visualize +> ```cli +> $ python -m spacy project run serve > ``` diff --git a/website/docs/usage/saving-loading.md b/website/docs/usage/saving-loading.md index f8bb1bfa9..5fb8fc98b 100644 --- a/website/docs/usage/saving-loading.md +++ b/website/docs/usage/saving-loading.md @@ -562,11 +562,11 @@ import DisplaCyEntSnekHtml from 'images/displacy-ent-snek.html' ## Saving, loading and distributing models {#models} After training your model, you'll usually want to save its state, and load it -back later. You can do this with the -[`Language.to_disk()`](/api/language#to_disk) method: +back later. You can do this with the [`Language.to_disk`](/api/language#to_disk) +method: ```python -nlp.to_disk('/home/me/data/en_example_model') +nlp.to_disk("./en_example_model") ``` The directory will be created if it doesn't exist, and the whole pipeline data, @@ -629,8 +629,8 @@ docs. > } > ``` -```bash -$ python -m spacy package /home/me/data/en_example_model /home/me/my_models +```cli +$ python -m spacy package ./en_example_model ./my_models ``` This command will create a model package directory and will run diff --git a/website/docs/usage/spacy-101.md b/website/docs/usage/spacy-101.md index df08e0320..8ea6a6ca0 100644 --- a/website/docs/usage/spacy-101.md +++ b/website/docs/usage/spacy-101.md @@ -160,7 +160,7 @@ the website or company in a specific context. > #### Loading models > -> ```bash +> ```cli > $ python -m spacy download en_core_web_sm > > >>> import spacy diff --git a/website/docs/usage/training.md b/website/docs/usage/training.md index 31ba902b0..6829d38c0 100644 --- a/website/docs/usage/training.md +++ b/website/docs/usage/training.md @@ -66,7 +66,7 @@ the [`init fill-config`](/api/cli#init-fill-config) command to fill in the remaining defaults. Training configs should always be **complete and without hidden defaults**, to keep your experiments reproducible. -```bash +```cli $ python -m spacy init fill-config base_config.cfg config.cfg ``` @@ -76,8 +76,8 @@ $ python -m spacy init fill-config base_config.cfg config.cfg > your training and development data, get useful stats, and find problems like > invalid entity annotations, cyclic dependencies, low data labels and more. > -> ```bash -> $ python -m spacy debug data config.cfg --verbose +> ```cli +> $ python -m spacy debug data config.cfg > ``` Instead of exporting your starter config from the quickstart widget and @@ -88,7 +88,7 @@ add your data and run [`train`](/api/cli#train) with your config. See the spaCy's binary `.spacy` format. You can either include the data paths in the `[paths]` section of your config, or pass them in via the command line. -```bash +```cli $ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy ``` @@ -186,9 +186,8 @@ For cases like this, you can set additional command-line options starting with `--paths.train ./corpus/train.spacy` sets the `train` value in the `[paths]` block. -```bash -$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy ---paths.dev ./corpus/dev.spacy --training.batch_size 128 +```cli +$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy --paths.dev ./corpus/dev.spacy --training.batch_size 128 ``` Only existing sections and values in the config can be overwritten. At the end @@ -486,8 +485,9 @@ still look good. ### Training with custom code {#custom-code} -> ```bash -> ### Example {wrap="true"} +> #### Example +> +> ```cli > $ python -m spacy train config.cfg --code functions.py > ``` @@ -605,9 +605,8 @@ you can now run [`spacy train`](/api/cli#train) and point the argument `--code` to your Python file. Before loading the config, spaCy will import the `functions.py` module and your custom functions will be registered. -```bash -### Training with custom code {wrap="true"} -python -m spacy train config.cfg --output ./output --code ./functions.py +```cli +$ python -m spacy train config.cfg --output ./output --code ./functions.py ``` #### Example: Custom batch size schedule {#custom-code-schedule} diff --git a/website/docs/usage/v3.md b/website/docs/usage/v3.md index ee90b6cc5..47110609e 100644 --- a/website/docs/usage/v3.md +++ b/website/docs/usage/v3.md @@ -212,14 +212,15 @@ Note that spaCy v3.0 now requires **Python 3.6+**. ### Removed or renamed API {#incompat-removed} -| Removed | Replacement | -| -------------------------------------------------------- | ----------------------------------------------------- | -| `Language.disable_pipes` | [`Language.select_pipes`](/api/language#select_pipes) | -| `GoldParse` | [`Example`](/api/example) | -| `GoldCorpus` | [`Corpus`](/api/corpus) | -| `spacy debug-data` | [`spacy debug data`](/api/cli#debug-data) | -| `spacy profile` | [`spacy debug profile`](/api/cli#debug-profile) | -| `spacy link`, `util.set_data_path`, `util.get_data_path` | not needed, model symlinks are deprecated | +| Removed | Replacement | +| ------------------------------------------------------ | ----------------------------------------------------------------------------------------- | +| `Language.disable_pipes` | [`Language.select_pipes`](/api/language#select_pipes) | +| `GoldParse` | [`Example`](/api/example) | +| `GoldCorpus` | [`Corpus`](/api/corpus) | +| `KnowledgeBase.load_bulk` `KnowledgeBase.dump` | [`KnowledgeBase.from_disk`](/api/kb#from_disk) [`KnowledgeBase.to_disk`](/api/kb#to_disk) | +| `spacy debug-data` | [`spacy debug data`](/api/cli#debug-data) | +| `spacy profile` | [`spacy debug profile`](/api/cli#debug-profile) | +| `spacy link` `util.set_data_path` `util.get_data_path` | not needed, model symlinks are deprecated | The following deprecated methods, attributes and arguments were removed in v3.0. Most of them have been **deprecated for a while** and many would previously @@ -412,12 +413,11 @@ spaCy v3.0 uses a new serializing a [`DocBin`](/api/docbin), which represents a collection of `Doc` objects. This means that you can train spaCy models using the same format it outputs: annotated `Doc` objects. The binary format is extremely **efficient in -storage**, especially when packing multiple documents together. +storage**, especially when packing multiple documents together. You can convert +your existing JSON-formatted data using the [`spacy convert`](/api/cli#convert) +command, which outputs `.spacy` files: -You can convert your existing JSON-formatted data using the -[`spacy convert`](/api/cli#convert) command, which outputs `.spacy` files: - -```bash +```cli $ python -m spacy convert ./training.json ./output ``` @@ -429,7 +429,7 @@ The easiest way to get started with a training config is to use the requirements, and it will auto-generate a starter config with the best-matching default settings. -```bash +```cli $ python -m spacy init config ./config.cfg --lang en --pipeline tagger,parser ``` diff --git a/website/src/components/code.js b/website/src/components/code.js index 0d1d214ae..740544f43 100644 --- a/website/src/components/code.js +++ b/website/src/components/code.js @@ -8,7 +8,7 @@ import { window } from 'browser-monads' import CUSTOM_TYPES from '../../meta/type-annotations.json' import { isString, htmlToReact } from './util' -import Link from './link' +import Link, { OptionalLink } from './link' import GitHubCode from './github' import classes from '../styles/code.module.sass' @@ -89,6 +89,91 @@ export const TypeAnnotation = ({ lang = 'python', link = true, children }) => { ) } +function replacePrompt(line, prompt, isFirst = false) { + let result = line + const hasPrompt = result.startsWith(`${prompt} `) + const showPrompt = hasPrompt || isFirst + if (hasPrompt) result = result.slice(2) + return result && showPrompt ? `${result}` : result +} + +function parseArgs(raw) { + const commandGroups = ['init', 'debug', 'project'] + let args = raw.split(' ').filter(arg => arg) + const result = {} + while (args.length) { + let opt = args.shift() + if (opt.length > 1 && opt.startsWith('-')) { + const isFlag = !args.length || (args[0].length > 1 && args[0].startsWith('-')) + result[opt] = isFlag ? true : args.shift() + } else { + const key = commandGroups.includes(opt) ? `${opt} ${args.shift()}` : opt + result[key] = null + } + } + return result +} + +function formatCode(html, lang, prompt) { + if (lang === 'cli') { + const cliRegex = /^(\$ )?python -m spacy/ + const lines = html + .trim() + .split('\n') + .map((line, i) => { + if (cliRegex.test(line)) { + const text = line.replace(cliRegex, '') + const args = parseArgs(text) + const cmd = Object.keys(args).map((key, i) => { + const value = args[key] + return value === null || value === true || i === 0 ? key : `${key} ${value}` + }) + return ( + + + python -m + {' '} + spacy{' '} + {cmd.map((item, j) => { + const isCmd = j === 0 + const url = isCmd ? `/api/cli#${item.replace(' ', '-')}` : null + const isAbstract = isString(item) && /^\[(.+)\]$/.test(item) + const itemClassNames = classNames(classes.cliArg, { + [classes.cliArgHighlight]: isCmd, + [classes.cliArgEmphasis]: isAbstract, + }) + const text = isAbstract ? item.slice(1, -1) : item + return ( + + {j !== 0 && ' '} + + + + + ) + })} + + ) + } + const htmlLine = replacePrompt(highlightCode('bash', line), '$') + return htmlToReact(htmlLine) + }) + return lines.map((line, i) => ( + + {i !== 0 &&
} + {line} +
+ )) + } + const result = html + .split('\n') + .map((line, i) => (prompt ? replacePrompt(line, prompt, i === 0) : line)) + .join('\n') + return htmlToReact(result) +} + export class Code extends React.Component { state = { Juniper: null } @@ -136,7 +221,8 @@ export class Code extends React.Component { children, } = this.props const codeClassNames = classNames(classes.code, className, `language-${lang}`, { - [classes.wrap]: !!highlight || !!wrap, + [classes.wrap]: !!highlight || !!wrap || lang === 'cli', + [classes.cli]: lang === 'cli', }) const ghClassNames = classNames(codeClassNames, classes.maxHeight) const { Juniper } = this.state @@ -154,14 +240,14 @@ export class Code extends React.Component { const codeText = Array.isArray(children) ? children.join('') : children || '' const highlightRange = highlight ? rangeParser.parse(highlight).filter(n => n > 0) : [] - const html = lang === 'none' ? codeText : highlightCode(lang, codeText, highlightRange) - + const rawHtml = ['none', 'cli'].includes(lang) + ? codeText + : highlightCode(lang, codeText, highlightRange) + const html = formatCode(rawHtml, lang, prompt) return ( <> {title &&

{title}

} - - {htmlToReact(html)} - + {html} ) } diff --git a/website/src/components/quickstart.js b/website/src/components/quickstart.js index f7ab11fa4..6a335d4a0 100644 --- a/website/src/components/quickstart.js +++ b/website/src/components/quickstart.js @@ -117,7 +117,7 @@ const Quickstart = ({ {help && ( {' '} - + )} @@ -201,7 +201,7 @@ const Quickstart = ({ className={classes.help} > {' '} - + )} diff --git a/website/src/fonts/jetBrainsmono-italic.woff b/website/src/fonts/jetBrainsmono-italic.woff new file mode 100644 index 000000000..f3ddf4db5 Binary files /dev/null and b/website/src/fonts/jetBrainsmono-italic.woff differ diff --git a/website/src/fonts/jetbrainsmono-italic.woff2 b/website/src/fonts/jetbrainsmono-italic.woff2 new file mode 100644 index 000000000..828c42961 Binary files /dev/null and b/website/src/fonts/jetbrainsmono-italic.woff2 differ diff --git a/website/src/styles/aside.module.sass b/website/src/styles/aside.module.sass index 0e73cc61a..1ea3f970a 100644 --- a/website/src/styles/aside.module.sass +++ b/website/src/styles/aside.module.sass @@ -28,7 +28,7 @@ $border-radius: 6px margin-top: 0 !important code - padding: 0 + padding: 0 !important margin: 0 h4 diff --git a/website/src/styles/code.module.sass b/website/src/styles/code.module.sass index 3ff1fae6b..2d213d001 100644 --- a/website/src/styles/code.module.sass +++ b/website/src/styles/code.module.sass @@ -27,7 +27,7 @@ padding: 1.75em 1.5em .code - &[data-prompt]:before, + &[data-prompt]:before, span[data-prompt]:before content: attr(data-prompt) margin-right: 0.65em display: inline-block @@ -163,3 +163,31 @@ font-weight: normal padding-top: 0.1rem color: var(--color-subtle-dark) + +.cli + padding-top: calc(var(--spacing-sm) - 6px) + padding-bottom: calc(var(--spacing-sm) - 12px) + + [data-prompt]:before + color: var(--color-subtle) + +.cli-arg + border: 1px solid var(--color-dark) + padding: 1px 6px + margin-bottom: 5px + border-radius: 0.5em + display: inline-block + + a + color: inherit !important + +.cli-arg-highlight + background: var(--color-theme) + border-color: var(--color-theme) + color: var(--color-back) !important + +.cli-arg-subtle + color: var(--syntax-comment) + +.cli-arg-emphasis + font-style: italic diff --git a/website/src/styles/layout.sass b/website/src/styles/layout.sass index 82612c103..03011bf4e 100644 --- a/website/src/styles/layout.sass +++ b/website/src/styles/layout.sass @@ -157,6 +157,14 @@ font-display: fallback src: url("../fonts/jetbrainsmono-regular.woff") format("woff"), url("../fonts/jetbrainsmono-regular.woff2") format("woff2") +@font-face + font-family: "JetBrains Mono" + font-style: italic + font-weight: 500 + font-display: fallback + src: url("../fonts/jetbrainsmono-italic.woff") format("woff"), url("../fonts/jetbrainsmono-italic.woff2") format("woff2") + + /* Reset */ *, *:before, *:after @@ -366,6 +374,12 @@ body [id]:target &.operator color: var(--syntax-comment) +[class*="language-bash"] .token + &.function + color: var(--color-subtle) + + &.operator, &.variable + color: var(--syntax-comment) // Settings for ini syntax (config files) [class*="language-ini"]