//- 💫 DOCS > API > COMMAND LINE INTERFACE include ../_includes/_mixins p | As of v1.7.0, spaCy comes with new command line helpers to download and | link models and show useful debugging information. For a list of available | commands, type #[code spacy --help]. +h(3, "download") Download p | Download #[+a("/usage/models") models] for spaCy. The downloader finds the | best-matching compatible version, uses pip to download the model as a | package and automatically creates a | #[+a("/usage/models#usage") shortcut link] to load the model by name. | Direct downloads don't perform any compatibility checks and require the | model name to be specified with its version (e.g. | #[code en_core_web_sm-2.0.0]). +aside("Downloading best practices") | The #[code download] command is mostly intended as a convenient, | interactive wrapper – it performs compatibility checks and prints | detailed messages in case things go wrong. It's #[strong not recommended] | to use this command as part of an automated process. If you know which | model your project needs, you should consider a | #[+a("/usage/models#download-pip") direct download via pip], or | uploading the model to a local PyPi installation and fetching it straight | from there. This will also allow you to add it as a versioned package | dependency to your project. +code(false, "bash", "$"). python -m spacy download [model] [--direct] +table(["Argument", "Type", "Description"]) +row +cell #[code model] +cell positional +cell | Model name or shortcut (#[code en], #[code de], | #[code en_core_web_sm]). +row +cell #[code --direct], #[code -d] +cell flag +cell Force direct download of exact model version. +row +cell other +tag-new(2.1) +cell - +cell | Additional installation options to be passed to | #[code pip install] when installing the model package. For | example, #[code --user] to install to the user home directory. +row +cell #[code --help], #[code -h] +cell flag +cell Show help message and available arguments. +row("foot") +cell creates +cell directory, symlink +cell | The installed model package in your #[code site-packages] | directory and a shortcut link as a symlink in #[code spacy/data]. +h(3, "link") Link p | Create a #[+a("/usage/models#usage") shortcut link] for a model, | either a Python package or a local directory. This will let you load | models from any location using a custom name via | #[+api("spacy#load") #[code spacy.load()]]. +infobox("Important note") | In spaCy v1.x, you had to use the model data directory to set up a shortcut | link for a local path. As of v2.0, spaCy expects all shortcut links to | be #[strong loadable model packages]. If you want to load a data directory, | call #[+api("spacy#load") #[code spacy.load()]] or | #[+api("language#from_disk") #[code Language.from_disk()]] with the path, | or use the #[+api("cli#package") #[code package]] command to create a | model package. +code(false, "bash", "$"). python -m spacy link [origin] [link_name] [--force] +table(["Argument", "Type", "Description"]) +row +cell #[code origin] +cell positional +cell Model name if package, or path to local directory. +row +cell #[code link_name] +cell positional +cell Name of the shortcut link to create. +row +cell #[code --force], #[code -f] +cell flag +cell Force overwriting of existing link. +row +cell #[code --help], #[code -h] +cell flag +cell Show help message and available arguments. +row("foot") +cell creates +cell symlink +cell | A shortcut link of the given name as a symlink in | #[code spacy/data]. +h(3, "info") Info p | Print information about your spaCy installation, models and local setup, | and generate #[+a("https://en.wikipedia.org/wiki/Markdown") Markdown]-formatted | markup to copy-paste into #[+a(gh("spacy") + "/issues") GitHub issues]. +code(false, "bash"). python -m spacy info [--markdown] python -m spacy info [model] [--markdown] +table(["Argument", "Type", "Description"]) +row +cell #[code model] +cell positional +cell A model, i.e. shortcut link, package name or path (optional). +row +cell #[code --markdown], #[code -md] +cell flag +cell Print information as Markdown. +row +cell #[code --silent], #[code -s] +tag-new("2.0.12") +cell flag +cell Don't print anything, just return the values. +row +cell #[code --help], #[code -h] +cell flag +cell Show help message and available arguments. +row("foot") +cell prints +cell #[code stdout] +cell Information about your spaCy installation. +h(3, "validate") Validate +tag-new(2) p | Find all models installed in the current environment (both packages and | shortcut links) and check whether they are compatible with the currently | installed version of spaCy. Should be run after upgrading spaCy via | #[code pip install -U spacy] to ensure that all installed models are | can be used with the new version. The command is also useful to detect | out-of-sync model links resulting from links created in different virtual | environments. It will a list of models, the installed versions, the | latest compatible version (if out of date) and the commands for updating. +aside("Automated validation") | You can also use the #[code validate] command as part of your build | process or test suite, to ensure all models are up to date before | proceeding. If incompatible models or shortcut links are found, it will | return #[code 1]. +code(false, "bash", "$"). python -m spacy validate +table(["Argument", "Type", "Description"]) +row("foot") +cell prints +cell #[code stdout] +cell Details about the compatibility of your installed models. +h(3, "convert") Convert p | Convert files into spaCy's #[+a("/api/annotation#json-input") JSON format] | for use with the #[code train] command and other experiment management | functions. The converter can be specified on the command line, or | chosen based on the file extension of the input file. +code(false, "bash", "$", false, false, true). python -m spacy convert [input_file] [output_dir] [--converter] [--n-sents] [--morphology] +table(["Argument", "Type", "Description"]) +row +cell #[code input_file] +cell positional +cell Input file. +row +cell #[code output_dir] +cell positional +cell Output directory for converted JSON file. +row +cell #[code converter], #[code -c] +cell option +cell #[+tag-new(2)] Name of converter to use (see below). +row +cell #[code --n-sents], #[code -n] +cell option +cell Number of sentences per document. +row +cell #[code --morphology], #[code -m] +cell option +cell Enable appending morphology to tags. +row +cell #[code --help], #[code -h] +cell flag +cell Show help message and available arguments. +row("foot") +cell creates +cell JSON +cell Data in spaCy's #[+a("/api/annotation#json-input") JSON format]. p The following file format converters are available: +table(["ID", "Description"]) +row +cell #[code auto] +cell Automatically pick converter based on file extension (default). +row +cell #[code conllu], #[code conll] +cell Universal Dependencies #[code .conllu] or #[code .conll] format. +row +cell #[code ner] +cell Tab-based named entity recognition format. +row +cell #[code iob] +cell IOB named entity recognition format. +h(3, "train") Train p | Train a model. Expects data in spaCy's | #[+a("/api/annotation#json-input") JSON format]. On each epoch, a model | will be saved out to the directory. Accuracy scores and model details | will be added to a #[+a("/usage/training#models-generating") #[code meta.json]] | to allow packaging the model using the | #[+api("cli#package") #[code package]] command. +infobox("Changed in v2.1", "⚠️") | As of spaCy 2.1, the #[code --no-tagger], #[code --no-parser] and | #[code --no-parser] flags have been replaced by a #[code --pipeline] | option, which lets you define comma-separated names of pipeline | components to train. For example, #[code --pipeline tagger,parser] will | only train the tagger and parser. +code(false, "bash", "$", false, false, true). python -m spacy train [lang] [output_path] [train_path] [dev_path] [--base-model] [--pipeline] [--vectors] [--n-iter] [--n-examples] [--use-gpu] [--version] [--meta-path] [--init-tok2vec] [--parser-multitasks] [--entity-multitasks] [--gold-preproc] [--noise-level] [--learn-tokens] [--verbose] +table(["Argument", "Type", "Description"]) +row +cell #[code lang] +cell positional +cell Model language. +row +cell #[code output_path] +cell positional +cell Directory to store model in. Will be created if it doesn't exist. +row +cell #[code train_path] +cell positional +cell Location of JSON-formatted training data. +row +cell #[code dev_path] +cell positional +cell Location of JSON-formatted development data for evaluation. +row +cell #[code --base-model], #[code -b] +cell option +cell | Optional name of base model to update. Can be any loadable | spaCy model. +row +cell #[code --pipeline], #[code -p] +tag-new("2.1.0") +cell option +cell | Comma-separated names of pipeline components to train. Defaults | to #[code 'tagger,parser,ner']. +row +cell #[code --vectors], #[code -v] +cell option +cell Model to load vectors from. +row +cell #[code --n-iter], #[code -n] +cell option +cell Number of iterations (default: #[code 30]). +row +cell #[code --n-examples], #[code -ns] +cell option +cell Number of examples to use (defaults to #[code 0] for all examples). +row +cell #[code --use-gpu], #[code -g] +cell option +cell | Whether to use GPU. Can be either #[code 0], #[code 1] or | #[code -1]. +row +cell #[code --version], #[code -V] +cell option +cell | Model version. Will be written out to the model's | #[code meta.json] after training. +row +cell #[code --meta-path], #[code -m] +tag-new(2) +cell option +cell | Optional path to model | #[+a("/usage/training#models-generating") #[code meta.json]]. | All relevant properties like #[code lang], #[code pipeline] and | #[code spacy_version] will be overwritten. +row +cell #[code --init-tok2vec], #[code -t2v] +tag-new("2.1.0") +cell option +cell | Path to pretrained weights for the token-to-vector parts of the | models. See #[code spacy pretrain]. Experimental. +row +cell #[code --parser-multitasks], #[code -pt] +cell option +cell | Side objectives for parser CNN, e.g. #[code 'dep'] or | #[code 'dep,tag'] +row +cell #[code --entity-multitasks], #[code -et] +cell option +cell | Side objectives for NER CNN, e.g. #[code 'dep'] or | #[code 'dep,tag'] +row +cell #[code --noise-level], #[code -nl] +cell option +cell Float indicating the amount of corruption for data agumentation. +row +cell #[code --gold-preproc], #[code -G] +cell flag +cell Use gold preprocessing. +row +cell #[code --learn-tokens], #[code -T] +cell flag +cell | Make parser learn gold-standard tokenization by merging ] subtokens. Typically used for languages like Chinese. +row +cell #[code --verbose], #[code -VV] +tag-new("2.0.13") +cell flag +cell Show more detailed messages during training. +row +cell #[code --help], #[code -h] +cell flag +cell Show help message and available arguments. +row("foot") +cell creates +cell model, pickle +cell A spaCy model on each epoch. +h(4, "train-hyperparams") Environment variables for hyperparameters +tag-new(2) p | spaCy lets you set hyperparameters for training via environment variables. | This is useful, because it keeps the command simple and allows you to | #[+a("https://askubuntu.com/questions/17536/how-do-i-create-a-permanent-bash-alias/17537#17537") create an alias] | for your custom #[code train] command while still being able to easily | tweak the hyperparameters. For example: +code(false, "bash", "$"). parser_hidden_depth=2 parser_maxout_pieces=1 spacy train [...] +code("Usage with alias", "bash", "$"). alias train-parser="spacy train en /output /data /train /dev -n 1000" parser_maxout_pieces=1 train-parser +table(["Name", "Description", "Default"]) +row +cell #[code dropout_from] +cell Initial dropout rate. +cell #[code 0.2] +row +cell #[code dropout_to] +cell Final dropout rate. +cell #[code 0.2] +row +cell #[code dropout_decay] +cell Rate of dropout change. +cell #[code 0.0] +row +cell #[code batch_from] +cell Initial batch size. +cell #[code 1] +row +cell #[code batch_to] +cell Final batch size. +cell #[code 64] +row +cell #[code batch_compound] +cell Rate of batch size acceleration. +cell #[code 1.001] +row +cell #[code token_vector_width] +cell Width of embedding tables and convolutional layers. +cell #[code 128] +row +cell #[code embed_size] +cell Number of rows in embedding tables. +cell #[code 7500] //- +row //- +cell #[code parser_maxout_pieces] //- +cell Number of pieces in the parser's and NER's first maxout layer. //- +cell #[code 2] //- +row //- +cell #[code parser_hidden_depth] //- +cell Number of hidden layers in the parser and NER. //- +cell #[code 1] +row +cell #[code hidden_width] +cell Size of the parser's and NER's hidden layers. +cell #[code 128] //- +row //- +cell #[code history_feats] //- +cell Number of previous action ID features for parser and NER. //- +cell #[code 128] //- +row //- +cell #[code history_width] //- +cell Number of embedding dimensions for each action ID. //- +cell #[code 128] +row +cell #[code learn_rate] +cell Learning rate. +cell #[code 0.001] +row +cell #[code optimizer_B1] +cell Momentum for the Adam solver. +cell #[code 0.9] +row +cell #[code optimizer_B2] +cell Adagrad-momentum for the Adam solver. +cell #[code 0.999] +row +cell #[code optimizer_eps] +cell Epsylon value for the Adam solver. +cell #[code 1e-08] +row +cell #[code L2_penalty] +cell L2 regularisation penalty. +cell #[code 1e-06] +row +cell #[code grad_norm_clip] +cell Gradient L2 norm constraint. +cell #[code 1.0] +h(3, "vocab") Vocab +tag-new(2) p | Compile a vocabulary from a | #[+a("/api/annotation#vocab-jsonl") lexicon JSONL] file and optional | word vectors. Will save out a valid spaCy model that you can load via | #[+api("spacy#load") #[code spacy.load]] or package using the | #[+api("cli#package") #[code package]] command. +code(false, "bash", "$"). python -m spacy vocab [lang] [output_dir] [lexemes_loc] [vectors_loc] +table(["Argument", "Type", "Description"]) +row +cell #[code lang] +cell positional +cell | Model language | #[+a("https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes") ISO code], | e.g. #[code en]. +row +cell #[code output_dir] +cell positional +cell Model output directory. Will be created if it doesn't exist. +row +cell #[code lexemes_loc] +cell positional +cell | Location of lexical data in spaCy's | #[+a("/api/annotation#vocab-jsonl") JSONL format]. +row +cell #[code vectors_loc] +cell positional +cell Optional location of vectors data as numpy #[code .npz] file. +row("foot") +cell creates +cell model +cell A spaCy model containing the vocab and vectors. +h(3, "init-model") Init Model +tag-new(2) p | Create a new model directory from raw data, like word frequencies, Brown | clusters and word vectors. This command is similar to the | #[code spacy model] command in v1.x. +code(false, "bash", "$", false, false, true). python -m spacy init-model [lang] [output_dir] [freqs_loc] [--clusters-loc] [--vectors-loc] [--prune-vectors] +table(["Argument", "Type", "Description"]) +row +cell #[code lang] +cell positional +cell | Model language | #[+a("https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes") ISO code], | e.g. #[code en]. +row +cell #[code output_dir] +cell positional +cell Model output directory. Will be created if it doesn't exist. +row +cell #[code freqs_loc] +cell positional +cell | Location of word frequencies file. Should be a tab-separated | file with three columns: frequency, document frequency and | frequency count. +row +cell #[code --clusters-loc], #[code -c] +cell option +cell | Optional location of clusters file. Should be a tab-separated | file with three columns: cluster, word and frequency. +row +cell #[code --vectors-loc], #[code -v] +cell option +cell | Optional location of vectors file. Should be a tab-separated | file in Word2Vec format where the first column contains the word | and the remaining columns the values. File can be provided in | #[code .txt] format or as a zipped text file in #[code .zip] or | #[code .tar.gz] format. +row +cell #[code --prune-vectors], #[code -V] +cell flag +cell | Number of vectors to prune the vocabulary to. Defaults to | #[code -1] for no pruning. +row("foot") +cell creates +cell model +cell A spaCy model containing the vocab and vectors. +h(3, "evaluate") Evaluate +tag-new(2) p | Evaluate a model's accuracy and speed on JSON-formatted annotated data. | Will print the results and optionally export | #[+a("/usage/visualizers") displaCy visualizations] of a sample set of | parses to #[code .html] files. Visualizations for the dependency parse | and NER will be exported as separate files if the respective component | is present in the model's pipeline. +code(false, "bash", "$", false, false, true). python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit] [--gpu-id] [--gold-preproc] +table(["Argument", "Type", "Description"]) +row +cell #[code model] +cell positional +cell | Model to evaluate. Can be a package or shortcut link name, or a | path to a model data directory. +row +cell #[code data_path] +cell positional +cell Location of JSON-formatted evaluation data. +row +cell #[code --displacy-path], #[code -dp] +cell option +cell | Directory to output rendered parses as HTML. If not set, no | visualizations will be generated. +row +cell #[code --displacy-limit], #[code -dl] +cell option +cell | Number of parses to generate per file. Defaults to #[code 25]. | Keep in mind that a significantly higher number might cause the | #[code .html] files to render slowly. +row +cell #[code --gpu-id], #[code -g] +cell option +cell GPU to use, if any. Defaults to #[code -1] for CPU. +row +cell #[code --gold-preproc], #[code -G] +cell flag +cell Use gold preprocessing. +row("foot") +cell prints / creates +cell #[code stdout], HTML +cell Training results and optional displaCy visualizations. +h(3, "package") Package p | Generate a #[+a("/usage/training#models-generating") model Python package] | from an existing model data directory. All data files are copied over. | If the path to a #[code meta.json] is supplied, or a #[code meta.json] is | found in the input directory, this file is used. Otherwise, the data can | be entered directly from the command line. After packaging, you can run | #[code python setup.py sdist] from the newly created directory to turn | your model into an installable archive file. +code(false, "bash", "$", false, false, true). python -m spacy package [input_dir] [output_dir] [--meta-path] [--create-meta] [--force] +aside-code("Example", "bash"). python -m spacy package /input /output cd /output/en_model-0.0.0 python setup.py sdist pip install dist/en_model-0.0.0.tar.gz +table(["Argument", "Type", "Description"]) +row +cell #[code input_dir] +cell positional +cell Path to directory containing model data. +row +cell #[code output_dir] +cell positional +cell Directory to create package folder in. +row +cell #[code --meta-path], #[code -m] +cell option +cell #[+tag-new(2)] Path to #[code meta.json] file (optional). +row +cell #[code --create-meta], #[code -c] +cell flag +cell | #[+tag-new(2)] Create a #[code meta.json] file on the command | line, even if one already exists in the directory. If an | existing file is found, its entries will be shown as the defaults | in the command line prompt. +row +cell #[code --force], #[code -f] +cell flag +cell Force overwriting of existing folder in output directory. +row +cell #[code --help], #[code -h] +cell flag +cell Show help message and available arguments. +row("foot") +cell creates +cell directory +cell A Python package containing the spaCy model.