spaCy/website/docs/api/cli.jade
2017-06-04 13:55:00 +02:00

376 lines
11 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

//- 💫 DOCS > USAGE > COMMAND LINE INTERFACE
include ../../_includes/_mixins
p
| As of v1.7.0, spaCy comes with new command line helpers to download and
| link models and show useful debugging information. For a list of available
| commands, type #[code python -m spacy]. To make the command even more
| convenient, we recommend
| #[+a("https://askubuntu.com/questions/17536/how-do-i-create-a-permanent-bash-alias/17537#17537") creating an alias]
| mapping #[code python -m spacy] to #[code spacy].
+aside("Why python -m?")
| The problem with a global entry point is that it's resolved by looking up
| entries in your #[code PATH] environment variable. This can give you
| unexpected results, like executing the wrong spaCy installation.
| #[code python -m] prevents fallbacks to system modules.
+infobox("⚠️ Deprecation note")
| As of spaCy 2.0, the #[code model] command to initialise a model data
| directory is deprecated. The command was only necessary because previous
| versions of spaCy expected a model directory to already be set up. This
| has since been changed, so you can use the #[+api("cli#train") #[code train]]
| command straight away.
+h(2, "download") Download
p
| Download #[+a("/docs/usage/models") models] for spaCy. The downloader finds the
| best-matching compatible version, uses pip to download the model as a
| package and automatically creates a
| #[+a("/docs/usage/models#usage") shortcut link] to load the model by name.
| Direct downloads don't perform any compatibility checks and require the
| model name to be specified with its version (e.g., #[code en_core_web_sm-1.2.0]).
+code(false, "bash").
python -m spacy download [model] [--direct]
+table(["Argument", "Type", "Description"])
+row
+cell #[code model]
+cell positional
+cell Model name or shortcut (#[code en], #[code de], #[code vectors]).
+row
+cell #[code --direct], #[code -d]
+cell flag
+cell Force direct download of exact model version.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+aside("Downloading best practices")
| The #[code download] command is mostly intended as a convenient,
| interactive wrapper it performs compatibility checks and prints
| detailed messages in case things go wrong. It's #[strong not recommended]
| to use this command as part of an automated process. If you know which
| model your project needs, you should consider a
| #[+a("/docs/usage/models#download-pip") direct download via pip], or
| uploading the model to a local PyPi installation and fetching it straight
| from there. This will also allow you to add it as a versioned package
| dependency to your project.
+h(2, "link") Link
p
| Create a #[+a("/docs/usage/models#usage") shortcut link] for a model,
| either a Python package or a local directory. This will let you load
| models from any location using a custom name via
| #[+api("spacy#load") #[code spacy.load()]].
+infobox("Important note")
| In spaCy v1.x, you had to use the model data directory to set up a shortcut
| link for a local path. As of v2.0, spaCy expects all shortcut links to
| be #[strong loadable model packages]. If you want to load a data directory,
| call #[+api("spacy#load") #[code spacy.load()]] or
| #[+api("language#from_disk") #[code Language.from_disk()]] with the path,
| or use the #[+api("cli#package") #[code package]] command to create a
| model package.
+code(false, "bash").
python -m spacy link [origin] [link_name] [--force]
+table(["Argument", "Type", "Description"])
+row
+cell #[code origin]
+cell positional
+cell Model name if package, or path to local directory.
+row
+cell #[code link_name]
+cell positional
+cell Name of the shortcut link to create.
+row
+cell #[code --force], #[code -f]
+cell flag
+cell Force overwriting of existing link.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+h(2, "info") Info
p
| Print information about your spaCy installation, models and local setup,
| and generate #[+a("https://en.wikipedia.org/wiki/Markdown") Markdown]-formatted
| markup to copy-paste into #[+a(gh("spacy") + "/issues") GitHub issues].
+code(false, "bash").
python -m spacy info [--markdown]
python -m spacy info [model] [--markdown]
+table(["Argument", "Type", "Description"])
+row
+cell #[code model]
+cell positional
+cell A model, i.e. shortcut link, package name or path (optional).
+row
+cell #[code --markdown], #[code -md]
+cell flag
+cell Print information as Markdown.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+h(2, "convert") Convert
p
| Convert files into spaCy's #[+a("/docs/api/annotation#json-input") JSON format]
| for use with the #[code train] command and other experiment management
| functions. The right converter is chosen based on the file extension of
| the input file. Currently only supports #[code .conllu].
+code(false, "bash").
python -m spacy convert [input_file] [output_dir] [--n-sents] [--morphology]
+table(["Argument", "Type", "Description"])
+row
+cell #[code input_file]
+cell positional
+cell Input file.
+row
+cell #[code output_dir]
+cell positional
+cell Output directory for converted JSON file.
+row
+cell #[code --n-sents], #[code -n]
+cell option
+cell Number of sentences per document.
+row
+cell #[code --morphology], #[code -m]
+cell option
+cell Enable appending morphology to tags.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+h(2, "train") Train
p
| Train a model. Expects data in spaCy's
| #[+a("/docs/api/annotation#json-input") JSON format].
+code(false, "bash").
python -m spacy train [lang] [output_dir] [train_data] [dev_data] [--n-iter] [--n-sents] [--use-gpu] [--no-tagger] [--no-parser] [--no-entities]
+table(["Argument", "Type", "Description"])
+row
+cell #[code lang]
+cell positional
+cell Model language.
+row
+cell #[code output_dir]
+cell positional
+cell Directory to store model in.
+row
+cell #[code train_data]
+cell positional
+cell Location of JSON-formatted training data.
+row
+cell #[code dev_data]
+cell positional
+cell Location of JSON-formatted dev data (optional).
+row
+cell #[code --n-iter], #[code -n]
+cell option
+cell Number of iterations (default: #[code 20]).
+row
+cell #[code --n-sents], #[code -ns]
+cell option
+cell Number of sentences (default: #[code 0]).
+row
+cell #[code --use-gpu], #[code -G]
+cell flag
+cell Use GPU.
+row
+cell #[code --no-tagger], #[code -T]
+cell flag
+cell Don't train tagger.
+row
+cell #[code --no-parser], #[code -P]
+cell flag
+cell Don't train parser.
+row
+cell #[code --no-entities], #[code -N]
+cell flag
+cell Don't train NER.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.
+h(3, "train-hyperparams") Environment variables for hyperparameters
p
| spaCy lets you set hyperparameters for training via environment variables.
| This is useful, because it keeps the command simple and allows you to
| #[+a("https://askubuntu.com/questions/17536/how-do-i-create-a-permanent-bash-alias/17537#17537") create an alias]
| for your custom #[code train] command while still being able to easily
| tweak the hyperparameters. For example:
+code(false, "bash").
parser_hidden_depth=2 parser_maxout_pieces=1 train-parser
+under-construction
+table(["Name", "Description", "Default"])
+row
+cell #[code dropout_from]
+cell
+cell #[code 0.2]
+row
+cell #[code dropout_to]
+cell
+cell #[code 0.2]
+row
+cell #[code dropout_decay]
+cell
+cell #[code 0.0]
+row
+cell #[code batch_from]
+cell
+cell #[code 1]
+row
+cell #[code batch_to]
+cell
+cell #[code 64]
+row
+cell #[code batch_compound]
+cell
+cell #[code 1.001]
+row
+cell #[code token_vector_width]
+cell
+cell #[code 128]
+row
+cell #[code embed_size]
+cell
+cell #[code 7500]
+row
+cell #[code parser_maxout_pieces]
+cell
+cell #[code 2]
+row
+cell #[code parser_hidden_depth]
+cell
+cell #[code 1]
+row
+cell #[code hidden_width]
+cell
+cell #[code 128]
+row
+cell #[code learn_rate]
+cell
+cell #[code 0.001]
+row
+cell #[code optimizer_B1]
+cell
+cell #[code 0.9]
+row
+cell #[code optimizer_B2]
+cell
+cell #[code 0.999]
+row
+cell #[code optimizer_eps]
+cell
+cell #[code 1e-08]
+row
+cell #[code L2_penalty]
+cell
+cell #[code 1e-06]
+row
+cell #[code grad_norm_clip]
+cell
+cell #[code 1.0]
+h(2, "package") Package
p
| Generate a #[+a("/docs/usage/saving-loading#generating") model Python package]
| from an existing model data directory. All data files are copied over.
| If the path to a meta.json is supplied, or a meta.json is found in the
| input directory, this file is used. Otherwise, the data can be entered
| directly from the command line. The required file templates are downloaded
| from #[+src(gh("spacy-dev-resources", "templates/model")) GitHub] to make
| sure you're always using the latest versions. This means you need to be
| connected to the internet to use this command.
+code(false, "bash").
python -m spacy package [input_dir] [output_dir] [--meta] [--force]
+table(["Argument", "Type", "Description"])
+row
+cell #[code input_dir]
+cell positional
+cell Path to directory containing model data.
+row
+cell #[code output_dir]
+cell positional
+cell Directory to create package folder in.
+row
+cell #[code meta]
+cell option
+cell Path to meta.json file (optional).
+row
+cell #[code --force], #[code -f]
+cell flag
+cell Force overwriting of existing folder in output directory.
+row
+cell #[code --help], #[code -h]
+cell flag
+cell Show help message and available arguments.