spaCy/website/docs/usage/index.jade

369 lines
14 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

//- 💫 DOCS > USAGE
include ../../_includes/_mixins
p
| spaCy is compatible with #[strong 64-bit CPython 2.6+∕3.3+] and
| runs on #[strong Unix/Linux], #[strong macOS/OS X] and
| #[strong Windows]. The latest spaCy releases are
| available over #[+a("https://pypi.python.org/pypi/spacy") pip] (source
| packages only) and #[+a("https://anaconda.org/conda-forge/spacy") conda].
| Installation requires a working build environment. See notes on
| #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") macOS/OS X]
| and #[a(href="#source-windows") Windows] for details.
+quickstart(QUICKSTART, "Quickstart")
+qs({config: 'venv', python: 2}) python -m pip install -U virtualenv
+qs({config: 'venv', python: 3}) python -m pip install -U venv
+qs({config: 'venv', python: 2}) virtualenv .env
+qs({config: 'venv', python: 3}) venv .env
+qs({config: 'venv', os: 'mac'}) source .env/bin/activate
+qs({config: 'venv', os: 'linux'}) source .env/bin/activate
+qs({config: 'venv', os: 'windows'}) .env\Scripts\activate
+qs({config: 'gpu', os: 'mac'}) export CUDA_HOME=/usr/local/cuda-8.0
+qs({config: 'gpu', os: 'mac'}) export PATH=$PATH:$CUDA_HOME/bin
+qs({config: 'gpu', os: 'linux'}) export CUDA_HOME=/usr/local/cuda-8.0
+qs({config: 'gpu', os: 'linux'}) export PATH=$PATH:$CUDA_HOME/bin
+qs({config: 'gpu', package: 'pip'}) pip install -U chainer
+qs({config: 'gpu', package: 'source'}) pip install -U chainer
+qs({config: 'gpu', package: 'conda'}) conda install -c anaconda chainer
+qs({package: 'pip'}) pip install -U spacy
+qs({package: 'conda'}) conda install -c conda-forge spacy
+qs({package: 'source'}) git clone https://github.com/explosion/spaCy
+qs({package: 'source'}) cd spaCy
+qs({package: 'source'}) pip install -r requirements.txt
+qs({package: 'source'}) pip install -e .
+qs({model: 'en'}) python -m spacy download en
+qs({model: 'de'}) python -m spacy download de
+qs({model: 'fr'}) python -m spacy download fr
+qs({model: 'es'}) python -m spacy download es
+h(2, "installation") Installation instructions
+h(3, "pip") pip
+badge("pipy")
p Using pip, spaCy releases are currently only available as source packages.
+code(false, "bash").
pip install -U spacy
+aside("Download models")
| After installation you need to download a language model. For more info
| and available models, see the #[+a("/docs/usage/models") docs on models].
+code.o-no-block.
python -m spacy download en
>>> import spacy
>>> nlp = spacy.load('en')
p
| When using pip it is generally recommended to install packages in a
| #[code virtualenv] to avoid modifying system state:
+code(false, "bash").
virtualenv .env
source .env/bin/activate
pip install spacy
+h(3, "conda") conda
+badge("conda")
p
| Thanks to our great community, we've finally re-added conda support. You
| can now install spaCy via #[code conda-forge]:
+code(false, "bash").
conda config --add channels conda-forge
conda install spacy
p
| For the feedstock including the build recipe and configuration, check out
| #[+a("https://github.com/conda-forge/spacy-feedstock") this repository].
| Improvements and pull requests to the recipe and setup are always appreciated.
+h(2, "gpu") Run spaCy with GPU
p
| As of v2.0, spaCy's comes with neural network models that are implemented
| in our machine learning library, #[+a(gh("thinc")) Thinc]. For GPU
| support, we've been grateful to use the work of
| #[+a("http://chainer.org") Chainer]'s CuPy module, which provides
| a NumPy-compatible interface for GPU arrays.
+aside("Why is this so complicated?")
| Installing Chainer when no GPU is available currently causes an
| error. We therefore do not specify Chainer as a dependency. However,
| CuPy will be split out into
| #[+a("https://www.slideshare.net/beam2d/chainer-v2-alpha/7") its own package]
| in Chainer v2.0. We'll have a smoother installation process for this
| in an upcoming version.
p
| First, install follows the normal CUDA installation procedure. Next, set
| your environment variables so that the installation will be able to find
| CUDA. Next, install Chainer, and check that CuPy can be imported
| correctly. Finally, install spaCy.
+code(false, "bash").
export CUDA_HOME=/usr/local/cuda-8.0 # Or wherever your CUDA is
export PATH=$PATH:$CUDA_HOME/bin
pip install chainer
python -c "import cupy; assert cupy" # Check it installed
pip install spacy
python -c "import thinc.neural.gpu_ops" # Check the GPU ops were built
+h(2, "source") Compile from source
p
| The other way to install spaCy is to clone its
| #[+a(gh("spaCy")) GitHub repository] and build it from source. That is
| the common way if you want to make changes to the code base. You'll need to
| make sure that you have a development enviroment consisting of a Python
| distribution including header files, a compiler,
| #[+a("https://pip.pypa.io/en/latest/installing/") pip],
| #[+a("https://virtualenv.pypa.io/") virtualenv] and
| #[+a("https://git-scm.com") git] installed. The compiler part is the
| trickiest. How to do that depends on your system. See notes on
| #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") OS X] and
| #[a(href="#source-windows") Windows] for details.
+code(false, "bash").
# make sure you are using recent pip/virtualenv versions
python -m pip install -U pip virtualenv
git clone #{gh("spaCy")}
cd spaCy
virtualenv .env
source .env/bin/activate
pip install -r requirements.txt
pip install -e .
p
| Compared to regular install via pip, #[+a(gh("spaCy", "requirements.txt")) requirements.txt]
| additionally installs developer dependencies such as Cython.
p
| Instead of the above verbose commands, you can also use the following
| #[+a("http://www.fabfile.org/") Fabric] commands:
+table(["Command", "Description"])
+row
+cell #[code fab env]
+cell Create #[code virtualenv] and delete previous one, if it exists.
+row
+cell #[code fab make]
+cell Compile the source.
+row
+cell #[code fab clean]
+cell Remove compiled objects, including the generated C++.
+row
+cell #[code fab test]
+cell Run basic tests, aborting after first failure.
p
| All commands assume that your #[code virtualenv] is located in a
| directory #[code .env]. If you're using a different directory, you can
| change it via the environment variable #[code VENV_DIR], for example:
+code(false, "bash").
VENV_DIR=".custom-env" fab clean make
+h(3, "source-ubuntu") Ubuntu
p Install system-level dependencies via #[code apt-get]:
+code(false, "bash").
sudo apt-get install build-essential python-dev git
+h(3, "source-osx") macOS / OS X
p
| Install a recent version of #[+a("https://developer.apple.com/xcode/") XCode],
| including the so-called "Command Line Tools". macOS and OS X ship with
| Python and git preinstalled. To compile spaCy with multi-threading support
| on macOS / OS X, #[+a("https://github.com/explosion/spaCy/issues/267") see here].
+h(3, "source-windows") Windows
p
| Install a version of
| #[+a("https://www.visualstudio.com/vs/visual-studio-express/") Visual Studio Express]
| that matches the version that was used to compile your Python
| interpreter. For official distributions these are:
+table([ "Distribution", "Version"])
+row
+cell Python 2.7
+cell Visual Studio 2008
+row
+cell Python 3.4
+cell Visual Studio 2010
+row
+cell Python 3.5+
+cell Visual Studio 2015
+h(2, "troubleshooting") Troubleshooting guide
p
| This section collects some of the most common errors you may come
| across when installing, loading and using spaCy, as well as their solutions.
+aside("Help us improve this guide")
| Did you come across a problem like the ones listed here and want to
| share the solution? You can find the "Suggest edits" button at the
| bottom of this page that points you to the source. We always
| appreciate #[+a(gh("spaCy") + "/pulls") pull requests]!
+h(3, "compatible-model") No compatible model found
+code(false, "text").
No compatible model found for [lang] (spaCy v#{SPACY_VERSION}).
p
| This usually means that the model you're trying to download does not
| exist, or isn't available for your version of spaCy. Check the
| #[+a(gh("spacy-models", "compatibility.json")) compatibility table]
| to see which models are available for your spaCy version. If you're using
| an old version, consider upgrading to the latest release. Note that while
| spaCy supports tokenization for
| #[+a("/docs/api/language-models/#alpha-support") a variety of languages],
| not all of them come with statistical models. To only use the tokenizer,
| import the language's #[code Language] class instead, for example
| #[code from spacy.fr import French].
+h(3, "symlink-privilege") Symbolic link privilege not held
+code(false, "text").
OSError: symbolic link privilege not held
p
| To create #[+a("/docs/usage/models/#usage") shortcut links] that let you
| load models by name, spaCy creates a symbolic link in the
| #[code spacy/data] directory. This means your user needs permission to do
| this. The above error mostly occurs when doing a system-wide installation,
| which will create the symlinks in a system directory. Run the
| #[code download] or #[code link] command as administrator, or use a
| #[code virtualenv] to install spaCy in a user directory, instead
| of doing a system-wide installation.
+h(3, "no-cache-dir") No such option: --no-cache-dir
+code(false, "text").
no such option: --no-cache-dir
p
| The #[code download] command uses pip to install the models and sets the
| #[code --no-cache-dir] flag to prevent it from requiring too much memory.
| #[+a("https://pip.pypa.io/en/stable/reference/pip_install/#caching") This setting]
| requires pip v6.0 or newer. Run #[code pip install -U pip] to upgrade to
| the latest version of pip. To see which version you have installed,
| run #[code pip --version].
+h(3, "import-error") Import error
+code(false, "text").
Import Error: No module named spacy
p
| This error means that the spaCy module can't be located on your system, or in
| your environment. Make sure you have spaCy installed. If you're using a
| #[code virtualenv], make sure it's activated and check that spaCy is
| installed in that environment otherwise, you're trying to load a system
| installation. You can also run #[code which python] to find out where
| your Python executable is located.
+h(3, "import-error-models") Import error: models
+code(false, "text").
ImportError: No module named 'en_core_web_sm'
p
| As of spaCy v1.7, all models can be installed as Python packages. This means
| that they'll become importable modules of your application. When creating
| #[+a("/docs/usage/models/#usage") shortcut links], spaCy will also try
| to import the model to load its meta data. If this fails, it's usually a
| sign that the package is not installed in the current environment.
| Run #[code pip list] or #[code pip freeze] to check which model packages
| you have installed, and install the
| #[+a("/docs/usage/models#available") correct models] if necessary. If you're
| importing a model manually at the top of a file, make sure to use the name
| of the package, not the shortcut link you've created.
+h(3, "vocab-strings") File not found: vocab/strings.json
+code(false, "text").
FileNotFoundError: No such file or directory: [...]/vocab/strings.json
p
| This error may occur when using #[code spacy.load()] to load
| a language model either because you haven't set up a
| #[+a("/docs/usage/models/#usage") shortcut link] for it, or because it
| doesn't actually exist. Set up a
| #[+a("/docs/usage/models/#usage") shortcut link] for the model
| you want to load. This can either be an installed model package, or a
| local directory containing the model data. If you want to use one of the
| #[+a("/docs/api/language-models/#alpha-support") alpha tokenizers] for
| languages that don't yet have a statistical model, you should import its
| #[code Language] class instead, for example
| #[code from spacy.lang.bn import Bengali].
+h(3, "command-not-found") Command not found
+code(false, "text").
command not found: spacy
p
| This error may occur when running the #[code spacy] command from the
| command line. spaCy does not currently add an entry to our #[code PATH]
| environment variable, as this can lead to unexpected results, especially
| when using #[code virtualenv]. Run the command with #[code python -m],
| for example #[code python -m spacy download en]. For more info on this,
| see #[+api("cli#download") download].
+h(3, "module-load") 'module' object has no attribute 'load'
+code(false, "text").
AttributeError: 'module' object has no attribute 'load'
p
| While this could technically have many causes, including spaCy being
| broken, the most likely one is that your script's file or directory name
| is "shadowing" the module e.g. your file is called #[code spacy.py],
| or a directory you're importing from is called #[code spacy]. So, when
| using spaCy, never call anything else #[code spacy].
+h(2, "tests") Run tests
p
| spaCy comes with an #[+a(gh("spacy", "spacy/tests")) extensive test suite].
| First, find out where spaCy is installed:
+code(false, "bash").
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
p
| Then run #[code pytest] on that directory. The flags #[code --slow] and
| #[code --model] are optional and enable additional tests.
+code(false, "bash").
# make sure you are using recent pytest version
python -m pip install -U pytest
python -m pytest <spacy-directory> # basic tests
python -m pytest <spacy-directory> --slow # basic and slow tests
python -m pytest <spacy-directory> --models --all # basic and all model tests
python -m pytest <spacy-directory> --models --en # basic and English model tests