Update installation instructions and fix formatting

This commit is contained in:
ines 2017-03-08 11:36:00 +01:00
parent 34801a0725
commit 758335452d
2 changed files with 133 additions and 104 deletions

View File

@ -1,10 +1,10 @@
spaCy: Industrial-strength NLP spaCy: Industrial-strength NLP
****************************** ******************************
spaCy is a library for advanced natural language processing in Python and spaCy is a library for advanced natural language processing in Python and
Cython. spaCy is built on the very latest research, but it isn't researchware. Cython. spaCy is built on the very latest research, but it isn't researchware.
It was designed from day one to be used in real products. spaCy currently supports It was designed from day one to be used in real products. spaCy currently supports
English and German, as well as tokenization for Chinese, Spanish, Italian, French, English and German, as well as tokenization for Chinese, Spanish, Italian, French,
Portuguese, Dutch, Swedish, Finnish, Hungarian and Bengali. It's commercial open-source Portuguese, Dutch, Swedish, Finnish, Hungarian and Bengali. It's commercial open-source
software, released under the MIT license. software, released under the MIT license.
@ -13,23 +13,23 @@ software, released under the MIT license.
.. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square .. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square
:target: https://travis-ci.org/explosion/spaCy :target: https://travis-ci.org/explosion/spaCy
:alt: Build Status :alt: Build Status
.. image:: https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square .. image:: https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square
:target: https://github.com/explosion/spaCy/releases :target: https://github.com/explosion/spaCy/releases
:alt: Current Release Version :alt: Current Release Version
.. image:: https://img.shields.io/pypi/v/spacy.svg?style=flat-square .. image:: https://img.shields.io/pypi/v/spacy.svg?style=flat-square
:target: https://pypi.python.org/pypi/spacy :target: https://pypi.python.org/pypi/spacy
:alt: pypi Version :alt: pypi Version
.. image:: https://anaconda.org/conda-forge/spacy/badges/version.svg .. image:: https://anaconda.org/conda-forge/spacy/badges/version.svg
:target: https://anaconda.org/conda-forge/spacy :target: https://anaconda.org/conda-forge/spacy
:alt: conda Version :alt: conda Version
.. image:: https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-09a3d5.svg?style=flat-square .. image:: https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-09a3d5.svg?style=flat-square
:target: https://gitter.im/explosion/spaCy :target: https://gitter.im/explosion/spaCy
:alt: spaCy on Gitter :alt: spaCy on Gitter
.. image:: https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow .. image:: https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow
:target: https://twitter.com/spacy_io :target: https://twitter.com/spacy_io
:alt: spaCy on Twitter :alt: spaCy on Twitter
@ -55,7 +55,7 @@ software, released under the MIT license.
+---------------------------+------------------------------------------------------------------------------------------------------------+ +---------------------------+------------------------------------------------------------------------------------------------------------+
| **Bug reports**     | `GitHub Issue tracker <https://github.com/explosion/spaCy/issues>`_                                     | | **Bug reports**     | `GitHub Issue tracker <https://github.com/explosion/spaCy/issues>`_                                     |
+---------------------------+------------------------------------------------------------------------------------------------------------+ +---------------------------+------------------------------------------------------------------------------------------------------------+
| **Usage questions**   | `StackOverflow <http://stackoverflow.com/questions/tagged/spacy>`_, `Reddit usergroup                     | | **Usage questions**   | `StackOverflow <http://stackoverflow.com/questions/tagged/spacy>`_, `Reddit usergroup                     |
| | <https://www.reddit.com/r/spacynlp>`_, `Gitter chat <https://gitter.im/explosion/spaCy>`_ | | | <https://www.reddit.com/r/spacynlp>`_, `Gitter chat <https://gitter.im/explosion/spaCy>`_ |
+---------------------------+------------------------------------------------------------------------------------------------------------+ +---------------------------+------------------------------------------------------------------------------------------------------------+
| **General discussion** | `Reddit usergroup <https://www.reddit.com/r/spacynlp>`_, | | **General discussion** | `Reddit usergroup <https://www.reddit.com/r/spacynlp>`_, |
@ -104,100 +104,143 @@ Supports
Install spaCy Install spaCy
============= =============
spaCy is compatible with 64-bit CPython 2.6+/3.3+ and runs on Unix/Linux, OS X spaCy is compatible with **64-bit CPython 2.6+/3.3+** and runs on **Unix/Linux**,
and Windows. Source packages are available via **macOS/OS X** and **Windows**. The latest spaCy releases are available over
`pip <https://pypi.python.org/pypi/spacy>`_. Please make sure that `pip <https://pypi.python.org/pypi/spacy>`_ (source packages only) and
you have a working build enviroment set up. See notes on Ubuntu, macOS/OS X and Windows `conda <https://anaconda.org/conda-forge/spacy>`_. Installation requires a working
for details. build environment. See notes on Ubuntu, macOS/OS X and Windows for details.
pip pip
--- ---
When using pip it is generally recommended to install packages in a virtualenv to Using pip, spaCy releases are currently only available as source packages.
avoid modifying system state:
.. code:: bash .. code:: bash
pip install spacy pip install -U spacy
Python packaging is awkward at the best of times, and it's particularly tricky with When using pip it is generally recommended to install packages in a ``virtualenv``
C extensions, built via Cython, requiring large data files. So, please report issues to avoid modifying system state:
as you encounter them.
.. code:: bash
virtualenv .env
source .env/bin/activate
pip install spacy
conda conda
----- -----
If you're using conda, you can install spaCy via ``conda-forge``: Thanks to our great community, we've finally re-added conda support. You can now
install spaCy via ``conda-forge``:
.. code:: bash .. code:: bash
  conda config --add channels conda-forge   conda config --add channels conda-forge
  conda install spacy   conda install spacy
For the feedstock including the build recipe and configuration, For the feedstock including the build recipe and configuration,
check out `this repository <https://github.com/conda-forge/spacy-feedstock>`_. check out `this repository <https://github.com/conda-forge/spacy-feedstock>`_.
Thanks to our great community, we've finally re-added conda support — improvements Improvements and pull requests to the recipe and setup are always appreciated.
and pull requests to the recipe and setup are always appreciated.
Install model Download models
============= ===============
After installation you need to download a language model. Currently only models for After installation you need to download a language model. Models for English
English and German, named ``en`` and ``de``, are available. (``en``) and German (``de``) are available.
.. code:: bash .. code:: bash
python -m spacy.en.download all python -m spacy.en.download all
python -m spacy.de.download all python -m spacy.de.download all
The download command fetches about 1 GB of data which it installs The download command fetches about 1 GB of data which it installs
within the ``spacy`` package directory. within the ``spacy`` package directory.
Upgrading spaCy Sometimes new releases require a new language model. Then you will have to
=============== upgrade to a new model, too. You can also force re-downloading and installing a
new language model:
To upgrade spaCy to the latest release:
pip
---
.. code:: bash
pip install -U spacy
Sometimes new releases require a new language model. Then you will have to upgrade to
a new model, too. You can also force re-downloading and installing a new language model:
.. code:: bash .. code:: bash
python -m spacy.en.download --force python -m spacy.en.download --force
Download model to custom location
---------------------------------
You can specify where ``spacy.en.download`` and ``spacy.de.download`` download
the language model to using the ``--data-path`` or ``-d`` argument:
.. code:: bash
python -m spacy.en.download all --data-path /some/dir
If you choose to download to a custom location, you will need to tell spaCy where to load the model
from in order to use it. You can do this either by calling ``spacy.util.set_data_path()`` before
calling ``spacy.load()``, or by passing a ``path`` argument to the ``spacy.en.English`` or
``spacy.de.German`` constructors.
Download models manually
------------------------
As of v1.6, the models and word vectors are also available as direct downloads
from GitHub, attached to the `releases <https://github.com/explosion/spacy/releases>`_
as ``.tar.gz`` archives.
To install the models manually, first find the default data path. You can use
``spacy.util.get_data_path()`` to find the directory where spaCy will look for
its models, or change the default data path with ``spacy.util.set_data_path()``.
Then simply unpack the archive and place the contained folder in that directory.
You can now load the models via ``spacy.load()``.
Compile from source Compile from source
=================== ===================
The other way to install spaCy is to clone its GitHub repository and build it from The other way to install spaCy is to clone its
`GitHub repository <https://github.com/explosion/spaCy>`_ and build it from
source. That is the common way if you want to make changes to the code base. source. That is the common way if you want to make changes to the code base.
You'll need to make sure that you have a development enviroment consisting of a
You'll need to make sure that you have a development enviroment consisting of a Python distribution including header files, a compiler,
Python distribution including header files, a compiler, pip, virtualenv and git `pip <https://pip.pypa.io/en/latest/installing/>`__, `virtualenv <https://virtualenv.pypa.io/>`_
installed. The compiler part is the trickiest. How to do that depends on your and `git <https://git-scm.com>`_ installed. The compiler part is the trickiest.
system. See notes on Ubuntu, OS X and Windows for details. How to do that depends on your system. See notes on Ubuntu, OS X and Windows for
details.
.. code:: bash .. code:: bash
# make sure you are using recent pip/virtualenv versions # make sure you are using recent pip/virtualenv versions
python -m pip install -U pip virtualenv python -m pip install -U pip virtualenv
git clone #{gh("spaCy")}
# find git install instructions at https://git-scm.com/downloads
git clone https://github.com/explosion/spaCy.git
cd spaCy cd spaCy
virtualenv .env && source .env/bin/activate
virtualenv .env
source .env/bin/activate
pip install -r requirements.txt pip install -r requirements.txt
pip install -e . pip install -e .
Compared to regular install via pip `requirements.txt <requirements.txt>`_ Compared to regular install via pip `requirements.txt <requirements.txt>`_
additionally installs developer dependencies such as cython. additionally installs developer dependencies such as Cython.
Instead of the above verbose commands, you can also use the following
`Fabric <http://www.fabfile.org/>`_ commands:
+---------------+--------------------------------------------------------------+
| ``fab env`` | Create ``virtualenv`` and delete previous one, if it exists. |
+---------------+--------------------------------------------------------------+
| ``fab make`` | Compile the source. |
+---------------+--------------------------------------------------------------+
| ``fab clean`` | Remove compiled objects, including the generated C++. |
+---------------+--------------------------------------------------------------+
| ``fab test`` | Run basic tests, aborting after first failure. |
+---------------+--------------------------------------------------------------+
All commands assume that your ``virtualenv`` is located in a directory ``.env``.
If you're using a different directory, you can change it via the environment
variable ``VENV_DIR``, for example:
.. code:: bash
VENV_DIR=".custom-env" fab clean make
Ubuntu Ubuntu
------ ------
@ -211,54 +254,38 @@ Install system-level dependencies via ``apt-get``:
macOS / OS X macOS / OS X
------------ ------------
Install a recent version of `XCode <https://developer.apple.com/xcode/>`_, Install a recent version of `XCode <https://developer.apple.com/xcode/>`_,
including the so-called "Command Line Tools". macOS and OS X ship with Python including the so-called "Command Line Tools". macOS and OS X ship with Python
and git preinstalled. and git preinstalled.
Windows Windows
------- -------
Install a version of `Visual Studio Express <https://www.visualstudio.com/vs/visual-studio-express/>`_ Install a version of `Visual Studio Express <https://www.visualstudio.com/vs/visual-studio-express/>`_
or higher that matches the version that was used to compile your Python or higher that matches the version that was used to compile your Python
interpreter. For official distributions these are VS 2008 (Python 2.7), interpreter. For official distributions these are VS 2008 (Python 2.7),
VS 2010 (Python 3.4) and VS 2015 (Python 3.5). VS 2010 (Python 3.4) and VS 2015 (Python 3.5).
Run tests Run tests
========= =========
spaCy comes with an extensive test suite. First, find out where spaCy is spaCy comes with an `extensive test suite <spacy/tests>`_. First, find out where
installed: spaCy is installed:
.. code:: bash .. code:: bash
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))" python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
Then run ``pytest`` on that directory. The flags ``--vectors``, ``--slow`` Then run ``pytest`` on that directory. The flags ``--vectors``, ``--slow``
and ``--model`` are optional and enable additional tests: and ``--model`` are optional and enable additional tests:
.. code:: bash .. code:: bash
# make sure you are using recent pytest version # make sure you are using recent pytest version
python -m pip install -U pytest python -m pip install -U pytest
python -m pytest <spacy-directory> --vectors --model --slow python -m pytest <spacy-directory> --vectors --model --slow
Download model to custom location
=================================
You can specify where ``spacy.en.download`` and ``spacy.de.download`` download the language model
to using the ``--data-path`` or ``-d`` argument:
.. code:: bash
python -m spacy.en.download all --data-path /some/dir
If you choose to download to a custom location, you will need to tell spaCy where to load the model
from in order to use it. You can do this either by calling ``spacy.util.set_data_path()`` before
calling ``spacy.load()``, or by passing a ``path`` argument to the ``spacy.en.English`` or
``spacy.de.German`` constructors.
Changelog Changelog
========= =========
@ -473,10 +500,10 @@ Thanks to `@daylen <https://github.com/daylen>`_, `@RahulKulhari <https://github
2016-05-05 `v0.100.7 <https://github.com/explosion/spaCy/releases/tag/0.100.7>`_: *German!* 2016-05-05 `v0.100.7 <https://github.com/explosion/spaCy/releases/tag/0.100.7>`_: *German!*
------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------
spaCy finally supports another language, in addition to English. We're lucky spaCy finally supports another language, in addition to English. We're lucky
to have Wolfgang Seeker on the team, and the new German model is just the to have Wolfgang Seeker on the team, and the new German model is just the
beginning. Now that there are multiple languages, you should consider loading beginning. Now that there are multiple languages, you should consider loading
spaCy via the ``load()`` function. This function also makes it easier to load extra spaCy via the ``load()`` function. This function also makes it easier to load extra
word vector data for English: word vector data for English:
.. code:: python .. code:: python
@ -484,25 +511,25 @@ word vector data for English:
import spacy import spacy
en_nlp = spacy.load('en', vectors='en_glove_cc_300_1m_vectors') en_nlp = spacy.load('en', vectors='en_glove_cc_300_1m_vectors')
de_nlp = spacy.load('de') de_nlp = spacy.load('de')
To support use of the load function, there are also two new helper functions: To support use of the load function, there are also two new helper functions:
``spacy.get_lang_class`` and ``spacy.set_lang_class``. Once the German model is ``spacy.get_lang_class`` and ``spacy.set_lang_class``. Once the German model is
loaded, you can use it just like the English model: loaded, you can use it just like the English model:
.. code:: python .. code:: python
doc = nlp(u'''Wikipedia ist ein Projekt zum Aufbau einer Enzyklopädie aus freien Inhalten, zu dem du mit deinem Wissen beitragen kannst. Seit Mai 2001 sind 1.936.257 Artikel in deutscher Sprache entstanden.''') doc = nlp(u'''Wikipedia ist ein Projekt zum Aufbau einer Enzyklopädie aus freien Inhalten, zu dem du mit deinem Wissen beitragen kannst. Seit Mai 2001 sind 1.936.257 Artikel in deutscher Sprache entstanden.''')
for sent in doc.sents: for sent in doc.sents:
print(sent.root.text, sent.root.n_lefts, sent.root.n_rights) print(sent.root.text, sent.root.n_lefts, sent.root.n_rights)
# (u'ist', 1, 2) # (u'ist', 1, 2)
# (u'sind', 1, 3) # (u'sind', 1, 3)
The German model provides tokenization, POS tagging, sentence boundary detection, The German model provides tokenization, POS tagging, sentence boundary detection,
syntactic dependency parsing, recognition of organisation, location and person syntactic dependency parsing, recognition of organisation, location and person
entities, and word vector representations trained on a mix of open subtitles and entities, and word vector representations trained on a mix of open subtitles and
Wikipedia data. It doesn't yet provide lemmatisation or morphological analysis, Wikipedia data. It doesn't yet provide lemmatisation or morphological analysis,
and it doesn't yet recognise numeric entities such as numbers and dates. and it doesn't yet recognise numeric entities such as numbers and dates.
**Bugfixes** **Bugfixes**
@ -518,7 +545,7 @@ and it doesn't yet recognise numeric entities such as numbers and dates.
2016-03-08 `v0.100.6 <https://github.com/explosion/spaCy/releases/tag/0.100.6>`_: *Add support for GloVe vectors* 2016-03-08 `v0.100.6 <https://github.com/explosion/spaCy/releases/tag/0.100.6>`_: *Add support for GloVe vectors*
----------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------
This release offers improved support for replacing the word vectors used by spaCy. This release offers improved support for replacing the word vectors used by spaCy.
To install Stanford's GloVe vectors, trained on the Common Crawl, just run: To install Stanford's GloVe vectors, trained on the Common Crawl, just run:
.. code:: bash .. code:: bash
@ -527,8 +554,8 @@ To install Stanford's GloVe vectors, trained on the Common Crawl, just run:
To reduce memory usage and loading time, we've trimmed the vocabulary down to 1m entries. To reduce memory usage and loading time, we've trimmed the vocabulary down to 1m entries.
This release also integrates all the code necessary for German parsing. A German model This release also integrates all the code necessary for German parsing. A German model
will be released shortly. To assist in multi-lingual processing, we've added a ``load()`` will be released shortly. To assist in multi-lingual processing, we've added a ``load()``
function. To load the English model with the GloVe vectors: function. To load the English model with the GloVe vectors:
.. code:: python .. code:: python

View File

@ -66,7 +66,9 @@ p
# Check whether the model was successfully installed # Check whether the model was successfully installed
python -c "import spacy; spacy.load('en'); print('OK')" python -c "import spacy; spacy.load('en'); print('OK')"
p The download command fetches about 1 GB of data which it installs within the #[code spacy] package directory. p
| The download command fetches about 1 GB of data which it
| installs within the #[code spacy] package directory.
+h(3, "custom-location") Download model to custom location +h(3, "custom-location") Download model to custom location
@ -144,7 +146,7 @@ p
+row +row
+cell #[code fab clean] +cell #[code fab clean]
+cell Remove compiled objects including the generated C++. +cell Remove compiled objects, including the generated C++.
+row +row
+cell #[code fab test] +cell #[code fab test]
@ -153,7 +155,7 @@ p
p p
| All commands assume that your #[code virtualenv] is located in a | All commands assume that your #[code virtualenv] is located in a
| directory #[code .env]. If you're using a different directory, you can | directory #[code .env]. If you're using a different directory, you can
| change it via environment the variable #[code VENV_DIR], for example: | change it via the environment variable #[code VENV_DIR], for example:
+code(false, "bash"). +code(false, "bash").
VENV_DIR=".custom-env" fab clean make VENV_DIR=".custom-env" fab clean make