diff --git a/README.rst b/README.rst
index 017a456f8..b6bee922b 100644
--- a/README.rst
+++ b/README.rst
@@ -1,35 +1,35 @@
spaCy: Industrial-strength NLP
******************************
-spaCy is a library for advanced natural language processing in Python and
-Cython. spaCy is built on the very latest research, but it isn't researchware.
-It was designed from day one to be used in real products. spaCy currently supports
-English and German, as well as tokenization for Chinese, Spanish, Italian, French,
+spaCy is a library for advanced natural language processing in Python and
+Cython. spaCy is built on the very latest research, but it isn't researchware.
+It was designed from day one to be used in real products. spaCy currently supports
+English and German, as well as tokenization for Chinese, Spanish, Italian, French,
Portuguese, Dutch, Swedish, Finnish, Hungarian and Bengali. It's commercial open-source
software, released under the MIT license.
đĢ **Version 1.6 out now!** `Read the release notes here. `_
-.. image:: https://img.shields.io/travis/explosion/spaCy.svg?style=flat-square
+.. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square
:target: https://travis-ci.org/explosion/spaCy
:alt: Build Status
-
+
.. image:: https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square
- :target: https://github.com/explosion/spaCy/releases
+ :target: https://github.com/explosion/spaCy/releases
:alt: Current Release Version
-
-.. image:: https://anaconda.org/conda-forge/spacy/badges/version.svg
- :target: https://anaconda.org/conda-forge/spacy
- :alt: conda Version
-
+
.. image:: https://img.shields.io/pypi/v/spacy.svg?style=flat-square
:target: https://pypi.python.org/pypi/spacy
:alt: pypi Version
+.. image:: https://anaconda.org/conda-forge/spacy/badges/version.svg
+ :target: https://anaconda.org/conda-forge/spacy
+ :alt: conda Version
+
.. image:: https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-09a3d5.svg?style=flat-square
:target: https://gitter.im/explosion/spaCy
:alt: spaCy on Gitter
-
+
.. image:: https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow
:target: https://twitter.com/spacy_io
:alt: spaCy on Twitter
@@ -55,7 +55,7 @@ software, released under the MIT license.
+---------------------------+------------------------------------------------------------------------------------------------------------+
| **Bug reports** Â Â | `GitHub Issue tracker `_ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â |
+---------------------------+------------------------------------------------------------------------------------------------------------+
-| **Usage questions**  | `StackOverflow `_, `Reddit usergroup           |
+| **Usage questions**  | `StackOverflow `_, `Reddit usergroup           |
| | `_, `Gitter chat `_ |
+---------------------------+------------------------------------------------------------------------------------------------------------+
|Â **General discussion** | `Reddit usergroup `_, |
@@ -104,100 +104,143 @@ Supports
Install spaCy
=============
-spaCy is compatible with 64-bit CPython 2.6+/3.3+ and runs on Unix/Linux, OS X
-and Windows. Source packages are available via
-`pip `_. Please make sure that
-you have a working build enviroment set up. See notes on Ubuntu, macOS/OS X and Windows
-for details.
+spaCy is compatible with **64-bit CPython 2.6+/3.3+** and runs on **Unix/Linux**,
+**macOS/OS X** and **Windows**. The latest spaCy releases are available over
+`pip `_ (source packages only) and
+`conda `_. Installation requires a working
+build environment. See notes on Ubuntu, macOS/OS X and Windows for details.
pip
---
-When using pip it is generally recommended to install packages in a virtualenv to
-avoid modifying system state:
+Using pip, spaCy releases are currently only available as source packages.
.. code:: bash
- pip install spacy
+ pip install -U spacy
-Python packaging is awkward at the best of times, and it's particularly tricky with
-C extensions, built via Cython, requiring large data files. So, please report issues
-as you encounter them.
+When using pip it is generally recommended to install packages in a ``virtualenv``
+to avoid modifying system state:
+
+.. code:: bash
+
+ virtualenv .env
+ source .env/bin/activate
+ pip install spacy
conda
-----
-If you're using conda, you can install spaCy via ``conda-forge``:
+Thanks to our great community, we've finally re-added conda support. You can now
+install spaCy via ``conda-forge``:
.. code:: bash
 conda config --add channels conda-forge
 conda install spacy
-
+
For the feedstock including the build recipe and configuration,
check out `this repository `_.
-Thanks to our great community, we've finally re-added conda support â improvements
-and pull requests to the recipe and setup are always appreciated.
+Improvements and pull requests to the recipe and setup are always appreciated.
-Install model
-=============
+Download models
+===============
-After installation you need to download a language model. Currently only models for
-English and German, named ``en`` and ``de``, are available.
+After installation you need to download a language model. Models for English
+(``en``) and German (``de``) are available.
.. code:: bash
python -m spacy.en.download all
python -m spacy.de.download all
-The download command fetches about 1 GB of data which it installs
+The download command fetches about 1 GB of data which it installs
within the ``spacy`` package directory.
-Upgrading spaCy
-===============
-
-To upgrade spaCy to the latest release:
-
-pip
----
-
-.. code:: bash
-
- pip install -U spacy
-
-Sometimes new releases require a new language model. Then you will have to upgrade to
-a new model, too. You can also force re-downloading and installing a new language model:
+Sometimes new releases require a new language model. Then you will have to
+upgrade to a new model, too. You can also force re-downloading and installing a
+new language model:
.. code:: bash
python -m spacy.en.download --force
+Download model to custom location
+---------------------------------
+
+You can specify where ``spacy.en.download`` and ``spacy.de.download`` download
+the language model to using the ``--data-path`` or ``-d`` argument:
+
+.. code:: bash
+
+ python -m spacy.en.download all --data-path /some/dir
+
+If you choose to download to a custom location, you will need to tell spaCy where to load the model
+from in order to use it. You can do this either by calling ``spacy.util.set_data_path()`` before
+calling ``spacy.load()``, or by passing a ``path`` argument to the ``spacy.en.English`` or
+``spacy.de.German`` constructors.
+
+Download models manually
+------------------------
+
+As of v1.6, the models and word vectors are also available as direct downloads
+from GitHub, attached to the `releases `_
+as ``.tar.gz`` archives.
+
+To install the models manually, first find the default data path. You can use
+``spacy.util.get_data_path()`` to find the directory where spaCy will look for
+its models, or change the default data path with ``spacy.util.set_data_path()``.
+Then simply unpack the archive and place the contained folder in that directory.
+You can now load the models via ``spacy.load()``.
+
Compile from source
===================
-The other way to install spaCy is to clone its GitHub repository and build it from
+The other way to install spaCy is to clone its
+`GitHub repository `_ and build it from
source. That is the common way if you want to make changes to the code base.
-
-You'll need to make sure that you have a development enviroment consisting of a
-Python distribution including header files, a compiler, pip, virtualenv and git
-installed. The compiler part is the trickiest. How to do that depends on your
-system. See notes on Ubuntu, OS X and Windows for details.
+You'll need to make sure that you have a development enviroment consisting of a
+Python distribution including header files, a compiler,
+`pip `__, `virtualenv `_
+and `git `_ installed. The compiler part is the trickiest.
+How to do that depends on your system. See notes on Ubuntu, OS X and Windows for
+details.
.. code:: bash
# make sure you are using recent pip/virtualenv versions
python -m pip install -U pip virtualenv
-
- # find git install instructions at https://git-scm.com/downloads
- git clone https://github.com/explosion/spaCy.git
-
+ git clone https://github.com/explosion/spaCy
cd spaCy
- virtualenv .env && source .env/bin/activate
+
+ virtualenv .env
+ source .env/bin/activate
pip install -r requirements.txt
pip install -e .
-
-Compared to regular install via pip `requirements.txt `_
-additionally installs developer dependencies such as cython.
+
+Compared to regular install via pip `requirements.txt `_
+additionally installs developer dependencies such as Cython.
+
+Instead of the above verbose commands, you can also use the following
+`Fabric `_ commands:
+
++---------------+--------------------------------------------------------------+
+| ``fab env`` | Create ``virtualenv`` and delete previous one, if it exists. |
++---------------+--------------------------------------------------------------+
+| ``fab make`` | Compile the source. |
++---------------+--------------------------------------------------------------+
+| ``fab clean`` | Remove compiled objects, including the generated C++. |
++---------------+--------------------------------------------------------------+
+| ``fab test`` | Run basic tests, aborting after first failure. |
++---------------+--------------------------------------------------------------+
+
+All commands assume that your ``virtualenv`` is located in a directory ``.env``.
+If you're using a different directory, you can change it via the environment
+variable ``VENV_DIR``, for example:
+
+.. code:: bash
+
+ VENV_DIR=".custom-env" fab clean make
Ubuntu
------
@@ -211,54 +254,38 @@ Install system-level dependencies via ``apt-get``:
macOS / OS X
------------
-Install a recent version of `XCode `_,
-including the so-called "Command Line Tools". macOS and OS X ship with Python
+Install a recent version of `XCode `_,
+including the so-called "Command Line Tools". macOS and OS X ship with Python
and git preinstalled.
Windows
-------
Install a version of `Visual Studio Express `_
-or higher that matches the version that was used to compile your Python
-interpreter. For official distributions these are VS 2008 (Python 2.7),
+or higher that matches the version that was used to compile your Python
+interpreter. For official distributions these are VS 2008 (Python 2.7),
VS 2010 (Python 3.4) and VS 2015 (Python 3.5).
Run tests
=========
-spaCy comes with an extensive test suite. First, find out where spaCy is
-installed:
+spaCy comes with an `extensive test suite `_. First, find out where
+spaCy is installed:
.. code:: bash
-
+
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
-Then run ``pytest`` on that directory. The flags ``--vectors``, ``--slow``
+Then run ``pytest`` on that directory. The flags ``--vectors``, ``--slow``
and ``--model`` are optional and enable additional tests:
.. code:: bash
-
+
# make sure you are using recent pytest version
python -m pip install -U pytest
python -m pytest --vectors --model --slow
-Download model to custom location
-=================================
-
-You can specify where ``spacy.en.download`` and ``spacy.de.download`` download the language model
-to using the ``--data-path`` or ``-d`` argument:
-
-.. code:: bash
-
- python -m spacy.en.download all --data-path /some/dir
-
-
-If you choose to download to a custom location, you will need to tell spaCy where to load the model
-from in order to use it. You can do this either by calling ``spacy.util.set_data_path()`` before
-calling ``spacy.load()``, or by passing a ``path`` argument to the ``spacy.en.English`` or
-``spacy.de.German`` constructors.
-
Changelog
=========
@@ -473,10 +500,10 @@ Thanks to `@daylen `_, `@RahulKulhari `_: *German!*
-------------------------------------------------------------------------------------------
-spaCy finally supports another language, in addition to English. We're lucky
-to have Wolfgang Seeker on the team, and the new German model is just the
-beginning. Now that there are multiple languages, you should consider loading
-spaCy via the ``load()`` function. This function also makes it easier to load extra
+spaCy finally supports another language, in addition to English. We're lucky
+to have Wolfgang Seeker on the team, and the new German model is just the
+beginning. Now that there are multiple languages, you should consider loading
+spaCy via the ``load()`` function. This function also makes it easier to load extra
word vector data for English:
.. code:: python
@@ -484,25 +511,25 @@ word vector data for English:
import spacy
en_nlp = spacy.load('en', vectors='en_glove_cc_300_1m_vectors')
de_nlp = spacy.load('de')
-
-To support use of the load function, there are also two new helper functions:
-``spacy.get_lang_class`` and ``spacy.set_lang_class``. Once the German model is
+
+To support use of the load function, there are also two new helper functions:
+``spacy.get_lang_class`` and ``spacy.set_lang_class``. Once the German model is
loaded, you can use it just like the English model:
.. code:: python
doc = nlp(u'''Wikipedia ist ein Projekt zum Aufbau einer Enzyklopädie aus freien Inhalten, zu dem du mit deinem Wissen beitragen kannst. Seit Mai 2001 sind 1.936.257 Artikel in deutscher Sprache entstanden.''')
-
+
for sent in doc.sents:
print(sent.root.text, sent.root.n_lefts, sent.root.n_rights)
-
+
# (u'ist', 1, 2)
# (u'sind', 1, 3)
-
-The German model provides tokenization, POS tagging, sentence boundary detection,
-syntactic dependency parsing, recognition of organisation, location and person
-entities, and word vector representations trained on a mix of open subtitles and
-Wikipedia data. It doesn't yet provide lemmatisation or morphological analysis,
+
+The German model provides tokenization, POS tagging, sentence boundary detection,
+syntactic dependency parsing, recognition of organisation, location and person
+entities, and word vector representations trained on a mix of open subtitles and
+Wikipedia data. It doesn't yet provide lemmatisation or morphological analysis,
and it doesn't yet recognise numeric entities such as numbers and dates.
**Bugfixes**
@@ -518,7 +545,7 @@ and it doesn't yet recognise numeric entities such as numbers and dates.
2016-03-08 `v0.100.6 `_: *Add support for GloVe vectors*
-----------------------------------------------------------------------------------------------------------------
-This release offers improved support for replacing the word vectors used by spaCy.
+This release offers improved support for replacing the word vectors used by spaCy.
To install Stanford's GloVe vectors, trained on the Common Crawl, just run:
.. code:: bash
@@ -527,8 +554,8 @@ To install Stanford's GloVe vectors, trained on the Common Crawl, just run:
To reduce memory usage and loading time, we've trimmed the vocabulary down to 1m entries.
-This release also integrates all the code necessary for German parsing. A German model
-will be released shortly. To assist in multi-lingual processing, we've added a ``load()``
+This release also integrates all the code necessary for German parsing. A German model
+will be released shortly. To assist in multi-lingual processing, we've added a ``load()``
function. To load the English model with the GloVe vectors:
.. code:: python
diff --git a/spacy/tests/bn/__init__.py b/spacy/tests/bn/__init__.py
new file mode 100644
index 000000000..57d631c3f
--- /dev/null
+++ b/spacy/tests/bn/__init__.py
@@ -0,0 +1 @@
+# coding: utf-8
diff --git a/spacy/tests/bn/test_tokenizer.py b/spacy/tests/bn/test_tokenizer.py
new file mode 100644
index 000000000..08b9a00df
--- /dev/null
+++ b/spacy/tests/bn/test_tokenizer.py
@@ -0,0 +1,40 @@
+# encoding: utf8
+from __future__ import unicode_literals
+
+import pytest
+
+TESTCASES = []
+
+PUNCTUATION_TESTS = [
+ (u'āĻāĻŽāĻŋ āĻŦāĻžāĻāĻ˛āĻžāĻ¯āĻŧ āĻāĻžāĻ¨ āĻāĻžāĻ!', [u'āĻāĻŽāĻŋ', u'āĻŦāĻžāĻāĻ˛āĻžāĻ¯āĻŧ', u'āĻāĻžāĻ¨', u'āĻāĻžāĻ', u'!']),
+ (u'āĻāĻŽāĻŋ āĻŦāĻžāĻāĻ˛āĻžāĻ¯āĻŧ āĻāĻĨāĻž āĻāĻāĨ¤', [u'āĻāĻŽāĻŋ', u'āĻŦāĻžāĻāĻ˛āĻžāĻ¯āĻŧ', u'āĻāĻĨāĻž', u'āĻāĻ', u'āĨ¤']),
+ (u'āĻŦāĻ¸ā§āĻ¨ā§āĻ§āĻ°āĻž āĻāĻ¨āĻ¸āĻŽā§āĻŽā§āĻā§ āĻĻā§āĻˇ āĻ¸ā§āĻŦā§āĻāĻžāĻ° āĻāĻ°āĻ˛ā§ āĻ¨āĻž?', [u'āĻŦāĻ¸ā§āĻ¨ā§āĻ§āĻ°āĻž', u'āĻāĻ¨āĻ¸āĻŽā§āĻŽā§āĻā§', u'āĻĻā§āĻˇ', u'āĻ¸ā§āĻŦā§āĻāĻžāĻ°', u'āĻāĻ°āĻ˛ā§', u'āĻ¨āĻž', u'?']),
+ (u'āĻāĻžāĻāĻž āĻĨāĻžāĻāĻ˛ā§ āĻāĻŋ āĻ¨āĻž āĻšāĻ¯āĻŧ!', [u'āĻāĻžāĻāĻž', u'āĻĨāĻžāĻāĻ˛ā§', u'āĻāĻŋ', u'āĻ¨āĻž', u'āĻšāĻ¯āĻŧ', u'!']),
+]
+
+ABBREVIATIONS = [
+ (u'āĻĄāĻ āĻāĻžāĻ˛ā§āĻĻ āĻŦāĻ˛āĻ˛ā§āĻ¨ āĻĸāĻžāĻāĻžāĻ¯āĻŧ ā§Šā§Ģ āĻĄāĻŋāĻā§āĻ°āĻŋ āĻ¸ā§.āĨ¤', [u'āĻĄāĻ', u'āĻāĻžāĻ˛ā§āĻĻ', u'āĻŦāĻ˛āĻ˛ā§āĻ¨', u'āĻĸāĻžāĻāĻžāĻ¯āĻŧ', u'ā§Šā§Ģ', u'āĻĄāĻŋāĻā§āĻ°āĻŋ', u'āĻ¸ā§.', u'āĨ¤'])
+]
+
+TESTCASES.extend(PUNCTUATION_TESTS)
+TESTCASES.extend(ABBREVIATIONS)
+
+
+@pytest.mark.parametrize('text,expected_tokens', TESTCASES)
+def test_tokenizer_handles_testcases(bn_tokenizer, text, expected_tokens):
+ tokens = bn_tokenizer(text)
+ token_list = [token.text for token in tokens if not token.is_space]
+ assert expected_tokens == token_list
+
+
+def test_tokenizer_handles_long_text(bn_tokenizer):
+ text = u"""āĻ¨āĻ°ā§āĻĨ āĻ¸āĻžāĻāĻĨ āĻŦāĻŋāĻļā§āĻŦāĻŦāĻŋāĻĻā§āĻ¯āĻžāĻ˛āĻ¯āĻŧā§ āĻ¸āĻžāĻ°āĻžāĻŦāĻāĻ° āĻā§āĻ¨ āĻ¨āĻž āĻā§āĻ¨ āĻŦāĻŋāĻˇāĻ¯āĻŧā§ āĻāĻŦā§āĻˇāĻŖāĻž āĻāĻ˛āĻ¤ā§āĻ āĻĨāĻžāĻā§āĨ¤ \
+āĻ
āĻāĻŋāĻā§āĻ āĻĢā§āĻ¯āĻžāĻāĻžāĻ˛ā§āĻāĻŋ āĻŽā§āĻŽā§āĻŦāĻžāĻ°āĻāĻŖ āĻĒā§āĻ°āĻžāĻ¯āĻŧāĻ āĻļāĻŋāĻā§āĻˇāĻžāĻ°ā§āĻĨā§āĻĻā§āĻ° āĻ¨āĻŋāĻ¯āĻŧā§ āĻŦāĻŋāĻāĻŋāĻ¨ā§āĻ¨ āĻāĻŦā§āĻˇāĻŖāĻž āĻĒā§āĻ°āĻāĻ˛ā§āĻĒā§ āĻāĻžāĻ āĻāĻ°ā§āĻ¨, \
+āĻ¯āĻžāĻ° āĻŽāĻ§ā§āĻ¯ā§ āĻ°āĻ¯āĻŧā§āĻā§ āĻ°ā§āĻŦāĻ āĻĨā§āĻā§ āĻŽā§āĻļāĻŋāĻ¨ āĻ˛āĻžāĻ°ā§āĻ¨āĻŋāĻ āĻ¸āĻŋāĻ¸ā§āĻā§āĻŽ āĻ āĻāĻ°ā§āĻāĻŋāĻĢāĻŋāĻļāĻŋāĻ¯āĻŧāĻžāĻ˛ āĻāĻ¨ā§āĻā§āĻ˛āĻŋāĻā§āĻ¨ā§āĻ¸āĨ¤ \
+āĻāĻ¸āĻāĻ˛ āĻĒā§āĻ°āĻāĻ˛ā§āĻĒā§ āĻāĻžāĻ āĻāĻ°āĻžāĻ° āĻŽāĻžāĻ§ā§āĻ¯āĻŽā§ āĻ¸āĻāĻļā§āĻ˛āĻŋāĻˇā§āĻ āĻā§āĻˇā§āĻ¤ā§āĻ°ā§ āĻ¯āĻĨā§āĻˇā§āĻ āĻĒāĻ°āĻŋāĻŽāĻžāĻŖ āĻ¸ā§āĻĒā§āĻļāĻžāĻ˛āĻžāĻāĻāĻĄ āĻšāĻāĻ¯āĻŧāĻž āĻ¸āĻŽā§āĻāĻŦāĨ¤ \
+āĻāĻ° āĻāĻŦā§āĻˇāĻŖāĻžāĻ° āĻāĻžāĻ āĻ¤ā§āĻŽāĻžāĻ° āĻā§āĻ¯āĻžāĻ°āĻŋāĻ¯āĻŧāĻžāĻ°āĻā§ āĻ ā§āĻ˛ā§ āĻ¨āĻŋāĻ¯āĻŧā§ āĻ¯āĻžāĻŦā§ āĻ
āĻ¨ā§āĻāĻāĻžāĻ¨āĻŋ! \
+āĻāĻ¨ā§āĻā§āĻ¸ā§āĻ āĻĒā§āĻ°ā§āĻā§āĻ°āĻžāĻŽāĻžāĻ° āĻšāĻ, āĻāĻŦā§āĻˇāĻ āĻāĻŋāĻāĻŦāĻž āĻĄā§āĻā§āĻ˛āĻĒāĻžāĻ° - āĻ¨āĻ°ā§āĻĨ āĻ¸āĻžāĻāĻĨ āĻāĻāĻ¨āĻŋāĻāĻžāĻ°ā§āĻ¸āĻŋāĻāĻŋāĻ¤ā§ āĻ¤ā§āĻŽāĻžāĻ° āĻĒā§āĻ°āĻ¤āĻŋāĻāĻž āĻŦāĻŋāĻāĻžāĻļā§āĻ° āĻ¸ā§āĻ¯ā§āĻ āĻ°āĻ¯āĻŧā§āĻā§āĻāĨ¤ \
+āĻ¨āĻ°ā§āĻĨ āĻ¸āĻžāĻāĻĨā§āĻ° āĻ
āĻ¸āĻžāĻ§āĻžāĻ°āĻŖ āĻāĻŽāĻŋāĻāĻ¨āĻŋāĻāĻŋāĻ¤ā§ āĻ¤ā§āĻŽāĻžāĻā§ āĻ¸āĻžāĻĻāĻ° āĻāĻŽāĻ¨ā§āĻ¤ā§āĻ°āĻŖāĨ¤"""
+
+ tokens = bn_tokenizer(text)
+ assert len(tokens) == 84
diff --git a/spacy/tests/conftest.py b/spacy/tests/conftest.py
index b6dcb905a..7c6dcda1b 100644
--- a/spacy/tests/conftest.py
+++ b/spacy/tests/conftest.py
@@ -11,6 +11,7 @@ from ..nl import Dutch
from ..sv import Swedish
from ..hu import Hungarian
from ..fi import Finnish
+from ..bn import Bengali
from ..tokens import Doc
from ..strings import StringStore
from ..lemmatizer import Lemmatizer
@@ -24,7 +25,7 @@ import pytest
LANGUAGES = [English, German, Spanish, Italian, French, Portuguese, Dutch,
- Swedish, Hungarian, Finnish]
+ Swedish, Hungarian, Finnish, Bengali]
@pytest.fixture(params=LANGUAGES)
@@ -73,6 +74,11 @@ def sv_tokenizer():
return Swedish.Defaults.create_tokenizer()
+@pytest.fixture
+def bn_tokenizer():
+ return Bengali.Defaults.create_tokenizer()
+
+
@pytest.fixture
def stringstore():
return StringStore()
diff --git a/spacy/tests/tokenizer/test_tokenizer.py b/spacy/tests/tokenizer/test_tokenizer.py
index 822834f58..349458bce 100644
--- a/spacy/tests/tokenizer/test_tokenizer.py
+++ b/spacy/tests/tokenizer/test_tokenizer.py
@@ -31,7 +31,7 @@ def test_tokenizer_handles_punct(tokenizer):
def test_tokenizer_handles_digits(tokenizer):
- exceptions = ["hu"]
+ exceptions = ["hu", "bn"]
text = "Lorem ipsum: 1984."
tokens = tokenizer(text)
diff --git a/website/docs/usage/entity-recognition.jade b/website/docs/usage/entity-recognition.jade
index 9fc7dddd9..210b04337 100644
--- a/website/docs/usage/entity-recognition.jade
+++ b/website/docs/usage/entity-recognition.jade
@@ -138,7 +138,9 @@ p
+code.
import spacy
+ import random
from spacy.gold import GoldParse
+ from spacy.language import EntityRecognizer
train_data = [
('Who is Chaka Khan?', [(7, 17, 'PERSON')]),
diff --git a/website/docs/usage/index.jade b/website/docs/usage/index.jade
index 1c142592d..479635e4b 100644
--- a/website/docs/usage/index.jade
+++ b/website/docs/usage/index.jade
@@ -5,16 +5,46 @@ include ../../_includes/_mixins
p
| spaCy is compatible with #[strong 64-bit CPython 2.6+∕3.3+] and
| runs on #[strong Unix/Linux], #[strong macOS/OS X] and
- | #[strong Windows]. The latest spaCy releases are currently only
- | available as source packages over
- | #[+a("https://pypi.python.org/pypi/spacy") pip]. Installation requires a
- | working build environment. See notes on
+ | #[strong Windows]. The latest spaCy releases are
+ | available over #[+a("https://pypi.python.org/pypi/spacy") pip] (source
+ | packages only) and #[+a("https://anaconda.org/conda-forge/spacy") conda].
+ | Installation requires a working build environment. See notes on
| #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") macOS/OS X]
| and #[a(href="#source-windows") Windows] for details.
++h(2, "pip") pip
+
+p Using pip, spaCy releases are currently only available as source packages.
+
+code(false, "bash").
pip install -U spacy
+p
+ | When using pip it is generally recommended to install packages in a
+ | #[code virtualenv] to avoid modifying system state:
+
++code(false, "bash").
+ virtualenv .env
+ source .env/bin/activate
+ pip install spacy
+
++h(2, "conda") conda
+
+p
+ | Thanks to our great community, we've finally re-added conda support. You
+ | can now install spaCy via #[code conda-forge]:
+
++code(false, "bash").
+ conda config --add channels conda-forge
+ conda install spacy
+
+p
+ | For the feedstock including the build recipe and configuration, check out
+ | #[+a("https://github.com/conda-forge/spacy-feedstock") this repository].
+ | Improvements and pull requests to the recipe and setup are always appreciated.
+
++h(2, "models") Download models
+
p
| After installation you need to download a language model. Models for
| English (#[code en]) and German (#[code de]) are available.
@@ -36,18 +66,49 @@ p
# Check whether the model was successfully installed
python -c "import spacy; spacy.load('en'); print('OK')"
-p The download command fetches about 1 GB of data which it installs within the #[code spacy] package directory.
+p
+ | The download command fetches about 1 GB of data which it
+ | installs within the #[code spacy] package directory.
+
++h(3, "custom-location") Download model to custom location
+
+p
+ | You can specify where #[code spacy.en.download] and
+ | #[code spacy.de.download] download the language model to using the
+ | #[code --data-path] or #[code -d] argument:
+
++code(false, "bash").
+ python -m spacy.en.download all --data-path /some/dir
+
+p
+ | If you choose to download to a custom location, you will need to tell
+ | spaCy where to load the model from in order to use it. You can do this
+ | either by calling #[code spacy.util.set_data_path()] before calling
+ | #[code spacy.load()], or by passing a #[code path] argument to the
+ | #[code spacy.en.English] or #[code spacy.de.German] constructors.
+
++h(3, "models-manual") Download models manually
+
+p
+ | As of v1.6, the models and word vectors are also available as direct
+ | downloads from GitHub, attached to the #[+a(gh("spaCy") + "/releases") releases] as #[code .tar.gz] archives.
+
+p
+ | To install the models manually, first find the default data path. You can
+ | use #[code spacy.util.get_data_path()] to find the directory where spaCy
+ | will look for its models, or change the default data path with
+ | #[code spacy.util.set_data_path()]. Then simply unpack the archive and
+ | place the contained folder in that directory. You can now load the models
+ | via #[code spacy.load()].
+h(2, "source") Compile from source
p
| The other way to install spaCy is to clone its
| #[+a(gh("spaCy")) GitHub repository] and build it from source. That is
- | the common way if you want to make changes to the code base.
-
-p
- | You'll need to make sure that you have a development enviroment
- | consisting of a Python distribution including header files, a compiler,
+ | the common way if you want to make changes to the code base. You'll need to
+ | make sure that you have a development enviroment consisting of a Python
+ | distribution including header files, a compiler,
| #[+a("https://pip.pypa.io/en/latest/installing/") pip],
| #[+a("https://virtualenv.pypa.io/") virtualenv] and
| #[+a("https://git-scm.com") git] installed. The compiler part is the
@@ -55,6 +116,50 @@ p
| #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") OS X] and
| #[a(href="#source-windows") Windows] for details.
++code(false, "bash").
+ # make sure you are using recent pip/virtualenv versions
+ python -m pip install -U pip virtualenv
+ git clone #{gh("spaCy")}
+ cd spaCy
+
+ virtualenv .env
+ source .env/bin/activate
+ pip install -r requirements.txt
+ pip install -e .
+
+p
+ | Compared to regular install via pip, #[+a(gh("spaCy", "requirements.txt")) requirements.txt]
+ | additionally installs developer dependencies such as Cython.
+
+p
+ | Instead of the above verbose commands, you can also use the following
+ | #[+a("http://www.fabfile.org/") Fabric] commands:
+
++table(["Command", "Description"])
+ +row
+ +cell #[code fab env]
+ +cell Create #[code virtualenv] and delete previous one, if it exists.
+
+ +row
+ +cell #[code fab make]
+ +cell Compile the source.
+
+ +row
+ +cell #[code fab clean]
+ +cell Remove compiled objects, including the generated C++.
+
+ +row
+ +cell #[code fab test]
+ +cell Run basic tests, aborting after first failure.
+
+p
+ | All commands assume that your #[code virtualenv] is located in a
+ | directory #[code .env]. If you're using a different directory, you can
+ | change it via the environment variable #[code VENV_DIR], for example:
+
++code(false, "bash").
+ VENV_DIR=".custom-env" fab clean make
+
+h(3, "source-ubuntu") Ubuntu
p Install system-level dependencies via #[code apt-get]:
@@ -67,12 +172,8 @@ p Install system-level dependencies via #[code apt-get]:
p
| Install a recent version of #[+a("https://developer.apple.com/xcode/") XCode],
| including the so-called "Command Line Tools". macOS and OS X ship with
- | Python and git preinstalled.
-
-p
- | To compile spaCy with multi-threading support on macOS / OS X,
- | #[+a("https://github.com/explosion/spaCy/issues/267") see here].
-
+ | Python and git preinstalled. To compile spaCy with multi-threading support
+ | on macOS / OS X, #[+a("https://github.com/explosion/spaCy/issues/267") see here].
+h(3, "source-windows") Windows
@@ -98,8 +199,8 @@ p
+h(2, "tests") Run tests
p
- | spaCy comes with an extensive test suite. First, find out where spaCy is
- | installed:
+ | spaCy comes with an #[+a(gh("spacy", "spacy/tests")) extensive test suite].
+ | First, find out where spaCy is installed:
+code(false, "bash").
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
@@ -114,20 +215,3 @@ p
python -m pip install -U pytest
python -m pytest <spacy-directory> --vectors --model --slow
-
-+h(2, "custom-location") Download model to custom location
-
-p
- | You can specify where #[code spacy.en.download] and
- | #[code spacy.de.download] download the language model to using the
- | #[code --data-path] or #[code -d] argument:
-
-+code(false, "bash").
- python -m spacy.en.download all --data-path /some/dir
-
-p
- | If you choose to download to a custom location, you will need to tell
- | spaCy where to load the model from in order to use it. You can do this
- | either by calling #[code spacy.util.set_data_path()] before calling
- | #[code spacy.load()], or by passing a #[code path] argument to the
- | #[code spacy.en.English] or #[code spacy.de.German] constructors.