mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-28 02:04:07 +03:00
Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix
This commit is contained in:
commit
b9307dfcd7
159
README.rst
159
README.rst
|
@ -10,7 +10,7 @@ software, released under the MIT license.
|
||||||
|
|
||||||
💫 **Version 1.6 out now!** `Read the release notes here. <https://github.com/explosion/spaCy/releases/>`_
|
💫 **Version 1.6 out now!** `Read the release notes here. <https://github.com/explosion/spaCy/releases/>`_
|
||||||
|
|
||||||
.. image:: https://img.shields.io/travis/explosion/spaCy.svg?style=flat-square
|
.. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square
|
||||||
:target: https://travis-ci.org/explosion/spaCy
|
:target: https://travis-ci.org/explosion/spaCy
|
||||||
:alt: Build Status
|
:alt: Build Status
|
||||||
|
|
||||||
|
@ -18,14 +18,14 @@ software, released under the MIT license.
|
||||||
:target: https://github.com/explosion/spaCy/releases
|
:target: https://github.com/explosion/spaCy/releases
|
||||||
:alt: Current Release Version
|
:alt: Current Release Version
|
||||||
|
|
||||||
.. image:: https://anaconda.org/conda-forge/spacy/badges/version.svg
|
|
||||||
:target: https://anaconda.org/conda-forge/spacy
|
|
||||||
:alt: conda Version
|
|
||||||
|
|
||||||
.. image:: https://img.shields.io/pypi/v/spacy.svg?style=flat-square
|
.. image:: https://img.shields.io/pypi/v/spacy.svg?style=flat-square
|
||||||
:target: https://pypi.python.org/pypi/spacy
|
:target: https://pypi.python.org/pypi/spacy
|
||||||
:alt: pypi Version
|
:alt: pypi Version
|
||||||
|
|
||||||
|
.. image:: https://anaconda.org/conda-forge/spacy/badges/version.svg
|
||||||
|
:target: https://anaconda.org/conda-forge/spacy
|
||||||
|
:alt: conda Version
|
||||||
|
|
||||||
.. image:: https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-09a3d5.svg?style=flat-square
|
.. image:: https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-09a3d5.svg?style=flat-square
|
||||||
:target: https://gitter.im/explosion/spaCy
|
:target: https://gitter.im/explosion/spaCy
|
||||||
:alt: spaCy on Gitter
|
:alt: spaCy on Gitter
|
||||||
|
@ -104,30 +104,35 @@ Supports
|
||||||
Install spaCy
|
Install spaCy
|
||||||
=============
|
=============
|
||||||
|
|
||||||
spaCy is compatible with 64-bit CPython 2.6+/3.3+ and runs on Unix/Linux, OS X
|
spaCy is compatible with **64-bit CPython 2.6+/3.3+** and runs on **Unix/Linux**,
|
||||||
and Windows. Source packages are available via
|
**macOS/OS X** and **Windows**. The latest spaCy releases are available over
|
||||||
`pip <https://pypi.python.org/pypi/spacy>`_. Please make sure that
|
`pip <https://pypi.python.org/pypi/spacy>`_ (source packages only) and
|
||||||
you have a working build enviroment set up. See notes on Ubuntu, macOS/OS X and Windows
|
`conda <https://anaconda.org/conda-forge/spacy>`_. Installation requires a working
|
||||||
for details.
|
build environment. See notes on Ubuntu, macOS/OS X and Windows for details.
|
||||||
|
|
||||||
pip
|
pip
|
||||||
---
|
---
|
||||||
|
|
||||||
When using pip it is generally recommended to install packages in a virtualenv to
|
Using pip, spaCy releases are currently only available as source packages.
|
||||||
avoid modifying system state:
|
|
||||||
|
|
||||||
.. code:: bash
|
.. code:: bash
|
||||||
|
|
||||||
pip install spacy
|
pip install -U spacy
|
||||||
|
|
||||||
Python packaging is awkward at the best of times, and it's particularly tricky with
|
When using pip it is generally recommended to install packages in a ``virtualenv``
|
||||||
C extensions, built via Cython, requiring large data files. So, please report issues
|
to avoid modifying system state:
|
||||||
as you encounter them.
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
virtualenv .env
|
||||||
|
source .env/bin/activate
|
||||||
|
pip install spacy
|
||||||
|
|
||||||
conda
|
conda
|
||||||
-----
|
-----
|
||||||
|
|
||||||
If you're using conda, you can install spaCy via ``conda-forge``:
|
Thanks to our great community, we've finally re-added conda support. You can now
|
||||||
|
install spaCy via ``conda-forge``:
|
||||||
|
|
||||||
.. code:: bash
|
.. code:: bash
|
||||||
|
|
||||||
|
@ -136,14 +141,13 @@ If you're using conda, you can install spaCy via ``conda-forge``:
|
||||||
|
|
||||||
For the feedstock including the build recipe and configuration,
|
For the feedstock including the build recipe and configuration,
|
||||||
check out `this repository <https://github.com/conda-forge/spacy-feedstock>`_.
|
check out `this repository <https://github.com/conda-forge/spacy-feedstock>`_.
|
||||||
Thanks to our great community, we've finally re-added conda support — improvements
|
Improvements and pull requests to the recipe and setup are always appreciated.
|
||||||
and pull requests to the recipe and setup are always appreciated.
|
|
||||||
|
|
||||||
Install model
|
Download models
|
||||||
=============
|
===============
|
||||||
|
|
||||||
After installation you need to download a language model. Currently only models for
|
After installation you need to download a language model. Models for English
|
||||||
English and German, named ``en`` and ``de``, are available.
|
(``en``) and German (``de``) are available.
|
||||||
|
|
||||||
.. code:: bash
|
.. code:: bash
|
||||||
|
|
||||||
|
@ -153,51 +157,90 @@ English and German, named ``en`` and ``de``, are available.
|
||||||
The download command fetches about 1 GB of data which it installs
|
The download command fetches about 1 GB of data which it installs
|
||||||
within the ``spacy`` package directory.
|
within the ``spacy`` package directory.
|
||||||
|
|
||||||
Upgrading spaCy
|
Sometimes new releases require a new language model. Then you will have to
|
||||||
===============
|
upgrade to a new model, too. You can also force re-downloading and installing a
|
||||||
|
new language model:
|
||||||
To upgrade spaCy to the latest release:
|
|
||||||
|
|
||||||
pip
|
|
||||||
---
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
pip install -U spacy
|
|
||||||
|
|
||||||
Sometimes new releases require a new language model. Then you will have to upgrade to
|
|
||||||
a new model, too. You can also force re-downloading and installing a new language model:
|
|
||||||
|
|
||||||
.. code:: bash
|
.. code:: bash
|
||||||
|
|
||||||
python -m spacy.en.download --force
|
python -m spacy.en.download --force
|
||||||
|
|
||||||
|
Download model to custom location
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
You can specify where ``spacy.en.download`` and ``spacy.de.download`` download
|
||||||
|
the language model to using the ``--data-path`` or ``-d`` argument:
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
python -m spacy.en.download all --data-path /some/dir
|
||||||
|
|
||||||
|
If you choose to download to a custom location, you will need to tell spaCy where to load the model
|
||||||
|
from in order to use it. You can do this either by calling ``spacy.util.set_data_path()`` before
|
||||||
|
calling ``spacy.load()``, or by passing a ``path`` argument to the ``spacy.en.English`` or
|
||||||
|
``spacy.de.German`` constructors.
|
||||||
|
|
||||||
|
Download models manually
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
As of v1.6, the models and word vectors are also available as direct downloads
|
||||||
|
from GitHub, attached to the `releases <https://github.com/explosion/spacy/releases>`_
|
||||||
|
as ``.tar.gz`` archives.
|
||||||
|
|
||||||
|
To install the models manually, first find the default data path. You can use
|
||||||
|
``spacy.util.get_data_path()`` to find the directory where spaCy will look for
|
||||||
|
its models, or change the default data path with ``spacy.util.set_data_path()``.
|
||||||
|
Then simply unpack the archive and place the contained folder in that directory.
|
||||||
|
You can now load the models via ``spacy.load()``.
|
||||||
|
|
||||||
Compile from source
|
Compile from source
|
||||||
===================
|
===================
|
||||||
|
|
||||||
The other way to install spaCy is to clone its GitHub repository and build it from
|
The other way to install spaCy is to clone its
|
||||||
|
`GitHub repository <https://github.com/explosion/spaCy>`_ and build it from
|
||||||
source. That is the common way if you want to make changes to the code base.
|
source. That is the common way if you want to make changes to the code base.
|
||||||
|
|
||||||
You'll need to make sure that you have a development enviroment consisting of a
|
You'll need to make sure that you have a development enviroment consisting of a
|
||||||
Python distribution including header files, a compiler, pip, virtualenv and git
|
Python distribution including header files, a compiler,
|
||||||
installed. The compiler part is the trickiest. How to do that depends on your
|
`pip <https://pip.pypa.io/en/latest/installing/>`__, `virtualenv <https://virtualenv.pypa.io/>`_
|
||||||
system. See notes on Ubuntu, OS X and Windows for details.
|
and `git <https://git-scm.com>`_ installed. The compiler part is the trickiest.
|
||||||
|
How to do that depends on your system. See notes on Ubuntu, OS X and Windows for
|
||||||
|
details.
|
||||||
|
|
||||||
.. code:: bash
|
.. code:: bash
|
||||||
|
|
||||||
# make sure you are using recent pip/virtualenv versions
|
# make sure you are using recent pip/virtualenv versions
|
||||||
python -m pip install -U pip virtualenv
|
python -m pip install -U pip virtualenv
|
||||||
|
git clone https://github.com/explosion/spaCy
|
||||||
# find git install instructions at https://git-scm.com/downloads
|
|
||||||
git clone https://github.com/explosion/spaCy.git
|
|
||||||
|
|
||||||
cd spaCy
|
cd spaCy
|
||||||
virtualenv .env && source .env/bin/activate
|
|
||||||
|
virtualenv .env
|
||||||
|
source .env/bin/activate
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
pip install -e .
|
pip install -e .
|
||||||
|
|
||||||
Compared to regular install via pip `requirements.txt <requirements.txt>`_
|
Compared to regular install via pip `requirements.txt <requirements.txt>`_
|
||||||
additionally installs developer dependencies such as cython.
|
additionally installs developer dependencies such as Cython.
|
||||||
|
|
||||||
|
Instead of the above verbose commands, you can also use the following
|
||||||
|
`Fabric <http://www.fabfile.org/>`_ commands:
|
||||||
|
|
||||||
|
+---------------+--------------------------------------------------------------+
|
||||||
|
| ``fab env`` | Create ``virtualenv`` and delete previous one, if it exists. |
|
||||||
|
+---------------+--------------------------------------------------------------+
|
||||||
|
| ``fab make`` | Compile the source. |
|
||||||
|
+---------------+--------------------------------------------------------------+
|
||||||
|
| ``fab clean`` | Remove compiled objects, including the generated C++. |
|
||||||
|
+---------------+--------------------------------------------------------------+
|
||||||
|
| ``fab test`` | Run basic tests, aborting after first failure. |
|
||||||
|
+---------------+--------------------------------------------------------------+
|
||||||
|
|
||||||
|
All commands assume that your ``virtualenv`` is located in a directory ``.env``.
|
||||||
|
If you're using a different directory, you can change it via the environment
|
||||||
|
variable ``VENV_DIR``, for example:
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
VENV_DIR=".custom-env" fab clean make
|
||||||
|
|
||||||
Ubuntu
|
Ubuntu
|
||||||
------
|
------
|
||||||
|
@ -226,8 +269,8 @@ VS 2010 (Python 3.4) and VS 2015 (Python 3.5).
|
||||||
Run tests
|
Run tests
|
||||||
=========
|
=========
|
||||||
|
|
||||||
spaCy comes with an extensive test suite. First, find out where spaCy is
|
spaCy comes with an `extensive test suite <spacy/tests>`_. First, find out where
|
||||||
installed:
|
spaCy is installed:
|
||||||
|
|
||||||
.. code:: bash
|
.. code:: bash
|
||||||
|
|
||||||
|
@ -243,22 +286,6 @@ and ``--model`` are optional and enable additional tests:
|
||||||
|
|
||||||
python -m pytest <spacy-directory> --vectors --model --slow
|
python -m pytest <spacy-directory> --vectors --model --slow
|
||||||
|
|
||||||
Download model to custom location
|
|
||||||
=================================
|
|
||||||
|
|
||||||
You can specify where ``spacy.en.download`` and ``spacy.de.download`` download the language model
|
|
||||||
to using the ``--data-path`` or ``-d`` argument:
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
python -m spacy.en.download all --data-path /some/dir
|
|
||||||
|
|
||||||
|
|
||||||
If you choose to download to a custom location, you will need to tell spaCy where to load the model
|
|
||||||
from in order to use it. You can do this either by calling ``spacy.util.set_data_path()`` before
|
|
||||||
calling ``spacy.load()``, or by passing a ``path`` argument to the ``spacy.en.English`` or
|
|
||||||
``spacy.de.German`` constructors.
|
|
||||||
|
|
||||||
Changelog
|
Changelog
|
||||||
=========
|
=========
|
||||||
|
|
||||||
|
|
1
spacy/tests/bn/__init__.py
Normal file
1
spacy/tests/bn/__init__.py
Normal file
|
@ -0,0 +1 @@
|
||||||
|
# coding: utf-8
|
40
spacy/tests/bn/test_tokenizer.py
Normal file
40
spacy/tests/bn/test_tokenizer.py
Normal file
|
@ -0,0 +1,40 @@
|
||||||
|
# encoding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
TESTCASES = []
|
||||||
|
|
||||||
|
PUNCTUATION_TESTS = [
|
||||||
|
(u'আমি বাংলায় গান গাই!', [u'আমি', u'বাংলায়', u'গান', u'গাই', u'!']),
|
||||||
|
(u'আমি বাংলায় কথা কই।', [u'আমি', u'বাংলায়', u'কথা', u'কই', u'।']),
|
||||||
|
(u'বসুন্ধরা জনসম্মুখে দোষ স্বীকার করলো না?', [u'বসুন্ধরা', u'জনসম্মুখে', u'দোষ', u'স্বীকার', u'করলো', u'না', u'?']),
|
||||||
|
(u'টাকা থাকলে কি না হয়!', [u'টাকা', u'থাকলে', u'কি', u'না', u'হয়', u'!']),
|
||||||
|
]
|
||||||
|
|
||||||
|
ABBREVIATIONS = [
|
||||||
|
(u'ডঃ খালেদ বললেন ঢাকায় ৩৫ ডিগ্রি সে.।', [u'ডঃ', u'খালেদ', u'বললেন', u'ঢাকায়', u'৩৫', u'ডিগ্রি', u'সে.', u'।'])
|
||||||
|
]
|
||||||
|
|
||||||
|
TESTCASES.extend(PUNCTUATION_TESTS)
|
||||||
|
TESTCASES.extend(ABBREVIATIONS)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text,expected_tokens', TESTCASES)
|
||||||
|
def test_tokenizer_handles_testcases(bn_tokenizer, text, expected_tokens):
|
||||||
|
tokens = bn_tokenizer(text)
|
||||||
|
token_list = [token.text for token in tokens if not token.is_space]
|
||||||
|
assert expected_tokens == token_list
|
||||||
|
|
||||||
|
|
||||||
|
def test_tokenizer_handles_long_text(bn_tokenizer):
|
||||||
|
text = u"""নর্থ সাউথ বিশ্ববিদ্যালয়ে সারাবছর কোন না কোন বিষয়ে গবেষণা চলতেই থাকে। \
|
||||||
|
অভিজ্ঞ ফ্যাকাল্টি মেম্বারগণ প্রায়ই শিক্ষার্থীদের নিয়ে বিভিন্ন গবেষণা প্রকল্পে কাজ করেন, \
|
||||||
|
যার মধ্যে রয়েছে রোবট থেকে মেশিন লার্নিং সিস্টেম ও আর্টিফিশিয়াল ইন্টেলিজেন্স। \
|
||||||
|
এসকল প্রকল্পে কাজ করার মাধ্যমে সংশ্লিষ্ট ক্ষেত্রে যথেষ্ঠ পরিমাণ স্পেশালাইজড হওয়া সম্ভব। \
|
||||||
|
আর গবেষণার কাজ তোমার ক্যারিয়ারকে ঠেলে নিয়ে যাবে অনেকখানি! \
|
||||||
|
কন্টেস্ট প্রোগ্রামার হও, গবেষক কিংবা ডেভেলপার - নর্থ সাউথ ইউনিভার্সিটিতে তোমার প্রতিভা বিকাশের সুযোগ রয়েছেই। \
|
||||||
|
নর্থ সাউথের অসাধারণ কমিউনিটিতে তোমাকে সাদর আমন্ত্রণ।"""
|
||||||
|
|
||||||
|
tokens = bn_tokenizer(text)
|
||||||
|
assert len(tokens) == 84
|
|
@ -11,6 +11,7 @@ from ..nl import Dutch
|
||||||
from ..sv import Swedish
|
from ..sv import Swedish
|
||||||
from ..hu import Hungarian
|
from ..hu import Hungarian
|
||||||
from ..fi import Finnish
|
from ..fi import Finnish
|
||||||
|
from ..bn import Bengali
|
||||||
from ..tokens import Doc
|
from ..tokens import Doc
|
||||||
from ..strings import StringStore
|
from ..strings import StringStore
|
||||||
from ..lemmatizer import Lemmatizer
|
from ..lemmatizer import Lemmatizer
|
||||||
|
@ -24,7 +25,7 @@ import pytest
|
||||||
|
|
||||||
|
|
||||||
LANGUAGES = [English, German, Spanish, Italian, French, Portuguese, Dutch,
|
LANGUAGES = [English, German, Spanish, Italian, French, Portuguese, Dutch,
|
||||||
Swedish, Hungarian, Finnish]
|
Swedish, Hungarian, Finnish, Bengali]
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture(params=LANGUAGES)
|
@pytest.fixture(params=LANGUAGES)
|
||||||
|
@ -73,6 +74,11 @@ def sv_tokenizer():
|
||||||
return Swedish.Defaults.create_tokenizer()
|
return Swedish.Defaults.create_tokenizer()
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def bn_tokenizer():
|
||||||
|
return Bengali.Defaults.create_tokenizer()
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
def stringstore():
|
def stringstore():
|
||||||
return StringStore()
|
return StringStore()
|
||||||
|
|
|
@ -31,7 +31,7 @@ def test_tokenizer_handles_punct(tokenizer):
|
||||||
|
|
||||||
|
|
||||||
def test_tokenizer_handles_digits(tokenizer):
|
def test_tokenizer_handles_digits(tokenizer):
|
||||||
exceptions = ["hu"]
|
exceptions = ["hu", "bn"]
|
||||||
text = "Lorem ipsum: 1984."
|
text = "Lorem ipsum: 1984."
|
||||||
tokens = tokenizer(text)
|
tokens = tokenizer(text)
|
||||||
|
|
||||||
|
|
|
@ -138,7 +138,9 @@ p
|
||||||
|
|
||||||
+code.
|
+code.
|
||||||
import spacy
|
import spacy
|
||||||
|
import random
|
||||||
from spacy.gold import GoldParse
|
from spacy.gold import GoldParse
|
||||||
|
from spacy.language import EntityRecognizer
|
||||||
|
|
||||||
train_data = [
|
train_data = [
|
||||||
('Who is Chaka Khan?', [(7, 17, 'PERSON')]),
|
('Who is Chaka Khan?', [(7, 17, 'PERSON')]),
|
||||||
|
|
|
@ -5,16 +5,46 @@ include ../../_includes/_mixins
|
||||||
p
|
p
|
||||||
| spaCy is compatible with #[strong 64-bit CPython 2.6+∕3.3+] and
|
| spaCy is compatible with #[strong 64-bit CPython 2.6+∕3.3+] and
|
||||||
| runs on #[strong Unix/Linux], #[strong macOS/OS X] and
|
| runs on #[strong Unix/Linux], #[strong macOS/OS X] and
|
||||||
| #[strong Windows]. The latest spaCy releases are currently only
|
| #[strong Windows]. The latest spaCy releases are
|
||||||
| available as source packages over
|
| available over #[+a("https://pypi.python.org/pypi/spacy") pip] (source
|
||||||
| #[+a("https://pypi.python.org/pypi/spacy") pip]. Installation requires a
|
| packages only) and #[+a("https://anaconda.org/conda-forge/spacy") conda].
|
||||||
| working build environment. See notes on
|
| Installation requires a working build environment. See notes on
|
||||||
| #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") macOS/OS X]
|
| #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") macOS/OS X]
|
||||||
| and #[a(href="#source-windows") Windows] for details.
|
| and #[a(href="#source-windows") Windows] for details.
|
||||||
|
|
||||||
|
+h(2, "pip") pip
|
||||||
|
|
||||||
|
p Using pip, spaCy releases are currently only available as source packages.
|
||||||
|
|
||||||
+code(false, "bash").
|
+code(false, "bash").
|
||||||
pip install -U spacy
|
pip install -U spacy
|
||||||
|
|
||||||
|
p
|
||||||
|
| When using pip it is generally recommended to install packages in a
|
||||||
|
| #[code virtualenv] to avoid modifying system state:
|
||||||
|
|
||||||
|
+code(false, "bash").
|
||||||
|
virtualenv .env
|
||||||
|
source .env/bin/activate
|
||||||
|
pip install spacy
|
||||||
|
|
||||||
|
+h(2, "conda") conda
|
||||||
|
|
||||||
|
p
|
||||||
|
| Thanks to our great community, we've finally re-added conda support. You
|
||||||
|
| can now install spaCy via #[code conda-forge]:
|
||||||
|
|
||||||
|
+code(false, "bash").
|
||||||
|
conda config --add channels conda-forge
|
||||||
|
conda install spacy
|
||||||
|
|
||||||
|
p
|
||||||
|
| For the feedstock including the build recipe and configuration, check out
|
||||||
|
| #[+a("https://github.com/conda-forge/spacy-feedstock") this repository].
|
||||||
|
| Improvements and pull requests to the recipe and setup are always appreciated.
|
||||||
|
|
||||||
|
+h(2, "models") Download models
|
||||||
|
|
||||||
p
|
p
|
||||||
| After installation you need to download a language model. Models for
|
| After installation you need to download a language model. Models for
|
||||||
| English (#[code en]) and German (#[code de]) are available.
|
| English (#[code en]) and German (#[code de]) are available.
|
||||||
|
@ -36,18 +66,49 @@ p
|
||||||
# Check whether the model was successfully installed
|
# Check whether the model was successfully installed
|
||||||
python -c "import spacy; spacy.load('en'); print('OK')"
|
python -c "import spacy; spacy.load('en'); print('OK')"
|
||||||
|
|
||||||
p The download command fetches about 1 GB of data which it installs within the #[code spacy] package directory.
|
p
|
||||||
|
| The download command fetches about 1 GB of data which it
|
||||||
|
| installs within the #[code spacy] package directory.
|
||||||
|
|
||||||
|
+h(3, "custom-location") Download model to custom location
|
||||||
|
|
||||||
|
p
|
||||||
|
| You can specify where #[code spacy.en.download] and
|
||||||
|
| #[code spacy.de.download] download the language model to using the
|
||||||
|
| #[code --data-path] or #[code -d] argument:
|
||||||
|
|
||||||
|
+code(false, "bash").
|
||||||
|
python -m spacy.en.download all --data-path /some/dir
|
||||||
|
|
||||||
|
p
|
||||||
|
| If you choose to download to a custom location, you will need to tell
|
||||||
|
| spaCy where to load the model from in order to use it. You can do this
|
||||||
|
| either by calling #[code spacy.util.set_data_path()] before calling
|
||||||
|
| #[code spacy.load()], or by passing a #[code path] argument to the
|
||||||
|
| #[code spacy.en.English] or #[code spacy.de.German] constructors.
|
||||||
|
|
||||||
|
+h(3, "models-manual") Download models manually
|
||||||
|
|
||||||
|
p
|
||||||
|
| As of v1.6, the models and word vectors are also available as direct
|
||||||
|
| downloads from GitHub, attached to the #[+a(gh("spaCy") + "/releases") releases] as #[code .tar.gz] archives.
|
||||||
|
|
||||||
|
p
|
||||||
|
| To install the models manually, first find the default data path. You can
|
||||||
|
| use #[code spacy.util.get_data_path()] to find the directory where spaCy
|
||||||
|
| will look for its models, or change the default data path with
|
||||||
|
| #[code spacy.util.set_data_path()]. Then simply unpack the archive and
|
||||||
|
| place the contained folder in that directory. You can now load the models
|
||||||
|
| via #[code spacy.load()].
|
||||||
|
|
||||||
+h(2, "source") Compile from source
|
+h(2, "source") Compile from source
|
||||||
|
|
||||||
p
|
p
|
||||||
| The other way to install spaCy is to clone its
|
| The other way to install spaCy is to clone its
|
||||||
| #[+a(gh("spaCy")) GitHub repository] and build it from source. That is
|
| #[+a(gh("spaCy")) GitHub repository] and build it from source. That is
|
||||||
| the common way if you want to make changes to the code base.
|
| the common way if you want to make changes to the code base. You'll need to
|
||||||
|
| make sure that you have a development enviroment consisting of a Python
|
||||||
p
|
| distribution including header files, a compiler,
|
||||||
| You'll need to make sure that you have a development enviroment
|
|
||||||
| consisting of a Python distribution including header files, a compiler,
|
|
||||||
| #[+a("https://pip.pypa.io/en/latest/installing/") pip],
|
| #[+a("https://pip.pypa.io/en/latest/installing/") pip],
|
||||||
| #[+a("https://virtualenv.pypa.io/") virtualenv] and
|
| #[+a("https://virtualenv.pypa.io/") virtualenv] and
|
||||||
| #[+a("https://git-scm.com") git] installed. The compiler part is the
|
| #[+a("https://git-scm.com") git] installed. The compiler part is the
|
||||||
|
@ -55,6 +116,50 @@ p
|
||||||
| #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") OS X] and
|
| #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") OS X] and
|
||||||
| #[a(href="#source-windows") Windows] for details.
|
| #[a(href="#source-windows") Windows] for details.
|
||||||
|
|
||||||
|
+code(false, "bash").
|
||||||
|
# make sure you are using recent pip/virtualenv versions
|
||||||
|
python -m pip install -U pip virtualenv
|
||||||
|
git clone #{gh("spaCy")}
|
||||||
|
cd spaCy
|
||||||
|
|
||||||
|
virtualenv .env
|
||||||
|
source .env/bin/activate
|
||||||
|
pip install -r requirements.txt
|
||||||
|
pip install -e .
|
||||||
|
|
||||||
|
p
|
||||||
|
| Compared to regular install via pip, #[+a(gh("spaCy", "requirements.txt")) requirements.txt]
|
||||||
|
| additionally installs developer dependencies such as Cython.
|
||||||
|
|
||||||
|
p
|
||||||
|
| Instead of the above verbose commands, you can also use the following
|
||||||
|
| #[+a("http://www.fabfile.org/") Fabric] commands:
|
||||||
|
|
||||||
|
+table(["Command", "Description"])
|
||||||
|
+row
|
||||||
|
+cell #[code fab env]
|
||||||
|
+cell Create #[code virtualenv] and delete previous one, if it exists.
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code fab make]
|
||||||
|
+cell Compile the source.
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code fab clean]
|
||||||
|
+cell Remove compiled objects, including the generated C++.
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code fab test]
|
||||||
|
+cell Run basic tests, aborting after first failure.
|
||||||
|
|
||||||
|
p
|
||||||
|
| All commands assume that your #[code virtualenv] is located in a
|
||||||
|
| directory #[code .env]. If you're using a different directory, you can
|
||||||
|
| change it via the environment variable #[code VENV_DIR], for example:
|
||||||
|
|
||||||
|
+code(false, "bash").
|
||||||
|
VENV_DIR=".custom-env" fab clean make
|
||||||
|
|
||||||
+h(3, "source-ubuntu") Ubuntu
|
+h(3, "source-ubuntu") Ubuntu
|
||||||
|
|
||||||
p Install system-level dependencies via #[code apt-get]:
|
p Install system-level dependencies via #[code apt-get]:
|
||||||
|
@ -67,12 +172,8 @@ p Install system-level dependencies via #[code apt-get]:
|
||||||
p
|
p
|
||||||
| Install a recent version of #[+a("https://developer.apple.com/xcode/") XCode],
|
| Install a recent version of #[+a("https://developer.apple.com/xcode/") XCode],
|
||||||
| including the so-called "Command Line Tools". macOS and OS X ship with
|
| including the so-called "Command Line Tools". macOS and OS X ship with
|
||||||
| Python and git preinstalled.
|
| Python and git preinstalled. To compile spaCy with multi-threading support
|
||||||
|
| on macOS / OS X, #[+a("https://github.com/explosion/spaCy/issues/267") see here].
|
||||||
p
|
|
||||||
| To compile spaCy with multi-threading support on macOS / OS X,
|
|
||||||
| #[+a("https://github.com/explosion/spaCy/issues/267") see here].
|
|
||||||
|
|
||||||
|
|
||||||
+h(3, "source-windows") Windows
|
+h(3, "source-windows") Windows
|
||||||
|
|
||||||
|
@ -98,8 +199,8 @@ p
|
||||||
+h(2, "tests") Run tests
|
+h(2, "tests") Run tests
|
||||||
|
|
||||||
p
|
p
|
||||||
| spaCy comes with an extensive test suite. First, find out where spaCy is
|
| spaCy comes with an #[+a(gh("spacy", "spacy/tests")) extensive test suite].
|
||||||
| installed:
|
| First, find out where spaCy is installed:
|
||||||
|
|
||||||
+code(false, "bash").
|
+code(false, "bash").
|
||||||
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
|
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
|
||||||
|
@ -114,20 +215,3 @@ p
|
||||||
python -m pip install -U pytest
|
python -m pip install -U pytest
|
||||||
|
|
||||||
python -m pytest <spacy-directory> --vectors --model --slow
|
python -m pytest <spacy-directory> --vectors --model --slow
|
||||||
|
|
||||||
+h(2, "custom-location") Download model to custom location
|
|
||||||
|
|
||||||
p
|
|
||||||
| You can specify where #[code spacy.en.download] and
|
|
||||||
| #[code spacy.de.download] download the language model to using the
|
|
||||||
| #[code --data-path] or #[code -d] argument:
|
|
||||||
|
|
||||||
+code(false, "bash").
|
|
||||||
python -m spacy.en.download all --data-path /some/dir
|
|
||||||
|
|
||||||
p
|
|
||||||
| If you choose to download to a custom location, you will need to tell
|
|
||||||
| spaCy where to load the model from in order to use it. You can do this
|
|
||||||
| either by calling #[code spacy.util.set_data_path()] before calling
|
|
||||||
| #[code spacy.load()], or by passing a #[code path] argument to the
|
|
||||||
| #[code spacy.en.English] or #[code spacy.de.German] constructors.
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user