mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 16:07:41 +03:00 
			
		
		
		
	Merge branch 'master' into rappdw/tokenizer_exceptions_url_fix
This commit is contained in:
		
						commit
						b9307dfcd7
					
				
							
								
								
									
										159
									
								
								README.rst
									
									
									
									
									
								
							
							
						
						
									
										159
									
								
								README.rst
									
									
									
									
									
								
							|  | @ -10,7 +10,7 @@ software, released under the MIT license. | ||||||
| 
 | 
 | ||||||
| 💫 **Version 1.6 out now!** `Read the release notes here. <https://github.com/explosion/spaCy/releases/>`_ | 💫 **Version 1.6 out now!** `Read the release notes here. <https://github.com/explosion/spaCy/releases/>`_ | ||||||
| 
 | 
 | ||||||
| .. image:: https://img.shields.io/travis/explosion/spaCy.svg?style=flat-square | .. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square | ||||||
|     :target: https://travis-ci.org/explosion/spaCy |     :target: https://travis-ci.org/explosion/spaCy | ||||||
|     :alt: Build Status |     :alt: Build Status | ||||||
| 
 | 
 | ||||||
|  | @ -18,14 +18,14 @@ software, released under the MIT license. | ||||||
|     :target: https://github.com/explosion/spaCy/releases |     :target: https://github.com/explosion/spaCy/releases | ||||||
|     :alt: Current Release Version |     :alt: Current Release Version | ||||||
| 
 | 
 | ||||||
| .. image:: https://anaconda.org/conda-forge/spacy/badges/version.svg |  | ||||||
|     :target: https://anaconda.org/conda-forge/spacy |  | ||||||
|     :alt: conda Version |  | ||||||
|      |  | ||||||
| .. image:: https://img.shields.io/pypi/v/spacy.svg?style=flat-square | .. image:: https://img.shields.io/pypi/v/spacy.svg?style=flat-square | ||||||
|     :target: https://pypi.python.org/pypi/spacy |     :target: https://pypi.python.org/pypi/spacy | ||||||
|     :alt: pypi Version |     :alt: pypi Version | ||||||
| 
 | 
 | ||||||
|  | .. image:: https://anaconda.org/conda-forge/spacy/badges/version.svg | ||||||
|  |     :target: https://anaconda.org/conda-forge/spacy | ||||||
|  |     :alt: conda Version | ||||||
|  | 
 | ||||||
| .. image:: https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-09a3d5.svg?style=flat-square | .. image:: https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-09a3d5.svg?style=flat-square | ||||||
|     :target: https://gitter.im/explosion/spaCy |     :target: https://gitter.im/explosion/spaCy | ||||||
|     :alt: spaCy on Gitter |     :alt: spaCy on Gitter | ||||||
|  | @ -104,30 +104,35 @@ Supports | ||||||
| Install spaCy | Install spaCy | ||||||
| ============= | ============= | ||||||
| 
 | 
 | ||||||
| spaCy is compatible with 64-bit CPython 2.6+/3.3+ and runs on Unix/Linux, OS X  | spaCy is compatible with **64-bit CPython 2.6+/3.3+** and runs on **Unix/Linux**, | ||||||
| and Windows. Source packages are available via  | **macOS/OS X** and **Windows**. The latest spaCy releases are available over | ||||||
| `pip <https://pypi.python.org/pypi/spacy>`_. Please make sure that | `pip <https://pypi.python.org/pypi/spacy>`_ (source packages only) and | ||||||
| you have a working build enviroment set up. See notes on Ubuntu, macOS/OS X and Windows | `conda <https://anaconda.org/conda-forge/spacy>`_. Installation requires a working | ||||||
| for details. | build environment. See notes on Ubuntu, macOS/OS X and Windows for details. | ||||||
| 
 | 
 | ||||||
| pip | pip | ||||||
| --- | --- | ||||||
| 
 | 
 | ||||||
| When using pip it is generally recommended to install packages in a virtualenv to | Using pip, spaCy releases are currently only available as source packages. | ||||||
| avoid modifying system state: |  | ||||||
| 
 | 
 | ||||||
| .. code:: bash | .. code:: bash | ||||||
| 
 | 
 | ||||||
|     pip install spacy |     pip install -U spacy | ||||||
| 
 | 
 | ||||||
| Python packaging is awkward at the best of times, and it's particularly tricky with | When using pip it is generally recommended to install packages in a ``virtualenv`` | ||||||
| C extensions, built via Cython, requiring large data files. So, please report issues | to avoid modifying system state: | ||||||
| as you encounter them. | 
 | ||||||
|  | .. code:: bash | ||||||
|  | 
 | ||||||
|  |     virtualenv .env | ||||||
|  |     source .env/bin/activate | ||||||
|  |     pip install spacy | ||||||
| 
 | 
 | ||||||
| conda | conda | ||||||
| ----- | ----- | ||||||
| 
 | 
 | ||||||
| If you're using conda, you can install spaCy via ``conda-forge``: | Thanks to our great community, we've finally re-added conda support. You can now | ||||||
|  | install spaCy via ``conda-forge``: | ||||||
| 
 | 
 | ||||||
| .. code:: bash | .. code:: bash | ||||||
| 
 | 
 | ||||||
|  | @ -136,14 +141,13 @@ If you're using conda, you can install spaCy via ``conda-forge``: | ||||||
| 
 | 
 | ||||||
| For the feedstock including the build recipe and configuration, | For the feedstock including the build recipe and configuration, | ||||||
| check out `this repository <https://github.com/conda-forge/spacy-feedstock>`_. | check out `this repository <https://github.com/conda-forge/spacy-feedstock>`_. | ||||||
| Thanks to our great community, we've finally re-added conda support — improvements | Improvements and pull requests to the recipe and setup are always appreciated. | ||||||
| and pull requests to the recipe and setup are always appreciated. |  | ||||||
| 
 | 
 | ||||||
| Install model | Download models | ||||||
| ============= | =============== | ||||||
| 
 | 
 | ||||||
| After installation you need to download a language model. Currently only models for  | After installation you need to download a language model. Models for English | ||||||
| English and German, named ``en`` and ``de``, are available. | (``en``) and German (``de``) are available. | ||||||
| 
 | 
 | ||||||
| .. code:: bash | .. code:: bash | ||||||
| 
 | 
 | ||||||
|  | @ -153,51 +157,90 @@ English and German, named ``en`` and ``de``, are available. | ||||||
| The download command fetches about 1 GB of data which it installs | The download command fetches about 1 GB of data which it installs | ||||||
| within the ``spacy`` package directory. | within the ``spacy`` package directory. | ||||||
| 
 | 
 | ||||||
| Upgrading spaCy | Sometimes new releases require a new language model. Then you will have to | ||||||
| =============== | upgrade to a new model, too. You can also force re-downloading and installing a | ||||||
| 
 | new language model: | ||||||
| To upgrade spaCy to the latest release: |  | ||||||
| 
 |  | ||||||
| pip |  | ||||||
| --- |  | ||||||
| 
 |  | ||||||
| .. code:: bash |  | ||||||
| 
 |  | ||||||
|     pip install -U spacy |  | ||||||
| 
 |  | ||||||
| Sometimes new releases require a new language model. Then you will have to upgrade to  |  | ||||||
| a new model, too. You can also force re-downloading and installing a new language model: |  | ||||||
| 
 | 
 | ||||||
| .. code:: bash | .. code:: bash | ||||||
| 
 | 
 | ||||||
|     python -m spacy.en.download --force |     python -m spacy.en.download --force | ||||||
| 
 | 
 | ||||||
|  | Download model to custom location | ||||||
|  | --------------------------------- | ||||||
|  | 
 | ||||||
|  | You can specify where ``spacy.en.download`` and ``spacy.de.download`` download | ||||||
|  | the language model to using the ``--data-path`` or ``-d`` argument: | ||||||
|  | 
 | ||||||
|  | .. code:: bash | ||||||
|  | 
 | ||||||
|  |     python -m spacy.en.download all --data-path /some/dir | ||||||
|  | 
 | ||||||
|  | If you choose to download to a custom location, you will need to tell spaCy where to load the model | ||||||
|  | from in order to use it. You can do this either by calling ``spacy.util.set_data_path()`` before | ||||||
|  | calling ``spacy.load()``, or by passing a ``path`` argument to the ``spacy.en.English`` or | ||||||
|  | ``spacy.de.German`` constructors. | ||||||
|  | 
 | ||||||
|  | Download models manually | ||||||
|  | ------------------------ | ||||||
|  | 
 | ||||||
|  | As of v1.6, the models and word vectors are also available as direct downloads | ||||||
|  | from GitHub, attached to the `releases <https://github.com/explosion/spacy/releases>`_ | ||||||
|  | as ``.tar.gz`` archives. | ||||||
|  | 
 | ||||||
|  | To install the models manually, first find the default data path. You can use | ||||||
|  | ``spacy.util.get_data_path()`` to find the directory where spaCy will look for | ||||||
|  | its models, or change the default data path with ``spacy.util.set_data_path()``. | ||||||
|  | Then simply unpack the archive and place the contained folder in that directory. | ||||||
|  | You can now load the models via ``spacy.load()``. | ||||||
|  | 
 | ||||||
| Compile from source | Compile from source | ||||||
| =================== | =================== | ||||||
| 
 | 
 | ||||||
| The other way to install spaCy is to clone its GitHub repository and build it from  | The other way to install spaCy is to clone its | ||||||
|  | `GitHub repository <https://github.com/explosion/spaCy>`_ and build it from | ||||||
| source. That is the common way if you want to make changes to the code base. | source. That is the common way if you want to make changes to the code base. | ||||||
| 
 |  | ||||||
| You'll need to make sure that you have a development enviroment consisting of a | You'll need to make sure that you have a development enviroment consisting of a | ||||||
| Python distribution including header files, a compiler, pip, virtualenv and git  | Python distribution including header files, a compiler, | ||||||
| installed. The compiler part is the trickiest. How to do that depends on your  | `pip <https://pip.pypa.io/en/latest/installing/>`__, `virtualenv <https://virtualenv.pypa.io/>`_ | ||||||
| system. See notes on Ubuntu, OS X and Windows for details. | and `git <https://git-scm.com>`_ installed. The compiler part is the trickiest. | ||||||
|  | How to do that depends on your system. See notes on Ubuntu, OS X and Windows for | ||||||
|  | details. | ||||||
| 
 | 
 | ||||||
| .. code:: bash | .. code:: bash | ||||||
| 
 | 
 | ||||||
|     # make sure you are using recent pip/virtualenv versions |     # make sure you are using recent pip/virtualenv versions | ||||||
|     python -m pip install -U pip virtualenv |     python -m pip install -U pip virtualenv | ||||||
| 
 |     git clone https://github.com/explosion/spaCy | ||||||
|     #  find git install instructions at https://git-scm.com/downloads |  | ||||||
|     git clone https://github.com/explosion/spaCy.git |  | ||||||
| 
 |  | ||||||
|     cd spaCy |     cd spaCy | ||||||
|     virtualenv .env && source .env/bin/activate | 
 | ||||||
|  |     virtualenv .env | ||||||
|  |     source .env/bin/activate | ||||||
|     pip install -r requirements.txt |     pip install -r requirements.txt | ||||||
|     pip install -e . |     pip install -e . | ||||||
| 
 | 
 | ||||||
| Compared to regular install via pip `requirements.txt <requirements.txt>`_ | Compared to regular install via pip `requirements.txt <requirements.txt>`_ | ||||||
| additionally installs developer dependencies such as cython. | additionally installs developer dependencies such as Cython. | ||||||
|  | 
 | ||||||
|  | Instead of the above verbose commands, you can also use the following | ||||||
|  | `Fabric <http://www.fabfile.org/>`_ commands: | ||||||
|  | 
 | ||||||
|  | +---------------+--------------------------------------------------------------+ | ||||||
|  | | ``fab env``   | Create ``virtualenv`` and delete previous one, if it exists. | | ||||||
|  | +---------------+--------------------------------------------------------------+ | ||||||
|  | | ``fab make``  | Compile the source.                                          | | ||||||
|  | +---------------+--------------------------------------------------------------+ | ||||||
|  | | ``fab clean`` | Remove compiled objects, including the generated C++.        | | ||||||
|  | +---------------+--------------------------------------------------------------+ | ||||||
|  | | ``fab test``  | Run basic tests, aborting after first failure.               | | ||||||
|  | +---------------+--------------------------------------------------------------+ | ||||||
|  | 
 | ||||||
|  | All commands assume that your ``virtualenv`` is located in a directory ``.env``. | ||||||
|  | If you're using a different directory, you can change it via the environment | ||||||
|  | variable ``VENV_DIR``, for example: | ||||||
|  | 
 | ||||||
|  | .. code:: bash | ||||||
|  | 
 | ||||||
|  |     VENV_DIR=".custom-env" fab clean make | ||||||
| 
 | 
 | ||||||
| Ubuntu | Ubuntu | ||||||
| ------ | ------ | ||||||
|  | @ -226,8 +269,8 @@ VS 2010 (Python 3.4) and VS 2015 (Python 3.5). | ||||||
| Run tests | Run tests | ||||||
| ========= | ========= | ||||||
| 
 | 
 | ||||||
| spaCy comes with an extensive test suite. First, find out where spaCy is  | spaCy comes with an `extensive test suite <spacy/tests>`_. First, find out where | ||||||
| installed: | spaCy is installed: | ||||||
| 
 | 
 | ||||||
| .. code:: bash | .. code:: bash | ||||||
| 
 | 
 | ||||||
|  | @ -243,22 +286,6 @@ and ``--model`` are optional and enable additional tests: | ||||||
| 
 | 
 | ||||||
|     python -m pytest <spacy-directory> --vectors --model --slow |     python -m pytest <spacy-directory> --vectors --model --slow | ||||||
| 
 | 
 | ||||||
| Download model to custom location |  | ||||||
| ================================= |  | ||||||
| 
 |  | ||||||
| You can specify where ``spacy.en.download`` and ``spacy.de.download`` download the language model |  | ||||||
| to using the ``--data-path`` or ``-d`` argument: |  | ||||||
| 
 |  | ||||||
| .. code:: bash |  | ||||||
|      |  | ||||||
|     python -m spacy.en.download all --data-path /some/dir |  | ||||||
| 
 |  | ||||||
| 
 |  | ||||||
| If you choose to download to a custom location, you will need to tell spaCy where to load the model |  | ||||||
| from in order to use it. You can do this either by calling ``spacy.util.set_data_path()`` before |  | ||||||
| calling ``spacy.load()``, or by passing a ``path`` argument to the ``spacy.en.English`` or |  | ||||||
| ``spacy.de.German`` constructors. |  | ||||||
| 
 |  | ||||||
| Changelog | Changelog | ||||||
| ========= | ========= | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
							
								
								
									
										1
									
								
								spacy/tests/bn/__init__.py
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										1
									
								
								spacy/tests/bn/__init__.py
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1 @@ | ||||||
|  | # coding: utf-8 | ||||||
							
								
								
									
										40
									
								
								spacy/tests/bn/test_tokenizer.py
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										40
									
								
								spacy/tests/bn/test_tokenizer.py
									
									
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,40 @@ | ||||||
|  | # encoding: utf8 | ||||||
|  | from __future__ import unicode_literals | ||||||
|  | 
 | ||||||
|  | import pytest | ||||||
|  | 
 | ||||||
|  | TESTCASES = [] | ||||||
|  | 
 | ||||||
|  | PUNCTUATION_TESTS = [ | ||||||
|  |     (u'আমি বাংলায় গান গাই!', [u'আমি', u'বাংলায়', u'গান', u'গাই', u'!']), | ||||||
|  |     (u'আমি বাংলায় কথা কই।', [u'আমি', u'বাংলায়', u'কথা', u'কই', u'।']), | ||||||
|  |     (u'বসুন্ধরা জনসম্মুখে দোষ স্বীকার করলো না?', [u'বসুন্ধরা', u'জনসম্মুখে', u'দোষ', u'স্বীকার', u'করলো', u'না', u'?']), | ||||||
|  |     (u'টাকা থাকলে কি না হয়!', [u'টাকা', u'থাকলে', u'কি', u'না', u'হয়', u'!']), | ||||||
|  | ] | ||||||
|  | 
 | ||||||
|  | ABBREVIATIONS = [ | ||||||
|  |     (u'ডঃ খালেদ বললেন ঢাকায় ৩৫ ডিগ্রি সে.।', [u'ডঃ', u'খালেদ', u'বললেন', u'ঢাকায়', u'৩৫', u'ডিগ্রি', u'সে.', u'।']) | ||||||
|  | ] | ||||||
|  | 
 | ||||||
|  | TESTCASES.extend(PUNCTUATION_TESTS) | ||||||
|  | TESTCASES.extend(ABBREVIATIONS) | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | @pytest.mark.parametrize('text,expected_tokens', TESTCASES) | ||||||
|  | def test_tokenizer_handles_testcases(bn_tokenizer, text, expected_tokens): | ||||||
|  |     tokens = bn_tokenizer(text) | ||||||
|  |     token_list = [token.text for token in tokens if not token.is_space] | ||||||
|  |     assert expected_tokens == token_list | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | def test_tokenizer_handles_long_text(bn_tokenizer): | ||||||
|  |     text = u"""নর্থ সাউথ বিশ্ববিদ্যালয়ে সারাবছর কোন না কোন বিষয়ে গবেষণা চলতেই থাকে। \ | ||||||
|  | অভিজ্ঞ ফ্যাকাল্টি মেম্বারগণ প্রায়ই শিক্ষার্থীদের নিয়ে বিভিন্ন গবেষণা প্রকল্পে কাজ করেন, \ | ||||||
|  | যার মধ্যে রয়েছে রোবট থেকে মেশিন লার্নিং সিস্টেম ও আর্টিফিশিয়াল ইন্টেলিজেন্স। \ | ||||||
|  | এসকল প্রকল্পে কাজ করার মাধ্যমে সংশ্লিষ্ট ক্ষেত্রে যথেষ্ঠ পরিমাণ স্পেশালাইজড হওয়া সম্ভব। \ | ||||||
|  | আর গবেষণার কাজ তোমার ক্যারিয়ারকে ঠেলে নিয়ে যাবে অনেকখানি! \ | ||||||
|  | কন্টেস্ট প্রোগ্রামার হও, গবেষক কিংবা ডেভেলপার - নর্থ সাউথ ইউনিভার্সিটিতে তোমার প্রতিভা বিকাশের সুযোগ রয়েছেই। \ | ||||||
|  | নর্থ সাউথের অসাধারণ কমিউনিটিতে তোমাকে সাদর আমন্ত্রণ।""" | ||||||
|  | 
 | ||||||
|  |     tokens = bn_tokenizer(text) | ||||||
|  |     assert len(tokens) == 84 | ||||||
|  | @ -11,6 +11,7 @@ from ..nl import Dutch | ||||||
| from ..sv import Swedish | from ..sv import Swedish | ||||||
| from ..hu import Hungarian | from ..hu import Hungarian | ||||||
| from ..fi import Finnish | from ..fi import Finnish | ||||||
|  | from ..bn import Bengali | ||||||
| from ..tokens import Doc | from ..tokens import Doc | ||||||
| from ..strings import StringStore | from ..strings import StringStore | ||||||
| from ..lemmatizer import Lemmatizer | from ..lemmatizer import Lemmatizer | ||||||
|  | @ -24,7 +25,7 @@ import pytest | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| LANGUAGES = [English, German, Spanish, Italian, French, Portuguese, Dutch, | LANGUAGES = [English, German, Spanish, Italian, French, Portuguese, Dutch, | ||||||
|              Swedish, Hungarian, Finnish] |              Swedish, Hungarian, Finnish, Bengali] | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| @pytest.fixture(params=LANGUAGES) | @pytest.fixture(params=LANGUAGES) | ||||||
|  | @ -73,6 +74,11 @@ def sv_tokenizer(): | ||||||
|     return Swedish.Defaults.create_tokenizer() |     return Swedish.Defaults.create_tokenizer() | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
|  | @pytest.fixture | ||||||
|  | def bn_tokenizer(): | ||||||
|  |     return Bengali.Defaults.create_tokenizer() | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
| @pytest.fixture | @pytest.fixture | ||||||
| def stringstore(): | def stringstore(): | ||||||
|     return StringStore() |     return StringStore() | ||||||
|  |  | ||||||
|  | @ -31,7 +31,7 @@ def test_tokenizer_handles_punct(tokenizer): | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| def test_tokenizer_handles_digits(tokenizer): | def test_tokenizer_handles_digits(tokenizer): | ||||||
|     exceptions = ["hu"] |     exceptions = ["hu", "bn"] | ||||||
|     text = "Lorem ipsum: 1984." |     text = "Lorem ipsum: 1984." | ||||||
|     tokens = tokenizer(text) |     tokens = tokenizer(text) | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
|  | @ -138,7 +138,9 @@ p | ||||||
| 
 | 
 | ||||||
| +code. | +code. | ||||||
|     import spacy |     import spacy | ||||||
|  |     import random | ||||||
|     from spacy.gold import GoldParse |     from spacy.gold import GoldParse | ||||||
|  |     from spacy.language import EntityRecognizer | ||||||
| 
 | 
 | ||||||
|     train_data = [ |     train_data = [ | ||||||
|         ('Who is Chaka Khan?', [(7, 17, 'PERSON')]), |         ('Who is Chaka Khan?', [(7, 17, 'PERSON')]), | ||||||
|  |  | ||||||
|  | @ -5,16 +5,46 @@ include ../../_includes/_mixins | ||||||
| p | p | ||||||
|     |  spaCy is compatible with #[strong 64-bit CPython 2.6+∕3.3+] and |     |  spaCy is compatible with #[strong 64-bit CPython 2.6+∕3.3+] and | ||||||
|     |  runs on #[strong Unix/Linux], #[strong macOS/OS X] and |     |  runs on #[strong Unix/Linux], #[strong macOS/OS X] and | ||||||
|     |  #[strong Windows]. The latest spaCy releases are currently only |     |  #[strong Windows]. The latest spaCy releases are | ||||||
|     |  available as source packages over |     |  available over #[+a("https://pypi.python.org/pypi/spacy") pip] (source | ||||||
|     |  #[+a("https://pypi.python.org/pypi/spacy") pip]. Installation requires a |     |  packages only) and #[+a("https://anaconda.org/conda-forge/spacy") conda]. | ||||||
|     |  working build environment. See notes on |     |  Installation requires a working build environment. See notes on | ||||||
|     |  #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") macOS/OS X] |     |  #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") macOS/OS X] | ||||||
|     |  and #[a(href="#source-windows") Windows] for details. |     |  and #[a(href="#source-windows") Windows] for details. | ||||||
| 
 | 
 | ||||||
|  | +h(2, "pip") pip | ||||||
|  | 
 | ||||||
|  | p Using pip, spaCy releases are currently only available as source packages. | ||||||
|  | 
 | ||||||
| +code(false, "bash"). | +code(false, "bash"). | ||||||
|     pip install -U spacy |     pip install -U spacy | ||||||
| 
 | 
 | ||||||
|  | p | ||||||
|  |     |  When using pip it is generally recommended to install packages in a | ||||||
|  |     |  #[code virtualenv] to avoid modifying system state: | ||||||
|  | 
 | ||||||
|  | +code(false, "bash"). | ||||||
|  |     virtualenv .env | ||||||
|  |     source .env/bin/activate | ||||||
|  |     pip install spacy | ||||||
|  | 
 | ||||||
|  | +h(2, "conda") conda | ||||||
|  | 
 | ||||||
|  | p | ||||||
|  |     |  Thanks to our great community, we've finally re-added conda support. You | ||||||
|  |     |  can now install spaCy via #[code conda-forge]: | ||||||
|  | 
 | ||||||
|  | +code(false, "bash"). | ||||||
|  |     conda config --add channels conda-forge | ||||||
|  |     conda install spacy | ||||||
|  | 
 | ||||||
|  | p | ||||||
|  |     |  For the feedstock including the build recipe and configuration, check out | ||||||
|  |     |  #[+a("https://github.com/conda-forge/spacy-feedstock") this repository]. | ||||||
|  |     |  Improvements and pull requests to the recipe and setup are always appreciated. | ||||||
|  | 
 | ||||||
|  | +h(2, "models") Download models | ||||||
|  | 
 | ||||||
| p | p | ||||||
|     |  After installation you need to download a language model. Models for |     |  After installation you need to download a language model. Models for | ||||||
|     |  English (#[code en]) and German (#[code de]) are available. |     |  English (#[code en]) and German (#[code de]) are available. | ||||||
|  | @ -36,18 +66,49 @@ p | ||||||
|     # Check whether the model was successfully installed |     # Check whether the model was successfully installed | ||||||
|     python -c "import spacy; spacy.load('en'); print('OK')" |     python -c "import spacy; spacy.load('en'); print('OK')" | ||||||
| 
 | 
 | ||||||
| p The download command fetches about 1 GB of data which it installs within the #[code spacy] package directory. | p | ||||||
|  |     |  The download command fetches about 1 GB of data which it | ||||||
|  |     |  installs within the #[code spacy] package directory. | ||||||
|  | 
 | ||||||
|  | +h(3, "custom-location") Download model to custom location | ||||||
|  | 
 | ||||||
|  | p | ||||||
|  |     |   You can specify where #[code spacy.en.download] and | ||||||
|  |     |  #[code spacy.de.download] download the language model to using the | ||||||
|  |     |  #[code --data-path] or #[code -d] argument: | ||||||
|  | 
 | ||||||
|  | +code(false, "bash"). | ||||||
|  |     python -m spacy.en.download all --data-path /some/dir | ||||||
|  | 
 | ||||||
|  | p | ||||||
|  |     |  If you choose to download to a custom location, you will need to tell | ||||||
|  |     |  spaCy where to load the model from in order to use it. You can do this | ||||||
|  |     |  either by calling #[code spacy.util.set_data_path()] before calling | ||||||
|  |     |  #[code spacy.load()], or by passing a #[code path] argument to the | ||||||
|  |     |  #[code spacy.en.English] or #[code spacy.de.German] constructors. | ||||||
|  | 
 | ||||||
|  | +h(3, "models-manual") Download models manually | ||||||
|  | 
 | ||||||
|  | p | ||||||
|  |     |  As of v1.6, the models and word vectors are also available as direct | ||||||
|  |     |  downloads from GitHub, attached to the #[+a(gh("spaCy") + "/releases") releases] as #[code .tar.gz] archives. | ||||||
|  | 
 | ||||||
|  | p | ||||||
|  |     |  To install the models manually, first find the default data path. You can | ||||||
|  |     |  use #[code spacy.util.get_data_path()] to find the directory where spaCy | ||||||
|  |     |  will look for its models, or change the default data path with | ||||||
|  |     |  #[code spacy.util.set_data_path()]. Then simply unpack the archive and | ||||||
|  |     |  place the contained folder in that directory. You can now load the models | ||||||
|  |     |  via #[code spacy.load()]. | ||||||
| 
 | 
 | ||||||
| +h(2, "source") Compile from source | +h(2, "source") Compile from source | ||||||
| 
 | 
 | ||||||
| p | p | ||||||
|     |  The other way to install spaCy is to clone its |     |  The other way to install spaCy is to clone its | ||||||
|     |  #[+a(gh("spaCy")) GitHub repository] and build it from source. That is |     |  #[+a(gh("spaCy")) GitHub repository] and build it from source. That is | ||||||
|     |  the common way if you want to make changes to the code base. |     |  the common way if you want to make changes to the code base. You'll need to | ||||||
| 
 |     |  make sure that you have a development enviroment consisting of a Python | ||||||
| p |     |  distribution including header files, a compiler, | ||||||
|     |  You'll need to make sure that you have a development enviroment |  | ||||||
|     |  consisting of a Python distribution including header files, a compiler, |  | ||||||
|     |  #[+a("https://pip.pypa.io/en/latest/installing/") pip], |     |  #[+a("https://pip.pypa.io/en/latest/installing/") pip], | ||||||
|     |  #[+a("https://virtualenv.pypa.io/") virtualenv] and |     |  #[+a("https://virtualenv.pypa.io/") virtualenv] and | ||||||
|     |  #[+a("https://git-scm.com") git] installed. The compiler part is the |     |  #[+a("https://git-scm.com") git] installed. The compiler part is the | ||||||
|  | @ -55,6 +116,50 @@ p | ||||||
|     |  #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") OS X] and |     |  #[a(href="#source-ubuntu") Ubuntu], #[a(href="#source-osx") OS X] and | ||||||
|     |  #[a(href="#source-windows") Windows] for details. |     |  #[a(href="#source-windows") Windows] for details. | ||||||
| 
 | 
 | ||||||
|  | +code(false, "bash"). | ||||||
|  |     # make sure you are using recent pip/virtualenv versions | ||||||
|  |     python -m pip install -U pip virtualenv | ||||||
|  |     git clone #{gh("spaCy")} | ||||||
|  |     cd spaCy | ||||||
|  | 
 | ||||||
|  |     virtualenv .env | ||||||
|  |     source .env/bin/activate | ||||||
|  |     pip install -r requirements.txt | ||||||
|  |     pip install -e . | ||||||
|  | 
 | ||||||
|  | p | ||||||
|  |     |  Compared to regular install via pip, #[+a(gh("spaCy", "requirements.txt")) requirements.txt] | ||||||
|  |     |  additionally installs developer dependencies such as Cython. | ||||||
|  | 
 | ||||||
|  | p | ||||||
|  |     |  Instead of the above verbose commands, you can also use the following | ||||||
|  |     |  #[+a("http://www.fabfile.org/") Fabric] commands: | ||||||
|  | 
 | ||||||
|  | +table(["Command", "Description"]) | ||||||
|  |     +row | ||||||
|  |         +cell #[code fab env] | ||||||
|  |         +cell Create #[code virtualenv] and delete previous one, if it exists. | ||||||
|  | 
 | ||||||
|  |     +row | ||||||
|  |         +cell #[code fab make] | ||||||
|  |         +cell Compile the source. | ||||||
|  | 
 | ||||||
|  |     +row | ||||||
|  |         +cell #[code fab clean] | ||||||
|  |         +cell Remove compiled objects, including the generated C++. | ||||||
|  | 
 | ||||||
|  |     +row | ||||||
|  |         +cell #[code fab test] | ||||||
|  |         +cell Run basic tests, aborting after first failure. | ||||||
|  | 
 | ||||||
|  | p | ||||||
|  |     |  All commands assume that your #[code virtualenv] is located in a | ||||||
|  |     |  directory #[code .env]. If you're using a different directory, you can | ||||||
|  |     |  change it via the environment variable #[code VENV_DIR], for example: | ||||||
|  | 
 | ||||||
|  | +code(false, "bash"). | ||||||
|  |     VENV_DIR=".custom-env" fab clean make | ||||||
|  | 
 | ||||||
| +h(3, "source-ubuntu") Ubuntu | +h(3, "source-ubuntu") Ubuntu | ||||||
| 
 | 
 | ||||||
| p Install system-level dependencies via #[code apt-get]: | p Install system-level dependencies via #[code apt-get]: | ||||||
|  | @ -67,12 +172,8 @@ p Install system-level dependencies via #[code apt-get]: | ||||||
| p | p | ||||||
|     |  Install a recent version of #[+a("https://developer.apple.com/xcode/") XCode], |     |  Install a recent version of #[+a("https://developer.apple.com/xcode/") XCode], | ||||||
|     |  including the so-called "Command Line Tools". macOS and OS X ship with |     |  including the so-called "Command Line Tools". macOS and OS X ship with | ||||||
|     |  Python and git preinstalled. |     |  Python and git preinstalled. To compile spaCy with multi-threading support | ||||||
| 
 |     |  on macOS / OS X, #[+a("https://github.com/explosion/spaCy/issues/267") see here]. | ||||||
| p |  | ||||||
|     |  To compile spaCy with multi-threading support on macOS / OS X, |  | ||||||
|     |  #[+a("https://github.com/explosion/spaCy/issues/267") see here]. |  | ||||||
| 
 |  | ||||||
| 
 | 
 | ||||||
| +h(3, "source-windows") Windows | +h(3, "source-windows") Windows | ||||||
| 
 | 
 | ||||||
|  | @ -98,8 +199,8 @@ p | ||||||
| +h(2, "tests") Run tests | +h(2, "tests") Run tests | ||||||
| 
 | 
 | ||||||
| p | p | ||||||
|     |  spaCy comes with an extensive test suite. First, find out where spaCy is |     |  spaCy comes with an #[+a(gh("spacy", "spacy/tests")) extensive test suite]. | ||||||
|     |  installed: |     |  First, find out where spaCy is installed: | ||||||
| 
 | 
 | ||||||
| +code(false, "bash"). | +code(false, "bash"). | ||||||
|     python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))" |     python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))" | ||||||
|  | @ -114,20 +215,3 @@ p | ||||||
|     python -m pip install -U pytest |     python -m pip install -U pytest | ||||||
| 
 | 
 | ||||||
|     python -m pytest <spacy-directory> --vectors --model --slow |     python -m pytest <spacy-directory> --vectors --model --slow | ||||||
| 
 |  | ||||||
| +h(2, "custom-location") Download model to custom location |  | ||||||
| 
 |  | ||||||
| p |  | ||||||
|     |   You can specify where #[code spacy.en.download] and |  | ||||||
|     |  #[code spacy.de.download] download the language model to using the |  | ||||||
|     |  #[code --data-path] or #[code -d] argument: |  | ||||||
| 
 |  | ||||||
| +code(false, "bash"). |  | ||||||
|     python -m spacy.en.download all --data-path /some/dir |  | ||||||
| 
 |  | ||||||
| p |  | ||||||
|     |  If you choose to download to a custom location, you will need to tell |  | ||||||
|     |  spaCy where to load the model from in order to use it. You can do this |  | ||||||
|     |  either by calling #[code spacy.util.set_data_path()] before calling |  | ||||||
|     |  #[code spacy.load()], or by passing a #[code path] argument to the |  | ||||||
|     |  #[code spacy.en.English] or #[code spacy.de.German] constructors. |  | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue
	
	Block a user