mirror of
https://github.com/explosion/spaCy.git
synced 2024-09-21 03:19:13 +03:00
d33953037e
* Create aryaprabhudesai.md (#2681) * Update _install.jade (#2688) Typo fix: "models" -> "model" * Add FAC to spacy.explain (resolves #2706) * Remove docstrings for deprecated arguments (see #2703) * When calling getoption() in conftest.py, pass a default option (#2709) * When calling getoption() in conftest.py, pass a default option This is necessary to allow testing an installed spacy by running: pytest --pyargs spacy * Add contributor agreement * update bengali token rules for hyphen and digits (#2731) * Less norm computations in token similarity (#2730) * Less norm computations in token similarity * Contributor agreement * Remove ')' for clarity (#2737) Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know. * added contributor agreement for mbkupfer (#2738) * Basic support for Telugu language (#2751) * Lex _attrs for polish language (#2750) * Signed spaCy contributor agreement * Added polish version of english lex_attrs * Introduces a bulk merge function, in order to solve issue #653 (#2696) * Fix comment * Introduce bulk merge to increase performance on many span merges * Sign contributor agreement * Implement pull request suggestions * Describe converters more explicitly (see #2643) * Add multi-threading note to Language.pipe (resolves #2582) [ci skip] * Fix formatting * Fix dependency scheme docs (closes #2705) [ci skip] * Don't set stop word in example (closes #2657) [ci skip] * Add words to portuguese language _num_words (#2759) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Update Indonesian model (#2752) * adding e-KTP in tokenizer exceptions list * add exception token * removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception * add tokenizer exceptions list * combining base_norms with norm_exceptions * adding norm_exception * fix double key in lemmatizer * remove unused import on punctuation.py * reformat stop_words to reduce number of lines, improve readibility * updating tokenizer exception * implement is_currency for lang/id * adding orth_first_upper in tokenizer_exceptions * update the norm_exception list * remove bunch of abbreviations * adding contributors file * Fixed spaCy+Keras example (#2763) * bug fixes in keras example * created contributor agreement * Adding French hyphenated first name (#2786) * Fix typo (closes #2784) * Fix typo (#2795) [ci skip] Fixed typo on line 6 "regcognizer --> recognizer" * Adding basic support for Sinhala language. (#2788) * adding Sinhala language package, stop words, examples and lex_attrs. * Adding contributor agreement * Updating contributor agreement * Also include lowercase norm exceptions * Fix error (#2802) * Fix error ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function * added spaCy Contributor Agreement * Add charlax's contributor agreement (#2805) * agreement of contributor, may I introduce a tiny pl languge contribution (#2799) * Contributors agreement * Contributors agreement * Contributors agreement * Add jupyter=True to displacy.render in documentation (#2806) * Revert "Also include lowercase norm exceptions" This reverts commit70f4e8adf3
. * Remove deprecated encoding argument to msgpack * Set up dependency tree pattern matching skeleton (#2732) * Fix bug when too many entity types. Fixes #2800 * Fix Python 2 test failure * Require older msgpack-numpy * Restore encoding arg on msgpack-numpy * Try to fix version pin for msgpack-numpy * Update Portuguese Language (#2790) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols * Extended punctuation and norm_exceptions in the Portuguese language * Correct error in spacy universe docs concerning spacy-lookup (#2814) * Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround * Fix typo (closes #2815) [ci skip] * Update regex version dependency * Set version to 2.0.13.dev3 * Skip seemingly problematic test * Remove problematic test * Try previous version of regex * Revert "Remove problematic test" This reverts commitbdebbef455
. * Unskip test * Try older version of regex * 💫 Update training examples and use minibatching (#2830) <!--- Provide a general summary of your changes in the title. --> ## Description Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results. ### Types of change enhancements ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Visual C++ link updated (#2842) (closes #2841) [ci skip] * New landing page * Add contribution agreement * Correcting lang/ru/examples.py (#2845) * Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement * Correct some grammatical inaccuracies in lang\ru\examples.py * Move contributor agreement to separate file * Set version to 2.0.13.dev4 * Add Persian(Farsi) language support (#2797) * Also include lowercase norm exceptions * Remove in favour of https://github.com/explosion/spaCy/graphs/contributors * Rule-based French Lemmatizer (#2818) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class. ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> - Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version. - Add several files containing exhaustive list of words for each part of speech - Add some lemma rules - Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX - Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned - Modify the lemmatize function to check in lookup table as a last resort - Init files are updated so the model can support all the functionalities mentioned above - Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [X] I have submitted the spaCy Contributor Agreement. - [X] I ran the tests, and all new and existing tests passed. - [X] My changes don't require a change to the documentation, or if they do, I've added all required information. * Set version to 2.0.13 * Fix formatting and consistency * Update docs for new version [ci skip] * Increment version [ci skip] * Add info on wheels [ci skip] * Adding "This is a sentence" example to Sinhala (#2846) * Add wheels badge * Update badge [ci skip] * Update README.rst [ci skip] * Update murmurhash pin * Increment version to 2.0.14.dev0 * Update GPU docs for v2.0.14 * Add wheel to setup_requires * Import prefer_gpu and require_gpu functions from Thinc * Add tests for prefer_gpu() and require_gpu() * Update requirements and setup.py * Workaround bug in thinc require_gpu * Set version to v2.0.14 * Update push-tag script * Unhack prefer_gpu * Require thinc 6.10.6 * Update prefer_gpu and require_gpu docs [ci skip] * Fix specifiers for GPU * Set version to 2.0.14.dev1 * Set version to 2.0.14 * Update Thinc version pin * Increment version * Fix msgpack-numpy version pin * Increment version * Update version to 2.0.16 * Update version [ci skip] * Redundant ')' in the Stop words' example (#2856) <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [ ] I have submitted the spaCy Contributor Agreement. - [ ] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. * Documentation improvement regarding joblib and SO (#2867) Some documentation improvements ## Description 1. Fixed the dead URL to joblib 2. Fixed Stack Overflow brand name (with space) ### Types of change Documentation ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * raise error when setting overlapping entities as doc.ents (#2880) * Fix out-of-bounds access in NER training The helper method state.B(1) gets the index of the first token of the buffer, or -1 if no such token exists. Normally this is safe because we pass this to functions like state.safe_get(), which returns an empty token. Here we used it directly as an array index, which is not okay! This error may have been the cause of out-of-bounds access errors during training. Similar errors may still be around, so much be hunted down. Hunting this one down took a long time...I printed out values across training runs and diffed, looking for points of divergence between runs, when no randomness should be allowed. * Change PyThaiNLP Url (#2876) * Fix missing comma * Add example showing a fix-up rule for space entities * Set version to 2.0.17.dev0 * Update regex version * Revert "Update regex version" This reverts commit62358dd867
. * Try setting older regex version, to align with conda * Set version to 2.0.17 * Add spacy-js to universe [ci-skip] * Add spacy-raspberry to universe (closes #2889) * Add script to validate universe json [ci skip] * Removed space in docs + added contributor indo (#2909) * - removed unneeded space in documentation * - added contributor info * Allow input text of length up to max_length, inclusive (#2922) * Include universe spec for spacy-wordnet component (#2919) * feat: include universe spec for spacy-wordnet component * chore: include spaCy contributor agreement * Minor formatting changes [ci skip] * Fix image [ci skip] Twitter URL doesn't work on live site * Check if the word is in one of the regular lists specific to each POS (#2886) * 💫 Create random IDs for SVGs to prevent ID clashes (#2927) Resolves #2924. ## Description Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.) ### Types of change bug fix ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Fix typo [ci skip] * fixes symbolic link on py3 and windows (#2949) * fixes symbolic link on py3 and windows during setup of spacy using command python -m spacy link en_core_web_sm en closes #2948 * Update spacy/compat.py Co-Authored-By: cicorias <cicorias@users.noreply.github.com> * Fix formatting * Update universe [ci skip] * Catalan Language Support (#2940) * Catalan language Support * Ddding Catalan to documentation * Sort languages alphabetically [ci skip] * Update tests for pytest 4.x (#2965) <!--- Provide a general summary of your changes in the title. --> ## Description - [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize)) - [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here) ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Fix regex pin to harmonize with conda (#2964) * Update README.rst * Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977) Fixes #2976 * Fix typo * Fix typo * Remove duplicate file * Require thinc 7.0.0.dev2 Fixes bug in gpu_ops that would use cupy instead of numpy on CPU * Add missing import * Fix error IDs * Fix tests
329 lines
12 KiB
ReStructuredText
329 lines
12 KiB
ReStructuredText
spaCy: Industrial-strength NLP
|
||
******************************
|
||
|
||
spaCy is a library for advanced Natural Language Processing in Python and Cython.
|
||
It's built on the very latest research, and was designed from day one to be
|
||
used in real products. spaCy comes with
|
||
`pre-trained statistical models <https://spacy.io/models>`_ and word
|
||
vectors, and currently supports tokenization for **30+ languages**. It features
|
||
the **fastest syntactic parser** in the world, convolutional **neural network models**
|
||
for tagging, parsing and **named entity recognition** and easy **deep learning**
|
||
integration. It's commercial open-source software, released under the MIT license.
|
||
|
||
💫 **Version 2.0 out now!** `Check out the release notes here. <https://github.com/explosion/spaCy/releases>`_
|
||
|
||
.. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square&logo=travis
|
||
:target: https://travis-ci.org/explosion/spaCy
|
||
:alt: Build Status
|
||
|
||
.. image:: https://img.shields.io/appveyor/ci/explosion/spaCy/master.svg?style=flat-square&logo=appveyor
|
||
:target: https://ci.appveyor.com/project/explosion/spaCy
|
||
:alt: Appveyor Build Status
|
||
|
||
.. image:: https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square
|
||
:target: https://github.com/explosion/spaCy/releases
|
||
:alt: Current Release Version
|
||
|
||
.. image:: https://img.shields.io/pypi/v/spacy.svg?style=flat-square
|
||
:target: https://pypi.python.org/pypi/spacy
|
||
:alt: pypi Version
|
||
|
||
.. image:: https://img.shields.io/conda/vn/conda-forge/spacy.svg?style=flat-square
|
||
:target: https://anaconda.org/conda-forge/spacy
|
||
:alt: conda Version
|
||
|
||
.. image:: https://img.shields.io/badge/wheels-%E2%9C%93-4c1.svg?longCache=true&style=flat-square&logo=python&logoColor=white
|
||
:target: https://github.com/explosion/wheelwright/releases
|
||
:alt: Python wheels
|
||
|
||
.. image:: https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow
|
||
:target: https://twitter.com/spacy_io
|
||
:alt: spaCy on Twitter
|
||
|
||
📖 Documentation
|
||
================
|
||
|
||
=================== ===
|
||
`spaCy 101`_ New to spaCy? Here's everything you need to know!
|
||
`Usage Guides`_ How to use spaCy and its features.
|
||
`New in v2.0`_ New features, backwards incompatibilities and migration guide.
|
||
`API Reference`_ The detailed reference for spaCy's API.
|
||
`Models`_ Download statistical language models for spaCy.
|
||
`Universe`_ Libraries, extensions, demos, books and courses.
|
||
`Changelog`_ Changes and version history.
|
||
`Contribute`_ How to contribute to the spaCy project and code base.
|
||
=================== ===
|
||
|
||
.. _spaCy 101: https://spacy.io/usage/spacy-101
|
||
.. _New in v2.0: https://spacy.io/usage/v2#migrating
|
||
.. _Usage Guides: https://spacy.io/usage/
|
||
.. _API Reference: https://spacy.io/api/
|
||
.. _Models: https://spacy.io/models
|
||
.. _Universe: https://spacy.io/universe
|
||
.. _Changelog: https://spacy.io/usage/#changelog
|
||
.. _Contribute: https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md
|
||
|
||
💬 Where to ask questions
|
||
==========================
|
||
|
||
The spaCy project is maintained by `@honnibal <https://github.com/honnibal>`_
|
||
and `@ines <https://github.com/ines>`_. Please understand that we won't be able
|
||
to provide individual support via email. We also believe that help is much more
|
||
valuable if it's shared publicly, so that more people can benefit from it.
|
||
|
||
====================== ===
|
||
**Bug Reports** `GitHub Issue Tracker`_
|
||
**Usage Questions** `Stack Overflow`_, `Gitter Chat`_, `Reddit User Group`_
|
||
**General Discussion** `Gitter Chat`_, `Reddit User Group`_
|
||
====================== ===
|
||
|
||
.. _GitHub Issue Tracker: https://github.com/explosion/spaCy/issues
|
||
.. _Stack Overflow: http://stackoverflow.com/questions/tagged/spacy
|
||
.. _Gitter Chat: https://gitter.im/explosion/spaCy
|
||
.. _Reddit User Group: https://www.reddit.com/r/spacynlp
|
||
|
||
Features
|
||
========
|
||
|
||
* **Fastest syntactic parser** in the world
|
||
* **Named entity** recognition
|
||
* Non-destructive **tokenization**
|
||
* Support for **30+ languages**
|
||
* Pre-trained `statistical models <https://spacy.io/models>`_ and word vectors
|
||
* Easy **deep learning** integration
|
||
* Part-of-speech tagging
|
||
* Labelled dependency parsing
|
||
* Syntax-driven sentence segmentation
|
||
* Built in **visualizers** for syntax and NER
|
||
* Convenient string-to-hash mapping
|
||
* Export to numpy data arrays
|
||
* Efficient binary serialization
|
||
* Easy **model packaging** and deployment
|
||
* State-of-the-art speed
|
||
* Robust, rigorously evaluated accuracy
|
||
|
||
📖 **For more details, see the** `facts, figures and benchmarks <https://spacy.io/usage/facts-figures>`_.
|
||
|
||
Install spaCy
|
||
=============
|
||
|
||
For detailed installation instructions, see
|
||
the `documentation <https://spacy.io/usage>`_.
|
||
|
||
==================== ===
|
||
**Operating system** macOS / OS X, Linux, Windows (Cygwin, MinGW, Visual Studio)
|
||
**Python version** CPython 2.7, 3.4+. Only 64 bit.
|
||
**Package managers** `pip`_, `conda`_ (via ``conda-forge``)
|
||
==================== ===
|
||
|
||
.. _pip: https://pypi.python.org/pypi/spacy
|
||
.. _conda: https://anaconda.org/conda-forge/spacy
|
||
|
||
pip
|
||
---
|
||
|
||
Using pip, spaCy releases are available as source packages and binary wheels
|
||
(as of ``v2.0.13``).
|
||
|
||
.. code:: bash
|
||
|
||
pip install spacy
|
||
|
||
When using pip it is generally recommended to install packages in a virtual
|
||
environment to avoid modifying system state:
|
||
|
||
.. code:: bash
|
||
|
||
python -m venv .env
|
||
source .env/bin/activate
|
||
pip install spacy
|
||
|
||
conda
|
||
-----
|
||
|
||
Thanks to our great community, we've finally re-added conda support. You can now
|
||
install spaCy via ``conda-forge``:
|
||
|
||
.. code:: bash
|
||
|
||
conda config --add channels conda-forge
|
||
conda install spacy
|
||
|
||
For the feedstock including the build recipe and configuration,
|
||
check out `this repository <https://github.com/conda-forge/spacy-feedstock>`_.
|
||
Improvements and pull requests to the recipe and setup are always appreciated.
|
||
|
||
Updating spaCy
|
||
--------------
|
||
|
||
Some updates to spaCy may require downloading new statistical models. If you're
|
||
running spaCy v2.0 or higher, you can use the ``validate`` command to check if
|
||
your installed models are compatible and if not, print details on how to update
|
||
them:
|
||
|
||
.. code:: bash
|
||
|
||
pip install -U spacy
|
||
python -m spacy validate
|
||
|
||
If you've trained your own models, keep in mind that your training and runtime
|
||
inputs must match. After updating spaCy, we recommend **retraining your models**
|
||
with the new version.
|
||
|
||
📖 **For details on upgrading from spaCy 1.x to spaCy 2.x, see the**
|
||
`migration guide <https://spacy.io/usage/v2#migrating>`_.
|
||
|
||
Download models
|
||
===============
|
||
|
||
As of v1.7.0, models for spaCy can be installed as **Python packages**.
|
||
This means that they're a component of your application, just like any
|
||
other module. Models can be installed using spaCy's ``download`` command,
|
||
or manually by pointing pip to a path or URL.
|
||
|
||
======================= ===
|
||
`Available Models`_ Detailed model descriptions, accuracy figures and benchmarks.
|
||
`Models Documentation`_ Detailed usage instructions.
|
||
======================= ===
|
||
|
||
.. _Available Models: https://spacy.io/models
|
||
.. _Models Documentation: https://spacy.io/docs/usage/models
|
||
|
||
.. code:: bash
|
||
|
||
# out-of-the-box: download best-matching default model
|
||
python -m spacy download en
|
||
|
||
# download best-matching version of specific model for your spaCy installation
|
||
python -m spacy download en_core_web_lg
|
||
|
||
# pip install .tar.gz archive from path or URL
|
||
pip install /Users/you/en_core_web_sm-2.0.0.tar.gz
|
||
|
||
Loading and using models
|
||
------------------------
|
||
|
||
To load a model, use ``spacy.load()`` with the model's shortcut link:
|
||
|
||
.. code:: python
|
||
|
||
import spacy
|
||
nlp = spacy.load('en')
|
||
doc = nlp(u'This is a sentence.')
|
||
|
||
If you've installed a model via pip, you can also ``import`` it directly and
|
||
then call its ``load()`` method:
|
||
|
||
.. code:: python
|
||
|
||
import spacy
|
||
import en_core_web_sm
|
||
|
||
nlp = en_core_web_sm.load()
|
||
doc = nlp(u'This is a sentence.')
|
||
|
||
📖 **For more info and examples, check out the**
|
||
`models documentation <https://spacy.io/docs/usage/models>`_.
|
||
|
||
Support for older versions
|
||
--------------------------
|
||
|
||
If you're using an older version (``v1.6.0`` or below), you can still download
|
||
and install the old models from within spaCy using ``python -m spacy.en.download all``
|
||
or ``python -m spacy.de.download all``. The ``.tar.gz`` archives are also
|
||
`attached to the v1.6.0 release <https://github.com/explosion/spaCy/tree/v1.6.0>`_.
|
||
To download and install the models manually, unpack the archive, drop the
|
||
contained directory into ``spacy/data`` and load the model via ``spacy.load('en')``
|
||
or ``spacy.load('de')``.
|
||
|
||
Compile from source
|
||
===================
|
||
|
||
The other way to install spaCy is to clone its
|
||
`GitHub repository <https://github.com/explosion/spaCy>`_ and build it from
|
||
source. That is the common way if you want to make changes to the code base.
|
||
You'll need to make sure that you have a development environment consisting of a
|
||
Python distribution including header files, a compiler,
|
||
`pip <https://pip.pypa.io/en/latest/installing/>`__, `virtualenv <https://virtualenv.pypa.io/>`_
|
||
and `git <https://git-scm.com>`_ installed. The compiler part is the trickiest.
|
||
How to do that depends on your system. See notes on Ubuntu, OS X and Windows for
|
||
details.
|
||
|
||
.. code:: bash
|
||
|
||
# make sure you are using the latest pip
|
||
python -m pip install -U pip
|
||
git clone https://github.com/explosion/spaCy
|
||
cd spaCy
|
||
|
||
python -m venv .env
|
||
source .env/bin/activate
|
||
export PYTHONPATH=`pwd`
|
||
pip install -r requirements.txt
|
||
python setup.py build_ext --inplace
|
||
|
||
Compared to regular install via pip, `requirements.txt <requirements.txt>`_
|
||
additionally installs developer dependencies such as Cython. For more details
|
||
and instructions, see the documentation on
|
||
`compiling spaCy from source <https://spacy.io/usage/#source>`_ and the
|
||
`quickstart widget <https://spacy.io/usage/#section-quickstart>`_ to get
|
||
the right commands for your platform and Python version.
|
||
|
||
Instead of the above verbose commands, you can also use the following
|
||
`Fabric <http://www.fabfile.org/>`_ commands. All commands assume that your
|
||
virtual environment is located in a directory ``.env``. If you're using a
|
||
different directory, you can change it via the environment variable ``VENV_DIR``,
|
||
for example ``VENV_DIR=".custom-env" fab clean make``.
|
||
|
||
============= ===
|
||
``fab env`` Create virtual environment and delete previous one, if it exists.
|
||
``fab make`` Compile the source.
|
||
``fab clean`` Remove compiled objects, including the generated C++.
|
||
``fab test`` Run basic tests, aborting after first failure.
|
||
============= ===
|
||
|
||
Ubuntu
|
||
------
|
||
|
||
Install system-level dependencies via ``apt-get``:
|
||
|
||
.. code:: bash
|
||
|
||
sudo apt-get install build-essential python-dev git
|
||
|
||
macOS / OS X
|
||
------------
|
||
|
||
Install a recent version of `XCode <https://developer.apple.com/xcode/>`_,
|
||
including the so-called "Command Line Tools". macOS and OS X ship with Python
|
||
and git preinstalled.
|
||
|
||
Windows
|
||
-------
|
||
|
||
Install a version of `Visual Studio Express <https://www.visualstudio.com/vs/visual-studio-express/>`_
|
||
or higher that matches the version that was used to compile your Python
|
||
interpreter. For official distributions these are VS 2008 (Python 2.7),
|
||
VS 2010 (Python 3.4) and VS 2015 (Python 3.5).
|
||
|
||
Run tests
|
||
=========
|
||
|
||
spaCy comes with an `extensive test suite <spacy/tests>`_. In order to run the
|
||
tests, you'll usually want to clone the repository and build spaCy from source.
|
||
This will also install the required development dependencies and test utilities
|
||
defined in the ``requirements.txt``.
|
||
|
||
Alternatively, you can find out where spaCy is installed and run ``pytest`` on
|
||
that directory. Don't forget to also install the test utilities via spaCy's
|
||
``requirements.txt``:
|
||
|
||
.. code:: bash
|
||
|
||
python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
|
||
pip install -r path/to/requirements.txt
|
||
python -m pytest <spacy-directory>
|
||
|
||
See `the documentation <https://spacy.io/usage/#tests>`_ for more details and
|
||
examples.
|