diff --git a/README.rst b/README.rst index 244308473..27fca3fc2 100644 --- a/README.rst +++ b/README.rst @@ -1,15 +1,16 @@ spaCy: Industrial-strength NLP ****************************** -spaCy is a library for advanced natural language processing in Python and +spaCy is a library for advanced Natural Language Processing in Python and Cython. spaCy is built on the very latest research, but it isn't researchware. -It was designed from day one to be used in real products. spaCy currently supports -English, German, French and Spanish, as well as tokenization for Italian, -Portuguese, Dutch, Swedish, Finnish, Norwegian, Danish, Hungarian, Polish, -Bengali, Hebrew, Chinese and Japanese. It's commercial open-source software, -released under the MIT license. +It was designed from day one to be used in real products. spaCy comes with +`pre-trained statistical models `_ and word +vectors, and currently supports tokenization for **20+ languages**. It features +the **fastest syntactic parser** in the world, convolutional **neural network models** +for tagging, parsing and **named entity recognition** and easy **deep learning** +integration. It's commercial open-source software, released under the MIT license. -💫 **Version 1.8 out now!** `Read the release notes here. `_ +💫 **Version 2.0 out now!** `Check out the new features here. `_ .. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square :target: https://travis-ci.org/explosion/spaCy @@ -38,68 +39,72 @@ released under the MIT license. 📖 Documentation ================ -=================== === -`Usage Workflows`_ How to use spaCy and its features. -`API Reference`_ The detailed reference for spaCy's API. -`Troubleshooting`_ Common problems and solutions for beginners. -`Tutorials`_ End-to-end examples, with code you can modify and run. -`Showcase & Demos`_ Demos, libraries and products from the spaCy community. -`Contribute`_ How to contribute to the spaCy project and code base. -=================== === +=================== === +`spaCy 101`_ New to spaCy? Here's everything you need to know! +`Usage Guides`_ How to use spaCy and its features. +`New in v2.0`_ New features, backwards incompatibilitiies and migration guide. +`API Reference`_ The detailed reference for spaCy's API. +`Models`_ Download statistical language models for spaCy. +`Resources`_ Libraries, extensions, demos, books and courses. +`Changelog`_ Changes and version history. +`Contribute`_ How to contribute to the spaCy project and code base. +=================== === -.. _Usage Workflows: https://spacy.io/docs/usage/ -.. _API Reference: https://spacy.io/docs/api/ -.. _Troubleshooting: https://spacy.io/docs/usage/troubleshooting -.. _Tutorials: https://spacy.io/docs/usage/tutorials -.. _Showcase & Demos: https://spacy.io/docs/usage/showcase +.. _spaCy 101: https://alpha.spacy.io/usage/spacy-101 +.. _New in v2.0: https://alpha.spacy.io/usage/v2#migrating +.. _Usage Guides: https://alpha.spacy.io/usage/ +.. _API Reference: https://alpha.spacy.io/api/ +.. _Models: https://alpha.spacy.io/models +.. _Resources: https://alpha.spacy.io/usage/resources +.. _Changelog: https://alpha.spacy.io/usage/#changelog .. _Contribute: https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md 💬 Where to ask questions ========================== +The spaCy project is maintained by `@honnibal `_ +and `@ines `_. Please understand that we won't be able +to provide individual support via email. We also believe that help is much more +valuable if it's shared publicly, so that more people can benefit from it. + ====================== === -**Bug reports** `GitHub issue tracker`_ -**Usage questions** `StackOverflow`_, `Gitter chat`_, `Reddit user group`_ -**General discussion** `Gitter chat`_, `Reddit user group`_ -**Commercial support** contact@explosion.ai +**Bug Reports** `GitHub Issue Tracker`_ +**Usage Questions** `StackOverflow`_, `Gitter Chat`_, `Reddit User Group`_ +**General Discussion** `Gitter Chat`_, `Reddit User Group`_ ====================== === -.. _GitHub issue tracker: https://github.com/explosion/spaCy/issues +.. _GitHub Issue Tracker: https://github.com/explosion/spaCy/issues .. _StackOverflow: http://stackoverflow.com/questions/tagged/spacy -.. _Gitter chat: https://gitter.im/explosion/spaCy -.. _Reddit user group: https://www.reddit.com/r/spacynlp +.. _Gitter Chat: https://gitter.im/explosion/spaCy +.. _Reddit User Group: https://www.reddit.com/r/spacynlp Features ======== -* Non-destructive **tokenization** -* Syntax-driven sentence segmentation -* Pre-trained **word vectors** -* Part-of-speech tagging +* **Fastest syntactic parser** in the world * **Named entity** recognition -* Labelled dependency parsing -* Convenient string-to-int mapping -* Export to numpy data arrays -* GIL-free **multi-threading** -* Efficient binary serialization +* Non-destructive **tokenization** +* Support for **20+ languages** +* Pre-trained `statistical models `_ and word vectors * Easy **deep learning** integration -* Statistical models for **English**, **German**, **French** and **Spanish** +* Part-of-speech tagging +* Labelled dependency parsing +* Syntax-driven sentence segmentation +* Built in **visualizers** for syntax and NER +* Convenient string-to-hash mapping +* Export to numpy data arrays +* Efficient binary serialization +* Easy **model packaging** and deployment * State-of-the-art speed * Robust, rigorously evaluated accuracy -See `facts, figures and benchmarks `_. +📖 **For more details, see the** `facts, figures and benchmarks `_. -Top Performance ---------------- +Install spaCy +============= -* Fastest in the world: <50ms per document. No faster system has ever been - announced. -* Accuracy within 1% of the current state of the art on all tasks performed - (parsing, named entity recognition, part-of-speech tagging). The only more - accurate systems are an order of magnitude slower or more. - -Supports --------- +For detailed installation instructions, see +the `documentation `_. ==================== === **Operating system** macOS / OS X, Linux, Windows (Cygwin, MinGW, Visual Studio) @@ -110,12 +115,6 @@ Supports .. _pip: https://pypi.python.org/pypi/spacy .. _conda: https://anaconda.org/conda-forge/spacy -Install spaCy -============= - -Installation requires a working build environment. See notes on Ubuntu, -macOS/OS X and Windows for details. - pip --- @@ -123,7 +122,7 @@ Using pip, spaCy releases are currently only available as source packages. .. code:: bash - pip install -U spacy + pip install spacy When using pip it is generally recommended to install packages in a ``virtualenv`` to avoid modifying system state: @@ -149,25 +148,41 @@ For the feedstock including the build recipe and configuration, check out `this repository `_. Improvements and pull requests to the recipe and setup are always appreciated. +Updating spaCy +-------------- + +Some updates to spaCy may require downloading new statistical models. If you're +running spaCy v2.0 or higher, you can use the ``validate`` command to check if +your installed models are compatible and if not, print details on how to update +them: + +.. code:: bash + + pip install -U spacy + spacy validate + +If you've trained your own models, keep in mind that your training and runtime +inputs must match. After updating spaCy, we recommend **retraining your models** +with the new version. + +📖 **For details on upgrading from spaCy 1.x to spaCy 2.x, see the** +`migration guide `_. + Download models =============== As of v1.7.0, models for spaCy can be installed as **Python packages**. This means that they're a component of your application, just like any -other module. They're versioned and can be defined as a dependency in your -``requirements.txt``. Models can be installed from a download URL or -a local directory, manually or via pip. Their data can be located anywhere on -your file system. To make a model available to spaCy, all you need to do is -create a "shortcut link", an internal alias that tells spaCy where to find the -data files for a specific model name. +other module. Models can be installed using spaCy's ``download`` command, +or manually by pointing pip to a path or URL. ======================= === -`spaCy Models`_ Available models, latest releases and direct download. +`Available Models`_ Detailed model descriptions, accuracy figures and benchmarks. `Models Documentation`_ Detailed usage instructions. ======================= === -.. _spaCy Models: https://github.com/explosion/spacy-models/releases/ -.. _Models Documentation: https://spacy.io/docs/usage/models +.. _Available Models: https://alpha.spacy.io/models +.. _Models Documentation: https://alpha.spacy.io/docs/usage/models .. code:: bash @@ -175,17 +190,10 @@ data files for a specific model name. python -m spacy download en # download best-matching version of specific model for your spaCy installation - python -m spacy download en_core_web_md + python -m spacy download en_core_web_lg # pip install .tar.gz archive from path or URL - pip install /Users/you/en_core_web_md-1.2.0.tar.gz - pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-1.2.0/en_core_web_md-1.2.0.tar.gz - - # set up shortcut link to load installed package as "en_default" - python -m spacy link en_core_web_md en_default - - # set up shortcut link to load local model as "my_amazing_model" - python -m spacy link /Users/you/data my_amazing_model + pip install /Users/you/en_core_web_sm-2.0.0.tar.gz Loading and using models ------------------------ @@ -199,24 +207,24 @@ To load a model, use ``spacy.load()`` with the model's shortcut link: doc = nlp(u'This is a sentence.') If you've installed a model via pip, you can also ``import`` it directly and -then call its ``load()`` method with no arguments. This should also work for -older models in previous versions of spaCy. +then call its ``load()`` method: .. code:: python import spacy - import en_core_web_md + import en_core_web_sm - nlp = en_core_web_md.load() + nlp = en_core_web_.load() doc = nlp(u'This is a sentence.') -📖 **For more info and examples, check out the** `models documentation `_. +📖 **For more info and examples, check out the** +`models documentation `_. Support for older versions -------------------------- -If you're using an older version (v1.6.0 or below), you can still download and -install the old models from within spaCy using ``python -m spacy.en.download all`` +If you're using an older version (``v1.6.0`` or below), you can still download +and install the old models from within spaCy using ``python -m spacy.en.download all`` or ``python -m spacy.de.download all``. The ``.tar.gz`` archives are also `attached to the v1.6.0 release `_. To download and install the models manually, unpack the archive, drop the @@ -248,11 +256,13 @@ details. pip install -r requirements.txt pip install -e . -Compared to regular install via pip `requirements.txt `_ +Compared to regular install via pip, `requirements.txt `_ additionally installs developer dependencies such as Cython. - Instead of the above verbose commands, you can also use the following -`Fabric `_ commands: +`Fabric `_ commands. All commands assume that your +``virtualenv`` is located in a directory ``.env``. If you're using a different +directory, you can change it via the environment variable ``VENV_DIR``, for +example ``VENV_DIR=".custom-env" fab clean make``. ============= === ``fab env`` Create ``virtualenv`` and delete previous one, if it exists. @@ -261,14 +271,6 @@ Instead of the above verbose commands, you can also use the following ``fab test`` Run basic tests, aborting after first failure. ============= === -All commands assume that your ``virtualenv`` is located in a directory ``.env``. -If you're using a different directory, you can change it via the environment -variable ``VENV_DIR``, for example: - -.. code:: bash - - VENV_DIR=".custom-env" fab clean make - Ubuntu ------ @@ -310,76 +312,4 @@ and ``--model`` are optional and enable additional tests: # make sure you are using recent pytest version python -m pip install -U pytest - python -m pytest - -🛠 Changelog -============ - -=========== ============== =========== -Version Date Description -=========== ============== =========== -`v1.8.2`_ ``2017-04-26`` French model and small improvements -`v1.8.1`_ ``2017-04-23`` Saving, loading and training bug fixes -`v1.8.0`_ ``2017-04-16`` Better NER training, saving and loading -`v1.7.5`_ ``2017-04-07`` Bug fixes and new CLI commands -`v1.7.3`_ ``2017-03-26`` Alpha support for Hebrew, new CLI commands and bug fixes -`v1.7.2`_ ``2017-03-20`` Small fixes to beam parser and model linking -`v1.7.1`_ ``2017-03-19`` Fix data download for system installation -`v1.7.0`_ ``2017-03-18`` New 50 MB model, CLI, better downloads and lots of bug fixes -`v1.6.0`_ ``2017-01-16`` Improvements to tokenizer and tests -`v1.5.0`_ ``2016-12-27`` Alpha support for Swedish and Hungarian -`v1.4.0`_ ``2016-12-18`` Improved language data and alpha Dutch support -`v1.3.0`_ ``2016-12-03`` Improve API consistency -`v1.2.0`_ ``2016-11-04`` Alpha tokenizers for Chinese, French, Spanish, Italian and Portuguese -`v1.1.0`_ ``2016-10-23`` Bug fixes and adjustments -`v1.0.0`_ ``2016-10-18`` Support for deep learning workflows and entity-aware rule matcher -`v0.101.0`_ ``2016-05-10`` Fixed German model -`v0.100.7`_ ``2016-05-05`` German support -`v0.100.6`_ ``2016-03-08`` Add support for GloVe vectors -`v0.100.5`_ ``2016-02-07`` Fix incorrect use of header file -`v0.100.4`_ ``2016-02-07`` Fix OSX problem introduced in 0.100.3 -`v0.100.3`_ ``2016-02-06`` Multi-threading, faster loading and bugfixes -`v0.100.2`_ ``2016-01-21`` Fix data version lock -`v0.100.1`_ ``2016-01-21`` Fix install for OSX -`v0.100`_ ``2016-01-19`` Revise setup.py, better model downloads, bug fixes -`v0.99`_ ``2015-11-08`` Improve span merging, internal refactoring -`v0.98`_ ``2015-11-03`` Smaller package, bug fixes -`v0.97`_ ``2015-10-23`` Load the StringStore from a json list, instead of a text file -`v0.96`_ ``2015-10-19`` Hotfix to .merge method -`v0.95`_ ``2015-10-18`` Bug fixes -`v0.94`_ ``2015-10-09`` Fix memory and parse errors -`v0.93`_ ``2015-09-22`` Bug fixes to word vectors -=========== ============== =========== - -.. _v1.8.2: https://github.com/explosion/spaCy/releases/tag/v1.8.2 -.. _v1.8.1: https://github.com/explosion/spaCy/releases/tag/v1.8.1 -.. _v1.8.0: https://github.com/explosion/spaCy/releases/tag/v1.8.0 -.. _v1.7.5: https://github.com/explosion/spaCy/releases/tag/v1.7.5 -.. _v1.7.3: https://github.com/explosion/spaCy/releases/tag/v1.7.3 -.. _v1.7.2: https://github.com/explosion/spaCy/releases/tag/v1.7.2 -.. _v1.7.1: https://github.com/explosion/spaCy/releases/tag/v1.7.1 -.. _v1.7.0: https://github.com/explosion/spaCy/releases/tag/v1.7.0 -.. _v1.6.0: https://github.com/explosion/spaCy/releases/tag/v1.6.0 -.. _v1.5.0: https://github.com/explosion/spaCy/releases/tag/v1.5.0 -.. _v1.4.0: https://github.com/explosion/spaCy/releases/tag/v1.4.0 -.. _v1.3.0: https://github.com/explosion/spaCy/releases/tag/v1.3.0 -.. _v1.2.0: https://github.com/explosion/spaCy/releases/tag/v1.2.0 -.. _v1.1.0: https://github.com/explosion/spaCy/releases/tag/v1.1.0 -.. _v1.0.0: https://github.com/explosion/spaCy/releases/tag/v1.0.0 -.. _v0.101.0: https://github.com/explosion/spaCy/releases/tag/0.101.0 -.. _v0.100.7: https://github.com/explosion/spaCy/releases/tag/0.100.7 -.. _v0.100.6: https://github.com/explosion/spaCy/releases/tag/0.100.6 -.. _v0.100.5: https://github.com/explosion/spaCy/releases/tag/0.100.5 -.. _v0.100.4: https://github.com/explosion/spaCy/releases/tag/0.100.4 -.. _v0.100.3: https://github.com/explosion/spaCy/releases/tag/0.100.3 -.. _v0.100.2: https://github.com/explosion/spaCy/releases/tag/0.100.2 -.. _v0.100.1: https://github.com/explosion/spaCy/releases/tag/0.100.1 -.. _v0.100: https://github.com/explosion/spaCy/releases/tag/0.100 -.. _v0.99: https://github.com/explosion/spaCy/releases/tag/0.99 -.. _v0.98: https://github.com/explosion/spaCy/releases/tag/0.98 -.. _v0.97: https://github.com/explosion/spaCy/releases/tag/0.97 -.. _v0.96: https://github.com/explosion/spaCy/releases/tag/0.96 -.. _v0.95: https://github.com/explosion/spaCy/releases/tag/0.95 -.. _v0.94: https://github.com/explosion/spaCy/releases/tag/0.94 -.. _v0.93: https://github.com/explosion/spaCy/releases/tag/0.93