spaCy/docs/source/index.rst

.. spaCy documentation master file, created by
   sphinx-quickstart on Tue Aug 19 16:27:38 2014.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

spaCy NLP Tokenizer and Lexicon
================================

spaCy is a library for industrial strength NLP in Python and Cython.  Its core
values are efficiency, accuracy and minimalism.

* Efficiency: spaCy is TODOx faster than the Stanford tools, and TODOx faster
  than NLTK.  You won't find faster NLP tools. Using spaCy will save you
  thousands in server costs, and will force you to make fewer compromises.

* Accuracy:  All spaCy tools are within 0.5% of the current published
  state-of-the-art, on both news and web text. NLP moves fast, so always check
  the numbers --- and don't settle for tools that aren't backed by
  rigorous recent evaluation. An algorithm that was "close enough to state-of-the-art"
  5 years ago is probably crap by today's standards.

* Minimalism:  This isn't a library that covers 43 known algorithms to do X. You
  get 1 --- the best one --- with a simple, low-level interface. This keeps the
  code-base small and concrete.  Our Python APIs use lists and
  dictionaries, and our C/Cython APIs use arrays and simple structs.


Comparison
----------
+-------------+-------------+---+-----------+--------------+
| POS taggers | Speed (w/s) | % Acc. (news) | % Acc. (web) |
+-------------+-------------+---------------+--------------+
| spaCy       |             |               |              |
+-------------+-------------+---------------+--------------+
| Stanford    | 16,000      |               |              |
+-------------+-------------+---------------+--------------+
| NLTK        |             |               |              |
+-------------+-------------+---------------+--------------+


.. toctree::
    :hidden:
    :maxdepth: 3

    what/index.rst
    why/index.rst
    how/index.rst