.. spaCy documentation master file, created by
   sphinx-quickstart on Tue Aug 19 16:27:38 2014.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

spaCy NLP Tokenizer and Lexicon
================================

spaCy is a library for industrial strength NLP in Python.  Its core
values are:

* **Efficiency**: You won't find faster NLP tools. For shallow analysis, it's 10x
  faster than Stanford Core NLP, and over 200x faster than NLTK.  Its parser is
  over 100x faster than Stanford's.

* **Accuracy**:  All spaCy tools are within 0.5% of the current published
  state-of-the-art, on both news and web text. NLP moves fast, so always check
  the numbers --- and don't settle for tools that aren't backed by
  rigorous recent evaluation.

* **Minimalism**:  This isn't a library that covers 43 known algorithms to do X. You
  get 1 --- the best one --- with a simple, low-level interface. This keeps the
  code-base small and concrete.  Our Python APIs use lists and
  dictionaries, and our C/Cython APIs use arrays and simple structs.
  

Comparison
----------

+----------------+-------------+--------+---------------+--------------+
| Tokenize & Tag | Speed (w/s) | Memory | % Acc. (news) | % Acc. (web) |
+----------------+-------------+--------+---------------+--------------+
| spaCy          | 107,000     |  1.3gb | 96.7          |              |
+----------------+-------------+--------+---------------+--------------+
| Stanford       | 8,000       |  1.5gb | 96.7          |              |
+----------------+-------------+--------+---------------+--------------+
| NLTK           | 543         |  61mb  | 94.0          |              |
+----------------+-------------+--------+---------------+--------------+


.. toctree::
    :hidden:
    :maxdepth: 3
    
    what/index.rst
    why/index.rst
    how/index.rst