spaCy/docs/source/index.rst

.. spaCy documentation master file, created by
   sphinx-quickstart on Tue Aug 19 16:27:38 2014.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

spaCy NLP Tokenizer and Lexicon
================================

spaCy is an industrial-strength multi-language tokenizer, bristling with features
you never knew you wanted. You do want these features though --- your current
tokenizer has been doing it wrong.
Where other tokenizers give you a list of strings, spaCy gives you references
to rich lexical types, for easy, excellent and efficient feature extraction.

* **Easy**: Tokenizer returns a sequence of rich lexical types, with features
  pre-computed:

    >>> from spacy.en import EN
    >>> for w in EN.tokenize(string):
    ...   print w.sic, w.shape, w.cluster, w.oft_title, w.can_verb

Check out the tutorial and API docs.

* **Excellent**: Distributional and orthographic features are crucial to robust
  NLP. Without them, models can only learn from tiny annotated training
  corpora.  Read more.
  
* **Efficient**: spaCy serves you rich lexical objects faster than most
  tokenizers can give you a list of strings.  

+--------+-------+--------------+--------------+
| System | Time	 | Words/second | Speed Factor |
+--------+-------+--------------+--------------+
| NLTK	 | 6m4s  | 89,000       | 1.00         |
+--------+-------+--------------+--------------+
| spaCy	 | 9.5s	 | 3,093,000	| 38.30        |
+--------+-------+--------------+--------------+


.. toctree::
    :hidden:
    :maxdepth: 3
    
    what/index.rst
    why/index.rst
    how/index.rst
* Re-add docs, sorting out mess from gh-pages 2014-09-25 20:42:20 +04:00			`.. spaCy documentation master file, created by`
			`sphinx-quickstart on Tue Aug 19 16:27:38 2014.`
			`You can adapt this file completely to your liking, but it should at least`
			contain the root `toctree` directive.

			`spaCy NLP Tokenizer and Lexicon`
			`================================`

* Update docs 2014-10-15 14:50:34 +04:00			`spaCy is an industrial-strength multi-language tokenizer, bristling with features`
			`you never knew you wanted. You do want these features though --- your current`
			`tokenizer has been doing it wrong.`
			`Where other tokenizers give you a list of strings, spaCy gives you references`
			`to rich lexical types, for easy, excellent and efficient feature extraction.`
* Upd docs 2014-09-26 20:40:18 +04:00
* Update docs 2014-10-15 14:50:34 +04:00			`* Easy: Tokenizer returns a sequence of rich lexical types, with features`
			`pre-computed:`
* Upd docs 2014-09-26 20:40:18 +04:00
* Update docs 2014-10-15 14:50:34 +04:00			`>>> from spacy.en import EN`
			`>>> for w in EN.tokenize(string):`
			`... print w.sic, w.shape, w.cluster, w.oft_title, w.can_verb`
* Upd docs 2014-09-26 20:40:18 +04:00
* Update docs 2014-10-15 14:50:34 +04:00			`Check out the tutorial and API docs.`
* Upd docs 2014-09-26 20:40:18 +04:00
* Update docs 2014-10-15 14:50:34 +04:00			`* Excellent: Distributional and orthographic features are crucial to robust`
			`NLP. Without them, models can only learn from tiny annotated training`
			`corpora. Read more.`

			`* Efficient: spaCy serves you rich lexical objects faster than most`
			`tokenizers can give you a list of strings.`
* Upd docs 2014-09-26 20:40:18 +04:00
* Update docs 2014-10-15 14:50:34 +04:00			`+--------+-------+--------------+--------------+`
			`\| System \| Time \| Words/second \| Speed Factor \|`
			`+--------+-------+--------------+--------------+`
			`\| NLTK \| 6m4s \| 89,000 \| 1.00 \|`
			`+--------+-------+--------------+--------------+`
			`\| spaCy \| 9.5s \| 3,093,000 \| 38.30 \|`
			`+--------+-------+--------------+--------------+`
* Upd docs 2014-09-26 20:40:18 +04:00


* Re-add docs, sorting out mess from gh-pages 2014-09-25 20:42:20 +04:00			`.. toctree::`
* Update docs 2014-10-15 14:50:34 +04:00			`:hidden:`
* Re-add docs, sorting out mess from gh-pages 2014-09-25 20:42:20 +04:00			`:maxdepth: 3`

* Update docs 2014-10-15 14:50:34 +04:00			`what/index.rst`
			`why/index.rst`
			`how/index.rst`