* Work on intro copy

2025-11-28 05:45:44 +03:00 · 2014-11-03 00:13:19 +11:00 · 2014-11-03 00:13:19 +11:00 · f1c3e17c80
commit f1c3e17c80
parent fa91506073
1 changed files with 19 additions and 27 deletions
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -6,36 +6,28 @@
 spaCy NLP Tokenizer and Lexicon
 ================================
-spaCy is an industrial-strength multi-language tokenizer, bristling with features
+spaCy is a library for industrial strength NLP in Python and Cython.  Its core
-you never knew you wanted. You do want these features though --- your current
+values are efficiency, accuracy and minimalism.  
 tokenizer has been doing it wrong.
 Where other tokenizers give you a list of strings, spaCy gives you references
 to rich lexical types, for easy, excellent and efficient feature extraction.
-* **Easy**: Tokenizer returns a sequence of rich lexical types, with features
+* Efficiency: spaCy is 
  pre-computed:
-    >>> from spacy.en import EN
+It does not attempt to be comprehensive,
-    >>> for w in EN.tokenize(string):
+or to provide lavish syntactic sugar.  This isn't a library that covers 43 known
-    ...   print w.sic, w.shape, w.cluster, w.oft_title, w.can_verb
+algorithms to do X. You get 1 --- the best one --- with a simple, low-level interface. 
-
+For commercial users, the code is free but the data isn't.  For researchers, both
-Check out the tutorial and API docs.
+are free and always will be.
 * **Excellent**: Distributional and orthographic features are crucial to robust
  NLP. Without them, models can only learn from tiny annotated training
  corpora.  Read more.
 * **Efficient**: spaCy serves you rich lexical objects faster than most
  tokenizers can give you a list of strings.  
 +--------+-------+--------------+--------------+
 | System | Time	 | Words/second | Speed Factor |
 +--------+-------+--------------+--------------+
 | NLTK	 | 6m4s  | 89,000       | 1.00         |
 +--------+-------+--------------+--------------+
 | spaCy	 | 9.5s	 | 3,093,000	| 38.30        |
 +--------+-------+--------------+--------------+
 Comparison
 ----------
 +-------------+-------------+---+-----------+--------------+
 | POS taggers | Speed (w/s) | % Acc. (news) | % Acc. (web) |
 +-------------+-------------+---------------+--------------+
 | spaCy       |             |               |              |
 +-------------+-------------+---------------+--------------+
 | Stanford    | 16,000      |               |              |
 +-------------+-------------+---------------+--------------+
 | NLTK        |             |               |              |
 +-------------+-------------+---------------+--------------+
 .. toctree::