spaCy/docs/source/index.rst

48 lines
1.9 KiB
ReStructuredText
Raw Normal View History

.. spaCy documentation master file, created by
sphinx-quickstart on Tue Aug 19 16:27:38 2014.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
spaCy NLP Tokenizer and Lexicon
================================
2014-11-04 09:01:54 +03:00
spaCy is a library for industrial strength NLP in Python. Its core
values are:
2014-11-02 16:13:19 +03:00
2014-11-04 09:01:54 +03:00
* **Efficiency**: You won't find faster NLP tools. For shallow analysis, it's 10x
faster than Stanford Core NLP, and over 200x faster than NLTK. Its parser is
over 100x faster than Stanford's.
2014-11-02 16:13:19 +03:00
2014-11-04 09:01:54 +03:00
* **Accuracy**: All spaCy tools are within 0.5% of the current published
2014-11-03 05:54:18 +03:00
state-of-the-art, on both news and web text. NLP moves fast, so always check
the numbers --- and don't settle for tools that aren't backed by
2014-11-04 09:01:54 +03:00
rigorous recent evaluation.
2014-11-03 05:54:18 +03:00
2014-11-04 09:01:54 +03:00
* **Minimalism**: This isn't a library that covers 43 known algorithms to do X. You
2014-11-03 05:54:18 +03:00
get 1 --- the best one --- with a simple, low-level interface. This keeps the
code-base small and concrete. Our Python APIs use lists and
dictionaries, and our C/Cython APIs use arrays and simple structs.
2014-11-02 16:13:19 +03:00
Comparison
----------
2014-11-04 09:01:54 +03:00
+----------------+-------------+--------+---------------+--------------+
| Tokenize & Tag | Speed (w/s) | Memory | % Acc. (news) | % Acc. (web) |
+----------------+-------------+--------+---------------+--------------+
| spaCy | 107,000 | 1.3gb | 96.7 | |
+----------------+-------------+--------+---------------+--------------+
| Stanford | 8,000 | 1.5gb | 96.7 | |
+----------------+-------------+--------+---------------+--------------+
| NLTK | 543 | 61mb | 94.0 | |
+----------------+-------------+--------+---------------+--------------+
2014-09-26 20:40:18 +04:00
.. toctree::
2014-10-15 14:50:34 +04:00
:hidden:
:maxdepth: 3
2014-10-15 14:50:34 +04:00
what/index.rst
why/index.rst
how/index.rst