mirror of
https://github.com/explosion/spaCy.git
synced 2025-02-03 21:24:11 +03:00
* Upd sales copy
This commit is contained in:
parent
954c970415
commit
2420d944cb
|
@ -6,20 +6,19 @@
|
||||||
spaCy NLP Tokenizer and Lexicon
|
spaCy NLP Tokenizer and Lexicon
|
||||||
================================
|
================================
|
||||||
|
|
||||||
spaCy is a library for industrial strength NLP in Python and Cython. Its core
|
spaCy is a library for industrial strength NLP in Python. Its core
|
||||||
values are efficiency, accuracy and minimalism.
|
values are:
|
||||||
|
|
||||||
* Efficiency: spaCy is TODOx faster than the Stanford tools, and TODOx faster
|
* **Efficiency**: You won't find faster NLP tools. For shallow analysis, it's 10x
|
||||||
than NLTK. You won't find faster NLP tools. Using spaCy will save you
|
faster than Stanford Core NLP, and over 200x faster than NLTK. Its parser is
|
||||||
thousands in server costs, and will force you to make fewer compromises.
|
over 100x faster than Stanford's.
|
||||||
|
|
||||||
* Accuracy: All spaCy tools are within 0.5% of the current published
|
* **Accuracy**: All spaCy tools are within 0.5% of the current published
|
||||||
state-of-the-art, on both news and web text. NLP moves fast, so always check
|
state-of-the-art, on both news and web text. NLP moves fast, so always check
|
||||||
the numbers --- and don't settle for tools that aren't backed by
|
the numbers --- and don't settle for tools that aren't backed by
|
||||||
rigorous recent evaluation. An algorithm that was "close enough to state-of-the-art"
|
rigorous recent evaluation.
|
||||||
5 years ago is probably crap by today's standards.
|
|
||||||
|
|
||||||
* Minimalism: This isn't a library that covers 43 known algorithms to do X. You
|
* **Minimalism**: This isn't a library that covers 43 known algorithms to do X. You
|
||||||
get 1 --- the best one --- with a simple, low-level interface. This keeps the
|
get 1 --- the best one --- with a simple, low-level interface. This keeps the
|
||||||
code-base small and concrete. Our Python APIs use lists and
|
code-base small and concrete. Our Python APIs use lists and
|
||||||
dictionaries, and our C/Cython APIs use arrays and simple structs.
|
dictionaries, and our C/Cython APIs use arrays and simple structs.
|
||||||
|
@ -27,15 +26,16 @@ values are efficiency, accuracy and minimalism.
|
||||||
|
|
||||||
Comparison
|
Comparison
|
||||||
----------
|
----------
|
||||||
+-------------+-------------+---+-----------+--------------+
|
|
||||||
| POS taggers | Speed (w/s) | % Acc. (news) | % Acc. (web) |
|
+----------------+-------------+--------+---------------+--------------+
|
||||||
+-------------+-------------+---------------+--------------+
|
| Tokenize & Tag | Speed (w/s) | Memory | % Acc. (news) | % Acc. (web) |
|
||||||
| spaCy | | | |
|
+----------------+-------------+--------+---------------+--------------+
|
||||||
+-------------+-------------+---------------+--------------+
|
| spaCy | 107,000 | 1.3gb | 96.7 | |
|
||||||
| Stanford | 16,000 | | |
|
+----------------+-------------+--------+---------------+--------------+
|
||||||
+-------------+-------------+---------------+--------------+
|
| Stanford | 8,000 | 1.5gb | 96.7 | |
|
||||||
| NLTK | | | |
|
+----------------+-------------+--------+---------------+--------------+
|
||||||
+-------------+-------------+---------------+--------------+
|
| NLTK | 543 | 61mb | 94.0 | |
|
||||||
|
+----------------+-------------+--------+---------------+--------------+
|
||||||
|
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
|
|
Loading…
Reference in New Issue
Block a user