spaCy/docs/redesign/comparisons.jade

140 lines
3.8 KiB
Plaintext
Raw Normal View History

2015-08-15 09:56:30 +03:00
- var urls = {}
- urls.choi_paper = "http://aclweb.org/anthology/P/P15/P15-1038.pdf"
- urls.emnlp_paper = "honnibal_johnson_emnlp2015.pdf"
2015-08-14 21:13:22 +03:00
+comparison("NLTK")
2015-08-15 09:56:30 +03:00
p spaCy is:
ul
li.pro 100x faster;
li.pro 50% more accurate;
li.pro Serializes TODO% smaller;
p spaCy features:
ul
li.pro Integrated word vectors;
li.pro Efficient binary serialization;
p NLTK features:
ul
li.con Multiple languages;
li.neutral Educational resources
2015-08-14 21:13:22 +03:00
//+comparison("Pattern")
+comparison("CoreNLP")
2015-08-15 09:56:30 +03:00
p spaCy is:
ul
li.pro TODO% faster;
li.pro TODO% more accurate;
li.pro Not Java;
li.pro Well documented;
li.pro Cheaper to license commercially;
li.neutral
| Opinionated/Minimalist. spaCy avoids providing redundant or overlapping
| options.
p CoreNLP features:
ul
li.con Multiple Languages;
li.con Sentiment analysis
li.con Coreference resolution
2015-08-14 21:13:22 +03:00
+comparison("ClearNLP")
2015-08-15 09:56:30 +03:00
p spaCy is:
ul
li.pro Not Java;
li.pro TODO% faster;
li.pro Well documented;
li.neutral Slightly more accurate;
2015-08-14 21:13:22 +03:00
2015-08-15 09:56:30 +03:00
p ClearNLP features:
ul
li.con Semantic Role Labelling
li.con Multiple Languages
li.con Model for biology/life-science;
//+comparison("Accuracy Summary")
//+comparison("Speed Summary")
// table
// thead
// tr
// th.
// th(colspan=3) Absolute (ms per doc)
// th(colspan=3) Relative (to spaCy)
//
// tbody
// tr
// td: strong System
// td: strong Split
// td: strong Tag
// td: strong Parse
// td: strong Split
// td: strong Tag
// td: strong Parse
//
// +row("spaCy", "0.2ms", "1ms", "19ms", "1x", "1x", "1x")
// +row("spaCy", "0.2ms", "1ms", "19ms", "1x", "1x", "1x")
// +row("CoreNLP", "2ms", "10ms", "49ms", "10x", "10x", "2.6x")
// +row("ZPar", "1ms", "8ms", "850ms", "5x", "8x", "44.7x")
// +row("NLTK", "4ms", "443ms", "n/a", "20x", "443x", "n/a")
//
// p
// | <strong>Set up</strong>: 100,000 plain-text documents were streamed
// | from an SQLite3 database, and processed with an NLP library, to one
// | of three levels of detail &ndash; tokenization, tagging, or parsing.
// | The tasks are additive: to parse the text you have to tokenize and
// | tag it. The pre-processing was not subtracted from the times &ndash;
// | I report the time required for the pipeline to complete. I report
// | mean times per document, in milliseconds.
//
// p
// | <strong>Hardware</strong>: Intel i7-3770 (2012)
+comparison("Peer-reviewed Evaluations")
p.
spaCy is committed to rigorous evaluation under standard methodology. Two
papers in 2015 confirm that:
ol
li spaCy is the fastest syntactic parser in the world;
li Its accuracy is within 1% of the best available;
li The few systems that are more accurate are 20&times; slower or more.
p
| spaCy v0.84 was evaluated by researchers at Yahoo! Labs and Emory University,
| as part of a survey paper benchmarking the current state-of-the-art dependency
| parsers
a(href=urls.choi_paper) (Choi et al., 2015)
| .
2015-08-14 21:13:22 +03:00
table
thead
2015-08-15 09:56:30 +03:00
+columns("System", "Language", "Accuracy", "Speed")
2015-08-14 21:13:22 +03:00
tbody
2015-08-15 09:56:30 +03:00
+row("spaCy v0.84", "Cython", "90.6", "13,963")
+row("spaCy v0.89", "Cython", "91.8", "13,000 (est.)")
+row("ClearNLP", "Java", "91.7", "10,271")
+row("CoreNLP", "Java", "89.6", "8,602")
+row("MATE", "Java", "92.5", "550")
+row("Turbo", "C++", "92.4", "349")
+row("Yara", "Java", "92.3", "340")
2015-08-14 21:13:22 +03:00
p
2015-08-15 09:56:30 +03:00
| Discussion with the authors led to accuracy improvements in spaCy, which
| have been accepted for publication in EMNLP, in joint work with Macquarie
| University
a(href=urls.emnlp_paper) (Honnibal and Johnson, 2015)
| .
2015-08-14 21:13:22 +03:00