+comparison("NLTK") //+comparison("Pattern") +comparison("CoreNLP") +comparison("ClearNLP") //+comparison("OpenNLP") //+comparison("GATE") +comparison("Accuracy Summary") +comparison("Speed Summary") table thead tr th. th(colspan=3) Absolute (ms per doc) th(colspan=3) Relative (to spaCy) tbody tr td: strong System td: strong Split td: strong Tag td: strong Parse td: strong Split td: strong Tag td: strong Parse +row("spaCy", "0.2ms", "1ms", "19ms", "1x", "1x", "1x") +row("spaCy", "0.2ms", "1ms", "19ms", "1x", "1x", "1x") +row("CoreNLP", "2ms", "10ms", "49ms", "10x", "10x", "2.6x") +row("ZPar", "1ms", "8ms", "850ms", "5x", "8x", "44.7x") +row("NLTK", "4ms", "443ms", "n/a", "20x", "443x", "n/a") p | Set up: 100,000 plain-text documents were streamed | from an SQLite3 database, and processed with an NLP library, to one | of three levels of detail – tokenization, tagging, or parsing. | The tasks are additive: to parse the text you have to tokenize and | tag it. The pre-processing was not subtracted from the times – | I report the time required for the pipeline to complete. I report | mean times per document, in milliseconds. p | Hardware: Intel i7-3770 (2012) +comparison("Independent Evaluation") p | Independent evaluation by Yahoo! Labs and Emory | University, to appear at ACL 2015. Higher is better. table thead +columns("System", "Language", "Accuracy", "Speed") tbody +row("spaCy v0.86", "Cython", "91.9", "13,963") +row("spaCy v0.84", "Cython", "90.6", "13,963") +row("ClearNLP", "Java", "91.7", "10,271") +row("CoreNLP", "Java", "89.6", "8,602") +row("MATE", "Java", "92.5", "550") +row("Turbo", "C++", "92.4", "349") +row("Yara", "Java", "92.3", "340") p | Accuracy is % unlabelled arcs correct, speed is tokens per second. p | Joel Tetreault and Amanda Stent (Yahoo! Labs) and Jin-ho Choi (Emory) | performed a detailed comparison of the best parsers available. | All numbers above are taken from the pre-print they kindly made | available to me, except for spaCy v0.86. p | I'm particularly grateful to the authors for discussion of their | results, which led to the improvement in accuracy between v0.84 and | v0.86. A tip from Jin-ho developer of ClearNLP) was particularly | useful.