mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 18:26:30 +03:00
213 lines
7.5 KiB
Plaintext
213 lines
7.5 KiB
Plaintext
//- 💫 DOCS > USAGE > FACTS & FIGURES > BENCHMARKS
|
|
|
|
p
|
|
| Two peer-reviewed papers in 2015 confirm that spaCy offers the
|
|
| #[strong fastest syntactic parser in the world] and that
|
|
| #[strong its accuracy is within 1% of the best] available. The few
|
|
| systems that are more accurate are 20× slower or more.
|
|
|
|
+aside("About the evaluation")
|
|
| The first of the evaluations was published by #[strong Yahoo! Labs] and
|
|
| #[strong Emory University], as part of a survey of current parsing
|
|
| technologies #[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") (Choi et al., 2015)].
|
|
| Their results and subsequent discussions helped us develop a novel
|
|
| psychologically-motivated technique to improve spaCy's accuracy, which
|
|
| we published in joint work with Macquarie University
|
|
| #[+a("https://www.aclweb.org/anthology/D/D15/D15-1162.pdf") (Honnibal and Johnson, 2015)].
|
|
|
|
include _benchmarks-choi-2015
|
|
|
|
+h(3, "algorithm") Algorithm comparison
|
|
|
|
p
|
|
| In this section, we compare spaCy's algorithms to recently published
|
|
| systems, using some of the most popular benchmarks. These benchmarks are
|
|
| designed to help isolate the contributions of specific algorithmic
|
|
| decisions, so they promote slightly "idealised" conditions. Specifically,
|
|
| the text comes pre-processed with "gold standard" token and sentence
|
|
| boundaries. The data sets also tend to be fairly small, to help
|
|
| researchers iterate quickly. These conditions mean the models trained on
|
|
| these data sets are not always useful for practical purposes.
|
|
|
|
+h(4, "parse-accuracy-penn") Parse accuracy (Penn Treebank / Wall Street Journal)
|
|
|
|
p
|
|
| This is the "classic" evaluation, so it's the number parsing researchers
|
|
| are most easily able to put in context. However, it's quite far removed
|
|
| from actual usage: it uses sentences with gold-standard segmentation and
|
|
| tokenization, from a pretty specific type of text (articles from a single
|
|
| newspaper, 1984-1989).
|
|
|
|
+aside("Methodology")
|
|
| #[+a("http://arxiv.org/abs/1603.06042") Andor et al. (2016)] chose
|
|
| slightly different experimental conditions from
|
|
| #[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") Choi et al. (2015)],
|
|
| so the two accuracy tables here do not present directly comparable
|
|
| figures.
|
|
|
|
+table(["System", "Year", "Type", "Accuracy"])
|
|
+row
|
|
+cell spaCy v2.0.0
|
|
+cell 2017
|
|
+cell neural
|
|
+cell("num") 94.48
|
|
|
|
+row
|
|
+cell spaCy v1.1.0
|
|
+cell 2016
|
|
+cell linear
|
|
+cell("num") 92.80
|
|
|
|
+row("divider")
|
|
+cell
|
|
+a("https://arxiv.org/pdf/1611.01734.pdf") Dozat and Manning
|
|
+cell 2017
|
|
+cell neural
|
|
+cell("num") #[strong 95.75]
|
|
|
|
+row
|
|
+cell
|
|
+a("http://arxiv.org/abs/1603.06042") Andor et al.
|
|
+cell 2016
|
|
+cell neural
|
|
+cell("num") 94.44
|
|
|
|
+row
|
|
+cell
|
|
+a("https://github.com/tensorflow/models/tree/master/research/syntaxnet") SyntaxNet Parsey McParseface
|
|
+cell 2016
|
|
+cell neural
|
|
+cell("num") 94.15
|
|
|
|
+row
|
|
+cell
|
|
+a("http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43800.pdf") Weiss et al.
|
|
+cell 2015
|
|
+cell neural
|
|
+cell("num") 93.91
|
|
|
|
+row
|
|
+cell
|
|
+a("http://research.google.com/pubs/archive/38148.pdf") Zhang and McDonald
|
|
+cell 2014
|
|
+cell linear
|
|
+cell("num") 93.32
|
|
|
|
+row
|
|
+cell
|
|
+a("http://www.cs.cmu.edu/~ark/TurboParser/") Martins et al.
|
|
+cell 2013
|
|
+cell linear
|
|
+cell("num") 93.10
|
|
|
|
+h(4, "ner-accuracy-ontonotes5") NER accuracy (OntoNotes 5, no pre-process)
|
|
|
|
p
|
|
| This is the evaluation we use to tune spaCy's parameters to decide which
|
|
| algorithms are better than the others. It's reasonably close to actual usage,
|
|
| because it requires the parses to be produced from raw text, without any
|
|
| pre-processing.
|
|
|
|
+table(["System", "Year", "Type", "Accuracy"])
|
|
+row
|
|
+cell spaCy #[+a("/models/en#en_core_web_lg") #[code en_core_web_lg]] v2.0.0a3
|
|
+cell 2017
|
|
+cell neural
|
|
+cell("num") 85.85
|
|
|
|
+row("divider")
|
|
+cell
|
|
+a("https://arxiv.org/pdf/1702.02098.pdf") Strubell et al.
|
|
+cell 2017
|
|
+cell neural
|
|
+cell("num") #[strong 86.81]
|
|
|
|
+row
|
|
+cell
|
|
+a("https://www.semanticscholar.org/paper/Named-Entity-Recognition-with-Bidirectional-LSTM-C-Chiu-Nichols/10a4db59e81d26b2e0e896d3186ef81b4458b93f") Chiu and Nichols
|
|
+cell 2016
|
|
+cell neural
|
|
+cell("num") 86.19
|
|
|
|
+row
|
|
+cell
|
|
+a("https://www.semanticscholar.org/paper/A-Joint-Model-for-Entity-Analysis-Coreference-Typi-Durrett-Klein/28eb033eee5f51c5e5389cbb6b777779203a6778") Durrett and Klein
|
|
+cell 2014
|
|
+cell neural
|
|
+cell("num") 84.04
|
|
|
|
+row
|
|
+cell
|
|
+a("http://www.aclweb.org/anthology/W09-1119") Ratinov and Roth
|
|
+cell 2009
|
|
+cell linear
|
|
+cell("num") 83.45
|
|
|
|
+h(3, "spacy-models") Model comparison
|
|
|
|
include _benchmarks-models
|
|
|
|
+h(3, "speed-comparison") Detailed speed comparison
|
|
|
|
p
|
|
| Here we compare the per-document processing time of various spaCy
|
|
| functionalities against other NLP libraries. We show both absolute
|
|
| timings (in ms) and relative performance (normalized to spaCy). Lower is
|
|
| better.
|
|
|
|
+infobox("Important note", "⚠️")
|
|
| This evaluation was conducted in 2015. We're working on benchmarks on
|
|
| current CPU and GPU hardware. In the meantime, we're grateful to the
|
|
| Stanford folks for drawing our attention to what seems
|
|
| to be #[+a("https://nlp.stanford.edu/software/tokenizer.html#Speed") a long-standing error]
|
|
| in our CoreNLP benchmarks, especially for their
|
|
| tokenizer. Until we run corrected experiments, we have updated the table
|
|
| using their figures.
|
|
|
|
|
|
+aside("Methodology")
|
|
| #[strong Set up:] 100,000 plain-text documents were streamed from an
|
|
| SQLite3 database, and processed with an NLP library, to one of three
|
|
| levels of detail — tokenization, tagging, or parsing. The tasks are
|
|
| additive: to parse the text you have to tokenize and tag it. The
|
|
| pre-processing was not subtracted from the times — we report the time
|
|
| required for the pipeline to complete. We report mean times per document,
|
|
| in milliseconds.#[br]#[br]
|
|
| #[strong Hardware]: Intel i7-3770 (2012)#[br]
|
|
| #[strong Implementation]: #[+src(gh("spacy-benchmarks")) #[code spacy-benchmarks]]
|
|
|
|
+table
|
|
+row.u-text-label.u-text-center
|
|
+head-cell
|
|
+head-cell(colspan="3") Absolute (ms per doc)
|
|
+head-cell(colspan="3") Relative (to spaCy)
|
|
|
|
+row
|
|
each column in ["System", "Tokenize", "Tag", "Parse", "Tokenize", "Tag", "Parse"]
|
|
+head-cell=column
|
|
|
|
+row
|
|
+cell #[strong spaCy]
|
|
each data in [ "0.2ms", "1ms", "19ms"]
|
|
+cell("num")=data
|
|
|
|
each data in ["1x", "1x", "1x"]
|
|
+cell("num")=data
|
|
|
|
+row
|
|
+cell CoreNLP
|
|
each data in ["0.18ms", "10ms", "49ms", "0.9x", "10x", "2.6x"]
|
|
+cell("num")=data
|
|
+row
|
|
+cell ZPar
|
|
each data in ["1ms", "8ms", "850ms", "5x", "8x", "44.7x"]
|
|
+cell("num")=data
|
|
+row
|
|
+cell NLTK
|
|
each data in ["4ms", "443ms"]
|
|
+cell("num")=data
|
|
+cell("num") #[em n/a]
|
|
each data in ["20x", "443x"]
|
|
+cell("num")=data
|
|
+cell("num") #[em n/a]
|