mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-14 21:57:15 +03:00
259 lines
12 KiB
Markdown
259 lines
12 KiB
Markdown
---
|
||
title: Facts & Figures
|
||
teaser: The hard numbers for spaCy and how it compares to other tools
|
||
next: /usage/spacy-101
|
||
menu:
|
||
- ['Feature Comparison', 'comparison']
|
||
- ['Benchmarks', 'benchmarks']
|
||
---
|
||
|
||
## Feature comparison {#comparison}
|
||
|
||
Here's a quick comparison of the functionalities offered by spaCy,
|
||
[NLTK](http://www.nltk.org/py-modindex.html) and
|
||
[CoreNLP](http://stanfordnlp.github.io/CoreNLP/).
|
||
|
||
| | spaCy | NLTK | CoreNLP |
|
||
| ----------------------- | :----: | :----: | :-----------: |
|
||
| Programming language | Python | Python | Java / Python |
|
||
| Neural network models | ✅ | ❌ | ✅ |
|
||
| Integrated word vectors | ✅ | ❌ | ❌ |
|
||
| Multi-language support | ✅ | ✅ | ✅ |
|
||
| Tokenization | ✅ | ✅ | ✅ |
|
||
| Part-of-speech tagging | ✅ | ✅ | ✅ |
|
||
| Sentence segmentation | ✅ | ✅ | ✅ |
|
||
| Dependency parsing | ✅ | ❌ | ✅ |
|
||
| Entity recognition | ✅ | ✅ | ✅ |
|
||
| Entity linking | ✅ | ❌ | ❌ |
|
||
| Coreference resolution | ❌ | ❌ | ✅ |
|
||
|
||
### When should I use what? {#comparison-usage}
|
||
|
||
Natural Language Understanding is an active area of research and development, so
|
||
there are many different tools or technologies catering to different use-cases.
|
||
The table below summarizes a few libraries (spaCy,
|
||
[NLTK](http://www.nltk.org/py-modindex.html), [AllenNLP](https://allennlp.org/),
|
||
[StanfordNLP](https://stanfordnlp.github.io/stanfordnlp/) and
|
||
[TensorFlow](https://www.tensorflow.org/)) to help you get a feel for things fit
|
||
together.
|
||
|
||
| | spaCy | NLTK | Allen-<br />NLP | Stanford-<br />NLP | Tensor-<br />Flow |
|
||
| ----------------------------------------------------------------- | :---: | :--: | :-------------: | :----------------: | :---------------: |
|
||
| I'm a beginner and just getting started with NLP. | ✅ | ✅ | ❌ | ✅ | ❌ |
|
||
| I want to build an end-to-end production application. | ✅ | ❌ | ❌ | ❌ | ✅ |
|
||
| I want to try out different neural network architectures for NLP. | ❌ | ❌ | ✅ | ❌ | ✅ |
|
||
| I want to try the latest models with state-of-the-art accuracy. | ❌ | ❌ | ✅ | ✅ | ✅ |
|
||
| I want to train models from my own data. | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||
| I want my application to be efficient on CPU. | ✅ | ✅ | ❌ | ❌ | ❌ |
|
||
|
||
## Benchmarks {#benchmarks}
|
||
|
||
Two peer-reviewed papers in 2015 confirmed that spaCy offers the **fastest
|
||
syntactic parser in the world** and that **its accuracy is within 1% of the
|
||
best** available. The few systems that are more accurate are 20× slower or more.
|
||
|
||
> #### About the evaluation
|
||
>
|
||
> The first of the evaluations was published by **Yahoo! Labs** and **Emory
|
||
> University**, as part of a survey of current parsing technologies
|
||
> ([Choi et al., 2015](https://aclweb.org/anthology/P/P15/P15-1038.pdf)). Their
|
||
> results and subsequent discussions helped us develop a novel
|
||
> psychologically-motivated technique to improve spaCy's accuracy, which we
|
||
> published in joint work with Macquarie University
|
||
> ([Honnibal and Johnson, 2015](https://www.aclweb.org/anthology/D/D15/D15-1162.pdf)).
|
||
|
||
import BenchmarksChoi from 'usage/\_benchmarks-choi.md'
|
||
|
||
<BenchmarksChoi />
|
||
|
||
### Algorithm comparison {#algorithm}
|
||
|
||
In this section, we compare spaCy's algorithms to recently published systems,
|
||
using some of the most popular benchmarks. These benchmarks are designed to help
|
||
isolate the contributions of specific algorithmic decisions, so they promote
|
||
slightly "idealized" conditions. Specifically, the text comes pre-processed with
|
||
"gold standard" token and sentence boundaries. The data sets also tend to be
|
||
fairly small, to help researchers iterate quickly. These conditions mean the
|
||
models trained on these data sets are not always useful for practical purposes.
|
||
|
||
#### Parse accuracy (Penn Treebank / Wall Street Journal) {#parse-accuracy-penn}
|
||
|
||
This is the "classic" evaluation, so it's the number parsing researchers are
|
||
most easily able to put in context. However, it's quite far removed from actual
|
||
usage: it uses sentences with gold-standard segmentation and tokenization, from
|
||
a pretty specific type of text (articles from a single newspaper, 1984-1989).
|
||
|
||
> #### Methodology
|
||
>
|
||
> [Andor et al. (2016)](http://arxiv.org/abs/1603.06042) chose slightly
|
||
> different experimental conditions from
|
||
> [Choi et al. (2015)](https://aclweb.org/anthology/P/P15/P15-1038.pdf), so the
|
||
> two accuracy tables here do not present directly comparable figures.
|
||
|
||
| System | Year | Type | Accuracy |
|
||
| ------------------------------------------------------------ | ---- | ------ | --------: |
|
||
| spaCy v2.0.0 | 2017 | neural | 94.48 |
|
||
| spaCy v1.1.0 | 2016 | linear | 92.80 |
|
||
| [Dozat and Manning][dozat and manning] | 2017 | neural | **95.75** |
|
||
| [Andor et al.][andor et al.] | 2016 | neural | 94.44 |
|
||
| [SyntaxNet Parsey McParseface][syntaxnet parsey mcparseface] | 2016 | neural | 94.15 |
|
||
| [Weiss et al.][weiss et al.] | 2015 | neural | 93.91 |
|
||
| [Zhang and McDonald][zhang and mcdonald] | 2014 | linear | 93.32 |
|
||
| [Martins et al.][martins et al.] | 2013 | linear | 93.10 |
|
||
|
||
[dozat and manning]: https://arxiv.org/pdf/1611.01734.pdf
|
||
[andor et al.]: http://arxiv.org/abs/1603.06042
|
||
[syntaxnet parsey mcparseface]:
|
||
https://github.com/tensorflow/models/tree/master/research/syntaxnet
|
||
[weiss et al.]:
|
||
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43800.pdf
|
||
[zhang and mcdonald]: http://research.google.com/pubs/archive/38148.pdf
|
||
[martins et al.]: http://www.cs.cmu.edu/~ark/TurboParser/
|
||
|
||
#### NER accuracy (OntoNotes 5, no pre-process) {#ner-accuracy-ontonotes5}
|
||
|
||
This is the evaluation we use to tune spaCy's parameters to decide which
|
||
algorithms are better than the others. It's reasonably close to actual usage,
|
||
because it requires the parses to be produced from raw text, without any
|
||
pre-processing.
|
||
|
||
| System | Year | Type | Accuracy |
|
||
| -------------------------------------------------- | ---- | ------ | --------: |
|
||
| spaCy [`en_core_web_lg`][en_core_web_lg] v2.0.0a3 | 2017 | neural | 85.85 |
|
||
| [Strubell et al.][strubell et al.] | 2017 | neural | **86.81** |
|
||
| [Chiu and Nichols][chiu and nichols] | 2016 | neural | 86.19 |
|
||
| [Durrett and Klein][durrett and klein] | 2014 | neural | 84.04 |
|
||
| [Ratinov and Roth][ratinov and roth] | 2009 | linear | 83.45 |
|
||
|
||
[en_core_web_lg]: /models/en#en_core_web_lg
|
||
[strubell et al.]: https://arxiv.org/pdf/1702.02098.pdf
|
||
[chiu and nichols]:
|
||
https://www.semanticscholar.org/paper/Named-Entity-Recognition-with-Bidirectional-LSTM-C-Chiu-Nichols/10a4db59e81d26b2e0e896d3186ef81b4458b93f
|
||
[durrett and klein]:
|
||
https://www.semanticscholar.org/paper/A-Joint-Model-for-Entity-Analysis-Coreference-Typi-Durrett-Klein/28eb033eee5f51c5e5389cbb6b777779203a6778
|
||
[ratinov and roth]: http://www.aclweb.org/anthology/W09-1119
|
||
|
||
### Model comparison {#spacy-models}
|
||
|
||
In this section, we provide benchmark accuracies for the pretrained model
|
||
pipelines we distribute with spaCy. Evaluations are conducted end-to-end from
|
||
raw text, with no "gold standard" pre-processing, over text from a mix of genres
|
||
where possible.
|
||
|
||
> #### Methodology
|
||
>
|
||
> The evaluation was conducted on raw text with no gold standard information.
|
||
> The parser, tagger and entity recognizer were trained on the
|
||
> [OntoNotes 5](https://www.gabormelli.com/RKB/OntoNotes_Corpus) corpus, the
|
||
> word vectors on [Common Crawl](http://commoncrawl.org).
|
||
|
||
#### English {#benchmarks-models-english}
|
||
|
||
| Model | spaCy | Type | UAS | NER F | POS | WPS | Size |
|
||
| --------------------------------------------------- | ----- | ------ | -------: | -------: | -------: | --------: | -------: |
|
||
| [`en_core_web_sm`](/models/en#en_core_web_sm) 2.0.0 | 2.x | neural | 91.7 | 85.3 | 97.0 | 10.1k | **35MB** |
|
||
| [`en_core_web_md`](/models/en#en_core_web_md) 2.0.0 | 2.x | neural | 91.7 | **85.9** | 97.1 | 10.0k | 115MB |
|
||
| [`en_core_web_lg`](/models/en#en_core_web_lg) 2.0.0 | 2.x | neural | **91.9** | **85.9** | **97.2** | 10.0k | 812MB |
|
||
| `en_core_web_sm` 1.2.0 | 1.x | linear | 86.6 | 78.5 | 96.6 | **25.7k** | 50MB |
|
||
| `en_core_web_md` 1.2.1 | 1.x | linear | 90.6 | 81.4 | 96.7 | 18.8k | 1GB |
|
||
|
||
#### Spanish {#benchmarks-models-spanish}
|
||
|
||
> #### Evaluation note
|
||
>
|
||
> The NER accuracy refers to the "silver standard" annotations in the WikiNER
|
||
> corpus. Accuracy on these annotations tends to be higher than correct human
|
||
> annotations.
|
||
|
||
| Model | spaCy | Type | UAS | NER F | POS | WPS | Size |
|
||
| ----------------------------------------------------- | ----- | ------ | -------: | -------: | -------: | ----: | -------: |
|
||
| [`es_core_news_sm`](/models/es#es_core_news_sm) 2.0.0 | 2.x | neural | 89.8 | 88.7 | **96.9** | _n/a_ | **35MB** |
|
||
| [`es_core_news_md`](/models/es#es_core_news_md) 2.0.0 | 2.x | neural | **90.2** | 89.0 | 97.8 | _n/a_ | 93MB |
|
||
| `es_core_web_md` 1.1.0 | 1.x | linear | 87.5 | **94.2** | 96.7 | _n/a_ | 377MB |
|
||
|
||
### Detailed speed comparison {#speed-comparison}
|
||
|
||
Here we compare the per-document processing time of various spaCy
|
||
functionalities against other NLP libraries. We show both absolute timings (in
|
||
ms) and relative performance (normalized to spaCy). Lower is better.
|
||
|
||
<Infobox title="Important note" variant="warning">
|
||
|
||
This evaluation was conducted in 2015. We're working on benchmarks on current
|
||
CPU and GPU hardware. In the meantime, we're grateful to the Stanford folks for
|
||
drawing our attention to what seems to be
|
||
[a long-standing error](https://nlp.stanford.edu/software/tokenizer.html#Speed)
|
||
in our CoreNLP benchmarks, especially for their tokenizer. Until we run
|
||
corrected experiments, we have updated the table using their figures.
|
||
|
||
</Infobox>
|
||
|
||
> #### Methodology
|
||
>
|
||
> - **Set up:** 100,000 plain-text documents were streamed from an SQLite3
|
||
> database, and processed with an NLP library, to one of three levels of
|
||
> detail — tokenization, tagging, or parsing. The tasks are additive: to parse
|
||
> the text you have to tokenize and tag it. The pre-processing was not
|
||
> subtracted from the times — we report the time required for the pipeline to
|
||
> complete. We report mean times per document, in milliseconds.
|
||
> - **Hardware**: Intel i7-3770 (2012)
|
||
> - **Implementation**:
|
||
> [`spacy-benchmarks`](https://github.com/explosion/spacy-benchmarks)
|
||
|
||
<Table>
|
||
<thead>
|
||
<Tr>
|
||
<Th></Th>
|
||
<Th colSpan="3">Absolute (ms per doc)</Th>
|
||
<Th colSpan="3">Relative (to spaCy)</Th>
|
||
</Tr>
|
||
<Tr>
|
||
<Th>System</Th>
|
||
<Th>Tokenize</Th>
|
||
<Th>Tag</Th>
|
||
<Th>Parse</Th>
|
||
<Th>Tokenize</Th>
|
||
<Th>Tag</Th>
|
||
<Th>Parse</Th>
|
||
</Tr>
|
||
</thead>
|
||
<tbody style="text-align: right">
|
||
<Tr>
|
||
<Td style="text-align: left"><strong>spaCy</strong></Td>
|
||
<Td>0.2ms</Td>
|
||
<Td>1ms</Td>
|
||
<Td>19ms</Td>
|
||
<Td>1x</Td>
|
||
<Td>1x</Td>
|
||
<Td>1x</Td>
|
||
</Tr>
|
||
<Tr>
|
||
<Td style="text-align: left">CoreNLP</Td>
|
||
<Td>0.18ms</Td>
|
||
<Td>10ms</Td>
|
||
<Td>49ms</Td>
|
||
<Td>0.9x</Td>
|
||
<Td>10x</Td>
|
||
<Td>2.6x</Td>
|
||
</Tr>
|
||
<Tr>
|
||
<Td style="text-align: left">ZPar</Td>
|
||
<Td>1ms</Td>
|
||
<Td>8ms</Td>
|
||
<Td>850ms</Td>
|
||
<Td>5x</Td>
|
||
<Td>8x</Td>
|
||
<Td>44.7x</Td>
|
||
</Tr>
|
||
<Tr>
|
||
<Td style="text-align: left">NLTK</Td>
|
||
<Td>4ms</Td>
|
||
<Td>443ms</Td>
|
||
<Td><em>n/a</em></Td>
|
||
<Td>20x</Td>
|
||
<Td>443x</Td>
|
||
<Td><em>n/a</em></Td>
|
||
</Tr>
|
||
</tbody>
|
||
</Table>
|