spaCy/website/docs/usage/facts-figures.md

---
title: Facts & Figures
teaser: The hard numbers for spaCy and how it compares to other tools
next: /usage/spacy-101
menu:
  - ['Feature Comparison', 'comparison']
  - ['Benchmarks', 'benchmarks']
  # TODO: - ['Citing spaCy', 'citation']
---

## Comparison {#comparison hidden="true"}

### When should I use spaCy? {#comparison-usage}

- ✅ **I'm a beginner and just getting started with NLP.** – spaCy makes it easy
  to get started and comes with extensive documentation, including a
  beginner-friendly [101 guide](/usage/spacy-101), a free interactive
  [online course](https://course.spacy.io) and a range of
  [video tutorials](https://www.youtube.com/c/ExplosionAI).
- ✅ **I want to build an end-to-end production application.** – spaCy is
  specifically designed for production use and lets you build and train powerful
  NLP pipelines and package them for easy deployment.
- ✅ **I want my application to be efficient on GPU _and_ CPU.** – While spaCy
  lets you train modern NLP models that are best run on GPU, it also offers
  CPU-optimized pipelines, which are less accurate but much cheaper to run.
- ✅ **I want to try out different neural network architectures for NLP.** –
  spaCy lets you customize and swap out the model architectures powering its
  components, and implement your own using a framework like PyTorch or
  TensorFlow. The declarative configuration system makes it easy to mix and
  match functions and keep track of your hyperparameters to make sure your
  experiments are reproducible.
- ❌ **I want to build a language generation application.** – spaCy's focus is
  natural language _processing_ and extracting information from large volumes of
  text. While you can use it to help you re-write existing text, it doesn't
  include any specific functionality for language generation tasks.
- ❌ **I want to research machine learning algorithms.** spaCy is built on the
  latest research, but it's not a research library. If your goal is to write
  papers and run benchmarks, spaCy is probably not a good choice. However, you
  can use it to make the results of your research easily available for others to
  use, e.g. via a custom spaCy component.

## Benchmarks {#benchmarks}

spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy
right up to **current state-of-the-art**. You can also use a CPU-optimized
pipeline, which is less accurate but much cheaper to run.

<!-- TODO: update benchmarks and intro -->

> #### Evaluation details
>
> - **OntoNotes 5.0:** spaCy's English models are trained on this corpus, as
>   it's several times larger than other English treebanks. However, most
>   systems do not report accuracies on it.
> - **Penn Treebank:** The "classic" parsing evaluation for research. However,
>   it's quite far removed from actual usage: it uses sentences with
>   gold-standard segmentation and tokenization, from a pretty specific type of
>   text (articles from a single newspaper, 1984-1989).

import Benchmarks from 'usage/\_benchmarks-models.md'

<Benchmarks />

<figure>

| Dependency Parsing System                                                      |  UAS |  LAS |
| ------------------------------------------------------------------------------ | ---: | ---: |
| spaCy RoBERTa (2020)<sup>1</sup>                                               | 96.8 | 95.0 |
| spaCy CNN (2020)<sup>1</sup>                                                   | 93.7 | 91.8 |
| [Mrini et al.](https://khalilmrini.github.io/Label_Attention_Layer.pdf) (2019) | 97.4 | 96.3 |
| [Zhou and Zhao](https://www.aclweb.org/anthology/P19-1230/) (2019)             | 97.2 | 95.7 |

<figcaption class="caption">

**Dependency parsing accuracy** on the Penn Treebank. See
[NLP-progress](http://nlpprogress.com/english/dependency_parsing.html) for more
results. **1. ** Project template:
[`benchmarks/parsing_penn_treebank`](%%GITHUB_PROJECTS/benchmarks/parsing_penn_treebank).

</figcaption>

</figure>

<!-- TODO: ## Citing spaCy {#citation}

-->