spaCy/website/usage/_facts-figures/_benchmarks-models.jade

//- 💫 DOCS > USAGE > FACTS & FIGURES > BENCHMARKS > MODEL COMPARISON

p
    |  In this section, we provide benchmark accuracies for the pre-trained
    |  model pipelines we distribute with spaCy. Evaluations are conducted
    |  end-to-end from raw text, with no "gold standard" pre-processing, over
    |  text from a mix of genres where possible. For are more detailed
    |  comparison of the available models, see the new
    |  #[+a("/models/comparison") model comparison tool].

+aside("Methodology")
    |  The evaluation was conducted on raw text with no gold standard
    |  information. The parser, tagger and entity recognizer were trained on the
    |  #[+a("https://www.gabormelli.com/RKB/OntoNotes_Corpus") OntoNotes 5]
    |  corpus, the word vectors on #[+a("http://commoncrawl.org") Common Crawl].

+h(4, "benchmarks-models-english") English

+table(["Model", "spaCy", "Type", "UAS", "NER F", "POS", "WPS", "Size"])
    +row
        +cell #[+a("/models/en#en_core_web_sm") #[code en_core_web_sm]] 2.0.0
        each data in ["2.x", "neural"]
            +cell("num")=data
        +cell("num") 91.7
        +cell("num") 85.3
        +cell("num") 97.0
        +cell("num") 10.1k
        +cell("num") #[strong 35MB]

    +row
        +cell #[+a("/models/en#en_core_web_lg") #[code en_core_web_lg]] 2.0.0
        each data in ["2.x", "neural"]
            +cell("num")=data
        +cell("num") #[strong 91.9]
        +cell("num") #[strong 85.9]
        +cell("num") #[strong 97.2]
        +cell("num") 10.0k
        +cell("num") 812MB

    +row("divider")
        +cell #[code en_core_web_sm] 1.2.0
        each data in ["1.x", "linear", 86.6, 78.5, 96.6]
            +cell("num")=data
        +cell("num") #[strong 25.7k]
        +cell("num") 50MB

    +row
        +cell #[code en_core_web_md] 1.2.1
        each data in ["1.x", "linear", 90.6, 81.4, 96.7, "18.8k", "1GB"]
            +cell("num")=data

+h(4, "benchmarks-models-spanish") Spanish

+aside("Evaluation note")
    |  The NER accuracy refers to the "silver standard" annotations in the
    |  WikiNER corpus. Accuracy on these annotations tends to be higher than
    |  correct human annotations.

+table(["Model", "spaCy", "Type", "UAS", "NER F", "POS", "WPS", "Size"])
    +row
        +cell #[+a("/models/es#es_core_news_sm") #[code es_core_news_sm]] 2.0.0
        +cell("num") 2.x
        +cell("num") neural
        +cell("num") 89.8
        +cell("num") 88.7
        +cell("num") #[strong 96.9]
        +cell("num") #[em n/a]
        +cell("num") #[strong 35MB]

    +row
        +cell #[+a("/models/es#es_core_news_md") #[code es_core_news_md]] 2.0.0
        +cell("num") 2.x
        +cell("num") neural
        +cell("num") #[strong 90.2]
        +cell("num") 89.0
        +cell("num") 97.8
        +cell("num") #[em n/a]
        +cell("num") 93MB

    +row("divider")
        +cell #[code es_core_web_md] 1.1.0
        each data in ["1.x", "linear", 87.5]
            +cell("num")=data
        +cell("num") #[strong 94.2]
        +cell("num") 96.7
        +cell("num") #[em n/a]
        +cell("num") 377MB
Update usage documentation 2017-10-03 15:26:20 +03:00			`//- 💫 DOCS > USAGE > FACTS & FIGURES > BENCHMARKS > MODEL COMPARISON`

			`p`
			`\| In this section, we provide benchmark accuracies for the pre-trained`
			`\| model pipelines we distribute with spaCy. Evaluations are conducted`
			`\| end-to-end from raw text, with no "gold standard" pre-processing, over`
Update benchmarks and models 2017-11-06 20:19:00 +03:00			`\| text from a mix of genres where possible. For are more detailed`
			`\| comparison of the available models, see the new`
			`\| #[+a("/models/comparison") model comparison tool].`
Update usage documentation 2017-10-03 15:26:20 +03:00
			`+aside("Methodology")`
			`\| The evaluation was conducted on raw text with no gold standard`
			`\| information. The parser, tagger and entity recognizer were trained on the`
			`\| #[+a("https://www.gabormelli.com/RKB/OntoNotes_Corpus") OntoNotes 5]`
			`\| corpus, the word vectors on #[+a("http://commoncrawl.org") Common Crawl].`

Update model benchmarks 2017-10-06 22:39:06 +03:00			`+h(4, "benchmarks-models-english") English`

Update usage documentation 2017-10-03 15:26:20 +03:00			`+table(["Model", "spaCy", "Type", "UAS", "NER F", "POS", "WPS", "Size"])`
			`+row`
Update website 2017-11-08 03:06:30 +03:00			`+cell #[+a("/models/en#en_core_web_sm") #[code en_core_web_sm]] 2.0.0`
Update usage documentation 2017-10-03 15:26:20 +03:00			`each data in ["2.x", "neural"]`
Update benchmarks and data table style 2017-11-06 21:36:02 +03:00			`+cell("num")=data`
			`+cell("num") 91.7`
			`+cell("num") 85.3`
			`+cell("num") 97.0`
			`+cell("num") 10.1k`
			`+cell("num") #[strong 35MB]`
Update usage documentation 2017-10-03 15:26:20 +03:00
			`+row`
Update website 2017-11-08 03:06:30 +03:00			`+cell #[+a("/models/en#en_core_web_lg") #[code en_core_web_lg]] 2.0.0`
Update usage documentation 2017-10-03 15:26:20 +03:00			`each data in ["2.x", "neural"]`
Update benchmarks and data table style 2017-11-06 21:36:02 +03:00			`+cell("num")=data`
			`+cell("num") #[strong 91.9]`
			`+cell("num") #[strong 85.9]`
			`+cell("num") #[strong 97.2]`
			`+cell("num") 10.0k`
			`+cell("num") 812MB`
Update usage documentation 2017-10-03 15:26:20 +03:00
			`+row("divider")`
			`+cell #[code en_core_web_sm] 1.2.0`
			`each data in ["1.x", "linear", 86.6, 78.5, 96.6]`
Update benchmarks and data table style 2017-11-06 21:36:02 +03:00			`+cell("num")=data`
			`+cell("num") #[strong 25.7k]`
			`+cell("num") 50MB`
Update usage documentation 2017-10-03 15:26:20 +03:00
			`+row`
			`+cell #[code en_core_web_md] 1.2.1`
Update benchmarks and data table style 2017-11-06 21:36:02 +03:00			`each data in ["1.x", "linear", 90.6, 81.4, 96.7, "18.8k", "1GB"]`
			`+cell("num")=data`
Update model benchmarks 2017-10-06 22:39:06 +03:00
			`+h(4, "benchmarks-models-spanish") Spanish`

Update benchmarks and models 2017-11-06 20:19:00 +03:00			`+aside("Evaluation note")`
			`\| The NER accuracy refers to the "silver standard" annotations in the`
			`\| WikiNER corpus. Accuracy on these annotations tends to be higher than`
			`\| correct human annotations.`

Update model benchmarks 2017-10-06 22:39:06 +03:00			`+table(["Model", "spaCy", "Type", "UAS", "NER F", "POS", "WPS", "Size"])`
			`+row`
Update website 2017-11-08 03:06:30 +03:00			`+cell #[+a("/models/es#es_core_news_sm") #[code es_core_news_sm]] 2.0.0`
Update benchmarks and data table style 2017-11-06 21:36:02 +03:00			`+cell("num") 2.x`
			`+cell("num") neural`
			`+cell("num") 89.8`
			`+cell("num") 88.7`
			`+cell("num") #[strong 96.9]`
			`+cell("num") #[em n/a]`
			`+cell("num") #[strong 35MB]`
Update benchmarks and models 2017-11-06 20:19:00 +03:00
			`+row`
Update website 2017-11-08 03:06:30 +03:00			`+cell #[+a("/models/es#es_core_news_md") #[code es_core_news_md]] 2.0.0`
Update benchmarks and data table style 2017-11-06 21:36:02 +03:00			`+cell("num") 2.x`
			`+cell("num") neural`
			`+cell("num") #[strong 90.2]`
			`+cell("num") 89.0`
			`+cell("num") 97.8`
			`+cell("num") #[em n/a]`
			`+cell("num") 93MB`
Update model benchmarks 2017-10-06 22:39:06 +03:00
			`+row("divider")`
			`+cell #[code es_core_web_md] 1.1.0`
			`each data in ["1.x", "linear", 87.5]`
Update benchmarks and data table style 2017-11-06 21:36:02 +03:00			`+cell("num")=data`
			`+cell("num") #[strong 94.2]`
			`+cell("num") 96.7`
			`+cell("num") #[em n/a]`
			`+cell("num") 377MB`