spaCy/website/usage/_facts-figures/_benchmarks-models.jade
Ines Montani 49cee4af92
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)
* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label
2018-04-29 02:06:46 +02:00

104 lines
3.2 KiB
Plaintext

//- 💫 DOCS > USAGE > FACTS & FIGURES > BENCHMARKS > MODEL COMPARISON
p
| In this section, we provide benchmark accuracies for the pre-trained
| model pipelines we distribute with spaCy. Evaluations are conducted
| end-to-end from raw text, with no "gold standard" pre-processing, over
| text from a mix of genres where possible.
+aside("Methodology")
| The evaluation was conducted on raw text with no gold standard
| information. The parser, tagger and entity recognizer were trained on the
| #[+a("https://www.gabormelli.com/RKB/OntoNotes_Corpus") OntoNotes 5]
| corpus, the word vectors on #[+a("http://commoncrawl.org") Common Crawl].
+h(4, "benchmarks-models-english") English
+table(["Model", "spaCy", "Type", "UAS", "NER F", "POS", "WPS", "Size"])
+row
+cell #[+a("/models/en#en_core_web_sm") #[code en_core_web_sm]] 2.0.0
+cell("num") 2.x
+cell neural
+cell("num") 91.7
+cell("num") 85.3
+cell("num") 97.0
+cell("num") 10.1k
+cell("num") #[strong 35MB]
+row
+cell #[+a("/models/en#en_core_web_md") #[code en_core_web_md]] 2.0.0
+cell("num") 2.x
+cell neural
+cell("num") 91.7
+cell("num") #[strong 85.9]
+cell("num") 97.1
+cell("num") 10.0k
+cell("num") 115MB
+row
+cell #[+a("/models/en#en_core_web_lg") #[code en_core_web_lg]] 2.0.0
+cell("num") 2.x
+cell neural
+cell("num") #[strong 91.9]
+cell("num") #[strong 85.9]
+cell("num") #[strong 97.2]
+cell("num") 10.0k
+cell("num") 812MB
+row("divider")
+cell #[code en_core_web_sm] 1.2.0
+cell("num") 1.x
+cell linear
+cell("num") 86.6
+cell("num") 78.5
+cell("num") 96.6
+cell("num") #[strong 25.7k]
+cell("num") 50MB
+row
+cell #[code en_core_web_md] 1.2.1
+cell("num") 1.x
+cell linear
+cell("num") 90.6
+cell("num") 81.4
+cell("num") 96.7
+cell("num") 18.8k
+cell("num") 1GB
+h(4, "benchmarks-models-spanish") Spanish
+aside("Evaluation note")
| The NER accuracy refers to the "silver standard" annotations in the
| WikiNER corpus. Accuracy on these annotations tends to be higher than
| correct human annotations.
+table(["Model", "spaCy", "Type", "UAS", "NER F", "POS", "WPS", "Size"])
+row
+cell #[+a("/models/es#es_core_news_sm") #[code es_core_news_sm]] 2.0.0
+cell("num") 2.x
+cell("num") neural
+cell("num") 89.8
+cell("num") 88.7
+cell("num") #[strong 96.9]
+cell("num") #[em n/a]
+cell("num") #[strong 35MB]
+row
+cell #[+a("/models/es#es_core_news_md") #[code es_core_news_md]] 2.0.0
+cell("num") 2.x
+cell("num") neural
+cell("num") #[strong 90.2]
+cell("num") 89.0
+cell("num") 97.8
+cell("num") #[em n/a]
+cell("num") 93MB
+row("divider")
+cell #[code es_core_web_md] 1.1.0
each data in ["1.x", "linear", 87.5]
+cell("num")=data
+cell("num") #[strong 94.2]
+cell("num") 96.7
+cell("num") #[em n/a]
+cell("num") 377MB