mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-04 22:36:32 +03:00
49cee4af92
* Integrate Python kernel via Binder * Add live model test for languages with examples * Update docs and code examples * Adjust margin (if not bootstrapped) * Add binder version to global config * Update terminal and executable code mixins * Pass attributes through infobox and section * Hide v-cloak * Fix example * Take out model comparison for now * Add meta text for compat * Remove chart.js dependency * Tidy up and simplify JS and port big components over to Vue * Remove chartjs example * Add Twitter icon * Add purple stylesheet option * Add utility for hand cursor (special cases only) * Add transition classes * Add small option for section * Add thumb object for small round thumbnail images * Allow unset code block language via "none" value (workaround to still allow unset language to default to DEFAULT_SYNTAX) * Pass through attributes * Add syntax highlighting definitions for Julia, R and Docker * Add website icon * Remove user survey from navigation * Don't hide GitHub icon on small screens * Make top navigation scrollable on small screens * Remove old resources page and references to it * Add Universe * Add helper functions for better page URL and title * Update site description * Increment versions * Update preview images * Update mentions of resources * Fix image * Fix social images * Fix problem with cover sizing and floats * Add divider and move badges into heading * Add docstrings * Reference converting section * Add section on converting word vectors * Move converting section to custom section and fix formatting * Remove old fastText example * Move extensions content to own section Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary) * Use better component example and add factories section * Add note on larger model * Use better example for non-vector * Remove similarity in context section Only works via small models with tensors so has always been kind of confusing * Add note on init-model command * Fix lightning tour examples and make excutable if possible * Add spacy train CLI section to train * Fix formatting and add video * Fix formatting * Fix textcat example description (resolves #2246) * Add dummy file to try resolve conflict * Delete dummy file * Tidy up [ci skip] * Ensure sufficient height of loading container * Add loading animation to universe * Update Thebelab build and use better startup message * Fix asset versioning * Fix typo [ci skip] * Add note on project idea label
104 lines
3.2 KiB
Plaintext
104 lines
3.2 KiB
Plaintext
//- 💫 DOCS > USAGE > FACTS & FIGURES > BENCHMARKS > MODEL COMPARISON
|
|
|
|
p
|
|
| In this section, we provide benchmark accuracies for the pre-trained
|
|
| model pipelines we distribute with spaCy. Evaluations are conducted
|
|
| end-to-end from raw text, with no "gold standard" pre-processing, over
|
|
| text from a mix of genres where possible.
|
|
|
|
+aside("Methodology")
|
|
| The evaluation was conducted on raw text with no gold standard
|
|
| information. The parser, tagger and entity recognizer were trained on the
|
|
| #[+a("https://www.gabormelli.com/RKB/OntoNotes_Corpus") OntoNotes 5]
|
|
| corpus, the word vectors on #[+a("http://commoncrawl.org") Common Crawl].
|
|
|
|
+h(4, "benchmarks-models-english") English
|
|
|
|
+table(["Model", "spaCy", "Type", "UAS", "NER F", "POS", "WPS", "Size"])
|
|
+row
|
|
+cell #[+a("/models/en#en_core_web_sm") #[code en_core_web_sm]] 2.0.0
|
|
+cell("num") 2.x
|
|
+cell neural
|
|
+cell("num") 91.7
|
|
+cell("num") 85.3
|
|
+cell("num") 97.0
|
|
+cell("num") 10.1k
|
|
+cell("num") #[strong 35MB]
|
|
|
|
+row
|
|
+cell #[+a("/models/en#en_core_web_md") #[code en_core_web_md]] 2.0.0
|
|
+cell("num") 2.x
|
|
+cell neural
|
|
+cell("num") 91.7
|
|
+cell("num") #[strong 85.9]
|
|
+cell("num") 97.1
|
|
+cell("num") 10.0k
|
|
+cell("num") 115MB
|
|
|
|
+row
|
|
+cell #[+a("/models/en#en_core_web_lg") #[code en_core_web_lg]] 2.0.0
|
|
+cell("num") 2.x
|
|
+cell neural
|
|
+cell("num") #[strong 91.9]
|
|
+cell("num") #[strong 85.9]
|
|
+cell("num") #[strong 97.2]
|
|
+cell("num") 10.0k
|
|
+cell("num") 812MB
|
|
|
|
+row("divider")
|
|
+cell #[code en_core_web_sm] 1.2.0
|
|
+cell("num") 1.x
|
|
+cell linear
|
|
+cell("num") 86.6
|
|
+cell("num") 78.5
|
|
+cell("num") 96.6
|
|
+cell("num") #[strong 25.7k]
|
|
+cell("num") 50MB
|
|
|
|
+row
|
|
+cell #[code en_core_web_md] 1.2.1
|
|
+cell("num") 1.x
|
|
+cell linear
|
|
+cell("num") 90.6
|
|
+cell("num") 81.4
|
|
+cell("num") 96.7
|
|
+cell("num") 18.8k
|
|
+cell("num") 1GB
|
|
|
|
+h(4, "benchmarks-models-spanish") Spanish
|
|
|
|
+aside("Evaluation note")
|
|
| The NER accuracy refers to the "silver standard" annotations in the
|
|
| WikiNER corpus. Accuracy on these annotations tends to be higher than
|
|
| correct human annotations.
|
|
|
|
+table(["Model", "spaCy", "Type", "UAS", "NER F", "POS", "WPS", "Size"])
|
|
+row
|
|
+cell #[+a("/models/es#es_core_news_sm") #[code es_core_news_sm]] 2.0.0
|
|
+cell("num") 2.x
|
|
+cell("num") neural
|
|
+cell("num") 89.8
|
|
+cell("num") 88.7
|
|
+cell("num") #[strong 96.9]
|
|
+cell("num") #[em n/a]
|
|
+cell("num") #[strong 35MB]
|
|
|
|
+row
|
|
+cell #[+a("/models/es#es_core_news_md") #[code es_core_news_md]] 2.0.0
|
|
+cell("num") 2.x
|
|
+cell("num") neural
|
|
+cell("num") #[strong 90.2]
|
|
+cell("num") 89.0
|
|
+cell("num") 97.8
|
|
+cell("num") #[em n/a]
|
|
+cell("num") 93MB
|
|
|
|
+row("divider")
|
|
+cell #[code es_core_web_md] 1.1.0
|
|
each data in ["1.x", "linear", 87.5]
|
|
+cell("num")=data
|
|
+cell("num") #[strong 94.2]
|
|
+cell("num") 96.7
|
|
+cell("num") #[em n/a]
|
|
+cell("num") 377MB
|