Update docs [ci skip]

This commit is contained in:
Ines Montani 2020-10-15 12:35:30 +02:00
parent 4fa869e6f7
commit 7f05ccc170
2 changed files with 21 additions and 6 deletions

View File

@ -6,7 +6,7 @@ import { Help } from 'components/typography'; import Link from 'components/link'
| Pipeline | Parser | Tagger | NER | WPS<br />CPU <Help>words per second on CPU, higher is better</Help> | WPS<br/>GPU <Help>words per second on GPU, higher is better</Help> |
| ---------------------------------------------------------- | -----: | -----: | ---: | ------------------------------------------------------------------: | -----------------------------------------------------------------: |
| [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | 95.5 | 98.3 | 89.7 | 1k | 8k |
| [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | 95.5 | 98.3 | 89.4 | 1k | 8k |
| [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3) | 92.2 | 97.4 | 85.4 | 7k | |
| `en_core_web_lg` (spaCy v2) | 91.9 | 97.2 | 85.7 | 10k | |

View File

@ -77,6 +77,26 @@ import Benchmarks from 'usage/\_benchmarks-models.md'
<Benchmarks />
#### New trained transformer-based pipelines {#features-transformers-pipelines}
> #### Notes on model capabilities
>
> The models are each trained with a **single transformer** shared across the
> pipeline, which requires it to be trained on a single corpus. For
> [English](/models/en) and [Chinese](/models/zh), we used the OntoNotes 5
> corpus, which has annotations across several tasks. For [French](/models/fr),
> [Spanish](/models/es) and [German](/models/de), we didn't have a suitable
> corpus that had both syntactic and entity annotations, so the transformer
> models for those languages do not include NER.
| Package | Language | Transformer | Tagger | Parser |  NER |
| ------------------------------------------------ | -------- | --------------------------------------------------------------------------------------------- | -----: | -----: | ---: |
| [`en_core_web_trf`](/models/en#en_core_web_trf) | English | [`roberta-base`](https://huggingface.co/roberta-base) | 97.8 | 95.0 | 89.4 |
| [`de_dep_news_trf`](/models/de#de_dep_news_trf) | German | [`bert-base-german-cased`](https://huggingface.co/bert-base-german-cased) | 99.0 | 95.8 | - |
| [`es_dep_news_trf`](/models/es#es_dep_news_trf) | Spanish | [`bert-base-spanish-wwm-cased`](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) | 98.2 | 94.6 | - |
| [`fr_dep_news_trf`](/models/fr#fr_dep_news_trf) | French | [`camembert-base`](https://huggingface.co/camembert-base) | 95.7 | 94.9 | - |
| [`zh_core_web_trf`](/models/zh#zh_core_news_trf) | Chinese | [`bert-base-chinese`](https://huggingface.co/bert-base-chinese) | 92.5 | 77.2 | 75.6 |
<Infobox title="Details & Documentation" emoji="📖" list>
- **Usage:** [Embeddings & Transformers](/usage/embeddings-transformers),
@ -88,11 +108,6 @@ import Benchmarks from 'usage/\_benchmarks-models.md'
- **Architectures: ** [TransformerModel](/api/architectures#TransformerModel),
[TransformerListener](/api/architectures#TransformerListener),
[Tok2VecTransformer](/api/architectures#Tok2VecTransformer)
- **Trained Pipelines:** [`en_core_web_trf`](/models/en#en_core_web_trf),
[`de_dep_news_trf`](/models/de#de_dep_news_trf),
[`es_dep_news_trf`](/models/es#es_dep_news_trf),
[`fr_dep_news_trf`](/models/fr#fr_dep_news_trf),
[`zh_core_web_trf`](/models/zh#zh_core_web_trf)
- **Implementation:**
[`spacy-transformers`](https://github.com/explosion/spacy-transformers)