Update docs and resolve todos [ci skip]

This commit is contained in:
Ines Montani 2020-09-24 13:41:25 +02:00
parent d7ab6a2ffe
commit 6836b66433
6 changed files with 20 additions and 13 deletions

View File

@ -1,10 +1,10 @@
import { Help } from 'components/typography'; import Link from 'components/link'
<!-- TODO: update, add project template -->
<!-- TODO: update numbers -->
<figure>
| System | Parser | Tagger | NER | WPS<br />CPU <Help>words per second on CPU, higher is better</Help> | WPS<br/>GPU <Help>words per second on GPU, higher is better</Help> |
| Pipeline | Parser | Tagger | NER | WPS<br />CPU <Help>words per second on CPU, higher is better</Help> | WPS<br/>GPU <Help>words per second on GPU, higher is better</Help> |
| ---------------------------------------------------------- | -----: | -----: | ---: | ------------------------------------------------------------------: | -----------------------------------------------------------------: |
| [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | | | | | 6k |
| [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3) | | | | | |
@ -21,10 +21,10 @@ import { Help } from 'components/typography'; import Link from 'components/link'
<figure>
| Named Entity Recognition Model | OntoNotes | CoNLL '03 |
| Named Entity Recognition System | OntoNotes | CoNLL '03 |
| ------------------------------------------------------------------------------ | --------: | --------: |
| spaCy RoBERTa (2020) | | 92.2 |
| spaCy CNN (2020) | | 88.4 |
| spaCy CNN (2020) | 85.3 | 88.4 |
| spaCy CNN (2017) | 86.4 | |
| [Stanza](https://stanfordnlp.github.io/stanza/) (StanfordNLP)<sup>1</sup> | 88.8 | 92.1 |
| <Link to="https://github.com/flairNLP/flair" hideIcon>Flair</Link><sup>2</sup> | 89.7 | 93.1 |

View File

@ -235,8 +235,6 @@ The `Transformer` component sets the
[`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
which lets you access the transformers outputs at runtime.
<!-- TODO: update/confirm once we have final models trained -->
```cli
$ python -m spacy download en_core_trf_lg
```

View File

@ -63,7 +63,7 @@ import Benchmarks from 'usage/\_benchmarks-models.md'
<figure>
| System | UAS | LAS |
| Dependency Parsing System | UAS | LAS |
| ------------------------------------------------------------------------------ | ---: | ---: |
| spaCy RoBERTa (2020)<sup>1</sup> | 96.8 | 95.0 |
| spaCy CNN (2020)<sup>1</sup> | 93.7 | 91.8 |

View File

@ -1654,9 +1654,12 @@ The [`SentenceRecognizer`](/api/sentencerecognizer) is a simple statistical
component that only provides sentence boundaries. Along with being faster and
smaller than the parser, its primary advantage is that it's easier to train
because it only requires annotated sentence boundaries rather than full
dependency parses.
<!-- TODO: update/confirm usage once we have final models trained -->
dependency parses. spaCy's [trained pipelines](/models) include both a parser
and a trained sentence segmenter, which is
[disabled](/usage/processing-pipelines#disabling) by default. If you only need
sentence boundaries and no parser, you can use the `enable` and `disable`
arguments on [`spacy.load`](/api/top-level#spacy.load) to enable the senter and
disable the parser.
> #### senter vs. parser
>

View File

@ -253,8 +253,6 @@ different mechanisms you can use:
Disabled and excluded component names can be provided to
[`spacy.load`](/api/top-level#spacy.load) as a list.
<!-- TODO: update with info on our models shipped with optional components -->
> #### 💡 Optional pipeline components
>
> The `disable` mechanism makes it easy to distribute pipeline packages with
@ -262,6 +260,11 @@ Disabled and excluded component names can be provided to
> your pipeline may include a statistical _and_ a rule-based component for
> sentence segmentation, and you can choose which one to run depending on your
> use case.
>
> For example, spaCy's [trained pipelines](/models) like
> [`en_core_web_sm`](/models/en#en_core_web_sm) contain both a `parser` and
> `senter` that perform sentence segmentation, but the `senter` is disabled by
> default.
```python
# Load the pipeline without the entity recognizer

View File

@ -733,7 +733,10 @@ workflows, but only one can be tracked by DVC.
<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
The Prodigy integration will require a nightly version of Prodigy that supports
spaCy v3+.
spaCy v3+. You can already use annotations created with Prodigy in spaCy v3 by
exporting your data with
[`data-to-spacy`](https://prodi.gy/docs/recipes#data-to-spacy) and running
[`spacy convert`](/api/cli#convert) to convert it to the binary format.
</Infobox>