Update docs and resolve todos [ci skip]

2025-08-08 06:04:57 +03:00 · 2020-09-24 13:41:25 +02:00 · 2020-09-24 13:41:25 +02:00 · 6836b66433
commit 6836b66433
parent d7ab6a2ffe
6 changed files with 20 additions and 13 deletions
--- a/website/docs/usage/_benchmarks-models.md
+++ b/website/docs/usage/_benchmarks-models.md
@ -1,10 +1,10 @@
 import { Help } from 'components/typography'; import Link from 'components/link'

-<!-- TODO: update, add project template -->
+<!-- TODO: update numbers -->

 <figure>

-| System                                                     | Parser | Tagger |  NER | WPS<br />CPU <Help>words per second on CPU, higher is better</Help> | WPS<br/>GPU <Help>words per second on GPU, higher is better</Help> |
+| Pipeline                                                   | Parser | Tagger |  NER | WPS<br />CPU <Help>words per second on CPU, higher is better</Help> | WPS<br/>GPU <Help>words per second on GPU, higher is better</Help> |
 | ---------------------------------------------------------- | -----: | -----: | ---: | ------------------------------------------------------------------: | -----------------------------------------------------------------: |
 | [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) |        |        |      |                                                                     |                                                                 6k |
 | [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3)   |        |        |      |                                                                     |                                                                    |
@ -21,10 +21,10 @@ import { Help } from 'components/typography'; import Link from 'components/link'

 <figure>

-| Named Entity Recognition Model                                                 | OntoNotes | CoNLL '03 |
+| Named Entity Recognition System                                                | OntoNotes | CoNLL '03 |
 | ------------------------------------------------------------------------------ | --------: | --------: |
 | spaCy RoBERTa (2020)                                                           |           |      92.2 |
-| spaCy CNN (2020)                                                               |           |      88.4 |
+| spaCy CNN (2020)                                                               |      85.3 |      88.4 |
 | spaCy CNN (2017)                                                               |      86.4 |           |
 | [Stanza](https://stanfordnlp.github.io/stanza/) (StanfordNLP)<sup>1</sup>      |      88.8 |      92.1 |
 | <Link to="https://github.com/flairNLP/flair" hideIcon>Flair</Link><sup>2</sup> |      89.7 |      93.1 |
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@ -235,8 +235,6 @@ The `Transformer` component sets the
 [`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
 which lets you access the transformers outputs at runtime.

-<!-- TODO: update/confirm once we have final models trained -->
-
 ```cli
 $ python -m spacy download en_core_trf_lg
 ```
--- a/website/docs/usage/facts-figures.md
+++ b/website/docs/usage/facts-figures.md
@ -63,7 +63,7 @@ import Benchmarks from 'usage/\_benchmarks-models.md'

 <figure>

-| System                                                                         |  UAS |  LAS |
+| Dependency Parsing System                                                      |  UAS |  LAS |
 | ------------------------------------------------------------------------------ | ---: | ---: |
 | spaCy RoBERTa (2020)<sup>1</sup>                                               | 96.8 | 95.0 |
 | spaCy CNN (2020)<sup>1</sup>                                                   | 93.7 | 91.8 |
--- a/website/docs/usage/linguistic-features.md
+++ b/website/docs/usage/linguistic-features.md
@ -1654,9 +1654,12 @@ The [`SentenceRecognizer`](/api/sentencerecognizer) is a simple statistical
 component that only provides sentence boundaries. Along with being faster and
 smaller than the parser, its primary advantage is that it's easier to train
 because it only requires annotated sentence boundaries rather than full
-dependency parses.
-
-<!-- TODO: update/confirm usage once we have final models trained -->
+dependency parses. spaCy's [trained pipelines](/models) include both a parser
+and a trained sentence segmenter, which is
+[disabled](/usage/processing-pipelines#disabling) by default. If you only need
+sentence boundaries and no parser, you can use the `enable` and `disable`
+arguments on [`spacy.load`](/api/top-level#spacy.load) to enable the senter and
+disable the parser.

 > #### senter vs. parser
 >
--- a/website/docs/usage/processing-pipelines.md
+++ b/website/docs/usage/processing-pipelines.md
@ -253,8 +253,6 @@ different mechanisms you can use:
 Disabled and excluded component names can be provided to
 [`spacy.load`](/api/top-level#spacy.load) as a list.

-<!-- TODO: update with info on our models shipped with optional components -->
-
 > #### 💡 Optional pipeline components
 >
 > The `disable` mechanism makes it easy to distribute pipeline packages with
@ -262,6 +260,11 @@ Disabled and excluded component names can be provided to
 > your pipeline may include a statistical _and_ a rule-based component for
 > sentence segmentation, and you can choose which one to run depending on your
 > use case.
+>
+> For example, spaCy's [trained pipelines](/models) like
+> [`en_core_web_sm`](/models/en#en_core_web_sm) contain both a `parser` and
+> `senter` that perform sentence segmentation, but the `senter` is disabled by
+> default.

 ```python
 # Load the pipeline without the entity recognizer
--- a/website/docs/usage/projects.md
+++ b/website/docs/usage/projects.md
@ -733,7 +733,10 @@ workflows, but only one can be tracked by DVC.
 <Infobox title="This section is still under construction" emoji="🚧" variant="warning">

 The Prodigy integration will require a nightly version of Prodigy that supports
-spaCy v3+.
+spaCy v3+. You can already use annotations created with Prodigy in spaCy v3 by
+exporting your data with
+[`data-to-spacy`](https://prodi.gy/docs/recipes#data-to-spacy) and running
+[`spacy convert`](/api/cli#convert) to convert it to the binary format.

 </Infobox>