mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-26 01:46:28 +03:00
Update docs and resolve todos [ci skip]
This commit is contained in:
parent
d7ab6a2ffe
commit
6836b66433
|
@ -1,10 +1,10 @@
|
|||
import { Help } from 'components/typography'; import Link from 'components/link'
|
||||
|
||||
<!-- TODO: update, add project template -->
|
||||
<!-- TODO: update numbers -->
|
||||
|
||||
<figure>
|
||||
|
||||
| System | Parser | Tagger | NER | WPS<br />CPU <Help>words per second on CPU, higher is better</Help> | WPS<br/>GPU <Help>words per second on GPU, higher is better</Help> |
|
||||
| Pipeline | Parser | Tagger | NER | WPS<br />CPU <Help>words per second on CPU, higher is better</Help> | WPS<br/>GPU <Help>words per second on GPU, higher is better</Help> |
|
||||
| ---------------------------------------------------------- | -----: | -----: | ---: | ------------------------------------------------------------------: | -----------------------------------------------------------------: |
|
||||
| [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | | | | | 6k |
|
||||
| [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3) | | | | | |
|
||||
|
@ -21,10 +21,10 @@ import { Help } from 'components/typography'; import Link from 'components/link'
|
|||
|
||||
<figure>
|
||||
|
||||
| Named Entity Recognition Model | OntoNotes | CoNLL '03 |
|
||||
| Named Entity Recognition System | OntoNotes | CoNLL '03 |
|
||||
| ------------------------------------------------------------------------------ | --------: | --------: |
|
||||
| spaCy RoBERTa (2020) | | 92.2 |
|
||||
| spaCy CNN (2020) | | 88.4 |
|
||||
| spaCy CNN (2020) | 85.3 | 88.4 |
|
||||
| spaCy CNN (2017) | 86.4 | |
|
||||
| [Stanza](https://stanfordnlp.github.io/stanza/) (StanfordNLP)<sup>1</sup> | 88.8 | 92.1 |
|
||||
| <Link to="https://github.com/flairNLP/flair" hideIcon>Flair</Link><sup>2</sup> | 89.7 | 93.1 |
|
||||
|
|
|
@ -235,8 +235,6 @@ The `Transformer` component sets the
|
|||
[`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
|
||||
which lets you access the transformers outputs at runtime.
|
||||
|
||||
<!-- TODO: update/confirm once we have final models trained -->
|
||||
|
||||
```cli
|
||||
$ python -m spacy download en_core_trf_lg
|
||||
```
|
||||
|
|
|
@ -63,7 +63,7 @@ import Benchmarks from 'usage/\_benchmarks-models.md'
|
|||
|
||||
<figure>
|
||||
|
||||
| System | UAS | LAS |
|
||||
| Dependency Parsing System | UAS | LAS |
|
||||
| ------------------------------------------------------------------------------ | ---: | ---: |
|
||||
| spaCy RoBERTa (2020)<sup>1</sup> | 96.8 | 95.0 |
|
||||
| spaCy CNN (2020)<sup>1</sup> | 93.7 | 91.8 |
|
||||
|
|
|
@ -1654,9 +1654,12 @@ The [`SentenceRecognizer`](/api/sentencerecognizer) is a simple statistical
|
|||
component that only provides sentence boundaries. Along with being faster and
|
||||
smaller than the parser, its primary advantage is that it's easier to train
|
||||
because it only requires annotated sentence boundaries rather than full
|
||||
dependency parses.
|
||||
|
||||
<!-- TODO: update/confirm usage once we have final models trained -->
|
||||
dependency parses. spaCy's [trained pipelines](/models) include both a parser
|
||||
and a trained sentence segmenter, which is
|
||||
[disabled](/usage/processing-pipelines#disabling) by default. If you only need
|
||||
sentence boundaries and no parser, you can use the `enable` and `disable`
|
||||
arguments on [`spacy.load`](/api/top-level#spacy.load) to enable the senter and
|
||||
disable the parser.
|
||||
|
||||
> #### senter vs. parser
|
||||
>
|
||||
|
|
|
@ -253,8 +253,6 @@ different mechanisms you can use:
|
|||
Disabled and excluded component names can be provided to
|
||||
[`spacy.load`](/api/top-level#spacy.load) as a list.
|
||||
|
||||
<!-- TODO: update with info on our models shipped with optional components -->
|
||||
|
||||
> #### 💡 Optional pipeline components
|
||||
>
|
||||
> The `disable` mechanism makes it easy to distribute pipeline packages with
|
||||
|
@ -262,6 +260,11 @@ Disabled and excluded component names can be provided to
|
|||
> your pipeline may include a statistical _and_ a rule-based component for
|
||||
> sentence segmentation, and you can choose which one to run depending on your
|
||||
> use case.
|
||||
>
|
||||
> For example, spaCy's [trained pipelines](/models) like
|
||||
> [`en_core_web_sm`](/models/en#en_core_web_sm) contain both a `parser` and
|
||||
> `senter` that perform sentence segmentation, but the `senter` is disabled by
|
||||
> default.
|
||||
|
||||
```python
|
||||
# Load the pipeline without the entity recognizer
|
||||
|
|
|
@ -733,7 +733,10 @@ workflows, but only one can be tracked by DVC.
|
|||
<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
|
||||
|
||||
The Prodigy integration will require a nightly version of Prodigy that supports
|
||||
spaCy v3+.
|
||||
spaCy v3+. You can already use annotations created with Prodigy in spaCy v3 by
|
||||
exporting your data with
|
||||
[`data-to-spacy`](https://prodi.gy/docs/recipes#data-to-spacy) and running
|
||||
[`spacy convert`](/api/cli#convert) to convert it to the binary format.
|
||||
|
||||
</Infobox>
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user