mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-13 18:56:36 +03:00
Update docs and resolve todos [ci skip]
This commit is contained in:
parent
d7ab6a2ffe
commit
6836b66433
|
@ -1,10 +1,10 @@
|
||||||
import { Help } from 'components/typography'; import Link from 'components/link'
|
import { Help } from 'components/typography'; import Link from 'components/link'
|
||||||
|
|
||||||
<!-- TODO: update, add project template -->
|
<!-- TODO: update numbers -->
|
||||||
|
|
||||||
<figure>
|
<figure>
|
||||||
|
|
||||||
| System | Parser | Tagger | NER | WPS<br />CPU <Help>words per second on CPU, higher is better</Help> | WPS<br/>GPU <Help>words per second on GPU, higher is better</Help> |
|
| Pipeline | Parser | Tagger | NER | WPS<br />CPU <Help>words per second on CPU, higher is better</Help> | WPS<br/>GPU <Help>words per second on GPU, higher is better</Help> |
|
||||||
| ---------------------------------------------------------- | -----: | -----: | ---: | ------------------------------------------------------------------: | -----------------------------------------------------------------: |
|
| ---------------------------------------------------------- | -----: | -----: | ---: | ------------------------------------------------------------------: | -----------------------------------------------------------------: |
|
||||||
| [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | | | | | 6k |
|
| [`en_core_web_trf`](/models/en#en_core_web_trf) (spaCy v3) | | | | | 6k |
|
||||||
| [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3) | | | | | |
|
| [`en_core_web_lg`](/models/en#en_core_web_lg) (spaCy v3) | | | | | |
|
||||||
|
@ -21,10 +21,10 @@ import { Help } from 'components/typography'; import Link from 'components/link'
|
||||||
|
|
||||||
<figure>
|
<figure>
|
||||||
|
|
||||||
| Named Entity Recognition Model | OntoNotes | CoNLL '03 |
|
| Named Entity Recognition System | OntoNotes | CoNLL '03 |
|
||||||
| ------------------------------------------------------------------------------ | --------: | --------: |
|
| ------------------------------------------------------------------------------ | --------: | --------: |
|
||||||
| spaCy RoBERTa (2020) | | 92.2 |
|
| spaCy RoBERTa (2020) | | 92.2 |
|
||||||
| spaCy CNN (2020) | | 88.4 |
|
| spaCy CNN (2020) | 85.3 | 88.4 |
|
||||||
| spaCy CNN (2017) | 86.4 | |
|
| spaCy CNN (2017) | 86.4 | |
|
||||||
| [Stanza](https://stanfordnlp.github.io/stanza/) (StanfordNLP)<sup>1</sup> | 88.8 | 92.1 |
|
| [Stanza](https://stanfordnlp.github.io/stanza/) (StanfordNLP)<sup>1</sup> | 88.8 | 92.1 |
|
||||||
| <Link to="https://github.com/flairNLP/flair" hideIcon>Flair</Link><sup>2</sup> | 89.7 | 93.1 |
|
| <Link to="https://github.com/flairNLP/flair" hideIcon>Flair</Link><sup>2</sup> | 89.7 | 93.1 |
|
||||||
|
|
|
@ -235,8 +235,6 @@ The `Transformer` component sets the
|
||||||
[`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
|
[`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
|
||||||
which lets you access the transformers outputs at runtime.
|
which lets you access the transformers outputs at runtime.
|
||||||
|
|
||||||
<!-- TODO: update/confirm once we have final models trained -->
|
|
||||||
|
|
||||||
```cli
|
```cli
|
||||||
$ python -m spacy download en_core_trf_lg
|
$ python -m spacy download en_core_trf_lg
|
||||||
```
|
```
|
||||||
|
|
|
@ -63,7 +63,7 @@ import Benchmarks from 'usage/\_benchmarks-models.md'
|
||||||
|
|
||||||
<figure>
|
<figure>
|
||||||
|
|
||||||
| System | UAS | LAS |
|
| Dependency Parsing System | UAS | LAS |
|
||||||
| ------------------------------------------------------------------------------ | ---: | ---: |
|
| ------------------------------------------------------------------------------ | ---: | ---: |
|
||||||
| spaCy RoBERTa (2020)<sup>1</sup> | 96.8 | 95.0 |
|
| spaCy RoBERTa (2020)<sup>1</sup> | 96.8 | 95.0 |
|
||||||
| spaCy CNN (2020)<sup>1</sup> | 93.7 | 91.8 |
|
| spaCy CNN (2020)<sup>1</sup> | 93.7 | 91.8 |
|
||||||
|
|
|
@ -1654,9 +1654,12 @@ The [`SentenceRecognizer`](/api/sentencerecognizer) is a simple statistical
|
||||||
component that only provides sentence boundaries. Along with being faster and
|
component that only provides sentence boundaries. Along with being faster and
|
||||||
smaller than the parser, its primary advantage is that it's easier to train
|
smaller than the parser, its primary advantage is that it's easier to train
|
||||||
because it only requires annotated sentence boundaries rather than full
|
because it only requires annotated sentence boundaries rather than full
|
||||||
dependency parses.
|
dependency parses. spaCy's [trained pipelines](/models) include both a parser
|
||||||
|
and a trained sentence segmenter, which is
|
||||||
<!-- TODO: update/confirm usage once we have final models trained -->
|
[disabled](/usage/processing-pipelines#disabling) by default. If you only need
|
||||||
|
sentence boundaries and no parser, you can use the `enable` and `disable`
|
||||||
|
arguments on [`spacy.load`](/api/top-level#spacy.load) to enable the senter and
|
||||||
|
disable the parser.
|
||||||
|
|
||||||
> #### senter vs. parser
|
> #### senter vs. parser
|
||||||
>
|
>
|
||||||
|
|
|
@ -253,8 +253,6 @@ different mechanisms you can use:
|
||||||
Disabled and excluded component names can be provided to
|
Disabled and excluded component names can be provided to
|
||||||
[`spacy.load`](/api/top-level#spacy.load) as a list.
|
[`spacy.load`](/api/top-level#spacy.load) as a list.
|
||||||
|
|
||||||
<!-- TODO: update with info on our models shipped with optional components -->
|
|
||||||
|
|
||||||
> #### 💡 Optional pipeline components
|
> #### 💡 Optional pipeline components
|
||||||
>
|
>
|
||||||
> The `disable` mechanism makes it easy to distribute pipeline packages with
|
> The `disable` mechanism makes it easy to distribute pipeline packages with
|
||||||
|
@ -262,6 +260,11 @@ Disabled and excluded component names can be provided to
|
||||||
> your pipeline may include a statistical _and_ a rule-based component for
|
> your pipeline may include a statistical _and_ a rule-based component for
|
||||||
> sentence segmentation, and you can choose which one to run depending on your
|
> sentence segmentation, and you can choose which one to run depending on your
|
||||||
> use case.
|
> use case.
|
||||||
|
>
|
||||||
|
> For example, spaCy's [trained pipelines](/models) like
|
||||||
|
> [`en_core_web_sm`](/models/en#en_core_web_sm) contain both a `parser` and
|
||||||
|
> `senter` that perform sentence segmentation, but the `senter` is disabled by
|
||||||
|
> default.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Load the pipeline without the entity recognizer
|
# Load the pipeline without the entity recognizer
|
||||||
|
|
|
@ -733,7 +733,10 @@ workflows, but only one can be tracked by DVC.
|
||||||
<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
|
<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
|
||||||
|
|
||||||
The Prodigy integration will require a nightly version of Prodigy that supports
|
The Prodigy integration will require a nightly version of Prodigy that supports
|
||||||
spaCy v3+.
|
spaCy v3+. You can already use annotations created with Prodigy in spaCy v3 by
|
||||||
|
exporting your data with
|
||||||
|
[`data-to-spacy`](https://prodi.gy/docs/recipes#data-to-spacy) and running
|
||||||
|
[`spacy convert`](/api/cli#convert) to convert it to the binary format.
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user