mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-25 17:36:30 +03:00
Update pipeline design docs [ci skip]
This commit is contained in:
parent
1d1cfadbca
commit
5bbdd7dc4c
49
website/docs/images/pipeline-design.svg
Normal file
49
website/docs/images/pipeline-design.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 27 KiB |
|
@ -55,15 +55,15 @@ For a detailed compatibility overview, see the
|
||||||
This is also the source of spaCy's internal compatibility check, performed when
|
This is also the source of spaCy's internal compatibility check, performed when
|
||||||
you run the [`download`](/api/cli#download) command.
|
you run the [`download`](/api/cli#download) command.
|
||||||
|
|
||||||
## Pretrained pipeline design {#design}
|
## Trained pipeline design {#design}
|
||||||
|
|
||||||
The spaCy v3 pretrained pipelines are designed to be efficient and configurable.
|
The spaCy v3 trained pipelines are designed to be efficient and configurable.
|
||||||
For example, multiple components can share a common "token-to-vector" model and
|
For example, multiple components can share a common "token-to-vector" model and
|
||||||
it's easy to swap out or disable the lemmatizer. The pipelines are designed to
|
it's easy to swap out or disable the lemmatizer. The pipelines are designed to
|
||||||
be efficient in terms of speed and size and work well when the pipeline is run
|
be efficient in terms of speed and size and work well when the pipeline is run
|
||||||
in full.
|
in full.
|
||||||
|
|
||||||
When modifying a pretrained pipeline, it's important to understand how the
|
When modifying a trained pipeline, it's important to understand how the
|
||||||
components **depend on** each other. Unlike spaCy v2, where the `tagger`,
|
components **depend on** each other. Unlike spaCy v2, where the `tagger`,
|
||||||
`parser` and `ner` components were all independent, some v3 components depend on
|
`parser` and `ner` components were all independent, some v3 components depend on
|
||||||
earlier components in the pipeline. As a result, disabling or reordering
|
earlier components in the pipeline. As a result, disabling or reordering
|
||||||
|
@ -84,6 +84,8 @@ Main changes from spaCy v2 models:
|
||||||
|
|
||||||
### CNN/CPU pipeline design
|
### CNN/CPU pipeline design
|
||||||
|
|
||||||
|
![Components and their dependencies in the CNN pipelines](../images/pipeline-design.svg)
|
||||||
|
|
||||||
In the `sm`/`md`/`lg` models:
|
In the `sm`/`md`/`lg` models:
|
||||||
|
|
||||||
- The `tagger`, `morphologizer` and `parser` components listen to the `tok2vec`
|
- The `tagger`, `morphologizer` and `parser` components listen to the `tok2vec`
|
||||||
|
@ -99,11 +101,9 @@ In the `sm`/`md`/`lg` models:
|
||||||
`tagger`+`attribute_ruler` or `morphologizer`.
|
`tagger`+`attribute_ruler` or `morphologizer`.
|
||||||
- The `ner` component is independent with its own internal tok2vec layer.
|
- The `ner` component is independent with its own internal tok2vec layer.
|
||||||
|
|
||||||
<!-- TODO: pretty diagram -->
|
|
||||||
|
|
||||||
### Transformer pipeline design
|
### Transformer pipeline design
|
||||||
|
|
||||||
In the tranformer (`trf`) models, the `tagger`, `parser` and `ner` (if present)
|
In the transformer (`trf`) models, the `tagger`, `parser` and `ner` (if present)
|
||||||
all listen to the `transformer` component. The `attribute_ruler` and
|
all listen to the `transformer` component. The `attribute_ruler` and
|
||||||
`lemmatizer` have the same configuration as in the CNN models.
|
`lemmatizer` have the same configuration as in the CNN models.
|
||||||
|
|
||||||
|
@ -112,7 +112,7 @@ all listen to the `transformer` component. The `attribute_ruler` and
|
||||||
### Modifying the default pipeline
|
### Modifying the default pipeline
|
||||||
|
|
||||||
For faster processing, you may only want to run a subset of the components in a
|
For faster processing, you may only want to run a subset of the components in a
|
||||||
pretrained pipeline. The `disable` and `exclude` arguments to
|
trained pipeline. The `disable` and `exclude` arguments to
|
||||||
[`spacy.load`](/api/top-level#spacy.load) let you control which components are
|
[`spacy.load`](/api/top-level#spacy.load) let you control which components are
|
||||||
loaded and run. Disabled components are loaded in the background so it's
|
loaded and run. Disabled components are loaded in the background so it's
|
||||||
possible to reenable them in the same pipeline in the future with
|
possible to reenable them in the same pipeline in the future with
|
||||||
|
|
Loading…
Reference in New Issue
Block a user