mirror of
https://github.com/explosion/spaCy.git
synced 2025-07-11 08:42:28 +03:00
Update intro section of the pipeline component docs
This commit is contained in:
parent
13e1d8ca90
commit
6e0f537c04
|
@ -1,56 +1,45 @@
|
||||||
---
|
---
|
||||||
title: CuratedTransformer
|
title: CuratedTransformer
|
||||||
teaser: Pipeline component for multi-task learning with transformer models
|
teaser:
|
||||||
|
Pipeline component for multi-task learning with curated transformer models
|
||||||
tag: class
|
tag: class
|
||||||
source: github.com/explosion/spacy-transformers/blob/master/spacy_curated_transformers/pipeline_component.py
|
source: github.com/explosion/spacy-transformers/blob/master/spacy_curated_transformers/pipeline_component.py
|
||||||
version: 3.7
|
version: 3.7
|
||||||
api_base_class: /api/pipe
|
api_base_class: /api/pipe
|
||||||
api_string_name: transformer
|
api_string_name: curated_transformer
|
||||||
---
|
---
|
||||||
|
|
||||||
> #### Installation
|
|
||||||
>
|
|
||||||
> ```bash
|
|
||||||
> $ pip install -U spacy-curated-transformers
|
|
||||||
> ```
|
|
||||||
|
|
||||||
<Infobox title="Important note" variant="warning">
|
<Infobox title="Important note" variant="warning">
|
||||||
|
|
||||||
This component is available via the extension package
|
This component is available via the extension package
|
||||||
[`spacy-curated-transformers`](https://github.com/explosion/spacy-curated-transformers).
|
[`spacy-curated-transformers`](https://github.com/explosion/spacy-curated-transformers).
|
||||||
It exposes the component via entry points, so if you have the package installed,
|
It exposes the component via entry points, so if you have the package installed,
|
||||||
using `factory = "curated_transformer"` in your
|
using `factory = "curated_transformer"` in your
|
||||||
[training config](/usage/training#config) or
|
[training config](/usage/training#config) will work out-of-the-box.
|
||||||
`nlp.add_pipe("curated_transformer")` will work out-of-the-box.
|
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
This Python package provides a curated set of transformer models for spaCy. It
|
This pipeline component lets you use a curated set of transformer models in your
|
||||||
is focused on deep integration into spaCy and will support deployment-focused
|
pipeline. spaCy Curated Transformers currently supports the following model
|
||||||
features such as distillation and quantization in the future. spaCy curated
|
types:
|
||||||
transformers currently supports the following model types:
|
|
||||||
|
|
||||||
- ALBERT
|
- ALBERT
|
||||||
- BERT
|
- BERT
|
||||||
- CamemBERT
|
- CamemBERT
|
||||||
- RoBERTa
|
- RoBERTa
|
||||||
- XLM-RoBERTa
|
- XLM-RoBERT
|
||||||
|
|
||||||
|
If you want to use another type of model, use
|
||||||
|
[spacy-transformers](/api/spacy-transformers), which allows you to use all
|
||||||
|
Hugging Face transformer models with spaCy.
|
||||||
|
|
||||||
You will usually connect downstream components to a shared curated transformer
|
You will usually connect downstream components to a shared curated transformer
|
||||||
using one of the curated transformer listener layers. This works similarly to
|
using one of the curated transformer listener layers. This works similarly to
|
||||||
spaCy's [Tok2Vec](/api/tok2vec), and the
|
spaCy's [Tok2Vec](/api/tok2vec), and the
|
||||||
[Tok2VecListener](/api/architectures/#Tok2VecListener) sublayer.
|
[Tok2VecListener](/api/architectures/#Tok2VecListener) sublayer. The component
|
||||||
|
assigns the output of the transformer to the `Doc`'s extension attributes. To
|
||||||
Supporting a wide variety of transformer models is a non-goal. If you want to
|
access the values, you can use the custom
|
||||||
use another type of model, use [spacy-transformers](/api/spacy-transformers),
|
[`Doc._.trf_data`](#assigned-attributes) attribute.
|
||||||
which allows you to use Hugging Face transformers models with spaCy.
|
|
||||||
|
|
||||||
The component assigns the output of the transformer to the `Doc`'s extension
|
|
||||||
attributes. We also calculate an alignment between the word-piece tokens and the
|
|
||||||
spaCy tokenization, so that we can use the last hidden states to set the
|
|
||||||
`Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy
|
|
||||||
token, the spaCy token receives the sum of their values. To access the values,
|
|
||||||
you can use the custom [`Doc._.trf_data`](#assigned-attributes) attribute.
|
|
||||||
|
|
||||||
For more details, see the [usage documentation](/usage/embeddings-transformers).
|
For more details, see the [usage documentation](/usage/embeddings-transformers).
|
||||||
|
|
||||||
|
@ -59,9 +48,9 @@ For more details, see the [usage documentation](/usage/embeddings-transformers).
|
||||||
The component sets the following
|
The component sets the following
|
||||||
[custom extension attribute](/usage/processing-pipeline#custom-components-attributes):
|
[custom extension attribute](/usage/processing-pipeline#custom-components-attributes):
|
||||||
|
|
||||||
| Location | Value |
|
| Location | Value |
|
||||||
| ---------------- | ------------------------------------------------------------------------------------ |
|
| ---------------- | -------------------------------------------------------------------------- |
|
||||||
| `Doc._.trf_data` | CuratedTransformer tokens and outputs for the `Doc` object. ~~DocTransformerOutput~~ |
|
| `Doc._.trf_data` | Curated transformer outputs for the `Doc` object. ~~DocTransformerOutput~~ |
|
||||||
|
|
||||||
## Config and implementation {id="config"}
|
## Config and implementation {id="config"}
|
||||||
|
|
||||||
|
@ -72,13 +61,19 @@ how the component should be configured. You can override its settings via the
|
||||||
[model architectures](/api/architectures#transformers) documentation for details
|
[model architectures](/api/architectures#transformers) documentation for details
|
||||||
on the transformer architectures and their arguments and hyperparameters.
|
on the transformer architectures and their arguments and hyperparameters.
|
||||||
|
|
||||||
|
Note that the default config does not include the mandatory `vocab_size`
|
||||||
|
hyperparameter as this value can differ between different models. So, you will
|
||||||
|
need to explicitly specify this before adding the pipe (as shown in the example
|
||||||
|
below).
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> from spacy_curated_transformers.pipeline.transformer import DEFAULT_CONFIG
|
> from spacy_curated_transformers.pipeline.transformer import DEFAULT_CONFIG
|
||||||
>
|
>
|
||||||
> DEFAULT_CONFIG["transformer"]["model"]["vocab_size"] = 250002
|
> config = DEFAULT_CONFIG.copy()
|
||||||
> nlp.add_pipe("curated_transformer", config=DEFAULT_CONFIG["transformer"])
|
> config["transformer"]["model"]["vocab_size"] = 250002
|
||||||
|
> nlp.add_pipe("curated_transformer", config=config["transformer"])
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Setting | Description |
|
| Setting | Description |
|
||||||
|
|
Loading…
Reference in New Issue
Block a user