Update intro section of the pipeline component docs

2025-12-05 09:14:22 +03:00 · 2023-08-08 14:21:18 +02:00 · 2023-08-08 14:21:18 +02:00 · 6e0f537c04
commit 6e0f537c04
parent 13e1d8ca90
1 changed files with 27 additions and 32 deletions
--- a/website/docs/api/curated-transformer.mdx
+++ b/website/docs/api/curated-transformer.mdx
@ -1,56 +1,45 @@
 ---
 title: CuratedTransformer
-teaser: Pipeline component for multi-task learning with transformer models
+teaser:
  Pipeline component for multi-task learning with curated transformer models
 tag: class
 source: github.com/explosion/spacy-transformers/blob/master/spacy_curated_transformers/pipeline_component.py
 version: 3.7
 api_base_class: /api/pipe
-api_string_name: transformer
+api_string_name: curated_transformer
 ---
 > #### Installation
 >
 > ```bash
 > $ pip install -U spacy-curated-transformers
 > ```
 <Infobox title="Important note" variant="warning">
 This component is available via the extension package
 [`spacy-curated-transformers`](https://github.com/explosion/spacy-curated-transformers).
 It exposes the component via entry points, so if you have the package installed,
 using `factory = "curated_transformer"` in your
-[training config](/usage/training#config) or
+[training config](/usage/training#config) will work out-of-the-box.
 `nlp.add_pipe("curated_transformer")` will work out-of-the-box.
 </Infobox>
-This Python package provides a curated set of transformer models for spaCy. It
+This pipeline component lets you use a curated set of transformer models in your
-is focused on deep integration into spaCy and will support deployment-focused
+pipeline. spaCy Curated Transformers currently supports the following model
-features such as distillation and quantization in the future. spaCy curated
+types:
 transformers currently supports the following model types:
 - ALBERT
 - BERT
 - CamemBERT
 - RoBERTa
- XLM-RoBERTa
+- XLM-RoBERT
 If you want to use another type of model, use
 [spacy-transformers](/api/spacy-transformers), which allows you to use all
 Hugging Face transformer models with spaCy.
 You will usually connect downstream components to a shared curated transformer
 using one of the curated transformer listener layers. This works similarly to
 spaCy's [Tok2Vec](/api/tok2vec), and the
-[Tok2VecListener](/api/architectures/#Tok2VecListener) sublayer.
+[Tok2VecListener](/api/architectures/#Tok2VecListener) sublayer. The component
-
+assigns the output of the transformer to the `Doc`'s extension attributes. To
-Supporting a wide variety of transformer models is a non-goal. If you want to
+access the values, you can use the custom
-use another type of model, use [spacy-transformers](/api/spacy-transformers),
+[`Doc._.trf_data`](#assigned-attributes) attribute.
 which allows you to use Hugging Face transformers models with spaCy.
 The component assigns the output of the transformer to the `Doc`'s extension
 attributes. We also calculate an alignment between the word-piece tokens and the
 spaCy tokenization, so that we can use the last hidden states to set the
 `Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy
 token, the spaCy token receives the sum of their values. To access the values,
 you can use the custom [`Doc._.trf_data`](#assigned-attributes) attribute.
 For more details, see the [usage documentation](/usage/embeddings-transformers).
@ -59,9 +48,9 @@ For more details, see the [usage documentation](/usage/embeddings-transformers).
 The component sets the following
 [custom extension attribute](/usage/processing-pipeline#custom-components-attributes):
-| Location         | Value                                                                                |
+| Location         | Value                                                                      |
-| ---------------- | ------------------------------------------------------------------------------------ |
+| ---------------- | -------------------------------------------------------------------------- |
-| `Doc._.trf_data` | CuratedTransformer tokens and outputs for the `Doc` object. ~~DocTransformerOutput~~ |
+| `Doc._.trf_data` | Curated transformer outputs for the `Doc` object. ~~DocTransformerOutput~~ |
 ## Config and implementation {id="config"}
@ -72,13 +61,19 @@ how the component should be configured. You can override its settings via the
 [model architectures](/api/architectures#transformers) documentation for details
 on the transformer architectures and their arguments and hyperparameters.
 Note that the default config does not include the mandatory `vocab_size`
 hyperparameter as this value can differ between different models. So, you will
 need to explicitly specify this before adding the pipe (as shown in the example
 below).
 > #### Example
 >
 > ```python
 > from spacy_curated_transformers.pipeline.transformer import DEFAULT_CONFIG
 >
-> DEFAULT_CONFIG["transformer"]["model"]["vocab_size"] = 250002
+> config = DEFAULT_CONFIG.copy()
-> nlp.add_pipe("curated_transformer", config=DEFAULT_CONFIG["transformer"])
+> config["transformer"]["model"]["vocab_size"] = 250002
 > nlp.add_pipe("curated_transformer", config=config["transformer"])
 > ```
 | Setting             | Description                                                                                                                                                                                                                                        |