Fix piece_encoder entries

This commit is contained in:
shadeMe 2023-07-20 13:15:37 +02:00
parent a282aec814
commit d8722877cb
No known key found for this signature in database
GPG Key ID: 6FCA9FC635B2A402

View File

@ -484,9 +484,9 @@ The other arguments are shared between all versions.
## Curated transformer architectures {id="curated-trf",source="https://github.com/explosion/spacy-curated-transformers/blob/main/spacy_curated_transformers/models/architectures.py"}
The following architectures are provided by the package
[`spacy-curated-transformers`](https://github.com/explosion/spacy-curated-transformers). See the
[usage documentation](/usage/embeddings-transformers#transformers) for how to
integrate the architectures into your training config.
[`spacy-curated-transformers`](https://github.com/explosion/spacy-curated-transformers).
See the [usage documentation](/usage/embeddings-transformers#transformers) for
how to integrate the architectures into your training config.
<Infobox variant="warning">
@ -503,11 +503,10 @@ for details and system requirements.
Construct an ALBERT transformer model.
| Name | Description |
|--------------------------------|-----------------------------------------------------------------------------|
| ------------------------------ | --------------------------------------------------------------------------- |
| `vocab_size` | Vocabulary size. ~~int~~ |
| `with_spans` | Callback that constructs a span generator model. ~~Callable~~ |
| `with_spans` | piece_encoder (Model) ~~Callable~~ |
| `with_spans` | The piece encoder to segment input tokens. ~~Callable~~ |
| `piece_encoder` | The piece encoder to segment input tokens. ~~Model~~ |
| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ |
| `embedding_width` | Width of the embedding representations. ~~int~~ |
| `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ |
@ -533,11 +532,10 @@ Construct an ALBERT transformer model.
Construct a BERT transformer model.
| Name | Description |
|--------------------------------|-----------------------------------------------------------------------------|
| ------------------------------ | --------------------------------------------------------------------------- |
| `vocab_size` | Vocabulary size. ~~int~~ |
| `with_spans` | Callback that constructs a span generator model. ~~Callable~~ |
| `with_spans` | piece_encoder (Model) ~~Callable~~ |
| `with_spans` | The piece encoder to segment input tokens. ~~Callable~~ |
| `piece_encoder` | The piece encoder to segment input tokens. ~~Model~~ |
| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ |
| `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ |
| `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ |
@ -561,11 +559,10 @@ Construct a BERT transformer model.
Construct a CamemBERT transformer model.
| Name | Description |
|--------------------------------|-----------------------------------------------------------------------------|
| ------------------------------ | --------------------------------------------------------------------------- |
| `vocab_size` | Vocabulary size. ~~int~~ |
| `with_spans` | Callback that constructs a span generator model. ~~Callable~~ |
| `with_spans` | piece_encoder (Model) ~~Callable~~ |
| `with_spans` | The piece encoder to segment input tokens. ~~Callable~~ |
| `piece_encoder` | The piece encoder to segment input tokens. ~~Model~~ |
| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ |
| `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ |
| `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ |
@ -589,11 +586,10 @@ Construct a CamemBERT transformer model.
Construct a RoBERTa transformer model.
| Name | Description |
|--------------------------------|-----------------------------------------------------------------------------|
| ------------------------------ | --------------------------------------------------------------------------- |
| `vocab_size` | Vocabulary size. ~~int~~ |
| `with_spans` | Callback that constructs a span generator model. ~~Callable~~ |
| `with_spans` | piece_encoder (Model) ~~Callable~~ |
| `with_spans` | The piece encoder to segment input tokens. ~~Callable~~ |
| `piece_encoder` | The piece encoder to segment input tokens. ~~Model~~ |
| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ |
| `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ |
| `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ |
@ -612,17 +608,15 @@ Construct a RoBERTa transformer model.
| `grad_scaler_config` | Configuration passed to the PyTorch gradient scaler. ~~dict~~ |
| **CREATES** | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ |
### spacy-curated-transformers.XlmrTransformer.v1
Construct a XLM-RoBERTa transformer model.
| Name | Description |
|--------------------------------|-----------------------------------------------------------------------------|
| ------------------------------ | --------------------------------------------------------------------------- |
| `vocab_size` | Vocabulary size. ~~int~~ |
| `with_spans` | Callback that constructs a span generator model. ~~Callable~~ |
| `with_spans` | piece_encoder (Model) ~~Callable~~ |
| `with_spans` | The piece encoder to segment input tokens. ~~Callable~~ |
| `piece_encoder` | The piece encoder to segment input tokens. ~~Model~~ |
| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ |
| `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ |
| `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ |
@ -641,13 +635,13 @@ Construct a XLM-RoBERTa transformer model.
| `grad_scaler_config` | Configuration passed to the PyTorch gradient scaler. ~~dict~~ |
| **CREATES** | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ |
### spacy-curated-transformers.ScalarWeight.v1
Construct a model that accepts a list of transformer layer outputs and returns a weighted representation of the same.
Construct a model that accepts a list of transformer layer outputs and returns a
weighted representation of the same.
| Name | Description |
|----------------------|-------------------------------------------------------------------------------|
| -------------------- | ----------------------------------------------------------------------------- |
| `num_layers` | Number of transformer hidden layers. ~~int~~ |
| `dropout_prob` | Dropout probability. ~~float~~ |
| `mixed_precision` | Use mixed-precision training. ~~bool~~ |
@ -656,137 +650,130 @@ Construct a model that accepts a list of transformer layer outputs and returns a
### spacy-curated-transformers.TransformerLayersListener.v1
Construct a listener layer that communicates with one or more upstream Transformer
components. This layer extracts the output of the last transformer layer and performs
pooling over the individual pieces of each Doc token, returning their corresponding
representations. The upstream name should either be the wildcard string '*', or the name of the Transformer component.
Construct a listener layer that communicates with one or more upstream
Transformer components. This layer extracts the output of the last transformer
layer and performs pooling over the individual pieces of each Doc token,
returning their corresponding representations. The upstream name should either
be the wildcard string '\*', or the name of the Transformer component.
In almost all cases, the wildcard string will suffice as there'll only be one
upstream Transformer component. But in certain situations, e.g: you have disjoint
datasets for certain tasks, or you'd like to use a pre-trained pipeline but a
downstream task requires its own token representations, you could end up with
more than one Transformer component in the pipeline.
upstream Transformer component. But in certain situations, e.g: you have
disjoint datasets for certain tasks, or you'd like to use a pre-trained pipeline
but a downstream task requires its own token representations, you could end up
with more than one Transformer component in the pipeline.
| Name | Description |
|-----------------|------------------------------------------------------------------------------------------------------------------------|
| --------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `layers` | The the number of layers produced by the upstream transformer component, excluding the embedding layer. ~~int~~ |
| `width` | The width of the vectors produced by the upstream transformer component. ~~int~~ |
| `pooling` | Model that is used to perform pooling over the piece representations. ~~Model~~ |
| `upstream_name` | A string to identify the 'upstream' Transformer component to communicate with. ~~str~~ |
| `upstream_name` | A string to identify the 'upstream' Transformer component to communicate with. ~~str~~ |
| `grad_factor` | Factor to multiply gradients with. ~~float~~ |
| **CREATES** | A model that returns the relevant vectors from an upstream transformer component. ~~Model[List[Doc], List[Floats2d]]~~ |
### spacy-curated-transformers.LastTransformerLayerListener.v1
Construct a listener layer that communicates with one or more upstream Transformer
components. This layer extracts the output of the last transformer layer and performs
pooling over the individual pieces of each Doc token, returning their corresponding
representations. The upstream name should either be the wildcard string '*', or the name of the Transformer component.
Construct a listener layer that communicates with one or more upstream
Transformer components. This layer extracts the output of the last transformer
layer and performs pooling over the individual pieces of each Doc token,
returning their corresponding representations. The upstream name should either
be the wildcard string '\*', or the name of the Transformer component.
In almost all cases, the wildcard string will suffice as there'll only be one
upstream Transformer component. But in certain situations, e.g: you have disjoint
datasets for certain tasks, or you'd like to use a pre-trained pipeline but a
downstream task requires its own token representations, you could end up with
more than one Transformer component in the pipeline.
upstream Transformer component. But in certain situations, e.g: you have
disjoint datasets for certain tasks, or you'd like to use a pre-trained pipeline
but a downstream task requires its own token representations, you could end up
with more than one Transformer component in the pipeline.
| Name | Description |
|-----------------|------------------------------------------------------------------------------------------------------------------------|
| --------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `width` | The width of the vectors produced by the upstream transformer component. ~~int~~ |
| `pooling` | Model that is used to perform pooling over the piece representations. ~~Model~~ |
| `upstream_name` | A string to identify the 'upstream' Transformer component to communicate with. ~~str~~ |
| `upstream_name` | A string to identify the 'upstream' Transformer component to communicate with. ~~str~~ |
| `grad_factor` | Factor to multiply gradients with. ~~float~~ |
| **CREATES** | A model that returns the relevant vectors from an upstream transformer component. ~~Model[List[Doc], List[Floats2d]]~~ |
### spacy-curated-transformers.ScalarWeightingListener.v1
Construct a listener layer that communicates with one or more upstream Transformer
components. This layer calculates a weighted representation of all transformer layer
outputs and performs pooling over the individual pieces of each Doc token, returning
their corresponding representations.
Construct a listener layer that communicates with one or more upstream
Transformer components. This layer calculates a weighted representation of all
transformer layer outputs and performs pooling over the individual pieces of
each Doc token, returning their corresponding representations.
Requires its upstream Transformer components to return all layer outputs from
their models. The upstream name should either be the wildcard string '*', or the name of the Transformer component.
their models. The upstream name should either be the wildcard string '\*', or
the name of the Transformer component.
In almost all cases, the wildcard string will suffice as there'll only be one
upstream Transformer component. But in certain situations, e.g: you have disjoint
datasets for certain tasks, or you'd like to use a pre-trained pipeline but a
downstream task requires its own token representations, you could end up with
more than one Transformer component in the pipeline.
upstream Transformer component. But in certain situations, e.g: you have
disjoint datasets for certain tasks, or you'd like to use a pre-trained pipeline
but a downstream task requires its own token representations, you could end up
with more than one Transformer component in the pipeline.
| Name | Description |
|-----------------|------------------------------------------------------------------------------------------------------------------------|
| --------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `width` | The width of the vectors produced by the upstream transformer component. ~~int~~ |
| `weighting` | Model that is used to perform the weighting of the different layer outputs. ~~Model~~ |
| `pooling` | Model that is used to perform pooling over the piece representations. ~~Model~~ |
| `upstream_name` | A string to identify the 'upstream' Transformer component to communicate with. ~~str~~ |
| `upstream_name` | A string to identify the 'upstream' Transformer component to communicate with. ~~str~~ |
| `grad_factor` | Factor to multiply gradients with. ~~float~~ |
| **CREATES** | A model that returns the relevant vectors from an upstream transformer component. ~~Model[List[Doc], List[Floats2d]]~~ |
### spacy-curated-transformers.BertWordpieceEncoder.v1
Construct a WordPiece piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers. This encoder also splits each token
on punctuation characters, as expected by most BERT models.
Construct a WordPiece piece encoder model that accepts a list of token sequences
or documents and returns a corresponding list of piece identifiers. This encoder
also splits each token on punctuation characters, as expected by most BERT
models.
This model must be separately initialized using an appropriate
loader.
This model must be separately initialized using an appropriate loader.
### spacy-curated-transformers.ByteBpeEncoder.v1
Construct a Byte-BPE piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers.
Construct a Byte-BPE piece encoder model that accepts a list of token sequences
or documents and returns a corresponding list of piece identifiers.
This model must be separately initialized using an appropriate
loader.
This model must be separately initialized using an appropriate loader.
### spacy-curated-transformers.CamembertSentencepieceEncoder.v1
Construct a SentencePiece piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers with CamemBERT post-processing applied.
This model must be separately initialized using an appropriate
loader.
Construct a SentencePiece piece encoder model that accepts a list of token
sequences or documents and returns a corresponding list of piece identifiers
with CamemBERT post-processing applied.
This model must be separately initialized using an appropriate loader.
### spacy-curated-transformers.CharEncoder.v1
Construct a character piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers.
This model must be separately initialized using an appropriate
loader.
Construct a character piece encoder model that accepts a list of token sequences
or documents and returns a corresponding list of piece identifiers.
This model must be separately initialized using an appropriate loader.
### spacy-curated-transformers.SentencepieceEncoder.v1
Construct a SentencePiece piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers with CamemBERT post-processing applied.
This model must be separately initialized using an appropriate
loader.
Construct a SentencePiece piece encoder model that accepts a list of token
sequences or documents and returns a corresponding list of piece identifiers
with CamemBERT post-processing applied.
This model must be separately initialized using an appropriate loader.
### spacy-curated-transformers.WordpieceEncoder.v1
Construct a WordPiece piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers. This encoder also splits each token
on punctuation characters, as expected by most BERT models.
This model must be separately initialized using an appropriate
loader.
Construct a WordPiece piece encoder model that accepts a list of token sequences
or documents and returns a corresponding list of piece identifiers. This encoder
also splits each token on punctuation characters, as expected by most BERT
models.
This model must be separately initialized using an appropriate loader.
### spacy-curated-transformers.XlmrSentencepieceEncoder.v1
Construct a SentencePiece piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers with XLM-RoBERTa post-processing applied.
This model must be separately initialized using an appropriate
loader.
Construct a SentencePiece piece encoder model that accepts a list of token
sequences or documents and returns a corresponding list of piece identifiers
with XLM-RoBERTa post-processing applied.
This model must be separately initialized using an appropriate loader.
## Pretraining architectures {id="pretrain",source="spacy/ml/models/multi_task.py"}
@ -826,7 +813,7 @@ objective for a Tok2Vec layer. To use this objective, make sure that the
vectors.
| Name | Description |
|-----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `maxout_pieces` | The number of maxout pieces to use. Recommended values are `2` or `3`. ~~int~~ |
| `hidden_size` | Size of the hidden layer of the model. ~~int~~ |
| `loss` | The loss function can be either "cosine" or "L2". We typically recommend to use "cosine". ~~~str~~ |