Fix piece_encoder entries

This commit is contained in:
shadeMe 2023-07-20 13:15:37 +02:00
parent a282aec814
commit d8722877cb
No known key found for this signature in database
GPG Key ID: 6FCA9FC635B2A402

View File

@ -484,9 +484,9 @@ The other arguments are shared between all versions.
## Curated transformer architectures {id="curated-trf",source="https://github.com/explosion/spacy-curated-transformers/blob/main/spacy_curated_transformers/models/architectures.py"} ## Curated transformer architectures {id="curated-trf",source="https://github.com/explosion/spacy-curated-transformers/blob/main/spacy_curated_transformers/models/architectures.py"}
The following architectures are provided by the package The following architectures are provided by the package
[`spacy-curated-transformers`](https://github.com/explosion/spacy-curated-transformers). See the [`spacy-curated-transformers`](https://github.com/explosion/spacy-curated-transformers).
[usage documentation](/usage/embeddings-transformers#transformers) for how to See the [usage documentation](/usage/embeddings-transformers#transformers) for
integrate the architectures into your training config. how to integrate the architectures into your training config.
<Infobox variant="warning"> <Infobox variant="warning">
@ -503,11 +503,10 @@ for details and system requirements.
Construct an ALBERT transformer model. Construct an ALBERT transformer model.
| Name | Description | | Name | Description |
|--------------------------------|-----------------------------------------------------------------------------| | ------------------------------ | --------------------------------------------------------------------------- |
| `vocab_size` | Vocabulary size. ~~int~~ | | `vocab_size` | Vocabulary size. ~~int~~ |
| `with_spans` | Callback that constructs a span generator model. ~~Callable~~ | | `with_spans` | Callback that constructs a span generator model. ~~Callable~~ |
| `with_spans` | piece_encoder (Model) ~~Callable~~ | | `piece_encoder` | The piece encoder to segment input tokens. ~~Model~~ |
| `with_spans` | The piece encoder to segment input tokens. ~~Callable~~ |
| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ | | `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ |
| `embedding_width` | Width of the embedding representations. ~~int~~ | | `embedding_width` | Width of the embedding representations. ~~int~~ |
| `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ | | `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ |
@ -533,11 +532,10 @@ Construct an ALBERT transformer model.
Construct a BERT transformer model. Construct a BERT transformer model.
| Name | Description | | Name | Description |
|--------------------------------|-----------------------------------------------------------------------------| | ------------------------------ | --------------------------------------------------------------------------- |
| `vocab_size` | Vocabulary size. ~~int~~ | | `vocab_size` | Vocabulary size. ~~int~~ |
| `with_spans` | Callback that constructs a span generator model. ~~Callable~~ | | `with_spans` | Callback that constructs a span generator model. ~~Callable~~ |
| `with_spans` | piece_encoder (Model) ~~Callable~~ | | `piece_encoder` | The piece encoder to segment input tokens. ~~Model~~ |
| `with_spans` | The piece encoder to segment input tokens. ~~Callable~~ |
| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ | | `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ |
| `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ | | `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ |
| `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ | | `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ |
@ -561,11 +559,10 @@ Construct a BERT transformer model.
Construct a CamemBERT transformer model. Construct a CamemBERT transformer model.
| Name | Description | | Name | Description |
|--------------------------------|-----------------------------------------------------------------------------| | ------------------------------ | --------------------------------------------------------------------------- |
| `vocab_size` | Vocabulary size. ~~int~~ | | `vocab_size` | Vocabulary size. ~~int~~ |
| `with_spans` | Callback that constructs a span generator model. ~~Callable~~ | | `with_spans` | Callback that constructs a span generator model. ~~Callable~~ |
| `with_spans` | piece_encoder (Model) ~~Callable~~ | | `piece_encoder` | The piece encoder to segment input tokens. ~~Model~~ |
| `with_spans` | The piece encoder to segment input tokens. ~~Callable~~ |
| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ | | `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ |
| `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ | | `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ |
| `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ | | `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ |
@ -589,11 +586,10 @@ Construct a CamemBERT transformer model.
Construct a RoBERTa transformer model. Construct a RoBERTa transformer model.
| Name | Description | | Name | Description |
|--------------------------------|-----------------------------------------------------------------------------| | ------------------------------ | --------------------------------------------------------------------------- |
| `vocab_size` | Vocabulary size. ~~int~~ | | `vocab_size` | Vocabulary size. ~~int~~ |
| `with_spans` | Callback that constructs a span generator model. ~~Callable~~ | | `with_spans` | Callback that constructs a span generator model. ~~Callable~~ |
| `with_spans` | piece_encoder (Model) ~~Callable~~ | | `piece_encoder` | The piece encoder to segment input tokens. ~~Model~~ |
| `with_spans` | The piece encoder to segment input tokens. ~~Callable~~ |
| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ | | `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ |
| `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ | | `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ |
| `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ | | `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ |
@ -612,17 +608,15 @@ Construct a RoBERTa transformer model.
| `grad_scaler_config` | Configuration passed to the PyTorch gradient scaler. ~~dict~~ | | `grad_scaler_config` | Configuration passed to the PyTorch gradient scaler. ~~dict~~ |
| **CREATES** | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ | | **CREATES** | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ |
### spacy-curated-transformers.XlmrTransformer.v1 ### spacy-curated-transformers.XlmrTransformer.v1
Construct a XLM-RoBERTa transformer model. Construct a XLM-RoBERTa transformer model.
| Name | Description | | Name | Description |
|--------------------------------|-----------------------------------------------------------------------------| | ------------------------------ | --------------------------------------------------------------------------- |
| `vocab_size` | Vocabulary size. ~~int~~ | | `vocab_size` | Vocabulary size. ~~int~~ |
| `with_spans` | Callback that constructs a span generator model. ~~Callable~~ | | `with_spans` | Callback that constructs a span generator model. ~~Callable~~ |
| `with_spans` | piece_encoder (Model) ~~Callable~~ | | `piece_encoder` | The piece encoder to segment input tokens. ~~Model~~ |
| `with_spans` | The piece encoder to segment input tokens. ~~Callable~~ |
| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ | | `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~ |
| `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ | | `hidden_act` | Activation used by the point-wise feed-forward layers. ~~str~~ |
| `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ | | `hidden_dropout_prob` | Dropout probabilty of the point-wise feed-forward and ~~float~~ |
@ -641,13 +635,13 @@ Construct a XLM-RoBERTa transformer model.
| `grad_scaler_config` | Configuration passed to the PyTorch gradient scaler. ~~dict~~ | | `grad_scaler_config` | Configuration passed to the PyTorch gradient scaler. ~~dict~~ |
| **CREATES** | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ | | **CREATES** | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ |
### spacy-curated-transformers.ScalarWeight.v1 ### spacy-curated-transformers.ScalarWeight.v1
Construct a model that accepts a list of transformer layer outputs and returns a weighted representation of the same. Construct a model that accepts a list of transformer layer outputs and returns a
weighted representation of the same.
| Name | Description | | Name | Description |
|----------------------|-------------------------------------------------------------------------------| | -------------------- | ----------------------------------------------------------------------------- |
| `num_layers` | Number of transformer hidden layers. ~~int~~ | | `num_layers` | Number of transformer hidden layers. ~~int~~ |
| `dropout_prob` | Dropout probability. ~~float~~ | | `dropout_prob` | Dropout probability. ~~float~~ |
| `mixed_precision` | Use mixed-precision training. ~~bool~~ | | `mixed_precision` | Use mixed-precision training. ~~bool~~ |
@ -656,20 +650,20 @@ Construct a model that accepts a list of transformer layer outputs and returns a
### spacy-curated-transformers.TransformerLayersListener.v1 ### spacy-curated-transformers.TransformerLayersListener.v1
Construct a listener layer that communicates with one or more upstream Transformer Construct a listener layer that communicates with one or more upstream
components. This layer extracts the output of the last transformer layer and performs Transformer components. This layer extracts the output of the last transformer
pooling over the individual pieces of each Doc token, returning their corresponding layer and performs pooling over the individual pieces of each Doc token,
representations. The upstream name should either be the wildcard string '*', or the name of the Transformer component. returning their corresponding representations. The upstream name should either
be the wildcard string '\*', or the name of the Transformer component.
In almost all cases, the wildcard string will suffice as there'll only be one In almost all cases, the wildcard string will suffice as there'll only be one
upstream Transformer component. But in certain situations, e.g: you have disjoint upstream Transformer component. But in certain situations, e.g: you have
datasets for certain tasks, or you'd like to use a pre-trained pipeline but a disjoint datasets for certain tasks, or you'd like to use a pre-trained pipeline
downstream task requires its own token representations, you could end up with but a downstream task requires its own token representations, you could end up
more than one Transformer component in the pipeline. with more than one Transformer component in the pipeline.
| Name | Description | | Name | Description |
|-----------------|------------------------------------------------------------------------------------------------------------------------| | --------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `layers` | The the number of layers produced by the upstream transformer component, excluding the embedding layer. ~~int~~ | | `layers` | The the number of layers produced by the upstream transformer component, excluding the embedding layer. ~~int~~ |
| `width` | The width of the vectors produced by the upstream transformer component. ~~int~~ | | `width` | The width of the vectors produced by the upstream transformer component. ~~int~~ |
| `pooling` | Model that is used to perform pooling over the piece representations. ~~Model~~ | | `pooling` | Model that is used to perform pooling over the piece representations. ~~Model~~ |
@ -677,47 +671,47 @@ more than one Transformer component in the pipeline.
| `grad_factor` | Factor to multiply gradients with. ~~float~~ | | `grad_factor` | Factor to multiply gradients with. ~~float~~ |
| **CREATES** | A model that returns the relevant vectors from an upstream transformer component. ~~Model[List[Doc], List[Floats2d]]~~ | | **CREATES** | A model that returns the relevant vectors from an upstream transformer component. ~~Model[List[Doc], List[Floats2d]]~~ |
### spacy-curated-transformers.LastTransformerLayerListener.v1 ### spacy-curated-transformers.LastTransformerLayerListener.v1
Construct a listener layer that communicates with one or more upstream Transformer Construct a listener layer that communicates with one or more upstream
components. This layer extracts the output of the last transformer layer and performs Transformer components. This layer extracts the output of the last transformer
pooling over the individual pieces of each Doc token, returning their corresponding layer and performs pooling over the individual pieces of each Doc token,
representations. The upstream name should either be the wildcard string '*', or the name of the Transformer component. returning their corresponding representations. The upstream name should either
be the wildcard string '\*', or the name of the Transformer component.
In almost all cases, the wildcard string will suffice as there'll only be one In almost all cases, the wildcard string will suffice as there'll only be one
upstream Transformer component. But in certain situations, e.g: you have disjoint upstream Transformer component. But in certain situations, e.g: you have
datasets for certain tasks, or you'd like to use a pre-trained pipeline but a disjoint datasets for certain tasks, or you'd like to use a pre-trained pipeline
downstream task requires its own token representations, you could end up with but a downstream task requires its own token representations, you could end up
more than one Transformer component in the pipeline. with more than one Transformer component in the pipeline.
| Name | Description | | Name | Description |
|-----------------|------------------------------------------------------------------------------------------------------------------------| | --------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `width` | The width of the vectors produced by the upstream transformer component. ~~int~~ | | `width` | The width of the vectors produced by the upstream transformer component. ~~int~~ |
| `pooling` | Model that is used to perform pooling over the piece representations. ~~Model~~ | | `pooling` | Model that is used to perform pooling over the piece representations. ~~Model~~ |
| `upstream_name` | A string to identify the 'upstream' Transformer component to communicate with. ~~str~~ | | `upstream_name` | A string to identify the 'upstream' Transformer component to communicate with. ~~str~~ |
| `grad_factor` | Factor to multiply gradients with. ~~float~~ | | `grad_factor` | Factor to multiply gradients with. ~~float~~ |
| **CREATES** | A model that returns the relevant vectors from an upstream transformer component. ~~Model[List[Doc], List[Floats2d]]~~ | | **CREATES** | A model that returns the relevant vectors from an upstream transformer component. ~~Model[List[Doc], List[Floats2d]]~~ |
### spacy-curated-transformers.ScalarWeightingListener.v1 ### spacy-curated-transformers.ScalarWeightingListener.v1
Construct a listener layer that communicates with one or more upstream Transformer Construct a listener layer that communicates with one or more upstream
components. This layer calculates a weighted representation of all transformer layer Transformer components. This layer calculates a weighted representation of all
outputs and performs pooling over the individual pieces of each Doc token, returning transformer layer outputs and performs pooling over the individual pieces of
their corresponding representations. each Doc token, returning their corresponding representations.
Requires its upstream Transformer components to return all layer outputs from Requires its upstream Transformer components to return all layer outputs from
their models. The upstream name should either be the wildcard string '*', or the name of the Transformer component. their models. The upstream name should either be the wildcard string '\*', or
the name of the Transformer component.
In almost all cases, the wildcard string will suffice as there'll only be one In almost all cases, the wildcard string will suffice as there'll only be one
upstream Transformer component. But in certain situations, e.g: you have disjoint upstream Transformer component. But in certain situations, e.g: you have
datasets for certain tasks, or you'd like to use a pre-trained pipeline but a disjoint datasets for certain tasks, or you'd like to use a pre-trained pipeline
downstream task requires its own token representations, you could end up with but a downstream task requires its own token representations, you could end up
more than one Transformer component in the pipeline. with more than one Transformer component in the pipeline.
| Name | Description | | Name | Description |
|-----------------|------------------------------------------------------------------------------------------------------------------------| | --------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `width` | The width of the vectors produced by the upstream transformer component. ~~int~~ | | `width` | The width of the vectors produced by the upstream transformer component. ~~int~~ |
| `weighting` | Model that is used to perform the weighting of the different layer outputs. ~~Model~~ | | `weighting` | Model that is used to perform the weighting of the different layer outputs. ~~Model~~ |
| `pooling` | Model that is used to perform pooling over the piece representations. ~~Model~~ | | `pooling` | Model that is used to perform pooling over the piece representations. ~~Model~~ |
@ -727,66 +721,59 @@ more than one Transformer component in the pipeline.
### spacy-curated-transformers.BertWordpieceEncoder.v1 ### spacy-curated-transformers.BertWordpieceEncoder.v1
Construct a WordPiece piece encoder model that accepts a list Construct a WordPiece piece encoder model that accepts a list of token sequences
of token sequences or documents and returns a corresponding list or documents and returns a corresponding list of piece identifiers. This encoder
of piece identifiers. This encoder also splits each token also splits each token on punctuation characters, as expected by most BERT
on punctuation characters, as expected by most BERT models. models.
This model must be separately initialized using an appropriate This model must be separately initialized using an appropriate loader.
loader.
### spacy-curated-transformers.ByteBpeEncoder.v1 ### spacy-curated-transformers.ByteBpeEncoder.v1
Construct a Byte-BPE piece encoder model that accepts a list Construct a Byte-BPE piece encoder model that accepts a list of token sequences
of token sequences or documents and returns a corresponding list or documents and returns a corresponding list of piece identifiers.
of piece identifiers.
This model must be separately initialized using an appropriate This model must be separately initialized using an appropriate loader.
loader.
### spacy-curated-transformers.CamembertSentencepieceEncoder.v1 ### spacy-curated-transformers.CamembertSentencepieceEncoder.v1
Construct a SentencePiece piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers with CamemBERT post-processing applied.
This model must be separately initialized using an appropriate Construct a SentencePiece piece encoder model that accepts a list of token
loader. sequences or documents and returns a corresponding list of piece identifiers
with CamemBERT post-processing applied.
This model must be separately initialized using an appropriate loader.
### spacy-curated-transformers.CharEncoder.v1 ### spacy-curated-transformers.CharEncoder.v1
Construct a character piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers.
This model must be separately initialized using an appropriate Construct a character piece encoder model that accepts a list of token sequences
loader. or documents and returns a corresponding list of piece identifiers.
This model must be separately initialized using an appropriate loader.
### spacy-curated-transformers.SentencepieceEncoder.v1 ### spacy-curated-transformers.SentencepieceEncoder.v1
Construct a SentencePiece piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers with CamemBERT post-processing applied.
This model must be separately initialized using an appropriate Construct a SentencePiece piece encoder model that accepts a list of token
loader. sequences or documents and returns a corresponding list of piece identifiers
with CamemBERT post-processing applied.
This model must be separately initialized using an appropriate loader.
### spacy-curated-transformers.WordpieceEncoder.v1 ### spacy-curated-transformers.WordpieceEncoder.v1
Construct a WordPiece piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers. This encoder also splits each token
on punctuation characters, as expected by most BERT models.
This model must be separately initialized using an appropriate Construct a WordPiece piece encoder model that accepts a list of token sequences
loader. or documents and returns a corresponding list of piece identifiers. This encoder
also splits each token on punctuation characters, as expected by most BERT
models.
This model must be separately initialized using an appropriate loader.
### spacy-curated-transformers.XlmrSentencepieceEncoder.v1 ### spacy-curated-transformers.XlmrSentencepieceEncoder.v1
Construct a SentencePiece piece encoder model that accepts a list
of token sequences or documents and returns a corresponding list
of piece identifiers with XLM-RoBERTa post-processing applied.
This model must be separately initialized using an appropriate
loader.
Construct a SentencePiece piece encoder model that accepts a list of token
sequences or documents and returns a corresponding list of piece identifiers
with XLM-RoBERTa post-processing applied.
This model must be separately initialized using an appropriate loader.
## Pretraining architectures {id="pretrain",source="spacy/ml/models/multi_task.py"} ## Pretraining architectures {id="pretrain",source="spacy/ml/models/multi_task.py"}
@ -826,7 +813,7 @@ objective for a Tok2Vec layer. To use this objective, make sure that the
vectors. vectors.
| Name | Description | | Name | Description |
|-----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------| | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `maxout_pieces` | The number of maxout pieces to use. Recommended values are `2` or `3`. ~~int~~ | | `maxout_pieces` | The number of maxout pieces to use. Recommended values are `2` or `3`. ~~int~~ |
| `hidden_size` | Size of the hidden layer of the model. ~~int~~ | | `hidden_size` | Size of the hidden layer of the model. ~~int~~ |
| `loss` | The loss function can be either "cosine" or "L2". We typically recommend to use "cosine". ~~~str~~ | | `loss` | The loss function can be either "cosine" or "L2". We typically recommend to use "cosine". ~~~str~~ |