From cca478152ecaf8a1d180aa3e5227b02c8b405769 Mon Sep 17 00:00:00 2001
From: shadeMe <shadeMe@users.noreply.github.com>
Date: Thu, 20 Jul 2023 16:05:42 +0200
Subject: [PATCH] Fix duplicate entries in tables

---
 website/docs/api/architectures.mdx | 214 ++++++++++++++---------------
 1 file changed, 102 insertions(+), 112 deletions(-)

diff --git a/website/docs/api/architectures.mdx b/website/docs/api/architectures.mdx
index 59b422020..e0f256332 100644
--- a/website/docs/api/architectures.mdx
+++ b/website/docs/api/architectures.mdx
@@ -492,138 +492,128 @@ how to integrate the architectures into your training config.
 
 Construct an ALBERT transformer model.
 
-| Name                           | Description                                                                 |
-| ------------------------------ | --------------------------------------------------------------------------- |
-| `vocab_size`                   | Vocabulary size. ~~int~~                                                    |
-| `with_spans`                   | Callback that constructs a span generator model. ~~Callable~~               |
-| `piece_encoder`                | The piece encoder to segment input tokens. ~~Model~~                        |
-| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~                  |
-| `embedding_width`              | Width of the embedding representations. ~~int~~                             |
-| `hidden_act`                   | Activation used by the point-wise feed-forward layers. ~~str~~              |
-| `hidden_dropout_prob`          | Dropout probabilty of the point-wise feed-forward and ~~float~~             |
-| `hidden_dropout_prob`          | embedding layers. ~~float~~                                                 |
-| `hidden_width`                 | Width of the final representations. ~~int~~                                 |
-| `intermediate_width`           | Width of the intermediate projection layer in the ~~int~~                   |
-| `intermediate_width`           | point-wise feed-forward layer. ~~int~~                                      |
-| `layer_norm_eps`               | Epsilon for layer normalization. ~~float~~                                  |
-| `max_position_embeddings`      | Maximum length of position embeddings. ~~int~~                              |
-| `model_max_length`             | Maximum length of model inputs. ~~int~~                                     |
-| `num_attention_heads`          | Number of self-attention heads. ~~int~~                                     |
-| `num_hidden_groups`            | Number of layer groups whose constituents share parameters. ~~int~~         |
-| `num_hidden_layers`            | Number of hidden layers. ~~int~~                                            |
-| `padding_idx`                  | Index of the padding meta-token. ~~int~~                                    |
-| `type_vocab_size`              | Type vocabulary size. ~~int~~                                               |
-| `mixed_precision`              | Use mixed-precision training. ~~bool~~                                      |
-| `grad_scaler_config`           | Configuration passed to the PyTorch gradient scaler. ~~dict~~               |
-| **CREATES**                    | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ |
+| Name                           | Description                                                                              |
+| ------------------------------ | ---------------------------------------------------------------------------------------- |
+| `vocab_size`                   | Vocabulary size. ~~int~~                                                                 |
+| `with_spans`                   | Callback that constructs a span generator model. ~~Callable~~                            |
+| `piece_encoder`                | The piece encoder to segment input tokens. ~~Model~~                                     |
+| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~                               |
+| `embedding_width`              | Width of the embedding representations. ~~int~~                                          |
+| `hidden_act`                   | Activation used by the point-wise feed-forward layers. ~~str~~                           |
+| `hidden_dropout_prob`          | Dropout probabilty of the point-wise feed-forward and embedding layers. ~~float~~        |
+| `hidden_width`                 | Width of the final representations. ~~int~~                                              |
+| `intermediate_width`           | Width of the intermediate projection layer in the point-wise feed-forward layer. ~~int~~ |
+| `layer_norm_eps`               | Epsilon for layer normalization. ~~float~~                                               |
+| `max_position_embeddings`      | Maximum length of position embeddings. ~~int~~                                           |
+| `model_max_length`             | Maximum length of model inputs. ~~int~~                                                  |
+| `num_attention_heads`          | Number of self-attention heads. ~~int~~                                                  |
+| `num_hidden_groups`            | Number of layer groups whose constituents share parameters. ~~int~~                      |
+| `num_hidden_layers`            | Number of hidden layers. ~~int~~                                                         |
+| `padding_idx`                  | Index of the padding meta-token. ~~int~~                                                 |
+| `type_vocab_size`              | Type vocabulary size. ~~int~~                                                            |
+| `mixed_precision`              | Use mixed-precision training. ~~bool~~                                                   |
+| `grad_scaler_config`           | Configuration passed to the PyTorch gradient scaler. ~~dict~~                            |
+| **CREATES**                    | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~              |
 
 ### spacy-curated-transformers.BertTransformer.v1
 
 Construct a BERT transformer model.
 
-| Name                           | Description                                                                 |
-| ------------------------------ | --------------------------------------------------------------------------- |
-| `vocab_size`                   | Vocabulary size. ~~int~~                                                    |
-| `with_spans`                   | Callback that constructs a span generator model. ~~Callable~~               |
-| `piece_encoder`                | The piece encoder to segment input tokens. ~~Model~~                        |
-| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~                  |
-| `hidden_act`                   | Activation used by the point-wise feed-forward layers. ~~str~~              |
-| `hidden_dropout_prob`          | Dropout probabilty of the point-wise feed-forward and ~~float~~             |
-| `hidden_dropout_prob`          | embedding layers. ~~float~~                                                 |
-| `hidden_width`                 | Width of the final representations. ~~int~~                                 |
-| `intermediate_width`           | Width of the intermediate projection layer in the ~~int~~                   |
-| `intermediate_width`           | point-wise feed-forward layer. ~~int~~                                      |
-| `layer_norm_eps`               | Epsilon for layer normalization. ~~float~~                                  |
-| `max_position_embeddings`      | Maximum length of position embeddings. ~~int~~                              |
-| `model_max_length`             | Maximum length of model inputs. ~~int~~                                     |
-| `num_attention_heads`          | Number of self-attention heads. ~~int~~                                     |
-| `num_hidden_layers`            | Number of hidden layers. ~~int~~                                            |
-| `padding_idx`                  | Index of the padding meta-token. ~~int~~                                    |
-| `type_vocab_size`              | Type vocabulary size. ~~int~~                                               |
-| `mixed_precision`              | Use mixed-precision training. ~~bool~~                                      |
-| `grad_scaler_config`           | Configuration passed to the PyTorch gradient scaler. ~~dict~~               |
-| **CREATES**                    | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ |
+| Name                           | Description                                                                              |
+| ------------------------------ | ---------------------------------------------------------------------------------------- |
+| `vocab_size`                   | Vocabulary size. ~~int~~                                                                 |
+| `with_spans`                   | Callback that constructs a span generator model. ~~Callable~~                            |
+| `piece_encoder`                | The piece encoder to segment input tokens. ~~Model~~                                     |
+| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~                               |
+| `hidden_act`                   | Activation used by the point-wise feed-forward layers. ~~str~~                           |
+| `hidden_dropout_prob`          | Dropout probabilty of the point-wise feed-forward and embedding layers. ~~float~~        |
+| `hidden_width`                 | Width of the final representations. ~~int~~                                              |
+| `intermediate_width`           | Width of the intermediate projection layer in the point-wise feed-forward layer. ~~int~~ |
+| `layer_norm_eps`               | Epsilon for layer normalization. ~~float~~                                               |
+| `max_position_embeddings`      | Maximum length of position embeddings. ~~int~~                                           |
+| `model_max_length`             | Maximum length of model inputs. ~~int~~                                                  |
+| `num_attention_heads`          | Number of self-attention heads. ~~int~~                                                  |
+| `num_hidden_layers`            | Number of hidden layers. ~~int~~                                                         |
+| `padding_idx`                  | Index of the padding meta-token. ~~int~~                                                 |
+| `type_vocab_size`              | Type vocabulary size. ~~int~~                                                            |
+| `mixed_precision`              | Use mixed-precision training. ~~bool~~                                                   |
+| `grad_scaler_config`           | Configuration passed to the PyTorch gradient scaler. ~~dict~~                            |
+| **CREATES**                    | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~              |
 
 ### spacy-curated-transformers.CamembertTransformer.v1
 
 Construct a CamemBERT transformer model.
 
-| Name                           | Description                                                                 |
-| ------------------------------ | --------------------------------------------------------------------------- |
-| `vocab_size`                   | Vocabulary size. ~~int~~                                                    |
-| `with_spans`                   | Callback that constructs a span generator model. ~~Callable~~               |
-| `piece_encoder`                | The piece encoder to segment input tokens. ~~Model~~                        |
-| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~                  |
-| `hidden_act`                   | Activation used by the point-wise feed-forward layers. ~~str~~              |
-| `hidden_dropout_prob`          | Dropout probabilty of the point-wise feed-forward and ~~float~~             |
-| `hidden_dropout_prob`          | embedding layers. ~~float~~                                                 |
-| `hidden_width`                 | Width of the final representations. ~~int~~                                 |
-| `intermediate_width`           | Width of the intermediate projection layer in the ~~int~~                   |
-| `intermediate_width`           | point-wise feed-forward layer. ~~int~~                                      |
-| `layer_norm_eps`               | Epsilon for layer normalization. ~~float~~                                  |
-| `max_position_embeddings`      | Maximum length of position embeddings. ~~int~~                              |
-| `model_max_length`             | Maximum length of model inputs. ~~int~~                                     |
-| `num_attention_heads`          | Number of self-attention heads. ~~int~~                                     |
-| `num_hidden_layers`            | Number of hidden layers. ~~int~~                                            |
-| `padding_idx`                  | Index of the padding meta-token. ~~int~~                                    |
-| `type_vocab_size`              | Type vocabulary size. ~~int~~                                               |
-| `mixed_precision`              | Use mixed-precision training. ~~bool~~                                      |
-| `grad_scaler_config`           | Configuration passed to the PyTorch gradient scaler. ~~dict~~               |
-| **CREATES**                    | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ |
+| Name                           | Description                                                                              |
+| ------------------------------ | ---------------------------------------------------------------------------------------- |
+| `vocab_size`                   | Vocabulary size. ~~int~~                                                                 |
+| `with_spans`                   | Callback that constructs a span generator model. ~~Callable~~                            |
+| `piece_encoder`                | The piece encoder to segment input tokens. ~~Model~~                                     |
+| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~                               |
+| `hidden_act`                   | Activation used by the point-wise feed-forward layers. ~~str~~                           |
+| `hidden_dropout_prob`          | Dropout probabilty of the point-wise feed-forward and embedding layers. ~~float~~        |
+| `hidden_width`                 | Width of the final representations. ~~int~~                                              |
+| `intermediate_width`           | Width of the intermediate projection layer in the point-wise feed-forward layer. ~~int~~ |
+| `layer_norm_eps`               | Epsilon for layer normalization. ~~float~~                                               |
+| `max_position_embeddings`      | Maximum length of position embeddings. ~~int~~                                           |
+| `model_max_length`             | Maximum length of model inputs. ~~int~~                                                  |
+| `num_attention_heads`          | Number of self-attention heads. ~~int~~                                                  |
+| `num_hidden_layers`            | Number of hidden layers. ~~int~~                                                         |
+| `padding_idx`                  | Index of the padding meta-token. ~~int~~                                                 |
+| `type_vocab_size`              | Type vocabulary size. ~~int~~                                                            |
+| `mixed_precision`              | Use mixed-precision training. ~~bool~~                                                   |
+| `grad_scaler_config`           | Configuration passed to the PyTorch gradient scaler. ~~dict~~                            |
+| **CREATES**                    | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~              |
 
 ### spacy-curated-transformers.RobertaTransformer.v1
 
 Construct a RoBERTa transformer model.
 
-| Name                           | Description                                                                 |
-| ------------------------------ | --------------------------------------------------------------------------- |
-| `vocab_size`                   | Vocabulary size. ~~int~~                                                    |
-| `with_spans`                   | Callback that constructs a span generator model. ~~Callable~~               |
-| `piece_encoder`                | The piece encoder to segment input tokens. ~~Model~~                        |
-| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~                  |
-| `hidden_act`                   | Activation used by the point-wise feed-forward layers. ~~str~~              |
-| `hidden_dropout_prob`          | Dropout probabilty of the point-wise feed-forward and ~~float~~             |
-| `hidden_dropout_prob`          | embedding layers. ~~float~~                                                 |
-| `hidden_width`                 | Width of the final representations. ~~int~~                                 |
-| `intermediate_width`           | Width of the intermediate projection layer in the ~~int~~                   |
-| `intermediate_width`           | point-wise feed-forward layer. ~~int~~                                      |
-| `layer_norm_eps`               | Epsilon for layer normalization. ~~float~~                                  |
-| `max_position_embeddings`      | Maximum length of position embeddings. ~~int~~                              |
-| `model_max_length`             | Maximum length of model inputs. ~~int~~                                     |
-| `num_attention_heads`          | Number of self-attention heads. ~~int~~                                     |
-| `num_hidden_layers`            | Number of hidden layers. ~~int~~                                            |
-| `padding_idx`                  | Index of the padding meta-token. ~~int~~                                    |
-| `type_vocab_size`              | Type vocabulary size. ~~int~~                                               |
-| `mixed_precision`              | Use mixed-precision training. ~~bool~~                                      |
-| `grad_scaler_config`           | Configuration passed to the PyTorch gradient scaler. ~~dict~~               |
-| **CREATES**                    | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ |
+| Name                           | Description                                                                              |
+| ------------------------------ | ---------------------------------------------------------------------------------------- |
+| `vocab_size`                   | Vocabulary size. ~~int~~                                                                 |
+| `with_spans`                   | Callback that constructs a span generator model. ~~Callable~~                            |
+| `piece_encoder`                | The piece encoder to segment input tokens. ~~Model~~                                     |
+| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~                               |
+| `hidden_act`                   | Activation used by the point-wise feed-forward layers. ~~str~~                           |
+| `hidden_dropout_prob`          | Dropout probabilty of the point-wise feed-forward and embedding layers. ~~float~~        |
+| `hidden_width`                 | Width of the final representations. ~~int~~                                              |
+| `intermediate_width`           | Width of the intermediate projection layer in the point-wise feed-forward layer. ~~int~~ |
+| `layer_norm_eps`               | Epsilon for layer normalization. ~~float~~                                               |
+| `max_position_embeddings`      | Maximum length of position embeddings. ~~int~~                                           |
+| `model_max_length`             | Maximum length of model inputs. ~~int~~                                                  |
+| `num_attention_heads`          | Number of self-attention heads. ~~int~~                                                  |
+| `num_hidden_layers`            | Number of hidden layers. ~~int~~                                                         |
+| `padding_idx`                  | Index of the padding meta-token. ~~int~~                                                 |
+| `type_vocab_size`              | Type vocabulary size. ~~int~~                                                            |
+| `mixed_precision`              | Use mixed-precision training. ~~bool~~                                                   |
+| `grad_scaler_config`           | Configuration passed to the PyTorch gradient scaler. ~~dict~~                            |
+| **CREATES**                    | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~              |
 
 ### spacy-curated-transformers.XlmrTransformer.v1
 
 Construct a XLM-RoBERTa transformer model.
 
-| Name                           | Description                                                                 |
-| ------------------------------ | --------------------------------------------------------------------------- |
-| `vocab_size`                   | Vocabulary size. ~~int~~                                                    |
-| `with_spans`                   | Callback that constructs a span generator model. ~~Callable~~               |
-| `piece_encoder`                | The piece encoder to segment input tokens. ~~Model~~                        |
-| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~                  |
-| `hidden_act`                   | Activation used by the point-wise feed-forward layers. ~~str~~              |
-| `hidden_dropout_prob`          | Dropout probabilty of the point-wise feed-forward and ~~float~~             |
-| `hidden_dropout_prob`          | embedding layers. ~~float~~                                                 |
-| `hidden_width`                 | Width of the final representations. ~~int~~                                 |
-| `intermediate_width`           | Width of the intermediate projection layer in the ~~int~~                   |
-| `intermediate_width`           | point-wise feed-forward layer. ~~int~~                                      |
-| `layer_norm_eps`               | Epsilon for layer normalization. ~~float~~                                  |
-| `max_position_embeddings`      | Maximum length of position embeddings. ~~int~~                              |
-| `model_max_length`             | Maximum length of model inputs. ~~int~~                                     |
-| `num_attention_heads`          | Number of self-attention heads. ~~int~~                                     |
-| `num_hidden_layers`            | Number of hidden layers. ~~int~~                                            |
-| `padding_idx`                  | Index of the padding meta-token. ~~int~~                                    |
-| `type_vocab_size`              | Type vocabulary size. ~~int~~                                               |
-| `mixed_precision`              | Use mixed-precision training. ~~bool~~                                      |
-| `grad_scaler_config`           | Configuration passed to the PyTorch gradient scaler. ~~dict~~               |
-| **CREATES**                    | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~ |
+| Name                           | Description                                                                              |
+| ------------------------------ | ---------------------------------------------------------------------------------------- |
+| `vocab_size`                   | Vocabulary size. ~~int~~                                                                 |
+| `with_spans`                   | Callback that constructs a span generator model. ~~Callable~~                            |
+| `piece_encoder`                | The piece encoder to segment input tokens. ~~Model~~                                     |
+| `attention_probs_dropout_prob` | Dropout probabilty of the self-attention layers. ~~float~~                               |
+| `hidden_act`                   | Activation used by the point-wise feed-forward layers. ~~str~~                           |
+| `hidden_dropout_prob`          | Dropout probabilty of the point-wise feed-forward and embedding layers. ~~float~~        |
+| `hidden_width`                 | Width of the final representations. ~~int~~                                              |
+| `intermediate_width`           | Width of the intermediate projection layer in the point-wise feed-forward layer. ~~int~~ |
+| `layer_norm_eps`               | Epsilon for layer normalization. ~~float~~                                               |
+| `max_position_embeddings`      | Maximum length of position embeddings. ~~int~~                                           |
+| `model_max_length`             | Maximum length of model inputs. ~~int~~                                                  |
+| `num_attention_heads`          | Number of self-attention heads. ~~int~~                                                  |
+| `num_hidden_layers`            | Number of hidden layers. ~~int~~                                                         |
+| `padding_idx`                  | Index of the padding meta-token. ~~int~~                                                 |
+| `type_vocab_size`              | Type vocabulary size. ~~int~~                                                            |
+| `mixed_precision`              | Use mixed-precision training. ~~bool~~                                                   |
+| `grad_scaler_config`           | Configuration passed to the PyTorch gradient scaler. ~~dict~~                            |
+| **CREATES**                    | The model using the architecture ~~Model[TransformerInT, TransformerOutT]~~              |
 
 ### spacy-curated-transformers.ScalarWeight.v1