Update docs and docstring [ci skip]

2025-07-15 18:52:29 +03:00 · 2020-10-05 21:55:27 +02:00 · 2020-10-05 21:55:27 +02:00 · 1a554bdcb1
commit 1a554bdcb1
parent 9614e53b02
2 changed files with 17 additions and 39 deletions
--- a/spacy/ml/models/tok2vec.py
+++ b/spacy/ml/models/tok2vec.py
@ -110,12 +110,12 @@ def MultiHashEmbed(

    The features used can be configured with the 'attrs' argument. The suggested
    attributes are NORM, PREFIX, SUFFIX and SHAPE. This lets the model take into
-    account some subword information, without contruction a fully character-based
+    account some subword information, without construction a fully character-based
    representation. If pretrained vectors are available, they can be included in
    the representation as well, with the vectors table will be kept static
    (i.e. it's not updated).

-    The `width` parameter specifices the output width of the layer and the widths
+    The `width` parameter specifies the output width of the layer and the widths
    of all embedding tables. If static vectors are included, a learned linear
    layer is used to map the vectors to the specified width before concatenating
    it with the other embedding outputs. A single Maxout layer is then used to
--- a/website/docs/api/architectures.md
+++ b/website/docs/api/architectures.md
@ -142,43 +142,21 @@ argument that connects to the shared `tok2vec` component in the pipeline.
 > ```

 Construct an embedding layer that separately embeds a number of lexical
-attributes using hash embedding, concatenates the results, and passes it
-through a feed-forward subnetwork to build a mixed representations.
-
-The features used can be configured with the 'attrs' argument. The suggested
-attributes are NORM, PREFIX, SUFFIX and SHAPE. This lets the model take into
-account some subword information, without contruction a fully character-based
-representation. If pretrained vectors are available, they can be included in
-the representation as well, with the vectors table will be kept static
-(i.e. it's not updated).
-
-The `width` parameter specifices the output width of the layer and the widths
-of all embedding tables. If static vectors are included, a learned linear
-layer is used to map the vectors to the specified width before concatenating
-it with the other embedding outputs. A single Maxout layer is then used to
-reduce the concatenated vectors to the final width.
-    
-The `rows` parameter controls the number of rows used by the `HashEmbed`
-tables. The HashEmbed layer needs surprisingly few rows, due to its use of
-the hashing trick. Generally between 2000 and 10000 rows is sufficient,
-even for very large vocabularies. A number of rows must be specified for each
-table, so the `rows` list must be of the same length as the `attrs` parameter.
-
-    attrs (list of attr IDs): The token attributes to embed. A separate
-        embedding table will be constructed for each attribute.
-    rows (List[int]): The number of rows in the embedding tables. Must have the
-        same length as attrs.
-    include_static_vectors (bool): Whether to also use static word vectors.
-        Requires a vectors table to be loaded in the Doc objects' vocab.
-
+attributes using hash embedding, concatenates the results, and passes it through
+a feed-forward subnetwork to build a mixed representations. The features used
+can be configured with the `attrs` argument. The suggested attributes are
+`NORM`, `PREFIX`, `SUFFIX` and `SHAPE`. This lets the model take into account
+some subword information, without construction a fully character-based
+representation. If pretrained vectors are available, they can be included in the
+representation as well, with the vectors table will be kept static (i.e. it's
+not updated).

 | Name                     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                        |
-| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `width`                   | The output width. Also used as the width of the embedding tables. Recommended values are between `64` and `300`. ~~int~~ |
-| `attrs`                   | The token attributes to embed. A separate |
-embedding table will be constructed for each attribute. ~~List[Union[int, str]]~~ |
-| `rows`                    | The number of rows for each embedding tables. Can be low, due to the hashing trick. Recommended values are between `1000` and `10000`. ~~List[int]~~ |
-| `include_static_vectors`  | Whether to also use static word vectors. Requires a vectors table to be loaded in the [Doc](/api/doc) objects' vocab. ~~bool~~ |
+| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `width`                  | The output width. Also used as the width of the embedding tables. Recommended values are between `64` and `300`. If static vectors are included, a learned linear layer is used to map the vectors to the specified width before concatenating it with the other embedding outputs. A single maxout layer is then used to reduce the concatenated vectors to the final width. ~~int~~                                                              |
+| `attrs`                  | The token attributes to embed. A separate embedding table will be constructed for each attribute. ~~List[Union[int, str]]~~                                                                                                                                                                                                                                                                                                                        |
+| `rows`                   | The number of rows for each embedding tables. Can be low, due to the hashing trick. Recommended values are between `1000` and `10000`. The layer needs surprisingly few rows, due to its use of the hashing trick. Generally between 2000 and 10000 rows is sufficient, even for very large vocabularies. A number of rows must be specified for each table, so the `rows` list must be of the same length as the `attrs` parameter. ~~List[int]~~ |
+| `include_static_vectors` | Whether to also use static word vectors. Requires a vectors table to be loaded in the [`Doc`](/api/doc) objects' vocab. ~~bool~~                                                                                                                                                                                                                                                                                                                   |
 | **CREATES**              | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~                                                                                                                                                                                                                                                                                                                                                                             |

 ### spacy.CharacterEmbed.v1 {#CharacterEmbed}