Update docs and docstring [ci skip]

This commit is contained in:
Ines Montani 2020-10-05 21:55:27 +02:00
parent 9614e53b02
commit 1a554bdcb1
2 changed files with 17 additions and 39 deletions

View File

@ -110,12 +110,12 @@ def MultiHashEmbed(
The features used can be configured with the 'attrs' argument. The suggested The features used can be configured with the 'attrs' argument. The suggested
attributes are NORM, PREFIX, SUFFIX and SHAPE. This lets the model take into attributes are NORM, PREFIX, SUFFIX and SHAPE. This lets the model take into
account some subword information, without contruction a fully character-based account some subword information, without construction a fully character-based
representation. If pretrained vectors are available, they can be included in representation. If pretrained vectors are available, they can be included in
the representation as well, with the vectors table will be kept static the representation as well, with the vectors table will be kept static
(i.e. it's not updated). (i.e. it's not updated).
The `width` parameter specifices the output width of the layer and the widths The `width` parameter specifies the output width of the layer and the widths
of all embedding tables. If static vectors are included, a learned linear of all embedding tables. If static vectors are included, a learned linear
layer is used to map the vectors to the specified width before concatenating layer is used to map the vectors to the specified width before concatenating
it with the other embedding outputs. A single Maxout layer is then used to it with the other embedding outputs. A single Maxout layer is then used to

View File

@ -142,43 +142,21 @@ argument that connects to the shared `tok2vec` component in the pipeline.
> ``` > ```
Construct an embedding layer that separately embeds a number of lexical Construct an embedding layer that separately embeds a number of lexical
attributes using hash embedding, concatenates the results, and passes it attributes using hash embedding, concatenates the results, and passes it through
through a feed-forward subnetwork to build a mixed representations. a feed-forward subnetwork to build a mixed representations. The features used
can be configured with the `attrs` argument. The suggested attributes are
The features used can be configured with the 'attrs' argument. The suggested `NORM`, `PREFIX`, `SUFFIX` and `SHAPE`. This lets the model take into account
attributes are NORM, PREFIX, SUFFIX and SHAPE. This lets the model take into some subword information, without construction a fully character-based
account some subword information, without contruction a fully character-based representation. If pretrained vectors are available, they can be included in the
representation. If pretrained vectors are available, they can be included in representation as well, with the vectors table will be kept static (i.e. it's
the representation as well, with the vectors table will be kept static not updated).
(i.e. it's not updated).
The `width` parameter specifices the output width of the layer and the widths
of all embedding tables. If static vectors are included, a learned linear
layer is used to map the vectors to the specified width before concatenating
it with the other embedding outputs. A single Maxout layer is then used to
reduce the concatenated vectors to the final width.
The `rows` parameter controls the number of rows used by the `HashEmbed`
tables. The HashEmbed layer needs surprisingly few rows, due to its use of
the hashing trick. Generally between 2000 and 10000 rows is sufficient,
even for very large vocabularies. A number of rows must be specified for each
table, so the `rows` list must be of the same length as the `attrs` parameter.
attrs (list of attr IDs): The token attributes to embed. A separate
embedding table will be constructed for each attribute.
rows (List[int]): The number of rows in the embedding tables. Must have the
same length as attrs.
include_static_vectors (bool): Whether to also use static word vectors.
Requires a vectors table to be loaded in the Doc objects' vocab.
| Name | Description | | Name | Description |
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `width` | The output width. Also used as the width of the embedding tables. Recommended values are between `64` and `300`. ~~int~~ | | `width` | The output width. Also used as the width of the embedding tables. Recommended values are between `64` and `300`. If static vectors are included, a learned linear layer is used to map the vectors to the specified width before concatenating it with the other embedding outputs. A single maxout layer is then used to reduce the concatenated vectors to the final width. ~~int~~ |
| `attrs` | The token attributes to embed. A separate | | `attrs` | The token attributes to embed. A separate embedding table will be constructed for each attribute. ~~List[Union[int, str]]~~ |
embedding table will be constructed for each attribute. ~~List[Union[int, str]]~~ | | `rows` | The number of rows for each embedding tables. Can be low, due to the hashing trick. Recommended values are between `1000` and `10000`. The layer needs surprisingly few rows, due to its use of the hashing trick. Generally between 2000 and 10000 rows is sufficient, even for very large vocabularies. A number of rows must be specified for each table, so the `rows` list must be of the same length as the `attrs` parameter. ~~List[int]~~ |
| `rows` | The number of rows for each embedding tables. Can be low, due to the hashing trick. Recommended values are between `1000` and `10000`. ~~List[int]~~ | | `include_static_vectors` | Whether to also use static word vectors. Requires a vectors table to be loaded in the [`Doc`](/api/doc) objects' vocab. ~~bool~~ |
| `include_static_vectors` | Whether to also use static word vectors. Requires a vectors table to be loaded in the [Doc](/api/doc) objects' vocab. ~~bool~~ |
| **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~ | | **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~ |
### spacy.CharacterEmbed.v1 {#CharacterEmbed} ### spacy.CharacterEmbed.v1 {#CharacterEmbed}