mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-24 17:06:29 +03:00
Update docs and docstring [ci skip]
This commit is contained in:
parent
9614e53b02
commit
1a554bdcb1
|
@ -110,12 +110,12 @@ def MultiHashEmbed(
|
|||
|
||||
The features used can be configured with the 'attrs' argument. The suggested
|
||||
attributes are NORM, PREFIX, SUFFIX and SHAPE. This lets the model take into
|
||||
account some subword information, without contruction a fully character-based
|
||||
account some subword information, without construction a fully character-based
|
||||
representation. If pretrained vectors are available, they can be included in
|
||||
the representation as well, with the vectors table will be kept static
|
||||
(i.e. it's not updated).
|
||||
|
||||
The `width` parameter specifices the output width of the layer and the widths
|
||||
The `width` parameter specifies the output width of the layer and the widths
|
||||
of all embedding tables. If static vectors are included, a learned linear
|
||||
layer is used to map the vectors to the specified width before concatenating
|
||||
it with the other embedding outputs. A single Maxout layer is then used to
|
||||
|
|
|
@ -142,44 +142,22 @@ argument that connects to the shared `tok2vec` component in the pipeline.
|
|||
> ```
|
||||
|
||||
Construct an embedding layer that separately embeds a number of lexical
|
||||
attributes using hash embedding, concatenates the results, and passes it
|
||||
through a feed-forward subnetwork to build a mixed representations.
|
||||
attributes using hash embedding, concatenates the results, and passes it through
|
||||
a feed-forward subnetwork to build a mixed representations. The features used
|
||||
can be configured with the `attrs` argument. The suggested attributes are
|
||||
`NORM`, `PREFIX`, `SUFFIX` and `SHAPE`. This lets the model take into account
|
||||
some subword information, without construction a fully character-based
|
||||
representation. If pretrained vectors are available, they can be included in the
|
||||
representation as well, with the vectors table will be kept static (i.e. it's
|
||||
not updated).
|
||||
|
||||
The features used can be configured with the 'attrs' argument. The suggested
|
||||
attributes are NORM, PREFIX, SUFFIX and SHAPE. This lets the model take into
|
||||
account some subword information, without contruction a fully character-based
|
||||
representation. If pretrained vectors are available, they can be included in
|
||||
the representation as well, with the vectors table will be kept static
|
||||
(i.e. it's not updated).
|
||||
|
||||
The `width` parameter specifices the output width of the layer and the widths
|
||||
of all embedding tables. If static vectors are included, a learned linear
|
||||
layer is used to map the vectors to the specified width before concatenating
|
||||
it with the other embedding outputs. A single Maxout layer is then used to
|
||||
reduce the concatenated vectors to the final width.
|
||||
|
||||
The `rows` parameter controls the number of rows used by the `HashEmbed`
|
||||
tables. The HashEmbed layer needs surprisingly few rows, due to its use of
|
||||
the hashing trick. Generally between 2000 and 10000 rows is sufficient,
|
||||
even for very large vocabularies. A number of rows must be specified for each
|
||||
table, so the `rows` list must be of the same length as the `attrs` parameter.
|
||||
|
||||
attrs (list of attr IDs): The token attributes to embed. A separate
|
||||
embedding table will be constructed for each attribute.
|
||||
rows (List[int]): The number of rows in the embedding tables. Must have the
|
||||
same length as attrs.
|
||||
include_static_vectors (bool): Whether to also use static word vectors.
|
||||
Requires a vectors table to be loaded in the Doc objects' vocab.
|
||||
|
||||
|
||||
| Name | Description |
|
||||
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `width` | The output width. Also used as the width of the embedding tables. Recommended values are between `64` and `300`. ~~int~~ |
|
||||
| `attrs` | The token attributes to embed. A separate |
|
||||
embedding table will be constructed for each attribute. ~~List[Union[int, str]]~~ |
|
||||
| `rows` | The number of rows for each embedding tables. Can be low, due to the hashing trick. Recommended values are between `1000` and `10000`. ~~List[int]~~ |
|
||||
| `include_static_vectors` | Whether to also use static word vectors. Requires a vectors table to be loaded in the [Doc](/api/doc) objects' vocab. ~~bool~~ |
|
||||
| **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~ |
|
||||
| Name | Description |
|
||||
| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `width` | The output width. Also used as the width of the embedding tables. Recommended values are between `64` and `300`. If static vectors are included, a learned linear layer is used to map the vectors to the specified width before concatenating it with the other embedding outputs. A single maxout layer is then used to reduce the concatenated vectors to the final width. ~~int~~ |
|
||||
| `attrs` | The token attributes to embed. A separate embedding table will be constructed for each attribute. ~~List[Union[int, str]]~~ |
|
||||
| `rows` | The number of rows for each embedding tables. Can be low, due to the hashing trick. Recommended values are between `1000` and `10000`. The layer needs surprisingly few rows, due to its use of the hashing trick. Generally between 2000 and 10000 rows is sufficient, even for very large vocabularies. A number of rows must be specified for each table, so the `rows` list must be of the same length as the `attrs` parameter. ~~List[int]~~ |
|
||||
| `include_static_vectors` | Whether to also use static word vectors. Requires a vectors table to be loaded in the [`Doc`](/api/doc) objects' vocab. ~~bool~~ |
|
||||
| **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~ |
|
||||
|
||||
### spacy.CharacterEmbed.v1 {#CharacterEmbed}
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user