mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 10:16:27 +03:00
Update docs and docstring [ci skip]
This commit is contained in:
parent
9614e53b02
commit
1a554bdcb1
|
@ -110,12 +110,12 @@ def MultiHashEmbed(
|
||||||
|
|
||||||
The features used can be configured with the 'attrs' argument. The suggested
|
The features used can be configured with the 'attrs' argument. The suggested
|
||||||
attributes are NORM, PREFIX, SUFFIX and SHAPE. This lets the model take into
|
attributes are NORM, PREFIX, SUFFIX and SHAPE. This lets the model take into
|
||||||
account some subword information, without contruction a fully character-based
|
account some subword information, without construction a fully character-based
|
||||||
representation. If pretrained vectors are available, they can be included in
|
representation. If pretrained vectors are available, they can be included in
|
||||||
the representation as well, with the vectors table will be kept static
|
the representation as well, with the vectors table will be kept static
|
||||||
(i.e. it's not updated).
|
(i.e. it's not updated).
|
||||||
|
|
||||||
The `width` parameter specifices the output width of the layer and the widths
|
The `width` parameter specifies the output width of the layer and the widths
|
||||||
of all embedding tables. If static vectors are included, a learned linear
|
of all embedding tables. If static vectors are included, a learned linear
|
||||||
layer is used to map the vectors to the specified width before concatenating
|
layer is used to map the vectors to the specified width before concatenating
|
||||||
it with the other embedding outputs. A single Maxout layer is then used to
|
it with the other embedding outputs. A single Maxout layer is then used to
|
||||||
|
|
|
@ -142,43 +142,21 @@ argument that connects to the shared `tok2vec` component in the pipeline.
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
Construct an embedding layer that separately embeds a number of lexical
|
Construct an embedding layer that separately embeds a number of lexical
|
||||||
attributes using hash embedding, concatenates the results, and passes it
|
attributes using hash embedding, concatenates the results, and passes it through
|
||||||
through a feed-forward subnetwork to build a mixed representations.
|
a feed-forward subnetwork to build a mixed representations. The features used
|
||||||
|
can be configured with the `attrs` argument. The suggested attributes are
|
||||||
The features used can be configured with the 'attrs' argument. The suggested
|
`NORM`, `PREFIX`, `SUFFIX` and `SHAPE`. This lets the model take into account
|
||||||
attributes are NORM, PREFIX, SUFFIX and SHAPE. This lets the model take into
|
some subword information, without construction a fully character-based
|
||||||
account some subword information, without contruction a fully character-based
|
representation. If pretrained vectors are available, they can be included in the
|
||||||
representation. If pretrained vectors are available, they can be included in
|
representation as well, with the vectors table will be kept static (i.e. it's
|
||||||
the representation as well, with the vectors table will be kept static
|
not updated).
|
||||||
(i.e. it's not updated).
|
|
||||||
|
|
||||||
The `width` parameter specifices the output width of the layer and the widths
|
|
||||||
of all embedding tables. If static vectors are included, a learned linear
|
|
||||||
layer is used to map the vectors to the specified width before concatenating
|
|
||||||
it with the other embedding outputs. A single Maxout layer is then used to
|
|
||||||
reduce the concatenated vectors to the final width.
|
|
||||||
|
|
||||||
The `rows` parameter controls the number of rows used by the `HashEmbed`
|
|
||||||
tables. The HashEmbed layer needs surprisingly few rows, due to its use of
|
|
||||||
the hashing trick. Generally between 2000 and 10000 rows is sufficient,
|
|
||||||
even for very large vocabularies. A number of rows must be specified for each
|
|
||||||
table, so the `rows` list must be of the same length as the `attrs` parameter.
|
|
||||||
|
|
||||||
attrs (list of attr IDs): The token attributes to embed. A separate
|
|
||||||
embedding table will be constructed for each attribute.
|
|
||||||
rows (List[int]): The number of rows in the embedding tables. Must have the
|
|
||||||
same length as attrs.
|
|
||||||
include_static_vectors (bool): Whether to also use static word vectors.
|
|
||||||
Requires a vectors table to be loaded in the Doc objects' vocab.
|
|
||||||
|
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `width` | The output width. Also used as the width of the embedding tables. Recommended values are between `64` and `300`. ~~int~~ |
|
| `width` | The output width. Also used as the width of the embedding tables. Recommended values are between `64` and `300`. If static vectors are included, a learned linear layer is used to map the vectors to the specified width before concatenating it with the other embedding outputs. A single maxout layer is then used to reduce the concatenated vectors to the final width. ~~int~~ |
|
||||||
| `attrs` | The token attributes to embed. A separate |
|
| `attrs` | The token attributes to embed. A separate embedding table will be constructed for each attribute. ~~List[Union[int, str]]~~ |
|
||||||
embedding table will be constructed for each attribute. ~~List[Union[int, str]]~~ |
|
| `rows` | The number of rows for each embedding tables. Can be low, due to the hashing trick. Recommended values are between `1000` and `10000`. The layer needs surprisingly few rows, due to its use of the hashing trick. Generally between 2000 and 10000 rows is sufficient, even for very large vocabularies. A number of rows must be specified for each table, so the `rows` list must be of the same length as the `attrs` parameter. ~~List[int]~~ |
|
||||||
| `rows` | The number of rows for each embedding tables. Can be low, due to the hashing trick. Recommended values are between `1000` and `10000`. ~~List[int]~~ |
|
| `include_static_vectors` | Whether to also use static word vectors. Requires a vectors table to be loaded in the [`Doc`](/api/doc) objects' vocab. ~~bool~~ |
|
||||||
| `include_static_vectors` | Whether to also use static word vectors. Requires a vectors table to be loaded in the [Doc](/api/doc) objects' vocab. ~~bool~~ |
|
|
||||||
| **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~ |
|
| **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~ |
|
||||||
|
|
||||||
### spacy.CharacterEmbed.v1 {#CharacterEmbed}
|
### spacy.CharacterEmbed.v1 {#CharacterEmbed}
|
||||||
|
|
Loading…
Reference in New Issue
Block a user