add docs for upstream argument of TransformerListener

This commit is contained in:
svlandeg 2020-11-09 21:42:58 +01:00
parent 363ac73c72
commit 789fb3d124
3 changed files with 15 additions and 12 deletions

View File

@ -125,8 +125,9 @@ class Warnings:
class Errors: class Errors:
E001 = ("No component '{name}' found in pipeline. Available names: {opts}") E001 = ("No component '{name}' found in pipeline. Available names: {opts}")
E002 = ("Can't find factory for '{name}' for language {lang} ({lang_code}). " E002 = ("Can't find factory for '{name}' for language {lang} ({lang_code}). "
"This usually happens when spaCy calls `nlp.{method}` with custom " "This usually happens when spaCy calls `nlp.{method}` with a custom "
"component name that's not registered on the current language class. " "component name that's not registered on the current language class. "
"If you're using a Transformer, make sure to install 'spacy-transformers'. "
"If you're using a custom component, make sure you've added the " "If you're using a custom component, make sure you've added the "
"decorator `@Language.component` (for function components) or " "decorator `@Language.component` (for function components) or "
"`@Language.factory` (for class components).\n\nAvailable " "`@Language.factory` (for class components).\n\nAvailable "

View File

@ -94,7 +94,7 @@ def build_text_classifier(
# TODO: move to legacy # TODO: move to legacy
@registry.architectures.register("spacy.TextCatEnsemble.v1") @registry.architectures.register("spacy.TextCatEnsemble.v1")
def build_text_classifier( def build_text_classifier_v1(
width: int, width: int,
embed_size: int, embed_size: int,
pretrained_vectors: Optional[bool], pretrained_vectors: Optional[bool],

View File

@ -143,10 +143,10 @@ argument that connects to the shared `tok2vec` component in the pipeline.
Construct an embedding layer that separately embeds a number of lexical Construct an embedding layer that separately embeds a number of lexical
attributes using hash embedding, concatenates the results, and passes it through attributes using hash embedding, concatenates the results, and passes it through
a feed-forward subnetwork to build a mixed representation. The features used a feed-forward subnetwork to build a mixed representation. The features used can
can be configured with the `attrs` argument. The suggested attributes are be configured with the `attrs` argument. The suggested attributes are `NORM`,
`NORM`, `PREFIX`, `SUFFIX` and `SHAPE`. This lets the model take into account `PREFIX`, `SUFFIX` and `SHAPE`. This lets the model take into account some
some subword information, without construction a fully character-based subword information, without construction a fully character-based
representation. If pretrained vectors are available, they can be included in the representation. If pretrained vectors are available, they can be included in the
representation as well, with the vectors table will be kept static (i.e. it's representation as well, with the vectors table will be kept static (i.e. it's
not updated). not updated).
@ -393,11 +393,12 @@ operate over wordpieces, which usually don't align one-to-one against spaCy
tokens. The layer therefore requires a reduction operation in order to calculate tokens. The layer therefore requires a reduction operation in order to calculate
a single token vector given zero or more wordpiece vectors. a single token vector given zero or more wordpiece vectors.
| Name | Description | | Name | Description |
| ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `pooling` | A reduction layer used to calculate the token vectors based on zero or more wordpiece vectors. If in doubt, mean pooling (see [`reduce_mean`](https://thinc.ai/docs/api-layers#reduce_mean)) is usually a good choice. ~~Model[Ragged, Floats2d]~~ | | `pooling` | A reduction layer used to calculate the token vectors based on zero or more wordpiece vectors. If in doubt, mean pooling (see [`reduce_mean`](https://thinc.ai/docs/api-layers#reduce_mean)) is usually a good choice. ~~Model[Ragged, Floats2d]~~ |
| `grad_factor` | Reweight gradients from the component before passing them upstream. You can set this to `0` to "freeze" the transformer weights with respect to the component, or use it to make some components more significant than others. Leaving it at `1.0` is usually fine. ~~float~~ | | `grad_factor` | Reweight gradients from the component before passing them upstream. You can set this to `0` to "freeze" the transformer weights with respect to the component, or use it to make some components more significant than others. Leaving it at `1.0` is usually fine. ~~float~~ |
| **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~ | | `upstream` | A string to identify the "upstream" `Transformer` component to communicate with. By default, the upstream name is the wildcard string `"*"`, but you could also specify the name of the `Transformer` component. You'll almost never have multiple upstream `Transformer` components, so the wildcard string will almost always be fine. ~~str~~ |
| **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~ |
### spacy-transformers.Tok2VecTransformer.v1 {#Tok2VecTransformer} ### spacy-transformers.Tok2VecTransformer.v1 {#Tok2VecTransformer}
@ -563,7 +564,8 @@ from the linear model, where it is stored in `model.attrs["multi_label"]`.
<Accordion title="spacy.TextCatEnsemble.v1 definition" spaced> <Accordion title="spacy.TextCatEnsemble.v1 definition" spaced>
The v1 was functionally similar, but used an internal `tok2vec` instead of taking it as argument. The v1 was functionally similar, but used an internal `tok2vec` instead of
taking it as argument.
| Name | Description | | Name | Description |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |