mirror of
https://github.com/explosion/spaCy.git
synced 2025-04-21 01:21:58 +03:00
Fill in the DocTransformerOutput
section
This commit is contained in:
parent
921be30331
commit
0d9aa48865
|
@ -398,32 +398,52 @@ serialization by passing in the string names via the `exclude` argument.
|
|||
| `cfg` | The config file. You usually don't want to exclude this. |
|
||||
| `model` | The binary model data. You usually don't want to exclude this. |
|
||||
|
||||
## DocTransformerOutput {id="transformerdata",tag="dataclass"}
|
||||
## DocTransformerOutput {id="doctransformeroutput",tag="dataclass"}
|
||||
|
||||
CuratedTransformer tokens and outputs for one `Doc` object. The transformer
|
||||
models return tensors that refer to a whole padded batch of documents. These
|
||||
tensors are wrapped into the
|
||||
[FullCuratedTransformerBatch](/api/curatedtransformer#fulltransformerbatch)
|
||||
object. The `FullCuratedTransformerBatch` then splits out the per-document data,
|
||||
which is handled by this class. Instances of this class are typically assigned
|
||||
Curated Transformer outputs for one `Doc` object. Stores the dense
|
||||
representations generated by the transformer for each piece identifier. Piece
|
||||
identifiers are grouped by token. Instances of this class are typically assigned
|
||||
to the [`Doc._.trf_data`](/api/curatedtransformer#assigned-attributes) extension
|
||||
attribute.
|
||||
|
||||
| Name | Description |
|
||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `tokens` | A slice of the tokens data produced by the tokenizer. This may have several fields, including the token IDs, the texts and the attention mask. See the [`transformers.BatchEncoding`](https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.BatchEncoding) object for details. ~~dict~~ |
|
||||
| `model_output` | The model output from the transformer model, determined by the model and transformer config. New in `spacy-transformers` v1.1.0. ~~transformers.file_utils.ModelOutput~~ |
|
||||
| `tensors` | The `model_output` in the earlier `transformers` tuple format converted using [`ModelOutput.to_tuple()`](https://huggingface.co/transformers/main_classes/output.html#transformers.file_utils.ModelOutput.to_tuple). Returns `Tuple` instead of `List` as of `spacy-transformers` v1.1.0. ~~Tuple[Union[FloatsXd, List[FloatsXd]]]~~ |
|
||||
| `align` | Alignment from the `Doc`'s tokenization to the wordpieces. This is a ragged array, where `align.lengths[i]` indicates the number of wordpiece tokens that token `i` aligns against. The actual indices are provided at `align[i].dataXd`. ~~Ragged~~ |
|
||||
| `width` | The width of the last hidden layer. ~~int~~ |
|
||||
| Name | Description |
|
||||
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `all_outputs` | List of `Ragged` tensors that correspends to outputs of the different transformer layers. Each tensor element corresponds to a piece identifier's representation. ~~List[Ragged]~~ |
|
||||
| `last_layer_only` | If only the last transformer layer's outputs are preserved. ~~bool~~ |
|
||||
|
||||
### DocTransformerOutput.empty {id="transformerdata-empty",tag="classmethod"}
|
||||
### DocTransformerOutput.embedding_layer {id="doctransformeroutput-embeddinglayer",tag="property"}
|
||||
|
||||
Create an empty `DocTransformerOutput` container.
|
||||
Return the output of the transformer's embedding layer or `None` if
|
||||
`last_layer_only` is `True`.
|
||||
|
||||
| Name | Description |
|
||||
| ----------- | --------------------------------------- |
|
||||
| **RETURNS** | The container. ~~DocTransformerOutput~~ |
|
||||
| Name | Description |
|
||||
| ----------- | -------------------------------------------- |
|
||||
| **RETURNS** | Embedding layer output. ~~Optional[Ragged]~~ |
|
||||
|
||||
### DocTransformerOutput.last_hidden_layer_state {id="doctransformeroutput-lasthiddenlayerstate",tag="property"}
|
||||
|
||||
Return the output of the transformer's last hidden layer.
|
||||
|
||||
| Name | Description |
|
||||
| ----------- | ------------------------------------ |
|
||||
| **RETURNS** | Last hidden layer output. ~~Ragged~~ |
|
||||
|
||||
### DocTransformerOutput.all_hidden_layer_states {id="doctransformeroutput-allhiddenlayerstates",tag="property"}
|
||||
|
||||
Return the outputs of all transformer layers (excluding the embedding layer).
|
||||
|
||||
| Name | Description |
|
||||
| ----------- | -------------------------------------- |
|
||||
| **RETURNS** | Hidden layer outputs. ~~List[Ragged]~~ |
|
||||
|
||||
### DocTransformerOutput.num_outputs {id="doctransformeroutput-numoutputs",tag="property"}
|
||||
|
||||
Return the number of layer outputs stored in the `DocTransformerOutput` instance
|
||||
(including the embedding layer).
|
||||
|
||||
| Name | Description |
|
||||
| ----------- | -------------------------- |
|
||||
| **RETURNS** | Numbef of outputs. ~~int~~ |
|
||||
|
||||
## Span getters {id="span_getters",source="github.com/explosion/spacy-transformers/blob/master/spacy_curated_transformers/span_getters.py"}
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user