mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 01:48:04 +03:00 
			
		
		
		
	various fixes
This commit is contained in:
		
							parent
							
								
									329e490560
								
							
						
					
					
						commit
						556e975a30
					
				| 
						 | 
				
			
			@ -49,8 +49,8 @@ The default config is defined by the pipeline component factory and describes
 | 
			
		|||
how the component should be configured. You can override its settings via the
 | 
			
		||||
`config` argument on [`nlp.add_pipe`](/api/language#add_pipe) or in your
 | 
			
		||||
[`config.cfg` for training](/usage/training#config). See the
 | 
			
		||||
[model architectures](/api/architectures) documentation for details on the
 | 
			
		||||
architectures and their arguments and hyperparameters.
 | 
			
		||||
[model architectures](/api/architectures#transformers) documentation for details
 | 
			
		||||
on the transformer architectures and their arguments and hyperparameters.
 | 
			
		||||
 | 
			
		||||
> #### Example
 | 
			
		||||
>
 | 
			
		||||
| 
						 | 
				
			
			@ -61,9 +61,9 @@ architectures and their arguments and hyperparameters.
 | 
			
		|||
> ```
 | 
			
		||||
 | 
			
		||||
| Setting             | Description                                                                                                                                                                                                                               |
 | 
			
		||||
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 
			
		||||
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 
			
		||||
| `max_batch_items`   | Maximum size of a padded batch. Defaults to `4096`. ~~int~~                                                                                                                                                                               |
 | 
			
		||||
| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs to store the annotations on the `Doc`. Defaults to `trfdata_setter` which sets the `Doc._.transformer_data` attribute. ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
 | 
			
		||||
| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs to store the annotations on the `Doc`. Defaults to `trfdata_setter` which sets the `Doc._.trf_data` attribute. ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
 | 
			
		||||
| `model`             | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [TransformerModel](/api/architectures#TransformerModel). ~~Model[List[Doc], FullTransformerBatch]~~                                            |
 | 
			
		||||
 | 
			
		||||
```python
 | 
			
		||||
| 
						 | 
				
			
			@ -97,15 +97,16 @@ Construct a `Transformer` component. One or more subsequent spaCy components can
 | 
			
		|||
use the transformer outputs as features in its model, with gradients
 | 
			
		||||
backpropagated to the single shared weights. The activations from the
 | 
			
		||||
transformer are saved in the [`Doc._.trf_data`](#custom-attributes) extension
 | 
			
		||||
attribute. You can also provide a callback to set additional annotations. In
 | 
			
		||||
your application, you would normally use a shortcut for this and instantiate the
 | 
			
		||||
component using its string name and [`nlp.add_pipe`](/api/language#create_pipe).
 | 
			
		||||
attribute by default, but you can provide a different `annotation_setter` to
 | 
			
		||||
customize this behaviour. In your application, you would normally use a shortcut
 | 
			
		||||
and instantiate the component using its string name and
 | 
			
		||||
[`nlp.add_pipe`](/api/language#create_pipe).
 | 
			
		||||
 | 
			
		||||
| Name                | Description                                                                                                                                                                                                                                        |
 | 
			
		||||
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 
			
		||||
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 
			
		||||
| `vocab`             | The shared vocabulary. ~~Vocab~~                                                                                                                                                                                                                   |
 | 
			
		||||
| `model`             | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Usually you will want to use the [TransformerModel](/api/architectures#TransformerModel) layer for this. ~~Model[List[Doc], FullTransformerBatch]~~                 |
 | 
			
		||||
| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs can set additional annotations on the `Doc`. The `Doc._.transformer_data` attribute is set prior to calling the callback. By default, no annotations are set. ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
 | 
			
		||||
| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs and stores the annotations on the `Doc`. By default, the function `trfdata_setter` sets the `Doc._.trf_data` attribute. ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
 | 
			
		||||
| _keyword-only_      |                                                                                                                                                                                                                                                    |
 | 
			
		||||
| `name`              | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~                                                                                                                                                |
 | 
			
		||||
| `max_batch_items`   | Maximum size of a padded batch. Defaults to `128*32`. ~~int~~                                                                                                                                                                                      |
 | 
			
		||||
| 
						 | 
				
			
			@ -204,8 +205,9 @@ modifying them.
 | 
			
		|||
 | 
			
		||||
Assign the extracted features to the Doc objects. By default, the
 | 
			
		||||
[`TransformerData`](/api/transformer#transformerdata) object is written to the
 | 
			
		||||
[`Doc._.trf_data`](#custom-attributes) attribute. Your annotation_setter
 | 
			
		||||
callback is then called, if provided.
 | 
			
		||||
[`Doc._.trf_data`](#custom-attributes) attribute. This behaviour can be
 | 
			
		||||
customized by providing a different `annotation_setter` argument upon
 | 
			
		||||
construction.
 | 
			
		||||
 | 
			
		||||
> #### Example
 | 
			
		||||
>
 | 
			
		||||
| 
						 | 
				
			
			@ -382,9 +384,8 @@ return tensors that refer to a whole padded batch of documents. These tensors
 | 
			
		|||
are wrapped into the
 | 
			
		||||
[FullTransformerBatch](/api/transformer#fulltransformerbatch) object. The
 | 
			
		||||
`FullTransformerBatch` then splits out the per-document data, which is handled
 | 
			
		||||
by this class. Instances of this class
 | 
			
		||||
are`typically assigned to the [Doc._.trf_data`](/api/transformer#custom-attributes)
 | 
			
		||||
extension attribute.
 | 
			
		||||
by this class. Instances of this class are typically assigned to the
 | 
			
		||||
[`Doc._.trf_data`](/api/transformer#custom-attributes) extension attribute.
 | 
			
		||||
 | 
			
		||||
| Name      | Description                                                                                                                                                                                                                                                                                                                                             |
 | 
			
		||||
| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 
			
		||||
| 
						 | 
				
			
			@ -446,8 +447,9 @@ overlap, and you can also omit sections of the Doc if they are not relevant.
 | 
			
		|||
 | 
			
		||||
Span getters can be referenced in the `[components.transformer.model.get_spans]`
 | 
			
		||||
block of the config to customize the sequences processed by the transformer. You
 | 
			
		||||
can also register custom span getters using the `@spacy.registry.span_getters`
 | 
			
		||||
decorator.
 | 
			
		||||
can also register
 | 
			
		||||
[custom span getters](/usage/embeddings-transformers#transformers-training-custom-settings)
 | 
			
		||||
using the `@spacy.registry.span_getters` decorator.
 | 
			
		||||
 | 
			
		||||
> #### Example
 | 
			
		||||
>
 | 
			
		||||
| 
						 | 
				
			
			@ -522,8 +524,7 @@ Annotation setters are functions that take a batch of `Doc` objects and a
 | 
			
		|||
annotations on the `Doc`, e.g. to set custom or built-in attributes. You can
 | 
			
		||||
register custom annotation setters using the `@registry.annotation_setters`
 | 
			
		||||
decorator. The default annotation setter used by the `Transformer` pipeline
 | 
			
		||||
component is `trfdata_setter`, which sets the custom `Doc._.transformer_data`
 | 
			
		||||
attribute.
 | 
			
		||||
component is `trfdata_setter`, which sets the custom `Doc._.trf_data` attribute.
 | 
			
		||||
 | 
			
		||||
> #### Example
 | 
			
		||||
>
 | 
			
		||||
| 
						 | 
				
			
			@ -555,5 +556,5 @@ The component sets the following
 | 
			
		|||
[custom extension attributes](/usage/processing-pipeline#custom-components-attributes):
 | 
			
		||||
 | 
			
		||||
| Name             | Description                                                              |
 | 
			
		||||
| -------------- | ------------------------------------------------------------------------ |
 | 
			
		||||
| `Doc.trf_data` | Transformer tokens and outputs for the `Doc` object. ~~TransformerData~~ |
 | 
			
		||||
| ---------------- | ------------------------------------------------------------------------ |
 | 
			
		||||
| `Doc._.trf_data` | Transformer tokens and outputs for the `Doc` object. ~~TransformerData~~ |
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -429,8 +429,8 @@ The same idea applies to task models that power the **downstream components**.
 | 
			
		|||
Most of spaCy's built-in model creation functions support a `tok2vec` argument,
 | 
			
		||||
which should be a Thinc layer of type ~~Model[List[Doc], List[Floats2d]]~~. This
 | 
			
		||||
is where we'll plug in our transformer model, using the
 | 
			
		||||
[TransformerListener](/api/architectures#TransformerListener) layer, which sneakily
 | 
			
		||||
delegates to the `Transformer` pipeline component.
 | 
			
		||||
[TransformerListener](/api/architectures#TransformerListener) layer, which
 | 
			
		||||
sneakily delegates to the `Transformer` pipeline component.
 | 
			
		||||
 | 
			
		||||
```ini
 | 
			
		||||
### config.cfg (excerpt) {highlight="12"}
 | 
			
		||||
| 
						 | 
				
			
			@ -452,11 +452,11 @@ grad_factor = 1.0
 | 
			
		|||
@layers = "reduce_mean.v1"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The [TransformerListener](/api/architectures#TransformerListener) layer expects a
 | 
			
		||||
[pooling layer](https://thinc.ai/docs/api-layers#reduction-ops) as the argument
 | 
			
		||||
`pooling`, which needs to be of type ~~Model[Ragged, Floats2d]~~. This layer
 | 
			
		||||
determines how the vector for each spaCy token will be computed from the zero or
 | 
			
		||||
more source rows the token is aligned against. Here we use the
 | 
			
		||||
The [TransformerListener](/api/architectures#TransformerListener) layer expects
 | 
			
		||||
a [pooling layer](https://thinc.ai/docs/api-layers#reduction-ops) as the
 | 
			
		||||
argument `pooling`, which needs to be of type ~~Model[Ragged, Floats2d]~~. This
 | 
			
		||||
layer determines how the vector for each spaCy token will be computed from the
 | 
			
		||||
zero or more source rows the token is aligned against. Here we use the
 | 
			
		||||
[`reduce_mean`](https://thinc.ai/docs/api-layers#reduce_mean) layer, which
 | 
			
		||||
averages the wordpiece rows. We could instead use
 | 
			
		||||
[`reduce_max`](https://thinc.ai/docs/api-layers#reduce_max), or a custom
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in New Issue
	
	Block a user