mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-13 18:56:36 +03:00
revert annotations refactor
This commit is contained in:
parent
13ee742fb4
commit
e47ea88aeb
|
@ -307,8 +307,7 @@ factories.
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Registry name | Description |
|
| Registry name | Description |
|
||||||
| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `annotation_setters` | Registry for functions that store Tok2Vec annotations on `Doc` objects. |
|
|
||||||
| `architectures` | Registry for functions that create [model architectures](/api/architectures). Can be used to register custom model architectures and reference them in the `config.cfg`. |
|
| `architectures` | Registry for functions that create [model architectures](/api/architectures). Can be used to register custom model architectures and reference them in the `config.cfg`. |
|
||||||
| `assets` | Registry for data assets, knowledge bases etc. |
|
| `assets` | Registry for data assets, knowledge bases etc. |
|
||||||
| `batchers` | Registry for training and evaluation [data batchers](#batchers). |
|
| `batchers` | Registry for training and evaluation [data batchers](#batchers). |
|
||||||
|
@ -338,17 +337,18 @@ See the [`Transformer`](/api/transformer) API reference and
|
||||||
> ```python
|
> ```python
|
||||||
> import spacy_transformers
|
> import spacy_transformers
|
||||||
>
|
>
|
||||||
> @spacy_transformers.registry.span_getters("my_span_getter.v1")
|
> @spacy_transformers.registry.annotation_setters("my_annotation_setter.v1")
|
||||||
> def configure_custom_span_getter() -> Callable:
|
> def configure_custom_annotation_setter():
|
||||||
> def span_getter(docs: List[Doc]) -> List[List[Span]]:
|
> def annotation_setter(docs, trf_data) -> None:
|
||||||
> # Transform each Doc into a List of Span objects
|
> # Set annotations on the docs
|
||||||
>
|
>
|
||||||
> return span_getter
|
> return annotation_setter
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Registry name | Description |
|
| Registry name | Description |
|
||||||
| ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| [`span_getters`](/api/transformer#span_getters) | Registry for functions that take a batch of `Doc` objects and return a list of `Span` objects to process by the transformer, e.g. sentences. |
|
| [`span_getters`](/api/transformer#span_getters) | Registry for functions that take a batch of `Doc` objects and return a list of `Span` objects to process by the transformer, e.g. sentences. |
|
||||||
|
| [`annotation_setters`](/api/transformer#annotation_setters) | Registry for functions that create annotation setters. Annotation setters are functions that take a batch of `Doc` objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set additional annotations on the `Doc`. |
|
||||||
|
|
||||||
## Loggers {#loggers source="spacy/gold/loggers.py" new="3"}
|
## Loggers {#loggers source="spacy/gold/loggers.py" new="3"}
|
||||||
|
|
||||||
|
|
|
@ -33,15 +33,16 @@ the [TransformerListener](/api/architectures#TransformerListener) layer. This
|
||||||
works similarly to spaCy's [Tok2Vec](/api/tok2vec) component and
|
works similarly to spaCy's [Tok2Vec](/api/tok2vec) component and
|
||||||
[Tok2VecListener](/api/architectures/Tok2VecListener) sublayer.
|
[Tok2VecListener](/api/architectures/Tok2VecListener) sublayer.
|
||||||
|
|
||||||
We calculate an alignment between the word-piece tokens and the spaCy
|
The component assigns the output of the transformer to the `Doc`'s extension
|
||||||
tokenization, so that we can use the last hidden states to store the information
|
attributes. We also calculate an alignment between the word-piece tokens and the
|
||||||
on the `Doc`. When multiple word-piece tokens align to the same spaCy token, the
|
spaCy tokenization, so that we can use the last hidden states to set the
|
||||||
spaCy token receives the sum of their values. By default, the information is
|
`Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy
|
||||||
written to the [`Doc._.trf_data`](#custom-attributes) extension attribute, but
|
token, the spaCy token receives the sum of their values. To access the values,
|
||||||
you can implement a custom [`@annotation_setter`](#annotation_setters) to change
|
you can use the custom [`Doc._.trf_data`](#custom-attributes) attribute. The
|
||||||
this behaviour. The package also adds the function registry
|
package also adds the function registries [`@span_getters`](#span_getters) and
|
||||||
[`@span_getters`](#span_getters) with several built-in registered functions. For
|
[`@annotation_setters`](#annotation_setters) with several built-in registered
|
||||||
more details, see the [usage documentation](/usage/embeddings-transformers).
|
functions. For more details, see the
|
||||||
|
[usage documentation](/usage/embeddings-transformers).
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
|
@ -61,9 +62,9 @@ on the transformer architectures and their arguments and hyperparameters.
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Setting | Description |
|
| Setting | Description |
|
||||||
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `max_batch_items` | Maximum size of a padded batch. Defaults to `4096`. ~~int~~ |
|
| `max_batch_items` | Maximum size of a padded batch. Defaults to `4096`. ~~int~~ |
|
||||||
| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs to store the annotations on the `Doc`. Defaults to `trfdata_setter` which sets the `Doc._.trf_data` attribute. ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
|
| `annotation_setter` | Function that takes a batch of `Doc` objects and transformer outputs to set additional annotations on the `Doc`. The `Doc._.transformer_data` attribute is set prior to calling the callback. Defaults to `null_annotation_setter` (no additional annotations). ~~Callable[[List[Doc], FullTransformerBatch], None]~~ |
|
||||||
| `model` | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [TransformerModel](/api/architectures#TransformerModel). ~~Model[List[Doc], FullTransformerBatch]~~ |
|
| `model` | The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Defaults to [TransformerModel](/api/architectures#TransformerModel). ~~Model[List[Doc], FullTransformerBatch]~~ |
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
@ -97,10 +98,9 @@ Construct a `Transformer` component. One or more subsequent spaCy components can
|
||||||
use the transformer outputs as features in its model, with gradients
|
use the transformer outputs as features in its model, with gradients
|
||||||
backpropagated to the single shared weights. The activations from the
|
backpropagated to the single shared weights. The activations from the
|
||||||
transformer are saved in the [`Doc._.trf_data`](#custom-attributes) extension
|
transformer are saved in the [`Doc._.trf_data`](#custom-attributes) extension
|
||||||
attribute by default, but you can provide a different `annotation_setter` to
|
attribute. You can also provide a callback to set additional annotations. In
|
||||||
customize this behaviour. In your application, you would normally use a shortcut
|
your application, you would normally use a shortcut for this and instantiate the
|
||||||
and instantiate the component using its string name and
|
component using its string name and [`nlp.add_pipe`](/api/language#create_pipe).
|
||||||
[`nlp.add_pipe`](/api/language#create_pipe).
|
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
@ -205,9 +205,8 @@ modifying them.
|
||||||
|
|
||||||
Assign the extracted features to the Doc objects. By default, the
|
Assign the extracted features to the Doc objects. By default, the
|
||||||
[`TransformerData`](/api/transformer#transformerdata) object is written to the
|
[`TransformerData`](/api/transformer#transformerdata) object is written to the
|
||||||
[`Doc._.trf_data`](#custom-attributes) attribute. This behaviour can be
|
[`Doc._.trf_data`](#custom-attributes) attribute. Your annotation_setter
|
||||||
customized by providing a different `annotation_setter` argument upon
|
callback is then called, if provided.
|
||||||
construction.
|
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
|
@ -520,21 +519,18 @@ right context.
|
||||||
## Annotation setters {#annotation_setters tag="registered functions" source="github.com/explosion/spacy-transformers/blob/master/spacy_transformers/annotation_setters.py"}
|
## Annotation setters {#annotation_setters tag="registered functions" source="github.com/explosion/spacy-transformers/blob/master/spacy_transformers/annotation_setters.py"}
|
||||||
|
|
||||||
Annotation setters are functions that take a batch of `Doc` objects and a
|
Annotation setters are functions that take a batch of `Doc` objects and a
|
||||||
[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and store the
|
[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set
|
||||||
annotations on the `Doc`, e.g. to set custom or built-in attributes. You can
|
additional annotations on the `Doc`, e.g. to set custom or built-in attributes.
|
||||||
register custom annotation setters using the `@registry.annotation_setters`
|
You can register custom annotation setters using the
|
||||||
decorator. The default annotation setter used by the `Transformer` pipeline
|
`@registry.annotation_setters` decorator.
|
||||||
component is `trfdata_setter`, which sets the custom `Doc._.trf_data` attribute.
|
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> @registry.annotation_setters("spacy-transformers.trfdata_setter.v1")
|
> @registry.annotation_setters("spacy-transformers.null_annotation_setter.v1")
|
||||||
> def configure_trfdata_setter() -> Callable:
|
> def configure_null_annotation_setter() -> Callable:
|
||||||
> def setter(docs: List[Doc], trf_data: FullTransformerBatch) -> None:
|
> def setter(docs: List[Doc], trf_data: FullTransformerBatch) -> None:
|
||||||
> doc_data = list(trf_data.doc_data)
|
> pass
|
||||||
> for doc, data in zip(docs, doc_data):
|
|
||||||
> doc._.trf_data = data
|
|
||||||
>
|
>
|
||||||
> return setter
|
> return setter
|
||||||
> ```
|
> ```
|
||||||
|
@ -547,8 +543,8 @@ component is `trfdata_setter`, which sets the custom `Doc._.trf_data` attribute.
|
||||||
The following built-in functions are available:
|
The following built-in functions are available:
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------------------------------- | ------------------------------------------------------------- |
|
| ---------------------------------------------- | ------------------------------------- |
|
||||||
| `spacy-transformers.trfdata_setter.v1` | Set the annotations to the custom attribute `doc._.trf_data`. |
|
| `spacy-transformers.null_annotation_setter.v1` | Don't set any additional annotations. |
|
||||||
|
|
||||||
## Custom attributes {#custom-attributes}
|
## Custom attributes {#custom-attributes}
|
||||||
|
|
||||||
|
|
|
@ -252,12 +252,13 @@ for doc in nlp.pipe(["some text", "some other text"]):
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also customize how the [`Transformer`](/api/transformer) component sets
|
You can also customize how the [`Transformer`](/api/transformer) component sets
|
||||||
annotations onto the [`Doc`](/api/doc), by customizing the `annotation_setter`.
|
annotations onto the [`Doc`](/api/doc), by specifying a custom
|
||||||
This callback will be called with the raw input and output data for the whole
|
`annotation_setter`. This callback will be called with the raw input and output
|
||||||
batch, along with the batch of `Doc` objects, allowing you to implement whatever
|
data for the whole batch, along with the batch of `Doc` objects, allowing you to
|
||||||
you need. The annotation setter is called with a batch of [`Doc`](/api/doc)
|
implement whatever you need. The annotation setter is called with a batch of
|
||||||
objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch)
|
[`Doc`](/api/doc) objects and a
|
||||||
containing the transformers data for the batch.
|
[`FullTransformerBatch`](/api/transformer#fulltransformerbatch) containing the
|
||||||
|
transformers data for the batch.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
def custom_annotation_setter(docs, trf_data):
|
def custom_annotation_setter(docs, trf_data):
|
||||||
|
|
Loading…
Reference in New Issue
Block a user