add loggers registry & logger docs sections

This commit is contained in:
svlandeg 2020-08-28 21:44:04 +02:00
parent 72a87095d9
commit 5230529de2
3 changed files with 98 additions and 28 deletions

View File

@ -272,7 +272,7 @@ def train_while_improving(
step (int): How many steps have been completed.
score (float): The main score form the last evaluation.
other_scores: : The other scores from the last evaluation.
loss: The accumulated losses throughout training.
losses: The accumulated losses throughout training.
checkpoints: A list of previous results, where each result is a
(score, step, epoch) tuple.
"""

View File

@ -296,24 +296,25 @@ factories.
> i += 1
> ```
| Registry name | Description |
| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `architectures` | Registry for functions that create [model architectures](/api/architectures). Can be used to register custom model architectures and reference them in the `config.cfg`. |
| `assets` | Registry for data assets, knowledge bases etc. |
| `batchers` | Registry for training and evaluation [data batchers](#batchers). |
| `callbacks` | Registry for custom callbacks to [modify the `nlp` object](/usage/training#custom-code-nlp-callbacks) before training. |
| `displacy_colors` | Registry for custom color scheme for the [`displacy` NER visualizer](/usage/visualizers). Automatically reads from [entry points](/usage/saving-loading#entry-points). |
| `factories` | Registry for functions that create [pipeline components](/usage/processing-pipelines#custom-components). Added automatically when you use the `@spacy.component` decorator and also reads from [entry points](/usage/saving-loading#entry-points). |
| `initializers` | Registry for functions that create [initializers](https://thinc.ai/docs/api-initializers). |
| `languages` | Registry for language-specific `Language` subclasses. Automatically reads from [entry points](/usage/saving-loading#entry-points). |
| `layers` | Registry for functions that create [layers](https://thinc.ai/docs/api-layers). |
| `loggers` | Registry for functions that log [training results](/usage/training). |
| `lookups` | Registry for large lookup tables available via `vocab.lookups`. |
| `losses` | Registry for functions that create [losses](https://thinc.ai/docs/api-loss). |
| `optimizers` | Registry for functions that create [optimizers](https://thinc.ai/docs/api-optimizers). |
| `readers` | Registry for training and evaluation data readers like [`Corpus`](/api/corpus). |
| `schedules` | Registry for functions that create [schedules](https://thinc.ai/docs/api-schedules). |
| `tokenizers` | Registry for tokenizer factories. Registered functions should return a callback that receives the `nlp` object and returns a [`Tokenizer`](/api/tokenizer) or a custom callable. |
| Registry name | Description |
| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `annotation_setters` | Registry for functions that store Tok2Vec annotations on `Doc` objects. |
| `architectures` | Registry for functions that create [model architectures](/api/architectures). Can be used to register custom model architectures and reference them in the `config.cfg`. |
| `assets` | Registry for data assets, knowledge bases etc. |
| `batchers` | Registry for training and evaluation [data batchers](#batchers). |
| `callbacks` | Registry for custom callbacks to [modify the `nlp` object](/usage/training#custom-code-nlp-callbacks) before training. |
| `displacy_colors` | Registry for custom color scheme for the [`displacy` NER visualizer](/usage/visualizers). Automatically reads from [entry points](/usage/saving-loading#entry-points). |
| `factories` | Registry for functions that create [pipeline components](/usage/processing-pipelines#custom-components). Added automatically when you use the `@spacy.component` decorator and also reads from [entry points](/usage/saving-loading#entry-points). |
| `initializers` | Registry for functions that create [initializers](https://thinc.ai/docs/api-initializers). |
| `languages` | Registry for language-specific `Language` subclasses. Automatically reads from [entry points](/usage/saving-loading#entry-points). |
| `layers` | Registry for functions that create [layers](https://thinc.ai/docs/api-layers). |
| `loggers` | Registry for functions that log [training results](/usage/training). |
| `lookups` | Registry for large lookup tables available via `vocab.lookups`. |
| `losses` | Registry for functions that create [losses](https://thinc.ai/docs/api-loss). |
| `optimizers` | Registry for functions that create [optimizers](https://thinc.ai/docs/api-optimizers). |
| `readers` | Registry for training and evaluation data readers like [`Corpus`](/api/corpus). |
| `schedules` | Registry for functions that create [schedules](https://thinc.ai/docs/api-schedules). |
| `tokenizers` | Registry for tokenizer factories. Registered functions should return a callback that receives the `nlp` object and returns a [`Tokenizer`](/api/tokenizer) or a custom callable. |
### spacy-transformers registry {#registry-transformers}
@ -327,18 +328,69 @@ See the [`Transformer`](/api/transformer) API reference and
> ```python
> import spacy_transformers
>
> @spacy_transformers.registry.annotation_setters("my_annotation_setter.v1")
> def configure_custom_annotation_setter():
> def annotation_setter(docs, trf_data) -> None:
> # Set annotations on the docs
> @spacy_transformers.registry.span_getters("my_span_getter.v1")
> def configure_custom_span_getter() -> Callable:
> def span_getter(docs: List[Doc]) -> List[List[Span]]:
> # Transform each Doc into a List of Span objects
>
> return annotation_sette
> return span_getter
> ```
| Registry name | Description |
| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`span_getters`](/api/transformer#span_getters) | Registry for functions that take a batch of `Doc` objects and return a list of `Span` objects to process by the transformer, e.g. sentences. |
| [`annotation_setters`](/api/transformer#annotation_setters) | Registry for functions that create annotation setters. Annotation setters are functions that take a batch of `Doc` objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set additional annotations on the `Doc`. |
| Registry name | Description |
| ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| [`span_getters`](/api/transformer#span_getters) | Registry for functions that take a batch of `Doc` objects and return a list of `Span` objects to process by the transformer, e.g. sentences. |
## Loggers {#loggers source="spacy/gold/loggers.py" new="3"}
A logger records the training results for each step. When a logger is created,
it returns a `log_step` function and a `finalize` function. The `log_step`
function is called by the [training script](/api/cli#train) and receives a
dictionary of information, including
# TODO
> #### Example config
>
> ```ini
> [training.logger]
> @loggers = "spacy.ConsoleLogger.v1"
> ```
Instead of using one of the built-in batchers listed here, you can also
[implement your own](/usage/training#custom-code-readers-batchers), which may or
may not use a custom schedule.
#### spacy.ConsoleLogger.v1 {#ConsoleLogger tag="registered function"}
Writes the results of a training step to the console in a tabular format.
#### spacy.WandbLogger.v1 {#WandbLogger tag="registered function"}
> #### Installation
>
> ```bash
> $ pip install wandb
> $ wandb login
> ```
Built-in logger that sends the results of each training step to the dashboard of
the [Weights & Biases`](https://www.wandb.com/) dashboard. To use this logger,
Weights & Biases should be installed, and you should be logged in. The logger
will send the full config file to W&B, as well as various system information
such as GPU
| Name | Description |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `project_name` | The name of the project in the Weights & Biases interface. The project will be created automatically if it doesn't exist yet. ~~str~~ |
> #### Example config
>
> ```ini
> [training.logger]
> @loggers = "spacy.WandbLogger.v1"
> project_name = "monitor_spacy_training"
> ```
## Batchers {#batchers source="spacy/gold/batchers.py" new="3"}

View File

@ -605,6 +605,24 @@ to your Python file. Before loading the config, spaCy will import the
$ python -m spacy train config.cfg --output ./output --code ./functions.py
```
#### Example: Custom logging function {#custom-logging}
During training, the results of each step are passed to a logger function in a
dictionary providing the following information:
| Key | Value |
| -------------- | ---------------------------------------------------------------------------------------------- |
| `epoch` | How many passes over the data have been completed. ~~int~~ |
| `step` | How many steps have been completed. ~~int~~ |
| `score` | The main score form the last evaluation, measured on the dev set. ~~float~~ |
| `other_scores` | The other scores from the last evaluation, measured on the dev set. ~~Dict[str, Any]~~ |
| `losses` | The accumulated training losses. ~~Dict[str, float]~~ |
| `checkpoints` | A list of previous results, where each result is a (score, step, epoch) tuple. ~~List[Tuple]~~ |
By default, these results are written to the console with the [`ConsoleLogger`](/api/top-level#ConsoleLogger)
# TODO
#### Example: Custom batch size schedule {#custom-code-schedule}
For example, let's say you've implemented your own batch size schedule to use