add loggers registry & logger docs sections

2025-08-05 21:00:19 +03:00 · 2020-08-28 21:44:04 +02:00 · 2020-08-28 21:44:04 +02:00 · 5230529de2
commit 5230529de2
parent 72a87095d9
3 changed files with 98 additions and 28 deletions
--- a/spacy/cli/train.py
+++ b/spacy/cli/train.py
@ -272,7 +272,7 @@ def train_while_improving(
        step (int): How many steps have been completed.
        score (float): The main score form the last evaluation.
        other_scores: : The other scores from the last evaluation.
-        loss: The accumulated losses throughout training.
+        losses: The accumulated losses throughout training.
        checkpoints: A list of previous results, where each result is a
            (score, step, epoch) tuple.
    """
--- a/website/docs/api/top-level.md
+++ b/website/docs/api/top-level.md
@ -296,24 +296,25 @@ factories.
 >         i += 1
 > ```

-| Registry name     | Description                                                                                                                                                                                                                                        |
-| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `architectures`   | Registry for functions that create [model architectures](/api/architectures). Can be used to register custom model architectures and reference them in the `config.cfg`.                                                                           |
-| `assets`          | Registry for data assets, knowledge bases etc.                                                                                                                                                                                                     |
-| `batchers`        | Registry for training and evaluation [data batchers](#batchers).                                                                                                                                                                                   |
-| `callbacks`       | Registry for custom callbacks to [modify the `nlp` object](/usage/training#custom-code-nlp-callbacks) before training.                                                                                                                             |
-| `displacy_colors` | Registry for custom color scheme for the [`displacy` NER visualizer](/usage/visualizers). Automatically reads from [entry points](/usage/saving-loading#entry-points).                                                                             |
-| `factories`       | Registry for functions that create [pipeline components](/usage/processing-pipelines#custom-components). Added automatically when you use the `@spacy.component` decorator and also reads from [entry points](/usage/saving-loading#entry-points). |
-| `initializers`    | Registry for functions that create [initializers](https://thinc.ai/docs/api-initializers).                                                                                                                                                         |
-| `languages`       | Registry for language-specific `Language` subclasses. Automatically reads from [entry points](/usage/saving-loading#entry-points).                                                                                                                 |
-| `layers`          | Registry for functions that create [layers](https://thinc.ai/docs/api-layers).                                                                                                                                                                     |
-| `loggers`         | Registry for functions that log [training results](/usage/training).                                                                                                                                                                               |
-| `lookups`         | Registry for large lookup tables available via `vocab.lookups`.                                                                                                                                                                                    |
-| `losses`          | Registry for functions that create [losses](https://thinc.ai/docs/api-loss).                                                                                                                                                                       |
-| `optimizers`      | Registry for functions that create [optimizers](https://thinc.ai/docs/api-optimizers).                                                                                                                                                             |
-| `readers`         | Registry for training and evaluation data readers like [`Corpus`](/api/corpus).                                                                                                                                                                    |
-| `schedules`       | Registry for functions that create [schedules](https://thinc.ai/docs/api-schedules).                                                                                                                                                               |
-| `tokenizers`      | Registry for tokenizer factories. Registered functions should return a callback that receives the `nlp` object and returns a [`Tokenizer`](/api/tokenizer) or a custom callable.                                                                   |
+| Registry name        | Description                                                                                                                                                                                                                                        |
+| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `annotation_setters` | Registry for functions that store Tok2Vec annotations on `Doc` objects.                                                                                                                                                                            |
+| `architectures`      | Registry for functions that create [model architectures](/api/architectures). Can be used to register custom model architectures and reference them in the `config.cfg`.                                                                           |
+| `assets`             | Registry for data assets, knowledge bases etc.                                                                                                                                                                                                     |
+| `batchers`           | Registry for training and evaluation [data batchers](#batchers).                                                                                                                                                                                   |
+| `callbacks`          | Registry for custom callbacks to [modify the `nlp` object](/usage/training#custom-code-nlp-callbacks) before training.                                                                                                                             |
+| `displacy_colors`    | Registry for custom color scheme for the [`displacy` NER visualizer](/usage/visualizers). Automatically reads from [entry points](/usage/saving-loading#entry-points).                                                                             |
+| `factories`          | Registry for functions that create [pipeline components](/usage/processing-pipelines#custom-components). Added automatically when you use the `@spacy.component` decorator and also reads from [entry points](/usage/saving-loading#entry-points). |
+| `initializers`       | Registry for functions that create [initializers](https://thinc.ai/docs/api-initializers).                                                                                                                                                         |
+| `languages`          | Registry for language-specific `Language` subclasses. Automatically reads from [entry points](/usage/saving-loading#entry-points).                                                                                                                 |
+| `layers`             | Registry for functions that create [layers](https://thinc.ai/docs/api-layers).                                                                                                                                                                     |
+| `loggers`            | Registry for functions that log [training results](/usage/training).                                                                                                                                                                               |
+| `lookups`            | Registry for large lookup tables available via `vocab.lookups`.                                                                                                                                                                                    |
+| `losses`             | Registry for functions that create [losses](https://thinc.ai/docs/api-loss).                                                                                                                                                                       |
+| `optimizers`         | Registry for functions that create [optimizers](https://thinc.ai/docs/api-optimizers).                                                                                                                                                             |
+| `readers`            | Registry for training and evaluation data readers like [`Corpus`](/api/corpus).                                                                                                                                                                    |
+| `schedules`          | Registry for functions that create [schedules](https://thinc.ai/docs/api-schedules).                                                                                                                                                               |
+| `tokenizers`         | Registry for tokenizer factories. Registered functions should return a callback that receives the `nlp` object and returns a [`Tokenizer`](/api/tokenizer) or a custom callable.                                                                   |

 ### spacy-transformers registry {#registry-transformers}

@ -327,18 +328,69 @@ See the [`Transformer`](/api/transformer) API reference and
 > ```python
 > import spacy_transformers
 >
-> @spacy_transformers.registry.annotation_setters("my_annotation_setter.v1")
-> def configure_custom_annotation_setter():
->     def annotation_setter(docs, trf_data) -> None:
->        # Set annotations on the docs
+> @spacy_transformers.registry.span_getters("my_span_getter.v1")
+> def configure_custom_span_getter() -> Callable:
+>     def span_getter(docs: List[Doc]) -> List[List[Span]]:
+>        # Transform each Doc into a List of Span objects
 >
->     return annotation_sette
+>     return span_getter
 > ```

-| Registry name                                               | Description                                                                                                                                                                                                                                       |
-| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| [`span_getters`](/api/transformer#span_getters)             | Registry for functions that take a batch of `Doc` objects and return a list of `Span` objects to process by the transformer, e.g. sentences.                                                                                                      |
-| [`annotation_setters`](/api/transformer#annotation_setters) | Registry for functions that create annotation setters. Annotation setters are functions that take a batch of `Doc` objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set additional annotations on the `Doc`. |
+
+| Registry name                                   | Description                                                                                                                                  |
+| ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
+| [`span_getters`](/api/transformer#span_getters) | Registry for functions that take a batch of `Doc` objects and return a list of `Span` objects to process by the transformer, e.g. sentences. |
+
+## Loggers {#loggers source="spacy/gold/loggers.py" new="3"}
+
+A logger records the training results for each step. When a logger is created,
+it returns a `log_step` function and a `finalize` function. The `log_step`
+function is called by the [training script](/api/cli#train) and receives a
+dictionary of information, including
+
+# TODO
+
+> #### Example config
+>
+> ```ini
+> [training.logger]
+> @loggers = "spacy.ConsoleLogger.v1"
+> ```
+
+Instead of using one of the built-in batchers listed here, you can also
+[implement your own](/usage/training#custom-code-readers-batchers), which may or
+may not use a custom schedule.
+
+#### spacy.ConsoleLogger.v1 {#ConsoleLogger tag="registered function"}
+
+Writes the results of a training step to the console in a tabular format.
+
+#### spacy.WandbLogger.v1 {#WandbLogger tag="registered function"}
+
+> #### Installation
+>
+> ```bash
+> $ pip install wandb
+> $ wandb login
+> ```
+
+Built-in logger that sends the results of each training step to the dashboard of
+the [Weights & Biases`](https://www.wandb.com/) dashboard. To use this logger,
+Weights & Biases should be installed, and you should be logged in. The logger 
+will send the full config file to W&B, as well as various system information 
+such as GPU 
+
+| Name           | Description                                                                                                                           |
+| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
+| `project_name` | The name of the project in the Weights & Biases interface. The project will be created automatically if it doesn't exist yet. ~~str~~ |
+
+> #### Example config
+>
+> ```ini
+> [training.logger]
+> @loggers = "spacy.WandbLogger.v1"
+> project_name = "monitor_spacy_training"
+> ```

 ## Batchers {#batchers source="spacy/gold/batchers.py" new="3"}

--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -605,6 +605,24 @@ to your Python file. Before loading the config, spaCy will import the
 $ python -m spacy train config.cfg --output ./output --code ./functions.py
 ```

+#### Example: Custom logging function {#custom-logging}
+
+During training, the results of each step are passed to a logger function in a
+dictionary providing the following information:
+
+| Key            | Value                                                                                          |
+| -------------- | ---------------------------------------------------------------------------------------------- |
+| `epoch`        | How many passes over the data have been completed. ~~int~~                                     |
+| `step`         | How many steps have been completed. ~~int~~                                                    |
+| `score`        | The main score form the last evaluation, measured on the dev set. ~~float~~                    |
+| `other_scores` | The other scores from the last evaluation, measured on the dev set. ~~Dict[str, Any]~~         |
+| `losses`       | The accumulated training losses. ~~Dict[str, float]~~                                          |
+| `checkpoints`  | A list of previous results, where each result is a (score, step, epoch) tuple. ~~List[Tuple]~~ |
+
+By default, these results are written to the console with the [`ConsoleLogger`](/api/top-level#ConsoleLogger)
+
+# TODO
+
 #### Example: Custom batch size schedule {#custom-code-schedule}

 For example, let's say you've implemented your own batch size schedule to use