diff --git a/website/docs/api/cli.mdx b/website/docs/api/cli.mdx index d5b2d54d2..c7beba6db 100644 --- a/website/docs/api/cli.mdx +++ b/website/docs/api/cli.mdx @@ -1060,7 +1060,6 @@ $ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id] | `--code`, `-c` | Path to Python file with additional code to be imported. Allows [registering custom functions](/usage/training#custom-functions) for new architectures. ~~Optional[Path] \(option)~~ | | `--verbose`, `-V` | Show more detailed messages during training. ~~bool (flag)~~ | | `--gpu-id`, `-g` | GPU ID or `-1` for CPU. Defaults to `-1`. ~~int (option)~~ | -| `--use_rehearse`, `-r` | Use 'rehearsal' updates on a pre-trained model to address the catastrophic forgetting problem. Defaults to `False`. ~~bool (flag)~~ | | `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ | | overrides | Config parameters to override. Should be options starting with `--` that correspond to the config section and value to override, e.g. `--paths.train ./train.spacy`. ~~Any (option/flag)~~ | | **CREATES** | The final trained pipeline and the best trained pipeline. | diff --git a/website/docs/api/data-formats.mdx b/website/docs/api/data-formats.mdx index c9d88f87c..41ef131d9 100644 --- a/website/docs/api/data-formats.mdx +++ b/website/docs/api/data-formats.mdx @@ -192,6 +192,7 @@ process that are used when you run [`spacy train`](/api/cli#train). | `eval_frequency` | How often to evaluate during training (steps). Defaults to `200`. ~~int~~ | | `frozen_components` | Pipeline component names that are "frozen" and shouldn't be initialized or updated during training. See [here](/usage/training#config-components) for details. Defaults to `[]`. ~~List[str]~~ | | `annotating_components` 3.1 | Pipeline component names that should set annotations on the predicted docs during training. See [here](/usage/training#annotating-components) for details. Defaults to `[]`. ~~List[str]~~ | +| `rehearse_components` 3.5.1 | Pipeline component names that should get rehearsed during training. See [here](/usage/training#rehearse-components) for details. Defaults to `[]`. ~~List[str]~~ | | `gpu_allocator` | Library for cupy to route GPU memory allocation to. Can be `"pytorch"` or `"tensorflow"`. Defaults to variable `${system.gpu_allocator}`. ~~str~~ | | `logger` | Callable that takes the `nlp` and stdout and stderr `IO` objects, sets up the logger, and returns two new callables to log a training step and to finalize the logger. Defaults to [`ConsoleLogger`](/api/top-level#ConsoleLogger). ~~Callable[[Language, IO, IO], [Tuple[Callable[[Dict[str, Any]], None], Callable[[], None]]]]~~ | | `max_epochs` | Maximum number of epochs to train for. `0` means an unlimited number of epochs. `-1` means that the train corpus should be streamed rather than loaded into memory with no shuffling within the training loop. Defaults to `0`. ~~int~~ | diff --git a/website/docs/usage/training.mdx b/website/docs/usage/training.mdx index 6cda975cb..0a05037c5 100644 --- a/website/docs/usage/training.mdx +++ b/website/docs/usage/training.mdx @@ -575,6 +575,29 @@ now-updated model to the predicted docs. +### Using rehearsing to address catastrophic forgetting {id="rehearse-components", tag="experimental", version="3.5.1"} + +Perform “rehearsal” updates to pre-trained components. Rehearsal updates teach the current component to make predictions similar to an initial model, to try to address the “catastrophic forgetting” problem. This feature is experimental. + +```ini {title="config.cfg (excerpt)"} +[nlp] +pipeline = ["sentencizer", "ner", "entity_linker"] + +[components.ner] +source = "en_core_web_sm" + +[training] +rehearse_components = ["ner"] +``` + + + +Be aware that the loss is calculated by the sum of both the `update` and `rehearse` function. +If both the loss and accuracy of the component increases over time, it can be caused due to the trained component making more different predictions that the inital model, +indicating `catastrophic forgetting`. + + + ### Using registered functions {id="config-functions"} The training configuration defined in the config file doesn't have to only