diff --git a/website/docs/usage/index.md b/website/docs/usage/index.md index 1f4869606..dff5a16ba 100644 --- a/website/docs/usage/index.md +++ b/website/docs/usage/index.md @@ -75,7 +75,6 @@ spaCy's [`setup.cfg`](%%GITHUB_SPACY/setup.cfg) for details on what's included. | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `lookups` | Install [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) for data tables for lemmatization and lexeme normalization. The data is serialized with trained pipelines, so you only need this package if you want to train your own models. | | `transformers` | Install [`spacy-transformers`](https://github.com/explosion/spacy-transformers). The package will be installed automatically when you install a transformer-based pipeline. | -| `ray` | Install [`spacy-ray`](https://github.com/explosion/spacy-ray) to add CLI commands for [parallel training](/usage/training#parallel-training). | | `cuda`, ... | Install spaCy with GPU support provided by [CuPy](https://cupy.chainer.org) for your given CUDA version. See the GPU [installation instructions](#gpu) for details and options. | | `apple` | Install [`thinc-apple-ops`](https://github.com/explosion/thinc-apple-ops) to improve performance on an Apple M1. | | `ja`, `ko`, `th` | Install additional dependencies required for tokenization for the [languages](/usage/models#languages). | diff --git a/website/docs/usage/projects.md b/website/docs/usage/projects.md index 90b612358..34315e4e7 100644 --- a/website/docs/usage/projects.md +++ b/website/docs/usage/projects.md @@ -1014,54 +1014,6 @@ https://github.com/explosion/projects/blob/v3/integrations/fastapi/scripts/main. --- -### Ray {#ray} - -> #### Installation -> -> ```cli -> $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS -> # Check that the CLI is registered -> $ python -m spacy ray --help -> ``` - -[Ray](https://ray.io/) is a fast and simple framework for building and running -**distributed applications**. You can use Ray for parallel and distributed -training with spaCy via our lightweight -[`spacy-ray`](https://github.com/explosion/spacy-ray) extension package. If the -package is installed in the same environment as spaCy, it will automatically add -[`spacy ray`](/api/cli#ray) commands to your spaCy CLI. See the usage guide on -[parallel training](/usage/training#parallel-training) for more details on how -it works under the hood. - - - -Get started with parallel training using our project template. It trains a -simple model on a Universal Dependencies Treebank and lets you parallelize the -training with Ray. - - - -You can integrate [`spacy ray train`](/api/cli#ray-train) into your -`project.yml` just like the regular training command and pass it the config, and -optional output directory or remote storage URL and config overrides if needed. - - -```yaml -### project.yml -commands: - - name: "ray" - help: "Train a model via parallel training with Ray" - script: - - "python -m spacy ray train configs/config.cfg -o training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy" - deps: - - "corpus/train.spacy" - - "corpus/dev.spacy" - outputs: - - "training/model-best" -``` - ---- - ### Weights & Biases {#wandb} [Weights & Biases](https://www.wandb.com/) is a popular platform for experiment diff --git a/website/docs/usage/training.md b/website/docs/usage/training.md index 27a8bbca7..e40a395c4 100644 --- a/website/docs/usage/training.md +++ b/website/docs/usage/training.md @@ -1572,77 +1572,6 @@ token-based annotations like the dependency parse or entity labels, you'll need to take care to adjust the `Example` object so its annotations match and remain valid. -## Parallel & distributed training with Ray {#parallel-training} - -> #### Installation -> -> ```cli -> $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS -> # Check that the CLI is registered -> $ python -m spacy ray --help -> ``` - -[Ray](https://ray.io/) is a fast and simple framework for building and running -**distributed applications**. You can use Ray to train spaCy on one or more -remote machines, potentially speeding up your training process. Parallel -training won't always be faster though – it depends on your batch size, models, -and hardware. - - - -To use Ray with spaCy, you need the -[`spacy-ray`](https://github.com/explosion/spacy-ray) package installed. -Installing the package will automatically add the `ray` command to the spaCy -CLI. - - - -The [`spacy ray train`](/api/cli#ray-train) command follows the same API as -[`spacy train`](/api/cli#train), with a few extra options to configure the Ray -setup. You can optionally set the `--address` option to point to your Ray -cluster. If it's not set, Ray will run locally. - -```cli -python -m spacy ray train config.cfg --n-workers 2 -``` - - - -Get started with parallel training using our project template. It trains a -simple model on a Universal Dependencies Treebank and lets you parallelize the -training with Ray. - - - -### How parallel training works {#parallel-training-details} - -Each worker receives a shard of the **data** and builds a copy of the **model -and optimizer** from the [`config.cfg`](#config). It also has a communication -channel to **pass gradients and parameters** to the other workers. Additionally, -each worker is given ownership of a subset of the parameter arrays. Every -parameter array is owned by exactly one worker, and the workers are given a -mapping so they know which worker owns which parameter. - -![Illustration of setup](../images/spacy-ray.svg) - -As training proceeds, every worker will be computing gradients for **all** of -the model parameters. When they compute gradients for parameters they don't own, -they'll **send them to the worker** that does own that parameter, along with a -version identifier so that the owner can decide whether to discard the gradient. -Workers use the gradients they receive and the ones they compute locally to -update the parameters they own, and then broadcast the updated array and a new -version ID to the other workers. - -This training procedure is **asynchronous** and **non-blocking**. Workers always -push their gradient increments and parameter updates, they do not have to pull -them and block on the result, so the transfers can happen in the background, -overlapped with the actual training work. The workers also do not have to stop -and wait for each other ("synchronize") at the start of each batch. This is very -useful for spaCy, because spaCy is often trained on long documents, which means -**batches can vary in size** significantly. Uneven workloads make synchronous -gradient descent inefficient, because if one batch is slow, all of the other -workers are stuck waiting for it to complete before they can continue. - ## Internal training API {#api}