Remove more ray docs

2025-08-07 13:44:55 +03:00 · 2022-11-10 12:45:22 +09:00 · 2022-11-10 12:45:22 +09:00 · 82d34828dd
commit 82d34828dd
parent 42c02ae8e0
3 changed files with 0 additions and 120 deletions
--- a/website/docs/usage/index.md
+++ b/website/docs/usage/index.md
@ -75,7 +75,6 @@ spaCy's [`setup.cfg`](%%GITHUB_SPACY/setup.cfg) for details on what's included.
 | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `lookups`        | Install [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) for data tables for lemmatization and lexeme normalization. The data is serialized with trained pipelines, so you only need this package if you want to train your own models. |
 | `transformers`   | Install [`spacy-transformers`](https://github.com/explosion/spacy-transformers). The package will be installed automatically when you install a transformer-based pipeline.                                                                                    |
 | `ray`            | Install [`spacy-ray`](https://github.com/explosion/spacy-ray) to add CLI commands for [parallel training](/usage/training#parallel-training).                                                                                                                  |
 | `cuda`, ...      | Install spaCy with GPU support provided by [CuPy](https://cupy.chainer.org) for your given CUDA version. See the GPU [installation instructions](#gpu) for details and options.                                                                                |
 | `apple`          | Install [`thinc-apple-ops`](https://github.com/explosion/thinc-apple-ops) to improve performance on an Apple M1.                                                                                                                                               |
 | `ja`, `ko`, `th` | Install additional dependencies required for tokenization for the [languages](/usage/models#languages).                                                                                                                                                        |
--- a/website/docs/usage/projects.md
+++ b/website/docs/usage/projects.md
@ -1014,54 +1014,6 @@ https://github.com/explosion/projects/blob/v3/integrations/fastapi/scripts/main.
 ---
 ### Ray {#ray} <IntegrationLogo name="ray" width={100} height="auto" align="right" />
 > #### Installation
 >
 > ```cli
 > $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
 > # Check that the CLI is registered
 > $ python -m spacy ray --help
 > ```
 [Ray](https://ray.io/) is a fast and simple framework for building and running
 **distributed applications**. You can use Ray for parallel and distributed
 training with spaCy via our lightweight
 [`spacy-ray`](https://github.com/explosion/spacy-ray) extension package. If the
 package is installed in the same environment as spaCy, it will automatically add
 [`spacy ray`](/api/cli#ray) commands to your spaCy CLI. See the usage guide on
 [parallel training](/usage/training#parallel-training) for more details on how
 it works under the hood.
 <Project id="integrations/ray">
 Get started with parallel training using our project template. It trains a
 simple model on a Universal Dependencies Treebank and lets you parallelize the
 training with Ray.
 </Project>
 You can integrate [`spacy ray train`](/api/cli#ray-train) into your
 `project.yml` just like the regular training command and pass it the config, and
 optional output directory or remote storage URL and config overrides if needed.
 <!-- prettier-ignore -->
 ```yaml
 ### project.yml
 commands:
  - name: "ray"
    help: "Train a model via parallel training with Ray"
    script:
      - "python -m spacy ray train configs/config.cfg -o training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy"
    deps:
      - "corpus/train.spacy"
      - "corpus/dev.spacy"
    outputs:
      - "training/model-best"
 ```
 ---
 ### Weights & Biases {#wandb} <IntegrationLogo name="wandb" width={175} height="auto" align="right" />
 [Weights & Biases](https://www.wandb.com/) is a popular platform for experiment
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -1572,77 +1572,6 @@ token-based annotations like the dependency parse or entity labels, you'll need
 to take care to adjust the `Example` object so its annotations match and remain
 valid.
 ## Parallel & distributed training with Ray {#parallel-training}
 > #### Installation
 >
 > ```cli
 > $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
 > # Check that the CLI is registered
 > $ python -m spacy ray --help
 > ```
 [Ray](https://ray.io/) is a fast and simple framework for building and running
 **distributed applications**. You can use Ray to train spaCy on one or more
 remote machines, potentially speeding up your training process. Parallel
 training won't always be faster though – it depends on your batch size, models,
 and hardware.
 <Infobox variant="warning">
 To use Ray with spaCy, you need the
 [`spacy-ray`](https://github.com/explosion/spacy-ray) package installed.
 Installing the package will automatically add the `ray` command to the spaCy
 CLI.
 </Infobox>
 The [`spacy ray train`](/api/cli#ray-train) command follows the same API as
 [`spacy train`](/api/cli#train), with a few extra options to configure the Ray
 setup. You can optionally set the `--address` option to point to your Ray
 cluster. If it's not set, Ray will run locally.
 ```cli
 python -m spacy ray train config.cfg --n-workers 2
 ```
 <Project id="integrations/ray">
 Get started with parallel training using our project template. It trains a
 simple model on a Universal Dependencies Treebank and lets you parallelize the
 training with Ray.
 </Project>
 ### How parallel training works {#parallel-training-details}
 Each worker receives a shard of the **data** and builds a copy of the **model
 and optimizer** from the [`config.cfg`](#config). It also has a communication
 channel to **pass gradients and parameters** to the other workers. Additionally,
 each worker is given ownership of a subset of the parameter arrays. Every
 parameter array is owned by exactly one worker, and the workers are given a
 mapping so they know which worker owns which parameter.
 ![Illustration of setup](../images/spacy-ray.svg)
 As training proceeds, every worker will be computing gradients for **all** of
 the model parameters. When they compute gradients for parameters they don't own,
 they'll **send them to the worker** that does own that parameter, along with a
 version identifier so that the owner can decide whether to discard the gradient.
 Workers use the gradients they receive and the ones they compute locally to
 update the parameters they own, and then broadcast the updated array and a new
 version ID to the other workers.
 This training procedure is **asynchronous** and **non-blocking**. Workers always
 push their gradient increments and parameter updates, they do not have to pull
 them and block on the result, so the transfers can happen in the background,
 overlapped with the actual training work. The workers also do not have to stop
 and wait for each other ("synchronize") at the start of each batch. This is very
 useful for spaCy, because spaCy is often trained on long documents, which means
 **batches can vary in size** significantly. Uneven workloads make synchronous
 gradient descent inefficient, because if one batch is slow, all of the other
 workers are stuck waiting for it to complete before they can continue.
 ## Internal training API {#api}
 <Infobox variant="danger">