Remove spacy-ray from docs (#11781)

* Remove spacy ray from cli docs * Remove more ray docs * Remove ray from universe
2025-11-04 01:48:04 +03:00 · 2022-11-14 19:58:38 +09:00 · 2022-11-14 19:58:38 +09:00 · bb523d4d91
commit bb523d4d91
parent 3478ff1eb0
5 changed files with 0 additions and 176 deletions
--- a/website/docs/api/cli.md
+++ b/website/docs/api/cli.md
@ -15,7 +15,6 @@ menu:
  - ['assemble', 'assemble']
  - ['package', 'package']
  - ['project', 'project']
  - ['ray', 'ray']
  - ['huggingface-hub', 'huggingface-hub']
 ---
@ -1502,50 +1501,6 @@ $ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose] [--
 | `--help`, `-h`    | Show help message and available arguments. ~~bool (flag)~~                                                    |
 | **CREATES**       | A `dvc.yaml` file in the project directory, based on the steps defined in the given workflow.                 |
 ## ray {#ray new="3"}
 The `spacy ray` CLI includes commands for parallel and distributed computing via
 [Ray](https://ray.io).
 <Infobox variant="warning">
 To use this command, you need the
 [`spacy-ray`](https://github.com/explosion/spacy-ray) package installed.
 Installing the package will automatically add the `ray` command to the spaCy
 CLI.
 </Infobox>
 ### ray train {#ray-train tag="command"}
 Train a spaCy pipeline using [Ray](https://ray.io) for parallel training. The
 command works just like [`spacy train`](/api/cli#train). For more details and
 examples, see the usage guide on
 [parallel training](/usage/training#parallel-training) and the spaCy project
 [integration](/usage/projects#ray).
 ```cli
 $ python -m spacy ray train [config_path] [--code] [--output] [--n-workers] [--address] [--gpu-id] [--verbose] [overrides]
 ```
 > #### Example
 >
 > ```cli
 > $ python -m spacy ray train config.cfg --n-workers 2
 > ```
 | Name                | Description                                                                                                                                                                                |
 | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | `config_path`       | Path to [training config](/api/data-formats#config) file containing all settings and hyperparameters. ~~Path (positional)~~                                                                |
 | `--code`, `-c`      | Path to Python file with additional code to be imported. Allows [registering custom functions](/usage/training#custom-functions) for new architectures. ~~Optional[Path] \(option)~~       |
 | `--output`, `-o`    | Directory or remote storage URL for saving trained pipeline. The directory will be created if it doesn't exist. ~~Optional[Path] \(option)~~                                               |
 | `--n-workers`, `-n` | The number of workers. Defaults to `1`. ~~int (option)~~                                                                                                                                   |
 | `--address`, `-a`   | Optional address of the Ray cluster. If not set (default), Ray will run locally. ~~Optional[str] \(option)~~                                                                               |
 | `--gpu-id`, `-g`    | GPU ID or `-1` for CPU. Defaults to `-1`. ~~int (option)~~                                                                                                                                 |
 | `--verbose`, `-V`   | Display more information for debugging purposes. ~~bool (flag)~~                                                                                                                           |
 | `--help`, `-h`      | Show help message and available arguments. ~~bool (flag)~~                                                                                                                                 |
 | overrides           | Config parameters to override. Should be options starting with `--` that correspond to the config section and value to override, e.g. `--paths.train ./train.spacy`. ~~Any (option/flag)~~ |
 ## huggingface-hub {#huggingface-hub new="3.1"}
 The `spacy huggingface-cli` CLI includes commands for uploading your trained
--- a/website/docs/usage/index.md
+++ b/website/docs/usage/index.md
@ -75,7 +75,6 @@ spaCy's [`setup.cfg`](%%GITHUB_SPACY/setup.cfg) for details on what's included.
 | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `lookups`        | Install [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) for data tables for lemmatization and lexeme normalization. The data is serialized with trained pipelines, so you only need this package if you want to train your own models. |
 | `transformers`   | Install [`spacy-transformers`](https://github.com/explosion/spacy-transformers). The package will be installed automatically when you install a transformer-based pipeline.                                                                                    |
 | `ray`            | Install [`spacy-ray`](https://github.com/explosion/spacy-ray) to add CLI commands for [parallel training](/usage/training#parallel-training).                                                                                                                  |
 | `cuda`, ...      | Install spaCy with GPU support provided by [CuPy](https://cupy.chainer.org) for your given CUDA version. See the GPU [installation instructions](#gpu) for details and options.                                                                                |
 | `apple`          | Install [`thinc-apple-ops`](https://github.com/explosion/thinc-apple-ops) to improve performance on an Apple M1.                                                                                                                                               |
 | `ja`, `ko`, `th` | Install additional dependencies required for tokenization for the [languages](/usage/models#languages).                                                                                                                                                        |
--- a/website/docs/usage/projects.md
+++ b/website/docs/usage/projects.md
@ -1014,54 +1014,6 @@ https://github.com/explosion/projects/blob/v3/integrations/fastapi/scripts/main.
 ---
 ### Ray {#ray} <IntegrationLogo name="ray" width={100} height="auto" align="right" />
 > #### Installation
 >
 > ```cli
 > $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
 > # Check that the CLI is registered
 > $ python -m spacy ray --help
 > ```
 [Ray](https://ray.io/) is a fast and simple framework for building and running
 **distributed applications**. You can use Ray for parallel and distributed
 training with spaCy via our lightweight
 [`spacy-ray`](https://github.com/explosion/spacy-ray) extension package. If the
 package is installed in the same environment as spaCy, it will automatically add
 [`spacy ray`](/api/cli#ray) commands to your spaCy CLI. See the usage guide on
 [parallel training](/usage/training#parallel-training) for more details on how
 it works under the hood.
 <Project id="integrations/ray">
 Get started with parallel training using our project template. It trains a
 simple model on a Universal Dependencies Treebank and lets you parallelize the
 training with Ray.
 </Project>
 You can integrate [`spacy ray train`](/api/cli#ray-train) into your
 `project.yml` just like the regular training command and pass it the config, and
 optional output directory or remote storage URL and config overrides if needed.
 <!-- prettier-ignore -->
 ```yaml
 ### project.yml
 commands:
  - name: "ray"
    help: "Train a model via parallel training with Ray"
    script:
      - "python -m spacy ray train configs/config.cfg -o training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy"
    deps:
      - "corpus/train.spacy"
      - "corpus/dev.spacy"
    outputs:
      - "training/model-best"
 ```
 ---
 ### Weights & Biases {#wandb} <IntegrationLogo name="wandb" width={175} height="auto" align="right" />
 [Weights & Biases](https://www.wandb.com/) is a popular platform for experiment
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -1572,77 +1572,6 @@ token-based annotations like the dependency parse or entity labels, you'll need
 to take care to adjust the `Example` object so its annotations match and remain
 valid.
 ## Parallel & distributed training with Ray {#parallel-training}
 > #### Installation
 >
 > ```cli
 > $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
 > # Check that the CLI is registered
 > $ python -m spacy ray --help
 > ```
 [Ray](https://ray.io/) is a fast and simple framework for building and running
 **distributed applications**. You can use Ray to train spaCy on one or more
 remote machines, potentially speeding up your training process. Parallel
 training won't always be faster though – it depends on your batch size, models,
 and hardware.
 <Infobox variant="warning">
 To use Ray with spaCy, you need the
 [`spacy-ray`](https://github.com/explosion/spacy-ray) package installed.
 Installing the package will automatically add the `ray` command to the spaCy
 CLI.
 </Infobox>
 The [`spacy ray train`](/api/cli#ray-train) command follows the same API as
 [`spacy train`](/api/cli#train), with a few extra options to configure the Ray
 setup. You can optionally set the `--address` option to point to your Ray
 cluster. If it's not set, Ray will run locally.
 ```cli
 python -m spacy ray train config.cfg --n-workers 2
 ```
 <Project id="integrations/ray">
 Get started with parallel training using our project template. It trains a
 simple model on a Universal Dependencies Treebank and lets you parallelize the
 training with Ray.
 </Project>
 ### How parallel training works {#parallel-training-details}
 Each worker receives a shard of the **data** and builds a copy of the **model
 and optimizer** from the [`config.cfg`](#config). It also has a communication
 channel to **pass gradients and parameters** to the other workers. Additionally,
 each worker is given ownership of a subset of the parameter arrays. Every
 parameter array is owned by exactly one worker, and the workers are given a
 mapping so they know which worker owns which parameter.
 ![Illustration of setup](../images/spacy-ray.svg)
 As training proceeds, every worker will be computing gradients for **all** of
 the model parameters. When they compute gradients for parameters they don't own,
 they'll **send them to the worker** that does own that parameter, along with a
 version identifier so that the owner can decide whether to discard the gradient.
 Workers use the gradients they receive and the ones they compute locally to
 update the parameters they own, and then broadcast the updated array and a new
 version ID to the other workers.
 This training procedure is **asynchronous** and **non-blocking**. Workers always
 push their gradient increments and parameter updates, they do not have to pull
 them and block on the result, so the transfers can happen in the background,
 overlapped with the actual training work. The workers also do not have to stop
 and wait for each other ("synchronize") at the start of each batch. This is very
 useful for spaCy, because spaCy is often trained on long documents, which means
 **batches can vary in size** significantly. Uneven workloads make synchronous
 gradient descent inefficient, because if one batch is slow, all of the other
 workers are stuck waiting for it to complete before they can continue.
 ## Internal training API {#api}
 <Infobox variant="danger">
--- a/website/meta/universe.json
+++ b/website/meta/universe.json
@ -557,17 +557,6 @@
            "tags": ["sentiment", "textblob"],
            "spacy_version": 3
        },
        {
            "id": "spacy-ray",
            "title": "spacy-ray",
            "slogan": "Parallel and distributed training with spaCy and Ray",
            "description": "[Ray](https://ray.io/) is a fast and simple framework for building and running **distributed applications**. This very lightweight extension package lets you use Ray for parallel and distributed training with spaCy. If `spacy-ray` is installed in the same environment as spaCy, it will automatically add `spacy ray` commands to your spaCy CLI.",
            "github": "explosion/spacy-ray",
            "pip": "spacy-ray",
            "category": ["training"],
            "author": "Explosion / Anyscale",
            "thumb": "https://i.imgur.com/7so6ZpS.png"
        },
        {
            "id": "spacy-sentence-bert",
            "title": "spaCy - sentence-transformers",