Remove spacy-ray from docs (#11781)

* Remove spacy ray from cli docs

* Remove more ray docs

* Remove ray from universe
This commit is contained in:
Paul O'Leary McCann 2022-11-14 19:58:38 +09:00 committed by GitHub
parent 3478ff1eb0
commit bb523d4d91
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 0 additions and 176 deletions

View File

@ -15,7 +15,6 @@ menu:
- ['assemble', 'assemble']
- ['package', 'package']
- ['project', 'project']
- ['ray', 'ray']
- ['huggingface-hub', 'huggingface-hub']
---
@ -1502,50 +1501,6 @@ $ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose] [--
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | A `dvc.yaml` file in the project directory, based on the steps defined in the given workflow. |
## ray {#ray new="3"}
The `spacy ray` CLI includes commands for parallel and distributed computing via
[Ray](https://ray.io).
<Infobox variant="warning">
To use this command, you need the
[`spacy-ray`](https://github.com/explosion/spacy-ray) package installed.
Installing the package will automatically add the `ray` command to the spaCy
CLI.
</Infobox>
### ray train {#ray-train tag="command"}
Train a spaCy pipeline using [Ray](https://ray.io) for parallel training. The
command works just like [`spacy train`](/api/cli#train). For more details and
examples, see the usage guide on
[parallel training](/usage/training#parallel-training) and the spaCy project
[integration](/usage/projects#ray).
```cli
$ python -m spacy ray train [config_path] [--code] [--output] [--n-workers] [--address] [--gpu-id] [--verbose] [overrides]
```
> #### Example
>
> ```cli
> $ python -m spacy ray train config.cfg --n-workers 2
> ```
| Name | Description |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `config_path` | Path to [training config](/api/data-formats#config) file containing all settings and hyperparameters. ~~Path (positional)~~ |
| `--code`, `-c` | Path to Python file with additional code to be imported. Allows [registering custom functions](/usage/training#custom-functions) for new architectures. ~~Optional[Path] \(option)~~ |
| `--output`, `-o` | Directory or remote storage URL for saving trained pipeline. The directory will be created if it doesn't exist. ~~Optional[Path] \(option)~~ |
| `--n-workers`, `-n` | The number of workers. Defaults to `1`. ~~int (option)~~ |
| `--address`, `-a` | Optional address of the Ray cluster. If not set (default), Ray will run locally. ~~Optional[str] \(option)~~ |
| `--gpu-id`, `-g` | GPU ID or `-1` for CPU. Defaults to `-1`. ~~int (option)~~ |
| `--verbose`, `-V` | Display more information for debugging purposes. ~~bool (flag)~~ |
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| overrides | Config parameters to override. Should be options starting with `--` that correspond to the config section and value to override, e.g. `--paths.train ./train.spacy`. ~~Any (option/flag)~~ |
## huggingface-hub {#huggingface-hub new="3.1"}
The `spacy huggingface-cli` CLI includes commands for uploading your trained

View File

@ -75,7 +75,6 @@ spaCy's [`setup.cfg`](%%GITHUB_SPACY/setup.cfg) for details on what's included.
| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `lookups` | Install [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) for data tables for lemmatization and lexeme normalization. The data is serialized with trained pipelines, so you only need this package if you want to train your own models. |
| `transformers` | Install [`spacy-transformers`](https://github.com/explosion/spacy-transformers). The package will be installed automatically when you install a transformer-based pipeline. |
| `ray` | Install [`spacy-ray`](https://github.com/explosion/spacy-ray) to add CLI commands for [parallel training](/usage/training#parallel-training). |
| `cuda`, ... | Install spaCy with GPU support provided by [CuPy](https://cupy.chainer.org) for your given CUDA version. See the GPU [installation instructions](#gpu) for details and options. |
| `apple` | Install [`thinc-apple-ops`](https://github.com/explosion/thinc-apple-ops) to improve performance on an Apple M1. |
| `ja`, `ko`, `th` | Install additional dependencies required for tokenization for the [languages](/usage/models#languages). |

View File

@ -1014,54 +1014,6 @@ https://github.com/explosion/projects/blob/v3/integrations/fastapi/scripts/main.
---
### Ray {#ray} <IntegrationLogo name="ray" width={100} height="auto" align="right" />
> #### Installation
>
> ```cli
> $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
> # Check that the CLI is registered
> $ python -m spacy ray --help
> ```
[Ray](https://ray.io/) is a fast and simple framework for building and running
**distributed applications**. You can use Ray for parallel and distributed
training with spaCy via our lightweight
[`spacy-ray`](https://github.com/explosion/spacy-ray) extension package. If the
package is installed in the same environment as spaCy, it will automatically add
[`spacy ray`](/api/cli#ray) commands to your spaCy CLI. See the usage guide on
[parallel training](/usage/training#parallel-training) for more details on how
it works under the hood.
<Project id="integrations/ray">
Get started with parallel training using our project template. It trains a
simple model on a Universal Dependencies Treebank and lets you parallelize the
training with Ray.
</Project>
You can integrate [`spacy ray train`](/api/cli#ray-train) into your
`project.yml` just like the regular training command and pass it the config, and
optional output directory or remote storage URL and config overrides if needed.
<!-- prettier-ignore -->
```yaml
### project.yml
commands:
- name: "ray"
help: "Train a model via parallel training with Ray"
script:
- "python -m spacy ray train configs/config.cfg -o training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy"
deps:
- "corpus/train.spacy"
- "corpus/dev.spacy"
outputs:
- "training/model-best"
```
---
### Weights & Biases {#wandb} <IntegrationLogo name="wandb" width={175} height="auto" align="right" />
[Weights & Biases](https://www.wandb.com/) is a popular platform for experiment

View File

@ -1572,77 +1572,6 @@ token-based annotations like the dependency parse or entity labels, you'll need
to take care to adjust the `Example` object so its annotations match and remain
valid.
## Parallel & distributed training with Ray {#parallel-training}
> #### Installation
>
> ```cli
> $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
> # Check that the CLI is registered
> $ python -m spacy ray --help
> ```
[Ray](https://ray.io/) is a fast and simple framework for building and running
**distributed applications**. You can use Ray to train spaCy on one or more
remote machines, potentially speeding up your training process. Parallel
training won't always be faster though it depends on your batch size, models,
and hardware.
<Infobox variant="warning">
To use Ray with spaCy, you need the
[`spacy-ray`](https://github.com/explosion/spacy-ray) package installed.
Installing the package will automatically add the `ray` command to the spaCy
CLI.
</Infobox>
The [`spacy ray train`](/api/cli#ray-train) command follows the same API as
[`spacy train`](/api/cli#train), with a few extra options to configure the Ray
setup. You can optionally set the `--address` option to point to your Ray
cluster. If it's not set, Ray will run locally.
```cli
python -m spacy ray train config.cfg --n-workers 2
```
<Project id="integrations/ray">
Get started with parallel training using our project template. It trains a
simple model on a Universal Dependencies Treebank and lets you parallelize the
training with Ray.
</Project>
### How parallel training works {#parallel-training-details}
Each worker receives a shard of the **data** and builds a copy of the **model
and optimizer** from the [`config.cfg`](#config). It also has a communication
channel to **pass gradients and parameters** to the other workers. Additionally,
each worker is given ownership of a subset of the parameter arrays. Every
parameter array is owned by exactly one worker, and the workers are given a
mapping so they know which worker owns which parameter.
![Illustration of setup](../images/spacy-ray.svg)
As training proceeds, every worker will be computing gradients for **all** of
the model parameters. When they compute gradients for parameters they don't own,
they'll **send them to the worker** that does own that parameter, along with a
version identifier so that the owner can decide whether to discard the gradient.
Workers use the gradients they receive and the ones they compute locally to
update the parameters they own, and then broadcast the updated array and a new
version ID to the other workers.
This training procedure is **asynchronous** and **non-blocking**. Workers always
push their gradient increments and parameter updates, they do not have to pull
them and block on the result, so the transfers can happen in the background,
overlapped with the actual training work. The workers also do not have to stop
and wait for each other ("synchronize") at the start of each batch. This is very
useful for spaCy, because spaCy is often trained on long documents, which means
**batches can vary in size** significantly. Uneven workloads make synchronous
gradient descent inefficient, because if one batch is slow, all of the other
workers are stuck waiting for it to complete before they can continue.
## Internal training API {#api}
<Infobox variant="danger">

View File

@ -557,17 +557,6 @@
"tags": ["sentiment", "textblob"],
"spacy_version": 3
},
{
"id": "spacy-ray",
"title": "spacy-ray",
"slogan": "Parallel and distributed training with spaCy and Ray",
"description": "[Ray](https://ray.io/) is a fast and simple framework for building and running **distributed applications**. This very lightweight extension package lets you use Ray for parallel and distributed training with spaCy. If `spacy-ray` is installed in the same environment as spaCy, it will automatically add `spacy ray` commands to your spaCy CLI.",
"github": "explosion/spacy-ray",
"pip": "spacy-ray",
"category": ["training"],
"author": "Explosion / Anyscale",
"thumb": "https://i.imgur.com/7so6ZpS.png"
},
{
"id": "spacy-sentence-bert",
"title": "spaCy - sentence-transformers",