mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 01:48:04 +03:00 
			
		
		
		
	Remove spacy-ray from docs (#11781)
* Remove spacy ray from cli docs * Remove more ray docs * Remove ray from universe
This commit is contained in:
		
							parent
							
								
									3478ff1eb0
								
							
						
					
					
						commit
						bb523d4d91
					
				| 
						 | 
					@ -15,7 +15,6 @@ menu:
 | 
				
			||||||
  - ['assemble', 'assemble']
 | 
					  - ['assemble', 'assemble']
 | 
				
			||||||
  - ['package', 'package']
 | 
					  - ['package', 'package']
 | 
				
			||||||
  - ['project', 'project']
 | 
					  - ['project', 'project']
 | 
				
			||||||
  - ['ray', 'ray']
 | 
					 | 
				
			||||||
  - ['huggingface-hub', 'huggingface-hub']
 | 
					  - ['huggingface-hub', 'huggingface-hub']
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -1502,50 +1501,6 @@ $ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose] [--
 | 
				
			||||||
| `--help`, `-h`    | Show help message and available arguments. ~~bool (flag)~~                                                    |
 | 
					| `--help`, `-h`    | Show help message and available arguments. ~~bool (flag)~~                                                    |
 | 
				
			||||||
| **CREATES**       | A `dvc.yaml` file in the project directory, based on the steps defined in the given workflow.                 |
 | 
					| **CREATES**       | A `dvc.yaml` file in the project directory, based on the steps defined in the given workflow.                 |
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## ray {#ray new="3"}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
The `spacy ray` CLI includes commands for parallel and distributed computing via
 | 
					 | 
				
			||||||
[Ray](https://ray.io).
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
<Infobox variant="warning">
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
To use this command, you need the
 | 
					 | 
				
			||||||
[`spacy-ray`](https://github.com/explosion/spacy-ray) package installed.
 | 
					 | 
				
			||||||
Installing the package will automatically add the `ray` command to the spaCy
 | 
					 | 
				
			||||||
CLI.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
</Infobox>
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
### ray train {#ray-train tag="command"}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Train a spaCy pipeline using [Ray](https://ray.io) for parallel training. The
 | 
					 | 
				
			||||||
command works just like [`spacy train`](/api/cli#train). For more details and
 | 
					 | 
				
			||||||
examples, see the usage guide on
 | 
					 | 
				
			||||||
[parallel training](/usage/training#parallel-training) and the spaCy project
 | 
					 | 
				
			||||||
[integration](/usage/projects#ray).
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
```cli
 | 
					 | 
				
			||||||
$ python -m spacy ray train [config_path] [--code] [--output] [--n-workers] [--address] [--gpu-id] [--verbose] [overrides]
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
> #### Example
 | 
					 | 
				
			||||||
>
 | 
					 | 
				
			||||||
> ```cli
 | 
					 | 
				
			||||||
> $ python -m spacy ray train config.cfg --n-workers 2
 | 
					 | 
				
			||||||
> ```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
| Name                | Description                                                                                                                                                                                |
 | 
					 | 
				
			||||||
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | 
					 | 
				
			||||||
| `config_path`       | Path to [training config](/api/data-formats#config) file containing all settings and hyperparameters. ~~Path (positional)~~                                                                |
 | 
					 | 
				
			||||||
| `--code`, `-c`      | Path to Python file with additional code to be imported. Allows [registering custom functions](/usage/training#custom-functions) for new architectures. ~~Optional[Path] \(option)~~       |
 | 
					 | 
				
			||||||
| `--output`, `-o`    | Directory or remote storage URL for saving trained pipeline. The directory will be created if it doesn't exist. ~~Optional[Path] \(option)~~                                               |
 | 
					 | 
				
			||||||
| `--n-workers`, `-n` | The number of workers. Defaults to `1`. ~~int (option)~~                                                                                                                                   |
 | 
					 | 
				
			||||||
| `--address`, `-a`   | Optional address of the Ray cluster. If not set (default), Ray will run locally. ~~Optional[str] \(option)~~                                                                               |
 | 
					 | 
				
			||||||
| `--gpu-id`, `-g`    | GPU ID or `-1` for CPU. Defaults to `-1`. ~~int (option)~~                                                                                                                                 |
 | 
					 | 
				
			||||||
| `--verbose`, `-V`   | Display more information for debugging purposes. ~~bool (flag)~~                                                                                                                           |
 | 
					 | 
				
			||||||
| `--help`, `-h`      | Show help message and available arguments. ~~bool (flag)~~                                                                                                                                 |
 | 
					 | 
				
			||||||
| overrides           | Config parameters to override. Should be options starting with `--` that correspond to the config section and value to override, e.g. `--paths.train ./train.spacy`. ~~Any (option/flag)~~ |
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
## huggingface-hub {#huggingface-hub new="3.1"}
 | 
					## huggingface-hub {#huggingface-hub new="3.1"}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The `spacy huggingface-cli` CLI includes commands for uploading your trained
 | 
					The `spacy huggingface-cli` CLI includes commands for uploading your trained
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -75,7 +75,6 @@ spaCy's [`setup.cfg`](%%GITHUB_SPACY/setup.cfg) for details on what's included.
 | 
				
			||||||
| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 
					| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | 
				
			||||||
| `lookups`        | Install [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) for data tables for lemmatization and lexeme normalization. The data is serialized with trained pipelines, so you only need this package if you want to train your own models. |
 | 
					| `lookups`        | Install [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) for data tables for lemmatization and lexeme normalization. The data is serialized with trained pipelines, so you only need this package if you want to train your own models. |
 | 
				
			||||||
| `transformers`   | Install [`spacy-transformers`](https://github.com/explosion/spacy-transformers). The package will be installed automatically when you install a transformer-based pipeline.                                                                                    |
 | 
					| `transformers`   | Install [`spacy-transformers`](https://github.com/explosion/spacy-transformers). The package will be installed automatically when you install a transformer-based pipeline.                                                                                    |
 | 
				
			||||||
| `ray`            | Install [`spacy-ray`](https://github.com/explosion/spacy-ray) to add CLI commands for [parallel training](/usage/training#parallel-training).                                                                                                                  |
 | 
					 | 
				
			||||||
| `cuda`, ...      | Install spaCy with GPU support provided by [CuPy](https://cupy.chainer.org) for your given CUDA version. See the GPU [installation instructions](#gpu) for details and options.                                                                                |
 | 
					| `cuda`, ...      | Install spaCy with GPU support provided by [CuPy](https://cupy.chainer.org) for your given CUDA version. See the GPU [installation instructions](#gpu) for details and options.                                                                                |
 | 
				
			||||||
| `apple`          | Install [`thinc-apple-ops`](https://github.com/explosion/thinc-apple-ops) to improve performance on an Apple M1.                                                                                                                                               |
 | 
					| `apple`          | Install [`thinc-apple-ops`](https://github.com/explosion/thinc-apple-ops) to improve performance on an Apple M1.                                                                                                                                               |
 | 
				
			||||||
| `ja`, `ko`, `th` | Install additional dependencies required for tokenization for the [languages](/usage/models#languages).                                                                                                                                                        |
 | 
					| `ja`, `ko`, `th` | Install additional dependencies required for tokenization for the [languages](/usage/models#languages).                                                                                                                                                        |
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -1014,54 +1014,6 @@ https://github.com/explosion/projects/blob/v3/integrations/fastapi/scripts/main.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Ray {#ray} <IntegrationLogo name="ray" width={100} height="auto" align="right" />
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
> #### Installation
 | 
					 | 
				
			||||||
>
 | 
					 | 
				
			||||||
> ```cli
 | 
					 | 
				
			||||||
> $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
 | 
					 | 
				
			||||||
> # Check that the CLI is registered
 | 
					 | 
				
			||||||
> $ python -m spacy ray --help
 | 
					 | 
				
			||||||
> ```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
[Ray](https://ray.io/) is a fast and simple framework for building and running
 | 
					 | 
				
			||||||
**distributed applications**. You can use Ray for parallel and distributed
 | 
					 | 
				
			||||||
training with spaCy via our lightweight
 | 
					 | 
				
			||||||
[`spacy-ray`](https://github.com/explosion/spacy-ray) extension package. If the
 | 
					 | 
				
			||||||
package is installed in the same environment as spaCy, it will automatically add
 | 
					 | 
				
			||||||
[`spacy ray`](/api/cli#ray) commands to your spaCy CLI. See the usage guide on
 | 
					 | 
				
			||||||
[parallel training](/usage/training#parallel-training) for more details on how
 | 
					 | 
				
			||||||
it works under the hood.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
<Project id="integrations/ray">
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Get started with parallel training using our project template. It trains a
 | 
					 | 
				
			||||||
simple model on a Universal Dependencies Treebank and lets you parallelize the
 | 
					 | 
				
			||||||
training with Ray.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
</Project>
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
You can integrate [`spacy ray train`](/api/cli#ray-train) into your
 | 
					 | 
				
			||||||
`project.yml` just like the regular training command and pass it the config, and
 | 
					 | 
				
			||||||
optional output directory or remote storage URL and config overrides if needed.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
<!-- prettier-ignore -->
 | 
					 | 
				
			||||||
```yaml
 | 
					 | 
				
			||||||
### project.yml
 | 
					 | 
				
			||||||
commands:
 | 
					 | 
				
			||||||
  - name: "ray"
 | 
					 | 
				
			||||||
    help: "Train a model via parallel training with Ray"
 | 
					 | 
				
			||||||
    script:
 | 
					 | 
				
			||||||
      - "python -m spacy ray train configs/config.cfg -o training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy"
 | 
					 | 
				
			||||||
    deps:
 | 
					 | 
				
			||||||
      - "corpus/train.spacy"
 | 
					 | 
				
			||||||
      - "corpus/dev.spacy"
 | 
					 | 
				
			||||||
    outputs:
 | 
					 | 
				
			||||||
      - "training/model-best"
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
---
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
### Weights & Biases {#wandb} <IntegrationLogo name="wandb" width={175} height="auto" align="right" />
 | 
					### Weights & Biases {#wandb} <IntegrationLogo name="wandb" width={175} height="auto" align="right" />
 | 
				
			||||||
 | 
					
 | 
				
			||||||
[Weights & Biases](https://www.wandb.com/) is a popular platform for experiment
 | 
					[Weights & Biases](https://www.wandb.com/) is a popular platform for experiment
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -1572,77 +1572,6 @@ token-based annotations like the dependency parse or entity labels, you'll need
 | 
				
			||||||
to take care to adjust the `Example` object so its annotations match and remain
 | 
					to take care to adjust the `Example` object so its annotations match and remain
 | 
				
			||||||
valid.
 | 
					valid.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Parallel & distributed training with Ray {#parallel-training}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
> #### Installation
 | 
					 | 
				
			||||||
>
 | 
					 | 
				
			||||||
> ```cli
 | 
					 | 
				
			||||||
> $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
 | 
					 | 
				
			||||||
> # Check that the CLI is registered
 | 
					 | 
				
			||||||
> $ python -m spacy ray --help
 | 
					 | 
				
			||||||
> ```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
[Ray](https://ray.io/) is a fast and simple framework for building and running
 | 
					 | 
				
			||||||
**distributed applications**. You can use Ray to train spaCy on one or more
 | 
					 | 
				
			||||||
remote machines, potentially speeding up your training process. Parallel
 | 
					 | 
				
			||||||
training won't always be faster though – it depends on your batch size, models,
 | 
					 | 
				
			||||||
and hardware.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
<Infobox variant="warning">
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
To use Ray with spaCy, you need the
 | 
					 | 
				
			||||||
[`spacy-ray`](https://github.com/explosion/spacy-ray) package installed.
 | 
					 | 
				
			||||||
Installing the package will automatically add the `ray` command to the spaCy
 | 
					 | 
				
			||||||
CLI.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
</Infobox>
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
The [`spacy ray train`](/api/cli#ray-train) command follows the same API as
 | 
					 | 
				
			||||||
[`spacy train`](/api/cli#train), with a few extra options to configure the Ray
 | 
					 | 
				
			||||||
setup. You can optionally set the `--address` option to point to your Ray
 | 
					 | 
				
			||||||
cluster. If it's not set, Ray will run locally.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
```cli
 | 
					 | 
				
			||||||
python -m spacy ray train config.cfg --n-workers 2
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
<Project id="integrations/ray">
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Get started with parallel training using our project template. It trains a
 | 
					 | 
				
			||||||
simple model on a Universal Dependencies Treebank and lets you parallelize the
 | 
					 | 
				
			||||||
training with Ray.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
</Project>
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
### How parallel training works {#parallel-training-details}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Each worker receives a shard of the **data** and builds a copy of the **model
 | 
					 | 
				
			||||||
and optimizer** from the [`config.cfg`](#config). It also has a communication
 | 
					 | 
				
			||||||
channel to **pass gradients and parameters** to the other workers. Additionally,
 | 
					 | 
				
			||||||
each worker is given ownership of a subset of the parameter arrays. Every
 | 
					 | 
				
			||||||
parameter array is owned by exactly one worker, and the workers are given a
 | 
					 | 
				
			||||||
mapping so they know which worker owns which parameter.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||

 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
As training proceeds, every worker will be computing gradients for **all** of
 | 
					 | 
				
			||||||
the model parameters. When they compute gradients for parameters they don't own,
 | 
					 | 
				
			||||||
they'll **send them to the worker** that does own that parameter, along with a
 | 
					 | 
				
			||||||
version identifier so that the owner can decide whether to discard the gradient.
 | 
					 | 
				
			||||||
Workers use the gradients they receive and the ones they compute locally to
 | 
					 | 
				
			||||||
update the parameters they own, and then broadcast the updated array and a new
 | 
					 | 
				
			||||||
version ID to the other workers.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
This training procedure is **asynchronous** and **non-blocking**. Workers always
 | 
					 | 
				
			||||||
push their gradient increments and parameter updates, they do not have to pull
 | 
					 | 
				
			||||||
them and block on the result, so the transfers can happen in the background,
 | 
					 | 
				
			||||||
overlapped with the actual training work. The workers also do not have to stop
 | 
					 | 
				
			||||||
and wait for each other ("synchronize") at the start of each batch. This is very
 | 
					 | 
				
			||||||
useful for spaCy, because spaCy is often trained on long documents, which means
 | 
					 | 
				
			||||||
**batches can vary in size** significantly. Uneven workloads make synchronous
 | 
					 | 
				
			||||||
gradient descent inefficient, because if one batch is slow, all of the other
 | 
					 | 
				
			||||||
workers are stuck waiting for it to complete before they can continue.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
## Internal training API {#api}
 | 
					## Internal training API {#api}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
<Infobox variant="danger">
 | 
					<Infobox variant="danger">
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -557,17 +557,6 @@
 | 
				
			||||||
            "tags": ["sentiment", "textblob"],
 | 
					            "tags": ["sentiment", "textblob"],
 | 
				
			||||||
            "spacy_version": 3
 | 
					            "spacy_version": 3
 | 
				
			||||||
        },
 | 
					        },
 | 
				
			||||||
        {
 | 
					 | 
				
			||||||
            "id": "spacy-ray",
 | 
					 | 
				
			||||||
            "title": "spacy-ray",
 | 
					 | 
				
			||||||
            "slogan": "Parallel and distributed training with spaCy and Ray",
 | 
					 | 
				
			||||||
            "description": "[Ray](https://ray.io/) is a fast and simple framework for building and running **distributed applications**. This very lightweight extension package lets you use Ray for parallel and distributed training with spaCy. If `spacy-ray` is installed in the same environment as spaCy, it will automatically add `spacy ray` commands to your spaCy CLI.",
 | 
					 | 
				
			||||||
            "github": "explosion/spacy-ray",
 | 
					 | 
				
			||||||
            "pip": "spacy-ray",
 | 
					 | 
				
			||||||
            "category": ["training"],
 | 
					 | 
				
			||||||
            "author": "Explosion / Anyscale",
 | 
					 | 
				
			||||||
            "thumb": "https://i.imgur.com/7so6ZpS.png"
 | 
					 | 
				
			||||||
        },
 | 
					 | 
				
			||||||
        {
 | 
					        {
 | 
				
			||||||
            "id": "spacy-sentence-bert",
 | 
					            "id": "spacy-sentence-bert",
 | 
				
			||||||
            "title": "spaCy - sentence-transformers",
 | 
					            "title": "spaCy - sentence-transformers",
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user