mirror of
https://github.com/explosion/spaCy.git
synced 2025-02-03 21:24:11 +03:00
Update docs [ci skip]
This commit is contained in:
parent
b2302c0a1c
commit
012b3a7096
|
@ -895,8 +895,6 @@ what you need. By default, spaCy's
|
||||||
can provide any other repo (public or private) that you have access to using the
|
can provide any other repo (public or private) that you have access to using the
|
||||||
`--repo` option.
|
`--repo` option.
|
||||||
|
|
||||||
<!-- TODO: update example once we've decided on repo structure -->
|
|
||||||
|
|
||||||
```cli
|
```cli
|
||||||
$ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse]
|
$ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse]
|
||||||
```
|
```
|
||||||
|
@ -904,7 +902,7 @@ $ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse]
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```cli
|
> ```cli
|
||||||
> $ python -m spacy project clone some_example
|
> $ python -m spacy project clone pipelines/ner_wikiner
|
||||||
> ```
|
> ```
|
||||||
>
|
>
|
||||||
> Clone from custom repo:
|
> Clone from custom repo:
|
||||||
|
|
|
@ -289,8 +289,7 @@ of objects by referring to creation functions, including functions you register
|
||||||
yourself. For details on how to get started with training your own model, check
|
yourself. For details on how to get started with training your own model, check
|
||||||
out the [training quickstart](/usage/training#quickstart).
|
out the [training quickstart](/usage/training#quickstart).
|
||||||
|
|
||||||
<!-- TODO:
|
<!-- TODO: <Project id="en_core_trf_lg">
|
||||||
<Project id="en_core_trf_lg">
|
|
||||||
|
|
||||||
The easiest way to get started is to clone a transformers-based project
|
The easiest way to get started is to clone a transformers-based project
|
||||||
template. Swap in your data, edit the settings and hyperparameters and train,
|
template. Swap in your data, edit the settings and hyperparameters and train,
|
||||||
|
@ -623,7 +622,7 @@ that are familiar from the training block: the `[pretraining.batcher]`,
|
||||||
`[pretraining.optimizer]` and `[pretraining.corpus]` all work the same way and
|
`[pretraining.optimizer]` and `[pretraining.corpus]` all work the same way and
|
||||||
expect the same types of objects, although for pretraining your corpus does not
|
expect the same types of objects, although for pretraining your corpus does not
|
||||||
need to have any annotations, so you will often use a different reader, such as
|
need to have any annotations, so you will often use a different reader, such as
|
||||||
the [`JsonlReader`](/api/toplevel#jsonlreader).
|
the [`JsonlReader`](/api/top-level#jsonlreader).
|
||||||
|
|
||||||
> #### Raw text format
|
> #### Raw text format
|
||||||
>
|
>
|
||||||
|
|
|
@ -45,7 +45,7 @@ spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy
|
||||||
right up to **current state-of-the-art**. You can also use a CPU-optimized
|
right up to **current state-of-the-art**. You can also use a CPU-optimized
|
||||||
pipeline, which is less accurate but much cheaper to run.
|
pipeline, which is less accurate but much cheaper to run.
|
||||||
|
|
||||||
<!-- TODO: -->
|
<!-- TODO: update benchmarks and intro -->
|
||||||
|
|
||||||
> #### Evaluation details
|
> #### Evaluation details
|
||||||
>
|
>
|
||||||
|
@ -68,6 +68,6 @@ our project template.
|
||||||
|
|
||||||
</Project>
|
</Project>
|
||||||
|
|
||||||
<!-- ## Citing spaCy {#citation}
|
<!-- TODO: ## Citing spaCy {#citation}
|
||||||
|
|
||||||
<!-- TODO: update -->
|
-->
|
||||||
|
|
|
@ -356,11 +356,11 @@ that training configs are complete and experiments fully reproducible.
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
Note that when using a PyTorch or Tensorflow model, it is recommended to set the GPU
|
Note that when using a PyTorch or Tensorflow model, it is recommended to set the
|
||||||
memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or
|
GPU memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or
|
||||||
"tensorflow" in the training config, cupy will allocate memory via those respective libraries,
|
"tensorflow" in the training config, cupy will allocate memory via those
|
||||||
preventing OOM errors when there's available memory sitting in the other
|
respective libraries, preventing OOM errors when there's available memory
|
||||||
library's pool.
|
sitting in the other library's pool.
|
||||||
|
|
||||||
```ini
|
```ini
|
||||||
### config.cfg (excerpt)
|
### config.cfg (excerpt)
|
||||||
|
@ -489,7 +489,7 @@ with Model.define_operators({">>": chain}):
|
||||||
<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
|
<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
<!-- TODO:
|
<!-- TODO: write trainable component section
|
||||||
- Interaction with `predict`, `get_loss` and `set_annotations`
|
- Interaction with `predict`, `get_loss` and `set_annotations`
|
||||||
- Initialization life-cycle with `begin_training`, correlation with add_label
|
- Initialization life-cycle with `begin_training`, correlation with add_label
|
||||||
Example: relation extraction component (implemented as project template)
|
Example: relation extraction component (implemented as project template)
|
||||||
|
|
|
@ -381,8 +381,6 @@ and loading pipeline packages, the underlying functionality is entirely based on
|
||||||
native Python packaging. This allows your application to handle a spaCy pipeline
|
native Python packaging. This allows your application to handle a spaCy pipeline
|
||||||
like any other package dependency.
|
like any other package dependency.
|
||||||
|
|
||||||
<!-- TODO: reference relevant spaCy project -->
|
|
||||||
|
|
||||||
### Downloading and requiring package dependencies {#models-download}
|
### Downloading and requiring package dependencies {#models-download}
|
||||||
|
|
||||||
spaCy's built-in [`download`](/api/cli#download) command is mostly intended as a
|
spaCy's built-in [`download`](/api/cli#download) command is mostly intended as a
|
||||||
|
|
|
@ -29,15 +29,13 @@ and share your results with your team. spaCy projects can be used via the new
|
||||||
|
|
||||||
![Illustration of project workflow and commands](../images/projects.svg)
|
![Illustration of project workflow and commands](../images/projects.svg)
|
||||||
|
|
||||||
<!-- TODO:
|
<Project id="pipelines/tagger_parser_ud">
|
||||||
<Project id="some_example_project">
|
|
||||||
|
|
||||||
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum
|
The easiest way to get started is to clone a project template and run it – for
|
||||||
sodales lectus, ut sodales orci ullamcorper id. Sed condimentum neque ut erat
|
example, this end-to-end template that lets you train a **part-of-speech
|
||||||
mattis pretium.
|
tagger** and **dependency parser** on a Universal Dependencies treebank.
|
||||||
|
|
||||||
</Project>
|
</Project>
|
||||||
-->
|
|
||||||
|
|
||||||
spaCy projects make it easy to integrate with many other **awesome tools** in
|
spaCy projects make it easy to integrate with many other **awesome tools** in
|
||||||
the data science and machine learning ecosystem to track and manage your data
|
the data science and machine learning ecosystem to track and manage your data
|
||||||
|
@ -65,10 +63,8 @@ project template and copies the files to a local directory. You can then run the
|
||||||
project, e.g. to train a pipeline and edit the commands and scripts to build
|
project, e.g. to train a pipeline and edit the commands and scripts to build
|
||||||
fully custom workflows.
|
fully custom workflows.
|
||||||
|
|
||||||
<!-- TODO: update with real example project -->
|
|
||||||
|
|
||||||
```cli
|
```cli
|
||||||
python -m spacy project clone some_example_project
|
python -m spacy project clone pipelines/tagger_parser_ud
|
||||||
```
|
```
|
||||||
|
|
||||||
By default, the project will be cloned into the current working directory. You
|
By default, the project will be cloned into the current working directory. You
|
||||||
|
@ -216,10 +212,8 @@ format, train a pipeline, evaluate it and export metrics, package it and spin up
|
||||||
a quick web demo. It looks pretty similar to a config file used to define CI
|
a quick web demo. It looks pretty similar to a config file used to define CI
|
||||||
pipelines.
|
pipelines.
|
||||||
|
|
||||||
<!-- TODO: update with better (final) example -->
|
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
https://github.com/explosion/projects/tree/v3/tutorials/ner_fashion_brands/project.yml
|
https://github.com/explosion/projects/tree/v3/pipelines/tagger_parser_ud/project.yml
|
||||||
```
|
```
|
||||||
|
|
||||||
| Section | Description |
|
| Section | Description |
|
||||||
|
|
|
@ -574,7 +574,7 @@ The directory will be created if it doesn't exist, and the whole pipeline data,
|
||||||
meta and configuration will be written out. To make the pipeline more convenient
|
meta and configuration will be written out. To make the pipeline more convenient
|
||||||
to deploy, we recommend wrapping it as a [Python package](/api/cli#package).
|
to deploy, we recommend wrapping it as a [Python package](/api/cli#package).
|
||||||
|
|
||||||
<Accordion title="What’s the difference between the config.cfg and meta.json?" spaced id="models-meta-vs-config">
|
<Accordion title="What’s the difference between the config.cfg and meta.json?" spaced id="models-meta-vs-config" spaced>
|
||||||
|
|
||||||
When you save a pipeline in spaCy v3.0+, two files will be exported: a
|
When you save a pipeline in spaCy v3.0+, two files will be exported: a
|
||||||
[`config.cfg`](/api/data-formats#config) based on
|
[`config.cfg`](/api/data-formats#config) based on
|
||||||
|
@ -596,6 +596,15 @@ based on [`nlp.meta`](/api/language#meta).
|
||||||
|
|
||||||
</Accordion>
|
</Accordion>
|
||||||
|
|
||||||
|
<Project id="pipelines/tagger_parser_ud">
|
||||||
|
|
||||||
|
The easiest way to get started with an end-to-end workflow is to clone a
|
||||||
|
[project template](/usage/projects) and run it – for example, this template that
|
||||||
|
lets you train a **part-of-speech tagger** and **dependency parser** on a
|
||||||
|
Universal Dependencies treebank and generates an installable Python package.
|
||||||
|
|
||||||
|
</Project>
|
||||||
|
|
||||||
### Generating a pipeline package {#models-generating}
|
### Generating a pipeline package {#models-generating}
|
||||||
|
|
||||||
<Infobox title="Important note" variant="warning">
|
<Infobox title="Important note" variant="warning">
|
||||||
|
@ -699,5 +708,3 @@ class and call [`from_disk`](/api/language#from_disk) instead.
|
||||||
```python
|
```python
|
||||||
nlp = spacy.blank("en").from_disk("/path/to/data")
|
nlp = spacy.blank("en").from_disk("/path/to/data")
|
||||||
```
|
```
|
||||||
|
|
||||||
<!-- TODO: point to spaCy projects? -->
|
|
||||||
|
|
|
@ -92,7 +92,7 @@ spaCy's binary `.spacy` format. You can either include the data paths in the
|
||||||
$ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy
|
$ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy
|
||||||
```
|
```
|
||||||
|
|
||||||
<Accordion title="How are the config recommendations generated?" id="quickstart-source">
|
<Accordion title="How are the config recommendations generated?" id="quickstart-source" spaced>
|
||||||
|
|
||||||
The recommended config settings generated by the quickstart widget and the
|
The recommended config settings generated by the quickstart widget and the
|
||||||
[`init config`](/api/cli#init-config) command are based on some general **best
|
[`init config`](/api/cli#init-config) command are based on some general **best
|
||||||
|
@ -112,6 +112,15 @@ as we run more experiments.
|
||||||
|
|
||||||
</Accordion>
|
</Accordion>
|
||||||
|
|
||||||
|
<Project id="pipelines/tagger_parser_ud">
|
||||||
|
|
||||||
|
The easiest way to get started is to clone a [project template](/usage/projects)
|
||||||
|
and run it – for example, this end-to-end template that lets you train a
|
||||||
|
**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
|
||||||
|
treebank.
|
||||||
|
|
||||||
|
</Project>
|
||||||
|
|
||||||
## Training config {#config}
|
## Training config {#config}
|
||||||
|
|
||||||
Training config files include all **settings and hyperparameters** for training
|
Training config files include all **settings and hyperparameters** for training
|
||||||
|
|
|
@ -176,18 +176,16 @@ freely combine implementations from different frameworks into a single model.
|
||||||
|
|
||||||
### Manage end-to-end workflows with projects {#features-projects}
|
### Manage end-to-end workflows with projects {#features-projects}
|
||||||
|
|
||||||
<!-- TODO: update example -->
|
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```cli
|
> ```cli
|
||||||
> # Clone a project template
|
> # Clone a project template
|
||||||
> $ python -m spacy project clone example
|
> $ python -m spacy project clone pipelines/tagger_parser_ud
|
||||||
> $ cd example
|
> $ cd tagger_parser_ud
|
||||||
> # Download data assets
|
> # Download data assets
|
||||||
> $ python -m spacy project assets
|
> $ python -m spacy project assets
|
||||||
> # Run a workflow
|
> # Run a workflow
|
||||||
> $ python -m spacy project run train
|
> $ python -m spacy project run all
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
spaCy projects let you manage and share **end-to-end spaCy workflows** for
|
spaCy projects let you manage and share **end-to-end spaCy workflows** for
|
||||||
|
@ -207,14 +205,6 @@ data, [Streamlit](/usage/projects#streamlit) for building interactive apps,
|
||||||
[Ray](/usage/projects#ray) for parallel training,
|
[Ray](/usage/projects#ray) for parallel training,
|
||||||
[Weights & Biases](/usage/projects#wandb) for experiment tracking, and more!
|
[Weights & Biases](/usage/projects#wandb) for experiment tracking, and more!
|
||||||
|
|
||||||
<!-- <Project id="some_example_project">
|
|
||||||
|
|
||||||
The easiest way to get started with an end-to-end training process is to clone a
|
|
||||||
[project](/usage/projects) template. Projects let you manage multi-step
|
|
||||||
workflows, from data preprocessing to training and packaging your pipeline.
|
|
||||||
|
|
||||||
</Project>-->
|
|
||||||
|
|
||||||
<Infobox title="Details & Documentation" emoji="📖" list>
|
<Infobox title="Details & Documentation" emoji="📖" list>
|
||||||
|
|
||||||
- **Usage:** [spaCy projects](/usage/projects),
|
- **Usage:** [spaCy projects](/usage/projects),
|
||||||
|
@ -224,6 +214,15 @@ workflows, from data preprocessing to training and packaging your pipeline.
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
|
<Project id="pipelines/tagger_parser_ud">
|
||||||
|
|
||||||
|
The easiest way to get started is to clone a [project template](/usage/projects)
|
||||||
|
and run it – for example, this end-to-end template that lets you train a
|
||||||
|
**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
|
||||||
|
treebank.
|
||||||
|
|
||||||
|
</Project>
|
||||||
|
|
||||||
### Parallel and distributed training with Ray {#features-parallel-training}
|
### Parallel and distributed training with Ray {#features-parallel-training}
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
|
@ -875,7 +874,14 @@ values. You can then use the auto-generated `config.cfg` for training:
|
||||||
+ python -m spacy train ./config.cfg --output ./output
|
+ python -m spacy train ./config.cfg --output ./output
|
||||||
```
|
```
|
||||||
|
|
||||||
<!-- TODO: project template -->
|
<Project id="pipelines/tagger_parser_ud">
|
||||||
|
|
||||||
|
The easiest way to get started is to clone a [project template](/usage/projects)
|
||||||
|
and run it – for example, this end-to-end template that lets you train a
|
||||||
|
**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
|
||||||
|
treebank.
|
||||||
|
|
||||||
|
</Project>
|
||||||
|
|
||||||
#### Training via the Python API {#migrating-training-python}
|
#### Training via the Python API {#migrating-training-python}
|
||||||
|
|
||||||
|
|
|
@ -12,6 +12,7 @@
|
||||||
"companyUrl": "https://explosion.ai",
|
"companyUrl": "https://explosion.ai",
|
||||||
"repo": "explosion/spaCy",
|
"repo": "explosion/spaCy",
|
||||||
"modelsRepo": "explosion/spacy-models",
|
"modelsRepo": "explosion/spacy-models",
|
||||||
|
"projectsRepo": "explosion/projects/tree/v3",
|
||||||
"social": {
|
"social": {
|
||||||
"twitter": "spacy_io",
|
"twitter": "spacy_io",
|
||||||
"github": "explosion"
|
"github": "explosion"
|
||||||
|
|
|
@ -13,7 +13,7 @@ export default function Tag({ spaced = false, variant, tooltip, children }) {
|
||||||
const isValid = isString(children) && !isNaN(children)
|
const isValid = isString(children) && !isNaN(children)
|
||||||
const version = isValid ? Number(children).toFixed(1) : children
|
const version = isValid ? Number(children).toFixed(1) : children
|
||||||
const tooltipText = `This feature is new and was introduced in spaCy v${version}`
|
const tooltipText = `This feature is new and was introduced in spaCy v${version}`
|
||||||
// TODO: we probably want to handle this more elegantly, but the idea is
|
// We probably want to handle this more elegantly, but the idea is
|
||||||
// that we can hide tags referring to old versions
|
// that we can hide tags referring to old versions
|
||||||
const major = isString(version) ? Number(version.split('.')[0]) : version
|
const major = isString(version) ? Number(version.split('.')[0]) : version
|
||||||
return major < MIN_VERSION ? null : (
|
return major < MIN_VERSION ? null : (
|
||||||
|
|
|
@ -10,6 +10,7 @@ const htmlToReactParser = new HtmlToReactParser()
|
||||||
const DEFAULT_BRANCH = 'develop'
|
const DEFAULT_BRANCH = 'develop'
|
||||||
export const repo = siteMetadata.repo
|
export const repo = siteMetadata.repo
|
||||||
export const modelsRepo = siteMetadata.modelsRepo
|
export const modelsRepo = siteMetadata.modelsRepo
|
||||||
|
export const projectsRepo = siteMetadata.projectsRepo
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* This is used to provide selectors for headings so they can be crawled by
|
* This is used to provide selectors for headings so they can be crawled by
|
||||||
|
|
|
@ -222,10 +222,11 @@ const Landing = ({ data }) => {
|
||||||
<br />
|
<br />
|
||||||
<br />
|
<br />
|
||||||
<br />
|
<br />
|
||||||
{/** TODO: update with actual example */}
|
<Project id="pipelines/tagger_parser_ud" title="Get started">
|
||||||
<Project id="some_example">
|
The easiest way to get started is to clone a project template and run it
|
||||||
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum
|
– for example, this template for training a{' '}
|
||||||
sodales lectus.
|
<strong>part-of-speech tagger</strong> and{' '}
|
||||||
|
<strong>dependency parser</strong> on a Universal Dependencies treebank.
|
||||||
</Project>
|
</Project>
|
||||||
</LandingCol>
|
</LandingCol>
|
||||||
<LandingCol>
|
<LandingCol>
|
||||||
|
|
|
@ -4,25 +4,29 @@ import CopyInput from '../components/copy'
|
||||||
import Infobox from '../components/infobox'
|
import Infobox from '../components/infobox'
|
||||||
import Link from '../components/link'
|
import Link from '../components/link'
|
||||||
import { InlineCode } from '../components/code'
|
import { InlineCode } from '../components/code'
|
||||||
|
import { projectsRepo } from '../components/util'
|
||||||
|
|
||||||
// TODO: move to meta?
|
|
||||||
const DEFAULT_REPO = 'https://github.com/explosion/projects/tree/v3'
|
|
||||||
const COMMAND = 'python -m spacy project clone'
|
const COMMAND = 'python -m spacy project clone'
|
||||||
|
|
||||||
export default function Project({ id, repo, children }) {
|
export default function Project({
|
||||||
|
title = 'Get started with a project template',
|
||||||
|
id,
|
||||||
|
repo,
|
||||||
|
children,
|
||||||
|
}) {
|
||||||
const repoArg = repo ? ` --repo ${repo}` : ''
|
const repoArg = repo ? ` --repo ${repo}` : ''
|
||||||
const text = `${COMMAND} ${id}${repoArg}`
|
const text = `${COMMAND} ${id}${repoArg}`
|
||||||
const url = `${repo || DEFAULT_REPO}/${id}`
|
const url = `${repo || projectsRepo}/${id}`
|
||||||
const title = (
|
const header = (
|
||||||
<>
|
<>
|
||||||
Get started with a project template:{' '}
|
{title}:{' '}
|
||||||
<Link to={url}>
|
<Link to={url}>
|
||||||
<InlineCode>{id}</InlineCode>
|
<InlineCode>{id}</InlineCode>
|
||||||
</Link>
|
</Link>
|
||||||
</>
|
</>
|
||||||
)
|
)
|
||||||
return (
|
return (
|
||||||
<Infobox title={title} emoji="🪐">
|
<Infobox title={header} emoji="🪐">
|
||||||
{children}
|
{children}
|
||||||
<CopyInput text={text} prefix="$" />
|
<CopyInput text={text} prefix="$" />
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
Loading…
Reference in New Issue
Block a user