mirror of
https://github.com/explosion/spaCy.git
synced 2025-02-03 21:24:11 +03:00
Update docs [ci skip]
This commit is contained in:
parent
b2302c0a1c
commit
012b3a7096
|
@ -895,8 +895,6 @@ what you need. By default, spaCy's
|
|||
can provide any other repo (public or private) that you have access to using the
|
||||
`--repo` option.
|
||||
|
||||
<!-- TODO: update example once we've decided on repo structure -->
|
||||
|
||||
```cli
|
||||
$ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse]
|
||||
```
|
||||
|
@ -904,7 +902,7 @@ $ python -m spacy project clone [name] [dest] [--repo] [--branch] [--sparse]
|
|||
> #### Example
|
||||
>
|
||||
> ```cli
|
||||
> $ python -m spacy project clone some_example
|
||||
> $ python -m spacy project clone pipelines/ner_wikiner
|
||||
> ```
|
||||
>
|
||||
> Clone from custom repo:
|
||||
|
|
|
@ -289,8 +289,7 @@ of objects by referring to creation functions, including functions you register
|
|||
yourself. For details on how to get started with training your own model, check
|
||||
out the [training quickstart](/usage/training#quickstart).
|
||||
|
||||
<!-- TODO:
|
||||
<Project id="en_core_trf_lg">
|
||||
<!-- TODO: <Project id="en_core_trf_lg">
|
||||
|
||||
The easiest way to get started is to clone a transformers-based project
|
||||
template. Swap in your data, edit the settings and hyperparameters and train,
|
||||
|
@ -623,7 +622,7 @@ that are familiar from the training block: the `[pretraining.batcher]`,
|
|||
`[pretraining.optimizer]` and `[pretraining.corpus]` all work the same way and
|
||||
expect the same types of objects, although for pretraining your corpus does not
|
||||
need to have any annotations, so you will often use a different reader, such as
|
||||
the [`JsonlReader`](/api/toplevel#jsonlreader).
|
||||
the [`JsonlReader`](/api/top-level#jsonlreader).
|
||||
|
||||
> #### Raw text format
|
||||
>
|
||||
|
|
|
@ -45,7 +45,7 @@ spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy
|
|||
right up to **current state-of-the-art**. You can also use a CPU-optimized
|
||||
pipeline, which is less accurate but much cheaper to run.
|
||||
|
||||
<!-- TODO: -->
|
||||
<!-- TODO: update benchmarks and intro -->
|
||||
|
||||
> #### Evaluation details
|
||||
>
|
||||
|
@ -68,6 +68,6 @@ our project template.
|
|||
|
||||
</Project>
|
||||
|
||||
<!-- ## Citing spaCy {#citation}
|
||||
<!-- TODO: ## Citing spaCy {#citation}
|
||||
|
||||
<!-- TODO: update -->
|
||||
-->
|
||||
|
|
|
@ -356,11 +356,11 @@ that training configs are complete and experiments fully reproducible.
|
|||
|
||||
</Infobox>
|
||||
|
||||
Note that when using a PyTorch or Tensorflow model, it is recommended to set the GPU
|
||||
memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or
|
||||
"tensorflow" in the training config, cupy will allocate memory via those respective libraries,
|
||||
preventing OOM errors when there's available memory sitting in the other
|
||||
library's pool.
|
||||
Note that when using a PyTorch or Tensorflow model, it is recommended to set the
|
||||
GPU memory allocator accordingly. When `gpu_allocator` is set to "pytorch" or
|
||||
"tensorflow" in the training config, cupy will allocate memory via those
|
||||
respective libraries, preventing OOM errors when there's available memory
|
||||
sitting in the other library's pool.
|
||||
|
||||
```ini
|
||||
### config.cfg (excerpt)
|
||||
|
@ -489,7 +489,7 @@ with Model.define_operators({">>": chain}):
|
|||
<Infobox title="This section is still under construction" emoji="🚧" variant="warning">
|
||||
</Infobox>
|
||||
|
||||
<!-- TODO:
|
||||
<!-- TODO: write trainable component section
|
||||
- Interaction with `predict`, `get_loss` and `set_annotations`
|
||||
- Initialization life-cycle with `begin_training`, correlation with add_label
|
||||
Example: relation extraction component (implemented as project template)
|
||||
|
|
|
@ -381,8 +381,6 @@ and loading pipeline packages, the underlying functionality is entirely based on
|
|||
native Python packaging. This allows your application to handle a spaCy pipeline
|
||||
like any other package dependency.
|
||||
|
||||
<!-- TODO: reference relevant spaCy project -->
|
||||
|
||||
### Downloading and requiring package dependencies {#models-download}
|
||||
|
||||
spaCy's built-in [`download`](/api/cli#download) command is mostly intended as a
|
||||
|
|
|
@ -29,15 +29,13 @@ and share your results with your team. spaCy projects can be used via the new
|
|||
|
||||
![Illustration of project workflow and commands](../images/projects.svg)
|
||||
|
||||
<!-- TODO:
|
||||
<Project id="some_example_project">
|
||||
<Project id="pipelines/tagger_parser_ud">
|
||||
|
||||
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum
|
||||
sodales lectus, ut sodales orci ullamcorper id. Sed condimentum neque ut erat
|
||||
mattis pretium.
|
||||
The easiest way to get started is to clone a project template and run it – for
|
||||
example, this end-to-end template that lets you train a **part-of-speech
|
||||
tagger** and **dependency parser** on a Universal Dependencies treebank.
|
||||
|
||||
</Project>
|
||||
-->
|
||||
|
||||
spaCy projects make it easy to integrate with many other **awesome tools** in
|
||||
the data science and machine learning ecosystem to track and manage your data
|
||||
|
@ -65,10 +63,8 @@ project template and copies the files to a local directory. You can then run the
|
|||
project, e.g. to train a pipeline and edit the commands and scripts to build
|
||||
fully custom workflows.
|
||||
|
||||
<!-- TODO: update with real example project -->
|
||||
|
||||
```cli
|
||||
python -m spacy project clone some_example_project
|
||||
python -m spacy project clone pipelines/tagger_parser_ud
|
||||
```
|
||||
|
||||
By default, the project will be cloned into the current working directory. You
|
||||
|
@ -216,10 +212,8 @@ format, train a pipeline, evaluate it and export metrics, package it and spin up
|
|||
a quick web demo. It looks pretty similar to a config file used to define CI
|
||||
pipelines.
|
||||
|
||||
<!-- TODO: update with better (final) example -->
|
||||
|
||||
```yaml
|
||||
https://github.com/explosion/projects/tree/v3/tutorials/ner_fashion_brands/project.yml
|
||||
https://github.com/explosion/projects/tree/v3/pipelines/tagger_parser_ud/project.yml
|
||||
```
|
||||
|
||||
| Section | Description |
|
||||
|
|
|
@ -574,7 +574,7 @@ The directory will be created if it doesn't exist, and the whole pipeline data,
|
|||
meta and configuration will be written out. To make the pipeline more convenient
|
||||
to deploy, we recommend wrapping it as a [Python package](/api/cli#package).
|
||||
|
||||
<Accordion title="What’s the difference between the config.cfg and meta.json?" spaced id="models-meta-vs-config">
|
||||
<Accordion title="What’s the difference between the config.cfg and meta.json?" spaced id="models-meta-vs-config" spaced>
|
||||
|
||||
When you save a pipeline in spaCy v3.0+, two files will be exported: a
|
||||
[`config.cfg`](/api/data-formats#config) based on
|
||||
|
@ -596,6 +596,15 @@ based on [`nlp.meta`](/api/language#meta).
|
|||
|
||||
</Accordion>
|
||||
|
||||
<Project id="pipelines/tagger_parser_ud">
|
||||
|
||||
The easiest way to get started with an end-to-end workflow is to clone a
|
||||
[project template](/usage/projects) and run it – for example, this template that
|
||||
lets you train a **part-of-speech tagger** and **dependency parser** on a
|
||||
Universal Dependencies treebank and generates an installable Python package.
|
||||
|
||||
</Project>
|
||||
|
||||
### Generating a pipeline package {#models-generating}
|
||||
|
||||
<Infobox title="Important note" variant="warning">
|
||||
|
@ -699,5 +708,3 @@ class and call [`from_disk`](/api/language#from_disk) instead.
|
|||
```python
|
||||
nlp = spacy.blank("en").from_disk("/path/to/data")
|
||||
```
|
||||
|
||||
<!-- TODO: point to spaCy projects? -->
|
||||
|
|
|
@ -92,7 +92,7 @@ spaCy's binary `.spacy` format. You can either include the data paths in the
|
|||
$ python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy
|
||||
```
|
||||
|
||||
<Accordion title="How are the config recommendations generated?" id="quickstart-source">
|
||||
<Accordion title="How are the config recommendations generated?" id="quickstart-source" spaced>
|
||||
|
||||
The recommended config settings generated by the quickstart widget and the
|
||||
[`init config`](/api/cli#init-config) command are based on some general **best
|
||||
|
@ -112,6 +112,15 @@ as we run more experiments.
|
|||
|
||||
</Accordion>
|
||||
|
||||
<Project id="pipelines/tagger_parser_ud">
|
||||
|
||||
The easiest way to get started is to clone a [project template](/usage/projects)
|
||||
and run it – for example, this end-to-end template that lets you train a
|
||||
**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
|
||||
treebank.
|
||||
|
||||
</Project>
|
||||
|
||||
## Training config {#config}
|
||||
|
||||
Training config files include all **settings and hyperparameters** for training
|
||||
|
|
|
@ -176,18 +176,16 @@ freely combine implementations from different frameworks into a single model.
|
|||
|
||||
### Manage end-to-end workflows with projects {#features-projects}
|
||||
|
||||
<!-- TODO: update example -->
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```cli
|
||||
> # Clone a project template
|
||||
> $ python -m spacy project clone example
|
||||
> $ cd example
|
||||
> $ python -m spacy project clone pipelines/tagger_parser_ud
|
||||
> $ cd tagger_parser_ud
|
||||
> # Download data assets
|
||||
> $ python -m spacy project assets
|
||||
> # Run a workflow
|
||||
> $ python -m spacy project run train
|
||||
> $ python -m spacy project run all
|
||||
> ```
|
||||
|
||||
spaCy projects let you manage and share **end-to-end spaCy workflows** for
|
||||
|
@ -207,14 +205,6 @@ data, [Streamlit](/usage/projects#streamlit) for building interactive apps,
|
|||
[Ray](/usage/projects#ray) for parallel training,
|
||||
[Weights & Biases](/usage/projects#wandb) for experiment tracking, and more!
|
||||
|
||||
<!-- <Project id="some_example_project">
|
||||
|
||||
The easiest way to get started with an end-to-end training process is to clone a
|
||||
[project](/usage/projects) template. Projects let you manage multi-step
|
||||
workflows, from data preprocessing to training and packaging your pipeline.
|
||||
|
||||
</Project>-->
|
||||
|
||||
<Infobox title="Details & Documentation" emoji="📖" list>
|
||||
|
||||
- **Usage:** [spaCy projects](/usage/projects),
|
||||
|
@ -224,6 +214,15 @@ workflows, from data preprocessing to training and packaging your pipeline.
|
|||
|
||||
</Infobox>
|
||||
|
||||
<Project id="pipelines/tagger_parser_ud">
|
||||
|
||||
The easiest way to get started is to clone a [project template](/usage/projects)
|
||||
and run it – for example, this end-to-end template that lets you train a
|
||||
**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
|
||||
treebank.
|
||||
|
||||
</Project>
|
||||
|
||||
### Parallel and distributed training with Ray {#features-parallel-training}
|
||||
|
||||
> #### Example
|
||||
|
@ -875,7 +874,14 @@ values. You can then use the auto-generated `config.cfg` for training:
|
|||
+ python -m spacy train ./config.cfg --output ./output
|
||||
```
|
||||
|
||||
<!-- TODO: project template -->
|
||||
<Project id="pipelines/tagger_parser_ud">
|
||||
|
||||
The easiest way to get started is to clone a [project template](/usage/projects)
|
||||
and run it – for example, this end-to-end template that lets you train a
|
||||
**part-of-speech tagger** and **dependency parser** on a Universal Dependencies
|
||||
treebank.
|
||||
|
||||
</Project>
|
||||
|
||||
#### Training via the Python API {#migrating-training-python}
|
||||
|
||||
|
|
|
@ -12,6 +12,7 @@
|
|||
"companyUrl": "https://explosion.ai",
|
||||
"repo": "explosion/spaCy",
|
||||
"modelsRepo": "explosion/spacy-models",
|
||||
"projectsRepo": "explosion/projects/tree/v3",
|
||||
"social": {
|
||||
"twitter": "spacy_io",
|
||||
"github": "explosion"
|
||||
|
|
|
@ -13,7 +13,7 @@ export default function Tag({ spaced = false, variant, tooltip, children }) {
|
|||
const isValid = isString(children) && !isNaN(children)
|
||||
const version = isValid ? Number(children).toFixed(1) : children
|
||||
const tooltipText = `This feature is new and was introduced in spaCy v${version}`
|
||||
// TODO: we probably want to handle this more elegantly, but the idea is
|
||||
// We probably want to handle this more elegantly, but the idea is
|
||||
// that we can hide tags referring to old versions
|
||||
const major = isString(version) ? Number(version.split('.')[0]) : version
|
||||
return major < MIN_VERSION ? null : (
|
||||
|
|
|
@ -10,6 +10,7 @@ const htmlToReactParser = new HtmlToReactParser()
|
|||
const DEFAULT_BRANCH = 'develop'
|
||||
export const repo = siteMetadata.repo
|
||||
export const modelsRepo = siteMetadata.modelsRepo
|
||||
export const projectsRepo = siteMetadata.projectsRepo
|
||||
|
||||
/**
|
||||
* This is used to provide selectors for headings so they can be crawled by
|
||||
|
|
|
@ -222,10 +222,11 @@ const Landing = ({ data }) => {
|
|||
<br />
|
||||
<br />
|
||||
<br />
|
||||
{/** TODO: update with actual example */}
|
||||
<Project id="some_example">
|
||||
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus interdum
|
||||
sodales lectus.
|
||||
<Project id="pipelines/tagger_parser_ud" title="Get started">
|
||||
The easiest way to get started is to clone a project template and run it
|
||||
– for example, this template for training a{' '}
|
||||
<strong>part-of-speech tagger</strong> and{' '}
|
||||
<strong>dependency parser</strong> on a Universal Dependencies treebank.
|
||||
</Project>
|
||||
</LandingCol>
|
||||
<LandingCol>
|
||||
|
|
|
@ -4,25 +4,29 @@ import CopyInput from '../components/copy'
|
|||
import Infobox from '../components/infobox'
|
||||
import Link from '../components/link'
|
||||
import { InlineCode } from '../components/code'
|
||||
import { projectsRepo } from '../components/util'
|
||||
|
||||
// TODO: move to meta?
|
||||
const DEFAULT_REPO = 'https://github.com/explosion/projects/tree/v3'
|
||||
const COMMAND = 'python -m spacy project clone'
|
||||
|
||||
export default function Project({ id, repo, children }) {
|
||||
export default function Project({
|
||||
title = 'Get started with a project template',
|
||||
id,
|
||||
repo,
|
||||
children,
|
||||
}) {
|
||||
const repoArg = repo ? ` --repo ${repo}` : ''
|
||||
const text = `${COMMAND} ${id}${repoArg}`
|
||||
const url = `${repo || DEFAULT_REPO}/${id}`
|
||||
const title = (
|
||||
const url = `${repo || projectsRepo}/${id}`
|
||||
const header = (
|
||||
<>
|
||||
Get started with a project template:{' '}
|
||||
{title}:{' '}
|
||||
<Link to={url}>
|
||||
<InlineCode>{id}</InlineCode>
|
||||
</Link>
|
||||
</>
|
||||
)
|
||||
return (
|
||||
<Infobox title={title} emoji="🪐">
|
||||
<Infobox title={header} emoji="🪐">
|
||||
{children}
|
||||
<CopyInput text={text} prefix="$" />
|
||||
</Infobox>
|
||||
|
|
Loading…
Reference in New Issue
Block a user