Update docs [ci skip]

This commit is contained in:
Ines Montani 2020-09-30 15:16:00 +02:00
parent 23c63eefaf
commit 115481aca7
7 changed files with 307 additions and 139 deletions

View File

@ -32,14 +32,16 @@ streaming.
> gold_preproc = false
> max_length = 0
> limit = 0
> augmenter = null
> ```
| Name | Description |
| --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `path` | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Path~~ |
|  `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~ |
| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
| Name | Description |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `path` | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Path~~ |
|  `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~ |
| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
| `augmenter` | Apply some simply data augmentation, where we replace tokens with variations. This is especially useful for punctuation and case replacement, to help generalize beyond corpora that don't have smart-quotes, or only have smart quotes, etc. Defaults to `None`. ~~Optional[Callable]~~ |
```python
%%GITHUB_SPACY/spacy/training/corpus.py
@ -74,7 +76,7 @@ train/test skew.
|  `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. Defaults to `False`. ~~bool~~ |
| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
| `augmenter` | Optional data augmentation callback. ~~Callable[[Language, Example], Iterable[Example]]~~
| `augmenter` | Optional data augmentation callback. ~~Callable[[Language, Example], Iterable[Example]]~~ |
## Corpus.\_\_call\_\_ {#call tag="method"}

View File

@ -191,16 +191,16 @@ browser. Will run a simple web server.
> displacy.serve([doc1, doc2], style="dep")
> ```
| Name | Description |
| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `docs` | Document(s) or span(s) to visualize. ~~Union[Iterable[Union[Doc, Span]], Doc, Span]~~ |
| `style` | Visualization style, `"dep"` or `"ent"`. Defaults to `"dep"`. ~~str~~ |
| `page` | Render markup as full HTML page. Defaults to `True`. ~~bool~~ |
| `minify` | Minify HTML markup. Defaults to `False`. ~~bool~~ |
| `options` | [Visualizer-specific options](#displacy_options), e.g. colors. ~~Dict[str, Any]~~ |
| Name | Description |
| --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docs` | Document(s) or span(s) to visualize. ~~Union[Iterable[Union[Doc, Span]], Doc, Span]~~ |
| `style` | Visualization style, `"dep"` or `"ent"`. Defaults to `"dep"`. ~~str~~ |
| `page` | Render markup as full HTML page. Defaults to `True`. ~~bool~~ |
| `minify` | Minify HTML markup. Defaults to `False`. ~~bool~~ |
| `options` | [Visualizer-specific options](#displacy_options), e.g. colors. ~~Dict[str, Any]~~ |
| `manual` | Don't parse `Doc` and instead expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. Defaults to `False`. ~~bool~~ |
| `port` | Port to serve visualization. Defaults to `5000`. ~~int~~ |
| `host` | Host to serve visualization. Defaults to `"0.0.0.0"`. ~~str~~ |
| `port` | Port to serve visualization. Defaults to `5000`. ~~int~~ |
| `host` | Host to serve visualization. Defaults to `"0.0.0.0"`. ~~str~~ |
### displacy.render {#displacy.render tag="method" new="2"}
@ -223,7 +223,7 @@ Render a dependency parse tree or named entity visualization.
| `page` | Render markup as full HTML page. Defaults to `True`. ~~bool~~ |
| `minify` | Minify HTML markup. Defaults to `False`. ~~bool~~ |
| `options` | [Visualizer-specific options](#displacy_options), e.g. colors. ~~Dict[str, Any]~~ |
| `manual` | Don't parse `Doc` and instead expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. Defaults to `False`. ~~bool~~ |
| `manual` | Don't parse `Doc` and instead expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. Defaults to `False`. ~~bool~~ |
| `jupyter` | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None` (default). ~~Optional[bool]~~ |
| **RETURNS** | The rendered HTML markup. ~~str~~ |
@ -244,7 +244,7 @@ If a setting is not present in the options, the default value will be used.
| Name | Description |
| ------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------- |
| `fine_grained` | Use fine-grained part-of-speech tags (`Token.tag_`) instead of coarse-grained tags (`Token.pos_`). Defaults to `False`. ~~bool~~ |
| `add_lemma` <Tag variant="new">2.2.4</Tag> | Print the lemmas in a separate row below the token texts. Defaults to `False`. ~~bool~~ |
| `add_lemma` <Tag variant="new">2.2.4</Tag> | Print the lemmas in a separate row below the token texts. Defaults to `False`. ~~bool~~ |
| `collapse_punct` | Attach punctuation to tokens. Can make the parse more readable, as it prevents long arcs to attach punctuation. Defaults to `True`. ~~bool~~ |
| `collapse_phrases` | Merge noun phrases into one token. Defaults to `False`. ~~bool~~ |
| `compact` | "Compact mode" with square arrows that takes up less space. Defaults to `False`. ~~bool~~ |
@ -498,12 +498,13 @@ the [`Corpus`](/api/corpus) class.
> limit = 0
> ```
| Name | Description |
| --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `path` | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Union[str, Path]~~ |
|  `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~ |
| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
| Name | Description |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `path` | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Union[str, Path]~~ |
|  `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~ |
| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
| `augmenter` | Apply some simply data augmentation, where we replace tokens with variations. This is especially useful for punctuation and case replacement, to help generalize beyond corpora that don't have smart-quotes, or only have smart quotes, etc. Defaults to `None`. ~~Optional[Callable]~~ |
### JsonlReader {#jsonlreader}
@ -935,7 +936,7 @@ Compile a sequence of prefix rules into a regex object.
| Name | Description |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `entries` | The prefix rules, e.g. [`lang.punctuation.TOKENIZER_PREFIXES`](%%GITHUB_SPACY/spacy/lang/punctuation.py). ~~Iterable[Union[str, Pattern]]~~ |
| **RETURNS** | The regex object to be used for [`Tokenizer.prefix_search`](/api/tokenizer#attributes). ~~Pattern~~ |
| **RETURNS** | The regex object to be used for [`Tokenizer.prefix_search`](/api/tokenizer#attributes). ~~Pattern~~ |
### util.compile_suffix_regex {#util.compile_suffix_regex tag="function"}
@ -952,7 +953,7 @@ Compile a sequence of suffix rules into a regex object.
| Name | Description |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `entries` | The suffix rules, e.g. [`lang.punctuation.TOKENIZER_SUFFIXES`](%%GITHUB_SPACY/spacy/lang/punctuation.py). ~~Iterable[Union[str, Pattern]]~~ |
| **RETURNS** | The regex object to be used for [`Tokenizer.suffix_search`](/api/tokenizer#attributes). ~~Pattern~~ |
| **RETURNS** | The regex object to be used for [`Tokenizer.suffix_search`](/api/tokenizer#attributes). ~~Pattern~~ |
### util.compile_infix_regex {#util.compile_infix_regex tag="function"}
@ -969,7 +970,7 @@ Compile a sequence of infix rules into a regex object.
| Name | Description |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `entries` | The infix rules, e.g. [`lang.punctuation.TOKENIZER_INFIXES`](%%GITHUB_SPACY/spacy/lang/punctuation.py). ~~Iterable[Union[str, Pattern]]~~ |
| **RETURNS** | The regex object to be used for [`Tokenizer.infix_finditer`](/api/tokenizer#attributes). ~~Pattern~~ |
| **RETURNS** | The regex object to be used for [`Tokenizer.infix_finditer`](/api/tokenizer#attributes). ~~Pattern~~ |
### util.minibatch {#util.minibatch tag="function" new="2"}

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 83 KiB

View File

@ -32,7 +32,7 @@ the [config](/usage/training#config):
```ini
[nlp]
pipeline = ["tagger", "parser", "ner"]
pipeline = ["tok2vec", "tagger", "parser", "ner"]
```
import Accordion from 'components/accordion.js'

View File

@ -167,8 +167,8 @@ the binary data:
```python
### spacy.load under the hood
lang = "en"
pipeline = ["tagger", "parser", "ner"]
data_path = "path/to/en_core_web_sm/en_core_web_sm-2.0.0"
pipeline = ["tok2vec", "tagger", "parser", "ner"]
data_path = "path/to/en_core_web_sm/en_core_web_sm-3.0.0"
cls = spacy.util.get_lang_class(lang) # 1. Get Language class, e.g. English
nlp = cls() # 2. Initialize it
@ -197,9 +197,9 @@ list of human-readable component names.
```python
print(nlp.pipeline)
# [('tagger', <spacy.pipeline.Tagger>), ('parser', <spacy.pipeline.DependencyParser>), ('ner', <spacy.pipeline.EntityRecognizer>)]
# [('tok2vec', <spacy.pipeline.Tok2Vec>), ('tagger', <spacy.pipeline.Tagger>), ('parser', <spacy.pipeline.DependencyParser>), ('ner', <spacy.pipeline.EntityRecognizer>)]
print(nlp.pipe_names)
# ['tagger', 'parser', 'ner']
# ['tok2vec', 'tagger', 'parser', 'ner']
```
### Built-in pipeline components {#built-in}
@ -1126,12 +1126,12 @@ For some use cases, it makes sense to also overwrite additional methods to
customize how the model is updated from examples, how it's initialized, how the
loss is calculated and to add evaluation scores to the training output.
| Name | Description |
| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| [`update`](/api/pipe#update) | Learn from a batch of [`Example`](/api/example) objects containing the predictions and gold-standard annotations, and update the component's model. |
| [`initialize`](/api/pipe#initialize) | Initialize the model. Typically calls into [`Model.initialize`](https://thinc.ai/docs/api-model#initialize) and [`Pipe.create_optimizer`](/api/pipe#create_optimizer) if no optimizer is provided. |
| [`get_loss`](/api/pipe#get_loss) | Return a tuple of the loss and the gradient for a batch of [`Example`](/api/example) objects. |
| [`score`](/api/pipe#score) | Score a batch of [`Example`](/api/example) objects and return a dictionary of scores. The [`@Language.factory`](/api/language#factory) decorator can define the `default_socre_weights` of the component to decide which keys of the scores to display during training and how they count towards the final score. |
| Name | Description |
| ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`update`](/api/pipe#update) | Learn from a batch of [`Example`](/api/example) objects containing the predictions and gold-standard annotations, and update the component's model. |
| [`initialize`](/api/pipe#initialize) | Initialize the model. Typically calls into [`Model.initialize`](https://thinc.ai/docs/api-model#initialize) and can be passed custom arguments via the [`[initialize]`](/api/data-formats#config-initialize) config block that are only loaded during training or when you call [`nlp.initialize`](/api/language#initialize), not at runtime. |
| [`get_loss`](/api/pipe#get_loss) | Return a tuple of the loss and the gradient for a batch of [`Example`](/api/example) objects. |
| [`score`](/api/pipe#score) | Score a batch of [`Example`](/api/example) objects and return a dictionary of scores. The [`@Language.factory`](/api/language#factory) decorator can define the `default_socre_weights` of the component to decide which keys of the scores to display during training and how they count towards the final score. |
<Infobox title="Custom trainable components and models" emoji="📖">

View File

@ -6,8 +6,9 @@ menu:
- ['Introduction', 'basics']
- ['Quickstart', 'quickstart']
- ['Config System', 'config']
# - ['Data Utilities', 'data']
- ['Custom Training', 'config-custom']
- ['Custom Functions', 'custom-functions']
- ['Data Utilities', 'data']
- ['Parallel Training', 'parallel-training']
- ['Internal API', 'api']
---
@ -122,7 +123,7 @@ treebank.
</Project>
## Training config {#config}
## Training config system {#config}
Training config files include all **settings and hyperparameters** for training
your pipeline. Instead of providing lots of arguments on the command line, you
@ -177,6 +178,7 @@ sections of a config file are:
| `system` | Settings related to system and hardware. Re-used across the config as variables, e.g. `${system.seed}`, and can be [overwritten](#config-overrides) on the CLI. |
| `training` | Settings and controls for the training and evaluation process. |
| `pretraining` | Optional settings and controls for the [language model pretraining](/usage/embeddings-transformers#pretraining). |
| `initialize` | Data resources and arguments passed to components when [`nlp.initialize`](/api/language#initialize) is called before training (but not at runtime). |
<Infobox title="Config format and settings" emoji="📖">
@ -190,6 +192,20 @@ available for the different architectures are documented with the
</Infobox>
### Config lifecycle at runtime and training {#config-lifecycle}
A pipeline's `config.cfg` is considered the "single source of truth", both at
**training** and **runtime**. Under the hood,
[`Language.from_config`](/api/language#from_config) takes care of constructing
the `nlp` object using the settings defined in the config. An `nlp` object's
config is available as [`nlp.config`](/api/language#config) and it includes all
information about the pipeline, as well as the settings used to train and
initialize it.
![Illustration of pipeline lifecycle](../images/lifecycle.svg)
<!-- TODO: explain lifecycle and initialization -->
### Overwriting config settings on the command line {#config-overrides}
The config system means that you can define all settings **in one place** and in
@ -233,6 +249,61 @@ defined in the config file.
$ SPACY_CONFIG_OVERRIDES="--system.gpu_allocator pytorch --training.batch_size 128" ./your_script.sh
```
### Using variable interpolation {#config-interpolation}
Another very useful feature of the config system is that it supports variable
interpolation for both **values and sections**. This means that you only need to
define a setting once and can reference it across your config using the
`${section.value}` syntax. In this example, the value of `seed` is reused within
the `[training]` block, and the whole block of `[training.optimizer]` is reused
in `[pretraining]` and will become `pretraining.optimizer`.
```ini
### config.cfg (excerpt) {highlight="5,18"}
[system]
seed = 0
[training]
seed = ${system.seed}
[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 1e-8
[pretraining]
optimizer = ${training.optimizer}
```
You can also use variables inside strings. In that case, it works just like
f-strings in Python. If the value of a variable is not a string, it's converted
to a string.
```ini
[paths]
version = 5
root = "/Users/you/data"
train = "${paths.root}/train_${paths.version}.spacy"
# Result: /Users/you/data/train_5.spacy
```
<Infobox title="Tip: Override variables on the CLI" emoji="💡">
If you need to change certain values between training runs, you can define them
once, reference them as variables and then [override](#config-overrides) them on
the CLI. For example, `--paths.root /other/root` will change the value of `root`
in the block `[paths]` and the change will be reflected across all other values
that reference this variable.
</Infobox>
## Customizing the pipeline and training {#config-custom}
### Defining pipeline components {#config-components}
You typically train a [pipeline](/usage/processing-pipelines) of **one or more
@ -353,59 +424,6 @@ stop = 1000
compound = 1.001
```
### Using variable interpolation {#config-interpolation}
Another very useful feature of the config system is that it supports variable
interpolation for both **values and sections**. This means that you only need to
define a setting once and can reference it across your config using the
`${section.value}` syntax. In this example, the value of `seed` is reused within
the `[training]` block, and the whole block of `[training.optimizer]` is reused
in `[pretraining]` and will become `pretraining.optimizer`.
```ini
### config.cfg (excerpt) {highlight="5,18"}
[system]
seed = 0
[training]
seed = ${system.seed}
[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 1e-8
[pretraining]
optimizer = ${training.optimizer}
```
You can also use variables inside strings. In that case, it works just like
f-strings in Python. If the value of a variable is not a string, it's converted
to a string.
```ini
[paths]
version = 5
root = "/Users/you/data"
train = "${paths.root}/train_${paths.version}.spacy"
# Result: /Users/you/data/train_5.spacy
```
<Infobox title="Tip: Override variables on the CLI" emoji="💡">
If you need to change certain values between training runs, you can define them
once, reference them as variables and then [override](#config-overrides) them on
the CLI. For example, `--paths.root /other/root` will change the value of `root`
in the block `[paths]` and the change will be reflected across all other values
that reference this variable.
</Infobox>
### Model architectures {#model-architectures}
> #### 💡 Model type annotations
@ -506,17 +524,7 @@ still look good.
</Accordion>
<!--
## Data Utilities {#data-utilities}
* spacy convert
* The [corpora] block
* Custom corpus class
* Minibatching
* Data augmentation
-->
## Custom Functions {#custom-functions}
## Custom functions {#custom-functions}
Registered functions in the training config files can refer to built-in
implementations, but you can also plug in fully **custom implementations**. All
@ -763,7 +771,96 @@ start = 2
factor = 1.005
```
#### Example: Custom data reading and batching {#custom-code-readers-batchers}
### Defining custom architectures {#custom-architectures}
Built-in pipeline components such as the tagger or named entity recognizer are
constructed with default neural network [models](/api/architectures). You can
change the model architecture entirely by implementing your own custom models
and providing those in the config when creating the pipeline component. See the
documentation on [layers and model architectures](/usage/layers-architectures)
for more details.
> ```ini
> ### config.cfg
> [components.tagger]
> factory = "tagger"
>
> [components.tagger.model]
> @architectures = "custom_neural_network.v1"
> output_width = 512
> ```
```python
### functions.py
from typing import List
from thinc.types import Floats2d
from thinc.api import Model
import spacy
from spacy.tokens import Doc
@spacy.registry.architectures("custom_neural_network.v1")
def MyModel(output_width: int) -> Model[List[Doc], List[Floats2d]]:
return create_model(output_width)
```
## Data utilities {#data}
spaCy includes various features and utilities to make it easy to train from your
own data. If you have training data in a standard format like `.conll` or
`.conllu`, the easiest way to convert it for use with spaCy is to run
[`spacy convert`](/api/cli#convert) and pass it a file and an output directory:
```cli
$ python -m spacy convert ./train.gold.conll ./corpus
```
<Infobox title="Tip: Manage multi-step workflows with projects" emoji="💡">
Training workflows often consist of multiple steps, from preprocessing the data
all the way to packaging and deploying the trained model.
[spaCy projects](/usage/projects) let you define all steps in one file, manage
data assets, track changes and share your end-to-end processes with your team.
</Infobox>
### Working with corpora {#data-corpora}
> #### Example
>
> ```ini
> [corpora]
>
> [corpora.train]
> @readers = "spacy.Corpus.v1"
> path = ${paths.train}
> gold_preproc = false
> max_length = 0
> limit = 0
> augmenter = null
>
> [training]
> train_corpus = "corpora.train"
> ```
The [`[corpora]`](/api/data-formats#config-corpora) block in your config lets
you define **data resources** to use for training, evaluation, pretraining or
any other custom workflows. `corpora.train` and `corpora.dev` are used as
conventions within spaCy's default configs, but you can also define any other
custom blocks. Each section in the corpora config should resolve to a
[`Corpus`](/api/corpus) for example, using spaCy's built-in
[corpus reader](/api/top-level#readers) that takes a path to a binary `.spacy`
file. The `train_corpus` and `dev_corpus` fields in the
[`[training]`](/api/data-formats#config-training) block specify where to find
the corpus in your config. This makes it easy to **swap out** different corpora
by only changing a single config setting.
Instead of making `[corpora]` a block with multiple subsections for each portion
of the data, you can also use a single function that returns a dictionary of
corpora, keyed by corpus name, e.g. `"train"` and `"dev"`. This can be
especially useful if you need to split a single file into corpora for training
and evaluation, without loading the same file twice.
### Custom data reading and batching {#custom-code-readers-batchers}
Some use-cases require **streaming in data** or manipulating datasets on the
fly, rather than generating all data beforehand and storing it to file. Instead
@ -859,37 +956,11 @@ def filter_batch(size: int) -> Callable[[Iterable[Example]], Iterator[List[Examp
return create_filtered_batches
```
### Defining custom architectures {#custom-architectures}
Built-in pipeline components such as the tagger or named entity recognizer are
constructed with default neural network [models](/api/architectures). You can
change the model architecture entirely by implementing your own custom models
and providing those in the config when creating the pipeline component. See the
documentation on [layers and model architectures](/usage/layers-architectures)
for more details.
> ```ini
> ### config.cfg
> [components.tagger]
> factory = "tagger"
>
> [components.tagger.model]
> @architectures = "custom_neural_network.v1"
> output_width = 512
> ```
```python
### functions.py
from typing import List
from thinc.types import Floats2d
from thinc.api import Model
import spacy
from spacy.tokens import Doc
@spacy.registry.architectures("custom_neural_network.v1")
def MyModel(output_width: int) -> Model[List[Doc], List[Floats2d]]:
return create_model(output_width)
```
<!-- TODO:
* Custom corpus class
* Minibatching
* Data augmentation
-->
## Parallel & distributed training with Ray {#parallel-training}

View File

@ -123,13 +123,14 @@ training run, with no hidden defaults, making it easy to rerun your experiments
and track changes. You can use the
[quickstart widget](/usage/training#quickstart) or the `init config` command to
get started. Instead of providing lots of arguments on the command line, you
only need to pass your `config.cfg` file to `spacy train`.
only need to pass your `config.cfg` file to [`spacy train`](/api/cli#train).
Training config files include all **settings and hyperparameters** for training
your pipeline. Some settings can also be registered **functions** that you can
swap out and customize, making it easy to implement your own custom models and
architectures.
![Illustration of pipeline lifecycle](../images/lifecycle.svg)
<Infobox title="Details & Documentation" emoji="📖" list>
- **Usage:** [Training pipelines and models](/usage/training)
@ -723,7 +724,7 @@ nlp = spacy.blank("en")
Because pipeline components are now added using their string names, you won't
have to instantiate the [component classes](/api/#architecture-pipeline)
directly anynore. To configure the component, you can now use the `config`
directly anymore. To configure the component, you can now use the `config`
argument on [`nlp.add_pipe`](/api/language#add_pipe).
> #### config.cfg (excerpt)