diff --git a/website/README.md b/website/README.md
index a02d5a151..c1f6e5805 100644
--- a/website/README.md
+++ b/website/README.md
@@ -384,7 +384,7 @@ original file is shown at the top of the widget.
> ```
```python
-https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_countries_api.py
+https://github.com/explosion/spaCy/tree/master/spacy/language.py
```
### Infobox
diff --git a/website/docs/api/dependencyparser.md b/website/docs/api/dependencyparser.md
index 135caf0c2..52442fe49 100644
--- a/website/docs/api/dependencyparser.md
+++ b/website/docs/api/dependencyparser.md
@@ -1,23 +1,22 @@
---
title: DependencyParser
tag: class
-source: spacy/pipeline/pipes.pyx
+source: spacy/pipeline/dep_parser.pyx
---
This class is a subclass of `Pipe` and follows the same API. The pipeline
component is available in the [processing pipeline](/usage/processing-pipelines)
via the ID `"parser"`.
-## Default config {#config}
+## Implementation and defaults {#implementation}
-This is the default configuration used to initialize the model powering the
-pipeline component. See the [model architectures](/api/architectures)
-documentation for details on the architectures and their arguments and
-hyperparameters. To learn more about how to customize the config and train
-custom models, check out the [training config](/usage/training#config) docs.
+See the [model architectures](/api/architectures) documentation for details on
+the architectures and their arguments and hyperparameters. To learn more about
+how to customize the config and train custom models, check out the
+[training config](/usage/training#config) docs.
```python
-https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/parser_defaults.cfg
+https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/dep_parser.pyx
```
## DependencyParser.\_\_init\_\_ {#init tag="method"}
@@ -25,22 +24,17 @@ https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/parser_d
> #### Example
>
> ```python
-> # Construction via create_pipe with default model
-> parser = nlp.create_pipe("parser")
+> # Construction via add_pipe with default model
+> parser = nlp.add_pipe("parser")
>
-> # Construction via create_pipe with custom model
+> # Construction via add_pipe with custom model
> config = {"model": {"@architectures": "my_parser"}}
-> parser = nlp.create_pipe("parser", config)
->
-> # Construction from class with custom model from file
-> from spacy.pipeline import DependencyParser
-> model = util.load_config("model.cfg", create_objects=True)["model"]
-> parser = DependencyParser(nlp.vocab, model)
+> parser = nlp.add_pipe("parser", config=config)
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
-[`nlp.create_pipe`](/api/language#create_pipe).
+[`nlp.add_pipe`](/api/language#add_pipe).
| Name | Type | Description |
| ----------- | ------------------ | ------------------------------------------------------------------------------- |
diff --git a/website/docs/api/entitylinker.md b/website/docs/api/entitylinker.md
index b77fc059d..bbc6cc632 100644
--- a/website/docs/api/entitylinker.md
+++ b/website/docs/api/entitylinker.md
@@ -4,7 +4,7 @@ teaser:
Functionality to disambiguate a named entity in text to a unique knowledge
base identifier.
tag: class
-source: spacy/pipeline/pipes.pyx
+source: spacy/pipeline/entity_linker.py
new: 2.2
---
@@ -12,16 +12,15 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline
component is available in the [processing pipeline](/usage/processing-pipelines)
via the ID `"entity_linker"`.
-## Default config {#config}
+## Implementation and defaults {#implementation}
-This is the default configuration used to initialize the model powering the
-pipeline component. See the [model architectures](/api/architectures)
-documentation for details on the architectures and their arguments and
-hyperparameters. To learn more about how to customize the config and train
-custom models, check out the [training config](/usage/training#config) docs.
+See the [model architectures](/api/architectures) documentation for details on
+the architectures and their arguments and hyperparameters. To learn more about
+how to customize the config and train custom models, check out the
+[training config](/usage/training#config) docs.
```python
-https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/entity_linker_defaults.cfg
+https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/entity_linker.py
```
## EntityLinker.\_\_init\_\_ {#init tag="method"}
@@ -29,22 +28,17 @@ https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/entity_l
> #### Example
>
> ```python
-> # Construction via create_pipe with default model
-> entity_linker = nlp.create_pipe("entity_linker")
+> # Construction via add_pipe with default model
+> entity_linker = nlp.add_pipe("entity_linker")
>
-> # Construction via create_pipe with custom model
+> # Construction via add_pipe with custom model
> config = {"model": {"@architectures": "my_el"}}
-> entity_linker = nlp.create_pipe("entity_linker", config)
->
-> # Construction from class with custom model from file
-> from spacy.pipeline import EntityLinker
-> model = util.load_config("model.cfg", create_objects=True)["model"]
-> entity_linker = EntityLinker(nlp.vocab, model)
+> entity_linker = nlp.add_pipe("entity_linker", config=config)
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
-[`nlp.create_pipe`](/api/language#create_pipe).
+[`nlp.add_pipe`](/api/language#add_pipe).
| Name | Type | Description |
| ------- | ------- | ------------------------------------------------------------------------------- |
@@ -185,9 +179,8 @@ method, a knowledge base should have been defined with
> #### Example
>
> ```python
-> entity_linker = EntityLinker(nlp.vocab)
+> entity_linker = nlp.add_pipe("entity_linker", last=True)
> entity_linker.set_kb(kb)
-> nlp.add_pipe(entity_linker, last=True)
> optimizer = entity_linker.begin_training(pipeline=nlp.pipeline)
> ```
diff --git a/website/docs/api/entityrecognizer.md b/website/docs/api/entityrecognizer.md
index 23cc71558..6b5b1f93c 100644
--- a/website/docs/api/entityrecognizer.md
+++ b/website/docs/api/entityrecognizer.md
@@ -1,23 +1,22 @@
---
title: EntityRecognizer
tag: class
-source: spacy/pipeline/pipes.pyx
+source: spacy/pipeline/ner.pyx
---
This class is a subclass of `Pipe` and follows the same API. The pipeline
component is available in the [processing pipeline](/usage/processing-pipelines)
via the ID `"ner"`.
-## Default config {#config}
+## Implementation and defaults {#implementation}
-This is the default configuration used to initialize the model powering the
-pipeline component. See the [model architectures](/api/architectures)
-documentation for details on the architectures and their arguments and
-hyperparameters. To learn more about how to customize the config and train
-custom models, check out the [training config](/usage/training#config) docs.
+See the [model architectures](/api/architectures) documentation for details on
+the architectures and their arguments and hyperparameters. To learn more about
+how to customize the config and train custom models, check out the
+[training config](/usage/training#config) docs.
```python
-https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/ner_defaults.cfg
+https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/ner.pyx
```
## EntityRecognizer.\_\_init\_\_ {#init tag="method"}
@@ -25,22 +24,17 @@ https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/ner_defa
> #### Example
>
> ```python
-> # Construction via create_pipe
-> ner = nlp.create_pipe("ner")
+> # Construction via add_pipe with default model
+> ner = nlp.add_pipe("ner")
>
-> # Construction via create_pipe with custom model
+> # Construction via add_pipe with custom model
> config = {"model": {"@architectures": "my_ner"}}
-> parser = nlp.create_pipe("ner", config)
->
-> # Construction from class with custom model from file
-> from spacy.pipeline import EntityRecognizer
-> model = util.load_config("model.cfg", create_objects=True)["model"]
-> ner = EntityRecognizer(nlp.vocab, model)
+> parser = nlp.add_pipe("ner", config=config)
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
-[`nlp.create_pipe`](/api/language#create_pipe).
+[`nlp.add_pipe`](/api/language#add_pipe).
| Name | Type | Description |
| ----------- | ------------------ | ------------------------------------------------------------------------------- |
diff --git a/website/docs/api/entityruler.md b/website/docs/api/entityruler.md
index 1279a3685..12a6ab748 100644
--- a/website/docs/api/entityruler.md
+++ b/website/docs/api/entityruler.md
@@ -8,10 +8,10 @@ new: 2.1
The EntityRuler lets you add spans to the [`Doc.ents`](/api/doc#ents) using
token-based rules or exact phrase matches. It can be combined with the
statistical [`EntityRecognizer`](/api/entityrecognizer) to boost accuracy, or
-used on its own to implement a purely rule-based entity recognition system.
-After initialization, the component is typically added to the processing
-pipeline using [`nlp.add_pipe`](/api/language#add_pipe). For usage examples, see
-the docs on
+used on its own to implement a purely rule-based entity recognition system. The
+pipeline component is available in the
+[processing pipeline](/usage/processing-pipelines) via the ID `"entity_ruler"`.
+For usage examples, see the docs on
[rule-based entity recognition](/usage/rule-based-matching#entityruler).
## EntityRuler.\_\_init\_\_ {#init tag="method"}
@@ -19,13 +19,13 @@ the docs on
Initialize the entity ruler. If patterns are supplied here, they need to be a
list of dictionaries with a `"label"` and `"pattern"` key. A pattern can either
be a token pattern (list) or a phrase pattern (string). For example:
-`{'label': 'ORG', 'pattern': 'Apple'}`.
+`{"label": "ORG", "pattern": "Apple"}`.
> #### Example
>
> ```python
-> # Construction via create_pipe
-> ruler = nlp.create_pipe("entity_ruler")
+> # Construction via add_pipe
+> ruler = nlp.add_pipe("entity_ruler")
>
> # Construction from class
> from spacy.pipeline import EntityRuler
@@ -90,9 +90,8 @@ is chosen.
> #### Example
>
> ```python
-> ruler = EntityRuler(nlp)
+> ruler = nlp.add_pipe("entity_ruler")
> ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}])
-> nlp.add_pipe(ruler)
>
> doc = nlp("A text about Apple.")
> ents = [(ent.text, ent.label_) for ent in doc.ents]
diff --git a/website/docs/api/example.md b/website/docs/api/example.md
index 421828f95..0d06c79a1 100644
--- a/website/docs/api/example.md
+++ b/website/docs/api/example.md
@@ -223,7 +223,7 @@ in `example.predicted`.
> #### Example
>
> ```python
-> nlp.add_pipe(my_ner)
+> nlp.add_pipe("my_ner")
> doc = nlp("Mr and Mrs Smith flew to New York")
> tokens_ref = ["Mr and Mrs", "Smith", "flew", "to", "New York"]
> example = Example.from_dict(doc, {"words": tokens_ref})
diff --git a/website/docs/api/language.md b/website/docs/api/language.md
index 9ab25597d..c41b349e5 100644
--- a/website/docs/api/language.md
+++ b/website/docs/api/language.md
@@ -15,6 +15,88 @@ the tagger or parser that are called on a document in order. You can also add
your own processing pipeline components that take a `Doc` object, modify it and
return it.
+## Language.component {#component tag="classmethod" new="3"}
+
+Register a custom pipeline component under a given name. This allows
+initializing the component by name using
+[`Language.add_pipe`](/api/language#add_pipe) and referring to it in
+[config files](/usage/training#config). This classmethod and decorator is
+intended for **simple stateless functions** that take a `Doc` and return it. For
+more complex stateful components that allow settings and need access to the
+shared `nlp` object, use the [`Language.factory`](/api/language#factory)
+decorator. For more details and examples, see the
+[usage documentation](/usage/processing-pipelines#custom-components).
+
+> #### Example
+>
+> ```python
+> from spacy.language import Language
+>
+> # Usage as a decorator
+> @Language.component("my_component")
+> def my_component(doc):
+> # Do something to the doc
+> return doc
+>
+> # Usage as a function
+> Language.component("my_component2", func=my_component)
+> ```
+
+| Name | Type | Description |
+| -------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
+| `name` | str | The name of the component factory. |
+| _keyword-only_ | | |
+| `assigns` | `Iterable[str]` | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. |
+| `requires` | `Iterable[str]` | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. |
+| `retokenizes` | bool | Whether the component changes tokenization. Used for pipeline analysis. |
+| `func` | `Optional[Callable]` | Optional function if not used a a decorator. |
+
+## Language.factory {#factory tag="classmethod"}
+
+Register a custom pipeline component factory under a given name. This allows
+initializing the component by name using
+[`Language.add_pipe`](/api/language#add_pipe) and referring to it in
+[config files](/usage/training#config). The registered factory function needs to
+take at least two **named arguments** which spaCy fills in automatically: `nlp`
+for the current `nlp` object and `name` for the component instance name. This
+can be useful to distinguish multiple instances of the same component and allows
+trainable components to add custom losses using the component instance name. The
+`default_config` defines the default values of the remaining factory arguments.
+It's merged into the [`nlp.config`](/api/language#config). For more details and
+examples, see the
+[usage documentation](/usage/processing-pipelines#custom-components).
+
+> #### Example
+>
+> ```python
+> from spacy.language import Language
+>
+> # Usage as a decorator
+> @Language.factory(
+> "my_component",
+> default_config={"some_setting": True},
+> )
+> def create_my_component(nlp, name, some_setting):
+> return MyComponent(some_setting)
+>
+> # Usage as function
+> Language.factory(
+> "my_component",
+> default_config={"some_setting": True},
+> func=create_my_component
+> )
+> ```
+
+| Name | Type | Description |
+| ---------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
+| `name` | str | The name of the component factory. |
+| _keyword-only_ | | |
+| `default_config` | `Dict[str, any]` | The default config, describing the default values of the factory arguments. |
+| `assigns` | `Iterable[str]` | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. |
+| `requires` | `Iterable[str]` | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. |
+| `retokenizes` | bool | Whether the component changes tokenization. Used for pipeline analysis. |
+| `func` | `Optional[Callable]` | Optional function if not used a a decorator. |
+
## Language.\_\_init\_\_ {#init tag="method"}
Initialize a `Language` object.
@@ -30,12 +112,41 @@ Initialize a `Language` object.
> nlp = English()
> ```
-| Name | Type | Description |
-| ----------- | ---------- | ------------------------------------------------------------------------------------------ |
-| `vocab` | `Vocab` | A `Vocab` object. If `True`, a vocab is created via `Language.Defaults.create_vocab`. |
-| `make_doc` | callable | A function that takes text and returns a `Doc` object. Usually a `Tokenizer`. |
-| `meta` | dict | Custom meta data for the `Language` class. Is written to by models to add model meta data. |
-| **RETURNS** | `Language` | The newly constructed object. |
+| Name | Type | Description |
+| ------------------ | ----------- | ------------------------------------------------------------------------------------------ |
+| `vocab` | `Vocab` | A `Vocab` object. If `True`, a vocab is created using the default language data settings. |
+| _keyword-only_ | | |
+| `max_length` | int | Maximum number of characters allowed in a single text. Defaults to `10 ** 6`. |
+| `meta` | dict | Custom meta data for the `Language` class. Is written to by models to add model meta data. |
+| `create_tokenizer` | `Callable` | Optional function that receives the `nlp` object and returns a tokenizer. |
+| **RETURNS** | `Language` | The newly constructed object. |
+
+## Language.from_config {#from_config tag="classmethod"}
+
+Create a `Language` object from a loaded config. Will set up the tokenizer and
+language data, add pipeline components based on the pipeline and components
+define in the config and validate the results. If no config is provided, the
+default config of the given language is used. This is also how spaCy loads a
+model under the hood based on its [`config.cfg`](/api/data-formats#config).
+
+> #### Example
+>
+> ```python
+> from thinc.api import Config
+> from spacy.language import Language
+>
+> config = Config().from_disk("./config.cfg")
+> nlp = Language.from_config(config)
+> ```
+
+| Name | Type | Description |
+| -------------- | ---------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
+| `config` | `Dict[str, Any]` / [`Config`](https://thinc.ai/docs/api-config#config) | The loaded config. |
+| _keyword-only_ | |
+| `disable` | `Iterable[str]` | List of pipeline component names to disable. |
+| `auto_fill` | bool | Whether to automatically fill in missing values in the config, based on defaults and function argument annotations. Defaults to `True`. |
+| `validate` | bool | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. |
+| **RETURNS** | `Language` | The initialized object. |
## Language.\_\_call\_\_ {#call tag="method"}
@@ -162,43 +273,99 @@ their original weights after the block.
Create a pipeline component from a factory.
+
+
+As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method also takes
+the string name of the factory, creates the component, adds it to the pipeline
+and returns it. The `Language.create_pipe` method is now mostly used internally.
+To create a component and add it to the pipeline, you should always use
+`Language.add_pipe`.
+
+
+
> #### Example
>
> ```python
> parser = nlp.create_pipe("parser")
-> nlp.add_pipe(parser)
> ```
-| Name | Type | Description |
-| ----------- | -------- | ---------------------------------------------------------------------------------- |
-| `name` | str | Factory name to look up in [`Language.factories`](/api/language#class-attributes). |
-| `config` | dict | Configuration parameters to initialize component. |
-| **RETURNS** | callable | The pipeline component. |
+| Name | Type | Description |
+| ------------------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `factory_name` | str | Name of the registered component factory. |
+| `name` | str | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. |
+| `config` 3 | `Dict[str, Any]` | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. |
+| `validate` 3 | bool | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. |
+| **RETURNS** | callable | The pipeline component. |
## Language.add_pipe {#add_pipe tag="method" new="2"}
-Add a component to the processing pipeline. Valid components are callables that
-take a `Doc` object, modify it and return it. Only one of `before`, `after`,
-`first` or `last` can be set. Default behavior is `last=True`.
+Add a component to the processing pipeline. Expects a name that maps to a
+component factory registered using
+[`@Language.component`](/api/language#component) or
+[`@Language.factory`](/api/language#factory). Components should be callables
+that take a `Doc` object, modify it and return it. Only one of `before`,
+`after`, `first` or `last` can be set. Default behavior is `last=True`.
+
+
+
+As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method doesn't
+take callables anymore and instead expects the name of a component factory
+registered using [`@Language.component`](/api/language#component) or
+[`@Language.factory`](/api/language#factory). It now takes care of creating the
+component, adds it to the pipeline and returns it.
+
+
> #### Example
>
> ```python
-> def component(doc):
+> @Language.component("component")
+> def component_func(doc):
> # modify Doc and return it return doc
>
-> nlp.add_pipe(component, before="ner")
-> nlp.add_pipe(component, name="custom_name", last=True)
+> nlp.add_pipe("component", before="ner")
+> component = nlp.add_pipe("component", name="custom_name", last=True)
> ```
-| Name | Type | Description |
-| ----------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `component` | callable | The pipeline component. |
-| `name` | str | Name of pipeline component. Overwrites existing `component.name` attribute if available. If no `name` is set and the component exposes no name attribute, `component.__name__` is used. An error is raised if the name already exists in the pipeline. |
-| `before` | str | Component name to insert component directly before. |
-| `after` | str | Component name to insert component directly after: |
-| `first` | bool | Insert component first / not first in the pipeline. |
-| `last` | bool | Insert component last / not last in the pipeline. |
+| Name | Type | Description |
+| -------------------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `factory_name` | str | Name of the registered component factory. |
+| `name` | str | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. |
+| _keyword-only_ | | |
+| `before` | str / int | Component name or index to insert component directly before. |
+| `after` | str / int | Component name or index to insert component directly after: |
+| `first` | bool | Insert component first / not first in the pipeline. |
+| `last` | bool | Insert component last / not last in the pipeline. |
+| `config` 3 | `Dict[str, Any]` | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. |
+| `validate` 3 | bool | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. |
+| **RETURNS** 3 | callable | The pipeline component. |
+
+## Language.has_factory {#has_factory tag="classmethod" new="3"}
+
+Check whether a factory name is registered on the `Language` class or subclass.
+Will check for
+[language-specific factories](/usage/processing-pipelines#factories-language)
+registered on the subclass, as well as general-purpose factories registered on
+the `Language` base class, available to all subclasses.
+
+> #### Example
+>
+> ```python
+> from spacy.language import Language
+> from spacy.lang.en import English
+>
+> @English.component("component")
+> def component(doc):
+> return doc
+>
+> assert English.has_factory("component")
+> assert not Language.has_factory("component")
+> ```
+
+| Name | Type | Description |
+| ----------- | ---- | ---------------------------------------------------------- |
+| `name` | str | Name of the pipeline factory to check. |
+| **RETURNS** | bool | Whether a factory of that name is registered on the class. |
## Language.has_pipe {#has_pipe tag="method" new="2"}
@@ -208,9 +375,13 @@ Check whether a component is present in the pipeline. Equivalent to
> #### Example
>
> ```python
-> nlp.add_pipe(lambda doc: doc, name="component")
-> assert "component" in nlp.pipe_names
-> assert nlp.has_pipe("component")
+> @Language.component("component")
+> def component(doc):
+> return doc
+>
+> nlp.add_pipe("component", name="my_component")
+> assert "my_component" in nlp.pipe_names
+> assert nlp.has_pipe("my_component")
> ```
| Name | Type | Description |
@@ -324,6 +495,43 @@ As of spaCy v3.0, the `disable_pipes` method has been renamed to `select_pipes`:
| `enable` | str / list | Names(s) of pipeline components that will not be disabled. |
| **RETURNS** | `DisabledPipes` | The disabled pipes that can be restored by calling the object's `.restore()` method. |
+## Language.meta {#meta tag="property"}
+
+Custom meta data for the Language class. If a model is loaded, contains meta
+data of the model. The `Language.meta` is also what's serialized as the
+`meta.json` when you save an `nlp` object to disk.
+
+> #### Example
+>
+> ```python
+> print(nlp.meta)
+> ```
+
+| Name | Type | Description |
+| ----------- | ---- | -------------- |
+| **RETURNS** | dict | The meta data. |
+
+## Language.config {#config tag="property" new="3"}
+
+Export a trainable [`config.cfg`](/api/data-formats#config) for the current
+`nlp` object. Includes the current pipeline, all configs used to create the
+currently active pipeline components, as well as the default training config
+that can be used with [`spacy train`](/api/cli#train). `Language.config` returns
+a [Thinc `Config` object](https://thinc.ai/docs/api-config#config), which is a
+subclass of the built-in `dict`. It supports the additional methods `to_disk`
+(serialize the config to a file) and `to_str` (output the config as a string).
+
+> #### Example
+>
+> ```python
+> nlp.config.to_disk("./config.cfg")
+> print(nlp.config.to_str())
+> ```
+
+| Name | Type | Description |
+| ----------- | --------------------------------------------------- | ----------- |
+| **RETURNS** | [`Config`](https://thinc.ai/docs/api-config#config) | The config. |
+
## Language.to_disk {#to_disk tag="method" new="2"}
Save the current state to a directory. If a model is loaded, this will **include
@@ -405,23 +613,25 @@ available to the loaded object.
## Attributes {#attributes}
-| Name | Type | Description |
-| ------------------------------------------ | ----------- | ----------------------------------------------------------------------------------------------- |
-| `vocab` | `Vocab` | A container for the lexical types. |
-| `tokenizer` | `Tokenizer` | The tokenizer. |
-| `make_doc` | `callable` | Callable that takes a string and returns a `Doc`. |
-| `pipeline` | list | List of `(name, component)` tuples describing the current processing pipeline, in order. |
-| `pipe_names` 2 | list | List of pipeline component names, in order. |
-| `pipe_labels` 2.2 | dict | List of labels set by the pipeline components, if available, keyed by component name. |
-| `meta` | dict | Custom meta data for the Language class. If a model is loaded, contains meta data of the model. |
-| `path` 2 | `Path` | Path to the model data directory, if a model is loaded. Otherwise `None`. |
+| Name | Type | Description |
+| --------------------------------------------- | ---------------------- | ---------------------------------------------------------------------------------------- |
+| `vocab` | `Vocab` | A container for the lexical types. |
+| `tokenizer` | `Tokenizer` | The tokenizer. |
+| `make_doc` | `Callable` | Callable that takes a string and returns a `Doc`. |
+| `pipeline` | `List[str, Callable]` | List of `(name, component)` tuples describing the current processing pipeline, in order. |
+| `pipe_names` 2 | `List[str]` | List of pipeline component names, in order. |
+| `pipe_labels` 2.2 | `Dict[str, List[str]]` | List of labels set by the pipeline components, if available, keyed by component name. |
+| `pipe_factories` 2.2 | `Dict[str, str]` | Dictionary of pipeline component names, mapped to their factory names. |
+| `factory_names` 3 | `List[str]` | List of all available factory names. |
+| `path` 2 | `Path` | Path to the model data directory, if a model is loaded. Otherwise `None`. |
## Class attributes {#class-attributes}
-| Name | Type | Description |
-| ---------- | ----- | ----------------------------------------------------------------------------------------------- |
-| `Defaults` | class | Settings, data and factory methods for creating the `nlp` object and processing pipeline. |
-| `lang` | str | Two-letter language ID, i.e. [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). |
+| Name | Type | Description |
+| ---------------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `Defaults` | class | Settings, data and factory methods for creating the `nlp` object and processing pipeline. |
+| `lang` | str | Two-letter language ID, i.e. [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). |
+| `default_config` | dict | Base [config](/usage/training#config) to use for [Language.config](/api/language#config). Defaults to [`default_config.cfg`](https://github.com/explosion/spaCy/tree/develop/spacy/default_config.cfg). |
## Defaults {#defaults}
diff --git a/website/docs/api/morphologizer.md b/website/docs/api/morphologizer.md
index ab2b1df73..3585c76f9 100644
--- a/website/docs/api/morphologizer.md
+++ b/website/docs/api/morphologizer.md
@@ -10,20 +10,18 @@ coarse-grained POS tags following the Universal Dependencies
[UPOS](https://universaldependencies.org/u/pos/index.html) and
[FEATS](https://universaldependencies.org/format.html#morphological-annotation)
annotation guidelines. This class is a subclass of `Pipe` and follows the same
-API. The component is also available via the string name `"morphologizer"`.
-After initialization, it is typically added to the processing pipeline using
-[`nlp.add_pipe`](/api/language#add_pipe).
+API. The pipeline component is available in the
+[processing pipeline](/usage/processing-pipelines) via the ID `"morphologizer"`.
-## Default config {#config}
+## Implementation and defaults {#implementation}
-This is the default configuration used to initialize the model powering the
-pipeline component. See the [model architectures](/api/architectures)
-documentation for details on the architectures and their arguments and
-hyperparameters. To learn more about how to customize the config and train
-custom models, check out the [training config](/usage/training#config) docs.
+See the [model architectures](/api/architectures) documentation for details on
+the architectures and their arguments and hyperparameters. To learn more about
+how to customize the config and train custom models, check out the
+[training config](/usage/training#config) docs.
```python
-https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/morphologizer_defaults.cfg
+https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/morphologizer.pyx
```
## Morphologizer.\_\_init\_\_ {#init tag="method"}
@@ -33,24 +31,19 @@ Initialize the morphologizer.
> #### Example
>
> ```python
-> # Construction via create_pipe
-> morphologizer = nlp.create_pipe("morphologizer")
->
-> # Construction from class
-> from spacy.pipeline import Morphologizer
-> morphologizer = Morphologizer()
+> # Construction via add_pipe
+> morphologizer = nlp.add_pipe("morphologizer")
> ```
-
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
-[`nlp.create_pipe`](/api/language#create_pipe).
+[`nlp.add_pipe`](/api/language#add_pipe).
-| Name | Type | Description |
-| ----------- | -------- | ------------------------------------------------------------------------------- |
-| `vocab` | `Vocab` | The shared vocabulary. |
-| `model` | `Model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
-| `**cfg` | - | Configuration parameters. |
+| Name | Type | Description |
+| ----------- | --------------- | ------------------------------------------------------------------------------- |
+| `vocab` | `Vocab` | The shared vocabulary. |
+| `model` | `Model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
+| `**cfg` | - | Configuration parameters. |
| **RETURNS** | `Morphologizer` | The newly constructed object. |
## Morphologizer.\_\_call\_\_ {#call tag="method"}
@@ -58,8 +51,8 @@ shortcut for this and instantiate the component using its string name and
Apply the pipe to one document. The document is modified in place, and returned.
This usually happens under the hood when the `nlp` object is called on a text
and all pipeline components are applied to the `Doc` in order. Both
-[`__call__`](/api/morphologizer#call) and [`pipe`](/api/morphologizer#pipe) delegate to the
-[`predict`](/api/morphologizer#predict) and
+[`__call__`](/api/morphologizer#call) and [`pipe`](/api/morphologizer#pipe)
+delegate to the [`predict`](/api/morphologizer#predict) and
[`set_annotations`](/api/morphologizer#set_annotations) methods.
> #### Example
@@ -81,7 +74,8 @@ and all pipeline components are applied to the `Doc` in order. Both
Apply the pipe to a stream of documents. This usually happens under the hood
when the `nlp` object is called on a text and all pipeline components are
applied to the `Doc` in order. Both [`__call__`](/api/morphologizer#call) and
-[`pipe`](/api/morphologizer#pipe) delegate to the [`predict`](/api/morphologizer#predict) and
+[`pipe`](/api/morphologizer#pipe) delegate to the
+[`predict`](/api/morphologizer#predict) and
[`set_annotations`](/api/morphologizer#set_annotations) methods.
> #### Example
@@ -126,9 +120,9 @@ Modify a batch of documents, using pre-computed scores.
> morphologizer.set_annotations([doc1, doc2], scores)
> ```
-| Name | Type | Description |
-| -------- | --------------- | ------------------------------------------------ |
-| `docs` | `Iterable[Doc]` | The documents to modify. |
+| Name | Type | Description |
+| -------- | --------------- | ------------------------------------------------------- |
+| `docs` | `Iterable[Doc]` | The documents to modify. |
| `scores` | - | The scores to set, produced by `Morphologizer.predict`. |
## Morphologizer.update {#update tag="method"}
@@ -145,15 +139,15 @@ pipe's model. Delegates to [`predict`](/api/morphologizer#predict) and
> losses = morphologizer.update(examples, sgd=optimizer)
> ```
-| Name | Type | Description |
-| ----------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
-| `examples` | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from. |
-| _keyword-only_ | | |
-| `drop` | float | The dropout rate. |
+| Name | Type | Description |
+| ----------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| `examples` | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from. |
+| _keyword-only_ | | |
+| `drop` | float | The dropout rate. |
| `set_annotations` | bool | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/morphologizer#set_annotations). |
-| `sgd` | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
-| `losses` | `Dict[str, float]` | Optional record of the loss during training. The value keyed by the model's name is updated. |
-| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. |
+| `sgd` | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
+| `losses` | `Dict[str, float]` | Optional record of the loss during training. The value keyed by the model's name is updated. |
+| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. |
## Morphologizer.get_loss {#get_loss tag="method"}
@@ -187,12 +181,12 @@ Initialize the pipe for training, using data examples if available. Return an
> optimizer = morphologizer.begin_training(pipeline=nlp.pipeline)
> ```
-| Name | Type | Description |
-| -------------- | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `get_examples` | `Iterable[Example]` | Optional gold-standard annotations in the form of [`Example`](/api/example) objects. |
-| `pipeline` | `List[(str, callable)]` | Optional list of pipeline components that this component is part of. |
+| Name | Type | Description |
+| -------------- | ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `get_examples` | `Iterable[Example]` | Optional gold-standard annotations in the form of [`Example`](/api/example) objects. |
+| `pipeline` | `List[(str, callable)]` | Optional list of pipeline components that this component is part of. |
| `sgd` | `Optimizer` | An optional [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. Will be created via [`create_optimizer`](/api/morphologizer#create_optimizer) if not set. |
-| **RETURNS** | `Optimizer` | An optimizer. |
+| **RETURNS** | `Optimizer` | An optimizer. |
## Morphologizer.create_optimizer {#create_optimizer tag="method"}
@@ -237,9 +231,9 @@ both `pos` and `morph`, the label should include the UPOS as the feature `POS`.
> morphologizer.add_label("Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin")
> ```
-| Name | Type | Description |
-| -------- | ---- | --------------------------------------------------------------- |
-| `label` | str | The label to add. |
+| Name | Type | Description |
+| ------- | ---- | ----------------- |
+| `label` | str | The label to add. |
## Morphologizer.to_disk {#to_disk tag="method"}
@@ -268,11 +262,11 @@ Load the pipe from disk. Modifies the object in place and returns it.
> morphologizer.from_disk("/path/to/morphologizer")
> ```
-| Name | Type | Description |
-| ----------- | ------------ | -------------------------------------------------------------------------- |
-| `path` | str / `Path` | A path to a directory. Paths may be either strings or `Path`-like objects. |
-| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
-| **RETURNS** | `Morphologizer` | The modified `Morphologizer` object. |
+| Name | Type | Description |
+| ----------- | --------------- | -------------------------------------------------------------------------- |
+| `path` | str / `Path` | A path to a directory. Paths may be either strings or `Path`-like objects. |
+| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
+| **RETURNS** | `Morphologizer` | The modified `Morphologizer` object. |
## Morphologizer.to_bytes {#to_bytes tag="method"}
@@ -288,7 +282,7 @@ Serialize the pipe to a bytestring.
| Name | Type | Description |
| ----------- | ----- | ------------------------------------------------------------------------- |
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
-| **RETURNS** | bytes | The serialized form of the `Morphologizer` object. |
+| **RETURNS** | bytes | The serialized form of the `Morphologizer` object. |
## Morphologizer.from_bytes {#from_bytes tag="method"}
@@ -302,16 +296,16 @@ Load the pipe from a bytestring. Modifies the object in place and returns it.
> morphologizer.from_bytes(morphologizer_bytes)
> ```
-| Name | Type | Description |
-| ------------ | -------- | ------------------------------------------------------------------------- |
-| `bytes_data` | bytes | The data to load from. |
-| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
-| **RETURNS** | `Morphologizer` | The `Morphologizer` object. |
+| Name | Type | Description |
+| ------------ | --------------- | ------------------------------------------------------------------------- |
+| `bytes_data` | bytes | The data to load from. |
+| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
+| **RETURNS** | `Morphologizer` | The `Morphologizer` object. |
## Morphologizer.labels {#labels tag="property"}
-The labels currently added to the component in Universal Dependencies [FEATS
-format](https://universaldependencies.org/format.html#morphological-annotation).
+The labels currently added to the component in Universal Dependencies
+[FEATS format](https://universaldependencies.org/format.html#morphological-annotation).
Note that even for a blank component, this will always include the internal
empty label `_`. If POS features are used, the labels will include the
coarse-grained POS as the feature `POS`.
@@ -339,8 +333,8 @@ serialization by passing in the string names via the `exclude` argument.
> data = morphologizer.to_disk("/path", exclude=["vocab"])
> ```
-| Name | Description |
-| --------- | ------------------------------------------------------------------------------------------ |
-| `vocab` | The shared [`Vocab`](/api/vocab). |
-| `cfg` | The config file. You usually don't want to exclude this. |
-| `model` | The binary model data. You usually don't want to exclude this. |
+| Name | Description |
+| ------- | -------------------------------------------------------------- |
+| `vocab` | The shared [`Vocab`](/api/vocab). |
+| `cfg` | The config file. You usually don't want to exclude this. |
+| `model` | The binary model data. You usually don't want to exclude this. |
diff --git a/website/docs/api/pipeline-functions.md b/website/docs/api/pipeline-functions.md
index fc417845c..5c2eb2b97 100644
--- a/website/docs/api/pipeline-functions.md
+++ b/website/docs/api/pipeline-functions.md
@@ -11,8 +11,7 @@ menu:
## merge_noun_chunks {#merge_noun_chunks tag="function"}
Merge noun chunks into a single token. Also available via the string name
-`"merge_noun_chunks"`. After initialization, the component is typically added to
-the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
+`"merge_noun_chunks"`.
> #### Example
>
@@ -20,9 +19,7 @@ the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
> texts = [t.text for t in nlp("I have a blue car")]
> assert texts == ["I", "have", "a", "blue", "car"]
>
-> merge_nps = nlp.create_pipe("merge_noun_chunks")
-> nlp.add_pipe(merge_nps)
->
+> nlp.add_pipe("merge_noun_chunks")
> texts = [t.text for t in nlp("I have a blue car")]
> assert texts == ["I", "have", "a blue car"]
> ```
@@ -44,8 +41,7 @@ all other components.
## merge_entities {#merge_entities tag="function"}
Merge named entities into a single token. Also available via the string name
-`"merge_entities"`. After initialization, the component is typically added to
-the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
+`"merge_entities"`.
> #### Example
>
@@ -53,8 +49,7 @@ the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
> texts = [t.text for t in nlp("I like David Bowie")]
> assert texts == ["I", "like", "David", "Bowie"]
>
-> merge_ents = nlp.create_pipe("merge_entities")
-> nlp.add_pipe(merge_ents)
+> nlp.add_pipe("merge_entities")
>
> texts = [t.text for t in nlp("I like David Bowie")]
> assert texts == ["I", "like", "David Bowie"]
@@ -76,12 +71,9 @@ components to the end of the pipeline and after all other components.
## merge_subtokens {#merge_subtokens tag="function" new="2.1"}
Merge subtokens into a single token. Also available via the string name
-`"merge_subtokens"`. After initialization, the component is typically added to
-the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
-
-As of v2.1, the parser is able to predict "subtokens" that should be merged into
-one single token later on. This is especially relevant for languages like
-Chinese, Japanese or Korean, where a "word" isn't defined as a
+`"merge_subtokens"`. As of v2.1, the parser is able to predict "subtokens" that
+should be merged into one single token later on. This is especially relevant for
+languages like Chinese, Japanese or Korean, where a "word" isn't defined as a
whitespace-delimited sequence of characters. Under the hood, this component uses
the [`Matcher`](/api/matcher) to find sequences of tokens with the dependency
label `"subtok"` and then merges them into a single token.
@@ -96,9 +88,7 @@ label `"subtok"` and then merges them into a single token.
> print([(token.text, token.dep_) for token in doc])
> # [('拜', 'subtok'), ('托', 'subtok')]
>
-> merge_subtok = nlp.create_pipe("merge_subtokens")
-> nlp.add_pipe(merge_subtok)
->
+> nlp.add_pipe("merge_subtokens")
> doc = nlp("拜托")
> print([token.text for token in doc])
> # ['拜托']
diff --git a/website/docs/api/sentencerecognizer.md b/website/docs/api/sentencerecognizer.md
index 458e42975..2cea97aaf 100644
--- a/website/docs/api/sentencerecognizer.md
+++ b/website/docs/api/sentencerecognizer.md
@@ -1,26 +1,24 @@
---
title: SentenceRecognizer
tag: class
-source: spacy/pipeline/pipes.pyx
+source: spacy/pipeline/senter.pyx
new: 3
---
A trainable pipeline component for sentence segmentation. For a simpler,
ruse-based strategy, see the [`Sentencizer`](/api/sentencizer). This class is a
subclass of `Pipe` and follows the same API. The component is also available via
-the string name `"senter"`. After initialization, it is typically added to the
-processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
+the string name `"senter"`.
-## Default config {#config}
+## Implementation and defaults {#implementation}
-This is the default configuration used to initialize the model powering the
-pipeline component. See the [model architectures](/api/architectures)
-documentation for details on the architectures and their arguments and
-hyperparameters. To learn more about how to customize the config and train
-custom models, check out the [training config](/usage/training#config) docs.
+See the [model architectures](/api/architectures) documentation for details on
+the architectures and their arguments and hyperparameters. To learn more about
+how to customize the config and train custom models, check out the
+[training config](/usage/training#config) docs.
```python
-https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/senter_defaults.cfg
+https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/senter.pyx
```
## SentenceRecognizer.\_\_init\_\_ {#init tag="method"}
@@ -30,12 +28,8 @@ Initialize the sentence recognizer.
> #### Example
>
> ```python
-> # Construction via create_pipe
-> senter = nlp.create_pipe("senter")
->
-> # Construction from class
-> from spacy.pipeline import SentenceRecognizer
-> senter = SentenceRecognizer()
+> # Construction via add_pipe
+> senter = nlp.add_pipe("senter")
> ```
diff --git a/website/docs/api/sentencizer.md b/website/docs/api/sentencizer.md
index 9c6e2d58c..181babe06 100644
--- a/website/docs/api/sentencizer.md
+++ b/website/docs/api/sentencizer.md
@@ -9,8 +9,7 @@ that doesn't require the dependency parse. By default, sentence segmentation is
performed by the [`DependencyParser`](/api/dependencyparser), so the
`Sentencizer` lets you implement a simpler, rule-based strategy that doesn't
require a statistical model to be loaded. The component is also available via
-the string name `"sentencizer"`. After initialization, it is typically added to
-the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
+the string name `"sentencizer"`.
## Sentencizer.\_\_init\_\_ {#init tag="method"}
@@ -19,12 +18,8 @@ Initialize the sentencizer.
> #### Example
>
> ```python
-> # Construction via create_pipe
-> sentencizer = nlp.create_pipe("sentencizer")
->
-> # Construction from class
-> from spacy.pipeline import Sentencizer
-> sentencizer = Sentencizer()
+> # Construction via add_pipe
+> sentencizer = nlp.add_pipe("sentencizer")
> ```
| Name | Type | Description |
@@ -58,8 +53,7 @@ the component has been added to the pipeline using
> from spacy.lang.en import English
>
> nlp = English()
-> sentencizer = nlp.create_pipe("sentencizer")
-> nlp.add_pipe(sentencizer)
+> nlp.add_pipe("sentencizer")
> doc = nlp("This is a sentence. This is another sentence.")
> assert len(list(doc.sents)) == 2
> ```
diff --git a/website/docs/api/tagger.md b/website/docs/api/tagger.md
index 9ef0843cf..82658b8e0 100644
--- a/website/docs/api/tagger.md
+++ b/website/docs/api/tagger.md
@@ -1,7 +1,7 @@
---
title: Tagger
tag: class
-source: spacy/pipeline/pipes.pyx
+source: spacy/pipeline/tagger.pyx
---
This class is a subclass of `Pipe` and follows the same API. The pipeline
@@ -13,22 +13,17 @@ via the ID `"tagger"`.
> #### Example
>
> ```python
-> # Construction via create_pipe
-> tagger = nlp.create_pipe("tagger")
+> # Construction via add_pipe with default model
+> tagger = nlp.add_pipe("tagger")
>
> # Construction via create_pipe with custom model
> config = {"model": {"@architectures": "my_tagger"}}
-> parser = nlp.create_pipe("tagger", config)
->
-> # Construction from class with custom model from file
-> from spacy.pipeline import Tagger
-> model = util.load_config("model.cfg", create_objects=True)["model"]
-> tagger = Tagger(nlp.vocab, model)
+> parser = nlp.add_pipe("tagger", config)
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
-[`nlp.create_pipe`](/api/language#create_pipe).
+[`nlp.add_pipe`](/api/language#add_pipe).
| Name | Type | Description |
| ----------- | -------- | ------------------------------------------------------------------------------- |
diff --git a/website/docs/api/textcategorizer.md b/website/docs/api/textcategorizer.md
index 431ee683b..52b7e399c 100644
--- a/website/docs/api/textcategorizer.md
+++ b/website/docs/api/textcategorizer.md
@@ -1,7 +1,7 @@
---
title: TextCategorizer
tag: class
-source: spacy/pipeline/pipes.pyx
+source: spacy/pipeline/textcat.py
new: 2
---
@@ -9,41 +9,33 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline
component is available in the [processing pipeline](/usage/processing-pipelines)
via the ID `"textcat"`.
-## Default config {#config}
+## Implementation and defaults {#implementation}
-This is the default configuration used to initialize the model powering the
-pipeline component. See the [model architectures](/api/architectures)
-documentation for details on the architectures and their arguments and
-hyperparameters. To learn more about how to customize the config and train
-custom models, check out the [training config](/usage/training#config) docs.
+See the [model architectures](/api/architectures) documentation for details on
+the architectures and their arguments and hyperparameters. To learn more about
+how to customize the config and train custom models, check out the
+[training config](/usage/training#config) docs.
```python
-https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/textcat_defaults.cfg
+https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/textcat.py
```
-
-
## TextCategorizer.\_\_init\_\_ {#init tag="method"}
> #### Example
>
> ```python
-> # Construction via create_pipe
-> textcat = nlp.create_pipe("textcat")
+> # Construction via add_pipe with default model
+> textcat = nlp.add_pipe("textcat")
>
-> # Construction via create_pipe with custom model
+> # Construction via add_pipe with custom model
> config = {"model": {"@architectures": "my_textcat"}}
-> parser = nlp.create_pipe("textcat", config)
->
-> # Construction from class with custom model from file
-> from spacy.pipeline import TextCategorizer
-> model = util.load_config("model.cfg", create_objects=True)["model"]
-> textcat = TextCategorizer(nlp.vocab, model)
+> parser = nlp.add_pipe("textcat", config=config)
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
-[`nlp.create_pipe`](/api/language#create_pipe).
+[`nlp.add_pipe`](/api/language#create_pipe).
| Name | Type | Description |
| ----------- | ----------------- | ------------------------------------------------------------------------------- |
diff --git a/website/docs/api/tok2vec.md b/website/docs/api/tok2vec.md
index 3667ed8ad..34e6e267a 100644
--- a/website/docs/api/tok2vec.md
+++ b/website/docs/api/tok2vec.md
@@ -4,16 +4,15 @@ source: spacy/pipeline/tok2vec.py
new: 3
---
-TODO: document
+
-## Default config {#config}
+## Implementation and defaults {#implementation}
-This is the default configuration used to initialize the model powering the
-pipeline component. See the [model architectures](/api/architectures)
-documentation for details on the architectures and their arguments and
-hyperparameters. To learn more about how to customize the config and train
-custom models, check out the [training config](/usage/training#config) docs.
+See the [model architectures](/api/architectures) documentation for details on
+the architectures and their arguments and hyperparameters. To learn more about
+how to customize the config and train custom models, check out the
+[training config](/usage/training#config) docs.
```python
-https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/tok2vec_defaults.cfg
+https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/tok2vec.py
```
diff --git a/website/docs/api/tokenizer.md b/website/docs/api/tokenizer.md
index 3281275e1..47e5aa9b3 100644
--- a/website/docs/api/tokenizer.md
+++ b/website/docs/api/tokenizer.md
@@ -31,7 +31,7 @@ the
> nlp = English()
> # Create a Tokenizer with the default settings for English
> # including punctuation rules and exceptions
-> tokenizer = nlp.Defaults.create_tokenizer(nlp)
+> tokenizer = nlp.tokenizer
> ```
| Name | Type | Description |
diff --git a/website/docs/api/top-level.md b/website/docs/api/top-level.md
index 8e9fff6aa..9c4380626 100644
--- a/website/docs/api/top-level.md
+++ b/website/docs/api/top-level.md
@@ -45,7 +45,8 @@ class, loads in the model data and returns it.
### Abstract example
cls = util.get_lang_class(lang) # get language for ID, e.g. 'en'
nlp = cls() # initialise the language
-for name in pipeline: component = nlp.create_pipe(name) # create each pipeline component nlp.add_pipe(component) # add component to pipeline
+for name in pipeline:
+ nlp.add_pipe(name) # add component to pipeline
nlp.from_disk(model_data_path) # load in model data
```
@@ -479,7 +480,6 @@ you can use the [`set_lang_class`](/api/top-level#util.set_lang_class) helper.
> for lang_id in ["en", "de"]:
> lang_class = util.get_lang_class(lang_id)
> lang = lang_class()
-> tokenizer = lang.Defaults.create_tokenizer()
> ```
| Name | Type | Description |
diff --git a/website/docs/images/pipeline.svg b/website/docs/images/pipeline.svg
index 022219c5f..9ece70e6f 100644
--- a/website/docs/images/pipeline.svg
+++ b/website/docs/images/pipeline.svg
@@ -1,30 +1,33 @@
-