spaCy/website/docs/api/language.md

---
title: Language
teaser: A text-processing pipeline
tag: class
source: spacy/language.py
---

Usually you'll load this once per process as `nlp` and pass the instance around
your application. The `Language` class is created when you call
[`spacy.load`](/api/top-level#spacy.load) and contains the shared vocabulary and
[language data](/usage/linguistic-features#language-data), optional binary
weights, e.g. provided by a [trained pipeline](/models), and the
[processing pipeline](/usage/processing-pipelines) containing components like
the tagger or parser that are called on a document in order. You can also add
your own processing pipeline components that take a `Doc` object, modify it and
return it.

## Language.\_\_init\_\_ {#init tag="method"}

Initialize a `Language` object. Note that the `meta` is only used for meta
information in [`Language.meta`](/api/language#meta) and not to configure the
`nlp` object or to override the config. To initialize from a config, use
[`Language.from_config`](/api/language#from_config) instead.

> #### Example
>
> ```python
> # Construction from subclass
> from spacy.lang.en import English
> nlp = English()
>
> # Construction from scratch
> from spacy.vocab import Vocab
> from spacy.language import Language
> nlp = Language(Vocab())
> ```

| Name               | Description                                                                                                              |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------ |
| `vocab`            | A `Vocab` object. If `True`, a vocab is created using the default language data settings. ~~Vocab~~                      |
| _keyword-only_     |                                                                                                                          |
| `max_length`       | Maximum number of characters allowed in a single text. Defaults to `10 ** 6`. ~~int~~                                    |
| `meta`             | [Meta data](/api/data-formats#meta) overrides. ~~Dict[str, Any]~~                                                        |
| `create_tokenizer` | Optional function that receives the `nlp` object and returns a tokenizer. ~~Callable[[Language], Callable[[str], Doc]]~~ |
| `batch_size`       | Default batch size for [`pipe`](#pipe) and [`evaluate`](#evaluate). Defaults to `1000`. ~~int~~                          |

## Language.from_config {#from_config tag="classmethod" new="3"}

Create a `Language` object from a loaded config. Will set up the tokenizer and
language data, add pipeline components based on the pipeline and add pipeline
components based on the definitions specified in the config. If no config is
provided, the default config of the given language is used. This is also how
spaCy loads a model under the hood based on its
[`config.cfg`](/api/data-formats#config).

> #### Example
>
> ```python
> from thinc.api import Config
> from spacy.language import Language
>
> config = Config().from_disk("./config.cfg")
> nlp = Language.from_config(config)
> ```

| Name           | Description                                                                                                                                                                                                                                      |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `config`       | The loaded config. ~~Union[Dict[str, Any], Config]~~                                                                                                                                                                                             |
| _keyword-only_ |                                                                                                                                                                                                                                                  |
| `vocab`        | A `Vocab` object. If `True`, a vocab is created using the default language data settings. ~~Vocab~~                                                                                                                                              |
| `disable`      | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [`nlp.enable_pipe`](/api/language#enable_pipe). ~~List[str]~~ |
| `exclude`      | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~                                                                                                             |
| `meta`         | [Meta data](/api/data-formats#meta) overrides. ~~Dict[str, Any]~~                                                                                                                                                                                |
| `auto_fill`    | Whether to automatically fill in missing values in the config, based on defaults and function argument annotations. Defaults to `True`. ~~bool~~                                                                                                 |
| `validate`     | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~                                                                                                                   |
| **RETURNS**    | The initialized object. ~~Language~~                                                                                                                                                                                                             |

## Language.component {#component tag="classmethod" new="3"}

Register a custom pipeline component under a given name. This allows
initializing the component by name using
[`Language.add_pipe`](/api/language#add_pipe) and referring to it in
[config files](/usage/training#config). This classmethod and decorator is
intended for **simple stateless functions** that take a `Doc` and return it. For
more complex stateful components that allow settings and need access to the
shared `nlp` object, use the [`Language.factory`](/api/language#factory)
decorator. For more details and examples, see the
[usage documentation](/usage/processing-pipelines#custom-components).

> #### Example
>
> ```python
> from spacy.language import Language
>
> # Usage as a decorator
> @Language.component("my_component")
> def my_component(doc):
>    # Do something to the doc
>    return doc
>
> # Usage as a function
> Language.component("my_component2", func=my_component)
> ```

| Name           | Description                                                                                                                                                        |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name`         | The name of the component factory. ~~str~~                                                                                                                         |
| _keyword-only_ |                                                                                                                                                                    |
| `assigns`      | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
| `requires`     | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
| `retokenizes`  | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~                                               |
| `func`         | Optional function if not used as a decorator. ~~Optional[Callable[[Doc], Doc]]~~                                                                                   |

## Language.factory {#factory tag="classmethod"}

Register a custom pipeline component factory under a given name. This allows
initializing the component by name using
[`Language.add_pipe`](/api/language#add_pipe) and referring to it in
[config files](/usage/training#config). The registered factory function needs to
take at least two **named arguments** which spaCy fills in automatically: `nlp`
for the current `nlp` object and `name` for the component instance name. This
can be useful to distinguish multiple instances of the same component and allows
trainable components to add custom losses using the component instance name. The
`default_config` defines the default values of the remaining factory arguments.
It's merged into the [`nlp.config`](/api/language#config). For more details and
examples, see the
[usage documentation](/usage/processing-pipelines#custom-components).

> #### Example
>
> ```python
> from spacy.language import Language
>
> # Usage as a decorator
> @Language.factory(
>    "my_component",
>    default_config={"some_setting": True},
> )
> def create_my_component(nlp, name, some_setting):
>      return MyComponent(some_setting)
>
> # Usage as function
> Language.factory(
>     "my_component",
>     default_config={"some_setting": True},
>     func=create_my_component
> )
> ```

| Name                    | Description                                                                                                                                                                                                                                                                                                                        |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                  | The name of the component factory. ~~str~~                                                                                                                                                                                                                                                                                         |
| _keyword-only_          |                                                                                                                                                                                                                                                                                                                                    |
| `default_config`        | The default config, describing the default values of the factory arguments. ~~Dict[str, Any]~~                                                                                                                                                                                                                                     |
| `assigns`               | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
| `requires`              | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
| `retokenizes`           | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~                                                                                                                                                                                                               |
| `default_score_weights` | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. If a weight is set to `None`, the score will not be logged or weighted. ~~Dict[str, Optional[float]]~~ |
| `func`                  | Optional function if not used as a decorator. ~~Optional[Callable[[...], Callable[[Doc], Doc]]]~~                                                                                                                                                                                                                                  |

## Language.\_\_call\_\_ {#call tag="method"}

Apply the pipeline to some text. The text can span multiple sentences, and can
contain arbitrary whitespace. Alignment into the original string is preserved.

> #### Example
>
> ```python
> doc = nlp("An example sentence. Another sentence.")
> assert (doc[0].text, doc[0].head.tag_) == ("An", "NN")
> ```

| Name            | Description                                                                                                                                    |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `text`          | The text to be processed. ~~str~~                                                                                                              |
| _keyword-only_  |                                                                                                                                                |
| `disable`       | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). ~~List[str]~~                                                |
| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ |
| **RETURNS**     | A container for accessing the annotations. ~~Doc~~                                                                                             |

## Language.pipe {#pipe tag="method"}

Process texts as a stream, and yield `Doc` objects in order. This is usually
more efficient than processing texts one-by-one.

> #### Example
>
> ```python
> texts = ["One document.", "...", "Lots of documents"]
> for doc in nlp.pipe(texts, batch_size=50):
>     assert doc.has_annotation("DEP")
> ```

| Name                                       | Description                                                                                                                                                         |
| ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `texts`                                    | A sequence of strings. ~~Iterable[str]~~                                                                                                                            |
| _keyword-only_                             |                                                                                                                                                                     |
| `as_tuples`                                | If set to `True`, inputs should be a sequence of `(text, context)` tuples. Output will then be a sequence of `(doc, context)` tuples. Defaults to `False`. ~~bool~~ |
| `batch_size`                               | The number of texts to buffer. ~~Optional[int]~~                                                                                                                    |
| `disable`                                  | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). ~~List[str]~~                                                                     |
| `cleanup`                                  | If `True`, unneeded strings are freed to control memory use. Experimental. ~~bool~~                                                                                 |
| `component_cfg`                            | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~                      |
| `n_process` <Tag variant="new">2.2.2</Tag> | Number of processors to use. Defaults to `1`. ~~int~~                                                                                                               |
| **YIELDS**                                 | Documents in the order of the original text. ~~Doc~~                                                                                                                |

## Language.initialize {#initialize tag="method" new="3"}

Initialize the pipeline for training and return an
[`Optimizer`](https://thinc.ai/docs/api-optimizers). Under the hood, it uses the
settings defined in the [`[initialize]`](/api/data-formats#config-initialize)
config block to set up the vocabulary, load in vectors and tok2vec weights and
pass optional arguments to the `initialize` methods implemented by pipeline
components or the tokenizer. This method is typically called automatically when
you run [`spacy train`](/api/cli#train). See the usage guide on the
[config lifecycle](/usage/training#config-lifecycle) and
[initialization](/usage/training#initialization) for details.

`get_examples` should be a function that returns an iterable of
[`Example`](/api/example) objects. The data examples can either be the full
training data or a representative sample. They are used to **initialize the
models** of trainable pipeline components and are passed each component's
[`initialize`](/api/pipe#initialize) method, if available. Initialization
includes validating the network,
[inferring missing shapes](/usage/layers-architectures#thinc-shape-inference)
and setting up the label scheme based on the data.

If no `get_examples` function is provided when calling `nlp.initialize`, the
pipeline components will be initialized with generic data. In this case, it is
crucial that the output dimension of each component has already been defined
either in the [config](/usage/training#config), or by calling
[`pipe.add_label`](/api/pipe#add_label) for each possible output label (e.g. for
the tagger or textcat).

<Infobox variant="warning" title="Changed in v3.0" id="begin_training">

This method was previously called `begin_training`. It now also takes a
**function** that is called with no arguments and returns a sequence of
[`Example`](/api/example) objects instead of tuples of `Doc` and `GoldParse`
objects.

</Infobox>

> #### Example
>
> ```python
> get_examples = lambda: examples
> optimizer = nlp.initialize(get_examples)
> ```

| Name           | Description                                                                                                                                              |
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `get_examples` | Optional function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Optional[Callable[[], Iterable[Example]]]~~ |
| _keyword-only_ |                                                                                                                                                          |
| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                            |
| **RETURNS**    | The optimizer. ~~Optimizer~~                                                                                                                             |

## Language.resume_training {#resume_training tag="method,experimental" new="3"}

Continue training a trained pipeline. Create and return an optimizer, and
initialize "rehearsal" for any pipeline component that has a `rehearse` method.
Rehearsal is used to prevent models from "forgetting" their initialized
"knowledge". To perform rehearsal, collect samples of text you want the models
to retain performance on, and call [`nlp.rehearse`](/api/language#rehearse) with
a batch of [Example](/api/example) objects.

> #### Example
>
> ```python
> optimizer = nlp.resume_training()
> nlp.rehearse(examples, sgd=optimizer)
> ```

| Name           | Description                                                                                                   |
| -------------- | ------------------------------------------------------------------------------------------------------------- |
| _keyword-only_ |                                                                                                               |
| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
| **RETURNS**    | The optimizer. ~~Optimizer~~                                                                                  |

## Language.update {#update tag="method"}

Update the models in the pipeline.

<Infobox variant="warning" title="Changed in v3.0">

The `Language.update` method now takes a batch of [`Example`](/api/example)
objects instead of the raw texts and annotations or `Doc` and `GoldParse`
objects. An [`Example`](/api/example) streamlines how data is passed around. It
stores two `Doc` objects: one for holding the gold-standard reference data, and
one for holding the predictions of the pipeline.

For most use cases, you shouldn't have to write your own training scripts
anymore. Instead, you can use [`spacy train`](/api/cli#train) with a config file
and custom registered functions if needed. See the
[training documentation](/usage/training) for details.

</Infobox>

> #### Example
>
> ```python
> for raw_text, entity_offsets in train_data:
>     doc = nlp.make_doc(raw_text)
>     example = Example.from_dict(doc, {"entities": entity_offsets})
>     nlp.update([example], sgd=optimizer)
> ```

| Name            | Description                                                                                                                                    |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `examples`      | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                              |
| _keyword-only_  |                                                                                                                                                |
| `drop`          | The dropout rate. ~~float~~                                                                                                                    |
| `sgd`           | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                  |
| `losses`        | Dictionary to update with the loss, keyed by pipeline component. ~~Optional[Dict[str, float]]~~                                                |
| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ |
| **RETURNS**     | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                          |

## Language.rehearse {#rehearse tag="method,experimental" new="3"}

Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
current model to make predictions similar to an initial model, to try to address
the "catastrophic forgetting" problem. This feature is experimental.

> #### Example
>
> ```python
> optimizer = nlp.resume_training()
> losses = nlp.rehearse(examples, sgd=optimizer)
> ```

| Name           | Description                                                                                                   |
| -------------- | ------------------------------------------------------------------------------------------------------------- |
| `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                             |
| _keyword-only_ |                                                                                                               |
| `drop`         | The dropout rate. ~~float~~                                                                                   |
| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
| `losses`       | Dictionary to update with the loss, keyed by pipeline component. ~~Optional[Dict[str, float]]~~               |
| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                         |

## Language.evaluate {#evaluate tag="method"}

Evaluate a pipeline's components.

<Infobox variant="warning" title="Changed in v3.0">

The `Language.update` method now takes a batch of [`Example`](/api/example)
objects instead of tuples of `Doc` and `GoldParse` objects.

</Infobox>

> #### Example
>
> ```python
> scores = nlp.evaluate(examples)
> print(scores)
> ```

| Name            | Description                                                                                                                                    |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `examples`      | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                              |
| _keyword-only_  |                                                                                                                                                |
| `batch_size`    | The batch size to use. ~~Optional[int]~~                                                                                                       |
| `scorer`        | Optional [`Scorer`](/api/scorer) to use. If not passed in, a new one will be created. ~~Optional[Scorer]~~                                     |
| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ |
| `scorer_cfg`    | Optional dictionary of keyword arguments for the `Scorer`. Defaults to `None`. ~~Optional[Dict[str, Any]]~~                                    |
| **RETURNS**     | A dictionary of evaluation scores. ~~Dict[str, Union[float, Dict[str, float]]]~~                                                               |

## Language.use_params {#use_params tag="contextmanager, method"}

Replace weights of models in the pipeline with those provided in the params
dictionary. Can be used as a context manager, in which case, models go back to
their original weights after the block.

> #### Example
>
> ```python
> with nlp.use_params(optimizer.averages):
>     nlp.to_disk("/tmp/checkpoint")
> ```

| Name     | Description                                            |
| -------- | ------------------------------------------------------ |
| `params` | A dictionary of parameters keyed by model ID. ~~dict~~ |

## Language.add_pipe {#add_pipe tag="method" new="2"}

Add a component to the processing pipeline. Expects a name that maps to a
component factory registered using
[`@Language.component`](/api/language#component) or
[`@Language.factory`](/api/language#factory). Components should be callables
that take a `Doc` object, modify it and return it. Only one of `before`,
`after`, `first` or `last` can be set. Default behavior is `last=True`.

<Infobox title="Changed in v3.0" variant="warning">

As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method doesn't
take callables anymore and instead expects the **name of a component factory**
registered using [`@Language.component`](/api/language#component) or
[`@Language.factory`](/api/language#factory). It now takes care of creating the
component, adds it to the pipeline and returns it.

</Infobox>

> #### Example
>
> ```python
> @Language.component("component")
> def component_func(doc):
>     # modify Doc and return it return doc
>
> nlp.add_pipe("component", before="ner")
> component = nlp.add_pipe("component", name="custom_name", last=True)
>
> # Add component from source pipeline
> source_nlp = spacy.load("en_core_web_sm")
> nlp.add_pipe("ner", source=source_nlp)
> ```

| Name                                  | Description                                                                                                                                                                                                                                                                              |
| ------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `factory_name`                        | Name of the registered component factory. ~~str~~                                                                                                                                                                                                                                        |
| `name`                                | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. ~~Optional[str]~~                                                                                                              |
| _keyword-only_                        |                                                                                                                                                                                                                                                                                          |
| `before`                              | Component name or index to insert component directly before. ~~Optional[Union[str, int]]~~                                                                                                                                                                                               |
| `after`                               | Component name or index to insert component directly after. ~~Optional[Union[str, int]]~~                                                                                                                                                                                                |
| `first`                               | Insert component first / not first in the pipeline. ~~Optional[bool]~~                                                                                                                                                                                                                   |
| `last`                                | Insert component last / not last in the pipeline. ~~Optional[bool]~~                                                                                                                                                                                                                     |
| `config` <Tag variant="new">3</Tag>   | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. ~~Optional[Dict[str, Any]]~~                                                                                                                          |
| `source` <Tag variant="new">3</Tag>   | Optional source pipeline to copy component from. If a source is provided, the `factory_name` is interpreted as the name of the component in the source pipeline. Make sure that the vocab, vectors and settings of the source pipeline match the target pipeline. ~~Optional[Language]~~ |
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~                                                                                                                                                           |
| **RETURNS**                           | The pipeline component. ~~Callable[[Doc], Doc]~~                                                                                                                                                                                                                                         |

## Language.create_pipe {#create_pipe tag="method" new="2"}

Create a pipeline component from a factory.

<Infobox title="Changed in v3.0" variant="warning">

As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method also takes
the string name of the factory, creates the component, adds it to the pipeline
and returns it. The `Language.create_pipe` method is now mostly used internally.
To create a component and add it to the pipeline, you should always use
`Language.add_pipe`.

</Infobox>

> #### Example
>
> ```python
> parser = nlp.create_pipe("parser")
> ```

| Name                                  | Description                                                                                                                                                                 |
| ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `factory_name`                        | Name of the registered component factory. ~~str~~                                                                                                                           |
| `name`                                | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. ~~Optional[str]~~ |
| _keyword-only_                        |                                                                                                                                                                             |
| `config` <Tag variant="new">3</Tag>   | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. ~~Optional[Dict[str, Any]]~~             |
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~                                              |
| **RETURNS**                           | The pipeline component. ~~Callable[[Doc], Doc]~~                                                                                                                            |

## Language.has_factory {#has_factory tag="classmethod" new="3"}

Check whether a factory name is registered on the `Language` class or subclass.
Will check for
[language-specific factories](/usage/processing-pipelines#factories-language)
registered on the subclass, as well as general-purpose factories registered on
the `Language` base class, available to all subclasses.

> #### Example
>
> ```python
> from spacy.language import Language
> from spacy.lang.en import English
>
> @English.component("component")
> def component(doc):
>     return doc
>
> assert English.has_factory("component")
> assert not Language.has_factory("component")
> ```

| Name        | Description                                                         |
| ----------- | ------------------------------------------------------------------- |
| `name`      | Name of the pipeline factory to check. ~~str~~                      |
| **RETURNS** | Whether a factory of that name is registered on the class. ~~bool~~ |

## Language.has_pipe {#has_pipe tag="method" new="2"}

Check whether a component is present in the pipeline. Equivalent to
`name in nlp.pipe_names`.

> #### Example
>
> ```python
> @Language.component("component")
> def component(doc):
>     return doc
>
> nlp.add_pipe("component", name="my_component")
> assert "my_component" in nlp.pipe_names
> assert nlp.has_pipe("my_component")
> ```

| Name        | Description                                                       |
| ----------- | ----------------------------------------------------------------- |
| `name`      | Name of the pipeline component to check. ~~str~~                  |
| **RETURNS** | Whether a component of that name exists in the pipeline. ~~bool~~ |

## Language.get_pipe {#get_pipe tag="method" new="2"}

Get a pipeline component for a given component name.

> #### Example
>
> ```python
> parser = nlp.get_pipe("parser")
> custom_component = nlp.get_pipe("custom_component")
> ```

| Name        | Description                                      |
| ----------- | ------------------------------------------------ |
| `name`      | Name of the pipeline component to get. ~~str~~   |
| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |

## Language.replace_pipe {#replace_pipe tag="method" new="2"}

Replace a component in the pipeline and return the new component.

<Infobox title="Changed in v3.0" variant="warning">

As of v3.0, the `Language.replace_pipe` method doesn't take callables anymore
and instead expects the **name of a component factory** registered using
[`@Language.component`](/api/language#component) or
[`@Language.factory`](/api/language#factory).

</Infobox>

> #### Example
>
> ```python
> new_parser = nlp.replace_pipe("parser", "my_custom_parser")
> ```

| Name                                  | Description                                                                                                                                                        |
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name`                                | Name of the component to replace. ~~str~~                                                                                                                          |
| `component`                           | The factory name of the component to insert. ~~str~~                                                                                                               |
| _keyword-only_                        |                                                                                                                                                                    |
| `config` <Tag variant="new">3</Tag>   | Optional config parameters to use for the new component. Will be merged with the `default_config` specified by the component factory. ~~Optional[Dict[str, Any]]~~ |
| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~                                     |
| **RETURNS**                           | The new pipeline component. ~~Callable[[Doc], Doc]~~                                                                                                               |

## Language.rename_pipe {#rename_pipe tag="method" new="2"}

Rename a component in the pipeline. Useful to create custom names for
pre-defined and pre-loaded components. To change the default name of a component
added to the pipeline, you can also use the `name` argument on
[`add_pipe`](/api/language#add_pipe).

> #### Example
>
> ```python
> nlp.rename_pipe("parser", "spacy_parser")
> ```

| Name       | Description                              |
| ---------- | ---------------------------------------- |
| `old_name` | Name of the component to rename. ~~str~~ |
| `new_name` | New name of the component. ~~str~~       |

## Language.remove_pipe {#remove_pipe tag="method" new="2"}

Remove a component from the pipeline. Returns the removed component name and
component function.

> #### Example
>
> ```python
> name, component = nlp.remove_pipe("parser")
> assert name == "parser"
> ```

| Name        | Description                                                                                |
| ----------- | ------------------------------------------------------------------------------------------ |
| `name`      | Name of the component to remove. ~~str~~                                                   |
| **RETURNS** | A `(name, component)` tuple of the removed component. ~~Tuple[str, Callable[[Doc], Doc]]~~ |

## Language.disable_pipe {#disable_pipe tag="method" new="3"}

Temporarily disable a pipeline component so it's not run as part of the
pipeline. Disabled components are listed in
[`nlp.disabled`](/api/language#attributes) and included in
[`nlp.components`](/api/language#attributes), but not in
[`nlp.pipeline`](/api/language#pipeline), so they're not run when you process a
`Doc` with the `nlp` object. If the component is already disabled, this method
does nothing.

> #### Example
>
> ```python
> nlp.add_pipe("ner")
> nlp.add_pipe("textcat")
> assert nlp.pipe_names == ["ner", "textcat"]
> nlp.disable_pipe("ner")
> assert nlp.pipe_names == ["textcat"]
> assert nlp.component_names == ["ner", "textcat"]
> assert nlp.disabled == ["ner"]
> ```

| Name   | Description                               |
| ------ | ----------------------------------------- |
| `name` | Name of the component to disable. ~~str~~ |

## Language.enable_pipe {#enable_pipe tag="method" new="3"}

Enable a previously disabled component (e.g. via
[`Language.disable_pipes`](/api/language#disable_pipes)) so it's run as part of
the pipeline, [`nlp.pipeline`](/api/language#pipeline). If the component is
already enabled, this method does nothing.

> #### Example
>
> ```python
> nlp.disable_pipe("ner")
> assert "ner" in nlp.disabled
> assert not "ner" in nlp.pipe_names
> nlp.enable_pipe("ner")
> assert not "ner" in nlp.disabled
> assert "ner" in nlp.pipe_names
> ```

| Name   | Description                              |
| ------ | ---------------------------------------- |
| `name` | Name of the component to enable. ~~str~~ |

## Language.select_pipes {#select_pipes tag="contextmanager, method" new="3"}

Disable one or more pipeline components. If used as a context manager, the
pipeline will be restored to the initial state at the end of the block.
Otherwise, a `DisabledPipes` object is returned, that has a `.restore()` method
you can use to undo your changes. You can specify either `disable` (as a list or
string), or `enable`. In the latter case, all components not in the `enable`
list will be disabled. Under the hood, this method calls into
[`disable_pipe`](/api/language#disable_pipe) and
[`enable_pipe`](/api/language#enable_pipe).

> #### Example
>
> ```python
> with nlp.select_pipes(disable=["tagger", "parser"]):
>    nlp.initialize()
>
> with nlp.select_pipes(enable="ner"):
>     nlp.initialize()
>
> disabled = nlp.select_pipes(disable=["tagger", "parser"])
> nlp.initialize()
> disabled.restore()
> ```

<Infobox title="Changed in v3.0" variant="warning" id="disable_pipes">

As of spaCy v3.0, the `disable_pipes` method has been renamed to `select_pipes`:

```diff
- nlp.disable_pipes(["tagger", "parser"])
+ nlp.select_pipes(disable=["tagger", "parser"])
```

</Infobox>

| Name           | Description                                                                                            |
| -------------- | ------------------------------------------------------------------------------------------------------ |
| _keyword-only_ |                                                                                                        |
| `disable`      | Name(s) of pipeline components to disable. ~~Optional[Union[str, Iterable[str]]]~~                     |
| `enable`       | Name(s) of pipeline components that will not be disabled. ~~Optional[Union[str, Iterable[str]]]~~      |
| **RETURNS**    | The disabled pipes that can be restored by calling the object's `.restore()` method. ~~DisabledPipes~~ |

## Language.get_factory_meta {#get_factory_meta tag="classmethod" new="3"}

Get the factory meta information for a given pipeline component name. Expects
the name of the component **factory**. The factory meta is an instance of the
[`FactoryMeta`](/api/language#factorymeta) dataclass and contains the
information about the component and its default provided by the
[`@Language.component`](/api/language#component) or
[`@Language.factory`](/api/language#factory) decorator.

> #### Example
>
> ```python
> factory_meta = Language.get_factory_meta("ner")
> assert factory_meta.factory == "ner"
> print(factory_meta.default_config)
> ```

| Name        | Description                       |
| ----------- | --------------------------------- |
| `name`      | The factory name. ~~str~~         |
| **RETURNS** | The factory meta. ~~FactoryMeta~~ |

## Language.get_pipe_meta {#get_pipe_meta tag="method" new="3"}

Get the factory meta information for a given pipeline component name. Expects
the name of the component **instance** in the pipeline. The factory meta is an
instance of the [`FactoryMeta`](/api/language#factorymeta) dataclass and
contains the information about the component and its default provided by the
[`@Language.component`](/api/language#component) or
[`@Language.factory`](/api/language#factory) decorator.

> #### Example
>
> ```python
> nlp.add_pipe("ner", name="entity_recognizer")
> factory_meta = nlp.get_pipe_meta("entity_recognizer")
> assert factory_meta.factory == "ner"
> print(factory_meta.default_config)
> ```

| Name        | Description                          |
| ----------- | ------------------------------------ |
| `name`      | The pipeline component name. ~~str~~ |
| **RETURNS** | The factory meta. ~~FactoryMeta~~    |

## Language.analyze_pipes {#analyze_pipes tag="method" new="3"}

Analyze the current pipeline components and show a summary of the attributes
they assign and require, and the scores they set. The data is based on the
information provided in the [`@Language.component`](/api/language#component) and
[`@Language.factory`](/api/language#factory) decorator. If requirements aren't
met, e.g. if a component specifies a required property that is not set by a
previous component, a warning is shown.

<Infobox variant="warning" title="Important note">

The pipeline analysis is static and does **not actually run the components**.
This means that it relies on the information provided by the components
themselves. If a custom component declares that it assigns an attribute but it
doesn't, the pipeline analysis won't catch that.

</Infobox>

> #### Example
>
> ```python
> nlp = spacy.blank("en")
> nlp.add_pipe("tagger")
> nlp.add_pipe("entity_linker")
> analysis = nlp.analyze_pipes()
> ```

<Accordion title="Example output" spaced>

```json
### Structured
{
  "summary": {
    "tagger": {
      "assigns": ["token.tag"],
      "requires": [],
      "scores": ["tag_acc", "pos_acc", "lemma_acc"],
      "retokenizes": false
    },
    "entity_linker": {
      "assigns": ["token.ent_kb_id"],
      "requires": ["doc.ents", "doc.sents", "token.ent_iob", "token.ent_type"],
      "scores": [],
      "retokenizes": false
    }
  },
  "problems": {
    "tagger": [],
    "entity_linker": ["doc.ents", "doc.sents", "token.ent_iob", "token.ent_type"]
  },
  "attrs": {
    "token.ent_iob": { "assigns": [], "requires": ["entity_linker"] },
    "doc.ents": { "assigns": [], "requires": ["entity_linker"] },
    "token.ent_kb_id": { "assigns": ["entity_linker"], "requires": [] },
    "doc.sents": { "assigns": [], "requires": ["entity_linker"] },
    "token.tag": { "assigns": ["tagger"], "requires": [] },
    "token.ent_type": { "assigns": [], "requires": ["entity_linker"] }
  }
}
```

```
### Pretty
============================= Pipeline Overview =============================

#   Component       Assigns           Requires         Scores        Retokenizes
-   -------------   ---------------   --------------   -----------   -----------
0   tagger          token.tag                          tag_acc       False

1   entity_linker   token.ent_kb_id   doc.ents         nel_micro_f   False
                                      doc.sents        nel_micro_r
                                      token.ent_iob    nel_micro_p
                                      token.ent_type


================================ Problems (4) ================================
⚠ 'entity_linker' requirements not met: doc.ents, doc.sents,
token.ent_iob, token.ent_type
```

</Accordion>

| Name           | Description                                                                                                                                                                                                                                 |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| _keyword-only_ |                                                                                                                                                                                                                                             |
| `keys`         | The values to display in the table. Corresponds to attributes of the [`FactoryMeta`](/api/language#factorymeta). Defaults to `["assigns", "requires", "scores", "retokenizes"]`. ~~List[str]~~                                              |
| `pretty`       | Pretty-print the results as a table. Defaults to `False`. ~~bool~~                                                                                                                                                                          |
| **RETURNS**    | Dictionary containing the pipe analysis, keyed by `"summary"` (component meta by pipe), `"problems"` (attribute names by pipe) and `"attrs"` (pipes that assign and require an attribute, keyed by attribute). ~~Optional[Dict[str, Any]]~~ |

## Language.meta {#meta tag="property"}

Meta data for the `Language` class, including name, version, data sources,
license, author information and more. If a trained pipeline is loaded, this
contains meta data of the pipeline. The `Language.meta` is also what's
serialized as the `meta.json` when you save an `nlp` object to disk. See the
[meta data format](/api/data-formats#meta) for more details.

<Infobox variant="warning" title="Changed in v3.0">

As of v3.0, the meta only contains **meta information** about the pipeline and
isn't used to construct the language class and pipeline components. This
information is expressed in the [`config.cfg`](/api/data-formats#config).

</Infobox>

> #### Example
>
> ```python
> print(nlp.meta)
> ```

| Name        | Description                       |
| ----------- | --------------------------------- |
| **RETURNS** | The meta data. ~~Dict[str, Any]~~ |

## Language.config {#config tag="property" new="3"}

Export a trainable [`config.cfg`](/api/data-formats#config) for the current
`nlp` object. Includes the current pipeline, all configs used to create the
currently active pipeline components, as well as the default training config
that can be used with [`spacy train`](/api/cli#train). `Language.config` returns
a [Thinc `Config` object](https://thinc.ai/docs/api-config#config), which is a
subclass of the built-in `dict`. It supports the additional methods `to_disk`
(serialize the config to a file) and `to_str` (output the config as a string).

> #### Example
>
> ```python
> nlp.config.to_disk("./config.cfg")
> print(nlp.config.to_str())
> ```

| Name        | Description            |
| ----------- | ---------------------- |
| **RETURNS** | The config. ~~Config~~ |

## Language.to_disk {#to_disk tag="method" new="2"}

Save the current state to a directory. Under the hood, this method delegates to
the `to_disk` methods of the individual pipeline components, if available. This
means that if a trained pipeline is loaded, all components and their weights
will be saved to disk.

> #### Example
>
> ```python
> nlp.to_disk("/path/to/pipeline")
> ```

| Name           | Description                                                                                                                                |
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `path`         | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
| _keyword-only_ |                                                                                                                                            |
| `exclude`      | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~                                |

## Language.from_disk {#from_disk tag="method" new="2"}

Loads state from a directory, including all data that was saved with the
`Language` object. Modifies the object in place and returns it.

<Infobox variant="warning" title="Important note">

Keep in mind that this method **only loads the serialized state** and doesn't
set up the `nlp` object. This means that it requires the correct language class
to be initialized and all pipeline components to be added to the pipeline. If
you want to load a serialized pipeline from a directory, you should use
[`spacy.load`](/api/top-level#spacy.load), which will set everything up for you.

</Infobox>

> #### Example
>
> ```python
> from spacy.language import Language
> nlp = Language().from_disk("/path/to/pipeline")
>
> # Using language-specific subclass
> from spacy.lang.en import English
> nlp = English().from_disk("/path/to/pipeline")
> ```

| Name           | Description                                                                                                 |
| -------------- | ----------------------------------------------------------------------------------------------------------- |
| `path`         | A path to a directory. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~             |
| _keyword-only_ |                                                                                                             |
| `exclude`      | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
| **RETURNS**    | The modified `Language` object. ~~Language~~                                                                |

## Language.to_bytes {#to_bytes tag="method"}

Serialize the current state to a binary string.

> #### Example
>
> ```python
> nlp_bytes = nlp.to_bytes()
> ```

| Name           | Description                                                                                            |
| -------------- | ------------------------------------------------------------------------------------------------------ |
| _keyword-only_ |                                                                                                        |
| `exclude`      | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. ~~iterable~~ |
| **RETURNS**    | The serialized form of the `Language` object. ~~bytes~~                                                |

## Language.from_bytes {#from_bytes tag="method"}

Load state from a binary string. Note that this method is commonly used via the
subclasses like `English` or `German` to make language-specific functionality
like the [lexical attribute getters](/usage/linguistic-features#language-data)
available to the loaded object.

> #### Example
>
> ```python
> from spacy.lang.en import English
> nlp_bytes = nlp.to_bytes()
> nlp2 = English()
> nlp2.from_bytes(nlp_bytes)
> ```

| Name           | Description                                                                                                 |
| -------------- | ----------------------------------------------------------------------------------------------------------- |
| `bytes_data`   | The data to load from. ~~bytes~~                                                                            |
| _keyword-only_ |                                                                                                             |
| `exclude`      | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
| **RETURNS**    | The `Language` object. ~~Language~~                                                                         |

## Attributes {#attributes}

| Name                                          | Description                                                                                                                                    |
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `vocab`                                       | A container for the lexical types. ~~Vocab~~                                                                                                   |
| `tokenizer`                                   | The tokenizer. ~~Tokenizer~~                                                                                                                   |
| `make_doc`                                    | Callable that takes a string and returns a `Doc`. ~~Callable[[str], Doc]~~                                                                     |
| `pipeline`                                    | List of `(name, component)` tuples describing the current processing pipeline, in order. ~~List[Tuple[str, Callable[[Doc], Doc]]]~~            |
| `pipe_names` <Tag variant="new">2</Tag>       | List of pipeline component names, in order. ~~List[str]~~                                                                                      |
| `pipe_labels` <Tag variant="new">2.2</Tag>    | List of labels set by the pipeline components, if available, keyed by component name. ~~Dict[str, List[str]]~~                                 |
| `pipe_factories` <Tag variant="new">2.2</Tag> | Dictionary of pipeline component names, mapped to their factory names. ~~Dict[str, str]~~                                                      |
| `factories`                                   | All available factory functions, keyed by name. ~~Dict[str, Callable[[...], Callable[[Doc], Doc]]]~~                                           |
| `factory_names` <Tag variant="new">3</Tag>    | List of all available factory names. ~~List[str]~~                                                                                             |
| `components` <Tag variant="new">3</Tag>       | List of all available `(name, component)` tuples, including components that are currently disabled. ~~List[Tuple[str, Callable[[Doc], Doc]]]~~ |
| `component_names` <Tag variant="new">3</Tag>  | List of all available component names, including components that are currently disabled. ~~List[str]~~                                         |
| `disabled` <Tag variant="new">3</Tag>         | Names of components that are currently disabled and don't run as part of the pipeline. ~~List[str]~~                                           |
| `path` <Tag variant="new">2</Tag>             | Path to the pipeline data directory, if a pipeline is loaded from a path or package. Otherwise `None`. ~~Optional[Path]~~                      |

## Class attributes {#class-attributes}

| Name             | Description                                                                                                                                                                       |
| ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Defaults`       | Settings, data and factory methods for creating the `nlp` object and processing pipeline. ~~Defaults~~                                                                            |
| `lang`           | Two-letter language ID, i.e. [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). ~~str~~                                                                           |
| `default_config` | Base [config](/usage/training#config) to use for [Language.config](/api/language#config). Defaults to [`default_config.cfg`](%%GITHUB_SPACY/spacy/default_config.cfg). ~~Config~~ |

## Defaults {#defaults}

The following attributes can be set on the `Language.Defaults` class to
customize the default language data:

> #### Example
>
> ```python
> from spacy.language import language
> from spacy.lang.tokenizer_exceptions import URL_MATCH
> from thinc.api import Config
>
> DEFAULT_CONFIFG = """
> [nlp.tokenizer]
> @tokenizers = "MyCustomTokenizer.v1"
> """
>
> class Defaults(Language.Defaults):
>    stop_words = set()
>    tokenizer_exceptions = {}
>    prefixes = tuple()
>    suffixes = tuple()
>    infixes = tuple()
>    token_match = None
>    url_match = URL_MATCH
>    lex_attr_getters = {}
>    syntax_iterators = {}
>    writing_system = {"direction": "ltr", "has_case": True, "has_letters": True}
>    config = Config().from_str(DEFAULT_CONFIG)
> ```

| Name                              | Description                                                                                                                                                                                                                                                                                                      |
| --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `stop_words`                      | List of stop words, used for `Token.is_stop`.<br />**Example:** [`stop_words.py`](%%GITHUB_SPACY/spacy/lang/en/stop_words.py) ~~Set[str]~~                                                                                                                                                                       |
| `tokenizer_exceptions`            | Tokenizer exception rules, string mapped to list of token attributes.<br />**Example:** [`de/tokenizer_exceptions.py`](%%GITHUB_SPACY/spacy/lang/de/tokenizer_exceptions.py) ~~Dict[str, List[dict]]~~                                                                                                           |
| `prefixes`, `suffixes`, `infixes` | Prefix, suffix and infix rules for the default tokenizer.<br />**Example:** [`puncutation.py`](%%GITHUB_SPACY/spacy/lang/punctuation.py) ~~Optional[List[Union[str, Pattern]]]~~                                                                                                                                 |
| `token_match`                     | Optional regex for matching strings that should never be split, overriding the infix rules.<br />**Example:** [`fr/tokenizer_exceptions.py`](%%GITHUB_SPACY/spacy/lang/fr/tokenizer_exceptions.py) ~~Optional[Pattern]~~                                                                                         |
| `url_match`                       | Regular expression for matching URLs. Prefixes and suffixes are removed before applying the match.<br />**Example:** [`tokenizer_exceptions.py`](%%GITHUB_SPACY/spacy/lang/tokenizer_exceptions.py) ~~Optional[Pattern]~~                                                                                        |
| `lex_attr_getters`                | Custom functions for setting lexical attributes on tokens, e.g. `like_num`.<br />**Example:** [`lex_attrs.py`](%%GITHUB_SPACY/spacy/lang/en/lex_attrs.py) ~~Dict[int, Callable[[str], Any]]~~                                                                                                                    |
| `syntax_iterators`                | Functions that compute views of a `Doc` object based on its syntax. At the moment, only used for [noun chunks](/usage/linguistic-features#noun-chunks).<br />**Example:** [`syntax_iterators.py`](%%GITHUB_SPACY/spacy/lang/en/syntax_iterators.py). ~~Dict[str, Callable[[Union[Doc, Span]], Iterator[Span]]]~~ |
| `writing_system`                  | Information about the language's writing system, available via `Vocab.writing_system`. Defaults to: `{"direction": "ltr", "has_case": True, "has_letters": True}.`.<br />**Example:** [`zh/__init__.py`](%%GITHUB_SPACY/spacy/lang/zh/__init__.py) ~~Dict[str, Any]~~                                            |
| `config`                          | Default [config](/usage/training#config) added to `nlp.config`. This can include references to custom tokenizers or lemmatizers.<br />**Example:** [`zh/__init__.py`](%%GITHUB_SPACY/spacy/lang/zh/__init__.py) ~~Config~~                                                                                       |

## Serialization fields {#serialization-fields}

During serialization, spaCy will export several data fields used to restore
different aspects of the object. If needed, you can exclude them from
serialization by passing in the string names via the `exclude` argument.

> #### Example
>
> ```python
> data = nlp.to_bytes(exclude=["tokenizer", "vocab"])
> nlp.from_disk("/pipeline", exclude=["ner"])
> ```

| Name        | Description                                                        |
| ----------- | ------------------------------------------------------------------ |
| `vocab`     | The shared [`Vocab`](/api/vocab).                                  |
| `tokenizer` | Tokenization rules and exceptions.                                 |
| `meta`      | The meta data, available as [`Language.meta`](/api/language#meta). |
| ...         | String names of pipeline components, e.g. `"ner"`.                 |

## FactoryMeta {#factorymeta new="3" tag="dataclass"}

The `FactoryMeta` contains the information about the component and its default
provided by the [`@Language.component`](/api/language#component) or
[`@Language.factory`](/api/language#factory) decorator. It's created whenever a
component is defined and stored on the `Language` class for each component
instance and factory instance.

| Name                    | Description                                                                                                                                                                                                                                                                                                                        |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `factory`               | The name of the registered component factory. ~~str~~                                                                                                                                                                                                                                                                              |
| `default_config`        | The default config, describing the default values of the factory arguments. ~~Dict[str, Any]~~                                                                                                                                                                                                                                     |
| `assigns`               | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
| `requires`              | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
| `retokenizes`           | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~                                                                                                                                                                                                               |
| `default_score_weights` | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. If a weight is set to `None`, the score will not be logged or weighted. ~~Dict[str, Optional[float]]~~ |
| `scores`                | All scores set by the components if it's trainable, e.g. `["ents_f", "ents_r", "ents_p"]`. Based on the `default_score_weights` and used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                              |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								---
 								title: Language
 								teaser: A text-processing pipeline
 								tag: class
 								source: spacy/language.py
 								---
 								Usually you'll load this once per process as `nlp` and pass the instance around
 								your application. The `Language` class is created when you call
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								[`spacy.load`](/api/top-level#spacy.load) and contains the shared vocabulary and
-												Update docs [ci skip]

											
										
										
											2020-10-02 14:24:33 +03:00
+								[language data](/usage/linguistic-features#language-data), optional binary
 								weights, e.g. provided by a [trained pipeline](/models), and the
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								[processing pipeline](/usage/processing-pipelines) containing components like
 								the tagger or parser that are called on a document in order. You can also add
 								your own processing pipeline components that take a `Doc` object, modify it and
 								return it.
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								## Language.\_\_init\_\_ {#init tag="method"}
-												Fix consistency and update docs

											
										
										
											2020-09-15 12:37:37 +03:00
+								Initialize a `Language` object. Note that the `meta` is only used for meta
 								information in [`Language.meta`](/api/language#meta) and not to configure the
 								`nlp` object or to override the config. To initialize from a config, use
 								[`Language.from_config`](/api/language#from_config) instead.
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
 								> #### Example
 								>
 								> ```python
 								> # Construction from subclass
 								> from spacy.lang.en import English
 								> nlp = English()
 								>
 								> # Construction from scratch
 								> from spacy.vocab import Vocab
 								> from spacy.language import Language
 								> nlp = Language(Vocab())
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name               | Description                                                                                                              |
 								| ------------------ | ------------------------------------------------------------------------------------------------------------------------ |
 								| `vocab`            | A `Vocab` object. If `True`, a vocab is created using the default language data settings. ~~Vocab~~                      |
 								| _keyword-only_     |                                                                                                                          |
 								| `max_length`       | Maximum number of characters allowed in a single text. Defaults to `10 ** 6`. ~~int~~                                    |
-												Fix consistency and update docs

											
										
										
											2020-09-15 12:37:37 +03:00
+								| `meta`             | [Meta data](/api/data-formats#meta) overrides. ~~Dict[str, Any]~~                                                        |
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| `create_tokenizer` | Optional function that receives the `nlp` object and returns a tokenizer. ~~Callable[[Language], Callable[[str], Doc]]~~ |
-												Format

											
										
										
											2020-12-09 14:44:01 +03:00
+								| `batch_size`       | Default batch size for [`pipe`](#pipe) and [`evaluate`](#evaluate). Defaults to `1000`. ~~int~~                          |
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
-												Update docs and util consistency

											
										
										
											2020-08-18 02:22:59 +03:00
+								## Language.from_config {#from_config tag="classmethod" new="3"}
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
 								Create a `Language` object from a loaded config. Will set up the tokenizer and
-												Merge branch 'develop' into pr/6135

											
										
										
											2020-09-24 14:44:56 +03:00
+								language data, add pipeline components based on the pipeline and add pipeline
 								components based on the definitions specified in the config. If no config is
 								provided, the default config of the given language is used. This is also how
 								spaCy loads a model under the hood based on its
 								[`config.cfg`](/api/data-formats#config).
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
 								> #### Example
 								>
 								> ```python
 								> from thinc.api import Config
 								> from spacy.language import Language
 								>
 								> config = Config().from_disk("./config.cfg")
 								> nlp = Language.from_config(config)
 								> ```
-												Fix consistency and update docs

											
										
										
											2020-09-15 12:37:37 +03:00
+								| Name           | Description                                                                                                                                                                                                                                      |
 								| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 								| `config`       | The loaded config. ~~Union[Dict[str, Any], Config]~~                                                                                                                                                                                             |
 								| _keyword-only_ |                                                                                                                                                                                                                                                  |
-												Update docs

											
										
										
											2020-09-15 12:41:49 +03:00
+								| `vocab`        | A `Vocab` object. If `True`, a vocab is created using the default language data settings. ~~Vocab~~                                                                                                                                              |
-												Fix consistency and update docs

											
										
										
											2020-09-15 12:37:37 +03:00
+								| `disable`      | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). Disabled pipes will be loaded but they won't be run unless you explicitly enable them by calling [`nlp.enable_pipe`](/api/language#enable_pipe). ~~List[str]~~ |
 								| `exclude`      | Names of pipeline components to [exclude](/usage/processing-pipelines#disabling). Excluded components won't be loaded. ~~List[str]~~                                                                                                             |
 								| `meta`         | [Meta data](/api/data-formats#meta) overrides. ~~Dict[str, Any]~~                                                                                                                                                                                |
 								| `auto_fill`    | Whether to automatically fill in missing values in the config, based on defaults and function argument annotations. Defaults to `True`. ~~bool~~                                                                                                 |
 								| `validate`     | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~                                                                                                                   |
 								| **RETURNS**    | The initialized object. ~~Language~~                                                                                                                                                                                                             |
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								## Language.component {#component tag="classmethod" new="3"}
 								Register a custom pipeline component under a given name. This allows
 								initializing the component by name using
 								[`Language.add_pipe`](/api/language#add_pipe) and referring to it in
 								[config files](/usage/training#config). This classmethod and decorator is
 								intended for **simple stateless functions** that take a `Doc` and return it. For
 								more complex stateful components that allow settings and need access to the
 								shared `nlp` object, use the [`Language.factory`](/api/language#factory)
 								decorator. For more details and examples, see the
 								[usage documentation](/usage/processing-pipelines#custom-components).
 								> #### Example
 								>
 								> ```python
 								> from spacy.language import Language
 								>
 								> # Usage as a decorator
 								> @Language.component("my_component")
 								> def my_component(doc):
 								>    # Do something to the doc
 								>    return doc
 								>
 								> # Usage as a function
 								> Language.component("my_component2", func=my_component)
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name           | Description                                                                                                                                                        |
 								| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 								| `name`         | The name of the component factory. ~~str~~                                                                                                                         |
 								| _keyword-only_ |                                                                                                                                                                    |
 								| `assigns`      | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
 								| `requires`     | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
 								| `retokenizes`  | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~                                               |
-												Merge branch 'develop' into pr/6135

											
										
										
											2020-09-24 14:44:56 +03:00
+								| `func`         | Optional function if not used as a decorator. ~~Optional[Callable[[Doc], Doc]]~~                                                                                   |
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
 								## Language.factory {#factory tag="classmethod"}
 								Register a custom pipeline component factory under a given name. This allows
 								initializing the component by name using
 								[`Language.add_pipe`](/api/language#add_pipe) and referring to it in
 								[config files](/usage/training#config). The registered factory function needs to
 								take at least two **named arguments** which spaCy fills in automatically: `nlp`
 								for the current `nlp` object and `name` for the component instance name. This
 								can be useful to distinguish multiple instances of the same component and allows
 								trainable components to add custom losses using the component instance name. The
 								`default_config` defines the default values of the remaining factory arguments.
 								It's merged into the [`nlp.config`](/api/language#config). For more details and
 								examples, see the
 								[usage documentation](/usage/processing-pipelines#custom-components).
 								> #### Example
 								>
 								> ```python
 								> from spacy.language import Language
 								>
 								> # Usage as a decorator
 								> @Language.factory(
 								>    "my_component",
 								>    default_config={"some_setting": True},
 								> )
 								> def create_my_component(nlp, name, some_setting):
 								>      return MyComponent(some_setting)
 								>
 								> # Usage as function
 								> Language.factory(
 								>     "my_component",
 								>     default_config={"some_setting": True},
 								>     func=create_my_component
 								> )
 								> ```
-												Fix handling of score_weights

											
										
										
											2020-09-24 11:27:33 +03:00
+								| Name                    | Description                                                                                                                                                                                                                                                                                                                        |
 								| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `name`                  | The name of the component factory. ~~str~~                                                                                                                                                                                                                                                                                         |
 								| _keyword-only_          |                                                                                                                                                                                                                                                                                                                                    |
 								| `default_config`        | The default config, describing the default values of the factory arguments. ~~Dict[str, Any]~~                                                                                                                                                                                                                                     |
 								| `assigns`               | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
 								| `requires`              | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
 								| `retokenizes`           | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~                                                                                                                                                                                                               |
 								| `default_score_weights` | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. If a weight is set to `None`, the score will not be logged or weighted. ~~Dict[str, Optional[float]]~~ |
-												Merge branch 'develop' into pr/6135

											
										
										
											2020-09-24 14:44:56 +03:00
+								| `func`                  | Optional function if not used as a decorator. ~~Optional[Callable[[...], Callable[[Doc], Doc]]]~~                                                                                                                                                                                                                                  |
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								## Language.\_\_call\_\_ {#call tag="method"}
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								Apply the pipeline to some text. The text can span multiple sentences, and can
 								contain arbitrary whitespace. Alignment into the original string is preserved.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								> #### Example
 								>
 								> ```python
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								> doc = nlp("An example sentence. Another sentence.")
 								> assert (doc[0].text, doc[0].head.tag_) == ("An", "NN")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name            | Description                                                                                                                                    |
 								| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `text`          | The text to be processed. ~~str~~                                                                                                              |
 								| _keyword-only_  |                                                                                                                                                |
 								| `disable`       | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). ~~List[str]~~                                                |
 								| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ |
 								| **RETURNS**     | A container for accessing the annotations. ~~Doc~~                                                                                             |
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								## Language.pipe {#pipe tag="method"}
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								Process texts as a stream, and yield `Doc` objects in order. This is usually
 								more efficient than processing texts one-by-one.
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
 								> #### Example
 								>
 								> ```python
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								> texts = ["One document.", "...", "Lots of documents"]
 								> for doc in nlp.pipe(texts, batch_size=50):
-												Update docs [ci skip]

											
										
										
											2020-09-22 10:45:41 +03:00
+								>     assert doc.has_annotation("DEP")
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name                                       | Description                                                                                                                                                         |
 								| ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `texts`                                    | A sequence of strings. ~~Iterable[str]~~                                                                                                                            |
 								| _keyword-only_                             |                                                                                                                                                                     |
 								| `as_tuples`                                | If set to `True`, inputs should be a sequence of `(text, context)` tuples. Output will then be a sequence of `(doc, context)` tuples. Defaults to `False`. ~~bool~~ |
-												Add nlp.batch_size setting

Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.

											
										
										
											2020-12-09 11:13:26 +03:00
+								| `batch_size`                               | The number of texts to buffer. ~~Optional[int]~~                                                                                                                    |
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| `disable`                                  | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). ~~List[str]~~                                                                     |
 								| `cleanup`                                  | If `True`, unneeded strings are freed to control memory use. Experimental. ~~bool~~                                                                                 |
 								| `component_cfg`                            | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~                      |
 								| `n_process` <Tag variant="new">2.2.2</Tag> | Number of processors to use. Defaults to `1`. ~~int~~                                                                                                               |
 								| **YIELDS**                                 | Documents in the order of the original text. ~~Doc~~                                                                                                                |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												Update docs [ci skip]

											
										
										
											2020-10-01 18:38:17 +03:00
+								## Language.initialize {#initialize tag="method" new="3"}
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												Update docs [ci skip]

											
										
										
											2020-08-11 21:57:23 +03:00
+								Initialize the pipeline for training and return an
-												Update docs [ci skip]

											
										
										
											2020-09-29 17:59:21 +03:00
+								[`Optimizer`](https://thinc.ai/docs/api-optimizers). Under the hood, it uses the
 								settings defined in the [`[initialize]`](/api/data-formats#config-initialize)
 								config block to set up the vocabulary, load in vectors and tok2vec weights and
 								pass optional arguments to the `initialize` methods implemented by pipeline
 								components or the tokenizer. This method is typically called automatically when
-												Update docs [ci skip]

											
										
										
											2020-10-02 14:24:33 +03:00
+								you run [`spacy train`](/api/cli#train). See the usage guide on the
 								[config lifecycle](/usage/training#config-lifecycle) and
 								[initialization](/usage/training#initialization) for details.
-												Update docs [ci skip]

											
										
										
											2020-09-29 17:59:21 +03:00
 								`get_examples` should be a function that returns an iterable of
 								[`Example`](/api/example) objects. The data examples can either be the full
 								training data or a representative sample. They are used to **initialize the
 								models** of trainable pipeline components and are passed each component's
 								[`initialize`](/api/pipe#initialize) method, if available. Initialization
 								includes validating the network,
-												fix link to shape inference section

											
										
										
											2020-09-10 11:22:59 +03:00
+								[inferring missing shapes](/usage/layers-architectures#thinc-shape-inference)
 								and setting up the label scheme based on the data.
-												Update docs [ci skip]

											
										
										
											2020-08-11 21:57:23 +03:00
-												begin_training -> initialize

											
										
										
											2020-09-28 22:35:09 +03:00
+								If no `get_examples` function is provided when calling `nlp.initialize`, the
-												document Pipe API details, crossreferences etc

											
										
										
											2020-09-09 16:56:27 +03:00
+								pipeline components will be initialized with generic data. In this case, it is
 								crucial that the output dimension of each component has already been defined
 								either in the [config](/usage/training#config), or by calling
 								[`pipe.add_label`](/api/pipe#add_label) for each possible output label (e.g. for
 								the tagger or textcat).
-												begin_training -> initialize

											
										
										
											2020-09-28 22:35:09 +03:00
+								<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
-												Update docs [ci skip]

											
										
										
											2020-08-11 21:57:23 +03:00
-												begin_training -> initialize

											
										
										
											2020-09-28 22:35:09 +03:00
+								This method was previously called `begin_training`. It now also takes a
 								**function** that is called with no arguments and returns a sequence of
 								[`Example`](/api/example) objects instead of tuples of `Doc` and `GoldParse`
 								objects.
-												Update docs [ci skip]

											
										
										
											2020-08-11 21:57:23 +03:00
 								</Infobox>
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								> #### Example
 								>
 								> ```python
-												Update docs [ci skip]

											
										
										
											2020-08-11 21:57:23 +03:00
+								> get_examples = lambda: examples
-												begin_training -> initialize

											
										
										
											2020-09-28 22:35:09 +03:00
+								> optimizer = nlp.initialize(get_examples)
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name           | Description                                                                                                                                              |
 								| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `get_examples` | Optional function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Optional[Callable[[], Iterable[Example]]]~~ |
 								| _keyword-only_ |                                                                                                                                                          |
 								| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                            |
 								| **RETURNS**    | The optimizer. ~~Optimizer~~                                                                                                                             |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								## Language.resume_training {#resume_training tag="method,experimental" new="3"}
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								Continue training a trained pipeline. Create and return an optimizer, and
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								initialize "rehearsal" for any pipeline component that has a `rehearse` method.
 								Rehearsal is used to prevent models from "forgetting" their initialized
 								"knowledge". To perform rehearsal, collect samples of text you want the models
 								to retain performance on, and call [`nlp.rehearse`](/api/language#rehearse) with
 								a batch of [Example](/api/example) objects.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								> #### Example
 								>
 								> ```python
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								> optimizer = nlp.resume_training()
 								> nlp.rehearse(examples, sgd=optimizer)
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name           | Description                                                                                                   |
 								| -------------- | ------------------------------------------------------------------------------------------------------------- |
 								| _keyword-only_ |                                                                                                               |
 								| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
 								| **RETURNS**    | The optimizer. ~~Optimizer~~                                                                                  |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.update {#update tag="method"}
 								Update the models in the pipeline.
-												Update docs [ci skip]

											
										
										
											2020-08-10 01:01:38 +03:00
+								<Infobox variant="warning" title="Changed in v3.0">
 								The `Language.update` method now takes a batch of [`Example`](/api/example)
 								objects instead of the raw texts and annotations or `Doc` and `GoldParse`
 								objects. An [`Example`](/api/example) streamlines how data is passed around. It
 								stores two `Doc` objects: one for holding the gold-standard reference data, and
 								one for holding the predictions of the pipeline.
 								For most use cases, you shouldn't have to write your own training scripts
 								anymore. Instead, you can use [`spacy train`](/api/cli#train) with a config file
 								and custom registered functions if needed. See the
 								[training documentation](/usage/training) for details.
 								</Infobox>
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> #### Example
 								>
 								> ```python
 								> for raw_text, entity_offsets in train_data:
 								>     doc = nlp.make_doc(raw_text)
-												fix component constructors, update, begin_training, reference to GoldParse

											
										
										
											2020-07-07 20:17:19 +03:00
+								>     example = Example.from_dict(doc, {"entities": entity_offsets})
 								>     nlp.update([example], sgd=optimizer)
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name            | Description                                                                                                                                    |
 								| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `examples`      | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                              |
 								| _keyword-only_  |                                                                                                                                                |
 								| `drop`          | The dropout rate. ~~float~~                                                                                                                    |
 								| `sgd`           | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                                  |
 								| `losses`        | Dictionary to update with the loss, keyed by pipeline component. ~~Optional[Dict[str, float]]~~                                                |
 								| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ |
 								| **RETURNS**     | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                                          |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												Update docs [ci skip]

											
										
										
											2020-08-11 21:57:23 +03:00
+								## Language.rehearse {#rehearse tag="method,experimental" new="3"}
-												Document Language.evaluate

											
										
										
											2019-05-24 15:06:36 +03:00
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
 								current model to make predictions similar to an initial model, to try to address
 								the "catastrophic forgetting" problem. This feature is experimental.
-												Document Language.evaluate

											
										
										
											2019-05-24 15:06:36 +03:00
 								> #### Example
 								>
 								> ```python
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								> optimizer = nlp.resume_training()
 								> losses = nlp.rehearse(examples, sgd=optimizer)
-												Document Language.evaluate

											
										
										
											2019-05-24 15:06:36 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name           | Description                                                                                                   |
 								| -------------- | ------------------------------------------------------------------------------------------------------------- |
 								| `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                             |
 								| _keyword-only_ |                                                                                                               |
 								| `drop`         | The dropout rate. ~~float~~                                                                                   |
 								| `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
 								| `losses`       | Dictionary to update with the loss, keyed by pipeline component. ~~Optional[Dict[str, float]]~~               |
 								| **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                         |
-												Document Language.evaluate

											
										
										
											2019-05-24 15:06:36 +03:00
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								## Language.evaluate {#evaluate tag="method"}
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								Evaluate a pipeline's components.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												Update docs [ci skip]

											
										
										
											2020-08-11 21:57:23 +03:00
+								<Infobox variant="warning" title="Changed in v3.0">
 								The `Language.update` method now takes a batch of [`Example`](/api/example)
 								objects instead of tuples of `Doc` and `GoldParse` objects.
 								</Infobox>
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> #### Example
 								>
 								> ```python
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								> scores = nlp.evaluate(examples)
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								> print(scores)
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name            | Description                                                                                                                                    |
 								| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `examples`      | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                              |
 								| _keyword-only_  |                                                                                                                                                |
-												Add nlp.batch_size setting

Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.

											
										
										
											2020-12-09 11:13:26 +03:00
+								| `batch_size`    | The batch size to use. ~~Optional[int]~~                                                                                                       |
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| `scorer`        | Optional [`Scorer`](/api/scorer) to use. If not passed in, a new one will be created. ~~Optional[Scorer]~~                                     |
 								| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ |
 								| `scorer_cfg`    | Optional dictionary of keyword arguments for the `Scorer`. Defaults to `None`. ~~Optional[Dict[str, Any]]~~                                    |
 								| **RETURNS**     | A dictionary of evaluation scores. ~~Dict[str, Union[float, Dict[str, float]]]~~                                                               |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.use_params {#use_params tag="contextmanager, method"}
 								Replace weights of models in the pipeline with those provided in the params
 								dictionary. Can be used as a context manager, in which case, models go back to
 								their original weights after the block.
 								> #### Example
 								>
 								> ```python
 								> with nlp.use_params(optimizer.averages):
 								>     nlp.to_disk("/tmp/checkpoint")
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name     | Description                                            |
 								| -------- | ------------------------------------------------------ |
 								| `params` | A dictionary of parameters keyed by model ID. ~~dict~~ |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.add_pipe {#add_pipe tag="method" new="2"}
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								Add a component to the processing pipeline. Expects a name that maps to a
 								component factory registered using
 								[`@Language.component`](/api/language#component) or
 								[`@Language.factory`](/api/language#factory). Components should be callables
 								that take a `Doc` object, modify it and return it. Only one of `before`,
 								`after`, `first` or `last` can be set. Default behavior is `last=True`.
 								<Infobox title="Changed in v3.0" variant="warning">
 								As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method doesn't
-												Update docs

											
										
										
											2020-08-05 16:00:54 +03:00
+								take callables anymore and instead expects the **name of a component factory**
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								registered using [`@Language.component`](/api/language#component) or
 								[`@Language.factory`](/api/language#factory). It now takes care of creating the
 								component, adds it to the pipeline and returns it.
 								</Infobox>
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								> #### Example
 								>
 								> ```python
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								> @Language.component("component")
 								> def component_func(doc):
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								>     # modify Doc and return it return doc
 								>
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								> nlp.add_pipe("component", before="ner")
 								> component = nlp.add_pipe("component", name="custom_name", last=True)
-												Update docs

											
										
										
											2020-08-05 16:00:54 +03:00
+								>
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								> # Add component from source pipeline
-												Update docs

											
										
										
											2020-08-05 16:00:54 +03:00
+								> source_nlp = spacy.load("en_core_web_sm")
 								> nlp.add_pipe("ner", source=source_nlp)
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> ```
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								| Name                                  | Description                                                                                                                                                                                                                                                                              |
 								| ------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `factory_name`                        | Name of the registered component factory. ~~str~~                                                                                                                                                                                                                                        |
 								| `name`                                | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. ~~Optional[str]~~                                                                                                              |
 								| _keyword-only_                        |                                                                                                                                                                                                                                                                                          |
 								| `before`                              | Component name or index to insert component directly before. ~~Optional[Union[str, int]]~~                                                                                                                                                                                               |
 								| `after`                               | Component name or index to insert component directly after. ~~Optional[Union[str, int]]~~                                                                                                                                                                                                |
 								| `first`                               | Insert component first / not first in the pipeline. ~~Optional[bool]~~                                                                                                                                                                                                                   |
 								| `last`                                | Insert component last / not last in the pipeline. ~~Optional[bool]~~                                                                                                                                                                                                                     |
 								| `config` <Tag variant="new">3</Tag>   | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. ~~Optional[Dict[str, Any]]~~                                                                                                                          |
 								| `source` <Tag variant="new">3</Tag>   | Optional source pipeline to copy component from. If a source is provided, the `factory_name` is interpreted as the name of the component in the source pipeline. Make sure that the vocab, vectors and settings of the source pipeline match the target pipeline. ~~Optional[Language]~~ |
 								| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~                                                                                                                                                           |
 								| **RETURNS**                           | The pipeline component. ~~Callable[[Doc], Doc]~~                                                                                                                                                                                                                                         |
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
-												Update docs

											
										
										
											2020-08-29 13:36:05 +03:00
+								## Language.create_pipe {#create_pipe tag="method" new="2"}
 								Create a pipeline component from a factory.
 								<Infobox title="Changed in v3.0" variant="warning">
 								As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method also takes
 								the string name of the factory, creates the component, adds it to the pipeline
 								and returns it. The `Language.create_pipe` method is now mostly used internally.
 								To create a component and add it to the pipeline, you should always use
 								`Language.add_pipe`.
 								</Infobox>
 								> #### Example
 								>
 								> ```python
 								> parser = nlp.create_pipe("parser")
 								> ```
 								| Name                                  | Description                                                                                                                                                                 |
 								| ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `factory_name`                        | Name of the registered component factory. ~~str~~                                                                                                                           |
 								| `name`                                | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. ~~Optional[str]~~ |
 								| _keyword-only_                        |                                                                                                                                                                             |
 								| `config` <Tag variant="new">3</Tag>   | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. ~~Optional[Dict[str, Any]]~~             |
 								| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~                                              |
 								| **RETURNS**                           | The pipeline component. ~~Callable[[Doc], Doc]~~                                                                                                                            |
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								## Language.has_factory {#has_factory tag="classmethod" new="3"}
 								Check whether a factory name is registered on the `Language` class or subclass.
 								Will check for
 								[language-specific factories](/usage/processing-pipelines#factories-language)
 								registered on the subclass, as well as general-purpose factories registered on
 								the `Language` base class, available to all subclasses.
 								> #### Example
 								>
 								> ```python
 								> from spacy.language import Language
 								> from spacy.lang.en import English
 								>
 								> @English.component("component")
 								> def component(doc):
 								>     return doc
 								>
 								> assert English.has_factory("component")
 								> assert not Language.has_factory("component")
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name        | Description                                                         |
 								| ----------- | ------------------------------------------------------------------- |
 								| `name`      | Name of the pipeline factory to check. ~~str~~                      |
 								| **RETURNS** | Whether a factory of that name is registered on the class. ~~bool~~ |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.has_pipe {#has_pipe tag="method" new="2"}
 								Check whether a component is present in the pipeline. Equivalent to
 								`name in nlp.pipe_names`.
 								> #### Example
 								>
 								> ```python
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								> @Language.component("component")
 								> def component(doc):
 								>     return doc
 								>
 								> nlp.add_pipe("component", name="my_component")
 								> assert "my_component" in nlp.pipe_names
 								> assert nlp.has_pipe("my_component")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name        | Description                                                       |
 								| ----------- | ----------------------------------------------------------------- |
 								| `name`      | Name of the pipeline component to check. ~~str~~                  |
 								| **RETURNS** | Whether a component of that name exists in the pipeline. ~~bool~~ |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.get_pipe {#get_pipe tag="method" new="2"}
 								Get a pipeline component for a given component name.
 								> #### Example
 								>
 								> ```python
 								> parser = nlp.get_pipe("parser")
 								> custom_component = nlp.get_pipe("custom_component")
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name        | Description                                      |
 								| ----------- | ------------------------------------------------ |
 								| `name`      | Name of the pipeline component to get. ~~str~~   |
 								| **RETURNS** | The pipeline component. ~~Callable[[Doc], Doc]~~ |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.replace_pipe {#replace_pipe tag="method" new="2"}
-												set_kb method for entity_linker

											
										
										
											2020-10-08 11:34:01 +03:00
+								Replace a component in the pipeline and return the new component.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								<Infobox title="Changed in v3.0" variant="warning">
 								As of v3.0, the `Language.replace_pipe` method doesn't take callables anymore
 								and instead expects the **name of a component factory** registered using
 								[`@Language.component`](/api/language#component) or
 								[`@Language.factory`](/api/language#factory).
 								</Infobox>
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> #### Example
 								>
 								> ```python
-												set_kb method for entity_linker

											
										
										
											2020-10-08 11:34:01 +03:00
+								> new_parser = nlp.replace_pipe("parser", "my_custom_parser")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name                                  | Description                                                                                                                                                        |
 								| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 								| `name`                                | Name of the component to replace. ~~str~~                                                                                                                          |
 								| `component`                           | The factory name of the component to insert. ~~str~~                                                                                                               |
 								| _keyword-only_                        |                                                                                                                                                                    |
 								| `config` <Tag variant="new">3</Tag>   | Optional config parameters to use for the new component. Will be merged with the `default_config` specified by the component factory. ~~Optional[Dict[str, Any]]~~ |
 								| `validate` <Tag variant="new">3</Tag> | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. ~~bool~~                                     |
-												set_kb method for entity_linker

											
										
										
											2020-10-08 11:34:01 +03:00
+								| **RETURNS**                           | The new pipeline component. ~~Callable[[Doc], Doc]~~                                                                                                               |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.rename_pipe {#rename_pipe tag="method" new="2"}
 								Rename a component in the pipeline. Useful to create custom names for
 								pre-defined and pre-loaded components. To change the default name of a component
 								added to the pipeline, you can also use the `name` argument on
 								[`add_pipe`](/api/language#add_pipe).
 								> #### Example
 								>
 								> ```python
 								> nlp.rename_pipe("parser", "spacy_parser")
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name       | Description                              |
 								| ---------- | ---------------------------------------- |
 								| `old_name` | Name of the component to rename. ~~str~~ |
 								| `new_name` | New name of the component. ~~str~~       |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.remove_pipe {#remove_pipe tag="method" new="2"}
 								Remove a component from the pipeline. Returns the removed component name and
 								component function.
 								> #### Example
 								>
 								> ```python
 								> name, component = nlp.remove_pipe("parser")
 								> assert name == "parser"
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name        | Description                                                                                |
 								| ----------- | ------------------------------------------------------------------------------------------ |
 								| `name`      | Name of the component to remove. ~~str~~                                                   |
 								| **RETURNS** | A `(name, component)` tuple of the removed component. ~~Tuple[str, Callable[[Doc], Doc]]~~ |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												Update docs

											
										
										
											2020-08-29 13:36:05 +03:00
+								## Language.disable_pipe {#disable_pipe tag="method" new="3"}
 								Temporarily disable a pipeline component so it's not run as part of the
 								pipeline. Disabled components are listed in
 								[`nlp.disabled`](/api/language#attributes) and included in
 								[`nlp.components`](/api/language#attributes), but not in
 								[`nlp.pipeline`](/api/language#pipeline), so they're not run when you process a
 								`Doc` with the `nlp` object. If the component is already disabled, this method
 								does nothing.
 								> #### Example
 								>
 								> ```python
 								> nlp.add_pipe("ner")
 								> nlp.add_pipe("textcat")
 								> assert nlp.pipe_names == ["ner", "textcat"]
 								> nlp.disable_pipe("ner")
 								> assert nlp.pipe_names == ["textcat"]
 								> assert nlp.component_names == ["ner", "textcat"]
 								> assert nlp.disabled == ["ner"]
 								> ```
 								| Name   | Description                               |
 								| ------ | ----------------------------------------- |
 								| `name` | Name of the component to disable. ~~str~~ |
 								## Language.enable_pipe {#enable_pipe tag="method" new="3"}
-												Proofreading

Proofread some API docs

											
										
										
											2020-09-24 14:15:28 +03:00
+								Enable a previously disabled component (e.g. via
-												Update docs

											
										
										
											2020-08-29 13:36:05 +03:00
+								[`Language.disable_pipes`](/api/language#disable_pipes)) so it's run as part of
 								the pipeline, [`nlp.pipeline`](/api/language#pipeline). If the component is
 								already enabled, this method does nothing.
 								> #### Example
 								>
 								> ```python
 								> nlp.disable_pipe("ner")
 								> assert "ner" in nlp.disabled
 								> assert not "ner" in nlp.pipe_names
 								> nlp.enable_pipe("ner")
 								> assert not "ner" in nlp.disabled
 								> assert "ner" in nlp.pipe_names
 								> ```
 								| Name   | Description                              |
 								| ------ | ---------------------------------------- |
 								| `name` | Name of the component to enable. ~~str~~ |
-												Feature toggle_pipes (#5378)

* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
											
										
										
											2020-05-18 23:27:10 +03:00
+								## Language.select_pipes {#select_pipes tag="contextmanager, method" new="3"}
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								Disable one or more pipeline components. If used as a context manager, the
 								pipeline will be restored to the initial state at the end of the block.
 								Otherwise, a `DisabledPipes` object is returned, that has a `.restore()` method
-												Update docs

											
										
										
											2020-07-04 17:25:30 +03:00
+								you can use to undo your changes. You can specify either `disable` (as a list or
 								string), or `enable`. In the latter case, all components not in the `enable`
-												Proofreading

Proofread some API docs

											
										
										
											2020-09-24 14:15:28 +03:00
+								list will be disabled. Under the hood, this method calls into
-												Update docs

											
										
										
											2020-08-29 13:36:05 +03:00
+								[`disable_pipe`](/api/language#disable_pipe) and
 								[`enable_pipe`](/api/language#enable_pipe).
-												Feature toggle_pipes (#5378)

* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
											
										
										
											2020-05-18 23:27:10 +03:00
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> #### Example
 								>
 								> ```python
-												Feature toggle_pipes (#5378)

* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
											
										
										
											2020-05-18 23:27:10 +03:00
+								> with nlp.select_pipes(disable=["tagger", "parser"]):
-												begin_training -> initialize

											
										
										
											2020-09-28 22:35:09 +03:00
+								>    nlp.initialize()
-												Also support passing list to Language.disable_pipes (#4521)

* Also support passing list to Language.disable_pipes

* Adjust internals

											
										
										
											2019-10-25 17:19:08 +03:00
+								>
-												Feature toggle_pipes (#5378)

* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
											
										
										
											2020-05-18 23:27:10 +03:00
+								> with nlp.select_pipes(enable="ner"):
-												begin_training -> initialize

											
										
										
											2020-09-28 22:35:09 +03:00
+								>     nlp.initialize()
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								>
-												Feature toggle_pipes (#5378)

* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
											
										
										
											2020-05-18 23:27:10 +03:00
+								> disabled = nlp.select_pipes(disable=["tagger", "parser"])
-												begin_training -> initialize

											
										
										
											2020-09-28 22:35:09 +03:00
+								> nlp.initialize()
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> disabled.restore()
 								> ```
-												Update docs

											
										
										
											2020-07-04 17:25:30 +03:00
+								<Infobox title="Changed in v3.0" variant="warning" id="disable_pipes">
-												Also support passing list to Language.disable_pipes (#4521)

* Also support passing list to Language.disable_pipes

* Adjust internals

											
										
										
											2019-10-25 17:19:08 +03:00
-												Feature toggle_pipes (#5378)

* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
											
										
										
											2020-05-18 23:27:10 +03:00
+								As of spaCy v3.0, the `disable_pipes` method has been renamed to `select_pipes`:
-												Also support passing list to Language.disable_pipes (#4521)

* Also support passing list to Language.disable_pipes

* Adjust internals

											
										
										
											2019-10-25 17:19:08 +03:00
 								```diff
-												Feature toggle_pipes (#5378)

* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
											
										
										
											2020-05-18 23:27:10 +03:00
+								- nlp.disable_pipes(["tagger", "parser"])
 								+ nlp.select_pipes(disable=["tagger", "parser"])
-												Also support passing list to Language.disable_pipes (#4521)

* Also support passing list to Language.disable_pipes

* Adjust internals

											
										
										
											2019-10-25 17:19:08 +03:00
+								```
 								</Infobox>
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name           | Description                                                                                            |
 								| -------------- | ------------------------------------------------------------------------------------------------------ |
 								| _keyword-only_ |                                                                                                        |
 								| `disable`      | Name(s) of pipeline components to disable. ~~Optional[Union[str, Iterable[str]]]~~                     |
-												Merge branch 'develop' into pr/6135

											
										
										
											2020-09-24 14:44:56 +03:00
+								| `enable`       | Name(s) of pipeline components that will not be disabled. ~~Optional[Union[str, Iterable[str]]]~~      |
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| **RETURNS**    | The disabled pipes that can be restored by calling the object's `.restore()` method. ~~DisabledPipes~~ |
-												Update docs

											
										
										
											2020-07-04 17:25:30 +03:00
-												API docs, docstrings and argument consistency

											
										
										
											2020-07-27 19:11:45 +03:00
+								## Language.get_factory_meta {#get_factory_meta tag="classmethod" new="3"}
 								Get the factory meta information for a given pipeline component name. Expects
 								the name of the component **factory**. The factory meta is an instance of the
 								[`FactoryMeta`](/api/language#factorymeta) dataclass and contains the
 								information about the component and its default provided by the
 								[`@Language.component`](/api/language#component) or
 								[`@Language.factory`](/api/language#factory) decorator.
 								> #### Example
 								>
 								> ```python
 								> factory_meta = Language.get_factory_meta("ner")
 								> assert factory_meta.factory == "ner"
 								> print(factory_meta.default_config)
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name        | Description                       |
 								| ----------- | --------------------------------- |
 								| `name`      | The factory name. ~~str~~         |
 								| **RETURNS** | The factory meta. ~~FactoryMeta~~ |
-												API docs, docstrings and argument consistency

											
										
										
											2020-07-27 19:11:45 +03:00
 								## Language.get_pipe_meta {#get_pipe_meta tag="method" new="3"}
 								Get the factory meta information for a given pipeline component name. Expects
 								the name of the component **instance** in the pipeline. The factory meta is an
 								instance of the [`FactoryMeta`](/api/language#factorymeta) dataclass and
 								contains the information about the component and its default provided by the
 								[`@Language.component`](/api/language#component) or
 								[`@Language.factory`](/api/language#factory) decorator.
 								> #### Example
 								>
 								> ```python
 								> nlp.add_pipe("ner", name="entity_recognizer")
 								> factory_meta = nlp.get_pipe_meta("entity_recognizer")
 								> assert factory_meta.factory == "ner"
 								> print(factory_meta.default_config)
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name        | Description                          |
 								| ----------- | ------------------------------------ |
 								| `name`      | The pipeline component name. ~~str~~ |
 								| **RETURNS** | The factory meta. ~~FactoryMeta~~    |
-												API docs, docstrings and argument consistency

											
										
										
											2020-07-27 19:11:45 +03:00
-												Update docs [ci skip]

											
										
										
											2020-07-31 19:55:38 +03:00
+								## Language.analyze_pipes {#analyze_pipes tag="method" new="3"}
 								Analyze the current pipeline components and show a summary of the attributes
 								they assign and require, and the scores they set. The data is based on the
 								information provided in the [`@Language.component`](/api/language#component) and
 								[`@Language.factory`](/api/language#factory) decorator. If requirements aren't
 								met, e.g. if a component specifies a required property that is not set by a
 								previous component, a warning is shown.
 								<Infobox variant="warning" title="Important note">
 								The pipeline analysis is static and does **not actually run the components**.
 								This means that it relies on the information provided by the components
 								themselves. If a custom component declares that it assigns an attribute but it
 								doesn't, the pipeline analysis won't catch that.
 								</Infobox>
 								> #### Example
 								>
 								> ```python
 								> nlp = spacy.blank("en")
 								> nlp.add_pipe("tagger")
 								> nlp.add_pipe("entity_linker")
-												Simplify pipe analysis

- remove unused code
- don't print by default
- integrate attrs info into analysis output

											
										
										
											2020-08-01 14:40:06 +03:00
+								> analysis = nlp.analyze_pipes()
-												Update docs [ci skip]

											
										
										
											2020-07-31 19:55:38 +03:00
+								> ```
 								<Accordion title="Example output" spaced>
-												Simplify pipe analysis

- remove unused code
- don't print by default
- integrate attrs info into analysis output

											
										
										
											2020-08-01 14:40:06 +03:00
+								```json
 								### Structured
 								{
 								  "summary": {
 								    "tagger": {
 								      "assigns": ["token.tag"],
 								      "requires": [],
 								      "scores": ["tag_acc", "pos_acc", "lemma_acc"],
 								      "retokenizes": false
 								    },
 								    "entity_linker": {
 								      "assigns": ["token.ent_kb_id"],
 								      "requires": ["doc.ents", "doc.sents", "token.ent_iob", "token.ent_type"],
 								      "scores": [],
 								      "retokenizes": false
 								    }
 								  },
 								  "problems": {
 								    "tagger": [],
 								    "entity_linker": ["doc.ents", "doc.sents", "token.ent_iob", "token.ent_type"]
 								  },
 								  "attrs": {
 								    "token.ent_iob": { "assigns": [], "requires": ["entity_linker"] },
 								    "doc.ents": { "assigns": [], "requires": ["entity_linker"] },
 								    "token.ent_kb_id": { "assigns": ["entity_linker"], "requires": [] },
 								    "doc.sents": { "assigns": [], "requires": ["entity_linker"] },
 								    "token.tag": { "assigns": ["tagger"], "requires": [] },
 								    "token.ent_type": { "assigns": [], "requires": ["entity_linker"] }
 								  }
 								}
-												Update docs [ci skip]

											
										
										
											2020-07-31 19:55:38 +03:00
+								```
-												Simplify pipe analysis

- remove unused code
- don't print by default
- integrate attrs info into analysis output

											
										
										
											2020-08-01 14:40:06 +03:00
 								```
 								### Pretty
-												Update docs [ci skip]

											
										
										
											2020-07-31 19:55:38 +03:00
+								============================= Pipeline Overview =============================
-												Update docs [ci skip]

											
										
										
											2020-11-09 07:43:26 +03:00
+								#   Component       Assigns           Requires         Scores        Retokenizes
 								-   -------------   ---------------   --------------   -----------   -----------
 tagger          token.tag                          tag_acc       False
 entity_linker   token.ent_kb_id   doc.ents         nel_micro_f   False
 								                                      doc.sents        nel_micro_r
 								                                      token.ent_iob    nel_micro_p
-												Update docs [ci skip]

											
										
										
											2020-07-31 19:55:38 +03:00
+								                                      token.ent_type
 								================================ Problems (4) ================================
 								⚠ 'entity_linker' requirements not met: doc.ents, doc.sents,
 								token.ent_iob, token.ent_type
 								```
 								</Accordion>
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name           | Description                                                                                                                                                                                                                                 |
 								| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| _keyword-only_ |                                                                                                                                                                                                                                             |
 								| `keys`         | The values to display in the table. Corresponds to attributes of the [`FactoryMeta`](/api/language#factorymeta). Defaults to `["assigns", "requires", "scores", "retokenizes"]`. ~~List[str]~~                                              |
 								| `pretty`       | Pretty-print the results as a table. Defaults to `False`. ~~bool~~                                                                                                                                                                          |
 								| **RETURNS**    | Dictionary containing the pipe analysis, keyed by `"summary"` (component meta by pipe), `"problems"` (attribute names by pipe) and `"attrs"` (pipes that assign and require an attribute, keyed by attribute). ~~Optional[Dict[str, Any]]~~ |
-												Update docs [ci skip]

											
										
										
											2020-07-31 19:55:38 +03:00
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
+								## Language.meta {#meta tag="property"}
-												Fix consistency and update docs

											
										
										
											2020-09-15 12:37:37 +03:00
+								Meta data for the `Language` class, including name, version, data sources,
 								license, author information and more. If a trained pipeline is loaded, this
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								contains meta data of the pipeline. The `Language.meta` is also what's
-												Fix consistency and update docs

											
										
										
											2020-09-15 12:37:37 +03:00
+								serialized as the `meta.json` when you save an `nlp` object to disk. See the
 								[meta data format](/api/data-formats#meta) for more details.
 								<Infobox variant="warning" title="Changed in v3.0">
 								As of v3.0, the meta only contains **meta information** about the pipeline and
 								isn't used to construct the language class and pipeline components. This
 								information is expressed in the [`config.cfg`](/api/data-formats#config).
 								</Infobox>
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
 								> #### Example
 								>
 								> ```python
 								> print(nlp.meta)
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name        | Description                       |
 								| ----------- | --------------------------------- |
 								| **RETURNS** | The meta data. ~~Dict[str, Any]~~ |
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
 								## Language.config {#config tag="property" new="3"}
 								Export a trainable [`config.cfg`](/api/data-formats#config) for the current
 								`nlp` object. Includes the current pipeline, all configs used to create the
 								currently active pipeline components, as well as the default training config
 								that can be used with [`spacy train`](/api/cli#train). `Language.config` returns
 								a [Thinc `Config` object](https://thinc.ai/docs/api-config#config), which is a
 								subclass of the built-in `dict`. It supports the additional methods `to_disk`
 								(serialize the config to a file) and `to_str` (output the config as a string).
 								> #### Example
 								>
 								> ```python
 								> nlp.config.to_disk("./config.cfg")
 								> print(nlp.config.to_str())
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name        | Description            |
 								| ----------- | ---------------------- |
 								| **RETURNS** | The config. ~~Config~~ |
-												Update docs [ci skip]

											
										
										
											2020-07-27 01:29:45 +03:00
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								## Language.to_disk {#to_disk tag="method" new="2"}
-												Fix typos and wording [ci skip]

											
										
										
											2020-09-03 17:37:45 +03:00
+								Save the current state to a directory. Under the hood, this method delegates to
 								the `to_disk` methods of the individual pipeline components, if available. This
 								means that if a trained pipeline is loaded, all components and their weights
 								will be saved to disk.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								> #### Example
 								>
 								> ```python
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								> nlp.to_disk("/path/to/pipeline")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name           | Description                                                                                                                                |
 								| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
 								| `path`         | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
 								| _keyword-only_ |                                                                                                                                            |
 								| `exclude`      | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~                                |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.from_disk {#from_disk tag="method" new="2"}
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								Loads state from a directory, including all data that was saved with the
 								`Language` object. Modifies the object in place and returns it.
 								<Infobox variant="warning" title="Important note">
-												Merge branch 'develop' into pr/6135

											
										
										
											2020-09-24 14:44:56 +03:00
+								Keep in mind that this method **only loads the serialized state** and doesn't
 								set up the `nlp` object. This means that it requires the correct language class
 								to be initialized and all pipeline components to be added to the pipeline. If
 								you want to load a serialized pipeline from a directory, you should use
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								[`spacy.load`](/api/top-level#spacy.load), which will set everything up for you.
 								</Infobox>
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								> #### Example
 								>
 								> ```python
 								> from spacy.language import Language
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								> nlp = Language().from_disk("/path/to/pipeline")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								>
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								> # Using language-specific subclass
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> from spacy.lang.en import English
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								> nlp = English().from_disk("/path/to/pipeline")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name           | Description                                                                                                 |
 								| -------------- | ----------------------------------------------------------------------------------------------------------- |
 								| `path`         | A path to a directory. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~             |
 								| _keyword-only_ |                                                                                                             |
 								| `exclude`      | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
 								| **RETURNS**    | The modified `Language` object. ~~Language~~                                                                |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.to_bytes {#to_bytes tag="method"}
 								Serialize the current state to a binary string.
 								> #### Example
 								>
 								> ```python
 								> nlp_bytes = nlp.to_bytes()
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name           | Description                                                                                            |
 								| -------------- | ------------------------------------------------------------------------------------------------------ |
 								| _keyword-only_ |                                                                                                        |
 								| `exclude`      | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. ~~iterable~~ |
 								| **RETURNS**    | The serialized form of the `Language` object. ~~bytes~~                                                |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Language.from_bytes {#from_bytes tag="method"}
 								Load state from a binary string. Note that this method is commonly used via the
 								subclasses like `English` or `German` to make language-specific functionality
-												Update docs [ci skip]

											
										
										
											2020-10-02 14:24:33 +03:00
+								like the [lexical attribute getters](/usage/linguistic-features#language-data)
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								available to the loaded object.
 								> #### Example
 								>
 								> ```python
 								> from spacy.lang.en import English
 								> nlp_bytes = nlp.to_bytes()
 								> nlp2 = English()
 								> nlp2.from_bytes(nlp_bytes)
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 17:45:24 +03:00
+								| Name           | Description                                                                                                 |
 								| -------------- | ----------------------------------------------------------------------------------------------------------- |
 								| `bytes_data`   | The data to load from. ~~bytes~~                                                                            |
 								| _keyword-only_ |                                                                                                             |
 								| `exclude`      | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
 								| **RETURNS**    | The `Language` object. ~~Language~~                                                                         |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Attributes {#attributes}
-												Update docs

											
										
										
											2020-08-29 13:36:05 +03:00
+								| Name                                          | Description                                                                                                                                    |
 								| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `vocab`                                       | A container for the lexical types. ~~Vocab~~                                                                                                   |
 								| `tokenizer`                                   | The tokenizer. ~~Tokenizer~~                                                                                                                   |
 								| `make_doc`                                    | Callable that takes a string and returns a `Doc`. ~~Callable[[str], Doc]~~                                                                     |
 								| `pipeline`                                    | List of `(name, component)` tuples describing the current processing pipeline, in order. ~~List[Tuple[str, Callable[[Doc], Doc]]]~~            |
 								| `pipe_names` <Tag variant="new">2</Tag>       | List of pipeline component names, in order. ~~List[str]~~                                                                                      |
 								| `pipe_labels` <Tag variant="new">2.2</Tag>    | List of labels set by the pipeline components, if available, keyed by component name. ~~Dict[str, List[str]]~~                                 |
 								| `pipe_factories` <Tag variant="new">2.2</Tag> | Dictionary of pipeline component names, mapped to their factory names. ~~Dict[str, str]~~                                                      |
 								| `factories`                                   | All available factory functions, keyed by name. ~~Dict[str, Callable[[...], Callable[[Doc], Doc]]]~~                                           |
 								| `factory_names` <Tag variant="new">3</Tag>    | List of all available factory names. ~~List[str]~~                                                                                             |
 								| `components` <Tag variant="new">3</Tag>       | List of all available `(name, component)` tuples, including components that are currently disabled. ~~List[Tuple[str, Callable[[Doc], Doc]]]~~ |
 								| `component_names` <Tag variant="new">3</Tag>  | List of all available component names, including components that are currently disabled. ~~List[str]~~                                         |
 								| `disabled` <Tag variant="new">3</Tag>         | Names of components that are currently disabled and don't run as part of the pipeline. ~~List[str]~~                                           |
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								| `path` <Tag variant="new">2</Tag>             | Path to the pipeline data directory, if a pipeline is loaded from a path or package. Otherwise `None`. ~~Optional[Path]~~                      |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								## Class attributes {#class-attributes}
-												Update docs [ci skip]

											
										
										
											2020-09-12 18:05:10 +03:00
+								| Name             | Description                                                                                                                                                                       |
 								| ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `Defaults`       | Settings, data and factory methods for creating the `nlp` object and processing pipeline. ~~Defaults~~                                                                            |
 								| `lang`           | Two-letter language ID, i.e. [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). ~~str~~                                                                           |
 								| `default_config` | Base [config](/usage/training#config) to use for [Language.config](/api/language#config). Defaults to [`default_config.cfg`](%%GITHUB_SPACY/spacy/default_config.cfg). ~~Config~~ |
-												Update docs [ci skip]

											
										
										
											2020-07-25 19:51:12 +03:00
 								## Defaults {#defaults}
 								The following attributes can be set on the `Language.Defaults` class to
 								customize the default language data:
 								> #### Example
 								>
 								> ```python
 								> from spacy.language import language
 								> from spacy.lang.tokenizer_exceptions import URL_MATCH
 								> from thinc.api import Config
 								>
 								> DEFAULT_CONFIFG = """
 								> [nlp.tokenizer]
 								> @tokenizers = "MyCustomTokenizer.v1"
 								> """
 								>
 								> class Defaults(Language.Defaults):
 								>    stop_words = set()
 								>    tokenizer_exceptions = {}
 								>    prefixes = tuple()
 								>    suffixes = tuple()
 								>    infixes = tuple()
 								>    token_match = None
 								>    url_match = URL_MATCH
 								>    lex_attr_getters = {}
 								>    syntax_iterators = {}
 								>    writing_system = {"direction": "ltr", "has_case": True, "has_letters": True}
 								>    config = Config().from_str(DEFAULT_CONFIG)
 								> ```
-												Update docs [ci skip]

											
										
										
											2020-09-12 18:05:10 +03:00
+								| Name                              | Description                                                                                                                                                                                                                                                                                                      |
 								| --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `stop_words`                      | List of stop words, used for `Token.is_stop`.<br />**Example:** [`stop_words.py`](%%GITHUB_SPACY/spacy/lang/en/stop_words.py) ~~Set[str]~~                                                                                                                                                                       |
 								| `tokenizer_exceptions`            | Tokenizer exception rules, string mapped to list of token attributes.<br />**Example:** [`de/tokenizer_exceptions.py`](%%GITHUB_SPACY/spacy/lang/de/tokenizer_exceptions.py) ~~Dict[str, List[dict]]~~                                                                                                           |
 								| `prefixes`, `suffixes`, `infixes` | Prefix, suffix and infix rules for the default tokenizer.<br />**Example:** [`puncutation.py`](%%GITHUB_SPACY/spacy/lang/punctuation.py) ~~Optional[List[Union[str, Pattern]]]~~                                                                                                                                 |
 								| `token_match`                     | Optional regex for matching strings that should never be split, overriding the infix rules.<br />**Example:** [`fr/tokenizer_exceptions.py`](%%GITHUB_SPACY/spacy/lang/fr/tokenizer_exceptions.py) ~~Optional[Pattern]~~                                                                                         |
 								| `url_match`                       | Regular expression for matching URLs. Prefixes and suffixes are removed before applying the match.<br />**Example:** [`tokenizer_exceptions.py`](%%GITHUB_SPACY/spacy/lang/tokenizer_exceptions.py) ~~Optional[Pattern]~~                                                                                        |
 								| `lex_attr_getters`                | Custom functions for setting lexical attributes on tokens, e.g. `like_num`.<br />**Example:** [`lex_attrs.py`](%%GITHUB_SPACY/spacy/lang/en/lex_attrs.py) ~~Dict[int, Callable[[str], Any]]~~                                                                                                                    |
 								| `syntax_iterators`                | Functions that compute views of a `Doc` object based on its syntax. At the moment, only used for [noun chunks](/usage/linguistic-features#noun-chunks).<br />**Example:** [`syntax_iterators.py`](%%GITHUB_SPACY/spacy/lang/en/syntax_iterators.py). ~~Dict[str, Callable[[Union[Doc, Span]], Iterator[Span]]]~~ |
 								| `writing_system`                  | Information about the language's writing system, available via `Vocab.writing_system`. Defaults to: `{"direction": "ltr", "has_case": True, "has_letters": True}.`.<br />**Example:** [`zh/__init__.py`](%%GITHUB_SPACY/spacy/lang/zh/__init__.py) ~~Dict[str, Any]~~                                            |
 								| `config`                          | Default [config](/usage/training#config) added to `nlp.config`. This can include references to custom tokenizers or lemmatizers.<br />**Example:** [`zh/__init__.py`](%%GITHUB_SPACY/spacy/lang/zh/__init__.py) ~~Config~~                                                                                       |
-												💫 Make serialization methods consistent (#3385)

* Make serialization methods consistent

exclude keyword argument instead of random named keyword arguments and deprecation handling

* Update docs and add section on serialization fields

											
										
										
											2019-03-10 21:16:45 +03:00
 								## Serialization fields {#serialization-fields}
 								During serialization, spaCy will export several data fields used to restore
 								different aspects of the object. If needed, you can exclude them from
 								serialization by passing in the string names via the `exclude` argument.
 								> #### Example
 								>
 								> ```python
 								> data = nlp.to_bytes(exclude=["tokenizer", "vocab"])
-												"model" terminology consistency in docs

											
										
										
											2020-09-03 14:13:03 +03:00
+								> nlp.from_disk("/pipeline", exclude=["ner"])
-												💫 Make serialization methods consistent (#3385)

* Make serialization methods consistent

exclude keyword argument instead of random named keyword arguments and deprecation handling

* Update docs and add section on serialization fields

											
										
										
											2019-03-10 21:16:45 +03:00
+								> ```
-												Update docs and consistency [ci skip]

											
										
										
											2020-08-18 15:39:40 +03:00
+								| Name        | Description                                                        |
 								| ----------- | ------------------------------------------------------------------ |
 								| `vocab`     | The shared [`Vocab`](/api/vocab).                                  |
 								| `tokenizer` | Tokenization rules and exceptions.                                 |
 								| `meta`      | The meta data, available as [`Language.meta`](/api/language#meta). |
 								| ...         | String names of pipeline components, e.g. `"ner"`.                 |
-												API docs, docstrings and argument consistency

											
										
										
											2020-07-27 19:11:45 +03:00
 								## FactoryMeta {#factorymeta new="3" tag="dataclass"}
 								The `FactoryMeta` contains the information about the component and its default
 								provided by the [`@Language.component`](/api/language#component) or
 								[`@Language.factory`](/api/language#factory) decorator. It's created whenever a
-												Update docstrings, docs and types

											
										
										
											2020-07-29 12:36:42 +03:00
+								component is defined and stored on the `Language` class for each component
 								instance and factory instance.
-												API docs, docstrings and argument consistency

											
										
										
											2020-07-27 19:11:45 +03:00
-												Fix handling of score_weights

											
										
										
											2020-09-24 11:27:33 +03:00
+								| Name                    | Description                                                                                                                                                                                                                                                                                                                        |
 								| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `factory`               | The name of the registered component factory. ~~str~~                                                                                                                                                                                                                                                                              |
 								| `default_config`        | The default config, describing the default values of the factory arguments. ~~Dict[str, Any]~~                                                                                                                                                                                                                                     |
 								| `assigns`               | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
 								| `requires`              | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
 								| `retokenizes`           | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~                                                                                                                                                                                                               |
 								| `default_score_weights` | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. If a weight is set to `None`, the score will not be logged or weighted. ~~Dict[str, Optional[float]]~~ |
 								| `scores`                | All scores set by the components if it's trainable, e.g. `["ents_f", "ents_r", "ents_p"]`. Based on the `default_score_weights` and used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                              |