mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 16:07:41 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			836 lines
		
	
	
		
			55 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			836 lines
		
	
	
		
			55 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | ||
| title: Language
 | ||
| teaser: A text-processing pipeline
 | ||
| tag: class
 | ||
| source: spacy/language.py
 | ||
| ---
 | ||
| 
 | ||
| Usually you'll load this once per process as `nlp` and pass the instance around
 | ||
| your application. The `Language` class is created when you call
 | ||
| [`spacy.load()`](/api/top-level#spacy.load) and contains the shared vocabulary
 | ||
| and [language data](/usage/adding-languages), optional model data loaded from a
 | ||
| [model package](/models) or a path, and a
 | ||
| [processing pipeline](/usage/processing-pipelines) containing components like
 | ||
| the tagger or parser that are called on a document in order. You can also add
 | ||
| your own processing pipeline components that take a `Doc` object, modify it and
 | ||
| return it.
 | ||
| 
 | ||
| ## Language.\_\_init\_\_ {#init tag="method"}
 | ||
| 
 | ||
| Initialize a `Language` object.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > # Construction from subclass
 | ||
| > from spacy.lang.en import English
 | ||
| > nlp = English()
 | ||
| >
 | ||
| > # Construction from scratch
 | ||
| > from spacy.vocab import Vocab
 | ||
| > from spacy.language import Language
 | ||
| > nlp = Language(Vocab())
 | ||
| > ```
 | ||
| 
 | ||
| | Name               | Type        | Description                                                                                |
 | ||
| | ------------------ | ----------- | ------------------------------------------------------------------------------------------ |
 | ||
| | `vocab`            | `Vocab`     | A `Vocab` object. If `True`, a vocab is created using the default language data settings.  |
 | ||
| | _keyword-only_     |             |                                                                                            |
 | ||
| | `max_length`       | int         | Maximum number of characters allowed in a single text. Defaults to `10 ** 6`.              |
 | ||
| | `meta`             | dict        | Custom meta data for the `Language` class. Is written to by models to add model meta data. |
 | ||
| | `create_tokenizer` |  `Callable` | Optional function that receives the `nlp` object and returns a tokenizer.                  |
 | ||
| 
 | ||
| ## Language.from_config {#from_config tag="classmethod"}
 | ||
| 
 | ||
| Create a `Language` object from a loaded config. Will set up the tokenizer and
 | ||
| language data, add pipeline components based on the pipeline and components
 | ||
| define in the config and validate the results. If no config is provided, the
 | ||
| default config of the given language is used. This is also how spaCy loads a
 | ||
| model under the hood based on its [`config.cfg`](/api/data-formats#config).
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > from thinc.api import Config
 | ||
| > from spacy.language import Language
 | ||
| >
 | ||
| > config = Config().from_disk("./config.cfg")
 | ||
| > nlp = Language.from_config(config)
 | ||
| > ```
 | ||
| 
 | ||
| | Name           | Type                                                                   | Description                                                                                                                             |
 | ||
| | -------------- | ---------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
 | ||
| | `config`       | `Dict[str, Any]` / [`Config`](https://thinc.ai/docs/api-config#config) | The loaded config.                                                                                                                      |
 | ||
| | _keyword-only_ |                                                                        |
 | ||
| | `disable`      | `Iterable[str]`                                                        | List of pipeline component names to disable.                                                                                            |
 | ||
| | `auto_fill`    | bool                                                                   | Whether to automatically fill in missing values in the config, based on defaults and function argument annotations. Defaults to `True`. |
 | ||
| | `validate`     | bool                                                                   | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`.                   |
 | ||
| | **RETURNS**    | `Language`                                                             | The initialized object.                                                                                                                 |
 | ||
| 
 | ||
| ## Language.component {#component tag="classmethod" new="3"}
 | ||
| 
 | ||
| Register a custom pipeline component under a given name. This allows
 | ||
| initializing the component by name using
 | ||
| [`Language.add_pipe`](/api/language#add_pipe) and referring to it in
 | ||
| [config files](/usage/training#config). This classmethod and decorator is
 | ||
| intended for **simple stateless functions** that take a `Doc` and return it. For
 | ||
| more complex stateful components that allow settings and need access to the
 | ||
| shared `nlp` object, use the [`Language.factory`](/api/language#factory)
 | ||
| decorator. For more details and examples, see the
 | ||
| [usage documentation](/usage/processing-pipelines#custom-components).
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > from spacy.language import Language
 | ||
| >
 | ||
| > # Usage as a decorator
 | ||
| > @Language.component("my_component")
 | ||
| > def my_component(doc):
 | ||
| >    # Do something to the doc
 | ||
| >    return doc
 | ||
| >
 | ||
| > # Usage as a function
 | ||
| > Language.component("my_component2", func=my_component)
 | ||
| > ```
 | ||
| 
 | ||
| | Name                    | Type                 | Description                                                                                                                                                                                                                 |
 | ||
| | ----------------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | ||
| | `name`                  | str                  | The name of the component factory.                                                                                                                                                                                          |
 | ||
| | _keyword-only_          |                      |                                                                                                                                                                                                                             |
 | ||
| | `assigns`               | `Iterable[str]`      | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something -->                                                                               |
 | ||
| | `requires`              | `Iterable[str]`      | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something -->                                                                               |
 | ||
| | `retokenizes`           | bool                 | Whether the component changes tokenization. Used for pipeline analysis. <!-- TODO: link to something -->                                                                                                                    |
 | ||
| | `scores`                | `Iterable[str]`      | All scores set by the components if it's trainable, e.g. `["ents_f", "ents_r", "ents_p"]`.                                                                                                                                  |
 | ||
| | `default_score_weights` | `Dict[str, float]`   | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. |
 | ||
| | `func`                  | `Optional[Callable]` | Optional function if not used a a decorator.                                                                                                                                                                                |
 | ||
| 
 | ||
| ## Language.factory {#factory tag="classmethod"}
 | ||
| 
 | ||
| Register a custom pipeline component factory under a given name. This allows
 | ||
| initializing the component by name using
 | ||
| [`Language.add_pipe`](/api/language#add_pipe) and referring to it in
 | ||
| [config files](/usage/training#config). The registered factory function needs to
 | ||
| take at least two **named arguments** which spaCy fills in automatically: `nlp`
 | ||
| for the current `nlp` object and `name` for the component instance name. This
 | ||
| can be useful to distinguish multiple instances of the same component and allows
 | ||
| trainable components to add custom losses using the component instance name. The
 | ||
| `default_config` defines the default values of the remaining factory arguments.
 | ||
| It's merged into the [`nlp.config`](/api/language#config). For more details and
 | ||
| examples, see the
 | ||
| [usage documentation](/usage/processing-pipelines#custom-components).
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > from spacy.language import Language
 | ||
| >
 | ||
| > # Usage as a decorator
 | ||
| > @Language.factory(
 | ||
| >    "my_component",
 | ||
| >    default_config={"some_setting": True},
 | ||
| > )
 | ||
| > def create_my_component(nlp, name, some_setting):
 | ||
| >      return MyComponent(some_setting)
 | ||
| >
 | ||
| > # Usage as function
 | ||
| > Language.factory(
 | ||
| >     "my_component",
 | ||
| >     default_config={"some_setting": True},
 | ||
| >     func=create_my_component
 | ||
| > )
 | ||
| > ```
 | ||
| 
 | ||
| | Name                    | Type                 | Description                                                                                                                                                                                                                 |
 | ||
| | ----------------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | ||
| | `name`                  | str                  | The name of the component factory.                                                                                                                                                                                          |
 | ||
| | _keyword-only_          |                      |                                                                                                                                                                                                                             |
 | ||
| | `default_config`        | `Dict[str, any]`     | The default config, describing the default values of the factory arguments.                                                                                                                                                 |
 | ||
| | `assigns`               | `Iterable[str]`      | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something -->                                                                               |
 | ||
| | `requires`              | `Iterable[str]`      | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something -->                                                                               |
 | ||
| | `retokenizes`           | bool                 | Whether the component changes tokenization. Used for pipeline analysis. <!-- TODO: link to something -->                                                                                                                    |
 | ||
| | `scores`                | `Iterable[str]`      | All scores set by the components if it's trainable, e.g. `["ents_f", "ents_r", "ents_p"]`.                                                                                                                                  |
 | ||
| | `default_score_weights` | `Dict[str, float]`   | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. |
 | ||
| | `func`                  | `Optional[Callable]` | Optional function if not used a a decorator.                                                                                                                                                                                |
 | ||
| 
 | ||
| ## Language.\_\_call\_\_ {#call tag="method"}
 | ||
| 
 | ||
| Apply the pipeline to some text. The text can span multiple sentences, and can
 | ||
| contain arbitrary whitespace. Alignment into the original string is preserved.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > doc = nlp("An example sentence. Another sentence.")
 | ||
| > assert (doc[0].text, doc[0].head.tag_) == ("An", "NN")
 | ||
| > ```
 | ||
| 
 | ||
| | Name            | Type              | Description                                                                                            |
 | ||
| | --------------- | ----------------- | ------------------------------------------------------------------------------------------------------ |
 | ||
| | `text`          | str               | The text to be processed.                                                                              |
 | ||
| | _keyword-only_  |                   |                                                                                                        |
 | ||
| | `disable`       | `List[str]`       | Names of pipeline components to [disable](/usage/processing-pipelines#disabling).                      |
 | ||
| | `component_cfg` | `Dict[str, dict]` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. |
 | ||
| | **RETURNS**     | [`Doc`](/api/doc) | A container for accessing the annotations.                                                             |
 | ||
| 
 | ||
| ## Language.pipe {#pipe tag="method"}
 | ||
| 
 | ||
| Process texts as a stream, and yield `Doc` objects in order. This is usually
 | ||
| more efficient than processing texts one-by-one.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > texts = ["One document.", "...", "Lots of documents"]
 | ||
| > for doc in nlp.pipe(texts, batch_size=50):
 | ||
| >     assert doc.is_parsed
 | ||
| > ```
 | ||
| 
 | ||
| | Name                                       | Type              | Description                                                                                                                                                |
 | ||
| | ------------------------------------------ | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | ||
| | `texts`                                    | `Iterable[str]`   | A sequence of strings.                                                                                                                                     |
 | ||
| | _keyword-only_                             |                   |                                                                                                                                                            |
 | ||
| | `as_tuples`                                | bool              | If set to `True`, inputs should be a sequence of `(text, context)` tuples. Output will then be a sequence of `(doc, context)` tuples. Defaults to `False`. |
 | ||
| | `batch_size`                               | int               | The number of texts to buffer.                                                                                                                             |
 | ||
| | `disable`                                  | `List[str]`       | Names of pipeline components to [disable](/usage/processing-pipelines#disabling).                                                                          |
 | ||
| | `cleanup`                                  | bool              | If `True`, unneeded strings are freed to control memory use. Experimental.                                                                                 |
 | ||
| | `component_cfg`                            | `Dict[str, dict]` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`.                                                     |
 | ||
| | `n_process` <Tag variant="new">2.2.2</Tag> | int               | Number of processors to use, only supported in Python 3. Defaults to `1`.                                                                                  |
 | ||
| | **YIELDS**                                 | `Doc`             | Documents in the order of the original text.                                                                                                               |
 | ||
| 
 | ||
| ## Language.begin_training {#begin_training tag="method"}
 | ||
| 
 | ||
| Initialize the pipe for training, using data examples if available. Returns an
 | ||
| [`Optimizer`](https://thinc.ai/docs/api-optimizers) object.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > optimizer = nlp.begin_training(get_examples)
 | ||
| > ```
 | ||
| 
 | ||
| | Name           | Type                                                | Description                                                                                                 |
 | ||
| | -------------- | --------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
 | ||
| | `get_examples` | `Callable[[], Iterable[Example]]`                   | Optional function that returns gold-standard annotations in the form of [`Example`](/api/example) objects.  |
 | ||
| | _keyword-only_ |                                                     |                                                                                                             |
 | ||
| | `sgd`          | [`Optimizer`](https://thinc.ai/docs/api-optimizers) | An optional optimizer. Will be created via [`create_optimizer`](/api/language#create_optimizer) if not set. |
 | ||
| | **RETURNS**    | [`Optimizer`](https://thinc.ai/docs/api-optimizers) | The optimizer.                                                                                              |
 | ||
| 
 | ||
| ## Language.resume_training {#resume_training tag="method,experimental" new="3"}
 | ||
| 
 | ||
| Continue training a pretrained model. Create and return an optimizer, and
 | ||
| initialize "rehearsal" for any pipeline component that has a `rehearse` method.
 | ||
| Rehearsal is used to prevent models from "forgetting" their initialized
 | ||
| "knowledge". To perform rehearsal, collect samples of text you want the models
 | ||
| to retain performance on, and call [`nlp.rehearse`](/api/language#rehearse) with
 | ||
| a batch of [Example](/api/example) objects.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > optimizer = nlp.resume_training()
 | ||
| > nlp.rehearse(examples, sgd=optimizer)
 | ||
| > ```
 | ||
| 
 | ||
| | Name           | Type                                                | Description                                                                                                 |
 | ||
| | -------------- | --------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
 | ||
| | _keyword-only_ |                                                     |                                                                                                             |
 | ||
| | `sgd`          | [`Optimizer`](https://thinc.ai/docs/api-optimizers) | An optional optimizer. Will be created via [`create_optimizer`](/api/language#create_optimizer) if not set. |
 | ||
| | **RETURNS**    | [`Optimizer`](https://thinc.ai/docs/api-optimizers) | The optimizer.                                                                                              |
 | ||
| 
 | ||
| ## Language.update {#update tag="method"}
 | ||
| 
 | ||
| Update the models in the pipeline.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > for raw_text, entity_offsets in train_data:
 | ||
| >     doc = nlp.make_doc(raw_text)
 | ||
| >     example = Example.from_dict(doc, {"entities": entity_offsets})
 | ||
| >     nlp.update([example], sgd=optimizer)
 | ||
| > ```
 | ||
| 
 | ||
| | Name            | Type                                                | Description                                                                                            |
 | ||
| | --------------- | --------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
 | ||
| | `examples`      | `Iterable[Example]`                                 | A batch of `Example` objects to learn from.                                                            |
 | ||
| | _keyword-only_  |                                                     |                                                                                                        |
 | ||
| | `drop`          | float                                               | The dropout rate.                                                                                      |
 | ||
| | `sgd`           | [`Optimizer`](https://thinc.ai/docs/api-optimizers) | The optimizer.                                                                                         |
 | ||
| | `losses`        | `Dict[str, float]`                                  | Dictionary to update with the loss, keyed by pipeline component.                                       |
 | ||
| | `component_cfg` | `Dict[str, dict]`                                   | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. |
 | ||
| | **RETURNS**     | `Dict[str, float]`                                  | The updated `losses` dictionary.                                                                       |
 | ||
| 
 | ||
| ## Language.rehearse {#rehearse tag="method,experimental"}
 | ||
| 
 | ||
| Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
 | ||
| current model to make predictions similar to an initial model, to try to address
 | ||
| the "catastrophic forgetting" problem. This feature is experimental.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > optimizer = nlp.resume_training()
 | ||
| > losses = nlp.rehearse(examples, sgd=optimizer)
 | ||
| > ```
 | ||
| 
 | ||
| | Name           | Type                                                | Description                                                                               |
 | ||
| | -------------- | --------------------------------------------------- | ----------------------------------------------------------------------------------------- |
 | ||
| | `examples`     | `Iterable[Example]`                                 | A batch of [`Example`](/api/example) objects to learn from.                               |
 | ||
| | _keyword-only_ |                                                     |                                                                                           |
 | ||
| | `drop`         | float                                               | The dropout rate.                                                                         |
 | ||
| | `sgd`          | [`Optimizer`](https://thinc.ai/docs/api-optimizers) | The optimizer.                                                                            |
 | ||
| | `losses`       | `Dict[str, float]`                                  | Optional record of the loss during training. Updated using the component name as the key. |
 | ||
| | **RETURNS**    | `Dict[str, float]`                                  | The updated `losses` dictionary.                                                          |
 | ||
| 
 | ||
| ## Language.evaluate {#evaluate tag="method"}
 | ||
| 
 | ||
| Evaluate a model's pipeline components.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > scores = nlp.evaluate(examples, verbose=True)
 | ||
| > print(scores)
 | ||
| > ```
 | ||
| 
 | ||
| | Name            | Type                            | Description                                                                                            |
 | ||
| | --------------- | ------------------------------- | ------------------------------------------------------------------------------------------------------ |
 | ||
| | `examples`      | `Iterable[Example]`             | A batch of [`Example`](/api/example) objects to learn from.                                            |
 | ||
| | _keyword-only_  |                                 |                                                                                                        |
 | ||
| | `verbose`       | bool                            | Print debugging information.                                                                           |
 | ||
| | `batch_size`    | int                             | The batch size to use.                                                                                 |
 | ||
| | `scorer`        | `Scorer`                        | Optional [`Scorer`](/api/scorer) to use. If not passed in, a new one will be created.                  |
 | ||
| | `component_cfg` | `Dict[str, dict]`               | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. |
 | ||
| | **RETURNS**     | `Dict[str, Union[float, dict]]` | A dictionary of evaluation scores.                                                                     |
 | ||
| 
 | ||
| ## Language.use_params {#use_params tag="contextmanager, method"}
 | ||
| 
 | ||
| Replace weights of models in the pipeline with those provided in the params
 | ||
| dictionary. Can be used as a context manager, in which case, models go back to
 | ||
| their original weights after the block.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > with nlp.use_params(optimizer.averages):
 | ||
| >     nlp.to_disk("/tmp/checkpoint")
 | ||
| > ```
 | ||
| 
 | ||
| | Name     | Type | Description                                   |
 | ||
| | -------- | ---- | --------------------------------------------- |
 | ||
| | `params` | dict | A dictionary of parameters keyed by model ID. |
 | ||
| 
 | ||
| ## Language.create_pipe {#create_pipe tag="method" new="2"}
 | ||
| 
 | ||
| Create a pipeline component from a factory.
 | ||
| 
 | ||
| <Infobox title="Changed in v3.0" variant="warning">
 | ||
| 
 | ||
| As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method also takes
 | ||
| the string name of the factory, creates the component, adds it to the pipeline
 | ||
| and returns it. The `Language.create_pipe` method is now mostly used internally.
 | ||
| To create a component and add it to the pipeline, you should always use
 | ||
| `Language.add_pipe`.
 | ||
| 
 | ||
| </Infobox>
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > parser = nlp.create_pipe("parser")
 | ||
| > ```
 | ||
| 
 | ||
| | Name                                  | Type             | Description                                                                                                                                               |
 | ||
| | ------------------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | ||
| | `factory_name`                        | str              | Name of the registered component factory.                                                                                                                 |
 | ||
| | `name`                                | str              | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. |
 | ||
| | _keyword-only_                        |                  |                                                                                                                                                           |
 | ||
| | `config` <Tag variant="new">3</Tag>   | `Dict[str, Any]` | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory.                        |
 | ||
| | `validate` <Tag variant="new">3</Tag> | bool             | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`.                                     |
 | ||
| | **RETURNS**                           | callable         | The pipeline component.                                                                                                                                   |
 | ||
| 
 | ||
| ## Language.add_pipe {#add_pipe tag="method" new="2"}
 | ||
| 
 | ||
| Add a component to the processing pipeline. Expects a name that maps to a
 | ||
| component factory registered using
 | ||
| [`@Language.component`](/api/language#component) or
 | ||
| [`@Language.factory`](/api/language#factory). Components should be callables
 | ||
| that take a `Doc` object, modify it and return it. Only one of `before`,
 | ||
| `after`, `first` or `last` can be set. Default behavior is `last=True`.
 | ||
| 
 | ||
| <Infobox title="Changed in v3.0" variant="warning">
 | ||
| 
 | ||
| As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method doesn't
 | ||
| take callables anymore and instead expects the name of a component factory
 | ||
| registered using [`@Language.component`](/api/language#component) or
 | ||
| [`@Language.factory`](/api/language#factory). It now takes care of creating the
 | ||
| component, adds it to the pipeline and returns it.
 | ||
| 
 | ||
| </Infobox>
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > @Language.component("component")
 | ||
| > def component_func(doc):
 | ||
| >     # modify Doc and return it return doc
 | ||
| >
 | ||
| > nlp.add_pipe("component", before="ner")
 | ||
| > component = nlp.add_pipe("component", name="custom_name", last=True)
 | ||
| > ```
 | ||
| 
 | ||
| | Name                                   | Type             | Description                                                                                                                                               |
 | ||
| | -------------------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | ||
| | `factory_name`                         | str              | Name of the registered component factory.                                                                                                                 |
 | ||
| | `name`                                 | str              | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. |
 | ||
| | _keyword-only_                         |                  |                                                                                                                                                           |
 | ||
| | `before`                               | str / int        | Component name or index to insert component directly before.                                                                                              |
 | ||
| | `after`                                | str / int        | Component name or index to insert component directly after:                                                                                               |
 | ||
| | `first`                                | bool             | Insert component first / not first in the pipeline.                                                                                                       |
 | ||
| | `last`                                 | bool             | Insert component last / not last in the pipeline.                                                                                                         |
 | ||
| | `config` <Tag variant="new">3</Tag>    | `Dict[str, Any]` | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory.                        |
 | ||
| | `validate` <Tag variant="new">3</Tag>  | bool             | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`.                                     |
 | ||
| | **RETURNS** <Tag variant="new">3</Tag> | callable         | The pipeline component.                                                                                                                                   |
 | ||
| 
 | ||
| ## Language.has_factory {#has_factory tag="classmethod" new="3"}
 | ||
| 
 | ||
| Check whether a factory name is registered on the `Language` class or subclass.
 | ||
| Will check for
 | ||
| [language-specific factories](/usage/processing-pipelines#factories-language)
 | ||
| registered on the subclass, as well as general-purpose factories registered on
 | ||
| the `Language` base class, available to all subclasses.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > from spacy.language import Language
 | ||
| > from spacy.lang.en import English
 | ||
| >
 | ||
| > @English.component("component")
 | ||
| > def component(doc):
 | ||
| >     return doc
 | ||
| >
 | ||
| > assert English.has_factory("component")
 | ||
| > assert not Language.has_factory("component")
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Type | Description                                                |
 | ||
| | ----------- | ---- | ---------------------------------------------------------- |
 | ||
| | `name`      | str  | Name of the pipeline factory to check.                     |
 | ||
| | **RETURNS** | bool | Whether a factory of that name is registered on the class. |
 | ||
| 
 | ||
| ## Language.has_pipe {#has_pipe tag="method" new="2"}
 | ||
| 
 | ||
| Check whether a component is present in the pipeline. Equivalent to
 | ||
| `name in nlp.pipe_names`.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > @Language.component("component")
 | ||
| > def component(doc):
 | ||
| >     return doc
 | ||
| >
 | ||
| > nlp.add_pipe("component", name="my_component")
 | ||
| > assert "my_component" in nlp.pipe_names
 | ||
| > assert nlp.has_pipe("my_component")
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Type | Description                                              |
 | ||
| | ----------- | ---- | -------------------------------------------------------- |
 | ||
| | `name`      | str  | Name of the pipeline component to check.                 |
 | ||
| | **RETURNS** | bool | Whether a component of that name exists in the pipeline. |
 | ||
| 
 | ||
| ## Language.get_pipe {#get_pipe tag="method" new="2"}
 | ||
| 
 | ||
| Get a pipeline component for a given component name.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > parser = nlp.get_pipe("parser")
 | ||
| > custom_component = nlp.get_pipe("custom_component")
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Type     | Description                            |
 | ||
| | ----------- | -------- | -------------------------------------- |
 | ||
| | `name`      | str      | Name of the pipeline component to get. |
 | ||
| | **RETURNS** | callable | The pipeline component.                |
 | ||
| 
 | ||
| ## Language.replace_pipe {#replace_pipe tag="method" new="2"}
 | ||
| 
 | ||
| Replace a component in the pipeline.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > nlp.replace_pipe("parser", my_custom_parser)
 | ||
| > ```
 | ||
| 
 | ||
| | Name                                  | Type             | Description                                                                                                                           |
 | ||
| | ------------------------------------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
 | ||
| | `name`                                | str              | Name of the component to replace.                                                                                                     |
 | ||
| | `component`                           | callable         | The pipeline component to insert.                                                                                                     |
 | ||
| | _keyword-only_                        |                  |                                                                                                                                       |
 | ||
| | `config` <Tag variant="new">3</Tag>   | `Dict[str, Any]` | Optional config parameters to use for the new component. Will be merged with the `default_config` specified by the component factory. |
 | ||
| | `validate` <Tag variant="new">3</Tag> | bool             | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`.                 |
 | ||
| 
 | ||
| ## Language.rename_pipe {#rename_pipe tag="method" new="2"}
 | ||
| 
 | ||
| Rename a component in the pipeline. Useful to create custom names for
 | ||
| pre-defined and pre-loaded components. To change the default name of a component
 | ||
| added to the pipeline, you can also use the `name` argument on
 | ||
| [`add_pipe`](/api/language#add_pipe).
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > nlp.rename_pipe("parser", "spacy_parser")
 | ||
| > ```
 | ||
| 
 | ||
| | Name       | Type | Description                      |
 | ||
| | ---------- | ---- | -------------------------------- |
 | ||
| | `old_name` | str  | Name of the component to rename. |
 | ||
| | `new_name` | str  | New name of the component.       |
 | ||
| 
 | ||
| ## Language.remove_pipe {#remove_pipe tag="method" new="2"}
 | ||
| 
 | ||
| Remove a component from the pipeline. Returns the removed component name and
 | ||
| component function.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > name, component = nlp.remove_pipe("parser")
 | ||
| > assert name == "parser"
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Type  | Description                                           |
 | ||
| | ----------- | ----- | ----------------------------------------------------- |
 | ||
| | `name`      | str   | Name of the component to remove.                      |
 | ||
| | **RETURNS** | tuple | A `(name, component)` tuple of the removed component. |
 | ||
| 
 | ||
| ## Language.select_pipes {#select_pipes tag="contextmanager, method" new="3"}
 | ||
| 
 | ||
| Disable one or more pipeline components. If used as a context manager, the
 | ||
| pipeline will be restored to the initial state at the end of the block.
 | ||
| Otherwise, a `DisabledPipes` object is returned, that has a `.restore()` method
 | ||
| you can use to undo your changes. You can specify either `disable` (as a list or
 | ||
| string), or `enable`. In the latter case, all components not in the `enable`
 | ||
| list, will be disabled.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > with nlp.select_pipes(disable=["tagger", "parser"]):
 | ||
| >    nlp.begin_training()
 | ||
| >
 | ||
| > with nlp.select_pipes(enable="ner"):
 | ||
| >     nlp.begin_training()
 | ||
| >
 | ||
| > disabled = nlp.select_pipes(disable=["tagger", "parser"])
 | ||
| > nlp.begin_training()
 | ||
| > disabled.restore()
 | ||
| > ```
 | ||
| 
 | ||
| <Infobox title="Changed in v3.0" variant="warning" id="disable_pipes">
 | ||
| 
 | ||
| As of spaCy v3.0, the `disable_pipes` method has been renamed to `select_pipes`:
 | ||
| 
 | ||
| ```diff
 | ||
| - nlp.disable_pipes(["tagger", "parser"])
 | ||
| + nlp.select_pipes(disable=["tagger", "parser"])
 | ||
| ```
 | ||
| 
 | ||
| </Infobox>
 | ||
| 
 | ||
| | Name           | Type            | Description                                                                          |
 | ||
| | -------------- | --------------- | ------------------------------------------------------------------------------------ |
 | ||
| | _keyword-only_ |                 |                                                                                      |
 | ||
| | `disable`      | str / list      | Name(s) of pipeline components to disable.                                           |
 | ||
| | `enable`       | str / list      | Names(s) of pipeline components that will not be disabled.                           |
 | ||
| | **RETURNS**    | `DisabledPipes` | The disabled pipes that can be restored by calling the object's `.restore()` method. |
 | ||
| 
 | ||
| ## Language.get_factory_meta {#get_factory_meta tag="classmethod" new="3"}
 | ||
| 
 | ||
| Get the factory meta information for a given pipeline component name. Expects
 | ||
| the name of the component **factory**. The factory meta is an instance of the
 | ||
| [`FactoryMeta`](/api/language#factorymeta) dataclass and contains the
 | ||
| information about the component and its default provided by the
 | ||
| [`@Language.component`](/api/language#component) or
 | ||
| [`@Language.factory`](/api/language#factory) decorator.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > factory_meta = Language.get_factory_meta("ner")
 | ||
| > assert factory_meta.factory == "ner"
 | ||
| > print(factory_meta.default_config)
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Type                          | Description        |
 | ||
| | ----------- | ----------------------------- | ------------------ |
 | ||
| | `name`      | str                           | The factory name.  |
 | ||
| | **RETURNS** | [`FactoryMeta`](#factorymeta) |  The factory meta. |
 | ||
| 
 | ||
| ## Language.get_pipe_meta {#get_pipe_meta tag="method" new="3"}
 | ||
| 
 | ||
| Get the factory meta information for a given pipeline component name. Expects
 | ||
| the name of the component **instance** in the pipeline. The factory meta is an
 | ||
| instance of the [`FactoryMeta`](/api/language#factorymeta) dataclass and
 | ||
| contains the information about the component and its default provided by the
 | ||
| [`@Language.component`](/api/language#component) or
 | ||
| [`@Language.factory`](/api/language#factory) decorator.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > nlp.add_pipe("ner", name="entity_recognizer")
 | ||
| > factory_meta = nlp.get_pipe_meta("entity_recognizer")
 | ||
| > assert factory_meta.factory == "ner"
 | ||
| > print(factory_meta.default_config)
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Type                          | Description                  |
 | ||
| | ----------- | ----------------------------- | ---------------------------- |
 | ||
| | `name`      | str                           | The pipeline component name. |
 | ||
| | **RETURNS** | [`FactoryMeta`](#factorymeta) |  The factory meta.           |
 | ||
| 
 | ||
| ## Language.meta {#meta tag="property"}
 | ||
| 
 | ||
| Custom meta data for the Language class. If a model is loaded, contains meta
 | ||
| data of the model. The `Language.meta` is also what's serialized as the
 | ||
| `meta.json` when you save an `nlp` object to disk.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > print(nlp.meta)
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Type | Description    |
 | ||
| | ----------- | ---- | -------------- |
 | ||
| | **RETURNS** | dict | The meta data. |
 | ||
| 
 | ||
| ## Language.config {#config tag="property" new="3"}
 | ||
| 
 | ||
| Export a trainable [`config.cfg`](/api/data-formats#config) for the current
 | ||
| `nlp` object. Includes the current pipeline, all configs used to create the
 | ||
| currently active pipeline components, as well as the default training config
 | ||
| that can be used with [`spacy train`](/api/cli#train). `Language.config` returns
 | ||
| a [Thinc `Config` object](https://thinc.ai/docs/api-config#config), which is a
 | ||
| subclass of the built-in `dict`. It supports the additional methods `to_disk`
 | ||
| (serialize the config to a file) and `to_str` (output the config as a string).
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > nlp.config.to_disk("./config.cfg")
 | ||
| > print(nlp.config.to_str())
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Type                                                | Description |
 | ||
| | ----------- | --------------------------------------------------- | ----------- |
 | ||
| | **RETURNS** | [`Config`](https://thinc.ai/docs/api-config#config) | The config. |
 | ||
| 
 | ||
| ## Language.to_disk {#to_disk tag="method" new="2"}
 | ||
| 
 | ||
| Save the current state to a directory. If a model is loaded, this will **include
 | ||
| the model**.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > nlp.to_disk("/path/to/models")
 | ||
| > ```
 | ||
| 
 | ||
| | Name      | Type         | Description                                                                                                           |
 | ||
| | --------- | ------------ | --------------------------------------------------------------------------------------------------------------------- |
 | ||
| | `path`    | str / `Path` | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. |
 | ||
| | `exclude` | list         | Names of pipeline components or [serialization fields](#serialization-fields) to exclude.                             |
 | ||
| 
 | ||
| ## Language.from_disk {#from_disk tag="method" new="2"}
 | ||
| 
 | ||
| Loads state from a directory. Modifies the object in place and returns it. If
 | ||
| the saved `Language` object contains a model, the model will be loaded. Note
 | ||
| that this method is commonly used via the subclasses like `English` or `German`
 | ||
| to make language-specific functionality like the
 | ||
| [lexical attribute getters](/usage/adding-languages#lex-attrs) available to the
 | ||
| loaded object.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > from spacy.language import Language
 | ||
| > nlp = Language().from_disk("/path/to/model")
 | ||
| >
 | ||
| > # using language-specific subclass
 | ||
| > from spacy.lang.en import English
 | ||
| > nlp = English().from_disk("/path/to/en_model")
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Type         | Description                                                                               |
 | ||
| | ----------- | ------------ | ----------------------------------------------------------------------------------------- |
 | ||
| | `path`      | str / `Path` | A path to a directory. Paths may be either strings or `Path`-like objects.                |
 | ||
| | `exclude`   | list         | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. |
 | ||
| | **RETURNS** | `Language`   | The modified `Language` object.                                                           |
 | ||
| 
 | ||
| ## Language.to_bytes {#to_bytes tag="method"}
 | ||
| 
 | ||
| Serialize the current state to a binary string.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > nlp_bytes = nlp.to_bytes()
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Type  | Description                                                                               |
 | ||
| | ----------- | ----- | ----------------------------------------------------------------------------------------- |
 | ||
| | `exclude`   | list  | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. |
 | ||
| | **RETURNS** | bytes | The serialized form of the `Language` object.                                             |
 | ||
| 
 | ||
| ## Language.from_bytes {#from_bytes tag="method"}
 | ||
| 
 | ||
| Load state from a binary string. Note that this method is commonly used via the
 | ||
| subclasses like `English` or `German` to make language-specific functionality
 | ||
| like the [lexical attribute getters](/usage/adding-languages#lex-attrs)
 | ||
| available to the loaded object.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > from spacy.lang.en import English
 | ||
| > nlp_bytes = nlp.to_bytes()
 | ||
| > nlp2 = English()
 | ||
| > nlp2.from_bytes(nlp_bytes)
 | ||
| > ```
 | ||
| 
 | ||
| | Name         | Type       | Description                                                                               |
 | ||
| | ------------ | ---------- | ----------------------------------------------------------------------------------------- |
 | ||
| | `bytes_data` | bytes      | The data to load from.                                                                    |
 | ||
| | `exclude`    | list       | Names of pipeline components or [serialization fields](#serialization-fields) to exclude. |
 | ||
| | **RETURNS**  | `Language` | The `Language` object.                                                                    |
 | ||
| 
 | ||
| ## Attributes {#attributes}
 | ||
| 
 | ||
| | Name                                          | Type                   | Description                                                                              |
 | ||
| | --------------------------------------------- | ---------------------- | ---------------------------------------------------------------------------------------- |
 | ||
| | `vocab`                                       | `Vocab`                | A container for the lexical types.                                                       |
 | ||
| | `tokenizer`                                   | `Tokenizer`            | The tokenizer.                                                                           |
 | ||
| | `make_doc`                                    | `Callable`             | Callable that takes a string and returns a `Doc`.                                        |
 | ||
| | `pipeline`                                    | `List[str, Callable]`  | List of `(name, component)` tuples describing the current processing pipeline, in order. |
 | ||
| | `pipe_names` <Tag variant="new">2</Tag>       | `List[str]`            | List of pipeline component names, in order.                                              |
 | ||
| | `pipe_labels` <Tag variant="new">2.2</Tag>    | `Dict[str, List[str]]` | List of labels set by the pipeline components, if available, keyed by component name.    |
 | ||
| | `pipe_factories` <Tag variant="new">2.2</Tag> | `Dict[str, str]`       | Dictionary of pipeline component names, mapped to their factory names.                   |
 | ||
| | `factories`                                   | `Dict[str, Callable]`  | All available factory functions, keyed by name.                                          |
 | ||
| | `factory_names` <Tag variant="new">3</Tag>    | `List[str]`            | List of all available factory names.                                                     |
 | ||
| | `path` <Tag variant="new">2</Tag>             | `Path`                 | Path to the model data directory, if a model is loaded. Otherwise `None`.                |
 | ||
| 
 | ||
| ## Class attributes {#class-attributes}
 | ||
| 
 | ||
| | Name             | Type  | Description                                                                                                                                                                                             |
 | ||
| | ---------------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | ||
| | `Defaults`       | class | Settings, data and factory methods for creating the `nlp` object and processing pipeline.                                                                                                               |
 | ||
| | `lang`           | str   | Two-letter language ID, i.e. [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes).                                                                                                         |
 | ||
| | `default_config` | dict  | Base [config](/usage/training#config) to use for [Language.config](/api/language#config). Defaults to [`default_config.cfg`](https://github.com/explosion/spaCy/tree/develop/spacy/default_config.cfg). |
 | ||
| 
 | ||
| ## Defaults {#defaults}
 | ||
| 
 | ||
| The following attributes can be set on the `Language.Defaults` class to
 | ||
| customize the default language data:
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > from spacy.language import language
 | ||
| > from spacy.lang.tokenizer_exceptions import URL_MATCH
 | ||
| > from thinc.api import Config
 | ||
| >
 | ||
| > DEFAULT_CONFIFG = """
 | ||
| > [nlp.tokenizer]
 | ||
| > @tokenizers = "MyCustomTokenizer.v1"
 | ||
| > """
 | ||
| >
 | ||
| > class Defaults(Language.Defaults):
 | ||
| >    stop_words = set()
 | ||
| >    tokenizer_exceptions = {}
 | ||
| >    prefixes = tuple()
 | ||
| >    suffixes = tuple()
 | ||
| >    infixes = tuple()
 | ||
| >    token_match = None
 | ||
| >    url_match = URL_MATCH
 | ||
| >    lex_attr_getters = {}
 | ||
| >    syntax_iterators = {}
 | ||
| >    writing_system = {"direction": "ltr", "has_case": True, "has_letters": True}
 | ||
| >    config = Config().from_str(DEFAULT_CONFIG)
 | ||
| > ```
 | ||
| 
 | ||
| | Name                              | Description                                                                                                                                                                                                              |
 | ||
| | --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | ||
| | `stop_words`                      | List of stop words, used for `Token.is_stop`.<br />**Example:** [`stop_words.py`][stop_words.py]                                                                                                                         |
 | ||
| | `tokenizer_exceptions`            | Tokenizer exception rules, string mapped to list of token attributes.<br />**Example:** [`de/tokenizer_exceptions.py`][de/tokenizer_exceptions.py]                                                                       |
 | ||
| | `prefixes`, `suffixes`, `infixes` | Prefix, suffix and infix rules for the default tokenizer.<br />**Example:** [`puncutation.py`][punctuation.py]                                                                                                           |
 | ||
| | `token_match`                     | Optional regex for matching strings that should never be split, overriding the infix rules.<br />**Example:** [`fr/tokenizer_exceptions.py`][fr/tokenizer_exceptions.py]                                                 |
 | ||
| | `url_match`                       | Regular expression for matching URLs. Prefixes and suffixes are removed before applying the match.<br />**Example:** [`tokenizer_exceptions.py`][tokenizer_exceptions.py]                                                |
 | ||
| | `lex_attr_getters`                | Custom functions for setting lexical attributes on tokens, e.g. `like_num`.<br />**Example:** [`lex_attrs.py`][lex_attrs.py]                                                                                             |
 | ||
| | `syntax_iterators`                | Functions that compute views of a `Doc` object based on its syntax. At the moment, only used for [noun chunks](/usage/linguistic-features#noun-chunks).<br />**Example:** [`syntax_iterators.py`][syntax_iterators.py].  |
 | ||
| | `writing_system`                  | Information about the language's writing system, available via `Vocab.writing_system`. Defaults to: `{"direction": "ltr", "has_case": True, "has_letters": True}.`.<br />**Example:** [`zh/__init__.py`][zh/__init__.py] |
 | ||
| | `config`                          | Default [config](/usage/training#config) added to `nlp.config`. This can include references to custom tokenizers or lemmatizers.<br />**Example:** [`zh/__init__.py`][zh/__init__.py]                                    |
 | ||
| 
 | ||
| [stop_words.py]:
 | ||
|   https://github.com/explosion/spaCy/tree/master/spacy/lang/en/stop_words.py
 | ||
| [tokenizer_exceptions.py]:
 | ||
|   https://github.com/explosion/spaCy/tree/master/spacy/lang/tokenizer_exceptions.py
 | ||
| [de/tokenizer_exceptions.py]:
 | ||
|   https://github.com/explosion/spaCy/tree/master/spacy/lang/de/tokenizer_exceptions.py
 | ||
| [fr/tokenizer_exceptions.py]:
 | ||
|   https://github.com/explosion/spaCy/tree/master/spacy/lang/fr/tokenizer_exceptions.py
 | ||
| [punctuation.py]:
 | ||
|   https://github.com/explosion/spaCy/tree/master/spacy/lang/punctuation.py
 | ||
| [lex_attrs.py]:
 | ||
|   https://github.com/explosion/spaCy/tree/master/spacy/lang/en/lex_attrs.py
 | ||
| [syntax_iterators.py]:
 | ||
|   https://github.com/explosion/spaCy/tree/master/spacy/lang/en/syntax_iterators.py
 | ||
| [zh/__init__.py]:
 | ||
|   https://github.com/explosion/spaCy/tree/master/spacy/lang/zh/__init__.py
 | ||
| 
 | ||
| ## Serialization fields {#serialization-fields}
 | ||
| 
 | ||
| During serialization, spaCy will export several data fields used to restore
 | ||
| different aspects of the object. If needed, you can exclude them from
 | ||
| serialization by passing in the string names via the `exclude` argument.
 | ||
| 
 | ||
| > #### Example
 | ||
| >
 | ||
| > ```python
 | ||
| > data = nlp.to_bytes(exclude=["tokenizer", "vocab"])
 | ||
| > nlp.from_disk("./model-data", exclude=["ner"])
 | ||
| > ```
 | ||
| 
 | ||
| | Name        | Description                                        |
 | ||
| | ----------- | -------------------------------------------------- |
 | ||
| | `vocab`     | The shared [`Vocab`](/api/vocab).                  |
 | ||
| | `tokenizer` | Tokenization rules and exceptions.                 |
 | ||
| | `meta`      | The meta data, available as `Language.meta`.       |
 | ||
| | ...         | String names of pipeline components, e.g. `"ner"`. |
 | ||
| 
 | ||
| ## FactoryMeta {#factorymeta new="3" tag="dataclass"}
 | ||
| 
 | ||
| The `FactoryMeta` contains the information about the component and its default
 | ||
| provided by the [`@Language.component`](/api/language#component) or
 | ||
| [`@Language.factory`](/api/language#factory) decorator. It's created whenever a
 | ||
| component is defined and stored on the `Language` class for each component
 | ||
| instance and factory instance.
 | ||
| 
 | ||
| | Name                    | Type               | Description                                                                                                                                                                                                                 |
 | ||
| | ----------------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | ||
| | `factory`               | str                | The name of the registered component factory.                                                                                                                                                                               |
 | ||
| | `default_config`        | `Dict[str, Any]`   | The default config, describing the default values of the factory arguments.                                                                                                                                                 |
 | ||
| | `assigns`               | `Iterable[str]`    | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something -->                                                                               |
 | ||
| | `requires`              | `Iterable[str]`    | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something -->                                                                               |
 | ||
| | `retokenizes`           | bool               | Whether the component changes tokenization. Used for pipeline analysis. <!-- TODO: link to something -->                                                                                                                    |
 | ||
| | `scores`                | `Iterable[str]`    | All scores set by the components if it's trainable, e.g. `["ents_f", "ents_r", "ents_p"]`.                                                                                                                                  |
 | ||
| | `default_score_weights` | `Dict[str, float]` | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. |
 |