mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 16:07:41 +03:00 
			
		
		
		
	Update docs
This commit is contained in:
		
							parent
							
								
									5fb776556a
								
							
						
					
					
						commit
						35d695a031
					
				|  | @ -176,12 +176,12 @@ This method was previously called `begin_training`. | |||
| > path = "corpus/labels/parser.json | ||||
| > ``` | ||||
| 
 | ||||
| | Name           | Description                                                                                                                                                                                                                                                                                                         | | ||||
| | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                               | | ||||
| | _keyword-only_ |                                                                                                                                                                                                                                                                                                                     | | ||||
| | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                | | ||||
| | `labels`       | The label information to add to the component. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[dict]~~ | | ||||
| | Name           | Description                                                                                                                                                                                                                                                                                                                                                                                                            | | ||||
| | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                                                                                                                                  | | ||||
| | _keyword-only_ |                                                                                                                                                                                                                                                                                                                                                                                                                        | | ||||
| | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                                                                                                                   | | ||||
| | `labels`       | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Dict[str, Dict[str, int]]]~~ | | ||||
| 
 | ||||
| ## DependencyParser.predict {#predict tag="method"} | ||||
| 
 | ||||
|  | @ -433,6 +433,24 @@ The labels currently added to the component. | |||
| | ----------- | ------------------------------------------------------ | | ||||
| | **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ | | ||||
| 
 | ||||
| ## DependencyParser.label_data {#label_data tag="property" new="3"} | ||||
| 
 | ||||
| The labels currently added to the component and their internal meta information. | ||||
| This is the data generated by [`init labels`](/api/cli#init-labels) and used by | ||||
| [`DependencyParser.initialize`](/api/dependencyparser#initialize) to initialize | ||||
| the model with a pre-defined label set. | ||||
| 
 | ||||
| > #### Example | ||||
| > | ||||
| > ```python | ||||
| > labels = parser.label_data | ||||
| > parser.initialize(lambda: [], nlp=nlp, labels=labels) | ||||
| > ``` | ||||
| 
 | ||||
| | Name        | Description                                                                     | | ||||
| | ----------- | ------------------------------------------------------------------------------- | | ||||
| | **RETURNS** | The label data added to the component. ~~Dict[str, Dict[str, Dict[str, int]]]~~ | | ||||
| 
 | ||||
| ## Serialization fields {#serialization-fields} | ||||
| 
 | ||||
| During serialization, spaCy will export several data fields used to restore | ||||
|  |  | |||
|  | @ -165,12 +165,12 @@ This method was previously called `begin_training`. | |||
| > path = "corpus/labels/ner.json | ||||
| > ``` | ||||
| 
 | ||||
| | Name           | Description                                                                                                                                                                                                                                                                                                         | | ||||
| | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                               | | ||||
| | _keyword-only_ |                                                                                                                                                                                                                                                                                                                     | | ||||
| | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                | | ||||
| | `labels`       | The label information to add to the component. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[dict]~~ | | ||||
| | Name           | Description                                                                                                                                                                                                                                                                                                                                                                                                            | | ||||
| | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                                                                                                                                  | | ||||
| | _keyword-only_ |                                                                                                                                                                                                                                                                                                                                                                                                                        | | ||||
| | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                                                                                                                   | | ||||
| | `labels`       | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Dict[str, Dict[str, int]]]~~ | | ||||
| 
 | ||||
| ## EntityRecognizer.predict {#predict tag="method"} | ||||
| 
 | ||||
|  | @ -421,6 +421,24 @@ The labels currently added to the component. | |||
| | ----------- | ------------------------------------------------------ | | ||||
| | **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ | | ||||
| 
 | ||||
| ## EntityRecognizer.label_data {#label_data tag="property" new="3"} | ||||
| 
 | ||||
| The labels currently added to the component and their internal meta information. | ||||
| This is the data generated by [`init labels`](/api/cli#init-labels) and used by | ||||
| [`EntityRecognizer.initialize`](/api/entityrecognizer#initialize) to initialize | ||||
| the model with a pre-defined label set. | ||||
| 
 | ||||
| > #### Example | ||||
| > | ||||
| > ```python | ||||
| > labels = ner.label_data | ||||
| > ner.initialize(lambda: [], nlp=nlp, labels=labels) | ||||
| > ``` | ||||
| 
 | ||||
| | Name        | Description                                                                     | | ||||
| | ----------- | ------------------------------------------------------------------------------- | | ||||
| | **RETURNS** | The label data added to the component. ~~Dict[str, Dict[str, Dict[str, int]]]~~ | | ||||
| 
 | ||||
| ## Serialization fields {#serialization-fields} | ||||
| 
 | ||||
| During serialization, spaCy will export several data fields used to restore | ||||
|  |  | |||
|  | @ -147,12 +147,12 @@ config. | |||
| > path = "corpus/labels/morphologizer.json | ||||
| > ``` | ||||
| 
 | ||||
| | Name           | Description                                                                                                                                                                                                                                                                                                         | | ||||
| | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                               | | ||||
| | _keyword-only_ |                                                                                                                                                                                                                                                                                                                     | | ||||
| | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                | | ||||
| | `labels`       | The label information to add to the component. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[dict]~~ | | ||||
| | Name           | Description                                                                                                                                                                                                                                                                                                                                                                                       | | ||||
| | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                                                                                                             | | ||||
| | _keyword-only_ |                                                                                                                                                                                                                                                                                                                                                                                                   | | ||||
| | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                                                                                              | | ||||
| | `labels`       | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[dict]~~ | | ||||
| 
 | ||||
| ## Morphologizer.predict {#predict tag="method"} | ||||
| 
 | ||||
|  | @ -377,6 +377,24 @@ coarse-grained POS as the feature `POS`. | |||
| | ----------- | ------------------------------------------------------ | | ||||
| | **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ | | ||||
| 
 | ||||
| ## Morphologizer.label_data {#label_data tag="property" new="3"} | ||||
| 
 | ||||
| The labels currently added to the component and their internal meta information. | ||||
| This is the data generated by [`init labels`](/api/cli#init-labels) and used by | ||||
| [`Morphologizer.initialize`](/api/morphologizer#initialize) to initialize the | ||||
| model with a pre-defined label set. | ||||
| 
 | ||||
| > #### Example | ||||
| > | ||||
| > ```python | ||||
| > labels = morphologizer.label_data | ||||
| > morphologizer.initialize(lambda: [], nlp=nlp, labels=labels) | ||||
| > ``` | ||||
| 
 | ||||
| | Name        | Description                                     | | ||||
| | ----------- | ----------------------------------------------- | | ||||
| | **RETURNS** | The label data added to the component. ~~dict~~ | | ||||
| 
 | ||||
| ## Serialization fields {#serialization-fields} | ||||
| 
 | ||||
| During serialization, spaCy will export several data fields used to restore | ||||
|  |  | |||
|  | @ -148,12 +148,12 @@ This method was previously called `begin_training`. | |||
| > path = "corpus/labels/tagger.json | ||||
| > ``` | ||||
| 
 | ||||
| | Name           | Description                                                                                                                                                                                                                                                                                                         | | ||||
| | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                               | | ||||
| | _keyword-only_ |                                                                                                                                                                                                                                                                                                                     | | ||||
| | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                | | ||||
| | `labels`       | The label information to add to the component. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[list]~~ | | ||||
| | Name           | Description                                                                                                                                                                                                                                                                                                                                                                                                | | ||||
| | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                                                                                                                      | | ||||
| | _keyword-only_ |                                                                                                                                                                                                                                                                                                                                                                                                            | | ||||
| | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                                                                                                       | | ||||
| | `labels`       | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ | | ||||
| 
 | ||||
| ## Tagger.predict {#predict tag="method"} | ||||
| 
 | ||||
|  | @ -411,6 +411,24 @@ The labels currently added to the component. | |||
| | ----------- | ------------------------------------------------------ | | ||||
| | **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ | | ||||
| 
 | ||||
| ## Tagger.label_data {#label_data tag="property" new="3"} | ||||
| 
 | ||||
| The labels currently added to the component and their internal meta information. | ||||
| This is the data generated by [`init labels`](/api/cli#init-labels) and used by | ||||
| [`Tagger.initialize`](/api/tagger#initialize) to initialize the model with a | ||||
| pre-defined label set. | ||||
| 
 | ||||
| > #### Example | ||||
| > | ||||
| > ```python | ||||
| > labels = tagger.label_data | ||||
| > tagger.initialize(lambda: [], nlp=nlp, labels=labels) | ||||
| > ``` | ||||
| 
 | ||||
| | Name        | Description                                                | | ||||
| | ----------- | ---------------------------------------------------------- | | ||||
| | **RETURNS** | The label data added to the component. ~~Tuple[str, ...]~~ | | ||||
| 
 | ||||
| ## Serialization fields {#serialization-fields} | ||||
| 
 | ||||
| During serialization, spaCy will export several data fields used to restore | ||||
|  |  | |||
|  | @ -29,7 +29,6 @@ architectures and their arguments and hyperparameters. | |||
| > ```python | ||||
| > from spacy.pipeline.textcat import DEFAULT_TEXTCAT_MODEL | ||||
| > config = { | ||||
| >    "labels": [], | ||||
| >    "threshold": 0.5, | ||||
| >    "model": DEFAULT_TEXTCAT_MODEL, | ||||
| > } | ||||
|  | @ -38,7 +37,6 @@ architectures and their arguments and hyperparameters. | |||
| 
 | ||||
| | Setting          | Description                                                                                                                                                      | | ||||
| | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `labels`         | A list of categories to learn. If empty, the model infers the categories from the data. Defaults to `[]`. ~~Iterable[str]~~                                      | | ||||
| | `threshold`      | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~                                                                   | | ||||
| | `positive_label` | The positive label for a binary task with exclusive classes, None otherwise and by default. ~~Optional[str]~~                                                    | | ||||
| | `model`          | A model instance that predicts scores for each category. Defaults to [TextCatEnsemble](/api/architectures#TextCatEnsemble). ~~Model[List[Doc], List[Floats2d]]~~ | | ||||
|  | @ -61,7 +59,7 @@ architectures and their arguments and hyperparameters. | |||
| > | ||||
| > # Construction from class | ||||
| > from spacy.pipeline import TextCategorizer | ||||
| > textcat = TextCategorizer(nlp.vocab, model, labels=[], threshold=0.5, positive_label="POS") | ||||
| > textcat = TextCategorizer(nlp.vocab, model, threshold=0.5, positive_label="POS") | ||||
| > ``` | ||||
| 
 | ||||
| Create a new pipeline instance. In your application, you would normally use a | ||||
|  | @ -74,7 +72,6 @@ shortcut for this and instantiate the component using its string name and | |||
| | `model`          | The Thinc [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model[List[Doc], List[Floats2d]]~~ | | ||||
| | `name`           | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~                        | | ||||
| | _keyword-only_   |                                                                                                                            | | ||||
| | `labels`         | The labels to use. ~~Iterable[str]~~                                                                                       | | ||||
| | `threshold`      | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~                             | | ||||
| | `positive_label` | The positive label for a binary task with exclusive classes, None otherwise. ~~Optional[str]~~                             | | ||||
| 
 | ||||
|  | @ -161,12 +158,12 @@ This method was previously called `begin_training`. | |||
| > path = "corpus/labels/textcat.json | ||||
| > ``` | ||||
| 
 | ||||
| | Name           | Description                                                                                                                                                                                                                                                                                                         | | ||||
| | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                               | | ||||
| | _keyword-only_ |                                                                                                                                                                                                                                                                                                                     | | ||||
| | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                | | ||||
| | `labels`       | The label information to add to the component. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[dict]~~ | | ||||
| | Name           | Description                                                                                                                                                                                                                                                                                                                                                                                                | | ||||
| | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||
| | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                                                                                                                      | | ||||
| | _keyword-only_ |                                                                                                                                                                                                                                                                                                                                                                                                            | | ||||
| | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                                                                                                       | | ||||
| | `labels`       | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ | | ||||
| 
 | ||||
| ## TextCategorizer.predict {#predict tag="method"} | ||||
| 
 | ||||
|  | @ -425,6 +422,24 @@ The labels currently added to the component. | |||
| | ----------- | ------------------------------------------------------ | | ||||
| | **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ | | ||||
| 
 | ||||
| ## TextCategorizer.label_data {#label_data tag="property" new="3"} | ||||
| 
 | ||||
| The labels currently added to the component and their internal meta information. | ||||
| This is the data generated by [`init labels`](/api/cli#init-labels) and used by | ||||
| [`TextCategorizer.initialize`](/api/textcategorizer#initialize) to initialize | ||||
| the model with a pre-defined label set. | ||||
| 
 | ||||
| > #### Example | ||||
| > | ||||
| > ```python | ||||
| > labels = textcat.label_data | ||||
| > textcat.initialize(lambda: [], nlp=nlp, labels=labels) | ||||
| > ``` | ||||
| 
 | ||||
| | Name        | Description                                                | | ||||
| | ----------- | ---------------------------------------------------------- | | ||||
| | **RETURNS** | The label data added to the component. ~~Tuple[str, ...]~~ | | ||||
| 
 | ||||
| ## Serialization fields {#serialization-fields} | ||||
| 
 | ||||
| During serialization, spaCy will export several data fields used to restore | ||||
|  |  | |||
|  | @ -692,14 +692,14 @@ for writing the log files to [Weights & Biases](https://www.wandb.com/) with the | |||
| [`WandbLogger`](/api/top-level#WandbLogger). The logger function receives a | ||||
| **dictionary** with the following keys: | ||||
| 
 | ||||
| | Key            | Value                                                                                          | | ||||
| | -------------- | ---------------------------------------------------------------------------------------------- | | ||||
| | `epoch`        | How many passes over the data have been completed. ~~int~~                                     | | ||||
| | `step`         | How many steps have been completed. ~~int~~                                                    | | ||||
| | `score`        | The main score from the last evaluation, measured on the dev set. ~~float~~                    | | ||||
| | `other_scores` | The other scores from the last evaluation, measured on the dev set. ~~Dict[str, Any]~~         | | ||||
| | `losses`       | The accumulated training losses, keyed by component name. ~~Dict[str, float]~~                 | | ||||
| | `checkpoints`  | A list of previous results, where each result is a (score, step, epoch) tuple. ~~List[Tuple]~~ | | ||||
| | Key            | Value                                                                                                 | | ||||
| | -------------- | ----------------------------------------------------------------------------------------------------- | | ||||
| | `epoch`        | How many passes over the data have been completed. ~~int~~                                            | | ||||
| | `step`         | How many steps have been completed. ~~int~~                                                           | | ||||
| | `score`        | The main score from the last evaluation, measured on the dev set. ~~float~~                           | | ||||
| | `other_scores` | The other scores from the last evaluation, measured on the dev set. ~~Dict[str, Any]~~                | | ||||
| | `losses`       | The accumulated training losses, keyed by component name. ~~Dict[str, float]~~                        | | ||||
| | `checkpoints`  | A list of previous results, where each result is a `(score, step)` tuple. ~~List[Tuple[float, int]]~~ | | ||||
| 
 | ||||
| You can easily implement and plug in your own logger that records the training | ||||
| results in a custom way, or sends them to an experiment management tracker of | ||||
|  | @ -819,7 +819,84 @@ def MyModel(output_width: int) -> Model[List[Doc], List[Floats2d]]: | |||
| 
 | ||||
| ### Customizing the initialization {#initialization} | ||||
| 
 | ||||
| <Infobox title="This section is still under construction" emoji="🚧" variant="warning"> | ||||
| When you start training a new model from scratch, | ||||
| [`spacy train`](/api/cli#train) will call | ||||
| [`nlp.initialize`](/api/language#initialize) to initialize the pipeline for | ||||
| training. This process typically includes the following: | ||||
| 
 | ||||
| > #### config.cfg (excerpt) | ||||
| > | ||||
| > ```ini | ||||
| > [initialize] | ||||
| > vectors = ${paths.vectors} | ||||
| > init_tok2vec = ${paths.init_tok2vec} | ||||
| > | ||||
| > [initialize.components] | ||||
| > # Settings for components | ||||
| > ``` | ||||
| 
 | ||||
| 1. Load in **data resources** defined in the `[initialize]` config, including | ||||
|    **word vectors** and | ||||
|    [pretrained](/usage/embeddings-transformers/#pretraining) **tok2vec | ||||
|    weights**. | ||||
| 2. Call the `initialize` methods of the tokenizer (if implemented, e.g. for | ||||
|    [Chinese](/usage/models#chinese)) and pipeline components with a callback to | ||||
|    access the training data, the current `nlp` object and any **custom | ||||
|    arguments** defined in the `[initialize]` config. | ||||
| 3. In **pipeline components**: if needed, use the data to | ||||
|    [infer missing shapes](/usage/layers-architectures#thinc-shape-inference) and | ||||
|    set up the label scheme if no labels are provided. Components may also load | ||||
|    other data like lookup tables or dictionaries. | ||||
| 
 | ||||
| The initialization step allows the config to define **all settings** required | ||||
| for the pipeline, while keeping a separation between settings and functions that | ||||
| should only be used **before training** to set up the initial pipeline, and | ||||
| logic and configuration that needs to be available **at runtime**. Without that | ||||
| separation, TODO: | ||||
| 
 | ||||
|  | ||||
| 
 | ||||
| #### Initializing labels {#initialization-labels} | ||||
| 
 | ||||
| Built-in pipeline components like the | ||||
| [`EntityRecognizer`](/api/entityrecognizer) or | ||||
| [`DependencyParser`](/api/dependencyparser) need to know their available labels | ||||
| and associated internal meta information to initialize their model weights. | ||||
| Using the `get_examples` callback provided on initialization, they're able to | ||||
| **read the labels off the training data** automatically, which is very | ||||
| convenient – but it can also slow down the training process to compute this | ||||
| information on every run. | ||||
| 
 | ||||
| The [`init labels`](/api/cli#init-labels) command lets you auto-generate JSON | ||||
| files containing the label data for all supported components. You can then pass | ||||
| in the labels in the `[initialize]` settings for the respective components to | ||||
| allow them to initialize faster. | ||||
| 
 | ||||
| > #### config.cfg | ||||
| > | ||||
| > ```ini | ||||
| > [initialize.components.ner] | ||||
| > | ||||
| > [initialize.components.ner.labels] | ||||
| > @readers = "spacy.read_labels.v1" | ||||
| > path = "corpus/labels/ner.json | ||||
| > ``` | ||||
| 
 | ||||
| ```cli | ||||
| $ python -m spacy init labels config.cfg ./corpus --paths.train ./corpus/train.spacy | ||||
| ``` | ||||
| 
 | ||||
| Under the hood, the command delegates to the `label_data` property of the | ||||
| pipeline components, for instance | ||||
| [`EntityRecognizer.label_data`](/api/entityrecognizer#label_data). | ||||
| 
 | ||||
| <Infobox variant="warning" title="Important note"> | ||||
| 
 | ||||
| The JSON format differs for each component and some components need additional | ||||
| meta information about their labels. The format exported by | ||||
| [`init labels`](/api/cli#init-labels) matches what the components need, so you | ||||
| should always let spaCy **auto-generate the labels** for you. | ||||
| 
 | ||||
| </Infobox> | ||||
| 
 | ||||
| ## Data utilities {#data} | ||||
|  | @ -1298,8 +1375,8 @@ of being dropped. | |||
| 
 | ||||
| > - [`nlp`](/api/language): The `nlp` object with the pipeline components and | ||||
| >   their models. | ||||
| > - [`nlp.initialize`](/api/language#initialize): Start the training and return | ||||
| >   an optimizer to update the component model weights. | ||||
| > - [`nlp.initialize`](/api/language#initialize): Initialize the pipeline and | ||||
| >   return an optimizer to update the component model weights. | ||||
| > - [`Optimizer`](https://thinc.ai/docs/api-optimizers): Function that holds | ||||
| >   state between updates. | ||||
| > - [`nlp.update`](/api/language#update): Update component models with examples. | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue
	
	Block a user