mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-22 19:54:18 +03:00 
			
		
		
		
	document Pipe API details, crossreferences etc
This commit is contained in:
		
							parent
							
								
									9a7c6cc61a
								
							
						
					
					
						commit
						a8aa9a8068
					
				|  | @ -205,9 +205,16 @@ examples can either be the full training data or a representative sample. They | |||
| are used to **initialize the models** of trainable pipeline components and are | ||||
| passed each component's [`begin_training`](/api/pipe#begin_training) method, if | ||||
| available. Initialization includes validating the network, | ||||
| [inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and | ||||
| [inferring missing shapes](/usage/layers-architectures#shape-inference) and | ||||
| setting up the label scheme based on the data. | ||||
| 
 | ||||
| If no `get_examples` function is provided when calling `nlp.begin_training`, the | ||||
| pipeline components will be initialized with generic data. In this case, it is | ||||
| crucial that the output dimension of each component has already been defined | ||||
| either in the [config](/usage/training#config), or by calling | ||||
| [`pipe.add_label`](/api/pipe#add_label) for each possible output label (e.g. for | ||||
| the tagger or textcat). | ||||
| 
 | ||||
| <Infobox variant="warning" title="Changed in v3.0"> | ||||
| 
 | ||||
| The `Language.update` method now takes a **function** that is called with no | ||||
|  |  | |||
|  | @ -286,9 +286,6 @@ context, the original parameters are restored. | |||
| 
 | ||||
| ## Pipe.add_label {#add_label tag="method"} | ||||
| 
 | ||||
| Add a new label to the pipe. It's possible to extend trained models with new | ||||
| labels, but care should be taken to avoid the "catastrophic forgetting" problem. | ||||
| 
 | ||||
| > #### Example | ||||
| > | ||||
| > ```python | ||||
|  | @ -296,10 +293,85 @@ labels, but care should be taken to avoid the "catastrophic forgetting" problem. | |||
| > pipe.add_label("MY_LABEL") | ||||
| > ``` | ||||
| 
 | ||||
| | Name        | Description                                                 | | ||||
| | ----------- | ----------------------------------------------------------- | | ||||
| | `label`     | The label to add. ~~str~~                                   | | ||||
| | **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ | | ||||
| <Infobox variant="danger"> | ||||
| 
 | ||||
| This method needs to be overwritten with your own custom `add_label` method. | ||||
| 
 | ||||
| </Infobox> | ||||
| 
 | ||||
| Add a new label to the pipe, to be predicted by the model. The actual | ||||
| implementation depends on the specific component, but in general `add_label` | ||||
| shouldn't be called if the output dimension is already set, or if the model has | ||||
| already been fully [initialized](#begin_training). If these conditions are | ||||
| violated, the function will raise an Error. The exception to this rule is when | ||||
| the component is [resizable](#is_resizable), in which case | ||||
| [`set_output`](#set_output) should be called to ensure that the model is | ||||
| properly resized. | ||||
| 
 | ||||
| | Name        | Description                                             | | ||||
| | ----------- | ------------------------------------------------------- | | ||||
| | `label`     | The label to add. ~~str~~                               | | ||||
| | **RETURNS** | 0 if the label is already present, otherwise 1. ~~int~~ | | ||||
| 
 | ||||
| Note that in general, you don't have to call `pipe.add_label` if you provide a | ||||
| representative data sample to the [`begin_training`](#begin_training) method. In | ||||
| this case, all labels found in the sample will be automatically added to the | ||||
| model, and the output dimension will be | ||||
| [inferred](/usage/layers-architectures#shape-inference) automatically. | ||||
| 
 | ||||
| ## Pipe.is_resizable {#is_resizable tag="method"} | ||||
| 
 | ||||
| > #### Example | ||||
| > | ||||
| > ```python | ||||
| > can_resize = pipe.is_resizable() | ||||
| > ``` | ||||
| 
 | ||||
| Check whether or not the output dimension of the component's model can be | ||||
| resized. If this method returns `True`, [`set_output`](#set_output) can be | ||||
| called to change the model's output dimension. | ||||
| 
 | ||||
| | Name        | Description                                                                                    | | ||||
| | ----------- | ---------------------------------------------------------------------------------------------- | | ||||
| | **RETURNS** | Whether or not the output dimension of the model can be changed after initialization. ~~bool~~ | | ||||
| 
 | ||||
| > #### Example | ||||
| > | ||||
| > ```python | ||||
| > def custom_resize(model, new_nO): | ||||
| >     # adjust model | ||||
| >     return model | ||||
| > custom_model.attrs["resize_output"] = custom_resize | ||||
| > ``` | ||||
| 
 | ||||
| For built-in components that are not resizable, you have to create and train a | ||||
| new model from scratch with the appropriate architecture and output dimension. | ||||
| 
 | ||||
| For custom components, you can implement a `resize_output` function and add it | ||||
| as an attribute to the component's model. | ||||
| 
 | ||||
| ## Pipe.set_output {#set_output tag="method"} | ||||
| 
 | ||||
| Change the output dimension of the component's model. If the component is not | ||||
| [resizable](#is_resizable), this method will throw a `NotImplementedError`. | ||||
| 
 | ||||
| If a component is resizable, the model's attribute `resize_output` will be | ||||
| called. This is a function that takes the original model and the new output | ||||
| dimension `nO`, and changes the model in place. | ||||
| 
 | ||||
| When resizing an already trained model, care should be taken to avoid the | ||||
| "catastrophic forgetting" problem. | ||||
| 
 | ||||
| > #### Example | ||||
| > | ||||
| > ```python | ||||
| > if pipe.is_resizable(): | ||||
| >     pipe.set_output(512) | ||||
| > ``` | ||||
| 
 | ||||
| | Name | Description                       | | ||||
| | ---- | --------------------------------- | | ||||
| | `nO` | The new output dimension. ~~int~~ | | ||||
| 
 | ||||
| ## Pipe.to_disk {#to_disk tag="method"} | ||||
| 
 | ||||
|  |  | |||
|  | @ -382,9 +382,11 @@ contrast to how the PyTorch layers are defined, where `in_features` precedes | |||
| ### Shape inference in thinc {#shape-inference} | ||||
| 
 | ||||
| It is not strictly necessary to define all the input and output dimensions for | ||||
| each layer, as Thinc can perform shape inference between sequential layers by | ||||
| matching up the output dimensionality of one layer to the input dimensionality | ||||
| of the next. This means that we can simplify the `layers` definition: | ||||
| each layer, as Thinc can perform | ||||
| [shape inference](https://thinc.ai/docs/usage-models#validation) between | ||||
| sequential layers by matching up the output dimensionality of one layer to the | ||||
| input dimensionality of the next. This means that we can simplify the `layers` | ||||
| definition: | ||||
| 
 | ||||
| ```python | ||||
| with Model.define_operators({">>": chain}): | ||||
|  | @ -399,8 +401,8 @@ with Model.define_operators({">>": chain}): | |||
| 
 | ||||
| Thinc can go one step further and deduce the correct input dimension of the | ||||
| first layer, and output dimension of the last. To enable this functionality, you | ||||
| can call [`model.initialize`](https://thinc.ai/docs/api-model#initialize) with | ||||
| an input sample `X` and an output sample `Y` with the correct dimensions. | ||||
| have to call [`model.initialize`](https://thinc.ai/docs/api-model#initialize) | ||||
| with an input sample `X` and an output sample `Y` with the correct dimensions. | ||||
| 
 | ||||
| ```python | ||||
| with Model.define_operators({">>": chain}): | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue
	
	Block a user