mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-30 23:47:31 +03:00 
			
		
		
		
	Merge branch 'master' of https://github.com/explosion/spaCy
This commit is contained in:
		
						commit
						2bdf68a632
					
				|  | @ -149,7 +149,9 @@ p | |||
| 
 | ||||
| +aside | ||||
|     |  #[+api("language#begin_training") #[code begin_training()]]: Start the | ||||
|     |  training and return an optimizer function to update the model's weights.#[br] | ||||
|     |  training and return an optimizer function to update the model's weights. | ||||
|     |  Can take an optional function converting the training data to spaCy's | ||||
|     |  training format.#[br] | ||||
|     |  #[+api("language#update") #[code update()]]: Update the model with the | ||||
|     |  training example and gold data.#[br] | ||||
|     |  #[+api("language#to_disk") #[code to_disk()]]: Save the updated model to | ||||
|  | @ -165,38 +167,38 @@ p | |||
|             nlp.update([doc], [gold], drop=0.5, sgd=optimizer) | ||||
|     nlp.to_disk('/model') | ||||
| 
 | ||||
| p | ||||
|     |  The #[+api("language#update") #[code nlp.update]] method takes the | ||||
|     |  following arguments: | ||||
| 
 | ||||
| +table(["Name", "Description"]) | ||||
|     +row | ||||
|         +cell #[code train_data] | ||||
|         +cell The training data. | ||||
| 
 | ||||
|     +row | ||||
|         +cell #[code get_data] | ||||
|         +cell | ||||
|             |  An optional function converting the training data to spaCy's | ||||
|             |  JSON format. | ||||
| 
 | ||||
|     +row | ||||
|         +cell #[code doc] | ||||
|         +cell #[code docs] | ||||
|         +cell | ||||
|             |  #[+api("doc") #[code Doc]] objects. The #[code update] method | ||||
|             |  takes a sequence of them, so you can batch up your training | ||||
|             |  examples. | ||||
|             |  examples. Alternatively, you can also pass in a sequence of | ||||
|             |  raw texts. | ||||
| 
 | ||||
|     +row | ||||
|         +cell #[code gold] | ||||
|         +cell #[code golds] | ||||
|         +cell | ||||
|             |  #[+api("goldparse") #[code GoldParse]] objects. The #[code update] | ||||
|             |  method takes a sequence of them, so you can batch up your | ||||
|             |  training examples. | ||||
|             |  training examples. Alternatively, you can also pass in a | ||||
|             |  dictionary containing the annotations. | ||||
| 
 | ||||
|     +row | ||||
|         +cell #[code drop] | ||||
|         +cell Dropout rate. Makes it harder for the model to just memorise the data. | ||||
|         +cell | ||||
|             |  Dropout rate. Makes it harder for the model to just memorise | ||||
|             |  the data. | ||||
| 
 | ||||
|     +row | ||||
|         +cell #[code optimizer] | ||||
|         +cell Callable to update the model's weights. | ||||
|         +cell #[code sgd] | ||||
|         +cell | ||||
|             |  An optimizer, i.e. a callable to update the model's weights. If | ||||
|             |  not set, spaCy will create a new one and save it for further use. | ||||
| 
 | ||||
| p | ||||
|     |  Instead of writing your own training loop, you can also use the | ||||
|  |  | |||
|  | @ -17,6 +17,25 @@ p | |||
|     |  runtime inputs must match. This means you'll have to | ||||
|     |  #[strong retrain your models] with spaCy v2.0. | ||||
| 
 | ||||
| +h(3, "migrating-document-processing") Document processing | ||||
| 
 | ||||
| p | ||||
|     |  The #[+api("language#pipe") #[code Language.pipe]] method allows spaCy | ||||
|     |  to batch documents, which brings a | ||||
|     |  #[strong significant performance advantage] in v2.0. The new neural | ||||
|     |  networks introduce some overhead per batch, so if you're processing a | ||||
|     |  number of documents in a row, you should use #[code nlp.pipe] and process | ||||
|     |  the texts as a stream. | ||||
| 
 | ||||
| +code-new docs = nlp.pipe(texts) | ||||
| +code-old docs = (nlp(text) for text in texts) | ||||
| 
 | ||||
| p | ||||
|     |  To make usage easier, there's now a boolean #[code as_tuples] | ||||
|     |  keyword argument, that lets you pass in an iterator of | ||||
|     |  #[code (text, context)] pairs, so you can get back an iterator of | ||||
|     |  #[code (doc, context)] tuples. | ||||
| 
 | ||||
| +h(3, "migrating-saving-loading") Saving, loading and serialization | ||||
| 
 | ||||
| p | ||||
|  |  | |||
		Loading…
	
		Reference in New Issue
	
	Block a user