This commit is contained in:
Matthew Honnibal 2017-11-08 03:01:16 +01:00
commit 2bdf68a632
2 changed files with 39 additions and 18 deletions

View File

@ -149,7 +149,9 @@ p
+aside +aside
| #[+api("language#begin_training") #[code begin_training()]]: Start the | #[+api("language#begin_training") #[code begin_training()]]: Start the
| training and return an optimizer function to update the model's weights.#[br] | training and return an optimizer function to update the model's weights.
| Can take an optional function converting the training data to spaCy's
| training format.#[br]
| #[+api("language#update") #[code update()]]: Update the model with the | #[+api("language#update") #[code update()]]: Update the model with the
| training example and gold data.#[br] | training example and gold data.#[br]
| #[+api("language#to_disk") #[code to_disk()]]: Save the updated model to | #[+api("language#to_disk") #[code to_disk()]]: Save the updated model to
@ -165,38 +167,38 @@ p
nlp.update([doc], [gold], drop=0.5, sgd=optimizer) nlp.update([doc], [gold], drop=0.5, sgd=optimizer)
nlp.to_disk('/model') nlp.to_disk('/model')
p
| The #[+api("language#update") #[code nlp.update]] method takes the
| following arguments:
+table(["Name", "Description"]) +table(["Name", "Description"])
+row +row
+cell #[code train_data] +cell #[code docs]
+cell The training data.
+row
+cell #[code get_data]
+cell
| An optional function converting the training data to spaCy's
| JSON format.
+row
+cell #[code doc]
+cell +cell
| #[+api("doc") #[code Doc]] objects. The #[code update] method | #[+api("doc") #[code Doc]] objects. The #[code update] method
| takes a sequence of them, so you can batch up your training | takes a sequence of them, so you can batch up your training
| examples. | examples. Alternatively, you can also pass in a sequence of
| raw texts.
+row +row
+cell #[code gold] +cell #[code golds]
+cell +cell
| #[+api("goldparse") #[code GoldParse]] objects. The #[code update] | #[+api("goldparse") #[code GoldParse]] objects. The #[code update]
| method takes a sequence of them, so you can batch up your | method takes a sequence of them, so you can batch up your
| training examples. | training examples. Alternatively, you can also pass in a
| dictionary containing the annotations.
+row +row
+cell #[code drop] +cell #[code drop]
+cell Dropout rate. Makes it harder for the model to just memorise the data. +cell
| Dropout rate. Makes it harder for the model to just memorise
| the data.
+row +row
+cell #[code optimizer] +cell #[code sgd]
+cell Callable to update the model's weights. +cell
| An optimizer, i.e. a callable to update the model's weights. If
| not set, spaCy will create a new one and save it for further use.
p p
| Instead of writing your own training loop, you can also use the | Instead of writing your own training loop, you can also use the

View File

@ -17,6 +17,25 @@ p
| runtime inputs must match. This means you'll have to | runtime inputs must match. This means you'll have to
| #[strong retrain your models] with spaCy v2.0. | #[strong retrain your models] with spaCy v2.0.
+h(3, "migrating-document-processing") Document processing
p
| The #[+api("language#pipe") #[code Language.pipe]] method allows spaCy
| to batch documents, which brings a
| #[strong significant performance advantage] in v2.0. The new neural
| networks introduce some overhead per batch, so if you're processing a
| number of documents in a row, you should use #[code nlp.pipe] and process
| the texts as a stream.
+code-new docs = nlp.pipe(texts)
+code-old docs = (nlp(text) for text in texts)
p
| To make usage easier, there's now a boolean #[code as_tuples]
| keyword argument, that lets you pass in an iterator of
| #[code (text, context)] pairs, so you can get back an iterator of
| #[code (doc, context)] tuples.
+h(3, "migrating-saving-loading") Saving, loading and serialization +h(3, "migrating-saving-loading") Saving, loading and serialization
p p