mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-25 17:36:30 +03:00
Merge branch 'master' of https://github.com/explosion/spaCy
This commit is contained in:
commit
2bdf68a632
|
@ -149,7 +149,9 @@ p
|
|||
|
||||
+aside
|
||||
| #[+api("language#begin_training") #[code begin_training()]]: Start the
|
||||
| training and return an optimizer function to update the model's weights.#[br]
|
||||
| training and return an optimizer function to update the model's weights.
|
||||
| Can take an optional function converting the training data to spaCy's
|
||||
| training format.#[br]
|
||||
| #[+api("language#update") #[code update()]]: Update the model with the
|
||||
| training example and gold data.#[br]
|
||||
| #[+api("language#to_disk") #[code to_disk()]]: Save the updated model to
|
||||
|
@ -165,38 +167,38 @@ p
|
|||
nlp.update([doc], [gold], drop=0.5, sgd=optimizer)
|
||||
nlp.to_disk('/model')
|
||||
|
||||
p
|
||||
| The #[+api("language#update") #[code nlp.update]] method takes the
|
||||
| following arguments:
|
||||
|
||||
+table(["Name", "Description"])
|
||||
+row
|
||||
+cell #[code train_data]
|
||||
+cell The training data.
|
||||
|
||||
+row
|
||||
+cell #[code get_data]
|
||||
+cell
|
||||
| An optional function converting the training data to spaCy's
|
||||
| JSON format.
|
||||
|
||||
+row
|
||||
+cell #[code doc]
|
||||
+cell #[code docs]
|
||||
+cell
|
||||
| #[+api("doc") #[code Doc]] objects. The #[code update] method
|
||||
| takes a sequence of them, so you can batch up your training
|
||||
| examples.
|
||||
| examples. Alternatively, you can also pass in a sequence of
|
||||
| raw texts.
|
||||
|
||||
+row
|
||||
+cell #[code gold]
|
||||
+cell #[code golds]
|
||||
+cell
|
||||
| #[+api("goldparse") #[code GoldParse]] objects. The #[code update]
|
||||
| method takes a sequence of them, so you can batch up your
|
||||
| training examples.
|
||||
| training examples. Alternatively, you can also pass in a
|
||||
| dictionary containing the annotations.
|
||||
|
||||
+row
|
||||
+cell #[code drop]
|
||||
+cell Dropout rate. Makes it harder for the model to just memorise the data.
|
||||
+cell
|
||||
| Dropout rate. Makes it harder for the model to just memorise
|
||||
| the data.
|
||||
|
||||
+row
|
||||
+cell #[code optimizer]
|
||||
+cell Callable to update the model's weights.
|
||||
+cell #[code sgd]
|
||||
+cell
|
||||
| An optimizer, i.e. a callable to update the model's weights. If
|
||||
| not set, spaCy will create a new one and save it for further use.
|
||||
|
||||
p
|
||||
| Instead of writing your own training loop, you can also use the
|
||||
|
|
|
@ -17,6 +17,25 @@ p
|
|||
| runtime inputs must match. This means you'll have to
|
||||
| #[strong retrain your models] with spaCy v2.0.
|
||||
|
||||
+h(3, "migrating-document-processing") Document processing
|
||||
|
||||
p
|
||||
| The #[+api("language#pipe") #[code Language.pipe]] method allows spaCy
|
||||
| to batch documents, which brings a
|
||||
| #[strong significant performance advantage] in v2.0. The new neural
|
||||
| networks introduce some overhead per batch, so if you're processing a
|
||||
| number of documents in a row, you should use #[code nlp.pipe] and process
|
||||
| the texts as a stream.
|
||||
|
||||
+code-new docs = nlp.pipe(texts)
|
||||
+code-old docs = (nlp(text) for text in texts)
|
||||
|
||||
p
|
||||
| To make usage easier, there's now a boolean #[code as_tuples]
|
||||
| keyword argument, that lets you pass in an iterator of
|
||||
| #[code (text, context)] pairs, so you can get back an iterator of
|
||||
| #[code (doc, context)] tuples.
|
||||
|
||||
+h(3, "migrating-saving-loading") Saving, loading and serialization
|
||||
|
||||
p
|
||||
|
|
Loading…
Reference in New Issue
Block a user