mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-13 18:56:36 +03:00
Merge branch 'master' of https://github.com/explosion/spaCy
This commit is contained in:
commit
2bdf68a632
|
@ -149,7 +149,9 @@ p
|
||||||
|
|
||||||
+aside
|
+aside
|
||||||
| #[+api("language#begin_training") #[code begin_training()]]: Start the
|
| #[+api("language#begin_training") #[code begin_training()]]: Start the
|
||||||
| training and return an optimizer function to update the model's weights.#[br]
|
| training and return an optimizer function to update the model's weights.
|
||||||
|
| Can take an optional function converting the training data to spaCy's
|
||||||
|
| training format.#[br]
|
||||||
| #[+api("language#update") #[code update()]]: Update the model with the
|
| #[+api("language#update") #[code update()]]: Update the model with the
|
||||||
| training example and gold data.#[br]
|
| training example and gold data.#[br]
|
||||||
| #[+api("language#to_disk") #[code to_disk()]]: Save the updated model to
|
| #[+api("language#to_disk") #[code to_disk()]]: Save the updated model to
|
||||||
|
@ -165,38 +167,38 @@ p
|
||||||
nlp.update([doc], [gold], drop=0.5, sgd=optimizer)
|
nlp.update([doc], [gold], drop=0.5, sgd=optimizer)
|
||||||
nlp.to_disk('/model')
|
nlp.to_disk('/model')
|
||||||
|
|
||||||
|
p
|
||||||
|
| The #[+api("language#update") #[code nlp.update]] method takes the
|
||||||
|
| following arguments:
|
||||||
|
|
||||||
+table(["Name", "Description"])
|
+table(["Name", "Description"])
|
||||||
+row
|
+row
|
||||||
+cell #[code train_data]
|
+cell #[code docs]
|
||||||
+cell The training data.
|
|
||||||
|
|
||||||
+row
|
|
||||||
+cell #[code get_data]
|
|
||||||
+cell
|
|
||||||
| An optional function converting the training data to spaCy's
|
|
||||||
| JSON format.
|
|
||||||
|
|
||||||
+row
|
|
||||||
+cell #[code doc]
|
|
||||||
+cell
|
+cell
|
||||||
| #[+api("doc") #[code Doc]] objects. The #[code update] method
|
| #[+api("doc") #[code Doc]] objects. The #[code update] method
|
||||||
| takes a sequence of them, so you can batch up your training
|
| takes a sequence of them, so you can batch up your training
|
||||||
| examples.
|
| examples. Alternatively, you can also pass in a sequence of
|
||||||
|
| raw texts.
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code gold]
|
+cell #[code golds]
|
||||||
+cell
|
+cell
|
||||||
| #[+api("goldparse") #[code GoldParse]] objects. The #[code update]
|
| #[+api("goldparse") #[code GoldParse]] objects. The #[code update]
|
||||||
| method takes a sequence of them, so you can batch up your
|
| method takes a sequence of them, so you can batch up your
|
||||||
| training examples.
|
| training examples. Alternatively, you can also pass in a
|
||||||
|
| dictionary containing the annotations.
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code drop]
|
+cell #[code drop]
|
||||||
+cell Dropout rate. Makes it harder for the model to just memorise the data.
|
+cell
|
||||||
|
| Dropout rate. Makes it harder for the model to just memorise
|
||||||
|
| the data.
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell #[code optimizer]
|
+cell #[code sgd]
|
||||||
+cell Callable to update the model's weights.
|
+cell
|
||||||
|
| An optimizer, i.e. a callable to update the model's weights. If
|
||||||
|
| not set, spaCy will create a new one and save it for further use.
|
||||||
|
|
||||||
p
|
p
|
||||||
| Instead of writing your own training loop, you can also use the
|
| Instead of writing your own training loop, you can also use the
|
||||||
|
|
|
@ -17,6 +17,25 @@ p
|
||||||
| runtime inputs must match. This means you'll have to
|
| runtime inputs must match. This means you'll have to
|
||||||
| #[strong retrain your models] with spaCy v2.0.
|
| #[strong retrain your models] with spaCy v2.0.
|
||||||
|
|
||||||
|
+h(3, "migrating-document-processing") Document processing
|
||||||
|
|
||||||
|
p
|
||||||
|
| The #[+api("language#pipe") #[code Language.pipe]] method allows spaCy
|
||||||
|
| to batch documents, which brings a
|
||||||
|
| #[strong significant performance advantage] in v2.0. The new neural
|
||||||
|
| networks introduce some overhead per batch, so if you're processing a
|
||||||
|
| number of documents in a row, you should use #[code nlp.pipe] and process
|
||||||
|
| the texts as a stream.
|
||||||
|
|
||||||
|
+code-new docs = nlp.pipe(texts)
|
||||||
|
+code-old docs = (nlp(text) for text in texts)
|
||||||
|
|
||||||
|
p
|
||||||
|
| To make usage easier, there's now a boolean #[code as_tuples]
|
||||||
|
| keyword argument, that lets you pass in an iterator of
|
||||||
|
| #[code (text, context)] pairs, so you can get back an iterator of
|
||||||
|
| #[code (doc, context)] tuples.
|
||||||
|
|
||||||
+h(3, "migrating-saving-loading") Saving, loading and serialization
|
+h(3, "migrating-saving-loading") Saving, loading and serialization
|
||||||
|
|
||||||
p
|
p
|
||||||
|
|
Loading…
Reference in New Issue
Block a user