Add docs on serializing the pipeline (see #3289) [ci skip]

This commit is contained in:
Ines Montani 2019-02-18 14:13:29 +01:00
parent 91f260f2c4
commit 57ae71ea95

View File

@ -22,6 +22,43 @@ the changes, see [this table](/usage/v2#incompat) and the notes on
</Infobox>
### Serializing the pipeline
When serializing the pipeline, keep in mind that this will only save out the
**binary data for the individual components** to allow spaCy to restore them
not the entire objects. This is a good thing, because it makes serialization
safe. But it also means that you have to take care of storing the language name
and pipeline component names as well, and restoring them separately before you
can load in the data.
> #### Saving the model meta
>
> The `nlp.meta` attribute is a JSON-serializable dictionary and contains all
> model meta information, like the language and pipeline, but also author and
> license information.
```python
### Serialize
bytes_data = nlp.to_bytes()
lang = nlp.meta["lang"] # "en"
pipeline = nlp.meta["pipeline"] # ["tagger", "parser", "ner"]
```
```python
### Deserialize
nlp = spacy.blank(lang)
for pipe_name in pipeline:
pipe = nlp.create_pipe(pipe_name)
nlp.add_pipe(pipe)
nlp.from_bytes(bytes_data)
```
This is also how spaCy does it under the hood when loading a model: it loads the
model's `meta.json` containing the language and pipeline information,
initializes the language class, creates and adds the pipeline components and
_then_ loads in the binary data. You can read more about this process
[here](/usage/processing-pipelines#pipelines).
### Using Pickle {#pickle}
> #### Example