Update saving/loading example

This commit is contained in:
Adriane Boyd 2021-03-18 16:56:10 +01:00
parent 0fb1881f36
commit 40e5d3a980

View File

@ -19,9 +19,8 @@ import Serialization101 from 'usage/101/\_serialization.md'
When serializing the pipeline, keep in mind that this will only save out the When serializing the pipeline, keep in mind that this will only save out the
**binary data for the individual components** to allow spaCy to restore them **binary data for the individual components** to allow spaCy to restore them
not the entire objects. This is a good thing, because it makes serialization not the entire objects. This is a good thing, because it makes serialization
safe. But it also means that you have to take care of storing the language name safe. But it also means that you have to take care of storing the config, which
and pipeline component names as well, and restoring them separately before you contains the pipeline configuration and all the relevant settings.
can load in the data.
> #### Saving the meta and config > #### Saving the meta and config
> >
@ -33,24 +32,21 @@ can load in the data.
```python ```python
### Serialize ### Serialize
config = nlp.config
bytes_data = nlp.to_bytes() bytes_data = nlp.to_bytes()
lang = nlp.config["nlp"]["lang"] # "en"
pipeline = nlp.config["nlp"]["pipeline"] # ["tagger", "parser", "ner"]
``` ```
```python ```python
### Deserialize ### Deserialize
nlp = spacy.blank(lang) lang_cls = spacy.util.get_lang_class(config["nlp"]["lang"])
for pipe_name in pipeline: nlp = lang_cls.from_config(config)
nlp.add_pipe(pipe_name)
nlp.from_bytes(bytes_data) nlp.from_bytes(bytes_data)
``` ```
This is also how spaCy does it under the hood when loading a pipeline: it loads This is also how spaCy does it under the hood when loading a pipeline: it loads
the `config.cfg` containing the language and pipeline information, initializes the `config.cfg` containing the language and pipeline information, initializes
the language class, creates and adds the pipeline components based on the the language class, creates and adds the pipeline components based on the config
defined [factories](/usage/processing-pipeline#custom-components-factories) and and _then_ loads in the binary data. You can read more about this process
_then_ loads in the binary data. You can read more about this process
[here](/usage/processing-pipelines#pipelines). [here](/usage/processing-pipelines#pipelines).
## Serializing Doc objects efficiently {#docs new="2.2"} ## Serializing Doc objects efficiently {#docs new="2.2"}