mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-26 18:06:29 +03:00
Update saving/loading example
This commit is contained in:
parent
0fb1881f36
commit
40e5d3a980
|
@ -19,9 +19,8 @@ import Serialization101 from 'usage/101/\_serialization.md'
|
||||||
When serializing the pipeline, keep in mind that this will only save out the
|
When serializing the pipeline, keep in mind that this will only save out the
|
||||||
**binary data for the individual components** to allow spaCy to restore them –
|
**binary data for the individual components** to allow spaCy to restore them –
|
||||||
not the entire objects. This is a good thing, because it makes serialization
|
not the entire objects. This is a good thing, because it makes serialization
|
||||||
safe. But it also means that you have to take care of storing the language name
|
safe. But it also means that you have to take care of storing the config, which
|
||||||
and pipeline component names as well, and restoring them separately before you
|
contains the pipeline configuration and all the relevant settings.
|
||||||
can load in the data.
|
|
||||||
|
|
||||||
> #### Saving the meta and config
|
> #### Saving the meta and config
|
||||||
>
|
>
|
||||||
|
@ -33,24 +32,21 @@ can load in the data.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
### Serialize
|
### Serialize
|
||||||
|
config = nlp.config
|
||||||
bytes_data = nlp.to_bytes()
|
bytes_data = nlp.to_bytes()
|
||||||
lang = nlp.config["nlp"]["lang"] # "en"
|
|
||||||
pipeline = nlp.config["nlp"]["pipeline"] # ["tagger", "parser", "ner"]
|
|
||||||
```
|
```
|
||||||
|
|
||||||
```python
|
```python
|
||||||
### Deserialize
|
### Deserialize
|
||||||
nlp = spacy.blank(lang)
|
lang_cls = spacy.util.get_lang_class(config["nlp"]["lang"])
|
||||||
for pipe_name in pipeline:
|
nlp = lang_cls.from_config(config)
|
||||||
nlp.add_pipe(pipe_name)
|
|
||||||
nlp.from_bytes(bytes_data)
|
nlp.from_bytes(bytes_data)
|
||||||
```
|
```
|
||||||
|
|
||||||
This is also how spaCy does it under the hood when loading a pipeline: it loads
|
This is also how spaCy does it under the hood when loading a pipeline: it loads
|
||||||
the `config.cfg` containing the language and pipeline information, initializes
|
the `config.cfg` containing the language and pipeline information, initializes
|
||||||
the language class, creates and adds the pipeline components based on the
|
the language class, creates and adds the pipeline components based on the config
|
||||||
defined [factories](/usage/processing-pipeline#custom-components-factories) and
|
and _then_ loads in the binary data. You can read more about this process
|
||||||
_then_ loads in the binary data. You can read more about this process
|
|
||||||
[here](/usage/processing-pipelines#pipelines).
|
[here](/usage/processing-pipelines#pipelines).
|
||||||
|
|
||||||
## Serializing Doc objects efficiently {#docs new="2.2"}
|
## Serializing Doc objects efficiently {#docs new="2.2"}
|
||||||
|
|
Loading…
Reference in New Issue
Block a user