mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 17:24:41 +03:00
Add docs on serializing the pipeline (see #3289) [ci skip]
This commit is contained in:
parent
91f260f2c4
commit
57ae71ea95
|
@ -22,6 +22,43 @@ the changes, see [this table](/usage/v2#incompat) and the notes on
|
|||
|
||||
</Infobox>
|
||||
|
||||
### Serializing the pipeline
|
||||
|
||||
When serializing the pipeline, keep in mind that this will only save out the
|
||||
**binary data for the individual components** to allow spaCy to restore them –
|
||||
not the entire objects. This is a good thing, because it makes serialization
|
||||
safe. But it also means that you have to take care of storing the language name
|
||||
and pipeline component names as well, and restoring them separately before you
|
||||
can load in the data.
|
||||
|
||||
> #### Saving the model meta
|
||||
>
|
||||
> The `nlp.meta` attribute is a JSON-serializable dictionary and contains all
|
||||
> model meta information, like the language and pipeline, but also author and
|
||||
> license information.
|
||||
|
||||
```python
|
||||
### Serialize
|
||||
bytes_data = nlp.to_bytes()
|
||||
lang = nlp.meta["lang"] # "en"
|
||||
pipeline = nlp.meta["pipeline"] # ["tagger", "parser", "ner"]
|
||||
```
|
||||
|
||||
```python
|
||||
### Deserialize
|
||||
nlp = spacy.blank(lang)
|
||||
for pipe_name in pipeline:
|
||||
pipe = nlp.create_pipe(pipe_name)
|
||||
nlp.add_pipe(pipe)
|
||||
nlp.from_bytes(bytes_data)
|
||||
```
|
||||
|
||||
This is also how spaCy does it under the hood when loading a model: it loads the
|
||||
model's `meta.json` containing the language and pipeline information,
|
||||
initializes the language class, creates and adds the pipeline components and
|
||||
_then_ loads in the binary data. You can read more about this process
|
||||
[here](/usage/processing-pipelines#pipelines).
|
||||
|
||||
### Using Pickle {#pickle}
|
||||
|
||||
> #### Example
|
||||
|
|
Loading…
Reference in New Issue
Block a user