mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-30 20:06:30 +03:00
30 lines
1.6 KiB
Markdown
30 lines
1.6 KiB
Markdown
|
If you've been modifying the pipeline, vocabulary, vectors and entities, or made
|
|||
|
updates to the model, you'll eventually want to **save your progress** – for
|
|||
|
example, everything that's in your `nlp` object. This means you'll have to
|
|||
|
translate its contents and structure into a format that can be saved, like a
|
|||
|
file or a byte string. This process is called serialization. spaCy comes with
|
|||
|
**built-in serialization methods** and supports the
|
|||
|
[Pickle protocol](http://www.diveintopython3.net/serializing.html#dump).
|
|||
|
|
|||
|
> #### What's pickle?
|
|||
|
>
|
|||
|
> Pickle is Python's built-in object persistence system. It lets you transfer
|
|||
|
> arbitrary Python objects between processes. This is usually used to load an
|
|||
|
> object to and from disk, but it's also used for distributed computing, e.g.
|
|||
|
> with
|
|||
|
> [PySpark](https://spark.apache.org/docs/0.9.0/python-programming-guide.html)
|
|||
|
> or [Dask](http://dask.pydata.org/en/latest/). When you unpickle an object,
|
|||
|
> you're agreeing to execute whatever code it contains. It's like calling
|
|||
|
> `eval()` on a string – so don't unpickle objects from untrusted sources.
|
|||
|
|
|||
|
All container classes, i.e. [`Language`](/api/language) (`nlp`),
|
|||
|
[`Doc`](/api/doc), [`Vocab`](/api/vocab) and [`StringStore`](/api/stringstore)
|
|||
|
have the following methods available:
|
|||
|
|
|||
|
| Method | Returns | Example |
|
|||
|
| ------------ | ------- | ------------------------ |
|
|||
|
| `to_bytes` | bytes | `data = nlp.to_bytes()` |
|
|||
|
| `from_bytes` | object | `nlp.from_bytes(data)` |
|
|||
|
| `to_disk` | - | `nlp.to_disk("/path")` |
|
|||
|
| `from_disk` | object | `nlp.from_disk("/path")` |
|