mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-26 13:41:21 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			30 lines
		
	
	
		
			1.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			30 lines
		
	
	
		
			1.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| If you've been modifying the pipeline, vocabulary, vectors and entities, or made
 | ||
| updates to the component models, you'll eventually want to **save your
 | ||
| progress** – for example, everything that's in your `nlp` object. This means
 | ||
| you'll have to translate its contents and structure into a format that can be
 | ||
| saved, like a file or a byte string. This process is called serialization. spaCy
 | ||
| comes with **built-in serialization methods** and supports the
 | ||
| [Pickle protocol](https://www.diveinto.org/python3/serializing.html#dump).
 | ||
| 
 | ||
| > #### What's pickle?
 | ||
| >
 | ||
| > Pickle is Python's built-in object persistence system. It lets you transfer
 | ||
| > arbitrary Python objects between processes. This is usually used to load an
 | ||
| > object to and from disk, but it's also used for distributed computing, e.g.
 | ||
| > with
 | ||
| > [PySpark](https://spark.apache.org/docs/0.9.0/python-programming-guide.html)
 | ||
| > or [Dask](https://dask.org). When you unpickle an object, you're agreeing to
 | ||
| > execute whatever code it contains. It's like calling `eval()` on a string – so
 | ||
| > don't unpickle objects from untrusted sources.
 | ||
| 
 | ||
| All container classes, i.e. [`Language`](/api/language) (`nlp`),
 | ||
| [`Doc`](/api/doc), [`Vocab`](/api/vocab) and [`StringStore`](/api/stringstore)
 | ||
| have the following methods available:
 | ||
| 
 | ||
| | Method       | Returns | Example                  |
 | ||
| | ------------ | ------- | ------------------------ |
 | ||
| | `to_bytes`   | bytes   | `data = nlp.to_bytes()`  |
 | ||
| | `from_bytes` | object  | `nlp.from_bytes(data)`   |
 | ||
| | `to_disk`    | -       | `nlp.to_disk("/path")`   |
 | ||
| | `from_disk`  | object  | `nlp.from_disk("/path")` |
 |