mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 09:57:26 +03:00 
			
		
		
		
	Update serialization 101
This commit is contained in:
		
							parent
							
								
									72380c952a
								
							
						
					
					
						commit
						abed463bbb
					
				| 
						 | 
					@ -1,12 +1,12 @@
 | 
				
			||||||
//- 💫 DOCS > USAGE > SPACY 101 > SERIALIZATION
 | 
					//- 💫 DOCS > USAGE > SPACY 101 > SERIALIZATION
 | 
				
			||||||
 | 
					
 | 
				
			||||||
p
 | 
					p
 | 
				
			||||||
    |  If you've been modifying the pipeline, vocabulary vectors and entities, or made
 | 
					    |  If you've been modifying the pipeline, vocabulary, vectors and entities,
 | 
				
			||||||
    |  updates to the model, you'll eventually want
 | 
					    |  or made updates to the model, you'll eventually want to
 | 
				
			||||||
    |  to #[strong save your progress] – for example, everything that's in your #[code nlp]
 | 
					    |  #[strong save your progress] – for example, everything that's in your
 | 
				
			||||||
    |  object. This means you'll have to translate its contents and structure
 | 
					    |  #[code nlp] object. This means you'll have to translate its contents and
 | 
				
			||||||
    |  into a format that can be saved, like a file or a byte string. This
 | 
					    |  structure into a format that can be saved, like a file or a byte string.
 | 
				
			||||||
    |  process is called serialization. spaCy comes with
 | 
					    |  This process is called serialization. spaCy comes with
 | 
				
			||||||
    |  #[strong built-in serialization methods] and supports the
 | 
					    |  #[strong built-in serialization methods] and supports the
 | 
				
			||||||
    |  #[+a("http://www.diveintopython3.net/serializing.html#dump") Pickle protocol].
 | 
					    |  #[+a("http://www.diveintopython3.net/serializing.html#dump") Pickle protocol].
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -45,11 +45,7 @@ p
 | 
				
			||||||
    |  #[code Vocab] holds the context-independent information about the words,
 | 
					    |  #[code Vocab] holds the context-independent information about the words,
 | 
				
			||||||
    |  tags and labels, and their #[strong hash values]. If the #[code Vocab]
 | 
					    |  tags and labels, and their #[strong hash values]. If the #[code Vocab]
 | 
				
			||||||
    |  wasn't saved with the #[code Doc], spaCy wouldn't know how to resolve
 | 
					    |  wasn't saved with the #[code Doc], spaCy wouldn't know how to resolve
 | 
				
			||||||
    |  those IDs – for example, the word text or the dependency labels. You
 | 
					    |  those IDs back to strings.
 | 
				
			||||||
    |  might be saving #[code 446] for "whale", but in a different vocabulary,
 | 
					 | 
				
			||||||
    |  this ID could map to "VERB". Similarly, if your document was processed by
 | 
					 | 
				
			||||||
    |  a German model, its vocab will include the specific
 | 
					 | 
				
			||||||
    |  #[+a("/docs/api/annotation#dependency-parsing-german") German dependency labels].
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
+code.
 | 
					+code.
 | 
				
			||||||
    moby_dick = open('moby_dick.txt', 'r') # open a large document
 | 
					    moby_dick = open('moby_dick.txt', 'r') # open a large document
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user