mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-26 01:46:28 +03:00
Update serialization 101
This commit is contained in:
parent
72380c952a
commit
abed463bbb
|
@ -1,12 +1,12 @@
|
||||||
//- 💫 DOCS > USAGE > SPACY 101 > SERIALIZATION
|
//- 💫 DOCS > USAGE > SPACY 101 > SERIALIZATION
|
||||||
|
|
||||||
p
|
p
|
||||||
| If you've been modifying the pipeline, vocabulary vectors and entities, or made
|
| If you've been modifying the pipeline, vocabulary, vectors and entities,
|
||||||
| updates to the model, you'll eventually want
|
| or made updates to the model, you'll eventually want to
|
||||||
| to #[strong save your progress] – for example, everything that's in your #[code nlp]
|
| #[strong save your progress] – for example, everything that's in your
|
||||||
| object. This means you'll have to translate its contents and structure
|
| #[code nlp] object. This means you'll have to translate its contents and
|
||||||
| into a format that can be saved, like a file or a byte string. This
|
| structure into a format that can be saved, like a file or a byte string.
|
||||||
| process is called serialization. spaCy comes with
|
| This process is called serialization. spaCy comes with
|
||||||
| #[strong built-in serialization methods] and supports the
|
| #[strong built-in serialization methods] and supports the
|
||||||
| #[+a("http://www.diveintopython3.net/serializing.html#dump") Pickle protocol].
|
| #[+a("http://www.diveintopython3.net/serializing.html#dump") Pickle protocol].
|
||||||
|
|
||||||
|
@ -45,11 +45,7 @@ p
|
||||||
| #[code Vocab] holds the context-independent information about the words,
|
| #[code Vocab] holds the context-independent information about the words,
|
||||||
| tags and labels, and their #[strong hash values]. If the #[code Vocab]
|
| tags and labels, and their #[strong hash values]. If the #[code Vocab]
|
||||||
| wasn't saved with the #[code Doc], spaCy wouldn't know how to resolve
|
| wasn't saved with the #[code Doc], spaCy wouldn't know how to resolve
|
||||||
| those IDs – for example, the word text or the dependency labels. You
|
| those IDs back to strings.
|
||||||
| might be saving #[code 446] for "whale", but in a different vocabulary,
|
|
||||||
| this ID could map to "VERB". Similarly, if your document was processed by
|
|
||||||
| a German model, its vocab will include the specific
|
|
||||||
| #[+a("/docs/api/annotation#dependency-parsing-german") German dependency labels].
|
|
||||||
|
|
||||||
+code.
|
+code.
|
||||||
moby_dick = open('moby_dick.txt', 'r') # open a large document
|
moby_dick = open('moby_dick.txt', 'r') # open a large document
|
||||||
|
|
Loading…
Reference in New Issue
Block a user