DocPallet->DocBin in docs

This commit is contained in:
Matthew Honnibal 2019-09-18 15:16:57 +02:00
parent e53b86751f
commit 931e96b6c7

View File

@ -118,20 +118,20 @@ classification.
</Infobox> </Infobox>
### New DocPallet class to efficiently Doc collections ### New DocBin class to efficiently serialize Doc collections
> #### Example > #### Example
> >
> ```python > ```python
> from spacy.tokens import DocPallet > from spacy.tokens import DocBin
> pallet = DocPallet(attrs=["LEMMA", "ENT_IOB", "ENT_TYPE"], store_user_data=False) > doc_bin = DocBin(attrs=["LEMMA", "ENT_IOB", "ENT_TYPE"], store_user_data=False)
> for doc in nlp.pipe(texts): > for doc in nlp.pipe(texts):
> pallet.add(doc) > doc_bin.add(doc)
> byte_data = pallet.to_bytes() > byte_data = docbin.to_bytes()
> # Deserialize later, e.g. in a new process > # Deserialize later, e.g. in a new process
> nlp = spacy.blank("en") > nlp = spacy.blank("en")
> pallet = DocPallet() > doc_bin = DocBin()
> docs = list(pallet.get_docs(nlp.vocab)) > docs = list(doc_bin.get_docs(nlp.vocab))
> ``` > ```
If you're working with lots of data, you'll probably need to pass analyses If you're working with lots of data, you'll probably need to pass analyses
@ -140,7 +140,7 @@ save out work to disk. Often it's sufficient to use the doc.to_array()
functionality for this, and just serialize the numpy arrays --- but other times functionality for this, and just serialize the numpy arrays --- but other times
you want a more general way to save and restore `Doc` objects. you want a more general way to save and restore `Doc` objects.
The new `DocPallet` class makes it easy to serialize and deserialize The new `DocBin` class makes it easy to serialize and deserialize
a collection of `Doc` objects together, and is much more efficient than a collection of `Doc` objects together, and is much more efficient than
calling `doc.to_bytes()` on each individual `Doc` object. You can also control calling `doc.to_bytes()` on each individual `Doc` object. You can also control
what data gets saved, and you can merge pallets together for easy what data gets saved, and you can merge pallets together for easy