mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-11 17:56:30 +03:00
DocPallet->DocBin in docs
This commit is contained in:
parent
e53b86751f
commit
931e96b6c7
|
@ -118,20 +118,20 @@ classification.
|
|||
|
||||
</Infobox>
|
||||
|
||||
### New DocPallet class to efficiently Doc collections
|
||||
### New DocBin class to efficiently serialize Doc collections
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> from spacy.tokens import DocPallet
|
||||
> pallet = DocPallet(attrs=["LEMMA", "ENT_IOB", "ENT_TYPE"], store_user_data=False)
|
||||
> from spacy.tokens import DocBin
|
||||
> doc_bin = DocBin(attrs=["LEMMA", "ENT_IOB", "ENT_TYPE"], store_user_data=False)
|
||||
> for doc in nlp.pipe(texts):
|
||||
> pallet.add(doc)
|
||||
> byte_data = pallet.to_bytes()
|
||||
> doc_bin.add(doc)
|
||||
> byte_data = docbin.to_bytes()
|
||||
> # Deserialize later, e.g. in a new process
|
||||
> nlp = spacy.blank("en")
|
||||
> pallet = DocPallet()
|
||||
> docs = list(pallet.get_docs(nlp.vocab))
|
||||
> doc_bin = DocBin()
|
||||
> docs = list(doc_bin.get_docs(nlp.vocab))
|
||||
> ```
|
||||
|
||||
If you're working with lots of data, you'll probably need to pass analyses
|
||||
|
@ -140,7 +140,7 @@ save out work to disk. Often it's sufficient to use the doc.to_array()
|
|||
functionality for this, and just serialize the numpy arrays --- but other times
|
||||
you want a more general way to save and restore `Doc` objects.
|
||||
|
||||
The new `DocPallet` class makes it easy to serialize and deserialize
|
||||
The new `DocBin` class makes it easy to serialize and deserialize
|
||||
a collection of `Doc` objects together, and is much more efficient than
|
||||
calling `doc.to_bytes()` on each individual `Doc` object. You can also control
|
||||
what data gets saved, and you can merge pallets together for easy
|
||||
|
|
Loading…
Reference in New Issue
Block a user