mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-25 17:36:30 +03:00
Update v2-2 docs
This commit is contained in:
parent
fa9a283128
commit
f537cbeacc
|
@ -98,9 +98,10 @@ on disk**.
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```bash
|
||||||
> scorer = nlp.evaluate(dev_data)
|
> spacy train en /path/to/output /path/to/train /path/to/dev \
|
||||||
> print(scorer.textcat_scores, scorer.textcats_per_cat)
|
> --pipeline textcat \
|
||||||
|
> --textcat-arch simple_cnn --textcat-multilabel
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
When training your models using the `spacy train` command, you can now also
|
When training your models using the `spacy train` command, you can now also
|
||||||
|
@ -117,6 +118,34 @@ classification.
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
|
### New DocPallet class to efficiently Doc collections
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> from spacy.tokens import DocPallet
|
||||||
|
> pallet = DocPallet(attrs=["LEMMA", "ENT_IOB", "ENT_TYPE"], store_user_data=False)
|
||||||
|
> for doc in nlp.pipe(texts):
|
||||||
|
> pallet.add(doc)
|
||||||
|
> byte_data = pallet.to_bytes()
|
||||||
|
> # Deserialize later, e.g. in a new process
|
||||||
|
> nlp = spacy.blank("en")
|
||||||
|
> pallet = DocPallet()
|
||||||
|
> docs = list(pallet.get_docs(nlp.vocab))
|
||||||
|
> ```
|
||||||
|
|
||||||
|
If you're working with lots of data, you'll probably need to pass analyses
|
||||||
|
between machines, either to use something like Dask or Spark, or even just to
|
||||||
|
save out work to disk. Often it's sufficient to use the doc.to_array()
|
||||||
|
functionality for this, and just serialize the numpy arrays --- but other times
|
||||||
|
you want a more general way to save and restore `Doc` objects.
|
||||||
|
|
||||||
|
The new `DocPallet` class makes it easy to serialize and deserialize
|
||||||
|
a collection of `Doc` objects together, and is much more efficient than
|
||||||
|
calling `doc.to_bytes()` on each individual `Doc` object. You can also control
|
||||||
|
what data gets saved, and you can merge pallets together for easy
|
||||||
|
map/reduce-style processing.
|
||||||
|
|
||||||
### CLI command to debug and validate training data {#debug-data}
|
### CLI command to debug and validate training data {#debug-data}
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
|
|
Loading…
Reference in New Issue
Block a user