From cef9f25ec08701377cc846c8b98b043e931323e8 Mon Sep 17 00:00:00 2001
From: langdonholmes <55119338+langdonholmes@users.noreply.github.com>
Date: Mon, 19 Apr 2021 02:58:12 -0700
Subject: [PATCH] Update processing-pipelines.md to mention method for doc
 metadata (#7480)

* Update processing-pipelines.md

Under "things to try," inform users they can save metadata when using nlp.pipe(foobar, as_tuples=True)

Link to a new example on the attributes page detailing the following:

> ```
> data = [
>   ("Some text to process", {"meta": "foo"}),
>   ("And more text...", {"meta": "bar"})
> ]
>
> for doc, context in nlp.pipe(data, as_tuples=True):
>     # Let's assume you have a "meta" extension registered on the Doc
>     doc._.meta = context["meta"]
> ```

from https://stackoverflow.com/questions/57058798/make-spacy-nlp-pipe-process-tuples-of-text-and-additional-information-to-add-as

* Updating the attributes section

Update the attributes section with example of how extensions can be used to store metadata.

* Update processing-pipelines.md

* Update processing-pipelines.md

Made as_tuples example executable and relocated to the end of the "Processing Text" section.

* Update processing-pipelines.md

* Update processing-pipelines.md

Removed extra line

* Reformat and rephrase

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
---
 website/docs/usage/processing-pipelines.md | 33 ++++++++++++++++++++++
 1 file changed, 33 insertions(+)
diff --git a/website/docs/usage/processing-pipelines.md b/website/docs/usage/processing-pipelines.md
index 52568658d..bde3ab84f 100644
--- a/website/docs/usage/processing-pipelines.md
+++ b/website/docs/usage/processing-pipelines.md
@@ -91,6 +91,37 @@ have to call `list()` on it first:
 
 </Infobox>
 
+You can use the `as_tuples` option to pass additional context along with each
+doc when using [`nlp.pipe`](/api/language#pipe). If `as_tuples` is `True`, then
+the input should be a sequence of `(text, context)` tuples and the output will
+be a sequence of `(doc, context)` tuples. For example, you can pass metadata in
+the context and save it in a [custom attribute](#custom-components-attributes):
+
+```python
+### {executable="true"}
+import spacy
+from spacy.tokens import Doc
+
+if not Doc.has_extension("text_id"):
+    Doc.set_extension("text_id", default=None)
+
+text_tuples = [
+    ("This is the first text.", {"text_id": "text1"}),
+    ("This is the second text.", {"text_id": "text2"})
+]
+
+nlp = spacy.load("en_core_web_sm")
+doc_tuples = nlp.pipe(text_tuples, as_tuples=True)
+
+docs = []
+for doc, context in doc_tuples:
+    doc._.text_id = context["text_id"]
+    docs.append(doc)
+
+for doc in docs:
+    print(f"{doc._.text_id}: {doc.text}")
+```
+
 ### Multiprocessing {#multiprocessing}
 
 spaCy includes built-in support for multiprocessing with
@@ -1373,6 +1404,8 @@ There are three main types of extensions, which can be defined using the
 [`Span.set_extension`](/api/span#set_extension) and
 [`Token.set_extension`](/api/token#set_extension) methods.
 
+## Description
+
 1. **Attribute extensions.** Set a default value for an attribute, which can be
    overwritten manually at any time. Attribute extensions work like "normal"
    variables and are the quickest way to store arbitrary information on a `Doc`,