mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-29 06:57:49 +03:00
Update processing-pipelines.md to mention method for doc metadata (#7480)
* Update processing-pipelines.md
Under "things to try," inform users they can save metadata when using nlp.pipe(foobar, as_tuples=True)
Link to a new example on the attributes page detailing the following:
> ```
> data = [
> ("Some text to process", {"meta": "foo"}),
> ("And more text...", {"meta": "bar"})
> ]
>
> for doc, context in nlp.pipe(data, as_tuples=True):
> # Let's assume you have a "meta" extension registered on the Doc
> doc._.meta = context["meta"]
> ```
from https://stackoverflow.com/questions/57058798/make-spacy-nlp-pipe-process-tuples-of-text-and-additional-information-to-add-as
* Updating the attributes section
Update the attributes section with example of how extensions can be used to store metadata.
* Update processing-pipelines.md
* Update processing-pipelines.md
Made as_tuples example executable and relocated to the end of the "Processing Text" section.
* Update processing-pipelines.md
* Update processing-pipelines.md
Removed extra line
* Reformat and rephrase
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
This commit is contained in:
parent
fd6eebbfdc
commit
cef9f25ec0
|
|
@ -91,6 +91,37 @@ have to call `list()` on it first:
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
|
You can use the `as_tuples` option to pass additional context along with each
|
||||||
|
doc when using [`nlp.pipe`](/api/language#pipe). If `as_tuples` is `True`, then
|
||||||
|
the input should be a sequence of `(text, context)` tuples and the output will
|
||||||
|
be a sequence of `(doc, context)` tuples. For example, you can pass metadata in
|
||||||
|
the context and save it in a [custom attribute](#custom-components-attributes):
|
||||||
|
|
||||||
|
```python
|
||||||
|
### {executable="true"}
|
||||||
|
import spacy
|
||||||
|
from spacy.tokens import Doc
|
||||||
|
|
||||||
|
if not Doc.has_extension("text_id"):
|
||||||
|
Doc.set_extension("text_id", default=None)
|
||||||
|
|
||||||
|
text_tuples = [
|
||||||
|
("This is the first text.", {"text_id": "text1"}),
|
||||||
|
("This is the second text.", {"text_id": "text2"})
|
||||||
|
]
|
||||||
|
|
||||||
|
nlp = spacy.load("en_core_web_sm")
|
||||||
|
doc_tuples = nlp.pipe(text_tuples, as_tuples=True)
|
||||||
|
|
||||||
|
docs = []
|
||||||
|
for doc, context in doc_tuples:
|
||||||
|
doc._.text_id = context["text_id"]
|
||||||
|
docs.append(doc)
|
||||||
|
|
||||||
|
for doc in docs:
|
||||||
|
print(f"{doc._.text_id}: {doc.text}")
|
||||||
|
```
|
||||||
|
|
||||||
### Multiprocessing {#multiprocessing}
|
### Multiprocessing {#multiprocessing}
|
||||||
|
|
||||||
spaCy includes built-in support for multiprocessing with
|
spaCy includes built-in support for multiprocessing with
|
||||||
|
|
@ -1373,6 +1404,8 @@ There are three main types of extensions, which can be defined using the
|
||||||
[`Span.set_extension`](/api/span#set_extension) and
|
[`Span.set_extension`](/api/span#set_extension) and
|
||||||
[`Token.set_extension`](/api/token#set_extension) methods.
|
[`Token.set_extension`](/api/token#set_extension) methods.
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
1. **Attribute extensions.** Set a default value for an attribute, which can be
|
1. **Attribute extensions.** Set a default value for an attribute, which can be
|
||||||
overwritten manually at any time. Attribute extensions work like "normal"
|
overwritten manually at any time. Attribute extensions work like "normal"
|
||||||
variables and are the quickest way to store arbitrary information on a `Doc`,
|
variables and are the quickest way to store arbitrary information on a `Doc`,
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue
Block a user