Merge branch 'docs/llm_develop' into docs/llm-translation-task

# Conflicts: # website/docs/api/large-language-models.mdx
2026-03-06 12:51:26 +03:00 · 2023-12-11 17:21:40 +01:00 · 2023-12-11 17:21:40 +01:00 · d981c27c93
commit d981c27c93
parent 6265e0c528 e79a9c5acd
1 changed files with 61 additions and 0 deletions
--- a/website/docs/api/large-language-models.mdx
+++ b/website/docs/api/large-language-models.mdx
@ -236,6 +236,13 @@ objects. This depends on the return type of the [model](#models).
 | `responses` | The generated prompts. ~~Iterable[Any]~~   |
 | **RETURNS** | The annotated documents. ~~Iterable[Doc]~~ |

+### Raw prompting {id="raw"}
+
+Different to all other tasks `spacy.Raw.vX` doesn't provide a specific prompt,
+wrapping doc data, to the model. Instead it instructs the model to reply to the
+doc content. This is handy for use cases like question answering (where each doc
+contains one question) or if you want to include customized prompts for each doc.
+
 ### Translation {id="translation"}

 The translation task translates texts from a defined or inferred source to a
@ -287,6 +294,60 @@ target_lang = "Spanish"
 path = "translation_examples.yml"
 ```

+#### spacy.Raw.v1 {id="raw-v1"}
+
+Note that since this task may request arbitrary information, it doesn't do any
+parsing per se - the model response is stored in a custom `Doc` attribute (i. e.
+can be accessed via `doc._.{field}`).
+
+It supports both zero-shot and few-shot prompting.
+
+> #### Example config
+>
+> ```ini
+> [components.llm.task]
+> @llm_tasks = "spacy.Raw.v1"
+> examples = null
+> ```
+
+| Argument              | Description                                                                                                                                                               |
+| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `template`            | Custom prompt template to send to LLM model. Defaults to [raw.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/raw.v1.jinja). ~~str~~ |
+| `examples`            | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~                                            |
+| `parse_responses`     | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[RawTask]]~~                        |
+| `prompt_example_type` | Type to use for fewshot examples. Defaults to `RawExample`. ~~Optional[Type[FewshotExample]]~~                                                                            |
+| `field`               | Name of extension attribute to store model reply in (i. e. the reply will be available in `doc._.{field}`). Defaults to `reply`. ~~str~~                                  |
+
+To perform [few-shot learning](/usage/large-language-models#few-shot-prompts),
+you can write down a few examples in a separate file, and provide these to be
+injected into the prompt to the LLM. The default reader `spacy.FewShotReader.v1`
+supports `.yml`, `.yaml`, `.json` and `.jsonl`.
+
+```yaml
+# Each example can follow an arbitrary pattern. It might help the prompt performance though if the examples resemble
+# the actual docs' content.
+- text: "3 + 5 = x. What's x?"
+  reply: '8'
+
+- text: 'Write me a limerick.'
+  reply:
+    "There was an Old Man with a beard, Who said, 'It is just as I feared! Two
+    Owls and a Hen, Four Larks and a Wren, Have all built their nests in my
+    beard!"
+
+- text: "Analyse the sentiment of the text 'This is great'."
+  reply: "'This is great' expresses a very positive sentiment."
+```
+
+```ini
+[components.llm.task]
+@llm_tasks = "spacy.Raw.v1"
+field = "llm_reply"
+[components.llm.task.examples]
+@misc = "spacy.FewShotReader.v1"
+path = "raw_examples.yml"
+```
+
 ### Summarization {id="summarization"}

 A summarization task takes a document as input and generates a summary that is