diff --git a/website/docs/api/large-language-models.mdx b/website/docs/api/large-language-models.mdx index d658e9dda..934ee0507 100644 --- a/website/docs/api/large-language-models.mdx +++ b/website/docs/api/large-language-models.mdx @@ -9,8 +9,8 @@ menu: - ['Various Functions', 'various-functions'] --- -[The spacy-llm package](https://github.com/explosion/spacy-llm) integrates Large -Language Models (LLMs) into spaCy, featuring a modular system for **fast +[The `spacy-llm` package](https://github.com/explosion/spacy-llm) integrates +Large Language Models (LLMs) into spaCy, featuring a modular system for **fast prototyping** and **prompting**, and turning unstructured responses into **robust outputs** for various NLP tasks, **no training data** required. @@ -202,13 +202,82 @@ not require labels. ## Tasks {id="tasks"} -### Task implementation {id="task-implementation"} +In `spacy-llm`, a _task_ defines an NLP problem or question and its solution +using an LLM. It does so by implementing the following responsibilities: -A _task_ defines an NLP problem or question, that will be sent to the LLM via a -prompt. Further, the task defines how to parse the LLM's responses back into -structured information. All tasks are registered in the `llm_tasks` registry. +1. Loading a prompt template and injecting documents' data into the prompt. + Optionally, include fewshot examples in the prompt. +2. Splitting the prompt into several pieces following a map-reduce paradigm, + _if_ the prompt is too long to fit into the model's context and the task + supports sharding prompts. +3. Parsing the LLM's responses back into structured information and validating + the parsed output. -#### task.generate_prompts {id="task-generate-prompts"} +Two different task interfaces are supported: `ShardingLLMTask` and +`NonShardingLLMTask`. Only the former supports the sharding of documents, i. e. +splitting up prompts if they are too long. + +All tasks are registered in the `llm_tasks` registry. + +### On Sharding {id="task-sharding"} + +"Sharding" describes, generally speaking, the process of distributing parts of a +dataset across multiple storage units for easier processing and lookups. In +`spacy-llm` we use this term (synonymously: "mapping") to describe the splitting +up of prompts if they are too long for a model to handle, and "fusing" +(synonymously: "reducing") to describe how the model responses for several shars +are merged back together into a single document. + +Prompts are broken up in a manner that _always_ keeps the prompt in the template +intact, meaning that the instructions to the LLM will always stay complete. The +document content however will be split, if the length of the fully rendered +prompt exceeds a model context length. + +A toy example: let's assume a model has a context window of 25 tokens and the +prompt template for our fictional, sharding-supporting task looks like this: + +``` +Estimate the sentiment of this text: +"{text}" +Estimated entiment: +``` + +Depening on how tokens are counted exactly (this is a config setting), we might +come up with `n = 12` tokens for the number of tokens in the prompt +instructions. Furthermore let's assume that our `text` is "This has been +amazing - I can't remember the last time I left the cinema so impressed." - +which has roughly 19 tokens. + +Considering we only have 13 tokens to add to our prompt before we hit the +context limit, we'll have to split our prompt into two parts. Thus `spacy-llm`, +assuming the task used supports sharding, will split the prompt into two (the +default splitting strategy splits by tokens, but alternative splitting +strategies splitting e. g. by sentences can be configured): + +_(Prompt 1/2)_ + +``` +Estimate the sentiment of this text: +"This has been amazing - I can't remember " +Estimated entiment: +``` + +_(Prompt 2/2)_ + +``` +Estimate the sentiment of this text: +"the last time I left the cinema so impressed." +Estimated entiment: +``` + +The reduction step is task-specific - a sentiment estimation task might e. g. do +a weighted average of the sentiment scores. Note that prompt sharding introduces +potential inaccuracies, as the LLM won't have access to the entire document at +once. Depending on your use case this might or might not be problematic. + +### `NonShardingLLMTask` {id="task-nonsharding"} + +#### task.generate_prompts {id="task-nonsharding-generate-prompts"} Takes a collection of documents, and returns a collection of "prompts", which can be of type `Any`. Often, prompts are of type `str` - but this is not @@ -219,7 +288,7 @@ enforced to allow for maximum flexibility in the framework. | `docs` | The input documents. ~~Iterable[Doc]~~ | | **RETURNS** | The generated prompts. ~~Iterable[Any]~~ | -#### task.parse_responses {id="task-parse-responses"} +#### task.parse_responses {id="task-non-sharding-parse-responses"} Takes a collection of LLM responses and the original documents, parses the responses into structured information, and sets the annotations on the @@ -230,19 +299,44 @@ defined fields. The `responses` are of type `Iterable[Any]`, though they will often be `str` objects. This depends on the return type of the [model](#models). -| Argument | Description | -| ----------- | ------------------------------------------ | -| `docs` | The input documents. ~~Iterable[Doc]~~ | -| `responses` | The generated prompts. ~~Iterable[Any]~~ | -| **RETURNS** | The annotated documents. ~~Iterable[Doc]~~ | +| Argument | Description | +| ----------- | ------------------------------------------------------ | +| `docs` | The input documents. ~~Iterable[Doc]~~ | +| `responses` | The responses received from the LLM. ~~Iterable[Any]~~ | +| **RETURNS** | The annotated documents. ~~Iterable[Doc]~~ | -### Raw prompting {id="raw"} +### `ShardingLLMTask` {id="task-sharding"} -Different to all other tasks `spacy.Raw.vX` doesn't provide a specific prompt, -wrapping doc data, to the model. Instead it instructs the model to reply to the -doc content. This is handy for use cases like question answering (where each doc -contains one question) or if you want to include customized prompts for each -doc. +#### task.generate_prompts {id="task-sharding-generate-prompts"} + +Takes a collection of documents, breaks them up into shards if necessary to fit +all content into the model's context, and returns a collection of collections of +"prompts" (i. e. each doc can have multiple shards, each of which have exactly +one prompt), which can be of type `Any`. Often, prompts are of type `str` - but +this is not enforced to allow for maximum flexibility in the framework. + +| Argument | Description | +| ----------- | -------------------------------------------------- | +| `docs` | The input documents. ~~Iterable[Doc]~~ | +| **RETURNS** | The generated prompts. ~~Iterable[Iterable[Any]]~~ | + +#### task.parse_responses {id="task-sharding-parse-responses"} + +Receives a collection of collection of LLM responses (i. e. each doc can have +multiple shards, each of which have exactly one prompt / prompt response) and +the original shards, parses the responses into structured information, sets the +annotations on the shards, and merges back doc shards into single docs. The +`parse_responses` function is free to set the annotations in any way, including +`Doc` fields like `ents`, `spans` or `cats`, or using custom defined fields. + +The `responses` are of type `Iterable[Iterable[Any]]`, though they will often be +`str` objects. This depends on the return type of the [model](#models). + +| Argument | Description | +| ----------- | ---------------------------------------------------------------- | +| `shards` | The input document shards. ~~Iterable[Iterable[Doc]]~~ | +| `responses` | The responses received from the LLM. ~~Iterable[Iterable[Any]]~~ | +| **RETURNS** | The annotated documents. ~~Iterable[Doc]~~ | ### Translation {id="translation"} @@ -295,6 +389,14 @@ target_lang = "Spanish" path = "translation_examples.yml" ``` +### Raw prompting {id="raw"} + +Different to all other tasks `spacy.Raw.vX` doesn't provide a specific prompt, +wrapping doc data, to the model. Instead it instructs the model to reply to the +doc content. This is handy for use cases like question answering (where each doc +contains one question) or if you want to include customized prompts for each +doc. + #### spacy.Raw.v1 {id="raw-v1"} Note that since this task may request arbitrary information, it doesn't do any @@ -1239,9 +1341,15 @@ A _model_ defines which LLM model to query, and how to query it. It can be a simple function taking a collection of prompts (consistent with the output type of `task.generate_prompts()`) and returning a collection of responses (consistent with the expected input of `parse_responses`). Generally speaking, -it's a function of type `Callable[[Iterable[Any]], Iterable[Any]]`, but specific +it's a function of type +`Callable[[Iterable[Iterable[Any]]], Iterable[Iterable[Any]]]`, but specific implementations can have other signatures, like -`Callable[[Iterable[str]], Iterable[str]]`. +`Callable[[Iterable[Iterable[str]]], Iterable[Iterable[str]]]`. + +Note: the model signature expects a nested iterable so it's able to deal with +sharded docs. Unsharded docs (i. e. those produced by (nonsharding +tasks)[/api/large-language-models#task-nonsharding]) are reshaped to fit the +expected data structure. ### Models via REST API {id="models-rest"} diff --git a/website/docs/usage/large-language-models.mdx b/website/docs/usage/large-language-models.mdx index 43b22ce07..9507e556c 100644 --- a/website/docs/usage/large-language-models.mdx +++ b/website/docs/usage/large-language-models.mdx @@ -340,15 +340,45 @@ A _task_ defines an NLP problem or question, that will be sent to the LLM via a prompt. Further, the task defines how to parse the LLM's responses back into structured information. All tasks are registered in the `llm_tasks` registry. -Practically speaking, a task should adhere to the `Protocol` `LLMTask` defined -in [`ty.py`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/ty.py). -It needs to define a `generate_prompts` function and a `parse_responses` -function. +Practically speaking, a task should adhere to the `Protocol` named `LLMTask` +defined in +[`ty.py`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/ty.py). It +needs to define a `generate_prompts` function and a `parse_responses` function. -| Task | Description | -| --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| [`task.generate_prompts`](/api/large-language-models#task-generate-prompts) | Takes a collection of documents, and returns a collection of "prompts", which can be of type `Any`. | -| [`task.parse_responses`](/api/large-language-models#task-parse-responses) | Takes a collection of LLM responses and the original documents, parses the responses into structured information, and sets the annotations on the documents. | +Tasks may support prompt sharding (for more info see the API docs on +[sharding](/api/large-language-models#task-sharding) and +[non-sharding](/api/large-language-models#task-nonsharding) tasks). The function +signatures for `generate_prompts` and `parse_responses` depend on whether they +do. + +| _For tasks *not supporting* sharding:_ | Task | Description | | +| -------------------------------------- | ---- | ----------- | --- | + +--- + +| | +[`task.generate_prompts`](/api/large-language-models#task-nonsharding-generate-prompts) +| Takes a collection of documents, and returns a collection of prompts, which +can be of type `Any`. | | +[`task.parse_responses`](/api/large-language-models#task-nonsharding-parse-responses) +| Takes a collection of LLM responses and the original documents, parses the +responses into structured information, and sets the annotations on the +documents. | + +| _For tasks *supporting* sharding:_ | Task | Description | | +| ---------------------------------- | ---- | ----------- | --- | + +--- + +| | +[`task.generate_prompts`](/api/large-language-models#task-sharding-generate-prompts) +| Takes a collection of documents, and returns a collection of collection of +prompt shards, which can be of type `Any`. | | +[`task.parse_responses`](/api/large-language-models#task-sharding-parse-responses) +| Takes a collection of collection of LLM responses (one per prompt shard) and +the original documents, parses the responses into structured information, sets +the annotations on the doc shards, and merges those doc shards back into a +single doc instance. | Moreover, the task may define an optional [`scorer` method](/api/scorer#score). It should accept an iterable of `Example` objects as input and return a score @@ -370,7 +400,9 @@ evaluate the component. | [`spacy.TextCat.v2`](/api/large-language-models#textcat-v2) | Version 2 builds on v1 and includes an improved prompt template. | | [`spacy.TextCat.v1`](/api/large-language-models#textcat-v1) | Version 1 of the built-in TextCat task supports both zero-shot and few-shot prompting. | | [`spacy.Lemma.v1`](/api/large-language-models#lemma-v1) | Lemmatizes the provided text and updates the `lemma_` attribute of the tokens accordingly. | +| [`spacy.Raw.v1`](/api/large-language-models#raw-v1) | Executes raw doc content as prompt to LLM. | | [`spacy.Sentiment.v1`](/api/large-language-models#sentiment-v1) | Performs sentiment analysis on provided texts. | +| [`spacy.Translation.v1`](/api/large-language-models#translation-v1) | Translates doc content into the specified target language. | | [`spacy.NoOp.v1`](/api/large-language-models#noop-v1) | This task is only useful for testing - it tells the LLM to do nothing, and does not set any fields on the `docs`. | #### Providing examples for few-shot prompts {id="few-shot-prompts"}