simplify sections

This commit is contained in:
svlandeg 2023-08-31 15:31:18 +02:00
parent 81d3bef507
commit c76e53999e

View File

@ -88,6 +88,11 @@ objects. This depends on the return type of the [model](#models).
| `responses` | The generated prompts. ~~Iterable[Any]~~ | | `responses` | The generated prompts. ~~Iterable[Any]~~ |
| **RETURNS** | The annotated documents. ~~Iterable[Doc]~~ | | **RETURNS** | The annotated documents. ~~Iterable[Doc]~~ |
### Summarization {id="summarization"}
A summarization task takes a document as input and generates a summary that is
stored in an extension attribute.
#### spacy.Summarization.v1 {id="summarization-v1"} #### spacy.Summarization.v1 {id="summarization-v1"}
The `spacy.Summarization.v1` task supports both zero-shot and few-shot The `spacy.Summarization.v1` task supports both zero-shot and few-shot
@ -154,9 +159,9 @@ The NER task identifies non-overlapping entities in text.
#### spacy.NER.v2 {id="ner-v2"} #### spacy.NER.v2 {id="ner-v2"}
The built-in NER task supports both zero-shot and few-shot prompting. This This version supports explicitly defining the provided labels with custom
version also supports explicitly defining the provided labels with custom descriptions, and further supports zero-shot and few-shot prompting just like
descriptions. v1.
> #### Example config > #### Example config
> >
@ -182,15 +187,6 @@ The parameters `alignment_mode`, `case_sensitive_matching` and `single_match`
are identical to the [v1](#ner-v1) implementation. The format of few-shot are identical to the [v1](#ner-v1) implementation. The format of few-shot
examples are also the same. examples are also the same.
```ini
[components.llm.task]
@llm_tasks = "spacy.NER.v2"
labels = PERSON,ORGANISATION,LOCATION
[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "ner_examples.yml"
```
> Label descriptions can also be used with explicit examples to give as much > Label descriptions can also be used with explicit examples to give as much
> info to the LLM model as possible. > info to the LLM model as possible.
@ -273,18 +269,19 @@ supports `.yml`, `.yaml`, `.json` and `.jsonl`.
``` ```
```ini ```ini
[components.llm.task]
@llm_tasks = "spacy.NER.v1"
labels = PERSON,ORGANISATION,LOCATION
[components.llm.task.examples] [components.llm.task.examples]
@misc = "spacy.FewShotReader.v1" @misc = "spacy.FewShotReader.v1"
path = "ner_examples.yml" path = "ner_examples.yml"
``` ```
### SpanCat {id="spancat"}
The SpanCat task identifies potentially overlapping entities in text.
#### spacy.SpanCat.v2 {id="spancat-v2"} #### spacy.SpanCat.v2 {id="spancat-v2"}
The built-in SpanCat task is a simple adaptation of the NER task to support The built-in SpanCat v2 task is a simple adaptation of the NER v2 task to
overlapping entities and store its annotations in `doc.spans`. support overlapping entities and store its annotations in `doc.spans`.
> #### Example config > #### Example config
> >
@ -307,8 +304,9 @@ overlapping entities and store its annotations in `doc.spans`.
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ | | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
| `single_match` | Whether to match an entity in the LLM's response only once (the first hit) or multiple times. Defaults to `False`. ~~bool~~ | | `single_match` | Whether to match an entity in the LLM's response only once (the first hit) or multiple times. Defaults to `False`. ~~bool~~ |
Except for the `spans_key` parameter, the SpanCat task reuses the configuration Except for the `spans_key` parameter, the SpanCat v2 task reuses the
from the NER task. Refer to [its documentation](#ner-v2) for more insight. configuration from the NER v2 task. Refer to [its documentation](#ner-v2) for
more insight.
#### spacy.SpanCat.v1 {id="spancat-v1"} #### spacy.SpanCat.v1 {id="spancat-v1"}
@ -335,14 +333,19 @@ v1 NER task to support overlapping entities and store its annotations in
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ | | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
| `single_match` | Whether to match an entity in the LLM's response only once (the first hit) or multiple times. Defaults to `False`. ~~bool~~ | | `single_match` | Whether to match an entity in the LLM's response only once (the first hit) or multiple times. Defaults to `False`. ~~bool~~ |
Except for the `spans_key` parameter, the SpanCat task reuses the configuration Except for the `spans_key` parameter, the SpanCat v1 task reuses the
from the NER task. Refer to [its documentation](#ner-v1) for more insight. configuration from the NER v1 task. Refer to [its documentation](#ner-v1) for
more insight.
### TextCat {id="textcat"}
The TextCat task labels documents with relevant categories.
#### spacy.TextCat.v3 {id="textcat-v3"} #### spacy.TextCat.v3 {id="textcat-v3"}
Version 3 (the most recent) of the built-in TextCat task supports both zero-shot On top of the functionality from v2, version 3 of the built-in TextCat tasks
and few-shot prompting. It allows setting definitions of labels. Those allows setting definitions of labels. Those definitions are included in the
definitions are included in the prompt. prompt.
> #### Example config > #### Example config
> >
@ -357,52 +360,23 @@ definitions are included in the prompt.
> examples = null > examples = null
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ | | `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `label_definitions` | Dictionary of label definitions. Included in the prompt, if set. Defaults to `None`. ~~Optional[Dict[str, str]]~~ | | `label_definitions` (NEW) | Dictionary of label definitions. Included in the prompt, if set. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `template` | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [`textcat.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/textcat.v3.jinja). ~~str~~ | | `template` | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [`textcat.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/textcat.v3.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
| `exclusive_classes` | If set to `True`, only one label per document should be valid. If set to `False`, one document can have multiple labels. Defaults to `False`. ~~bool~~ | | `exclusive_classes` | If set to `True`, only one label per document should be valid. If set to `False`, one document can have multiple labels. Defaults to `False`. ~~bool~~ |
| `allow_none` | When set to `True`, allows the LLM to not return any of the given label. The resulting dict in `doc.cats` will have `0.0` scores for all labels. Defaults to `True`. ~~bool~~ | | `allow_none` | When set to `True`, allows the LLM to not return any of the given label. The resulting dict in `doc.cats` will have `0.0` scores for all labels. Defaults to `True`. ~~bool~~ |
| `verbose` | If set to `True`, warnings will be generated when the LLM returns invalid responses. Defaults to `False`. ~~bool~~ | | `verbose` | If set to `True`, warnings will be generated when the LLM returns invalid responses. Defaults to `False`. ~~bool~~ |
To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts), The formatting of few-shot examples is the same as those for the
you can write down a few examples in a separate file, and provide these to be [v1](#textcat-v1) implementation.
injected into the prompt to the LLM. The default reader `spacy.FewShotReader.v1`
supports `.yml`, `.yaml`, `.json` and `.jsonl`.
```json
[
{
"text": "You look great!",
"answer": "Compliment"
},
{
"text": "You are not very clever at all.",
"answer": "Insult"
}
]
```
```ini
[components.llm.task]
@llm_tasks = "spacy.TextCat.v3"
labels = ["COMPLIMENT", "INSULT"]
label_definitions = {
"COMPLIMENT": "a polite expression of praise or admiration.",
"INSULT": "a disrespectful or scornfully abusive remark or act."
}
[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "textcat_examples.json"
```
#### spacy.TextCat.v2 {id="textcat-v2"} #### spacy.TextCat.v2 {id="textcat-v2"}
Version 2 of the built-in TextCat task supports both zero-shot and few-shot V2 includes all v1 functionality, with an improved prompt template.
prompting and includes an improved prompt template.
> #### Example config > #### Example config
> >
@ -423,32 +397,8 @@ prompting and includes an improved prompt template.
| `allow_none` | When set to `True`, allows the LLM to not return any of the given label. The resulting dict in `doc.cats` will have `0.0` scores for all labels. Defaults to `True`. ~~bool~~ | | `allow_none` | When set to `True`, allows the LLM to not return any of the given label. The resulting dict in `doc.cats` will have `0.0` scores for all labels. Defaults to `True`. ~~bool~~ |
| `verbose` | If set to `True`, warnings will be generated when the LLM returns invalid responses. Defaults to `False`. ~~bool~~ | | `verbose` | If set to `True`, warnings will be generated when the LLM returns invalid responses. Defaults to `False`. ~~bool~~ |
To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts), The formatting of few-shot examples is the same as those for the
you can write down a few examples in a separate file, and provide these to be [v1](#textcat-v1) implementation.
injected into the prompt to the LLM. The default reader `spacy.FewShotReader.v1`
supports `.yml`, `.yaml`, `.json` and `.jsonl`.
```json
[
{
"text": "You look great!",
"answer": "Compliment"
},
{
"text": "You are not very clever at all.",
"answer": "Insult"
}
]
```
```ini
[components.llm.task]
@llm_tasks = "spacy.TextCat.v2"
labels = ["COMPLIMENT", "INSULT"]
[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "textcat_examples.json"
```
#### spacy.TextCat.v1 {id="textcat-v1"} #### spacy.TextCat.v1 {id="textcat-v1"}
@ -492,14 +442,15 @@ supports `.yml`, `.yaml`, `.json` and `.jsonl`.
``` ```
```ini ```ini
[components.llm.task]
@llm_tasks = "spacy.TextCat.v2"
labels = COMPLIMENT,INSULT
[components.llm.task.examples] [components.llm.task.examples]
@misc = "spacy.FewShotReader.v1" @misc = "spacy.FewShotReader.v1"
path = "textcat_examples.json" path = "textcat_examples.json"
``` ```
### REL {id="rel"}
The REL task extracts relations between named entities.
#### spacy.REL.v1 {id="rel-v1"} #### spacy.REL.v1 {id="rel-v1"}
The built-in REL task supports both zero-shot and few-shot prompting. It relies The built-in REL task supports both zero-shot and few-shot prompting. It relies
@ -545,10 +496,14 @@ Note: the REL task relies on pre-extracted entities to make its prediction.
Hence, you'll need to add a component that populates `doc.ents` with recognized Hence, you'll need to add a component that populates `doc.ents` with recognized
spans to your spaCy pipeline and put it _before_ the REL component. spans to your spaCy pipeline and put it _before_ the REL component.
### Lemma {id="lemma"}
The Lemma task lemmatizes the provided text and updates the `lemma_` attribute
in the doc's tokens accordingly.
#### spacy.Lemma.v1 {id="lemma-v1"} #### spacy.Lemma.v1 {id="lemma-v1"}
The `Lemma.v1` task lemmatizes the provided text and updates the `lemma_` This task supports both zero-shot and few-shot prompting.
attribute in the doc's tokens accordingly.
> #### Example config > #### Example config
> >
@ -563,9 +518,9 @@ attribute in the doc's tokens accordingly.
| `template` | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [lemma.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/lemma.v1.jinja). ~~str~~ | | `template` | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [lemma.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/lemma.v1.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
`Lemma.v1` prompts the LLM to lemmatize the passed text and return the The task prompts the LLM to lemmatize the passed text and return the lemmatized
lemmatized version as a list of tokens and their corresponding lemma. E. g. the version as a list of tokens and their corresponding lemma. E. g. the text
text `I'm buying ice cream for my friends` should invoke the response `I'm buying ice cream for my friends` should invoke the response
``` ```
I: I I: I
@ -617,12 +572,16 @@ supports `.yml`, `.yaml`, `.json` and `.jsonl`.
path = "lemma_examples.yml" path = "lemma_examples.yml"
``` ```
#### spacy.Sentiment.v1 {id="sentiment-v1"} ### Sentiment {id="sentiment"}
Performs sentiment analysis on provided texts. Scores between 0 and 1 are stored Performs sentiment analysis on provided texts. Scores between 0 and 1 are stored
in `Doc._.sentiment` - the higher, the more positive. Note in cases of parsing in `Doc._.sentiment` - the higher, the more positive. Note in cases of parsing
issues (e. g. in case of unexpected LLM responses) the value might be `None`. issues (e. g. in case of unexpected LLM responses) the value might be `None`.
#### spacy.Sentiment.v1 {id="sentiment-v1"}
This task supports both zero-shot and few-shot prompting.
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -661,6 +620,11 @@ supports `.yml`, `.yaml`, `.json` and `.jsonl`.
path = "sentiment_examples.yml" path = "sentiment_examples.yml"
``` ```
### NoOp {id="noop"}
This task is only useful for testing - it tells the LLM to do nothing, and does
not set any fields on the `docs`.
#### spacy.NoOp.v1 {id="noop-v1"} #### spacy.NoOp.v1 {id="noop-v1"}
> #### Example config > #### Example config
@ -670,10 +634,7 @@ path = "sentiment_examples.yml"
> @llm_tasks = "spacy.NoOp.v1" > @llm_tasks = "spacy.NoOp.v1"
> ``` > ```
This task is only useful for testing - it tells the LLM to do nothing, and does ## Models {id="models"}
not set any fields on the `docs`.
### Models {id="models"}
A _model_ defines which LLM model to query, and how to query it. It can be a A _model_ defines which LLM model to query, and how to query it. It can be a
simple function taking a collection of prompts (consistent with the output type simple function taking a collection of prompts (consistent with the output type
@ -683,7 +644,7 @@ it's a function of type `Callable[[Iterable[Any]], Iterable[Any]]`, but specific
implementations can have other signatures, like implementations can have other signatures, like
`Callable[[Iterable[str]], Iterable[str]]`. `Callable[[Iterable[str]], Iterable[str]]`.
#### API Keys {id="api-keys"} ### API Keys {id="api-keys"}
Note that when using hosted services, you have to ensure that the proper API Note that when using hosted services, you have to ensure that the proper API
keys are set as environment variables as described by the corresponding keys are set as environment variables as described by the corresponding
@ -709,10 +670,12 @@ and for Anthropic
export ANTHROPIC_API_KEY="..." export ANTHROPIC_API_KEY="..."
``` ```
#### spacy.GPT-4.v1 {id="gpt-4"} ### GPT-4 {id="gpt-4"}
OpenAI's `gpt-4` model family. OpenAI's `gpt-4` model family.
#### spacy.GPT-4.v1 {id="gpt-4-v1"}
> #### Example config: > #### Example config:
> >
> ```ini > ```ini
@ -730,10 +693,12 @@ OpenAI's `gpt-4` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.GPT-3-5.v1 {id="gpt-3-5"} ### GPT-3-5 {id="gpt-3-5"}
OpenAI's `gpt-3-5` model family. OpenAI's `gpt-3-5` model family.
#### spacy.GPT-3-5.v1 {id="gpt-3-5-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -751,10 +716,12 @@ OpenAI's `gpt-3-5` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Text-Davinci.v1 {id="text-davinci"} ### Text-Davinci {id="text-davinci"}
OpenAI's `text-davinci` model family. OpenAI's `text-davinci` model family.
#### spacy.Text-Davinci.v1 {id="text-davinci-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -772,10 +739,12 @@ OpenAI's `text-davinci` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Code-Davinci.v1 {id="code-davinci"} ### Code-Davinci {id="code-davinci"}
OpenAI's `code-davinci` model family. OpenAI's `code-davinci` model family.
#### spacy.Code-Davinci.v1 {id="code-davinci-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -793,10 +762,12 @@ OpenAI's `code-davinci` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Text-Curie.v1 {id="text-curie"} ### Text-Curie {id="text-curie"}
OpenAI's `text-curie` model family. OpenAI's `text-curie` model family.
#### spacy.Text-Curie.v1 {id="text-curie-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -814,10 +785,12 @@ OpenAI's `text-curie` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Text-Babbage.v1 {id="text-babbage"} ### Text-Babbage {id="text-babbage"}
OpenAI's `text-babbage` model family. OpenAI's `text-babbage` model family.
#### spacy.Text-Babbage.v1 {id="text-babbage-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -835,10 +808,12 @@ OpenAI's `text-babbage` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Text-Ada.v1 {id="text-ada"} ### Text-Ada {id="text-ada"}
OpenAI's `text-ada` model family. OpenAI's `text-ada` model family.
#### spacy.Text-Ada.v1 {id="text-ada-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -856,10 +831,12 @@ OpenAI's `text-ada` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Davinci.v1 {id="davinci"} ### Davinci {id="davinci"}
OpenAI's `davinci` model family. OpenAI's `davinci` model family.
#### spacy.Davinci.v1 {id="davinci-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -877,10 +854,12 @@ OpenAI's `davinci` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Curie.v1 {id="curie"} ### Curie {id="curie"}
OpenAI's `curie` model family. OpenAI's `curie` model family.
#### spacy.Curie.v1 {id="curie-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -898,10 +877,12 @@ OpenAI's `curie` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Babbage.v1 {id="babbage"} ### Babbage {id="babbage"}
OpenAI's `babbage` model family. OpenAI's `babbage` model family.
#### spacy.Babbage.v1 {id="babbage-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -919,10 +900,12 @@ OpenAI's `babbage` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Ada.v1 {id="ada"} ### Ada {id="ada"}
OpenAI's `ada` model family. OpenAI's `ada` model family.
#### spacy.Ada.v1 {id="ada-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -940,10 +923,12 @@ OpenAI's `ada` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Command.v1 {id="command"} ### Command {id="command"}
Cohere's `command` model family. Cohere's `command` model family.
#### spacy.Command.v1 {id="command-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -961,10 +946,12 @@ Cohere's `command` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Claude-2.v1 {id="claude-2"} ### Claude-2 {id="claude-2"}
Anthropic's `claude-2` model family. Anthropic's `claude-2` model family.
#### spacy.Claude-2.v1 {id="claude-2-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -982,10 +969,12 @@ Anthropic's `claude-2` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Claude-1.v1 {id="claude-1"} ### Claude-1 {id="claude-1"}
Anthropic's `claude-1` model family. Anthropic's `claude-1` model family.
#### spacy.Claude-1.v1 {id="claude-1-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -1003,10 +992,12 @@ Anthropic's `claude-1` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Claude-instant-1.v1 {id="claude-instant-1"} ### Claude-instant-1 {id="claude-instant-1"}
Anthropic's `claude-instant-1` model family. Anthropic's `claude-instant-1` model family.
#### spacy.Claude-instant-1.v1 {id="claude-instant-1-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -1024,10 +1015,12 @@ Anthropic's `claude-instant-1` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Claude-instant-1-1.v1 {id="claude-instant-1-1"} ### Claude-instant-1-1 {id="claude-instant-1-1"}
Anthropic's `claude-instant-1.1` model family. Anthropic's `claude-instant-1.1` model family.
#### spacy.Claude-instant-1-1.v1 {id="claude-instant-1-1-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -1045,10 +1038,12 @@ Anthropic's `claude-instant-1.1` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Claude-1-0.v1 {id="claude-1-0"} #### Claude-1-0 {id="claude-1-0"}
Anthropic's `claude-1.0` model family. Anthropic's `claude-1.0` model family.
#### spacy.Claude-1-0.v1 {id="claude-1-0-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -1066,10 +1061,12 @@ Anthropic's `claude-1.0` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Claude-1-2.v1 {id="claude-1-2"} #### Claude-1-2 {id="claude-1-2"}
Anthropic's `claude-1.2` model family. Anthropic's `claude-1.2` model family.
#### spacy.Claude-1-2.v1 {id="claude-1-2-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -1087,10 +1084,12 @@ Anthropic's `claude-1.2` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Claude-1-3.v1 {id="claude-1-3"} #### Claude-1-3 {id="claude-1-3"}
Anthropic's `claude-1.3` model family. Anthropic's `claude-1.3` model family.
#### spacy.Claude-1-3.v1 {id="claude-1-3-v1"}
> #### Example config > #### Example config
> >
> ```ini > ```ini
@ -1108,7 +1107,11 @@ Anthropic's `claude-1.3` model family.
| `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `3`. ~~int~~ |
| `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ | | `timeout` | Timeout for API request in seconds. Defaults to `30`. ~~int~~ |
#### spacy.Dolly.v1 {id="dolly"} ### Dolly {id="dolly"}
Databrick's open-source `Dolly` model family.
#### spacy.Dolly.v1 {id="dolly-v1"}
To use this model, ideally you have a GPU enabled and have installed To use this model, ideally you have a GPU enabled and have installed
`transformers`, `torch` and CUDA in your virtual environment. This allows you to `transformers`, `torch` and CUDA in your virtual environment. This allows you to
@ -1157,7 +1160,11 @@ can
[define the cached directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache) [define the cached directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache)
by setting the environmental variable `HF_HOME`. by setting the environmental variable `HF_HOME`.
#### spacy.Llama2.v1 {id="llama2"} ### Llama2 {id="llama2"}
Meta AI's open-source `Llama2` model family.
#### spacy.Llama2.v1 {id="llama2-v1"}
To use this model, ideally you have a GPU enabled and have installed To use this model, ideally you have a GPU enabled and have installed
`transformers`, `torch` and CUDA in your virtual environment. This allows you to `transformers`, `torch` and CUDA in your virtual environment. This allows you to
@ -1203,7 +1210,11 @@ can
[define the cache directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache) [define the cache directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache)
by setting the environmental variable `HF_HOME`. by setting the environmental variable `HF_HOME`.
#### spacy.Falcon.v1 {id="falcon"} ### Falcon {id="falcon"}
TII's open-source `Falcon` model family.
#### spacy.Falcon.v1 {id="falcon-v1"}
To use this model, ideally you have a GPU enabled and have installed To use this model, ideally you have a GPU enabled and have installed
`transformers`, `torch` and CUDA in your virtual environment. This allows you to `transformers`, `torch` and CUDA in your virtual environment. This allows you to
@ -1244,7 +1255,11 @@ can
[define the cache directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache) [define the cache directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache)
by setting the environmental variable `HF_HOME`. by setting the environmental variable `HF_HOME`.
#### spacy.StableLM.v1 {id="stablelm"} ### StableLM {id="stablelm"}
Stability AI's open-source `StableLM` model family.
#### spacy.StableLM.v1 {id="stablelm-v1"}
To use this model, ideally you have a GPU enabled and have installed To use this model, ideally you have a GPU enabled and have installed
`transformers`, `torch` and CUDA in your virtual environment. `transformers`, `torch` and CUDA in your virtual environment.
@ -1287,7 +1302,11 @@ can
[define the cached directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache) [define the cached directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache)
by setting the environmental variable `HF_HOME`. by setting the environmental variable `HF_HOME`.
#### spacy.OpenLLaMA.v1 {id="openllama"} ### OpenLLaMA {id="openllama"}
OpenLM Research's open-source `OpenLLaMA` model family.
#### spacy.OpenLLaMA.v1 {id="openllama-v1"}
To use this model, ideally you have a GPU enabled and have installed To use this model, ideally you have a GPU enabled and have installed
@ -1333,7 +1352,7 @@ can
[define the cached directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache) [define the cached directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache)
by setting the environmental variable `HF_HOME`. by setting the environmental variable `HF_HOME`.
#### LangChain models {id="langchain-models"} ### LangChain models {id="langchain-models"}
To use [LangChain](https://github.com/hwchase17/langchain) for the API retrieval To use [LangChain](https://github.com/hwchase17/langchain) for the API retrieval
part, make sure you have installed it first: part, make sure you have installed it first:
@ -1374,7 +1393,7 @@ The name of the model to be used has to be passed in via the `name` attribute.
The default `query` (`spacy.CallLangChain.v1`) executes the prompts by running The default `query` (`spacy.CallLangChain.v1`) executes the prompts by running
`model(text)` for each given textual prompt. `model(text)` for each given textual prompt.
### Cache {id="cache"} ## Cache {id="cache"}
Interacting with LLMs, either through an external API or a local instance, is Interacting with LLMs, either through an external API or a local instance, is
costly. Since developing an NLP pipeline generally means a lot of exploration costly. Since developing an NLP pipeline generally means a lot of exploration
@ -1406,9 +1425,9 @@ provide your own registered function returning your own cache implementation. If
you wish to do so, ensure that your cache object adheres to the `Protocol` you wish to do so, ensure that your cache object adheres to the `Protocol`
defined in `spacy_llm.ty.Cache`. defined in `spacy_llm.ty.Cache`.
### Various functions {id="various-functions"} ## Various functions {id="various-functions"}
#### spacy.FewShotReader.v1 {id="fewshotreader-v1"} ### spacy.FewShotReader.v1 {id="fewshotreader-v1"}
This function is registered in spaCy's `misc` registry, and reads in examples This function is registered in spaCy's `misc` registry, and reads in examples
from a `.yml`, `.yaml`, `.json` or `.jsonl` file. It uses from a `.yml`, `.yaml`, `.json` or `.jsonl` file. It uses
@ -1427,7 +1446,7 @@ them depending on the file extension.
| -------- | ----------------------------------------------------------------------------------------------- | | -------- | ----------------------------------------------------------------------------------------------- |
| `path` | Path to an examples file with suffix `.yml`, `.yaml`, `.json` or `.jsonl`. ~~Union[str, Path]~~ | | `path` | Path to an examples file with suffix `.yml`, `.yaml`, `.json` or `.jsonl`. ~~Union[str, Path]~~ |
#### spacy.FileReader.v1 {id="filereader-v1"} ### spacy.FileReader.v1 {id="filereader-v1"}
This function is registered in spaCy's `misc` registry, and reads a file This function is registered in spaCy's `misc` registry, and reads a file
provided to the `path` to return a `str` representation of its contents. This provided to the `path` to return a `str` representation of its contents. This
@ -1447,7 +1466,7 @@ template.
| -------- | ------------------------------------------------- | | -------- | ------------------------------------------------- |
| `path` | Path to the file to be read. ~~Union[str, Path]~~ | | `path` | Path to the file to be read. ~~Union[str, Path]~~ |
#### Normalizer functions {id="normalizer-functions"} ### Normalizer functions {id="normalizer-functions"}
These functions provide simple normalizations for string comparisons, e.g. These functions provide simple normalizations for string comparisons, e.g.
between a list of specified labels and a label given in the raw text of the LLM between a list of specified labels and a label given in the raw text of the LLM