v3 overview of parameters

2025-11-11 05:19:52 +03:00 · 2023-09-01 12:07:48 +02:00 · 2023-09-01 12:07:48 +02:00 · c13f9ec933
commit c13f9ec933
parent e5f561565b
1 changed files with 100 additions and 64 deletions
--- a/website/docs/api/large-language-models.mdx
+++ b/website/docs/api/large-language-models.mdx
@ -108,8 +108,8 @@ prompting.
 > ```
 | Argument      | Description                                                                                                                                                                                   |
-| ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `template`    | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [summarization.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/summarization.v1.jinja). ~~str~~ |
+| `template`    | Custom prompt template to send to LLM model. Defaults to [summarization.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/summarization.v1.jinja). ~~str~~ |
 | `examples`    | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~                                                                |
 | `max_n_words` | Maximum number of words to be used in summary. Note that this should not expected to work exactly. Defaults to `None`. ~~Optional[int]~~                                                      |
 | `field`       | Name of extension attribute to store summary in (i. e. the summary will be available in `doc._.{field}`). Defaults to `summary`. ~~str~~                                                      |
@ -157,6 +157,40 @@ path = "summarization_examples.yml"
 The NER task identifies non-overlapping entities in text.
 #### spacy.NER.v3 {id="ner-v3"}
 Version 3 is fundamentally different to v1 and v2, as it implements
 Chain-of-Thought prompting, based on
 [the PromptNER paper by Ashok and Lipton (2023)](https://arxiv.org/pdf/2305.15444.pdf).
 From preliminary experiments, we've found this implementation to obtain
 significant better accuracy.
 > #### Example config
 >
 > ```ini
 > [components.llm.task]
 > @llm_tasks = "spacy.NER.v3"
 > labels = ["PERSON", "ORGANISATION", "LOCATION"]
 > ```
 When no examples are [specified](/usage/large-language-models#few-shot-prompts),
 the v3 implementation will use a dummy example in the prompt. Technically this
 means that the task will always perform few-shot prompting under the hood.
 | Argument                  | Description                                                                                                                                                                                            |
 | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | `labels`                  | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~                                                                                                                     |
 | `label_definitions`       | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
 | `template`                | Custom prompt template to send to LLM model. Defaults to [ner.v3.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v3.jinja). ~~str~~                              |
 | `description` (NEW)       | A description of what to recognize or not recognize as entities. ~~str~~                                                                                                                               |
 | `examples`                | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~                                                                         |
 | `normalizer`              | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~                              |
 | `alignment_mode`          | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~                         |
 | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~                                                                                                                              |
 Note that the `single_match` parameter, used in v1 and v2, is not supported
 anymore, as the CoT parsing takes care of this automatically.
 #### spacy.NER.v2 {id="ner-v2"}
 This version supports explicitly defining the provided labels with custom
@ -173,10 +207,10 @@ v1.
 > ```
 | Argument                  | Description                                                                                                                                                                                            |
-| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | `labels`                  | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~                                                                                                                     |
 | `template` (NEW)          | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [ner.v2.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v2.jinja). ~~str~~ |
 | `label_definitions` (NEW) | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
 | `template` (NEW)          | Custom prompt template to send to LLM model. Defaults to [ner.v2.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v2.jinja). ~~str~~                              |
 | `examples`                | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~                                                                         |
 | `normalizer`              | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~                              |
 | `alignment_mode`          | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~                         |
@ -293,9 +327,9 @@ support overlapping entities and store its annotations in `doc.spans`.
 > ```
 | Argument                  | Description                                                                                                                                                                                            |
-| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | `labels`                  | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~                                                                                                                     |
-| `template` (NEW)          | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [`spancat.v2.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/spancat.v2.jinja). ~~str~~ |
+| `template` (NEW)          | Custom prompt template to send to LLM model. Defaults to [`spancat.v2.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/spancat.v2.jinja). ~~str~~                    |
 | `label_definitions` (NEW) | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
 | `spans_key`               | Key of the `Doc.spans` dict to save the spans under. Defaults to `"sc"`. ~~str~~                                                                                                                       |
 | `examples`                | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~                                                                         |
@ -361,10 +395,10 @@ prompt.
 > ```
 | Argument                  | Description                                                                                                                                                                         |
-| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `labels`                  | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~                                                                                                  |
 | `label_definitions` (NEW) | Dictionary of label definitions. Included in the prompt, if set. Defaults to `None`. ~~Optional[Dict[str, str]]~~                                                                   |
-| `template`                | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [`textcat.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/textcat.v3.jinja). ~~str~~ |
+| `template`                | Custom prompt template to send to LLM model. Defaults to [`textcat.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/textcat.v3.jinja). ~~str~~ |
 | `examples`                | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~                                                      |
 | `normalizer`              | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~         |
 | `exclusive_classes`       | If set to `True`, only one label per document should be valid. If set to `False`, one document can have multiple labels. Defaults to `False`. ~~bool~~                              |
@ -388,9 +422,9 @@ V2 includes all v1 functionality, with an improved prompt template.
 > ```
 | Argument            | Description                                                                                                                                                                         |
-| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `labels`            | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~                                                                                                  |
-| `template` (NEW)    | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [`textcat.v2.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/textcat.v2.jinja). ~~str~~ |
+| `template` (NEW)    | Custom prompt template to send to LLM model. Defaults to [`textcat.v2.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/textcat.v2.jinja). ~~str~~ |
 | `examples`          | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~                                                      |
 | `normalizer`        | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~                             |
 | `exclusive_classes` | If set to `True`, only one label per document should be valid. If set to `False`, one document can have multiple labels. Defaults to `False`. ~~bool~~                              |
@ -465,9 +499,9 @@ on an upstream NER component for entities extraction.
 > ```
 | Argument            | Description                                                                                                                                                                 |
-| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `labels`            | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~                                                                                          |
-| `template`          | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [`rel.v1.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/rel.v1.jinja). ~~str~~ |
+| `template`          | Custom prompt template to send to LLM model. Defaults to [`rel.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/rel.v1.jinja). ~~str~~ |
 | `label_description` | Dictionary providing a description for each relation label. Defaults to `None`. ~~Optional[Dict[str, str]]~~                                                                |
 | `examples`          | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~                                              |
 | `normalizer`        | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
@ -514,8 +548,8 @@ This task supports both zero-shot and few-shot prompting.
 > ```
 | Argument   | Description                                                                                                                                                                   |
-| ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `template` | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [lemma.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/lemma.v1.jinja). ~~str~~ |
+| `template` | Custom prompt template to send to LLM model. Defaults to [lemma.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/lemma.v1.jinja). ~~str~~ |
 | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~                                                |
 The task prompts the LLM to lemmatize the passed text and return the lemmatized
@ -591,8 +625,8 @@ This task supports both zero-shot and few-shot prompting.
 > ```
 | Argument   | Description                                                                                                                                |
-| ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ---------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
-| `template` | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [sentiment.v1.jinja](./spacy_llm/tasks/templates/sentiment.v1.jinja). ~~str~~ |
+| `template` | Custom prompt template to send to LLM model. Defaults to [sentiment.v1.jinja](./spacy_llm/tasks/templates/sentiment.v1.jinja). ~~str~~     |
 | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~             |
 | `field`    | Name of extension attribute to store summary in (i. e. the summary will be available in `doc._.{field}`). Defaults to `sentiment`. ~~str~~ |
@ -648,7 +682,9 @@ implementations can have other signatures, like
 ### Models via REST API {id="models-rest"}
-These models all take the same parameters, but note that the `config` should contain provider-specific keys and values, as it will be passed onwards to the provider's API.
+These models all take the same parameters, but note that the `config` should
 contain provider-specific keys and values, as it will be passed onwards to the
 provider's API.
 | Argument           | Description                                                                                                                                       |
 | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------- |