Update spacy-llm task argument docs w.r.t. task refactoring (#12995)

* Update task arguments w.r.t. task refactoring in 0.5.0.

* Add disclaimer w.r.t. gated models/Llama 2.

* Update website/docs/api/large-language-models.mdx

* Update website/docs/api/large-language-models.mdx
This commit is contained in:
Raphael Mitsch 2023-10-05 08:45:25 +02:00 committed by GitHub
parent 829613b959
commit 734826db79
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -255,9 +255,11 @@ prompting.
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `template` | Custom prompt template to send to LLM model. Defaults to [summarization.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/summarization.v1.jinja). ~~str~~ | | `template` | Custom prompt template to send to LLM model. Defaults to [summarization.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/summarization.v1.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[SummarizationTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `SummarizationExample`. ~~Optional[Type[FewshotExample]]~~ |
| `max_n_words` | Maximum number of words to be used in summary. Note that this should not expected to work exactly. Defaults to `None`. ~~Optional[int]~~ | | `max_n_words` | Maximum number of words to be used in summary. Note that this should not expected to work exactly. Defaults to `None`. ~~Optional[int]~~ |
| `field` | Name of extension attribute to store summary in (i. e. the summary will be available in `doc._.{field}`). Defaults to `summary`. ~~str~~ | | `field` | Name of extension attribute to store summary in (i. e. the summary will be available in `doc._.{field}`). Defaults to `summary`. ~~str~~ |
@ -326,12 +328,15 @@ the v3 implementation will use a dummy example in the prompt. Technically this
means that the task will always perform few-shot prompting under the hood. means that the task will always perform few-shot prompting under the hood.
| Argument | Description | | Argument | Description |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `template` | Custom prompt template to send to LLM model. Defaults to [ner.v3.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v3.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[NERTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `NERExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ | | `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `label_definitions` | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ | | `label_definitions` | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `template` | Custom prompt template to send to LLM model. Defaults to [ner.v3.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v3.jinja). ~~str~~ |
| `description` (NEW) | A description of what to recognize or not recognize as entities. ~~str~~ | | `description` (NEW) | A description of what to recognize or not recognize as entities. ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
| `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ | | `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ |
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ | | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
@ -416,11 +421,14 @@ v1.
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `label_definitions` (NEW) | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `template` (NEW) | Custom prompt template to send to LLM model. Defaults to [ner.v2.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v2.jinja). ~~str~~ | | `template` (NEW) | Custom prompt template to send to LLM model. Defaults to [ner.v2.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v2.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[NERTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `NERExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `label_definitions` (NEW) | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
| `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ | | `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ |
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ | | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
@ -468,9 +476,12 @@ few-shot prompting.
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `labels` | Comma-separated list of labels. ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[NERTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `NERExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `labels` | Comma-separated list of labels. ~~str~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ |
| `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ | | `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ |
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ | | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
@ -540,13 +551,16 @@ support overlapping entities and store its annotations in `doc.spans`.
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `template` | Custom prompt template to send to LLM model. Defaults to [`spancat.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/spancat.v3.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[SpanCatTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `SpanCatExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ | | `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `label_definitions` | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ | | `label_definitions` | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `template` | Custom prompt template to send to LLM model. Defaults to [`spancat.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/spancat.v3.jinja). ~~str~~ |
| `description` (NEW) | A description of what to recognize or not recognize as entities. ~~str~~ | | `description` (NEW) | A description of what to recognize or not recognize as entities. ~~str~~ |
| `spans_key` | Key of the `Doc.spans` dict to save the spans under. Defaults to `"sc"`. ~~str~~ | | `spans_key` | Key of the `Doc.spans` dict to save the spans under. Defaults to `"sc"`. ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ |
| `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ | | `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ |
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ | | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
@ -569,12 +583,15 @@ support overlapping entities and store its annotations in `doc.spans`.
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `template` (NEW) | Custom prompt template to send to LLM model. Defaults to [`spancat.v2.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/spancat.v2.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[SpanCatTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `SpanCatExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ | | `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `label_definitions` (NEW) | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ | | `label_definitions` (NEW) | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `template` (NEW) | Custom prompt template to send to LLM model. Defaults to [`spancat.v2.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/spancat.v2.jinja). ~~str~~ |
| `spans_key` | Key of the `Doc.spans` dict to save the spans under. Defaults to `"sc"`. ~~str~~ | | `spans_key` | Key of the `Doc.spans` dict to save the spans under. Defaults to `"sc"`. ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ |
| `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ | | `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ |
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ | | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
@ -600,10 +617,13 @@ v1 NER task to support overlapping entities and store its annotations in
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[SpanCatTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `SpanCatExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `labels` | Comma-separated list of labels. ~~str~~ | | `labels` | Comma-separated list of labels. ~~str~~ |
| `spans_key` | Key of the `Doc.spans` dict to save the spans under. Defaults to `"sc"`. ~~str~~ | | `spans_key` | Key of the `Doc.spans` dict to save the spans under. Defaults to `"sc"`. ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ |
| `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ | | `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ |
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ | | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
@ -637,11 +657,14 @@ prompt.
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `label_definitions` (NEW) | Dictionary of label definitions. Included in the prompt, if set. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `template` | Custom prompt template to send to LLM model. Defaults to [`textcat.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/textcat.v3.jinja). ~~str~~ | | `template` | Custom prompt template to send to LLM model. Defaults to [`textcat.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/textcat.v3.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[SpanCatTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `TextCatExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `label_definitions` (NEW) | Dictionary of label definitions. Included in the prompt, if set. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
| `exclusive_classes` | If set to `True`, only one label per document should be valid. If set to `False`, one document can have multiple labels. Defaults to `False`. ~~bool~~ | | `exclusive_classes` | If set to `True`, only one label per document should be valid. If set to `False`, one document can have multiple labels. Defaults to `False`. ~~bool~~ |
| `allow_none` | When set to `True`, allows the LLM to not return any of the given label. The resulting dict in `doc.cats` will have `0.0` scores for all labels. Defaults to `True`. ~~bool~~ | | `allow_none` | When set to `True`, allows the LLM to not return any of the given label. The resulting dict in `doc.cats` will have `0.0` scores for all labels. Defaults to `True`. ~~bool~~ |
@ -664,10 +687,13 @@ V2 includes all v1 functionality, with an improved prompt template.
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `template` (NEW) | Custom prompt template to send to LLM model. Defaults to [`textcat.v2.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/textcat.v2.jinja). ~~str~~ | | `template` (NEW) | Custom prompt template to send to LLM model. Defaults to [`textcat.v2.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/textcat.v2.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[SpanCatTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `TextCatExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ |
| `exclusive_classes` | If set to `True`, only one label per document should be valid. If set to `False`, one document can have multiple labels. Defaults to `False`. ~~bool~~ | | `exclusive_classes` | If set to `True`, only one label per document should be valid. If set to `False`, one document can have multiple labels. Defaults to `False`. ~~bool~~ |
| `allow_none` | When set to `True`, allows the LLM to not return any of the given label. The resulting dict in `doc.cats` will have `0.0` scores for all labels. Defaults to `True`. ~~bool~~ | | `allow_none` | When set to `True`, allows the LLM to not return any of the given label. The resulting dict in `doc.cats` will have `0.0` scores for all labels. Defaults to `True`. ~~bool~~ |
@ -691,13 +717,16 @@ prompting.
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `labels` | Comma-separated list of labels. ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Deafults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Deafults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[SpanCatTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `TextCatExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `labels` | Comma-separated list of labels. ~~str~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ |
| `exclusive_classes` | If set to `True`, only one label per document should be valid. If set to `False`, one document can have multiple labels. Deafults to `False`. ~~bool~~ | | `exclusive_classes` | If set to `True`, only one label per document should be valid. If set to `False`, one document can have multiple labels. Defaults to `False`. ~~bool~~ |
| `allow_none` | When set to `True`, allows the LLM to not return any of the given label. The resulting dict in `doc.cats` will have `0.0` scores for all labels. Deafults to `True`. ~~bool~~ | | `allow_none` | When set to `True`, allows the LLM to not return any of the given label. The resulting dict in `doc.cats` will have `0.0` scores for all labels. Defaults to `True`. ~~bool~~ |
| `verbose` | If set to `True`, warnings will be generated when the LLM returns invalid responses. Deafults to `False`. ~~bool~~ | | `verbose` | If set to `True`, warnings will be generated when the LLM returns invalid responses. Defaults to `False`. ~~bool~~ |
To perform [few-shot learning](/usage/large-language-models#few-shot-prompts), To perform [few-shot learning](/usage/large-language-models#few-shot-prompts),
you can write down a few examples in a separate file, and provide these to be you can write down a few examples in a separate file, and provide these to be
@ -741,11 +770,14 @@ on an upstream NER component for entities extraction.
> ``` > ```
| Argument | Description | | Argument | Description |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `template` | Custom prompt template to send to LLM model. Defaults to [`rel.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/rel.v1.jinja). ~~str~~ | | `template` | Custom prompt template to send to LLM model. Defaults to [`rel.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/rel.v1.jinja). ~~str~~ |
| `label_definitions` | Dictionary providing a description for each relation label. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[RELTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `RELExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
| `label_definitions` | Dictionary providing a description for each relation label. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
| `verbose` | If set to `True`, warnings will be generated when the LLM returns invalid responses. Defaults to `False`. ~~bool~~ | | `verbose` | If set to `True`, warnings will be generated when the LLM returns invalid responses. Defaults to `False`. ~~bool~~ |
@ -794,9 +826,12 @@ This task supports both zero-shot and few-shot prompting.
> ``` > ```
| Argument | Description | | Argument | Description |
| ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `template` | Custom prompt template to send to LLM model. Defaults to [lemma.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/lemma.v1.jinja). ~~str~~ | | `template` | Custom prompt template to send to LLM model. Defaults to [lemma.v1.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/lemma.v1.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[LemmaTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `LemmaExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
The task prompts the LLM to lemmatize the passed text and return the lemmatized The task prompts the LLM to lemmatize the passed text and return the lemmatized
version as a list of tokens and their corresponding lemma. E. g. the text version as a list of tokens and their corresponding lemma. E. g. the text
@ -871,9 +906,12 @@ This task supports both zero-shot and few-shot prompting.
> ``` > ```
| Argument | Description | | Argument | Description |
| ---------- | ------------------------------------------------------------------------------------------------------------------------------------------ | | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `template` | Custom prompt template to send to LLM model. Defaults to [sentiment.v1.jinja](./spacy_llm/tasks/templates/sentiment.v1.jinja). ~~str~~ | | `template` | Custom prompt template to send to LLM model. Defaults to [sentiment.v1.jinja](./spacy_llm/tasks/templates/sentiment.v1.jinja). ~~str~~ |
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
| `parse_responses` (NEW) | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[SentimentTask]]~~ |
| `prompt_example_type` (NEW) | Type to use for fewshot examples. Defaults to `SentimentExample`. ~~Optional[Type[FewshotExample]]~~ |
| `scorer` (NEW) | Scorer function that evaluates the task performance on provided examples. Defaults to the metric used by spaCy. ~~Optional[Scorer]~~ |
| `field` | Name of extension attribute to store summary in (i. e. the summary will be available in `doc._.{field}`). Defaults to `sentiment`. ~~str~~ | | `field` | Name of extension attribute to store summary in (i. e. the summary will be available in `doc._.{field}`). Defaults to `sentiment`. ~~str~~ |
To perform [few-shot learning](/usage/large-language-models#few-shot-prompts), To perform [few-shot learning](/usage/large-language-models#few-shot-prompts),
@ -1042,6 +1080,21 @@ Currently, these models are provided as part of the core library:
| `spacy.StableLM.v1` | Stability AI | `["stablelm-base-alpha-3b", "stablelm-base-alpha-7b", "stablelm-tuned-alpha-3b", "stablelm-tuned-alpha-7b"]` | https://huggingface.co/stabilityai | | `spacy.StableLM.v1` | Stability AI | `["stablelm-base-alpha-3b", "stablelm-base-alpha-7b", "stablelm-tuned-alpha-3b", "stablelm-tuned-alpha-7b"]` | https://huggingface.co/stabilityai |
| `spacy.OpenLLaMA.v1` | OpenLM Research | `["open_llama_3b", "open_llama_7b", "open_llama_7b_v2", "open_llama_13b"]` | https://huggingface.co/openlm-research | | `spacy.OpenLLaMA.v1` | OpenLM Research | `["open_llama_3b", "open_llama_7b", "open_llama_7b_v2", "open_llama_13b"]` | https://huggingface.co/openlm-research |
<Infobox variant="warning" title="Gated models on Hugging Face" id="hf_licensing">
Some models available on Hugging Face (HF), such as Llama 2, are _gated models_.
That means that users have to fulfill certain requirements to be allowed access
to these models. In the case of Llama 2 you'll need to request agree to Meta's
Terms of Service while logged in with your HF account. After Meta grants you
permission to use Llama 2, you'll be able to download and use the model.
This requires that you are logged in with your HF account on your local
machine - check out the HF quick start documentation. In a nutshell, you'll need
to create an access token on HF and log in to HF using your access token, e. g.
with `huggingface-cli login`.
</Infobox>
Note that Hugging Face will download the model the first time you use it - you Note that Hugging Face will download the model the first time you use it - you
can can
[define the cached directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache) [define the cached directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache)