diff --git a/website/docs/api/large-language-models.mdx b/website/docs/api/large-language-models.mdx index 70d3548b2..b61ce0c36 100644 --- a/website/docs/api/large-language-models.mdx +++ b/website/docs/api/large-language-models.mdx @@ -189,7 +189,7 @@ means that the task will always perform few-shot prompting under the hood. | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ | Note that the `single_match` parameter, used in v1 and v2, is not supported -anymore, as the CoT parsing takes care of this automatically. +anymore, as the CoT parsing algorithm takes care of this automatically. #### spacy.NER.v2 {id="ner-v2"} @@ -312,6 +312,37 @@ path = "ner_examples.yml" The SpanCat task identifies potentially overlapping entities in text. +#### spacy.SpanCat.v3 {id="spancat-v3"} + +The built-in SpanCat v3 task is a simple adaptation of the NER v3 task to +support overlapping entities and store its annotations in `doc.spans`. + +> #### Example config +> +> ```ini +> [components.llm.task] +> @llm_tasks = "spacy.SpanCat.v3" +> labels = ["PERSON", "ORGANISATION", "LOCATION"] +> examples = null +> ``` + +| Argument | Description | +| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ | +| `label_definitions` | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ | +| `template` | Custom prompt template to send to LLM model. Defaults to [`spancat.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/spancat.v3.jinja). ~~str~~ | +| `description` (NEW) | A description of what to recognize or not recognize as entities. ~~str~~ | +| `spans_key` | Key of the `Doc.spans` dict to save the spans under. Defaults to `"sc"`. ~~str~~ | +| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | +| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ | +| `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ | +| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ | + +Note that the `single_match` parameter, used in v1 and v2, is not supported +anymore, as the CoT parsing algorithm takes care of this automatically. + +# TODO: check_label_consistency ? + #### spacy.SpanCat.v2 {id="spancat-v2"} The built-in SpanCat v2 task is a simple adaptation of the NER v2 task to @@ -329,8 +360,8 @@ support overlapping entities and store its annotations in `doc.spans`. | Argument | Description | | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ | -| `template` (NEW) | Custom prompt template to send to LLM model. Defaults to [`spancat.v2.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/spancat.v2.jinja). ~~str~~ | | `label_definitions` (NEW) | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ | +| `template` (NEW) | Custom prompt template to send to LLM model. Defaults to [`spancat.v2.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/spancat.v2.jinja). ~~str~~ | | `spans_key` | Key of the `Doc.spans` dict to save the spans under. Defaults to `"sc"`. ~~str~~ | | `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ | | `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. ~~Optional[Callable[[str], str]]~~ | diff --git a/website/docs/usage/large-language-models.mdx b/website/docs/usage/large-language-models.mdx index 525f349cc..e7a72bbe1 100644 --- a/website/docs/usage/large-language-models.mdx +++ b/website/docs/usage/large-language-models.mdx @@ -369,12 +369,13 @@ evaluate the component. | [`spacy.NER.v3`](/api/large-language-models#ner-v3 | Implements Chain-of-Thought reasoning for NER extraction - obtains higher accuracy than v1 or v2. | | [`spacy.NER.v2`](/api/large-language-models#ner-v2) | Builds on v1 and additionally supports defining the provided labels with explicit descriptions. | | [`spacy.NER.v1`](/api/large-language-models#ner-v1) | The original version of the built-in NER task supports both zero-shot and few-shot prompting. | +| [`spacy.SpanCat.v3`](/api/large-language-models#spancat-v3) | Adaptation of the v3 NER task to support overlapping entities and store its annotations in `doc.spans`. | | [`spacy.SpanCat.v2`](/api/large-language-models#spancat-v2) | Adaptation of the v2 NER task to support overlapping entities and store its annotations in `doc.spans`. | | [`spacy.SpanCat.v1`](/api/large-language-models#spancat-v1) | Adaptation of the v1 NER task to support overlapping entities and store its annotations in `doc.spans`. | +| [`spacy.REL.v1`](/api/large-language-models#rel-v1) | Relation Extraction task supporting both zero-shot and few-shot prompting. | | [`spacy.TextCat.v3`](/api/large-language-models#textcat-v3) | Version 3 builds on v2 and allows setting definitions of labels. | | [`spacy.TextCat.v2`](/api/large-language-models#textcat-v2) | Version 2 builds on v1 and includes an improved prompt template. | | [`spacy.TextCat.v1`](/api/large-language-models#textcat-v1) | Version 1 of the built-in TextCat task supports both zero-shot and few-shot prompting. | -| [`spacy.REL.v1`](/api/large-language-models#rel-v1) | Relation Extraction task supporting both zero-shot and few-shot prompting. | | [`spacy.Lemma.v1`](/api/large-language-models#lemma-v1) | Lemmatizes the provided text and updates the `lemma_` attribute of the tokens accordingly. | | [`spacy.Sentiment.v1`](/api/large-language-models#sentiment-v1) | Performs sentiment analysis on provided texts. | | [`spacy.NoOp.v1`](/api/large-language-models#noop-v1) | This task is only useful for testing - it tells the LLM to do nothing, and does not set any fields on the `docs`. |