Shorten NER section

2025-08-02 11:20:19 +03:00 · 2023-08-31 14:42:30 +02:00 · 2023-08-31 14:42:30 +02:00 · 3266898466
commit 3266898466
parent 3e4264899c
1 changed files with 18 additions and 47 deletions
--- a/website/docs/api/large-language-models.mdx
+++ b/website/docs/api/large-language-models.mdx
@ -52,7 +52,9 @@ signatures of the `model` and `task` callables are consistent with each other
 and emits a warning if they don't. `validate_types` can be set to `False` if you
 want to disable this behavior.

-### Tasks {id="tasks"}
+## Tasks {id="tasks"}
+
+### Task implementation {id="task-implementation"}

 A _task_ defines an NLP problem or question, that will be sent to the LLM via a
 prompt. Further, the task defines how to parse the LLM's responses back into
@ -146,6 +148,10 @@ max_n_words = 20
 path = "summarization_examples.yml"
 ```

+### NER {id="ner"}
+
+The NER task identifies non-overlapping entities in text.
+
 #### spacy.NER.v2 {id="ner-v2"}

 The built-in NER task supports both zero-shot and few-shot prompting. This
@ -164,52 +170,17 @@ descriptions.
 | Argument                  | Description                                                                                                                                                                                                                                                         |
 | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `labels`                  | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~                                                                                                                                                                                  |
-| `template`                | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [ner.v2.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v2.jinja). ~~str~~ |
-| `label_definitions`       | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~                                                              |
+| `template` (NEW)          | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [ner.v2.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v2.jinja). ~~str~~ |
+| `label_definitions` (NEW) | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~                                                              |
 | `examples`                | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~                                                                                                                                      |
 | `normalizer`              | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~                                                                                           |
 | `alignment_mode`          | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~                                                                                      |
 | `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~                                                                                                                                                                                           |
 | `single_match`            | Whether to match an entity in the LLM's response only once (the first hit) or multiple times. Defaults to `False`. ~~bool~~                                                                                                                                         |

-The NER task implementation doesn't currently ask the LLM for specific offsets,
-but simply expects a list of strings that represent the enties in the document.
-This means that a form of string matching is required. This can be configured by
-the following parameters:
-
- The `single_match` parameter is typically set to `False` to allow for multiple
-  matches. For instance, the response from the LLM might only mention the entity
-  "Paris" once, but you'd still want to mark it every time it occurs in the
-  document.
- The case-sensitive matching is typically set to `False` to be robust against
-  case variances in the LLM's output.
- The `alignment_mode` argument is used to match entities as returned by the LLM
-  to the tokens from the original `Doc` - specifically it's used as argument in
-  the call to [`doc.char_span()`](/api/doc#char_span). The `"strict"` mode will
-  only keep spans that strictly adhere to the given token boundaries.
-  `"contract"` will only keep those tokens that are fully within the given
-  range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will expand the
-  span to the next token boundaries, e.g. expanding `"New Y"` out to
-  `"New York"`.
-
-To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts),
-you can write down a few examples in a separate file, and provide these to be
-injected into the prompt to the LLM. The default reader `spacy.FewShotReader.v1`
-supports `.yml`, `.yaml`, `.json` and `.jsonl`.
-
-```yaml
- text: Jack and Jill went up the hill.
-  entities:
-    PERSON:
-      - Jack
-      - Jill
-    LOCATION:
-      - hill
- text: Jack fell down and broke his crown.
-  entities:
-    PERSON:
-      - Jack
-```
+The parameters `alignment_mode`, `case_sensitive_matching` and `single_match`
+are identical to the [v1](#ner-v1) implementation. The format of few-shot
+examples are also the same.

 ```ini
 [components.llm.task]
@ -223,12 +194,12 @@ path = "ner_examples.yml"
 > Label descriptions can also be used with explicit examples to give as much
 > info to the LLM model as possible.

-You can also write definitions for each label and provide them via the
-`label_definitions` argument. This lets you tell the LLM exactly what you're
-looking for rather than relying on the LLM to interpret its task given just the
-label name. Label descriptions are freeform so you can write whatever you want
-here, but through some experiments a brief description along with some examples
-and counter examples seems to work quite well.
+New to v2, is the fact that you can write definitions for each label and provide
+them via the `label_definitions` argument. This lets you tell the LLM exactly
+what you're looking for rather than relying on the LLM to interpret its task
+given just the label name. Label descriptions are freeform so you can write
+whatever you want here, but a brief description along with some examples and
+counter examples seems to work quite well.

 ```ini
 [components.llm.task]