mirror of
https://github.com/explosion/spaCy.git
synced 2025-07-10 08:12:24 +03:00
Shorten NER section
This commit is contained in:
parent
3e4264899c
commit
3266898466
|
@ -52,7 +52,9 @@ signatures of the `model` and `task` callables are consistent with each other
|
|||
and emits a warning if they don't. `validate_types` can be set to `False` if you
|
||||
want to disable this behavior.
|
||||
|
||||
### Tasks {id="tasks"}
|
||||
## Tasks {id="tasks"}
|
||||
|
||||
### Task implementation {id="task-implementation"}
|
||||
|
||||
A _task_ defines an NLP problem or question, that will be sent to the LLM via a
|
||||
prompt. Further, the task defines how to parse the LLM's responses back into
|
||||
|
@ -146,6 +148,10 @@ max_n_words = 20
|
|||
path = "summarization_examples.yml"
|
||||
```
|
||||
|
||||
### NER {id="ner"}
|
||||
|
||||
The NER task identifies non-overlapping entities in text.
|
||||
|
||||
#### spacy.NER.v2 {id="ner-v2"}
|
||||
|
||||
The built-in NER task supports both zero-shot and few-shot prompting. This
|
||||
|
@ -164,52 +170,17 @@ descriptions.
|
|||
| Argument | Description |
|
||||
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
|
||||
| `template` | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [ner.v2.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v2.jinja). ~~str~~ |
|
||||
| `label_definitions` | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
|
||||
| `template` (NEW) | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [ner.v2.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v2.jinja). ~~str~~ |
|
||||
| `label_definitions` (NEW) | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
|
||||
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
|
||||
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
|
||||
| `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ |
|
||||
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
|
||||
| `single_match` | Whether to match an entity in the LLM's response only once (the first hit) or multiple times. Defaults to `False`. ~~bool~~ |
|
||||
|
||||
The NER task implementation doesn't currently ask the LLM for specific offsets,
|
||||
but simply expects a list of strings that represent the enties in the document.
|
||||
This means that a form of string matching is required. This can be configured by
|
||||
the following parameters:
|
||||
|
||||
- The `single_match` parameter is typically set to `False` to allow for multiple
|
||||
matches. For instance, the response from the LLM might only mention the entity
|
||||
"Paris" once, but you'd still want to mark it every time it occurs in the
|
||||
document.
|
||||
- The case-sensitive matching is typically set to `False` to be robust against
|
||||
case variances in the LLM's output.
|
||||
- The `alignment_mode` argument is used to match entities as returned by the LLM
|
||||
to the tokens from the original `Doc` - specifically it's used as argument in
|
||||
the call to [`doc.char_span()`](/api/doc#char_span). The `"strict"` mode will
|
||||
only keep spans that strictly adhere to the given token boundaries.
|
||||
`"contract"` will only keep those tokens that are fully within the given
|
||||
range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will expand the
|
||||
span to the next token boundaries, e.g. expanding `"New Y"` out to
|
||||
`"New York"`.
|
||||
|
||||
To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts),
|
||||
you can write down a few examples in a separate file, and provide these to be
|
||||
injected into the prompt to the LLM. The default reader `spacy.FewShotReader.v1`
|
||||
supports `.yml`, `.yaml`, `.json` and `.jsonl`.
|
||||
|
||||
```yaml
|
||||
- text: Jack and Jill went up the hill.
|
||||
entities:
|
||||
PERSON:
|
||||
- Jack
|
||||
- Jill
|
||||
LOCATION:
|
||||
- hill
|
||||
- text: Jack fell down and broke his crown.
|
||||
entities:
|
||||
PERSON:
|
||||
- Jack
|
||||
```
|
||||
The parameters `alignment_mode`, `case_sensitive_matching` and `single_match`
|
||||
are identical to the [v1](#ner-v1) implementation. The format of few-shot
|
||||
examples are also the same.
|
||||
|
||||
```ini
|
||||
[components.llm.task]
|
||||
|
@ -223,12 +194,12 @@ path = "ner_examples.yml"
|
|||
> Label descriptions can also be used with explicit examples to give as much
|
||||
> info to the LLM model as possible.
|
||||
|
||||
You can also write definitions for each label and provide them via the
|
||||
`label_definitions` argument. This lets you tell the LLM exactly what you're
|
||||
looking for rather than relying on the LLM to interpret its task given just the
|
||||
label name. Label descriptions are freeform so you can write whatever you want
|
||||
here, but through some experiments a brief description along with some examples
|
||||
and counter examples seems to work quite well.
|
||||
New to v2, is the fact that you can write definitions for each label and provide
|
||||
them via the `label_definitions` argument. This lets you tell the LLM exactly
|
||||
what you're looking for rather than relying on the LLM to interpret its task
|
||||
given just the label name. Label descriptions are freeform so you can write
|
||||
whatever you want here, but a brief description along with some examples and
|
||||
counter examples seems to work quite well.
|
||||
|
||||
```ini
|
||||
[components.llm.task]
|
||||
|
|
Loading…
Reference in New Issue
Block a user