mirror of
https://github.com/explosion/spaCy.git
synced 2025-07-10 16:22:29 +03:00
Shorten NER section
This commit is contained in:
parent
3e4264899c
commit
3266898466
|
@ -52,7 +52,9 @@ signatures of the `model` and `task` callables are consistent with each other
|
||||||
and emits a warning if they don't. `validate_types` can be set to `False` if you
|
and emits a warning if they don't. `validate_types` can be set to `False` if you
|
||||||
want to disable this behavior.
|
want to disable this behavior.
|
||||||
|
|
||||||
### Tasks {id="tasks"}
|
## Tasks {id="tasks"}
|
||||||
|
|
||||||
|
### Task implementation {id="task-implementation"}
|
||||||
|
|
||||||
A _task_ defines an NLP problem or question, that will be sent to the LLM via a
|
A _task_ defines an NLP problem or question, that will be sent to the LLM via a
|
||||||
prompt. Further, the task defines how to parse the LLM's responses back into
|
prompt. Further, the task defines how to parse the LLM's responses back into
|
||||||
|
@ -146,6 +148,10 @@ max_n_words = 20
|
||||||
path = "summarization_examples.yml"
|
path = "summarization_examples.yml"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### NER {id="ner"}
|
||||||
|
|
||||||
|
The NER task identifies non-overlapping entities in text.
|
||||||
|
|
||||||
#### spacy.NER.v2 {id="ner-v2"}
|
#### spacy.NER.v2 {id="ner-v2"}
|
||||||
|
|
||||||
The built-in NER task supports both zero-shot and few-shot prompting. This
|
The built-in NER task supports both zero-shot and few-shot prompting. This
|
||||||
|
@ -164,52 +170,17 @@ descriptions.
|
||||||
| Argument | Description |
|
| Argument | Description |
|
||||||
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
|
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
|
||||||
| `template` | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [ner.v2.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v2.jinja). ~~str~~ |
|
| `template` (NEW) | Custom prompt template to send to LLM model. Default templates for each task are located in the `spacy_llm/tasks/templates` directory. Defaults to [ner.v2.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v2.jinja). ~~str~~ |
|
||||||
| `label_definitions` | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
|
| `label_definitions` (NEW) | Optional dict mapping a label to a description of that label. These descriptions are added to the prompt to help instruct the LLM on what to extract. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
|
||||||
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
|
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
|
||||||
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
|
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, defaults to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
|
||||||
| `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ |
|
| `alignment_mode` | Alignment mode in case the LLM returns entities that do not align with token boundaries. Options are `"strict"`, `"contract"` or `"expand"`. Defaults to `"contract"`. ~~str~~ |
|
||||||
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
|
| `case_sensitive_matching` | Whether to search without case sensitivity. Defaults to `False`. ~~bool~~ |
|
||||||
| `single_match` | Whether to match an entity in the LLM's response only once (the first hit) or multiple times. Defaults to `False`. ~~bool~~ |
|
| `single_match` | Whether to match an entity in the LLM's response only once (the first hit) or multiple times. Defaults to `False`. ~~bool~~ |
|
||||||
|
|
||||||
The NER task implementation doesn't currently ask the LLM for specific offsets,
|
The parameters `alignment_mode`, `case_sensitive_matching` and `single_match`
|
||||||
but simply expects a list of strings that represent the enties in the document.
|
are identical to the [v1](#ner-v1) implementation. The format of few-shot
|
||||||
This means that a form of string matching is required. This can be configured by
|
examples are also the same.
|
||||||
the following parameters:
|
|
||||||
|
|
||||||
- The `single_match` parameter is typically set to `False` to allow for multiple
|
|
||||||
matches. For instance, the response from the LLM might only mention the entity
|
|
||||||
"Paris" once, but you'd still want to mark it every time it occurs in the
|
|
||||||
document.
|
|
||||||
- The case-sensitive matching is typically set to `False` to be robust against
|
|
||||||
case variances in the LLM's output.
|
|
||||||
- The `alignment_mode` argument is used to match entities as returned by the LLM
|
|
||||||
to the tokens from the original `Doc` - specifically it's used as argument in
|
|
||||||
the call to [`doc.char_span()`](/api/doc#char_span). The `"strict"` mode will
|
|
||||||
only keep spans that strictly adhere to the given token boundaries.
|
|
||||||
`"contract"` will only keep those tokens that are fully within the given
|
|
||||||
range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will expand the
|
|
||||||
span to the next token boundaries, e.g. expanding `"New Y"` out to
|
|
||||||
`"New York"`.
|
|
||||||
|
|
||||||
To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts),
|
|
||||||
you can write down a few examples in a separate file, and provide these to be
|
|
||||||
injected into the prompt to the LLM. The default reader `spacy.FewShotReader.v1`
|
|
||||||
supports `.yml`, `.yaml`, `.json` and `.jsonl`.
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
- text: Jack and Jill went up the hill.
|
|
||||||
entities:
|
|
||||||
PERSON:
|
|
||||||
- Jack
|
|
||||||
- Jill
|
|
||||||
LOCATION:
|
|
||||||
- hill
|
|
||||||
- text: Jack fell down and broke his crown.
|
|
||||||
entities:
|
|
||||||
PERSON:
|
|
||||||
- Jack
|
|
||||||
```
|
|
||||||
|
|
||||||
```ini
|
```ini
|
||||||
[components.llm.task]
|
[components.llm.task]
|
||||||
|
@ -223,12 +194,12 @@ path = "ner_examples.yml"
|
||||||
> Label descriptions can also be used with explicit examples to give as much
|
> Label descriptions can also be used with explicit examples to give as much
|
||||||
> info to the LLM model as possible.
|
> info to the LLM model as possible.
|
||||||
|
|
||||||
You can also write definitions for each label and provide them via the
|
New to v2, is the fact that you can write definitions for each label and provide
|
||||||
`label_definitions` argument. This lets you tell the LLM exactly what you're
|
them via the `label_definitions` argument. This lets you tell the LLM exactly
|
||||||
looking for rather than relying on the LLM to interpret its task given just the
|
what you're looking for rather than relying on the LLM to interpret its task
|
||||||
label name. Label descriptions are freeform so you can write whatever you want
|
given just the label name. Label descriptions are freeform so you can write
|
||||||
here, but through some experiments a brief description along with some examples
|
whatever you want here, but a brief description along with some examples and
|
||||||
and counter examples seems to work quite well.
|
counter examples seems to work quite well.
|
||||||
|
|
||||||
```ini
|
```ini
|
||||||
[components.llm.task]
|
[components.llm.task]
|
||||||
|
|
Loading…
Reference in New Issue
Block a user