mirror of
https://github.com/explosion/spaCy.git
synced 2025-07-27 08:29:51 +03:00
commit
6161591889
|
@ -602,7 +602,7 @@ on an upstream NER component for entities extraction.
|
|||
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `labels` | List of labels or str of comma-separated list of labels. ~~Union[List[str], str]~~ |
|
||||
| `template` | Custom prompt template to send to LLM model. Defaults to [`rel.v3.jinja`](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/rel.v1.jinja). ~~str~~ |
|
||||
| `label_description` | Dictionary providing a description for each relation label. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
|
||||
| `label_definitions` | Dictionary providing a description for each relation label. Defaults to `None`. ~~Optional[Dict[str, str]]~~ |
|
||||
| `examples` | Optional function that generates examples for few-shot learning. Defaults to `None`. ~~Optional[Callable[[], Iterable[Any]]]~~ |
|
||||
| `normalizer` | Function that normalizes the labels as returned by the LLM. If `None`, falls back to `spacy.LowercaseNormalizer.v1`. Defaults to `None`. ~~Optional[Callable[[str], str]]~~ |
|
||||
| `verbose` | If set to `True`, warnings will be generated when the LLM returns invalid responses. Defaults to `False`. ~~bool~~ |
|
||||
|
@ -621,6 +621,7 @@ supports `.yml`, `.yaml`, `.json` and `.jsonl`.
|
|||
[components.llm.task]
|
||||
@llm_tasks = "spacy.REL.v1"
|
||||
labels = ["LivesIn", "Visits"]
|
||||
|
||||
[components.llm.task.examples]
|
||||
@misc = "spacy.FewShotReader.v1"
|
||||
path = "rel_examples.jsonl"
|
||||
|
|
|
@ -184,7 +184,7 @@ nlp.add_pipe(
|
|||
"labels": ["PERSON", "ORGANISATION", "LOCATION"]
|
||||
},
|
||||
"model": {
|
||||
"@llm_models": "spacy.gpt-3.5.v1",
|
||||
"@llm_models": "spacy.GPT-3-5.v1",
|
||||
},
|
||||
},
|
||||
)
|
||||
|
|
|
@ -180,7 +180,7 @@ Some of the main advantages and features of spaCy's training config are:
|
|||
|
||||
Under the hood, the config is parsed into a dictionary. It's divided into
|
||||
sections and subsections, indicated by the square brackets and dot notation. For
|
||||
example, `[training]` is a section and `[training.batch_size]` a subsection.
|
||||
example, `[training]` is a section and `[training.batcher]` a subsection.
|
||||
Subsections can define values, just like a dictionary, or use the `@` syntax to
|
||||
refer to [registered functions](#config-functions). This allows the config to
|
||||
not just define static settings, but also construct objects like architectures,
|
||||
|
@ -254,7 +254,7 @@ For cases like this, you can set additional command-line options starting with
|
|||
block.
|
||||
|
||||
```bash
|
||||
$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy --paths.dev ./corpus/dev.spacy --training.batch_size 128
|
||||
$ python -m spacy train config.cfg --paths.train ./corpus/train.spacy --paths.dev ./corpus/dev.spacy --training.max_epochs 3
|
||||
```
|
||||
|
||||
Only existing sections and values in the config can be overwritten. At the end
|
||||
|
@ -279,7 +279,7 @@ process. Environment variables **take precedence** over CLI overrides and values
|
|||
defined in the config file.
|
||||
|
||||
```bash
|
||||
$ SPACY_CONFIG_OVERRIDES="--system.gpu_allocator pytorch --training.batch_size 128" ./your_script.sh
|
||||
$ SPACY_CONFIG_OVERRIDES="--system.gpu_allocator pytorch --training.max_epochs 3" ./your_script.sh
|
||||
```
|
||||
|
||||
### Reading from standard input {id="config-stdin"}
|
||||
|
@ -578,16 +578,17 @@ now-updated model to the predicted docs.
|
|||
|
||||
The training configuration defined in the config file doesn't have to only
|
||||
consist of static values. Some settings can also be **functions**. For instance,
|
||||
the `batch_size` can be a number that doesn't change, or a schedule, like a
|
||||
the batch size can be a number that doesn't change, or a schedule, like a
|
||||
sequence of compounding values, which has shown to be an effective trick (see
|
||||
[Smith et al., 2017](https://arxiv.org/abs/1711.00489)).
|
||||
|
||||
```ini {title="With static value"}
|
||||
[training]
|
||||
batch_size = 128
|
||||
[training.batcher]
|
||||
@batchers = "spacy.batch_by_words.v1"
|
||||
size = 3000
|
||||
```
|
||||
|
||||
To refer to a function instead, you can make `[training.batch_size]` its own
|
||||
To refer to a function instead, you can make `[training.batcher.size]` its own
|
||||
section and use the `@` syntax to specify the function and its arguments – in
|
||||
this case [`compounding.v1`](https://thinc.ai/docs/api-schedules#compounding)
|
||||
defined in the [function registry](/api/top-level#registry). All other values
|
||||
|
@ -606,7 +607,7 @@ from your configs.
|
|||
> optimizer.
|
||||
|
||||
```ini {title="With registered function"}
|
||||
[training.batch_size]
|
||||
[training.batcher.size]
|
||||
@schedules = "compounding.v1"
|
||||
start = 100
|
||||
stop = 1000
|
||||
|
@ -1027,14 +1028,14 @@ def my_custom_schedule(start: int = 1, factor: float = 1.001):
|
|||
```
|
||||
|
||||
In your config, you can now reference the schedule in the
|
||||
`[training.batch_size]` block via `@schedules`. If a block contains a key
|
||||
`[training.batcher.size]` block via `@schedules`. If a block contains a key
|
||||
starting with an `@`, it's interpreted as a reference to a function. All other
|
||||
settings in the block will be passed to the function as keyword arguments. Keep
|
||||
in mind that the config shouldn't have any hidden defaults and all arguments on
|
||||
the functions need to be represented in the config.
|
||||
|
||||
```ini {title="config.cfg (excerpt)"}
|
||||
[training.batch_size]
|
||||
[training.batcher.size]
|
||||
@schedules = "my_custom_schedule.v1"
|
||||
start = 2
|
||||
factor = 1.005
|
||||
|
|
|
@ -2806,7 +2806,7 @@
|
|||
"",
|
||||
"# see github repo for examples on sentence-transformers and Huggingface",
|
||||
"nlp = spacy.load('en_core_web_md')",
|
||||
"nlp.add_pipe(\"text_categorizer\", ",
|
||||
"nlp.add_pipe(\"classy_classification\", ",
|
||||
" config={",
|
||||
" \"data\": data,",
|
||||
" \"model\": \"spacy\"",
|
||||
|
@ -3010,8 +3010,8 @@
|
|||
"# Load the spaCy language model:",
|
||||
"nlp = spacy.load(\"en_core_web_sm\")",
|
||||
"",
|
||||
"# Add the \"text_categorizer\" pipeline component to the spaCy model, and configure it with SetFit parameters:",
|
||||
"nlp.add_pipe(\"text_categorizer\", config={",
|
||||
"# Add the \"spacy_setfit\" pipeline component to the spaCy model, and configure it with SetFit parameters:",
|
||||
"nlp.add_pipe(\"spacy_setfit\", config={",
|
||||
" \"pretrained_model_name_or_path\": \"paraphrase-MiniLM-L3-v2\",",
|
||||
" \"setfit_trainer_args\": {",
|
||||
" \"train_dataset\": train_dataset",
|
||||
|
|
Loading…
Reference in New Issue
Block a user