Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
This commit is contained in:
Victoria 2023-07-19 15:16:01 +02:00 committed by GitHub
parent cb92ddfa55
commit 3977acba37
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -12,7 +12,7 @@ menu:
--- ---
[The spacy-llm package](https://github.com/explosion/spacy-llm) integrates Large [The spacy-llm package](https://github.com/explosion/spacy-llm) integrates Large
Language Models (LLMs) into [spaCy](https://spacy.io), featuring a modular Language Models (LLMs) into spaCy pipelines, featuring a modular
system for **fast prototyping** and **prompting**, and turning unstructured system for **fast prototyping** and **prompting**, and turning unstructured
responses into **robust outputs** for various NLP tasks, **no training data** responses into **robust outputs** for various NLP tasks, **no training data**
required. required.
@ -25,11 +25,10 @@ required.
- Access to - Access to
**[OpenAI API](https://platform.openai.com/docs/api-reference/introduction)**, **[OpenAI API](https://platform.openai.com/docs/api-reference/introduction)**,
including GPT-4 and various GPT-3 models including GPT-4 and various GPT-3 models
- Built-in support for **open-source - Built-in support for various **open-source** models hosted on [Hugging Face](https://huggingface.co/)
[Dolly](https://huggingface.co/databricks)** models hosted on Hugging Face - Usage examples for standard NLP tasks such as **Named Entity Recognition** and **Text Classification**
- Usage examples for **Named Entity Recognition** and **Text Classification**
- Easy implementation of **your own functions** via - Easy implementation of **your own functions** via
[spaCy's registry](https://spacy.io/api/top-level#registry) for custom the [registry](https://spacy.io/api/top-level#registry) for custom
prompting, parsing and model integrations prompting, parsing and model integrations
## Motivation {id="motivation"} ## Motivation {id="motivation"}
@ -83,7 +82,7 @@ python -m pip install spacy-llm
## Usage {id="usage"} ## Usage {id="usage"}
The task and the model have to be supplied to the `llm` pipeline component using The task and the model have to be supplied to the `llm` pipeline component using
[spaCy's config system](https://spacy.io/api/data-formats#config). This package the [config system](https://spacy.io/api/data-formats#config). This package
provides various built-in functionality, as detailed in the [API](#-api) provides various built-in functionality, as detailed in the [API](#-api)
documentation. documentation.
@ -174,7 +173,7 @@ to be `"databricks/dolly-v2-12b"` for better performance.
### Example 3: Create the component directly in Python {id="example-3"} ### Example 3: Create the component directly in Python {id="example-3"}
The `llm` component behaves as any other spaCy component does, so adding it to The `llm` component behaves as any other component does, so adding it to
an existing pipeline follows the same pattern: an existing pipeline follows the same pattern:
```python ```python
@ -205,12 +204,12 @@ calling `nlp(doc)` with a single document.
### Example 4: Implement your own custom task {id="example-4"} ### Example 4: Implement your own custom task {id="example-4"}
To write a [`task`](#tasks), you need to implement two functions: To write a [`task`](#tasks), you need to implement two functions:
`generate_prompts` that takes a list of spaCy [`Doc`](https://spacy.io/api/doc) `generate_prompts` that takes a list of [`Doc`](https://spacy.io/api/doc)
objects and transforms them into a list of prompts, and `parse_responses` that objects and transforms them into a list of prompts, and `parse_responses` that
transforms the LLM outputs into annotations on the transforms the LLM outputs into annotations on the
[`Doc`](https://spacy.io/api/doc), e.g. entity spans, text categories and more. [`Doc`](https://spacy.io/api/doc), e.g. entity spans, text categories and more.
To register your custom task with spaCy, decorate a factory function using the To register your custom task, decorate a factory function using the
`spacy_llm.registry.llm_tasks` decorator with a custom name that you can refer `spacy_llm.registry.llm_tasks` decorator with a custom name that you can refer
to in your config. to in your config.
@ -326,7 +325,7 @@ An `llm` component is defined by two main settings:
- A [**task**](#tasks), defining the prompt to send to the LLM as well as the - A [**task**](#tasks), defining the prompt to send to the LLM as well as the
functionality to parse the resulting response back into structured fields on functionality to parse the resulting response back into structured fields on
spaCy's [Doc](https://spacy.io/api/doc) objects. the [Doc](https://spacy.io/api/doc) objects.
- A [**model**](#models) defining the model to use and how to connect to it. - A [**model**](#models) defining the model to use and how to connect to it.
Note that `spacy-llm` supports both access to external APIs (such as OpenAI) Note that `spacy-llm` supports both access to external APIs (such as OpenAI)
as well as access to self-hosted open-source LLMs (such as using Dolly through as well as access to self-hosted open-source LLMs (such as using Dolly through
@ -339,7 +338,7 @@ through a REST API) more than once.
Finally, you can choose to save a stringified version of LLM prompts/responses Finally, you can choose to save a stringified version of LLM prompts/responses
within the `Doc.user_data["llm_io"]` attribute by setting `save_io` to `True`. within the `Doc.user_data["llm_io"]` attribute by setting `save_io` to `True`.
`Doc.user_data["llm_io"]` is a dictionary containing one entry for every LLM `Doc.user_data["llm_io"]` is a dictionary containing one entry for every LLM
component within the spaCy pipeline. Each entry is itself a dictionary, with two component within the `nlp` pipeline. Each entry is itself a dictionary, with two
keys: `prompt` and `response`. keys: `prompt` and `response`.
A note on `validate_types`: by default, `spacy-llm` checks whether the A note on `validate_types`: by default, `spacy-llm` checks whether the
@ -351,7 +350,7 @@ want to disable this behavior.
A _task_ defines an NLP problem or question, that will be sent to the LLM via a A _task_ defines an NLP problem or question, that will be sent to the LLM via a
prompt. Further, the task defines how to parse the LLM's responses back into prompt. Further, the task defines how to parse the LLM's responses back into
structured information. All tasks are registered in spaCy's `llm_tasks` structured information. All tasks are registered in the `llm_tasks`
registry. registry.
Practically speaking, a task should adhere to the `Protocol` `LLMTask` defined Practically speaking, a task should adhere to the `Protocol` `LLMTask` defined
@ -389,7 +388,7 @@ method is defined, `spacy-llm` will call it to evaluate the component.
All built-in tasks support few-shot prompts, i. e. including examples in a All built-in tasks support few-shot prompts, i. e. including examples in a
prompt. Examples can be supplied in two ways: (1) as a separate file containing prompt. Examples can be supplied in two ways: (1) as a separate file containing
only examples or (2) by initializing `llm` with a `get_examples()` callback only examples or (2) by initializing `llm` with a `get_examples()` callback
(like any other spaCy pipeline component). (like any other pipeline component).
##### (1) Few-shot example file ##### (1) Few-shot example file