This commit is contained in:
Raphael Mitsch 2024-02-01 11:17:49 +00:00 committed by GitHub
commit 795c24deeb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 52 additions and 96 deletions

View File

@ -21,8 +21,8 @@ through a generic `llm`
[component factory](https://spacy.io/usage/processing-pipelines#custom-components-factories) [component factory](https://spacy.io/usage/processing-pipelines#custom-components-factories)
as well as through task-specific component factories: `llm_ner`, `llm_spancat`, as well as through task-specific component factories: `llm_ner`, `llm_spancat`,
`llm_rel`, `llm_textcat`, `llm_sentiment`, `llm_summarization`, `llm_rel`, `llm_textcat`, `llm_sentiment`, `llm_summarization`,
`llm_entity_linker`, `llm_raw` and `llm_translation`. For these factories, the `llm_entity_linker`, `llm_raw` and `llm_translation`. For these factories,
GPT-3-5 model from OpenAI is used by default, but this can be customized. OpenAI's GPT-3.5 model is used by default, but this can be customized.
> #### Example > #### Example
> >
@ -31,7 +31,7 @@ GPT-3-5 model from OpenAI is used by default, but this can be customized.
> config = {"task": {"@llm_tasks": "spacy.NER.v3", "labels": ["PERSON", "ORGANISATION", "LOCATION"]}} > config = {"task": {"@llm_tasks": "spacy.NER.v3", "labels": ["PERSON", "ORGANISATION", "LOCATION"]}}
> llm = nlp.add_pipe("llm", config=config) > llm = nlp.add_pipe("llm", config=config)
> >
> # Construction via add_pipe with a task-specific factory and default GPT3.5 model > # Construction via add_pipe with a task-specific factory and default GPT-3.5 model
> llm = nlp.add_pipe("llm_ner") > llm = nlp.add_pipe("llm_ner")
> >
> # Construction via add_pipe with a task-specific factory and custom model > # Construction via add_pipe with a task-specific factory and custom model
@ -1382,7 +1382,7 @@ provider's API.
| Argument | Description | | Argument | Description |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name` | Model name, i. e. any supported variant for this particular model. Default depends on the specific model (cf. below) ~~str~~ | | `name` | Any supported model name for this particular model provider. ~~str~~ |
| `config` | Further configuration passed on to the model. Default depends on the specific model (cf. below). ~~Dict[Any, Any]~~ | | `config` | Further configuration passed on to the model. Default depends on the specific model (cf. below). ~~Dict[Any, Any]~~ |
| `strict` | If `True`, raises an error if the LLM API returns a malformed response. Otherwise, return the error responses as is. Defaults to `True`. ~~bool~~ | | `strict` | If `True`, raises an error if the LLM API returns a malformed response. Otherwise, return the error responses as is. Defaults to `True`. ~~bool~~ |
| `max_tries` | Max. number of tries for API request. Defaults to `5`. ~~int~~ | | `max_tries` | Max. number of tries for API request. Defaults to `5`. ~~int~~ |
@ -1394,50 +1394,21 @@ provider's API.
> >
> ```ini > ```ini
> [components.llm.model] > [components.llm.model]
> @llm_models = "spacy.GPT-4.v1" > @llm_models = "spacy.OpenAI.v1"
> name = "gpt-4" > name = "gpt-4"
> config = {"temperature": 0.0} > config = {"temperature": 0.0}
> ``` > ```
Currently, these models are provided as part of the core library: Currently, these model providers are supported as part of the core library (more
can be used by leveraging the LangChain integration):
| Model | Provider | Supported names | Default name | Default config | | Model | Provider | Information on available models | Default config |
| ----------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------ | ---------------------- | ------------------------------------ | | -------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------- | ------------------- |
| `spacy.GPT-4.v1` | OpenAI | `["gpt-4", "gpt-4-0314", "gpt-4-32k", "gpt-4-32k-0314"]` | `"gpt-4"` | `{}` | | `spacy.OpenAI.v1` | OpenAI | All completion/chat models listed [here](https://platform.openai.com/docs/models) | `{}` |
| `spacy.GPT-4.v2` | OpenAI | `["gpt-4", "gpt-4-0314", "gpt-4-32k", "gpt-4-32k-0314"]` | `"gpt-4"` | `{temperature=0.0}` | | `spacy.Azure.v1` | Microsoft, OpenAI | All completion/chat models listed [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) | `{temperature=0.0}` |
| `spacy.GPT-4.v3` | OpenAI | All names of [GPT-4 models](https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo) offered by OpenAI | `"gpt-4"` | `{temperature=0.0}` | | `spacy.Cohere.v1` | Cohere | All models listed [here](https://docs.cohere.com/docs/models) | `{}` |
| `spacy.GPT-3-5.v1` | OpenAI | `["gpt-3.5-turbo", "gpt-3.5-turbo-16k", "gpt-3.5-turbo-0613", "gpt-3.5-turbo-0613-16k", "gpt-3.5-turbo-instruct"]` | `"gpt-3.5-turbo"` | `{}` | | `spacy.Anthropic.v1` | Anthropic | All models listed [here](https://docs.anthropic.com/claude/reference/selecting-a-model) | `{}` |
| `spacy.GPT-3-5.v2` | OpenAI | `["gpt-3.5-turbo", "gpt-3.5-turbo-16k", "gpt-3.5-turbo-0613", "gpt-3.5-turbo-0613-16k", "gpt-3.5-turbo-instruct"]` | `"gpt-3.5-turbo"` | `{temperature=0.0}` | | `spacy.Google.v1` | Google | All completion/chat models listed [here](https://cloud.google.com/vertex-ai/docs/generative-ai/language-model-overview#palm-api) | `{temperature=0.0}` |
| `spacy.GPT-3-5.v3` | OpenAI | All names of [GPT-3.5 models](https://platform.openai.com/docs/models/gpt-3-5) offered by OpenAI | `"gpt-3.5-turbo"` | `{temperature=0.0}` |
| `spacy.Davinci.v1` | OpenAI | `["davinci"]` | `"davinci"` | `{}` |
| `spacy.Davinci.v2` | OpenAI | `["davinci"]` | `"davinci"` | `{temperature=0.0, max_tokens=500}` |
| `spacy.Text-Davinci.v1` | OpenAI | `["text-davinci-003", "text-davinci-002"]` | `"text-davinci-003"` | `{}` |
| `spacy.Text-Davinci.v2` | OpenAI | `["text-davinci-003", "text-davinci-002"]` | `"text-davinci-003"` | `{temperature=0.0, max_tokens=1000}` |
| `spacy.Code-Davinci.v1` | OpenAI | `["code-davinci-002"]` | `"code-davinci-002"` | `{}` |
| `spacy.Code-Davinci.v2` | OpenAI | `["code-davinci-002"]` | `"code-davinci-002"` | `{temperature=0.0, max_tokens=500}` |
| `spacy.Curie.v1` | OpenAI | `["curie"]` | `"curie"` | `{}` |
| `spacy.Curie.v2` | OpenAI | `["curie"]` | `"curie"` | `{temperature=0.0, max_tokens=500}` |
| `spacy.Text-Curie.v1` | OpenAI | `["text-curie-001"]` | `"text-curie-001"` | `{}` |
| `spacy.Text-Curie.v2` | OpenAI | `["text-curie-001"]` | `"text-curie-001"` | `{temperature=0.0, max_tokens=500}` |
| `spacy.Babbage.v1` | OpenAI | `["babbage"]` | `"babbage"` | `{}` |
| `spacy.Babbage.v2` | OpenAI | `["babbage"]` | `"babbage"` | `{temperature=0.0, max_tokens=500}` |
| `spacy.Text-Babbage.v1` | OpenAI | `["text-babbage-001"]` | `"text-babbage-001"` | `{}` |
| `spacy.Text-Babbage.v2` | OpenAI | `["text-babbage-001"]` | `"text-babbage-001"` | `{temperature=0.0, max_tokens=500}` |
| `spacy.Ada.v1` | OpenAI | `["ada"]` | `"ada"` | `{}` |
| `spacy.Ada.v2` | OpenAI | `["ada"]` | `"ada"` | `{temperature=0.0, max_tokens=500}` |
| `spacy.Text-Ada.v1` | OpenAI | `["text-ada-001"]` | `"text-ada-001"` | `{}` |
| `spacy.Text-Ada.v2` | OpenAI | `["text-ada-001"]` | `"text-ada-001"` | `{temperature=0.0, max_tokens=500}` |
| `spacy.Azure.v1` | Microsoft, OpenAI | Arbitrary values | No default | `{temperature=0.0}` |
| `spacy.Command.v1` | Cohere | `["command", "command-light", "command-light-nightly", "command-nightly"]` | `"command"` | `{}` |
| `spacy.Claude-2-1.v1` | Anthropic | `["claude-2-1"]` | `"claude-2-1"` | `{}` |
| `spacy.Claude-2.v1` | Anthropic | `["claude-2", "claude-2-100k"]` | `"claude-2"` | `{}` |
| `spacy.Claude-1.v1` | Anthropic | `["claude-1", "claude-1-100k"]` | `"claude-1"` | `{}` |
| `spacy.Claude-1-0.v1` | Anthropic | `["claude-1.0"]` | `"claude-1.0"` | `{}` |
| `spacy.Claude-1-2.v1` | Anthropic | `["claude-1.2"]` | `"claude-1.2"` | `{}` |
| `spacy.Claude-1-3.v1` | Anthropic | `["claude-1.3", "claude-1.3-100k"]` | `"claude-1.3"` | `{}` |
| `spacy.Claude-instant-1.v1` | Anthropic | `["claude-instant-1", "claude-instant-1-100k"]` | `"claude-instant-1"` | `{}` |
| `spacy.Claude-instant-1-1.v1` | Anthropic | `["claude-instant-1.1", "claude-instant-1.1-100k"]` | `"claude-instant-1.1"` | `{}` |
| `spacy.PaLM.v1` | Google | `["chat-bison-001", "text-bison-001"]` | `"text-bison-001"` | `{temperature=0.0}` |
To use these models, make sure that you've [set the relevant API](#api-keys) To use these models, make sure that you've [set the relevant API](#api-keys)
keys as environment variables. keys as environment variables.
@ -1506,20 +1477,21 @@ These models all take the same parameters:
> >
> ```ini > ```ini
> [components.llm.model] > [components.llm.model]
> @llm_models = "spacy.Llama2.v1" > @llm_models = "spacy.HuggingFace.v1"
> name = "Llama-2-7b-hf" > name = "Llama-2-7b-hf"
> ``` > ```
Currently, these models are provided as part of the core library: Currently, these models are provided as part of the core library (more models
can be accessed through the `langchain` integration):
| Model | Provider | Supported names | HF directory | | Model family | Author | Names of available model | HF directory |
| -------------------- | --------------- | ------------------------------------------------------------------------------------------------------------ | -------------------------------------- | | ------------ | --------------- | ------------------------------------------------------------------------------------------------------------ | -------------------------------------- |
| `spacy.Dolly.v1` | Databricks | `["dolly-v2-3b", "dolly-v2-7b", "dolly-v2-12b"]` | https://huggingface.co/databricks | | Dolly | Databricks | `["dolly-v2-3b", "dolly-v2-7b", "dolly-v2-12b"]` | https://huggingface.co/databricks |
| `spacy.Falcon.v1` | TII | `["falcon-rw-1b", "falcon-7b", "falcon-7b-instruct", "falcon-40b-instruct"]` | https://huggingface.co/tiiuae | | Falcon | TII | `["falcon-rw-1b", "falcon-7b", "falcon-7b-instruct", "falcon-40b-instruct"]` | https://huggingface.co/tiiuae |
| `spacy.Llama2.v1` | Meta AI | `["Llama-2-7b-hf", "Llama-2-13b-hf", "Llama-2-70b-hf"]` | https://huggingface.co/meta-llama | | Llama 2 | Meta AI | `["Llama-2-7b-hf", "Llama-2-13b-hf", "Llama-2-70b-hf"]` | https://huggingface.co/meta-llama |
| `spacy.Mistral.v1` | Mistral AI | `["Mistral-7B-v0.1", "Mistral-7B-Instruct-v0.1"]` | https://huggingface.co/mistralai | | Mistral | Mistral AI | `["Mistral-7B-v0.1", "Mistral-7B-Instruct-v0.1"]` | https://huggingface.co/mistralai |
| `spacy.StableLM.v1` | Stability AI | `["stablelm-base-alpha-3b", "stablelm-base-alpha-7b", "stablelm-tuned-alpha-3b", "stablelm-tuned-alpha-7b"]` | https://huggingface.co/stabilityai | | Stable LM | Stability AI | `["stablelm-base-alpha-3b", "stablelm-base-alpha-7b", "stablelm-tuned-alpha-3b", "stablelm-tuned-alpha-7b"]` | https://huggingface.co/stabilityai |
| `spacy.OpenLLaMA.v1` | OpenLM Research | `["open_llama_3b", "open_llama_7b", "open_llama_7b_v2", "open_llama_13b"]` | https://huggingface.co/openlm-research | | OpenLLaMa | OpenLM Research | `["open_llama_3b", "open_llama_7b", "open_llama_7b_v2", "open_llama_13b"]` | https://huggingface.co/openlm-research |
<Infobox variant="warning" title="Gated models on Hugging Face" id="hf_licensing"> <Infobox variant="warning" title="Gated models on Hugging Face" id="hf_licensing">
@ -1569,7 +1541,7 @@ To use [LangChain](https://github.com/hwchase17/langchain) for the API retrieval
part, make sure you have installed it first: part, make sure you have installed it first:
```shell ```shell
python -m pip install "langchain==0.0.191" python -m pip install "langchain>=0.1,<0.2"
# Or install with spacy-llm directly # Or install with spacy-llm directly
python -m pip install "spacy-llm[extras]" python -m pip install "spacy-llm[extras]"
``` ```
@ -1579,9 +1551,12 @@ Note that LangChain currently only supports Python 3.9 and beyond.
LangChain models in `spacy-llm` work slightly differently. `langchain`'s models LangChain models in `spacy-llm` work slightly differently. `langchain`'s models
are parsed automatically, each LLM class in `langchain` has one entry in are parsed automatically, each LLM class in `langchain` has one entry in
`spacy-llm`'s registry. As `langchain`'s design has one class per API and not `spacy-llm`'s registry. As `langchain`'s design has one class per API and not
per model, this results in registry entries like `langchain.OpenAI.v1` - i. e. per model, this results in registry entries like `langchain.OpenAIChat.v1` - i.
there is one registry entry per API and not per model (family), as for the REST- e. there is one registry entry per API and not per model (family), as for the
and HuggingFace-based entries. REST- and HuggingFace-based entries. LangChain provides access to many more
model that `spacy-llm` does natively, so if your model or provider of choice
isn't available directly, just leverage the `langchain` integration by
specifying your model with `langchain.YourModel.v1`!
The name of the model to be used has to be passed in via the `name` attribute. The name of the model to be used has to be passed in via the `name` attribute.
@ -1589,7 +1564,7 @@ The name of the model to be used has to be passed in via the `name` attribute.
> >
> ```ini > ```ini
> [components.llm.model] > [components.llm.model]
> @llm_models = "langchain.OpenAI.v1" > @llm_models = "langchain.OpenAIChat.v1"
> name = "gpt-3.5-turbo" > name = "gpt-3.5-turbo"
> query = {"@llm_queries": "spacy.CallLangChain.v1"} > query = {"@llm_queries": "spacy.CallLangChain.v1"}
> config = {"temperature": 0.0} > config = {"temperature": 0.0}

View File

@ -107,7 +107,8 @@ factory = "llm"
labels = ["COMPLIMENT", "INSULT"] labels = ["COMPLIMENT", "INSULT"]
[components.llm.model] [components.llm.model]
@llm_models = "spacy.GPT-3-5.v1" @llm_models = "spacy.OpenAI.v1"
name = "gpt-3.5-turbo"
config = {"temperature": 0.0} config = {"temperature": 0.0}
``` ```
@ -146,7 +147,7 @@ factory = "llm"
labels = ["PERSON", "ORGANISATION", "LOCATION"] labels = ["PERSON", "ORGANISATION", "LOCATION"]
[components.llm.model] [components.llm.model]
@llm_models = "spacy.Dolly.v1" @llm_models = "spacy.HuggingFace.v1"
# For better performance, use dolly-v2-12b instead # For better performance, use dolly-v2-12b instead
name = "dolly-v2-3b" name = "dolly-v2-3b"
``` ```
@ -457,12 +458,13 @@ models by specifying one of the models registered with the `langchain.` prefix.
_Why LangChain if there are also are native REST and HuggingFace interfaces? When should I use what?_ _Why LangChain if there are also are native REST and HuggingFace interfaces? When should I use what?_
Third-party libraries like `langchain` focus on prompt management, integration Third-party libraries like `langchain` focus on prompt management, integration
of many different LLM APIs, and other related features such as conversational of many different LLM APIs and models, and other related features such as
memory or agents. `spacy-llm` on the other hand emphasizes features we consider conversational memory or agents. `spacy-llm` on the other hand emphasizes
useful in the context of NLP pipelines utilizing LLMs to process documents features we consider useful in the context of NLP pipelines utilizing LLMs for
(mostly) independent from each other. It makes sense that the feature sets of (1) extractive NLP and (2) to process documents independent from each other. It
such third-party libraries and `spacy-llm` aren't identical - and users might makes sense that the feature sets of such third-party libraries and `spacy-llm`
want to take advantage of features not available in `spacy-llm`. aren't identical - and users might want to take advantage of features not
available in `spacy-llm`.
The advantage of implementing our own REST and HuggingFace integrations is that The advantage of implementing our own REST and HuggingFace integrations is that
we can ensure a larger degree of stability and robustness, as we can guarantee we can ensure a larger degree of stability and robustness, as we can guarantee
@ -482,35 +484,14 @@ provider's documentation.
</Infobox> </Infobox>
| Model | Description | | Model | Description |
| ----------------------------------------------------------------------- | ---------------------------------------------- | | --------------------------------------------------------------- | -------------------------------------------------- |
| [`spacy.GPT-4.v2`](/api/large-language-models#models-rest) | OpenAIs `gpt-4` model family. | | [`spacy.OpenAI.v1`](/api/large-language-models#models-rest) | OpenAI's chat and completion models. |
| [`spacy.GPT-3-5.v2`](/api/large-language-models#models-rest) | OpenAIs `gpt-3-5` model family. |
| [`spacy.Text-Davinci.v2`](/api/large-language-models#models-rest) | OpenAIs `text-davinci` model family. |
| [`spacy.Code-Davinci.v2`](/api/large-language-models#models-rest) | OpenAIs `code-davinci` model family. |
| [`spacy.Text-Curie.v2`](/api/large-language-models#models-rest) | OpenAIs `text-curie` model family. |
| [`spacy.Text-Babbage.v2`](/api/large-language-models#models-rest) | OpenAIs `text-babbage` model family. |
| [`spacy.Text-Ada.v2`](/api/large-language-models#models-rest) | OpenAIs `text-ada` model family. |
| [`spacy.Davinci.v2`](/api/large-language-models#models-rest) | OpenAIs `davinci` model family. |
| [`spacy.Curie.v2`](/api/large-language-models#models-rest) | OpenAIs `curie` model family. |
| [`spacy.Babbage.v2`](/api/large-language-models#models-rest) | OpenAIs `babbage` model family. |
| [`spacy.Ada.v2`](/api/large-language-models#models-rest) | OpenAIs `ada` model family. |
| [`spacy.Azure.v1`](/api/large-language-models#models-rest) | Azure's OpenAI models. | | [`spacy.Azure.v1`](/api/large-language-models#models-rest) | Azure's OpenAI models. |
| [`spacy.Command.v1`](/api/large-language-models#models-rest) | Coheres `command` model family. | | [`spacy.Cohere.v1`](/api/large-language-models#models-rest) | Coheres text models. |
| [`spacy.Claude-2.v1`](/api/large-language-models#models-rest) | Anthropics `claude-2` model family. | | [`spacy.Anthropic.v1`](/api/large-language-models#models-rest) | Anthropics text models. |
| [`spacy.Claude-1.v1`](/api/large-language-models#models-rest) | Anthropics `claude-1` model family. | | [`spacy.Google.v1`](/api/large-language-models#models-rest) | Googles text models (e. g. PaLM). |
| [`spacy.Claude-instant-1.v1`](/api/large-language-models#models-rest) | Anthropics `claude-instant-1` model family. | | [`spacy.HuggingFace.v1`](/api/large-language-models#models-hf) | A selection of LLMs available through HuggingFace. |
| [`spacy.Claude-instant-1-1.v1`](/api/large-language-models#models-rest) | Anthropics `claude-instant-1.1` model family. | | [LangChain models](/api/large-language-models#langchain-models) | All models available through `langchain`. |
| [`spacy.Claude-1-0.v1`](/api/large-language-models#models-rest) | Anthropics `claude-1.0` model family. |
| [`spacy.Claude-1-2.v1`](/api/large-language-models#models-rest) | Anthropics `claude-1.2` model family. |
| [`spacy.Claude-1-3.v1`](/api/large-language-models#models-rest) | Anthropics `claude-1.3` model family. |
| [`spacy.PaLM.v1`](/api/large-language-models#models-rest) | Googles `PaLM` model family. |
| [`spacy.Dolly.v1`](/api/large-language-models#models-hf) | Dolly models through HuggingFace. |
| [`spacy.Falcon.v1`](/api/large-language-models#models-hf) | Falcon models through HuggingFace. |
| [`spacy.Mistral.v1`](/api/large-language-models#models-hf) | Mistral models through HuggingFace. |
| [`spacy.Llama2.v1`](/api/large-language-models#models-hf) | Llama2 models through HuggingFace. |
| [`spacy.StableLM.v1`](/api/large-language-models#models-hf) | StableLM models through HuggingFace. |
| [`spacy.OpenLLaMA.v1`](/api/large-language-models#models-hf) | OpenLLaMA models through HuggingFace. |
| [LangChain models](/api/large-language-models#langchain-models) | LangChain models for API retrieval. |
Note that the chat models variants of Llama 2 are currently not supported. This Note that the chat models variants of Llama 2 are currently not supported. This
is because they need a particular prompting setup and don't add any discernible is because they need a particular prompting setup and don't add any discernible