From 5c44000d625f00450d6114929e473374b0981689 Mon Sep 17 00:00:00 2001 From: Raphael Mitsch Date: Thu, 20 Jul 2023 12:49:22 +0200 Subject: [PATCH] Add section on Llama 2. Format. --- website/docs/api/large-language-models.mdx | 82 +++++++++++++++++----- 1 file changed, 63 insertions(+), 19 deletions(-) diff --git a/website/docs/api/large-language-models.mdx b/website/docs/api/large-language-models.mdx index 907d992d4..cc8328790 100644 --- a/website/docs/api/large-language-models.mdx +++ b/website/docs/api/large-language-models.mdx @@ -10,10 +10,9 @@ menu: --- [The spacy-llm package](https://github.com/explosion/spacy-llm) integrates Large -Language Models (LLMs) into spaCy, featuring a modular -system for **fast prototyping** and **prompting**, and turning unstructured -responses into **robust outputs** for various NLP tasks, **no training data** -required. +Language Models (LLMs) into spaCy, featuring a modular system for **fast +prototyping** and **prompting**, and turning unstructured responses into +**robust outputs** for various NLP tasks, **no training data** required. ## Config {id="config"} @@ -57,8 +56,7 @@ want to disable this behavior. A _task_ defines an NLP problem or question, that will be sent to the LLM via a prompt. Further, the task defines how to parse the LLM's responses back into -structured information. All tasks are registered in the `llm_tasks` -registry. +structured information. All tasks are registered in the `llm_tasks` registry. #### task.generate_prompts {id="task-generate-prompts"} @@ -187,11 +185,11 @@ the following parameters: case variances in the LLM's output. - The `alignment_mode` argument is used to match entities as returned by the LLM to the tokens from the original `Doc` - specifically it's used as argument in - the call to [`doc.char_span()`](/api/doc#char_span). The - `"strict"` mode will only keep spans that strictly adhere to the given token - boundaries. `"contract"` will only keep those tokens that are fully within the - given range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will - expand the span to the next token boundaries, e.g. expanding `"New Y"` out to + the call to [`doc.char_span()`](/api/doc#char_span). The `"strict"` mode will + only keep spans that strictly adhere to the given token boundaries. + `"contract"` will only keep those tokens that are fully within the given + range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will expand the + span to the next token boundaries, e.g. expanding `"New Y"` out to `"New York"`. To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts), @@ -277,11 +275,11 @@ the following parameters: case variances in the LLM's output. - The `alignment_mode` argument is used to match entities as returned by the LLM to the tokens from the original `Doc` - specifically it's used as argument in - the call to [`doc.char_span()`](/api/doc#char_span). The - `"strict"` mode will only keep spans that strictly adhere to the given token - boundaries. `"contract"` will only keep those tokens that are fully within the - given range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will - expand the span to the next token boundaries, e.g. expanding `"New Y"` out to + the call to [`doc.char_span()`](/api/doc#char_span). The `"strict"` mode will + only keep spans that strictly adhere to the given token boundaries. + `"contract"` will only keep those tokens that are fully within the given + range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will expand the + span to the next token boundaries, e.g. expanding `"New Y"` out to `"New York"`. To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts), @@ -611,9 +609,9 @@ friends: friend ``` If for any given text/doc instance the number of lemmas returned by the LLM -doesn't match the number of tokens from the pipeline's tokenizer, no lemmas are stored in -the corresponding doc's tokens. Otherwise the tokens `.lemma_` property is -updated with the lemma suggested by the LLM. +doesn't match the number of tokens from the pipeline's tokenizer, no lemmas are +stored in the corresponding doc's tokens. Otherwise the tokens `.lemma_` +property is updated with the lemma suggested by the LLM. To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts), you can write down a few examples in a separate file, and provide these to be @@ -1188,6 +1186,52 @@ can [define the cached directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache) by setting the environmental variable `HF_HOME`. +#### spacy.Llama2.v1 {id="llama2"} + +To use this model, ideally you have a GPU enabled and have installed +`transformers`, `torch` and CUDA in your virtual environment. This allows you to +have the setting `device=cuda:0` in your config, which ensures that the model is +loaded entirely on the GPU (and fails otherwise). + +You can do so with + +```shell +python -m pip install "spacy-llm[transformers]" "transformers[sentencepiece]" +``` + +If you don't have access to a GPU, you can install `accelerate` and +set`device_map=auto` instead, but be aware that this may result in some layers +getting distributed to the CPU or even the hard drive, which may ultimately +result in extremely slow queries. + +```shell +python -m pip install "accelerate>=0.16.0,<1.0" +``` + +Note that the chat models variants of Llama 2 are currently not supported. This +is because they need a particular prompting setup and don't add any discernible +benefits in the use case of `spacy-llm` (i. e. no interactive chat) compared the +completion model variants. + +> #### Example config +> +> ```ini +> [components.llm.model] +> @llm_models = "spacy.Llama2.v1" +> name = "llama2-7b-hf" +> ``` + +| Argument | Description | +| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `name` | The name of a Llama 2 model variant that is supported. Defaults to `"Llama-2-7b-hf"`. ~~Literal["Llama-2-7b-hf", "Llama-2-13b-hf", "Llama-2-70b-hf"]~~ | +| `config_init` | Further configuration passed on to the construction of the model with `transformers.pipeline()`. Defaults to `{}`. ~~Dict[str, Any]~~ | +| `config_run` | Further configuration used during model inference. Defaults to `{}`. ~~Dict[str, Any]~~ | + +Note that Hugging Face will download this model the first time you use it - you +can +[define the cache directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache) +by setting the environmental variable `HF_HOME`. + #### spacy.Falcon.v1 {id="falcon"} To use this model, ideally you have a GPU enabled and have installed