Add section on Llama 2. Format.

2025-09-20 11:02:38 +03:00 · 2023-07-20 12:49:22 +02:00 · 2023-07-20 12:49:22 +02:00 · 5c44000d62
commit 5c44000d62
parent b8a6a25953
1 changed files with 63 additions and 19 deletions
--- a/website/docs/api/large-language-models.mdx
+++ b/website/docs/api/large-language-models.mdx
@ -10,10 +10,9 @@ menu:
 ---
 [The spacy-llm package](https://github.com/explosion/spacy-llm) integrates Large
-Language Models (LLMs) into spaCy, featuring a modular
+Language Models (LLMs) into spaCy, featuring a modular system for **fast
-system for **fast prototyping** and **prompting**, and turning unstructured
+prototyping** and **prompting**, and turning unstructured responses into
-responses into **robust outputs** for various NLP tasks, **no training data**
+**robust outputs** for various NLP tasks, **no training data** required.
 required.
 ## Config {id="config"}
@ -57,8 +56,7 @@ want to disable this behavior.
 A _task_ defines an NLP problem or question, that will be sent to the LLM via a
 prompt. Further, the task defines how to parse the LLM's responses back into
-structured information. All tasks are registered in the `llm_tasks`
+structured information. All tasks are registered in the `llm_tasks` registry.
 registry.
 #### task.generate_prompts {id="task-generate-prompts"}
@ -187,11 +185,11 @@ the following parameters:
  case variances in the LLM's output.
 - The `alignment_mode` argument is used to match entities as returned by the LLM
  to the tokens from the original `Doc` - specifically it's used as argument in
-  the call to [`doc.char_span()`](/api/doc#char_span). The
+  the call to [`doc.char_span()`](/api/doc#char_span). The `"strict"` mode will
-  `"strict"` mode will only keep spans that strictly adhere to the given token
+  only keep spans that strictly adhere to the given token boundaries.
-  boundaries. `"contract"` will only keep those tokens that are fully within the
+  `"contract"` will only keep those tokens that are fully within the given
-  given range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will
+  range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will expand the
-  expand the span to the next token boundaries, e.g. expanding `"New Y"` out to
+  span to the next token boundaries, e.g. expanding `"New Y"` out to
  `"New York"`.
 To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts),
@ -277,11 +275,11 @@ the following parameters:
  case variances in the LLM's output.
 - The `alignment_mode` argument is used to match entities as returned by the LLM
  to the tokens from the original `Doc` - specifically it's used as argument in
-  the call to [`doc.char_span()`](/api/doc#char_span). The
+  the call to [`doc.char_span()`](/api/doc#char_span). The `"strict"` mode will
-  `"strict"` mode will only keep spans that strictly adhere to the given token
+  only keep spans that strictly adhere to the given token boundaries.
-  boundaries. `"contract"` will only keep those tokens that are fully within the
+  `"contract"` will only keep those tokens that are fully within the given
-  given range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will
+  range, e.g. reducing `"New Y"` to `"New"`. Finally, `"expand"` will expand the
-  expand the span to the next token boundaries, e.g. expanding `"New Y"` out to
+  span to the next token boundaries, e.g. expanding `"New Y"` out to
  `"New York"`.
 To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts),
@ -611,9 +609,9 @@ friends: friend
 ```
 If for any given text/doc instance the number of lemmas returned by the LLM
-doesn't match the number of tokens from the pipeline's tokenizer, no lemmas are stored in
+doesn't match the number of tokens from the pipeline's tokenizer, no lemmas are
-the corresponding doc's tokens. Otherwise the tokens `.lemma_` property is
+stored in the corresponding doc's tokens. Otherwise the tokens `.lemma_`
-updated with the lemma suggested by the LLM.
+property is updated with the lemma suggested by the LLM.
 To perform [few-shot learning](/usage/large-langauge-models#few-shot-prompts),
 you can write down a few examples in a separate file, and provide these to be
@ -1188,6 +1186,52 @@ can
 [define the cached directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache)
 by setting the environmental variable `HF_HOME`.
 #### spacy.Llama2.v1 {id="llama2"}
 To use this model, ideally you have a GPU enabled and have installed
 `transformers`, `torch` and CUDA in your virtual environment. This allows you to
 have the setting `device=cuda:0` in your config, which ensures that the model is
 loaded entirely on the GPU (and fails otherwise).
 You can do so with
 ```shell
 python -m pip install "spacy-llm[transformers]" "transformers[sentencepiece]"
 ```
 If you don't have access to a GPU, you can install `accelerate` and
 set`device_map=auto` instead, but be aware that this may result in some layers
 getting distributed to the CPU or even the hard drive, which may ultimately
 result in extremely slow queries.
 ```shell
 python -m pip install "accelerate>=0.16.0,<1.0"
 ```
 Note that the chat models variants of Llama 2 are currently not supported. This
 is because they need a particular prompting setup and don't add any discernible
 benefits in the use case of `spacy-llm` (i. e. no interactive chat) compared the
 completion model variants.
 > #### Example config
 >
 > ```ini
 > [components.llm.model]
 > @llm_models = "spacy.Llama2.v1"
 > name = "llama2-7b-hf"
 > ```
 | Argument      | Description                                                                                                                                            |
 | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | `name`        | The name of a Llama 2 model variant that is supported. Defaults to `"Llama-2-7b-hf"`. ~~Literal["Llama-2-7b-hf", "Llama-2-13b-hf", "Llama-2-70b-hf"]~~ |
 | `config_init` | Further configuration passed on to the construction of the model with `transformers.pipeline()`. Defaults to `{}`. ~~Dict[str, Any]~~                  |
 | `config_run`  | Further configuration used during model inference. Defaults to `{}`. ~~Dict[str, Any]~~                                                                |
 Note that Hugging Face will download this model the first time you use it - you
 can
 [define the cache directory](https://huggingface.co/docs/huggingface_hub/main/en/guides/manage-cache)
 by setting the environmental variable `HF_HOME`.
 #### spacy.Falcon.v1 {id="falcon"}
 To use this model, ideally you have a GPU enabled and have installed