diff --git a/website/docs/api/large-language-models.mdx b/website/docs/api/large-language-models.mdx index d1dda0b30..9b4997701 100644 --- a/website/docs/api/large-language-models.mdx +++ b/website/docs/api/large-language-models.mdx @@ -305,15 +305,14 @@ path = "summarization_examples.yml" ### EL (Entity Linking) {id="nel"} -The EL links recognized entities (see [NER](#ner)) against a provided knowledge -base (KB). The EL task prompts the LLM to choose a shortlist of (the most -likely) candidates from the KB. How this KB looks like and stores data can be -arbitrary. +The EL links recognized entities (see [NER](#ner)) to those in a knowledge +base (KB). The EL task prompts the LLM to select the most likely +candidates from the KB, whose structure can be arbitrary. -Note that the documents run through the entity linking task are expected to have -recognized entities in `.ents`. This can be achieved by - prior to running the -EL task - running the [NER task](#ner), a trained spaCy NER model or by setting -your docs' entities manually. +Note that the documents processed by the entity linking task are expected to have +recognized entities in their `.ents` attribute. This can be achieved by either running the +[NER task](#ner), using a trained spaCy NER model or setting the entities manually prior +to running the EL task. In order to be able to pull data from the KB, an object implementing the `CandidateSelector` protocol has to be provided. This requires two functions: @@ -325,7 +324,7 @@ ideally provide more context for entities stored in the KB. `spacy-llm` provides a `CandidateSelector` implementation (`spacy.CandidateSelector.v1`) that leverages a spaCy pipeline with an `entity_linking` component to select candidates. Note that this pipeline doesn't -have to provide a trained EL model, merely its default (or custom) candidate +have to provide a trained EL model but merely its default (or custom) candidate selection capabilities. #### spacy.EntityLinker.v1 {id="el-v1"} @@ -352,7 +351,7 @@ Supports zero- and few-shot prompting. | `template` | Custom prompt template to send to LLM model. Defaults to [ner.v3.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v3.jinja). ~~str~~ | | `parse_responses` | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[EntityLinkerTask]]~~ | | `prompt_example_type` | Type to use for fewshot examples. Defaults to `ELExample`. ~~Optional[Type[FewshotExample]]~~ | -| `examples` | Optional callable that reads a file containing task examples for few-shot learning. If None is passed, then zero-shot learning will be used. Defaults to `None`. ~~ExamplesConfigType~~ | +| `examples` | Optional callable that reads a file containing task examples for few-shot learning. If `None` is passed, zero-shot learning will be used. Defaults to `None`. ~~ExamplesConfigType~~ | | `scorer` | Scorer function. Defaults to the metric used by spaCy to evaluate entity linking performance. ~~Optional[Scorer]~~ | ##### spacy.CandidateSelector.v1 {id="candidate-selector-v1"} @@ -360,7 +359,7 @@ Supports zero- and few-shot prompting. `spacy.CandidateSelector.v1` is an implementation of the `CandidateSelector` protocol required by [`spacy.EntityLinker.v1`](#el-v1). The built-in candidate selector method leverages a spaCy pipeline with an entity linking component. The -pipeline's EL component's candidate selection capabilities are used to select +EL component's candidate selection capabilities are used to select the most likely entity candidates for the specified mentions. > ##### Example config @@ -379,10 +378,10 @@ the most likely entity candidates for the specified mentions. | Argument | Description | | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `nlp_path` | Path to stored spaCy pipeline. ~~Union[Path, str]~~ | -| `desc_path` | Path to .csv file with descriptions for entities. Has to have two columns with the first one being the entity ID, the second one being the description. The entity ID has to match with the entity ID in the stored knowledge base. ~~Union[Path, str]~~ | -| `el_component_name` | Name of EL component in spaCy pipeline loaded from `nlp_path`. Defaults to `entity_linker`. ~~str~~ | -| `top_n` | Top n candidates to include in prompt. Defaults to 5. ~~int~~ | -| `ent_desc_reader` | Entity description reader. Defaults to an internal method expecting a CSV file without header row, with ";" as delimiters, and with two columns - one for the entitys' IDs, one for their descriptions. ~~Optional[Scorer]~~ | +| `desc_path` | Path to `.csv` file with descriptions for entities. Must have two columns: entity ID and description. The entity ID has to match with the entity ID in the stored knowledge base. ~~Union[Path, str]~~ | +| `el_component_name` | Name of the EL component in the pipeline loaded from `nlp_path`. Defaults to `entity_linker`. ~~str~~ | +| `top_n` | Top-n candidates to include in the prompt. Defaults to 5. ~~int~~ | +| `ent_desc_reader` | Entity description reader. Defaults to an internal method that expects a CSV file in the following format: No header row, ";" as delimiters, two columns - one for the entitys' IDs and one for their descriptions. ~~Optional[Scorer]~~ | ### NER {id="ner"}