Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
This commit is contained in:
Sofie Van Landeghem 2023-11-09 11:35:10 +01:00 committed by GitHub
parent 30998a000f
commit 8569b27663
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -305,15 +305,14 @@ path = "summarization_examples.yml"
### EL (Entity Linking) {id="nel"} ### EL (Entity Linking) {id="nel"}
The EL links recognized entities (see [NER](#ner)) against a provided knowledge The EL links recognized entities (see [NER](#ner)) to those in a knowledge
base (KB). The EL task prompts the LLM to choose a shortlist of (the most base (KB). The EL task prompts the LLM to select the most likely
likely) candidates from the KB. How this KB looks like and stores data can be candidates from the KB, whose structure can be arbitrary.
arbitrary.
Note that the documents run through the entity linking task are expected to have Note that the documents processed by the entity linking task are expected to have
recognized entities in `.ents`. This can be achieved by - prior to running the recognized entities in their `.ents` attribute. This can be achieved by either running the
EL task - running the [NER task](#ner), a trained spaCy NER model or by setting [NER task](#ner), using a trained spaCy NER model or setting the entities manually prior
your docs' entities manually. to running the EL task.
In order to be able to pull data from the KB, an object implementing the In order to be able to pull data from the KB, an object implementing the
`CandidateSelector` protocol has to be provided. This requires two functions: `CandidateSelector` protocol has to be provided. This requires two functions:
@ -325,7 +324,7 @@ ideally provide more context for entities stored in the KB.
`spacy-llm` provides a `CandidateSelector` implementation `spacy-llm` provides a `CandidateSelector` implementation
(`spacy.CandidateSelector.v1`) that leverages a spaCy pipeline with an (`spacy.CandidateSelector.v1`) that leverages a spaCy pipeline with an
`entity_linking` component to select candidates. Note that this pipeline doesn't `entity_linking` component to select candidates. Note that this pipeline doesn't
have to provide a trained EL model, merely its default (or custom) candidate have to provide a trained EL model but merely its default (or custom) candidate
selection capabilities. selection capabilities.
#### spacy.EntityLinker.v1 {id="el-v1"} #### spacy.EntityLinker.v1 {id="el-v1"}
@ -352,7 +351,7 @@ Supports zero- and few-shot prompting.
| `template` | Custom prompt template to send to LLM model. Defaults to [ner.v3.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v3.jinja). ~~str~~ | | `template` | Custom prompt template to send to LLM model. Defaults to [ner.v3.jinja](https://github.com/explosion/spacy-llm/blob/main/spacy_llm/tasks/templates/ner.v3.jinja). ~~str~~ |
| `parse_responses` | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[EntityLinkerTask]]~~ | | `parse_responses` | Callable for parsing LLM responses for this task. Defaults to the internal parsing method for this task. ~~Optional[TaskResponseParser[EntityLinkerTask]]~~ |
| `prompt_example_type` | Type to use for fewshot examples. Defaults to `ELExample`. ~~Optional[Type[FewshotExample]]~~ | | `prompt_example_type` | Type to use for fewshot examples. Defaults to `ELExample`. ~~Optional[Type[FewshotExample]]~~ |
| `examples` | Optional callable that reads a file containing task examples for few-shot learning. If None is passed, then zero-shot learning will be used. Defaults to `None`. ~~ExamplesConfigType~~ | | `examples` | Optional callable that reads a file containing task examples for few-shot learning. If `None` is passed, zero-shot learning will be used. Defaults to `None`. ~~ExamplesConfigType~~ |
| `scorer` | Scorer function. Defaults to the metric used by spaCy to evaluate entity linking performance. ~~Optional[Scorer]~~ | | `scorer` | Scorer function. Defaults to the metric used by spaCy to evaluate entity linking performance. ~~Optional[Scorer]~~ |
##### spacy.CandidateSelector.v1 {id="candidate-selector-v1"} ##### spacy.CandidateSelector.v1 {id="candidate-selector-v1"}
@ -360,7 +359,7 @@ Supports zero- and few-shot prompting.
`spacy.CandidateSelector.v1` is an implementation of the `CandidateSelector` `spacy.CandidateSelector.v1` is an implementation of the `CandidateSelector`
protocol required by [`spacy.EntityLinker.v1`](#el-v1). The built-in candidate protocol required by [`spacy.EntityLinker.v1`](#el-v1). The built-in candidate
selector method leverages a spaCy pipeline with an entity linking component. The selector method leverages a spaCy pipeline with an entity linking component. The
pipeline's EL component's candidate selection capabilities are used to select EL component's candidate selection capabilities are used to select
the most likely entity candidates for the specified mentions. the most likely entity candidates for the specified mentions.
> ##### Example config > ##### Example config
@ -379,10 +378,10 @@ the most likely entity candidates for the specified mentions.
| Argument | Description | | Argument | Description |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `nlp_path` | Path to stored spaCy pipeline. ~~Union[Path, str]~~ | | `nlp_path` | Path to stored spaCy pipeline. ~~Union[Path, str]~~ |
| `desc_path` | Path to .csv file with descriptions for entities. Has to have two columns with the first one being the entity ID, the second one being the description. The entity ID has to match with the entity ID in the stored knowledge base. ~~Union[Path, str]~~ | | `desc_path` | Path to `.csv` file with descriptions for entities. Must have two columns: entity ID and description. The entity ID has to match with the entity ID in the stored knowledge base. ~~Union[Path, str]~~ |
| `el_component_name` | Name of EL component in spaCy pipeline loaded from `nlp_path`. Defaults to `entity_linker`. ~~str~~ | | `el_component_name` | Name of the EL component in the pipeline loaded from `nlp_path`. Defaults to `entity_linker`. ~~str~~ |
| `top_n` | Top n candidates to include in prompt. Defaults to 5. ~~int~~ | | `top_n` | Top-n candidates to include in the prompt. Defaults to 5. ~~int~~ |
| `ent_desc_reader` | Entity description reader. Defaults to an internal method expecting a CSV file without header row, with ";" as delimiters, and with two columns - one for the entitys' IDs, one for their descriptions. ~~Optional[Scorer]~~ | | `ent_desc_reader` | Entity description reader. Defaults to an internal method that expects a CSV file in the following format: No header row, ";" as delimiters, two columns - one for the entitys' IDs and one for their descriptions. ~~Optional[Scorer]~~ |
### NER {id="ner"} ### NER {id="ner"}