Add info on get_candidates(), get_candidates_batched().

2026-01-01 22:43:10 +03:00 · 2024-02-07 10:14:53 +01:00 · 2024-02-07 10:14:53 +01:00 · 5f87b6a915
commit 5f87b6a915
parent 8a2a7f1879
1 changed files with 29 additions and 15 deletions
--- a/website/docs/api/entitylinker.mdx
+++ b/website/docs/api/entitylinker.mdx
@ -53,21 +53,35 @@ architectures and their arguments and hyperparameters.
 > nlp.add_pipe("entity_linker", config=config)
 > ```

-| Setting                                          | Description                                                                                                                                                                                                                                                                                 |
-| ------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `labels_discard`                                 | NER labels that will automatically get an "NIL" prediction. Defaults to `[]`. ~~Iterable[str]~~                                                                                                                                                                                              |
-| `n_sents`                                        | The number of neighbouring sentences to take into account. Defaults to `0`. ~~int~~                                                                                                                                                                                                           |
-| `incl_prior`                                     | Whether prior probabilities from the KB are included in the model. Defaults to `True`. ~~bool~~                                                                                                                                                                                        |
-| `incl_context`                                   | Whether the local context is included in the model. Defaults to `True`. ~~bool~~                                                                                                                                                                                                      |
-| `model`                                          | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. Defaults to [`EntityLinker`](/api/architectures#EntityLinker). ~~Model~~                                                                                                                                      |
-| `entity_vector_length`                           | Size of encoding vectors in the KB. Defaults to `64`. ~~int~~                                                                                                                                                                                                                               |
-| `use_gold_ents`                                  | Whether entities are copied from the gold docs. Defaults to `True`. If `False`, entities must be set in the training data or by an annotating component in the pipeline. ~~int~~                                                                                                        |
-| `get_candidates`                                 | Function that retrieves plausible candidates per entity mention in a given `Iterator[SpanGroup]`. Defaults to [CandidateGenerator](/api/architectures#CandidateGenerator). ~~Callable[[KnowledgeBase, Iterator[SpanGroup]], Iterator[Iterable[Iterable[Candidate]]]]~~                      |
-| `generate_empty_kb` <Tag variant="new">3.6</Tag> | Function that generates an empty `KnowledgeBase` object. Defaults to [`spacy.EmptyKB.v2`](/api/architectures#EmptyKB), which generates an empty [`InMemoryLookupKB`](/api/inmemorylookupkb). ~~Callable[[Vocab, int], KnowledgeBase]~~                                                      |
-| `overwrite` <Tag variant="new">3.2</Tag>         | Whether existing annotation is overwritten. Defaults to `True`. ~~bool~~                                                                                                                                                                                                                    |
-| `scorer` <Tag variant="new">3.2</Tag>            | The scoring method. Defaults to [`Scorer.score_links`](/api/scorer#score_links). ~~Optional[Callable]~~                                                                                                                                                                                     |
-| `save_activations` <Tag variant="new">4.0</Tag>  | Save activations in `Doc` when annotating. Saved activations are `"ents"` and `"scores"`. ~~Union[bool, list[str]]~~                                                                                                                                                                        |
-| `threshold` <Tag variant="new">3.4</Tag>         | Confidence threshold for entity predictions. The default of `None` implies that all predictions are accepted, otherwise those with a score beneath the treshold are discarded. If there are no predictions with scores above the threshold, the linked entity is `NIL`. ~~Optional[float]~~ |
+| Setting                                          | Description                                                                                                                                                                                                                                                                                                                                    |
+| ------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `labels_discard`                                 | NER labels that will automatically get an "NIL" prediction. Defaults to `[]`. ~~Iterable[str]~~                                                                                                                                                                                                                                                |
+| `n_sents`                                        | The number of neighbouring sentences to take into account. Defaults to `0`. ~~int~~                                                                                                                                                                                                                                                            |
+| `incl_prior`                                     | Whether prior probabilities from the KB are included in the model. Defaults to `True`. ~~bool~~                                                                                                                                                                                                                                                |
+| `incl_context`                                   | Whether the local context is included in the model. Defaults to `True`. ~~bool~~                                                                                                                                                                                                                                                               |
+| `model`                                          | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. Defaults to [`EntityLinker`](/api/architectures#EntityLinker). ~~Model~~                                                                                                                                                                                       |
+| `entity_vector_length`                           | Size of encoding vectors in the KB. Defaults to `64`. ~~int~~                                                                                                                                                                                                                                                                                  |
+| `use_gold_ents`                                  | Whether entities are copied from the gold docs. Defaults to `True`. If `False`, entities must be set in the training data or by an annotating component in the pipeline. ~~int~~                                                                                                                                                               |
+| `get_candidates` <Tag variant="new">4.0</Tag>    | Function that retrieves plausible candidates per entity mention in a given `Iterator[SpanGroup]` (one `SpanGroup` includes all mentions found in a given `Doc` instance). Defaults to [CandidateGenerator](/api/architectures#CandidateGenerator). ~~Callable[[KnowledgeBase, Iterator[SpanGroup]], Iterator[Iterable[Iterable[Candidate]]]]~~ |
+| `generate_empty_kb` <Tag variant="new">3.6</Tag> | Function that generates an empty `KnowledgeBase` object. Defaults to [`spacy.EmptyKB.v2`](/api/architectures#EmptyKB), which generates an empty [`InMemoryLookupKB`](/api/inmemorylookupkb). ~~Callable[[Vocab, int], KnowledgeBase]~~                                                                                                         |
+| `overwrite` <Tag variant="new">3.2</Tag>         | Whether existing annotation is overwritten. Defaults to `True`. ~~bool~~                                                                                                                                                                                                                                                                       |
+| `scorer` <Tag variant="new">3.2</Tag>            | The scoring method. Defaults to [`Scorer.score_links`](/api/scorer#score_links). ~~Optional[Callable]~~                                                                                                                                                                                                                                        |
+| `save_activations` <Tag variant="new">4.0</Tag>  | Save activations in `Doc` when annotating. Saved activations are `"ents"` and `"scores"`. ~~Union[bool, list[str]]~~                                                                                                                                                                                                                           |
+| `threshold` <Tag variant="new">3.4</Tag>         | Confidence threshold for entity predictions. The default of `None` implies that all predictions are accepted, otherwise those with a score beneath the treshold are discarded. If there are no predictions with scores above the threshold, the linked entity is `NIL`. ~~Optional[float]~~                                                    |
+
+<Infobox variant="warning">
+
+Prior to spaCy v4.0 `get_candidates()` returns a single `Iterable` of candidates
+for one specific mention, i. e. the function was typed as
+`Callable[[KnowledgeBase, Span], Iterable[Candidate]]`. To retrive candidates
+batch-wise, spaCy >= 3.5 exposes `get_candidates_batched()`, which identifies
+candidates for an arbitrary number of spans:
+`Callable[[KnowledgeBase, Iterable[Span]], Iterable[Iterable[Candidate]]]`. The
+main difference between `get_candidates_batched()` and `get_candidates()` in
+spaCy >= 4.0 is that the latter considers the grouping of provided mention spans
+per `Doc` instance.
+
+</Infobox>

 ```python
 %%GITHUB_SPACY/spacy/pipeline/entity_linker.py