update docs

This commit is contained in:
kadarakos 2023-02-20 17:03:41 +00:00
parent 86d3e78c64
commit 6e5e77ea79

View File

@ -13,11 +13,11 @@ A span categorizer consists of two parts: a [suggester function](#suggesters)
that proposes candidate spans, which may or may not overlap, and a labeler model that proposes candidate spans, which may or may not overlap, and a labeler model
that predicts zero or more labels for each candidate. that predicts zero or more labels for each candidate.
This component comes in two forms: `spancat` and `spancat_exclusive`. When you This component comes in two forms: `spancat` and `spancat_singlelabel`. When you
need to perform multi-label classification on your spans, use `spancat`. The need to perform multi-label classification on your spans, use `spancat`. The
`spancat` component uses a `Logistic` layer where the output class probabilities `spancat` component uses a `Logistic` layer where the output class probabilities
are independent for each class. However, if you need to predict at most one true are independent for each class. However, if you need to predict at most one true
class for a span, then use `spancat_exclusive`. It uses a `Softmax` layer and class for a span, then use `spancat_singlelabel`. It uses a `Softmax` layer and
treats the entities as a multi-class problem. treats the entities as a multi-class problem.
Predicted spans will be saved in a [`SpanGroup`](/api/spangroup) on the doc. Predicted spans will be saved in a [`SpanGroup`](/api/spangroup) on the doc.
@ -59,17 +59,17 @@ architectures and their arguments and hyperparameters.
> nlp.add_pipe("spancat", config=config) > nlp.add_pipe("spancat", config=config)
> ``` > ```
> #### Example (spancat_exclusive) > #### Example (spancat_singlelabel)
> >
> ```python > ```python
> from spacy.pipeline.spancat_exclusive import DEFAULT_EXCL_SPANCAT_MODEL > from spacy.pipeline.spancat import DEFAULT_SPANCAT_SINGLELABEL_MODEL
> config = { > config = {
> "threshold": 0.5, > "threshold": 0.5,
> "spans_key": "labeled_spans", > "spans_key": "labeled_spans",
> "model": DEFAULT_EXCL_SPANCAT_MODEL, > "model": DEFAULT_EXCL_SPANCAT_MODEL,
> "suggester": {"@misc": "spacy.ngram_suggester.v1", "sizes": [1, 2, 3]}, > "suggester": {"@misc": "spacy.ngram_suggester.v1", "sizes": [1, 2, 3]},
> # Additional spancat_exclusive parameters > # Additional spancat_exclusive parameters
> "negative_weight": 1.0, > "negative_weight": 0.8,
> "allow_overlap": True, > "allow_overlap": True,
> } > }
> nlp.add_pipe("spancat_exclusive", config=config) > nlp.add_pipe("spancat_exclusive", config=config)
@ -80,11 +80,12 @@ architectures and their arguments and hyperparameters.
| `suggester` | A function that [suggests spans](#suggesters). Spans are returned as a ragged array with two integer columns, for the start and end positions. Defaults to [`ngram_suggester`](#ngram_suggester). ~~Callable[[Iterable[Doc], Optional[Ops]], Ragged]~~ | | `suggester` | A function that [suggests spans](#suggesters). Spans are returned as a ragged array with two integer columns, for the start and end positions. Defaults to [`ngram_suggester`](#ngram_suggester). ~~Callable[[Iterable[Doc], Optional[Ops]], Ragged]~~ |
| `model` | A model instance that is given a a list of documents and `(start, end)` indices representing candidate span offsets. The model predicts a probability for each category for each span. Defaults to [SpanCategorizer](/api/architectures#SpanCategorizer). ~~Model[Tuple[List[Doc], Ragged], Floats2d]~~ | | `model` | A model instance that is given a a list of documents and `(start, end)` indices representing candidate span offsets. The model predicts a probability for each category for each span. Defaults to [SpanCategorizer](/api/architectures#SpanCategorizer). ~~Model[Tuple[List[Doc], Ragged], Floats2d]~~ |
| `spans_key` | Key of the [`Doc.spans`](/api/doc#spans) dict to save the spans under. During initialization and training, the component will look for spans on the reference document under the same key. Defaults to `"sc"`. ~~str~~ | | `spans_key` | Key of the [`Doc.spans`](/api/doc#spans) dict to save the spans under. During initialization and training, the component will look for spans on the reference document under the same key. Defaults to `"sc"`. ~~str~~ |
| `threshold` | Minimum probability to consider a prediction positive. Spans with a positive prediction will be saved on the Doc. Defaults to `0.5`. ~~float~~ | | `threshold` | Minimum probability to consider a prediction positive. Spans with a positive prediction will be saved on the Doc. Meant to be used in combination with the multi-class `spancat` component with a `Logistic` scoring layer. Defaults to `0.5`. ~~float~~ |
| `max_positive` | Maximum number of labels to consider positive per span. Defaults to `None`, indicating no limit. It is only available for the `spancat` component. ~~Optional[int]~~ | | `max_positive` | Maximum number of labels to consider positive per span. Defaults to `None`, indicating no limit. Meant to be used together with the `spancat` component and defaults to 0 with `spancat_singlelabel`. ~~Optional[int]~~ |
| `scorer` | The scoring method. Defaults to [`Scorer.score_spans`](/api/scorer#score_spans) for `Doc.spans[spans_key]` with overlapping spans allowed. ~~Optional[Callable]~~ | | `scorer` | The scoring method. Defaults to [`Scorer.score_spans`](/api/scorer#score_spans) for `Doc.spans[spans_key]` with overlapping spans allowed. ~~Optional[Callable]~~ |
| `negative_weight` | Multiplier for the loss terms. It can be used to downweight the negative samples if there are too many. It is only available for the `spancat_exclusive` component. Defaults to `1.0`. ~~float~~ | | `add_negative_label` | Whether to learn to predict a special negative label for each unannotated `Span`. This should be `True` when using a `Softmax` classifier layer and so its `True` by default for `spancat_singlelabel`. Spans with negative labels and their scores are not stored as annotations. ~~bool~~ |
| `allow_overlap` | If `True`, the data is assumed to contain overlapping spans. It is only available for the `spancat_exclusive` component. Defaults to `True`. ~~bool~~ | | `negative_weight` | Multiplier for the loss terms. It can be used to downweight the negative samples if there are too many. It is only used when `add_negative_label` is `True`. Defaults to `1.0`. ~~float~~ |
| `allow_overlap` | If `True`, the data is assumed to contain overlapping spans. It is only available when `max_positive` is exactly 1. Defaults to `True`. ~~bool~~ |
```python ```python
%%GITHUB_SPACY/spacy/pipeline/spancat.py %%GITHUB_SPACY/spacy/pipeline/spancat.py
@ -122,6 +123,10 @@ shortcut for this and instantiate the component using its string name and
| `spans_key` | Key of the [`Doc.spans`](/api/doc#sans) dict to save the spans under. During initialization and training, the component will look for spans on the reference document under the same key. Defaults to `"sc"`. ~~str~~ | | `spans_key` | Key of the [`Doc.spans`](/api/doc#sans) dict to save the spans under. During initialization and training, the component will look for spans on the reference document under the same key. Defaults to `"sc"`. ~~str~~ |
| `threshold` | Minimum probability to consider a prediction positive. Spans with a positive prediction will be saved on the Doc. Defaults to `0.5`. ~~float~~ | | `threshold` | Minimum probability to consider a prediction positive. Spans with a positive prediction will be saved on the Doc. Defaults to `0.5`. ~~float~~ |
| `max_positive` | Maximum number of labels to consider positive per span. Defaults to `None`, indicating no limit. ~~Optional[int]~~ | | `max_positive` | Maximum number of labels to consider positive per span. Defaults to `None`, indicating no limit. ~~Optional[int]~~ |
| `allow_overlap` | If `True`, the data is assumed to contain overlapping spans. It is only available when `max_positive` is exactly 1. Defaults to `True`. ~~bool~~ |
| `add_negative_label` | Whether to learn to predict a special negative label for each unannotated `Span`. This should be `True` when using a `Softmax` classifier layer and so its `True` by default for `spancat_singlelabel`. Spans with negative labels and their scores are not stored as annotations. ~~bool~~ |
| `negative_weight` | Multiplier for the loss terms. It can be used to downweight the negative samples if there are too many. It is only used when `add_negative_label` is `True`. Defaults to `1.0`. ~~float~~ |
## SpanCategorizer.\_\_call\_\_ {id="call",tag="method"} ## SpanCategorizer.\_\_call\_\_ {id="call",tag="method"}