Add experimental coref docs (#11291)

* Add experimental coref docs * Docs cleanup * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Apply changes from code review * Fix prettier formatting It seems a period after a number made this think it was a list? * Update docs on examples for initialize * Add docs for coref scorers * Remove 3.4 notes from coref There won't be a "new" tag until it's in core. * Add docs for span cleaner * Fix docs * Fix docs to match spacy-experimental These weren't properly updated when the code was moved out of spacy core. * More doc fixes * Formatting * Update architectures * Fix links * Fix another link Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <svlandeg@github.com>
2025-10-29 23:17:59 +03:00 · 2022-09-27 18:11:23 +09:00 · 2022-09-27 18:11:23 +09:00 · a44b7d4622
commit a44b7d4622
parent 877671e09a
6 changed files with 889 additions and 6 deletions
--- a/website/docs/api/architectures.md
+++ b/website/docs/api/architectures.md
@ -11,6 +11,7 @@ menu:
  - ['Text Classification', 'textcat']
  - ['Span Classification', 'spancat']
  - ['Entity Linking', 'entitylinker']
  - ['Coreference', 'coref-architectures']
 ---
 A **model architecture** is a function that wires up a
@ -587,8 +588,8 @@ consists of either two or three subnetworks:
  run once for each batch.
 - **lower**: Construct a feature-specific vector for each `(token, feature)`
  pair. This is also run once for each batch. Constructing the state
-  representation is then a matter of summing the component features and
+  representation is then a matter of summing the component features and applying
-  applying the non-linearity.
+  the non-linearity.
 - **upper** (optional): A feed-forward network that predicts scores from the
  state representation. If not present, the output from the lower model is used
  as action scores directly.
@ -628,8 +629,8 @@ same signature, but the `use_upper` argument was `True` by default.
 > ```
 Build a tagger model, using a provided token-to-vector component. The tagger
-model adds a linear layer with softmax activation to predict scores given
+model adds a linear layer with softmax activation to predict scores given the
-the token vectors.
+token vectors.
 | Name        | Description                                                                                |
 | ----------- | ------------------------------------------------------------------------------------------ |
@ -920,5 +921,84 @@ A function that reads an existing `KnowledgeBase` from file.
 A function that takes as input a [`KnowledgeBase`](/api/kb) and a
 [`Span`](/api/span) object denoting a named entity, and returns a list of
 plausible [`Candidate`](/api/kb/#candidate) objects. The default
-`CandidateGenerator` uses the text of a mention to find its potential
+`CandidateGenerator` uses the text of a mention to find its potential aliases in
-aliases in the `KnowledgeBase`. Note that this function is case-dependent.
+the `KnowledgeBase`. Note that this function is case-dependent.
 ## Coreference {#coref-architectures tag="experimental"}
 A [`CoreferenceResolver`](/api/coref) component identifies tokens that refer to
 the same entity. A [`SpanResolver`](/api/span-resolver) component infers spans
 from single tokens. Together these components can be used to reproduce
 traditional coreference models. You can also omit the `SpanResolver` if working
 with only token-level clusters is acceptable.
 ### spacy-experimental.Coref.v1 {#Coref tag="experimental"}
 > #### Example Config
 >
 > ```ini
 >
 > [model]
 > @architectures = "spacy-experimental.Coref.v1"
 > distance_embedding_size = 20
 > dropout = 0.3
 > hidden_size = 1024
 > depth = 2
 > antecedent_limit = 50
 > antecedent_batch_size = 512
 >
 > [model.tok2vec]
 > @architectures = "spacy-transformers.TransformerListener.v1"
 > grad_factor = 1.0
 > upstream = "transformer"
 > pooling = {"@layers":"reduce_mean.v1"}
 > ```
 The `Coref` model architecture is a Thinc `Model`.
 | Name                      | Description                                                                                                                                                                              |
 | ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `tok2vec`                 | The [`tok2vec`](#tok2vec) layer of the model. ~~Model~~                                                                                                                                  |
 | `distance_embedding_size` | A representation of the distance between candidates. ~~int~~                                                                                                                             |
 | `dropout`                 | The dropout to use internally. Unlike some Thinc models, this has separate dropout for the internal PyTorch layers. ~~float~~                                                            |
 | `hidden_size`             | Size of the main internal layers. ~~int~~                                                                                                                                                |
 | `depth`                   | Depth of the internal network. ~~int~~                                                                                                                                                   |
 | `antecedent_limit`        | How many candidate antecedents to keep after rough scoring. This has a significant effect on memory usage. Typical values would be 50 to 200, or higher for very long documents. ~~int~~ |
 | `antecedent_batch_size`   | Internal batch size. ~~int~~                                                                                                                                                             |
 | **CREATES**               | The model using the architecture. ~~Model[List[Doc], Floats2d]~~                                                                                                                         |
 ### spacy-experimental.SpanResolver.v1 {#SpanResolver tag="experimental"}
 > #### Example Config
 >
 > ```ini
 >
 > [model]
 > @architectures = "spacy-experimental.SpanResolver.v1"
 > hidden_size = 1024
 > distance_embedding_size = 64
 > conv_channels = 4
 > window_size = 1
 > max_distance = 128
 > prefix = "coref_head_clusters"
 >
 > [model.tok2vec]
 > @architectures = "spacy-transformers.TransformerListener.v1"
 > grad_factor = 1.0
 > upstream = "transformer"
 > pooling = {"@layers":"reduce_mean.v1"}
 > ```
 The `SpanResolver` model architecture is a Thinc `Model`. Note that
 `MentionClusters` is `List[List[Tuple[int, int]]]`.
 | Name                      | Description                                                                                                          |
 | ------------------------- | -------------------------------------------------------------------------------------------------------------------- |
 | `tok2vec`                 | The [`tok2vec`](#tok2vec) layer of the model. ~~Model~~                                                              |
 | `hidden_size`             | Size of the main internal layers. ~~int~~                                                                            |
 | `distance_embedding_size` | A representation of the distance between two candidates. ~~int~~                                                     |
 | `conv_channels`           | The number of channels in the internal CNN. ~~int~~                                                                  |
 | `window_size`             | The number of neighboring tokens to consider in the internal CNN. `1` means consider one token on each side. ~~int~~ |
 | `max_distance`            | The longest possible length of a predicted span. ~~int~~                                                             |
 | `prefix`                  | The prefix that indicates spans to use for input data. ~~string~~                                                    |
 | **CREATES**               | The model using the architecture. ~~Model[List[Doc], List[MentionClusters]]~~                                        |
--- a/website/docs/api/coref.md
+++ b/website/docs/api/coref.md
@ -0,0 +1,353 @@
 ---
 title: CoreferenceResolver
 tag: class,experimental
 source: spacy-experimental/coref/coref_component.py
 teaser: 'Pipeline component for word-level coreference resolution'
 api_base_class: /api/pipe
 api_string_name: coref
 api_trainable: true
 ---
 > #### Installation
 >
 > ```bash
 > $ pip install -U spacy-experimental
 > ```
 <Infobox title="Important note" variant="warning">
 This component is not yet integrated into spaCy core, and is available via the
 extension package
 [`spacy-experimental`](https://github.com/explosion/spacy-experimental) starting
 in version 0.6.0. It exposes the component via
 [entry points](/usage/saving-loading/#entry-points), so if you have the package
 installed, using `factory = "experimental_coref"` in your
 [training config](/usage/training#config) or
 `nlp.add_pipe("experimental_coref")` will work out-of-the-box.
 </Infobox>
 A `CoreferenceResolver` component groups tokens into clusters that refer to the
 same thing. Clusters are represented as SpanGroups that start with a prefix
 (`coref_clusters` by default).
 A `CoreferenceResolver` component can be paired with a
 [`SpanResolver`](/api/span-resolver) to expand single tokens to spans.
 ## Assigned Attributes {#assigned-attributes}
 Predictions will be saved to `Doc.spans` as a [`SpanGroup`](/api/spangroup). The
 span key will be a prefix plus a serial number referring to the coreference
 cluster, starting from zero.
 The span key prefix defaults to `"coref_clusters"`, but can be passed as a
 parameter.
 | Location                                   | Value                                                                                                   |
 | ------------------------------------------ | ------------------------------------------------------------------------------------------------------- |
 | `Doc.spans[prefix + "_" + cluster_number]` | One coreference cluster, represented as single-token spans. Cluster numbers start from 1. ~~SpanGroup~~ |
 ## Config and implementation {#config}
 The default config is defined by the pipeline component factory and describes
 how the component should be configured. You can override its settings via the
 `config` argument on [`nlp.add_pipe`](/api/language#add_pipe) or in your
 [`config.cfg` for training](/usage/training#config). See the
 [model architectures](/api/architectures#coref-architectures) documentation for
 details on the architectures and their arguments and hyperparameters.
 > #### Example
 >
 > ```python
 > from spacy_experimental.coref.coref_component import DEFAULT_COREF_MODEL
 > from spacy_experimental.coref.coref_util import DEFAULT_CLUSTER_PREFIX
 > config={
 >     "model": DEFAULT_COREF_MODEL,
 >     "span_cluster_prefix": DEFAULT_CLUSTER_PREFIX,
 > },
 > nlp.add_pipe("experimental_coref", config=config)
 > ```
 | Setting               | Description                                                                                                                              |
 | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
 | `model`               | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. Defaults to [Coref](/api/architectures#Coref). ~~Model~~ |
 | `span_cluster_prefix` | The prefix for the keys for clusters saved to `doc.spans`. Defaults to `coref_clusters`. ~~str~~                                         |
 ## CoreferenceResolver.\_\_init\_\_ {#init tag="method"}
 > #### Example
 >
 > ```python
 > # Construction via add_pipe with default model
 > coref = nlp.add_pipe("experimental_coref")
 >
 > # Construction via add_pipe with custom model
 > config = {"model": {"@architectures": "my_coref.v1"}}
 > coref = nlp.add_pipe("experimental_coref", config=config)
 >
 > # Construction from class
 > from spacy_experimental.coref.coref_component import CoreferenceResolver
 > coref = CoreferenceResolver(nlp.vocab, model)
 > ```
 Create a new pipeline instance. In your application, you would normally use a
 shortcut for this and instantiate the component using its string name and
 [`nlp.add_pipe`](/api/language#add_pipe).
 | Name                  | Description                                                                                         |
 | --------------------- | --------------------------------------------------------------------------------------------------- |
 | `vocab`               | The shared vocabulary. ~~Vocab~~                                                                    |
 | `model`               | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model~~           |
 | `name`                | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
 | _keyword-only_        |                                                                                                     |
 | `span_cluster_prefix` | The prefix for the key for saving clusters of spans. ~~bool~~                                       |
 ## CoreferenceResolver.\_\_call\_\_ {#call tag="method"}
 Apply the pipe to one document. The document is modified in place and returned.
 This usually happens under the hood when the `nlp` object is called on a text
 and all pipeline components are applied to the `Doc` in order. Both
 [`__call__`](/api/coref#call) and [`pipe`](/api/coref#pipe) delegate to the
 [`predict`](/api/coref#predict) and
 [`set_annotations`](/api/coref#set_annotations) methods.
 > #### Example
 >
 > ```python
 > doc = nlp("This is a sentence.")
 > coref = nlp.add_pipe("experimental_coref")
 > # This usually happens under the hood
 > processed = coref(doc)
 > ```
 | Name        | Description                      |
 | ----------- | -------------------------------- |
 | `doc`       | The document to process. ~~Doc~~ |
 | **RETURNS** | The processed document. ~~Doc~~  |
 ## CoreferenceResolver.pipe {#pipe tag="method"}
 Apply the pipe to a stream of documents. This usually happens under the hood
 when the `nlp` object is called on a text and all pipeline components are
 applied to the `Doc` in order. Both [`__call__`](/api/coref#call) and
 [`pipe`](/api/coref#pipe) delegate to the [`predict`](/api/coref#predict) and
 [`set_annotations`](/api/coref#set_annotations) methods.
 > #### Example
 >
 > ```python
 > coref = nlp.add_pipe("experimental_coref")
 > for doc in coref.pipe(docs, batch_size=50):
 >     pass
 > ```
 | Name           | Description                                                   |
 | -------------- | ------------------------------------------------------------- |
 | `stream`       | A stream of documents. ~~Iterable[Doc]~~                      |
 | _keyword-only_ |                                                               |
 | `batch_size`   | The number of documents to buffer. Defaults to `128`. ~~int~~ |
 | **YIELDS**     | The processed documents in order. ~~Doc~~                     |
 ## CoreferenceResolver.initialize {#initialize tag="method"}
 Initialize the component for training. `get_examples` should be a function that
 returns an iterable of [`Example`](/api/example) objects. **At least one example
 should be supplied.** The data examples are used to **initialize the model** of
 the component and can either be the full training data or a representative
 sample. Initialization includes validating the network,
 [inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
 setting up the label scheme based on the data. This method is typically called
 by [`Language.initialize`](/api/language#initialize).
 > #### Example
 >
 > ```python
 > coref = nlp.add_pipe("experimental_coref")
 > coref.initialize(lambda: examples, nlp=nlp)
 > ```
 | Name           | Description                                                                                                                                                                |
 | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
 | _keyword-only_ |                                                                                                                                                                            |
 | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                       |
 ## CoreferenceResolver.predict {#predict tag="method"}
 Apply the component's model to a batch of [`Doc`](/api/doc) objects, without
 modifying them. Clusters are returned as a list of `MentionClusters`, one for
 each input `Doc`. A `MentionClusters` instance is just a list of lists of pairs
 of `int`s, where each item corresponds to a cluster, and the `int`s correspond
 to token indices.
 > #### Example
 >
 > ```python
 > coref = nlp.add_pipe("experimental_coref")
 > clusters = coref.predict([doc1, doc2])
 > ```
 | Name        | Description                                                                  |
 | ----------- | ---------------------------------------------------------------------------- |
 | `docs`      | The documents to predict. ~~Iterable[Doc]~~                                  |
 | **RETURNS** | The predicted coreference clusters for the `docs`. ~~List[MentionClusters]~~ |
 ## CoreferenceResolver.set_annotations {#set_annotations tag="method"}
 Modify a batch of documents, saving coreference clusters in `Doc.spans`.
 > #### Example
 >
 > ```python
 > coref = nlp.add_pipe("experimental_coref")
 > clusters = coref.predict([doc1, doc2])
 > coref.set_annotations([doc1, doc2], clusters)
 > ```
 | Name       | Description                                                                  |
 | ---------- | ---------------------------------------------------------------------------- |
 | `docs`     | The documents to modify. ~~Iterable[Doc]~~                                   |
 | `clusters` | The predicted coreference clusters for the `docs`. ~~List[MentionClusters]~~ |
 ## CoreferenceResolver.update {#update tag="method"}
 Learn from a batch of [`Example`](/api/example) objects. Delegates to
 [`predict`](/api/coref#predict).
 > #### Example
 >
 > ```python
 > coref = nlp.add_pipe("experimental_coref")
 > optimizer = nlp.initialize()
 > losses = coref.update(examples, sgd=optimizer)
 > ```
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
 | _keyword-only_ |                                                                                                                          |
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
 | **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                    |
 ## CoreferenceResolver.create_optimizer {#create_optimizer tag="method"}
 Create an optimizer for the pipeline component.
 > #### Example
 >
 > ```python
 > coref = nlp.add_pipe("experimental_coref")
 > optimizer = coref.create_optimizer()
 > ```
 | Name        | Description                  |
 | ----------- | ---------------------------- |
 | **RETURNS** | The optimizer. ~~Optimizer~~ |
 ## CoreferenceResolver.use_params {#use_params tag="method, contextmanager"}
 Modify the pipe's model, to use the given parameter values. At the end of the
 context, the original parameters are restored.
 > #### Example
 >
 > ```python
 > coref = nlp.add_pipe("experimental_coref")
 > with coref.use_params(optimizer.averages):
 >     coref.to_disk("/best_model")
 > ```
 | Name     | Description                                        |
 | -------- | -------------------------------------------------- |
 | `params` | The parameter values to use in the model. ~~dict~~ |
 ## CoreferenceResolver.to_disk {#to_disk tag="method"}
 Serialize the pipe to disk.
 > #### Example
 >
 > ```python
 > coref = nlp.add_pipe("experimental_coref")
 > coref.to_disk("/path/to/coref")
 > ```
 | Name           | Description                                                                                                                                |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
 | `path`         | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
 | _keyword-only_ |                                                                                                                                            |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~                                                |
 ## CoreferenceResolver.from_disk {#from_disk tag="method"}
 Load the pipe from disk. Modifies the object in place and returns it.
 > #### Example
 >
 > ```python
 > coref = nlp.add_pipe("experimental_coref")
 > coref.from_disk("/path/to/coref")
 > ```
 | Name           | Description                                                                                     |
 | -------------- | ----------------------------------------------------------------------------------------------- |
 | `path`         | A path to a directory. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
 | _keyword-only_ |                                                                                                 |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~     |
 | **RETURNS**    | The modified `CoreferenceResolver` object. ~~CoreferenceResolver~~                              |
 ## CoreferenceResolver.to_bytes {#to_bytes tag="method"}
 > #### Example
 >
 > ```python
 > coref = nlp.add_pipe("experimental_coref")
 > coref_bytes = coref.to_bytes()
 > ```
 Serialize the pipe to a bytestring, including the `KnowledgeBase`.
 | Name           | Description                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------- |
 | _keyword-only_ |                                                                                             |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
 | **RETURNS**    | The serialized form of the `CoreferenceResolver` object. ~~bytes~~                          |
 ## CoreferenceResolver.from_bytes {#from_bytes tag="method"}
 Load the pipe from a bytestring. Modifies the object in place and returns it.
 > #### Example
 >
 > ```python
 > coref_bytes = coref.to_bytes()
 > coref = nlp.add_pipe("experimental_coref")
 > coref.from_bytes(coref_bytes)
 > ```
 | Name           | Description                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------- |
 | `bytes_data`   | The data to load from. ~~bytes~~                                                            |
 | _keyword-only_ |                                                                                             |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
 | **RETURNS**    | The `CoreferenceResolver` object. ~~CoreferenceResolver~~                                   |
 ## Serialization fields {#serialization-fields}
 During serialization, spaCy will export several data fields used to restore
 different aspects of the object. If needed, you can exclude them from
 serialization by passing in the string names via the `exclude` argument.
 > #### Example
 >
 > ```python
 > data = coref.to_disk("/path", exclude=["vocab"])
 > ```
 | Name    | Description                                                    |
 | ------- | -------------------------------------------------------------- |
 | `vocab` | The shared [`Vocab`](/api/vocab).                              |
 | `cfg`   | The config file. You usually don't want to exclude this.       |
 | `model` | The binary model data. You usually don't want to exclude this. |
--- a/website/docs/api/pipeline-functions.md
+++ b/website/docs/api/pipeline-functions.md
@ -153,3 +153,36 @@ whole pipeline has run.
 | `attrs`     | A dict of the `Doc` attributes and the values to set them to. Defaults to `{"tensor": None, "_.trf_data": None}` to clean up after `tok2vec` and `transformer` components. ~~dict~~ |
 | `silent`    | If `False`, show warnings if attributes aren't found or can't be set. Defaults to `True`. ~~bool~~                                                                                  |
 | **RETURNS** | The modified `Doc` with the modified attributes. ~~Doc~~                                                                                                                            |
 ## span_cleaner {#span_cleaner tag="function,experimental"}
 Remove `SpanGroup`s from `doc.spans` based on a key prefix. This is used to
 clean up after the [`CoreferenceResolver`](/api/coref) when it's paired with a
 [`SpanResolver`](/api/span-resolver).
 <Infobox title="Important note" variant="warning">
 This pipeline function is not yet integrated into spaCy core, and is available
 via the extension package
 [`spacy-experimental`](https://github.com/explosion/spacy-experimental) starting
 in version 0.6.0. It exposes the component via
 [entry points](/usage/saving-loading/#entry-points), so if you have the package
 installed, using `factory = "span_cleaner"` in your
 [training config](/usage/training#config) or `nlp.add_pipe("span_cleaner")` will
 work out-of-the-box.
 </Infobox>
 > #### Example
 >
 > ```python
 > config = {"prefix": "coref_head_clusters"}
 > nlp.add_pipe("span_cleaner", config=config)
 > doc = nlp("text")
 > assert "coref_head_clusters_1" not in doc.spans
 > ```
 | Setting     | Description                                                                                                               |
 | ----------- | ------------------------------------------------------------------------------------------------------------------------- |
 | `prefix`    | A prefix to check `SpanGroup` keys for. Any matching groups will be removed. Defaults to `"coref_head_clusters"`. ~~str~~ |
 | **RETURNS** | The modified `Doc` with any matching spans removed. ~~Doc~~                                                               |
--- a/website/docs/api/scorer.md
+++ b/website/docs/api/scorer.md
@ -270,3 +270,62 @@ Compute micro-PRF and per-entity PRF scores.
 | Name       | Description                                                                                                         |
 | ---------- | ------------------------------------------------------------------------------------------------------------------- |
 | `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
 ## score_coref_clusters {#score_coref_clusters tag="experimental"}
 Returns LEA ([Moosavi and Strube, 2016](https://aclanthology.org/P16-1060/)) PRF
 scores for coreference clusters.
 <Infobox title="Important note" variant="warning">
 Note this scoring function is not yet included in spaCy core - for details, see
 the [CoreferenceResolver](/api/coref) docs.
 </Infobox>
 > #### Example
 >
 > ```python
 > scores = score_coref_clusters(
 >     examples,
 >     span_cluster_prefix="coref_clusters",
 > )
 > print(scores["coref_f"])
 > ```
 | Name                  | Description                                                                                                         |
 | --------------------- | ------------------------------------------------------------------------------------------------------------------- |
 | `examples`            | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
 | _keyword-only_        |                                                                                                                     |
 | `span_cluster_prefix` | The prefix used for spans representing coreference clusters. ~~str~~                                                |
 | **RETURNS**           | A dictionary containing the scores. ~~Dict[str, Optional[float]]~~                                                  |
 ## score_span_predictions {#score_span_predictions tag="experimental"}
 Return accuracy for reconstructions of spans from single tokens. Only exactly
 correct predictions are counted as correct, there is no partial credit for near
 answers. Used by the [SpanResolver](/api/span-resolver).
 <Infobox title="Important note" variant="warning">
 Note this scoring function is not yet included in spaCy core - for details, see
 the [SpanResolver](/api/span-resolver) docs.
 </Infobox>
 > #### Example
 >
 > ```python
 > scores = score_span_predictions(
 >     examples,
 >     output_prefix="coref_clusters",
 > )
 > print(scores["span_coref_clusters_accuracy"])
 > ```
 | Name            | Description                                                                                                         |
 | --------------- | ------------------------------------------------------------------------------------------------------------------- |
 | `examples`      | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
 | _keyword-only_  |                                                                                                                     |
 | `output_prefix` | The prefix used for spans representing the final predicted spans. ~~str~~                                           |
 | **RETURNS**     | A dictionary containing the scores. ~~Dict[str, Optional[float]]~~                                                  |
--- a/website/docs/api/span-resolver.md
+++ b/website/docs/api/span-resolver.md
@ -0,0 +1,356 @@
 ---
 title: SpanResolver
 tag: class,experimental
 source: spacy-experimental/coref/span_resolver_component.py
 teaser: 'Pipeline component for resolving tokens into spans'
 api_base_class: /api/pipe
 api_string_name: span_resolver
 api_trainable: true
 ---
 > #### Installation
 >
 > ```bash
 > $ pip install -U spacy-experimental
 > ```
 <Infobox title="Important note" variant="warning">
 This component not yet integrated into spaCy core, and is available via the
 extension package
 [`spacy-experimental`](https://github.com/explosion/spacy-experimental) starting
 in version 0.6.0. It exposes the component via
 [entry points](/usage/saving-loading/#entry-points), so if you have the package
 installed, using `factory = "experimental_span_resolver"` in your
 [training config](/usage/training#config) or
 `nlp.add_pipe("experimental_span_resolver")` will work out-of-the-box.
 </Infobox>
 A `SpanResolver` component takes in tokens (represented as `Span` objects of
 length 1) and resolves them into `Span` objects of arbitrary length. The initial
 use case is as a post-processing step on word-level
 [coreference resolution](/api/coref). The input and output keys used to store
 `Span` objects are configurable.
 ## Assigned Attributes {#assigned-attributes}
 Predictions will be saved to `Doc.spans` as [`SpanGroup`s](/api/spangroup).
 Input token spans will be read in using an input prefix, by default
 `"coref_head_clusters"`, and output spans will be saved using an output prefix
 (default `"coref_clusters"`) plus a serial number starting from one. The
 prefixes are configurable.
 | Location                                          | Value                                                                     |
 | ------------------------------------------------- | ------------------------------------------------------------------------- |
 | `Doc.spans[output_prefix + "_" + cluster_number]` | One group of predicted spans. Cluster number starts from 1. ~~SpanGroup~~ |
 ## Config and implementation {#config}
 The default config is defined by the pipeline component factory and describes
 how the component should be configured. You can override its settings via the
 `config` argument on [`nlp.add_pipe`](/api/language#add_pipe) or in your
 [`config.cfg` for training](/usage/training#config). See the
 [model architectures](/api/architectures#coref-architectures) documentation for
 details on the architectures and their arguments and hyperparameters.
 > #### Example
 >
 > ```python
 > from spacy_experimental.coref.span_resolver_component import DEFAULT_SPAN_RESOLVER_MODEL
 > from spacy_experimental.coref.coref_util import DEFAULT_CLUSTER_PREFIX, DEFAULT_CLUSTER_HEAD_PREFIX
 > config={
 >     "model": DEFAULT_SPAN_RESOLVER_MODEL,
 >     "input_prefix": DEFAULT_CLUSTER_HEAD_PREFIX,
 >     "output_prefix": DEFAULT_CLUSTER_PREFIX,
 > },
 > nlp.add_pipe("experimental_span_resolver", config=config)
 > ```
 | Setting         | Description                                                                                                                                            |
 | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | `model`         | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. Defaults to [SpanResolver](/api/architectures#SpanResolver). ~~Model~~ |
 | `input_prefix`  | The prefix to use for input `SpanGroup`s. Defaults to `coref_head_clusters`. ~~str~~                                                                   |
 | `output_prefix` | The prefix for predicted `SpanGroup`s. Defaults to `coref_clusters`. ~~str~~                                                                           |
 ## SpanResolver.\_\_init\_\_ {#init tag="method"}
 > #### Example
 >
 > ```python
 > # Construction via add_pipe with default model
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 >
 > # Construction via add_pipe with custom model
 > config = {"model": {"@architectures": "my_span_resolver.v1"}}
 > span_resolver = nlp.add_pipe("experimental_span_resolver", config=config)
 >
 > # Construction from class
 > from spacy_experimental.coref.span_resolver_component import SpanResolver
 > span_resolver = SpanResolver(nlp.vocab, model)
 > ```
 Create a new pipeline instance. In your application, you would normally use a
 shortcut for this and instantiate the component using its string name and
 [`nlp.add_pipe`](/api/language#add_pipe).
 | Name            | Description                                                                                         |
 | --------------- | --------------------------------------------------------------------------------------------------- |
 | `vocab`         | The shared vocabulary. ~~Vocab~~                                                                    |
 | `model`         | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model~~           |
 | `name`          | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
 | _keyword-only_  |                                                                                                     |
 | `input_prefix`  | The prefix to use for input `SpanGroup`s. Defaults to `coref_head_clusters`. ~~str~~                |
 | `output_prefix` | The prefix for predicted `SpanGroup`s. Defaults to `coref_clusters`. ~~str~~                        |
 ## SpanResolver.\_\_call\_\_ {#call tag="method"}
 Apply the pipe to one document. The document is modified in place and returned.
 This usually happens under the hood when the `nlp` object is called on a text
 and all pipeline components are applied to the `Doc` in order. Both
 [`__call__`](#call) and [`pipe`](#pipe) delegate to the [`predict`](#predict)
 and [`set_annotations`](#set_annotations) methods.
 > #### Example
 >
 > ```python
 > doc = nlp("This is a sentence.")
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > # This usually happens under the hood
 > processed = span_resolver(doc)
 > ```
 | Name        | Description                      |
 | ----------- | -------------------------------- |
 | `doc`       | The document to process. ~~Doc~~ |
 | **RETURNS** | The processed document. ~~Doc~~  |
 ## SpanResolver.pipe {#pipe tag="method"}
 Apply the pipe to a stream of documents. This usually happens under the hood
 when the `nlp` object is called on a text and all pipeline components are
 applied to the `Doc` in order. Both [`__call__`](/api/span-resolver#call) and
 [`pipe`](/api/span-resolver#pipe) delegate to the
 [`predict`](/api/span-resolver#predict) and
 [`set_annotations`](/api/span-resolver#set_annotations) methods.
 > #### Example
 >
 > ```python
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > for doc in span_resolver.pipe(docs, batch_size=50):
 >     pass
 > ```
 | Name           | Description                                                   |
 | -------------- | ------------------------------------------------------------- |
 | `stream`       | A stream of documents. ~~Iterable[Doc]~~                      |
 | _keyword-only_ |                                                               |
 | `batch_size`   | The number of documents to buffer. Defaults to `128`. ~~int~~ |
 | **YIELDS**     | The processed documents in order. ~~Doc~~                     |
 ## SpanResolver.initialize {#initialize tag="method"}
 Initialize the component for training. `get_examples` should be a function that
 returns an iterable of [`Example`](/api/example) objects. **At least one example
 should be supplied.** The data examples are used to **initialize the model** of
 the component and can either be the full training data or a representative
 sample. Initialization includes validating the network,
 [inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
 setting up the label scheme based on the data. This method is typically called
 by [`Language.initialize`](/api/language#initialize).
 > #### Example
 >
 > ```python
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > span_resolver.initialize(lambda: examples, nlp=nlp)
 > ```
 | Name           | Description                                                                                                                                                                |
 | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
 | _keyword-only_ |                                                                                                                                                                            |
 | `nlp`          | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                       |
 ## SpanResolver.predict {#predict tag="method"}
 Apply the component's model to a batch of [`Doc`](/api/doc) objects, without
 modifying them. Predictions are returned as a list of `MentionClusters`, one for
 each input `Doc`. A `MentionClusters` instance is just a list of lists of pairs
 of `int`s, where each item corresponds to an input `SpanGroup`, and the `int`s
 correspond to token indices.
 > #### Example
 >
 > ```python
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > spans = span_resolver.predict([doc1, doc2])
 > ```
 | Name        | Description                                                   |
 | ----------- | ------------------------------------------------------------- |
 | `docs`      | The documents to predict. ~~Iterable[Doc]~~                   |
 | **RETURNS** | The predicted spans for the `Doc`s. ~~List[MentionClusters]~~ |
 ## SpanResolver.set_annotations {#set_annotations tag="method"}
 Modify a batch of documents, saving predictions using the output prefix in
 `Doc.spans`.
 > #### Example
 >
 > ```python
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > spans = span_resolver.predict([doc1, doc2])
 > span_resolver.set_annotations([doc1, doc2], spans)
 > ```
 | Name    | Description                                                   |
 | ------- | ------------------------------------------------------------- |
 | `docs`  | The documents to modify. ~~Iterable[Doc]~~                    |
 | `spans` | The predicted spans for the `docs`. ~~List[MentionClusters]~~ |
 ## SpanResolver.update {#update tag="method"}
 Learn from a batch of [`Example`](/api/example) objects. Delegates to
 [`predict`](/api/span-resolver#predict).
 > #### Example
 >
 > ```python
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > optimizer = nlp.initialize()
 > losses = span_resolver.update(examples, sgd=optimizer)
 > ```
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
 | _keyword-only_ |                                                                                                                          |
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
 | **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                    |
 ## SpanResolver.create_optimizer {#create_optimizer tag="method"}
 Create an optimizer for the pipeline component.
 > #### Example
 >
 > ```python
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > optimizer = span_resolver.create_optimizer()
 > ```
 | Name        | Description                  |
 | ----------- | ---------------------------- |
 | **RETURNS** | The optimizer. ~~Optimizer~~ |
 ## SpanResolver.use_params {#use_params tag="method, contextmanager"}
 Modify the pipe's model, to use the given parameter values. At the end of the
 context, the original parameters are restored.
 > #### Example
 >
 > ```python
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > with span_resolver.use_params(optimizer.averages):
 >     span_resolver.to_disk("/best_model")
 > ```
 | Name     | Description                                        |
 | -------- | -------------------------------------------------- |
 | `params` | The parameter values to use in the model. ~~dict~~ |
 ## SpanResolver.to_disk {#to_disk tag="method"}
 Serialize the pipe to disk.
 > #### Example
 >
 > ```python
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > span_resolver.to_disk("/path/to/span_resolver")
 > ```
 | Name           | Description                                                                                                                                |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
 | `path`         | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
 | _keyword-only_ |                                                                                                                                            |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~                                                |
 ## SpanResolver.from_disk {#from_disk tag="method"}
 Load the pipe from disk. Modifies the object in place and returns it.
 > #### Example
 >
 > ```python
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > span_resolver.from_disk("/path/to/span_resolver")
 > ```
 | Name           | Description                                                                                     |
 | -------------- | ----------------------------------------------------------------------------------------------- |
 | `path`         | A path to a directory. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
 | _keyword-only_ |                                                                                                 |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~     |
 | **RETURNS**    | The modified `SpanResolver` object. ~~SpanResolver~~                                            |
 ## SpanResolver.to_bytes {#to_bytes tag="method"}
 > #### Example
 >
 > ```python
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > span_resolver_bytes = span_resolver.to_bytes()
 > ```
 Serialize the pipe to a bytestring.
 | Name           | Description                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------- |
 | _keyword-only_ |                                                                                             |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
 | **RETURNS**    | The serialized form of the `SpanResolver` object. ~~bytes~~                                 |
 ## SpanResolver.from_bytes {#from_bytes tag="method"}
 Load the pipe from a bytestring. Modifies the object in place and returns it.
 > #### Example
 >
 > ```python
 > span_resolver_bytes = span_resolver.to_bytes()
 > span_resolver = nlp.add_pipe("experimental_span_resolver")
 > span_resolver.from_bytes(span_resolver_bytes)
 > ```
 | Name           | Description                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------- |
 | `bytes_data`   | The data to load from. ~~bytes~~                                                            |
 | _keyword-only_ |                                                                                             |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
 | **RETURNS**    | The `SpanResolver` object. ~~SpanResolver~~                                                 |
 ## Serialization fields {#serialization-fields}
 During serialization, spaCy will export several data fields used to restore
 different aspects of the object. If needed, you can exclude them from
 serialization by passing in the string names via the `exclude` argument.
 > #### Example
 >
 > ```python
 > data = span_resolver.to_disk("/path", exclude=["vocab"])
 > ```
 | Name    | Description                                                    |
 | ------- | -------------------------------------------------------------- |
 | `vocab` | The shared [`Vocab`](/api/vocab).                              |
 | `cfg`   | The config file. You usually don't want to exclude this.       |
 | `model` | The binary model data. You usually don't want to exclude this. |
--- a/website/meta/sidebars.json
+++ b/website/meta/sidebars.json
@ -94,6 +94,7 @@
                "label": "Pipeline",
                "items": [
                    { "text": "AttributeRuler", "url": "/api/attributeruler" },
                    { "text": "CoreferenceResolver", "url": "/api/coref" },
                    { "text": "DependencyParser", "url": "/api/dependencyparser" },
                    { "text": "EditTreeLemmatizer", "url": "/api/edittreelemmatizer" },
                    { "text": "EntityLinker", "url": "/api/entitylinker" },
@ -104,6 +105,7 @@
                    { "text": "SentenceRecognizer", "url": "/api/sentencerecognizer" },
                    { "text": "Sentencizer", "url": "/api/sentencizer" },
                    { "text": "SpanCategorizer", "url": "/api/spancategorizer" },
                    { "text": "SpanResolver", "url": "/api/span-resolver" },
                    { "text": "SpanRuler", "url": "/api/spanruler" },
                    { "text": "Tagger", "url": "/api/tagger" },
                    { "text": "TextCategorizer", "url": "/api/textcategorizer" },