Update docs [ci skip]

2025-06-29 09:23:12 +03:00 · 2020-08-10 00:01:38 +02:00 · 2020-08-10 00:01:38 +02:00 · c044460823
commit c044460823
parent 3eaeb73342
5 changed files with 237 additions and 77 deletions
--- a/website/docs/api/architectures.md
+++ b/website/docs/api/architectures.md
@ -3,7 +3,7 @@ title: Model Architectures
 teaser: Pre-defined model architectures included with the core library
 source: spacy/ml/models
 menu:
-  - ['Tok2Vec', 'tok2vec']
+  - ['Tok2Vec', 'tok2vec-arch']
  - ['Transformers', 'transformers']
  - ['Parser & NER', 'parser']
  - ['Tagging', 'tagger']
@ -236,7 +236,7 @@ and residual connections.
 > depth = 4
 > ```
-Encode context using bidirectonal LSTM layers. Requires
+Encode context using bidirectional LSTM layers. Requires
 [PyTorch](https://pytorch.org).
 | Name          | Type | Description                                                                                                                                                                                            |
@ -278,8 +278,6 @@ architectures into your training config.
 ### spacy-transformers.Tok2VecListener.v1 {#Tok2VecListener}
 <!-- TODO: description -->
 > #### Example Config
 >
 > ```ini
@ -291,10 +289,41 @@ architectures into your training config.
 > @layers = "reduce_mean.v1"
 > ```
-| Name          | Type                      | Description                                                                                    |
+Create a `TransformerListener` layer, which will connect to a
-| ------------- | ------------------------- | ---------------------------------------------------------------------------------------------- |
+[`Transformer`](/api/transformer) component earlier in the pipeline. The layer
-| `grad_factor` | float                     | Factor for weighting the gradient if multiple components listen to the same transformer model. |
+takes a list of [`Doc`](/api/doc) objects as input, and produces a list of
-| `pooling`     | `Model[Ragged, Floats2d]` | Pooling layer to determine how the vector for each spaCy token will be computed.               |
+2-dimensional arrays as output, with each array having one row per token. Most
 spaCy models expect a sublayer with this signature, making it easy to connect
 them to a transformer model via this sublayer. Transformer models usually
 operate over wordpieces, which usually don't align one-to-one against spaCy
 tokens. The layer therefore requires a reduction operation in order to calculate
 a single token vector given zero or more wordpiece vectors.
 | Name          | Type                                       | Description                                                                                                                                                                                                                                                         |
 | ------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `pooling`     | [`Model`](https://thinc.ai/docs/api-model) | **Input:** [`Ragged`](https://thinc.ai/docs/api-types#ragged). **Output:** [`Floats2d`](https://thinc.ai/docs/api-types#types)                                                                                                                                      | A reduction layer used to calculate the token vectors based on zero or more wordpiece vectors. If in doubt, mean pooling (see [`reduce_mean`](https://thinc.ai/docs/api-layers#reduce_mean)) is usually a good choice. |
 | `grad_factor` | float                                      | Reweight gradients from the component before passing them upstream. You can set this to `0` to "freeze" the transformer weights with respect to the component, or use it to make some components more significant than others. Leaving it at `1.0` is usually fine. |
 ### spacy-transformers.Tok2VecTransformer.v1 {#Tok2VecTransformer}
 > #### Example Config
 >
 > ```ini
 > # TODO:
 > ```
 Use a transformer as a [`Tok2Vec`](/api/tok2vec) layer directly. This does
 **not** allow multiple components to share the transformer weights, and does
 **not** allow the transformer to set annotations into the [`Doc`](/api/doc)
 object, but it's a **simpler solution** if you only need the transformer within
 one component.
 | Name               | Type                                       | Description                                                                                                                                                                                                                                                         |
 | ------------------ | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `get_spans`        | callable                                   | Function that takes a batch of [`Doc`](/api/doc) object and returns lists of [`Span`](/api) objects to process by the transformer. [See here](/api/transformer#span_getters) for built-in options and examples.                                                     |
 | `tokenizer_config` | `Dict[str, Any]`                           | Tokenizer settings passed to [`transformers.AutoTokenizer`](https://huggingface.co/transformers/model_doc/auto.html#transformers.AutoTokenizer).                                                                                                                    |
 | `pooling`          | [`Model`](https://thinc.ai/docs/api-model) | **Input:** [`Ragged`](https://thinc.ai/docs/api-types#ragged). **Output:** [`Floats2d`](https://thinc.ai/docs/api-types#types)                                                                                                                                      | A reduction layer used to calculate the token vectors based on zero or more wordpiece vectors. If in doubt, mean pooling (see [`reduce_mean`](https://thinc.ai/docs/api-layers#reduce_mean)) is usually a good choice. |
 | `grad_factor`      | float                                      | Reweight gradients from the component before passing them upstream. You can set this to `0` to "freeze" the transformer weights with respect to the component, or use it to make some components more significant than others. Leaving it at `1.0` is usually fine. |
 ## Parser & NER architectures {#parser}
@ -595,8 +624,6 @@ A function that creates a default, empty `KnowledgeBase` from a
 A function that takes as input a [`KnowledgeBase`](/api/kb) and a
 [`Span`](/api/span) object denoting a named entity, and returns a list of
-plausible [`Candidate` objects](/api/kb/#candidate_init).
+plausible [`Candidate` objects](/api/kb/#candidate_init). The default
-
+`CandidateGenerator` simply uses the text of a mention to find its potential
-The default `CandidateGenerator` simply uses the text of a mention to find its
+aliases in the `KnowledgeBase`. Note that this function is case-dependent.
 potential aliases in the Knowledgebase. Note that this function is
 case-dependent.
--- a/website/docs/api/language.md
+++ b/website/docs/api/language.md
@ -242,6 +242,21 @@ a batch of [Example](/api/example) objects.
 Update the models in the pipeline.
 <Infobox variant="warning" title="Changed in v3.0">
 The `Language.update` method now takes a batch of [`Example`](/api/example)
 objects instead of the raw texts and annotations or `Doc` and `GoldParse`
 objects. An [`Example`](/api/example) streamlines how data is passed around. It
 stores two `Doc` objects: one for holding the gold-standard reference data, and
 one for holding the predictions of the pipeline.
 For most use cases, you shouldn't have to write your own training scripts
 anymore. Instead, you can use [`spacy train`](/api/cli#train) with a config file
 and custom registered functions if needed. See the
 [training documentation](/usage/training) for details.
 </Infobox>
 > #### Example
 >
 > ```python
@ -253,7 +268,7 @@ Update the models in the pipeline.
 | Name            | Type                                                | Description                                                                                            |
 | --------------- | --------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
-| `examples`      | `Iterable[Example]`                                 | A batch of `Example` objects to learn from.                                                            |
+| `examples`      | `Iterable[Example]`                                 | A batch of [`Example`](/api/example) objects to learn from.                                            |
 | _keyword-only_  |                                                     |                                                                                                        |
 | `drop`          | float                                               | The dropout rate.                                                                                      |
 | `sgd`           | [`Optimizer`](https://thinc.ai/docs/api-optimizers) | The optimizer.                                                                                         |
--- a/website/docs/api/lemmatizer.md
+++ b/website/docs/api/lemmatizer.md
@ -9,6 +9,28 @@ api_string_name: lemmatizer
 api_trainable: false
 ---
 Component for assigning base forms to tokens using rules based on part-of-speech
 tags, or lookup tables. Functionality to train the component is coming soon.
 Different [`Language`](/api/language) subclasses can implement their own
 lemmatizer components via
 [language-specific factories](/usage/processing-pipelines#factories-language).
 The default data used is provided by the
 [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data)
 extension package.
 <Infobox variant="warning" title="New in v3.0">
 As of v3.0, the `Lemmatizer` is a **standalone pipeline component** that can be
 added to your pipeline, and not a hidden part of the vocab that runs behind the
 scenes. This makes it easier to customize how lemmas should be assigned in your
 pipeline.
 If the lemmatization mode is set to `"rule"` and requires part-of-speech tags to
 be assigned, make sure a [`Tagger`](/api/tagger) or another component assigning
 tags is available in the pipeline and runs _before_ the lemmatizer.
 </Infobox>
 ## Config and implementation
 The default config is defined by the pipeline component factory and describes
@ -29,7 +51,7 @@ lemmatizers, see the
 | Setting     | Type                                       | Description                                                                                                                                                                            | Default    |
 | ----------- | ------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
-| `mode`      | str                                        | The lemmatizer mode, e.g. "lookup" or "rule".                                                                                                                                          | `"lookup"` |
+| `mode`      | str                                        | The lemmatizer mode, e.g. `"lookup"` or `"rule"`.                                                                                                                                      | `"lookup"` |
 | `lookups`   | [`Lookups`](/api/lookups)                  | The lookups object containing the tables such as `"lemma_rules"`, `"lemma_index"`, `"lemma_exc"` and `"lemma_lookup"`. If `None`, default tables are loaded from `spacy-lookups-data`. | `None`     |
 | `overwrite` | bool                                       | Whether to overwrite existing lemmas.                                                                                                                                                  | `False`    |
 | `model`     | [`Model`](https://thinc.ai/docs/api-model) | **Not yet implemented:** the model to use.                                                                                                                                             | `None`     |
@ -55,15 +77,15 @@ Create a new pipeline instance. In your application, you would normally use a
 shortcut for this and instantiate the component using its string name and
 [`nlp.add_pipe`](/api/language#add_pipe).
-| Name           | Type                                       | Description                                                                                                                      |
+| Name           | Type                                       | Description                                                                                                                              |
-| -------------- | ------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- |
+| -------------- | ------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- |
-| `vocab`        | [`Vocab`](/api/vocab)                      | The vocab.                                                                                                                       |
+| `vocab`        | [`Vocab`](/api/vocab)                      | The vocab.                                                                                                                               |
-| `model`        | [`Model`](https://thinc.ai/docs/api-model) | A model (not yet implemented).                                                                                                   |
+| `model`        | [`Model`](https://thinc.ai/docs/api-model) | A model (not yet implemented).                                                                                                           |
-| `name`         | str                                        | String name of the component instance. Used to add entries to the `losses` during training.                                      |
+| `name`         | str                                        | String name of the component instance. Used to add entries to the `losses` during training.                                              |
-| _keyword-only_ |                                            |                                                                                                                                  |
+| _keyword-only_ |                                            |                                                                                                                                          |
-| mode           | str                                        | The lemmatizer mode, e.g. "lookup" or "rule". Defaults to "lookup".                                                              |
+| mode           | str                                        | The lemmatizer mode, e.g. `"lookup"` or `"rule"`. Defaults to `"lookup"`.                                                                |
-| lookups        | [`Lookups`](/api/lookups)                  | A lookups object containing the tables such as "lemma_rules", "lemma_index", "lemma_exc" and "lemma_lookup". Defaults to `None`. |
+| lookups        | [`Lookups`](/api/lookups)                  | A lookups object containing the tables such as `"lemma_rules"`, `"lemma_index"`, `"lemma_exc"` and `"lemma_lookup"`. Defaults to `None`. |
-| overwrite      | bool                                       | Whether to overwrite existing lemmas.                                                                                            |
+| overwrite      | bool                                       | Whether to overwrite existing lemmas.                                                                                                    |
 ## Lemmatizer.\_\_call\_\_ {#call tag="method"}
--- a/website/docs/api/transformer.md
+++ b/website/docs/api/transformer.md
@ -25,8 +25,15 @@ work out-of-the-box.
 </Infobox>
-This pipeline component lets you use transformer models in your pipeline. The
+This pipeline component lets you use transformer models in your pipeline, using
-component assigns the output of the transformer to the Doc's extension
+the [HuggingFace `transformers`](https://huggingface.co/transformers) library
 under the hood. Usually you will connect subsequent components to the shared
 transformer using the
 [TransformerListener](/api/architectures#TransformerListener) layer. This works
 similarly to spaCy's [Tok2Vec](/api/tok2vec) component and
 [Tok2VecListener](/api/architectures/Tok2VecListener) sublayer.
 The component assigns the output of the transformer to the `Doc`'s extension
 attributes. We also calculate an alignment between the word-piece tokens and the
 spaCy tokenization, so that we can use the last hidden states to set the
 `Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy
@ -53,11 +60,11 @@ architectures and their arguments and hyperparameters.
 > nlp.add_pipe("transformer", config=DEFAULT_CONFIG)
 > ```
-| Setting             | Type                                       | Description                                                                                                                                                         | Default                                                 |
+| Setting             | Type                                       | Description                                                                                                                                                                                                                                                                                     | Default                                                 |
-| ------------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
+| ------------------- | ------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
-| `max_batch_items`   | int                                        | Maximum size of a padded batch.                                                                                                                                     | `4096`                                                  |
+| `max_batch_items`   | int                                        | Maximum size of a padded batch.                                                                                                                                                                                                                                                                 | `4096`                                                  |
-| `annotation_setter` | Callable                                   | Function that takes a batch of `Doc` objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set additional annotations on the `Doc`. | `null_annotation_setter`                                |
+| `annotation_setter` | Callable                                   | Function that takes a batch of `Doc` objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set additional annotations on the `Doc`. The `Doc._.transformer_data` attribute is set prior to calling the callback. By default, no additional annotations are set. | `null_annotation_setter`                                |
-| `model`             | [`Model`](https://thinc.ai/docs/api-model) | The model to use.                                                                                                                                                   | [TransformerModel](/api/architectures#TransformerModel) |
+| `model`             | [`Model`](https://thinc.ai/docs/api-model) | **Input:** `List[Doc]`. **Output:** [`FullTransformerBatch`](/api/transformer#fulltransformerbatch). The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer.                                                                                                             | [TransformerModel](/api/architectures#TransformerModel) |
 ```python
 https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py
@ -86,18 +93,22 @@ https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/p
 > trf = Transformer(nlp.vocab, model)
 > ```
-Create a new pipeline instance. In your application, you would normally use a
+Construct a `Transformer` component. One or more subsequent spaCy components can
-shortcut for this and instantiate the component using its string name and
+use the transformer outputs as features in its model, with gradients
-[`nlp.add_pipe`](/api/language#create_pipe).
+backpropagated to the single shared weights. The activations from the
 transformer are saved in the [`Doc._.trf_data`](#custom-attributes) extension
 attribute. You can also provide a callback to set additional annotations. In
 your application, you would normally use a shortcut for this and instantiate the
 component using its string name and [`nlp.add_pipe`](/api/language#create_pipe).
-| Name                | Type                                       | Description                                                                                                                                                                                                                             |
+| Name                | Type                                       | Description                                                                                                                                                                                                                                                                                     |
-| ------------------- | ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ------------------- | ------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `vocab`             | `Vocab`                                    | The shared vocabulary.                                                                                                                                                                                                                  |
+| `vocab`             | `Vocab`                                    | The shared vocabulary.                                                                                                                                                                                                                                                                          |
-| `model`             | [`Model`](https://thinc.ai/docs/api-model) | The Thinc [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component.                                                                                                                                                   |
+| `model`             | [`Model`](https://thinc.ai/docs/api-model) | **Input:** `List[Doc]`. **Output:** [`FullTransformerBatch`](/api/transformer#fulltransformerbatch). The Thinc [`Model`](https://thinc.ai/docs/api-model) wrapping the transformer. Usually you will want to use the [TransformerModel](/api/architectures#TransformerModel) layer for this.    |
-| `annotation_setter` | `Callable`                                 | Function that takes a batch of `Doc` objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set additional annotations on the `Doc`. Defaults to `null_annotation_setter`, a function that does nothing. |
+| `annotation_setter` | `Callable`                                 | Function that takes a batch of `Doc` objects and a [`FullTransformerBatch`](/api/transformer#fulltransformerbatch) and can set additional annotations on the `Doc`. The `Doc._.transformer_data` attribute is set prior to calling the callback. By default, no additional annotations are set. |
-| _keyword-only_      |                                            |                                                                                                                                                                                                                                         |
+| _keyword-only_      |                                            |                                                                                                                                                                                                                                                                                                 |
-| `name`              | str                                        | String name of the component instance. Used to add entries to the `losses` during training.                                                                                                                                             |
+| `name`              | str                                        | String name of the component instance. Used to add entries to the `losses` during training.                                                                                                                                                                                                     |
-| `max_batch_items`   | int                                        | Maximum size of a padded batch. Defaults to `128*32`.                                                                                                                                                                                   |
+| `max_batch_items`   | int                                        | Maximum size of a padded batch. Defaults to `128*32`.                                                                                                                                                                                                                                           |
 ## Transformer.\_\_call\_\_ {#call tag="method"}
@ -184,7 +195,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
 ## Transformer.set_annotations {#set_annotations tag="method"}
-Modify a batch of documents, using pre-computed scores.
+Assign the extracted features to the Doc objects. By default, the
 [`TransformerData`](/api/transformer#transformerdata) object is written to the
 [`Doc._.trf_data`](#custom-attributes) attribute. Your annotation_setter
 callback is then called, if provided.
 > #### Example
 >
@ -201,8 +215,19 @@ Modify a batch of documents, using pre-computed scores.
 ## Transformer.update {#update tag="method"}
-Learn from a batch of documents and gold-standard information, updating the
+Prepare for an update to the transformer. Like the [`Tok2Vec`](/api/tok2vec)
-pipe's model. Delegates to [`predict`](/api/transformer#predict).
+component, the `Transformer` component is unusual in that it does not receive
 "gold standard" annotations to calculate a weight update. The optimal output of
 the transformer data is unknown – it's a hidden layer inside the network that is
 updated by backpropagating from output layers.
 The `Transformer` component therefore does **not** perform a weight update
 during its own `update` method. Instead, it runs its transformer model and
 communicates the output and the backpropagation callback to any **downstream
 components** that have been connected to it via the
 [TransformerListener](/api/architectures#TransformerListener) sublayer. If there
 are multiple listeners, the last layer will actually backprop to the transformer
 and call the optimizer, while the others simply increment the gradients.
 > #### Example
 >
@ -212,15 +237,15 @@ pipe's model. Delegates to [`predict`](/api/transformer#predict).
 > losses = trf.update(examples, sgd=optimizer)
 > ```
-| Name              | Type                                                | Description                                                                                                                               |
+| Name              | Type                                                | Description                                                                                                                                                |
-| ----------------- | --------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
+| ----------------- | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `examples`        | `Iterable[Example]`                                 | A batch of [`Example`](/api/example) objects to learn from.                                                                               |
+| `examples`        | `Iterable[Example]`                                 | A batch of [`Example`](/api/example) objects. Only the [`Example.predicted`](/api/example#predicted) `Doc` object is used, the reference `Doc` is ignored. |
-| _keyword-only_    |                                                     |                                                                                                                                           |
+| _keyword-only_    |                                                     |                                                                                                                                                            |
-| `drop`            | float                                               | The dropout rate.                                                                                                                         |
+| `drop`            | float                                               | The dropout rate.                                                                                                                                          |
-| `set_annotations` | bool                                                | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/transformer#set_annotations). |
+| `set_annotations` | bool                                                | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/transformer#set_annotations).                  |
-| `sgd`             | [`Optimizer`](https://thinc.ai/docs/api-optimizers) | The optimizer.                                                                                                                            |
+| `sgd`             | [`Optimizer`](https://thinc.ai/docs/api-optimizers) | The optimizer.                                                                                                                                             |
-| `losses`          | `Dict[str, float]`                                  | Optional record of the loss during training. Updated using the component name as the key.                                                 |
+| `losses`          | `Dict[str, float]`                                  | Optional record of the loss during training. Updated using the component name as the key.                                                                  |
-| **RETURNS**       | `Dict[str, float]`                                  | The updated `losses` dictionary.                                                                                                          |
+| **RETURNS**       | `Dict[str, float]`                                  | The updated `losses` dictionary.                                                                                                                           |
 ## Transformer.create_optimizer {#create_optimizer tag="method"}
@ -396,14 +421,16 @@ Split a `TransformerData` object that represents a batch into a list with one
 ## Span getters {#span_getters tag="registered functions" source="github.com/explosion/spacy-transformers/blob/master/spacy_transformers/span_getters.py"}
 <!-- TODO: details on what this is for -->
 Span getters are functions that take a batch of [`Doc`](/api/doc) objects and
 return a lists of [`Span`](/api/span) objects for each doc, to be processed by
-the transformer. The returned spans can overlap. Span getters can be referenced
+the transformer. This is used to manage long documents, by cutting them into
-in the config's `[components.transformer.model.get_spans]` block to customize
+smaller sequences before running the transformer. The spans are allowed to
-the sequences processed by the transformer. You can also register custom span
+overlap, and you can also omit sections of the Doc if they are not relevant.
-getters using the `@registry.span_getters` decorator.
+
 Span getters can be referenced in the config's
 `[components.transformer.model.get_spans]` block to customize the sequences
 processed by the transformer. You can also register custom span getters using
 the `@registry.span_getters` decorator.
 > #### Example
 >
--- a/website/docs/usage/v3.md
+++ b/website/docs/usage/v3.md
@ -6,25 +6,97 @@ menu:
  - ['New Features', 'features']
  - ['Backwards Incompatibilities', 'incompat']
  - ['Migrating from v2.x', 'migrating']
  - ['Migrating plugins', 'plugins']
 ---
 ## Summary {#summary}
 ## New Features {#features}
 ### New training workflow and config system {#features-training}
 ### Transformer-based pipelines {#features-transformers}
 ### Custom models using any framework {#feautres-custom-models}
 ### Manage end-to-end workflows with projects {#features-projects}
 ### New built-in pipeline components {#features-pipeline-components}
 | Name                                            | Description                                                                                                                                                                                                  |
 | ----------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | [`SentenceRecognizer`](/api/sentencerecognizer) | Trainable component for sentence segmentation.                                                                                                                                                               |
 | [`Morphologizer`](/api/morphologizer)           | Trainable component to predict morphological features.                                                                                                                                                       |
 | [`Lemmatizer`](/api/lemmatizer)                 | Standalone component for rule-based and lookup lemmatization.                                                                                                                                                |
 | [`AttributeRuler`](/api/attributeruler)         | Component for setting token attributes using match patterns.                                                                                                                                                 |
 | [`Transformer`](/api/transformer)               | Component for using [transformer models](/usage/transformers) in your pipeline, accessing outputs and aligning tokens. Provided via [`spacy-transformers`](https://github.com/explosion/spacy-transformers). |
 ### New and improved pipeline component APIs {#features-components}
 - `Language.factory`, `Language.component`
 - `Language.analyze_pipes`
 - Adding components from other models
 ### Type hints and type-based data validation {#features-types}
 spaCy v3.0 officially drops support for Python 2 and now requires **Python
 3.6+**. This also means that the code base can take full advantage of
 [type hints](https://docs.python.org/3/library/typing.html). spaCy's user-facing
 API that's implemented in pure Python (as opposed to Cython) now comes with type
 hints. The new version of spaCy's machine learning library
 [Thinc](https://thinc.ai) also features extensive
 [type support](https://thinc.ai/docs/usage-type-checking/), including custom
 types for models and arrays, and a custom `mypy` plugin that can be used to
 type-check model definitions.
 For data validation, spacy v3.0 adopts
 [`pydantic`](https://github.com/samuelcolvin/pydantic). It also powers the data
 validation of Thinc's [config system](https://thinc.ai/docs/usage-config), which
 lets you to register **custom functions with typed arguments**, reference them
 in your config and see validation errors if the argument values don't match.
 ### CLI
 | Name                                    | Description                                                                                              |
 | --------------------------------------- | -------------------------------------------------------------------------------------------------------- |
 | [`init config`](/api/cli#init-config)   | Initialize a [training config](/usage/training) file for a blank language or auto-fill a partial config. |
 | [`debug config`](/api/cli#debug-config) | Debug a [training config](/usage/training) file and show validation errors.                              |
 | [`project`](/api/cli#project)           | Subcommand for cloning and running [spaCy projects](/usage/projects).                                    |
 ## Backwards Incompatibilities {#incompat}
-### Removed or renamed objects, methods, attributes and arguments {#incompat-removed}
+As always, we've tried to keep the breaking changes to a minimum and focus on
 changes that were necessary to support the new features, fix problems or improve
 usability. The following section lists the relevant changes to the user-facing
 API. For specific examples of how to rewrite your code, check out the
 [migration guide](#migrating).
-| Removed                                                  | Replacement                               |
+### Compatibility {#incompat-compat}
 | -------------------------------------------------------- | ----------------------------------------- |
 | `GoldParse`                                              | [`Example`](/api/example)                 |
 | `GoldCorpus`                                             | [`Corpus`](/api/corpus)                   |
 | `spacy debug-data`                                       | [`spacy debug data`](/api/cli#debug-data) |
 | `spacy link`, `util.set_data_path`, `util.get_data_path` | not needed, model symlinks are deprecated |
-### Removed deprecated methods, attributes and arguments {#incompat-removed-deprecated}
+- spaCy now requires **Python 3.6+**.
 ### API changes {#incompat-api}
 - [`Language.add_pipe`](/api/language#add_pipe) now takes the **string name** of
  the component factory instead of the component function.
 - **Custom pipeline components** now needs to be decorated with the
  [`@Language.component`](/api/language#component) or
  [`@Language.factory`](/api/language#factory) decorator.
 - [`Language.update`](/api/language#update) now takes a batch of
  [`Example`](/api/example) objects instead of raw texts and annotations, or
  `Doc` and `GoldParse` objects.
 - The `Language.disable_pipes` contextmanager has been replaced by
  [`Language.select_pipes`](/api/language#select_pipes), which can explicitly
  disable or enable components.
 ### Removed or renamed API {#incompat-removed}
 | Removed                                                  | Replacement                                           |
 | -------------------------------------------------------- | ----------------------------------------------------- |
 | `Language.disable_pipes`                                 | [`Language.select_pipes`](/api/language#select_pipes) |
 | `GoldParse`                                              | [`Example`](/api/example)                             |
 | `GoldCorpus`                                             | [`Corpus`](/api/corpus)                               |
 | `spacy debug-data`                                       | [`spacy debug data`](/api/cli#debug-data)             |
 | `spacy link`, `util.set_data_path`, `util.get_data_path` | not needed, model symlinks are deprecated             |
 The following deprecated methods, attributes and arguments were removed in v3.0.
 Most of them have been **deprecated for a while** and many would previously
@ -214,17 +286,14 @@ python -m spacy package ./model ./packages
 - python setup.py sdist
 ```
-## Migration notes for plugin maintainers {#plugins}
+#### Migration notes for plugin maintainers {#migrating-plugins}
 Thanks to everyone who's been contributing to the spaCy ecosystem by developing
 and maintaining one of the many awesome [plugins and extensions](/universe).
-We've tried to keep breaking changes to a minimum and make it as easy as
+We've tried to make it as easy as possible for you to upgrade your packages for
-possible for you to upgrade your packages for spaCy v3.
+spaCy v3. The most common use case for plugins is providing pipeline components
-
+and extension attributes. When migrating your plugin, double-check the
-### Custom pipeline components
+following:
 The most common use case for plugins is providing pipeline components and
 extension attributes.
 - Use the [`@Language.factory`](/api/language#factory) decorator to register
  your component and assign it a name. This allows users to refer to your