Merge branch 'develop' of https://github.com/explosion/spaCy into develop

2025-11-01 08:27:44 +03:00 · 2020-09-14 21:04:49 +02:00 · 2020-09-14 21:04:49 +02:00 · adf0bab23a
commit adf0bab23a
parent ae15fa9688 3216a33149
25 changed files with 206 additions and 116 deletions
--- a/README.md
+++ b/README.md
@ -4,17 +4,19 @@

 spaCy is a library for advanced Natural Language Processing in Python and
 Cython. It's built on the very latest research, and was designed from day one to
-be used in real products. spaCy comes with
-[pretrained statistical models](https://spacy.io/models) and word vectors, and
-currently supports tokenization for **60+ languages**. It features
+be used in real products.
+
+spaCy comes with
+[pretrained pipelines](https://spacy.io/models) and vectors, and
+currently supports tokenization for **59+ languages**. It features
 state-of-the-art speed, convolutional **neural network models** for tagging,
-parsing and **named entity recognition** and easy **deep learning** integration.
-It's commercial open-source software, released under the MIT license.
+parsing, **named entity recognition**, **text classification** and more, multi-task learning with pretrained **transformers** like BERT, as well as a production-ready training system and easy model packaging, deployment and workflow management.
+spaCy is commercial open-source software, released under the MIT license.

 💫 **Version 2.3 out now!**
 [Check out the release notes here.](https://github.com/explosion/spaCy/releases)

-[![Azure Pipelines](<https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build+(3.x)>)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
+[![Azure Pipelines](https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
 [![Current Release Version](https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square&logo=github)](https://github.com/explosion/spaCy/releases)
 [![pypi Version](https://img.shields.io/pypi/v/spacy.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/spacy/)
 [![conda Version](https://img.shields.io/conda/vn/conda-forge/spacy.svg?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/spacy)
@ -31,7 +33,7 @@ It's commercial open-source software, released under the MIT license.
 | --------------- | -------------------------------------------------------------- |
 | [spaCy 101]     | New to spaCy? Here's everything you need to know!              |
 | [Usage Guides]  | How to use spaCy and its features.                             |
-| [New in v2.3]   | New features, backwards incompatibilities and migration guide. |
+| [New in v3.0]   | New features, backwards incompatibilities and migration guide. |
 | [API Reference] | The detailed reference for spaCy's API.                        |
 | [Models]        | Download statistical language models for spaCy.                |
 | [Universe]      | Libraries, extensions, demos, books and courses.               |
@ -39,7 +41,7 @@ It's commercial open-source software, released under the MIT license.
 | [Contribute]    | How to contribute to the spaCy project and code base.          |

 [spacy 101]: https://spacy.io/usage/spacy-101
-[new in v2.3]: https://spacy.io/usage/v2-3
+[new in v3.0]: https://spacy.io/usage/v3
 [usage guides]: https://spacy.io/usage/
 [api reference]: https://spacy.io/api/
 [models]: https://spacy.io/models
@ -56,34 +58,29 @@ be able to provide individual support via email. We also believe that help is
 much more valuable if it's shared publicly, so that more people can benefit from
 it.

-| Type                     | Platforms                                              |
-| ------------------------ | ------------------------------------------------------ |
-| 🚨 **Bug Reports**       | [GitHub Issue Tracker]                                 |
-| 🎁 **Feature Requests**  | [GitHub Issue Tracker]                                 |
-| 👩‍💻 **Usage Questions**   | [Stack Overflow] · [Gitter Chat] · [Reddit User Group] |
-| 🗯 **General Discussion** | [Gitter Chat] · [Reddit User Group]                    |
+| Type                    | Platforms              |
+| ----------------------- | ---------------------- |
+| 🚨 **Bug Reports**      | [GitHub Issue Tracker] |
+| 🎁 **Feature Requests** | [GitHub Issue Tracker] |
+| 👩‍💻 **Usage Questions**  | [Stack Overflow]       |

 [github issue tracker]: https://github.com/explosion/spaCy/issues
 [stack overflow]: https://stackoverflow.com/questions/tagged/spacy
-[gitter chat]: https://gitter.im/explosion/spaCy
-[reddit user group]: https://www.reddit.com/r/spacynlp

 ## Features

- Non-destructive **tokenization**
- **Named entity** recognition
- Support for **50+ languages**
- pretrained [statistical models](https://spacy.io/models) and word vectors
+- Support for **59+ languages**
+- **Trained pipelines**
+- Multi-task learning with pretrained **transformers** like BERT
+- Pretrained **word vectors**
 - State-of-the-art speed
- Easy **deep learning** integration
- Part-of-speech tagging
- Labelled dependency parsing
- Syntax-driven sentence segmentation
+- Production-ready **training system**
+- Linguistically-motivated **tokenization**
+- Components for named **entity recognition**, part-of-speech-tagging, dependency parsing, sentence segmentation, **text classification**, lemmatization, morphological analysis, entity linking and more
+- Easily extensible with **custom components** and attributes
+- Support for custom models in **PyTorch**, **TensorFlow** and other frameworks
 - Built in **visualizers** for syntax and NER
- Convenient string-to-hash mapping
- Export to numpy data arrays
- Efficient binary serialization
- Easy **model packaging** and deployment
+- Easy **model packaging**, deployment and workflow management
 - Robust, rigorously evaluated accuracy

 📖 **For more details, see the
@ -102,13 +99,6 @@ For detailed installation instructions, see the
 [pip]: https://pypi.org/project/spacy/
 [conda]: https://anaconda.org/conda-forge/spacy

-> ⚠️ **Important note for Python 3.8:** We can't yet ship pre-compiled binary
-> wheels for spaCy that work on Python 3.8, as we're still waiting for our CI
-> providers and other tooling to support it. This means that in order to run
-> spaCy on Python 3.8, you'll need [a compiler installed](#source) and compile
-> the library and its Cython dependencies locally. If this is causing problems
-> for you, the easiest solution is to **use Python 3.7** in the meantime.
-
 ### pip

 Using pip, spaCy releases are available as source packages and binary wheels (as
@ -164,26 +154,26 @@ If you've trained your own models, keep in mind that your training and runtime
 inputs must match. After updating spaCy, we recommend **retraining your models**
 with the new version.

-📖 **For details on upgrading from spaCy 1.x to spaCy 2.x, see the
-[migration guide](https://spacy.io/usage/v2#migrating).**
+📖 **For details on upgrading from spaCy 2.x to spaCy 3.x, see the
+[migration guide](https://spacy.io/usage/v3#migrating).**

 ## Download models

-As of v1.7.0, models for spaCy can be installed as **Python packages**. This
+Trained pipelines for spaCy can be installed as **Python packages**. This
 means that they're a component of your application, just like any other module.
 Models can be installed using spaCy's `download` command, or manually by
 pointing pip to a path or URL.

-| Documentation          |                                                               |
-| ---------------------- | ------------------------------------------------------------- |
-| [Available Models]     | Detailed model descriptions, accuracy figures and benchmarks. |
-| [Models Documentation] | Detailed usage instructions.                                  |
+| Documentation          |                                                                  |
+| ---------------------- | ---------------------------------------------------------------- |
+| [Available Pipelines]  | Detailed pipeline descriptions, accuracy figures and benchmarks. |
+| [Models Documentation] | Detailed usage instructions.                                     |

-[available models]: https://spacy.io/models
+[available pipelines]: https://spacy.io/models
 [models documentation]: https://spacy.io/docs/usage/models

 ```bash
-# download best-matching version of specific model for your spaCy installation
+# Download best-matching version of specific model for your spaCy installation
 python -m spacy download en_core_web_sm

 # pip install .tar.gz archive from path or URL
--- a/spacy/cli/train.py
+++ b/spacy/cli/train.py
@ -89,7 +89,6 @@ def train(
        nlp, config = util.load_model_from_config(config)
    if config["training"]["vectors"] is not None:
        util.load_vectors_into_model(nlp, config["training"]["vectors"])
-    verify_config(nlp)
    raw_text, tag_map, morph_rules, weights_data = load_from_paths(config)
    T_cfg = config["training"]
    optimizer = T_cfg["optimizer"]
@ -108,6 +107,8 @@ def train(
            nlp.resume_training(sgd=optimizer)
    with nlp.select_pipes(disable=[*frozen_components, *resume_components]):
        nlp.begin_training(lambda: train_corpus(nlp), sgd=optimizer)
+    # Verify the config after calling 'begin_training' to ensure labels are properly initialized
+    verify_config(nlp)

    if tag_map:
        # Replace tag map with provided mapping
@ -401,7 +402,7 @@ def verify_cli_args(config_path: Path, output_path: Optional[Path] = None) -> No


 def verify_config(nlp: Language) -> None:
-    """Perform additional checks based on the config and loaded nlp object."""
+    """Perform additional checks based on the config, loaded nlp object and training data."""
    # TODO: maybe we should validate based on the actual components, the list
    # in config["nlp"]["pipeline"] instead?
    for pipe_config in nlp.config["components"].values():
@ -415,18 +416,13 @@ def verify_textcat_config(nlp: Language, pipe_config: Dict[str, Any]) -> None:
    # if 'positive_label' is provided: double check whether it's in the data and
    # the task is binary
    if pipe_config.get("positive_label"):
-        textcat_labels = nlp.get_pipe("textcat").cfg.get("labels", [])
+        textcat_labels = nlp.get_pipe("textcat").labels
        pos_label = pipe_config.get("positive_label")
        if pos_label not in textcat_labels:
-            msg.fail(
-                f"The textcat's 'positive_label' config setting '{pos_label}' "
-                f"does not match any label in the training data.",
-                exits=1,
+            raise ValueError(
+                Errors.E920.format(pos_label=pos_label, labels=textcat_labels)
            )
-        if len(textcat_labels) != 2:
-            msg.fail(
-                f"A textcat 'positive_label' '{pos_label}' was "
-                f"provided for training data that does not appear to be a "
-                f"binary classification problem with two labels.",
-                exits=1,
+        if len(list(textcat_labels)) != 2:
+            raise ValueError(
+                Errors.E919.format(pos_label=pos_label, labels=textcat_labels)
            )
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -480,6 +480,11 @@ class Errors:
    E201 = ("Span index out of range.")

    # TODO: fix numbering after merging develop into master
+    E919 = ("A textcat 'positive_label' '{pos_label}' was provided for training "
+            "data that does not appear to be a binary classification problem "
+            "with two labels. Labels found: {labels}")
+    E920 = ("The textcat's 'positive_label' config setting '{pos_label}' "
+            "does not match any label in the training data. Labels found: {labels}")
    E921 = ("The method 'set_output' can only be called on components that have "
            "a Model with a 'resize_output' attribute. Otherwise, the output "
            "layer can not be dynamically changed.")
--- a/spacy/pipeline/textcat.py
+++ b/spacy/pipeline/textcat.py
@ -56,7 +56,12 @@ subword_features = true
@Language.factory(
    "textcat",
    assigns=["doc.cats"],
-    default_config={"labels": [], "threshold": 0.5, "model": DEFAULT_TEXTCAT_MODEL},
+    default_config={
+        "labels": [],
+        "threshold": 0.5,
+        "positive_label": None,
+        "model": DEFAULT_TEXTCAT_MODEL,
+    },
    scores=[
        "cats_score",
        "cats_score_desc",
@ -74,8 +79,9 @@ def make_textcat(
    nlp: Language,
    name: str,
    model: Model[List[Doc], List[Floats2d]],
-    labels: Iterable[str],
+    labels: List[str],
    threshold: float,
+    positive_label: Optional[str],
 ) -> "TextCategorizer":
    """Create a TextCategorizer compoment. The text categorizer predicts categories
    over a whole document. It can learn one or more labels, and the labels can
@ -88,8 +94,16 @@ def make_textcat(
    labels (list): A list of categories to learn. If empty, the model infers the
        categories from the data.
    threshold (float): Cutoff to consider a prediction "positive".
+    positive_label (Optional[str]): The positive label for a binary task with exclusive classes, None otherwise.
    """
-    return TextCategorizer(nlp.vocab, model, name, labels=labels, threshold=threshold)
+    return TextCategorizer(
+        nlp.vocab,
+        model,
+        name,
+        labels=labels,
+        threshold=threshold,
+        positive_label=positive_label,
+    )


 class TextCategorizer(Pipe):
@ -104,8 +118,9 @@ class TextCategorizer(Pipe):
        model: Model,
        name: str = "textcat",
        *,
-        labels: Iterable[str],
+        labels: List[str],
        threshold: float,
+        positive_label: Optional[str],
    ) -> None:
        """Initialize a text categorizer.

@ -113,8 +128,9 @@ class TextCategorizer(Pipe):
        model (thinc.api.Model): The Thinc Model powering the pipeline component.
        name (str): The component instance name, used to add entries to the
            losses during training.
-        labels (Iterable[str]): The labels to use.
+        labels (List[str]): The labels to use.
        threshold (float): Cutoff to consider a prediction "positive".
+        positive_label (Optional[str]): The positive label for a binary task with exclusive classes, None otherwise.

        DOCS: https://nightly.spacy.io/api/textcategorizer#init
        """
@ -122,7 +138,11 @@ class TextCategorizer(Pipe):
        self.model = model
        self.name = name
        self._rehearsal_model = None
-        cfg = {"labels": labels, "threshold": threshold}
+        cfg = {
+            "labels": labels,
+            "threshold": threshold,
+            "positive_label": positive_label,
+        }
        self.cfg = dict(cfg)

    @property
@ -131,10 +151,10 @@ class TextCategorizer(Pipe):

        DOCS: https://nightly.spacy.io/api/textcategorizer#labels
        """
-        return tuple(self.cfg.setdefault("labels", []))
+        return tuple(self.cfg["labels"])

    @labels.setter
-    def labels(self, value: Iterable[str]) -> None:
+    def labels(self, value: List[str]) -> None:
        self.cfg["labels"] = tuple(value)

    def pipe(self, stream: Iterable[Doc], *, batch_size: int = 128) -> Iterator[Doc]:
@ -353,17 +373,10 @@ class TextCategorizer(Pipe):
            sgd = self.create_optimizer()
        return sgd

-    def score(
-        self,
-        examples: Iterable[Example],
-        *,
-        positive_label: Optional[str] = None,
-        **kwargs,
-    ) -> Dict[str, Any]:
+    def score(self, examples: Iterable[Example], **kwargs) -> Dict[str, Any]:
        """Score a batch of examples.

        examples (Iterable[Example]): The examples to score.
-        positive_label (str): Optional positive label.
        RETURNS (Dict[str, Any]): The scores, produced by Scorer.score_cats.

        DOCS: https://nightly.spacy.io/api/textcategorizer#score
@ -374,7 +387,7 @@ class TextCategorizer(Pipe):
            "cats",
            labels=self.labels,
            multi_label=self.model.attrs["multi_label"],
-            positive_label=positive_label,
+            positive_label=self.cfg["positive_label"],
            threshold=self.cfg["threshold"],
            **kwargs,
        )
--- a/spacy/tests/pipeline/test_textcat.py
+++ b/spacy/tests/pipeline/test_textcat.py
@ -10,6 +10,7 @@ from spacy.tokens import Doc
 from spacy.pipeline.tok2vec import DEFAULT_TOK2VEC_MODEL

 from ..util import make_tempdir
+from ...cli.train import verify_textcat_config
 from ...training import Example


@ -130,7 +131,10 @@ def test_overfitting_IO():
    fix_random_seed(0)
    nlp = English()
    # Set exclusive labels
-    textcat = nlp.add_pipe("textcat", config={"model": {"exclusive_classes": True}})
+    textcat = nlp.add_pipe(
+        "textcat",
+        config={"model": {"exclusive_classes": True}, "positive_label": "POSITIVE"},
+    )
    train_examples = []
    for text, annotations in TRAIN_DATA:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
@ -159,7 +163,7 @@ def test_overfitting_IO():
        assert cats2["POSITIVE"] + cats2["NEGATIVE"] == pytest.approx(1.0, 0.001)

    # Test scoring
-    scores = nlp.evaluate(train_examples, scorer_cfg={"positive_label": "POSITIVE"})
+    scores = nlp.evaluate(train_examples)
    assert scores["cats_micro_f"] == 1.0
    assert scores["cats_score"] == 1.0
    assert "cats_score_desc" in scores
@ -194,3 +198,29 @@ def test_textcat_configs(textcat_config):
    for i in range(5):
        losses = {}
        nlp.update(train_examples, sgd=optimizer, losses=losses)
+
+
+def test_positive_class():
+    nlp = English()
+    pipe_config = {"positive_label": "POS", "labels": ["POS", "NEG"]}
+    textcat = nlp.add_pipe("textcat", config=pipe_config)
+    assert textcat.labels == ("POS", "NEG")
+    verify_textcat_config(nlp, pipe_config)
+
+
+def test_positive_class_not_present():
+    nlp = English()
+    pipe_config = {"positive_label": "POS", "labels": ["SOME", "THING"]}
+    textcat = nlp.add_pipe("textcat", config=pipe_config)
+    assert textcat.labels == ("SOME", "THING")
+    with pytest.raises(ValueError):
+        verify_textcat_config(nlp, pipe_config)
+
+
+def test_positive_class_not_binary():
+    nlp = English()
+    pipe_config = {"positive_label": "POS", "labels": ["SOME", "THING", "POS"]}
+    textcat = nlp.add_pipe("textcat", config=pipe_config)
+    assert textcat.labels == ("SOME", "THING", "POS")
+    with pytest.raises(ValueError):
+        verify_textcat_config(nlp, pipe_config)
--- a/spacy/tests/serialize/test_serialize_pipeline.py
+++ b/spacy/tests/serialize/test_serialize_pipeline.py
@ -136,7 +136,7 @@ def test_serialize_textcat_empty(en_vocab):
    # See issue #1105
    cfg = {"model": DEFAULT_TEXTCAT_MODEL}
    model = registry.make_from_config(cfg, validate=True)["model"]
-    textcat = TextCategorizer(en_vocab, model, labels=["ENTITY", "ACTION", "MODIFIER"], threshold=0.5)
+    textcat = TextCategorizer(en_vocab, model, labels=["ENTITY", "ACTION", "MODIFIER"], threshold=0.5, positive_label=None)
    textcat.to_bytes(exclude=["vocab"])


--- a/website/README.md
+++ b/website/README.md
@ -630,3 +630,49 @@ In addition to the native markdown elements, you can use the components
 ├── gatsby-node.js       # Node-specific hooks for Gatsby
 └── package.json         # package settings and dependencies
 ```
+
+## Editorial {#editorial}
+
+- "spaCy" should always be spelled with a lowercase "s" and a capital "C",
+  unless it specifically refers to the Python package or Python import `spacy`
+  (in which case it should be formatted as code).
+  - ✅ spaCy is a library for advanced NLP in Python.
+  - ❌ Spacy is a library for advanced NLP in Python.
+  - ✅ First, you need to install the `spacy` package from pip.
+- Mentions of code, like function names, classes, variable names etc. in inline
+  text should be formatted as `code`.
+  - ✅ "Calling the `nlp` object on a text returns a `Doc`."
+- Objects that have pages in the [API docs](/api) should be linked – for
+  example, [`Doc`](/api/doc) or [`Language.to_disk`](/api/language#to_disk). The
+  mentions should still be formatted as code within the link. Links pointing to
+  the API docs will automatically receive a little icon. However, if a paragraph
+  includes many references to the API, the links can easily get messy. In that
+  case, we typically only link the first mention of an object and not any
+  subsequent ones.
+  - ✅ The [`Span`](/api/span) and [`Token`](/api/token) objects are views of a
+    [`Doc`](/api/doc). [`Span.as_doc`](/api/span#as_doc) creates a `Doc` object
+    from a `Span`.
+  - ❌ The [`Span`](/api/span) and [`Token`](/api/token) objects are views of a
+    [`Doc`](/api/doc). [`Span.as_doc`](/api/span#as_doc) creates a
+    [`Doc`](/api/doc) object from a [`Span`](/api/span).
+
+* Other things we format as code are: references to trained pipeline packages
+  like `en_core_web_sm` or file names like `code.py` or `meta.json`.
+
+  - ✅ After training, the `config.cfg` is saved to disk.
+
+* [Type annotations](#type-annotations) are a special type of code formatting,
+  expressed by wrapping the text in `~~` instead of backticks. The result looks
+  like this: ~~List[Doc]~~. All references to known types will be linked
+  automatically.
+
+  - ✅ The model has the input type ~~List[Doc]~~ and it outputs a
+    ~~List[Array2d]~~.
+
+* We try to keep links meaningful but short.
+  - ✅ For details, see the usage guide on
+    [training with custom code](/usage/training#custom-code).
+  - ❌ For details, see
+    [the usage guide on training with custom code](/usage/training#custom-code).
+  - ❌ For details, see the usage guide on training with custom code
+    [here](/usage/training#custom-code).
--- a/website/docs/api/dependencymatcher.md
+++ b/website/docs/api/dependencymatcher.md
@ -183,7 +183,7 @@ will be overwritten.
 | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `match_id`     | An ID for the patterns. ~~str~~                                                                                                                                      |
 | `patterns`     | A list of match patterns. A pattern consists of a list of dicts, where each dict describes a token in the tree. ~~List[List[Dict[str, Union[str, Dict]]]]~~          |
-| _keyword-only_ |                                                                                                                                                                      |  |
+| _keyword-only_ |                                                                                                                                                                      |
 | `on_match`     | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[DependencyMatcher, Doc, int, List[Tuple], Any]]~~ |

 ## DependencyMatcher.get {#get tag="method"}
--- a/website/docs/api/dependencyparser.md
+++ b/website/docs/api/dependencyparser.md
@ -217,7 +217,7 @@ model. Delegates to [`predict`](/api/dependencyparser#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
--- a/website/docs/api/entitylinker.md
+++ b/website/docs/api/entitylinker.md
@ -85,7 +85,7 @@ providing custom registered functions.
 | `vocab`          | The shared vocabulary. ~~Vocab~~                                                                                                 |
 | `model`          | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model~~                                        |
 | `name`           | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~                              |
-| _keyword-only_   |                                                                                                                                  |  |
+| _keyword-only_   |                                                                                                                                  |
 | `kb_loader`      | Function that creates a [`KnowledgeBase`](/api/kb) from a `Vocab` instance. ~~Callable[[Vocab], KnowledgeBase]~~                 |
 | `get_candidates` | Function that generates plausible candidates for a given `Span` object. ~~Callable[[KnowledgeBase, Span], Iterable[Candidate]]~~ |
 | `labels_discard` | NER labels that will automatically get a `"NIL"` prediction. ~~Iterable[str]~~                                                   |
@ -218,7 +218,7 @@ pipe's entity linking model and context encoder. Delegates to
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
--- a/website/docs/api/entityrecognizer.md
+++ b/website/docs/api/entityrecognizer.md
@ -206,7 +206,7 @@ model. Delegates to [`predict`](/api/entityrecognizer#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
--- a/website/docs/api/entityruler.md
+++ b/website/docs/api/entityruler.md
@ -255,7 +255,7 @@ Get all patterns that were added to the entity ruler.

 | Name              | Description                                                                                                           |
 | ----------------- | --------------------------------------------------------------------------------------------------------------------- |
-| `matcher`         | The underlying matcher used to process token patterns. ~~Matcher~~                                                    |  |
+| `matcher`         | The underlying matcher used to process token patterns. ~~Matcher~~                                                    |
 | `phrase_matcher`  | The underlying phrase matcher, used to process phrase patterns. ~~PhraseMatcher~~                                     |
 | `token_patterns`  | The token patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Dict[str, Union[str, List[dict]]]]~~ |
 | `phrase_patterns` | The phrase patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Doc]]~~                             |
--- a/website/docs/api/lemmatizer.md
+++ b/website/docs/api/lemmatizer.md
@ -81,7 +81,7 @@ shortcut for this and instantiate the component using its string name and
 | `vocab`        | The shared vocabulary. ~~Vocab~~                                                                                                                               |
 | `model`        | **Not yet implemented:** The model to use. ~~Model~~                                                                                                           |
 | `name`         | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~                                                            |
-| _keyword-only_ |                                                                                                                                                                |  |
+| _keyword-only_ |                                                                                                                                                                |
 | mode           | The lemmatizer mode, e.g. `"lookup"` or `"rule"`. Defaults to `"lookup"`. ~~str~~                                                                              |
 | lookups        | A lookups object containing the tables such as `"lemma_rules"`, `"lemma_index"`, `"lemma_exc"` and `"lemma_lookup"`. Defaults to `None`. ~~Optional[Lookups]~~ |
 | overwrite      | Whether to overwrite existing lemmas. ~~bool~                                                                                                                  |
--- a/website/docs/api/morphologizer.md
+++ b/website/docs/api/morphologizer.md
@ -139,7 +139,7 @@ setting up the label scheme based on the data.
 | Name           | Description                                                                                                                           |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
 | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
-| _keyword-only_ |                                                                                                                                       |  |
+| _keyword-only_ |                                                                                                                                       |
 | `pipeline`     | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~             |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                         |
 | **RETURNS**    | The optimizer. ~~Optimizer~~                                                                                                          |
@ -196,7 +196,7 @@ Delegates to [`predict`](/api/morphologizer#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
--- a/website/docs/api/phrasematcher.md
+++ b/website/docs/api/phrasematcher.md
@ -150,9 +150,9 @@ patterns = [nlp("health care reform"), nlp("healthcare reform")]

 | Name           | Description                                                                                                                                                |
 | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `match_id`     | str                                                                                                                                                        | An ID for the thing you're matching. ~~str~~ |
+| `match_id`     | An ID for the thing you're matching. ~~str~~ |                                                                                                                                                        | 
 | `docs`         | `Doc` objects of the phrases to match. ~~List[Doc]~~                                                                                                       |
-| _keyword-only_ |                                                                                                                                                            |  |
+| _keyword-only_ |                                                                                                                                                            |
 | `on_match`     | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |

 ## PhraseMatcher.remove {#remove tag="method" new="2.2"}
--- a/website/docs/api/pipe.md
+++ b/website/docs/api/pipe.md
@ -187,7 +187,7 @@ predictions and gold-standard annotations, and update the component's model.
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
@ -211,7 +211,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
-| _keyword-only_ |                                                                                                                          |  |
+| _keyword-only_ |                                                                                                                          |
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
--- a/website/docs/api/sentencerecognizer.md
+++ b/website/docs/api/sentencerecognizer.md
@ -192,7 +192,7 @@ Delegates to [`predict`](/api/sentencerecognizer#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
@ -216,7 +216,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
-| _keyword-only_ |                                                                                                                          |  |
+| _keyword-only_ |                                                                                                                          |
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
--- a/website/docs/api/sentencizer.md
+++ b/website/docs/api/sentencizer.md
@ -53,7 +53,7 @@ Initialize the sentencizer.

 | Name           | Description                                                                                                             |
 | -------------- | ----------------------------------------------------------------------------------------------------------------------- |
-| _keyword-only_ |                                                                                                                         |  |
+| _keyword-only_ |                                                                                                                         |
 | `punct_chars`  | Optional custom list of punctuation characters that mark sentence ends. See below for defaults. ~~Optional[List[str]]~~ |

 ```python
--- a/website/docs/api/tagger.md
+++ b/website/docs/api/tagger.md
@ -190,7 +190,7 @@ Delegates to [`predict`](/api/tagger#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
@ -214,7 +214,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
-| _keyword-only_ |                                                                                                                          |  |
+| _keyword-only_ |                                                                                                                          |
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
--- a/website/docs/api/textcategorizer.md
+++ b/website/docs/api/textcategorizer.md
@ -36,11 +36,12 @@ architectures and their arguments and hyperparameters.
 > nlp.add_pipe("textcat", config=config)
 > ```

-| Setting     | Description                                                                                                                                                      |
-| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `labels`    | A list of categories to learn. If empty, the model infers the categories from the data. Defaults to `[]`. ~~Iterable[str]~~                                      |
-| `threshold` | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~                                                                   |
-| `model`     | A model instance that predicts scores for each category. Defaults to [TextCatEnsemble](/api/architectures#TextCatEnsemble). ~~Model[List[Doc], List[Floats2d]]~~ |
+| Setting          | Description                                                                                                                                                      |
+| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `labels`         | A list of categories to learn. If empty, the model infers the categories from the data. Defaults to `[]`. ~~Iterable[str]~~                                      |
+| `threshold`      | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~                                                                   |
+| `positive_label` | The positive label for a binary task with exclusive classes, None otherwise and by default. ~~Optional[str]~~                                                    |
+| `model`          | A model instance that predicts scores for each category. Defaults to [TextCatEnsemble](/api/architectures#TextCatEnsemble). ~~Model[List[Doc], List[Floats2d]]~~ |

 ```python
 %%GITHUB_SPACY/spacy/pipeline/textcat.py
@ -60,21 +61,22 @@ architectures and their arguments and hyperparameters.
 >
 > # Construction from class
 > from spacy.pipeline import TextCategorizer
-> textcat = TextCategorizer(nlp.vocab, model, labels=[], threshold=0.5)
+> textcat = TextCategorizer(nlp.vocab, model, labels=[], threshold=0.5, positive_label="POS")
 > ```

 Create a new pipeline instance. In your application, you would normally use a
 shortcut for this and instantiate the component using its string name and
 [`nlp.add_pipe`](/api/language#create_pipe).

-| Name           | Description                                                                                                                |
-| -------------- | -------------------------------------------------------------------------------------------------------------------------- |
-| `vocab`        | The shared vocabulary. ~~Vocab~~                                                                                           |
-| `model`        | The Thinc [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model[List[Doc], List[Floats2d]]~~ |
-| `name`         | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~                        |
-| _keyword-only_ |                                                                                                                            |
-| `labels`       | The labels to use. ~~Iterable[str]~~                                                                                       |
-| `threshold`    | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~                             |
+| Name             | Description                                                                                                                |
+| ---------------- | -------------------------------------------------------------------------------------------------------------------------- |
+| `vocab`          | The shared vocabulary. ~~Vocab~~                                                                                           |
+| `model`          | The Thinc [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model[List[Doc], List[Floats2d]]~~ |
+| `name`           | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~                        |
+| _keyword-only_   |                                                                                                                            |
+| `labels`         | The labels to use. ~~Iterable[str]~~                                                                                       |
+| `threshold`      | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~                             |
+| `positive_label` | The positive label for a binary task with exclusive classes, None otherwise. ~~Optional[str]~~                             |

 ## TextCategorizer.\_\_call\_\_ {#call tag="method"}

@ -201,7 +203,7 @@ Delegates to [`predict`](/api/textcategorizer#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    | 
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
@ -225,7 +227,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
-| _keyword-only_ |                                                                                                                          |  |
+| _keyword-only_ |                                                                                                                          | 
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
@ -263,7 +265,7 @@ Score a batch of examples.
 | Name             | Description                                                                                                          |
 | ---------------- | -------------------------------------------------------------------------------------------------------------------- |
 | `examples`       | The examples to score. ~~Iterable[Example]~~                                                                         |
-| _keyword-only_   |                                                                                                                      |  |
+| _keyword-only_   |                                                                                                                      |
 | `positive_label` | Optional positive label. ~~Optional[str]~~                                                                           |
 | **RETURNS**      | The scores, produced by [`Scorer.score_cats`](/api/scorer#score_cats). ~~Dict[str, Union[float, Dict[str, float]]]~~ |

--- a/website/docs/api/tok2vec.md
+++ b/website/docs/api/tok2vec.md
@ -144,7 +144,7 @@ setting up the label scheme based on the data.
 | Name           | Description                                                                                                                           |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
 | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
-| _keyword-only_ |                                                                                                                                       |  |
+| _keyword-only_ |                                                                                                                                       |
 | `pipeline`     | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~             |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                         |
 | **RETURNS**    | The optimizer. ~~Optimizer~~                                                                                                          |
@ -200,7 +200,7 @@ Delegates to [`predict`](/api/tok2vec#predict).
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
--- a/website/docs/styleguide.md
+++ b/website/docs/styleguide.md
@ -11,6 +11,7 @@ menu:
  - ['Setup & Installation', 'setup']
  - ['Markdown Reference', 'markdown']
  - ['Project Structure', 'structure']
+  - ['Editorial', 'editorial']
 sidebar:
  - label: Styleguide
    items:
--- a/website/src/components/quickstart.js
+++ b/website/src/components/quickstart.js
@ -27,6 +27,7 @@ const Quickstart = ({
    hidePrompts,
    small,
    codeLang,
+    Container = Section,
    children,
 }) => {
    const contentRef = useRef()
@ -83,7 +84,7 @@ const Quickstart = ({
    }, [data, initialized])

    return !data.length ? null : (
-        <Section id={id}>
+        <Container id={id}>
            <div className={classNames(classes.root, { [classes.hidePrompts]: !!hidePrompts })}>
                {title && (
                    <H2 className={classes.title} name={id}>
@ -249,7 +250,7 @@ const Quickstart = ({
                </pre>
                {showCopy && <textarea ref={copyAreaRef} className={classes.copyArea} rows={1} />}
            </div>
-        </Section>
+        </Container>
    )
 }

--- a/website/src/styles/list.module.sass
+++ b/website/src/styles/list.module.sass
@ -41,3 +41,7 @@

    &:before
        content: ""
+
+    .ul .ul &
+        text-indent: initial
+        margin-left: -20px
--- a/website/src/widgets/quickstart-training.js
+++ b/website/src/widgets/quickstart-training.js
@ -87,6 +87,8 @@ export default function QuickstartTraining({ id, title, download = 'base_config.
                    .sort((a, b) => a.title.localeCompare(b.title))
                return (
                    <Quickstart
+                        id="quickstart-widget"
+                        Container="div"
                        download={download}
                        rawContent={content}
                        data={DATA}