Merge branch 'develop' of https://github.com/explosion/spaCy into develop

2025-11-06 02:47:29 +03:00 · 2020-09-14 21:04:49 +02:00 · 2020-09-14 21:04:49 +02:00 · adf0bab23a
commit adf0bab23a
parent ae15fa9688 3216a33149
25 changed files with 206 additions and 116 deletions
--- a/README.md
+++ b/README.md
@ -4,17 +4,19 @@
 spaCy is a library for advanced Natural Language Processing in Python and
 Cython. It's built on the very latest research, and was designed from day one to
-be used in real products. spaCy comes with
+be used in real products.
-[pretrained statistical models](https://spacy.io/models) and word vectors, and
+
-currently supports tokenization for **60+ languages**. It features
+spaCy comes with
 [pretrained pipelines](https://spacy.io/models) and vectors, and
 currently supports tokenization for **59+ languages**. It features
 state-of-the-art speed, convolutional **neural network models** for tagging,
-parsing and **named entity recognition** and easy **deep learning** integration.
+parsing, **named entity recognition**, **text classification** and more, multi-task learning with pretrained **transformers** like BERT, as well as a production-ready training system and easy model packaging, deployment and workflow management.
-It's commercial open-source software, released under the MIT license.
+spaCy is commercial open-source software, released under the MIT license.
 💫 **Version 2.3 out now!**
 [Check out the release notes here.](https://github.com/explosion/spaCy/releases)
-[![Azure Pipelines](<https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build+(3.x)>)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
+[![Azure Pipelines](https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
 [![Current Release Version](https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square&logo=github)](https://github.com/explosion/spaCy/releases)
 [![pypi Version](https://img.shields.io/pypi/v/spacy.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/spacy/)
 [![conda Version](https://img.shields.io/conda/vn/conda-forge/spacy.svg?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/spacy)
@ -31,7 +33,7 @@ It's commercial open-source software, released under the MIT license.
 | --------------- | -------------------------------------------------------------- |
 | [spaCy 101]     | New to spaCy? Here's everything you need to know!              |
 | [Usage Guides]  | How to use spaCy and its features.                             |
-| [New in v2.3]   | New features, backwards incompatibilities and migration guide. |
+| [New in v3.0]   | New features, backwards incompatibilities and migration guide. |
 | [API Reference] | The detailed reference for spaCy's API.                        |
 | [Models]        | Download statistical language models for spaCy.                |
 | [Universe]      | Libraries, extensions, demos, books and courses.               |
@ -39,7 +41,7 @@ It's commercial open-source software, released under the MIT license.
 | [Contribute]    | How to contribute to the spaCy project and code base.          |
 [spacy 101]: https://spacy.io/usage/spacy-101
-[new in v2.3]: https://spacy.io/usage/v2-3
+[new in v3.0]: https://spacy.io/usage/v3
 [usage guides]: https://spacy.io/usage/
 [api reference]: https://spacy.io/api/
 [models]: https://spacy.io/models
@ -57,33 +59,28 @@ much more valuable if it's shared publicly, so that more people can benefit from
 it.
 | Type                    | Platforms              |
-| ------------------------ | ------------------------------------------------------ |
+| ----------------------- | ---------------------- |
 | 🚨 **Bug Reports**      | [GitHub Issue Tracker] |
 | 🎁 **Feature Requests** | [GitHub Issue Tracker] |
-| 👩‍💻 **Usage Questions**   | [Stack Overflow] · [Gitter Chat] · [Reddit User Group] |
+| 👩‍💻 **Usage Questions**  | [Stack Overflow]       |
 | 🗯 **General Discussion** | [Gitter Chat] · [Reddit User Group]                    |
 [github issue tracker]: https://github.com/explosion/spaCy/issues
 [stack overflow]: https://stackoverflow.com/questions/tagged/spacy
 [gitter chat]: https://gitter.im/explosion/spaCy
 [reddit user group]: https://www.reddit.com/r/spacynlp
 ## Features
- Non-destructive **tokenization**
+- Support for **59+ languages**
- **Named entity** recognition
+- **Trained pipelines**
- Support for **50+ languages**
+- Multi-task learning with pretrained **transformers** like BERT
- pretrained [statistical models](https://spacy.io/models) and word vectors
+- Pretrained **word vectors**
 - State-of-the-art speed
- Easy **deep learning** integration
+- Production-ready **training system**
- Part-of-speech tagging
+- Linguistically-motivated **tokenization**
- Labelled dependency parsing
+- Components for named **entity recognition**, part-of-speech-tagging, dependency parsing, sentence segmentation, **text classification**, lemmatization, morphological analysis, entity linking and more
- Syntax-driven sentence segmentation
+- Easily extensible with **custom components** and attributes
 - Support for custom models in **PyTorch**, **TensorFlow** and other frameworks
 - Built in **visualizers** for syntax and NER
- Convenient string-to-hash mapping
+- Easy **model packaging**, deployment and workflow management
 - Export to numpy data arrays
 - Efficient binary serialization
 - Easy **model packaging** and deployment
 - Robust, rigorously evaluated accuracy
 📖 **For more details, see the
@ -102,13 +99,6 @@ For detailed installation instructions, see the
 [pip]: https://pypi.org/project/spacy/
 [conda]: https://anaconda.org/conda-forge/spacy
 > ⚠️ **Important note for Python 3.8:** We can't yet ship pre-compiled binary
 > wheels for spaCy that work on Python 3.8, as we're still waiting for our CI
 > providers and other tooling to support it. This means that in order to run
 > spaCy on Python 3.8, you'll need [a compiler installed](#source) and compile
 > the library and its Cython dependencies locally. If this is causing problems
 > for you, the easiest solution is to **use Python 3.7** in the meantime.
 ### pip
 Using pip, spaCy releases are available as source packages and binary wheels (as
@ -164,26 +154,26 @@ If you've trained your own models, keep in mind that your training and runtime
 inputs must match. After updating spaCy, we recommend **retraining your models**
 with the new version.
-📖 **For details on upgrading from spaCy 1.x to spaCy 2.x, see the
+📖 **For details on upgrading from spaCy 2.x to spaCy 3.x, see the
-[migration guide](https://spacy.io/usage/v2#migrating).**
+[migration guide](https://spacy.io/usage/v3#migrating).**
 ## Download models
-As of v1.7.0, models for spaCy can be installed as **Python packages**. This
+Trained pipelines for spaCy can be installed as **Python packages**. This
 means that they're a component of your application, just like any other module.
 Models can be installed using spaCy's `download` command, or manually by
 pointing pip to a path or URL.
 | Documentation          |                                                                  |
-| ---------------------- | ------------------------------------------------------------- |
+| ---------------------- | ---------------------------------------------------------------- |
-| [Available Models]     | Detailed model descriptions, accuracy figures and benchmarks. |
+| [Available Pipelines]  | Detailed pipeline descriptions, accuracy figures and benchmarks. |
 | [Models Documentation] | Detailed usage instructions.                                     |
-[available models]: https://spacy.io/models
+[available pipelines]: https://spacy.io/models
 [models documentation]: https://spacy.io/docs/usage/models
 ```bash
-# download best-matching version of specific model for your spaCy installation
+# Download best-matching version of specific model for your spaCy installation
 python -m spacy download en_core_web_sm
 # pip install .tar.gz archive from path or URL
--- a/spacy/cli/train.py
+++ b/spacy/cli/train.py
@ -89,7 +89,6 @@ def train(
        nlp, config = util.load_model_from_config(config)
    if config["training"]["vectors"] is not None:
        util.load_vectors_into_model(nlp, config["training"]["vectors"])
    verify_config(nlp)
    raw_text, tag_map, morph_rules, weights_data = load_from_paths(config)
    T_cfg = config["training"]
    optimizer = T_cfg["optimizer"]
@ -108,6 +107,8 @@ def train(
            nlp.resume_training(sgd=optimizer)
    with nlp.select_pipes(disable=[*frozen_components, *resume_components]):
        nlp.begin_training(lambda: train_corpus(nlp), sgd=optimizer)
    # Verify the config after calling 'begin_training' to ensure labels are properly initialized
    verify_config(nlp)
    if tag_map:
        # Replace tag map with provided mapping
@ -401,7 +402,7 @@ def verify_cli_args(config_path: Path, output_path: Optional[Path] = None) -> No
 def verify_config(nlp: Language) -> None:
-    """Perform additional checks based on the config and loaded nlp object."""
+    """Perform additional checks based on the config, loaded nlp object and training data."""
    # TODO: maybe we should validate based on the actual components, the list
    # in config["nlp"]["pipeline"] instead?
    for pipe_config in nlp.config["components"].values():
@ -415,18 +416,13 @@ def verify_textcat_config(nlp: Language, pipe_config: Dict[str, Any]) -> None:
    # if 'positive_label' is provided: double check whether it's in the data and
    # the task is binary
    if pipe_config.get("positive_label"):
-        textcat_labels = nlp.get_pipe("textcat").cfg.get("labels", [])
+        textcat_labels = nlp.get_pipe("textcat").labels
        pos_label = pipe_config.get("positive_label")
        if pos_label not in textcat_labels:
-            msg.fail(
+            raise ValueError(
-                f"The textcat's 'positive_label' config setting '{pos_label}' "
+                Errors.E920.format(pos_label=pos_label, labels=textcat_labels)
                f"does not match any label in the training data.",
                exits=1,
            )
-        if len(textcat_labels) != 2:
+        if len(list(textcat_labels)) != 2:
-            msg.fail(
+            raise ValueError(
-                f"A textcat 'positive_label' '{pos_label}' was "
+                Errors.E919.format(pos_label=pos_label, labels=textcat_labels)
                f"provided for training data that does not appear to be a "
                f"binary classification problem with two labels.",
                exits=1,
            )
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -480,6 +480,11 @@ class Errors:
    E201 = ("Span index out of range.")
    # TODO: fix numbering after merging develop into master
    E919 = ("A textcat 'positive_label' '{pos_label}' was provided for training "
            "data that does not appear to be a binary classification problem "
            "with two labels. Labels found: {labels}")
    E920 = ("The textcat's 'positive_label' config setting '{pos_label}' "
            "does not match any label in the training data. Labels found: {labels}")
    E921 = ("The method 'set_output' can only be called on components that have "
            "a Model with a 'resize_output' attribute. Otherwise, the output "
            "layer can not be dynamically changed.")
--- a/spacy/pipeline/textcat.py
+++ b/spacy/pipeline/textcat.py
@ -56,7 +56,12 @@ subword_features = true
@Language.factory(
    "textcat",
    assigns=["doc.cats"],
-    default_config={"labels": [], "threshold": 0.5, "model": DEFAULT_TEXTCAT_MODEL},
+    default_config={
        "labels": [],
        "threshold": 0.5,
        "positive_label": None,
        "model": DEFAULT_TEXTCAT_MODEL,
    },
    scores=[
        "cats_score",
        "cats_score_desc",
@ -74,8 +79,9 @@ def make_textcat(
    nlp: Language,
    name: str,
    model: Model[List[Doc], List[Floats2d]],
-    labels: Iterable[str],
+    labels: List[str],
    threshold: float,
    positive_label: Optional[str],
 ) -> "TextCategorizer":
    """Create a TextCategorizer compoment. The text categorizer predicts categories
    over a whole document. It can learn one or more labels, and the labels can
@ -88,8 +94,16 @@ def make_textcat(
    labels (list): A list of categories to learn. If empty, the model infers the
        categories from the data.
    threshold (float): Cutoff to consider a prediction "positive".
    positive_label (Optional[str]): The positive label for a binary task with exclusive classes, None otherwise.
    """
-    return TextCategorizer(nlp.vocab, model, name, labels=labels, threshold=threshold)
+    return TextCategorizer(
        nlp.vocab,
        model,
        name,
        labels=labels,
        threshold=threshold,
        positive_label=positive_label,
    )
 class TextCategorizer(Pipe):
@ -104,8 +118,9 @@ class TextCategorizer(Pipe):
        model: Model,
        name: str = "textcat",
        *,
-        labels: Iterable[str],
+        labels: List[str],
        threshold: float,
        positive_label: Optional[str],
    ) -> None:
        """Initialize a text categorizer.
@ -113,8 +128,9 @@ class TextCategorizer(Pipe):
        model (thinc.api.Model): The Thinc Model powering the pipeline component.
        name (str): The component instance name, used to add entries to the
            losses during training.
-        labels (Iterable[str]): The labels to use.
+        labels (List[str]): The labels to use.
        threshold (float): Cutoff to consider a prediction "positive".
        positive_label (Optional[str]): The positive label for a binary task with exclusive classes, None otherwise.
        DOCS: https://nightly.spacy.io/api/textcategorizer#init
        """
@ -122,7 +138,11 @@ class TextCategorizer(Pipe):
        self.model = model
        self.name = name
        self._rehearsal_model = None
-        cfg = {"labels": labels, "threshold": threshold}
+        cfg = {
            "labels": labels,
            "threshold": threshold,
            "positive_label": positive_label,
        }
        self.cfg = dict(cfg)
    @property
@ -131,10 +151,10 @@ class TextCategorizer(Pipe):
        DOCS: https://nightly.spacy.io/api/textcategorizer#labels
        """
-        return tuple(self.cfg.setdefault("labels", []))
+        return tuple(self.cfg["labels"])
    @labels.setter
-    def labels(self, value: Iterable[str]) -> None:
+    def labels(self, value: List[str]) -> None:
        self.cfg["labels"] = tuple(value)
    def pipe(self, stream: Iterable[Doc], *, batch_size: int = 128) -> Iterator[Doc]:
@ -353,17 +373,10 @@ class TextCategorizer(Pipe):
            sgd = self.create_optimizer()
        return sgd
-    def score(
+    def score(self, examples: Iterable[Example], **kwargs) -> Dict[str, Any]:
        self,
        examples: Iterable[Example],
        *,
        positive_label: Optional[str] = None,
        **kwargs,
    ) -> Dict[str, Any]:
        """Score a batch of examples.
        examples (Iterable[Example]): The examples to score.
        positive_label (str): Optional positive label.
        RETURNS (Dict[str, Any]): The scores, produced by Scorer.score_cats.
        DOCS: https://nightly.spacy.io/api/textcategorizer#score
@ -374,7 +387,7 @@ class TextCategorizer(Pipe):
            "cats",
            labels=self.labels,
            multi_label=self.model.attrs["multi_label"],
-            positive_label=positive_label,
+            positive_label=self.cfg["positive_label"],
            threshold=self.cfg["threshold"],
            **kwargs,
        )
--- a/spacy/tests/pipeline/test_textcat.py
+++ b/spacy/tests/pipeline/test_textcat.py
@ -10,6 +10,7 @@ from spacy.tokens import Doc
 from spacy.pipeline.tok2vec import DEFAULT_TOK2VEC_MODEL
 from ..util import make_tempdir
 from ...cli.train import verify_textcat_config
 from ...training import Example
@ -130,7 +131,10 @@ def test_overfitting_IO():
    fix_random_seed(0)
    nlp = English()
    # Set exclusive labels
-    textcat = nlp.add_pipe("textcat", config={"model": {"exclusive_classes": True}})
+    textcat = nlp.add_pipe(
        "textcat",
        config={"model": {"exclusive_classes": True}, "positive_label": "POSITIVE"},
    )
    train_examples = []
    for text, annotations in TRAIN_DATA:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
@ -159,7 +163,7 @@ def test_overfitting_IO():
        assert cats2["POSITIVE"] + cats2["NEGATIVE"] == pytest.approx(1.0, 0.001)
    # Test scoring
-    scores = nlp.evaluate(train_examples, scorer_cfg={"positive_label": "POSITIVE"})
+    scores = nlp.evaluate(train_examples)
    assert scores["cats_micro_f"] == 1.0
    assert scores["cats_score"] == 1.0
    assert "cats_score_desc" in scores
@ -194,3 +198,29 @@ def test_textcat_configs(textcat_config):
    for i in range(5):
        losses = {}
        nlp.update(train_examples, sgd=optimizer, losses=losses)
 def test_positive_class():
    nlp = English()
    pipe_config = {"positive_label": "POS", "labels": ["POS", "NEG"]}
    textcat = nlp.add_pipe("textcat", config=pipe_config)
    assert textcat.labels == ("POS", "NEG")
    verify_textcat_config(nlp, pipe_config)
 def test_positive_class_not_present():
    nlp = English()
    pipe_config = {"positive_label": "POS", "labels": ["SOME", "THING"]}
    textcat = nlp.add_pipe("textcat", config=pipe_config)
    assert textcat.labels == ("SOME", "THING")
    with pytest.raises(ValueError):
        verify_textcat_config(nlp, pipe_config)
 def test_positive_class_not_binary():
    nlp = English()
    pipe_config = {"positive_label": "POS", "labels": ["SOME", "THING", "POS"]}
    textcat = nlp.add_pipe("textcat", config=pipe_config)
    assert textcat.labels == ("SOME", "THING", "POS")
    with pytest.raises(ValueError):
        verify_textcat_config(nlp, pipe_config)
--- a/spacy/tests/serialize/test_serialize_pipeline.py
+++ b/spacy/tests/serialize/test_serialize_pipeline.py
@ -136,7 +136,7 @@ def test_serialize_textcat_empty(en_vocab):
    # See issue #1105
    cfg = {"model": DEFAULT_TEXTCAT_MODEL}
    model = registry.make_from_config(cfg, validate=True)["model"]
-    textcat = TextCategorizer(en_vocab, model, labels=["ENTITY", "ACTION", "MODIFIER"], threshold=0.5)
+    textcat = TextCategorizer(en_vocab, model, labels=["ENTITY", "ACTION", "MODIFIER"], threshold=0.5, positive_label=None)
    textcat.to_bytes(exclude=["vocab"])
--- a/website/README.md
+++ b/website/README.md
@ -630,3 +630,49 @@ In addition to the native markdown elements, you can use the components
 ├── gatsby-node.js       # Node-specific hooks for Gatsby
 └── package.json         # package settings and dependencies
 ```
 ## Editorial {#editorial}
 - "spaCy" should always be spelled with a lowercase "s" and a capital "C",
  unless it specifically refers to the Python package or Python import `spacy`
  (in which case it should be formatted as code).
  - ✅ spaCy is a library for advanced NLP in Python.
  - ❌ Spacy is a library for advanced NLP in Python.
  - ✅ First, you need to install the `spacy` package from pip.
 - Mentions of code, like function names, classes, variable names etc. in inline
  text should be formatted as `code`.
  - ✅ "Calling the `nlp` object on a text returns a `Doc`."
 - Objects that have pages in the [API docs](/api) should be linked – for
  example, [`Doc`](/api/doc) or [`Language.to_disk`](/api/language#to_disk). The
  mentions should still be formatted as code within the link. Links pointing to
  the API docs will automatically receive a little icon. However, if a paragraph
  includes many references to the API, the links can easily get messy. In that
  case, we typically only link the first mention of an object and not any
  subsequent ones.
  - ✅ The [`Span`](/api/span) and [`Token`](/api/token) objects are views of a
    [`Doc`](/api/doc). [`Span.as_doc`](/api/span#as_doc) creates a `Doc` object
    from a `Span`.
  - ❌ The [`Span`](/api/span) and [`Token`](/api/token) objects are views of a
    [`Doc`](/api/doc). [`Span.as_doc`](/api/span#as_doc) creates a
    [`Doc`](/api/doc) object from a [`Span`](/api/span).
 * Other things we format as code are: references to trained pipeline packages
  like `en_core_web_sm` or file names like `code.py` or `meta.json`.
  - ✅ After training, the `config.cfg` is saved to disk.
 * [Type annotations](#type-annotations) are a special type of code formatting,
  expressed by wrapping the text in `~~` instead of backticks. The result looks
  like this: ~~List[Doc]~~. All references to known types will be linked
  automatically.
  - ✅ The model has the input type ~~List[Doc]~~ and it outputs a
    ~~List[Array2d]~~.
 * We try to keep links meaningful but short.
  - ✅ For details, see the usage guide on
    [training with custom code](/usage/training#custom-code).
  - ❌ For details, see
    [the usage guide on training with custom code](/usage/training#custom-code).
  - ❌ For details, see the usage guide on training with custom code
    [here](/usage/training#custom-code).
--- a/website/docs/api/dependencymatcher.md
+++ b/website/docs/api/dependencymatcher.md
@ -183,7 +183,7 @@ will be overwritten.
 | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `match_id`     | An ID for the patterns. ~~str~~                                                                                                                                      |
 | `patterns`     | A list of match patterns. A pattern consists of a list of dicts, where each dict describes a token in the tree. ~~List[List[Dict[str, Union[str, Dict]]]]~~          |
-| _keyword-only_ |                                                                                                                                                                      |  |
+| _keyword-only_ |                                                                                                                                                                      |
 | `on_match`     | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[DependencyMatcher, Doc, int, List[Tuple], Any]]~~ |
 ## DependencyMatcher.get {#get tag="method"}
--- a/website/docs/api/dependencyparser.md
+++ b/website/docs/api/dependencyparser.md
@ -217,7 +217,7 @@ model. Delegates to [`predict`](/api/dependencyparser#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
--- a/website/docs/api/entitylinker.md
+++ b/website/docs/api/entitylinker.md
@ -85,7 +85,7 @@ providing custom registered functions.
 | `vocab`          | The shared vocabulary. ~~Vocab~~                                                                                                 |
 | `model`          | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model~~                                        |
 | `name`           | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~                              |
-| _keyword-only_   |                                                                                                                                  |  |
+| _keyword-only_   |                                                                                                                                  |
 | `kb_loader`      | Function that creates a [`KnowledgeBase`](/api/kb) from a `Vocab` instance. ~~Callable[[Vocab], KnowledgeBase]~~                 |
 | `get_candidates` | Function that generates plausible candidates for a given `Span` object. ~~Callable[[KnowledgeBase, Span], Iterable[Candidate]]~~ |
 | `labels_discard` | NER labels that will automatically get a `"NIL"` prediction. ~~Iterable[str]~~                                                   |
@ -218,7 +218,7 @@ pipe's entity linking model and context encoder. Delegates to
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
--- a/website/docs/api/entityrecognizer.md
+++ b/website/docs/api/entityrecognizer.md
@ -206,7 +206,7 @@ model. Delegates to [`predict`](/api/entityrecognizer#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
--- a/website/docs/api/entityruler.md
+++ b/website/docs/api/entityruler.md
@ -255,7 +255,7 @@ Get all patterns that were added to the entity ruler.
 | Name              | Description                                                                                                           |
 | ----------------- | --------------------------------------------------------------------------------------------------------------------- |
-| `matcher`         | The underlying matcher used to process token patterns. ~~Matcher~~                                                    |  |
+| `matcher`         | The underlying matcher used to process token patterns. ~~Matcher~~                                                    |
 | `phrase_matcher`  | The underlying phrase matcher, used to process phrase patterns. ~~PhraseMatcher~~                                     |
 | `token_patterns`  | The token patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Dict[str, Union[str, List[dict]]]]~~ |
 | `phrase_patterns` | The phrase patterns present in the entity ruler, keyed by label. ~~Dict[str, List[Doc]]~~                             |
--- a/website/docs/api/lemmatizer.md
+++ b/website/docs/api/lemmatizer.md
@ -81,7 +81,7 @@ shortcut for this and instantiate the component using its string name and
 | `vocab`        | The shared vocabulary. ~~Vocab~~                                                                                                                               |
 | `model`        | **Not yet implemented:** The model to use. ~~Model~~                                                                                                           |
 | `name`         | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~                                                            |
-| _keyword-only_ |                                                                                                                                                                |  |
+| _keyword-only_ |                                                                                                                                                                |
 | mode           | The lemmatizer mode, e.g. `"lookup"` or `"rule"`. Defaults to `"lookup"`. ~~str~~                                                                              |
 | lookups        | A lookups object containing the tables such as `"lemma_rules"`, `"lemma_index"`, `"lemma_exc"` and `"lemma_lookup"`. Defaults to `None`. ~~Optional[Lookups]~~ |
 | overwrite      | Whether to overwrite existing lemmas. ~~bool~                                                                                                                  |
--- a/website/docs/api/morphologizer.md
+++ b/website/docs/api/morphologizer.md
@ -139,7 +139,7 @@ setting up the label scheme based on the data.
 | Name           | Description                                                                                                                           |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
 | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
-| _keyword-only_ |                                                                                                                                       |  |
+| _keyword-only_ |                                                                                                                                       |
 | `pipeline`     | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~             |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                         |
 | **RETURNS**    | The optimizer. ~~Optimizer~~                                                                                                          |
@ -196,7 +196,7 @@ Delegates to [`predict`](/api/morphologizer#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
--- a/website/docs/api/phrasematcher.md
+++ b/website/docs/api/phrasematcher.md
@ -150,9 +150,9 @@ patterns = [nlp("health care reform"), nlp("healthcare reform")]
 | Name           | Description                                                                                                                                                |
 | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `match_id`     | str                                                                                                                                                        | An ID for the thing you're matching. ~~str~~ |
+| `match_id`     | An ID for the thing you're matching. ~~str~~ |                                                                                                                                                        | 
 | `docs`         | `Doc` objects of the phrases to match. ~~List[Doc]~~                                                                                                       |
-| _keyword-only_ |                                                                                                                                                            |  |
+| _keyword-only_ |                                                                                                                                                            |
 | `on_match`     | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |
 ## PhraseMatcher.remove {#remove tag="method" new="2.2"}
--- a/website/docs/api/pipe.md
+++ b/website/docs/api/pipe.md
@ -187,7 +187,7 @@ predictions and gold-standard annotations, and update the component's model.
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
@ -211,7 +211,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
-| _keyword-only_ |                                                                                                                          |  |
+| _keyword-only_ |                                                                                                                          |
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
--- a/website/docs/api/sentencerecognizer.md
+++ b/website/docs/api/sentencerecognizer.md
@ -192,7 +192,7 @@ Delegates to [`predict`](/api/sentencerecognizer#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
@ -216,7 +216,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
-| _keyword-only_ |                                                                                                                          |  |
+| _keyword-only_ |                                                                                                                          |
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
--- a/website/docs/api/sentencizer.md
+++ b/website/docs/api/sentencizer.md
@ -53,7 +53,7 @@ Initialize the sentencizer.
 | Name           | Description                                                                                                             |
 | -------------- | ----------------------------------------------------------------------------------------------------------------------- |
-| _keyword-only_ |                                                                                                                         |  |
+| _keyword-only_ |                                                                                                                         |
 | `punct_chars`  | Optional custom list of punctuation characters that mark sentence ends. See below for defaults. ~~Optional[List[str]]~~ |
 ```python
--- a/website/docs/api/tagger.md
+++ b/website/docs/api/tagger.md
@ -190,7 +190,7 @@ Delegates to [`predict`](/api/tagger#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
@ -214,7 +214,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
-| _keyword-only_ |                                                                                                                          |  |
+| _keyword-only_ |                                                                                                                          |
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
--- a/website/docs/api/textcategorizer.md
+++ b/website/docs/api/textcategorizer.md
@ -37,9 +37,10 @@ architectures and their arguments and hyperparameters.
 > ```
 | Setting          | Description                                                                                                                                                      |
-| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `labels`         | A list of categories to learn. If empty, the model infers the categories from the data. Defaults to `[]`. ~~Iterable[str]~~                                      |
 | `threshold`      | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~                                                                   |
 | `positive_label` | The positive label for a binary task with exclusive classes, None otherwise and by default. ~~Optional[str]~~                                                    |
 | `model`          | A model instance that predicts scores for each category. Defaults to [TextCatEnsemble](/api/architectures#TextCatEnsemble). ~~Model[List[Doc], List[Floats2d]]~~ |
 ```python
@ -60,7 +61,7 @@ architectures and their arguments and hyperparameters.
 >
 > # Construction from class
 > from spacy.pipeline import TextCategorizer
-> textcat = TextCategorizer(nlp.vocab, model, labels=[], threshold=0.5)
+> textcat = TextCategorizer(nlp.vocab, model, labels=[], threshold=0.5, positive_label="POS")
 > ```
 Create a new pipeline instance. In your application, you would normally use a
@ -68,13 +69,14 @@ shortcut for this and instantiate the component using its string name and
 [`nlp.add_pipe`](/api/language#create_pipe).
 | Name             | Description                                                                                                                |
-| -------------- | -------------------------------------------------------------------------------------------------------------------------- |
+| ---------------- | -------------------------------------------------------------------------------------------------------------------------- |
 | `vocab`          | The shared vocabulary. ~~Vocab~~                                                                                           |
 | `model`          | The Thinc [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model[List[Doc], List[Floats2d]]~~ |
 | `name`           | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~                        |
 | _keyword-only_   |                                                                                                                            |
 | `labels`         | The labels to use. ~~Iterable[str]~~                                                                                       |
 | `threshold`      | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~                             |
 | `positive_label` | The positive label for a binary task with exclusive classes, None otherwise. ~~Optional[str]~~                             |
 ## TextCategorizer.\_\_call\_\_ {#call tag="method"}
@ -201,7 +203,7 @@ Delegates to [`predict`](/api/textcategorizer#predict) and
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    | 
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
@ -225,7 +227,7 @@ the "catastrophic forgetting" problem. This feature is experimental.
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
-| _keyword-only_ |                                                                                                                          |  |
+| _keyword-only_ |                                                                                                                          | 
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
@ -263,7 +265,7 @@ Score a batch of examples.
 | Name             | Description                                                                                                          |
 | ---------------- | -------------------------------------------------------------------------------------------------------------------- |
 | `examples`       | The examples to score. ~~Iterable[Example]~~                                                                         |
-| _keyword-only_   |                                                                                                                      |  |
+| _keyword-only_   |                                                                                                                      |
 | `positive_label` | Optional positive label. ~~Optional[str]~~                                                                           |
 | **RETURNS**      | The scores, produced by [`Scorer.score_cats`](/api/scorer#score_cats). ~~Dict[str, Union[float, Dict[str, float]]]~~ |
--- a/website/docs/api/tok2vec.md
+++ b/website/docs/api/tok2vec.md
@ -144,7 +144,7 @@ setting up the label scheme based on the data.
 | Name           | Description                                                                                                                           |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
 | `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
-| _keyword-only_ |                                                                                                                                       |  |
+| _keyword-only_ |                                                                                                                                       |
 | `pipeline`     | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~             |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                         |
 | **RETURNS**    | The optimizer. ~~Optimizer~~                                                                                                          |
@ -200,7 +200,7 @@ Delegates to [`predict`](/api/tok2vec#predict).
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
-| _keyword-only_    |                                                                                                                                    |  |
+| _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
--- a/website/docs/styleguide.md
+++ b/website/docs/styleguide.md
@ -11,6 +11,7 @@ menu:
  - ['Setup & Installation', 'setup']
  - ['Markdown Reference', 'markdown']
  - ['Project Structure', 'structure']
  - ['Editorial', 'editorial']
 sidebar:
  - label: Styleguide
    items:
--- a/website/src/components/quickstart.js
+++ b/website/src/components/quickstart.js
@ -27,6 +27,7 @@ const Quickstart = ({
    hidePrompts,
    small,
    codeLang,
    Container = Section,
    children,
 }) => {
    const contentRef = useRef()
@ -83,7 +84,7 @@ const Quickstart = ({
    }, [data, initialized])
    return !data.length ? null : (
-        <Section id={id}>
+        <Container id={id}>
            <div className={classNames(classes.root, { [classes.hidePrompts]: !!hidePrompts })}>
                {title && (
                    <H2 className={classes.title} name={id}>
@ -249,7 +250,7 @@ const Quickstart = ({
                </pre>
                {showCopy && <textarea ref={copyAreaRef} className={classes.copyArea} rows={1} />}
            </div>
-        </Section>
+        </Container>
    )
 }
--- a/website/src/styles/list.module.sass
+++ b/website/src/styles/list.module.sass
@ -41,3 +41,7 @@
    &:before
        content: ""
    .ul .ul &
        text-indent: initial
        margin-left: -20px
--- a/website/src/widgets/quickstart-training.js
+++ b/website/src/widgets/quickstart-training.js
@ -87,6 +87,8 @@ export default function QuickstartTraining({ id, title, download = 'base_config.
                    .sort((a, b) => a.title.localeCompare(b.title))
                return (
                    <Quickstart
                        id="quickstart-widget"
                        Container="div"
                        download={download}
                        rawContent={content}
                        data={DATA}