Merge branch 'develop' of https://github.com/explosion/spaCy into develop

2025-11-07 11:27:37 +03:00 · 2021-01-13 12:03:02 +11:00 · 2021-01-13 12:03:02 +11:00 · 97d5a7ba99
commit 97d5a7ba99
parent 8d6448ccf7 ad43cbb042
82 changed files with 2710 additions and 4817 deletions
--- a/.github/contributors/bratao.md
+++ b/.github/contributors/bratao.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [X] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |  Bruno Souza Cabral  |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 24/12/2020           |
 | GitHub username                |  bratao              |
 | Website (optional)             |                      |
--- a/.gitignore
+++ b/.gitignore
@ -51,6 +51,7 @@ env3.*/
 .pypyenv
 .pytest_cache/
 .mypy_cache/
 .hypothesis/
 # Distribution / packaging
 env/
--- a/spacy/cli/download.py
+++ b/spacy/cli/download.py
@ -35,7 +35,10 @@ def download_cli(
 def download(model: str, direct: bool = False, *pip_args) -> None:
-    if not (is_package("spacy") or is_package("spacy-nightly")) and "--no-deps" not in pip_args:
+    if (
        not (is_package("spacy") or is_package("spacy-nightly"))
        and "--no-deps" not in pip_args
    ):
        msg.warn(
            "Skipping pipeline package dependencies and setting `--no-deps`. "
            "You don't seem to have the spaCy package itself installed "
--- a/spacy/cli/evaluate.py
+++ b/spacy/cli/evaluate.py
@ -172,7 +172,9 @@ def render_parses(
            file_.write(html)
-def print_prf_per_type(msg: Printer, scores: Dict[str, Dict[str, float]], name: str, type: str) -> None:
+def print_prf_per_type(
    msg: Printer, scores: Dict[str, Dict[str, float]], name: str, type: str
 ) -> None:
    data = [
        (k, f"{v['p']*100:.2f}", f"{v['r']*100:.2f}", f"{v['f']*100:.2f}")
        for k, v in scores.items()
--- a/spacy/cli/info.py
+++ b/spacy/cli/info.py
@ -1,10 +1,10 @@
-from typing import Optional, Dict, Any, Union
+from typing import Optional, Dict, Any, Union, List
 import platform
 from pathlib import Path
 from wasabi import Printer, MarkdownRenderer
 import srsly
-from ._util import app, Arg, Opt
+from ._util import app, Arg, Opt, string_to_list
 from .. import util
 from .. import about
@ -15,20 +15,22 @@ def info_cli(
    model: Optional[str] = Arg(None, help="Optional loadable spaCy pipeline"),
    markdown: bool = Opt(False, "--markdown", "-md", help="Generate Markdown for GitHub issues"),
    silent: bool = Opt(False, "--silent", "-s", "-S", help="Don't print anything (just return)"),
    exclude: Optional[str] = Opt("labels", "--exclude", "-e", help="Comma-separated keys to exclude from the print-out"),
    # fmt: on
 ):
    """
-    Print info about spaCy installation. If a pipeline is speficied as an argument,
+    Print info about spaCy installation. If a pipeline is specified as an argument,
    print its meta information. Flag --markdown prints details in Markdown for easy
    copy-pasting to GitHub issues.
    DOCS: https://nightly.spacy.io/api/cli#info
    """
-    info(model, markdown=markdown, silent=silent)
+    exclude = string_to_list(exclude)
    info(model, markdown=markdown, silent=silent, exclude=exclude)
 def info(
-    model: Optional[str] = None, *, markdown: bool = False, silent: bool = True
+    model: Optional[str] = None, *, markdown: bool = False, silent: bool = True, exclude: List[str]
 ) -> Union[str, dict]:
    msg = Printer(no_print=silent, pretty=not silent)
    if model:
@ -42,13 +44,13 @@ def info(
        data["Pipelines"] = ", ".join(
            f"{n} ({v})" for n, v in data["Pipelines"].items()
        )
-    markdown_data = get_markdown(data, title=title)
+    markdown_data = get_markdown(data, title=title, exclude=exclude)
    if markdown:
        if not silent:
            print(markdown_data)
        return markdown_data
    if not silent:
-        table_data = dict(data)
+        table_data = {k: v for k, v in data.items() if k not in exclude}
        msg.table(table_data, title=title)
    return raw_data
@ -82,7 +84,7 @@ def info_model(model: str, *, silent: bool = True) -> Dict[str, Any]:
    if util.is_package(model):
        model_path = util.get_package_path(model)
    else:
-        model_path = model
+        model_path = Path(model)
    meta_path = model_path / "meta.json"
    if not meta_path.is_file():
        msg.fail("Can't find pipeline meta.json", meta_path, exits=1)
@ -96,7 +98,7 @@ def info_model(model: str, *, silent: bool = True) -> Dict[str, Any]:
    }
-def get_markdown(data: Dict[str, Any], title: Optional[str] = None) -> str:
+def get_markdown(data: Dict[str, Any], title: Optional[str] = None, exclude: List[str] = None) -> str:
    """Get data in GitHub-flavoured Markdown format for issues etc.
    data (dict or list of tuples): Label/value pairs.
@ -108,8 +110,16 @@ def get_markdown(data: Dict[str, Any], title: Optional[str] = None) -> str:
        md.add(md.title(2, title))
    items = []
    for key, value in data.items():
-        if isinstance(value, str) and Path(value).exists():
+        if exclude and key in exclude:
            continue
        if isinstance(value, str):
            try:
                existing_path = Path(value).exists()
            except:
                # invalid Path, like a URL string
                existing_path = False
            if existing_path:
                continue
        items.append(f"{md.bold(f'{key}:')} {value}")
    md.add(md.list(items))
    return f"\n{md.text}\n"
--- a/spacy/cli/init_config.py
+++ b/spacy/cli/init_config.py
@ -32,6 +32,7 @@ def init_config_cli(
    optimize: Optimizations = Opt(Optimizations.efficiency.value, "--optimize", "-o", help="Whether to optimize for efficiency (faster inference, smaller model, lower memory consumption) or higher accuracy (potentially larger and slower model). This will impact the choice of architecture, pretrained weights and related hyperparameters."),
    gpu: bool = Opt(False, "--gpu", "-G", help="Whether the model can run on GPU. This will impact the choice of architecture, pretrained weights and related hyperparameters."),
    pretraining: bool = Opt(False, "--pretraining", "-pt", help="Include config for pretraining (with 'spacy pretrain')"),
    force_overwrite: bool = Opt(False, "--force", "-F", help="Force overwriting the output file"),
    # fmt: on
 ):
    """
@ -46,6 +47,12 @@ def init_config_cli(
        optimize = optimize.value
    pipeline = string_to_list(pipeline)
    is_stdout = str(output_file) == "-"
    if not is_stdout and output_file.exists() and not force_overwrite:
        msg = Printer()
        msg.fail(
            "The provided output file already exists. To force overwriting the config file, set the --force or -F flag.",
            exits=1,
        )
    config = init_config(
        lang=lang,
        pipeline=pipeline,
@ -162,7 +169,7 @@ def init_config(
        "Hardware": variables["hardware"].upper(),
        "Transformer": template_vars.transformer.get("name", False),
    }
-    msg.info("Generated template specific for your use case")
+    msg.info("Generated config template specific for your use case")
    for label, value in use_case.items():
        msg.text(f"- {label}: {value}")
    with show_validation_error(hint_fill=False):
--- a/spacy/cli/templates/quickstart_training.jinja
+++ b/spacy/cli/templates/quickstart_training.jinja
@ -149,13 +149,44 @@ grad_factor = 1.0
 [components.textcat.model.linear_model]
@architectures = "spacy.TextCatBOW.v1"
-exclusive_classes = false
+exclusive_classes = true
 ngram_size = 1
 no_output_layer = false
 {% else -%}
 [components.textcat.model]
@architectures = "spacy.TextCatBOW.v1"
 exclusive_classes = true
 ngram_size = 1
 no_output_layer = false
 {%- endif %}
 {%- endif %}
 {% if "textcat_multilabel" in components %}
 [components.textcat_multilabel]
 factory = "textcat_multilabel"
 {% if optimize == "accuracy" %}
 [components.textcat_multilabel.model]
@architectures = "spacy.TextCatEnsemble.v2"
 nO = null
 [components.textcat_multilabel.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
 grad_factor = 1.0
 [components.textcat_multilabel.model.tok2vec.pooling]
@layers = "reduce_mean.v1"
 [components.textcat_multilabel.model.linear_model]
@architectures = "spacy.TextCatBOW.v1"
 exclusive_classes = false
 ngram_size = 1
 no_output_layer = false
 {% else -%}
 [components.textcat_multilabel.model]
@architectures = "spacy.TextCatBOW.v1"
 exclusive_classes = false
 ngram_size = 1
 no_output_layer = false
@ -174,7 +205,7 @@ no_output_layer = false
 factory = "tok2vec"
 [components.tok2vec.model]
-@architectures = "spacy.Tok2Vec.v1"
+@architectures = "spacy.Tok2Vec.v2"
 [components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v1"
@ -189,7 +220,7 @@ rows = [5000, 2500]
 include_static_vectors = {{ "true" if optimize == "accuracy" else "false" }}
 [components.tok2vec.model.encode]
-@architectures = "spacy.MaxoutWindowEncoder.v1"
+@architectures = "spacy.MaxoutWindowEncoder.v2"
 width = {{ 96 if optimize == "efficiency" else 256 }}
 depth = {{ 4 if optimize == "efficiency" else 8 }}
 window_size = 1
@ -288,13 +319,41 @@ width = ${components.tok2vec.model.encode.width}
 [components.textcat.model.linear_model]
@architectures = "spacy.TextCatBOW.v1"
-exclusive_classes = false
+exclusive_classes = true
 ngram_size = 1
 no_output_layer = false
 {% else -%}
 [components.textcat.model]
@architectures = "spacy.TextCatBOW.v1"
 exclusive_classes = true
 ngram_size = 1
 no_output_layer = false
 {%- endif %}
 {%- endif %}
 {% if "textcat_multilabel" in components %}
 [components.textcat_multilabel]
 factory = "textcat_multilabel"
 {% if optimize == "accuracy" %}
 [components.textcat_multilabel.model]
@architectures = "spacy.TextCatEnsemble.v2"
 nO = null
 [components.textcat_multilabel.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
 width = ${components.tok2vec.model.encode.width}
 [components.textcat_multilabel.model.linear_model]
@architectures = "spacy.TextCatBOW.v1"
 exclusive_classes = false
 ngram_size = 1
 no_output_layer = false
 {% else -%}
 [components.textcat_multilabel.model]
@architectures = "spacy.TextCatBOW.v1"
 exclusive_classes = false
 ngram_size = 1
 no_output_layer = false
@ -303,7 +362,7 @@ no_output_layer = false
 {% endif %}
 {% for pipe in components %}
-{% if pipe not in ["tagger", "morphologizer", "parser", "ner", "textcat", "entity_linker"] %}
+{% if pipe not in ["tagger", "morphologizer", "parser", "ner", "textcat", "textcat_multilabel", "entity_linker"] %}
 {# Other components defined by the user: we just assume they're factories #}
 [components.{{ pipe }}]
 factory = "{{ pipe }}"
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -463,6 +463,10 @@ class Errors:
            "issue tracker: http://github.com/explosion/spaCy/issues")
    # TODO: fix numbering after merging develop into master
    E895 = ("The 'textcat' component received gold-standard annotations with "
            "multiple labels per document. In spaCy 3 you should use the "
            "'textcat_multilabel' component for this instead. "
            "Example of an offending annotation: {value}")
    E896 = ("There was an error using the static vectors. Ensure that the vectors "
            "of the vocab are properly initialized, or set 'include_static_vectors' "
            "to False.")
--- a/spacy/lang/char_classes.py
+++ b/spacy/lang/char_classes.py
@ -214,8 +214,22 @@ _macedonian_lower = r"ѓѕјљњќѐѝ"
 _macedonian_upper = r"ЃЅЈЉЊЌЀЍ"
 _macedonian = r"ѓѕјљњќѐѝЃЅЈЉЊЌЀЍ"
-_upper = LATIN_UPPER + _russian_upper + _tatar_upper + _greek_upper + _ukrainian_upper + _macedonian_upper
+_upper = (
-_lower = LATIN_LOWER + _russian_lower + _tatar_lower + _greek_lower + _ukrainian_lower + _macedonian_lower
+    LATIN_UPPER
    + _russian_upper
    + _tatar_upper
    + _greek_upper
    + _ukrainian_upper
    + _macedonian_upper
 )
 _lower = (
    LATIN_LOWER
    + _russian_lower
    + _tatar_lower
    + _greek_lower
    + _ukrainian_lower
    + _macedonian_lower
 )
 _uncased = (
    _bengali
@ -230,7 +244,9 @@ _uncased = (
    + _cjk
 )
-ALPHA = group_chars(LATIN + _russian + _tatar + _greek + _ukrainian + _macedonian + _uncased)
+ALPHA = group_chars(
    LATIN + _russian + _tatar + _greek + _ukrainian + _macedonian + _uncased
 )
 ALPHA_LOWER = group_chars(_lower + _uncased)
 ALPHA_UPPER = group_chars(_upper + _uncased)
--- a/spacy/lang/cs/init.py
+++ b/spacy/lang/cs/init.py
@ -1,18 +1,11 @@
 from .stop_words import STOP_WORDS
 from .tag_map import TAG_MAP
 from ...language import Language
 from ...attrs import LANG
 from .lex_attrs import LEX_ATTRS
 from ...language import Language
 class CzechDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = lambda text: "cs"
    tag_map = TAG_MAP
    stop_words = STOP_WORDS
    lex_attr_getters = LEX_ATTRS
    stop_words = STOP_WORDS
 class Czech(Language):
--- a/spacy/lang/cs/tag_map.py
+++ b/spacy/lang/cs/tag_map.py
--- a/spacy/lang/mk/lemmatizer.py
+++ b/spacy/lang/mk/lemmatizer.py
@ -14,7 +14,7 @@ class MacedonianLemmatizer(Lemmatizer):
        if univ_pos in ("", "eol", "space"):
            return [string.lower()]
-        if string[-3:] == 'јќи':
+        if string[-3:] == "јќи":
            string = string[:-3]
            univ_pos = "verb"
@ -23,7 +23,13 @@ class MacedonianLemmatizer(Lemmatizer):
        index_table = self.lookups.get_table("lemma_index", {})
        exc_table = self.lookups.get_table("lemma_exc", {})
        rules_table = self.lookups.get_table("lemma_rules", {})
-        if not any((index_table.get(univ_pos), exc_table.get(univ_pos), rules_table.get(univ_pos))):
+        if not any(
            (
                index_table.get(univ_pos),
                exc_table.get(univ_pos),
                rules_table.get(univ_pos),
            )
        ):
            if univ_pos == "propn":
                return [string]
            else:
--- a/spacy/lang/mk/lex_attrs.py
+++ b/spacy/lang/mk/lex_attrs.py
@ -1,21 +1,104 @@
 from ...attrs import LIKE_NUM
 _num_words = [
-    "нула", "еден", "една", "едно", "два", "две", "три", "четири", "пет", "шест", "седум", "осум", "девет", "десет",
+    "нула",
-    "единаесет", "дванаесет", "тринаесет", "четиринаесет", "петнаесет", "шеснаесет", "седумнаесет", "осумнаесет",
+    "еден",
-    "деветнаесет", "дваесет", "триесет", "четириесет", "педесет", "шеесет", "седумдесет", "осумдесет", "деведесет",
+    "една",
-    "сто", "двесте", "триста", "четиристотини", "петстотини", "шестотини", "седумстотини", "осумстотини",
+    "едно",
-    "деветстотини", "илјада", "илјади", 'милион', 'милиони', 'милијарда', 'милијарди', 'билион', 'билиони',
+    "два",
-
+    "две",
-    "двајца", "тројца", "четворица", "петмина", "шестмина", "седуммина", "осуммина", "деветмина", "обата", "обајцата",
+    "три",
-
+    "четири",
-    "прв", "втор", "трет", "четврт", "седм", "осм", "двестоти",
+    "пет",
-
+    "шест",
-    "два-три", "два-триесет", "два-триесетмина", "два-тринаесет", "два-тројца", "две-три", "две-тристотини",
+    "седум",
-    "пет-шеесет", "пет-шеесетмина", "пет-шеснаесетмина", "пет-шест", "пет-шестмина", "пет-шестотини", "петина",
+    "осум",
-    "осмина", "седум-осум", "седум-осумдесет", "седум-осуммина", "седум-осумнаесет", "седум-осумнаесетмина",
+    "девет",
-    "три-четириесет", "три-четиринаесет", "шеесет", "шеесетина", "шеесетмина", "шеснаесет", "шеснаесетмина",
+    "десет",
-    "шест-седум", "шест-седумдесет", "шест-седумнаесет", "шест-седумстотини", "шестоти", "шестотини"
+    "единаесет",
    "дванаесет",
    "тринаесет",
    "четиринаесет",
    "петнаесет",
    "шеснаесет",
    "седумнаесет",
    "осумнаесет",
    "деветнаесет",
    "дваесет",
    "триесет",
    "четириесет",
    "педесет",
    "шеесет",
    "седумдесет",
    "осумдесет",
    "деведесет",
    "сто",
    "двесте",
    "триста",
    "четиристотини",
    "петстотини",
    "шестотини",
    "седумстотини",
    "осумстотини",
    "деветстотини",
    "илјада",
    "илјади",
    "милион",
    "милиони",
    "милијарда",
    "милијарди",
    "билион",
    "билиони",
    "двајца",
    "тројца",
    "четворица",
    "петмина",
    "шестмина",
    "седуммина",
    "осуммина",
    "деветмина",
    "обата",
    "обајцата",
    "прв",
    "втор",
    "трет",
    "четврт",
    "седм",
    "осм",
    "двестоти",
    "два-три",
    "два-триесет",
    "два-триесетмина",
    "два-тринаесет",
    "два-тројца",
    "две-три",
    "две-тристотини",
    "пет-шеесет",
    "пет-шеесетмина",
    "пет-шеснаесетмина",
    "пет-шест",
    "пет-шестмина",
    "пет-шестотини",
    "петина",
    "осмина",
    "седум-осум",
    "седум-осумдесет",
    "седум-осуммина",
    "седум-осумнаесет",
    "седум-осумнаесетмина",
    "три-четириесет",
    "три-четиринаесет",
    "шеесет",
    "шеесетина",
    "шеесетмина",
    "шеснаесет",
    "шеснаесетмина",
    "шест-седум",
    "шест-седумдесет",
    "шест-седумнаесет",
    "шест-седумстотини",
    "шестоти",
    "шестотини",
 ]
--- a/spacy/lang/mk/tokenizer_exceptions.py
+++ b/spacy/lang/mk/tokenizer_exceptions.py
@ -21,8 +21,7 @@ _abbr_exc = [
    {ORTH: "хл", NORM: "хектолитар"},
    {ORTH: "дкл", NORM: "декалитар"},
    {ORTH: "л", NORM: "литар"},
-    {ORTH: "дл", NORM: "децилитар"}
+    {ORTH: "дл", NORM: "децилитар"},
 ]
 for abbr in _abbr_exc:
    _exc[abbr[ORTH]] = [abbr]
@ -33,7 +32,6 @@ _abbr_line_exc = [
    {ORTH: "г-ѓа", NORM: "госпоѓа"},
    {ORTH: "г-ца", NORM: "госпоѓица"},
    {ORTH: "г-дин", NORM: "господин"},
 ]
 for abbr in _abbr_line_exc:
@ -54,7 +52,6 @@ _abbr_dot_exc = [
    {ORTH: "т.", NORM: "точка"},
    {ORTH: "т.е.", NORM: "то ест"},
    {ORTH: "т.н.", NORM: "таканаречен"},
    {ORTH: "бр.", NORM: "број"},
    {ORTH: "гр.", NORM: "град"},
    {ORTH: "др.", NORM: "другар"},
@ -68,7 +65,6 @@ _abbr_dot_exc = [
    {ORTH: "с.", NORM: "страница"},
    {ORTH: "стр.", NORM: "страница"},
    {ORTH: "чл.", NORM: "член"},
    {ORTH: "арх.", NORM: "архитект"},
    {ORTH: "бел.", NORM: "белешка"},
    {ORTH: "гимн.", NORM: "гимназија"},
@ -89,8 +85,6 @@ _abbr_dot_exc = [
    {ORTH: "истор.", NORM: "историја"},
    {ORTH: "геогр.", NORM: "географија"},
    {ORTH: "литер.", NORM: "литература"},
 ]
 for abbr in _abbr_dot_exc:
--- a/spacy/lang/tr/tokenizer_exceptions.py
+++ b/spacy/lang/tr/tokenizer_exceptions.py
@ -45,7 +45,7 @@ _abbr_period_exc = [
    {ORTH: "Doç.", NORM: "doçent"},
    {ORTH: "doğ."},
    {ORTH: "Dr.", NORM: "doktor"},
-    {ORTH: "dr.", NORM:"doktor"},
+    {ORTH: "dr.", NORM: "doktor"},
    {ORTH: "drl.", NORM: "derleyen"},
    {ORTH: "Dz.", NORM: "deniz"},
    {ORTH: "Dz.K.K.lığı"},
@ -118,7 +118,7 @@ _abbr_period_exc = [
    {ORTH: "Uzm.", NORM: "uzman"},
    {ORTH: "Üçvş.", NORM: "üstçavuş"},
    {ORTH: "Üni.", NORM: "üniversitesi"},
-    {ORTH: "Ütğm.", NORM:  "üsteğmen"},
+    {ORTH: "Ütğm.", NORM: "üsteğmen"},
    {ORTH: "vb."},
    {ORTH: "vs.", NORM: "vesaire"},
    {ORTH: "Yard.", NORM: "yardımcı"},
@ -163,19 +163,29 @@ for abbr in _abbr_exc:
    _exc[abbr[ORTH]] = [abbr]
 _num = r"[+-]?\d+([,.]\d+)*"
 _ord_num = r"(\d+\.)"
 _date = r"(((\d{1,2}[./-]){2})?(\d{4})|(\d{1,2}[./]\d{1,2}(\.)?))"
 _dash_num = r"(([{al}\d]+/\d+)|(\d+/[{al}]))".format(al=ALPHA)
-_roman_num =  "M{0,3}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})"
+_roman_num = "M{0,3}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})"
 _roman_ord = r"({rn})\.".format(rn=_roman_num)
 _time_exp = r"\d+(:\d+)*"
 _inflections = r"'[{al}]+".format(al=ALPHA_LOWER)
 _abbrev_inflected = r"[{a}]+\.'[{al}]+".format(a=ALPHA, al=ALPHA_LOWER)
-_nums = r"(({d})|({dn})|({te})|({on})|({n})|({ro})|({rn}))({inf})?".format(d=_date, dn=_dash_num, te=_time_exp, on=_ord_num, n=_num, ro=_roman_ord, rn=_roman_num, inf=_inflections)
+_nums = r"(({d})|({dn})|({te})|({on})|({n})|({ro})|({rn}))({inf})?".format(
    d=_date,
    dn=_dash_num,
    te=_time_exp,
    on=_ord_num,
    n=_num,
    ro=_roman_ord,
    rn=_roman_num,
    inf=_inflections,
 )
 TOKENIZER_EXCEPTIONS = _exc
-TOKEN_MATCH = re.compile(r"^({abbr})|({n})$".format(n=_nums, abbr=_abbrev_inflected)).match
+TOKEN_MATCH = re.compile(
    r"^({abbr})|({n})$".format(n=_nums, abbr=_abbrev_inflected)
 ).match
--- a/spacy/ml/extract_ngrams.py
+++ b/spacy/ml/extract_ngrams.py
@ -1,4 +1,3 @@
 import numpy
 from thinc.api import Model
 from ..attrs import LOWER
--- a/spacy/ml/models/parser.py
+++ b/spacy/ml/models/parser.py
@ -21,14 +21,14 @@ def transition_parser_v1(
    nO: Optional[int] = None,
 ) -> Model:
    return build_tb_parser_model(
-    tok2vec,
+        tok2vec,
-    state_type,
+        state_type,
-    extra_state_tokens,
+        extra_state_tokens,
-    hidden_width,
+        hidden_width,
-    maxout_pieces,
+        maxout_pieces,
-    use_upper,
+        use_upper,
-    nO,
+        nO,
-)
+    )
@registry.architectures.register("spacy.TransitionBasedParser.v2")
@ -42,14 +42,15 @@ def transition_parser_v2(
    nO: Optional[int] = None,
 ) -> Model:
    return build_tb_parser_model(
-    tok2vec,
+        tok2vec,
-    state_type,
+        state_type,
-    extra_state_tokens,
+        extra_state_tokens,
-    hidden_width,
+        hidden_width,
-    maxout_pieces,
+        maxout_pieces,
-    use_upper,
+        use_upper,
-    nO,
+        nO,
-)
+    )
 def build_tb_parser_model(
    tok2vec: Model[List[Doc], List[Floats2d]],
@ -162,8 +163,8 @@ def _resize_upper(model, new_nO):
        # just adding rows here.
        if smaller.has_dim("nO"):
            old_nO = smaller.get_dim("nO")
-            larger_W[: old_nO] = smaller_W
+            larger_W[:old_nO] = smaller_W
-            larger_b[: old_nO] = smaller_b
+            larger_b[:old_nO] = smaller_b
            for i in range(old_nO, new_nO):
                model.attrs["unseen_classes"].add(i)
--- a/spacy/ml/models/textcat.py
+++ b/spacy/ml/models/textcat.py
@ -6,6 +6,7 @@ from thinc.api import chain, concatenate, clone, Dropout, ParametricAttention
 from thinc.api import SparseLinear, Softmax, softmax_activation, Maxout, reduce_sum
 from thinc.api import HashEmbed, with_array, with_cpu, uniqued
 from thinc.api import Relu, residual, expand_window
 from thinc.layers.chain import init as init_chain
 from ...attrs import ID, ORTH, PREFIX, SUFFIX, SHAPE, LOWER
 from ...util import registry
@ -13,6 +14,7 @@ from ..extract_ngrams import extract_ngrams
 from ..staticvectors import StaticVectors
 from ..featureextractor import FeatureExtractor
 from ...tokens import Doc
 from .tok2vec import get_tok2vec_width
@registry.architectures.register("spacy.TextCatCNN.v1")
@ -69,13 +71,16 @@ def build_text_classifier_v2(
    exclusive_classes = not linear_model.attrs["multi_label"]
    with Model.define_operators({">>": chain, "|": concatenate}):
        width = tok2vec.maybe_get_dim("nO")
        attention_layer = ParametricAttention(width)   # TODO: benchmark performance difference of this layer
        maxout_layer = Maxout(nO=width, nI=width)
        linear_layer = Linear(nO=nO, nI=width)
        cnn_model = (
                tok2vec
                >> list2ragged()
-                >> ParametricAttention(width)   # TODO: benchmark performance difference of this layer
+                >> attention_layer
                >> reduce_sum()
-                >> residual(Maxout(nO=width, nI=width))
+                >> residual(maxout_layer)
-                >> Linear(nO=nO, nI=width)
+                >> linear_layer
                >> Dropout(0.0)
        )
@ -89,9 +94,25 @@ def build_text_classifier_v2(
    if model.has_dim("nO") is not False:
        model.set_dim("nO", nO)
    model.set_ref("output_layer", linear_model.get_ref("output_layer"))
    model.set_ref("attention_layer", attention_layer)
    model.set_ref("maxout_layer", maxout_layer)
    model.set_ref("linear_layer", linear_layer)
    model.attrs["multi_label"] = not exclusive_classes
    model.init = init_ensemble_textcat
    return model
 def init_ensemble_textcat(model, X, Y) -> Model:
    tok2vec_width = get_tok2vec_width(model)
    model.get_ref("attention_layer").set_dim("nO", tok2vec_width)
    model.get_ref("maxout_layer").set_dim("nO", tok2vec_width)
    model.get_ref("maxout_layer").set_dim("nI", tok2vec_width)
    model.get_ref("linear_layer").set_dim("nI", tok2vec_width)
    init_chain(model, X, Y)
    return model
 # TODO: move to legacy
@registry.architectures.register("spacy.TextCatEnsemble.v1")
 def build_text_classifier_v1(
--- a/spacy/ml/models/tok2vec.py
+++ b/spacy/ml/models/tok2vec.py
@ -20,6 +20,17 @@ def tok2vec_listener_v1(width: int, upstream: str = "*"):
    return tok2vec
 def get_tok2vec_width(model: Model):
    nO = None
    if model.has_ref("tok2vec"):
        tok2vec = model.get_ref("tok2vec")
        if tok2vec.has_dim("nO"):
            nO = tok2vec.get_dim("nO")
        elif tok2vec.has_ref("listener"):
            nO = tok2vec.get_ref("listener").get_dim("nO")
    return nO
@registry.architectures.register("spacy.HashEmbedCNN.v1")
 def build_hash_embed_cnn_tok2vec(
    *,
@ -76,6 +87,7 @@ def build_hash_embed_cnn_tok2vec(
    )
 # TODO: archive
@registry.architectures.register("spacy.Tok2Vec.v1")
 def build_Tok2Vec_model(
    embed: Model[List[Doc], List[Floats2d]],
@ -97,6 +109,28 @@ def build_Tok2Vec_model(
    return tok2vec
@registry.architectures.register("spacy.Tok2Vec.v2")
 def build_Tok2Vec_model(
    embed: Model[List[Doc], List[Floats2d]],
    encode: Model[List[Floats2d], List[Floats2d]],
 ) -> Model[List[Doc], List[Floats2d]]:
    """Construct a tok2vec model out of embedding and encoding subnetworks.
    See https://explosion.ai/blog/deep-learning-formula-nlp
    embed (Model[List[Doc], List[Floats2d]]): Embed tokens into context-independent
        word vector representations.
    encode (Model[List[Floats2d], List[Floats2d]]): Encode context into the
        embeddings, using an architecture such as a CNN, BiLSTM or transformer.
    """
    tok2vec = chain(embed, encode)
    tok2vec.set_dim("nO", encode.get_dim("nO"))
    tok2vec.set_ref("embed", embed)
    tok2vec.set_ref("encode", encode)
    return tok2vec
@registry.architectures.register("spacy.MultiHashEmbed.v1")
 def MultiHashEmbed(
    width: int,
@ -244,6 +278,7 @@ def CharacterEmbed(
    return model
 # TODO: archive
@registry.architectures.register("spacy.MaxoutWindowEncoder.v1")
 def MaxoutWindowEncoder(
    width: int, window_size: int, maxout_pieces: int, depth: int
@ -275,7 +310,39 @@ def MaxoutWindowEncoder(
    model.attrs["receptive_field"] = window_size * depth
    return model
@registry.architectures.register("spacy.MaxoutWindowEncoder.v2")
 def MaxoutWindowEncoder(
    width: int, window_size: int, maxout_pieces: int, depth: int
 ) -> Model[List[Floats2d], List[Floats2d]]:
    """Encode context using convolutions with maxout activation, layer
    normalization and residual connections.
    width (int): The input and output width. These are required to be the same,
        to allow residual connections. This value will be determined by the
        width of the inputs. Recommended values are between 64 and 300.
    window_size (int): The number of words to concatenate around each token
        to construct the convolution. Recommended value is 1.
    maxout_pieces (int): The number of maxout pieces to use. Recommended
        values are 2 or 3.
    depth (int): The number of convolutional layers. Recommended value is 4.
    """
    cnn = chain(
        expand_window(window_size=window_size),
        Maxout(
            nO=width,
            nI=width * ((window_size * 2) + 1),
            nP=maxout_pieces,
            dropout=0.0,
            normalize=True,
        ),
    )
    model = clone(residual(cnn), depth)
    model.set_dim("nO", width)
    receptive_field = window_size * depth
    return with_array(model, pad=receptive_field)
 # TODO: archive
@registry.architectures.register("spacy.MishWindowEncoder.v1")
 def MishWindowEncoder(
    width: int, window_size: int, depth: int
@ -299,6 +366,29 @@ def MishWindowEncoder(
    return model
@registry.architectures.register("spacy.MishWindowEncoder.v2")
 def MishWindowEncoder(
    width: int, window_size: int, depth: int
 ) -> Model[List[Floats2d], List[Floats2d]]:
    """Encode context using convolutions with mish activation, layer
    normalization and residual connections.
    width (int): The input and output width. These are required to be the same,
        to allow residual connections. This value will be determined by the
        width of the inputs. Recommended values are between 64 and 300.
    window_size (int): The number of words to concatenate around each token
        to construct the convolution. Recommended value is 1.
    depth (int): The number of convolutional layers. Recommended value is 4.
    """
    cnn = chain(
        expand_window(window_size=window_size),
        Mish(nO=width, nI=width * ((window_size * 2) + 1), dropout=0.0, normalize=True),
    )
    model = clone(residual(cnn), depth)
    model.set_dim("nO", width)
    return with_array(model)
@registry.architectures.register("spacy.TorchBiLSTMEncoder.v1")
 def BiLSTMEncoder(
    width: int, depth: int, dropout: float
@ -308,9 +398,9 @@ def BiLSTMEncoder(
    width (int): The input and output width. These are required to be the same,
        to allow residual connections. This value will be determined by the
        width of the inputs. Recommended values are between 64 and 300.
-    window_size (int): The number of words to concatenate around each token
+    depth (int): The number of recurrent layers.
-        to construct the convolution. Recommended value is 1.
+    dropout (float): Creates a Dropout layer on the outputs of each LSTM layer
-    depth (int): The number of convolutional layers. Recommended value is 4.
+        except the last layer. Set to 0 to disable this functionality.
    """
    if depth == 0:
        return noop()
--- a/spacy/ml/staticvectors.py
+++ b/spacy/ml/staticvectors.py
@ -47,8 +47,7 @@ def forward(
    except ValueError:
        raise RuntimeError(Errors.E896)
    output = Ragged(
-        vectors_data,
+        vectors_data, model.ops.asarray([len(doc) for doc in docs], dtype="i")
        model.ops.asarray([len(doc) for doc in docs], dtype="i")
    )
    mask = None
    if is_train:
--- a/spacy/ml/tb_framework.py
+++ b/spacy/ml/tb_framework.py
@ -1,8 +1,10 @@
-from thinc.api import Model, noop, use_ops, Linear
+from thinc.api import Model, noop
 from .parser_model import ParserStepModel
-def TransitionModel(tok2vec, lower, upper, resize_output, dropout=0.2, unseen_classes=set()):
+def TransitionModel(
    tok2vec, lower, upper, resize_output, dropout=0.2, unseen_classes=set()
 ):
    """Set up a stepwise transition-based model"""
    if upper is None:
        has_upper = False
@ -44,4 +46,3 @@ def init(model, X=None, Y=None):
    if model.attrs["has_upper"]:
        statevecs = model.ops.alloc2f(2, lower.get_dim("nO"))
        model.get_ref("upper").initialize(X=statevecs)
--- a/spacy/morphology.pyx
+++ b/spacy/morphology.pyx
@ -133,8 +133,9 @@ cdef class Morphology:
        """
        cdef MorphAnalysisC tag
        tag.length = len(field_feature_pairs)
-        tag.fields = <attr_t*>self.mem.alloc(tag.length, sizeof(attr_t))
+        if tag.length > 0:
-        tag.features = <attr_t*>self.mem.alloc(tag.length, sizeof(attr_t))
+            tag.fields = <attr_t*>self.mem.alloc(tag.length, sizeof(attr_t))
            tag.features = <attr_t*>self.mem.alloc(tag.length, sizeof(attr_t))
        for i, (field, feature) in enumerate(field_feature_pairs):
            tag.fields[i] = field
            tag.features[i] = feature
--- a/spacy/pipeline/init.py
+++ b/spacy/pipeline/init.py
@ -11,6 +11,7 @@ from .senter import SentenceRecognizer
 from .sentencizer import Sentencizer
 from .tagger import Tagger
 from .textcat import TextCategorizer
 from .textcat_multilabel import MultiLabel_TextCategorizer
 from .tok2vec import Tok2Vec
 from .functions import merge_entities, merge_noun_chunks, merge_subtokens
@ -22,13 +23,14 @@ __all__ = [
    "EntityRuler",
    "Morphologizer",
    "Lemmatizer",
-    "TrainablePipe",
+    "MultiLabel_TextCategorizer",
    "Pipe",
    "SentenceRecognizer",
    "Sentencizer",
    "Tagger",
    "TextCategorizer",
    "Tok2Vec",
    "TrainablePipe",
    "merge_entities",
    "merge_noun_chunks",
    "merge_subtokens",
--- a/spacy/pipeline/_parser_internals/_beam_utils.pyx
+++ b/spacy/pipeline/_parser_internals/_beam_utils.pyx
@ -255,7 +255,7 @@ def get_gradient(nr_class, beam_maps, histories, losses):
    for a beam state -- so we have "the gradient of loss for taking
    action i given history H."
-    Histories: Each hitory is a list of actions
+    Histories: Each history is a list of actions
    Each candidate has a history
    Each beam has multiple candidates
    Each batch has multiple beams
--- a/spacy/pipeline/_parser_internals/arc_eager.pxd
+++ b/spacy/pipeline/_parser_internals/arc_eager.pxd
@ -4,4 +4,4 @@ from .transition_system cimport Transition, TransitionSystem
 cdef class ArcEager(TransitionSystem):
-    pass
+    cdef get_arcs(self, StateC* state)
--- a/spacy/pipeline/_parser_internals/arc_eager.pyx
+++ b/spacy/pipeline/_parser_internals/arc_eager.pyx
@ -1,6 +1,7 @@
 # cython: profile=True, cdivision=True, infer_types=True
 from cymem.cymem cimport Pool, Address
 from libc.stdint cimport int32_t
 from libcpp.vector cimport vector
 from collections import defaultdict, Counter
@ -10,9 +11,9 @@ from ...structs cimport TokenC
 from ...tokens.doc cimport Doc, set_children_from_heads
 from ...training.example cimport Example
 from .stateclass cimport StateClass
-from ._state cimport StateC
+from ._state cimport StateC, ArcC
 from ...errors import Errors
 from thinc.extra.search cimport Beam
 cdef weight_t MIN_SCORE = -90000
 cdef attr_t SUBTOK_LABEL = hash_string(u'subtok')
@ -65,6 +66,7 @@ cdef GoldParseStateC create_gold_state(Pool mem, const StateC* state,
    cdef GoldParseStateC gs
    gs.length = len(heads)
    gs.stride = 1
    assert gs.length > 0
    gs.labels = <attr_t*>mem.alloc(gs.length, sizeof(gs.labels[0]))
    gs.heads = <int32_t*>mem.alloc(gs.length, sizeof(gs.heads[0]))
    gs.n_kids = <int32_t*>mem.alloc(gs.length, sizeof(gs.n_kids[0]))
@ -126,6 +128,7 @@ cdef GoldParseStateC create_gold_state(Pool mem, const StateC* state,
                1
            )
    # Make an array of pointers, pointing into the gs_kids_flat array.
    assert gs.length > 0
    gs.kids = <int32_t**>mem.alloc(gs.length, sizeof(int32_t*))
    for i in range(gs.length):
        if gs.n_kids[i] != 0:
@ -609,7 +612,7 @@ cdef class ArcEager(TransitionSystem):
        return gold
    def init_gold_batch(self, examples):
-        # TODO: Projectivitity?
+        # TODO: Projectivity?
        all_states = self.init_batch([eg.predicted for eg in examples])
        golds = []
        states = []
@ -705,6 +708,28 @@ cdef class ArcEager(TransitionSystem):
                doc.c[i].dep = self.root_label
        set_children_from_heads(doc.c, 0, doc.length)
    def get_beam_parses(self, Beam beam):
        parses = []
        probs = beam.probs
        for i in range(beam.size):
            state = <StateC*>beam.at(i)
            if state.is_final():
                prob = probs[i]
                parse = []
                arcs = self.get_arcs(state)
                if arcs:
                    for arc in arcs:
                        dep = arc["label"]
                        label = self.strings[dep]
                        parse.append((arc["head"], arc["child"], label))
                    parses.append((prob, parse))
        return parses
    cdef get_arcs(self, StateC* state):
        cdef vector[ArcC] arcs
        state.get_arcs(&arcs)
        return list(arcs)
    def has_gold(self, Example eg, start=0, end=None):
        for word in eg.y[start:end]:
            if word.dep != 0:
--- a/spacy/pipeline/_parser_internals/ner.pyx
+++ b/spacy/pipeline/_parser_internals/ner.pyx
@ -2,6 +2,7 @@ from libc.stdint cimport int32_t
 from cymem.cymem cimport Pool
 from collections import Counter
 from thinc.extra.search cimport Beam
 from ...tokens.doc cimport Doc
 from ...tokens.span import Span
@ -63,6 +64,7 @@ cdef GoldNERStateC create_gold_state(
    Example example
 ) except *:
    cdef GoldNERStateC gs
    assert example.x.length > 0
    gs.ner = <Transition*>mem.alloc(example.x.length, sizeof(Transition))
    ner_tags = example.get_aligned_ner()
    for i, ner_tag in enumerate(ner_tags):
@ -245,6 +247,21 @@ cdef class BiluoPushDown(TransitionSystem):
            if doc.c[i].ent_iob == 0:
                doc.c[i].ent_iob = 2
    def get_beam_parses(self, Beam beam):
        parses = []
        probs = beam.probs
        for i in range(beam.size):
            state = <StateC*>beam.at(i)
            if state.is_final():
                prob = probs[i]
                parse = []
                for j in range(state._ents.size()):
                    ent = state._ents.at(j)
                    if ent.start != -1 and ent.end != -1:
                        parse.append((ent.start, ent.end, self.strings[ent.label]))
                parses.append((prob, parse))
        return parses
    def init_gold(self, StateClass state, Example example):
        return BiluoGold(self, state, example)
--- a/spacy/pipeline/attributeruler.py
+++ b/spacy/pipeline/attributeruler.py
@ -226,6 +226,7 @@ class AttributeRuler(Pipe):
        DOCS: https://nightly.spacy.io/api/tagger#score
        """
        def morph_key_getter(token, attr):
            return getattr(token, attr).key
@ -240,8 +241,16 @@ class AttributeRuler(Pipe):
            elif attr == POS:
                results.update(Scorer.score_token_attr(examples, "pos", **kwargs))
            elif attr == MORPH:
-                results.update(Scorer.score_token_attr(examples, "morph", getter=morph_key_getter, **kwargs))
+                results.update(
-                results.update(Scorer.score_token_attr_per_feat(examples, "morph", getter=morph_key_getter, **kwargs))
+                    Scorer.score_token_attr(
                        examples, "morph", getter=morph_key_getter, **kwargs
                    )
                )
                results.update(
                    Scorer.score_token_attr_per_feat(
                        examples, "morph", getter=morph_key_getter, **kwargs
                    )
                )
            elif attr == LEMMA:
                results.update(Scorer.score_token_attr(examples, "lemma", **kwargs))
        return results
--- a/spacy/pipeline/dep_parser.pyx
+++ b/spacy/pipeline/dep_parser.pyx
@ -1,4 +1,5 @@
 # cython: infer_types=True, profile=True, binding=True
 from collections import defaultdict
 from typing import Optional, Iterable
 from thinc.api import Model, Config
@ -258,3 +259,20 @@ cdef class DependencyParser(Parser):
        results.update(Scorer.score_deps(examples, "dep", **kwargs))
        del results["sents_per_type"]
        return results
    def scored_parses(self, beams):
        """Return two dictionaries with scores for each beam/doc that was processed:
        one containing (i, head) keys, and another containing (i, label) keys.
        """
        head_scores = []
        label_scores = []
        for beam in beams:
            score_head_dict = defaultdict(float)
            score_label_dict = defaultdict(float)
            for score, parses in self.moves.get_beam_parses(beam):
                for head, i, label in parses:
                    score_head_dict[(i, head)] += score
                    score_label_dict[(i, label)] += score
            head_scores.append(score_head_dict)
            label_scores.append(score_label_dict)
        return head_scores, label_scores
--- a/spacy/pipeline/morphologizer.pyx
+++ b/spacy/pipeline/morphologizer.pyx
@ -24,7 +24,7 @@ default_model_config = """
@architectures = "spacy.Tagger.v1"
 [model.tok2vec]
-@architectures = "spacy.Tok2Vec.v1"
+@architectures = "spacy.Tok2Vec.v2"
 [model.tok2vec.embed]
@architectures = "spacy.CharacterEmbed.v1"
@ -35,7 +35,7 @@ nC = 8
 include_static_vectors = false
 [model.tok2vec.encode]
-@architectures = "spacy.MaxoutWindowEncoder.v1"
+@architectures = "spacy.MaxoutWindowEncoder.v2"
 width = 128
 depth = 4
 window_size = 1
--- a/spacy/pipeline/ner.pyx
+++ b/spacy/pipeline/ner.pyx
@ -1,4 +1,5 @@
 # cython: infer_types=True, profile=True, binding=True
 from collections import defaultdict
 from typing import Optional, Iterable
 from thinc.api import Model, Config
@ -197,3 +198,16 @@ cdef class EntityRecognizer(Parser):
        """
        validate_examples(examples, "EntityRecognizer.score")
        return get_ner_prf(examples)
    def scored_ents(self, beams):
        """Return a dictionary of (start, end, label) tuples with corresponding scores
        for each beam/doc that was processed.
        """
        entity_scores = []
        for beam in beams:
            score_dict = defaultdict(float)
            for score, ents in self.moves.get_beam_parses(beam):
                for start, end, label in ents:
                    score_dict[(start, end, label)] += score
            entity_scores.append(score_dict)
        return entity_scores
--- a/spacy/pipeline/tagger.pyx
+++ b/spacy/pipeline/tagger.pyx
@ -256,8 +256,14 @@ class Tagger(TrainablePipe):
        DOCS: https://nightly.spacy.io/api/tagger#get_loss
        """
        validate_examples(examples, "Tagger.get_loss")
-        loss_func = SequenceCategoricalCrossentropy(names=self.labels, normalize=False, missing_value="")
+        loss_func = SequenceCategoricalCrossentropy(names=self.labels, normalize=False)
-        truths = [eg.get_aligned("TAG", as_string=True) for eg in examples]
+        # Convert empty tag "" to missing value None so that both misaligned
        # tokens and tokens with missing annotation have the default missing
        # value None.
        truths = []
        for eg in examples:
            eg_truths = [tag if tag is not "" else None for tag in eg.get_aligned("TAG", as_string=True)]
            truths.append(eg_truths)
        d_scores, loss = loss_func(scores, truths)
        if self.model.ops.xp.isnan(loss):
            raise ValueError(Errors.E910.format(name=self.name))
--- a/spacy/pipeline/textcat.py
+++ b/spacy/pipeline/textcat.py
@ -14,12 +14,12 @@ from ..tokens import Doc
 from ..vocab import Vocab
-default_model_config = """
+single_label_default_config = """
 [model]
@architectures = "spacy.TextCatEnsemble.v2"
 [model.tok2vec]
-@architectures = "spacy.Tok2Vec.v1"
+@architectures = "spacy.Tok2Vec.v2"
 [model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v1"
@ -29,7 +29,7 @@ attrs = ["ORTH", "LOWER", "PREFIX", "SUFFIX", "SHAPE", "ID"]
 include_static_vectors = false
 [model.tok2vec.encode]
-@architectures = "spacy.MaxoutWindowEncoder.v1"
+@architectures = "spacy.MaxoutWindowEncoder.v2"
 width = ${model.tok2vec.embed.width}
 window_size = 1
 maxout_pieces = 3
@ -37,24 +37,24 @@ depth = 2
 [model.linear_model]
@architectures = "spacy.TextCatBOW.v1"
-exclusive_classes = false
+exclusive_classes = true
 ngram_size = 1
 no_output_layer = false
 """
-DEFAULT_TEXTCAT_MODEL = Config().from_str(default_model_config)["model"]
+DEFAULT_SINGLE_TEXTCAT_MODEL = Config().from_str(single_label_default_config)["model"]
-bow_model_config = """
+single_label_bow_config = """
 [model]
@architectures = "spacy.TextCatBOW.v1"
-exclusive_classes = false
+exclusive_classes = true
 ngram_size = 1
 no_output_layer = false
 """
-cnn_model_config = """
+single_label_cnn_config = """
 [model]
@architectures = "spacy.TextCatCNN.v1"
-exclusive_classes = false
+exclusive_classes = true
 [model.tok2vec]
@architectures = "spacy.HashEmbedCNN.v1"
@ -71,7 +71,7 @@ subword_features = true
@Language.factory(
    "textcat",
    assigns=["doc.cats"],
-    default_config={"threshold": 0.5, "model": DEFAULT_TEXTCAT_MODEL},
+    default_config={"threshold": 0.5, "model": DEFAULT_SINGLE_TEXTCAT_MODEL},
    default_score_weights={
        "cats_score": 1.0,
        "cats_score_desc": None,
@ -103,7 +103,7 @@ def make_textcat(
 class TextCategorizer(TrainablePipe):
-    """Pipeline component for text classification.
+    """Pipeline component for single-label text classification.
    DOCS: https://nightly.spacy.io/api/textcategorizer
    """
@ -111,7 +111,7 @@ class TextCategorizer(TrainablePipe):
    def __init__(
        self, vocab: Vocab, model: Model, name: str = "textcat", *, threshold: float
    ) -> None:
-        """Initialize a text categorizer.
+        """Initialize a text categorizer for single-label classification.
        vocab (Vocab): The shared vocabulary.
        model (thinc.api.Model): The Thinc Model powering the pipeline component.
@ -214,6 +214,7 @@ class TextCategorizer(TrainablePipe):
            losses = {}
        losses.setdefault(self.name, 0.0)
        validate_examples(examples, "TextCategorizer.update")
        self._validate_categories(examples)
        if not any(len(eg.predicted) if eg.predicted else 0 for eg in examples):
            # Handle cases where there are no tokens in any docs.
            return losses
@ -256,6 +257,7 @@ class TextCategorizer(TrainablePipe):
        if self._rehearsal_model is None:
            return losses
        validate_examples(examples, "TextCategorizer.rehearse")
        self._validate_categories(examples)
        docs = [eg.predicted for eg in examples]
        if not any(len(doc) for doc in docs):
            # Handle cases where there are no tokens in any docs.
@ -296,6 +298,7 @@ class TextCategorizer(TrainablePipe):
        DOCS: https://nightly.spacy.io/api/textcategorizer#get_loss
        """
        validate_examples(examples, "TextCategorizer.get_loss")
        self._validate_categories(examples)
        truths, not_missing = self._examples_to_truth(examples)
        not_missing = self.model.ops.asarray(not_missing)
        d_scores = (scores - truths) / scores.shape[0]
@ -341,6 +344,7 @@ class TextCategorizer(TrainablePipe):
        DOCS: https://nightly.spacy.io/api/textcategorizer#initialize
        """
        validate_get_examples(get_examples, "TextCategorizer.initialize")
        self._validate_categories(get_examples())
        if labels is None:
            for example in get_examples():
                for cat in example.y.cats:
@ -373,12 +377,20 @@ class TextCategorizer(TrainablePipe):
        DOCS: https://nightly.spacy.io/api/textcategorizer#score
        """
        validate_examples(examples, "TextCategorizer.score")
        self._validate_categories(examples)
        return Scorer.score_cats(
            examples,
            "cats",
            labels=self.labels,
-            multi_label=self.model.attrs["multi_label"],
+            multi_label=False,
            positive_label=self.cfg["positive_label"],
            threshold=self.cfg["threshold"],
            **kwargs,
        )
    def _validate_categories(self, examples: List[Example]):
        """Check whether the provided examples all have single-label cats annotations."""
        for ex in examples:
            if list(ex.reference.cats.values()).count(1.0) > 1:
                raise ValueError(Errors.E895.format(value=ex.reference.cats))
--- a/spacy/pipeline/textcat_multilabel.py
+++ b/spacy/pipeline/textcat_multilabel.py
@ -0,0 +1,191 @@
 from itertools import islice
 from typing import Iterable, Optional, Dict, List, Callable, Any
 from thinc.api import Model, Config
 from thinc.types import Floats2d
 from ..language import Language
 from ..training import Example, validate_examples, validate_get_examples
 from ..errors import Errors
 from ..scorer import Scorer
 from ..tokens import Doc
 from ..vocab import Vocab
 from .textcat import TextCategorizer
 multi_label_default_config = """
 [model]
@architectures = "spacy.TextCatEnsemble.v2"
 [model.tok2vec]
@architectures = "spacy.Tok2Vec.v1"
 [model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v1"
 width = 64
 rows = [2000, 2000, 1000, 1000, 1000, 1000]
 attrs = ["ORTH", "LOWER", "PREFIX", "SUFFIX", "SHAPE", "ID"]
 include_static_vectors = false
 [model.tok2vec.encode]
@architectures = "spacy.MaxoutWindowEncoder.v1"
 width = ${model.tok2vec.embed.width}
 window_size = 1
 maxout_pieces = 3
 depth = 2
 [model.linear_model]
@architectures = "spacy.TextCatBOW.v1"
 exclusive_classes = false
 ngram_size = 1
 no_output_layer = false
 """
 DEFAULT_MULTI_TEXTCAT_MODEL = Config().from_str(multi_label_default_config)["model"]
 multi_label_bow_config = """
 [model]
@architectures = "spacy.TextCatBOW.v1"
 exclusive_classes = false
 ngram_size = 1
 no_output_layer = false
 """
 multi_label_cnn_config = """
 [model]
@architectures = "spacy.TextCatCNN.v1"
 exclusive_classes = false
 [model.tok2vec]
@architectures = "spacy.HashEmbedCNN.v1"
 pretrained_vectors = null
 width = 96
 depth = 4
 embed_size = 2000
 window_size = 1
 maxout_pieces = 3
 subword_features = true
 """
@Language.factory(
    "textcat_multilabel",
    assigns=["doc.cats"],
    default_config={"threshold": 0.5, "model": DEFAULT_MULTI_TEXTCAT_MODEL},
    default_score_weights={
        "cats_score": 1.0,
        "cats_score_desc": None,
        "cats_micro_p": None,
        "cats_micro_r": None,
        "cats_micro_f": None,
        "cats_macro_p": None,
        "cats_macro_r": None,
        "cats_macro_f": None,
        "cats_macro_auc": None,
        "cats_f_per_type": None,
        "cats_macro_auc_per_type": None,
    },
 )
 def make_multilabel_textcat(
    nlp: Language, name: str, model: Model[List[Doc], List[Floats2d]], threshold: float
 ) -> "TextCategorizer":
    """Create a TextCategorizer compoment. The text categorizer predicts categories
    over a whole document. It can learn one or more labels, and the labels can
    be mutually exclusive (i.e. one true label per doc) or non-mutually exclusive
    (i.e. zero or more labels may be true per doc). The multi-label setting is
    controlled by the model instance that's provided.
    model (Model[List[Doc], List[Floats2d]]): A model instance that predicts
        scores for each category.
    threshold (float): Cutoff to consider a prediction "positive".
    """
    return MultiLabel_TextCategorizer(nlp.vocab, model, name, threshold=threshold)
 class MultiLabel_TextCategorizer(TextCategorizer):
    """Pipeline component for multi-label text classification.
    DOCS: https://nightly.spacy.io/api/multilabel_textcategorizer
    """
    def __init__(
        self,
        vocab: Vocab,
        model: Model,
        name: str = "textcat_multilabel",
        *,
        threshold: float,
    ) -> None:
        """Initialize a text categorizer for multi-label classification.
        vocab (Vocab): The shared vocabulary.
        model (thinc.api.Model): The Thinc Model powering the pipeline component.
        name (str): The component instance name, used to add entries to the
            losses during training.
        threshold (float): Cutoff to consider a prediction "positive".
        DOCS: https://nightly.spacy.io/api/multilabel_textcategorizer#init
        """
        self.vocab = vocab
        self.model = model
        self.name = name
        self._rehearsal_model = None
        cfg = {"labels": [], "threshold": threshold}
        self.cfg = dict(cfg)
    def initialize(
        self,
        get_examples: Callable[[], Iterable[Example]],
        *,
        nlp: Optional[Language] = None,
        labels: Optional[Dict] = None,
    ):
        """Initialize the pipe for training, using a representative set
        of data examples.
        get_examples (Callable[[], Iterable[Example]]): Function that
            returns a representative sample of gold-standard Example objects.
        nlp (Language): The current nlp object the component is part of.
        labels: The labels to add to the component, typically generated by the
            `init labels` command. If no labels are provided, the get_examples
            callback is used to extract the labels from the data.
        DOCS: https://nightly.spacy.io/api/multilabel_textcategorizer#initialize
        """
        validate_get_examples(get_examples, "MultiLabel_TextCategorizer.initialize")
        if labels is None:
            for example in get_examples():
                for cat in example.y.cats:
                    self.add_label(cat)
        else:
            for label in labels:
                self.add_label(label)
        subbatch = list(islice(get_examples(), 10))
        doc_sample = [eg.reference for eg in subbatch]
        label_sample, _ = self._examples_to_truth(subbatch)
        self._require_labels()
        assert len(doc_sample) > 0, Errors.E923.format(name=self.name)
        assert len(label_sample) > 0, Errors.E923.format(name=self.name)
        self.model.initialize(X=doc_sample, Y=label_sample)
    def score(self, examples: Iterable[Example], **kwargs) -> Dict[str, Any]:
        """Score a batch of examples.
        examples (Iterable[Example]): The examples to score.
        RETURNS (Dict[str, Any]): The scores, produced by Scorer.score_cats.
        DOCS: https://nightly.spacy.io/api/multilabel_textcategorizer#score
        """
        validate_examples(examples, "MultiLabel_TextCategorizer.score")
        return Scorer.score_cats(
            examples,
            "cats",
            labels=self.labels,
            multi_label=True,
            threshold=self.cfg["threshold"],
            **kwargs,
        )
    def _validate_categories(self, examples: List[Example]):
        """This component allows any type of single- or multi-label annotations.
        This method overwrites the more strict one from 'textcat'. """
        pass
--- a/spacy/scorer.py
+++ b/spacy/scorer.py
@ -3,7 +3,7 @@ import numpy as np
 from collections import defaultdict
 from .training import Example
-from .tokens import Token, Doc, Span, MorphAnalysis
+from .tokens import Token, Doc, Span
 from .errors import Errors
 from .util import get_lang_class, SimpleFrozenList
 from .morphology import Morphology
@ -176,7 +176,7 @@ class Scorer:
                "token_acc": None,
                "token_p": None,
                "token_r": None,
-                "token_f": None
+                "token_f": None,
            }
    @staticmethod
@ -276,7 +276,10 @@ class Scorer:
                    if gold_i not in missing_indices:
                        value = getter(token, attr)
                        morph = gold_doc.vocab.strings[value]
-                        if value not in missing_values and morph != Morphology.EMPTY_MORPH:
+                        if (
                            value not in missing_values
                            and morph != Morphology.EMPTY_MORPH
                        ):
                            for feat in morph.split(Morphology.FEATURE_SEP):
                                field, values = feat.split(Morphology.FIELD_SEP)
                                if field not in per_feat:
@ -367,7 +370,6 @@ class Scorer:
                f"{attr}_per_type": None,
            }
    @staticmethod
    def score_cats(
        examples: Iterable[Example],
@ -458,7 +460,7 @@ class Scorer:
                gold_label, gold_score = max(gold_cats, key=lambda it: it[1])
                if gold_score is not None and gold_score > 0:
                    f_per_type[gold_label].fn += 1
-            else:
+            elif pred_cats:
                pred_label, pred_score = max(pred_cats, key=lambda it: it[1])
                if pred_score >= threshold:
                    f_per_type[pred_label].fp += 1
@ -473,7 +475,10 @@ class Scorer:
        macro_f = sum(prf.fscore for prf in f_per_type.values()) / n_cats
        # Limit macro_auc to those labels with gold annotations,
        # but still divide by all cats to avoid artificial boosting of datasets with missing labels
-        macro_auc = sum(auc.score if auc.is_binary() else 0.0 for auc in auc_per_type.values()) / n_cats
+        macro_auc = (
            sum(auc.score if auc.is_binary() else 0.0 for auc in auc_per_type.values())
            / n_cats
        )
        results = {
            f"{attr}_score": None,
            f"{attr}_score_desc": None,
@ -485,7 +490,9 @@ class Scorer:
            f"{attr}_macro_f": macro_f,
            f"{attr}_macro_auc": macro_auc,
            f"{attr}_f_per_type": {k: v.to_dict() for k, v in f_per_type.items()},
-            f"{attr}_auc_per_type": {k: v.score if v.is_binary() else None for k, v in auc_per_type.items()},
+            f"{attr}_auc_per_type": {
                k: v.score if v.is_binary() else None for k, v in auc_per_type.items()
            },
        }
        if len(labels) == 2 and not multi_label and positive_label:
            positive_label_f = results[f"{attr}_f_per_type"][positive_label]["f"]
@ -675,8 +682,7 @@ class Scorer:
 def get_ner_prf(examples: Iterable[Example]) -> Dict[str, Any]:
-    """Compute micro-PRF and per-entity PRF scores for a sequence of examples.
+    """Compute micro-PRF and per-entity PRF scores for a sequence of examples."""
    """
    score_per_type = defaultdict(PRFScore)
    for eg in examples:
        if not eg.y.has_annotation("ENT_IOB"):
--- a/spacy/tests/doc/test_doc_api.py
+++ b/spacy/tests/doc/test_doc_api.py
@ -154,10 +154,10 @@ def test_doc_api_serialize(en_tokenizer, text):
    logger = logging.getLogger("spacy")
    with mock.patch.object(logger, "warning") as mock_warning:
-        _ = tokens.to_bytes()
+        _ = tokens.to_bytes()  # noqa: F841
        mock_warning.assert_not_called()
        tokens.user_hooks["similarity"] = inner_func
-        _ = tokens.to_bytes()
+        _ = tokens.to_bytes()  # noqa: F841
        mock_warning.assert_called_once()
--- a/spacy/tests/doc/test_retokenize_merge.py
+++ b/spacy/tests/doc/test_retokenize_merge.py
@ -21,11 +21,13 @@ def test_doc_retokenize_merge(en_tokenizer):
    assert doc[4].text == "the beach boys"
    assert doc[4].text_with_ws == "the beach boys "
    assert doc[4].tag_ == "NAMED"
    assert doc[4].lemma_ == "LEMMA"
    assert str(doc[4].morph) == "Number=Plur"
    assert doc[5].text == "all night"
    assert doc[5].text_with_ws == "all night"
    assert doc[5].tag_ == "NAMED"
    assert str(doc[5].morph) == "Number=Plur"
    assert doc[5].lemma_ == "LEMMA"
 def test_doc_retokenize_merge_children(en_tokenizer):
@ -103,25 +105,29 @@ def test_doc_retokenize_spans_merge_tokens(en_tokenizer):
 def test_doc_retokenize_spans_merge_tokens_default_attrs(en_vocab):
    words = ["The", "players", "start", "."]
    lemmas = [t.lower() for t in words]
    heads = [1, 2, 2, 2]
    tags = ["DT", "NN", "VBZ", "."]
    pos = ["DET", "NOUN", "VERB", "PUNCT"]
-    doc = Doc(en_vocab, words=words, tags=tags, pos=pos, heads=heads)
+    doc = Doc(en_vocab, words=words, tags=tags, pos=pos, heads=heads, lemmas=lemmas)
    assert len(doc) == 4
    assert doc[0].text == "The"
    assert doc[0].tag_ == "DT"
    assert doc[0].pos_ == "DET"
    assert doc[0].lemma_ == "the"
    with doc.retokenize() as retokenizer:
        retokenizer.merge(doc[0:2])
    assert len(doc) == 3
    assert doc[0].text == "The players"
    assert doc[0].tag_ == "NN"
    assert doc[0].pos_ == "NOUN"
-    doc = Doc(en_vocab, words=words, tags=tags, pos=pos, heads=heads)
+    assert doc[0].lemma_ == "the players"
    doc = Doc(en_vocab, words=words, tags=tags, pos=pos, heads=heads, lemmas=lemmas)
    assert len(doc) == 4
    assert doc[0].text == "The"
    assert doc[0].tag_ == "DT"
    assert doc[0].pos_ == "DET"
    assert doc[0].lemma_ == "the"
    with doc.retokenize() as retokenizer:
        retokenizer.merge(doc[0:2])
        retokenizer.merge(doc[2:4])
@ -129,9 +135,11 @@ def test_doc_retokenize_spans_merge_tokens_default_attrs(en_vocab):
    assert doc[0].text == "The players"
    assert doc[0].tag_ == "NN"
    assert doc[0].pos_ == "NOUN"
    assert doc[0].lemma_ == "the players"
    assert doc[1].text == "start ."
    assert doc[1].tag_ == "VBZ"
    assert doc[1].pos_ == "VERB"
    assert doc[1].lemma_ == "start ."
 def test_doc_retokenize_spans_merge_heads(en_vocab):
--- a/spacy/tests/doc/test_retokenize_split.py
+++ b/spacy/tests/doc/test_retokenize_split.py
@ -39,6 +39,36 @@ def test_doc_retokenize_split(en_vocab):
    assert len(str(doc)) == 19
 def test_doc_retokenize_split_lemmas(en_vocab):
    # If lemmas are not set, leave unset
    words = ["LosAngeles", "start", "."]
    heads = [1, 2, 2]
    doc = Doc(en_vocab, words=words, heads=heads)
    with doc.retokenize() as retokenizer:
        retokenizer.split(
            doc[0],
            ["Los", "Angeles"],
            [(doc[0], 1), doc[1]],
        )
    assert doc[0].lemma_ == ""
    assert doc[1].lemma_ == ""
    # If lemmas are set, use split orth as default lemma
    words = ["LosAngeles", "start", "."]
    heads = [1, 2, 2]
    doc = Doc(en_vocab, words=words, heads=heads)
    for t in doc:
        t.lemma_ = "a"
    with doc.retokenize() as retokenizer:
        retokenizer.split(
            doc[0],
            ["Los", "Angeles"],
            [(doc[0], 1), doc[1]],
        )
    assert doc[0].lemma_ == "Los"
    assert doc[1].lemma_ == "Angeles"
 def test_doc_retokenize_split_dependencies(en_vocab):
    doc = Doc(en_vocab, words=["LosAngeles", "start", "."])
    dep1 = doc.vocab.strings.add("amod")
--- a/spacy/tests/lang/en/test_exceptions.py
+++ b/spacy/tests/lang/en/test_exceptions.py
@ -113,9 +113,8 @@ def test_en_tokenizer_norm_exceptions(en_tokenizer, text, norms):
    assert [token.norm_ for token in tokens] == norms
@pytest.mark.skip
@pytest.mark.parametrize(
-    "text,norm", [("radicalised", "radicalized"), ("cuz", "because")]
+    "text,norm", [("Jan.", "January"), ("'cuz", "because")]
 )
 def test_en_lex_attrs_norm_exceptions(en_tokenizer, text, norm):
    tokens = en_tokenizer(text)
--- a/spacy/tests/lang/mk/test_text.py
+++ b/spacy/tests/lang/mk/test_text.py
@ -4,21 +4,21 @@ from spacy.lang.mk.lex_attrs import like_num
 def test_tokenizer_handles_long_text(mk_tokenizer):
    text = """
-    Во организациските работи или на нашите собранија со членството, никој од нас не зборуваше за 
+    Во организациските работи или на нашите собранија со членството, никој од нас не зборуваше за
-    организацијата и идеологијата. Работна беше нашата работа, а не идеолошка. Што се однесува до социјализмот на 
+    организацијата и идеологијата. Работна беше нашата работа, а не идеолошка. Што се однесува до социјализмот на
-    Делчев, неговата дејност зборува сама за себе - спротивно. Во суштина, водачите си имаа свои основни погледи и 
+    Делчев, неговата дејност зборува сама за себе - спротивно. Во суштина, водачите си имаа свои основни погледи и
-    свои разбирања за положбата и работите, коишто стоеја пред нив и ги завршуваа со голема упорност, настојчивост и 
+    свои разбирања за положбата и работите, коишто стоеја пред нив и ги завршуваа со голема упорност, настојчивост и
-    насоченост. Значи, идеологија имаше, само што нивната идеологија имаше своја оригиналност. Македонија денеска, 
+    насоченост. Значи, идеологија имаше, само што нивната идеологија имаше своја оригиналност. Македонија денеска,
-    чиста рожба на животот и положбата во Македонија, кои му служеа како база на неговите побуди, беше дејност која 
+    чиста рожба на животот и положбата во Македонија, кои му служеа како база на неговите побуди, беше дејност која
-    имаше потреба од ум за да си најде своја смисла. Таквата идеологија и заемното дејство на умот и срцето му 
+    имаше потреба од ум за да си најде своја смисла. Таквата идеологија и заемното дејство на умот и срцето му
-    помогнаа на Делчев да не се занесе по патот на својата идеологија... Во суштина, Организацијата и нејзините 
+    помогнаа на Делчев да не се занесе по патот на својата идеологија... Во суштина, Организацијата и нејзините
-    водачи имаа свои разбирања за работите и положбата во идеен поглед, но тоа беше врската, животот и положбата во 
+    водачи имаа свои разбирања за работите и положбата во идеен поглед, но тоа беше врската, животот и положбата во
-    Македонија и го внесуваа во својата идеологија гласот на своето срце, и на крај, прибегнуваа до умот, 
+    Македонија и го внесуваа во својата идеологија гласот на своето срце, и на крај, прибегнуваа до умот,
-    за да најдат смисла или да ѝ дадат. Тоа содејство и заемен сооднос на умот и срцето му помогнаа на Делчев да ја 
+    за да најдат смисла или да ѝ дадат. Тоа содејство и заемен сооднос на умот и срцето му помогнаа на Делчев да ја
-    држи својата идеологија во сообразност со положбата на работите... Водачите навистина направија една жртва 
+    држи својата идеологија во сообразност со положбата на работите... Водачите навистина направија една жртва
-    бидејќи на населението не му зборуваа за своите мисли и идеи. Тие се одрекоа од секаква субјективност во своите 
+    бидејќи на населението не му зборуваа за своите мисли и идеи. Тие се одрекоа од секаква субјективност во своите
-    мисли. Целта беше да не се зголемуваат целите и задачите како и преданоста во работата. Населението не можеше да 
+    мисли. Целта беше да не се зголемуваат целите и задачите како и преданоста во работата. Населението не можеше да
-    ги разбере овие идеи... 
+    ги разбере овие идеи...
    """
    tokens = mk_tokenizer(text)
    assert len(tokens) == 297
@ -45,7 +45,7 @@ def test_tokenizer_handles_long_text(mk_tokenizer):
        (",", False),
        ("милијарда", True),
        ("билион", True),
-    ]
+    ],
 )
 def test_mk_lex_attrs_like_number(mk_tokenizer, word, match):
    tokens = mk_tokenizer(word)
@ -53,14 +53,7 @@ def test_mk_lex_attrs_like_number(mk_tokenizer, word, match):
    assert tokens[0].like_num == match
-@pytest.mark.parametrize(
+@pytest.mark.parametrize("word", ["двесте", "два-три", "пет-шест"])
    "word",
    [
        "двесте",
        "два-три",
        "пет-шест"
    ]
 )
 def test_mk_lex_attrs_capitals(word):
    assert like_num(word)
    assert like_num(word.upper())
@ -77,8 +70,8 @@ def test_mk_lex_attrs_capitals(word):
        "петто",
        "стоти",
        "шеесетите",
-        "седумдесетите"
+        "седумдесетите",
-    ]
+    ],
 )
 def test_mk_lex_attrs_like_number_for_ordinal(word):
    assert like_num(word)
--- a/spacy/tests/lang/tr/test_text.py
+++ b/spacy/tests/lang/tr/test_text.py
@ -5,24 +5,22 @@ from spacy.lang.tr.lex_attrs import like_num
 def test_tr_tokenizer_handles_long_text(tr_tokenizer):
    text = """Pamuk nasıl ipliğe dönüştürülür?
-Sıkıştırılmış balyalar halindeki pamuk, iplik fabrikasına getirildiğinde hem 
+Sıkıştırılmış balyalar halindeki pamuk, iplik fabrikasına getirildiğinde hem
-lifleri birbirine dolaşmıştır, hem de tarladan toplanırken araya bitkinin 
+lifleri birbirine dolaşmıştır, hem de tarladan toplanırken araya bitkinin
-parçaları karışmıştır. Üstelik balyalardaki pamuğun cinsi aynı olsa bile kalitesi 
+parçaları karışmıştır. Üstelik balyalardaki pamuğun cinsi aynı olsa bile kalitesi
 değişeceğinden, önce bütün balyaların birbirine karıştırılarak harmanlanması gerekir.
-Daha sonra pamuk yığınları, liflerin açılıp temizlenmesi için tek bir birim halinde 
+Daha sonra pamuk yığınları, liflerin açılıp temizlenmesi için tek bir birim halinde
 birleştirilmiş çeşitli makinelerden geçirilir.Bunlardan biri, dönen tokmaklarıyla
 pamuğu dövüp kabartarak dağınık yumaklar haline getiren ve liflerin arasındaki yabancı
 maddeleri temizleyen hallaç makinesidir. Daha sonra tarak makinesine giren pamuk demetleri,
 herbirinin yüzeyinde yüzbinlerce incecik iğne bulunan döner silindirlerin arasından geçerek lif lif ayrılır
-ve tül inceliğinde gevşek bir örtüye dönüşür. Ama bir sonraki makine bu lifleri dağınık 
+ve tül inceliğinde gevşek bir örtüye dönüşür. Ama bir sonraki makine bu lifleri dağınık
 ve gevşek bir biçimde birbirine yaklaştırarak 2 cm eninde bir pamuk şeridi haline getirir."""
    tokens = tr_tokenizer(text)
    assert len(tokens) == 146
@pytest.mark.parametrize(
    "word",
    [
--- a/spacy/tests/lang/tr/test_tokenizer.py
+++ b/spacy/tests/lang/tr/test_tokenizer.py
@ -2,145 +2,692 @@ import pytest
 ABBREV_TESTS = [
-        ("Dr. Murat Bey ile görüştüm.", ["Dr.", "Murat", "Bey", "ile", "görüştüm", "."]),
+    ("Dr. Murat Bey ile görüştüm.", ["Dr.", "Murat", "Bey", "ile", "görüştüm", "."]),
-        ("Dr.la görüştüm.", ["Dr.la", "görüştüm", "."]),
+    ("Dr.la görüştüm.", ["Dr.la", "görüştüm", "."]),
-        ("Dr.'la görüştüm.", ["Dr.'la", "görüştüm", "."]),
+    ("Dr.'la görüştüm.", ["Dr.'la", "görüştüm", "."]),
-        ("TBMM'de çalışıyormuş.", ["TBMM'de", "çalışıyormuş", "."]),
+    ("TBMM'de çalışıyormuş.", ["TBMM'de", "çalışıyormuş", "."]),
-        ("Hem İst. hem Ank. bu konuda gayet iyi durumda.", ["Hem", "İst.", "hem", "Ank.", "bu", "konuda", "gayet", "iyi", "durumda", "."]),
+    (
-        ("Hem İst. hem Ank.'da yağış var.", ["Hem", "İst.", "hem", "Ank.'da", "yağış", "var", "."]),
+        "Hem İst. hem Ank. bu konuda gayet iyi durumda.",
-        ("Dr.", ["Dr."]),
+        ["Hem", "İst.", "hem", "Ank.", "bu", "konuda", "gayet", "iyi", "durumda", "."],
-        ("Yrd.Doç.", ["Yrd.Doç."]),
+    ),
-        ("Prof.'un", ["Prof.'un"]),
+    (
-        ("Böl.'nde", ["Böl.'nde"]),
+        "Hem İst. hem Ank.'da yağış var.",
        ["Hem", "İst.", "hem", "Ank.'da", "yağış", "var", "."],
    ),
    ("Dr.", ["Dr."]),
    ("Yrd.Doç.", ["Yrd.Doç."]),
    ("Prof.'un", ["Prof.'un"]),
    ("Böl.'nde", ["Böl.'nde"]),
 ]
 URL_TESTS = [
-        ("Bizler de www.duygu.com.tr adında bir websitesi kurduk.", ["Bizler", "de", "www.duygu.com.tr", "adında", "bir", "websitesi", "kurduk", "."]),
+    (
-        ("Bizler de https://www.duygu.com.tr adında bir websitesi kurduk.", ["Bizler", "de", "https://www.duygu.com.tr", "adında", "bir", "websitesi", "kurduk", "."]),
+        "Bizler de www.duygu.com.tr adında bir websitesi kurduk.",
-        ("Bizler de www.duygu.com.tr'dan satın aldık.", ["Bizler", "de", "www.duygu.com.tr'dan", "satın", "aldık", "."]),
+        [
-        ("Bizler de https://www.duygu.com.tr'dan satın aldık.", ["Bizler", "de", "https://www.duygu.com.tr'dan", "satın", "aldık", "."]),
+            "Bizler",
            "de",
            "www.duygu.com.tr",
            "adında",
            "bir",
            "websitesi",
            "kurduk",
            ".",
        ],
    ),
    (
        "Bizler de https://www.duygu.com.tr adında bir websitesi kurduk.",
        [
            "Bizler",
            "de",
            "https://www.duygu.com.tr",
            "adında",
            "bir",
            "websitesi",
            "kurduk",
            ".",
        ],
    ),
    (
        "Bizler de www.duygu.com.tr'dan satın aldık.",
        ["Bizler", "de", "www.duygu.com.tr'dan", "satın", "aldık", "."],
    ),
    (
        "Bizler de https://www.duygu.com.tr'dan satın aldık.",
        ["Bizler", "de", "https://www.duygu.com.tr'dan", "satın", "aldık", "."],
    ),
 ]
 NUMBER_TESTS = [
-        ("Rakamla 6 yazılıydı.", ["Rakamla", "6", "yazılıydı", "."]),
+    ("Rakamla 6 yazılıydı.", ["Rakamla", "6", "yazılıydı", "."]),
-        ("Hava -4 dereceydi.", ["Hava", "-4", "dereceydi", "."]),
+    ("Hava -4 dereceydi.", ["Hava", "-4", "dereceydi", "."]),
-        ("Hava sıcaklığı -4ten +6ya yükseldi.", ["Hava", "sıcaklığı", "-4ten", "+6ya", "yükseldi", "."]),
+    (
-        ("Hava sıcaklığı -4'ten +6'ya yükseldi.", ["Hava", "sıcaklığı", "-4'ten", "+6'ya", "yükseldi", "."]),
+        "Hava sıcaklığı -4ten +6ya yükseldi.",
-        ("Yarışta 6. oldum.", ["Yarışta", "6.", "oldum", "."]),
+        ["Hava", "sıcaklığı", "-4ten", "+6ya", "yükseldi", "."],
-        ("Yarışta 438547745. oldum.", ["Yarışta", "438547745.", "oldum", "."]),
+    ),
-        ("Kitap IV. Murat hakkında.",["Kitap", "IV.", "Murat", "hakkında", "."]),
+    (
-        #("Bana söylediği sayı 6.", ["Bana", "söylediği", "sayı", "6", "."]),
+        "Hava sıcaklığı -4'ten +6'ya yükseldi.",
-        ("Saat 6'da buluşalım.", ["Saat", "6'da", "buluşalım", "."]),
+        ["Hava", "sıcaklığı", "-4'ten", "+6'ya", "yükseldi", "."],
-        ("Saat 6dan sonra buluşalım.", ["Saat", "6dan", "sonra", "buluşalım", "."]),
+    ),
-        ("6.dan sonra saymadım.", ["6.dan", "sonra", "saymadım", "."]),
+    ("Yarışta 6. oldum.", ["Yarışta", "6.", "oldum", "."]),
-        ("6.'dan sonra saymadım.", ["6.'dan", "sonra", "saymadım", "."]),
+    ("Yarışta 438547745. oldum.", ["Yarışta", "438547745.", "oldum", "."]),
-        ("Saat 6'ydı.", ["Saat", "6'ydı", "."]),
+    ("Kitap IV. Murat hakkında.", ["Kitap", "IV.", "Murat", "hakkında", "."]),
-        ("5'te", ["5'te"]),
+    # ("Bana söylediği sayı 6.", ["Bana", "söylediği", "sayı", "6", "."]),
-        ("6'da", ["6'da"]),
+    ("Saat 6'da buluşalım.", ["Saat", "6'da", "buluşalım", "."]),
-        ("9dan", ["9dan"]),
+    ("Saat 6dan sonra buluşalım.", ["Saat", "6dan", "sonra", "buluşalım", "."]),
-        ("19'da", ["19'da"]),
+    ("6.dan sonra saymadım.", ["6.dan", "sonra", "saymadım", "."]),
-        ("VI'da", ["VI'da"]),
+    ("6.'dan sonra saymadım.", ["6.'dan", "sonra", "saymadım", "."]),
-        ("5.", ["5."]),
+    ("Saat 6'ydı.", ["Saat", "6'ydı", "."]),
-        ("72.", ["72."]),
+    ("5'te", ["5'te"]),
-        ("VI.", ["VI."]),
+    ("6'da", ["6'da"]),
-        ("6.'dan", ["6.'dan"]),
+    ("9dan", ["9dan"]),
-        ("19.'dan", ["19.'dan"]),
+    ("19'da", ["19'da"]),
-        ("6.dan", ["6.dan"]),
+    ("VI'da", ["VI'da"]),
-        ("16.dan", ["16.dan"]),
+    ("5.", ["5."]),
-        ("VI.'dan", ["VI.'dan"]),
+    ("72.", ["72."]),
-        ("VI.dan", ["VI.dan"]),
+    ("VI.", ["VI."]),
-        ("Hepsi 1994 yılında oldu.", ["Hepsi", "1994", "yılında", "oldu", "."]),
+    ("6.'dan", ["6.'dan"]),
-        ("Hepsi 1994'te oldu.", ["Hepsi", "1994'te", "oldu", "."]),
+    ("19.'dan", ["19.'dan"]),
-        ("2/3 tarihli faturayı bulamadım.", ["2/3", "tarihli", "faturayı", "bulamadım", "."]),
+    ("6.dan", ["6.dan"]),
-        ("2.3 tarihli faturayı bulamadım.", ["2.3", "tarihli", "faturayı", "bulamadım", "."]),
+    ("16.dan", ["16.dan"]),
-        ("2.3. tarihli faturayı bulamadım.", ["2.3.", "tarihli", "faturayı", "bulamadım", "."]),
+    ("VI.'dan", ["VI.'dan"]),
-        ("2/3/2020 tarihli faturayı bulamadm.", ["2/3/2020", "tarihli", "faturayı", "bulamadm", "."]),
+    ("VI.dan", ["VI.dan"]),
-        ("2/3/1987 tarihinden beri burda yaşıyorum.", ["2/3/1987", "tarihinden", "beri", "burda", "yaşıyorum", "."]),
+    ("Hepsi 1994 yılında oldu.", ["Hepsi", "1994", "yılında", "oldu", "."]),
-        ("2-3-1987 tarihinden beri burdayım.", ["2-3-1987", "tarihinden", "beri", "burdayım", "."]),
+    ("Hepsi 1994'te oldu.", ["Hepsi", "1994'te", "oldu", "."]),
-        ("2.3.1987 tarihinden beri burdayım.", ["2.3.1987", "tarihinden", "beri", "burdayım", "."]),
+    (
-        ("Bu olay 2005-2006 tarihleri arasında oldu.", ["Bu", "olay", "2005", "-", "2006", "tarihleri", "arasında", "oldu", "."]),
+        "2/3 tarihli faturayı bulamadım.",
-        ("Bu olay 4/12/2005-21/3/2006 tarihleri arasında oldu.", ["Bu", "olay", "4/12/2005", "-", "21/3/2006", "tarihleri", "arasında", "oldu", ".",]),
+        ["2/3", "tarihli", "faturayı", "bulamadım", "."],
-        ("Ek fıkra: 5/11/2003-4999/3 maddesine göre uygundur.", ["Ek", "fıkra", ":", "5/11/2003", "-", "4999/3", "maddesine", "göre", "uygundur", "."]),
+    ),
-        ("2/A alanları: 6831 sayılı Kanunun 2nci maddesinin birinci fıkrasının (A) bendine göre", ["2/A", "alanları", ":", "6831", "sayılı", "Kanunun", "2nci", "maddesinin", "birinci", "fıkrasının", "(", "A", ")", "bendine", "göre"]),
+    (
-        ("ŞEHİTTEĞMENKALMAZ Cad. No: 2/311", ["ŞEHİTTEĞMENKALMAZ", "Cad.", "No", ":", "2/311"]),
+        "2.3 tarihli faturayı bulamadım.",
-        ("2-3-2025", ["2-3-2025",]),
+        ["2.3", "tarihli", "faturayı", "bulamadım", "."],
-        ("2/3/2025", ["2/3/2025"]),
+    ),
-        ("Yıllardır 0.5 uç kullanıyorum.", ["Yıllardır", "0.5", "uç", "kullanıyorum", "."]),
+    (
-        ("Kan değerlerim 0.5-0.7 arasıydı.", ["Kan", "değerlerim", "0.5", "-", "0.7", "arasıydı", "."]),
+        "2.3. tarihli faturayı bulamadım.",
-        ("0.5", ["0.5"]),
+        ["2.3.", "tarihli", "faturayı", "bulamadım", "."],
-        ("1/2", ["1/2"]),
+    ),
-        ("%1", ["%", "1"]),
+    (
-        ("%1lik", ["%", "1lik"]),
+        "2/3/2020 tarihli faturayı bulamadm.",
-        ("%1'lik", ["%", "1'lik"]),
+        ["2/3/2020", "tarihli", "faturayı", "bulamadm", "."],
-        ("%1lik dilim", ["%", "1lik", "dilim"]),
+    ),
-        ("%1'lik dilim", ["%", "1'lik", "dilim"]),
+    (
-        ("%1.5", ["%", "1.5"]),
+        "2/3/1987 tarihinden beri burda yaşıyorum.",
-        #("%1-%2 arası büyüme bekleniyor.", ["%", "1", "-", "%", "2", "arası", "büyüme", "bekleniyor", "."]),
+        ["2/3/1987", "tarihinden", "beri", "burda", "yaşıyorum", "."],
-        ("%1-2 arası büyüme bekliyoruz.", ["%", "1", "-", "2", "arası", "büyüme", "bekliyoruz", "."]),
+    ),
-        ("%11-12 arası büyüme bekliyoruz.", ["%", "11", "-", "12", "arası", "büyüme", "bekliyoruz", "."]),
+    (
-        ("%1.5luk büyüme bekliyoruz.", ["%", "1.5luk", "büyüme", "bekliyoruz", "."]),
+        "2-3-1987 tarihinden beri burdayım.",
-        ("Saat 1-2 arası gelin lütfen.", ["Saat", "1", "-", "2", "arası", "gelin", "lütfen", "."]),
+        ["2-3-1987", "tarihinden", "beri", "burdayım", "."],
-        ("Saat 15:30 gibi buluşalım.", ["Saat", "15:30", "gibi", "buluşalım", "."]),
+    ),
-        ("Saat 15:30'da buluşalım.", ["Saat", "15:30'da", "buluşalım", "."]),
+    (
-        ("Saat 15.30'da buluşalım.", ["Saat", "15.30'da", "buluşalım", "."]),
+        "2.3.1987 tarihinden beri burdayım.",
-        ("Saat 15.30da buluşalım.", ["Saat", "15.30da", "buluşalım", "."]),
+        ["2.3.1987", "tarihinden", "beri", "burdayım", "."],
-        ("Saat 15 civarı buluşalım.", ["Saat", "15", "civarı", "buluşalım", "."]),
+    ),
-        ("9’daki otobüse binsek mi?", ["9’daki", "otobüse", "binsek", "mi", "?"]),
+    (
-        ("Okulumuz 3-B şubesi", ["Okulumuz", "3-B", "şubesi"]),
+        "Bu olay 2005-2006 tarihleri arasında oldu.",
-        ("Okulumuz 3/B şubesi", ["Okulumuz", "3/B", "şubesi"]),
+        ["Bu", "olay", "2005", "-", "2006", "tarihleri", "arasında", "oldu", "."],
-        ("Okulumuz 3B şubesi", ["Okulumuz", "3B", "şubesi"]),
+    ),
-        ("Okulumuz 3b şubesi", ["Okulumuz", "3b", "şubesi"]),
+    (
-        ("Antonio Gaudí 20. yüzyılda, 1904-1914 yılları arasında on yıl süren bir reform süreci getirmiştir.", ["Antonio", "Gaudí", "20.", "yüzyılda", ",", "1904", "-", "1914", "yılları", "arasında", "on", "yıl", "süren", "bir", "reform", "süreci", "getirmiştir", "."]),
+        "Bu olay 4/12/2005-21/3/2006 tarihleri arasında oldu.",
-        ("Dizel yakıtın avro bölgesi ortalaması olan 1,165 avroya kıyasla litre başına 1,335 avroya mal olduğunu gösteriyor.", ["Dizel", "yakıtın", "avro", "bölgesi", "ortalaması", "olan", "1,165", "avroya", "kıyasla", "litre", "başına", "1,335", "avroya", "mal", "olduğunu", "gösteriyor", "."]),
+        [
-        ("Marcus Antonius M.Ö. 1 Ocak 49'da, Sezar'dan Vali'nin kendisini barış dostu ilan ettiği bir bildiri yayınlamıştır.", ["Marcus", "Antonius", "M.Ö.", "1", "Ocak", "49'da", ",", "Sezar'dan", "Vali'nin", "kendisini", "barış", "dostu", "ilan", "ettiği", "bir", "bildiri", "yayınlamıştır", "."])
+            "Bu",
            "olay",
            "4/12/2005",
            "-",
            "21/3/2006",
            "tarihleri",
            "arasında",
            "oldu",
            ".",
        ],
    ),
    (
        "Ek fıkra: 5/11/2003-4999/3 maddesine göre uygundur.",
        [
            "Ek",
            "fıkra",
            ":",
            "5/11/2003",
            "-",
            "4999/3",
            "maddesine",
            "göre",
            "uygundur",
            ".",
        ],
    ),
    (
        "2/A alanları: 6831 sayılı Kanunun 2nci maddesinin birinci fıkrasının (A) bendine göre",
        [
            "2/A",
            "alanları",
            ":",
            "6831",
            "sayılı",
            "Kanunun",
            "2nci",
            "maddesinin",
            "birinci",
            "fıkrasının",
            "(",
            "A",
            ")",
            "bendine",
            "göre",
        ],
    ),
    (
        "ŞEHİTTEĞMENKALMAZ Cad. No: 2/311",
        ["ŞEHİTTEĞMENKALMAZ", "Cad.", "No", ":", "2/311"],
    ),
    (
        "2-3-2025",
        [
            "2-3-2025",
        ],
    ),
    ("2/3/2025", ["2/3/2025"]),
    ("Yıllardır 0.5 uç kullanıyorum.", ["Yıllardır", "0.5", "uç", "kullanıyorum", "."]),
    (
        "Kan değerlerim 0.5-0.7 arasıydı.",
        ["Kan", "değerlerim", "0.5", "-", "0.7", "arasıydı", "."],
    ),
    ("0.5", ["0.5"]),
    ("1/2", ["1/2"]),
    ("%1", ["%", "1"]),
    ("%1lik", ["%", "1lik"]),
    ("%1'lik", ["%", "1'lik"]),
    ("%1lik dilim", ["%", "1lik", "dilim"]),
    ("%1'lik dilim", ["%", "1'lik", "dilim"]),
    ("%1.5", ["%", "1.5"]),
    # ("%1-%2 arası büyüme bekleniyor.", ["%", "1", "-", "%", "2", "arası", "büyüme", "bekleniyor", "."]),
    (
        "%1-2 arası büyüme bekliyoruz.",
        ["%", "1", "-", "2", "arası", "büyüme", "bekliyoruz", "."],
    ),
    (
        "%11-12 arası büyüme bekliyoruz.",
        ["%", "11", "-", "12", "arası", "büyüme", "bekliyoruz", "."],
    ),
    ("%1.5luk büyüme bekliyoruz.", ["%", "1.5luk", "büyüme", "bekliyoruz", "."]),
    (
        "Saat 1-2 arası gelin lütfen.",
        ["Saat", "1", "-", "2", "arası", "gelin", "lütfen", "."],
    ),
    ("Saat 15:30 gibi buluşalım.", ["Saat", "15:30", "gibi", "buluşalım", "."]),
    ("Saat 15:30'da buluşalım.", ["Saat", "15:30'da", "buluşalım", "."]),
    ("Saat 15.30'da buluşalım.", ["Saat", "15.30'da", "buluşalım", "."]),
    ("Saat 15.30da buluşalım.", ["Saat", "15.30da", "buluşalım", "."]),
    ("Saat 15 civarı buluşalım.", ["Saat", "15", "civarı", "buluşalım", "."]),
    ("9’daki otobüse binsek mi?", ["9’daki", "otobüse", "binsek", "mi", "?"]),
    ("Okulumuz 3-B şubesi", ["Okulumuz", "3-B", "şubesi"]),
    ("Okulumuz 3/B şubesi", ["Okulumuz", "3/B", "şubesi"]),
    ("Okulumuz 3B şubesi", ["Okulumuz", "3B", "şubesi"]),
    ("Okulumuz 3b şubesi", ["Okulumuz", "3b", "şubesi"]),
    (
        "Antonio Gaudí 20. yüzyılda, 1904-1914 yılları arasında on yıl süren bir reform süreci getirmiştir.",
        [
            "Antonio",
            "Gaudí",
            "20.",
            "yüzyılda",
            ",",
            "1904",
            "-",
            "1914",
            "yılları",
            "arasında",
            "on",
            "yıl",
            "süren",
            "bir",
            "reform",
            "süreci",
            "getirmiştir",
            ".",
        ],
    ),
    (
        "Dizel yakıtın avro bölgesi ortalaması olan 1,165 avroya kıyasla litre başına 1,335 avroya mal olduğunu gösteriyor.",
        [
            "Dizel",
            "yakıtın",
            "avro",
            "bölgesi",
            "ortalaması",
            "olan",
            "1,165",
            "avroya",
            "kıyasla",
            "litre",
            "başına",
            "1,335",
            "avroya",
            "mal",
            "olduğunu",
            "gösteriyor",
            ".",
        ],
    ),
    (
        "Marcus Antonius M.Ö. 1 Ocak 49'da, Sezar'dan Vali'nin kendisini barış dostu ilan ettiği bir bildiri yayınlamıştır.",
        [
            "Marcus",
            "Antonius",
            "M.Ö.",
            "1",
            "Ocak",
            "49'da",
            ",",
            "Sezar'dan",
            "Vali'nin",
            "kendisini",
            "barış",
            "dostu",
            "ilan",
            "ettiği",
            "bir",
            "bildiri",
            "yayınlamıştır",
            ".",
        ],
    ),
 ]
 PUNCT_TESTS = [
-        ("Gitmedim dedim ya!", ["Gitmedim", "dedim", "ya", "!"]),
+    ("Gitmedim dedim ya!", ["Gitmedim", "dedim", "ya", "!"]),
-        ("Gitmedim dedim ya!!", ["Gitmedim", "dedim", "ya", "!", "!"]),
+    ("Gitmedim dedim ya!!", ["Gitmedim", "dedim", "ya", "!", "!"]),
-        ("Gitsek mi?", ["Gitsek", "mi", "?"]),
+    ("Gitsek mi?", ["Gitsek", "mi", "?"]),
-        ("Gitsek mi??", ["Gitsek", "mi", "?", "?"]),
+    ("Gitsek mi??", ["Gitsek", "mi", "?", "?"]),
-        ("Gitsek mi?!?", ["Gitsek", "mi", "?", "!", "?"]),
+    ("Gitsek mi?!?", ["Gitsek", "mi", "?", "!", "?"]),
-        ("Ankara - Antalya arası otobüs işliyor.", ["Ankara", "-",  "Antalya", "arası", "otobüs", "işliyor", "."]),
+    (
-        ("Ankara-Antalya arası otobüs işliyor.", ["Ankara", "-", "Antalya", "arası", "otobüs", "işliyor", "."]),
+        "Ankara - Antalya arası otobüs işliyor.",
-        ("Sen--ben, ya da onlar.", ["Sen", "--", "ben", ",", "ya", "da", "onlar", "."]),
+        ["Ankara", "-", "Antalya", "arası", "otobüs", "işliyor", "."],
-        ("Senden, benden, bizden şarkısını biliyor musun?", ["Senden", ",", "benden", ",", "bizden", "şarkısını", "biliyor", "musun", "?"]),
+    ),
-        ("Akif'le geldik, sonra da o ayrıldı.", ["Akif'le", "geldik", ",", "sonra", "da", "o", "ayrıldı", "."]),
+    (
-        ("Bu adam ne dedi şimdi???", ["Bu", "adam", "ne", "dedi", "şimdi", "?", "?", "?"]),
+        "Ankara-Antalya arası otobüs işliyor.",
-        ("Yok hasta olmuş, yok annesi hastaymış, bahaneler işte...", ["Yok", "hasta", "olmuş", ",", "yok", "annesi", "hastaymış", ",", "bahaneler", "işte", "..."]),
+        ["Ankara", "-", "Antalya", "arası", "otobüs", "işliyor", "."],
-        ("Ankara'dan İstanbul'a ... bir aşk hikayesi.", ["Ankara'dan", "İstanbul'a", "...", "bir", "aşk", "hikayesi", "."]),
+    ),
-        ("Ahmet'te", ["Ahmet'te"]),
+    ("Sen--ben, ya da onlar.", ["Sen", "--", "ben", ",", "ya", "da", "onlar", "."]),
-        ("İstanbul'da", ["İstanbul'da"]),
+    (
        "Senden, benden, bizden şarkısını biliyor musun?",
        ["Senden", ",", "benden", ",", "bizden", "şarkısını", "biliyor", "musun", "?"],
    ),
    (
        "Akif'le geldik, sonra da o ayrıldı.",
        ["Akif'le", "geldik", ",", "sonra", "da", "o", "ayrıldı", "."],
    ),
    ("Bu adam ne dedi şimdi???", ["Bu", "adam", "ne", "dedi", "şimdi", "?", "?", "?"]),
    (
        "Yok hasta olmuş, yok annesi hastaymış, bahaneler işte...",
        [
            "Yok",
            "hasta",
            "olmuş",
            ",",
            "yok",
            "annesi",
            "hastaymış",
            ",",
            "bahaneler",
            "işte",
            "...",
        ],
    ),
    (
        "Ankara'dan İstanbul'a ... bir aşk hikayesi.",
        ["Ankara'dan", "İstanbul'a", "...", "bir", "aşk", "hikayesi", "."],
    ),
    ("Ahmet'te", ["Ahmet'te"]),
    ("İstanbul'da", ["İstanbul'da"]),
 ]
 GENERAL_TESTS = [
-        ("1914'teki Endurance seferinde, Sir Ernest Shackleton'ın kaptanlığını yaptığı İngiliz Endurance gemisi yirmi sekiz kişi ile Antarktika'yı geçmek üzere yelken açtı.", ["1914'teki", "Endurance", "seferinde", ",", "Sir", "Ernest", "Shackleton'ın", "kaptanlığını", "yaptığı", "İngiliz", "Endurance", "gemisi", "yirmi", "sekiz", "kişi", "ile", "Antarktika'yı", "geçmek", "üzere", "yelken", "açtı", "."]),
+    (
-        ("Danışılan \"%100 Cospedal\" olduğunu belirtti.", ["Danışılan", '"', "%", "100", "Cospedal", '"', "olduğunu", "belirtti", "."]),
+        "1914'teki Endurance seferinde, Sir Ernest Shackleton'ın kaptanlığını yaptığı İngiliz Endurance gemisi yirmi sekiz kişi ile Antarktika'yı geçmek üzere yelken açtı.",
-        ("1976'da parkur artık kullanılmıyordu; 1990'da ise bir yangın, daha sonraları ahırlarla birlikte yıkılacak olan tahta tribünlerden geri kalanları da yok etmişti.", ["1976'da", "parkur", "artık", "kullanılmıyordu", ";", "1990'da", "ise", "bir", "yangın", ",", "daha", "sonraları", "ahırlarla", "birlikte", "yıkılacak", "olan", "tahta", "tribünlerden", "geri", "kalanları", "da", "yok", "etmişti", "."]),
+        [
-        ("Dahiyane bir ameliyat ve zorlu bir rehabilitasyon sürecinden sonra, tamamen iyileştim.", ["Dahiyane", "bir", "ameliyat", "ve", "zorlu", "bir", "rehabilitasyon", "sürecinden", "sonra", ",", "tamamen", "iyileştim", "."]),
+            "1914'teki",
-        ("Yaklaşık iki hafta süren bireysel erken oy kullanma döneminin ardından 5,7 milyondan fazla Floridalı sandık başına gitti.", ["Yaklaşık", "iki", "hafta", "süren", "bireysel", "erken", "oy", "kullanma", "döneminin", "ardından", "5,7", "milyondan", "fazla", "Floridalı", "sandık", "başına", "gitti", "."]),
+            "Endurance",
-        ("Ancak, bu ABD Çevre Koruma Ajansı'nın dünyayı bu konularda uyarmasının ardından ortaya çıktı.", ["Ancak", ",", "bu", "ABD", "Çevre", "Koruma", "Ajansı'nın", "dünyayı", "bu", "konularda", "uyarmasının", "ardından", "ortaya", "çıktı", "."]),
+            "seferinde",
-        ("Ortalama şansa ve 10.000 Sterlin değerinde tahvillere sahip bir yatırımcı yılda 125 Sterlin ikramiye kazanabilir.", ["Ortalama", "şansa", "ve", "10.000", "Sterlin", "değerinde", "tahvillere", "sahip", "bir", "yatırımcı", "yılda", "125", "Sterlin", "ikramiye", "kazanabilir", "."]),
+            ",",
-        ("Granit adaları; Seyşeller ve Tioman ile Saint Helena gibi volkanik adaları kapsar." , ["Granit", "adaları", ";", "Seyşeller", "ve", "Tioman", "ile", "Saint", "Helena", "gibi", "volkanik", "adaları", "kapsar", "."]),
+            "Sir",
-        ("Barış antlaşmasıyla İspanya, Amerika'ya Porto Riko, Guam ve Filipinler kolonilerini devretti.", ["Barış", "antlaşmasıyla", "İspanya", ",", "Amerika'ya", "Porto", "Riko", ",", "Guam", "ve", "Filipinler", "kolonilerini", "devretti", "."]),
+            "Ernest",
-        ("Makedonya\'nın sınır bölgelerini güvence altına alan Philip, büyük bir Makedon ordusu kurdu ve uzun bir fetih seferi için Trakya\'ya doğru yürüdü.", ["Makedonya\'nın", "sınır", "bölgelerini", "güvence", "altına", "alan", "Philip", ",", "büyük", "bir", "Makedon", "ordusu", "kurdu", "ve", "uzun", "bir", "fetih", "seferi", "için", "Trakya\'ya", "doğru", "yürüdü", "."]),
+            "Shackleton'ın",
-        ("Fransız gazetesi Le Figaro'ya göre bu hükumet planı sayesinde 42 milyon Euro kazanç sağlanabilir ve elde edilen paranın 15.5 milyonu ulusal güvenlik için kullanılabilir.", ["Fransız", "gazetesi", "Le", "Figaro'ya", "göre", "bu", "hükumet", "planı", "sayesinde", "42", "milyon", "Euro", "kazanç", "sağlanabilir", "ve", "elde", "edilen", "paranın", "15.5", "milyonu", "ulusal", "güvenlik", "için", "kullanılabilir", "."]),
+            "kaptanlığını",
-        ("Ortalama şansa ve 10.000 Sterlin değerinde tahvillere sahip bir yatırımcı yılda 125 Sterlin ikramiye kazanabilir.", ["Ortalama", "şansa", "ve", "10.000", "Sterlin", "değerinde", "tahvillere", "sahip", "bir", "yatırımcı", "yılda", "125", "Sterlin", "ikramiye", "kazanabilir", "."]),
+            "yaptığı",
-        ("3 Kasım Salı günü, Ankara Belediye Başkanı 2014'te hükümetle birlikte oluşturulan kentsel gelişim anlaşmasını askıya alma kararı verdi.", ["3", "Kasım", "Salı", "günü", ",", "Ankara", "Belediye", "Başkanı", "2014'te", "hükümetle", "birlikte", "oluşturulan", "kentsel", "gelişim", "anlaşmasını", "askıya", "alma", "kararı", "verdi", "."]),
+            "İngiliz",
-        ("Stalin, Abakumov'u Beria'nın enerji bakanlıkları üzerindeki baskınlığına karşı MGB içinde kendi ağını kurmaya teşvik etmeye başlamıştı.", ["Stalin", ",", "Abakumov'u", "Beria'nın", "enerji", "bakanlıkları", "üzerindeki", "baskınlığına", "karşı", "MGB", "içinde", "kendi", "ağını", "kurmaya", "teşvik", "etmeye", "başlamıştı", "."]),
+            "Endurance",
-        ("Güney Avrupa'daki kazı alanlarının çoğunluğu gibi, bu bulgu M.Ö. 5. yüzyılın başlar", ["Güney", "Avrupa'daki", "kazı", "alanlarının", "çoğunluğu", "gibi", ",", "bu", "bulgu", "M.Ö.", "5.", "yüzyılın", "başlar"]),
+            "gemisi",
-        ("Sağlığın bozulması Hitchcock hayatının son yirmi yılında üretimini azalttı.", ["Sağlığın", "bozulması", "Hitchcock", "hayatının", "son", "yirmi", "yılında", "üretimini", "azalttı", "."]),
+            "yirmi",
            "sekiz",
            "kişi",
            "ile",
            "Antarktika'yı",
            "geçmek",
            "üzere",
            "yelken",
            "açtı",
            ".",
        ],
    ),
    (
        'Danışılan "%100 Cospedal" olduğunu belirtti.',
        ["Danışılan", '"', "%", "100", "Cospedal", '"', "olduğunu", "belirtti", "."],
    ),
    (
        "1976'da parkur artık kullanılmıyordu; 1990'da ise bir yangın, daha sonraları ahırlarla birlikte yıkılacak olan tahta tribünlerden geri kalanları da yok etmişti.",
        [
            "1976'da",
            "parkur",
            "artık",
            "kullanılmıyordu",
            ";",
            "1990'da",
            "ise",
            "bir",
            "yangın",
            ",",
            "daha",
            "sonraları",
            "ahırlarla",
            "birlikte",
            "yıkılacak",
            "olan",
            "tahta",
            "tribünlerden",
            "geri",
            "kalanları",
            "da",
            "yok",
            "etmişti",
            ".",
        ],
    ),
    (
        "Dahiyane bir ameliyat ve zorlu bir rehabilitasyon sürecinden sonra, tamamen iyileştim.",
        [
            "Dahiyane",
            "bir",
            "ameliyat",
            "ve",
            "zorlu",
            "bir",
            "rehabilitasyon",
            "sürecinden",
            "sonra",
            ",",
            "tamamen",
            "iyileştim",
            ".",
        ],
    ),
    (
        "Yaklaşık iki hafta süren bireysel erken oy kullanma döneminin ardından 5,7 milyondan fazla Floridalı sandık başına gitti.",
        [
            "Yaklaşık",
            "iki",
            "hafta",
            "süren",
            "bireysel",
            "erken",
            "oy",
            "kullanma",
            "döneminin",
            "ardından",
            "5,7",
            "milyondan",
            "fazla",
            "Floridalı",
            "sandık",
            "başına",
            "gitti",
            ".",
        ],
    ),
    (
        "Ancak, bu ABD Çevre Koruma Ajansı'nın dünyayı bu konularda uyarmasının ardından ortaya çıktı.",
        [
            "Ancak",
            ",",
            "bu",
            "ABD",
            "Çevre",
            "Koruma",
            "Ajansı'nın",
            "dünyayı",
            "bu",
            "konularda",
            "uyarmasının",
            "ardından",
            "ortaya",
            "çıktı",
            ".",
        ],
    ),
    (
        "Ortalama şansa ve 10.000 Sterlin değerinde tahvillere sahip bir yatırımcı yılda 125 Sterlin ikramiye kazanabilir.",
        [
            "Ortalama",
            "şansa",
            "ve",
            "10.000",
            "Sterlin",
            "değerinde",
            "tahvillere",
            "sahip",
            "bir",
            "yatırımcı",
            "yılda",
            "125",
            "Sterlin",
            "ikramiye",
            "kazanabilir",
            ".",
        ],
    ),
    (
        "Granit adaları; Seyşeller ve Tioman ile Saint Helena gibi volkanik adaları kapsar.",
        [
            "Granit",
            "adaları",
            ";",
            "Seyşeller",
            "ve",
            "Tioman",
            "ile",
            "Saint",
            "Helena",
            "gibi",
            "volkanik",
            "adaları",
            "kapsar",
            ".",
        ],
    ),
    (
        "Barış antlaşmasıyla İspanya, Amerika'ya Porto Riko, Guam ve Filipinler kolonilerini devretti.",
        [
            "Barış",
            "antlaşmasıyla",
            "İspanya",
            ",",
            "Amerika'ya",
            "Porto",
            "Riko",
            ",",
            "Guam",
            "ve",
            "Filipinler",
            "kolonilerini",
            "devretti",
            ".",
        ],
    ),
    (
        "Makedonya'nın sınır bölgelerini güvence altına alan Philip, büyük bir Makedon ordusu kurdu ve uzun bir fetih seferi için Trakya'ya doğru yürüdü.",
        [
            "Makedonya'nın",
            "sınır",
            "bölgelerini",
            "güvence",
            "altına",
            "alan",
            "Philip",
            ",",
            "büyük",
            "bir",
            "Makedon",
            "ordusu",
            "kurdu",
            "ve",
            "uzun",
            "bir",
            "fetih",
            "seferi",
            "için",
            "Trakya'ya",
            "doğru",
            "yürüdü",
            ".",
        ],
    ),
    (
        "Fransız gazetesi Le Figaro'ya göre bu hükumet planı sayesinde 42 milyon Euro kazanç sağlanabilir ve elde edilen paranın 15.5 milyonu ulusal güvenlik için kullanılabilir.",
        [
            "Fransız",
            "gazetesi",
            "Le",
            "Figaro'ya",
            "göre",
            "bu",
            "hükumet",
            "planı",
            "sayesinde",
            "42",
            "milyon",
            "Euro",
            "kazanç",
            "sağlanabilir",
            "ve",
            "elde",
            "edilen",
            "paranın",
            "15.5",
            "milyonu",
            "ulusal",
            "güvenlik",
            "için",
            "kullanılabilir",
            ".",
        ],
    ),
    (
        "Ortalama şansa ve 10.000 Sterlin değerinde tahvillere sahip bir yatırımcı yılda 125 Sterlin ikramiye kazanabilir.",
        [
            "Ortalama",
            "şansa",
            "ve",
            "10.000",
            "Sterlin",
            "değerinde",
            "tahvillere",
            "sahip",
            "bir",
            "yatırımcı",
            "yılda",
            "125",
            "Sterlin",
            "ikramiye",
            "kazanabilir",
            ".",
        ],
    ),
    (
        "3 Kasım Salı günü, Ankara Belediye Başkanı 2014'te hükümetle birlikte oluşturulan kentsel gelişim anlaşmasını askıya alma kararı verdi.",
        [
            "3",
            "Kasım",
            "Salı",
            "günü",
            ",",
            "Ankara",
            "Belediye",
            "Başkanı",
            "2014'te",
            "hükümetle",
            "birlikte",
            "oluşturulan",
            "kentsel",
            "gelişim",
            "anlaşmasını",
            "askıya",
            "alma",
            "kararı",
            "verdi",
            ".",
        ],
    ),
    (
        "Stalin, Abakumov'u Beria'nın enerji bakanlıkları üzerindeki baskınlığına karşı MGB içinde kendi ağını kurmaya teşvik etmeye başlamıştı.",
        [
            "Stalin",
            ",",
            "Abakumov'u",
            "Beria'nın",
            "enerji",
            "bakanlıkları",
            "üzerindeki",
            "baskınlığına",
            "karşı",
            "MGB",
            "içinde",
            "kendi",
            "ağını",
            "kurmaya",
            "teşvik",
            "etmeye",
            "başlamıştı",
            ".",
        ],
    ),
    (
        "Güney Avrupa'daki kazı alanlarının çoğunluğu gibi, bu bulgu M.Ö. 5. yüzyılın başlar",
        [
            "Güney",
            "Avrupa'daki",
            "kazı",
            "alanlarının",
            "çoğunluğu",
            "gibi",
            ",",
            "bu",
            "bulgu",
            "M.Ö.",
            "5.",
            "yüzyılın",
            "başlar",
        ],
    ),
    (
        "Sağlığın bozulması Hitchcock hayatının son yirmi yılında üretimini azalttı.",
        [
            "Sağlığın",
            "bozulması",
            "Hitchcock",
            "hayatının",
            "son",
            "yirmi",
            "yılında",
            "üretimini",
            "azalttı",
            ".",
        ],
    ),
 ]
-
+TESTS = ABBREV_TESTS + URL_TESTS + NUMBER_TESTS + PUNCT_TESTS + GENERAL_TESTS
 TESTS = (ABBREV_TESTS + URL_TESTS +  NUMBER_TESTS + PUNCT_TESTS + GENERAL_TESTS)
@pytest.mark.parametrize("text,expected_tokens", TESTS)
@ -149,4 +696,3 @@ def test_tr_tokenizer_handles_allcases(tr_tokenizer, text, expected_tokens):
    token_list = [token.text for token in tokens if not token.is_space]
    print(token_list)
    assert expected_tokens == token_list
--- a/spacy/tests/lang/uk/test_tokenizer.py
+++ b/spacy/tests/lang/uk/test_tokenizer.py
@ -89,7 +89,6 @@ def test_uk_tokenizer_splits_open_appostrophe(uk_tokenizer, text):
    assert tokens[0].text == "'"
@pytest.mark.skip(reason="See Issue #3327 and PR #3329")
@pytest.mark.parametrize("text", ["Тест''"])
 def test_uk_tokenizer_splits_double_end_quote(uk_tokenizer, text):
    tokens = uk_tokenizer(text)
--- a/spacy/tests/parser/test_arc_eager_oracle.py
+++ b/spacy/tests/parser/test_arc_eager_oracle.py
@ -7,7 +7,6 @@ from spacy.tokens import Doc
 from spacy.pipeline._parser_internals.nonproj import projectivize
 from spacy.pipeline._parser_internals.arc_eager import ArcEager
 from spacy.pipeline.dep_parser import DEFAULT_PARSER_MODEL
 from spacy.pipeline._parser_internals.stateclass import StateClass
 def get_sequence_costs(M, words, heads, deps, transitions):
@ -59,7 +58,7 @@ def test_oracle_four_words(arc_eager, vocab):
        ["S"],
        ["L-left"],
        ["S"],
-        ["D"]
+        ["D"],
    ]
    assert state.is_final()
    for i, state_costs in enumerate(cost_history):
@ -185,9 +184,9 @@ def test_oracle_dev_sentence(vocab, arc_eager):
        "L-nn",  # Attach 'Cars' to 'Inc.'
        "L-nn",  # Attach 'Motor' to 'Inc.'
        "L-nn",  # Attach 'Rolls-Royce' to 'Inc.'
-        "S",     # Shift "Inc."
+        "S",  # Shift "Inc."
        "L-nsubj",  # Attach 'Inc.' to 'said'
-        "S",        # Shift 'said'
+        "S",  # Shift 'said'
        "S",  # Shift 'it'
        "L-nsubj",  # Attach 'it.' to 'expects'
        "R-ccomp",  # Attach 'expects' to 'said'
@ -251,7 +250,7 @@ def test_oracle_bad_tokenization(vocab, arc_eager):
        is root is
        bad comp is
    """
- 
+
    gold_words = []
    gold_deps = []
    gold_heads = []
@ -268,7 +267,9 @@ def test_oracle_bad_tokenization(vocab, arc_eager):
        arc_eager.add_action(2, dep)  # Left
        arc_eager.add_action(3, dep)  # Right
    reference = Doc(Vocab(), words=gold_words, deps=gold_deps, heads=gold_heads)
-    predicted = Doc(reference.vocab, words=["[", "catalase", "]", ":", "that", "is", "bad"])
+    predicted = Doc(
        reference.vocab, words=["[", "catalase", "]", ":", "that", "is", "bad"]
    )
    example = Example(predicted=predicted, reference=reference)
    ae_oracle_actions = arc_eager.get_oracle_sequence(example, _debug=False)
    ae_oracle_actions = [arc_eager.get_class_name(i) for i in ae_oracle_actions]
--- a/spacy/tests/parser/test_ner.py
+++ b/spacy/tests/parser/test_ner.py
@ -301,11 +301,9 @@ def test_block_ner():
    assert [token.ent_type_ for token in doc] == expected_types
-@pytest.mark.parametrize(
+@pytest.mark.parametrize("use_upper", [True, False])
    "use_upper", [True, False]
 )
 def test_overfitting_IO(use_upper):
-    # Simple test to try and quickly overfit the NER component - ensuring the ML models work correctly
+    # Simple test to try and quickly overfit the NER component
    nlp = English()
    ner = nlp.add_pipe("ner", config={"model": {"use_upper": use_upper}})
    train_examples = []
@ -361,6 +359,84 @@ def test_overfitting_IO(use_upper):
    assert_equal(batch_deps_1, no_batch_deps)
 def test_beam_ner_scores():
    # Test that we can get confidence values out of the beam_ner pipe
    beam_width = 16
    beam_density = 0.0001
    nlp = English()
    config = {
        "beam_width": beam_width,
        "beam_density": beam_density,
    }
    ner = nlp.add_pipe("beam_ner", config=config)
    train_examples = []
    for text, annotations in TRAIN_DATA:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
        for ent in annotations.get("entities"):
            ner.add_label(ent[2])
    optimizer = nlp.initialize()
    # update once
    losses = {}
    nlp.update(train_examples, sgd=optimizer, losses=losses)
    # test the scores from the beam
    test_text = "I like London."
    doc = nlp.make_doc(test_text)
    docs = [doc]
    beams = ner.predict(docs)
    entity_scores = ner.scored_ents(beams)[0]
    for j in range(len(doc)):
        for label in ner.labels:
            score = entity_scores[(j, j+1, label)]
            eps = 0.00001
            assert 0 - eps <= score <= 1 + eps
 def test_beam_overfitting_IO():
    # Simple test to try and quickly overfit the Beam NER component
    nlp = English()
    beam_width = 16
    beam_density = 0.0001
    config = {
        "beam_width": beam_width,
        "beam_density": beam_density,
    }
    ner = nlp.add_pipe("beam_ner", config=config)
    train_examples = []
    for text, annotations in TRAIN_DATA:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
        for ent in annotations.get("entities"):
            ner.add_label(ent[2])
    optimizer = nlp.initialize()
    # run overfitting
    for i in range(50):
        losses = {}
        nlp.update(train_examples, sgd=optimizer, losses=losses)
    assert losses["beam_ner"] < 0.0001
    # test the scores from the beam
    test_text = "I like London."
    docs = [nlp.make_doc(test_text)]
    beams = ner.predict(docs)
    entity_scores = ner.scored_ents(beams)[0]
    assert entity_scores[(2, 3, "LOC")] == 1.0
    assert entity_scores[(2, 3, "PERSON")] == 0.0
    # Also test the results are still the same after IO
    with make_tempdir() as tmp_dir:
        nlp.to_disk(tmp_dir)
        nlp2 = util.load_model_from_path(tmp_dir)
        docs2 = [nlp2.make_doc(test_text)]
        ner2 = nlp2.get_pipe("beam_ner")
        beams2 = ner2.predict(docs2)
        entity_scores2 = ner2.scored_ents(beams2)[0]
        assert entity_scores2[(2, 3, "LOC")] == 1.0
        assert entity_scores2[(2, 3, "PERSON")] == 0.0
 def test_ner_warns_no_lookups(caplog):
    nlp = English()
    assert nlp.lang in util.LEXEME_NORM_LANGS
--- a/spacy/tests/parser/test_nn_beam.py
+++ b/spacy/tests/parser/test_nn_beam.py
@ -1,13 +1,9 @@
 # coding: utf8
 from __future__ import unicode_literals
 import pytest
 import hypothesis
 import hypothesis.strategies
 import numpy
 from spacy.vocab import Vocab
 from spacy.language import Language
 from spacy.pipeline import DependencyParser
 from spacy.pipeline._parser_internals.arc_eager import ArcEager
 from spacy.tokens import Doc
 from spacy.pipeline._parser_internals._beam_utils import BeamBatch
@ -44,7 +40,7 @@ def docs(vocab):
            words=["Rats", "bite", "things"],
            heads=[1, 1, 1],
            deps=["nsubj", "ROOT", "dobj"],
-            sent_starts=[True, False, False]
+            sent_starts=[True, False, False],
        )
    ]
@ -77,10 +73,12 @@ def batch_size(docs):
 def beam_width():
    return 4
@pytest.fixture(params=[0.0, 0.5, 1.0])
 def beam_density(request):
    return request.param
@pytest.fixture
 def vector_size():
    return 6
@ -100,7 +98,9 @@ def scores(moves, batch_size, beam_width):
                numpy.random.uniform(-0.1, 0.1, (beam_width, moves.n_moves))
                for _ in range(batch_size)
            ]
-        ), dtype="float32")
+        ),
        dtype="float32",
    )
 def test_create_beam(beam):
@ -128,8 +128,6 @@ def test_beam_parse(examples, beam_width):
    parser(doc)
@hypothesis.given(hyp=hypothesis.strategies.data())
 def test_beam_density(moves, examples, beam_width, hyp):
    beam_density = float(hyp.draw(hypothesis.strategies.floats(0.0, 1.0, width=32)))
--- a/spacy/tests/parser/test_parse.py
+++ b/spacy/tests/parser/test_parse.py
@ -28,6 +28,26 @@ TRAIN_DATA = [
 ]
 CONFLICTING_DATA = [
    (
        "I like London and Berlin.",
        {
            "heads": [1, 1, 1, 2, 2, 1],
            "deps": ["nsubj", "ROOT", "dobj", "cc", "conj", "punct"],
        },
    ),
    (
        "I like London and Berlin.",
        {
            "heads": [0, 0, 0, 0, 0, 0],
            "deps": ["ROOT", "nsubj", "nsubj", "cc", "conj", "punct"],
        },
    ),
 ]
 eps = 0.01
 def test_parser_root(en_vocab):
    words = ["i", "do", "n't", "have", "other", "assistance"]
    heads = [3, 3, 3, 3, 5, 3]
@ -185,26 +205,31 @@ def test_parser_set_sent_starts(en_vocab):
            assert token.head in sent
-def test_overfitting_IO():
+@pytest.mark.parametrize("pipe_name", ["parser", "beam_parser"])
-    # Simple test to try and quickly overfit the dependency parser - ensuring the ML models work correctly
+def test_overfitting_IO(pipe_name):
    # Simple test to try and quickly overfit the dependency parser (normal or beam)
    nlp = English()
-    parser = nlp.add_pipe("parser")
+    parser = nlp.add_pipe(pipe_name)
    train_examples = []
    for text, annotations in TRAIN_DATA:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
        for dep in annotations.get("deps", []):
            parser.add_label(dep)
    optimizer = nlp.initialize()
-    for i in range(100):
+    # run overfitting
    for i in range(150):
        losses = {}
        nlp.update(train_examples, sgd=optimizer, losses=losses)
-    assert losses["parser"] < 0.0001
+    assert losses[pipe_name] < 0.0001
    # test the trained model
    test_text = "I like securities."
    doc = nlp(test_text)
    assert doc[0].dep_ == "nsubj"
    assert doc[2].dep_ == "dobj"
    assert doc[3].dep_ == "punct"
    assert doc[0].head.i == 1
    assert doc[2].head.i == 1
    assert doc[3].head.i == 1
    # Also test the results are still the same after IO
    with make_tempdir() as tmp_dir:
        nlp.to_disk(tmp_dir)
@ -213,6 +238,9 @@ def test_overfitting_IO():
        assert doc2[0].dep_ == "nsubj"
        assert doc2[2].dep_ == "dobj"
        assert doc2[3].dep_ == "punct"
        assert doc2[0].head.i == 1
        assert doc2[2].head.i == 1
        assert doc2[3].head.i == 1
    # Make sure that running pipe twice, or comparing to call, always amounts to the same predictions
    texts = [
@ -226,3 +254,123 @@ def test_overfitting_IO():
    no_batch_deps = [doc.to_array([DEP]) for doc in [nlp(text) for text in texts]]
    assert_equal(batch_deps_1, batch_deps_2)
    assert_equal(batch_deps_1, no_batch_deps)
 def test_beam_parser_scores():
    # Test that we can get confidence values out of the beam_parser pipe
    beam_width = 16
    beam_density = 0.0001
    nlp = English()
    config = {
        "beam_width": beam_width,
        "beam_density": beam_density,
    }
    parser = nlp.add_pipe("beam_parser", config=config)
    train_examples = []
    for text, annotations in CONFLICTING_DATA:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
        for dep in annotations.get("deps", []):
            parser.add_label(dep)
    optimizer = nlp.initialize()
    # update a bit with conflicting data
    for i in range(10):
        losses = {}
        nlp.update(train_examples, sgd=optimizer, losses=losses)
    # test the scores from the beam
    test_text = "I like securities."
    doc = nlp.make_doc(test_text)
    docs = [doc]
    beams = parser.predict(docs)
    head_scores, label_scores = parser.scored_parses(beams)
    for j in range(len(doc)):
        for label in parser.labels:
            label_score = label_scores[0][(j, label)]
            assert 0 - eps <= label_score <= 1 + eps
        for i in range(len(doc)):
            head_score = head_scores[0][(j, i)]
            assert 0 - eps <= head_score <= 1 + eps
 def test_beam_overfitting_IO():
    # Simple test to try and quickly overfit the Beam dependency parser
    nlp = English()
    beam_width = 16
    beam_density = 0.0001
    config = {
        "beam_width": beam_width,
        "beam_density": beam_density,
    }
    parser = nlp.add_pipe("beam_parser", config=config)
    train_examples = []
    for text, annotations in TRAIN_DATA:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
        for dep in annotations.get("deps", []):
            parser.add_label(dep)
    optimizer = nlp.initialize()
    # run overfitting
    for i in range(150):
        losses = {}
        nlp.update(train_examples, sgd=optimizer, losses=losses)
    assert losses["beam_parser"] < 0.0001
    # test the scores from the beam
    test_text = "I like securities."
    docs = [nlp.make_doc(test_text)]
    beams = parser.predict(docs)
    head_scores, label_scores = parser.scored_parses(beams)
    # we only processed one document
    head_scores = head_scores[0]
    label_scores = label_scores[0]
    # test label annotations: 0=nsubj, 2=dobj, 3=punct
    assert label_scores[(0, "nsubj")] == pytest.approx(1.0, eps)
    assert label_scores[(0, "dobj")] == pytest.approx(0.0, eps)
    assert label_scores[(0, "punct")] == pytest.approx(0.0, eps)
    assert label_scores[(2, "nsubj")] == pytest.approx(0.0, eps)
    assert label_scores[(2, "dobj")] == pytest.approx(1.0, eps)
    assert label_scores[(2, "punct")] == pytest.approx(0.0, eps)
    assert label_scores[(3, "nsubj")] == pytest.approx(0.0, eps)
    assert label_scores[(3, "dobj")] == pytest.approx(0.0, eps)
    assert label_scores[(3, "punct")] == pytest.approx(1.0, eps)
    # test head annotations: the root is token at index 1
    assert head_scores[(0, 0)] == pytest.approx(0.0, eps)
    assert head_scores[(0, 1)] == pytest.approx(1.0, eps)
    assert head_scores[(0, 2)] == pytest.approx(0.0, eps)
    assert head_scores[(2, 0)] == pytest.approx(0.0, eps)
    assert head_scores[(2, 1)] == pytest.approx(1.0, eps)
    assert head_scores[(2, 2)] == pytest.approx(0.0, eps)
    assert head_scores[(3, 0)] == pytest.approx(0.0, eps)
    assert head_scores[(3, 1)] == pytest.approx(1.0, eps)
    assert head_scores[(3, 2)] == pytest.approx(0.0, eps)
    # Also test the results are still the same after IO
    with make_tempdir() as tmp_dir:
        nlp.to_disk(tmp_dir)
        nlp2 = util.load_model_from_path(tmp_dir)
        docs2 = [nlp2.make_doc(test_text)]
        parser2 = nlp2.get_pipe("beam_parser")
        beams2 = parser2.predict(docs2)
        head_scores2, label_scores2 = parser2.scored_parses(beams2)
        # we only processed one document
        head_scores2 = head_scores2[0]
        label_scores2 = label_scores2[0]
        # check the results again
        assert label_scores2[(0, "nsubj")] == pytest.approx(1.0, eps)
        assert label_scores2[(0, "dobj")] == pytest.approx(0.0, eps)
        assert label_scores2[(0, "punct")] == pytest.approx(0.0, eps)
        assert label_scores2[(2, "nsubj")] == pytest.approx(0.0, eps)
        assert label_scores2[(2, "dobj")] == pytest.approx(1.0, eps)
        assert label_scores2[(2, "punct")] == pytest.approx(0.0, eps)
        assert label_scores2[(3, "nsubj")] == pytest.approx(0.0, eps)
        assert label_scores2[(3, "dobj")] == pytest.approx(0.0, eps)
        assert label_scores2[(3, "punct")] == pytest.approx(1.0, eps)
        assert head_scores2[(0, 0)] == pytest.approx(0.0, eps)
        assert head_scores2[(0, 1)] == pytest.approx(1.0, eps)
        assert head_scores2[(0, 2)] == pytest.approx(0.0, eps)
        assert head_scores2[(2, 0)] == pytest.approx(0.0, eps)
        assert head_scores2[(2, 1)] == pytest.approx(1.0, eps)
        assert head_scores2[(2, 2)] == pytest.approx(0.0, eps)
        assert head_scores2[(3, 0)] == pytest.approx(0.0, eps)
        assert head_scores2[(3, 1)] == pytest.approx(1.0, eps)
        assert head_scores2[(3, 2)] == pytest.approx(0.0, eps)
--- a/spacy/tests/parser/test_state.py
+++ b/spacy/tests/parser/test_state.py
@ -4,14 +4,17 @@ from spacy.tokens.doc import Doc
 from spacy.vocab import Vocab
 from spacy.pipeline._parser_internals.stateclass import StateClass
@pytest.fixture
 def vocab():
    return Vocab()
@pytest.fixture
 def doc(vocab):
    return Doc(vocab, words=["a", "b", "c", "d"])
 def test_init_state(doc):
    state = StateClass(doc)
    assert state.stack == []
@ -19,6 +22,7 @@ def test_init_state(doc):
    assert not state.is_final()
    assert state.buffer_length() == 4
 def test_push_pop(doc):
    state = StateClass(doc)
    state.push()
@ -33,6 +37,7 @@ def test_push_pop(doc):
    assert state.stack == [0]
    assert 1 not in state.queue
 def test_stack_depth(doc):
    state = StateClass(doc)
    assert state.stack_depth() == 0
--- a/spacy/tests/pipeline/test_attributeruler.py
+++ b/spacy/tests/pipeline/test_attributeruler.py
@ -161,7 +161,7 @@ def test_attributeruler_score(nlp, pattern_dicts):
    # "cat" is the only correct lemma
    assert scores["lemma_acc"] == pytest.approx(0.2)
    # no morphs are set
-    assert scores["morph_acc"] == None
+    assert scores["morph_acc"] is None
 def test_attributeruler_rule_order(nlp):
--- a/spacy/tests/pipeline/test_entity_ruler.py
+++ b/spacy/tests/pipeline/test_entity_ruler.py
@ -201,13 +201,9 @@ def test_entity_ruler_overlapping_spans(nlp):
@pytest.mark.parametrize("n_process", [1, 2])
 def test_entity_ruler_multiprocessing(nlp, n_process):
-    texts = [
+    texts = ["I enjoy eating Pizza Hut pizza."]
        "I enjoy eating Pizza Hut pizza."
    ]
-    patterns = [
+    patterns = [{"label": "FASTFOOD", "pattern": "Pizza Hut", "id": "1234"}]
        {"label": "FASTFOOD", "pattern": "Pizza Hut", "id": "1234"}
    ]
    ruler = nlp.add_pipe("entity_ruler")
    ruler.add_patterns(patterns)
--- a/spacy/tests/pipeline/test_pipe_factories.py
+++ b/spacy/tests/pipeline/test_pipe_factories.py
@ -159,8 +159,12 @@ def test_pipe_class_component_model():
        "model": {
            "@architectures": "spacy.TextCatEnsemble.v2",
            "tok2vec": DEFAULT_TOK2VEC_MODEL,
-            "linear_model": {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": False, "ngram_size": 1,
+            "linear_model": {
-                      "no_output_layer": False},
+                "@architectures": "spacy.TextCatBOW.v1",
                "exclusive_classes": False,
                "ngram_size": 1,
                "no_output_layer": False,
            },
        },
        "value1": 10,
    }
--- a/spacy/tests/pipeline/test_tagger.py
+++ b/spacy/tests/pipeline/test_tagger.py
@ -37,7 +37,16 @@ TRAIN_DATA = [
 ]
 PARTIAL_DATA = [
    # partial annotation
    ("I like green eggs", {"tags": ["", "V", "J", ""]}),
    # misaligned partial annotation
    (
        "He hates green eggs",
        {
            "words": ["He", "hate", "s", "green", "eggs"],
            "tags": ["", "V", "S", "J", ""],
        },
    ),
 ]
@ -126,6 +135,7 @@ def test_incomplete_data():
    assert doc[1].tag_ is "V"
    assert doc[2].tag_ is "J"
 def test_overfitting_IO():
    # Simple test to try and quickly overfit the tagger - ensuring the ML models work correctly
    nlp = English()
--- a/spacy/tests/pipeline/test_textcat.py
+++ b/spacy/tests/pipeline/test_textcat.py
@ -15,15 +15,31 @@ from spacy.training import Example
 from ..util import make_tempdir
-TRAIN_DATA = [
+TRAIN_DATA_SINGLE_LABEL = [
    ("I'm so happy.", {"cats": {"POSITIVE": 1.0, "NEGATIVE": 0.0}}),
    ("I'm so angry", {"cats": {"POSITIVE": 0.0, "NEGATIVE": 1.0}}),
 ]
 TRAIN_DATA_MULTI_LABEL = [
    ("I'm angry and confused", {"cats": {"ANGRY": 1.0, "CONFUSED": 1.0, "HAPPY": 0.0}}),
    ("I'm confused but happy", {"cats": {"ANGRY": 0.0, "CONFUSED": 1.0, "HAPPY": 1.0}}),
 ]
-def make_get_examples(nlp):
+
 def make_get_examples_single_label(nlp):
    train_examples = []
-    for t in TRAIN_DATA:
+    for t in TRAIN_DATA_SINGLE_LABEL:
        train_examples.append(Example.from_dict(nlp.make_doc(t[0]), t[1]))
    def get_examples():
        return train_examples
    return get_examples
 def make_get_examples_multi_label(nlp):
    train_examples = []
    for t in TRAIN_DATA_MULTI_LABEL:
        train_examples.append(Example.from_dict(nlp.make_doc(t[0]), t[1]))
    def get_examples():
@ -85,49 +101,75 @@ def test_textcat_learns_multilabel():
                    assert score > 0.5
-def test_label_types():
+@pytest.mark.parametrize("name", ["textcat", "textcat_multilabel"])
 def test_label_types(name):
    nlp = Language()
-    textcat = nlp.add_pipe("textcat")
+    textcat = nlp.add_pipe(name)
    textcat.add_label("answer")
    with pytest.raises(ValueError):
        textcat.add_label(9)
-def test_no_label():
+@pytest.mark.parametrize("name", ["textcat", "textcat_multilabel"])
 def test_no_label(name):
    nlp = Language()
-    nlp.add_pipe("textcat")
+    nlp.add_pipe(name)
    with pytest.raises(ValueError):
        nlp.initialize()
-def test_implicit_label():
+@pytest.mark.parametrize(
    "name,get_examples",
    [
        ("textcat", make_get_examples_single_label),
        ("textcat_multilabel", make_get_examples_multi_label),
    ],
 )
 def test_implicit_label(name, get_examples):
    nlp = Language()
-    nlp.add_pipe("textcat")
+    nlp.add_pipe(name)
-    nlp.initialize(get_examples=make_get_examples(nlp))
+    nlp.initialize(get_examples=get_examples(nlp))
-def test_no_resize():
+@pytest.mark.parametrize("name", ["textcat", "textcat_multilabel"])
 def test_no_resize(name):
    nlp = Language()
-    textcat = nlp.add_pipe("textcat")
+    textcat = nlp.add_pipe(name)
    textcat.add_label("POSITIVE")
    textcat.add_label("NEGATIVE")
    nlp.initialize()
-    assert textcat.model.get_dim("nO") == 2
+    assert textcat.model.get_dim("nO") >= 2
    # this throws an error because the textcat can't be resized after initialization
    with pytest.raises(ValueError):
        textcat.add_label("NEUTRAL")
-def test_initialize_examples():
+def test_error_with_multi_labels():
    nlp = Language()
    textcat = nlp.add_pipe("textcat")
-    for text, annotations in TRAIN_DATA:
+    train_examples = []
    for text, annotations in TRAIN_DATA_MULTI_LABEL:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
    with pytest.raises(ValueError):
        optimizer = nlp.initialize(get_examples=lambda: train_examples)
@pytest.mark.parametrize(
    "name,get_examples, train_data",
    [
        ("textcat", make_get_examples_single_label, TRAIN_DATA_SINGLE_LABEL),
        ("textcat_multilabel", make_get_examples_multi_label, TRAIN_DATA_MULTI_LABEL),
    ],
 )
 def test_initialize_examples(name, get_examples, train_data):
    nlp = Language()
    textcat = nlp.add_pipe(name)
    for text, annotations in train_data:
        for label, value in annotations.get("cats").items():
            textcat.add_label(label)
    # you shouldn't really call this more than once, but for testing it should be fine
    nlp.initialize()
-    get_examples = make_get_examples(nlp)
+    nlp.initialize(get_examples=get_examples(nlp))
    nlp.initialize(get_examples=get_examples)
    with pytest.raises(TypeError):
        nlp.initialize(get_examples=lambda: None)
    with pytest.raises(TypeError):
@ -138,12 +180,10 @@ def test_overfitting_IO():
    # Simple test to try and quickly overfit the single-label textcat component - ensuring the ML models work correctly
    fix_random_seed(0)
    nlp = English()
-    nlp.config["initialize"]["components"]["textcat"] = {"positive_label": "POSITIVE"}
+    textcat = nlp.add_pipe("textcat")
-    # Set exclusive labels
+
    config = {"model": {"linear_model": {"exclusive_classes": True}}}
    textcat = nlp.add_pipe("textcat", config=config)
    train_examples = []
-    for text, annotations in TRAIN_DATA:
+    for text, annotations in TRAIN_DATA_SINGLE_LABEL:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
    optimizer = nlp.initialize(get_examples=lambda: train_examples)
    assert textcat.model.get_dim("nO") == 2
@ -172,6 +212,8 @@ def test_overfitting_IO():
    # Test scoring
    scores = nlp.evaluate(train_examples)
    assert scores["cats_micro_f"] == 1.0
    assert scores["cats_macro_f"] == 1.0
    assert scores["cats_macro_auc"] == 1.0
    assert scores["cats_score"] == 1.0
    assert "cats_score_desc" in scores
@ -192,7 +234,7 @@ def test_overfitting_IO_multi():
    config = {"model": {"linear_model": {"exclusive_classes": False}}}
    textcat = nlp.add_pipe("textcat", config=config)
    train_examples = []
-    for text, annotations in TRAIN_DATA:
+    for text, annotations in TRAIN_DATA_MULTI_LABEL:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
    optimizer = nlp.initialize(get_examples=lambda: train_examples)
    assert textcat.model.get_dim("nO") == 2
@ -231,27 +273,75 @@ def test_overfitting_IO_multi():
    assert_equal(batch_cats_1, no_batch_cats)
 def test_overfitting_IO_multi():
    # Simple test to try and quickly overfit the multi-label textcat component - ensuring the ML models work correctly
    fix_random_seed(0)
    nlp = English()
    textcat = nlp.add_pipe("textcat_multilabel")
    train_examples = []
    for text, annotations in TRAIN_DATA_MULTI_LABEL:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
    optimizer = nlp.initialize(get_examples=lambda: train_examples)
    assert textcat.model.get_dim("nO") == 3
    for i in range(100):
        losses = {}
        nlp.update(train_examples, sgd=optimizer, losses=losses)
    assert losses["textcat_multilabel"] < 0.01
    # test the trained model
    test_text = "I am confused but happy."
    doc = nlp(test_text)
    cats = doc.cats
    assert cats["HAPPY"] > 0.9
    assert cats["CONFUSED"] > 0.9
    # Also test the results are still the same after IO
    with make_tempdir() as tmp_dir:
        nlp.to_disk(tmp_dir)
        nlp2 = util.load_model_from_path(tmp_dir)
        doc2 = nlp2(test_text)
        cats2 = doc2.cats
        assert cats2["HAPPY"] > 0.9
        assert cats2["CONFUSED"] > 0.9
    # Test scoring
    scores = nlp.evaluate(train_examples)
    assert scores["cats_micro_f"] == 1.0
    assert scores["cats_macro_f"] == 1.0
    assert "cats_score_desc" in scores
    # Make sure that running pipe twice, or comparing to call, always amounts to the same predictions
    texts = ["Just a sentence.", "I like green eggs.", "I am happy.", "I eat ham."]
    batch_deps_1 = [doc.cats for doc in nlp.pipe(texts)]
    batch_deps_2 = [doc.cats for doc in nlp.pipe(texts)]
    no_batch_deps = [doc.cats for doc in [nlp(text) for text in texts]]
    assert_equal(batch_deps_1, batch_deps_2)
    assert_equal(batch_deps_1, no_batch_deps)
 # fmt: off
@pytest.mark.parametrize(
-    "textcat_config",
+    "name,train_data,textcat_config",
    [
-        {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": False, "ngram_size": 1, "no_output_layer": False},
+        ("textcat_multilabel", TRAIN_DATA_MULTI_LABEL, {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": False, "ngram_size": 1, "no_output_layer": False}),
-        {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": True, "ngram_size": 4, "no_output_layer": False},
+        ("textcat", TRAIN_DATA_SINGLE_LABEL, {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": True, "ngram_size": 4, "no_output_layer": False}),
-        {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": False, "ngram_size": 3, "no_output_layer": True},
+        ("textcat_multilabel", TRAIN_DATA_MULTI_LABEL, {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": False, "ngram_size": 3, "no_output_layer": True}),
-        {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": True, "ngram_size": 2, "no_output_layer": True},
+        ("textcat", TRAIN_DATA_SINGLE_LABEL, {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": True, "ngram_size": 2, "no_output_layer": True}),
-        {"@architectures": "spacy.TextCatEnsemble.v2", "tok2vec": DEFAULT_TOK2VEC_MODEL, "linear_model": {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": False, "ngram_size": 1, "no_output_layer": False}},
+        ("textcat_multilabel", TRAIN_DATA_MULTI_LABEL, {"@architectures": "spacy.TextCatEnsemble.v2", "tok2vec": DEFAULT_TOK2VEC_MODEL, "linear_model": {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": False, "ngram_size": 1, "no_output_layer": False}}),
-        {"@architectures": "spacy.TextCatEnsemble.v2", "tok2vec": DEFAULT_TOK2VEC_MODEL, "linear_model": {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": True, "ngram_size": 5, "no_output_layer": False}},
+        ("textcat", TRAIN_DATA_SINGLE_LABEL, {"@architectures": "spacy.TextCatEnsemble.v2", "tok2vec": DEFAULT_TOK2VEC_MODEL, "linear_model": {"@architectures": "spacy.TextCatBOW.v1", "exclusive_classes": True, "ngram_size": 5, "no_output_layer": False}}),
-        {"@architectures": "spacy.TextCatCNN.v1", "tok2vec": DEFAULT_TOK2VEC_MODEL, "exclusive_classes": True},
+        ("textcat", TRAIN_DATA_SINGLE_LABEL, {"@architectures": "spacy.TextCatCNN.v1", "tok2vec": DEFAULT_TOK2VEC_MODEL, "exclusive_classes": True}),
-        {"@architectures": "spacy.TextCatCNN.v1", "tok2vec": DEFAULT_TOK2VEC_MODEL, "exclusive_classes": False},
+        ("textcat_multilabel", TRAIN_DATA_MULTI_LABEL, {"@architectures": "spacy.TextCatCNN.v1", "tok2vec": DEFAULT_TOK2VEC_MODEL, "exclusive_classes": False}),
    ],
 )
 # fmt: on
-def test_textcat_configs(textcat_config):
+def test_textcat_configs(name, train_data, textcat_config):
    pipe_config = {"model": textcat_config}
    nlp = English()
-    textcat = nlp.add_pipe("textcat", config=pipe_config)
+    textcat = nlp.add_pipe(name, config=pipe_config)
    train_examples = []
-    for text, annotations in TRAIN_DATA:
+    for text, annotations in train_data:
        train_examples.append(Example.from_dict(nlp.make_doc(text), annotations))
        for label, value in annotations.get("cats").items():
            textcat.add_label(label)
@ -264,15 +354,24 @@ def test_textcat_configs(textcat_config):
 def test_positive_class():
    nlp = English()
    textcat = nlp.add_pipe("textcat")
-    get_examples = make_get_examples(nlp)
+    get_examples = make_get_examples_single_label(nlp)
    textcat.initialize(get_examples, labels=["POS", "NEG"], positive_label="POS")
    assert textcat.labels == ("POS", "NEG")
    assert textcat.cfg["positive_label"] == "POS"
    textcat_multilabel = nlp.add_pipe("textcat_multilabel")
    get_examples = make_get_examples_multi_label(nlp)
    with pytest.raises(TypeError):
        textcat_multilabel.initialize(get_examples, labels=["POS", "NEG"], positive_label="POS")
    textcat_multilabel.initialize(get_examples, labels=["FICTION", "DRAMA"])
    assert textcat_multilabel.labels == ("FICTION", "DRAMA")
    assert "positive_label" not in textcat_multilabel.cfg
 def test_positive_class_not_present():
    nlp = English()
    textcat = nlp.add_pipe("textcat")
-    get_examples = make_get_examples(nlp)
+    get_examples = make_get_examples_single_label(nlp)
    with pytest.raises(ValueError):
        textcat.initialize(get_examples, labels=["SOME", "THING"], positive_label="POS")
@ -280,11 +379,9 @@ def test_positive_class_not_present():
 def test_positive_class_not_binary():
    nlp = English()
    textcat = nlp.add_pipe("textcat")
-    get_examples = make_get_examples(nlp)
+    get_examples = make_get_examples_multi_label(nlp)
    with pytest.raises(ValueError):
-        textcat.initialize(
+        textcat.initialize(get_examples, labels=["SOME", "THING", "POS"], positive_label="POS")
            get_examples, labels=["SOME", "THING", "POS"], positive_label="POS"
        )
 def test_textcat_evaluation():
--- a/spacy/tests/pipeline/test_tok2vec.py
+++ b/spacy/tests/pipeline/test_tok2vec.py
@ -113,7 +113,7 @@ cfg_string = """
    factory = "tok2vec"
    [components.tok2vec.model]
-    @architectures = "spacy.Tok2Vec.v1"
+    @architectures = "spacy.Tok2Vec.v2"
    [components.tok2vec.model.embed]
    @architectures = "spacy.MultiHashEmbed.v1"
@ -123,7 +123,7 @@ cfg_string = """
    include_static_vectors = false
    [components.tok2vec.model.encode]
-    @architectures = "spacy.MaxoutWindowEncoder.v1"
+    @architectures = "spacy.MaxoutWindowEncoder.v2"
    width = 96
    depth = 4
    window_size = 1
--- a/spacy/tests/regression/test_issue4001-4500.py
+++ b/spacy/tests/regression/test_issue4001-4500.py
@ -288,35 +288,33 @@ def test_multiple_predictions():
    dummy_pipe(doc)
@pytest.mark.skip(reason="removed Beam stuff during the Example/GoldParse refactor")
 def test_issue4313():
    """ This should not crash or exit with some strange error code """
    beam_width = 16
    beam_density = 0.0001
    nlp = English()
-    config = {}
+    config = {
-    ner = nlp.create_pipe("ner", config=config)
+        "beam_width": beam_width,
        "beam_density": beam_density,
    }
    ner = nlp.add_pipe("beam_ner", config=config)
    ner.add_label("SOME_LABEL")
-    ner.initialize(lambda: [])
+    nlp.initialize()
    # add a new label to the doc
    doc = nlp("What do you think about Apple ?")
    assert len(ner.labels) == 1
    assert "SOME_LABEL" in ner.labels
    ner.add_label("MY_ORG")   # TODO: not sure if we want this to be necessary...
    apple_ent = Span(doc, 5, 6, label="MY_ORG")
    doc.ents = list(doc.ents) + [apple_ent]
    # ensure the beam_parse still works with the new label
    docs = [doc]
-    beams = nlp.entity.beam_parse(
+    ner = nlp.get_pipe("beam_ner")
-        docs, beam_width=beam_width, beam_density=beam_density
+    beams = ner.beam_parse(
        docs, drop=0.0, beam_width=beam_width, beam_density=beam_density
    )
    for doc, beam in zip(docs, beams):
        entity_scores = defaultdict(float)
        for score, ents in nlp.entity.moves.get_beam_parses(beam):
            for start, end, label in ents:
                entity_scores[(start, end, label)] += score
 def test_issue4348():
    """Test that training the tagger with empty data, doesn't throw errors"""
--- a/spacy/tests/regression/test_issue5501-6000.py
+++ b/spacy/tests/regression/test_issue5501-6000.py
@ -2,8 +2,11 @@ import pytest
 from thinc.api import Config, fix_random_seed
 from spacy.lang.en import English
-from spacy.pipeline.textcat import default_model_config, bow_model_config
+from spacy.pipeline.textcat import single_label_default_config, single_label_bow_config
-from spacy.pipeline.textcat import cnn_model_config
+from spacy.pipeline.textcat import single_label_cnn_config
 from spacy.pipeline.textcat_multilabel import multi_label_default_config
 from spacy.pipeline.textcat_multilabel import multi_label_bow_config
 from spacy.pipeline.textcat_multilabel import multi_label_cnn_config
 from spacy.tokens import Span
 from spacy import displacy
 from spacy.pipeline import merge_entities
@ -11,7 +14,15 @@ from spacy.training import Example
@pytest.mark.parametrize(
-    "textcat_config", [default_model_config, bow_model_config, cnn_model_config]
+    "textcat_config",
    [
        single_label_default_config,
        single_label_bow_config,
        single_label_cnn_config,
        multi_label_default_config,
        multi_label_bow_config,
        multi_label_cnn_config,
    ],
 )
 def test_issue5551(textcat_config):
    """Test that after fixing the random seed, the results of the pipeline are truly identical"""
--- a/spacy/tests/regression/test_issue6258.py
+++ b/spacy/tests/regression/test_issue6258.py
@ -1,4 +1,3 @@
 import pydantic
 import pytest
 from pydantic import ValidationError
 from spacy.schemas import TokenPattern, TokenPatternSchema
--- a/spacy/tests/serialize/test_serialize_config.py
+++ b/spacy/tests/serialize/test_serialize_config.py
@ -208,7 +208,7 @@ def test_create_nlp_from_pretraining_config():
    config = Config().from_str(pretrain_config_string)
    pretrain_config = load_config(DEFAULT_CONFIG_PRETRAIN_PATH)
    filled = config.merge(pretrain_config)
-    resolved = registry.resolve(filled["pretraining"], schema=ConfigSchemaPretrain)
+    registry.resolve(filled["pretraining"], schema=ConfigSchemaPretrain)
 def test_create_nlp_from_config_multiple_instances():
--- a/spacy/tests/serialize/test_serialize_pipeline.py
+++ b/spacy/tests/serialize/test_serialize_pipeline.py
@ -4,7 +4,7 @@ from spacy.pipeline import Tagger, DependencyParser, EntityRecognizer
 from spacy.pipeline import TextCategorizer, SentenceRecognizer, TrainablePipe
 from spacy.pipeline.dep_parser import DEFAULT_PARSER_MODEL
 from spacy.pipeline.tagger import DEFAULT_TAGGER_MODEL
-from spacy.pipeline.textcat import DEFAULT_TEXTCAT_MODEL
+from spacy.pipeline.textcat import DEFAULT_SINGLE_TEXTCAT_MODEL
 from spacy.pipeline.senter import DEFAULT_SENTER_MODEL
 from spacy.lang.en import English
 from thinc.api import Linear
@ -24,7 +24,7 @@ def parser(en_vocab):
        "update_with_oracle_cut_size": 100,
        "beam_width": 1,
        "beam_update_prob": 1.0,
-        "beam_density": 0.0
+        "beam_density": 0.0,
    }
    cfg = {"model": DEFAULT_PARSER_MODEL}
    model = registry.resolve(cfg, validate=True)["model"]
@ -41,7 +41,7 @@ def blank_parser(en_vocab):
        "update_with_oracle_cut_size": 100,
        "beam_width": 1,
        "beam_update_prob": 1.0,
-        "beam_density": 0.0
+        "beam_density": 0.0,
    }
    cfg = {"model": DEFAULT_PARSER_MODEL}
    model = registry.resolve(cfg, validate=True)["model"]
@ -66,7 +66,7 @@ def test_serialize_parser_roundtrip_bytes(en_vocab, Parser):
        "update_with_oracle_cut_size": 100,
        "beam_width": 1,
        "beam_update_prob": 1.0,
-        "beam_density": 0.0
+        "beam_density": 0.0,
    }
    cfg = {"model": DEFAULT_PARSER_MODEL}
    model = registry.resolve(cfg, validate=True)["model"]
@ -90,7 +90,7 @@ def test_serialize_parser_strings(Parser):
        "update_with_oracle_cut_size": 100,
        "beam_width": 1,
        "beam_update_prob": 1.0,
-        "beam_density": 0.0
+        "beam_density": 0.0,
    }
    cfg = {"model": DEFAULT_PARSER_MODEL}
    model = registry.resolve(cfg, validate=True)["model"]
@ -112,7 +112,7 @@ def test_serialize_parser_roundtrip_disk(en_vocab, Parser):
        "update_with_oracle_cut_size": 100,
        "beam_width": 1,
        "beam_update_prob": 1.0,
-        "beam_density": 0.0
+        "beam_density": 0.0,
    }
    cfg = {"model": DEFAULT_PARSER_MODEL}
    model = registry.resolve(cfg, validate=True)["model"]
@ -140,9 +140,6 @@ def test_to_from_bytes(parser, blank_parser):
    assert blank_parser.moves.n_moves == parser.moves.n_moves
@pytest.mark.skip(
    reason="This seems to be a dict ordering bug somewhere. Only failing on some platforms."
 )
 def test_serialize_tagger_roundtrip_bytes(en_vocab, taggers):
    tagger1 = taggers[0]
    tagger1_b = tagger1.to_bytes()
@ -191,7 +188,7 @@ def test_serialize_tagger_strings(en_vocab, de_vocab, taggers):
 def test_serialize_textcat_empty(en_vocab):
    # See issue #1105
-    cfg = {"model": DEFAULT_TEXTCAT_MODEL}
+    cfg = {"model": DEFAULT_SINGLE_TEXTCAT_MODEL}
    model = registry.resolve(cfg, validate=True)["model"]
    textcat = TextCategorizer(en_vocab, model, threshold=0.5)
    textcat.to_bytes(exclude=["vocab"])
--- a/spacy/tests/serialize/test_serialize_tokenizer.py
+++ b/spacy/tests/serialize/test_serialize_tokenizer.py
@ -26,7 +26,6 @@ def test_serialize_custom_tokenizer(en_vocab, en_tokenizer):
    assert tokenizer_reloaded.rules == {}
@pytest.mark.skip(reason="Currently unreliable across platforms")
@pytest.mark.parametrize("text", ["I💜you", "they’re", "“hello”"])
 def test_serialize_tokenizer_roundtrip_bytes(en_tokenizer, text):
    tokenizer = en_tokenizer
@ -38,7 +37,6 @@ def test_serialize_tokenizer_roundtrip_bytes(en_tokenizer, text):
    assert [token.text for token in doc1] == [token.text for token in doc2]
@pytest.mark.skip(reason="Currently unreliable across platforms")
 def test_serialize_tokenizer_roundtrip_disk(en_tokenizer):
    tokenizer = en_tokenizer
    with make_tempdir() as d:
--- a/spacy/tests/test_cli.py
+++ b/spacy/tests/test_cli.py
@ -3,7 +3,9 @@ from click import NoSuchOption
 from spacy.training import docs_to_json, offsets_to_biluo_tags
 from spacy.training.converters import iob_to_docs, conll_ner_to_docs, conllu_to_docs
 from spacy.schemas import ProjectConfigSchema, RecommendationSchema, validate
 from spacy.lang.nl import Dutch
 from spacy.util import ENV_VARS
 from spacy.cli import info
 from spacy.cli.init_config import init_config, RECOMMENDATIONS
 from spacy.cli._util import validate_project_commands, parse_config_overrides
 from spacy.cli._util import load_project_config, substitute_project_variables
@ -15,6 +17,16 @@ import os
 from .util import make_tempdir
 def test_cli_info():
    nlp = Dutch()
    nlp.add_pipe("textcat")
    with make_tempdir() as tmp_dir:
        nlp.to_disk(tmp_dir)
        raw_data = info(tmp_dir, exclude=[""])
        assert raw_data["lang"] == "nl"
        assert raw_data["components"] == ["textcat"]
 def test_cli_converters_conllu_to_docs():
    # from NorNE: https://github.com/ltgoslo/norne/blob/3d23274965f513f23aa48455b28b1878dad23c05/ud/nob/no_bokmaal-ud-dev.conllu
    lines = [
--- a/spacy/tests/test_misc.py
+++ b/spacy/tests/test_misc.py
@ -83,6 +83,7 @@ def test_PrecomputableAffine(nO=4, nI=5, nF=3, nP=2):
 def test_prefer_gpu():
    try:
        import cupy  # noqa: F401
        prefer_gpu()
        assert isinstance(get_current_ops(), CupyOps)
    except ImportError:
@ -92,17 +93,20 @@ def test_prefer_gpu():
 def test_require_gpu():
    try:
        import cupy  # noqa: F401
        require_gpu()
        assert isinstance(get_current_ops(), CupyOps)
    except ImportError:
        with pytest.raises(ValueError):
            require_gpu()
 def test_require_cpu():
    require_cpu()
    assert isinstance(get_current_ops(), NumpyOps)
    try:
        import cupy  # noqa: F401
        require_gpu()
        assert isinstance(get_current_ops(), CupyOps)
    except ImportError:
--- a/spacy/tests/test_scorer.py
+++ b/spacy/tests/test_scorer.py
@ -294,7 +294,7 @@ def test_partial_annotation(en_tokenizer):
        # cats doesn't have an unset state
        if key.startswith("cats"):
            continue
-        assert scores[key] == None
+        assert scores[key] is None
    # partially annotated reference, not overlapping with predicted annotation
    ref_doc = en_tokenizer("a b c d e")
@ -306,13 +306,13 @@ def test_partial_annotation(en_tokenizer):
    example = Example(pred_doc, ref_doc)
    scorer = Scorer()
    scores = scorer.score([example])
-    assert scores["token_acc"] == None
+    assert scores["token_acc"] is None
    assert scores["tag_acc"] == 0.0
    assert scores["pos_acc"] == 0.0
    assert scores["morph_acc"] == 0.0
    assert scores["dep_uas"] == 1.0
    assert scores["dep_las"] == 0.0
-    assert scores["sents_f"] == None
+    assert scores["sents_f"] is None
    # partially annotated reference, overlapping with predicted annotation
    ref_doc = en_tokenizer("a b c d e")
@ -324,13 +324,13 @@ def test_partial_annotation(en_tokenizer):
    example = Example(pred_doc, ref_doc)
    scorer = Scorer()
    scores = scorer.score([example])
-    assert scores["token_acc"] == None
+    assert scores["token_acc"] is None
    assert scores["tag_acc"] == 1.0
    assert scores["pos_acc"] == 1.0
    assert scores["morph_acc"] == 0.0
    assert scores["dep_uas"] == 1.0
    assert scores["dep_las"] == 0.0
-    assert scores["sents_f"] == None
+    assert scores["sents_f"] is None
 def test_roc_auc_score():
@ -391,7 +391,7 @@ def test_roc_auc_score():
    score.score_set(0.25, 0)
    score.score_set(0.75, 0)
    with pytest.raises(ValueError):
-        s = score.score
+        _ = score.score  # noqa: F841
    y_true = [1, 1]
    y_score = [0.25, 0.75]
@ -402,4 +402,4 @@ def test_roc_auc_score():
    score.score_set(0.25, 1)
    score.score_set(0.75, 1)
    with pytest.raises(ValueError):
-        s = score.score
+        _ = score.score  # noqa: F841
--- a/spacy/tests/tokenizer/test_tokenizer.py
+++ b/spacy/tests/tokenizer/test_tokenizer.py
@ -180,3 +180,9 @@ def test_tokenizer_special_cases_idx(tokenizer):
    doc = tokenizer(text)
    assert doc[1].idx == 4
    assert doc[2].idx == 7
 def test_tokenizer_special_cases_spaces(tokenizer):
    assert [t.text for t in tokenizer("a b c")] == ["a", "b", "c"]
    tokenizer.add_special_case("a b c", [{"ORTH": "a b c"}])
    assert [t.text for t in tokenizer("a b c")] == ["a b c"]
--- a/spacy/tests/training/test_readers.py
+++ b/spacy/tests/training/test_readers.py
@ -51,7 +51,7 @@ def test_readers():
    for example in train_corpus(nlp):
        nlp.update([example], sgd=optimizer)
    scores = nlp.evaluate(list(dev_corpus(nlp)))
-    assert scores["cats_score"] == 0.0
+    assert scores["cats_macro_auc"] == 0.0
    # ensure the pipeline runs
    doc = nlp("Quick test")
    assert doc.cats
@ -73,7 +73,7 @@ def test_cat_readers(reader, additional_config):
    nlp_config_string = """
    [training]
    seed = 0
-    
+
    [training.score_weights]
    cats_macro_auc = 1.0
--- a/spacy/tests/vocab_vectors/test_lookups.py
+++ b/spacy/tests/vocab_vectors/test_lookups.py
@ -71,7 +71,6 @@ def test_table_api_to_from_bytes():
    assert "def" not in new_table2
@pytest.mark.skip(reason="This fails on Python 3.5")
 def test_lookups_to_from_bytes():
    lookups = Lookups()
    lookups.add_table("table1", {"foo": "bar", "hello": "world"})
@ -91,7 +90,6 @@ def test_lookups_to_from_bytes():
    assert new_lookups.to_bytes() == lookups_bytes
@pytest.mark.skip(reason="This fails on Python 3.5")
 def test_lookups_to_from_disk():
    lookups = Lookups()
    lookups.add_table("table1", {"foo": "bar", "hello": "world"})
@ -111,7 +109,6 @@ def test_lookups_to_from_disk():
    assert table2["b"] == 2
@pytest.mark.skip(reason="This fails on Python 3.5")
 def test_lookups_to_from_bytes_via_vocab():
    table_name = "test"
    vocab = Vocab()
@ -128,7 +125,6 @@ def test_lookups_to_from_bytes_via_vocab():
    assert new_vocab.to_bytes() == vocab_bytes
@pytest.mark.skip(reason="This fails on Python 3.5")
 def test_lookups_to_from_disk_via_vocab():
    table_name = "test"
    vocab = Vocab()
--- a/spacy/tokenizer.pyx
+++ b/spacy/tokenizer.pyx
@ -258,6 +258,7 @@ cdef class Tokenizer:
            tokens = doc.c
        # Otherwise create a separate array to store modified tokens
        else:
            assert max_length > 0
            tokens = <TokenC*>mem.alloc(max_length, sizeof(TokenC))
        # Modify tokenization according to filtered special cases
        offset = self._retokenize_special_spans(doc, tokens, span_data)
@ -610,7 +611,7 @@ cdef class Tokenizer:
            self.mem.free(stale_special)
        self._rules[string] = substrings
        self._flush_cache()
-        if self.find_prefix(string) or self.find_infix(string) or self.find_suffix(string):
+        if self.find_prefix(string) or self.find_infix(string) or self.find_suffix(string) or " " in string:
            self._special_matcher.add(string, None, self._tokenize_affixes(string, False))
    def _reload_special_cases(self):
--- a/spacy/tokens/_retokenize.pyx
+++ b/spacy/tokens/_retokenize.pyx
@ -188,8 +188,15 @@ def _merge(Doc doc, merges):
                    and doc.c[start - 1].ent_type == token.ent_type:
                merged_iob = 1
        token.ent_iob = merged_iob
        # Set lemma to concatenated lemmas
        merged_lemma = ""
        for span_token in span:
            merged_lemma += span_token.lemma_
            if doc.c[span_token.i].spacy:
                merged_lemma += " "
        merged_lemma = merged_lemma.strip()
        token.lemma = doc.vocab.strings.add(merged_lemma)
        # Unset attributes that don't match new token
        token.lemma = 0
        token.norm = 0
        tokens[merge_index] = token
    # Resize the doc.tensor, if it's set. Let the last row for each token stand
@ -335,7 +342,9 @@ def _split(Doc doc, int token_index, orths, heads, attrs):
        token = &doc.c[token_index + i]
        lex = doc.vocab.get(doc.mem, orth)
        token.lex = lex
-        token.lemma = 0  # reset lemma
+        # If lemma is currently set, set default lemma to orth
        if token.lemma != 0:
            token.lemma = lex.orth
        token.norm = 0  # reset norm
        if to_process_tensor:
            # setting the tensors of the split tokens to array of zeros
--- a/spacy/tokens/doc.pyx
+++ b/spacy/tokens/doc.pyx
@ -225,6 +225,7 @@ cdef class Doc:
        # Guarantee self.lex[i-x], for any i >= 0 and x < padding is in bounds
        # However, we need to remember the true starting places, so that we can
        # realloc.
        assert size + (PADDING*2) > 0
        data_start = <TokenC*>self.mem.alloc(size + (PADDING*2), sizeof(TokenC))
        cdef int i
        for i in range(size + (PADDING*2)):
@ -1097,7 +1098,7 @@ cdef class Doc:
        (vocab,) = vocab
        if attrs is None:
-            attrs = Doc._get_array_attrs()
+            attrs = list(Doc._get_array_attrs())
        else:
            if any(isinstance(attr, str) for attr in attrs):     # resolve attribute names
                attrs = [intify_attr(attr) for attr in attrs]    # intify_attr returns None for invalid attrs
@ -1177,6 +1178,7 @@ cdef class Doc:
        other.length = self.length
        other.max_length = self.max_length
        buff_size = other.max_length + (PADDING*2)
        assert buff_size > 0
        tokens = <TokenC*>other.mem.alloc(buff_size, sizeof(TokenC))
        memcpy(tokens, self.c - PADDING, buff_size * sizeof(TokenC))
        other.c = &tokens[PADDING]
--- a/spacy/training/initialize.py
+++ b/spacy/training/initialize.py
@ -37,9 +37,17 @@ def init_nlp(config: Config, *, use_gpu: int = -1) -> "Language":
    T = registry.resolve(config["training"], schema=ConfigSchemaTraining)
    dot_names = [T["train_corpus"], T["dev_corpus"]]
    if not isinstance(T["train_corpus"], str):
-        raise ConfigValidationError(desc=Errors.E897.format(field="training.train_corpus", type=type(T["train_corpus"])))
+        raise ConfigValidationError(
            desc=Errors.E897.format(
                field="training.train_corpus", type=type(T["train_corpus"])
            )
        )
    if not isinstance(T["dev_corpus"], str):
-        raise ConfigValidationError(desc=Errors.E897.format(field="training.dev_corpus", type=type(T["dev_corpus"])))
+        raise ConfigValidationError(
            desc=Errors.E897.format(
                field="training.dev_corpus", type=type(T["dev_corpus"])
            )
        )
    train_corpus, dev_corpus = resolve_dot_names(config, dot_names)
    optimizer = T["optimizer"]
    # Components that shouldn't be updated during training
--- a/spacy/training/loop.py
+++ b/spacy/training/loop.py
@ -59,6 +59,19 @@ def train(
    batcher = T["batcher"]
    train_logger = T["logger"]
    before_to_disk = create_before_to_disk_callback(T["before_to_disk"])
    # Helper function to save checkpoints. This is a closure for convenience,
    # to avoid passing in all the args all the time.
    def save_checkpoint(is_best):
        with nlp.use_params(optimizer.averages):
            before_to_disk(nlp).to_disk(output_path / DIR_MODEL_LAST)
        if is_best:
            # Avoid saving twice (saving will be more expensive than
            # the dir copy)
            if (output_path / DIR_MODEL_BEST).exists():
                shutil.rmtree(output_path / DIR_MODEL_BEST)
            shutil.copytree(output_path / DIR_MODEL_LAST, output_path / DIR_MODEL_BEST)
    # Components that shouldn't be updated during training
    frozen_components = T["frozen_components"]
    # Create iterator, which yields out info after each optimization step.
@ -87,36 +100,31 @@ def train(
            if is_best_checkpoint is not None and output_path is not None:
                with nlp.select_pipes(disable=frozen_components):
                    update_meta(T, nlp, info)
-                with nlp.use_params(optimizer.averages):
+                save_checkpoint(is_best_checkpoint)
                    nlp = before_to_disk(nlp)
                    nlp.to_disk(output_path / DIR_MODEL_BEST)
    except Exception as e:
        if output_path is not None:
            # We don't want to swallow the traceback if we don't have a
            # specific error, but we do want to warn that we're trying
            # to do something here.
            stdout.write(
                msg.warn(
                    f"Aborting and saving the final best model. "
-                    f"Encountered exception: {str(e)}"
+                    f"Encountered exception: {repr(e)}"
                )
                + "\n"
            )
        raise e
    finally:
        finalize_logger()
-        if optimizer.averages:
+        save_checkpoint(False)
-            nlp.use_params(optimizer.averages)
+    # This will only run if we did't hit an error
-        if output_path is not None:
+    if optimizer.averages:
-            final_model_path = output_path / DIR_MODEL_LAST
+        nlp.use_params(optimizer.averages)
-            nlp.to_disk(final_model_path)
+    if output_path is not None:
-            # This will only run if we don't hit an error
+        stdout.write(
-            stdout.write(
+            msg.good("Saved pipeline to output directory", output_path / DIR_MODEL_LAST)
-                msg.good("Saved pipeline to output directory", final_model_path) + "\n"
+            + "\n"
-            )
+        )
-            return (nlp, final_model_path)
+        return (nlp, output_path / DIR_MODEL_LAST)
-        else:
+    else:
-            return (nlp, None)
+        return (nlp, None)
 def train_while_improving(
--- a/spacy/training/pretrain.py
+++ b/spacy/training/pretrain.py
@ -10,7 +10,7 @@ from wasabi import Printer
 from .example import Example
 from ..tokens import Doc
-from ..schemas import ConfigSchemaTraining, ConfigSchemaPretrain
+from ..schemas import ConfigSchemaPretrain
 from ..util import registry, load_model_from_config, dot_to_object
@ -30,7 +30,6 @@ def pretrain(
        set_gpu_allocator(allocator)
    nlp = load_model_from_config(config)
    _config = nlp.config.interpolate()
    T = registry.resolve(_config["training"], schema=ConfigSchemaTraining)
    P = registry.resolve(_config["pretraining"], schema=ConfigSchemaPretrain)
    corpus = dot_to_object(_config, P["corpus"])
    corpus = registry.resolve({"corpus": corpus})["corpus"]
--- a/spacy/util.py
+++ b/spacy/util.py
@ -69,7 +69,7 @@ CONFIG_SECTION_ORDER = ["paths", "variables", "system", "nlp", "components", "co
 logger = logging.getLogger("spacy")
 logger_stream_handler = logging.StreamHandler()
-logger_stream_handler.setFormatter(logging.Formatter('%(message)s'))
+logger_stream_handler.setFormatter(logging.Formatter("%(message)s"))
 logger.addHandler(logger_stream_handler)
--- a/spacy/vocab.pyx
+++ b/spacy/vocab.pyx
@ -164,7 +164,7 @@ cdef class Vocab:
        if len(string) < 3 or self.length < 10000:
            mem = self.mem
        cdef bint is_oov = mem is not self.mem
-        lex = <LexemeC*>mem.alloc(sizeof(LexemeC), 1)
+        lex = <LexemeC*>mem.alloc(1, sizeof(LexemeC))
        lex.orth = self.strings.add(string)
        lex.length = len(string)
        if self.vectors is not None:
--- a/website/docs/api/architectures.md
+++ b/website/docs/api/architectures.md
@ -5,6 +5,7 @@ source: spacy/ml/models
 menu:
  - ['Tok2Vec', 'tok2vec-arch']
  - ['Transformers', 'transformers']
  - ['Pretraining', 'pretrain']
  - ['Parser & NER', 'parser']
  - ['Tagging', 'tagger']
  - ['Text Classification', 'textcat']
@ -25,20 +26,20 @@ usage documentation on
 ## Tok2Vec architectures {#tok2vec-arch source="spacy/ml/models/tok2vec.py"}
-### spacy.Tok2Vec.v1 {#Tok2Vec}
+### spacy.Tok2Vec.v2 {#Tok2Vec}
 > #### Example config
 >
 > ```ini
 > [model]
-> @architectures = "spacy.Tok2Vec.v1"
+> @architectures = "spacy.Tok2Vec.v2"
 >
 > [model.embed]
 > @architectures = "spacy.CharacterEmbed.v1"
 > # ...
 >
 > [model.encode]
-> @architectures = "spacy.MaxoutWindowEncoder.v1"
+> @architectures = "spacy.MaxoutWindowEncoder.v2"
 > # ...
 > ```
@ -196,13 +197,13 @@ network to construct a single vector to represent the information.
 | `nC`        | The number of UTF-8 bytes to embed per word. Recommended values are between `3` and `8`, although it may depend on the length of words in the language. ~~int~~ |
 | **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~                                                                                          |
-### spacy.MaxoutWindowEncoder.v1 {#MaxoutWindowEncoder}
+### spacy.MaxoutWindowEncoder.v2 {#MaxoutWindowEncoder}
 > #### Example config
 >
 > ```ini
 > [model]
-> @architectures = "spacy.MaxoutWindowEncoder.v1"
+> @architectures = "spacy.MaxoutWindowEncoder.v2"
 > width = 128
 > window_size = 1
 > maxout_pieces = 3
@ -220,13 +221,13 @@ and residual connections.
 | `depth`         | The number of convolutional layers. Recommended value is `4`. ~~int~~                                                                                                                                          |
 | **CREATES**     | The model using the architecture. ~~Model[List[Floats2d], List[Floats2d]]~~                                                                                                                                    |
-### spacy.MishWindowEncoder.v1 {#MishWindowEncoder}
+### spacy.MishWindowEncoder.v2 {#MishWindowEncoder}
 > #### Example config
 >
 > ```ini
 > [model]
-> @architectures = "spacy.MishWindowEncoder.v1"
+> @architectures = "spacy.MishWindowEncoder.v2"
 > width = 64
 > window_size = 1
 > depth = 4
@ -251,19 +252,19 @@ and residual connections.
 > [model]
 > @architectures = "spacy.TorchBiLSTMEncoder.v1"
 > width = 64
-> window_size = 1
+> depth = 2
-> depth = 4
+> dropout = 0.0
 > ```
 Encode context using bidirectional LSTM layers. Requires
 [PyTorch](https://pytorch.org).
-| Name          | Description                                                                                                                                                                                                    |
+| Name        | Description                                                                                                                                                                                                    |
-| ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `width`       | The input and output width. These are required to be the same, to allow residual connections. This value will be determined by the width of the inputs. Recommended values are between `64` and `300`. ~~int~~ |
+| `width`     | The input and output width. These are required to be the same, to allow residual connections. This value will be determined by the width of the inputs. Recommended values are between `64` and `300`. ~~int~~ |
-| `window_size` | The number of words to concatenate around each token to construct the convolution. Recommended value is `1`. ~~int~~                                                                                           |
+| `depth`     | The number of recurrent layers, for instance `depth=2` results in stacking two LSTMs together. ~~int~~                                                                                                         |
-| `depth`       | The number of convolutional layers. Recommended value is `4`. ~~int~~                                                                                                                                          |
+| `dropout`   | Creates a Dropout layer on the outputs of each LSTM layer except the last layer. Set to 0.0 to disable this functionality. ~~float~~                                                                           |
-| **CREATES**   | The model using the architecture. ~~Model[List[Floats2d], List[Floats2d]]~~                                                                                                                                    |
+| **CREATES** | The model using the architecture. ~~Model[List[Floats2d], List[Floats2d]]~~                                                                                                                                    |
 ### spacy.StaticVectors.v1 {#StaticVectors}
@ -426,6 +427,71 @@ one component.
 | `grad_factor`      | Reweight gradients from the component before passing them upstream. You can set this to `0` to "freeze" the transformer weights with respect to the component, or use it to make some components more significant than others. Leaving it at `1.0` is usually fine. ~~float~~ |
 | **CREATES**        | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~                                                                                                                                                                                                        |
 ## Pretraining architectures {#pretrain source="spacy/ml/models/multi_task.py"}
 The spacy `pretrain` command lets you initialize a `Tok2Vec` layer in your
 pipeline with information from raw text. To this end, additional layers are
 added to build a network for a temporary task that forces the `Tok2Vec` layer to
 learn something about sentence structure and word cooccurrence statistics. Two
 pretraining objectives are available, both of which are variants of the cloze
 task [Devlin et al. (2018)](https://arxiv.org/abs/1810.04805) introduced for
 BERT.
 For more information, see the section on
 [pretraining](/usage/embeddings-transformers#pretraining).
 ### spacy.PretrainVectors.v1 {#pretrain_vectors}
 > #### Example config
 >
 > ```ini
 > [pretraining]
 > component = "tok2vec"
 > ...
 >
 > [pretraining.objective]
 > @architectures = "spacy.PretrainVectors.v1"
 > maxout_pieces = 3
 > hidden_size = 300
 > loss = "cosine"
 > ```
 Predict the word's vector from a static embeddings table as pretraining
 objective for a Tok2Vec layer.
 | Name            | Description                                                                                                                                               |
 | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `maxout_pieces` | The number of maxout pieces to use. Recommended values are `2` or `3`. ~~int~~                                                                            |
 | `hidden_size`   | Size of the hidden layer of the model. ~~int~~                                                                                                            |
 | `loss`          | The loss function can be either "cosine" or "L2". We typically recommend to use "cosine". ~~~str~~                                                        |
 | **CREATES**     | A callable function that can create the Model, given the `vocab` of the pipeline and the `tok2vec` layer to pretrain. ~~Callable[[Vocab, Model], Model]~~ |
 ### spacy.PretrainCharacters.v1 {#pretrain_chars}
 > #### Example config
 >
 > ```ini
 > [pretraining]
 > component = "tok2vec"
 > ...
 >
 > [pretraining.objective]
 > @architectures = "spacy.PretrainCharacters.v1"
 > maxout_pieces = 3
 > hidden_size = 300
 > n_characters = 4
 > ```
 Predict some number of leading and trailing UTF-8 bytes as pretraining objective
 for a Tok2Vec layer.
 | Name            | Description                                                                                                                                               |
 | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `maxout_pieces` | The number of maxout pieces to use. Recommended values are `2` or `3`. ~~int~~                                                                            |
 | `hidden_size`   | Size of the hidden layer of the model. ~~int~~                                                                                                            |
 | `n_characters`  | The window of characters - e.g. if `n_characters = 2`, the model will try to predict the first two and last two characters of the word. ~~int~~           |
 | **CREATES**     | A callable function that can create the Model, given the `vocab` of the pipeline and the `tok2vec` layer to pretrain. ~~Callable[[Vocab, Model], Model]~~ |
 ## Parser & NER architectures {#parser}
 ### spacy.TransitionBasedParser.v2 {#TransitionBasedParser source="spacy/ml/models/parser.py"}
@ -534,7 +600,7 @@ specific data and challenge.
 > no_output_layer = false
 >
 > [model.tok2vec]
-> @architectures = "spacy.Tok2Vec.v1"
+> @architectures = "spacy.Tok2Vec.v2"
 >
 > [model.tok2vec.embed]
 > @architectures = "spacy.MultiHashEmbed.v1"
@ -544,7 +610,7 @@ specific data and challenge.
 > include_static_vectors = false
 >
 > [model.tok2vec.encode]
-> @architectures = "spacy.MaxoutWindowEncoder.v1"
+> @architectures = "spacy.MaxoutWindowEncoder.v2"
 > width = ${model.tok2vec.embed.width}
 > window_size = 1
 > maxout_pieces = 3
--- a/website/docs/api/cli.md
+++ b/website/docs/api/cli.md
@ -61,20 +61,27 @@ markup to copy-paste into
 [GitHub issues](https://github.com/explosion/spaCy/issues).
 ```cli
-$ python -m spacy info [--markdown] [--silent]
+$ python -m spacy info [--markdown] [--silent] [--exclude]
 ```
 > #### Example
 >
 > ```cli
 > $ python -m spacy info en_core_web_lg --markdown
 > ```
 ```cli
-$ python -m spacy info [model] [--markdown] [--silent]
+$ python -m spacy info [model] [--markdown] [--silent] [--exclude]
 ```
-| Name                                             | Description                                                                               |
+| Name                                             | Description                                                                                   |
-| ------------------------------------------------ | ----------------------------------------------------------------------------------------- |
+| ------------------------------------------------ | --------------------------------------------------------------------------------------------- |
-| `model`                                          | A trained pipeline, i.e. package name or path (optional). ~~Optional[str] \(positional)~~ |
+| `model`                                          | A trained pipeline, i.e. package name or path (optional). ~~Optional[str] \(positional)~~     |
-| `--markdown`, `-md`                              | Print information as Markdown. ~~bool (flag)~~                                            |
+| `--markdown`, `-md`                              | Print information as Markdown. ~~bool (flag)~~                                                |
-| `--silent`, `-s` <Tag variant="new">2.0.12</Tag> | Don't print anything, just return the values. ~~bool (flag)~~                             |
+| `--silent`, `-s` <Tag variant="new">2.0.12</Tag> | Don't print anything, just return the values. ~~bool (flag)~~                                 |
-| `--help`, `-h`                                   | Show help message and available arguments. ~~bool (flag)~~                                |
+| `--exclude`, `-e`                                | Comma-separated keys to exclude from the print-out. Defaults to `"labels"`. ~~Optional[str]~~ |
-| **PRINTS**                                       | Information about your spaCy installation.                                                |
+| `--help`, `-h`                                   | Show help message and available arguments. ~~bool (flag)~~                                    |
 | **PRINTS**                                       | Information about your spaCy installation.                                                    |
 ## validate {#validate new="2" tag="command"}
@ -121,7 +128,7 @@ customize those settings in your config file later.
 > ```
 ```cli
-$ python -m spacy init config [output_file] [--lang] [--pipeline] [--optimize] [--gpu] [--pretraining]
+$ python -m spacy init config [output_file] [--lang] [--pipeline] [--optimize] [--gpu] [--pretraining] [--force]
 ```
 | Name                   | Description                                                                                                                                                                                                                                                                                                                        |
@ -132,6 +139,7 @@ $ python -m spacy init config [output_file] [--lang] [--pipeline] [--optimize] [
 | `--optimize`, `-o`     | `"efficiency"` or `"accuracy"`. Whether to optimize for efficiency (faster inference, smaller model, lower memory consumption) or higher accuracy (potentially larger and slower model). This will impact the choice of architecture, pretrained weights and related hyperparameters. Defaults to `"efficiency"`. ~~str (option)~~ |
 | `--gpu`, `-G`          | Whether the model can run on GPU. This will impact the choice of architecture, pretrained weights and related hyperparameters. ~~bool (flag)~~                                                                                                                                                                                     |
 | `--pretraining`, `-pt` | Include config for pretraining (with [`spacy pretrain`](/api/cli#pretrain)). Defaults to `False`. ~~bool (flag)~~                                                                                                                                                                                                                  |
 | `--force`, `-f`        | Force overwriting the output file if it already exists. ~~bool (flag)~~                                                                                                                                                                                                                                                            |
 | `--help`, `-h`         | Show help message and available arguments. ~~bool (flag)~~                                                                                                                                                                                                                                                                         |
 | **CREATES**            | The config file for training.                                                                                                                                                                                                                                                                                                      |
@ -783,6 +791,12 @@ in the section `[paths]`.
 </Infobox>
 > #### Example
 >
 > ```cli
 > $ python -m spacy train config.cfg --output ./output --paths.train ./train --paths.dev ./dev
 > ```
 ```cli
 $ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id] [overrides]
 ```
@ -801,15 +815,16 @@ $ python -m spacy train [config_path] [--output] [--code] [--verbose] [--gpu-id]
 ## pretrain {#pretrain new="2.1" tag="command,experimental"}
 Pretrain the "token to vector" ([`Tok2vec`](/api/tok2vec)) layer of pipeline
-components on [raw text](/api/data-formats#pretrain), using an approximate
+components on raw text, using an approximate language-modeling objective.
-language-modeling objective. Specifically, we load pretrained vectors, and train
+Specifically, we load pretrained vectors, and train a component like a CNN,
-a component like a CNN, BiLSTM, etc to predict vectors which match the
+BiLSTM, etc to predict vectors which match the pretrained ones. The weights are
-pretrained ones. The weights are saved to a directory after each epoch. You can
+saved to a directory after each epoch. You can then include a **path to one of
-then include a **path to one of these pretrained weights files** in your
+these pretrained weights files** in your
 [training config](/usage/training#config) as the `init_tok2vec` setting when you
 train your pipeline. This technique may be especially helpful if you have little
 labelled data. See the usage docs on
-[pretraining](/usage/embeddings-transformers#pretraining) for more info.
+[pretraining](/usage/embeddings-transformers#pretraining) for more info. To read
 the raw text, a [`JsonlCorpus`](/api/top-level#jsonlcorpus) is typically used.
 <Infobox title="Changed in v3.0" variant="warning">
@ -823,6 +838,12 @@ auto-generated by setting `--pretraining` on
 </Infobox>
 > #### Example
 >
 > ```cli
 > $ python -m spacy pretrain config.cfg ./output_pretrain --paths.raw_text ./data.jsonl
 > ```
 ```cli
 $ python -m spacy pretrain [config_path] [output_dir] [--code] [--resume-path] [--epoch-resume] [--gpu-id] [overrides]
 ```
--- a/website/docs/api/data-formats.md
+++ b/website/docs/api/data-formats.md
@ -94,7 +94,7 @@ Defines the `nlp` object, its tokenizer and
 >
 > [components.textcat.model]
 > @architectures = "spacy.TextCatBOW.v1"
-> exclusive_classes = false
+> exclusive_classes = true
 > ngram_size = 1
 > no_output_layer = false
 > ```
@ -148,7 +148,7 @@ This section defines a **dictionary** mapping of string keys to functions. Each
 function takes an `nlp` object and yields [`Example`](/api/example) objects. By
 default, the two keys `train` and `dev` are specified and each refer to a
 [`Corpus`](/api/top-level#Corpus). When pretraining, an additional `pretrain`
-section is added that defaults to a [`JsonlCorpus`](/api/top-level#JsonlCorpus).
+section is added that defaults to a [`JsonlCorpus`](/api/top-level#jsonlcorpus).
 You can also register custom functions that return a callable.
 | Name       | Description                                                                                                                                                                 |
--- a/website/docs/api/multilabel_textcategorizer.md
+++ b/website/docs/api/multilabel_textcategorizer.md
@ -0,0 +1,454 @@
 ---
 title: Multi-label TextCategorizer
 tag: class
 source: spacy/pipeline/textcat_multilabel.py
 new: 3
 teaser: 'Pipeline component for multi-label text classification'
 api_base_class: /api/pipe
 api_string_name: textcat_multilabel
 api_trainable: true
 ---
 The text categorizer predicts **categories over a whole document**. It 
 learns non-mutually exclusive labels, which means that zero or more labels 
 may be true per document.
 ## Config and implementation {#config}
 The default config is defined by the pipeline component factory and describes
 how the component should be configured. You can override its settings via the
 `config` argument on [`nlp.add_pipe`](/api/language#add_pipe) or in your
 [`config.cfg` for training](/usage/training#config). See the
 [model architectures](/api/architectures) documentation for details on the
 architectures and their arguments and hyperparameters.
 > #### Example
 >
 > ```python
 > from spacy.pipeline.textcat_multilabel import DEFAULT_MULTI_TEXTCAT_MODEL
 > config = {
 >    "threshold": 0.5,
 >    "model": DEFAULT_MULTI_TEXTCAT_MODEL,
 > }
 > nlp.add_pipe("textcat_multilabel", config=config)
 > ```
 | Setting     | Description                                                                                                                                                      |
 | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `threshold` | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~                                                                   |
 | `model`     | A model instance that predicts scores for each category. Defaults to [TextCatEnsemble](/api/architectures#TextCatEnsemble). ~~Model[List[Doc], List[Floats2d]]~~ |
 ```python
 %%GITHUB_SPACY/spacy/pipeline/textcat_multilabel.py
 ```
 ## MultiLabel_TextCategorizer.\_\_init\_\_ {#init tag="method"}
 > #### Example
 >
 > ```python
 > # Construction via add_pipe with default model
 > textcat = nlp.add_pipe("textcat_multilabel")
 >
 > # Construction via add_pipe with custom model
 > config = {"model": {"@architectures": "my_textcat"}}
 > parser = nlp.add_pipe("textcat_multilabel", config=config)
 >
 > # Construction from class
 > from spacy.pipeline import MultiLabel_TextCategorizer
 > textcat = MultiLabel_TextCategorizer(nlp.vocab, model, threshold=0.5)
 > ```
 Create a new pipeline instance. In your application, you would normally use a
 shortcut for this and instantiate the component using its string name and
 [`nlp.add_pipe`](/api/language#create_pipe).
 | Name           | Description                                                                                                                |
 | -------------- | -------------------------------------------------------------------------------------------------------------------------- |
 | `vocab`        | The shared vocabulary. ~~Vocab~~                                                                                           |
 | `model`        | The Thinc [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model[List[Doc], List[Floats2d]]~~ |
 | `name`         | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~                        |
 | _keyword-only_ |                                                                                                                            |
 | `threshold`    | Cutoff to consider a prediction "positive", relevant when printing accuracy results. ~~float~~                             |
 ## MultiLabel_TextCategorizer.\_\_call\_\_ {#call tag="method"}
 Apply the pipe to one document. The document is modified in place, and returned.
 This usually happens under the hood when the `nlp` object is called on a text
 and all pipeline components are applied to the `Doc` in order. Both
 [`__call__`](/api/multilabel_textcategorizer#call) and [`pipe`](/api/multilabel_textcategorizer#pipe)
 delegate to the [`predict`](/api/multilabel_textcategorizer#predict) and
 [`set_annotations`](/api/multilabel_textcategorizer#set_annotations) methods.
 > #### Example
 >
 > ```python
 > doc = nlp("This is a sentence.")
 > textcat = nlp.add_pipe("textcat_multilabel")
 > # This usually happens under the hood
 > processed = textcat(doc)
 > ```
 | Name        | Description                      |
 | ----------- | -------------------------------- |
 | `doc`       | The document to process. ~~Doc~~ |
 | **RETURNS** | The processed document. ~~Doc~~  |
 ## MultiLabel_TextCategorizer.pipe {#pipe tag="method"}
 Apply the pipe to a stream of documents. This usually happens under the hood
 when the `nlp` object is called on a text and all pipeline components are
 applied to the `Doc` in order. Both [`__call__`](/api/multilabel_textcategorizer#call) and
 [`pipe`](/api/multilabel_textcategorizer#pipe) delegate to the
 [`predict`](/api/multilabel_textcategorizer#predict) and
 [`set_annotations`](/api/multilabel_textcategorizer#set_annotations) methods.
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat_multilabel")
 > for doc in textcat.pipe(docs, batch_size=50):
 >     pass
 > ```
 | Name           | Description                                                   |
 | -------------- | ------------------------------------------------------------- |
 | `stream`       | A stream of documents. ~~Iterable[Doc]~~                      |
 | _keyword-only_ |                                                               |
 | `batch_size`   | The number of documents to buffer. Defaults to `128`. ~~int~~ |
 | **YIELDS**     | The processed documents in order. ~~Doc~~                     |
 ## MultiLabel_TextCategorizer.initialize {#initialize tag="method" new="3"}
 Initialize the component for training. `get_examples` should be a function that
 returns an iterable of [`Example`](/api/example) objects. The data examples are
 used to **initialize the model** of the component and can either be the full
 training data or a representative sample. Initialization includes validating the
 network,
 [inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
 setting up the label scheme based on the data. This method is typically called
 by [`Language.initialize`](/api/language#initialize) and lets you customize
 arguments it receives via the
 [`[initialize.components]`](/api/data-formats#config-initialize) block in the
 config.
 <Infobox variant="warning" title="Changed in v3.0" id="begin_training">
 This method was previously called `begin_training`.
 </Infobox>
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat_multilabel")
 > textcat.initialize(lambda: [], nlp=nlp)
 > ```
 >
 > ```ini
 > ### config.cfg
 > [initialize.components.textcat_multilabel]
 >
 > [initialize.components.textcat_multilabel.labels]
 > @readers = "spacy.read_labels.v1"
 > path = "corpus/labels/textcat.json
 > ```
 | Name             | Description                                                                                                                                                                                                                                                                                                                                                                                                |
 | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `get_examples`   | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~                                                                                                                                                                                                                                                                      |
 | _keyword-only_   |                                                                                                                                                                                                                                                                                                                                                                                                            |
 | `nlp`            | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~                                                                                                                                                                                                                                                                                                                                       |
 | `labels`         | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ |
 ## MultiLabel_TextCategorizer.predict {#predict tag="method"}
 Apply the component's model to a batch of [`Doc`](/api/doc) objects without
 modifying them.
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat_multilabel")
 > scores = textcat.predict([doc1, doc2])
 > ```
 | Name        | Description                                 |
 | ----------- | ------------------------------------------- |
 | `docs`      | The documents to predict. ~~Iterable[Doc]~~ |
 | **RETURNS** | The model's prediction for each document.   |
 ## MultiLabel_TextCategorizer.set_annotations {#set_annotations tag="method"}
 Modify a batch of [`Doc`](/api/doc) objects using pre-computed scores.
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat_multilabel")
 > scores = textcat.predict(docs)
 > textcat.set_annotations(docs, scores)
 > ```
 | Name     | Description                                               |
 | -------- | --------------------------------------------------------- |
 | `docs`   | The documents to modify. ~~Iterable[Doc]~~                |
 | `scores` | The scores to set, produced by `MultiLabel_TextCategorizer.predict`. |
 ## MultiLabel_TextCategorizer.update {#update tag="method"}
 Learn from a batch of [`Example`](/api/example) objects containing the
 predictions and gold-standard annotations, and update the component's model.
 Delegates to [`predict`](/api/multilabel_textcategorizer#predict) and
 [`get_loss`](/api/multilabel_textcategorizer#get_loss).
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat_multilabel")
 > optimizer = nlp.initialize()
 > losses = textcat.update(examples, sgd=optimizer)
 > ```
 | Name              | Description                                                                                                                        |
 | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
 | `examples`        | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                                  |
 | _keyword-only_    |                                                                                                                                    |
 | `drop`            | The dropout rate. ~~float~~                                                                                                        |
 | `set_annotations` | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](#set_annotations). ~~bool~~ |
 | `sgd`             | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~                      |
 | `losses`          | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~           |
 | **RETURNS**       | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                              |
 ## MultiLabel_TextCategorizer.rehearse {#rehearse tag="method,experimental" new="3"}
 Perform a "rehearsal" update from a batch of data. Rehearsal updates teach the
 current model to make predictions similar to an initial model to try to address
 the "catastrophic forgetting" problem. This feature is experimental.
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat_multilabel")
 > optimizer = nlp.resume_training()
 > losses = textcat.rehearse(examples, sgd=optimizer)
 > ```
 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
 | `examples`     | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~                                        |
 | _keyword-only_ |                                                                                                                          |
 | `drop`         | The dropout rate. ~~float~~                                                                                              |
 | `sgd`          | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~            |
 | `losses`       | Optional record of the loss during training. Updated using the component name as the key. ~~Optional[Dict[str, float]]~~ |
 | **RETURNS**    | The updated `losses` dictionary. ~~Dict[str, float]~~                                                                    |
 ## MultiLabel_TextCategorizer.get_loss {#get_loss tag="method"}
 Find the loss and gradient of loss for the batch of documents and their
 predicted scores.
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat_multilabel")
 > scores = textcat.predict([eg.predicted for eg in examples])
 > loss, d_loss = textcat.get_loss(examples, scores)
 > ```
 | Name        | Description                                                                 |
 | ----------- | --------------------------------------------------------------------------- |
 | `examples`  | The batch of examples. ~~Iterable[Example]~~                                |
 | `scores`    | Scores representing the model's predictions.                                |
 | **RETURNS** | The loss and the gradient, i.e. `(loss, gradient)`. ~~Tuple[float, float]~~ |
 ## MultiLabel_TextCategorizer.score {#score tag="method" new="3"}
 Score a batch of examples.
 > #### Example
 >
 > ```python
 > scores = textcat.score(examples)
 > ```
 | Name             | Description                                                                                                          |
 | ---------------- | -------------------------------------------------------------------------------------------------------------------- |
 | `examples`       | The examples to score. ~~Iterable[Example]~~                                                                         |
 | _keyword-only_   |                                                                                                                      |
 | **RETURNS**      | The scores, produced by [`Scorer.score_cats`](/api/scorer#score_cats). ~~Dict[str, Union[float, Dict[str, float]]]~~ |
 ## MultiLabel_TextCategorizer.create_optimizer {#create_optimizer tag="method"}
 Create an optimizer for the pipeline component.
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat")
 > optimizer = textcat.create_optimizer()
 > ```
 | Name        | Description                  |
 | ----------- | ---------------------------- |
 | **RETURNS** | The optimizer. ~~Optimizer~~ |
 ## MultiLabel_TextCategorizer.use_params {#use_params tag="method, contextmanager"}
 Modify the pipe's model to use the given parameter values.
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat")
 > with textcat.use_params(optimizer.averages):
 >     textcat.to_disk("/best_model")
 > ```
 | Name     | Description                                        |
 | -------- | -------------------------------------------------- |
 | `params` | The parameter values to use in the model. ~~dict~~ |
 ## MultiLabel_TextCategorizer.add_label {#add_label tag="method"}
 Add a new label to the pipe. Raises an error if the output dimension is already
 set, or if the model has already been fully [initialized](#initialize). Note
 that you don't have to call this method if you provide a **representative data
 sample** to the [`initialize`](#initialize) method. In this case, all labels
 found in the sample will be automatically added to the model, and the output
 dimension will be [inferred](/usage/layers-architectures#thinc-shape-inference)
 automatically.
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat")
 > textcat.add_label("MY_LABEL")
 > ```
 | Name        | Description                                                 |
 | ----------- | ----------------------------------------------------------- |
 | `label`     | The label to add. ~~str~~                                   |
 | **RETURNS** | `0` if the label is already present, otherwise `1`. ~~int~~ |
 ## MultiLabel_TextCategorizer.to_disk {#to_disk tag="method"}
 Serialize the pipe to disk.
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat")
 > textcat.to_disk("/path/to/textcat")
 > ```
 | Name           | Description                                                                                                                                |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
 | `path`         | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
 | _keyword-only_ |                                                                                                                                            |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~                                                |
 ## MultiLabel_TextCategorizer.from_disk {#from_disk tag="method"}
 Load the pipe from disk. Modifies the object in place and returns it.
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat")
 > textcat.from_disk("/path/to/textcat")
 > ```
 | Name           | Description                                                                                     |
 | -------------- | ----------------------------------------------------------------------------------------------- |
 | `path`         | A path to a directory. Paths may be either strings or `Path`-like objects. ~~Union[str, Path]~~ |
 | _keyword-only_ |                                                                                                 |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~     |
 | **RETURNS**    | The modified `MultiLabel_TextCategorizer` object. ~~MultiLabel_TextCategorizer~~                                      |
 ## MultiLabel_TextCategorizer.to_bytes {#to_bytes tag="method"}
 > #### Example
 >
 > ```python
 > textcat = nlp.add_pipe("textcat")
 > textcat_bytes = textcat.to_bytes()
 > ```
 Serialize the pipe to a bytestring.
 | Name           | Description                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------- |
 | _keyword-only_ |                                                                                             |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
 | **RETURNS**    | The serialized form of the `MultiLabel_TextCategorizer` object. ~~bytes~~                              |
 ## MultiLabel_TextCategorizer.from_bytes {#from_bytes tag="method"}
 Load the pipe from a bytestring. Modifies the object in place and returns it.
 > #### Example
 >
 > ```python
 > textcat_bytes = textcat.to_bytes()
 > textcat = nlp.add_pipe("textcat")
 > textcat.from_bytes(textcat_bytes)
 > ```
 | Name           | Description                                                                                 |
 | -------------- | ------------------------------------------------------------------------------------------- |
 | `bytes_data`   | The data to load from. ~~bytes~~                                                            |
 | _keyword-only_ |                                                                                             |
 | `exclude`      | String names of [serialization fields](#serialization-fields) to exclude. ~~Iterable[str]~~ |
 | **RETURNS**    | The `MultiLabel_TextCategorizer` object. ~~MultiLabel_TextCategorizer~~                                           |
 ## MultiLabel_TextCategorizer.labels {#labels tag="property"}
 The labels currently added to the component.
 > #### Example
 >
 > ```python
 > textcat.add_label("MY_LABEL")
 > assert "MY_LABEL" in textcat.labels
 > ```
 | Name        | Description                                            |
 | ----------- | ------------------------------------------------------ |
 | **RETURNS** | The labels added to the component. ~~Tuple[str, ...]~~ |
 ## MultiLabel_TextCategorizer.label_data {#label_data tag="property" new="3"}
 The labels currently added to the component and their internal meta information.
 This is the data generated by [`init labels`](/api/cli#init-labels) and used by
 [`MultiLabel_TextCategorizer.initialize`](/api/multilabel_textcategorizer#initialize) to initialize
 the model with a pre-defined label set.
 > #### Example
 >
 > ```python
 > labels = textcat.label_data
 > textcat.initialize(lambda: [], nlp=nlp, labels=labels)
 > ```
 | Name        | Description                                                |
 | ----------- | ---------------------------------------------------------- |
 | **RETURNS** | The label data added to the component. ~~Tuple[str, ...]~~ |
 ## Serialization fields {#serialization-fields}
 During serialization, spaCy will export several data fields used to restore
 different aspects of the object. If needed, you can exclude them from
 serialization by passing in the string names via the `exclude` argument.
 > #### Example
 >
 > ```python
 > data = textcat.to_disk("/path", exclude=["vocab"])
 > ```
 | Name    | Description                                                    |
 | ------- | -------------------------------------------------------------- |
 | `vocab` | The shared [`Vocab`](/api/vocab).                              |
 | `cfg`   | The config file. You usually don't want to exclude this.       |
 | `model` | The binary model data. You usually don't want to exclude this. |
--- a/website/docs/api/textcategorizer.md
+++ b/website/docs/api/textcategorizer.md
@ -3,17 +3,15 @@ title: TextCategorizer
 tag: class
 source: spacy/pipeline/textcat.py
 new: 2
-teaser: 'Pipeline component for text classification'
+teaser: 'Pipeline component for single-label text classification'
 api_base_class: /api/pipe
 api_string_name: textcat
 api_trainable: true
 ---
 The text categorizer predicts **categories over a whole document**. It can learn
-one or more labels, and the labels can be mutually exclusive (i.e. one true
+one or more labels, and the labels are mutually exclusive - there is exactly one 
-label per document) or non-mutually exclusive (i.e. zero or more labels may be
+true label per document. 
 true per document). The multi-label setting is controlled by the model instance
 that's provided.
 ## Config and implementation {#config}
@ -27,10 +25,10 @@ architectures and their arguments and hyperparameters.
 > #### Example
 >
 > ```python
-> from spacy.pipeline.textcat import DEFAULT_TEXTCAT_MODEL
+> from spacy.pipeline.textcat import DEFAULT_SINGLE_TEXTCAT_MODEL
 > config = {
 >    "threshold": 0.5,
->    "model": DEFAULT_TEXTCAT_MODEL,
+>    "model": DEFAULT_SINGLE_TEXTCAT_MODEL,
 > }
 > nlp.add_pipe("textcat", config=config)
 > ```
@ -280,7 +278,6 @@ Score a batch of examples.
 | ---------------- | -------------------------------------------------------------------------------------------------------------------- |
 | `examples`       | The examples to score. ~~Iterable[Example]~~                                                                         |
 | _keyword-only_   |                                                                                                                      |
 | `positive_label` | Optional positive label. ~~Optional[str]~~                                                                           |
 | **RETURNS**      | The scores, produced by [`Scorer.score_cats`](/api/scorer#score_cats). ~~Dict[str, Union[float, Dict[str, float]]]~~ |
 ## TextCategorizer.create_optimizer {#create_optimizer tag="method"}
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@ -129,13 +129,13 @@ the entity recognizer, use a
 factory = "tok2vec"
 [components.tok2vec.model]
-@architectures = "spacy.Tok2Vec.v1"
+@architectures = "spacy.Tok2Vec.v2"
 [components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v1"
 [components.tok2vec.model.encode]
-@architectures = "spacy.MaxoutWindowEncoder.v1"
+@architectures = "spacy.MaxoutWindowEncoder.v2"
 [components.ner]
 factory = "ner"
@ -161,13 +161,13 @@ factory = "ner"
@architectures = "spacy.TransitionBasedParser.v1"
 [components.ner.model.tok2vec]
-@architectures = "spacy.Tok2Vec.v1"
+@architectures = "spacy.Tok2Vec.v2"
 [components.ner.model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v1"
 [components.ner.model.tok2vec.encode]
-@architectures = "spacy.MaxoutWindowEncoder.v1"
+@architectures = "spacy.MaxoutWindowEncoder.v2"
 ```
 <!-- TODO: Once rehearsal is tested, mention it here. -->
@ -713,34 +713,39 @@ layer = "tok2vec"
 #### Pretraining objectives {#pretraining-details}
 Two pretraining objectives are available, both of which are variants of the
 cloze task [Devlin et al. (2018)](https://arxiv.org/abs/1810.04805) introduced
 for BERT. The objective can be defined and configured via the
 `[pretraining.objective]` config block.
 > ```ini
 > ### Characters objective
 > [pretraining.objective]
-> type = "characters"
+> @architectures = "spacy.PretrainCharacters.v1"
 > maxout_pieces = 3
 > hidden_size = 300
 > n_characters = 4
 > ```
 >
 > ```ini
 > ### Vectors objective
 > [pretraining.objective]
-> type = "vectors"
+> @architectures = "spacy.PretrainVectors.v1"
 > maxout_pieces = 3
 > hidden_size = 300
 > loss = "cosine"
 > ```
- **Characters:** The `"characters"` objective asks the model to predict some
+Two pretraining objectives are available, both of which are variants of the
-  number of leading and trailing UTF-8 bytes for the words. For instance,
+cloze task [Devlin et al. (2018)](https://arxiv.org/abs/1810.04805) introduced
-  setting `n_characters = 2`, the model will try to predict the first two and
+for BERT. The objective can be defined and configured via the
-  last two characters of the word.
+`[pretraining.objective]` config block.
- **Vectors:** The `"vectors"` objective asks the model to predict the word's
+- [`PretrainCharacters`](/api/architectures#pretrain_chars): The `"characters"`
-  vector, from a static embeddings table. This requires a word vectors model to
+  objective asks the model to predict some number of leading and trailing UTF-8
-  be trained and loaded. The vectors objective can optimize either a cosine or
+  bytes for the words. For instance, setting `n_characters = 2`, the model will
-  an L2 loss. We've generally found cosine loss to perform better.
+  try to predict the first two and last two characters of the word.
 - [`PretrainVectors`](/api/architectures#pretrain_vectors): The `"vectors"`
  objective asks the model to predict the word's vector, from a static
  embeddings table. This requires a word vectors model to be trained and loaded.
  The vectors objective can optimize either a cosine or an L2 loss. We've
  generally found cosine loss to perform better.
 These pretraining objectives use a trick that we term **language modelling with
 approximate outputs (LMAO)**. The motivation for the trick is that predicting an
--- a/website/docs/usage/layers-architectures.md
+++ b/website/docs/usage/layers-architectures.md
@ -134,7 +134,7 @@ labels = []
 nO = null
 [components.textcat.model.tok2vec]
-@architectures = "spacy.Tok2Vec.v1"
+@architectures = "spacy.Tok2Vec.v2"
 [components.textcat.model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v1"
@ -144,7 +144,7 @@ attrs = ["ORTH", "LOWER", "PREFIX", "SUFFIX", "SHAPE", "ID"]
 include_static_vectors = false
 [components.textcat.model.tok2vec.encode]
-@architectures = "spacy.MaxoutWindowEncoder.v1"
+@architectures = "spacy.MaxoutWindowEncoder.v2"
 width = ${components.textcat.model.tok2vec.embed.width}
 window_size = 1
 maxout_pieces = 3
@ -152,7 +152,7 @@ depth = 2
 [components.textcat.model.linear_model]
@architectures = "spacy.TextCatBOW.v1"
-exclusive_classes = false
+exclusive_classes = true
 ngram_size = 1
 no_output_layer = false
 ```
@ -170,7 +170,7 @@ labels = []
 [components.textcat.model]
@architectures = "spacy.TextCatBOW.v1"
-exclusive_classes = false
+exclusive_classes = true
 ngram_size = 1
 no_output_layer = false
 nO = null
@ -201,14 +201,14 @@ tokens, and their combination forms a typical
 factory = "tok2vec"
 [components.tok2vec.model]
-@architectures = "spacy.Tok2Vec.v1"
+@architectures = "spacy.Tok2Vec.v2"
 [components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v1"
 # ...
 [components.tok2vec.model.encode]
-@architectures = "spacy.MaxoutWindowEncoder.v1"
+@architectures = "spacy.MaxoutWindowEncoder.v2"
 # ...
 ```
@ -224,7 +224,7 @@ architecture:
 # ...
 [components.tok2vec.model.encode]
-@architectures = "spacy.MaxoutWindowEncoder.v1"
+@architectures = "spacy.MaxoutWindowEncoder.v2"
 # ...
 ```
@ -716,7 +716,7 @@ that we want to classify as being related or not. As these candidate pairs are
 typically formed within one document, this function takes a [`Doc`](/api/doc) as
 input and outputs a `List` of `Span` tuples. For instance, the following
 implementation takes any two entities from the same document, as long as they
-are within a **maximum distance** (in number of tokens) of eachother:
+are within a **maximum distance** (in number of tokens) of each other:
 > #### config.cfg (excerpt)
 >
@ -742,7 +742,7 @@ def create_instances(max_length: int) -> Callable[[Doc], List[Tuple[Span, Span]]
    return get_candidates
 ```
-This function in added to the [`@misc` registry](/api/top-level#registry) so we
+This function is added to the [`@misc` registry](/api/top-level#registry) so we
 can refer to it from the config, and easily swap it out for any other candidate
 generation function.
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -1060,7 +1060,7 @@ In this example we assume a custom function `read_custom_data` which loads or
 generates texts with relevant text classification annotations. Then, small
 lexical variations of the input text are created before generating the final
 [`Example`](/api/example) objects. The `@spacy.registry.readers` decorator lets
-you register the function creating the custom reader in the `readers`
+you register the function creating the custom reader in the `readers` 
 [registry](/api/top-level#registry) and assign it a string name, so it can be
 used in your config. All arguments on the registered function become available
 as **config settings** – in this case, `source`.
`@ -4,4 +4,4 @@ from .transition_system cimport Transition, TransitionSystem`


	`cdef class ArcEager(TransitionSystem):`	`cdef class ArcEager(TransitionSystem):`
	`pass`	`cdef get_arcs(self, StateC* state)`