Merge branch 'master' into spacy.io

2025-11-15 07:15:52 +03:00 · 2021-04-24 12:58:47 +02:00 · 2021-04-24 12:58:47 +02:00 · 29ac7f776a
commit 29ac7f776a
parent 047d912904 df3444421a
99 changed files with 2019 additions and 560 deletions
--- a/.github/azure-steps.yml
+++ b/.github/azure-steps.yml
@ -0,0 +1,57 @@
+parameters:
+  python_version: ''
+  architecture: ''
+  prefix: ''
+  gpu: false
+  num_build_jobs: 1
+
+steps:
+  - task: UsePythonVersion@0
+    inputs:
+      versionSpec: ${{ parameters.python_version }}
+      architecture: ${{ parameters.architecture }}
+
+  - script: |
+      ${{ parameters.prefix }} python -m pip install -U pip setuptools
+      ${{ parameters.prefix }} python -m pip install -U -r requirements.txt
+    displayName: "Install dependencies"
+
+  - script: |
+      ${{ parameters.prefix }} python setup.py build_ext --inplace -j ${{ parameters.num_build_jobs }}
+      ${{ parameters.prefix }} python setup.py sdist --formats=gztar
+    displayName: "Compile and build sdist"
+
+  - task: DeleteFiles@1
+    inputs:
+      contents: "spacy"
+    displayName: "Delete source directory"
+
+  - script: |
+      ${{ parameters.prefix }} python -m pip freeze --exclude torch --exclude cupy-cuda110 > installed.txt
+      ${{ parameters.prefix }} python -m pip uninstall -y -r installed.txt
+    displayName: "Uninstall all packages"
+
+  - bash: |
+      ${{ parameters.prefix }} SDIST=$(python -c "import os;print(os.listdir('./dist')[-1])" 2>&1)
+      ${{ parameters.prefix }} python -m pip install dist/$SDIST
+    displayName: "Install from sdist"
+
+  - script: |
+      ${{ parameters.prefix }} python -m pip install -U -r requirements.txt
+    displayName: "Install test requirements"
+
+  - script: |
+      ${{ parameters.prefix }} python -m pip install -U cupy-cuda110
+      ${{ parameters.prefix }} python -m pip install "torch==1.7.1+cu110" -f https://download.pytorch.org/whl/torch_stable.html
+    displayName: "Install GPU requirements"
+    condition: eq(${{ parameters.gpu }}, true)
+
+  - script: |
+      ${{ parameters.prefix }} python -m pytest --pyargs spacy
+    displayName: "Run CPU tests"
+    condition: eq(${{ parameters.gpu }}, false)
+
+  - script: |
+      ${{ parameters.prefix }} python -m pytest --pyargs spacy -p spacy.tests.enable_gpu
+    displayName: "Run GPU tests"
+    condition: eq(${{ parameters.gpu }}, true)
--- a/.github/contributors/AyushExel.md
+++ b/.github/contributors/AyushExel.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Ayush Chaurasia      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2021-03-12           |
+| GitHub username                | AyushExel            |
+| Website (optional)             |                      |
--- a/.github/contributors/broaddeep.md
+++ b/.github/contributors/broaddeep.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Dongjun Park         |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2021-03-06           |
+| GitHub username                | broaddeep            |
+| Website (optional)             |                      |
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@ -76,39 +76,24 @@ jobs:
      maxParallel: 4
    pool:
      vmImage: $(imageName)
-
    steps:
-      - task: UsePythonVersion@0
-        inputs:
-          versionSpec: "$(python.version)"
-          architecture: "x64"
+      - template: .github/azure-steps.yml
+        parameters:
+          python_version: '$(python.version)'
+          architecture: 'x64'

-      - script: |
-          python -m pip install -U setuptools
-          pip install -r requirements.txt
-        displayName: "Install dependencies"
-
-      - script: |
-          python setup.py build_ext --inplace
-          python setup.py sdist --formats=gztar
-        displayName: "Compile and build sdist"
-
-      - task: DeleteFiles@1
-        inputs:
-          contents: "spacy"
-        displayName: "Delete source directory"
-
-      - script: |
-          pip freeze > installed.txt
-          pip uninstall -y -r installed.txt
-        displayName: "Uninstall all packages"
-
-      - bash: |
-          SDIST=$(python -c "import os;print(os.listdir('./dist')[-1])" 2>&1)
-          pip install dist/$SDIST
-        displayName: "Install from sdist"
-
-      - script: |
-          pip install -r requirements.txt
-          python -m pytest --pyargs spacy
-        displayName: "Run tests"
+  - job: "TestGPU"
+    dependsOn: "Validate"
+    strategy:
+      matrix:
+        Python38LinuxX64_GPU:
+          python.version: '3.8'
+    pool:
+      name: "LinuxX64_GPU"
+    steps:
+      - template: .github/azure-steps.yml
+        parameters:
+          python_version: '$(python.version)'
+          architecture: 'x64'
+          gpu: true
+          num_build_jobs: 24
--- a/pyproject.toml
+++ b/pyproject.toml
@ -5,7 +5,7 @@ requires = [
    "cymem>=2.0.2,<2.1.0",
    "preshed>=3.0.2,<3.1.0",
    "murmurhash>=0.28.0,<1.1.0",
-    "thinc>=8.0.2,<8.1.0",
+    "thinc>=8.0.3,<8.1.0",
    "blis>=0.4.0,<0.8.0",
    "pathy",
    "numpy>=1.15.0",
--- a/requirements.txt
+++ b/requirements.txt
@ -1,14 +1,14 @@
 # Our libraries
-spacy-legacy>=3.0.0,<3.1.0
+spacy-legacy>=3.0.4,<3.1.0
 cymem>=2.0.2,<2.1.0
 preshed>=3.0.2,<3.1.0
-thinc>=8.0.2,<8.1.0
+thinc>=8.0.3,<8.1.0
 blis>=0.4.0,<0.8.0
 ml_datasets>=0.2.0,<0.3.0
 murmurhash>=0.28.0,<1.1.0
 wasabi>=0.8.1,<1.1.0
-srsly>=2.4.0,<3.0.0
-catalogue>=2.0.1,<2.1.0
+srsly>=2.4.1,<3.0.0
+catalogue>=2.0.3,<2.1.0
 typer>=0.3.0,<0.4.0
 pathy>=0.3.5
 # Third party dependencies
@ -20,7 +20,6 @@ jinja2
 # Official Python utilities
 setuptools
 packaging>=20.0
-importlib_metadata>=0.20; python_version < "3.8"
 typing_extensions>=3.7.4.1,<4.0.0.0; python_version < "3.8"
 # Development dependencies
 cython>=0.25
--- a/setup.cfg
+++ b/setup.cfg
@ -34,18 +34,18 @@ setup_requires =
    cymem>=2.0.2,<2.1.0
    preshed>=3.0.2,<3.1.0
    murmurhash>=0.28.0,<1.1.0
-    thinc>=8.0.2,<8.1.0
+    thinc>=8.0.3,<8.1.0
 install_requires =
    # Our libraries
-    spacy-legacy>=3.0.0,<3.1.0
+    spacy-legacy>=3.0.4,<3.1.0
    murmurhash>=0.28.0,<1.1.0
    cymem>=2.0.2,<2.1.0
    preshed>=3.0.2,<3.1.0
-    thinc>=8.0.2,<8.1.0
+    thinc>=8.0.3,<8.1.0
    blis>=0.4.0,<0.8.0
    wasabi>=0.8.1,<1.1.0
-    srsly>=2.4.0,<3.0.0
-    catalogue>=2.0.1,<2.1.0
+    srsly>=2.4.1,<3.0.0
+    catalogue>=2.0.3,<2.1.0
    typer>=0.3.0,<0.4.0
    pathy>=0.3.5
    # Third-party dependencies
@ -57,7 +57,6 @@ install_requires =
    # Official Python utilities
    setuptools
    packaging>=20.0
-    importlib_metadata>=0.20; python_version < "3.8"
    typing_extensions>=3.7.4,<4.0.0.0; python_version < "3.8"

 [options.entry_points]
@ -91,6 +90,8 @@ cuda110 =
    cupy-cuda110>=5.0.0b4,<9.0.0
 cuda111 =
    cupy-cuda111>=5.0.0b4,<9.0.0
+cuda112 =
+    cupy-cuda112>=5.0.0b4,<9.0.0
 # Language tokenizers with external dependencies
 ja =
    sudachipy>=0.4.9
--- a/spacy/about.py
+++ b/spacy/about.py
@ -1,6 +1,6 @@
 # fmt: off
 __title__ = "spacy"
-__version__ = "3.0.5"
+__version__ = "3.0.6"
 __download_url__ = "https://github.com/explosion/spacy-models/releases/download"
 __compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
 __projects__ = "https://github.com/explosion/projects"
--- a/spacy/cli/init.py
+++ b/spacy/cli/init.py
@ -9,6 +9,7 @@ from .info import info  # noqa: F401
 from .package import package  # noqa: F401
 from .profile import profile  # noqa: F401
 from .train import train_cli  # noqa: F401
+from .assemble import assemble_cli  # noqa: F401
 from .pretrain import pretrain  # noqa: F401
 from .debug_data import debug_data  # noqa: F401
 from .debug_config import debug_config  # noqa: F401
@ -29,9 +30,9 @@ from .project.document import project_document  # noqa: F401

@app.command("link", no_args_is_help=True, deprecated=True, hidden=True)
 def link(*args, **kwargs):
-    """As of spaCy v3.0, symlinks like "en" are deprecated. You can load trained
+    """As of spaCy v3.0, symlinks like "en" are not supported anymore. You can load trained
    pipeline packages using their full names or from a directory path."""
    msg.warn(
-        "As of spaCy v3.0, model symlinks are deprecated. You can load trained "
+        "As of spaCy v3.0, model symlinks are not supported anymore. You can load trained "
        "pipeline packages using their full names or from a directory path."
    )
--- a/spacy/cli/assemble.py
+++ b/spacy/cli/assemble.py
@ -0,0 +1,58 @@
+from typing import Optional
+from pathlib import Path
+from wasabi import msg
+import typer
+import logging
+
+from ._util import app, Arg, Opt, parse_config_overrides, show_validation_error
+from ._util import import_code
+from ..training.initialize import init_nlp
+from .. import util
+from ..util import get_sourced_components, load_model_from_config
+
+
+@app.command(
+    "assemble",
+    context_settings={"allow_extra_args": True, "ignore_unknown_options": True},
+)
+def assemble_cli(
+    # fmt: off
+    ctx: typer.Context,  # This is only used to read additional arguments
+    config_path: Path = Arg(..., help="Path to config file", exists=True, allow_dash=True),
+    output_path: Path = Arg(..., help="Output directory to store assembled pipeline in"),
+    code_path: Optional[Path] = Opt(None, "--code", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
+    verbose: bool = Opt(False, "--verbose", "-V", "-VV", help="Display more information for debugging purposes"),
+    # fmt: on
+):
+    """
+    Assemble a spaCy pipeline from a config file. The config file includes
+    all settings for initializing the pipeline. To override settings in the
+    config, e.g. settings that point to local paths or that you want to
+    experiment with, you can override them as command line options. The
+    --code argument lets you pass in a Python file that can be used to
+    register custom functions that are referenced in the config.
+
+    DOCS: https://spacy.io/api/cli#assemble
+    """
+    util.logger.setLevel(logging.DEBUG if verbose else logging.INFO)
+    # Make sure all files and paths exists if they are needed
+    if not config_path or (str(config_path) != "-" and not config_path.exists()):
+        msg.fail("Config file not found", config_path, exits=1)
+    overrides = parse_config_overrides(ctx.args)
+    import_code(code_path)
+    with show_validation_error(config_path):
+        config = util.load_config(config_path, overrides=overrides, interpolate=False)
+    msg.divider("Initializing pipeline")
+    nlp = load_model_from_config(config, auto_fill=True)
+    config = config.interpolate()
+    sourced = get_sourced_components(config)
+    # Make sure that listeners are defined before initializing further
+    nlp._link_components()
+    with nlp.select_pipes(disable=[*sourced]):
+        nlp.initialize()
+    msg.good("Initialized pipeline")
+    msg.divider("Serializing to disk")
+    if output_path is not None and not output_path.exists():
+        output_path.mkdir(parents=True)
+        msg.good(f"Created output directory: {output_path}")
+    nlp.to_disk(output_path)
--- a/spacy/cli/debug_data.py
+++ b/spacy/cli/debug_data.py
@ -1,4 +1,4 @@
-from typing import List, Sequence, Dict, Any, Tuple, Optional
+from typing import List, Sequence, Dict, Any, Tuple, Optional, Set
 from pathlib import Path
 from collections import Counter
 import sys
@ -13,6 +13,8 @@ from ..training.initialize import get_sourced_components
 from ..schemas import ConfigSchemaTraining
 from ..pipeline._parser_internals import nonproj
 from ..pipeline._parser_internals.nonproj import DELIMITER
+from ..pipeline import Morphologizer
+from ..morphology import Morphology
 from ..language import Language
 from ..util import registry, resolve_dot_names
 from .. import util
@ -194,32 +196,32 @@ def debug_data(
        )
        label_counts = gold_train_data["ner"]
        model_labels = _get_labels_from_model(nlp, "ner")
-        new_labels = [l for l in labels if l not in model_labels]
-        existing_labels = [l for l in labels if l in model_labels]
        has_low_data_warning = False
        has_no_neg_warning = False
        has_ws_ents_error = False
        has_punct_ents_warning = False

        msg.divider("Named Entity Recognition")
-        msg.info(
-            f"{len(new_labels)} new label(s), {len(existing_labels)} existing label(s)"
-        )
+        msg.info(f"{len(model_labels)} label(s)")
        missing_values = label_counts["-"]
        msg.text(f"{missing_values} missing value(s) (tokens with '-' label)")
-        for label in new_labels:
+        for label in labels:
            if len(label) == 0:
-                msg.fail("Empty label found in new labels")
-        if new_labels:
-            labels_with_counts = [
-                (label, count)
-                for label, count in label_counts.most_common()
-                if label != "-"
-            ]
-            labels_with_counts = _format_labels(labels_with_counts, counts=True)
-            msg.text(f"New: {labels_with_counts}", show=verbose)
-        if existing_labels:
-            msg.text(f"Existing: {_format_labels(existing_labels)}", show=verbose)
+                msg.fail("Empty label found in train data")
+        labels_with_counts = [
+            (label, count)
+            for label, count in label_counts.most_common()
+            if label != "-"
+        ]
+        labels_with_counts = _format_labels(labels_with_counts, counts=True)
+        msg.text(f"Labels in train data: {_format_labels(labels)}", show=verbose)
+        missing_labels = model_labels - labels
+        if missing_labels:
+            msg.warn(
+                "Some model labels are not present in the train data. The "
+                "model performance may be degraded for these labels after "
+                f"training: {_format_labels(missing_labels)}."
+            )
        if gold_train_data["ws_ents"]:
            msg.fail(f"{gold_train_data['ws_ents']} invalid whitespace entity spans")
            has_ws_ents_error = True
@ -228,10 +230,10 @@ def debug_data(
            msg.warn(f"{gold_train_data['punct_ents']} entity span(s) with punctuation")
            has_punct_ents_warning = True

-        for label in new_labels:
+        for label in labels:
            if label_counts[label] <= NEW_LABEL_THRESHOLD:
                msg.warn(
-                    f"Low number of examples for new label '{label}' ({label_counts[label]})"
+                    f"Low number of examples for label '{label}' ({label_counts[label]})"
                )
                has_low_data_warning = True

@ -276,22 +278,52 @@ def debug_data(
            )

    if "textcat" in factory_names:
-        msg.divider("Text Classification")
-        labels = [label for label in gold_train_data["cats"]]
-        model_labels = _get_labels_from_model(nlp, "textcat")
-        new_labels = [l for l in labels if l not in model_labels]
-        existing_labels = [l for l in labels if l in model_labels]
-        msg.info(
-            f"Text Classification: {len(new_labels)} new label(s), "
-            f"{len(existing_labels)} existing label(s)"
+        msg.divider("Text Classification (Exclusive Classes)")
+        labels = _get_labels_from_model(nlp, "textcat")
+        msg.info(f"Text Classification: {len(labels)} label(s)")
+        msg.text(f"Labels: {_format_labels(labels)}", show=verbose)
+        labels_with_counts = _format_labels(
+            gold_train_data["cats"].most_common(), counts=True
        )
-        if new_labels:
-            labels_with_counts = _format_labels(
-                gold_train_data["cats"].most_common(), counts=True
+        msg.text(f"Labels in train data: {labels_with_counts}", show=verbose)
+        missing_labels = labels - set(gold_train_data["cats"].keys())
+        if missing_labels:
+            msg.warn(
+                "Some model labels are not present in the train data. The "
+                "model performance may be degraded for these labels after "
+                f"training: {_format_labels(missing_labels)}."
+            )
+        if gold_train_data["n_cats_multilabel"] > 0:
+            # Note: you should never get here because you run into E895 on
+            # initialization first.
+            msg.warn(
+                "The train data contains instances without "
+                "mutually-exclusive classes. Use the component "
+                "'textcat_multilabel' instead of 'textcat'."
+            )
+        if gold_dev_data["n_cats_multilabel"] > 0:
+            msg.fail(
+                "Train/dev mismatch: the dev data contains instances "
+                "without mutually-exclusive classes while the train data "
+                "contains only instances with mutually-exclusive classes."
+            )
+
+    if "textcat_multilabel" in factory_names:
+        msg.divider("Text Classification (Multilabel)")
+        labels = _get_labels_from_model(nlp, "textcat_multilabel")
+        msg.info(f"Text Classification: {len(labels)} label(s)")
+        msg.text(f"Labels: {_format_labels(labels)}", show=verbose)
+        labels_with_counts = _format_labels(
+            gold_train_data["cats"].most_common(), counts=True
+        )
+        msg.text(f"Labels in train data: {labels_with_counts}", show=verbose)
+        missing_labels = labels - set(gold_train_data["cats"].keys())
+        if missing_labels:
+            msg.warn(
+                "Some model labels are not present in the train data. The "
+                "model performance may be degraded for these labels after "
+                f"training: {_format_labels(missing_labels)}."
            )
-            msg.text(f"New: {labels_with_counts}", show=verbose)
-        if existing_labels:
-            msg.text(f"Existing: {_format_labels(existing_labels)}", show=verbose)
        if set(gold_train_data["cats"]) != set(gold_dev_data["cats"]):
            msg.fail(
                f"The train and dev labels are not the same. "
@ -299,11 +331,6 @@ def debug_data(
                f"Dev labels: {_format_labels(gold_dev_data['cats'])}."
            )
        if gold_train_data["n_cats_multilabel"] > 0:
-            msg.info(
-                "The train data contains instances without "
-                "mutually-exclusive classes. Use '--textcat-multilabel' "
-                "when training."
-            )
            if gold_dev_data["n_cats_multilabel"] == 0:
                msg.warn(
                    "Potential train/dev mismatch: the train data contains "
@ -311,9 +338,10 @@ def debug_data(
                    "dev data does not."
                )
        else:
-            msg.info(
+            msg.warn(
                "The train data contains only instances with "
-                "mutually-exclusive classes."
+                "mutually-exclusive classes. You can potentially use the "
+                "component 'textcat' instead of 'textcat_multilabel'."
            )
            if gold_dev_data["n_cats_multilabel"] > 0:
                msg.fail(
@ -325,13 +353,37 @@ def debug_data(
    if "tagger" in factory_names:
        msg.divider("Part-of-speech Tagging")
        labels = [label for label in gold_train_data["tags"]]
-        # TODO: does this need to be updated?
-        msg.info(f"{len(labels)} label(s) in data")
+        model_labels = _get_labels_from_model(nlp, "tagger")
+        msg.info(f"{len(labels)} label(s) in train data")
+        missing_labels = model_labels - set(labels)
+        if missing_labels:
+            msg.warn(
+                "Some model labels are not present in the train data. The "
+                "model performance may be degraded for these labels after "
+                f"training: {_format_labels(missing_labels)}."
+            )
        labels_with_counts = _format_labels(
            gold_train_data["tags"].most_common(), counts=True
        )
        msg.text(labels_with_counts, show=verbose)

+    if "morphologizer" in factory_names:
+        msg.divider("Morphologizer (POS+Morph)")
+        labels = [label for label in gold_train_data["morphs"]]
+        model_labels = _get_labels_from_model(nlp, "morphologizer")
+        msg.info(f"{len(labels)} label(s) in train data")
+        missing_labels = model_labels - set(labels)
+        if missing_labels:
+            msg.warn(
+                "Some model labels are not present in the train data. The "
+                "model performance may be degraded for these labels after "
+                f"training: {_format_labels(missing_labels)}."
+            )
+        labels_with_counts = _format_labels(
+            gold_train_data["morphs"].most_common(), counts=True
+        )
+        msg.text(labels_with_counts, show=verbose)
+
    if "parser" in factory_names:
        has_low_data_warning = False
        msg.divider("Dependency Parsing")
@ -491,6 +543,7 @@ def _compile_gold(
        "ner": Counter(),
        "cats": Counter(),
        "tags": Counter(),
+        "morphs": Counter(),
        "deps": Counter(),
        "words": Counter(),
        "roots": Counter(),
@ -544,13 +597,36 @@ def _compile_gold(
                    data["ner"][combined_label] += 1
                elif label == "-":
                    data["ner"]["-"] += 1
-        if "textcat" in factory_names:
+        if "textcat" in factory_names or "textcat_multilabel" in factory_names:
            data["cats"].update(gold.cats)
            if list(gold.cats.values()).count(1.0) != 1:
                data["n_cats_multilabel"] += 1
        if "tagger" in factory_names:
            tags = eg.get_aligned("TAG", as_string=True)
            data["tags"].update([x for x in tags if x is not None])
+        if "morphologizer" in factory_names:
+            pos_tags = eg.get_aligned("POS", as_string=True)
+            morphs = eg.get_aligned("MORPH", as_string=True)
+            for pos, morph in zip(pos_tags, morphs):
+                # POS may align (same value for multiple tokens) when morph
+                # doesn't, so if either is misaligned (None), treat the
+                # annotation as missing so that truths doesn't end up with an
+                # unknown morph+POS combination
+                if pos is None or morph is None:
+                    pass
+                # If both are unset, the annotation is missing (empty morph
+                # converted from int is "_" rather than "")
+                elif pos == "" and morph == "":
+                    pass
+                # Otherwise, generate the combined label
+                else:
+                    label_dict = Morphology.feats_to_dict(morph)
+                    if pos:
+                        label_dict[Morphologizer.POS_FEAT] = pos
+                    label = eg.reference.vocab.strings[
+                        eg.reference.vocab.morphology.add(label_dict)
+                    ]
+                    data["morphs"].update([label])
        if "parser" in factory_names:
            aligned_heads, aligned_deps = eg.get_aligned_parse(projectivize=make_proj)
            data["deps"].update([x for x in aligned_deps if x is not None])
@ -584,8 +660,8 @@ def _get_examples_without_label(data: Sequence[Example], label: str) -> int:
    return count


-def _get_labels_from_model(nlp: Language, pipe_name: str) -> Sequence[str]:
+def _get_labels_from_model(nlp: Language, pipe_name: str) -> Set[str]:
    if pipe_name not in nlp.pipe_names:
        return set()
    pipe = nlp.get_pipe(pipe_name)
-    return pipe.labels
+    return set(pipe.labels)
--- a/spacy/cli/templates/quickstart_training.jinja
+++ b/spacy/cli/templates/quickstart_training.jinja
@ -206,7 +206,7 @@ factory = "tok2vec"
@architectures = "spacy.Tok2Vec.v2"

 [components.tok2vec.model.embed]
-@architectures = "spacy.MultiHashEmbed.v1"
+@architectures = "spacy.MultiHashEmbed.v2"
 width = ${components.tok2vec.model.encode.width}
 {% if has_letters -%}
 attrs = ["NORM", "PREFIX", "SUFFIX", "SHAPE"]
--- a/spacy/default_config.cfg
+++ b/spacy/default_config.cfg
@ -68,8 +68,11 @@ seed = ${system.seed}
 gpu_allocator = ${system.gpu_allocator}
 dropout = 0.1
 accumulate_gradient = 1
-# Controls early-stopping. 0 or -1 mean unlimited.
+# Controls early-stopping. 0 disables early stopping.
 patience = 1600
+# Number of epochs. 0 means unlimited. If >= 0, train corpus is loaded once in
+# memory and shuffled within the training loop. -1 means stream train corpus
+# rather than loading in memory with no shuffling within the training loop.
 max_epochs = 0
 max_steps = 20000
 eval_frequency = 200
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -157,6 +157,10 @@ class Warnings:
            "`spacy.load()` to ensure that the model is loaded on the correct "
            "device. More information: "
            "http://spacy.io/usage/v3#jupyter-notebook-gpu")
+    W112 = ("The model specified to use for initial vectors ({name}) has no "
+            "vectors. This is almost certainly a mistake.")
+    W113 = ("Sourced component '{name}' may not work as expected: source "
+            "vectors are not identical to current pipeline vectors.")


@add_codes
@ -497,6 +501,12 @@ class Errors:
    E202 = ("Unsupported alignment mode '{mode}'. Supported modes: {modes}.")

    # New errors added in v3.x
+    E872 = ("Unable to copy tokenizer from base model due to different "
+            'tokenizer settings: current tokenizer config "{curr_config}" '
+            'vs. base model "{base_config}"')
+    E873 = ("Unable to merge a span from doc.spans with key '{key}' and text "
+            "'{text}'. This is likely a bug in spaCy, so feel free to open an "
+            "issue: https://github.com/explosion/spaCy/issues")
    E874 = ("Could not initialize the tok2vec model from component "
            "'{component}' and layer '{layer}'.")
    E875 = ("To use the PretrainVectors objective, make sure that static vectors are loaded. "
@ -631,7 +641,7 @@ class Errors:
            "method, make sure it's overwritten on the subclass.")
    E940 = ("Found NaN values in scores.")
    E941 = ("Can't find model '{name}'. It looks like you're trying to load a "
-            "model from a shortcut, which is deprecated as of spaCy v3.0. To "
+            "model from a shortcut, which is obsolete as of spaCy v3.0. To "
            "load the model, use its full name instead:\n\n"
            "nlp = spacy.load(\"{full}\")\n\nFor more details on the available "
            "models, see the models directory: https://spacy.io/models. If you "
@ -646,8 +656,8 @@ class Errors:
            "returned the initialized nlp object instead?")
    E944 = ("Can't copy pipeline component '{name}' from source '{model}': "
            "not found in pipeline. Available components: {opts}")
-    E945 = ("Can't copy pipeline component '{name}' from source. Expected loaded "
-            "nlp object, but got: {source}")
+    E945 = ("Can't copy pipeline component '{name}' from source. Expected "
+            "loaded nlp object, but got: {source}")
    E947 = ("`Matcher.add` received invalid `greedy` argument: expected "
            "a string value from {expected} but got: '{arg}'")
    E948 = ("`Matcher.add` received invalid 'patterns' argument: expected "
--- a/spacy/lang/it/tokenizer_exceptions.py
+++ b/spacy/lang/it/tokenizer_exceptions.py
@ -17,14 +17,19 @@ _exc = {
 for orth in [
    "..",
    "....",
+    "a.C.",
    "al.",
    "all-path",
    "art.",
    "Art.",
    "artt.",
    "att.",
+    "avv.",
+    "Avv."
    "by-pass",
    "c.d.",
+    "c/c",
+    "C.so",
    "centro-sinistra",
    "check-up",
    "Civ.",
@ -48,6 +53,8 @@ for orth in [
    "prof.",
    "sett.",
    "s.p.a.",
+    "s.n.c",
+    "s.r.l",
    "ss.",
    "St.",
    "tel.",
--- a/spacy/language.py
+++ b/spacy/language.py
@ -682,9 +682,14 @@ class Language:
        name (str): Optional alternative name to use in current pipeline.
        RETURNS (Tuple[Callable, str]): The component and its factory name.
        """
-        # TODO: handle errors and mismatches (vectors etc.)
-        if not isinstance(source, self.__class__):
+        # Check source type
+        if not isinstance(source, Language):
            raise ValueError(Errors.E945.format(name=source_name, source=type(source)))
+        # Check vectors, with faster checks first
+        if self.vocab.vectors.shape != source.vocab.vectors.shape or \
+                self.vocab.vectors.key2row != source.vocab.vectors.key2row or \
+                self.vocab.vectors.to_bytes() != source.vocab.vectors.to_bytes():
+            util.logger.warning(Warnings.W113.format(name=source_name))
        if not source_name in source.component_names:
            raise KeyError(
                Errors.E944.format(
@ -1673,7 +1678,16 @@ class Language:
                        # model with the same vocab as the current nlp object
                        source_nlps[model] = util.load_model(model, vocab=nlp.vocab)
                    source_name = pipe_cfg.get("component", pipe_name)
+                    listeners_replaced = False
+                    if "replace_listeners" in pipe_cfg:
+                        for name, proc in source_nlps[model].pipeline:
+                            if source_name in getattr(proc, "listening_components", []):
+                                source_nlps[model].replace_listeners(name, source_name, pipe_cfg["replace_listeners"])
+                                listeners_replaced = True
                    nlp.add_pipe(source_name, source=source_nlps[model], name=pipe_name)
+                    # Delete from cache if listeners were replaced
+                    if listeners_replaced:
+                        del source_nlps[model]
        disabled_pipes = [*config["nlp"]["disabled"], *disable]
        nlp._disabled = set(p for p in disabled_pipes if p not in exclude)
        nlp.batch_size = config["nlp"]["batch_size"]
--- a/spacy/matcher/dependencymatcher.pyx
+++ b/spacy/matcher/dependencymatcher.pyx
@ -299,7 +299,7 @@ cdef class DependencyMatcher:
        if isinstance(doclike, Doc):
            doc = doclike
        elif isinstance(doclike, Span):
-            doc = doclike.as_doc()
+            doc = doclike.as_doc(copy_user_data=True)
        else:
            raise ValueError(Errors.E195.format(good="Doc or Span", got=type(doclike).__name__))

--- a/spacy/matcher/matcher.pxd
+++ b/spacy/matcher/matcher.pxd
@ -46,6 +46,12 @@ cdef struct TokenPatternC:
    int32_t nr_py
    quantifier_t quantifier
    hash_t key
+    int32_t token_idx
+
+
+cdef struct MatchAlignmentC:
+    int32_t token_idx
+    int32_t length


 cdef struct PatternStateC:
--- a/spacy/matcher/matcher.pyx
+++ b/spacy/matcher/matcher.pyx
@ -196,7 +196,7 @@ cdef class Matcher:
                else:
                    yield doc

-    def __call__(self, object doclike, *, as_spans=False, allow_missing=False):
+    def __call__(self, object doclike, *, as_spans=False, allow_missing=False, with_alignments=False):
        """Find all token sequences matching the supplied pattern.

        doclike (Doc or Span): The document to match over.
@ -204,10 +204,16 @@ cdef class Matcher:
            start, end) tuples.
        allow_missing (bool): Whether to skip checks for missing annotation for
            attributes included in patterns. Defaults to False.
+        with_alignments (bool): Return match alignment information, which is
+            `List[int]` with length of matched span. Each entry denotes the
+            corresponding index of token pattern. If as_spans is set to True,
+            this setting is ignored.
        RETURNS (list): A list of `(match_id, start, end)` tuples,
            describing the matches. A match tuple describes a span
            `doc[start:end]`. The `match_id` is an integer. If as_spans is set
            to True, a list of Span objects is returned.
+            If with_alignments is set to True and as_spans is set to False,
+            A list of `(match_id, start, end, alignments)` tuples is returned.
        """
        if isinstance(doclike, Doc):
            doc = doclike
@ -217,6 +223,9 @@ cdef class Matcher:
            length = doclike.end - doclike.start
        else:
            raise ValueError(Errors.E195.format(good="Doc or Span", got=type(doclike).__name__))
+        # Skip alignments calculations if as_spans is set
+        if as_spans:
+            with_alignments = False
        cdef Pool tmp_pool = Pool()
        if not allow_missing:
            for attr in (TAG, POS, MORPH, LEMMA, DEP):
@ -232,18 +241,20 @@ cdef class Matcher:
                    error_msg = Errors.E155.format(pipe=pipe, attr=self.vocab.strings.as_string(attr))
                    raise ValueError(error_msg)
        matches = find_matches(&self.patterns[0], self.patterns.size(), doclike, length,
-                                extensions=self._extensions, predicates=self._extra_predicates)
+                                extensions=self._extensions, predicates=self._extra_predicates, with_alignments=with_alignments)
        final_matches = []
        pairs_by_id = {}
-        # For each key, either add all matches, or only the filtered, non-overlapping ones
-        for (key, start, end) in matches:
+        # For each key, either add all matches, or only the filtered,
+        # non-overlapping ones this `match` can be either (start, end) or
+        # (start, end, alignments) depending on `with_alignments=` option.
+        for key, *match in matches:
            span_filter = self._filter.get(key)
            if span_filter is not None:
                pairs = pairs_by_id.get(key, [])
-                pairs.append((start,end))
+                pairs.append(match)
                pairs_by_id[key] = pairs
            else:
-                final_matches.append((key, start, end))
+                final_matches.append((key, *match))
        matched = <char*>tmp_pool.alloc(length, sizeof(char))
        empty = <char*>tmp_pool.alloc(length, sizeof(char))
        for key, pairs in pairs_by_id.items():
@ -255,14 +266,18 @@ cdef class Matcher:
                sorted_pairs = sorted(pairs, key=lambda x: (x[1]-x[0], -x[0]), reverse=True) # reverse sort by length
            else:
                raise ValueError(Errors.E947.format(expected=["FIRST", "LONGEST"], arg=span_filter))
-            for (start, end) in sorted_pairs:
+            for match in sorted_pairs:
+                start, end = match[:2]
                assert 0 <= start < end  # Defend against segfaults
                span_len = end-start
                # If no tokens in the span have matched
                if memcmp(&matched[start], &empty[start], span_len * sizeof(matched[0])) == 0:
-                    final_matches.append((key, start, end))
+                    final_matches.append((key, *match))
                    # Mark tokens that have matched
                    memset(&matched[start], 1, span_len * sizeof(matched[0]))
+        if with_alignments:
+            final_matches_with_alignments = final_matches
+            final_matches = [(key, start, end) for key, start, end, alignments in final_matches]
        # perform the callbacks on the filtered set of results
        for i, (key, start, end) in enumerate(final_matches):
            on_match = self._callbacks.get(key, None)
@ -270,6 +285,22 @@ cdef class Matcher:
                on_match(self, doc, i, final_matches)
        if as_spans:
            return [Span(doc, start, end, label=key) for key, start, end in final_matches]
+        elif with_alignments:
+            # convert alignments List[Dict[str, int]] --> List[int]
+            final_matches = []
+            # when multiple alignment (belongs to the same length) is found,
+            # keeps the alignment that has largest token_idx
+            for key, start, end, alignments in final_matches_with_alignments:
+                sorted_alignments = sorted(alignments, key=lambda x: (x['length'], x['token_idx']), reverse=False)
+                alignments = [0] * (end-start)
+                for align in sorted_alignments:
+                    if align['length'] >= end-start:
+                        continue
+                    # Since alignments are sorted in order of (length, token_idx)
+                    # this overwrites smaller token_idx when they have same length.
+                    alignments[align['length']] = align['token_idx']
+                final_matches.append((key, start, end, alignments))
+            return final_matches
        else:
            return final_matches

@ -288,9 +319,9 @@ def unpickle_matcher(vocab, patterns, callbacks):
    return matcher


-cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, extensions=None, predicates=tuple()):
+cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, extensions=None, predicates=tuple(), bint with_alignments=0):
    """Find matches in a doc, with a compiled array of patterns. Matches are
-    returned as a list of (id, start, end) tuples.
+    returned as a list of (id, start, end) tuples or (id, start, end, alignments) tuples (if with_alignments != 0)

    To augment the compiled patterns, we optionally also take two Python lists.

@ -302,6 +333,8 @@ cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, e
    """
    cdef vector[PatternStateC] states
    cdef vector[MatchC] matches
+    cdef vector[vector[MatchAlignmentC]] align_states
+    cdef vector[vector[MatchAlignmentC]] align_matches
    cdef PatternStateC state
    cdef int i, j, nr_extra_attr
    cdef Pool mem = Pool()
@ -328,12 +361,14 @@ cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, e
    for i in range(length):
        for j in range(n):
            states.push_back(PatternStateC(patterns[j], i, 0))
-        transition_states(states, matches, predicate_cache,
-            doclike[i], extra_attr_values, predicates)
+        if with_alignments != 0:
+            align_states.resize(states.size())
+        transition_states(states, matches, align_states, align_matches, predicate_cache,
+            doclike[i], extra_attr_values, predicates, with_alignments)
        extra_attr_values += nr_extra_attr
        predicate_cache += len(predicates)
    # Handle matches that end in 0-width patterns
-    finish_states(matches, states)
+    finish_states(matches, states, align_matches, align_states, with_alignments)
    seen = set()
    for i in range(matches.size()):
        match = (
@ -346,16 +381,22 @@ cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, e
        # first .?, or the second .? -- it doesn't matter, it's just one match.
        # Skip 0-length matches. (TODO: fix algorithm)
        if match not in seen and matches[i].length > 0:
-            output.append(match)
+            if with_alignments != 0:
+                # since the length of align_matches equals to that of match, we can share same 'i'
+                output.append(match + (align_matches[i],))
+            else:
+                output.append(match)
            seen.add(match)
    return output


 cdef void transition_states(vector[PatternStateC]& states, vector[MatchC]& matches,
+                            vector[vector[MatchAlignmentC]]& align_states, vector[vector[MatchAlignmentC]]& align_matches,
                            int8_t* cached_py_predicates,
-        Token token, const attr_t* extra_attrs, py_predicates) except *:
+        Token token, const attr_t* extra_attrs, py_predicates, bint with_alignments) except *:
    cdef int q = 0
    cdef vector[PatternStateC] new_states
+    cdef vector[vector[MatchAlignmentC]] align_new_states
    cdef int nr_predicate = len(py_predicates)
    for i in range(states.size()):
        if states[i].pattern.nr_py >= 1:
@ -370,23 +411,39 @@ cdef void transition_states(vector[PatternStateC]& states, vector[MatchC]& match
        # it in the states list, because q doesn't advance.
        state = states[i]
        states[q] = state
+        # Separate from states, performance is guaranteed for users who only need basic options (without alignments).
+        # `align_states` always corresponds to `states` 1:1.
+        if with_alignments != 0:
+            align_state = align_states[i]
+            align_states[q] = align_state
        while action in (RETRY, RETRY_ADVANCE, RETRY_EXTEND):
+            # Update alignment before the transition of current state
+            # 'MatchAlignmentC' maps 'original token index of current pattern' to 'current matching length'
+            if with_alignments != 0:
+                align_states[q].push_back(MatchAlignmentC(states[q].pattern.token_idx, states[q].length))
            if action == RETRY_EXTEND:
                # This handles the 'extend'
                new_states.push_back(
                    PatternStateC(pattern=states[q].pattern, start=state.start,
                                  length=state.length+1))
+                if with_alignments != 0:
+                    align_new_states.push_back(align_states[q])
            if action == RETRY_ADVANCE:
                # This handles the 'advance'
                new_states.push_back(
                    PatternStateC(pattern=states[q].pattern+1, start=state.start,
                                  length=state.length+1))
+                if with_alignments != 0:
+                    align_new_states.push_back(align_states[q])
            states[q].pattern += 1
            if states[q].pattern.nr_py != 0:
                update_predicate_cache(cached_py_predicates,
                    states[q].pattern, token, py_predicates)
            action = get_action(states[q], token.c, extra_attrs,
                                cached_py_predicates)
+        # Update alignment before the transition of current state
+        if with_alignments != 0:
+            align_states[q].push_back(MatchAlignmentC(states[q].pattern.token_idx, states[q].length))
        if action == REJECT:
            pass
        elif action == ADVANCE:
@ -399,29 +456,50 @@ cdef void transition_states(vector[PatternStateC]& states, vector[MatchC]& match
                matches.push_back(
                    MatchC(pattern_id=ent_id, start=state.start,
                            length=state.length+1))
+                # `align_matches` always corresponds to `matches` 1:1
+                if with_alignments != 0:
+                    align_matches.push_back(align_states[q])
            elif action == MATCH_DOUBLE:
                # push match without last token if length > 0
                if state.length > 0:
                    matches.push_back(
                        MatchC(pattern_id=ent_id, start=state.start,
                                length=state.length))
+                    # MATCH_DOUBLE emits matches twice,
+                    # add one more to align_matches in order to keep 1:1 relationship
+                    if with_alignments != 0:
+                        align_matches.push_back(align_states[q])
                # push match with last token
                matches.push_back(
                    MatchC(pattern_id=ent_id, start=state.start,
                            length=state.length+1))
+                # `align_matches` always corresponds to `matches` 1:1
+                if with_alignments != 0:
+                    align_matches.push_back(align_states[q])
            elif action == MATCH_REJECT:
                matches.push_back(
                    MatchC(pattern_id=ent_id, start=state.start,
                            length=state.length))
+                # `align_matches` always corresponds to `matches` 1:1
+                if with_alignments != 0:
+                    align_matches.push_back(align_states[q])
            elif action == MATCH_EXTEND:
                matches.push_back(
                    MatchC(pattern_id=ent_id, start=state.start,
                           length=state.length))
+                # `align_matches` always corresponds to `matches` 1:1
+                if with_alignments != 0:
+                    align_matches.push_back(align_states[q])
                states[q].length += 1
                q += 1
    states.resize(q)
    for i in range(new_states.size()):
        states.push_back(new_states[i])
+    # `align_states` always corresponds to `states` 1:1
+    if with_alignments != 0:
+        align_states.resize(q)
+        for i in range(align_new_states.size()):
+            align_states.push_back(align_new_states[i])


 cdef int update_predicate_cache(int8_t* cache,
@ -444,15 +522,27 @@ cdef int update_predicate_cache(int8_t* cache,
                raise ValueError(Errors.E125.format(value=result))


-cdef void finish_states(vector[MatchC]& matches, vector[PatternStateC]& states) except *:
+cdef void finish_states(vector[MatchC]& matches, vector[PatternStateC]& states,
+                        vector[vector[MatchAlignmentC]]& align_matches,
+                        vector[vector[MatchAlignmentC]]& align_states,
+                        bint with_alignments) except *:
    """Handle states that end in zero-width patterns."""
    cdef PatternStateC state
+    cdef vector[MatchAlignmentC] align_state
    for i in range(states.size()):
        state = states[i]
+        if with_alignments != 0:
+            align_state = align_states[i]
        while get_quantifier(state) in (ZERO_PLUS, ZERO_ONE):
+            # Update alignment before the transition of current state
+            if with_alignments != 0:
+                align_state.push_back(MatchAlignmentC(state.pattern.token_idx, state.length))
            is_final = get_is_final(state)
            if is_final:
                ent_id = get_ent_id(state.pattern)
+                # `align_matches` always corresponds to `matches` 1:1
+                if with_alignments != 0:
+                    align_matches.push_back(align_state)
                matches.push_back(
                    MatchC(pattern_id=ent_id, start=state.start, length=state.length))
                break
@ -607,7 +697,7 @@ cdef int8_t get_quantifier(PatternStateC state) nogil:
 cdef TokenPatternC* init_pattern(Pool mem, attr_t entity_id, object token_specs) except NULL:
    pattern = <TokenPatternC*>mem.alloc(len(token_specs) + 1, sizeof(TokenPatternC))
    cdef int i, index
-    for i, (quantifier, spec, extensions, predicates) in enumerate(token_specs):
+    for i, (quantifier, spec, extensions, predicates, token_idx) in enumerate(token_specs):
        pattern[i].quantifier = quantifier
        # Ensure attrs refers to a null pointer if nr_attr == 0
        if len(spec) > 0:
@ -628,6 +718,7 @@ cdef TokenPatternC* init_pattern(Pool mem, attr_t entity_id, object token_specs)
            pattern[i].py_predicates[j] = index
        pattern[i].nr_py = len(predicates)
        pattern[i].key = hash64(pattern[i].attrs, pattern[i].nr_attr * sizeof(AttrValueC), 0)
+        pattern[i].token_idx = token_idx
    i = len(token_specs)
    # Use quantifier to identify final ID pattern node (rather than previous
    # uninitialized quantifier == 0/ZERO + nr_attr == 0 + non-zero-length attrs)
@ -638,6 +729,7 @@ cdef TokenPatternC* init_pattern(Pool mem, attr_t entity_id, object token_specs)
    pattern[i].nr_attr = 1
    pattern[i].nr_extra_attr = 0
    pattern[i].nr_py = 0
+    pattern[i].token_idx = -1
    return pattern


@ -655,7 +747,7 @@ def _preprocess_pattern(token_specs, vocab, extensions_table, extra_predicates):
    """This function interprets the pattern, converting the various bits of
    syntactic sugar before we compile it into a struct with init_pattern.

-    We need to split the pattern up into three parts:
+    We need to split the pattern up into four parts:
    * Normal attribute/value pairs, which are stored on either the token or lexeme,
        can be handled directly.
    * Extension attributes are handled specially, as we need to prefetch the
@ -664,13 +756,14 @@ def _preprocess_pattern(token_specs, vocab, extensions_table, extra_predicates):
        functions and store them. So we store these specially as well.
    * Extension attributes that have extra predicates are stored within the
        extra_predicates.
+    * Token index that this pattern belongs to.
    """
    tokens = []
    string_store = vocab.strings
-    for spec in token_specs:
+    for token_idx, spec in enumerate(token_specs):
        if not spec:
            # Signifier for 'any token'
-            tokens.append((ONE, [(NULL_ATTR, 0)], [], []))
+            tokens.append((ONE, [(NULL_ATTR, 0)], [], [], token_idx))
            continue
        if not isinstance(spec, dict):
            raise ValueError(Errors.E154.format())
@ -679,7 +772,7 @@ def _preprocess_pattern(token_specs, vocab, extensions_table, extra_predicates):
        extensions = _get_extensions(spec, string_store, extensions_table)
        predicates = _get_extra_predicates(spec, extra_predicates, vocab)
        for op in ops:
-            tokens.append((op, list(attr_values), list(extensions), list(predicates)))
+            tokens.append((op, list(attr_values), list(extensions), list(predicates), token_idx))
    return tokens


--- a/spacy/ml/_character_embed.py
+++ b/spacy/ml/_character_embed.py
@ -3,8 +3,10 @@ from thinc.api import Model
 from thinc.types import Floats2d

 from ..tokens import Doc
+from ..util import registry


+@registry.layers("spacy.CharEmbed.v1")
 def CharacterEmbed(nM: int, nC: int) -> Model[List[Doc], List[Floats2d]]:
    # nM: Number of dimensions per character. nC: Number of characters.
    return Model(
--- a/spacy/ml/models/tok2vec.py
+++ b/spacy/ml/models/tok2vec.py
@ -31,7 +31,7 @@ def get_tok2vec_width(model: Model):
    return nO


-@registry.architectures("spacy.HashEmbedCNN.v1")
+@registry.architectures("spacy.HashEmbedCNN.v2")
 def build_hash_embed_cnn_tok2vec(
    *,
    width: int,
@ -108,7 +108,7 @@ def build_Tok2Vec_model(
    return tok2vec


-@registry.architectures("spacy.MultiHashEmbed.v1")
+@registry.architectures("spacy.MultiHashEmbed.v2")
 def MultiHashEmbed(
    width: int,
    attrs: List[Union[str, int]],
@ -182,7 +182,7 @@ def MultiHashEmbed(
    return model


-@registry.architectures("spacy.CharacterEmbed.v1")
+@registry.architectures("spacy.CharacterEmbed.v2")
 def CharacterEmbed(
    width: int,
    rows: int,
--- a/spacy/ml/staticvectors.py
+++ b/spacy/ml/staticvectors.py
@ -8,7 +8,7 @@ from ..tokens import Doc
 from ..errors import Errors


-@registry.layers("spacy.StaticVectors.v1")
+@registry.layers("spacy.StaticVectors.v2")
 def StaticVectors(
    nO: Optional[int] = None,
    nM: Optional[int] = None,
@ -38,7 +38,7 @@ def forward(
        return _handle_empty(model.ops, model.get_dim("nO"))
    key_attr = model.attrs["key_attr"]
    W = cast(Floats2d, model.ops.as_contig(model.get_param("W")))
-    V = cast(Floats2d, docs[0].vocab.vectors.data)
+    V = cast(Floats2d, model.ops.asarray(docs[0].vocab.vectors.data))
    rows = model.ops.flatten(
        [doc.vocab.vectors.find(keys=doc.to_array(key_attr)) for doc in docs]
    )
@ -46,6 +46,8 @@ def forward(
        vectors_data = model.ops.gemm(model.ops.as_contig(V[rows]), W, trans2=True)
    except ValueError:
        raise RuntimeError(Errors.E896)
+    # Convert negative indices to 0-vectors (TODO: more options for UNK tokens)
+    vectors_data[rows < 0] = 0
    output = Ragged(
        vectors_data, model.ops.asarray([len(doc) for doc in docs], dtype="i")
    )
--- a/spacy/pipeline/dep_parser.pyx
+++ b/spacy/pipeline/dep_parser.pyx
@ -24,7 +24,7 @@ maxout_pieces = 2
 use_upper = true

 [model.tok2vec]
-@architectures = "spacy.HashEmbedCNN.v1"
+@architectures = "spacy.HashEmbedCNN.v2"
 pretrained_vectors = null
 width = 96
 depth = 4
--- a/spacy/pipeline/entity_linker.py
+++ b/spacy/pipeline/entity_linker.py
@ -26,7 +26,7 @@ default_model_config = """
@architectures = "spacy.EntityLinker.v1"

 [model.tok2vec]
-@architectures = "spacy.HashEmbedCNN.v1"
+@architectures = "spacy.HashEmbedCNN.v2"
 pretrained_vectors = null
 width = 96
 depth = 2
@ -300,77 +300,77 @@ class EntityLinker(TrainablePipe):
        for i, doc in enumerate(docs):
            sentences = [s for s in doc.sents]
            if len(doc) > 0:
-                # Looping through each sentence and each entity
-                # This may go wrong if there are entities across sentences - which shouldn't happen normally.
-                for sent_index, sent in enumerate(sentences):
-                    if sent.ents:
-                        # get n_neightbour sentences, clipped to the length of the document
-                        start_sentence = max(0, sent_index - self.n_sents)
-                        end_sentence = min(
-                            len(sentences) - 1, sent_index + self.n_sents
-                        )
-                        start_token = sentences[start_sentence].start
-                        end_token = sentences[end_sentence].end
-                        sent_doc = doc[start_token:end_token].as_doc()
-                        # currently, the context is the same for each entity in a sentence (should be refined)
-                        xp = self.model.ops.xp
-                        if self.incl_context:
-                            sentence_encoding = self.model.predict([sent_doc])[0]
-                            sentence_encoding_t = sentence_encoding.T
-                            sentence_norm = xp.linalg.norm(sentence_encoding_t)
-                        for ent in sent.ents:
-                            entity_count += 1
-                            if ent.label_ in self.labels_discard:
-                                # ignoring this entity - setting to NIL
-                                final_kb_ids.append(self.NIL)
-                            else:
-                                candidates = self.get_candidates(self.kb, ent)
-                                if not candidates:
-                                    # no prediction possible for this entity - setting to NIL
-                                    final_kb_ids.append(self.NIL)
-                                elif len(candidates) == 1:
-                                    # shortcut for efficiency reasons: take the 1 candidate
-                                    # TODO: thresholding
-                                    final_kb_ids.append(candidates[0].entity_)
-                                else:
-                                    random.shuffle(candidates)
-                                    # set all prior probabilities to 0 if incl_prior=False
-                                    prior_probs = xp.asarray(
-                                        [c.prior_prob for c in candidates]
+                # Looping through each entity (TODO: rewrite)
+                for ent in doc.ents:
+                    sent = ent.sent
+                    sent_index = sentences.index(sent)
+                    assert sent_index >= 0
+                    # get n_neightbour sentences, clipped to the length of the document
+                    start_sentence = max(0, sent_index - self.n_sents)
+                    end_sentence = min(
+                        len(sentences) - 1, sent_index + self.n_sents
+                    )
+                    start_token = sentences[start_sentence].start
+                    end_token = sentences[end_sentence].end
+                    sent_doc = doc[start_token:end_token].as_doc()
+                    # currently, the context is the same for each entity in a sentence (should be refined)
+                    xp = self.model.ops.xp
+                    if self.incl_context:
+                        sentence_encoding = self.model.predict([sent_doc])[0]
+                        sentence_encoding_t = sentence_encoding.T
+                        sentence_norm = xp.linalg.norm(sentence_encoding_t)
+                    entity_count += 1
+                    if ent.label_ in self.labels_discard:
+                        # ignoring this entity - setting to NIL
+                        final_kb_ids.append(self.NIL)
+                    else:
+                        candidates = self.get_candidates(self.kb, ent)
+                        if not candidates:
+                            # no prediction possible for this entity - setting to NIL
+                            final_kb_ids.append(self.NIL)
+                        elif len(candidates) == 1:
+                            # shortcut for efficiency reasons: take the 1 candidate
+                            # TODO: thresholding
+                            final_kb_ids.append(candidates[0].entity_)
+                        else:
+                            random.shuffle(candidates)
+                            # set all prior probabilities to 0 if incl_prior=False
+                            prior_probs = xp.asarray(
+                                [c.prior_prob for c in candidates]
+                            )
+                            if not self.incl_prior:
+                                prior_probs = xp.asarray(
+                                    [0.0 for _ in candidates]
+                                )
+                            scores = prior_probs
+                            # add in similarity from the context
+                            if self.incl_context:
+                                entity_encodings = xp.asarray(
+                                    [c.entity_vector for c in candidates]
+                                )
+                                entity_norm = xp.linalg.norm(
+                                    entity_encodings, axis=1
+                                )
+                                if len(entity_encodings) != len(prior_probs):
+                                    raise RuntimeError(
+                                        Errors.E147.format(
+                                            method="predict",
+                                            msg="vectors not of equal length",
+                                        )
                                    )
-                                    if not self.incl_prior:
-                                        prior_probs = xp.asarray(
-                                            [0.0 for _ in candidates]
-                                        )
-                                    scores = prior_probs
-                                    # add in similarity from the context
-                                    if self.incl_context:
-                                        entity_encodings = xp.asarray(
-                                            [c.entity_vector for c in candidates]
-                                        )
-                                        entity_norm = xp.linalg.norm(
-                                            entity_encodings, axis=1
-                                        )
-                                        if len(entity_encodings) != len(prior_probs):
-                                            raise RuntimeError(
-                                                Errors.E147.format(
-                                                    method="predict",
-                                                    msg="vectors not of equal length",
-                                                )
-                                            )
-                                        # cosine similarity
-                                        sims = xp.dot(
-                                            entity_encodings, sentence_encoding_t
-                                        ) / (sentence_norm * entity_norm)
-                                        if sims.shape != prior_probs.shape:
-                                            raise ValueError(Errors.E161)
-                                        scores = (
-                                            prior_probs + sims - (prior_probs * sims)
-                                        )
-                                    # TODO: thresholding
-                                    best_index = scores.argmax().item()
-                                    best_candidate = candidates[best_index]
-                                    final_kb_ids.append(best_candidate.entity_)
+                                # cosine similarity
+                                sims = xp.dot(
+                                    entity_encodings, sentence_encoding_t
+                                ) / (sentence_norm * entity_norm)
+                                if sims.shape != prior_probs.shape:
+                                    raise ValueError(Errors.E161)
+                                scores = (
+                                    prior_probs + sims - (prior_probs * sims)
+                                )
+                            # TODO: thresholding
+                            best_index = scores.argmax().item()
+                            best_candidate = candidates[best_index]
+                            final_kb_ids.append(best_candidate.entity_)
        if not (len(final_kb_ids) == entity_count):
            err = Errors.E147.format(
                method="predict", msg="result variables not of equal length"
--- a/spacy/pipeline/lemmatizer.py
+++ b/spacy/pipeline/lemmatizer.py
@ -175,7 +175,7 @@ class Lemmatizer(Pipe):

        DOCS: https://spacy.io/api/lemmatizer#rule_lemmatize
        """
-        cache_key = (token.orth, token.pos, token.morph)
+        cache_key = (token.orth, token.pos, token.morph.key)
        if cache_key in self.cache:
            return self.cache[cache_key]
        string = token.text
--- a/spacy/pipeline/morphologizer.pyx
+++ b/spacy/pipeline/morphologizer.pyx
@ -27,7 +27,7 @@ default_model_config = """
@architectures = "spacy.Tok2Vec.v2"

 [model.tok2vec.embed]
-@architectures = "spacy.CharacterEmbed.v1"
+@architectures = "spacy.CharacterEmbed.v2"
 width = 128
 rows = 7000
 nM = 64
--- a/spacy/pipeline/multitask.pyx
+++ b/spacy/pipeline/multitask.pyx
@ -22,7 +22,7 @@ maxout_pieces = 3
 token_vector_width = 96

 [model.tok2vec]
-@architectures = "spacy.HashEmbedCNN.v1"
+@architectures = "spacy.HashEmbedCNN.v2"
 pretrained_vectors = null
 width = 96
 depth = 4
--- a/spacy/pipeline/ner.pyx
+++ b/spacy/pipeline/ner.pyx
@ -21,7 +21,7 @@ maxout_pieces = 2
 use_upper = true

 [model.tok2vec]
-@architectures = "spacy.HashEmbedCNN.v1"
+@architectures = "spacy.HashEmbedCNN.v2"
 pretrained_vectors = null
 width = 96
 depth = 4
--- a/spacy/pipeline/senter.pyx
+++ b/spacy/pipeline/senter.pyx
@ -19,7 +19,7 @@ default_model_config = """
@architectures = "spacy.Tagger.v1"

 [model.tok2vec]
-@architectures = "spacy.HashEmbedCNN.v1"
+@architectures = "spacy.HashEmbedCNN.v2"
 pretrained_vectors = null
 width = 12
 depth = 1
--- a/spacy/pipeline/tagger.pyx
+++ b/spacy/pipeline/tagger.pyx
@ -26,7 +26,7 @@ default_model_config = """
@architectures = "spacy.Tagger.v1"

 [model.tok2vec]
-@architectures = "spacy.HashEmbedCNN.v1"
+@architectures = "spacy.HashEmbedCNN.v2"
 pretrained_vectors = null
 width = 96
 depth = 4
--- a/spacy/pipeline/textcat.py
+++ b/spacy/pipeline/textcat.py
@ -21,7 +21,7 @@ single_label_default_config = """
@architectures = "spacy.Tok2Vec.v2"

 [model.tok2vec.embed]
-@architectures = "spacy.MultiHashEmbed.v1"
+@architectures = "spacy.MultiHashEmbed.v2"
 width = 64
 rows = [2000, 2000, 1000, 1000, 1000, 1000]
 attrs = ["ORTH", "LOWER", "PREFIX", "SUFFIX", "SHAPE", "ID"]
@ -56,7 +56,7 @@ single_label_cnn_config = """
 exclusive_classes = true

 [model.tok2vec]
-@architectures = "spacy.HashEmbedCNN.v1"
+@architectures = "spacy.HashEmbedCNN.v2"
 pretrained_vectors = null
 width = 96
 depth = 4
--- a/spacy/pipeline/textcat_multilabel.py
+++ b/spacy/pipeline/textcat_multilabel.py
@ -21,7 +21,7 @@ multi_label_default_config = """
@architectures = "spacy.Tok2Vec.v1"

 [model.tok2vec.embed]
-@architectures = "spacy.MultiHashEmbed.v1"
+@architectures = "spacy.MultiHashEmbed.v2"
 width = 64
 rows = [2000, 2000, 1000, 1000, 1000, 1000]
 attrs = ["ORTH", "LOWER", "PREFIX", "SUFFIX", "SHAPE", "ID"]
@ -56,7 +56,7 @@ multi_label_cnn_config = """
 exclusive_classes = false

 [model.tok2vec]
-@architectures = "spacy.HashEmbedCNN.v1"
+@architectures = "spacy.HashEmbedCNN.v2"
 pretrained_vectors = null
 width = 96
 depth = 4
--- a/spacy/pipeline/tok2vec.py
+++ b/spacy/pipeline/tok2vec.py
@ -11,7 +11,7 @@ from ..errors import Errors

 default_model_config = """
 [model]
-@architectures = "spacy.HashEmbedCNN.v1"
+@architectures = "spacy.HashEmbedCNN.v2"
 pretrained_vectors = null
 width = 96
 depth = 4
--- a/spacy/scorer.py
+++ b/spacy/scorer.py
@ -20,10 +20,16 @@ MISSING_VALUES = frozenset([None, 0, ""])
 class PRFScore:
    """A precision / recall / F score."""

-    def __init__(self) -> None:
-        self.tp = 0
-        self.fp = 0
-        self.fn = 0
+    def __init__(
+        self,
+        *,
+        tp: int = 0,
+        fp: int = 0,
+        fn: int = 0,
+    ) -> None:
+        self.tp = tp
+        self.fp = fp
+        self.fn = fn

    def __len__(self) -> int:
        return self.tp + self.fp + self.fn
@ -305,6 +311,8 @@ class Scorer:
        *,
        getter: Callable[[Doc, str], Iterable[Span]] = getattr,
        has_annotation: Optional[Callable[[Doc], bool]] = None,
+        labeled: bool = True,
+        allow_overlap: bool = False,
        **cfg,
    ) -> Dict[str, Any]:
        """Returns PRF scores for labeled spans.
@ -317,6 +325,11 @@ class Scorer:
        has_annotation (Optional[Callable[[Doc], bool]]) should return whether a `Doc`
            has annotation for this `attr`. Docs without annotation are skipped for
            scoring purposes.
+        labeled (bool): Whether or not to include label information in
+            the evaluation. If set to 'False', two spans will be considered
+            equal if their start and end match, irrespective of their label.
+        allow_overlap (bool): Whether or not to allow overlapping spans.
+            If set to 'False', the alignment will automatically resolve conflicts.
        RETURNS (Dict[str, Any]): A dictionary containing the PRF scores under
            the keys attr_p/r/f and the per-type PRF scores under attr_per_type.

@ -345,33 +358,42 @@ class Scorer:
            gold_spans = set()
            pred_spans = set()
            for span in getter(gold_doc, attr):
-                gold_span = (span.label_, span.start, span.end - 1)
+                if labeled:
+                    gold_span = (span.label_, span.start, span.end - 1)
+                else:
+                    gold_span = (span.start, span.end - 1)
                gold_spans.add(gold_span)
-                gold_per_type[span.label_].add((span.label_, span.start, span.end - 1))
+                gold_per_type[span.label_].add(gold_span)
            pred_per_type = {label: set() for label in labels}
-            for span in example.get_aligned_spans_x2y(getter(pred_doc, attr)):
-                pred_spans.add((span.label_, span.start, span.end - 1))
-                pred_per_type[span.label_].add((span.label_, span.start, span.end - 1))
+            for span in example.get_aligned_spans_x2y(getter(pred_doc, attr), allow_overlap):
+                if labeled:
+                    pred_span = (span.label_, span.start, span.end - 1)
+                else:
+                    pred_span = (span.start, span.end - 1)
+                pred_spans.add(pred_span)
+                pred_per_type[span.label_].add(pred_span)
            # Scores per label
-            for k, v in score_per_type.items():
-                if k in pred_per_type:
-                    v.score_set(pred_per_type[k], gold_per_type[k])
+            if labeled:
+                for k, v in score_per_type.items():
+                    if k in pred_per_type:
+                        v.score_set(pred_per_type[k], gold_per_type[k])
            # Score for all labels
            score.score_set(pred_spans, gold_spans)
-        if len(score) > 0:
-            return {
-                f"{attr}_p": score.precision,
-                f"{attr}_r": score.recall,
-                f"{attr}_f": score.fscore,
-                f"{attr}_per_type": {k: v.to_dict() for k, v in score_per_type.items()},
-            }
-        else:
-            return {
+        # Assemble final result
+        final_scores = {
                f"{attr}_p": None,
                f"{attr}_r": None,
                f"{attr}_f": None,
-                f"{attr}_per_type": None,
            }
+        if labeled:
+            final_scores[f"{attr}_per_type"] = None
+        if len(score) > 0:
+            final_scores[f"{attr}_p"] = score.precision
+            final_scores[f"{attr}_r"] = score.recall
+            final_scores[f"{attr}_f"] = score.fscore
+            if labeled:
+                final_scores[f"{attr}_per_type"] = {k: v.to_dict() for k, v in score_per_type.items()}
+        return final_scores

    @staticmethod
    def score_cats(
--- a/spacy/strings.pyx
+++ b/spacy/strings.pyx
@ -223,7 +223,7 @@ cdef class StringStore:
            it doesn't exist. Paths may be either strings or Path-like objects.
        """
        path = util.ensure_path(path)
-        strings = list(self)
+        strings = sorted(self)
        srsly.write_json(path, strings)

    def from_disk(self, path):
@ -247,7 +247,7 @@ cdef class StringStore:

        RETURNS (bytes): The serialized form of the `StringStore` object.
        """
-        return srsly.json_dumps(list(self))
+        return srsly.json_dumps(sorted(self))

    def from_bytes(self, bytes_data, **kwargs):
        """Load state from a binary string.
--- a/spacy/tests/doc/test_doc_api.py
+++ b/spacy/tests/doc/test_doc_api.py
@ -6,12 +6,14 @@ import logging
 import mock

 from spacy.lang.xx import MultiLanguage
-from spacy.tokens import Doc, Span
+from spacy.tokens import Doc, Span, Token
 from spacy.vocab import Vocab
 from spacy.lexeme import Lexeme
 from spacy.lang.en import English
 from spacy.attrs import ENT_TYPE, ENT_IOB, SENT_START, HEAD, DEP, MORPH

+from .test_underscore import clean_underscore  # noqa: F401
+

 def test_doc_api_init(en_vocab):
    words = ["a", "b", "c", "d"]
@ -347,15 +349,19 @@ def test_doc_from_array_morph(en_vocab):
    assert [str(t.morph) for t in doc] == [str(t.morph) for t in new_doc]


+@pytest.mark.usefixtures("clean_underscore")
 def test_doc_api_from_docs(en_tokenizer, de_tokenizer):
    en_texts = ["Merging the docs is fun.", "", "They don't think alike."]
    en_texts_without_empty = [t for t in en_texts if len(t)]
    de_text = "Wie war die Frage?"
    en_docs = [en_tokenizer(text) for text in en_texts]
-    docs_idx = en_texts[0].index("docs")
+    en_docs[0].spans["group"] = [en_docs[0][1:4]]
+    en_docs[2].spans["group"] = [en_docs[2][1:4]]
+    span_group_texts = sorted([en_docs[0][1:4].text, en_docs[2][1:4].text])
    de_doc = de_tokenizer(de_text)
-    expected = (True, None, None, None)
-    en_docs[0].user_data[("._.", "is_ambiguous", docs_idx, None)] = expected
+    Token.set_extension("is_ambiguous", default=False)
+    en_docs[0][2]._.is_ambiguous = True # docs
+    en_docs[2][3]._.is_ambiguous = True # think
    assert Doc.from_docs([]) is None
    assert de_doc is not Doc.from_docs([de_doc])
    assert str(de_doc) == str(Doc.from_docs([de_doc]))
@ -372,11 +378,12 @@ def test_doc_api_from_docs(en_tokenizer, de_tokenizer):
    en_docs_tokens = [t for doc in en_docs for t in doc]
    assert len(m_doc) == len(en_docs_tokens)
    think_idx = len(en_texts[0]) + 1 + en_texts[2].index("think")
+    assert m_doc[2]._.is_ambiguous == True
    assert m_doc[9].idx == think_idx
-    with pytest.raises(AttributeError):
-        # not callable, because it was not set via set_extension
-        m_doc[2]._.is_ambiguous
-    assert len(m_doc.user_data) == len(en_docs[0].user_data)  # but it's there
+    assert m_doc[9]._.is_ambiguous == True
+    assert not any([t._.is_ambiguous for t in m_doc[3:8]])
+    assert "group" in m_doc.spans
+    assert span_group_texts == sorted([s.text for s in m_doc.spans["group"]])

    m_doc = Doc.from_docs(en_docs, ensure_whitespace=False)
    assert len(en_texts_without_empty) == len(list(m_doc.sents))
@ -388,6 +395,8 @@ def test_doc_api_from_docs(en_tokenizer, de_tokenizer):
    assert len(m_doc) == len(en_docs_tokens)
    think_idx = len(en_texts[0]) + 0 + en_texts[2].index("think")
    assert m_doc[9].idx == think_idx
+    assert "group" in m_doc.spans
+    assert span_group_texts == sorted([s.text for s in m_doc.spans["group"]])

    m_doc = Doc.from_docs(en_docs, attrs=["lemma", "length", "pos"])
    assert len(str(m_doc)) > len(en_texts[0]) + len(en_texts[1])
@ -399,6 +408,8 @@ def test_doc_api_from_docs(en_tokenizer, de_tokenizer):
    assert len(m_doc) == len(en_docs_tokens)
    think_idx = len(en_texts[0]) + 1 + en_texts[2].index("think")
    assert m_doc[9].idx == think_idx
+    assert "group" in m_doc.spans
+    assert span_group_texts == sorted([s.text for s in m_doc.spans["group"]])


 def test_doc_api_from_docs_ents(en_tokenizer):
--- a/spacy/tests/doc/test_retokenize_merge.py
+++ b/spacy/tests/doc/test_retokenize_merge.py
@ -452,3 +452,30 @@ def test_retokenize_disallow_zero_length(en_vocab):
    with pytest.raises(ValueError):
        with doc.retokenize() as retokenizer:
            retokenizer.merge(doc[1:1])
+
+
+def test_doc_retokenize_merge_without_parse_keeps_sents(en_tokenizer):
+    text = "displaCy is a parse tool built with Javascript"
+    sent_starts = [1, 0, 0, 0, 1, 0, 0, 0]
+    tokens = en_tokenizer(text)
+
+    # merging within a sentence keeps all sentence boundaries
+    doc = Doc(tokens.vocab, words=[t.text for t in tokens], sent_starts=sent_starts)
+    assert len(list(doc.sents)) == 2
+    with doc.retokenize() as retokenizer:
+        retokenizer.merge(doc[1:3])
+    assert len(list(doc.sents)) == 2
+
+    # merging over a sentence boundary unsets it by default
+    doc = Doc(tokens.vocab, words=[t.text for t in tokens], sent_starts=sent_starts)
+    assert len(list(doc.sents)) == 2
+    with doc.retokenize() as retokenizer:
+        retokenizer.merge(doc[3:6])
+    assert doc[3].is_sent_start == None
+
+    # merging over a sentence boundary and setting sent_start
+    doc = Doc(tokens.vocab, words=[t.text for t in tokens], sent_starts=sent_starts)
+    assert len(list(doc.sents)) == 2
+    with doc.retokenize() as retokenizer:
+        retokenizer.merge(doc[3:6], attrs={"sent_start": True})
+    assert len(list(doc.sents)) == 2
--- a/spacy/tests/doc/test_span.py
+++ b/spacy/tests/doc/test_span.py
@ -1,9 +1,11 @@
 import pytest
 from spacy.attrs import ORTH, LENGTH
-from spacy.tokens import Doc, Span
+from spacy.tokens import Doc, Span, Token
 from spacy.vocab import Vocab
 from spacy.util import filter_spans

+from .test_underscore import clean_underscore  # noqa: F401
+

@pytest.fixture
 def doc(en_tokenizer):
@ -219,11 +221,14 @@ def test_span_as_doc(doc):
    assert span_doc[0].idx == 0


+@pytest.mark.usefixtures("clean_underscore")
 def test_span_as_doc_user_data(doc):
    """Test that the user_data can be preserved (but not by default). """
    my_key = "my_info"
    my_value = 342
    doc.user_data[my_key] = my_value
+    Token.set_extension("is_x", default=False)
+    doc[7]._.is_x = True

    span = doc[4:10]
    span_doc_with = span.as_doc(copy_user_data=True)
@ -232,6 +237,12 @@ def test_span_as_doc_user_data(doc):
    assert doc.user_data.get(my_key, None) is my_value
    assert span_doc_with.user_data.get(my_key, None) is my_value
    assert span_doc_without.user_data.get(my_key, None) is None
+    for i in range(len(span_doc_with)):
+        if i != 3:
+            assert span_doc_with[i]._.is_x is False
+        else:
+            assert span_doc_with[i]._.is_x is True
+    assert not any([t._.is_x for t in span_doc_without])


 def test_span_string_label_kb_id(doc):
--- a/spacy/tests/enable_gpu.py
+++ b/spacy/tests/enable_gpu.py
@ -0,0 +1,3 @@
+from spacy import require_gpu
+
+require_gpu()
--- a/spacy/tests/matcher/test_dependency_matcher.py
+++ b/spacy/tests/matcher/test_dependency_matcher.py
@ -4,7 +4,9 @@ import re
 import copy
 from mock import Mock
 from spacy.matcher import DependencyMatcher
-from spacy.tokens import Doc
+from spacy.tokens import Doc, Token
+
+from ..doc.test_underscore import clean_underscore  # noqa: F401


@pytest.fixture
@ -344,3 +346,26 @@ def test_dependency_matcher_long_matches(en_vocab, doc):
    matcher = DependencyMatcher(en_vocab)
    with pytest.raises(ValueError):
        matcher.add("pattern", [pattern])
+
+
+@pytest.mark.usefixtures("clean_underscore")
+def test_dependency_matcher_span_user_data(en_tokenizer):
+    doc = en_tokenizer("a b c d e")
+    for token in doc:
+        token.head = doc[0]
+        token.dep_ = "a"
+    get_is_c = lambda token: token.text in ("c",)
+    Token.set_extension("is_c", default=False)
+    doc[2]._.is_c = True
+    pattern = [
+        {"RIGHT_ID": "c", "RIGHT_ATTRS": {"_": {"is_c": True}}},
+    ]
+    matcher = DependencyMatcher(en_tokenizer.vocab)
+    matcher.add("C", [pattern])
+    doc_matches = matcher(doc)
+    offset = 1
+    span_matches = matcher(doc[offset:])
+    for doc_match, span_match in zip(sorted(doc_matches), sorted(span_matches)):
+        assert doc_match[0] == span_match[0]
+        for doc_t_i, span_t_i in zip(doc_match[1], span_match[1]):
+            assert doc_t_i == span_t_i + offset
--- a/spacy/tests/matcher/test_matcher_logic.py
+++ b/spacy/tests/matcher/test_matcher_logic.py
@ -204,3 +204,90 @@ def test_matcher_remove():
    # removing again should throw an error
    with pytest.raises(ValueError):
        matcher.remove("Rule")
+
+
+def test_matcher_with_alignments_greedy_longest(en_vocab):
+    cases = [
+        ("aaab", "a* b", [0, 0, 0, 1]),
+        ("baab", "b a* b", [0, 1, 1, 2]),
+        ("aaab", "a a a b", [0, 1, 2, 3]),
+        ("aaab", "a+ b", [0, 0, 0, 1]),
+        ("aaba", "a+ b a+", [0, 0, 1, 2]),
+        ("aabaa", "a+ b a+", [0, 0, 1, 2, 2]),
+        ("aaba", "a+ b a*", [0, 0, 1, 2]),
+        ("aaaa", "a*", [0, 0, 0, 0]),
+        ("baab", "b a* b b*", [0, 1, 1, 2]),
+        ("aabb", "a* b* a*", [0, 0, 1, 1]),
+        ("aaab", "a+ a+ a b", [0, 1, 2, 3]),
+        ("aaab", "a+ a+ a+ b", [0, 1, 2, 3]),
+        ("aaab", "a+ a a b", [0, 1, 2, 3]),
+        ("aaab", "a+ a a", [0, 1, 2]),
+        ("aaab", "a+ a a?", [0, 1, 2]),
+        ("aaaa", "a a a a a?", [0, 1, 2, 3]),
+        ("aaab", "a+ a b", [0, 0, 1, 2]),
+        ("aaab", "a+ a+ b", [0, 0, 1, 2]),
+    ]
+    for string, pattern_str, result in cases:
+        matcher = Matcher(en_vocab)
+        doc = Doc(matcher.vocab, words=list(string))
+        pattern = []
+        for part in pattern_str.split():
+            if part.endswith("+"):
+                pattern.append({"ORTH": part[0], "OP": "+"})
+            elif part.endswith("*"):
+                pattern.append({"ORTH": part[0], "OP": "*"})
+            elif part.endswith("?"):
+                pattern.append({"ORTH": part[0], "OP": "?"})
+            else:
+                pattern.append({"ORTH": part})
+        matcher.add("PATTERN", [pattern], greedy="LONGEST")
+        matches = matcher(doc, with_alignments=True)
+        n_matches = len(matches)
+
+        _, s, e, expected = matches[0]
+
+        assert expected == result, (string, pattern_str, s, e, n_matches)
+
+
+def test_matcher_with_alignments_nongreedy(en_vocab):
+    cases = [
+        (0, "aaab", "a* b", [[0, 1], [0, 0, 1], [0, 0, 0, 1], [1]]),
+        (1, "baab", "b a* b", [[0, 1, 1, 2]]),
+        (2, "aaab", "a a a b", [[0, 1, 2, 3]]),
+        (3, "aaab", "a+ b",   [[0, 1], [0, 0, 1], [0, 0, 0, 1]]),
+        (4, "aaba", "a+ b a+", [[0, 1, 2], [0, 0, 1, 2]]),
+        (5, "aabaa", "a+ b a+", [[0, 1, 2], [0, 0, 1, 2], [0, 0, 1, 2, 2], [0, 1, 2, 2] ]),
+        (6, "aaba", "a+ b a*", [[0, 1], [0, 0, 1], [0, 0, 1, 2], [0, 1, 2]]),
+        (7, "aaaa", "a*", [[0], [0, 0], [0, 0, 0], [0, 0, 0, 0]]),
+        (8, "baab", "b a* b b*", [[0, 1, 1, 2]]),
+        (9, "aabb", "a* b* a*", [[1], [2], [2, 2], [0, 1], [0, 0, 1], [0, 0, 1, 1], [0, 1, 1], [1, 1]]),
+        (10, "aaab", "a+ a+ a b", [[0, 1, 2, 3]]),
+        (11, "aaab", "a+ a+ a+ b", [[0, 1, 2, 3]]),
+        (12, "aaab", "a+ a a b", [[0, 1, 2, 3]]),
+        (13, "aaab", "a+ a a", [[0, 1, 2]]),
+        (14, "aaab", "a+ a a?", [[0, 1], [0, 1, 2]]),
+        (15, "aaaa", "a a a a a?", [[0, 1, 2, 3]]),
+        (16, "aaab", "a+ a b", [[0, 1, 2], [0, 0, 1, 2]]),
+        (17, "aaab", "a+ a+ b", [[0, 1, 2], [0, 0, 1, 2]]),
+    ]
+    for case_id, string, pattern_str, results in cases:
+        matcher = Matcher(en_vocab)
+        doc = Doc(matcher.vocab, words=list(string))
+        pattern = []
+        for part in pattern_str.split():
+            if part.endswith("+"):
+                pattern.append({"ORTH": part[0], "OP": "+"})
+            elif part.endswith("*"):
+                pattern.append({"ORTH": part[0], "OP": "*"})
+            elif part.endswith("?"):
+                pattern.append({"ORTH": part[0], "OP": "?"})
+            else:
+                pattern.append({"ORTH": part})
+
+        matcher.add("PATTERN", [pattern])
+        matches = matcher(doc, with_alignments=True)
+        n_matches = len(matches)
+
+        for _, s, e, expected in matches:
+            assert expected in results, (case_id, string, pattern_str, s, e, n_matches)
+            assert len(expected) == e - s
--- a/spacy/tests/pipeline/test_entity_ruler.py
+++ b/spacy/tests/pipeline/test_entity_ruler.py
@ -5,6 +5,7 @@ from spacy.tokens import Span
 from spacy.language import Language
 from spacy.pipeline import EntityRuler
 from spacy.errors import MatchPatternError
+from thinc.api import NumpyOps, get_current_ops


@pytest.fixture
@ -201,13 +202,14 @@ def test_entity_ruler_overlapping_spans(nlp):

@pytest.mark.parametrize("n_process", [1, 2])
 def test_entity_ruler_multiprocessing(nlp, n_process):
-    texts = ["I enjoy eating Pizza Hut pizza."]
+    if isinstance(get_current_ops, NumpyOps) or n_process < 2:
+        texts = ["I enjoy eating Pizza Hut pizza."]

-    patterns = [{"label": "FASTFOOD", "pattern": "Pizza Hut", "id": "1234"}]
+        patterns = [{"label": "FASTFOOD", "pattern": "Pizza Hut", "id": "1234"}]

-    ruler = nlp.add_pipe("entity_ruler")
-    ruler.add_patterns(patterns)
+        ruler = nlp.add_pipe("entity_ruler")
+        ruler.add_patterns(patterns)

-    for doc in nlp.pipe(texts, n_process=2):
-        for ent in doc.ents:
-            assert ent.ent_id_ == "1234"
+        for doc in nlp.pipe(texts, n_process=2):
+            for ent in doc.ents:
+                assert ent.ent_id_ == "1234"
--- a/spacy/tests/pipeline/test_lemmatizer.py
+++ b/spacy/tests/pipeline/test_lemmatizer.py
@ -1,6 +1,7 @@
 import pytest
 import logging
 import mock
+import pickle
 from spacy import util, registry
 from spacy.lang.en import English
 from spacy.lookups import Lookups
@ -106,6 +107,9 @@ def test_lemmatizer_serialize(nlp):
    doc2 = nlp2.make_doc("coping")
    doc2[0].pos_ = "VERB"
    assert doc2[0].lemma_ == ""
-    doc2 = lemmatizer(doc2)
+    doc2 = lemmatizer2(doc2)
    assert doc2[0].text == "coping"
    assert doc2[0].lemma_ == "cope"
+
+    # Make sure that lemmatizer cache can be pickled
+    b = pickle.dumps(lemmatizer2)
--- a/spacy/tests/pipeline/test_models.py
+++ b/spacy/tests/pipeline/test_models.py
@ -4,7 +4,7 @@ import numpy
 import pytest
 from numpy.testing import assert_almost_equal
 from spacy.vocab import Vocab
-from thinc.api import NumpyOps, Model, data_validation
+from thinc.api import Model, data_validation, get_current_ops
 from thinc.types import Array2d, Ragged

 from spacy.lang.en import English
@ -13,7 +13,7 @@ from spacy.ml._character_embed import CharacterEmbed
 from spacy.tokens import Doc


-OPS = NumpyOps()
+OPS = get_current_ops()

 texts = ["These are 4 words", "Here just three"]
 l0 = [[1, 2], [3, 4], [5, 6], [7, 8]]
@ -82,7 +82,7 @@ def util_batch_unbatch_docs_list(
        Y_batched = model.predict(in_data)
        Y_not_batched = [model.predict([u])[0] for u in in_data]
        for i in range(len(Y_batched)):
-            assert_almost_equal(Y_batched[i], Y_not_batched[i], decimal=4)
+            assert_almost_equal(OPS.to_numpy(Y_batched[i]), OPS.to_numpy(Y_not_batched[i]), decimal=4)


 def util_batch_unbatch_docs_array(
@ -91,7 +91,7 @@ def util_batch_unbatch_docs_array(
    with data_validation(True):
        model.initialize(in_data, out_data)
        Y_batched = model.predict(in_data).tolist()
-        Y_not_batched = [model.predict([u])[0] for u in in_data]
+        Y_not_batched = [model.predict([u])[0].tolist() for u in in_data]
        assert_almost_equal(Y_batched, Y_not_batched, decimal=4)


@ -100,8 +100,8 @@ def util_batch_unbatch_docs_ragged(
 ):
    with data_validation(True):
        model.initialize(in_data, out_data)
-        Y_batched = model.predict(in_data)
+        Y_batched = model.predict(in_data).data.tolist()
        Y_not_batched = []
        for u in in_data:
            Y_not_batched.extend(model.predict([u]).data.tolist())
-        assert_almost_equal(Y_batched.data, Y_not_batched, decimal=4)
+        assert_almost_equal(Y_batched, Y_not_batched, decimal=4)
--- a/spacy/tests/pipeline/test_pipe_factories.py
+++ b/spacy/tests/pipeline/test_pipe_factories.py
@ -1,4 +1,6 @@
 import pytest
+import mock
+import logging
 from spacy.language import Language
 from spacy.lang.en import English
 from spacy.lang.de import German
@ -402,6 +404,38 @@ def test_pipe_factories_from_source():
        nlp.add_pipe("custom", source=source_nlp)


+def test_pipe_factories_from_source_language_subclass():
+    class CustomEnglishDefaults(English.Defaults):
+        stop_words = set(["custom", "stop"])
+
+    @registry.languages("custom_en")
+    class CustomEnglish(English):
+        lang = "custom_en"
+        Defaults = CustomEnglishDefaults
+
+    source_nlp = English()
+    source_nlp.add_pipe("tagger")
+
+    # custom subclass
+    nlp = CustomEnglish()
+    nlp.add_pipe("tagger", source=source_nlp)
+    assert "tagger" in nlp.pipe_names
+
+    # non-subclass
+    nlp = German()
+    nlp.add_pipe("tagger", source=source_nlp)
+    assert "tagger" in nlp.pipe_names
+
+    # mismatched vectors
+    nlp = English()
+    nlp.vocab.vectors.resize((1, 4))
+    nlp.vocab.vectors.add("cat", vector=[1, 2, 3, 4])
+    logger = logging.getLogger("spacy")
+    with mock.patch.object(logger, "warning") as mock_warning:
+        nlp.add_pipe("tagger", source=source_nlp)
+        mock_warning.assert_called()
+
+
 def test_pipe_factories_from_source_custom():
    """Test adding components from a source model with custom components."""
    name = "test_pipe_factories_from_source_custom"
--- a/spacy/tests/pipeline/test_textcat.py
+++ b/spacy/tests/pipeline/test_textcat.py
@ -1,7 +1,7 @@
 import pytest
 import random
 import numpy.random
-from numpy.testing import assert_equal
+from numpy.testing import assert_almost_equal
 from thinc.api import fix_random_seed
 from spacy import util
 from spacy.lang.en import English
@ -222,8 +222,12 @@ def test_overfitting_IO():
    batch_cats_1 = [doc.cats for doc in nlp.pipe(texts)]
    batch_cats_2 = [doc.cats for doc in nlp.pipe(texts)]
    no_batch_cats = [doc.cats for doc in [nlp(text) for text in texts]]
-    assert_equal(batch_cats_1, batch_cats_2)
-    assert_equal(batch_cats_1, no_batch_cats)
+    for cats_1, cats_2 in zip(batch_cats_1, batch_cats_2):
+        for cat in cats_1:
+            assert_almost_equal(cats_1[cat], cats_2[cat], decimal=5)
+    for cats_1, cats_2 in zip(batch_cats_1, no_batch_cats):
+        for cat in cats_1:
+            assert_almost_equal(cats_1[cat], cats_2[cat], decimal=5)


 def test_overfitting_IO_multi():
@ -270,8 +274,12 @@ def test_overfitting_IO_multi():
    batch_deps_1 = [doc.cats for doc in nlp.pipe(texts)]
    batch_deps_2 = [doc.cats for doc in nlp.pipe(texts)]
    no_batch_deps = [doc.cats for doc in [nlp(text) for text in texts]]
-    assert_equal(batch_deps_1, batch_deps_2)
-    assert_equal(batch_deps_1, no_batch_deps)
+    for cats_1, cats_2 in zip(batch_deps_1, batch_deps_2):
+        for cat in cats_1:
+            assert_almost_equal(cats_1[cat], cats_2[cat], decimal=5)
+    for cats_1, cats_2 in zip(batch_deps_1, no_batch_deps):
+        for cat in cats_1:
+            assert_almost_equal(cats_1[cat], cats_2[cat], decimal=5)


 # fmt: off
--- a/spacy/tests/pipeline/test_tok2vec.py
+++ b/spacy/tests/pipeline/test_tok2vec.py
@ -8,8 +8,8 @@ from spacy.tokens import Doc
 from spacy.training import Example
 from spacy import util
 from spacy.lang.en import English
-from thinc.api import Config
-from numpy.testing import assert_equal
+from thinc.api import Config, get_current_ops
+from numpy.testing import assert_array_equal

 from ..util import get_batch, make_tempdir

@ -160,7 +160,8 @@ def test_tok2vec_listener():

    doc = nlp("Running the pipeline as a whole.")
    doc_tensor = tagger_tok2vec.predict([doc])[0]
-    assert_equal(doc.tensor, doc_tensor)
+    ops = get_current_ops()
+    assert_array_equal(ops.to_numpy(doc.tensor), ops.to_numpy(doc_tensor))

    # TODO: should this warn or error?
    nlp.select_pipes(disable="tok2vec")
--- a/spacy/tests/regression/test_issue4501-5000.py
+++ b/spacy/tests/regression/test_issue4501-5000.py
@ -9,6 +9,7 @@ from spacy.language import Language
 from spacy.util import ensure_path, load_model_from_path
 import numpy
 import pickle
+from thinc.api import NumpyOps, get_current_ops

 from ..util import make_tempdir

@ -169,21 +170,22 @@ def test_issue4725_1():


 def test_issue4725_2():
-    # ensures that this runs correctly and doesn't hang or crash because of the global vectors
-    # if it does crash, it's usually because of calling 'spawn' for multiprocessing (e.g. on Windows),
-    # or because of issues with pickling the NER (cf test_issue4725_1)
-    vocab = Vocab(vectors_name="test_vocab_add_vector")
-    data = numpy.ndarray((5, 3), dtype="f")
-    data[0] = 1.0
-    data[1] = 2.0
-    vocab.set_vector("cat", data[0])
-    vocab.set_vector("dog", data[1])
-    nlp = English(vocab=vocab)
-    nlp.add_pipe("ner")
-    nlp.initialize()
-    docs = ["Kurt is in London."] * 10
-    for _ in nlp.pipe(docs, batch_size=2, n_process=2):
-        pass
+    if isinstance(get_current_ops, NumpyOps):
+        # ensures that this runs correctly and doesn't hang or crash because of the global vectors
+        # if it does crash, it's usually because of calling 'spawn' for multiprocessing (e.g. on Windows),
+        # or because of issues with pickling the NER (cf test_issue4725_1)
+        vocab = Vocab(vectors_name="test_vocab_add_vector")
+        data = numpy.ndarray((5, 3), dtype="f")
+        data[0] = 1.0
+        data[1] = 2.0
+        vocab.set_vector("cat", data[0])
+        vocab.set_vector("dog", data[1])
+        nlp = English(vocab=vocab)
+        nlp.add_pipe("ner")
+        nlp.initialize()
+        docs = ["Kurt is in London."] * 10
+        for _ in nlp.pipe(docs, batch_size=2, n_process=2):
+            pass


 def test_issue4849():
@ -204,10 +206,11 @@ def test_issue4849():
        count_ents += len([ent for ent in doc.ents if ent.ent_id > 0])
    assert count_ents == 2
    # USING 2 PROCESSES
-    count_ents = 0
-    for doc in nlp.pipe([text], n_process=2):
-        count_ents += len([ent for ent in doc.ents if ent.ent_id > 0])
-    assert count_ents == 2
+    if isinstance(get_current_ops, NumpyOps):
+        count_ents = 0
+        for doc in nlp.pipe([text], n_process=2):
+            count_ents += len([ent for ent in doc.ents if ent.ent_id > 0])
+        assert count_ents == 2


@Language.factory("my_pipe")
@ -239,10 +242,11 @@ def test_issue4903():
    nlp.add_pipe("sentencizer")
    nlp.add_pipe("my_pipe", after="sentencizer")
    text = ["I like bananas.", "Do you like them?", "No, I prefer wasabi."]
-    docs = list(nlp.pipe(text, n_process=2))
-    assert docs[0].text == "I like bananas."
-    assert docs[1].text == "Do you like them?"
-    assert docs[2].text == "No, I prefer wasabi."
+    if isinstance(get_current_ops(), NumpyOps):
+        docs = list(nlp.pipe(text, n_process=2))
+        assert docs[0].text == "I like bananas."
+        assert docs[1].text == "Do you like them?"
+        assert docs[2].text == "No, I prefer wasabi."


 def test_issue4924():
--- a/spacy/tests/regression/test_issue5001-5500.py
+++ b/spacy/tests/regression/test_issue5001-5500.py
@ -6,6 +6,7 @@ from spacy.language import Language
 from spacy.lang.en.syntax_iterators import noun_chunks
 from spacy.vocab import Vocab
 import spacy
+from thinc.api import get_current_ops
 import pytest

 from ...util import make_tempdir
@ -54,16 +55,17 @@ def test_issue5082():
    ruler.add_patterns(patterns)
    parsed_vectors_1 = [t.vector for t in nlp(text)]
    assert len(parsed_vectors_1) == 4
-    numpy.testing.assert_array_equal(parsed_vectors_1[0], array1)
-    numpy.testing.assert_array_equal(parsed_vectors_1[1], array2)
-    numpy.testing.assert_array_equal(parsed_vectors_1[2], array3)
-    numpy.testing.assert_array_equal(parsed_vectors_1[3], array4)
+    ops = get_current_ops()
+    numpy.testing.assert_array_equal(ops.to_numpy(parsed_vectors_1[0]), array1)
+    numpy.testing.assert_array_equal(ops.to_numpy(parsed_vectors_1[1]), array2)
+    numpy.testing.assert_array_equal(ops.to_numpy(parsed_vectors_1[2]), array3)
+    numpy.testing.assert_array_equal(ops.to_numpy(parsed_vectors_1[3]), array4)
    nlp.add_pipe("merge_entities")
    parsed_vectors_2 = [t.vector for t in nlp(text)]
    assert len(parsed_vectors_2) == 3
-    numpy.testing.assert_array_equal(parsed_vectors_2[0], array1)
-    numpy.testing.assert_array_equal(parsed_vectors_2[1], array2)
-    numpy.testing.assert_array_equal(parsed_vectors_2[2], array34)
+    numpy.testing.assert_array_equal(ops.to_numpy(parsed_vectors_2[0]), array1)
+    numpy.testing.assert_array_equal(ops.to_numpy(parsed_vectors_2[1]), array2)
+    numpy.testing.assert_array_equal(ops.to_numpy(parsed_vectors_2[2]), array34)


 def test_issue5137():
--- a/spacy/tests/regression/test_issue5501-6000.py
+++ b/spacy/tests/regression/test_issue5501-6000.py
@ -1,5 +1,6 @@
 import pytest
-from thinc.api import Config, fix_random_seed
+from numpy.testing import assert_almost_equal
+from thinc.api import Config, fix_random_seed, get_current_ops

 from spacy.lang.en import English
 from spacy.pipeline.textcat import single_label_default_config, single_label_bow_config
@ -44,11 +45,12 @@ def test_issue5551(textcat_config):
        nlp.update([Example.from_dict(doc, annots)])
        # Store the result of each iteration
        result = pipe.model.predict([doc])
-        results.append(list(result[0]))
+        results.append(result[0])
    # All results should be the same because of the fixed seed
    assert len(results) == 3
-    assert results[0] == results[1]
-    assert results[0] == results[2]
+    ops = get_current_ops()
+    assert_almost_equal(ops.to_numpy(results[0]), ops.to_numpy(results[1]))
+    assert_almost_equal(ops.to_numpy(results[0]), ops.to_numpy(results[2]))


 def test_issue5838():
--- a/spacy/tests/regression/test_issue7065.py
+++ b/spacy/tests/regression/test_issue7065.py
@ -1,4 +1,6 @@
+from spacy.kb import KnowledgeBase
 from spacy.lang.en import English
+from spacy.training import Example


 def test_issue7065():
@ -16,3 +18,58 @@ def test_issue7065():
    ent = doc.ents[0]
    assert ent.start < sent0.end < ent.end
    assert sentences.index(ent.sent) == 0
+
+
+def test_issue7065_b():
+    # Test that the NEL doesn't crash when an entity crosses a sentence boundary
+    nlp = English()
+    vector_length = 3
+    nlp.add_pipe("sentencizer")
+
+    text = "Mahler 's Symphony No. 8 was beautiful."
+    entities = [(0, 6, "PERSON"), (10, 24, "WORK")]
+    links = {(0, 6): {"Q7304": 1.0, "Q270853": 0.0},
+             (10, 24): {"Q7304": 0.0, "Q270853": 1.0}}
+    sent_starts = [1, -1, 0, 0, 0, 0, 0, 0, 0]
+    doc = nlp(text)
+    example = Example.from_dict(doc, {"entities": entities, "links": links, "sent_starts": sent_starts})
+    train_examples = [example]
+
+    def create_kb(vocab):
+        # create artificial KB
+        mykb = KnowledgeBase(vocab, entity_vector_length=vector_length)
+        mykb.add_entity(entity="Q270853", freq=12, entity_vector=[9, 1, -7])
+        mykb.add_alias(
+            alias="No. 8",
+            entities=["Q270853"],
+            probabilities=[1.0],
+        )
+        mykb.add_entity(entity="Q7304", freq=12, entity_vector=[6, -4, 3])
+        mykb.add_alias(
+            alias="Mahler",
+            entities=["Q7304"],
+            probabilities=[1.0],
+        )
+        return mykb
+
+    # Create the Entity Linker component and add it to the pipeline
+    entity_linker = nlp.add_pipe("entity_linker", last=True)
+    entity_linker.set_kb(create_kb)
+
+    # train the NEL pipe
+    optimizer = nlp.initialize(get_examples=lambda: train_examples)
+    for i in range(2):
+        losses = {}
+        nlp.update(train_examples, sgd=optimizer, losses=losses)
+
+    # Add a custom rule-based component to mimick NER
+    patterns = [
+        {"label": "PERSON", "pattern": [{"LOWER": "mahler"}]},
+        {"label": "WORK", "pattern": [{"LOWER": "symphony"}, {"LOWER": "no"}, {"LOWER": "."}, {"LOWER": "8"}]}
+    ]
+    ruler = nlp.add_pipe("entity_ruler", before="entity_linker")
+    ruler.add_patterns(patterns)
+
+    # test the trained model - this should not throw E148
+    doc = nlp(text)
+    assert doc
--- a/spacy/tests/serialize/test_serialize_config.py
+++ b/spacy/tests/serialize/test_serialize_config.py
@ -4,7 +4,7 @@ import spacy
 from spacy.lang.en import English
 from spacy.lang.de import German
 from spacy.language import Language, DEFAULT_CONFIG, DEFAULT_CONFIG_PRETRAIN_PATH
-from spacy.util import registry, load_model_from_config, load_config
+from spacy.util import registry, load_model_from_config, load_config, load_config_from_str
 from spacy.ml.models import build_Tok2Vec_model, build_tb_parser_model
 from spacy.ml.models import MultiHashEmbed, MaxoutWindowEncoder
 from spacy.schemas import ConfigSchema, ConfigSchemaPretrain
@ -465,3 +465,32 @@ def test_config_only_resolve_relevant_blocks():
        nlp.initialize()
    nlp.config["initialize"]["lookups"] = None
    nlp.initialize()
+
+
+def test_hyphen_in_config():
+    hyphen_config_str = """
+    [nlp]
+    lang = "en"
+    pipeline = ["my_punctual_component"]
+
+    [components]
+
+    [components.my_punctual_component]
+    factory = "my_punctual_component"
+    punctuation = ["?","-"]
+    """
+
+    @spacy.Language.factory("my_punctual_component")
+    class MyPunctualComponent(object):
+        name = "my_punctual_component"
+
+        def __init__(
+            self,
+            nlp,
+            name,
+            punctuation,
+        ):
+            self.punctuation = punctuation
+
+    nlp = English.from_config(load_config_from_str(hyphen_config_str))
+    assert nlp.get_pipe("my_punctual_component").punctuation == ['?', '-']
--- a/spacy/tests/serialize/test_serialize_tokenizer.py
+++ b/spacy/tests/serialize/test_serialize_tokenizer.py
@ -26,10 +26,14 @@ def test_serialize_custom_tokenizer(en_vocab, en_tokenizer):
    assert tokenizer.rules != {}
    assert tokenizer.token_match is not None
    assert tokenizer.url_match is not None
+    assert tokenizer.prefix_search is not None
+    assert tokenizer.infix_finditer is not None
    tokenizer.from_bytes(tokenizer_bytes)
    assert tokenizer.rules == {}
    assert tokenizer.token_match is None
    assert tokenizer.url_match is None
+    assert tokenizer.prefix_search is None
+    assert tokenizer.infix_finditer is None

    tokenizer = Tokenizer(en_vocab, rules={"ABC.": [{"ORTH": "ABC"}, {"ORTH": "."}]})
    tokenizer.rules = {}
--- a/spacy/tests/serialize/test_serialize_vocab_strings.py
+++ b/spacy/tests/serialize/test_serialize_vocab_strings.py
@ -49,9 +49,9 @@ def test_serialize_vocab_roundtrip_disk(strings1, strings2):
        vocab1_d = Vocab().from_disk(file_path1)
        vocab2_d = Vocab().from_disk(file_path2)
        # check strings rather than lexemes, which are only reloaded on demand
-        assert strings1 == [s for s in vocab1_d.strings]
-        assert strings2 == [s for s in vocab2_d.strings]
-        if strings1 == strings2:
+        assert set(strings1) == set([s for s in vocab1_d.strings])
+        assert set(strings2) == set([s for s in vocab2_d.strings])
+        if set(strings1) == set(strings2):
            assert [s for s in vocab1_d.strings] == [s for s in vocab2_d.strings]
        else:
            assert [s for s in vocab1_d.strings] != [s for s in vocab2_d.strings]
@ -96,7 +96,7 @@ def test_serialize_stringstore_roundtrip_bytes(strings1, strings2):
    sstore2 = StringStore(strings=strings2)
    sstore1_b = sstore1.to_bytes()
    sstore2_b = sstore2.to_bytes()
-    if strings1 == strings2:
+    if set(strings1) == set(strings2):
        assert sstore1_b == sstore2_b
    else:
        assert sstore1_b != sstore2_b
@ -104,7 +104,7 @@ def test_serialize_stringstore_roundtrip_bytes(strings1, strings2):
    assert sstore1.to_bytes() == sstore1_b
    new_sstore1 = StringStore().from_bytes(sstore1_b)
    assert new_sstore1.to_bytes() == sstore1_b
-    assert list(new_sstore1) == strings1
+    assert set(new_sstore1) == set(strings1)


@pytest.mark.parametrize("strings1,strings2", test_strings)
@ -118,12 +118,12 @@ def test_serialize_stringstore_roundtrip_disk(strings1, strings2):
        sstore2.to_disk(file_path2)
        sstore1_d = StringStore().from_disk(file_path1)
        sstore2_d = StringStore().from_disk(file_path2)
-        assert list(sstore1_d) == list(sstore1)
-        assert list(sstore2_d) == list(sstore2)
-        if strings1 == strings2:
-            assert list(sstore1_d) == list(sstore2_d)
+        assert set(sstore1_d) == set(sstore1)
+        assert set(sstore2_d) == set(sstore2)
+        if set(strings1) == set(strings2):
+            assert set(sstore1_d) == set(sstore2_d)
        else:
-            assert list(sstore1_d) != list(sstore2_d)
+            assert set(sstore1_d) != set(sstore2_d)


@pytest.mark.parametrize("strings,lex_attr", test_strings_attrs)
--- a/spacy/tests/test_cli.py
+++ b/spacy/tests/test_cli.py
@ -307,8 +307,11 @@ def test_project_config_validation2(config, n_errors):
    assert len(errors) == n_errors


-def test_project_config_interpolation():
-    variables = {"a": 10, "b": {"c": "foo", "d": True}}
+@pytest.mark.parametrize(
+    "int_value", [10, pytest.param("10", marks=pytest.mark.xfail)],
+)
+def test_project_config_interpolation(int_value):
+    variables = {"a": int_value, "b": {"c": "foo", "d": True}}
    commands = [
        {"name": "x", "script": ["hello ${vars.a} ${vars.b.c}"]},
        {"name": "y", "script": ["${vars.b.c} ${vars.b.d}"]},
@ -317,6 +320,8 @@ def test_project_config_interpolation():
    with make_tempdir() as d:
        srsly.write_yaml(d / "project.yml", project)
        cfg = load_project_config(d)
+    assert type(cfg) == dict
+    assert type(cfg["commands"]) == list
    assert cfg["commands"][0]["script"][0] == "hello 10 foo"
    assert cfg["commands"][1]["script"][0] == "foo true"
    commands = [{"name": "x", "script": ["hello ${vars.a} ${vars.b.e}"]}]
@ -325,6 +330,24 @@ def test_project_config_interpolation():
        substitute_project_variables(project)


+@pytest.mark.parametrize(
+    "greeting", [342, "everyone", "tout le monde", pytest.param("42", marks=pytest.mark.xfail)],
+)
+def test_project_config_interpolation_override(greeting):
+    variables = {"a": "world"}
+    commands = [
+        {"name": "x", "script": ["hello ${vars.a}"]},
+    ]
+    overrides = {"vars.a": greeting}
+    project = {"commands": commands, "vars": variables}
+    with make_tempdir() as d:
+        srsly.write_yaml(d / "project.yml", project)
+        cfg = load_project_config(d, overrides=overrides)
+    assert type(cfg) == dict
+    assert type(cfg["commands"]) == list
+    assert cfg["commands"][0]["script"][0] == f"hello {greeting}"
+
+
 def test_project_config_interpolation_env():
    variables = {"a": 10}
    env_var = "SPACY_TEST_FOO"
--- a/spacy/tests/test_language.py
+++ b/spacy/tests/test_language.py
@ -10,6 +10,7 @@ from spacy.lang.en import English
 from spacy.lang.de import German
 from spacy.util import registry, ignore_error, raise_error
 import spacy
+from thinc.api import NumpyOps, get_current_ops

 from .util import add_vecs_to_vocab, assert_docs_equal

@ -142,25 +143,29 @@ def texts():

@pytest.mark.parametrize("n_process", [1, 2])
 def test_language_pipe(nlp2, n_process, texts):
-    texts = texts * 10
-    expecteds = [nlp2(text) for text in texts]
-    docs = nlp2.pipe(texts, n_process=n_process, batch_size=2)
+    ops = get_current_ops()
+    if isinstance(ops, NumpyOps) or n_process < 2:
+        texts = texts * 10
+        expecteds = [nlp2(text) for text in texts]
+        docs = nlp2.pipe(texts, n_process=n_process, batch_size=2)

-    for doc, expected_doc in zip(docs, expecteds):
-        assert_docs_equal(doc, expected_doc)
+        for doc, expected_doc in zip(docs, expecteds):
+            assert_docs_equal(doc, expected_doc)


@pytest.mark.parametrize("n_process", [1, 2])
 def test_language_pipe_stream(nlp2, n_process, texts):
-    # check if nlp.pipe can handle infinite length iterator properly.
-    stream_texts = itertools.cycle(texts)
-    texts0, texts1 = itertools.tee(stream_texts)
-    expecteds = (nlp2(text) for text in texts0)
-    docs = nlp2.pipe(texts1, n_process=n_process, batch_size=2)
+    ops = get_current_ops()
+    if isinstance(ops, NumpyOps) or n_process < 2:
+        # check if nlp.pipe can handle infinite length iterator properly.
+        stream_texts = itertools.cycle(texts)
+        texts0, texts1 = itertools.tee(stream_texts)
+        expecteds = (nlp2(text) for text in texts0)
+        docs = nlp2.pipe(texts1, n_process=n_process, batch_size=2)

-    n_fetch = 20
-    for doc, expected_doc in itertools.islice(zip(docs, expecteds), n_fetch):
-        assert_docs_equal(doc, expected_doc)
+        n_fetch = 20
+        for doc, expected_doc in itertools.islice(zip(docs, expecteds), n_fetch):
+            assert_docs_equal(doc, expected_doc)


 def test_language_pipe_error_handler():
--- a/spacy/tests/test_misc.py
+++ b/spacy/tests/test_misc.py
@ -8,7 +8,8 @@ from spacy import prefer_gpu, require_gpu, require_cpu
 from spacy.ml._precomputable_affine import PrecomputableAffine
 from spacy.ml._precomputable_affine import _backprop_precomputable_affine_padding
 from spacy.util import dot_to_object, SimpleFrozenList, import_file
-from thinc.api import Config, Optimizer, ConfigValidationError
+from thinc.api import Config, Optimizer, ConfigValidationError, get_current_ops
+from thinc.api import set_current_ops
 from spacy.training.batchers import minibatch_by_words
 from spacy.lang.en import English
 from spacy.lang.nl import Dutch
@ -81,6 +82,7 @@ def test_PrecomputableAffine(nO=4, nI=5, nF=3, nP=2):


 def test_prefer_gpu():
+    current_ops = get_current_ops()
    try:
        import cupy  # noqa: F401

@ -88,9 +90,11 @@ def test_prefer_gpu():
        assert isinstance(get_current_ops(), CupyOps)
    except ImportError:
        assert not prefer_gpu()
+    set_current_ops(current_ops)


 def test_require_gpu():
+    current_ops = get_current_ops()
    try:
        import cupy  # noqa: F401

@ -99,9 +103,11 @@ def test_require_gpu():
    except ImportError:
        with pytest.raises(ValueError):
            require_gpu()
+    set_current_ops(current_ops)


 def test_require_cpu():
+    current_ops = get_current_ops()
    require_cpu()
    assert isinstance(get_current_ops(), NumpyOps)
    try:
@ -113,6 +119,7 @@ def test_require_cpu():
        pass
    require_cpu()
    assert isinstance(get_current_ops(), NumpyOps)
+    set_current_ops(current_ops)


 def test_ascii_filenames():
--- a/spacy/tests/test_models.py
+++ b/spacy/tests/test_models.py
@ -1,7 +1,7 @@
 from typing import List
 import pytest
 from thinc.api import fix_random_seed, Adam, set_dropout_rate
-from numpy.testing import assert_array_equal
+from numpy.testing import assert_array_equal, assert_array_almost_equal
 import numpy
 from spacy.ml.models import build_Tok2Vec_model, MultiHashEmbed, MaxoutWindowEncoder
 from spacy.ml.models import build_bow_text_classifier, build_simple_cnn_text_classifier
@ -109,7 +109,7 @@ def test_models_initialize_consistently(seed, model_func, kwargs):
    model2.initialize()
    params1 = get_all_params(model1)
    params2 = get_all_params(model2)
-    assert_array_equal(params1, params2)
+    assert_array_equal(model1.ops.to_numpy(params1), model2.ops.to_numpy(params2))


@pytest.mark.parametrize(
@ -134,14 +134,25 @@ def test_models_predict_consistently(seed, model_func, kwargs, get_X):
        for i in range(len(tok2vec1)):
            for j in range(len(tok2vec1[i])):
                assert_array_equal(
-                    numpy.asarray(tok2vec1[i][j]), numpy.asarray(tok2vec2[i][j])
+                    numpy.asarray(model1.ops.to_numpy(tok2vec1[i][j])),
+                    numpy.asarray(model2.ops.to_numpy(tok2vec2[i][j])),
                )

+    try:
+        Y1 = model1.ops.to_numpy(Y1)
+        Y2 = model2.ops.to_numpy(Y2)
+    except Exception:
+        pass
    if isinstance(Y1, numpy.ndarray):
        assert_array_equal(Y1, Y2)
    elif isinstance(Y1, List):
        assert len(Y1) == len(Y2)
        for y1, y2 in zip(Y1, Y2):
+            try:
+                y1 = model1.ops.to_numpy(y1)
+                y2 = model2.ops.to_numpy(y2)
+            except Exception:
+                pass
            assert_array_equal(y1, y2)
    else:
        raise ValueError(f"Could not compare type {type(Y1)}")
@ -169,12 +180,17 @@ def test_models_update_consistently(seed, dropout, model_func, kwargs, get_X):
            model.finish_update(optimizer)
        updated_params = get_all_params(model)
        with pytest.raises(AssertionError):
-            assert_array_equal(initial_params, updated_params)
+            assert_array_equal(
+                model.ops.to_numpy(initial_params), model.ops.to_numpy(updated_params)
+            )
        return model

    model1 = get_updated_model()
    model2 = get_updated_model()
-    assert_array_equal(get_all_params(model1), get_all_params(model2))
+    assert_array_almost_equal(
+        model1.ops.to_numpy(get_all_params(model1)),
+        model2.ops.to_numpy(get_all_params(model2)),
+    )


@pytest.mark.parametrize("model_func,kwargs", [(StaticVectors, {"nO": 128, "nM": 300})])
--- a/spacy/tests/test_scorer.py
+++ b/spacy/tests/test_scorer.py
@ -3,10 +3,10 @@ import pytest
 from pytest import approx
 from spacy.training import Example
 from spacy.training.iob_utils import offsets_to_biluo_tags
-from spacy.scorer import Scorer, ROCAUCScore
+from spacy.scorer import Scorer, ROCAUCScore, PRFScore
 from spacy.scorer import _roc_auc_score, _roc_curve
 from spacy.lang.en import English
-from spacy.tokens import Doc
+from spacy.tokens import Doc, Span


 test_las_apple = [
@ -403,3 +403,68 @@ def test_roc_auc_score():
    score.score_set(0.75, 1)
    with pytest.raises(ValueError):
        _ = score.score  # noqa: F841
+
+
+def test_score_spans():
+    nlp = English()
+    text = "This is just a random sentence."
+    key = "my_spans"
+    gold = nlp.make_doc(text)
+    pred = nlp.make_doc(text)
+    spans = []
+    spans.append(gold.char_span(0, 4, label="PERSON"))
+    spans.append(gold.char_span(0, 7, label="ORG"))
+    spans.append(gold.char_span(8, 12, label="ORG"))
+    gold.spans[key] = spans
+
+    def span_getter(doc, span_key):
+        return doc.spans[span_key]
+
+    # Predict exactly the same, but overlapping spans will be discarded
+    pred.spans[key] = spans
+    eg = Example(pred, gold)
+    scores = Scorer.score_spans([eg], attr=key, getter=span_getter)
+    assert scores[f"{key}_p"] == 1.0
+    assert scores[f"{key}_r"] < 1.0
+
+    # Allow overlapping, now both precision and recall should be 100%
+    pred.spans[key] = spans
+    eg = Example(pred, gold)
+    scores = Scorer.score_spans([eg], attr=key, getter=span_getter, allow_overlap=True)
+    assert scores[f"{key}_p"] == 1.0
+    assert scores[f"{key}_r"] == 1.0
+
+    # Change the predicted labels
+    new_spans = [Span(pred, span.start, span.end, label="WRONG") for span in spans]
+    pred.spans[key] = new_spans
+    eg = Example(pred, gold)
+    scores = Scorer.score_spans([eg], attr=key, getter=span_getter, allow_overlap=True)
+    assert scores[f"{key}_p"] == 0.0
+    assert scores[f"{key}_r"] == 0.0
+    assert f"{key}_per_type" in scores
+
+    # Discard labels from the evaluation
+    scores = Scorer.score_spans([eg], attr=key, getter=span_getter, allow_overlap=True, labeled=False)
+    assert scores[f"{key}_p"] == 1.0
+    assert scores[f"{key}_r"] == 1.0
+    assert f"{key}_per_type" not in scores
+
+
+def test_prf_score():
+    cand = {"hi", "ho"}
+    gold1 = {"yo", "hi"}
+    gold2 = set()
+
+    a = PRFScore()
+    a.score_set(cand=cand, gold=gold1)
+    assert (a.precision, a.recall, a.fscore) == approx((0.5, 0.5, 0.5))
+
+    b = PRFScore()
+    b.score_set(cand=cand, gold=gold2)
+    assert (b.precision, b.recall, b.fscore) == approx((0.0, 0.0, 0.0))
+
+    c = a + b
+    assert (c.precision, c.recall, c.fscore) == approx((0.25, 0.5, 0.33333333))
+
+    a += b
+    assert (a.precision, a.recall, a.fscore) == approx((c.precision, c.recall, c.fscore))
--- a/spacy/tests/tokenizer/test_explain.py
+++ b/spacy/tests/tokenizer/test_explain.py
@ -1,5 +1,7 @@
 import pytest
+import re
 from spacy.util import get_lang_class
+from spacy.tokenizer import Tokenizer

 # Only include languages with no external dependencies
 # "is" seems to confuse importlib, so we're also excluding it for now
@ -60,3 +62,18 @@ def test_tokenizer_explain(lang):
        tokens = [t.text for t in tokenizer(sentence) if not t.is_space]
        debug_tokens = [t[1] for t in tokenizer.explain(sentence)]
        assert tokens == debug_tokens
+
+
+def test_tokenizer_explain_special_matcher(en_vocab):
+    suffix_re = re.compile(r"[\.]$")
+    infix_re = re.compile(r"[/]")
+    rules = {"a.": [{"ORTH": "a."}]}
+    tokenizer = Tokenizer(
+        en_vocab,
+        rules=rules,
+        suffix_search=suffix_re.search,
+        infix_finditer=infix_re.finditer,
+    )
+    tokens = [t.text for t in tokenizer("a/a.")]
+    explain_tokens = [t[1] for t in tokenizer.explain("a/a.")]
+    assert tokens == explain_tokens
--- a/spacy/tests/tokenizer/test_tokenizer.py
+++ b/spacy/tests/tokenizer/test_tokenizer.py
@ -1,4 +1,5 @@
 import pytest
+import re
 from spacy.vocab import Vocab
 from spacy.tokenizer import Tokenizer
 from spacy.util import ensure_path
@ -186,3 +187,31 @@ def test_tokenizer_special_cases_spaces(tokenizer):
    assert [t.text for t in tokenizer("a b c")] == ["a", "b", "c"]
    tokenizer.add_special_case("a b c", [{"ORTH": "a b c"}])
    assert [t.text for t in tokenizer("a b c")] == ["a b c"]
+
+
+def test_tokenizer_flush_cache(en_vocab):
+    suffix_re = re.compile(r"[\.]$")
+    tokenizer = Tokenizer(
+        en_vocab,
+        suffix_search=suffix_re.search,
+    )
+    assert [t.text for t in tokenizer("a.")] == ["a", "."]
+    tokenizer.suffix_search = None
+    assert [t.text for t in tokenizer("a.")] == ["a."]
+
+
+def test_tokenizer_flush_specials(en_vocab):
+    suffix_re = re.compile(r"[\.]$")
+    rules = {"a a": [{"ORTH": "a a"}]}
+    tokenizer1 = Tokenizer(
+        en_vocab,
+        suffix_search=suffix_re.search,
+        rules=rules,
+    )
+    tokenizer2 = Tokenizer(
+        en_vocab,
+        suffix_search=suffix_re.search,
+    )
+    assert [t.text for t in tokenizer1("a a.")] == ["a a", "."]
+    tokenizer1.rules = {}
+    assert [t.text for t in tokenizer1("a a.")] == ["a", "a", "."]
--- a/spacy/tests/training/test_new_example.py
+++ b/spacy/tests/training/test_new_example.py
@ -2,6 +2,7 @@ import pytest
 from spacy.training.example import Example
 from spacy.tokens import Doc
 from spacy.vocab import Vocab
+from spacy.util import to_ternary_int


 def test_Example_init_requires_doc_objects():
@ -121,7 +122,7 @@ def test_Example_from_dict_with_morphology(annots):
    [
        {
            "words": ["This", "is", "one", "sentence", "this", "is", "another"],
-            "sent_starts": [1, 0, 0, 0, 1, 0, 0],
+            "sent_starts": [1, False, 0, None, True, -1, -5.7],
        }
    ],
 )
@ -131,7 +132,12 @@ def test_Example_from_dict_with_sent_start(annots):
    example = Example.from_dict(predicted, annots)
    assert len(list(example.reference.sents)) == 2
    for i, token in enumerate(example.reference):
-        assert bool(token.is_sent_start) == bool(annots["sent_starts"][i])
+        if to_ternary_int(annots["sent_starts"][i]) == 1:
+            assert token.is_sent_start is True
+        elif to_ternary_int(annots["sent_starts"][i]) == 0:
+            assert token.is_sent_start is None
+        else:
+            assert token.is_sent_start is False


@pytest.mark.parametrize(
--- a/spacy/tests/training/test_training.py
+++ b/spacy/tests/training/test_training.py
@ -426,6 +426,29 @@ def test_aligned_spans_x2y(en_vocab, en_tokenizer):
    assert [(ent.start, ent.end) for ent in ents_x2y] == [(0, 2), (4, 6)]


+def test_aligned_spans_y2x_overlap(en_vocab, en_tokenizer):
+    text = "I flew to San Francisco Valley"
+    nlp = English()
+    doc = nlp(text)
+    # the reference doc has overlapping spans
+    gold_doc = nlp.make_doc(text)
+    spans = []
+    prefix = "I flew to "
+    spans.append(gold_doc.char_span(len(prefix), len(prefix + "San Francisco"), label="CITY"))
+    spans.append(gold_doc.char_span(len(prefix), len(prefix + "San Francisco Valley"), label="VALLEY"))
+    spans_key = "overlap_ents"
+    gold_doc.spans[spans_key] = spans
+    example = Example(doc, gold_doc)
+    spans_gold = example.reference.spans[spans_key]
+    assert [(ent.start, ent.end) for ent in spans_gold] == [(3, 5), (3, 6)]
+
+    # Ensure that 'get_aligned_spans_y2x' has the aligned entities correct
+    spans_y2x_no_overlap = example.get_aligned_spans_y2x(spans_gold, allow_overlap=False)
+    assert [(ent.start, ent.end) for ent in spans_y2x_no_overlap] == [(3, 5)]
+    spans_y2x_overlap = example.get_aligned_spans_y2x(spans_gold, allow_overlap=True)
+    assert [(ent.start, ent.end) for ent in spans_y2x_overlap] == [(3, 5), (3, 6)]
+
+
 def test_gold_ner_missing_tags(en_tokenizer):
    doc = en_tokenizer("I flew to Silicon Valley via London.")
    biluo_tags = [None, "O", "O", "B-LOC", "L-LOC", "O", "U-GPE", "O"]
--- a/spacy/tests/util.py
+++ b/spacy/tests/util.py
@ -5,6 +5,7 @@ import srsly
 from spacy.tokens import Doc
 from spacy.vocab import Vocab
 from spacy.util import make_tempdir  # noqa: F401
+from thinc.api import get_current_ops


@contextlib.contextmanager
@ -58,7 +59,10 @@ def add_vecs_to_vocab(vocab, vectors):

 def get_cosine(vec1, vec2):
    """Get cosine for two given vectors"""
-    return numpy.dot(vec1, vec2) / (numpy.linalg.norm(vec1) * numpy.linalg.norm(vec2))
+    OPS = get_current_ops()
+    v1 = OPS.to_numpy(OPS.asarray(vec1))
+    v2 = OPS.to_numpy(OPS.asarray(vec2))
+    return numpy.dot(v1, v2) / (numpy.linalg.norm(v1) * numpy.linalg.norm(v2))


 def assert_docs_equal(doc1, doc2):
--- a/spacy/tests/vocab_vectors/test_vectors.py
+++ b/spacy/tests/vocab_vectors/test_vectors.py
@ -1,6 +1,7 @@
 import pytest
 import numpy
 from numpy.testing import assert_allclose, assert_equal
+from thinc.api import get_current_ops
 from spacy.vocab import Vocab
 from spacy.vectors import Vectors
 from spacy.tokenizer import Tokenizer
@ -9,6 +10,7 @@ from spacy.tokens import Doc

 from ..util import add_vecs_to_vocab, get_cosine, make_tempdir

+OPS = get_current_ops()

@pytest.fixture
 def strings():
@ -18,21 +20,21 @@ def strings():
@pytest.fixture
 def vectors():
    return [
-        ("apple", [1, 2, 3]),
-        ("orange", [-1, -2, -3]),
-        ("and", [-1, -1, -1]),
-        ("juice", [5, 5, 10]),
-        ("pie", [7, 6.3, 8.9]),
+        ("apple", OPS.asarray([1, 2, 3])),
+        ("orange", OPS.asarray([-1, -2, -3])),
+        ("and", OPS.asarray([-1, -1, -1])),
+        ("juice", OPS.asarray([5, 5, 10])),
+        ("pie", OPS.asarray([7, 6.3, 8.9])),
    ]


@pytest.fixture
 def ngrams_vectors():
    return [
-        ("apple", [1, 2, 3]),
-        ("app", [-0.1, -0.2, -0.3]),
-        ("ppl", [-0.2, -0.3, -0.4]),
-        ("pl", [0.7, 0.8, 0.9]),
+        ("apple", OPS.asarray([1, 2, 3])),
+        ("app", OPS.asarray([-0.1, -0.2, -0.3])),
+        ("ppl", OPS.asarray([-0.2, -0.3, -0.4])),
+        ("pl", OPS.asarray([0.7, 0.8, 0.9])),
    ]


@ -171,8 +173,10 @@ def test_vectors_most_similar_identical():
@pytest.mark.parametrize("text", ["apple and orange"])
 def test_vectors_token_vector(tokenizer_v, vectors, text):
    doc = tokenizer_v(text)
-    assert vectors[0] == (doc[0].text, list(doc[0].vector))
-    assert vectors[1] == (doc[2].text, list(doc[2].vector))
+    assert vectors[0][0] == doc[0].text
+    assert all([a == b for a, b in zip(vectors[0][1], doc[0].vector)])
+    assert vectors[1][0] == doc[2].text
+    assert all([a == b for a, b in zip(vectors[1][1], doc[2].vector)])


@pytest.mark.parametrize("text", ["apple"])
@ -301,7 +305,7 @@ def test_vectors_doc_doc_similarity(vocab, text1, text2):

 def test_vocab_add_vector():
    vocab = Vocab(vectors_name="test_vocab_add_vector")
-    data = numpy.ndarray((5, 3), dtype="f")
+    data = OPS.xp.ndarray((5, 3), dtype="f")
    data[0] = 1.0
    data[1] = 2.0
    vocab.set_vector("cat", data[0])
@ -320,10 +324,10 @@ def test_vocab_prune_vectors():
    _ = vocab["cat"]  # noqa: F841
    _ = vocab["dog"]  # noqa: F841
    _ = vocab["kitten"]  # noqa: F841
-    data = numpy.ndarray((5, 3), dtype="f")
-    data[0] = [1.0, 1.2, 1.1]
-    data[1] = [0.3, 1.3, 1.0]
-    data[2] = [0.9, 1.22, 1.05]
+    data = OPS.xp.ndarray((5, 3), dtype="f")
+    data[0] = OPS.asarray([1.0, 1.2, 1.1])
+    data[1] = OPS.asarray([0.3, 1.3, 1.0])
+    data[2] = OPS.asarray([0.9, 1.22, 1.05])
    vocab.set_vector("cat", data[0])
    vocab.set_vector("dog", data[1])
    vocab.set_vector("kitten", data[2])
@ -332,40 +336,41 @@ def test_vocab_prune_vectors():
    assert list(remap.keys()) == ["kitten"]
    neighbour, similarity = list(remap.values())[0]
    assert neighbour == "cat", remap
-    assert_allclose(similarity, get_cosine(data[0], data[2]), atol=1e-4, rtol=1e-3)
+    cosine = get_cosine(data[0], data[2])
+    assert_allclose(float(similarity), cosine, atol=1e-4, rtol=1e-3)


 def test_vectors_serialize():
-    data = numpy.asarray([[4, 2, 2, 2], [4, 2, 2, 2], [1, 1, 1, 1]], dtype="f")
+    data = OPS.asarray([[4, 2, 2, 2], [4, 2, 2, 2], [1, 1, 1, 1]], dtype="f")
    v = Vectors(data=data, keys=["A", "B", "C"])
    b = v.to_bytes()
    v_r = Vectors()
    v_r.from_bytes(b)
-    assert_equal(v.data, v_r.data)
+    assert_equal(OPS.to_numpy(v.data), OPS.to_numpy(v_r.data))
    assert v.key2row == v_r.key2row
    v.resize((5, 4))
    v_r.resize((5, 4))
-    row = v.add("D", vector=numpy.asarray([1, 2, 3, 4], dtype="f"))
-    row_r = v_r.add("D", vector=numpy.asarray([1, 2, 3, 4], dtype="f"))
+    row = v.add("D", vector=OPS.asarray([1, 2, 3, 4], dtype="f"))
+    row_r = v_r.add("D", vector=OPS.asarray([1, 2, 3, 4], dtype="f"))
    assert row == row_r
-    assert_equal(v.data, v_r.data)
+    assert_equal(OPS.to_numpy(v.data), OPS.to_numpy(v_r.data))
    assert v.is_full == v_r.is_full
    with make_tempdir() as d:
        v.to_disk(d)
        v_r.from_disk(d)
-        assert_equal(v.data, v_r.data)
+        assert_equal(OPS.to_numpy(v.data), OPS.to_numpy(v_r.data))
        assert v.key2row == v_r.key2row
        v.resize((5, 4))
        v_r.resize((5, 4))
-        row = v.add("D", vector=numpy.asarray([10, 20, 30, 40], dtype="f"))
-        row_r = v_r.add("D", vector=numpy.asarray([10, 20, 30, 40], dtype="f"))
+        row = v.add("D", vector=OPS.asarray([10, 20, 30, 40], dtype="f"))
+        row_r = v_r.add("D", vector=OPS.asarray([10, 20, 30, 40], dtype="f"))
        assert row == row_r
-        assert_equal(v.data, v_r.data)
+        assert_equal(OPS.to_numpy(v.data), OPS.to_numpy(v_r.data))


 def test_vector_is_oov():
    vocab = Vocab(vectors_name="test_vocab_is_oov")
-    data = numpy.ndarray((5, 3), dtype="f")
+    data = OPS.xp.ndarray((5, 3), dtype="f")
    data[0] = 1.0
    data[1] = 2.0
    vocab.set_vector("cat", data[0])
--- a/spacy/tokenizer.pxd
+++ b/spacy/tokenizer.pxd
@ -23,8 +23,8 @@ cdef class Tokenizer:
    cdef object _infix_finditer
    cdef object _rules
    cdef PhraseMatcher _special_matcher
-    cdef int _property_init_count
-    cdef int _property_init_max
+    cdef int _property_init_count  # TODO: unused, remove in v3.1
+    cdef int _property_init_max    # TODO: unused, remove in v3.1

    cdef Doc _tokenize_affixes(self, unicode string, bint with_special_cases)
    cdef int _apply_special_cases(self, Doc doc) except -1
--- a/spacy/tokenizer.pyx
+++ b/spacy/tokenizer.pyx
@ -20,11 +20,12 @@ from .attrs import intify_attrs
 from .symbols import ORTH, NORM
 from .errors import Errors, Warnings
 from . import util
-from .util import registry
+from .util import registry, get_words_and_spaces
 from .attrs import intify_attrs
 from .symbols import ORTH
 from .scorer import Scorer
 from .training import validate_examples
+from .tokens import Span


 cdef class Tokenizer:
@ -68,8 +69,6 @@ cdef class Tokenizer:
        self._rules = {}
        self._special_matcher = PhraseMatcher(self.vocab)
        self._load_special_cases(rules)
-        self._property_init_count = 0
-        self._property_init_max = 4

    property token_match:
        def __get__(self):
@ -78,8 +77,6 @@ cdef class Tokenizer:
        def __set__(self, token_match):
            self._token_match = token_match
            self._reload_special_cases()
-            if self._property_init_count <= self._property_init_max:
-                self._property_init_count += 1

    property url_match:
        def __get__(self):
@ -87,7 +84,7 @@ cdef class Tokenizer:

        def __set__(self, url_match):
            self._url_match = url_match
-            self._flush_cache()
+            self._reload_special_cases()

    property prefix_search:
        def __get__(self):
@ -96,8 +93,6 @@ cdef class Tokenizer:
        def __set__(self, prefix_search):
            self._prefix_search = prefix_search
            self._reload_special_cases()
-            if self._property_init_count <= self._property_init_max:
-                self._property_init_count += 1

    property suffix_search:
        def __get__(self):
@ -106,8 +101,6 @@ cdef class Tokenizer:
        def __set__(self, suffix_search):
            self._suffix_search = suffix_search
            self._reload_special_cases()
-            if self._property_init_count <= self._property_init_max:
-                self._property_init_count += 1

    property infix_finditer:
        def __get__(self):
@ -116,8 +109,6 @@ cdef class Tokenizer:
        def __set__(self, infix_finditer):
            self._infix_finditer = infix_finditer
            self._reload_special_cases()
-            if self._property_init_count <= self._property_init_max:
-                self._property_init_count += 1

    property rules:
        def __get__(self):
@ -125,7 +116,7 @@ cdef class Tokenizer:

        def __set__(self, rules):
            self._rules = {}
-            self._reset_cache([key for key in self._cache])
+            self._flush_cache()
            self._flush_specials()
            self._cache = PreshMap()
            self._specials = PreshMap()
@ -225,6 +216,7 @@ cdef class Tokenizer:
                self.mem.free(cached)

    def _flush_specials(self):
+        self._special_matcher = PhraseMatcher(self.vocab)
        for k in self._specials:
            cached = <_Cached*>self._specials.get(k)
            del self._specials[k]
@ -567,7 +559,6 @@ cdef class Tokenizer:
        """Add special-case tokenization rules."""
        if special_cases is not None:
            for chunk, substrings in sorted(special_cases.items()):
-                self._validate_special_case(chunk, substrings)
                self.add_special_case(chunk, substrings)

    def _validate_special_case(self, chunk, substrings):
@ -615,16 +606,9 @@ cdef class Tokenizer:
            self._special_matcher.add(string, None, self._tokenize_affixes(string, False))

    def _reload_special_cases(self):
-        try:
-            self._property_init_count
-        except AttributeError:
-            return
-        # only reload if all 4 of prefix, suffix, infix, token_match have
-        # have been initialized
-        if self.vocab is not None and self._property_init_count >= self._property_init_max:
-            self._flush_cache()
-            self._flush_specials()
-            self._load_special_cases(self._rules)
+        self._flush_cache()
+        self._flush_specials()
+        self._load_special_cases(self._rules)

    def explain(self, text):
        """A debugging tokenizer that provides information about which
@ -638,8 +622,14 @@ cdef class Tokenizer:
        DOCS: https://spacy.io/api/tokenizer#explain
        """
        prefix_search = self.prefix_search
+        if prefix_search is None:
+            prefix_search = re.compile("a^").search
        suffix_search = self.suffix_search
+        if suffix_search is None:
+            suffix_search = re.compile("a^").search
        infix_finditer = self.infix_finditer
+        if infix_finditer is None:
+            infix_finditer = re.compile("a^").finditer
        token_match = self.token_match
        if token_match is None:
            token_match = re.compile("a^").match
@ -687,7 +677,7 @@ cdef class Tokenizer:
                    tokens.append(("URL_MATCH", substring))
                    substring = ''
                elif substring in special_cases:
-                    tokens.extend(("SPECIAL-" + str(i + 1), self.vocab.strings[e[ORTH]]) for i, e in enumerate(special_cases[substring]))
+                    tokens.extend((f"SPECIAL-{i + 1}", self.vocab.strings[e[ORTH]]) for i, e in enumerate(special_cases[substring]))
                    substring = ''
                elif list(infix_finditer(substring)):
                    infixes = infix_finditer(substring)
@ -705,7 +695,33 @@ cdef class Tokenizer:
                    tokens.append(("TOKEN", substring))
                    substring = ''
            tokens.extend(reversed(suffixes))
-        return tokens
+        # Find matches for special cases handled by special matcher
+        words, spaces = get_words_and_spaces([t[1] for t in tokens], text)
+        t_words = []
+        t_spaces = []
+        for word, space in zip(words, spaces):
+            if not word.isspace():
+                t_words.append(word)
+                t_spaces.append(space)
+        doc = Doc(self.vocab, words=t_words, spaces=t_spaces)
+        matches = self._special_matcher(doc)
+        spans = [Span(doc, s, e, label=m_id) for m_id, s, e in matches]
+        spans = util.filter_spans(spans)
+        # Replace matched tokens with their exceptions
+        i = 0
+        final_tokens = []
+        spans_by_start = {s.start: s for s in spans}
+        while i < len(tokens):
+            if i in spans_by_start:
+                span = spans_by_start[i]
+                exc = [d[ORTH] for d in special_cases[span.label_]]
+                for j, orth in enumerate(exc):
+                    final_tokens.append((f"SPECIAL-{j + 1}", self.vocab.strings[orth]))
+                i += len(span)
+            else:
+                final_tokens.append(tokens[i])
+                i += 1
+        return final_tokens

    def score(self, examples, **kwargs):
        validate_examples(examples, "Tokenizer.score")
@ -778,6 +794,15 @@ cdef class Tokenizer:
            "url_match": lambda b: data.setdefault("url_match", b),
            "exceptions": lambda b: data.setdefault("rules", b)
        }
+        # reset all properties and flush all caches (through rules),
+        # reset rules first so that _reload_special_cases is trivial/fast as
+        # the other properties are reset
+        self.rules = {}
+        self.prefix_search = None
+        self.suffix_search = None
+        self.infix_finditer = None
+        self.token_match = None
+        self.url_match = None
        msg = util.from_bytes(bytes_data, deserializers, exclude)
        if "prefix_search" in data and isinstance(data["prefix_search"], str):
            self.prefix_search = re.compile(data["prefix_search"]).search
@ -785,22 +810,12 @@ cdef class Tokenizer:
            self.suffix_search = re.compile(data["suffix_search"]).search
        if "infix_finditer" in data and isinstance(data["infix_finditer"], str):
            self.infix_finditer = re.compile(data["infix_finditer"]).finditer
-        # for token_match and url_match, set to None to override the language
-        # defaults if no regex is provided
        if "token_match" in data and isinstance(data["token_match"], str):
            self.token_match = re.compile(data["token_match"]).match
-        else:
-            self.token_match = None
        if "url_match" in data and isinstance(data["url_match"], str):
            self.url_match = re.compile(data["url_match"]).match
-        else:
-            self.url_match = None
        if "rules" in data and isinstance(data["rules"], dict):
-            # make sure to hard reset the cache to remove data from the default exceptions
-            self._rules = {}
-            self._flush_cache()
-            self._flush_specials()
-            self._load_special_cases(data["rules"])
+            self.rules = data["rules"]
        return self


--- a/spacy/tokens/_retokenize.pyx
+++ b/spacy/tokens/_retokenize.pyx
@ -281,7 +281,8 @@ def _merge(Doc doc, merges):
    for i in range(doc.length):
        doc.c[i].head -= i
    # Set the left/right children, left/right edges
-    set_children_from_heads(doc.c, 0, doc.length)
+    if doc.has_annotation("DEP"):
+        set_children_from_heads(doc.c, 0, doc.length)
    # Make sure ent_iob remains consistent
    make_iob_consistent(doc.c, doc.length)
    # Return the merged Python object
@ -294,7 +295,19 @@ def _resize_tensor(tensor, ranges):
        for i in range(start, end-1):
            delete.append(i)
    xp = get_array_module(tensor)
-    return xp.delete(tensor, delete, axis=0)
+    if xp is numpy:
+        return xp.delete(tensor, delete, axis=0)
+    else:
+        offset = 0
+        copy_start = 0
+        resized_shape = (tensor.shape[0] - len(delete), tensor.shape[1])
+        for start, end in ranges:
+            if copy_start > 0:
+                tensor[copy_start - offset:start - offset] = tensor[copy_start: start]
+            offset += end - start - 1
+            copy_start = end - 1
+        tensor[copy_start - offset:resized_shape[0]] = tensor[copy_start:]
+        return xp.asarray(tensor[:resized_shape[0]])


 def _split(Doc doc, int token_index, orths, heads, attrs):
@ -331,7 +344,13 @@ def _split(Doc doc, int token_index, orths, heads, attrs):
    to_process_tensor = (doc.tensor is not None and doc.tensor.size != 0)
    if to_process_tensor:
        xp = get_array_module(doc.tensor)
-        doc.tensor = xp.append(doc.tensor, xp.zeros((nb_subtokens,doc.tensor.shape[1]), dtype="float32"), axis=0)
+        if xp is numpy:
+            doc.tensor = xp.append(doc.tensor, xp.zeros((nb_subtokens,doc.tensor.shape[1]), dtype="float32"), axis=0)
+        else:
+            shape = (doc.tensor.shape[0] + nb_subtokens, doc.tensor.shape[1])
+            resized_array = xp.zeros(shape, dtype="float32")
+            resized_array[:doc.tensor.shape[0]] = doc.tensor[:doc.tensor.shape[0]]
+            doc.tensor = resized_array
    for token_to_move in range(orig_length - 1, token_index, -1):
        doc.c[token_to_move + nb_subtokens - 1] = doc.c[token_to_move]
        if to_process_tensor:
@ -348,7 +367,7 @@ def _split(Doc doc, int token_index, orths, heads, attrs):
        token.norm = 0  # reset norm
        if to_process_tensor:
            # setting the tensors of the split tokens to array of zeros
-            doc.tensor[token_index + i] = xp.zeros((1,doc.tensor.shape[1]), dtype="float32")
+            doc.tensor[token_index + i:token_index + i + 1] = xp.zeros((1,doc.tensor.shape[1]), dtype="float32")
        # Update the character offset of the subtokens
        if i != 0:
            token.idx = orig_token.idx + idx_offset
@ -392,7 +411,8 @@ def _split(Doc doc, int token_index, orths, heads, attrs):
    for i in range(doc.length):
        doc.c[i].head -= i
    # set children from head
-    set_children_from_heads(doc.c, 0, doc.length)
+    if doc.has_annotation("DEP"):
+        set_children_from_heads(doc.c, 0, doc.length)


 def _validate_extensions(extensions):
--- a/spacy/tokens/doc.pyx
+++ b/spacy/tokens/doc.pyx
@ -6,7 +6,7 @@ from libc.math cimport sqrt
 from libc.stdint cimport int32_t, uint64_t

 import copy
-from collections import Counter
+from collections import Counter, defaultdict
 from enum import Enum
 import itertools
 import numpy
@ -1120,13 +1120,14 @@ cdef class Doc:
        concat_words = []
        concat_spaces = []
        concat_user_data = {}
+        concat_spans = defaultdict(list)
        char_offset = 0
        for doc in docs:
            concat_words.extend(t.text for t in doc)
            concat_spaces.extend(bool(t.whitespace_) for t in doc)

            for key, value in doc.user_data.items():
-                if isinstance(key, tuple) and len(key) == 4:
+                if isinstance(key, tuple) and len(key) == 4 and key[0] == "._.":
                    data_type, name, start, end = key
                    if start is not None or end is not None:
                        start += char_offset
@ -1137,8 +1138,17 @@ cdef class Doc:
                        warnings.warn(Warnings.W101.format(name=name))
                else:
                    warnings.warn(Warnings.W102.format(key=key, value=value))
+            for key in doc.spans:
+                for span in doc.spans[key]:
+                    concat_spans[key].append((
+                        span.start_char + char_offset,
+                        span.end_char + char_offset,
+                        span.label,
+                        span.kb_id,
+                        span.text, # included as a check
+                    ))
            char_offset += len(doc.text)
-            if ensure_whitespace and not (len(doc) > 0 and doc[-1].is_space):
+            if len(doc) > 0 and ensure_whitespace and not doc[-1].is_space:
                char_offset += 1

        arrays = [doc.to_array(attrs) for doc in docs]
@ -1160,6 +1170,22 @@ cdef class Doc:

        concat_doc.from_array(attrs, concat_array)

+        for key in concat_spans:
+            if key not in concat_doc.spans:
+                concat_doc.spans[key] = []
+            for span_tuple in concat_spans[key]:
+                span = concat_doc.char_span(
+                        span_tuple[0],
+                        span_tuple[1],
+                        label=span_tuple[2],
+                        kb_id=span_tuple[3],
+                )
+                text = span_tuple[4]
+                if span is not None and span.text == text:
+                    concat_doc.spans[key].append(span)
+                else:
+                    raise ValueError(Errors.E873.format(key=key, text=text))
+
        return concat_doc

    def get_lca_matrix(self):
--- a/spacy/tokens/span.pyx
+++ b/spacy/tokens/span.pyx
@ -6,6 +6,7 @@ from libc.math cimport sqrt
 import numpy
 from thinc.api import get_array_module
 import warnings
+import copy

 from .doc cimport token_by_start, token_by_end, get_token_attr, _get_lca_matrix
 from ..structs cimport TokenC, LexemeC
@ -241,7 +242,19 @@ cdef class Span:
                if cat_start == self.start_char and cat_end == self.end_char:
                    doc.cats[cat_label] = value
        if copy_user_data:
-            doc.user_data = self.doc.user_data
+            user_data = {}
+            char_offset = self.start_char
+            for key, value in self.doc.user_data.items():
+                if isinstance(key, tuple) and len(key) == 4 and key[0] == "._.":
+                    data_type, name, start, end = key
+                    if start is not None or end is not None:
+                        start -= char_offset
+                        if end is not None:
+                            end -= char_offset
+                        user_data[(data_type, name, start, end)] = copy.copy(value)
+                else:
+                    user_data[key] = copy.copy(value)
+            doc.user_data = user_data
        return doc

    def _fix_dep_copy(self, attrs, array):
--- a/spacy/training/init.py
+++ b/spacy/training/init.py
@ -8,3 +8,4 @@ from .iob_utils import biluo_tags_to_spans, tags_to_entities  # noqa: F401
 from .gold_io import docs_to_json, read_json_file  # noqa: F401
 from .batchers import minibatch_by_padded_size, minibatch_by_words  # noqa: F401
 from .loggers import console_logger, wandb_logger  # noqa: F401
+from .callbacks import create_copy_from_base_model  # noqa: F401
--- a/spacy/training/callbacks.py
+++ b/spacy/training/callbacks.py
@ -0,0 +1,32 @@
+from typing import Optional
+from ..errors import Errors
+from ..language import Language
+from ..util import load_model, registry, logger
+
+
+@registry.callbacks("spacy.copy_from_base_model.v1")
+def create_copy_from_base_model(
+    tokenizer: Optional[str] = None,
+    vocab: Optional[str] = None,
+) -> Language:
+    def copy_from_base_model(nlp):
+        if tokenizer:
+            logger.info(f"Copying tokenizer from: {tokenizer}")
+            base_nlp = load_model(tokenizer)
+            if nlp.config["nlp"]["tokenizer"] == base_nlp.config["nlp"]["tokenizer"]:
+                nlp.tokenizer.from_bytes(base_nlp.tokenizer.to_bytes(exclude=["vocab"]))
+            else:
+                raise ValueError(
+                    Errors.E872.format(
+                        curr_config=nlp.config["nlp"]["tokenizer"],
+                        base_config=base_nlp.config["nlp"]["tokenizer"],
+                    )
+                )
+        if vocab:
+            logger.info(f"Copying vocab from: {vocab}")
+            # only reload if the vocab is from a different model
+            if tokenizer != vocab:
+                base_nlp = load_model(vocab)
+            nlp.vocab.from_bytes(base_nlp.vocab.to_bytes())
+
+    return copy_from_base_model
--- a/spacy/training/converters/conll_ner_to_docs.py
+++ b/spacy/training/converters/conll_ner_to_docs.py
@ -124,6 +124,9 @@ def segment_sents_and_docs(doc, n_sents, doc_delimiter, model=None, msg=None):
        nlp = load_model(model)
        if "parser" in nlp.pipe_names:
            msg.info(f"Segmenting sentences with parser from model '{model}'.")
+            for name, proc in nlp.pipeline:
+                if "parser" in getattr(proc, "listening_components", []):
+                    nlp.replace_listeners(name, "parser", ["model.tok2vec"])
            sentencizer = nlp.get_pipe("parser")
    if not sentencizer:
        msg.info(
--- a/spacy/training/corpus.py
+++ b/spacy/training/corpus.py
@ -2,6 +2,7 @@ import warnings
 from typing import Union, List, Iterable, Iterator, TYPE_CHECKING, Callable
 from typing import Optional
 from pathlib import Path
+import random
 import srsly

 from .. import util
@ -96,6 +97,7 @@ class Corpus:
        Defaults to 0, which indicates no limit.
    augment (Callable[Example, Iterable[Example]]): Optional data augmentation
        function, to extrapolate additional examples from your annotations.
+    shuffle (bool): Whether to shuffle the examples.

    DOCS: https://spacy.io/api/corpus
    """
@ -108,12 +110,14 @@ class Corpus:
        gold_preproc: bool = False,
        max_length: int = 0,
        augmenter: Optional[Callable] = None,
+        shuffle: bool = False,
    ) -> None:
        self.path = util.ensure_path(path)
        self.gold_preproc = gold_preproc
        self.max_length = max_length
        self.limit = limit
        self.augmenter = augmenter if augmenter is not None else dont_augment
+        self.shuffle = shuffle

    def __call__(self, nlp: "Language") -> Iterator[Example]:
        """Yield examples from the data.
@ -124,6 +128,10 @@ class Corpus:
        DOCS: https://spacy.io/api/corpus#call
        """
        ref_docs = self.read_docbin(nlp.vocab, walk_corpus(self.path, FILE_TYPE))
+        if self.shuffle:
+            ref_docs = list(ref_docs)
+            random.shuffle(ref_docs)
+
        if self.gold_preproc:
            examples = self.make_examples_gold_preproc(nlp, ref_docs)
        else:
--- a/spacy/training/example.pyx
+++ b/spacy/training/example.pyx
@ -13,7 +13,7 @@ from .iob_utils import biluo_tags_to_spans
 from ..errors import Errors, Warnings
 from ..pipeline._parser_internals import nonproj
 from ..tokens.token cimport MISSING_DEP
-from ..util import logger
+from ..util import logger, to_ternary_int


 cpdef Doc annotations_to_doc(vocab, tok_annot, doc_annot):
@ -213,18 +213,19 @@ cdef class Example:
        else:
            return [None] * len(self.x)

-    def get_aligned_spans_x2y(self, x_spans):
-        return self._get_aligned_spans(self.y, x_spans, self.alignment.x2y)
+    def get_aligned_spans_x2y(self, x_spans, allow_overlap=False):
+        return self._get_aligned_spans(self.y, x_spans, self.alignment.x2y, allow_overlap)

-    def get_aligned_spans_y2x(self, y_spans):
-        return self._get_aligned_spans(self.x, y_spans, self.alignment.y2x)
+    def get_aligned_spans_y2x(self, y_spans, allow_overlap=False):
+        return self._get_aligned_spans(self.x, y_spans, self.alignment.y2x, allow_overlap)

-    def _get_aligned_spans(self, doc, spans, align):
+    def _get_aligned_spans(self, doc, spans, align, allow_overlap):
        seen = set()
        output = []
        for span in spans:
            indices = align[span.start : span.end].data.ravel()
-            indices = [idx for idx in indices if idx not in seen]
+            if not allow_overlap:
+                indices = [idx for idx in indices if idx not in seen]
            if len(indices) >= 1:
                aligned_span = Span(doc, indices[0], indices[-1] + 1, label=span.label)
                target_text = span.text.lower().strip().replace(" ", "")
@ -237,7 +238,7 @@ cdef class Example:
    def get_aligned_ner(self):
        if not self.y.has_annotation("ENT_IOB"):
            return [None] * len(self.x)  # should this be 'missing' instead of 'None' ?
-        x_ents = self.get_aligned_spans_y2x(self.y.ents)
+        x_ents = self.get_aligned_spans_y2x(self.y.ents, allow_overlap=False)
        # Default to 'None' for missing values
        x_tags = offsets_to_biluo_tags(
            self.x,
@ -337,7 +338,7 @@ def _annot2array(vocab, tok_annot, doc_annot):
            values.append([vocab.strings.add(h) if h is not None else MISSING_DEP for h in value])
        elif key == "SENT_START":
            attrs.append(key)
-            values.append(value)
+            values.append([to_ternary_int(v) for v in value])
        elif key == "MORPH":
            attrs.append(key)
            values.append([vocab.morphology.add(v) for v in value])
--- a/spacy/training/gold_io.pyx
+++ b/spacy/training/gold_io.pyx
@ -121,7 +121,7 @@ def json_to_annotations(doc):
                if i == 0:
                    sent_starts.append(1)
                else:
-                    sent_starts.append(0)
+                    sent_starts.append(-1)
            if "brackets" in sent:
                brackets.extend((b["first"] + sent_start_i,
                                 b["last"] + sent_start_i, b["label"])
--- a/spacy/training/initialize.py
+++ b/spacy/training/initialize.py
@ -8,6 +8,7 @@ import tarfile
 import gzip
 import zipfile
 import tqdm
+from itertools import islice

 from .pretrain import get_tok2vec_ref
 from ..lookups import Lookups
@ -68,7 +69,11 @@ def init_nlp(config: Config, *, use_gpu: int = -1) -> "Language":
    # Make sure that listeners are defined before initializing further
    nlp._link_components()
    with nlp.select_pipes(disable=[*frozen_components, *resume_components]):
-        nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
+        if T["max_epochs"] == -1:
+            logger.debug("Due to streamed train corpus, using only first 100 examples for initialization. If necessary, provide all labels in [initialize]. More info: https://spacy.io/api/cli#init_labels")
+            nlp.initialize(lambda: islice(train_corpus(nlp), 100), sgd=optimizer)
+        else:
+            nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
        logger.info(f"Initialized pipeline components: {nlp.pipe_names}")
    # Detect components with listeners that are not frozen consistently
    for name, proc in nlp.pipeline:
@ -133,6 +138,10 @@ def load_vectors_into_model(
        )
        err = ConfigValidationError.from_error(e, title=title, desc=desc)
        raise err from None
+
+    if len(vectors_nlp.vocab.vectors.keys()) == 0:
+        logger.warning(Warnings.W112.format(name=name))
+
    nlp.vocab.vectors = vectors_nlp.vocab.vectors
    if add_strings:
        # I guess we should add the strings from the vectors_nlp model?
--- a/spacy/training/loggers.py
+++ b/spacy/training/loggers.py
@ -101,8 +101,13 @@ def console_logger(progress_bar: bool = False):
    return setup_printer


-@registry.loggers("spacy.WandbLogger.v1")
-def wandb_logger(project_name: str, remove_config_values: List[str] = []):
+@registry.loggers("spacy.WandbLogger.v2")
+def wandb_logger(
+    project_name: str,
+    remove_config_values: List[str] = [],
+    model_log_interval: Optional[int] = None,
+    log_dataset_dir: Optional[str] = None,
+):
    try:
        import wandb
        from wandb import init, log, join  # test that these are available
@ -119,9 +124,23 @@ def wandb_logger(project_name: str, remove_config_values: List[str] = []):
        for field in remove_config_values:
            del config_dot[field]
        config = util.dot_to_dict(config_dot)
-        wandb.init(project=project_name, config=config, reinit=True)
+        run = wandb.init(project=project_name, config=config, reinit=True)
        console_log_step, console_finalize = console(nlp, stdout, stderr)

+        def log_dir_artifact(
+            path: str,
+            name: str,
+            type: str,
+            metadata: Optional[Dict[str, Any]] = {},
+            aliases: Optional[List[str]] = [],
+        ):
+            dataset_artifact = wandb.Artifact(name, type=type, metadata=metadata)
+            dataset_artifact.add_dir(path, name=name)
+            wandb.log_artifact(dataset_artifact, aliases=aliases)
+
+        if log_dataset_dir:
+            log_dir_artifact(path=log_dataset_dir, name="dataset", type="dataset")
+
        def log_step(info: Optional[Dict[str, Any]]):
            console_log_step(info)
            if info is not None:
@ -133,6 +152,21 @@ def wandb_logger(project_name: str, remove_config_values: List[str] = []):
                    wandb.log({f"loss_{k}": v for k, v in losses.items()})
                if isinstance(other_scores, dict):
                    wandb.log(other_scores)
+                if model_log_interval and info.get("output_path"):
+                    if info["step"] % model_log_interval == 0 and info["step"] != 0:
+                        log_dir_artifact(
+                            path=info["output_path"],
+                            name="pipeline_" + run.id,
+                            type="checkpoint",
+                            metadata=info,
+                            aliases=[
+                                f"epoch {info['epoch']} step {info['step']}",
+                                "latest",
+                                "best"
+                                if info["score"] == max(info["checkpoints"])[0]
+                                else "",
+                            ],
+                        )

        def finalize() -> None:
            console_finalize()
--- a/spacy/training/loop.py
+++ b/spacy/training/loop.py
@ -78,7 +78,7 @@ def train(
    training_step_iterator = train_while_improving(
        nlp,
        optimizer,
-        create_train_batches(train_corpus(nlp), batcher, T["max_epochs"]),
+        create_train_batches(nlp, train_corpus, batcher, T["max_epochs"]),
        create_evaluation_callback(nlp, dev_corpus, score_weights),
        dropout=T["dropout"],
        accumulate_gradient=T["accumulate_gradient"],
@ -96,12 +96,13 @@ def train(
        log_step, finalize_logger = train_logger(nlp, stdout, stderr)
    try:
        for batch, info, is_best_checkpoint in training_step_iterator:
-            log_step(info if is_best_checkpoint is not None else None)
            if is_best_checkpoint is not None:
                with nlp.select_pipes(disable=frozen_components):
                    update_meta(T, nlp, info)
                if output_path is not None:
                    save_checkpoint(is_best_checkpoint)
+                    info["output_path"] = str(output_path / DIR_MODEL_LAST)
+            log_step(info if is_best_checkpoint is not None else None)
    except Exception as e:
        if output_path is not None:
            stdout.write(
@ -289,17 +290,22 @@ def create_evaluation_callback(


 def create_train_batches(
-    iterator: Iterator[Example],
+    nlp: "Language",
+    corpus: Callable[["Language"], Iterable[Example]],
    batcher: Callable[[Iterable[Example]], Iterable[Example]],
    max_epochs: int,
 ):
    epoch = 0
-    examples = list(iterator)
-    if not examples:
-        # Raise error if no data
-        raise ValueError(Errors.E986)
+    if max_epochs >= 0:
+        examples = list(corpus(nlp))
+        if not examples:
+            # Raise error if no data
+            raise ValueError(Errors.E986)
    while max_epochs < 1 or epoch != max_epochs:
-        random.shuffle(examples)
+        if max_epochs >= 0:
+            random.shuffle(examples)
+        else:
+            examples = corpus(nlp)
        for batch in batcher(examples):
            yield epoch, batch
        epoch += 1
--- a/spacy/util.py
+++ b/spacy/util.py
@ -36,7 +36,7 @@ except ImportError:
 try:  # Python 3.8
    import importlib.metadata as importlib_metadata
 except ImportError:
-    import importlib_metadata
+    from catalogue import _importlib_metadata as importlib_metadata

 # These are functions that were previously (v2.x) available from spacy.util
 # and have since moved to Thinc. We're importing them here so people's code
@ -1526,3 +1526,18 @@ def check_lexeme_norms(vocab, component_name):
    if len(lexeme_norms) == 0 and vocab.lang in LEXEME_NORM_LANGS:
        langs = ", ".join(LEXEME_NORM_LANGS)
        logger.debug(Warnings.W033.format(model=component_name, langs=langs))
+
+
+def to_ternary_int(val) -> int:
+    """Convert a value to the ternary 1/0/-1 int used for True/None/False in
+    attributes such as SENT_START: True/1/1.0 is 1 (True), None/0/0.0 is 0
+    (None), any other values are -1 (False).
+    """
+    if isinstance(val, float):
+        val = int(val)
+    if val is True or val is 1:
+        return 1
+    elif val is None or val is 0:
+        return 0
+    else:
+        return -1
--- a/spacy/vectors.pyx
+++ b/spacy/vectors.pyx
@ -55,7 +55,7 @@ cdef class Vectors:
        """Create a new vector store.

        shape (tuple): Size of the table, as (# entries, # columns)
-        data (numpy.ndarray): The vector data.
+        data (numpy.ndarray or cupy.ndarray): The vector data.
        keys (iterable): A sequence of keys, aligned with the data.
        name (str): A name to identify the vectors table.

@ -65,7 +65,8 @@ cdef class Vectors:
        if data is None:
            if shape is None:
                shape = (0,0)
-            data = numpy.zeros(shape, dtype="f")
+            ops = get_current_ops()
+            data = ops.xp.zeros(shape, dtype="f")
        self.data = data
        self.key2row = {}
        if self.data is not None:
@ -300,6 +301,8 @@ cdef class Vectors:
        else:
            raise ValueError(Errors.E197.format(row=row, key=key))
        if vector is not None:
+            xp = get_array_module(self.data)
+            vector = xp.asarray(vector)
            self.data[row] = vector
        if self._unset.count(row):
            self._unset.erase(self._unset.find(row))
@ -321,10 +324,11 @@ cdef class Vectors:
        RETURNS (tuple): The most similar entries as a `(keys, best_rows, scores)`
            tuple.
        """
+        xp = get_array_module(self.data)
        filled = sorted(list({row for row in self.key2row.values()}))
        if len(filled) < n:
            raise ValueError(Errors.E198.format(n=n, n_rows=len(filled)))
-        xp = get_array_module(self.data)
+        filled = xp.asarray(filled)

        norms = xp.linalg.norm(self.data[filled], axis=1, keepdims=True)
        norms[norms == 0] = 1
@ -357,8 +361,10 @@ cdef class Vectors:
        # Account for numerical error we want to return in range -1, 1
        scores = xp.clip(scores, a_min=-1, a_max=1, out=scores)
        row2key = {row: key for key, row in self.key2row.items()}
+
+        numpy_rows = get_current_ops().to_numpy(best_rows)
        keys = xp.asarray(
-            [[row2key[row] for row in best_rows[i] if row in row2key]
+            [[row2key[row] for row in numpy_rows[i] if row in row2key]
                    for i in range(len(queries)) ], dtype="uint64")
        return (keys, best_rows, scores)

@ -459,7 +465,8 @@ cdef class Vectors:
            if hasattr(self.data, "from_bytes"):
                self.data.from_bytes()
            else:
-                self.data = srsly.msgpack_loads(b)
+                xp = get_array_module(self.data)
+                self.data = xp.asarray(srsly.msgpack_loads(b))

        deserializers = {
            "key2row": lambda b: self.key2row.update(srsly.msgpack_loads(b)),
--- a/spacy/vocab.pyx
+++ b/spacy/vocab.pyx
@ -2,7 +2,7 @@
 from libc.string cimport memcpy

 import srsly
-from thinc.api import get_array_module
+from thinc.api import get_array_module, get_current_ops
 import functools

 from .lexeme cimport EMPTY_LEXEME, OOV_RANK
@ -293,7 +293,7 @@ cdef class Vocab:
        among those remaining.

        For example, suppose the original table had vectors for the words:
-        ['sat', 'cat', 'feline', 'reclined']. If we prune the vector table to,
+        ['sat', 'cat', 'feline', 'reclined']. If we prune the vector table to
        two rows, we would discard the vectors for 'feline' and 'reclined'.
        These words would then be remapped to the closest remaining vector
        -- so "feline" would have the same vector as "cat", and "reclined"
@ -314,6 +314,7 @@ cdef class Vocab:

        DOCS: https://spacy.io/api/vocab#prune_vectors
        """
+        ops = get_current_ops()
        xp = get_array_module(self.vectors.data)
        # Make sure all vectors are in the vocab
        for orth in self.vectors:
@ -329,8 +330,9 @@ cdef class Vocab:
        toss = xp.ascontiguousarray(self.vectors.data[indices[nr_row:]])
        self.vectors = Vectors(data=keep, keys=keys[:nr_row], name=self.vectors.name)
        syn_keys, syn_rows, scores = self.vectors.most_similar(toss, batch_size=batch_size)
+        syn_keys = ops.to_numpy(syn_keys)
        remap = {}
-        for i, key in enumerate(keys[nr_row:]):
+        for i, key in enumerate(ops.to_numpy(keys[nr_row:])):
            self.vectors.add(key, row=syn_rows[i][0])
            word = self.strings[key]
            synonym = self.strings[syn_keys[i][0]]
@ -351,7 +353,7 @@ cdef class Vocab:
            Defaults to the length of `orth`.
        maxn (int): Maximum n-gram length used for Fasttext's ngram computation.
            Defaults to the length of `orth`.
-        RETURNS (numpy.ndarray): A word vector. Size
+        RETURNS (numpy.ndarray or cupy.ndarray): A word vector. Size
            and shape determined by the `vocab.vectors` instance. Usually, a
            numpy ndarray of shape (300,) and dtype float32.

@ -400,7 +402,7 @@ cdef class Vocab:
        by string or int ID.

        orth (int / unicode): The word.
-        vector (numpy.ndarray[ndim=1, dtype='float32']): The vector to set.
+        vector (numpy.ndarray or cupy.nadarry[ndim=1, dtype='float32']): The vector to set.

        DOCS: https://spacy.io/api/vocab#set_vector
        """
--- a/website/docs/api/architectures.md
+++ b/website/docs/api/architectures.md
@ -35,7 +35,7 @@ usage documentation on
 > @architectures = "spacy.Tok2Vec.v2"
 >
 > [model.embed]
-> @architectures = "spacy.CharacterEmbed.v1"
+> @architectures = "spacy.CharacterEmbed.v2"
 > # ...
 >
 > [model.encode]
@ -54,13 +54,13 @@ blog post for background.
 | `encode`    | Encode context into the embeddings, using an architecture such as a CNN, BiLSTM or transformer. For example, [MaxoutWindowEncoder](/api/architectures#MaxoutWindowEncoder). ~~Model[List[Floats2d], List[Floats2d]]~~            |
 | **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~                                                                                                                                                           |

-### spacy.HashEmbedCNN.v1 {#HashEmbedCNN}
+### spacy.HashEmbedCNN.v2 {#HashEmbedCNN}

 > #### Example Config
 >
 > ```ini
 > [model]
-> @architectures = "spacy.HashEmbedCNN.v1"
+> @architectures = "spacy.HashEmbedCNN.v2"
 > pretrained_vectors = null
 > width = 96
 > depth = 4
@ -96,7 +96,7 @@ consisting of a CNN and a layer-normalized maxout activation function.
 > factory = "tok2vec"
 >
 > [components.tok2vec.model]
-> @architectures = "spacy.HashEmbedCNN.v1"
+> @architectures = "spacy.HashEmbedCNN.v2"
 > width = 342
 >
 > [components.tagger]
@ -129,13 +129,13 @@ argument that connects to the shared `tok2vec` component in the pipeline.
 | `upstream`  | A string to identify the "upstream" `Tok2Vec` component to communicate with. By default, the upstream name is the wildcard string `"*"`, but you could also specify the name of the `Tok2Vec` component. You'll almost never have multiple upstream `Tok2Vec` components, so the wildcard string will almost always be fine. ~~str~~ |
 | **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~                                                                                                                                                                                                                                                               |

-### spacy.MultiHashEmbed.v1 {#MultiHashEmbed}
+### spacy.MultiHashEmbed.v2 {#MultiHashEmbed}

 > #### Example config
 >
 > ```ini
 > [model]
-> @architectures = "spacy.MultiHashEmbed.v1"
+> @architectures = "spacy.MultiHashEmbed.v2"
 > width = 64
 > attrs = ["NORM", "PREFIX", "SUFFIX", "SHAPE"]
 > rows = [2000, 1000, 1000, 1000]
@ -160,13 +160,13 @@ not updated).
 | `include_static_vectors` | Whether to also use static word vectors. Requires a vectors table to be loaded in the [`Doc`](/api/doc) objects' vocab. ~~bool~~                                                                                                                                                                                                                                                                                                                   |
 | **CREATES**              | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~                                                                                                                                                                                                                                                                                                                                                                             |

-### spacy.CharacterEmbed.v1 {#CharacterEmbed}
+### spacy.CharacterEmbed.v2 {#CharacterEmbed}

 > #### Example config
 >
 > ```ini
 > [model]
-> @architectures = "spacy.CharacterEmbed.v1"
+> @architectures = "spacy.CharacterEmbed.v2"
 > width = 128
 > rows = 7000
 > nM = 64
@ -266,13 +266,13 @@ Encode context using bidirectional LSTM layers. Requires
 | `dropout`   | Creates a Dropout layer on the outputs of each LSTM layer except the last layer. Set to 0.0 to disable this functionality. ~~float~~                                                                           |
 | **CREATES** | The model using the architecture. ~~Model[List[Floats2d], List[Floats2d]]~~                                                                                                                                    |

-### spacy.StaticVectors.v1 {#StaticVectors}
+### spacy.StaticVectors.v2 {#StaticVectors}

 > #### Example config
 >
 > ```ini
 > [model]
-> @architectures = "spacy.StaticVectors.v1"
+> @architectures = "spacy.StaticVectors.v2"
 > nO = null
 > nM = null
 > dropout = 0.2
@ -283,8 +283,9 @@ Encode context using bidirectional LSTM layers. Requires
 > ```

 Embed [`Doc`](/api/doc) objects with their vocab's vectors table, applying a
-learned linear projection to control the dimensionality. See the documentation
-on [static vectors](/usage/embeddings-transformers#static-vectors) for details.
+learned linear projection to control the dimensionality. Unknown tokens are
+mapped to a zero vector. See the documentation on [static
+vectors](/usage/embeddings-transformers#static-vectors) for details.

 | Name        |  Description                                                                                                                                                                                                            |
 | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@ -513,7 +514,7 @@ for a Tok2Vec layer.
 > use_upper = true
 >
 > [model.tok2vec]
-> @architectures = "spacy.HashEmbedCNN.v1"
+> @architectures = "spacy.HashEmbedCNN.v2"
 > pretrained_vectors = null
 > width = 96
 > depth = 4
@ -619,7 +620,7 @@ single-label use-cases where `exclusive_classes = true`, while the
 > @architectures = "spacy.Tok2Vec.v2"
 >
 > [model.tok2vec.embed]
-> @architectures = "spacy.MultiHashEmbed.v1"
+> @architectures = "spacy.MultiHashEmbed.v2"
 > width = 64
 > rows = [2000, 2000, 1000, 1000, 1000, 1000]
 > attrs = ["ORTH", "LOWER", "PREFIX", "SUFFIX", "SHAPE", "ID"]
@ -676,7 +677,7 @@ taking it as argument:
 > nO = null
 >
 > [model.tok2vec]
-> @architectures = "spacy.HashEmbedCNN.v1"
+> @architectures = "spacy.HashEmbedCNN.v2"
 > pretrained_vectors = null
 > width = 96
 > depth = 4
@ -744,7 +745,7 @@ into the "real world". This requires 3 main components:
 > nO = null
 >
 > [model.tok2vec]
-> @architectures = "spacy.HashEmbedCNN.v1"
+> @architectures = "spacy.HashEmbedCNN.v2"
 > pretrained_vectors = null
 > width = 96
 > depth = 2
--- a/website/docs/api/cli.md
+++ b/website/docs/api/cli.md
@ -12,6 +12,7 @@ menu:
  - ['train', 'train']
  - ['pretrain', 'pretrain']
  - ['evaluate', 'evaluate']
+  - ['assemble', 'assemble']
  - ['package', 'package']
  - ['project', 'project']
  - ['ray', 'ray']
@ -892,6 +893,34 @@ $ python -m spacy evaluate [model] [data_path] [--output] [--code] [--gold-prepr
 | `--help`, `-h`                            | Show help message and available arguments. ~~bool (flag)~~                                                                                                                           |
 | **CREATES**                               | Training results and optional metrics and visualizations.                                                                                                                            |

+## assemble {#assemble tag="command"}
+
+Assemble a pipeline from a config file without additional training. Expects a
+[config file](/api/data-formats#config) with all settings and hyperparameters.
+The `--code` argument can be used to import a Python file that lets you register
+[custom functions](/usage/training#custom-functions) and refer to them in your
+config.
+
+> #### Example
+>
+> ```cli
+> $ python -m spacy assemble config.cfg ./output
+> ```
+
+```cli
+$ python -m spacy assemble [config_path] [output_dir] [--code] [--verbose] [overrides]
+```
+
+| Name              | Description                                                                                                                                                                                                   |
+| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `config_path`     | Path to the [config](/api/data-formats#config) file containing all settings and hyperparameters. If `-`, the data will be [read from stdin](/usage/training#config-stdin). ~~Union[Path, str] \(positional)~~ |
+| `output_dir`      | Directory to store the final pipeline in. Will be created if it doesn't exist. ~~Optional[Path] \(option)~~                                                                                                   |
+| `--code`, `-c`    | Path to Python file with additional code to be imported. Allows [registering custom functions](/usage/training#custom-functions). ~~Optional[Path] \(option)~~                                                |
+| `--verbose`, `-V` | Show more detailed messages during processing. ~~bool (flag)~~                                                                                                                                                |
+| `--help`, `-h`    | Show help message and available arguments. ~~bool (flag)~~                                                                                                                                                    |
+| overrides         | Config parameters to override. Should be options starting with `--` that correspond to the config section and value to override, e.g. `--paths.data ./data`. ~~Any (option/flag)~~                            |
+| **CREATES**       | The final assembled pipeline.                                                                                                                                                                                 |
+
 ## package {#package tag="command"}

 Generate an installable [Python package](/usage/training#models-generating) from
--- a/website/docs/api/data-formats.md
+++ b/website/docs/api/data-formats.md
@ -29,8 +29,8 @@ recommended settings for your use case, check out the
 >
 > The `@` syntax lets you refer to function names registered in the
 > [function registry](/api/top-level#registry). For example,
-> `@architectures = "spacy.HashEmbedCNN.v1"` refers to a registered function of
-> the name [spacy.HashEmbedCNN.v1](/api/architectures#HashEmbedCNN) and all
+> `@architectures = "spacy.HashEmbedCNN.v2"` refers to a registered function of
+> the name [spacy.HashEmbedCNN.v2](/api/architectures#HashEmbedCNN) and all
 > other values defined in its block will be passed into that function as
 > arguments. Those arguments depend on the registered function. See the usage
 > guide on [registered functions](/usage/training#config-functions) for details.
@ -193,10 +193,10 @@ process that are used when you run [`spacy train`](/api/cli#train).
 | `frozen_components`   | Pipeline component names that are "frozen" and shouldn't be initialized or updated during training. See [here](/usage/training#config-components) for details. Defaults to `[]`. ~~List[str]~~                                                                                                                                      |
 | `gpu_allocator`       | Library for cupy to route GPU memory allocation to. Can be `"pytorch"` or `"tensorflow"`. Defaults to variable `${system.gpu_allocator}`. ~~str~~                                                                                                                                                                                   |
 | `logger`              | Callable that takes the `nlp` and stdout and stderr `IO` objects, sets up the logger, and returns two new callables to log a training step and to finalize the logger. Defaults to [`ConsoleLogger`](/api/top-level#ConsoleLogger). ~~Callable[[Language, IO, IO], [Tuple[Callable[[Dict[str, Any]], None], Callable[[], None]]]]~~ |
-| `max_epochs`          | Maximum number of epochs to train for. Defaults to `0`. ~~int~~                                                                                                                                                                                                                                                                     |
-| `max_steps`           | Maximum number of update steps to train for. Defaults to `20000`. ~~int~~                                                                                                                                                                                                                                                           |
+| `max_epochs`          | Maximum number of epochs to train for. `0` means an unlimited number of epochs. `-1` means that the train corpus should be streamed rather than loaded into memory with no shuffling within the training loop. Defaults to `0`. ~~int~~                                                                                             |
+| `max_steps`           | Maximum number of update steps to train for. `0` means an unlimited number of steps. Defaults to `20000`. ~~int~~                                                                                                                                                                                                                   |
 | `optimizer`           | The optimizer. The learning rate schedule and other settings can be configured as part of the optimizer. Defaults to [`Adam`](https://thinc.ai/docs/api-optimizers#adam). ~~Optimizer~~                                                                                                                                             |
-| `patience`            | How many steps to continue without improvement in evaluation score. Defaults to `1600`. ~~int~~                                                                                                                                                                                                                                     |
+| `patience`            | How many steps to continue without improvement in evaluation score. `0` disables early stopping. Defaults to `1600`. ~~int~~                                                                                                                                                                                                        |
 | `score_weights`       | Score names shown in metrics mapped to their weight towards the final weighted score. See [here](/usage/training#metrics) for details. Defaults to `{}`. ~~Dict[str, float]~~                                                                                                                                                       |
 | `seed`                | The random seed. Defaults to variable `${system.seed}`. ~~int~~                                                                                                                                                                                                                                                                     |
 | `train_corpus`        | Dot notation of the config location defining the train corpus. Defaults to `corpora.train`. ~~str~~                                                                                                                                                                                                                                 |
@ -390,7 +390,7 @@ file to keep track of your settings and hyperparameters and your own
 >    "tags": List[str],
 >    "pos": List[str],
 >    "morphs": List[str],
->    "sent_starts": List[bool],
+>    "sent_starts": List[Optional[bool]],
 >    "deps": List[string],
 >    "heads": List[int],
 >    "entities": List[str],
--- a/website/docs/api/doc.md
+++ b/website/docs/api/doc.md
@ -44,7 +44,7 @@ Construct a `Doc` object. The most common way to get a `Doc` object is via the
 | `lemmas` <Tag variant="new">3</Tag>      | A list of strings, of the same length as `words`, to assign as `token.lemma` for each word. Defaults to `None`. ~~Optional[List[str]]~~                                                            |
 | `heads` <Tag variant="new">3</Tag>       | A list of values, of the same length as `words`, to assign as the head for each word. Head indices are the absolute position of the head in the `Doc`. Defaults to `None`. ~~Optional[List[int]]~~ |
 | `deps` <Tag variant="new">3</Tag>        | A list of strings, of the same length as `words`, to assign as `token.dep` for each word. Defaults to `None`. ~~Optional[List[str]]~~                                                              |
-| `sent_starts` <Tag variant="new">3</Tag> | A list of values, of the same length as `words`, to assign as `token.is_sent_start`. Will be overridden by heads if `heads` is provided. Defaults to `None`. ~~Optional[List[Union[bool, None]]~~  |
+| `sent_starts` <Tag variant="new">3</Tag> | A list of values, of the same length as `words`, to assign as `token.is_sent_start`. Will be overridden by heads if `heads` is provided. Defaults to `None`. ~~Optional[List[Optional[bool]]]~~    |
 | `ents` <Tag variant="new">3</Tag>        | A list of strings, of the same length of `words`, to assign the token-based IOB tag. Defaults to `None`. ~~Optional[List[str]]~~                                                                   |

 ## Doc.\_\_getitem\_\_ {#getitem tag="method"}
--- a/website/docs/api/example.md
+++ b/website/docs/api/example.md
@ -33,8 +33,8 @@ both documents.

 | Name           | Description                                                                                                              |
 | -------------- | ------------------------------------------------------------------------------------------------------------------------ |
-| `predicted`    | The document containing (partial) predictions. Cannot be `None`. ~~Doc~~                                                |
-| `reference`    | The document containing gold-standard annotations. Cannot be `None`. ~~Doc~~                                            |
+| `predicted`    | The document containing (partial) predictions. Cannot be `None`. ~~Doc~~                                                 |
+| `reference`    | The document containing gold-standard annotations. Cannot be `None`. ~~Doc~~                                             |
 | _keyword-only_ |                                                                                                                          |
 | `alignment`    | An object holding the alignment between the tokens of the `predicted` and `reference` documents. ~~Optional[Alignment]~~ |

@ -56,11 +56,11 @@ see the [training format documentation](/api/data-formats#dict-input).
 > example = Example.from_dict(predicted, {"words": token_ref, "tags": tags_ref})
 > ```

-| Name           | Description                                                               |
-| -------------- | ------------------------------------------------------------------------- |
-| `predicted`    | The document containing (partial) predictions. Cannot be `None`. ~~Doc~~ |
-| `example_dict` | `Dict[str, obj]`                                                          | The gold-standard annotations as a dictionary. Cannot be `None`. ~~Dict[str, Any]~~ |
-| **RETURNS**    | The newly constructed object. ~~Example~~                                 |
+| Name           | Description                                                                         |
+| -------------- | ----------------------------------------------------------------------------------- |
+| `predicted`    | The document containing (partial) predictions. Cannot be `None`. ~~Doc~~            |
+| `example_dict` | The gold-standard annotations as a dictionary. Cannot be `None`. ~~Dict[str, Any]~~ |
+| **RETURNS**    | The newly constructed object. ~~Example~~                                           |

 ## Example.text {#text tag="property"}

@ -211,10 +211,11 @@ align to the tokenization in [`Example.predicted`](/api/example#predicted).
 > assert [(ent.start, ent.end) for ent in ents_y2x] == [(0, 1)]
 > ```

-| Name        | Description                                                                   |
-| ----------- | ----------------------------------------------------------------------------- |
-| `y_spans`   | `Span` objects aligned to the tokenization of `reference`. ~~Iterable[Span]~~ |
-| **RETURNS** | `Span` objects aligned to the tokenization of `predicted`. ~~List[Span]~~     |
+| Name            | Description                                                                                  |
+| --------------- | -------------------------------------------------------------------------------------------- |
+| `y_spans`       | `Span` objects aligned to the tokenization of `reference`. ~~Iterable[Span]~~                |
+| `allow_overlap` | Whether the resulting `Span` objects may overlap or not. Set to `False` by default. ~~bool~~ |
+| **RETURNS**     | `Span` objects aligned to the tokenization of `predicted`. ~~List[Span]~~                    |

 ## Example.get_aligned_spans_x2y {#get_aligned_spans_x2y tag="method"}

@ -238,10 +239,11 @@ against the original gold-standard annotation.
 > assert [(ent.start, ent.end) for ent in ents_x2y] == [(0, 2)]
 > ```

-| Name        | Description                                                                   |
-| ----------- | ----------------------------------------------------------------------------- |
-| `x_spans`   | `Span` objects aligned to the tokenization of `predicted`. ~~Iterable[Span]~~ |
-| **RETURNS** | `Span` objects aligned to the tokenization of `reference`. ~~List[Span]~~     |
+| Name            | Description                                                                                  |
+| --------------- | -------------------------------------------------------------------------------------------- |
+| `x_spans`       | `Span` objects aligned to the tokenization of `predicted`. ~~Iterable[Span]~~                |
+| `allow_overlap` | Whether the resulting `Span` objects may overlap or not. Set to `False` by default. ~~bool~~ |
+| **RETURNS**     | `Span` objects aligned to the tokenization of `reference`. ~~List[Span]~~                    |

 ## Example.to_dict {#to_dict tag="method"}

--- a/website/docs/api/legacy.md
+++ b/website/docs/api/legacy.md
@ -4,12 +4,13 @@ teaser: Archived implementations available through spacy-legacy
 source: spacy/legacy
 ---

-The [`spacy-legacy`](https://github.com/explosion/spacy-legacy) package includes 
-outdated registered functions and architectures. It is installed automatically as 
-a dependency of spaCy, and provides backwards compatibility for archived functions 
-that may still be used in projects.
+The [`spacy-legacy`](https://github.com/explosion/spacy-legacy) package includes
+outdated registered functions and architectures. It is installed automatically
+as a dependency of spaCy, and provides backwards compatibility for archived
+functions that may still be used in projects.

-You can find the detailed documentation of each such legacy function on this page.
+You can find the detailed documentation of each such legacy function on this
+page.

 ## Architectures {#architectures}

@ -17,8 +18,8 @@ These functions are available from `@spacy.registry.architectures`.

 ### spacy.Tok2Vec.v1 {#Tok2Vec_v1}

-The `spacy.Tok2Vec.v1` architecture was expecting an `encode` model of type 
-`Model[Floats2D, Floats2D]` such as `spacy.MaxoutWindowEncoder.v1` or 
+The `spacy.Tok2Vec.v1` architecture was expecting an `encode` model of type
+`Model[Floats2D, Floats2D]` such as `spacy.MaxoutWindowEncoder.v1` or
 `spacy.MishWindowEncoder.v1`.

 > #### Example config
@ -44,15 +45,14 @@ blog post for background.
 | Name        | Description                                                                                                                                                                                                                      |
 | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `embed`     | Embed tokens into context-independent word vector representations. For example, [CharacterEmbed](/api/architectures#CharacterEmbed) or [MultiHashEmbed](/api/architectures#MultiHashEmbed). ~~Model[List[Doc], List[Floats2d]]~~ |
-| `encode`    | Encode context into the embeddings, using an architecture such as a CNN, BiLSTM or transformer. For example, [MaxoutWindowEncoder.v1](/api/legacy#MaxoutWindowEncoder_v1). ~~Model[Floats2d, Floats2d]~~                            |
+| `encode`    | Encode context into the embeddings, using an architecture such as a CNN, BiLSTM or transformer. For example, [MaxoutWindowEncoder.v1](/api/legacy#MaxoutWindowEncoder_v1). ~~Model[Floats2d, Floats2d]~~                         |
 | **CREATES** | The model using the architecture. ~~Model[List[Doc], List[Floats2d]]~~                                                                                                                                                           |

 ### spacy.MaxoutWindowEncoder.v1 {#MaxoutWindowEncoder_v1}

-The `spacy.MaxoutWindowEncoder.v1` architecture was producing a model of type 
-`Model[Floats2D, Floats2D]`. Since `spacy.MaxoutWindowEncoder.v2`, this has been changed to output 
-type `Model[List[Floats2d], List[Floats2d]]`.
-
+The `spacy.MaxoutWindowEncoder.v1` architecture was producing a model of type
+`Model[Floats2D, Floats2D]`. Since `spacy.MaxoutWindowEncoder.v2`, this has been
+changed to output type `Model[List[Floats2d], List[Floats2d]]`.

 > #### Example config
 >
@ -78,9 +78,9 @@ and residual connections.

 ### spacy.MishWindowEncoder.v1 {#MishWindowEncoder_v1}

-The `spacy.MishWindowEncoder.v1` architecture was producing a model of type 
-`Model[Floats2D, Floats2D]`. Since `spacy.MishWindowEncoder.v2`, this has been changed to output 
-type `Model[List[Floats2d], List[Floats2d]]`.
+The `spacy.MishWindowEncoder.v1` architecture was producing a model of type
+`Model[Floats2D, Floats2D]`. Since `spacy.MishWindowEncoder.v2`, this has been
+changed to output type `Model[List[Floats2d], List[Floats2d]]`.

 > #### Example config
 >
@ -103,12 +103,11 @@ and residual connections.
 | `depth`       | The number of convolutional layers. Recommended value is `4`. ~~int~~                                                                                                                                          |
 | **CREATES**   | The model using the architecture. ~~Model[Floats2d, Floats2d]~~                                                                                                                                                |

-
 ### spacy.TextCatEnsemble.v1 {#TextCatEnsemble_v1}

-The `spacy.TextCatEnsemble.v1` architecture built an internal `tok2vec` and `linear_model`. 
-Since `spacy.TextCatEnsemble.v2`, this has been refactored so that the `TextCatEnsemble` takes these 
-two sublayers as input.
+The `spacy.TextCatEnsemble.v1` architecture built an internal `tok2vec` and
+`linear_model`. Since `spacy.TextCatEnsemble.v2`, this has been refactored so
+that the `TextCatEnsemble` takes these two sublayers as input.

 > #### Example Config
 >
@ -140,4 +139,62 @@ network has an internal CNN Tok2Vec layer and uses attention.
 | `ngram_size`         | Determines the maximum length of the n-grams in the BOW model. For instance, `ngram_size=3`would give unigram, trigram and bigram features. ~~int~~                                            |
 | `dropout`            | The dropout rate. ~~float~~                                                                                                                                                                    |
 | `nO`                 | Output dimension, determined by the number of different labels. If not set, the [`TextCategorizer`](/api/textcategorizer) component will set it when `initialize` is called. ~~Optional[int]~~ |
-| **CREATES**          | The model using the architecture. ~~Model[List[Doc], Floats2d]~~                                                                                                                               |
+| **CREATES**          | The model using the architecture. ~~Model[List[Doc], Floats2d]~~                                                                                                                               |
+
+### spacy.HashEmbedCNN.v1 {#HashEmbedCNN_v1}
+
+Identical to [`spacy.HashEmbedCNN.v2`](/api/architectures#HashEmbedCNN) except
+using [`spacy.StaticVectors.v1`](#StaticVectors_v1) if vectors are included.
+
+### spacy.MultiHashEmbed.v1 {#MultiHashEmbed_v1}
+
+Identical to [`spacy.MultiHashEmbed.v2`](/api/architectures#MultiHashEmbed)
+except with [`spacy.StaticVectors.v1`](#StaticVectors_v1) if vectors are
+included.
+
+### spacy.CharacterEmbed.v1 {#CharacterEmbed_v1}
+
+Identical to [`spacy.CharacterEmbed.v2`](/api/architectures#CharacterEmbed)
+except using [`spacy.StaticVectors.v1`](#StaticVectors_v1) if vectors are
+included.
+
+## Layers {#layers}
+
+These functions are available from `@spacy.registry.layers`.
+
+### spacy.StaticVectors.v1 {#StaticVectors_v1}
+
+Identical to [`spacy.StaticVectors.v2`](/api/architectures#StaticVectors) except
+for the handling of tokens without vectors.
+
+<Infobox title="Bugs for tokens without vectors" variant="warning">
+
+`spacy.StaticVectors.v1` maps tokens without vectors to the final row in the
+vectors table, which causes the model predictions to change if new vectors are
+added to an existing vectors table. See more details in
+[issue #7662](https://github.com/explosion/spaCy/issues/7662#issuecomment-813925655).
+
+</Infobox>
+
+## Loggers {#loggers}
+
+These functions are available from `@spacy.registry.loggers`.
+
+### spacy.WandbLogger.v1 {#WandbLogger_v1}
+
+The first version of the [`WandbLogger`](/api/top-level#WandbLogger) did not yet
+support the `log_dataset_dir` and `model_log_interval` arguments.
+
+> #### Example config
+>
+> ```ini
+> [training.logger]
+> @loggers = "spacy.WandbLogger.v1"
+> project_name = "monitor_spacy_training"
+> remove_config_values = ["paths.train", "paths.dev", "corpora.train.path", "corpora.dev.path"]
+> ```
+>
+> | Name                   | Description                                                                                                                           |
+> | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
+> | `project_name`         | The name of the project in the Weights & Biases interface. The project will be created automatically if it doesn't exist yet. ~~str~~ |
+> | `remove_config_values` | A list of values to include from the config before it is uploaded to W&B (default: empty). ~~List[str]~~                              |
--- a/website/docs/api/matcher.md
+++ b/website/docs/api/matcher.md
@ -120,13 +120,14 @@ Find all token sequences matching the supplied patterns on the `Doc` or `Span`.
 > matches = matcher(doc)
 > ```

-| Name                                       | Description                                                                                                                                                                                                                                                                                              |
-| ------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `doclike`                                  | The `Doc` or `Span` to match over. ~~Union[Doc, Span]~~                                                                                                                                                                                                                                                  |
-| _keyword-only_                             |                                                                                                                                                                                                                                                                                                          |
-| `as_spans` <Tag variant="new">3</Tag>      | Instead of tuples, return a list of [`Span`](/api/span) objects of the matches, with the `match_id` assigned as the span label. Defaults to `False`. ~~bool~~                                                                                                                                            |
-| `allow_missing` <Tag variant="new">3</Tag> | Whether to skip checks for missing annotation for attributes included in patterns. Defaults to `False`. ~~bool~~                                                                                                                                                                                         |
-| **RETURNS**                                | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end`]. The `match_id` is the ID of the added match pattern. If `as_spans` is set to `True`, a list of `Span` objects is returned instead. ~~Union[List[Tuple[int, int, int]], List[Span]]~~ |
+| Name                                           | Description                                                                                                                                                                                                                                                                                              |
+| ---------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `doclike`                                      | The `Doc` or `Span` to match over. ~~Union[Doc, Span]~~                                                                                                                                                                                                                                                  |
+| _keyword-only_                                 |                                                                                                                                                                                                                                                                                                          |
+| `as_spans` <Tag variant="new">3</Tag>          | Instead of tuples, return a list of [`Span`](/api/span) objects of the matches, with the `match_id` assigned as the span label. Defaults to `False`. ~~bool~~                                                                                                                                            |
+| `allow_missing` <Tag variant="new">3</Tag>     | Whether to skip checks for missing annotation for attributes included in patterns. Defaults to `False`. ~~bool~~                                                                                                                                                                                         |
+| `with_alignments` <Tag variant="new">3.1</Tag> | Return match alignment information as part of the match tuple as `List[int]` with the same length as the matched span. Each entry denotes the corresponding index of the token pattern. If `as_spans` is set to `True`, this setting is ignored. Defaults to `False`. ~~bool~~                             |
+| **RETURNS**                                    | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end`]. The `match_id` is the ID of the added match pattern. If `as_spans` is set to `True`, a list of `Span` objects is returned instead. ~~Union[List[Tuple[int, int, int]], List[Span]]~~ |

 ## Matcher.\_\_len\_\_ {#len tag="method" new="2"}

--- a/website/docs/api/scorer.md
+++ b/website/docs/api/scorer.md
@ -137,14 +137,16 @@ Returns PRF scores for labeled or unlabeled spans.
 > print(scores["ents_f"])
 > ```

-| Name             | Description                                                                                                                                                                                                        |
-| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `examples`       | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~                                                                                                |
-| `attr`           | The attribute to score. ~~str~~                                                                                                                                                                                    |
-| _keyword-only_   |                                                                                                                                                                                                                    |
-| `getter`         | Defaults to `getattr`. If provided, `getter(doc, attr)` should return the `Span` objects for an individual `Doc`. ~~Callable[[Doc, str], Iterable[Span]]~~                                                         |
-| `has_annotation` | Defaults to `None`. If provided, `has_annotation(doc)` should return whether a `Doc` has annotation for this `attr`. Docs without annotation are skipped for scoring purposes. ~~Optional[Callable[[Doc], bool]]~~ |
-| **RETURNS**      | A dictionary containing the PRF scores under the keys `{attr}_p`, `{attr}_r`, `{attr}_f` and the per-type PRF scores under `{attr}_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~                        |
+| Name             | Description                                                                                                                                                                                 |
+| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `examples`       | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~                                                                         |
+| `attr`           | The attribute to score. ~~str~~                                                                                                                                                             |
+| _keyword-only_   |                                                                                                                                                                                             |
+| `getter`         | Defaults to `getattr`. If provided, `getter(doc, attr)` should return the `Span` objects for an individual `Doc`. ~~Callable[[Doc, str], Iterable[Span]]~~                                  |
+| `has_annotation` | Defaults to `None`. If provided, `has_annotation(doc)` should return whether a `Doc` has annotation for this `attr`. Docs without annotation are skipped for scoring purposes. ~~str~~      |
+| `labeled`        | Defaults to `True`. If set to `False`, two spans will be considered equal if their start and end match, irrespective of their label. ~~bool~~                                               |
+| `allow_overlap`  | Defaults to `False`. Whether or not to allow overlapping spans. If set to `False`, the alignment will automatically resolve conflicts. ~~bool~~                                             |
+| **RETURNS**      | A dictionary containing the PRF scores under the keys `{attr}_p`, `{attr}_r`, `{attr}_f` and the per-type PRF scores under `{attr}_per_type`. ~~Dict[str, Union[float, Dict[str, float]]]~~ |

 ## Scorer.score_deps {#score_deps tag="staticmethod" new="3"}

--- a/website/docs/api/token.md
+++ b/website/docs/api/token.md
@ -364,7 +364,7 @@ unknown. Defaults to `True` for the first token in the `Doc`.

 | Name        | Description                                   |
 | ----------- | --------------------------------------------- |
-| **RETURNS** | Whether the token starts a sentence. ~~bool~~ |
+| **RETURNS** | Whether the token starts a sentence. ~~Optional[bool]~~ |

 ## Token.has_vector {#has_vector tag="property" model="vectors"}

--- a/website/docs/api/top-level.md
+++ b/website/docs/api/top-level.md
@ -8,6 +8,7 @@ menu:
  - ['Readers', 'readers']
  - ['Batchers', 'batchers']
  - ['Augmenters', 'augmenters']
+  - ['Callbacks', 'callbacks']
  - ['Training & Alignment', 'gold']
  - ['Utility Functions', 'util']
 ---
@ -461,7 +462,7 @@ start decreasing across epochs.

 </Accordion>

-#### spacy.WandbLogger.v1 {#WandbLogger tag="registered function"}
+#### spacy.WandbLogger.v2 {#WandbLogger tag="registered function"}

 > #### Installation
 >
@ -493,15 +494,19 @@ remain in the config file stored on your local system.
 >
 > ```ini
 > [training.logger]
-> @loggers = "spacy.WandbLogger.v1"
+> @loggers = "spacy.WandbLogger.v2"
 > project_name = "monitor_spacy_training"
 > remove_config_values = ["paths.train", "paths.dev", "corpora.train.path", "corpora.dev.path"]
+> log_dataset_dir = "corpus"
+> model_log_interval = 1000
 > ```

 | Name                   | Description                                                                                                                           |
 | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
 | `project_name`         | The name of the project in the Weights & Biases interface. The project will be created automatically if it doesn't exist yet. ~~str~~ |
 | `remove_config_values` | A list of values to include from the config before it is uploaded to W&B (default: empty). ~~List[str]~~                              |
+| `model_log_interval`   | Steps to wait between logging model checkpoints to W&B dasboard (default: None). ~~Optional[int]~~                                    |
+| `log_dataset_dir`      | Directory containing dataset to be logged and versioned as W&B artifact (default: None). ~~Optional[str]~~                            |

 <Project id="integrations/wandb">

@ -781,6 +786,35 @@ useful for making the model less sensitive to capitalization.
 | `level`     | The percentage of texts that will be augmented. ~~float~~                                                                                                                    |
 | **CREATES** | A function that takes the current `nlp` object and an [`Example`](/api/example) and yields augmented `Example` objects. ~~Callable[[Language, Example], Iterator[Example]]~~ |

+## Callbacks {#callbacks source="spacy/training/callbacks.py" new="3"}
+
+The config supports [callbacks](/usage/training#custom-code-nlp-callbacks) at
+several points in the lifecycle that can be used modify the `nlp` object.
+
+### spacy.copy_from_base_model.v1 {#copy_from_base_model tag="registered function"}
+
+> #### Example config
+>
+> ```ini
+> [initialize.before_init]
+> @callbacks = "spacy.copy_from_base_model.v1"
+> tokenizer = "en_core_sci_md"
+> vocab = "en_core_sci_md"
+> ```
+
+Copy the tokenizer and/or vocab from the specified models. It's similar to the
+v2 [base model](https://v2.spacy.io/api/cli#train) option and useful in
+combination with
+[sourced components](/usage/processing-pipelines#sourced-components) when
+fine-tuning an existing pipeline. The vocab includes the lookups and the vectors
+from the specified model. Intended for use in `[initialize.before_init]`.
+
+| Name        | Description                                                                                                             |
+| ----------- | ----------------------------------------------------------------------------------------------------------------------- |
+| `tokenizer` | The pipeline to copy the tokenizer from. Defaults to `None`. ~~Optional[str]~~                                          |
+| `vocab`     | The pipeline to copy the vocab from. The vocab includes the lookups and vectors. Defaults to `None`. ~~Optional[str]~~  |
+| **CREATES** | A function that takes the current `nlp` object and modifies its `tokenizer` and `vocab`. ~~Callable[[Language], None]~~ |
+
 ## Training data and alignment {#gold source="spacy/training"}

 ### training.offsets_to_biluo_tags {#offsets_to_biluo_tags tag="function"}
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@ -132,7 +132,7 @@ factory = "tok2vec"
@architectures = "spacy.Tok2Vec.v2"

 [components.tok2vec.model.embed]
-@architectures = "spacy.MultiHashEmbed.v1"
+@architectures = "spacy.MultiHashEmbed.v2"

 [components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
@ -164,7 +164,7 @@ factory = "ner"
@architectures = "spacy.Tok2Vec.v2"

 [components.ner.model.tok2vec.embed]
-@architectures = "spacy.MultiHashEmbed.v1"
+@architectures = "spacy.MultiHashEmbed.v2"

 [components.ner.model.tok2vec.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
@ -541,7 +541,7 @@ word vector tables using the `include_static_vectors` flag.

 ```ini
 [tagger.model.tok2vec.embed]
-@architectures = "spacy.MultiHashEmbed.v1"
+@architectures = "spacy.MultiHashEmbed.v2"
 width = 128
 attrs = ["LOWER","PREFIX","SUFFIX","SHAPE"]
 rows = [5000,2500,2500,2500]
@ -550,7 +550,7 @@ include_static_vectors = true

 <Infobox title="How it works" emoji="💡">

-The configuration system will look up the string `"spacy.MultiHashEmbed.v1"` in
+The configuration system will look up the string `"spacy.MultiHashEmbed.v2"` in
 the `architectures` [registry](/api/top-level#registry), and call the returned
 object with the rest of the arguments from the block. This will result in a call
 to the
--- a/website/docs/usage/index.md
+++ b/website/docs/usage/index.md
@ -130,9 +130,9 @@ which provides a numpy-compatible interface for GPU arrays.

 spaCy can be installed on GPU by specifying `spacy[cuda]`, `spacy[cuda90]`,
 `spacy[cuda91]`, `spacy[cuda92]`, `spacy[cuda100]`, `spacy[cuda101]`,
-`spacy[cuda102]`, `spacy[cuda110]` or `spacy[cuda111]`. If you know your cuda
-version, using the more explicit specifier allows cupy to be installed via
-wheel, saving some compilation time. The specifiers should install
+`spacy[cuda102]`, `spacy[cuda110]`, `spacy[cuda111]` or `spacy[cuda112]`. If you
+know your cuda version, using the more explicit specifier allows cupy to be
+installed via wheel, saving some compilation time. The specifiers should install
 [`cupy`](https://cupy.chainer.org).

 ```bash
--- a/website/docs/usage/layers-architectures.md
+++ b/website/docs/usage/layers-architectures.md
@ -137,7 +137,7 @@ nO = null
@architectures = "spacy.Tok2Vec.v2"

 [components.textcat.model.tok2vec.embed]
-@architectures = "spacy.MultiHashEmbed.v1"
+@architectures = "spacy.MultiHashEmbed.v2"
 width = 64
 rows = [2000, 2000, 1000, 1000, 1000, 1000]
 attrs = ["ORTH", "LOWER", "PREFIX", "SUFFIX", "SHAPE", "ID"]
@ -204,7 +204,7 @@ factory = "tok2vec"
@architectures = "spacy.Tok2Vec.v2"

 [components.tok2vec.model.embed]
-@architectures = "spacy.MultiHashEmbed.v1"
+@architectures = "spacy.MultiHashEmbed.v2"
 # ...

 [components.tok2vec.model.encode]
@ -220,7 +220,7 @@ architecture:
 ```ini
 ### config.cfg (excerpt)
 [components.tok2vec.model.embed]
-@architectures = "spacy.CharacterEmbed.v1"
+@architectures = "spacy.CharacterEmbed.v2"
 # ...

 [components.tok2vec.model.encode]
@ -638,7 +638,7 @@ that has the full implementation.
 > @architectures = "rel_instance_tensor.v1"
 >
 > [model.create_instance_tensor.tok2vec]
-> @architectures = "spacy.HashEmbedCNN.v1"
+> @architectures = "spacy.HashEmbedCNN.v2"
 > # ...
 >
 > [model.create_instance_tensor.pooling]
--- a/website/docs/usage/linguistic-features.md
+++ b/website/docs/usage/linguistic-features.md
@ -787,6 +787,7 @@ rather than performance:

 ```python
 def tokenizer_pseudo_code(
+    text,
    special_cases,
    prefix_search,
    suffix_search,
@ -840,12 +841,14 @@ def tokenizer_pseudo_code(
                tokens.append(substring)
                substring = ""
        tokens.extend(reversed(suffixes))
+    for match in matcher(special_cases, text):
+        tokens.replace(match, special_cases[match])
    return tokens
 ```

 The algorithm can be summarized as follows:

-1. Iterate over whitespace-separated substrings.
+1. Iterate over space-separated substrings.
 2. Look for a token match. If there is a match, stop processing and keep this
   token.
 3. Check whether we have an explicitly defined special case for this substring.
@ -859,6 +862,8 @@ The algorithm can be summarized as follows:
 8. Look for "infixes" – stuff like hyphens etc. and split the substring into
   tokens on all infixes.
 9. Once we can't consume any more of the string, handle it as a single token.
+10. Make a final pass over the text to check for special cases that include
+    spaces or that were missed due to the incremental processing of affixes.

 </Accordion>

--- a/website/docs/usage/projects.md
+++ b/website/docs/usage/projects.md
@ -995,7 +995,7 @@ your results.
 >
 > ```ini
 > [training.logger]
-> @loggers = "spacy.WandbLogger.v1"
+> @loggers = "spacy.WandbLogger.v2"
 > project_name = "monitor_spacy_training"
 > remove_config_values = ["paths.train", "paths.dev", "corpora.train.path", "corpora.dev.path"]
 > ```
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -1130,8 +1130,8 @@ any other custom workflows. `corpora.train` and `corpora.dev` are used as
 conventions within spaCy's default configs, but you can also define any other
 custom blocks. Each section in the corpora config should resolve to a
 [`Corpus`](/api/corpus) – for example, using spaCy's built-in
-[corpus reader](/api/top-level#readers) that takes a path to a binary `.spacy`
-file. The `train_corpus` and `dev_corpus` fields in the
+[corpus reader](/api/top-level#corpus-readers) that takes a path to a binary
+`.spacy` file. The `train_corpus` and `dev_corpus` fields in the
 [`[training]`](/api/data-formats#config-training) block specify where to find
 the corpus in your config. This makes it easy to **swap out** different corpora
 by only changing a single config setting.
@ -1142,21 +1142,23 @@ corpora, keyed by corpus name, e.g. `"train"` and `"dev"`. This can be
 especially useful if you need to split a single file into corpora for training
 and evaluation, without loading the same file twice.

+By default, the training data is loaded into memory and shuffled before each
+epoch. If the corpus is **too large to fit into memory** during training, stream
+the corpus using a custom reader as described in the next section.
+
 ### Custom data reading and batching {#custom-code-readers-batchers}

 Some use-cases require **streaming in data** or manipulating datasets on the
-fly, rather than generating all data beforehand and storing it to file. Instead
+fly, rather than generating all data beforehand and storing it to disk. Instead
 of using the built-in [`Corpus`](/api/corpus) reader, which uses static file
 paths, you can create and register a custom function that generates
-[`Example`](/api/example) objects. The resulting generator can be infinite. When
-using this dataset for training, stopping criteria such as maximum number of
-steps, or stopping when the loss does not decrease further, can be used.
+[`Example`](/api/example) objects.

-In this example we assume a custom function `read_custom_data` which loads or
-generates texts with relevant text classification annotations. Then, small
-lexical variations of the input text are created before generating the final
-[`Example`](/api/example) objects. The `@spacy.registry.readers` decorator lets
-you register the function creating the custom reader in the `readers`
+In the following example we assume a custom function `read_custom_data` which
+loads or generates texts with relevant text classification annotations. Then,
+small lexical variations of the input text are created before generating the
+final [`Example`](/api/example) objects. The `@spacy.registry.readers` decorator
+lets you register the function creating the custom reader in the `readers`
 [registry](/api/top-level#registry) and assign it a string name, so it can be
 used in your config. All arguments on the registered function become available
 as **config settings** – in this case, `source`.
@ -1199,6 +1201,80 @@ Remember that a registered function should always be a function that spaCy

 </Infobox>

+If the corpus is **too large to load into memory** or the corpus reader is an
+**infinite generator**, use the setting `max_epochs = -1` to indicate that the
+train corpus should be streamed. With this setting the train corpus is merely
+streamed and batched, not shuffled, so any shuffling needs to be implemented in
+the corpus reader itself. In the example below, a corpus reader that generates
+sentences containing even or odd numbers is used with an unlimited number of
+examples for the train corpus and a limited number of examples for the dev
+corpus. The dev corpus should always be finite and fit in memory during the
+evaluation step. `max_steps` and/or `patience` are used to determine when the
+training should stop.
+
+> #### config.cfg
+>
+> ```ini
+> [corpora.dev]
+> @readers = "even_odd.v1"
+> limit = 100
+>
+> [corpora.train]
+> @readers = "even_odd.v1"
+> limit = -1
+>
+> [training]
+> max_epochs = -1
+> patience = 500
+> max_steps = 2000
+> ```
+
+```python
+### functions.py
+from typing import Callable, Iterable, Iterator
+from spacy import util
+import random
+from spacy.training import Example
+from spacy import Language
+
+
+@util.registry.readers("even_odd.v1")
+def create_even_odd_corpus(limit: int = -1) -> Callable[[Language], Iterable[Example]]:
+    return EvenOddCorpus(limit)
+
+
+class EvenOddCorpus:
+    def __init__(self, limit):
+        self.limit = limit
+
+    def __call__(self, nlp: Language) -> Iterator[Example]:
+        i = 0
+        while i < self.limit or self.limit < 0:
+            r = random.randint(0, 1000)
+            cat = r % 2 == 0
+            text = "This is sentence " + str(r)
+            yield Example.from_dict(
+                nlp.make_doc(text), {"cats": {"EVEN": cat, "ODD": not cat}}
+            )
+            i += 1
+```
+
+> #### config.cfg
+>
+> ```ini
+> [initialize.components.textcat.labels]
+> @readers = "spacy.read_labels.v1"
+> path = "labels/textcat.json"
+> require = true
+> ```
+
+If the train corpus is streamed, the initialize step peeks at the first 100
+examples in the corpus to find the labels for each component. If this isn't
+sufficient, you'll need to [provide the labels](#initialization-labels) for each
+component in the `[initialize]` block. [`init labels`](/api/cli#init-labels) can
+be used to generate JSON files in the correct format, which you can extend with
+the full label set.
+
 We can also customize the **batching strategy** by registering a new batcher
 function in the `batchers` [registry](/api/top-level#registry). A batcher turns
 a stream of items into a stream of batches. spaCy has several useful built-in
--- a/website/docs/usage/v3.md
+++ b/website/docs/usage/v3.md
@ -616,11 +616,11 @@ Note that spaCy v3.0 now requires **Python 3.6+**.
 | `spacy profile`                                                                              | [`spacy debug profile`](/api/cli#debug-profile)                                                                                                                                                                          |
 | `spacy link`, `util.set_data_path`, `util.get_data_path`                                     | not needed, symlinks are deprecated                                                                                                                                                                                      |

-The following deprecated methods, attributes and arguments were removed in v3.0.
-Most of them have been **deprecated for a while** and many would previously
-raise errors. Many of them were also mostly internals. If you've been working
-with more recent versions of spaCy v2.x, it's **unlikely** that your code relied
-on them.
+The following methods, attributes and arguments were removed in v3.0. Most of
+them have been **deprecated for a while** and many would previously raise
+errors. Many of them were also mostly internals. If you've been working with
+more recent versions of spaCy v2.x, it's **unlikely** that your code relied on
+them.

 | Removed                                                                                                                 | Replacement                                                                                                                                                |
 | ----------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
@ -637,10 +637,10 @@ on them.

 ### Downloading and loading trained pipelines {#migrating-downloading-models}

-Symlinks and shortcuts like `en` are now officially deprecated. There are
-[many different trained pipelines](/models) with different capabilities and not
-just one "English model". In order to download and load a package, you should
-always use its full name – for instance,
+Symlinks and shortcuts like `en` have been deprecated for a while, and are now
+not supported anymore. There are [many different trained pipelines](/models)
+with different capabilities and not just one "English model". In order to
+download and load a package, you should always use its full name – for instance,
 [`en_core_web_sm`](/models/en#en_core_web_sm).

 ```diff
@ -1185,9 +1185,10 @@ package isn't imported.
 In Jupyter notebooks, run [`prefer_gpu`](/api/top-level#spacy.prefer_gpu),
 [`require_gpu`](/api/top-level#spacy.require_gpu) or
 [`require_cpu`](/api/top-level#spacy.require_cpu) in the same cell as
-[`spacy.load`](/api/top-level#spacy.load) to ensure that the model is loaded on the correct device.
+[`spacy.load`](/api/top-level#spacy.load) to ensure that the model is loaded on
+the correct device.

-Due to a bug related to `contextvars` (see the [bug
-report](https://github.com/ipython/ipython/issues/11565)), the GPU settings may
-not be preserved correctly across cells, resulting in models being loaded on
+Due to a bug related to `contextvars` (see the
+[bug report](https://github.com/ipython/ipython/issues/11565)), the GPU settings
+may not be preserved correctly across cells, resulting in models being loaded on
 the wrong device or only partially on GPU.