Merge branch 'main' into feature/docwise-generator-batching

2025-09-13 15:42:35 +03:00 · 2024-01-30 21:00:22 +01:00 · 2024-01-30 21:00:22 +01:00 · 78c72d3ab7
commit 78c72d3ab7
parent 25bce73461 70e2f2a14a
211 changed files with 8546 additions and 4846 deletions
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@ -0,0 +1 @@
 custom: [https://explosion.ai/merch, https://explosion.ai/tailored-solutions]
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@ -58,7 +58,7 @@ jobs:
      fail-fast: true
      matrix:
        os: [ubuntu-latest, windows-latest, macos-latest]
-        python_version: ["3.11"]
+        python_version: ["3.12"]
        include:
          - os: macos-latest
            python_version: "3.8"
@ -66,6 +66,8 @@ jobs:
            python_version: "3.9"
          - os: windows-latest
            python_version: "3.10"
          - os: macos-latest
            python_version: "3.11"
    runs-on: ${{ matrix.os }}
--- a/2
+++ b/2
@ -1,6 +1,6 @@
 The MIT License (MIT)
-Copyright (C) 2016-2022 ExplosionAI GmbH, 2016 spaCy GmbH, 2015 Matthew Honnibal
+Copyright (C) 2016-2023 ExplosionAI GmbH, 2016 spaCy GmbH, 2015 Matthew Honnibal
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/README.md
+++ b/README.md
@ -6,23 +6,20 @@ spaCy is a library for **advanced Natural Language Processing** in Python and
 Cython. It's built on the very latest research, and was designed from day one to
 be used in real products.
-spaCy comes with
+spaCy comes with [pretrained pipelines](https://spacy.io/models) and currently
-[pretrained pipelines](https://spacy.io/models) and
+supports tokenization and training for **70+ languages**. It features
-currently supports tokenization and training for **70+ languages**. It features
+state-of-the-art speed and **neural network models** for tagging, parsing,
-state-of-the-art speed and **neural network models** for tagging,
+**named entity recognition**, **text classification** and more, multi-task
-parsing, **named entity recognition**, **text classification** and more,
+learning with pretrained **transformers** like BERT, as well as a
 multi-task learning with pretrained **transformers** like BERT, as well as a
 production-ready [**training system**](https://spacy.io/usage/training) and easy
 model packaging, deployment and workflow management. spaCy is commercial
-open-source software, released under the [MIT license](https://github.com/explosion/spaCy/blob/master/LICENSE).
+open-source software, released under the
 [MIT license](https://github.com/explosion/spaCy/blob/master/LICENSE).
-💥 **We'd love to hear more about your experience with spaCy!**
+💫 **Version 3.7 out now!**
 [Fill out our survey here.](https://form.typeform.com/to/aMel9q9f)
 💫 **Version 3.5 out now!**
 [Check out the release notes here.](https://github.com/explosion/spaCy/releases)
-[![Azure Pipelines](https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
+[![tests](https://github.com/explosion/spaCy/actions/workflows/tests.yml/badge.svg)](https://github.com/explosion/spaCy/actions/workflows/tests.yml)
 [![Current Release Version](https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square&logo=github)](https://github.com/explosion/spaCy/releases)
 [![pypi Version](https://img.shields.io/pypi/v/spacy.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/spacy/)
 [![conda Version](https://img.shields.io/conda/vn/conda-forge/spacy.svg?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/spacy)
@ -35,35 +32,42 @@ open-source software, released under the [MIT license](https://github.com/explos
 ## 📖 Documentation
-| Documentation                 |                                                                        |
+| Documentation                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                              |
-| ----------------------------- | ---------------------------------------------------------------------- |
+| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| ⭐️ **[spaCy 101]**           | New to spaCy? Here's everything you need to know!                      |
+| ⭐️ **[spaCy 101]**                                                                                                                                                                                                       | New to spaCy? Here's everything you need to know!                                                                                                                                                                                                                                                                                            |
-| 📚 **[Usage Guides]**         | How to use spaCy and its features.                                     |
+| 📚 **[Usage Guides]**                                                                                                                                                                                                     | How to use spaCy and its features.                                                                                                                                                                                                                                                                                                           |
-| 🚀 **[New in v3.0]**          | New features, backwards incompatibilities and migration guide.         |
+| 🚀 **[New in v3.0]**                                                                                                                                                                                                      | New features, backwards incompatibilities and migration guide.                                                                                                                                                                                                                                                                               |
-| 🪐 **[Project Templates]**    | End-to-end workflows you can clone, modify and run.                    |
+| 🪐 **[Project Templates]**                                                                                                                                                                                                | End-to-end workflows you can clone, modify and run.                                                                                                                                                                                                                                                                                          |
-| 🎛 **[API Reference]**         | The detailed reference for spaCy's API.                                |
+| 🎛 **[API Reference]**                                                                                                                                                                                                     | The detailed reference for spaCy's API.                                                                                                                                                                                                                                                                                                      |
-| 📦 **[Models]**               | Download trained pipelines for spaCy.                                  |
+| ⏩ **[GPU Processing]**                                                                                                                                                                                                    | Use spaCy with CUDA-compatible GPU processing.                                                                                                                                                                                                                                                                                               |
-| 🌌 **[Universe]**             | Plugins, extensions, demos and books from the spaCy ecosystem.         |
+| 📦 **[Models]**                                                                                                                                                                                                           | Download trained pipelines for spaCy.                                                                                                                                                                                                                                                                                                        |
-| ⚙️ **[spaCy VS Code Extension]** | Additional tooling and features for working with spaCy's config files. |
+| 🦙 **[Large Language Models]**                                                                                                                                                                                            | Integrate LLMs into spaCy pipelines.                                                                                                                                                                                                                                                                                                        |
-| 👩‍🏫 **[Online Course]** | Learn spaCy in this free and interactive online course. |
+| 🌌 **[Universe]**                                                                                                                                                                                                         | Plugins, extensions, demos and books from the spaCy ecosystem.                                                                                                                                                                                                                                                                               |
-| 📺 **[Videos]** | Our YouTube channel with video tutorials, talks and more. |
+| ⚙️ **[spaCy VS Code Extension]**                                                                                                                                                                                          | Additional tooling and features for working with spaCy's config files.                                                                                                                                                                                                                                                                       |
-| 🛠 **[Changelog]** | Changes and version history. |
+| 👩‍🏫 **[Online Course]**                                                                                                                                                                                                    | Learn spaCy in this free and interactive online course.                                                                                                                                                                                                                                                                                      |
-| 💝 **[Contribute]** | How to contribute to the spaCy project and code base. |
+| 📰 **[Blog]**                                                                                                                                                                                                             | Read about current spaCy and Prodigy development, releases, talks and more from Explosion.                                                                                                                                                                                                                 |
-| <a href="https://explosion.ai/spacy-tailored-pipelines"><img src="https://user-images.githubusercontent.com/13643239/152853098-1c761611-ccb0-4ec6-9066-b234552831fe.png" width="125" alt="spaCy Tailored Pipelines"/></a> | Get a custom spaCy pipeline, tailor-made for your NLP problem by spaCy's core developers. Streamlined, production-ready, predictable and maintainable. Start by completing our 5-minute questionnaire to tell us what you need and we'll be in touch! **[Learn more &rarr;](https://explosion.ai/spacy-tailored-pipelines)** |
+| 📺 **[Videos]**                                                                                                                                                                                                           | Our YouTube channel with video tutorials, talks and more.                                                                                                                                                                                                                                                                                    |
-| <a href="https://explosion.ai/spacy-tailored-analysis"><img src="https://user-images.githubusercontent.com/1019791/206151300-b00cd189-e503-4797-aa1e-1bb6344062c5.png" width="125" alt="spaCy Tailored Pipelines"/></a> | Bespoke advice for problem solving, strategy and analysis for applied NLP projects. Services include data strategy, code reviews, pipeline design and annotation coaching. Curious? Fill in our 5-minute questionnaire to tell us what you need and we'll be in touch! **[Learn more &rarr;](https://explosion.ai/spacy-tailored-analysis)** |
+| 🛠 **[Changelog]**                                                                                                                                                                                                         | Changes and version history.                                                                                                                                                                                                                                                                                                                 |
 | 💝 **[Contribute]**                                                                                                                                                                                                       | How to contribute to the spaCy project and code base.                                                                                                                                                                                                                                                                                        |
 | 👕 **[Swag]**                                                                                                                                                                                                             | Support us and our work with unique, custom-designed swag!                                                                                                                                                                                                                                                                                   |
 | <a href="https://explosion.ai/tailored-solutions"><img src="https://github.com/explosion/spaCy/assets/13643239/36d2a42e-98c0-4599-90e1-788ef75181be" width="150" alt="Tailored Solutions"/></a> | Custom NLP consulting, implementation and strategic advice by spaCy’s core development team. Streamlined, production-ready, predictable and maintainable. Send us an email or take our 5-minute questionnaire, and well'be in touch! **[Learn more &rarr;](https://explosion.ai/tailored-solutions)**                 |
 [spacy 101]: https://spacy.io/usage/spacy-101
 [new in v3.0]: https://spacy.io/usage/v3
 [usage guides]: https://spacy.io/usage/
 [api reference]: https://spacy.io/api/
 [gpu processing]: https://spacy.io/usage#gpu
 [models]: https://spacy.io/models
 [large language models]: https://spacy.io/usage/large-language-models
 [universe]: https://spacy.io/universe
-[spaCy VS Code Extension]: https://github.com/explosion/spacy-vscode
+[spacy vs code extension]: https://github.com/explosion/spacy-vscode
 [videos]: https://www.youtube.com/c/ExplosionAI
 [online course]: https://course.spacy.io
 [blog]: https://explosion.ai
 [project templates]: https://github.com/explosion/projects
 [changelog]: https://spacy.io/usage#changelog
 [contribute]: https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md
 [swag]: https://explosion.ai/merch
 ## 💬 Where to ask questions
@ -92,7 +96,9 @@ more people can benefit from it.
 - State-of-the-art speed
 - Production-ready **training system**
 - Linguistically-motivated **tokenization**
- Components for named **entity recognition**, part-of-speech-tagging, dependency parsing, sentence segmentation, **text classification**, lemmatization, morphological analysis, entity linking and more
+- Components for named **entity recognition**, part-of-speech-tagging,
  dependency parsing, sentence segmentation, **text classification**,
  lemmatization, morphological analysis, entity linking and more
 - Easily extensible with **custom components** and attributes
 - Support for custom models in **PyTorch**, **TensorFlow** and other frameworks
 - Built in **visualizers** for syntax and NER
@ -118,8 +124,8 @@ For detailed installation instructions, see the
 ### pip
 Using pip, spaCy releases are available as source packages and binary wheels.
-Before you install spaCy and its dependencies, make sure that
+Before you install spaCy and its dependencies, make sure that your `pip`,
-your `pip`, `setuptools` and `wheel` are up to date.
+`setuptools` and `wheel` are up to date.
 ```bash
 pip install -U pip setuptools wheel
@ -174,9 +180,9 @@ with the new version.
 ## 📦 Download model packages
-Trained pipelines for spaCy can be installed as **Python packages**. This
+Trained pipelines for spaCy can be installed as **Python packages**. This means
-means that they're a component of your application, just like any other module.
+that they're a component of your application, just like any other module. Models
-Models can be installed using spaCy's [`download`](https://spacy.io/api/cli#download)
+can be installed using spaCy's [`download`](https://spacy.io/api/cli#download)
 command, or manually by pointing pip to a path or URL.
 | Documentation              |                                                                  |
@ -242,8 +248,7 @@ do that depends on your system.
 | **Mac**     | Install a recent version of [XCode](https://developer.apple.com/xcode/), including the so-called "Command Line Tools". macOS and OS X ship with Python and git preinstalled.                                                                                        |
 | **Windows** | Install a version of the [Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) or [Visual Studio Express](https://visualstudio.microsoft.com/vs/express/) that matches the version that was used to compile your Python interpreter. |
-For more details
+For more details and instructions, see the documentation on
 and instructions, see the documentation on
 [compiling spaCy from source](https://spacy.io/usage#source) and the
 [quickstart widget](https://spacy.io/usage#section-quickstart) to get the right
 commands for your platform and Python version.
--- a/build-constraints.txt
+++ b/build-constraints.txt
@ -1,7 +1,4 @@
-# build version constraints for use with wheelwright + multibuild
+# build version constraints for use with wheelwright
 numpy==1.17.3; python_version=='3.8' and platform_machine!='aarch64'
 numpy==1.19.2; python_version=='3.8' and platform_machine=='aarch64'
-numpy==1.19.3; python_version=='3.9'
+numpy>=1.25.0; python_version>='3.9'
 numpy==1.21.3; python_version=='3.10'
 numpy==1.23.2; python_version=='3.11'
 numpy; python_version>='3.12'
--- a/extra/DEVELOPER_DOCS/Listeners.md
+++ b/extra/DEVELOPER_DOCS/Listeners.md
@ -1,14 +1,17 @@
 # Listeners
-1. [Overview](#1-overview)
+- [1. Overview](#1-overview)
-2. [Initialization](#2-initialization)
+- [2. Initialization](#2-initialization)
-   - [A. Linking listeners to the embedding component](#2a-linking-listeners-to-the-embedding-component)
+  - [2A. Linking listeners to the embedding component](#2a-linking-listeners-to-the-embedding-component)
-   - [B. Shape inference](#2b-shape-inference)
+  - [2B. Shape inference](#2b-shape-inference)
-3. [Internal communication](#3-internal-communication)
+- [3. Internal communication](#3-internal-communication)
-   - [A. During prediction](#3a-during-prediction)
+  - [3A. During prediction](#3a-during-prediction)
-   - [B. During training](#3b-during-training)
+  - [3B. During training](#3b-during-training)
-   - [C. Frozen components](#3c-frozen-components)
+    - [Training with multiple listeners](#training-with-multiple-listeners)
-4. [Replacing listener with standalone](#4-replacing-listener-with-standalone)
+  - [3C. Frozen components](#3c-frozen-components)
    - [The Tok2Vec or Transformer is frozen](#the-tok2vec-or-transformer-is-frozen)
    - [The upstream component is frozen](#the-upstream-component-is-frozen)
 - [4. Replacing listener with standalone](#4-replacing-listener-with-standalone)
 ## 1. Overview
@ -62,7 +65,7 @@ of this `find_listener()` method will specifically identify sublayers of a model
 If it's a Transformer-based pipeline, a
 [`transformer` component](https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/pipeline_component.py)
-has a similar implementation but its `find_listener()` function will specifically look for `TransformerListener` 
+has a similar implementation but its `find_listener()` function will specifically look for `TransformerListener`
 sublayers of downstream components.
 ### 2B. Shape inference
@ -154,7 +157,7 @@ as a tagger or a parser. This used to be impossible before 3.1, but has become s
 embedding component in the [`annotating_components`](https://spacy.io/usage/training#annotating-components)
 list of the config. This works like any other "annotating component" because it relies on the `Doc` attributes.
-However, if the `Tok2Vec` or `Transformer` is frozen, and not present in `annotating_components`, and a related 
+However, if the `Tok2Vec` or `Transformer` is frozen, and not present in `annotating_components`, and a related
 listener isn't frozen, then a `W086` warning is shown and further training of the pipeline will likely end with `E954`.
 #### The upstream component is frozen
@ -216,5 +219,17 @@ new_model = tok2vec_model.attrs["replace_listener"](new_model)
 ```
 The new config and model are then properly stored on the `nlp` object.
-Note that this functionality (running the replacement for a transformer listener) was broken prior to 
+Note that this functionality (running the replacement for a transformer listener) was broken prior to
 `spacy-transformers` 1.0.5.
 In spaCy 3.7, `Language.replace_listeners` was updated to pass the following additional arguments to the `replace_listener` callback:
 the listener to be replaced and the `tok2vec`/`transformer` pipe from which the new model was copied. To maintain backwards-compatiblity,
 the method only passes these extra arguments for callbacks that support them:
 ```
 def replace_listener_pre_37(copied_tok2vec_model):
  ...
 def replace_listener_post_37(copied_tok2vec_model, replaced_listener, tok2vec_pipe):
  ...
 ```
--- a/licenses/3rd_party_licenses.txt
+++ b/licenses/3rd_party_licenses.txt
@ -158,3 +158,45 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
 SciPy
 -----
 * Files: scorer.py
 The implementation of trapezoid() is adapted from SciPy, which is distributed
 under the following license:
 New BSD License
 Copyright (c) 2001-2002 Enthought, Inc. 2003-2023, SciPy Developers.
 All rights reserved.
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
 1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
 2. Redistributions in binary form must reproduce the above
   copyright notice, this list of conditions and the following
   disclaimer in the documentation and/or other materials provided
   with the distribution.
 3. Neither the name of the copyright holder nor the names of its
   contributors may be used to endorse or promote products derived
   from this software without specific prior written permission.
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/pyproject.toml
+++ b/pyproject.toml
@ -5,8 +5,9 @@ requires = [
    "cymem>=2.0.2,<2.1.0",
    "preshed>=3.0.2,<3.1.0",
    "murmurhash>=0.28.0,<1.1.0",
-    "thinc>=9.0.0.dev2,<9.1.0",
+    "thinc>=9.0.0.dev4,<9.1.0",
-    "numpy>=1.15.0",
+    "numpy>=1.15.0; python_version < '3.9'",
    "numpy>=1.25.0; python_version >= '3.9'",
 ]
 build-backend = "setuptools.build_meta"
--- a/requirements.txt
+++ b/requirements.txt
@ -1,22 +1,23 @@
 # Our libraries
-spacy-legacy>=4.0.0.dev0,<4.1.0
+spacy-legacy>=4.0.0.dev1,<4.1.0
 spacy-loggers>=1.0.0,<2.0.0
 cymem>=2.0.2,<2.1.0
 preshed>=3.0.2,<3.1.0
-thinc>=9.0.0.dev2,<9.1.0
+thinc>=9.0.0.dev4,<9.1.0
 ml_datasets>=0.2.0,<0.3.0
 murmurhash>=0.28.0,<1.1.0
 wasabi>=0.9.1,<1.2.0
 srsly>=2.4.3,<3.0.0
 catalogue>=2.0.6,<2.1.0
 typer>=0.3.0,<0.10.0
 pathy>=0.10.0
 smart-open>=5.2.1,<7.0.0
 weasel>=0.1.0,<0.4.0
 # Third party dependencies
-numpy>=1.15.0
+numpy>=1.15.0; python_version < "3.9"
 numpy>=1.19.0; python_version >= "3.9"
 requests>=2.13.0,<3.0.0
 tqdm>=4.38.0,<5.0.0
-pydantic>=1.7.4,!=1.8,!=1.8.1,<1.11.0
+pydantic>=1.7.4,!=1.8,!=1.8.1,<3.0.0
 jinja2
 langcodes>=3.2.0,<4.0.0
 # Official Python utilities
@ -30,11 +31,11 @@ pytest-timeout>=1.3.0,<2.0.0
 mock>=2.0.0,<3.0.0
 flake8>=3.8.0,<6.0.0
 hypothesis>=3.27.0,<7.0.0
-mypy>=0.990,<1.1.0; platform_machine != "aarch64"
+mypy>=1.5.0,<1.6.0; platform_machine != "aarch64" and python_version >= "3.8"
 types-mock>=0.1.1
 types-setuptools>=57.0.0
 types-requests
 types-setuptools>=57.0.0
 black==22.3.0
-cython-lint>=0.15.0; python_version >= "3.7"
+cython-lint>=0.15.0
 isort>=5.0,<6.0
--- a/setup.cfg
+++ b/setup.cfg
@ -30,33 +30,26 @@ project_urls =
 zip_safe = false
 include_package_data = true
 python_requires = >=3.8
 setup_requires =
    cython>=0.25,<3.0
    numpy>=1.15.0
    # We also need our Cython packages here to compile against
    cymem>=2.0.2,<2.1.0
    preshed>=3.0.2,<3.1.0
    murmurhash>=0.28.0,<1.1.0
    thinc>=9.0.0.dev2,<9.1.0
 install_requires =
    # Our libraries
-    spacy-legacy>=4.0.0.dev0,<4.1.0
+    spacy-legacy>=4.0.0.dev1,<4.1.0
    spacy-loggers>=1.0.0,<2.0.0
    murmurhash>=0.28.0,<1.1.0
    cymem>=2.0.2,<2.1.0
    preshed>=3.0.2,<3.1.0
-    thinc>=9.0.0.dev2,<9.1.0
+    thinc>=9.0.0.dev4,<9.1.0
    wasabi>=0.9.1,<1.2.0
    srsly>=2.4.3,<3.0.0
    catalogue>=2.0.6,<2.1.0
    weasel>=0.1.0,<0.4.0
    # Third-party dependencies
    typer>=0.3.0,<0.10.0
    pathy>=0.10.0
    smart-open>=5.2.1,<7.0.0
    tqdm>=4.38.0,<5.0.0
-    numpy>=1.15.0
+    numpy>=1.15.0; python_version < "3.9"
    numpy>=1.19.0; python_version >= "3.9"
    requests>=2.13.0,<3.0.0
-    pydantic>=1.7.4,!=1.8,!=1.8.1,<1.11.0
+    pydantic>=1.7.4,!=1.8,!=1.8.1,<3.0.0
    jinja2
    # Official Python utilities
    setuptools
@ -71,9 +64,7 @@ console_scripts =
 lookups =
    spacy_lookups_data>=1.0.3,<1.1.0
 transformers =
-    spacy_transformers>=1.1.2,<1.3.0
+    spacy_transformers>=1.1.2,<1.4.0
 ray =
    spacy_ray>=0.1.0,<1.0.0
 cuda =
    cupy>=5.0.0b4,<13.0.0
 cuda80 =
@ -108,6 +99,8 @@ cuda117 =
    cupy-cuda117>=5.0.0b4,<13.0.0
 cuda11x =
    cupy-cuda11x>=11.0.0,<13.0.0
 cuda12x =
    cupy-cuda12x>=11.5.0,<13.0.0
 cuda-autodetect =
    cupy-wheel>=11.0.0,<13.0.0
 apple =
--- a/setup.py
+++ b/setup.py
@ -1,10 +1,9 @@
 #!/usr/bin/env python
 from setuptools import Extension, setup, find_packages
 import sys
 import platform
 import numpy
-from distutils.command.build_ext import build_ext
+from setuptools.command.build_ext import build_ext
-from distutils.sysconfig import get_python_inc
+from sysconfig import get_path
 from pathlib import Path
 import shutil
 from Cython.Build import cythonize
@ -33,10 +32,12 @@ MOD_NAMES = [
    "spacy.kb.candidate",
    "spacy.kb.kb",
    "spacy.kb.kb_in_memory",
-    "spacy.ml.tb_framework",
+    "spacy.ml.parser_model",
    "spacy.morphology",
    "spacy.pipeline.dep_parser",
    "spacy.pipeline._edit_tree_internals.edit_trees",
    "spacy.pipeline.morphologizer",
    "spacy.pipeline.ner",
    "spacy.pipeline.pipe",
    "spacy.pipeline.trainable_pipe",
    "spacy.pipeline.sentencizer",
@ -44,7 +45,6 @@ MOD_NAMES = [
    "spacy.pipeline.tagger",
    "spacy.pipeline.transition_parser",
    "spacy.pipeline._parser_internals.arc_eager",
    "spacy.pipeline._parser_internals.batch",
    "spacy.pipeline._parser_internals.ner",
    "spacy.pipeline._parser_internals.nonproj",
    "spacy.pipeline._parser_internals.search",
@ -52,7 +52,6 @@ MOD_NAMES = [
    "spacy.pipeline._parser_internals.stateclass",
    "spacy.pipeline._parser_internals.transition_system",
    "spacy.pipeline._parser_internals._beam_utils",
    "spacy.pipeline._parser_internals._parser_utils",
    "spacy.tokenizer",
    "spacy.training.align",
    "spacy.training.gold_io",
@ -80,6 +79,7 @@ COMPILER_DIRECTIVES = {
    "language_level": -3,
    "embedsignature": True,
    "annotation_typing": False,
    "profile": sys.version_info < (3, 12),
 }
 # Files to copy into the package that are otherwise not included
 COPY_FILES = {
@ -89,30 +89,6 @@ COPY_FILES = {
 }
 def is_new_osx():
    """Check whether we're on OSX >= 10.7"""
    if sys.platform != "darwin":
        return False
    mac_ver = platform.mac_ver()[0]
    if mac_ver.startswith("10"):
        minor_version = int(mac_ver.split(".")[1])
        if minor_version >= 7:
            return True
        else:
            return False
    return False
 if is_new_osx():
    # On Mac, use libc++ because Apple deprecated use of
    # libstdc
    COMPILE_OPTIONS["other"].append("-stdlib=libc++")
    LINK_OPTIONS["other"].append("-lc++")
    # g++ (used by unix compiler on mac) links to libstdc++ as a default lib.
    # See: https://stackoverflow.com/questions/1653047/avoid-linking-to-libstdc
    LINK_OPTIONS["other"].append("-nodefaultlibs")
 # By subclassing build_extensions we have the actual compiler that will be used which is really known only after finalize_options
 # http://stackoverflow.com/questions/724664/python-distutils-how-to-get-a-compiler-that-is-going-to-be-used
 class build_ext_options:
@ -205,7 +181,7 @@ def setup_package():
    include_dirs = [
        numpy.get_include(),
-        get_python_inc(plat_specific=True),
+        get_path("include"),
    ]
    ext_modules = []
    ext_modules.append(
--- a/spacy/about.py
+++ b/spacy/about.py
@ -1,7 +1,9 @@
 # fmt: off
 __title__ = "spacy"
-__version__ = "4.0.0.dev1"
+__version__ = "4.0.0.dev2"
 __download_url__ = "https://github.com/explosion/spacy-models/releases/download"
 __compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
 __projects__ = "https://github.com/explosion/projects"
 __projects_branch__ = "v3"
 __lookups_tag__ = "v1.0.3"
 __lookups_url__ = f"https://raw.githubusercontent.com/explosion/spacy-lookups-data/{__lookups_tag__}/spacy_lookups_data/data/"
--- a/spacy/attrs.pyx
+++ b/spacy/attrs.pyx
@ -1,3 +1,4 @@
 # cython: profile=False
 from .errors import Errors
 IOB_STRINGS = ("", "I", "O", "B")
--- a/spacy/cli/init.py
+++ b/spacy/cli/init.py
@ -14,6 +14,7 @@ from .debug_diff import debug_diff  # noqa: F401
 from .debug_model import debug_model  # noqa: F401
 from .download import download  # noqa: F401
 from .evaluate import evaluate  # noqa: F401
 from .find_function import find_function  # noqa: F401
 from .find_threshold import find_threshold  # noqa: F401
 from .info import info  # noqa: F401
 from .init_config import fill_config, init_config  # noqa: F401
@ -21,15 +22,17 @@ from .init_pipeline import init_pipeline_cli  # noqa: F401
 from .package import package  # noqa: F401
 from .pretrain import pretrain  # noqa: F401
 from .profile import profile  # noqa: F401
-from .project.assets import project_assets  # noqa: F401
+from .project.assets import project_assets  # type: ignore[attr-defined]  # noqa: F401
-from .project.clone import project_clone  # noqa: F401
+from .project.clone import project_clone  # type: ignore[attr-defined]  # noqa: F401
-from .project.document import project_document  # noqa: F401
+from .project.document import (  # type: ignore[attr-defined]  # noqa: F401
-from .project.dvc import project_update_dvc  # noqa: F401
+    project_document,
-from .project.pull import project_pull  # noqa: F401
+)
-from .project.push import project_push  # noqa: F401
+from .project.dvc import project_update_dvc  # type: ignore[attr-defined]  # noqa: F401
-from .project.run import project_run  # noqa: F401
+from .project.pull import project_pull  # type: ignore[attr-defined]  # noqa: F401
-from .train import train_cli  # noqa: F401
+from .project.push import project_push  # type: ignore[attr-defined]  # noqa: F401
-from .validate import validate  # noqa: F401
+from .project.run import project_run  # type: ignore[attr-defined]  # noqa: F401
 from .train import train_cli  # type: ignore[attr-defined]  # noqa: F401
 from .validate import validate  # type: ignore[attr-defined]  # noqa: F401
@app.command("link", no_args_is_help=True, deprecated=True, hidden=True)
--- a/spacy/cli/_util.py
+++ b/spacy/cli/_util.py
@ -26,10 +26,11 @@ from thinc.api import Config, ConfigValidationError, require_gpu
 from thinc.util import gpu_is_available
 from typer.main import get_command
 from wasabi import Printer, msg
 from weasel import app as project_cli
 from .. import about
 from ..errors import RENAMED_LANGUAGE_CODES
-from ..schemas import ProjectConfigSchema, validate
+from ..schemas import validate
 from ..util import (
    ENV_VARS,
    SimpleFrozenDict,
@ -41,15 +42,10 @@ from ..util import (
    run_command,
 )
 if TYPE_CHECKING:
    from pathy import FluidPath  # noqa: F401
 SDIST_SUFFIX = ".tar.gz"
 WHEEL_SUFFIX = "-py3-none-any.whl"
 PROJECT_FILE = "project.yml"
 PROJECT_LOCK = "project.lock"
 COMMAND = "python -m spacy"
 NAME = "spacy"
 HELP = """spaCy Command-line Interface
@ -75,11 +71,10 @@ Opt = typer.Option
 app = typer.Typer(name=NAME, help=HELP)
 benchmark_cli = typer.Typer(name="benchmark", help=BENCHMARK_HELP, no_args_is_help=True)
 project_cli = typer.Typer(name="project", help=PROJECT_HELP, no_args_is_help=True)
 debug_cli = typer.Typer(name="debug", help=DEBUG_HELP, no_args_is_help=True)
 init_cli = typer.Typer(name="init", help=INIT_HELP, no_args_is_help=True)
-app.add_typer(project_cli)
+app.add_typer(project_cli, name="project", help=PROJECT_HELP, no_args_is_help=True)
 app.add_typer(debug_cli)
 app.add_typer(benchmark_cli)
 app.add_typer(init_cli)
@ -164,148 +159,6 @@ def _handle_renamed_language_codes(lang: Optional[str]) -> None:
        )
 def load_project_config(
    path: Path, interpolate: bool = True, overrides: Dict[str, Any] = SimpleFrozenDict()
 ) -> Dict[str, Any]:
    """Load the project.yml file from a directory and validate it. Also make
    sure that all directories defined in the config exist.
    path (Path): The path to the project directory.
    interpolate (bool): Whether to substitute project variables.
    overrides (Dict[str, Any]): Optional config overrides.
    RETURNS (Dict[str, Any]): The loaded project.yml.
    """
    config_path = path / PROJECT_FILE
    if not config_path.exists():
        msg.fail(f"Can't find {PROJECT_FILE}", config_path, exits=1)
    invalid_err = f"Invalid {PROJECT_FILE}. Double-check that the YAML is correct."
    try:
        config = srsly.read_yaml(config_path)
    except ValueError as e:
        msg.fail(invalid_err, e, exits=1)
    errors = validate(ProjectConfigSchema, config)
    if errors:
        msg.fail(invalid_err)
        print("\n".join(errors))
        sys.exit(1)
    validate_project_version(config)
    validate_project_commands(config)
    if interpolate:
        err = f"{PROJECT_FILE} validation error"
        with show_validation_error(title=err, hint_fill=False):
            config = substitute_project_variables(config, overrides)
    # Make sure directories defined in config exist
    for subdir in config.get("directories", []):
        dir_path = path / subdir
        if not dir_path.exists():
            dir_path.mkdir(parents=True)
    return config
 def substitute_project_variables(
    config: Dict[str, Any],
    overrides: Dict[str, Any] = SimpleFrozenDict(),
    key: str = "vars",
    env_key: str = "env",
 ) -> Dict[str, Any]:
    """Interpolate variables in the project file using the config system.
    config (Dict[str, Any]): The project config.
    overrides (Dict[str, Any]): Optional config overrides.
    key (str): Key containing variables in project config.
    env_key (str): Key containing environment variable mapping in project config.
    RETURNS (Dict[str, Any]): The interpolated project config.
    """
    config.setdefault(key, {})
    config.setdefault(env_key, {})
    # Substitute references to env vars with their values
    for config_var, env_var in config[env_key].items():
        config[env_key][config_var] = _parse_override(os.environ.get(env_var, ""))
    # Need to put variables in the top scope again so we can have a top-level
    # section "project" (otherwise, a list of commands in the top scope wouldn't)
    # be allowed by Thinc's config system
    cfg = Config({"project": config, key: config[key], env_key: config[env_key]})
    cfg = Config().from_str(cfg.to_str(), overrides=overrides)
    interpolated = cfg.interpolate()
    return dict(interpolated["project"])
 def validate_project_version(config: Dict[str, Any]) -> None:
    """If the project defines a compatible spaCy version range, chec that it's
    compatible with the current version of spaCy.
    config (Dict[str, Any]): The loaded config.
    """
    spacy_version = config.get("spacy_version", None)
    if spacy_version and not is_compatible_version(about.__version__, spacy_version):
        err = (
            f"The {PROJECT_FILE} specifies a spaCy version range ({spacy_version}) "
            f"that's not compatible with the version of spaCy you're running "
            f"({about.__version__}). You can edit version requirement in the "
            f"{PROJECT_FILE} to load it, but the project may not run as expected."
        )
        msg.fail(err, exits=1)
 def validate_project_commands(config: Dict[str, Any]) -> None:
    """Check that project commands and workflows are valid, don't contain
    duplicates, don't clash  and only refer to commands that exist.
    config (Dict[str, Any]): The loaded config.
    """
    command_names = [cmd["name"] for cmd in config.get("commands", [])]
    workflows = config.get("workflows", {})
    duplicates = set([cmd for cmd in command_names if command_names.count(cmd) > 1])
    if duplicates:
        err = f"Duplicate commands defined in {PROJECT_FILE}: {', '.join(duplicates)}"
        msg.fail(err, exits=1)
    for workflow_name, workflow_steps in workflows.items():
        if workflow_name in command_names:
            err = f"Can't use workflow name '{workflow_name}': name already exists as a command"
            msg.fail(err, exits=1)
        for step in workflow_steps:
            if step not in command_names:
                msg.fail(
                    f"Unknown command specified in workflow '{workflow_name}': {step}",
                    f"Workflows can only refer to commands defined in the 'commands' "
                    f"section of the {PROJECT_FILE}.",
                    exits=1,
                )
 def get_hash(data, exclude: Iterable[str] = tuple()) -> str:
    """Get the hash for a JSON-serializable object.
    data: The data to hash.
    exclude (Iterable[str]): Top-level keys to exclude if data is a dict.
    RETURNS (str): The hash.
    """
    if isinstance(data, dict):
        data = {k: v for k, v in data.items() if k not in exclude}
    data_str = srsly.json_dumps(data, sort_keys=True).encode("utf8")
    return hashlib.md5(data_str).hexdigest()
 def get_checksum(path: Union[Path, str]) -> str:
    """Get the checksum for a file or directory given its file path. If a
    directory path is provided, this uses all files in that directory.
    path (Union[Path, str]): The file or directory path.
    RETURNS (str): The checksum.
    """
    path = Path(path)
    if not (path.is_file() or path.is_dir()):
        msg.fail(f"Can't get checksum for {path}: not a file or directory", exits=1)
    if path.is_file():
        return hashlib.md5(Path(path).read_bytes()).hexdigest()
    else:
        # TODO: this is currently pretty slow
        dir_checksum = hashlib.md5()
        for sub_file in sorted(fp for fp in path.rglob("*") if fp.is_file()):
            dir_checksum.update(sub_file.read_bytes())
        return dir_checksum.hexdigest()
@contextmanager
 def show_validation_error(
    file_path: Optional[Union[str, Path]] = None,
@ -350,6 +203,13 @@ def show_validation_error(
        msg.fail("Config validation error", e, exits=1)
 def import_code_paths(code_paths: str) -> None:
    """Helper to import comma-separated list of code paths."""
    code_paths = [Path(p.strip()) for p in string_to_list(code_paths)]
    for code_path in code_paths:
        import_code(code_path)
 def import_code(code_path: Optional[Union[Path, str]]) -> None:
    """Helper to import Python file provided in training commands / commands
    using the config. This makes custom registered functions available.
@ -363,166 +223,10 @@ def import_code(code_path: Optional[Union[Path, str]]) -> None:
            msg.fail(f"Couldn't load Python code: {code_path}", e, exits=1)
 def upload_file(src: Path, dest: Union[str, "FluidPath"]) -> None:
    """Upload a file.
    src (Path): The source path.
    url (str): The destination URL to upload to.
    """
    import smart_open
    # Create parent directories for local paths
    if isinstance(dest, Path):
        if not dest.parent.exists():
            dest.parent.mkdir(parents=True)
    dest = str(dest)
    with smart_open.open(dest, mode="wb") as output_file:
        with src.open(mode="rb") as input_file:
            output_file.write(input_file.read())
 def download_file(
    src: Union[str, "FluidPath"], dest: Path, *, force: bool = False
 ) -> None:
    """Download a file using smart_open.
    url (str): The URL of the file.
    dest (Path): The destination path.
    force (bool): Whether to force download even if file exists.
        If False, the download will be skipped.
    """
    import smart_open
    if dest.exists() and not force:
        return None
    src = str(src)
    with smart_open.open(src, mode="rb", compression="disable") as input_file:
        with dest.open(mode="wb") as output_file:
            shutil.copyfileobj(input_file, output_file)
 def ensure_pathy(path):
    """Temporary helper to prevent importing Pathy globally (which can cause
    slow and annoying Google Cloud warning)."""
    from pathy import Pathy  # noqa: F811
    return Pathy.fluid(path)
 def git_checkout(
    repo: str, subpath: str, dest: Path, *, branch: str = "master", sparse: bool = False
 ):
    git_version = get_git_version()
    if dest.exists():
        msg.fail("Destination of checkout must not exist", exits=1)
    if not dest.parent.exists():
        msg.fail("Parent of destination of checkout must exist", exits=1)
    if sparse and git_version >= (2, 22):
        return git_sparse_checkout(repo, subpath, dest, branch)
    elif sparse:
        # Only show warnings if the user explicitly wants sparse checkout but
        # the Git version doesn't support it
        err_old = (
            f"You're running an old version of Git (v{git_version[0]}.{git_version[1]}) "
            f"that doesn't fully support sparse checkout yet."
        )
        err_unk = "You're running an unknown version of Git, so sparse checkout has been disabled."
        msg.warn(
            f"{err_unk if git_version == (0, 0) else err_old} "
            f"This means that more files than necessary may be downloaded "
            f"temporarily. To only download the files needed, make sure "
            f"you're using Git v2.22 or above."
        )
    with make_tempdir() as tmp_dir:
        cmd = f"git -C {tmp_dir} clone {repo} . -b {branch}"
        run_command(cmd, capture=True)
        # We need Path(name) to make sure we also support subdirectories
        try:
            source_path = tmp_dir / Path(subpath)
            if not is_subpath_of(tmp_dir, source_path):
                err = f"'{subpath}' is a path outside of the cloned repository."
                msg.fail(err, repo, exits=1)
            shutil.copytree(str(source_path), str(dest))
        except FileNotFoundError:
            err = f"Can't clone {subpath}. Make sure the directory exists in the repo (branch '{branch}')"
            msg.fail(err, repo, exits=1)
 def git_sparse_checkout(repo, subpath, dest, branch):
    # We're using Git, partial clone and sparse checkout to
    # only clone the files we need
    # This ends up being RIDICULOUS. omg.
    # So, every tutorial and SO post talks about 'sparse checkout'...But they
    # go and *clone* the whole repo. Worthless. And cloning part of a repo
    # turns out to be completely broken. The only way to specify a "path" is..
    # a path *on the server*? The contents of which, specifies the paths. Wat.
    # Obviously this is hopelessly broken and insecure, because you can query
    # arbitrary paths on the server! So nobody enables this.
    # What we have to do is disable *all* files. We could then just checkout
    # the path, and it'd "work", but be hopelessly slow...Because it goes and
    # transfers every missing object one-by-one. So the final piece is that we
    # need to use some weird git internals to fetch the missings in bulk, and
    # *that* we can do by path.
    # We're using Git and sparse checkout to only clone the files we need
    with make_tempdir() as tmp_dir:
        # This is the "clone, but don't download anything" part.
        cmd = (
            f"git clone {repo} {tmp_dir} --no-checkout --depth 1 "
            f"-b {branch} --filter=blob:none"
        )
        run_command(cmd)
        # Now we need to find the missing filenames for the subpath we want.
        # Looking for this 'rev-list' command in the git --help? Hah.
        cmd = f"git -C {tmp_dir} rev-list --objects --all --missing=print -- {subpath}"
        ret = run_command(cmd, capture=True)
        git_repo = _http_to_git(repo)
        # Now pass those missings into another bit of git internals
        missings = " ".join([x[1:] for x in ret.stdout.split() if x.startswith("?")])
        if not missings:
            err = (
                f"Could not find any relevant files for '{subpath}'. "
                f"Did you specify a correct and complete path within repo '{repo}' "
                f"and branch {branch}?"
            )
            msg.fail(err, exits=1)
        cmd = f"git -C {tmp_dir} fetch-pack {git_repo} {missings}"
        run_command(cmd, capture=True)
        # And finally, we can checkout our subpath
        cmd = f"git -C {tmp_dir} checkout {branch} {subpath}"
        run_command(cmd, capture=True)
        # Get a subdirectory of the cloned path, if appropriate
        source_path = tmp_dir / Path(subpath)
        if not is_subpath_of(tmp_dir, source_path):
            err = f"'{subpath}' is a path outside of the cloned repository."
            msg.fail(err, repo, exits=1)
        shutil.move(str(source_path), str(dest))
 def git_repo_branch_exists(repo: str, branch: str) -> bool:
    """Uses 'git ls-remote' to check if a repository and branch exists
    repo (str): URL to get repo.
    branch (str): Branch on repo to check.
    RETURNS (bool): True if repo:branch exists.
    """
    get_git_version()
    cmd = f"git ls-remote {repo} {branch}"
    # We might be tempted to use `--exit-code` with `git ls-remote`, but
    # `run_command` handles the `returncode` for us, so we'll rely on
    # the fact that stdout returns '' if the requested branch doesn't exist
    ret = run_command(cmd, capture=True)
    exists = ret.stdout != ""
    return exists
 def get_git_version(
    error: str = "Could not run 'git'. Make sure it's installed and the executable is available.",
 ) -> Tuple[int, int]:
    """Get the version of git and raise an error if calling 'git --version' fails.
    error (str): The error message to show.
    RETURNS (Tuple[int, int]): The version as a (major, minor) tuple. Returns
        (0, 0) if the version couldn't be determined.
@ -538,30 +242,6 @@ def get_git_version(
    return int(version[0]), int(version[1])
 def _http_to_git(repo: str) -> str:
    if repo.startswith("http://"):
        repo = repo.replace(r"http://", r"https://")
    if repo.startswith(r"https://"):
        repo = repo.replace("https://", "git@").replace("/", ":", 1)
        if repo.endswith("/"):
            repo = repo[:-1]
        repo = f"{repo}.git"
    return repo
 def is_subpath_of(parent, child):
    """
    Check whether `child` is a path contained within `parent`.
    """
    # Based on https://stackoverflow.com/a/37095733 .
    # In Python 3.9, the `Path.is_relative_to()` method will supplant this, so
    # we can stop using crusty old os.path functions.
    parent_realpath = os.path.realpath(parent)
    child_realpath = os.path.realpath(child)
    return os.path.commonpath([parent_realpath, child_realpath]) == parent_realpath
@overload
 def string_to_list(value: str, intify: Literal[False] = ...) -> List[str]:
    ...
--- a/spacy/cli/apply.py
+++ b/spacy/cli/apply.py
@ -133,7 +133,9 @@ def apply(
    if len(text_files) > 0:
        streams.append(_stream_texts(text_files))
    datagen = cast(DocOrStrStream, chain(*streams))
-    for doc in tqdm.tqdm(nlp.pipe(datagen, batch_size=batch_size, n_process=n_process)):
+    for doc in tqdm.tqdm(
        nlp.pipe(datagen, batch_size=batch_size, n_process=n_process), disable=None
    ):
        docbin.add(doc)
    if output_file.suffix == "":
        output_file = output_file.with_suffix(".spacy")
--- a/spacy/cli/assemble.py
+++ b/spacy/cli/assemble.py
@ -11,7 +11,7 @@ from ._util import (
    Arg,
    Opt,
    app,
-    import_code,
+    import_code_paths,
    parse_config_overrides,
    show_validation_error,
 )
@ -26,7 +26,7 @@ def assemble_cli(
    ctx: typer.Context,  # This is only used to read additional arguments
    config_path: Path = Arg(..., help="Path to config file", exists=True, allow_dash=True),
    output_path: Path = Arg(..., help="Output directory to store assembled pipeline in"),
-    code_path: Optional[Path] = Opt(None, "--code", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
+    code_path: str = Opt("", "--code", "-c", help="Comma-separated paths to Python files with additional code (registered functions) to be imported"),
    verbose: bool = Opt(False, "--verbose", "-V", "-VV", help="Display more information for debugging purposes"),
    # fmt: on
 ):
@ -40,12 +40,13 @@ def assemble_cli(
    DOCS: https://spacy.io/api/cli#assemble
    """
-    util.logger.setLevel(logging.DEBUG if verbose else logging.INFO)
+    if verbose:
        util.logger.setLevel(logging.DEBUG)
    # Make sure all files and paths exists if they are needed
    if not config_path or (str(config_path) != "-" and not config_path.exists()):
        msg.fail("Config file not found", config_path, exits=1)
    overrides = parse_config_overrides(ctx.args)
-    import_code(code_path)
+    import_code_paths(code_path)
    with show_validation_error(config_path):
        config = util.load_config(config_path, overrides=overrides, interpolate=False)
    msg.divider("Initializing pipeline")
--- a/spacy/cli/benchmark_speed.py
+++ b/spacy/cli/benchmark_speed.py
@ -89,7 +89,7 @@ class Quartiles:
 def annotate(
    nlp: Language, docs: List[Doc], batch_size: Optional[int]
 ) -> numpy.ndarray:
-    docs = nlp.pipe(tqdm(docs, unit="doc"), batch_size=batch_size)
+    docs = nlp.pipe(tqdm(docs, unit="doc", disable=None), batch_size=batch_size)
    wps = []
    while True:
        with time_context() as elapsed:
--- a/spacy/cli/debug_config.py
+++ b/spacy/cli/debug_config.py
@ -13,7 +13,7 @@ from ._util import (
    Arg,
    Opt,
    debug_cli,
-    import_code,
+    import_code_paths,
    parse_config_overrides,
    show_validation_error,
 )
@ -27,7 +27,7 @@ def debug_config_cli(
    # fmt: off
    ctx: typer.Context,  # This is only used to read additional arguments
    config_path: Path = Arg(..., help="Path to config file", exists=True, allow_dash=True),
-    code_path: Optional[Path] = Opt(None, "--code-path", "--code", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
+    code_path: str = Opt("", "--code", "-c", help="Comma-separated paths to Python files with additional code (registered functions) to be imported"),
    show_funcs: bool = Opt(False, "--show-functions", "-F", help="Show an overview of all registered functions used in the config and where they come from (modules, files etc.)"),
    show_vars: bool = Opt(False, "--show-variables", "-V", help="Show an overview of all variables referenced in the config and their values. This will also reflect variables overwritten on the CLI.")
    # fmt: on
@ -44,7 +44,7 @@ def debug_config_cli(
    DOCS: https://spacy.io/api/cli#debug-config
    """
    overrides = parse_config_overrides(ctx.args)
-    import_code(code_path)
+    import_code_paths(code_path)
    debug_config(
        config_path, overrides=overrides, show_funcs=show_funcs, show_vars=show_vars
    )
--- a/spacy/cli/debug_data.py
+++ b/spacy/cli/debug_data.py
@ -40,7 +40,7 @@ from ._util import (
    _format_number,
    app,
    debug_cli,
-    import_code,
+    import_code_paths,
    parse_config_overrides,
    show_validation_error,
 )
@ -72,7 +72,7 @@ def debug_data_cli(
    # fmt: off
    ctx: typer.Context,  # This is only used to read additional arguments
    config_path: Path = Arg(..., help="Path to config file", exists=True, allow_dash=True),
-    code_path: Optional[Path] = Opt(None, "--code-path", "--code", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
+    code_path: str = Opt("", "--code", "-c", help="Comma-separated paths to Python files with additional code (registered functions) to be imported"),
    ignore_warnings: bool = Opt(False, "--ignore-warnings", "-IW", help="Ignore warnings, only show stats and errors"),
    verbose: bool = Opt(False, "--verbose", "-V", help="Print additional information and explanations"),
    no_format: bool = Opt(False, "--no-format", "-NF", help="Don't pretty-print the results"),
@ -92,7 +92,7 @@ def debug_data_cli(
            "--help for an overview of the other available debugging commands."
        )
    overrides = parse_config_overrides(ctx.args)
-    import_code(code_path)
+    import_code_paths(code_path)
    debug_data(
        config_path,
        config_overrides=overrides,
--- a/spacy/cli/download.py
+++ b/spacy/cli/download.py
@ -10,6 +10,8 @@ from ..util import (
    get_installed_models,
    get_minor_version,
    get_package_version,
    is_in_interactive,
    is_in_jupyter,
    is_package,
    is_prerelease_version,
    run_command,
@ -85,6 +87,27 @@ def download(
        "Download and installation successful",
        f"You can now load the package via spacy.load('{model_name}')",
    )
    if is_in_jupyter():
        reload_deps_msg = (
            "If you are in a Jupyter or Colab notebook, you may need to "
            "restart Python in order to load all the package's dependencies. "
            "You can do this by selecting the 'Restart kernel' or 'Restart "
            "runtime' option."
        )
        msg.warn(
            "Restart to reload dependencies",
            reload_deps_msg,
        )
    elif is_in_interactive():
        reload_deps_msg = (
            "If you are in an interactive Python session, you may need to "
            "exit and restart Python to load all the package's dependencies. "
            "You can exit with Ctrl-D (or Ctrl-Z and Enter on Windows)."
        )
        msg.warn(
            "Restart to reload dependencies",
            reload_deps_msg,
        )
 def get_model_filename(model_name: str, version: str, sdist: bool = False) -> str:
--- a/spacy/cli/evaluate.py
+++ b/spacy/cli/evaluate.py
@ -10,7 +10,7 @@ from .. import displacy, util
 from ..scorer import Scorer
 from ..tokens import Doc
 from ..training import Corpus
-from ._util import Arg, Opt, app, benchmark_cli, import_code, setup_gpu
+from ._util import Arg, Opt, app, benchmark_cli, import_code_paths, setup_gpu
@benchmark_cli.command(
@ -22,12 +22,13 @@ def evaluate_cli(
    model: str = Arg(..., help="Model name or path"),
    data_path: Path = Arg(..., help="Location of binary evaluation data in .spacy format", exists=True),
    output: Optional[Path] = Opt(None, "--output", "-o", help="Output JSON file for metrics", dir_okay=False),
-    code_path: Optional[Path] = Opt(None, "--code", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
+    code_path: str = Opt("", "--code", "-c", help="Comma-separated paths to Python files with additional code (registered functions) to be imported"),
    use_gpu: int = Opt(-1, "--gpu-id", "-g", help="GPU ID or -1 for CPU"),
    gold_preproc: bool = Opt(False, "--gold-preproc", "-G", help="Use gold preprocessing"),
    displacy_path: Optional[Path] = Opt(None, "--displacy-path", "-dp", help="Directory to output rendered parses as HTML", exists=True, file_okay=False),
    displacy_limit: int = Opt(25, "--displacy-limit", "-dl", help="Limit of parses to render as HTML"),
    per_component: bool = Opt(False, "--per-component", "-P", help="Return scores per component, only applicable when an output JSON file is specified."),
    spans_key: str = Opt("sc", "--spans-key", "-sk", help="Spans key to use when evaluating Doc.spans"),
    # fmt: on
 ):
    """
@ -42,7 +43,7 @@ def evaluate_cli(
    DOCS: https://spacy.io/api/cli#benchmark-accuracy
    """
-    import_code(code_path)
+    import_code_paths(code_path)
    evaluate(
        model,
        data_path,
@ -53,6 +54,7 @@ def evaluate_cli(
        displacy_limit=displacy_limit,
        per_component=per_component,
        silent=False,
        spans_key=spans_key,
    )
--- a/spacy/cli/find_function.py
+++ b/spacy/cli/find_function.py
@ -0,0 +1,69 @@
 from typing import Optional, Tuple
 from catalogue import RegistryError
 from wasabi import msg
 from ..util import registry
 from ._util import Arg, Opt, app
@app.command("find-function")
 def find_function_cli(
    # fmt: off
    func_name: str = Arg(..., help="Name of the registered function."),
    registry_name: Optional[str] = Opt(None, "--registry", "-r", help="Name of the catalogue registry."),
    # fmt: on
 ):
    """
    Find the module, path and line number to the file the registered
    function is defined in, if available.
    func_name (str): Name of the registered function.
    registry_name (Optional[str]): Name of the catalogue registry.
    DOCS: https://spacy.io/api/cli#find-function
    """
    if not registry_name:
        registry_names = registry.get_registry_names()
        for name in registry_names:
            if registry.has(name, func_name):
                registry_name = name
                break
    if not registry_name:
        msg.fail(
            f"Couldn't find registered function: '{func_name}'",
            exits=1,
        )
    assert registry_name is not None
    find_function(func_name, registry_name)
 def find_function(func_name: str, registry_name: str) -> Tuple[str, int]:
    registry_desc = None
    try:
        registry_desc = registry.find(registry_name, func_name)
    except RegistryError as e:
        msg.fail(
            f"Couldn't find registered function: '{func_name}' in registry '{registry_name}'",
        )
        msg.fail(f"{e}", exits=1)
    assert registry_desc is not None
    registry_path = None
    line_no = None
    if registry_desc["file"]:
        registry_path = registry_desc["file"]
        line_no = registry_desc["line_no"]
    if not registry_path or not line_no:
        msg.fail(
            f"Couldn't find path to registered function: '{func_name}' in registry '{registry_name}'",
            exits=1,
        )
    assert registry_path is not None
    assert line_no is not None
    msg.good(f"Found registered function '{func_name}' at {registry_path}:{line_no}")
    return str(registry_path), int(line_no)
--- a/spacy/cli/find_threshold.py
+++ b/spacy/cli/find_threshold.py
@ -52,8 +52,8 @@ def find_threshold_cli(
    DOCS: https://spacy.io/api/cli#find-threshold
    """
-
+    if verbose:
-    util.logger.setLevel(logging.DEBUG if verbose else logging.INFO)
+        util.logger.setLevel(logging.DEBUG)
    import_code(code_path)
    find_threshold(
        model=model,
--- a/spacy/cli/init_pipeline.py
+++ b/spacy/cli/init_pipeline.py
@ -90,7 +90,8 @@ def init_pipeline_cli(
    use_gpu: int = Opt(-1, "--gpu-id", "-g", help="GPU ID or -1 for CPU")
    # fmt: on
 ):
-    util.logger.setLevel(logging.DEBUG if verbose else logging.INFO)
+    if verbose:
        util.logger.setLevel(logging.DEBUG)
    overrides = parse_config_overrides(ctx.args)
    import_code(code_path)
    setup_gpu(use_gpu)
@ -119,7 +120,8 @@ def init_labels_cli(
    """Generate JSON files for the labels in the data. This helps speed up the
    training process, since spaCy won't have to preprocess the data to
    extract the labels."""
-    util.logger.setLevel(logging.DEBUG if verbose else logging.INFO)
+    if verbose:
        util.logger.setLevel(logging.DEBUG)
    if not output_path.exists():
        output_path.mkdir(parents=True)
    overrides = parse_config_overrides(ctx.args)
--- a/spacy/cli/package.py
+++ b/spacy/cli/package.py
@ -1,5 +1,8 @@
 import importlib.metadata
 import os
 import re
 import shutil
 import subprocess
 import sys
 from collections import defaultdict
 from pathlib import Path
@ -20,7 +23,7 @@ def package_cli(
    # fmt: off
    input_dir: Path = Arg(..., help="Directory with pipeline data", exists=True, file_okay=False),
    output_dir: Path = Arg(..., help="Output parent directory", exists=True, file_okay=False),
-    code_paths: str = Opt("", "--code", "-c", help="Comma-separated paths to Python file with additional code (registered functions) to be included in the package"),
+    code_paths: str = Opt("", "--code", "-c", help="Comma-separated paths to Python files with additional code (registered functions) to be included in the package"),
    meta_path: Optional[Path] = Opt(None, "--meta-path", "--meta", "-m", help="Path to meta.json", exists=True, dir_okay=False),
    create_meta: bool = Opt(False, "--create-meta", "-C", help="Create meta.json, even if one exists"),
    name: Optional[str] = Opt(None, "--name", "-n", help="Package name to override meta"),
@ -35,7 +38,7 @@ def package_cli(
    specified output directory, and the data will be copied over. If
    --create-meta is set and a meta.json already exists in the output directory,
    the existing values will be used as the defaults in the command-line prompt.
-    After packaging, "python setup.py sdist" is run in the package directory,
+    After packaging, "python -m build --sdist" is run in the package directory,
    which will create a .tar.gz archive that can be installed via "pip install".
    If additional code files are provided (e.g. Python files containing custom
@ -78,9 +81,17 @@ def package(
    input_path = util.ensure_path(input_dir)
    output_path = util.ensure_path(output_dir)
    meta_path = util.ensure_path(meta_path)
-    if create_wheel and not has_wheel():
+    if create_wheel and not has_wheel() and not has_build():
-        err = "Generating a binary .whl file requires wheel to be installed"
+        err = (
-        msg.fail(err, "pip install wheel", exits=1)
+            "Generating wheels requires 'build' or 'wheel' (deprecated) to be installed"
        )
        msg.fail(err, "pip install build", exits=1)
    if not has_build():
        msg.warn(
            "Generating packages without the 'build' package is deprecated and "
            "will not be supported in the future. To install 'build': pip "
            "install build"
        )
    if not input_path or not input_path.exists():
        msg.fail("Can't locate pipeline data", input_path, exits=1)
    if not output_path or not output_path.exists():
@ -184,12 +195,37 @@ def package(
    msg.good(f"Successfully created package directory '{model_name_v}'", main_path)
    if create_sdist:
        with util.working_dir(main_path):
-            util.run_command([sys.executable, "setup.py", "sdist"], capture=False)
+            # run directly, since util.run_command is not designed to continue
            # after a command fails
            ret = subprocess.run(
                [sys.executable, "-m", "build", ".", "--sdist"],
                env=os.environ.copy(),
            )
            if ret.returncode != 0:
                msg.warn(
                    "Creating sdist with 'python -m build' failed. Falling "
                    "back to deprecated use of 'python setup.py sdist'"
                )
                util.run_command([sys.executable, "setup.py", "sdist"], capture=False)
        zip_file = main_path / "dist" / f"{model_name_v}{SDIST_SUFFIX}"
        msg.good(f"Successfully created zipped Python package", zip_file)
    if create_wheel:
        with util.working_dir(main_path):
-            util.run_command([sys.executable, "setup.py", "bdist_wheel"], capture=False)
+            # run directly, since util.run_command is not designed to continue
            # after a command fails
            ret = subprocess.run(
                [sys.executable, "-m", "build", ".", "--wheel"],
                env=os.environ.copy(),
            )
            if ret.returncode != 0:
                msg.warn(
                    "Creating wheel with 'python -m build' failed. Falling "
                    "back to deprecated use of 'wheel' with "
                    "'python setup.py bdist_wheel'"
                )
                util.run_command(
                    [sys.executable, "setup.py", "bdist_wheel"], capture=False
                )
        wheel_name_squashed = re.sub("_+", "_", model_name_v)
        wheel = main_path / "dist" / f"{wheel_name_squashed}{WHEEL_SUFFIX}"
        msg.good(f"Successfully created binary wheel", wheel)
@ -209,6 +245,17 @@ def has_wheel() -> bool:
        return False
 def has_build() -> bool:
    # it's very likely that there is a local directory named build/ (especially
    # in an editable install), so an import check is not sufficient; instead
    # check that there is a package version
    try:
        importlib.metadata.version("build")
        return True
    except importlib.metadata.PackageNotFoundError:  # type: ignore[attr-defined]
        return False
 def get_third_party_dependencies(
    config: Config, exclude: List[str] = util.SimpleFrozenList()
 ) -> List[str]:
@ -403,7 +450,7 @@ def _format_sources(data: Any) -> str:
        if author:
            result += " ({})".format(author)
        sources.append(result)
-    return "<br />".join(sources)
+    return "<br>".join(sources)
 def _format_accuracy(data: Dict[str, Any], exclude: List[str] = ["speed"]) -> str:
--- a/spacy/cli/pretrain.py
+++ b/spacy/cli/pretrain.py
@ -11,7 +11,7 @@ from ._util import (
    Arg,
    Opt,
    app,
-    import_code,
+    import_code_paths,
    parse_config_overrides,
    setup_gpu,
    show_validation_error,
@ -27,7 +27,7 @@ def pretrain_cli(
    ctx: typer.Context,  # This is only used to read additional arguments
    config_path: Path = Arg(..., help="Path to config file", exists=True, dir_okay=False, allow_dash=True),
    output_dir: Path = Arg(..., help="Directory to write weights to on each epoch"),
-    code_path: Optional[Path] = Opt(None, "--code", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
+    code_path: str = Opt("", "--code", "-c", help="Comma-separated paths to Python files with additional code (registered functions) to be imported"),
    resume_path: Optional[Path] = Opt(None, "--resume-path", "-r", help="Path to pretrained weights from which to resume pretraining"),
    epoch_resume: Optional[int] = Opt(None, "--epoch-resume", "-er", help="The epoch to resume counting from when using --resume-path. Prevents unintended overwriting of existing weight files."),
    use_gpu: int = Opt(-1, "--gpu-id", "-g", help="GPU ID or -1 for CPU"),
@ -56,7 +56,7 @@ def pretrain_cli(
    DOCS: https://spacy.io/api/cli#pretrain
    """
    config_overrides = parse_config_overrides(ctx.args)
-    import_code(code_path)
+    import_code_paths(code_path)
    verify_cli_args(config_path, output_dir, resume_path, epoch_resume)
    setup_gpu(use_gpu)
    msg.info(f"Loading config from: {config_path}")
--- a/spacy/cli/profile.py
+++ b/spacy/cli/profile.py
@ -71,7 +71,7 @@ def profile(model: str, inputs: Optional[Path] = None, n_texts: int = 10000) ->
 def parse_texts(nlp: Language, texts: Sequence[str]) -> None:
-    for doc in nlp.pipe(tqdm.tqdm(texts), batch_size=16):
+    for doc in nlp.pipe(tqdm.tqdm(texts, disable=None), batch_size=16):
        pass
--- a/spacy/cli/project/assets.py
+++ b/spacy/cli/project/assets.py
@ -1,217 +1 @@
-import os
+from weasel.cli.assets import *
 import re
 import shutil
 from pathlib import Path
 from typing import Any, Dict, Optional
 import requests
 import typer
 from wasabi import msg
 from ...util import ensure_path, working_dir
 from .._util import (
    PROJECT_FILE,
    Arg,
    Opt,
    SimpleFrozenDict,
    download_file,
    get_checksum,
    get_git_version,
    git_checkout,
    load_project_config,
    parse_config_overrides,
    project_cli,
 )
 # Whether assets are extra if `extra` is not set.
 EXTRA_DEFAULT = False
@project_cli.command(
    "assets",
    context_settings={"allow_extra_args": True, "ignore_unknown_options": True},
 )
 def project_assets_cli(
    # fmt: off
    ctx: typer.Context,  # This is only used to read additional arguments
    project_dir: Path = Arg(Path.cwd(), help="Path to cloned project. Defaults to current working directory.", exists=True, file_okay=False),
    sparse_checkout: bool = Opt(False, "--sparse", "-S", help="Use sparse checkout for assets provided via Git, to only check out and clone the files needed. Requires Git v22.2+."),
    extra: bool = Opt(False, "--extra", "-e", help="Download all assets, including those marked as 'extra'.")
    # fmt: on
 ):
    """Fetch project assets like datasets and pretrained weights. Assets are
    defined in the "assets" section of the project.yml. If a checksum is
    provided in the project.yml, the file is only downloaded if no local file
    with the same checksum exists.
    DOCS: https://spacy.io/api/cli#project-assets
    """
    overrides = parse_config_overrides(ctx.args)
    project_assets(
        project_dir,
        overrides=overrides,
        sparse_checkout=sparse_checkout,
        extra=extra,
    )
 def project_assets(
    project_dir: Path,
    *,
    overrides: Dict[str, Any] = SimpleFrozenDict(),
    sparse_checkout: bool = False,
    extra: bool = False,
 ) -> None:
    """Fetch assets for a project using DVC if possible.
    project_dir (Path): Path to project directory.
    sparse_checkout (bool): Use sparse checkout for assets provided via Git, to only check out and clone the files
                            needed.
    extra (bool): Whether to download all assets, including those marked as 'extra'.
    """
    project_path = ensure_path(project_dir)
    config = load_project_config(project_path, overrides=overrides)
    assets = [
        asset
        for asset in config.get("assets", [])
        if extra or not asset.get("extra", EXTRA_DEFAULT)
    ]
    if not assets:
        msg.warn(
            f"No assets specified in {PROJECT_FILE} (if assets are marked as extra, download them with --extra)",
            exits=0,
        )
    msg.info(f"Fetching {len(assets)} asset(s)")
    for asset in assets:
        dest = (project_dir / asset["dest"]).resolve()
        checksum = asset.get("checksum")
        if "git" in asset:
            git_err = (
                f"Cloning spaCy project templates requires Git and the 'git' command. "
                f"Make sure it's installed and that the executable is available."
            )
            get_git_version(error=git_err)
            if dest.exists():
                # If there's already a file, check for checksum
                if checksum and checksum == get_checksum(dest):
                    msg.good(
                        f"Skipping download with matching checksum: {asset['dest']}"
                    )
                    continue
                else:
                    if dest.is_dir():
                        shutil.rmtree(dest)
                    else:
                        dest.unlink()
            if "repo" not in asset["git"] or asset["git"]["repo"] is None:
                msg.fail(
                    "A git asset must include 'repo', the repository address.", exits=1
                )
            if "path" not in asset["git"] or asset["git"]["path"] is None:
                msg.fail(
                    "A git asset must include 'path' - use \"\" to get the entire repository.",
                    exits=1,
                )
            git_checkout(
                asset["git"]["repo"],
                asset["git"]["path"],
                dest,
                branch=asset["git"].get("branch"),
                sparse=sparse_checkout,
            )
            msg.good(f"Downloaded asset {dest}")
        else:
            url = asset.get("url")
            if not url:
                # project.yml defines asset without URL that the user has to place
                check_private_asset(dest, checksum)
                continue
            fetch_asset(project_path, url, dest, checksum)
 def check_private_asset(dest: Path, checksum: Optional[str] = None) -> None:
    """Check and validate assets without a URL (private assets that the user
    has to provide themselves) and give feedback about the checksum.
    dest (Path): Destination path of the asset.
    checksum (Optional[str]): Optional checksum of the expected file.
    """
    if not Path(dest).exists():
        err = f"No URL provided for asset. You need to add this file yourself: {dest}"
        msg.warn(err)
    else:
        if not checksum:
            msg.good(f"Asset already exists: {dest}")
        elif checksum == get_checksum(dest):
            msg.good(f"Asset exists with matching checksum: {dest}")
        else:
            msg.fail(f"Asset available but with incorrect checksum: {dest}")
 def fetch_asset(
    project_path: Path, url: str, dest: Path, checksum: Optional[str] = None
 ) -> None:
    """Fetch an asset from a given URL or path. If a checksum is provided and a
    local file exists, it's only re-downloaded if the checksum doesn't match.
    project_path (Path): Path to project directory.
    url (str): URL or path to asset.
    checksum (Optional[str]): Optional expected checksum of local file.
    RETURNS (Optional[Path]): The path to the fetched asset or None if fetching
        the asset failed.
    """
    dest_path = (project_path / dest).resolve()
    if dest_path.exists():
        # If there's already a file, check for checksum
        if checksum:
            if checksum == get_checksum(dest_path):
                msg.good(f"Skipping download with matching checksum: {dest}")
                return
        else:
            # If there's not a checksum, make sure the file is a possibly valid size
            if os.path.getsize(dest_path) == 0:
                msg.warn(f"Asset exists but with size of 0 bytes, deleting: {dest}")
                os.remove(dest_path)
    # We might as well support the user here and create parent directories in
    # case the asset dir isn't listed as a dir to create in the project.yml
    if not dest_path.parent.exists():
        dest_path.parent.mkdir(parents=True)
    with working_dir(project_path):
        url = convert_asset_url(url)
        try:
            download_file(url, dest_path)
            msg.good(f"Downloaded asset {dest}")
        except requests.exceptions.RequestException as e:
            if Path(url).exists() and Path(url).is_file():
                # If it's a local file, copy to destination
                shutil.copy(url, str(dest_path))
                msg.good(f"Copied local asset {dest}")
            else:
                msg.fail(f"Download failed: {dest}", e)
    if checksum and checksum != get_checksum(dest_path):
        msg.fail(f"Checksum doesn't match value defined in {PROJECT_FILE}: {dest}")
 def convert_asset_url(url: str) -> str:
    """Check and convert the asset URL if needed.
    url (str): The asset URL.
    RETURNS (str): The converted URL.
    """
    # If the asset URL is a regular GitHub URL it's likely a mistake
    if (
        re.match(r"(http(s?)):\/\/github.com", url)
        and "releases/download" not in url
        and "/raw/" not in url
    ):
        converted = url.replace("github.com", "raw.githubusercontent.com")
        converted = re.sub(r"/(tree|blob)/", "/", converted)
        msg.warn(
            "Downloading from a regular GitHub URL. This will only download "
            "the source of the page, not the actual file. Converting the URL "
            "to a raw URL.",
            converted,
        )
        return converted
    return url
--- a/spacy/cli/project/clone.py
+++ b/spacy/cli/project/clone.py
@ -1,124 +1 @@
-import re
+from weasel.cli.clone import *
 import subprocess
 from pathlib import Path
 from typing import Optional
 from wasabi import msg
 from ... import about
 from ...util import ensure_path
 from .._util import (
    COMMAND,
    PROJECT_FILE,
    Arg,
    Opt,
    get_git_version,
    git_checkout,
    git_repo_branch_exists,
    project_cli,
 )
 DEFAULT_REPO = about.__projects__
 DEFAULT_PROJECTS_BRANCH = about.__projects_branch__
 DEFAULT_BRANCHES = ["main", "master"]
@project_cli.command("clone")
 def project_clone_cli(
    # fmt: off
    name: str = Arg(..., help="The name of the template to clone"),
    dest: Optional[Path] = Arg(None, help="Where to clone the project. Defaults to current working directory", exists=False),
    repo: str = Opt(DEFAULT_REPO, "--repo", "-r", help="The repository to clone from"),
    branch: Optional[str] = Opt(None, "--branch", "-b", help=f"The branch to clone from. If not provided, will attempt {', '.join(DEFAULT_BRANCHES)}"),
    sparse_checkout: bool = Opt(False, "--sparse", "-S", help="Use sparse Git checkout to only check out and clone the files needed. Requires Git v22.2+.")
    # fmt: on
 ):
    """Clone a project template from a repository. Calls into "git" and will
    only download the files from the given subdirectory. The GitHub repo
    defaults to the official spaCy template repo, but can be customized
    (including using a private repo).
    DOCS: https://spacy.io/api/cli#project-clone
    """
    if dest is None:
        dest = Path.cwd() / Path(name).parts[-1]
    if repo == DEFAULT_REPO and branch is None:
        branch = DEFAULT_PROJECTS_BRANCH
    if branch is None:
        for default_branch in DEFAULT_BRANCHES:
            if git_repo_branch_exists(repo, default_branch):
                branch = default_branch
                break
        if branch is None:
            default_branches_msg = ", ".join(f"'{b}'" for b in DEFAULT_BRANCHES)
            msg.fail(
                "No branch provided and attempted default "
                f"branches {default_branches_msg} do not exist.",
                exits=1,
            )
    else:
        if not git_repo_branch_exists(repo, branch):
            msg.fail(f"repo: {repo} (branch: {branch}) does not exist.", exits=1)
    assert isinstance(branch, str)
    project_clone(name, dest, repo=repo, branch=branch, sparse_checkout=sparse_checkout)
 def project_clone(
    name: str,
    dest: Path,
    *,
    repo: str = about.__projects__,
    branch: str = about.__projects_branch__,
    sparse_checkout: bool = False,
 ) -> None:
    """Clone a project template from a repository.
    name (str): Name of subdirectory to clone.
    dest (Path): Destination path of cloned project.
    repo (str): URL of Git repo containing project templates.
    branch (str): The branch to clone from
    """
    dest = ensure_path(dest)
    check_clone(name, dest, repo)
    project_dir = dest.resolve()
    repo_name = re.sub(r"(http(s?)):\/\/github.com/", "", repo)
    try:
        git_checkout(repo, name, dest, branch=branch, sparse=sparse_checkout)
    except subprocess.CalledProcessError:
        err = f"Could not clone '{name}' from repo '{repo_name}' (branch '{branch}')"
        msg.fail(err, exits=1)
    msg.good(f"Cloned '{name}' from '{repo_name}' (branch '{branch}')", project_dir)
    if not (project_dir / PROJECT_FILE).exists():
        msg.warn(f"No {PROJECT_FILE} found in directory")
    else:
        msg.good(f"Your project is now ready!")
        print(f"To fetch the assets, run:\n{COMMAND} project assets {dest}")
 def check_clone(name: str, dest: Path, repo: str) -> None:
    """Check and validate that the destination path can be used to clone. Will
    check that Git is available and that the destination path is suitable.
    name (str): Name of the directory to clone from the repo.
    dest (Path): Local destination of cloned directory.
    repo (str): URL of the repo to clone from.
    """
    git_err = (
        f"Cloning spaCy project templates requires Git and the 'git' command. "
        f"To clone a project without Git, copy the files from the '{name}' "
        f"directory in the {repo} to {dest} manually."
    )
    get_git_version(error=git_err)
    if not dest:
        msg.fail(f"Not a valid directory to clone project: {dest}", exits=1)
    if dest.exists():
        # Directory already exists (not allowed, clone needs to create it)
        msg.fail(f"Can't clone project, directory already exists: {dest}", exits=1)
    if not dest.parent.exists():
        # We're not creating parents, parent dir should exist
        msg.fail(
            f"Can't clone project, parent directory doesn't exist: {dest.parent}. "
            f"Create the necessary folder(s) first before continuing.",
            exits=1,
        )
--- a/spacy/cli/project/document.py
+++ b/spacy/cli/project/document.py
@ -1,115 +1 @@
-from pathlib import Path
+from weasel.cli.document import *
 from wasabi import MarkdownRenderer, msg
 from ...util import working_dir
 from .._util import PROJECT_FILE, Arg, Opt, load_project_config, project_cli
 DOCS_URL = "https://spacy.io"
 INTRO_PROJECT = f"""The [`{PROJECT_FILE}`]({PROJECT_FILE}) defines the data assets required by the
 project, as well as the available commands and workflows. For details, see the
 [spaCy projects documentation]({DOCS_URL}/usage/projects)."""
 INTRO_COMMANDS = f"""The following commands are defined by the project. They
 can be executed using [`spacy project run [name]`]({DOCS_URL}/api/cli#project-run).
 Commands are only re-run if their inputs have changed."""
 INTRO_WORKFLOWS = f"""The following workflows are defined by the project. They
 can be executed using [`spacy project run [name]`]({DOCS_URL}/api/cli#project-run)
 and will run the specified commands in order. Commands are only re-run if their
 inputs have changed."""
 INTRO_ASSETS = f"""The following assets are defined by the project. They can
 be fetched by running [`spacy project assets`]({DOCS_URL}/api/cli#project-assets)
 in the project directory."""
 # These markers are added to the Markdown and can be used to update the file in
 # place if it already exists. Only the auto-generated part will be replaced.
 MARKER_START = "<!-- SPACY PROJECT: AUTO-GENERATED DOCS START (do not remove) -->"
 MARKER_END = "<!-- SPACY PROJECT: AUTO-GENERATED DOCS END (do not remove) -->"
 # If this marker is used in an existing README, it's ignored and not replaced
 MARKER_IGNORE = "<!-- SPACY PROJECT: IGNORE -->"
@project_cli.command("document")
 def project_document_cli(
    # fmt: off
    project_dir: Path = Arg(Path.cwd(), help="Path to cloned project. Defaults to current working directory.", exists=True, file_okay=False),
    output_file: Path = Opt("-", "--output", "-o", help="Path to output Markdown file for output. Defaults to - for standard output"),
    no_emoji: bool = Opt(False, "--no-emoji", "-NE", help="Don't use emoji")
    # fmt: on
 ):
    """
    Auto-generate a README.md for a project. If the content is saved to a file,
    hidden markers are added so you can add custom content before or after the
    auto-generated section and only the auto-generated docs will be replaced
    when you re-run the command.
    DOCS: https://spacy.io/api/cli#project-document
    """
    project_document(project_dir, output_file, no_emoji=no_emoji)
 def project_document(
    project_dir: Path, output_file: Path, *, no_emoji: bool = False
 ) -> None:
    is_stdout = str(output_file) == "-"
    config = load_project_config(project_dir)
    md = MarkdownRenderer(no_emoji=no_emoji)
    md.add(MARKER_START)
    title = config.get("title")
    description = config.get("description")
    md.add(md.title(1, f"spaCy Project{f': {title}' if title else ''}", "🪐"))
    if description:
        md.add(description)
    md.add(md.title(2, PROJECT_FILE, "📋"))
    md.add(INTRO_PROJECT)
    # Commands
    cmds = config.get("commands", [])
    data = [(md.code(cmd["name"]), cmd.get("help", "")) for cmd in cmds]
    if data:
        md.add(md.title(3, "Commands", "⏯"))
        md.add(INTRO_COMMANDS)
        md.add(md.table(data, ["Command", "Description"]))
    # Workflows
    wfs = config.get("workflows", {}).items()
    data = [(md.code(n), " &rarr; ".join(md.code(w) for w in stp)) for n, stp in wfs]
    if data:
        md.add(md.title(3, "Workflows", "⏭"))
        md.add(INTRO_WORKFLOWS)
        md.add(md.table(data, ["Workflow", "Steps"]))
    # Assets
    assets = config.get("assets", [])
    data = []
    for a in assets:
        source = "Git" if a.get("git") else "URL" if a.get("url") else "Local"
        dest_path = a["dest"]
        dest = md.code(dest_path)
        if source == "Local":
            # Only link assets if they're in the repo
            with working_dir(project_dir) as p:
                if (p / dest_path).exists():
                    dest = md.link(dest, dest_path)
        data.append((dest, source, a.get("description", "")))
    if data:
        md.add(md.title(3, "Assets", "🗂"))
        md.add(INTRO_ASSETS)
        md.add(md.table(data, ["File", "Source", "Description"]))
    md.add(MARKER_END)
    # Output result
    if is_stdout:
        print(md.text)
    else:
        content = md.text
        if output_file.exists():
            with output_file.open("r", encoding="utf8") as f:
                existing = f.read()
            if MARKER_IGNORE in existing:
                msg.warn("Found ignore marker in existing file: skipping", output_file)
                return
            if MARKER_START in existing and MARKER_END in existing:
                msg.info("Found existing file: only replacing auto-generated docs")
                before = existing.split(MARKER_START)[0]
                after = existing.split(MARKER_END)[1]
                content = f"{before}{content}{after}"
            else:
                msg.warn("Replacing existing file")
        with output_file.open("w", encoding="utf8") as f:
            f.write(content)
        msg.good("Saved project documentation", output_file)
--- a/spacy/cli/project/dvc.py
+++ b/spacy/cli/project/dvc.py
@ -1,220 +1 @@
-"""This module contains helpers and subcommands for integrating spaCy projects
+from weasel.cli.dvc import *
 with Data Version Controk (DVC). https://dvc.org"""
 import subprocess
 from pathlib import Path
 from typing import Any, Dict, Iterable, List, Optional
 from wasabi import msg
 from ...util import (
    SimpleFrozenList,
    join_command,
    run_command,
    split_command,
    working_dir,
 )
 from .._util import (
    COMMAND,
    NAME,
    PROJECT_FILE,
    Arg,
    Opt,
    get_hash,
    load_project_config,
    project_cli,
 )
 DVC_CONFIG = "dvc.yaml"
 DVC_DIR = ".dvc"
 UPDATE_COMMAND = "dvc"
 DVC_CONFIG_COMMENT = f"""# This file is auto-generated by spaCy based on your {PROJECT_FILE}. If you've
 # edited your {PROJECT_FILE}, you can regenerate this file by running:
 # {COMMAND} project {UPDATE_COMMAND}"""
@project_cli.command(UPDATE_COMMAND)
 def project_update_dvc_cli(
    # fmt: off
    project_dir: Path = Arg(Path.cwd(), help="Location of project directory. Defaults to current working directory.", exists=True, file_okay=False),
    workflow: Optional[str] = Arg(None, help=f"Name of workflow defined in {PROJECT_FILE}. Defaults to first workflow if not set."),
    verbose: bool = Opt(False, "--verbose", "-V", help="Print more info"),
    quiet: bool = Opt(False, "--quiet", "-q", help="Print less info"),
    force: bool = Opt(False, "--force", "-F", help="Force update DVC config"),
    # fmt: on
 ):
    """Auto-generate Data Version Control (DVC) config. A DVC
    project can only define one pipeline, so you need to specify one workflow
    defined in the project.yml. If no workflow is specified, the first defined
    workflow is used. The DVC config will only be updated if the project.yml
    changed.
    DOCS: https://spacy.io/api/cli#project-dvc
    """
    project_update_dvc(project_dir, workflow, verbose=verbose, quiet=quiet, force=force)
 def project_update_dvc(
    project_dir: Path,
    workflow: Optional[str] = None,
    *,
    verbose: bool = False,
    quiet: bool = False,
    force: bool = False,
 ) -> None:
    """Update the auto-generated Data Version Control (DVC) config file. A DVC
    project can only define one pipeline, so you need to specify one workflow
    defined in the project.yml. Will only update the file if the checksum changed.
    project_dir (Path): The project directory.
    workflow (Optional[str]): Optional name of workflow defined in project.yml.
        If not set, the first workflow will be used.
    verbose (bool): Print more info.
    quiet (bool): Print less info.
    force (bool): Force update DVC config.
    """
    config = load_project_config(project_dir)
    updated = update_dvc_config(
        project_dir, config, workflow, verbose=verbose, quiet=quiet, force=force
    )
    help_msg = "To execute the workflow with DVC, run: dvc repro"
    if updated:
        msg.good(f"Updated DVC config from {PROJECT_FILE}", help_msg)
    else:
        msg.info(f"No changes found in {PROJECT_FILE}, no update needed", help_msg)
 def update_dvc_config(
    path: Path,
    config: Dict[str, Any],
    workflow: Optional[str] = None,
    verbose: bool = False,
    quiet: bool = False,
    force: bool = False,
 ) -> bool:
    """Re-run the DVC commands in dry mode and update dvc.yaml file in the
    project directory. The file is auto-generated based on the config. The
    first line of the auto-generated file specifies the hash of the config
    dict, so if any of the config values change, the DVC config is regenerated.
    path (Path): The path to the project directory.
    config (Dict[str, Any]): The loaded project.yml.
    verbose (bool): Whether to print additional info (via DVC).
    quiet (bool): Don't output anything (via DVC).
    force (bool): Force update, even if hashes match.
    RETURNS (bool): Whether the DVC config file was updated.
    """
    ensure_dvc(path)
    workflows = config.get("workflows", {})
    workflow_names = list(workflows.keys())
    check_workflows(workflow_names, workflow)
    if not workflow:
        workflow = workflow_names[0]
    config_hash = get_hash(config)
    path = path.resolve()
    dvc_config_path = path / DVC_CONFIG
    if dvc_config_path.exists():
        # Check if the file was generated using the current config, if not, redo
        with dvc_config_path.open("r", encoding="utf8") as f:
            ref_hash = f.readline().strip().replace("# ", "")
        if ref_hash == config_hash and not force:
            return False  # Nothing has changed in project.yml, don't need to update
        dvc_config_path.unlink()
    dvc_commands = []
    config_commands = {cmd["name"]: cmd for cmd in config.get("commands", [])}
    # some flags that apply to every command
    flags = []
    if verbose:
        flags.append("--verbose")
    if quiet:
        flags.append("--quiet")
    for name in workflows[workflow]:
        command = config_commands[name]
        deps = command.get("deps", [])
        outputs = command.get("outputs", [])
        outputs_no_cache = command.get("outputs_no_cache", [])
        if not deps and not outputs and not outputs_no_cache:
            continue
        # Default to the working dir as the project path since dvc.yaml is auto-generated
        # and we don't want arbitrary paths in there
        project_cmd = ["python", "-m", NAME, "project", "run", name]
        deps_cmd = [c for cl in [["-d", p] for p in deps] for c in cl]
        outputs_cmd = [c for cl in [["-o", p] for p in outputs] for c in cl]
        outputs_nc_cmd = [c for cl in [["-O", p] for p in outputs_no_cache] for c in cl]
        dvc_cmd = ["run", *flags, "-n", name, "-w", str(path), "--no-exec"]
        if command.get("no_skip"):
            dvc_cmd.append("--always-changed")
        full_cmd = [*dvc_cmd, *deps_cmd, *outputs_cmd, *outputs_nc_cmd, *project_cmd]
        dvc_commands.append(join_command(full_cmd))
    if not dvc_commands:
        # If we don't check for this, then there will be an error when reading the
        # config, since DVC wouldn't create it.
        msg.fail(
            "No usable commands for DVC found. This can happen if none of your "
            "commands have dependencies or outputs.",
            exits=1,
        )
    with working_dir(path):
        for c in dvc_commands:
            dvc_command = "dvc " + c
            run_command(dvc_command)
    with dvc_config_path.open("r+", encoding="utf8") as f:
        content = f.read()
        f.seek(0, 0)
        f.write(f"# {config_hash}\n{DVC_CONFIG_COMMENT}\n{content}")
    return True
 def check_workflows(workflows: List[str], workflow: Optional[str] = None) -> None:
    """Validate workflows provided in project.yml and check that a given
    workflow can be used to generate a DVC config.
    workflows (List[str]): Names of the available workflows.
    workflow (Optional[str]): The name of the workflow to convert.
    """
    if not workflows:
        msg.fail(
            f"No workflows defined in {PROJECT_FILE}. To generate a DVC config, "
            f"define at least one list of commands.",
            exits=1,
        )
    if workflow is not None and workflow not in workflows:
        msg.fail(
            f"Workflow '{workflow}' not defined in {PROJECT_FILE}. "
            f"Available workflows: {', '.join(workflows)}",
            exits=1,
        )
    if not workflow:
        msg.warn(
            f"No workflow specified for DVC pipeline. Using the first workflow "
            f"defined in {PROJECT_FILE}: '{workflows[0]}'"
        )
 def ensure_dvc(project_dir: Path) -> None:
    """Ensure that the "dvc" command is available and that the current project
    directory is an initialized DVC project.
    """
    try:
        subprocess.run(["dvc", "--version"], stdout=subprocess.DEVNULL)
    except Exception:
        msg.fail(
            "To use spaCy projects with DVC (Data Version Control), DVC needs "
            "to be installed and the 'dvc' command needs to be available",
            "You can install the Python package from pip (pip install dvc) or "
            "conda (conda install -c conda-forge dvc). For more details, see the "
            "documentation: https://dvc.org/doc/install",
            exits=1,
        )
    if not (project_dir / ".dvc").exists():
        msg.fail(
            "Project not initialized as a DVC project",
            "To initialize a DVC project, you can run 'dvc init' in the project "
            "directory. For more details, see the documentation: "
            "https://dvc.org/doc/command-reference/init",
            exits=1,
        )
--- a/spacy/cli/project/pull.py
+++ b/spacy/cli/project/pull.py
@ -1,67 +1 @@
-from pathlib import Path
+from weasel.cli.pull import *
 from wasabi import msg
 from .._util import Arg, load_project_config, logger, project_cli
 from .remote_storage import RemoteStorage, get_command_hash
 from .run import update_lockfile
@project_cli.command("pull")
 def project_pull_cli(
    # fmt: off
    remote: str = Arg("default", help="Name or path of remote storage"),
    project_dir: Path = Arg(Path.cwd(), help="Location of project directory. Defaults to current working directory.", exists=True, file_okay=False),
    # fmt: on
 ):
    """Retrieve available precomputed outputs from a remote storage.
    You can alias remotes in your project.yml by mapping them to storage paths.
    A storage can be anything that the smart-open library can upload to, e.g.
    AWS, Google Cloud Storage, SSH, local directories etc.
    DOCS: https://spacy.io/api/cli#project-pull
    """
    for url, output_path in project_pull(project_dir, remote):
        if url is not None:
            msg.good(f"Pulled {output_path} from {url}")
 def project_pull(project_dir: Path, remote: str, *, verbose: bool = False):
    # TODO: We don't have tests for this :(. It would take a bit of mockery to
    # set up. I guess see if it breaks first?
    config = load_project_config(project_dir)
    if remote in config.get("remotes", {}):
        remote = config["remotes"][remote]
    storage = RemoteStorage(project_dir, remote)
    commands = list(config.get("commands", []))
    # We use a while loop here because we don't know how the commands
    # will be ordered. A command might need dependencies from one that's later
    # in the list.
    while commands:
        for i, cmd in enumerate(list(commands)):
            logger.debug("CMD: %s.", cmd["name"])
            deps = [project_dir / dep for dep in cmd.get("deps", [])]
            if all(dep.exists() for dep in deps):
                cmd_hash = get_command_hash("", "", deps, cmd["script"])
                for output_path in cmd.get("outputs", []):
                    url = storage.pull(output_path, command_hash=cmd_hash)
                    logger.debug(
                        "URL: %s for %s with command hash %s",
                        url,
                        output_path,
                        cmd_hash,
                    )
                    yield url, output_path
                out_locs = [project_dir / out for out in cmd.get("outputs", [])]
                if all(loc.exists() for loc in out_locs):
                    update_lockfile(project_dir, cmd)
                # We remove the command from the list here, and break, so that
                # we iterate over the loop again.
                commands.pop(i)
                break
            else:
                logger.debug("Dependency missing. Skipping %s outputs.", cmd["name"])
        else:
            # If we didn't break the for loop, break the while loop.
            break
--- a/spacy/cli/project/push.py
+++ b/spacy/cli/project/push.py
@ -1,69 +1 @@
-from pathlib import Path
+from weasel.cli.push import *
 from wasabi import msg
 from .._util import Arg, load_project_config, logger, project_cli
 from .remote_storage import RemoteStorage, get_command_hash, get_content_hash
@project_cli.command("push")
 def project_push_cli(
    # fmt: off
    remote: str = Arg("default", help="Name or path of remote storage"),
    project_dir: Path = Arg(Path.cwd(), help="Location of project directory. Defaults to current working directory.", exists=True, file_okay=False),
    # fmt: on
 ):
    """Persist outputs to a remote storage. You can alias remotes in your
    project.yml by mapping them to storage paths. A storage can be anything that
    the smart-open library can upload to, e.g. AWS, Google Cloud Storage, SSH,
    local directories etc.
    DOCS: https://spacy.io/api/cli#project-push
    """
    for output_path, url in project_push(project_dir, remote):
        if url is None:
            msg.info(f"Skipping {output_path}")
        else:
            msg.good(f"Pushed {output_path} to {url}")
 def project_push(project_dir: Path, remote: str):
    """Persist outputs to a remote storage. You can alias remotes in your project.yml
    by mapping them to storage paths. A storage can be anything that the smart-open
    library can upload to, e.g. gcs, aws, ssh, local directories etc
    """
    config = load_project_config(project_dir)
    if remote in config.get("remotes", {}):
        remote = config["remotes"][remote]
    storage = RemoteStorage(project_dir, remote)
    for cmd in config.get("commands", []):
        logger.debug("CMD: %s", cmd["name"])
        deps = [project_dir / dep for dep in cmd.get("deps", [])]
        if any(not dep.exists() for dep in deps):
            logger.debug("Dependency missing. Skipping %s outputs", cmd["name"])
            continue
        cmd_hash = get_command_hash(
            "", "", [project_dir / dep for dep in cmd.get("deps", [])], cmd["script"]
        )
        logger.debug("CMD_HASH: %s", cmd_hash)
        for output_path in cmd.get("outputs", []):
            output_loc = project_dir / output_path
            if output_loc.exists() and _is_not_empty_dir(output_loc):
                url = storage.push(
                    output_path,
                    command_hash=cmd_hash,
                    content_hash=get_content_hash(output_loc),
                )
                logger.debug(
                    "URL: %s for output %s with cmd_hash %s", url, output_path, cmd_hash
                )
                yield output_path, url
 def _is_not_empty_dir(loc: Path):
    if not loc.is_dir():
        return True
    elif any(_is_not_empty_dir(child) for child in loc.iterdir()):
        return True
    else:
        return False
--- a/spacy/cli/project/remote_storage.py
+++ b/spacy/cli/project/remote_storage.py
@ -1,212 +1 @@
-import hashlib
+from weasel.cli.remote_storage import *
 import os
 import site
 import tarfile
 import urllib.parse
 from pathlib import Path
 from typing import TYPE_CHECKING, Dict, List, Optional
 from wasabi import msg
 from ... import about
 from ...errors import Errors
 from ...git_info import GIT_VERSION
 from ...util import ENV_VARS, check_bool_env_var, get_minor_version
 from .._util import (
    download_file,
    ensure_pathy,
    get_checksum,
    get_hash,
    make_tempdir,
    upload_file,
 )
 if TYPE_CHECKING:
    from pathy import FluidPath  # noqa: F401
 class RemoteStorage:
    """Push and pull outputs to and from a remote file storage.
    Remotes can be anything that `smart-open` can support: AWS, GCS, file system,
    ssh, etc.
    """
    def __init__(self, project_root: Path, url: str, *, compression="gz"):
        self.root = project_root
        self.url = ensure_pathy(url)
        self.compression = compression
    def push(self, path: Path, command_hash: str, content_hash: str) -> "FluidPath":
        """Compress a file or directory within a project and upload it to a remote
        storage. If an object exists at the full URL, nothing is done.
        Within the remote storage, files are addressed by their project path
        (url encoded) and two user-supplied hashes, representing their creation
        context and their file contents. If the URL already exists, the data is
        not uploaded. Paths are archived and compressed prior to upload.
        """
        loc = self.root / path
        if not loc.exists():
            raise IOError(f"Cannot push {loc}: does not exist.")
        url = self.make_url(path, command_hash, content_hash)
        if url.exists():
            return url
        tmp: Path
        with make_tempdir() as tmp:
            tar_loc = tmp / self.encode_name(str(path))
            mode_string = f"w:{self.compression}" if self.compression else "w"
            with tarfile.open(tar_loc, mode=mode_string) as tar_file:
                tar_file.add(str(loc), arcname=str(path))
            upload_file(tar_loc, url)
        return url
    def pull(
        self,
        path: Path,
        *,
        command_hash: Optional[str] = None,
        content_hash: Optional[str] = None,
    ) -> Optional["FluidPath"]:
        """Retrieve a file from the remote cache. If the file already exists,
        nothing is done.
        If the command_hash and/or content_hash are specified, only matching
        results are returned. If no results are available, an error is raised.
        """
        dest = self.root / path
        if dest.exists():
            return None
        url = self.find(path, command_hash=command_hash, content_hash=content_hash)
        if url is None:
            return url
        else:
            # Make sure the destination exists
            if not dest.parent.exists():
                dest.parent.mkdir(parents=True)
            tmp: Path
            with make_tempdir() as tmp:
                tar_loc = tmp / url.parts[-1]
                download_file(url, tar_loc)
                mode_string = f"r:{self.compression}" if self.compression else "r"
                with tarfile.open(tar_loc, mode=mode_string) as tar_file:
                    # This requires that the path is added correctly, relative
                    # to root. This is how we set things up in push()
                    # Disallow paths outside the current directory for the tar
                    # file (CVE-2007-4559, directory traversal vulnerability)
                    def is_within_directory(directory, target):
                        abs_directory = os.path.abspath(directory)
                        abs_target = os.path.abspath(target)
                        prefix = os.path.commonprefix([abs_directory, abs_target])
                        return prefix == abs_directory
                    def safe_extract(tar, path):
                        for member in tar.getmembers():
                            member_path = os.path.join(path, member.name)
                            if not is_within_directory(path, member_path):
                                raise ValueError(Errors.E852)
                        tar.extractall(path)
                    safe_extract(tar_file, self.root)
        return url
    def find(
        self,
        path: Path,
        *,
        command_hash: Optional[str] = None,
        content_hash: Optional[str] = None,
    ) -> Optional["FluidPath"]:
        """Find the best matching version of a file within the storage,
        or `None` if no match can be found. If both the creation and content hash
        are specified, only exact matches will be returned. Otherwise, the most
        recent matching file is preferred.
        """
        name = self.encode_name(str(path))
        urls = []
        if command_hash is not None and content_hash is not None:
            url = self.url / name / command_hash / content_hash
            urls = [url] if url.exists() else []
        elif command_hash is not None:
            if (self.url / name / command_hash).exists():
                urls = list((self.url / name / command_hash).iterdir())
        else:
            if (self.url / name).exists():
                for sub_dir in (self.url / name).iterdir():
                    urls.extend(sub_dir.iterdir())
                if content_hash is not None:
                    urls = [url for url in urls if url.parts[-1] == content_hash]
        if len(urls) >= 2:
            try:
                urls.sort(key=lambda x: x.stat().last_modified)  # type: ignore
            except Exception:
                msg.warn(
                    "Unable to sort remote files by last modified. The file(s) "
                    "pulled from the cache may not be the most recent."
                )
        return urls[-1] if urls else None
    def make_url(self, path: Path, command_hash: str, content_hash: str) -> "FluidPath":
        """Construct a URL from a subpath, a creation hash and a content hash."""
        return self.url / self.encode_name(str(path)) / command_hash / content_hash
    def encode_name(self, name: str) -> str:
        """Encode a subpath into a URL-safe name."""
        return urllib.parse.quote_plus(name)
 def get_content_hash(loc: Path) -> str:
    return get_checksum(loc)
 def get_command_hash(
    site_hash: str, env_hash: str, deps: List[Path], cmd: List[str]
 ) -> str:
    """Create a hash representing the execution of a command. This includes the
    currently installed packages, whatever environment variables have been marked
    as relevant, and the command.
    """
    if check_bool_env_var(ENV_VARS.PROJECT_USE_GIT_VERSION):
        spacy_v = GIT_VERSION
    else:
        spacy_v = str(get_minor_version(about.__version__) or "")
    dep_checksums = [get_checksum(dep) for dep in sorted(deps)]
    hashes = [spacy_v, site_hash, env_hash] + dep_checksums
    hashes.extend(cmd)
    creation_bytes = "".join(hashes).encode("utf8")
    return hashlib.md5(creation_bytes).hexdigest()
 def get_site_hash():
    """Hash the current Python environment's site-packages contents, including
    the name and version of the libraries. The list we're hashing is what
    `pip freeze` would output.
    """
    site_dirs = site.getsitepackages()
    if site.ENABLE_USER_SITE:
        site_dirs.extend(site.getusersitepackages())
    packages = set()
    for site_dir in site_dirs:
        site_dir = Path(site_dir)
        for subpath in site_dir.iterdir():
            if subpath.parts[-1].endswith("dist-info"):
                packages.add(subpath.parts[-1].replace(".dist-info", ""))
    package_bytes = "".join(sorted(packages)).encode("utf8")
    return hashlib.md5sum(package_bytes).hexdigest()
 def get_env_hash(env: Dict[str, str]) -> str:
    """Construct a hash of the environment variables that will be passed into
    the commands.
    Values in the env dict may be references to the current os.environ, using
    the syntax $ENV_VAR to mean os.environ[ENV_VAR]
    """
    env_vars = {}
    for key, value in env.items():
        if value.startswith("$"):
            env_vars[key] = os.environ.get(value[1:], "")
        else:
            env_vars[key] = value
    return get_hash(env_vars)
--- a/spacy/cli/project/run.py
+++ b/spacy/cli/project/run.py
@ -1,379 +1 @@
-import os.path
+from weasel.cli.run import *
 import sys
 from pathlib import Path
 from typing import Any, Dict, Iterable, List, Optional, Sequence, Tuple
 import srsly
 import typer
 from wasabi import msg
 from wasabi.util import locale_escape
 from ... import about
 from ...git_info import GIT_VERSION
 from ...util import (
    ENV_VARS,
    SimpleFrozenDict,
    SimpleFrozenList,
    check_bool_env_var,
    is_cwd,
    is_minor_version_match,
    join_command,
    run_command,
    split_command,
    working_dir,
 )
 from .._util import (
    COMMAND,
    PROJECT_FILE,
    PROJECT_LOCK,
    Arg,
    Opt,
    get_checksum,
    get_hash,
    load_project_config,
    parse_config_overrides,
    project_cli,
 )
@project_cli.command(
    "run", context_settings={"allow_extra_args": True, "ignore_unknown_options": True}
 )
 def project_run_cli(
    # fmt: off
    ctx: typer.Context,  # This is only used to read additional arguments
    subcommand: str = Arg(None, help=f"Name of command defined in the {PROJECT_FILE}"),
    project_dir: Path = Arg(Path.cwd(), help="Location of project directory. Defaults to current working directory.", exists=True, file_okay=False),
    force: bool = Opt(False, "--force", "-F", help="Force re-running steps, even if nothing changed"),
    dry: bool = Opt(False, "--dry", "-D", help="Perform a dry run and don't execute scripts"),
    show_help: bool = Opt(False, "--help", help="Show help message and available subcommands")
    # fmt: on
 ):
    """Run a named command or workflow defined in the project.yml. If a workflow
    name is specified, all commands in the workflow are run, in order. If
    commands define dependencies and/or outputs, they will only be re-run if
    state has changed.
    DOCS: https://spacy.io/api/cli#project-run
    """
    if show_help or not subcommand:
        print_run_help(project_dir, subcommand)
    else:
        overrides = parse_config_overrides(ctx.args)
        project_run(project_dir, subcommand, overrides=overrides, force=force, dry=dry)
 def project_run(
    project_dir: Path,
    subcommand: str,
    *,
    overrides: Dict[str, Any] = SimpleFrozenDict(),
    force: bool = False,
    dry: bool = False,
    capture: bool = False,
    skip_requirements_check: bool = False,
 ) -> None:
    """Run a named script defined in the project.yml. If the script is part
    of the default pipeline (defined in the "run" section), DVC is used to
    execute the command, so it can determine whether to rerun it. It then
    calls into "exec" to execute it.
    project_dir (Path): Path to project directory.
    subcommand (str): Name of command to run.
    overrides (Dict[str, Any]): Optional config overrides.
    force (bool): Force re-running, even if nothing changed.
    dry (bool): Perform a dry run and don't execute commands.
    capture (bool): Whether to capture the output and errors of individual commands.
        If False, the stdout and stderr will not be redirected, and if there's an error,
        sys.exit will be called with the return code. You should use capture=False
        when you want to turn over execution to the command, and capture=True
        when you want to run the command more like a function.
    skip_requirements_check (bool): Whether to skip the requirements check.
    """
    config = load_project_config(project_dir, overrides=overrides)
    commands = {cmd["name"]: cmd for cmd in config.get("commands", [])}
    workflows = config.get("workflows", {})
    validate_subcommand(list(commands.keys()), list(workflows.keys()), subcommand)
    req_path = project_dir / "requirements.txt"
    if not skip_requirements_check:
        if config.get("check_requirements", True) and os.path.exists(req_path):
            with req_path.open() as requirements_file:
                _check_requirements([req.strip() for req in requirements_file])
    if subcommand in workflows:
        msg.info(f"Running workflow '{subcommand}'")
        for cmd in workflows[subcommand]:
            project_run(
                project_dir,
                cmd,
                overrides=overrides,
                force=force,
                dry=dry,
                capture=capture,
                skip_requirements_check=True,
            )
    else:
        cmd = commands[subcommand]
        for dep in cmd.get("deps", []):
            if not (project_dir / dep).exists():
                err = f"Missing dependency specified by command '{subcommand}': {dep}"
                err_help = "Maybe you forgot to run the 'project assets' command or a previous step?"
                err_exits = 1 if not dry else None
                msg.fail(err, err_help, exits=err_exits)
        check_spacy_commit = check_bool_env_var(ENV_VARS.PROJECT_USE_GIT_VERSION)
        with working_dir(project_dir) as current_dir:
            msg.divider(subcommand)
            rerun = check_rerun(current_dir, cmd, check_spacy_commit=check_spacy_commit)
            if not rerun and not force:
                msg.info(f"Skipping '{cmd['name']}': nothing changed")
            else:
                run_commands(cmd["script"], dry=dry, capture=capture)
                if not dry:
                    update_lockfile(current_dir, cmd)
 def print_run_help(project_dir: Path, subcommand: Optional[str] = None) -> None:
    """Simulate a CLI help prompt using the info available in the project.yml.
    project_dir (Path): The project directory.
    subcommand (Optional[str]): The subcommand or None. If a subcommand is
        provided, the subcommand help is shown. Otherwise, the top-level help
        and a list of available commands is printed.
    """
    config = load_project_config(project_dir)
    config_commands = config.get("commands", [])
    commands = {cmd["name"]: cmd for cmd in config_commands}
    workflows = config.get("workflows", {})
    project_loc = "" if is_cwd(project_dir) else project_dir
    if subcommand:
        validate_subcommand(list(commands.keys()), list(workflows.keys()), subcommand)
        print(f"Usage: {COMMAND} project run {subcommand} {project_loc}")
        if subcommand in commands:
            help_text = commands[subcommand].get("help")
            if help_text:
                print(f"\n{help_text}\n")
        elif subcommand in workflows:
            steps = workflows[subcommand]
            print(f"\nWorkflow consisting of {len(steps)} commands:")
            steps_data = [
                (f"{i + 1}. {step}", commands[step].get("help", ""))
                for i, step in enumerate(steps)
            ]
            msg.table(steps_data)
            help_cmd = f"{COMMAND} project run [COMMAND] {project_loc} --help"
            print(f"For command details, run: {help_cmd}")
    else:
        print("")
        title = config.get("title")
        if title:
            print(f"{locale_escape(title)}\n")
        if config_commands:
            print(f"Available commands in {PROJECT_FILE}")
            print(f"Usage: {COMMAND} project run [COMMAND] {project_loc}")
            msg.table([(cmd["name"], cmd.get("help", "")) for cmd in config_commands])
        if workflows:
            print(f"Available workflows in {PROJECT_FILE}")
            print(f"Usage: {COMMAND} project run [WORKFLOW] {project_loc}")
            msg.table([(name, " -> ".join(steps)) for name, steps in workflows.items()])
 def run_commands(
    commands: Iterable[str] = SimpleFrozenList(),
    silent: bool = False,
    dry: bool = False,
    capture: bool = False,
 ) -> None:
    """Run a sequence of commands in a subprocess, in order.
    commands (List[str]): The string commands.
    silent (bool): Don't print the commands.
    dry (bool): Perform a dry run and don't execut anything.
    capture (bool): Whether to capture the output and errors of individual commands.
        If False, the stdout and stderr will not be redirected, and if there's an error,
        sys.exit will be called with the return code. You should use capture=False
        when you want to turn over execution to the command, and capture=True
        when you want to run the command more like a function.
    """
    for c in commands:
        command = split_command(c)
        # Not sure if this is needed or a good idea. Motivation: users may often
        # use commands in their config that reference "python" and we want to
        # make sure that it's always executing the same Python that spaCy is
        # executed with and the pip in the same env, not some other Python/pip.
        # Also ensures cross-compatibility if user 1 writes "python3" (because
        # that's how it's set up on their system), and user 2 without the
        # shortcut tries to re-run the command.
        if len(command) and command[0] in ("python", "python3"):
            command[0] = sys.executable
        elif len(command) and command[0] in ("pip", "pip3"):
            command = [sys.executable, "-m", "pip", *command[1:]]
        if not silent:
            print(f"Running command: {join_command(command)}")
        if not dry:
            run_command(command, capture=capture)
 def validate_subcommand(
    commands: Sequence[str], workflows: Sequence[str], subcommand: str
 ) -> None:
    """Check that a subcommand is valid and defined. Raises an error otherwise.
    commands (Sequence[str]): The available commands.
    subcommand (str): The subcommand.
    """
    if not commands and not workflows:
        msg.fail(f"No commands or workflows defined in {PROJECT_FILE}", exits=1)
    if subcommand not in commands and subcommand not in workflows:
        help_msg = []
        if subcommand in ["assets", "asset"]:
            help_msg.append("Did you mean to run: python -m spacy project assets?")
        if commands:
            help_msg.append(f"Available commands: {', '.join(commands)}")
        if workflows:
            help_msg.append(f"Available workflows: {', '.join(workflows)}")
        msg.fail(
            f"Can't find command or workflow '{subcommand}' in {PROJECT_FILE}",
            ". ".join(help_msg),
            exits=1,
        )
 def check_rerun(
    project_dir: Path,
    command: Dict[str, Any],
    *,
    check_spacy_version: bool = True,
    check_spacy_commit: bool = False,
 ) -> bool:
    """Check if a command should be rerun because its settings or inputs/outputs
    changed.
    project_dir (Path): The current project directory.
    command (Dict[str, Any]): The command, as defined in the project.yml.
    strict_version (bool):
    RETURNS (bool): Whether to re-run the command.
    """
    # Always rerun if no-skip is set
    if command.get("no_skip", False):
        return True
    lock_path = project_dir / PROJECT_LOCK
    if not lock_path.exists():  # We don't have a lockfile, run command
        return True
    data = srsly.read_yaml(lock_path)
    if command["name"] not in data:  # We don't have info about this command
        return True
    entry = data[command["name"]]
    # Always run commands with no outputs (otherwise they'd always be skipped)
    if not entry.get("outs", []):
        return True
    # Always rerun if spaCy version or commit hash changed
    spacy_v = entry.get("spacy_version")
    commit = entry.get("spacy_git_version")
    if check_spacy_version and not is_minor_version_match(spacy_v, about.__version__):
        info = f"({spacy_v} in {PROJECT_LOCK}, {about.__version__} current)"
        msg.info(f"Re-running '{command['name']}': spaCy minor version changed {info}")
        return True
    if check_spacy_commit and commit != GIT_VERSION:
        info = f"({commit} in {PROJECT_LOCK}, {GIT_VERSION} current)"
        msg.info(f"Re-running '{command['name']}': spaCy commit changed {info}")
        return True
    # If the entry in the lockfile matches the lockfile entry that would be
    # generated from the current command, we don't rerun because it means that
    # all inputs/outputs, hashes and scripts are the same and nothing changed
    lock_entry = get_lock_entry(project_dir, command)
    exclude = ["spacy_version", "spacy_git_version"]
    return get_hash(lock_entry, exclude=exclude) != get_hash(entry, exclude=exclude)
 def update_lockfile(project_dir: Path, command: Dict[str, Any]) -> None:
    """Update the lockfile after running a command. Will create a lockfile if
    it doesn't yet exist and will add an entry for the current command, its
    script and dependencies/outputs.
    project_dir (Path): The current project directory.
    command (Dict[str, Any]): The command, as defined in the project.yml.
    """
    lock_path = project_dir / PROJECT_LOCK
    if not lock_path.exists():
        srsly.write_yaml(lock_path, {})
        data = {}
    else:
        data = srsly.read_yaml(lock_path)
    data[command["name"]] = get_lock_entry(project_dir, command)
    srsly.write_yaml(lock_path, data)
 def get_lock_entry(project_dir: Path, command: Dict[str, Any]) -> Dict[str, Any]:
    """Get a lockfile entry for a given command. An entry includes the command,
    the script (command steps) and a list of dependencies and outputs with
    their paths and file hashes, if available. The format is based on the
    dvc.lock files, to keep things consistent.
    project_dir (Path): The current project directory.
    command (Dict[str, Any]): The command, as defined in the project.yml.
    RETURNS (Dict[str, Any]): The lockfile entry.
    """
    deps = get_fileinfo(project_dir, command.get("deps", []))
    outs = get_fileinfo(project_dir, command.get("outputs", []))
    outs_nc = get_fileinfo(project_dir, command.get("outputs_no_cache", []))
    return {
        "cmd": f"{COMMAND} run {command['name']}",
        "script": command["script"],
        "deps": deps,
        "outs": [*outs, *outs_nc],
        "spacy_version": about.__version__,
        "spacy_git_version": GIT_VERSION,
    }
 def get_fileinfo(project_dir: Path, paths: List[str]) -> List[Dict[str, Optional[str]]]:
    """Generate the file information for a list of paths (dependencies, outputs).
    Includes the file path and the file's checksum.
    project_dir (Path): The current project directory.
    paths (List[str]): The file paths.
    RETURNS (List[Dict[str, str]]): The lockfile entry for a file.
    """
    data = []
    for path in paths:
        file_path = project_dir / path
        md5 = get_checksum(file_path) if file_path.exists() else None
        data.append({"path": path, "md5": md5})
    return data
 def _check_requirements(requirements: List[str]) -> Tuple[bool, bool]:
    """Checks whether requirements are installed and free of version conflicts.
    requirements (List[str]): List of requirements.
    RETURNS (Tuple[bool, bool]): Whether (1) any packages couldn't be imported, (2) any packages with version conflicts
        exist.
    """
    import pkg_resources
    failed_pkgs_msgs: List[str] = []
    conflicting_pkgs_msgs: List[str] = []
    for req in requirements:
        try:
            pkg_resources.require(req)
        except pkg_resources.DistributionNotFound as dnf:
            failed_pkgs_msgs.append(dnf.report())
        except pkg_resources.VersionConflict as vc:
            conflicting_pkgs_msgs.append(vc.report())
        except Exception:
            msg.warn(
                f"Unable to check requirement: {req} "
                "Checks are currently limited to requirement specifiers "
                "(PEP 508)"
            )
    if len(failed_pkgs_msgs) or len(conflicting_pkgs_msgs):
        msg.warn(
            title="Missing requirements or requirement conflicts detected. Make sure your Python environment is set up "
            "correctly and you installed all requirements specified in your project's requirements.txt: "
        )
        for pgk_msg in failed_pkgs_msgs + conflicting_pkgs_msgs:
            msg.text(pgk_msg)
    return len(failed_pkgs_msgs) > 0, len(conflicting_pkgs_msgs) > 0
--- a/spacy/cli/templates/quickstart_training.jinja
+++ b/spacy/cli/templates/quickstart_training.jinja
@ -90,11 +90,12 @@ grad_factor = 1.0
 factory = "parser"
 [components.parser.model]
-@architectures = "spacy.TransitionBasedParser.v3"
+@architectures = "spacy.TransitionBasedParser.v2"
 state_type = "parser"
 extra_state_tokens = false
 hidden_width = 128
 maxout_pieces = 3
 use_upper = false
 nO = null
 [components.parser.model.tok2vec]
@ -110,11 +111,12 @@ grad_factor = 1.0
 factory = "ner"
 [components.ner.model]
-@architectures = "spacy.TransitionBasedParser.v3"
+@architectures = "spacy.TransitionBasedParser.v2"
 state_type = "ner"
 extra_state_tokens = false
 hidden_width = 64
 maxout_pieces = 2
 use_upper = false
 nO = null
 [components.ner.model.tok2vec]
@ -269,8 +271,9 @@ grad_factor = 1.0
@layers = "reduce_mean.v1"
 [components.textcat.model.linear_model]
-@architectures = "spacy.TextCatBOW.v2"
+@architectures = "spacy.TextCatBOW.v3"
 exclusive_classes = true
 length = 262144
 ngram_size = 1
 no_output_layer = false
@ -306,8 +309,9 @@ grad_factor = 1.0
@layers = "reduce_mean.v1"
 [components.textcat_multilabel.model.linear_model]
-@architectures = "spacy.TextCatBOW.v2"
+@architectures = "spacy.TextCatBOW.v3"
 exclusive_classes = false
 length = 262144
 ngram_size = 1
 no_output_layer = false
@ -383,11 +387,12 @@ width = ${components.tok2vec.model.encode.width}
 factory = "parser"
 [components.parser.model]
-@architectures = "spacy.TransitionBasedParser.v3"
+@architectures = "spacy.TransitionBasedParser.v2"
 state_type = "parser"
 extra_state_tokens = false
 hidden_width = 128
 maxout_pieces = 3
 use_upper = true
 nO = null
 [components.parser.model.tok2vec]
@ -400,11 +405,12 @@ width = ${components.tok2vec.model.encode.width}
 factory = "ner"
 [components.ner.model]
-@architectures = "spacy.TransitionBasedParser.v3"
+@architectures = "spacy.TransitionBasedParser.v2"
 state_type = "ner"
 extra_state_tokens = false
 hidden_width = 64
 maxout_pieces = 2
 use_upper = true
 nO = null
 [components.ner.model.tok2vec]
@ -538,14 +544,15 @@ nO = null
 width = ${components.tok2vec.model.encode.width}
 [components.textcat.model.linear_model]
-@architectures = "spacy.TextCatBOW.v2"
+@architectures = "spacy.TextCatBOW.v3"
 exclusive_classes = true
 length = 262144
 ngram_size = 1
 no_output_layer = false
 {% else -%}
 [components.textcat.model]
-@architectures = "spacy.TextCatBOW.v2"
+@architectures = "spacy.TextCatBOW.v3"
 exclusive_classes = true
 ngram_size = 1
 no_output_layer = false
@ -566,15 +573,17 @@ nO = null
 width = ${components.tok2vec.model.encode.width}
 [components.textcat_multilabel.model.linear_model]
-@architectures = "spacy.TextCatBOW.v2"
+@architectures = "spacy.TextCatBOW.v3"
 exclusive_classes = false
 length = 262144
 ngram_size = 1
 no_output_layer = false
 {% else -%}
 [components.textcat_multilabel.model]
-@architectures = "spacy.TextCatBOW.v2"
+@architectures = "spacy.TextCatBOW.v3"
 exclusive_classes = false
 length = 262144
 ngram_size = 1
 no_output_layer = false
 {%- endif %}
--- a/spacy/cli/train.py
+++ b/spacy/cli/train.py
@ -13,7 +13,7 @@ from ._util import (
    Arg,
    Opt,
    app,
-    import_code,
+    import_code_paths,
    parse_config_overrides,
    setup_gpu,
    show_validation_error,
@ -28,7 +28,7 @@ def train_cli(
    ctx: typer.Context,  # This is only used to read additional arguments
    config_path: Path = Arg(..., help="Path to config file", exists=True, allow_dash=True),
    output_path: Optional[Path] = Opt(None, "--output", "--output-path", "-o", help="Output directory to store trained pipeline in"),
-    code_path: Optional[Path] = Opt(None, "--code", "-c", help="Path to Python file with additional code (registered functions) to be imported"),
+    code_path: str = Opt("", "--code", "-c", help="Comma-separated paths to Python files with additional code (registered functions) to be imported"),
    verbose: bool = Opt(False, "--verbose", "-V", "-VV", help="Display more information for debugging purposes"),
    use_gpu: int = Opt(-1, "--gpu-id", "-g", help="GPU ID or -1 for CPU")
    # fmt: on
@ -47,9 +47,10 @@ def train_cli(
    DOCS: https://spacy.io/api/cli#train
    """
-    util.logger.setLevel(logging.DEBUG if verbose else logging.INFO)
+    if verbose:
        util.logger.setLevel(logging.DEBUG)
    overrides = parse_config_overrides(ctx.args)
-    import_code(code_path)
+    import_code_paths(code_path)
    train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
--- a/spacy/default_config.cfg
+++ b/spacy/default_config.cfg
@ -26,6 +26,9 @@ batch_size = 1000
 [nlp.tokenizer]
@tokenizers = "spacy.Tokenizer.v1"
 [nlp.vectors]
@vectors = "spacy.Vectors.v1"
 # The pipeline components and their models
 [components]
--- a/spacy/displacy/render.py
+++ b/spacy/displacy/render.py
@ -142,7 +142,25 @@ class SpanRenderer:
        spans (list): Individual entity spans and their start, end, label, kb_id and kb_url.
        title (str / None): Document title set in Doc.user_data['title'].
        """
-        per_token_info = []
+        per_token_info = self._assemble_per_token_info(tokens, spans)
        markup = self._render_markup(per_token_info)
        markup = TPL_SPANS.format(content=markup, dir=self.direction)
        if title:
            markup = TPL_TITLE.format(title=title) + markup
        return markup
    @staticmethod
    def _assemble_per_token_info(
        tokens: List[str], spans: List[Dict[str, Any]]
    ) -> List[Dict[str, List[Dict[str, Any]]]]:
        """Assembles token info used to generate markup in render_spans().
        tokens (List[str]): Tokens in text.
        spans (List[Dict[str, Any]]): Spans in text.
        RETURNS (List[Dict[str, List[Dict, str, Any]]]): Per token info needed to render HTML markup for given tokens
            and spans.
        """
        per_token_info: List[Dict[str, List[Dict[str, Any]]]] = []
        # we must sort so that we can correctly describe when spans need to "stack"
        # which is determined by their start token, then span length (longer spans on top),
        # then break any remaining ties with the span label
@ -154,21 +172,22 @@ class SpanRenderer:
                s["label"],
            ),
        )
        for s in spans:
            # this is the vertical 'slot' that the span will be rendered in
            # vertical_position = span_label_offset + (offset_step * (slot - 1))
            s["render_slot"] = 0
        for idx, token in enumerate(tokens):
            # Identify if a token belongs to a Span (and which) and if it's a
            # start token of said Span. We'll use this for the final HTML render
            token_markup: Dict[str, Any] = {}
            token_markup["text"] = token
-            concurrent_spans = 0
+            intersecting_spans: List[Dict[str, Any]] = []
            entities = []
            for span in spans:
                ent = {}
                if span["start_token"] <= idx < span["end_token"]:
                    concurrent_spans += 1
                    span_start = idx == span["start_token"]
                    ent["label"] = span["label"]
                    ent["is_start"] = span_start
@ -176,7 +195,12 @@ class SpanRenderer:
                        # When the span starts, we need to know how many other
                        # spans are on the 'span stack' and will be rendered.
                        # This value becomes the vertical render slot for this entire span
-                        span["render_slot"] = concurrent_spans
+                        span["render_slot"] = (
                            intersecting_spans[-1]["render_slot"]
                            if len(intersecting_spans)
                            else 0
                        ) + 1
                    intersecting_spans.append(span)
                    ent["render_slot"] = span["render_slot"]
                    kb_id = span.get("kb_id", "")
                    kb_url = span.get("kb_url", "#")
@ -193,11 +217,8 @@ class SpanRenderer:
                    span["render_slot"] = 0
            token_markup["entities"] = entities
            per_token_info.append(token_markup)
-        markup = self._render_markup(per_token_info)
+
-        markup = TPL_SPANS.format(content=markup, dir=self.direction)
+        return per_token_info
        if title:
            markup = TPL_TITLE.format(title=title) + markup
        return markup
    def _render_markup(self, per_token_info: List[Dict[str, Any]]) -> str:
        """Render the markup from per-token information"""
@ -313,6 +334,8 @@ class DependencyRenderer:
                self.lang = settings.get("lang", DEFAULT_LANG)
            render_id = f"{id_prefix}-{i}"
            svg = self.render_svg(render_id, p["words"], p["arcs"])
            if p.get("title"):
                svg = TPL_TITLE.format(title=p.get("title")) + svg
            rendered.append(svg)
        if page:
            content = "".join([TPL_FIGURE.format(content=svg) for svg in rendered])
@ -565,7 +588,7 @@ class EntityRenderer:
            for i, fragment in enumerate(fragments):
                markup += escape_html(fragment)
                if len(fragments) > 1 and i != len(fragments) - 1:
-                    markup += "</br>"
+                    markup += "<br>"
            if self.ents is None or label.upper() in self.ents:
                color = self.colors.get(label.upper(), self.default_color)
                ent_settings = {
@ -583,7 +606,7 @@ class EntityRenderer:
        for i, fragment in enumerate(fragments):
            markup += escape_html(fragment)
            if len(fragments) > 1 and i != len(fragments) - 1:
-                markup += "</br>"
+                markup += "<br>"
        markup = TPL_ENTS.format(content=markup, dir=self.direction)
        if title:
            markup = TPL_TITLE.format(title=title) + markup
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -1,6 +1,8 @@
 import warnings
 from typing import Literal
 from . import about
 class ErrorsWithCodes(type):
    def __getattribute__(self, code):
@ -103,13 +105,14 @@ class Warnings(metaclass=ErrorsWithCodes):
            "table. This may degrade the performance of the model to some "
            "degree. If this is intentional or the language you're using "
            "doesn't have a normalization table, please ignore this warning. "
-            "If this is surprising, make sure you have the spacy-lookups-data "
+            "If this is surprising, make sure you are loading the table in "
-            "package installed and load the table in your config. The "
+            "your config. The languages with lexeme normalization tables are "
-            "languages with lexeme normalization tables are currently: "
+            "currently: {langs}\n\nAn example of how to load a table in "
-            "{langs}\n\nLoad the table in your config with:\n\n"
+            "your config :\n\n"
            "[initialize.lookups]\n"
-            "@misc = \"spacy.LookupsDataLoader.v1\"\n"
+            "@misc = \"spacy.LookupsDataLoaderFromURL.v1\"\n"
            "lang = ${{nlp.lang}}\n"
             f'url = "{about.__lookups_url__}"\n'
            "tables = [\"lexeme_norm\"]\n")
    W035 = ("Discarding subpattern '{pattern}' due to an unrecognized "
            "attribute or operator.")
@ -211,9 +214,9 @@ class Warnings(metaclass=ErrorsWithCodes):
    W125 = ("The StaticVectors key_attr is no longer used. To set a custom "
            "key attribute for vectors, configure it through Vectors(attr=) or "
            "'spacy init vectors --attr'")
    W126 = ("These keys are unsupported: {unsupported}")
    # v4 warning strings
    W400 = ("`use_upper=False` is ignored, the upper layer is always enabled")
    W401 = ("`incl_prior is True`, but the selected knowledge base type {kb_type} doesn't support prior probability "
            "lookups so this setting will be ignored. If your KB does support prior probability lookups, make sure "
            "to return `True` in `.supports_prior_probs`.")
@ -224,7 +227,6 @@ class Errors(metaclass=ErrorsWithCodes):
    E002 = ("Can't find factory for '{name}' for language {lang} ({lang_code}). "
            "This usually happens when spaCy calls `nlp.{method}` with a custom "
            "component name that's not registered on the current language class. "
            "If you're using a Transformer, make sure to install 'spacy-transformers'. "
            "If you're using a custom component, make sure you've added the "
            "decorator `@Language.component` (for function components) or "
            "`@Language.factory` (for class components).\n\nAvailable "
@ -549,12 +551,12 @@ class Errors(metaclass=ErrorsWithCodes):
            "during training, make sure to include it in 'annotating components'")
    # New errors added in v3.x
    E849 = ("The vocab only supports {method} for vectors of type "
            "spacy.vectors.Vectors, not {vectors_type}.")
    E850 = ("The PretrainVectors objective currently only supports default or "
            "floret vectors, not {mode} vectors.")
    E851 = ("The 'textcat' component labels should only have values of 0 or 1, "
            "but found value of '{val}'.")
    E852 = ("The tar file pulled from the remote attempted an unsafe path "
            "traversal.")
    E853 = ("Unsupported component factory name '{name}'. The character '.' is "
            "not permitted in factory names.")
    E854 = ("Unable to set doc.ents. Check that the 'ents_filter' does not "
@ -967,6 +969,12 @@ class Errors(metaclass=ErrorsWithCodes):
             " 'min_length': {min_length}, 'max_length': {max_length}")
    E1054 = ("The text, including whitespace, must match between reference and "
             "predicted docs when training {component}.")
    E1055 = ("The 'replace_listener' callback expects {num_params} parameters, "
             "but only callbacks with one or three parameters are supported")
    E1056 = ("The `TextCatBOW` architecture expects a length of at least 1, was {length}.")
    E1057 = ("The `TextCatReduce` architecture must be used with at least one "
             "reduction. Please enable one of `use_reduce_first`, "
             "`use_reduce_last`, `use_reduce_max` or `use_reduce_mean`.")
    # v4 error strings
    E4000 = ("Expected a Doc as input, but got: '{type}'")
@ -982,6 +990,18 @@ class Errors(metaclass=ErrorsWithCodes):
             "{existing_value}.")
    E4008 = ("Span {pos}_char {value} does not correspond to a token {pos}.")
    E4009 = ("The '{attr}' parameter should be 'None' or 'True', but found '{value}'.")
    E4010 = ("Required lemmatizer table(s) {missing_tables} not found in "
             "[initialize] or in registered lookups (spacy-lookups-data). An "
             "example for how to load lemmatizer tables in [initialize]:\n\n"
             "[initialize.components]\n\n"
             "[initialize.components.{pipe_name}]\n\n"
             "[initialize.components.{pipe_name}.lookups]\n"
             '@misc = "spacy.LookupsDataLoaderFromURL.v1"\n'
             "lang = ${{nlp.lang}}\n"
             f'url = "{about.__lookups_url__}"\n'
             "tables = {tables}\n"
             "# or required tables only: tables = {required_tables}\n")
    E4011 = ("Server error ({status_code}), couldn't fetch {url}")
 RENAMED_LANGUAGE_CODES = {"xx": "mul", "is": "isl"}
--- a/spacy/kb/init.py
+++ b/spacy/kb/init.py
@ -2,4 +2,9 @@ from .candidate import Candidate, InMemoryCandidate
 from .kb import KnowledgeBase
 from .kb_in_memory import InMemoryLookupKB
-__all__ = ["KnowledgeBase", "InMemoryLookupKB", "Candidate", "InMemoryCandidate"]
+__all__ = [
    "Candidate",
    "KnowledgeBase",
    "InMemoryCandidate",
    "InMemoryLookupKB",
 ]
--- a/spacy/kb/candidate.pyx
+++ b/spacy/kb/candidate.pyx
@ -1,4 +1,4 @@
-# cython: infer_types=True, profile=True
+# cython: infer_types=True
 from .kb_in_memory cimport InMemoryLookupKB
--- a/spacy/kb/kb.pyx
+++ b/spacy/kb/kb.pyx
@ -1,4 +1,4 @@
-# cython: infer_types=True, profile=True
+# cython: infer_types=True
 from pathlib import Path
 from typing import Iterable, Iterator, Tuple, Union
--- a/spacy/kb/kb_in_memory.pyx
+++ b/spacy/kb/kb_in_memory.pyx
@ -1,4 +1,4 @@
-# cython: infer_types=True, profile=True
+# cython: infer_types=True
 from typing import Any, Callable, Dict, Iterable, Iterator
 import srsly
--- a/spacy/lang/en/lex_attrs.py
+++ b/spacy/lang/en/lex_attrs.py
@ -6,7 +6,8 @@ _num_words = [
    "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
    "sixteen", "seventeen", "eighteen", "nineteen", "twenty", "thirty", "forty",
    "fifty", "sixty", "seventy", "eighty", "ninety", "hundred", "thousand",
-    "million", "billion", "trillion", "quadrillion", "gajillion", "bazillion"
+    "million", "billion", "trillion", "quadrillion", "quintillion", "sextillion",
    "septillion", "octillion", "nonillion", "decillion", "gajillion", "bazillion"
 ]
 _ordinal_words = [
    "first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eighth",
@ -14,7 +15,8 @@ _ordinal_words = [
    "fifteenth", "sixteenth", "seventeenth", "eighteenth", "nineteenth",
    "twentieth", "thirtieth", "fortieth", "fiftieth", "sixtieth", "seventieth",
    "eightieth", "ninetieth", "hundredth", "thousandth", "millionth", "billionth",
-    "trillionth", "quadrillionth", "gajillionth", "bazillionth"
+    "trillionth", "quadrillionth", "quintillionth", "sextillionth", "septillionth",
    "octillionth", "nonillionth", "decillionth", "gajillionth", "bazillionth"
 ]
 # fmt: on
--- a/spacy/lang/es/lemmatizer.py
+++ b/spacy/lang/es/lemmatizer.py
@ -163,7 +163,7 @@ class SpanishLemmatizer(Lemmatizer):
        for old, new in self.lookups.get_table("lemma_rules").get("det", []):
            if word == old:
                return [new]
-        # If none of the specfic rules apply, search in the common rules for
+        # If none of the specific rules apply, search in the common rules for
        # determiners and pronouns that follow a unique pattern for
        # lemmatization. If the word is in the list, return the corresponding
        # lemma.
@ -291,7 +291,7 @@ class SpanishLemmatizer(Lemmatizer):
        for old, new in self.lookups.get_table("lemma_rules").get("pron", []):
            if word == old:
                return [new]
-        # If none of the specfic rules apply, search in the common rules for
+        # If none of the specific rules apply, search in the common rules for
        # determiners and pronouns that follow a unique pattern for
        # lemmatization. If the word is in the list, return the corresponding
        # lemma.
--- a/spacy/lang/fo/init.py
+++ b/spacy/lang/fo/init.py
@ -0,0 +1,18 @@
 from ...language import BaseDefaults, Language
 from ..punctuation import TOKENIZER_INFIXES, TOKENIZER_PREFIXES, TOKENIZER_SUFFIXES
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 class FaroeseDefaults(BaseDefaults):
    tokenizer_exceptions = TOKENIZER_EXCEPTIONS
    infixes = TOKENIZER_INFIXES
    suffixes = TOKENIZER_SUFFIXES
    prefixes = TOKENIZER_PREFIXES
 class Faroese(Language):
    lang = "fo"
    Defaults = FaroeseDefaults
 __all__ = ["Faroese"]
--- a/spacy/lang/fo/tokenizer_exceptions.py
+++ b/spacy/lang/fo/tokenizer_exceptions.py
@ -0,0 +1,90 @@
 from ...symbols import ORTH
 from ...util import update_exc
 from ..tokenizer_exceptions import BASE_EXCEPTIONS
 _exc = {}
 for orth in [
    "apr.",
    "aug.",
    "avgr.",
    "árg.",
    "ávís.",
    "beinl.",
    "blkv.",
    "blaðkv.",
    "blm.",
    "blaðm.",
    "bls.",
    "blstj.",
    "blaðstj.",
    "des.",
    "eint.",
    "febr.",
    "fyrrv.",
    "góðk.",
    "h.m.",
    "innt.",
    "jan.",
    "kl.",
    "m.a.",
    "mðr.",
    "mió.",
    "nr.",
    "nto.",
    "nov.",
    "nút.",
    "o.a.",
    "o.a.m.",
    "o.a.tíl.",
    "o.fl.",
    "ff.",
    "o.m.a.",
    "o.o.",
    "o.s.fr.",
    "o.tíl.",
    "o.ø.",
    "okt.",
    "omf.",
    "pst.",
    "ritstj.",
    "sbr.",
    "sms.",
    "smst.",
    "smb.",
    "sb.",
    "sbrt.",
    "sp.",
    "sept.",
    "spf.",
    "spsk.",
    "t.e.",
    "t.s.",
    "t.s.s.",
    "tlf.",
    "tel.",
    "tsk.",
    "t.o.v.",
    "t.d.",
    "uml.",
    "ums.",
    "uppl.",
    "upprfr.",
    "uppr.",
    "útg.",
    "útl.",
    "útr.",
    "vanl.",
    "v.",
    "v.h.",
    "v.ø.o.",
    "viðm.",
    "viðv.",
    "vm.",
    "v.m.",
 ]:
    _exc[orth] = [{ORTH: orth}]
    capitalized = orth.capitalize()
    _exc[capitalized] = [{ORTH: capitalized}]
 TOKENIZER_EXCEPTIONS = update_exc(BASE_EXCEPTIONS, _exc)
--- a/spacy/lang/grc/punctuation.py
+++ b/spacy/lang/grc/punctuation.py
@ -15,6 +15,7 @@ _prefixes = (
    [
        "†",
        "⸏",
        "〈",
    ]
    + LIST_PUNCT
    + LIST_ELLIPSES
@ -31,6 +32,7 @@ _suffixes = (
    + [
        "†",
        "⸎",
        "〉",
        r"(?<=[\u1F00-\u1FFF\u0370-\u03FF])[\-\.⸏]",
    ]
 )
--- a/spacy/lang/nn/init.py
+++ b/spacy/lang/nn/init.py
@ -0,0 +1,20 @@
 from ...language import BaseDefaults, Language
 from ..nb import SYNTAX_ITERATORS
 from .punctuation import TOKENIZER_INFIXES, TOKENIZER_PREFIXES, TOKENIZER_SUFFIXES
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 class NorwegianNynorskDefaults(BaseDefaults):
    tokenizer_exceptions = TOKENIZER_EXCEPTIONS
    prefixes = TOKENIZER_PREFIXES
    infixes = TOKENIZER_INFIXES
    suffixes = TOKENIZER_SUFFIXES
    syntax_iterators = SYNTAX_ITERATORS
 class NorwegianNynorsk(Language):
    lang = "nn"
    Defaults = NorwegianNynorskDefaults
 __all__ = ["NorwegianNynorsk"]
--- a/spacy/lang/nn/examples.py
+++ b/spacy/lang/nn/examples.py
@ -0,0 +1,15 @@
 """
 Example sentences to test spaCy and its language models.
 >>> from spacy.lang.nn.examples import sentences
 >>> docs = nlp.pipe(sentences)
 """
 # sentences taken from Omsetjingsminne frå Nynorsk pressekontor 2022 (https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-80/)
 sentences = [
    "Konseptet går ut på at alle tre omgangar tel, alle hopparar må stille i kvalifiseringa og poengsummen skal telje.",
    "Det er ein meir enn i same periode i fjor.",
    "Det har lava ned enorme snømengder i store delar av Europa den siste tida.",
    "Akhtar Chaudhry er ikkje innstilt på Oslo-lista til SV, men utfordrar Heikki Holmås om førsteplassen.",
 ]
--- a/spacy/lang/nn/punctuation.py
+++ b/spacy/lang/nn/punctuation.py
@ -0,0 +1,74 @@
 from ..char_classes import (
    ALPHA,
    ALPHA_LOWER,
    ALPHA_UPPER,
    CONCAT_QUOTES,
    CURRENCY,
    LIST_CURRENCY,
    LIST_ELLIPSES,
    LIST_ICONS,
    LIST_PUNCT,
    LIST_QUOTES,
    PUNCT,
    UNITS,
 )
 from ..punctuation import TOKENIZER_SUFFIXES
 _quotes = CONCAT_QUOTES.replace("'", "")
 _list_punct = [x for x in LIST_PUNCT if x != "#"]
 _list_icons = [x for x in LIST_ICONS if x != "°"]
 _list_icons = [x.replace("\\u00B0", "") for x in _list_icons]
 _list_quotes = [x for x in LIST_QUOTES if x != "\\'"]
 _prefixes = (
    ["§", "%", "=", "—", "–", r"\+(?![0-9])"]
    + _list_punct
    + LIST_ELLIPSES
    + LIST_QUOTES
    + LIST_CURRENCY
    + LIST_ICONS
 )
 _infixes = (
    LIST_ELLIPSES
    + _list_icons
    + [
        r"(?<=[{al}])\.(?=[{au}])".format(al=ALPHA_LOWER, au=ALPHA_UPPER),
        r"(?<=[{a}])[,!?](?=[{a}])".format(a=ALPHA),
        r"(?<=[{a}])[:<>=/](?=[{a}])".format(a=ALPHA),
        r"(?<=[{a}]),(?=[{a}])".format(a=ALPHA),
        r"(?<=[{a}])([{q}\)\]\(\[])(?=[{a}])".format(a=ALPHA, q=_quotes),
        r"(?<=[{a}])--(?=[{a}])".format(a=ALPHA),
    ]
 )
 _suffixes = (
    LIST_PUNCT
    + LIST_ELLIPSES
    + _list_quotes
    + _list_icons
    + ["—", "–"]
    + [
        r"(?<=[0-9])\+",
        r"(?<=°[FfCcKk])\.",
        r"(?<=[0-9])(?:{c})".format(c=CURRENCY),
        r"(?<=[0-9])(?:{u})".format(u=UNITS),
        r"(?<=[{al}{e}{p}(?:{q})])\.".format(
            al=ALPHA_LOWER, e=r"%²\-\+", q=_quotes, p=PUNCT
        ),
        r"(?<=[{au}][{au}])\.".format(au=ALPHA_UPPER),
    ]
    + [r"(?<=[^sSxXzZ])'"]
 )
 _suffixes += [
    suffix
    for suffix in TOKENIZER_SUFFIXES
    if suffix not in ["'s", "'S", "’s", "’S", r"\'"]
 ]
 TOKENIZER_PREFIXES = _prefixes
 TOKENIZER_INFIXES = _infixes
 TOKENIZER_SUFFIXES = _suffixes
--- a/spacy/lang/nn/tokenizer_exceptions.py
+++ b/spacy/lang/nn/tokenizer_exceptions.py
@ -0,0 +1,228 @@
 from ...symbols import NORM, ORTH
 from ...util import update_exc
 from ..tokenizer_exceptions import BASE_EXCEPTIONS
 _exc = {}
 for exc_data in [
    {ORTH: "jan.", NORM: "januar"},
    {ORTH: "feb.", NORM: "februar"},
    {ORTH: "mar.", NORM: "mars"},
    {ORTH: "apr.", NORM: "april"},
    {ORTH: "jun.", NORM: "juni"},
    # note: "jul." is in the simple list below without a NORM exception
    {ORTH: "aug.", NORM: "august"},
    {ORTH: "sep.", NORM: "september"},
    {ORTH: "okt.", NORM: "oktober"},
    {ORTH: "nov.", NORM: "november"},
    {ORTH: "des.", NORM: "desember"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
 for orth in [
    "Ap.",
    "Aq.",
    "Ca.",
    "Chr.",
    "Co.",
    "Dr.",
    "F.eks.",
    "Fr.p.",
    "Frp.",
    "Grl.",
    "Kr.",
    "Kr.F.",
    "Kr.F.s",
    "Mr.",
    "Mrs.",
    "Pb.",
    "Pr.",
    "Sp.",
    "St.",
    "a.m.",
    "ad.",
    "adm.dir.",
    "adr.",
    "b.c.",
    "bl.a.",
    "bla.",
    "bm.",
    "bnr.",
    "bto.",
    "c.c.",
    "ca.",
    "cand.mag.",
    "co.",
    "d.d.",
    "d.m.",
    "d.y.",
    "dept.",
    "dr.",
    "dr.med.",
    "dr.philos.",
    "dr.psychol.",
    "dss.",
    "dvs.",
    "e.Kr.",
    "e.l.",
    "eg.",
    "eig.",
    "ekskl.",
    "el.",
    "et.",
    "etc.",
    "etg.",
    "ev.",
    "evt.",
    "f.",
    "f.Kr.",
    "f.eks.",
    "f.o.m.",
    "fhv.",
    "fk.",
    "foreg.",
    "fork.",
    "fv.",
    "fvt.",
    "g.",
    "gl.",
    "gno.",
    "gnr.",
    "grl.",
    "gt.",
    "h.r.adv.",
    "hhv.",
    "hoh.",
    "hr.",
    "ifb.",
    "ifm.",
    "iht.",
    "inkl.",
    "istf.",
    "jf.",
    "jr.",
    "jul.",
    "juris.",
    "kfr.",
    "kgl.",
    "kgl.res.",
    "kl.",
    "komm.",
    "kr.",
    "kst.",
    "lat.",
    "lø.",
    "m.a.",
    "m.a.o.",
    "m.fl.",
    "m.m.",
    "m.v.",
    "ma.",
    "mag.art.",
    "md.",
    "mfl.",
    "mht.",
    "mill.",
    "min.",
    "mnd.",
    "moh.",
    "mrd.",
    "muh.",
    "mv.",
    "mva.",
    "n.å.",
    "ndf.",
    "nr.",
    "nto.",
    "nyno.",
    "o.a.",
    "o.l.",
    "obl.",
    "off.",
    "ofl.",
    "on.",
    "op.",
    "org.",
    "osv.",
    "ovf.",
    "p.",
    "p.a.",
    "p.g.a.",
    "p.m.",
    "p.t.",
    "pga.",
    "ph.d.",
    "pkt.",
    "pr.",
    "pst.",
    "pt.",
    "red.anm.",
    "ref.",
    "res.",
    "res.kap.",
    "resp.",
    "rv.",
    "s.",
    "s.d.",
    "s.k.",
    "s.u.",
    "s.å.",
    "sen.",
    "sep.",
    "siviling.",
    "sms.",
    "snr.",
    "spm.",
    "sr.",
    "sst.",
    "st.",
    "st.meld.",
    "st.prp.",
    "stip.",
    "stk.",
    "stud.",
    "sv.",
    "såk.",
    "sø.",
    "t.d.",
    "t.h.",
    "t.o.m.",
    "t.v.",
    "temp.",
    "ti.",
    "tils.",
    "tilsv.",
    "tl;dr",
    "tlf.",
    "to.",
    "ult.",
    "utg.",
    "v.",
    "vedk.",
    "vedr.",
    "vg.",
    "vgs.",
    "vha.",
    "vit.ass.",
    "vn.",
    "vol.",
    "vs.",
    "vsa.",
    "§§",
    "©NTB",
    "årg.",
    "årh.",
 ]:
    _exc[orth] = [{ORTH: orth}]
 # Dates
 for h in range(1, 31 + 1):
    for period in ["."]:
        _exc[f"{h}{period}"] = [{ORTH: f"{h}."}]
 _custom_base_exc = {"i.": [{ORTH: "i", NORM: "i"}, {ORTH: "."}]}
 _exc.update(_custom_base_exc)
 TOKENIZER_EXCEPTIONS = update_exc(BASE_EXCEPTIONS, _exc)
--- a/spacy/lang/tr/examples.py
+++ b/spacy/lang/tr/examples.py
@ -15,4 +15,7 @@ sentences = [
    "Türkiye'nin başkenti neresi?",
    "Bakanlar Kurulu 180 günlük eylem planını açıkladı.",
    "Merkez Bankası, beklentiler doğrultusunda faizlerde değişikliğe gitmedi.",
    "Cemal Sureya kimdir?",
    "Bunlari Biliyor muydunuz?",
    "Altinoluk Turkiye haritasinin neresinde yer alir?",
 ]
--- a/spacy/lang/zh/init.py
+++ b/spacy/lang/zh/init.py
@ -31,7 +31,7 @@ segmenter = "char"
 [initialize]
 [initialize.tokenizer]
-pkuseg_model = null
+pkuseg_model = "spacy_ontonotes"
 pkuseg_user_dict = "default"
 """
--- a/spacy/language.py
+++ b/spacy/language.py
@ -1,4 +1,5 @@
 import functools
 import inspect
 import itertools
 import multiprocessing as mp
 import random
@ -64,6 +65,7 @@ from .util import (
    registry,
    warn_if_jupyter_cupy,
 )
 from .vectors import BaseVectors
 from .vocab import Vocab, create_vocab
 PipeCallable = Callable[[Doc], Doc]
@ -128,13 +130,6 @@ def create_tokenizer() -> Callable[["Language"], Tokenizer]:
    return tokenizer_factory
@registry.misc("spacy.LookupsDataLoader.v1")
 def load_lookups_data(lang, tables):
    util.logger.debug("Loading lookups from spacy-lookups-data: %s", tables)
    lookups = load_lookups(lang=lang, tables=tables)
    return lookups
 class Language:
    """A text-processing pipeline. Usually you'll load this once per process,
    and pass the instance around your application.
@ -160,6 +155,7 @@ class Language:
        max_length: int = 10**6,
        meta: Dict[str, Any] = {},
        create_tokenizer: Optional[Callable[["Language"], Callable[[str], Doc]]] = None,
        create_vectors: Optional[Callable[["Vocab"], BaseVectors]] = None,
        batch_size: int = 1000,
        **kwargs,
    ) -> None:
@ -199,6 +195,10 @@ class Language:
            raise ValueError(Errors.E918.format(vocab=vocab, vocab_type=type(Vocab)))
        if vocab is True:
            vocab = create_vocab(self.lang, self.Defaults)
            if not create_vectors:
                vectors_cfg = {"vectors": self._config["nlp"]["vectors"]}
                create_vectors = registry.resolve(vectors_cfg)["vectors"]
            vocab.vectors = create_vectors(vocab)
        else:
            if (self.lang and vocab.lang) and (self.lang != vocab.lang):
                raise ValueError(Errors.E150.format(nlp=self.lang, vocab=vocab.lang))
@ -1797,6 +1797,12 @@ class Language:
        for proc in procs:
            proc.start()
        # Close writing-end of channels. This is needed to avoid that reading
        # from the channel blocks indefinitely when the worker closes the
        # channel.
        for tx in bytedocs_send_ch:
            tx.close()
        # Cycle channels not to break the order of docs.
        # The received object is a batch of byte-encoded docs, so flatten them with chain.from_iterable.
        byte_tuples = chain.from_iterable(
@ -1819,8 +1825,23 @@ class Language:
                    # tell `sender` that one batch was consumed.
                    sender.step()
        finally:
            # If we are stopping in an orderly fashion, the workers' queues
            # are empty. Put the sentinel in their queues to signal that work
            # is done, so that they can exit gracefully.
            for q in texts_q:
                q.put(_WORK_DONE_SENTINEL)
            # Otherwise, we are stopping because the error handler raised an
            # exception. The sentinel will be last to go out of the queue.
            # To avoid doing unnecessary work or hanging on platforms that
            # block on sending (Windows), we'll close our end of the channel.
            # This signals to the worker that it can exit the next time it
            # attempts to send data down the channel.
            for r in bytedocs_recv_ch:
                r.close()
            for proc in procs:
-                proc.terminate()
+                proc.join()
    def _link_components(self) -> None:
        """Register 'listeners' within pipeline components, to allow them to
@ -1885,6 +1906,10 @@ class Language:
            ).merge(config)
        if "nlp" not in config:
            raise ValueError(Errors.E985.format(config=config))
        # fill in [nlp.vectors] if not present (as a narrower alternative to
        # auto-filling [nlp] from the default config)
        if "vectors" not in config["nlp"]:
            config["nlp"]["vectors"] = {"@vectors": "spacy.Vectors.v1"}
        config_lang = config["nlp"].get("lang")
        if config_lang is not None and config_lang != cls.lang:
            raise ValueError(
@ -1920,6 +1945,7 @@ class Language:
            filled["nlp"], validate=validate, schema=ConfigSchemaNlp
        )
        create_tokenizer = resolved_nlp["tokenizer"]
        create_vectors = resolved_nlp["vectors"]
        before_creation = resolved_nlp["before_creation"]
        after_creation = resolved_nlp["after_creation"]
        after_pipeline_creation = resolved_nlp["after_pipeline_creation"]
@ -1940,7 +1966,12 @@ class Language:
        # inside stuff like the spacy train function. If we loaded them here,
        # then we would load them twice at runtime: once when we make from config,
        # and then again when we load from disk.
-        nlp = lang_cls(vocab=vocab, create_tokenizer=create_tokenizer, meta=meta)
+        nlp = lang_cls(
            vocab=vocab,
            create_tokenizer=create_tokenizer,
            create_vectors=create_vectors,
            meta=meta,
        )
        if after_creation is not None:
            nlp = after_creation(nlp)
            if not isinstance(nlp, cls):
@ -2157,8 +2188,20 @@ class Language:
            # Go over the listener layers and replace them
            for listener in pipe_listeners:
                new_model = tok2vec_model.copy()
-                if "replace_listener" in tok2vec_model.attrs:
+                replace_listener_func = tok2vec_model.attrs.get("replace_listener")
-                    new_model = tok2vec_model.attrs["replace_listener"](new_model)
+                if replace_listener_func is not None:
                    # Pass the extra args to the callback without breaking compatibility with
                    # old library versions that only expect a single parameter.
                    num_params = len(
                        inspect.signature(replace_listener_func).parameters
                    )
                    if num_params == 1:
                        new_model = replace_listener_func(new_model)
                    elif num_params == 3:
                        new_model = replace_listener_func(new_model, listener, tok2vec)
                    else:
                        raise ValueError(Errors.E1055.format(num_params=num_params))
                util.replace_model_node(pipe.model, listener, new_model)  # type: ignore[attr-defined]
                tok2vec.remove_listener(listener, pipe_name)
@ -2418,6 +2461,11 @@ def _apply_pipes(
    while True:
        try:
            texts_with_ctx = receiver.get()
            # Stop working if we encounter the end-of-work sentinel.
            if isinstance(texts_with_ctx, _WorkDoneSentinel):
                return
            docs = (
                ensure_doc(doc_like, context) for doc_like, context in texts_with_ctx
            )
@ -2426,11 +2474,21 @@ def _apply_pipes(
            # Connection does not accept unpickable objects, so send list.
            byte_docs = [(doc.to_bytes(), doc._context, None) for doc in docs]
            padding = [(None, None, None)] * (len(texts_with_ctx) - len(byte_docs))
-            sender.send(byte_docs + padding)  # type: ignore[operator]
+            data: Sequence[Tuple[Optional[bytes], Optional[Any], Optional[bytes]]] = (
                byte_docs + padding  # type: ignore[operator]
            )
        except Exception:
            error_msg = [(None, None, srsly.msgpack_dumps(traceback.format_exc()))]
            padding = [(None, None, None)] * (len(texts_with_ctx) - 1)
-            sender.send(error_msg + padding)
+            data = error_msg + padding
        try:
            sender.send(data)
        except BrokenPipeError:
            # Parent has closed the pipe prematurely. This happens when a
            # worker encounters an error and the error handler is set to
            # stop processing.
            return
 class _Sender:
@ -2460,3 +2518,10 @@ class _Sender:
        if self.count >= self.chunk_size:
            self.count = 0
            self.send()
 class _WorkDoneSentinel:
    pass
 _WORK_DONE_SENTINEL = _WorkDoneSentinel()
--- a/spacy/lexeme.pyx
+++ b/spacy/lexeme.pyx
@ -1,4 +1,5 @@
 # cython: embedsignature=True
 # cython: profile=False
 # Compiler crashes on memory view coercion without this. Should report bug.
 cimport numpy as np
 from libc.string cimport memset
--- a/spacy/lookups.py
+++ b/spacy/lookups.py
@ -2,16 +2,40 @@ from collections import OrderedDict
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Union
 import requests
 import srsly
 from preshed.bloom import BloomFilter
 from .errors import Errors
 from .strings import get_string_id
-from .util import SimpleFrozenDict, ensure_path, load_language_data, registry
+from .util import SimpleFrozenDict, ensure_path, load_language_data, logger, registry
 UNSET = object()
@registry.misc("spacy.LookupsDataLoader.v1")
 def load_lookups_data(lang, tables):
    logger.debug(f"Loading lookups from spacy-lookups-data: {tables}")
    lookups = load_lookups(lang=lang, tables=tables)
    return lookups
@registry.misc("spacy.LookupsDataLoaderFromURL.v1")
 def load_lookups_data_from_url(lang, tables, url):
    logger.debug(f"Loading lookups from {url}: {tables}")
    lookups = Lookups()
    for table in tables:
        table_url = url + lang + "_" + table + ".json"
        r = requests.get(table_url)
        if r.status_code != 200:
            raise ValueError(
                Errors.E4011.format(status_code=r.status_code, url=table_url)
            )
        table_data = r.json()
        lookups.add_table(table, table_data)
    return lookups
 def load_lookups(lang: str, tables: List[str], strict: bool = True) -> "Lookups":
    """Load the data from the spacy-lookups-data package for a given language,
    if available. Returns an empty `Lookups` container if there's no data or if the package
--- a/spacy/matcher/init.py
+++ b/spacy/matcher/init.py
@ -3,4 +3,4 @@ from .levenshtein import levenshtein
 from .matcher import Matcher
 from .phrasematcher import PhraseMatcher
-__all__ = ["Matcher", "PhraseMatcher", "DependencyMatcher", "levenshtein"]
+__all__ = ["DependencyMatcher", "Matcher", "PhraseMatcher", "levenshtein"]
--- a/spacy/matcher/dependencymatcher.pyx
+++ b/spacy/matcher/dependencymatcher.pyx
@ -1,4 +1,4 @@
-# cython: infer_types=True, profile=True
+# cython: infer_types=True
 import warnings
 from collections import defaultdict
 from itertools import product
@ -129,6 +129,7 @@ cdef class DependencyMatcher:
            else:
                required_keys = {"RIGHT_ID", "RIGHT_ATTRS", "REL_OP", "LEFT_ID"}
                relation_keys = set(relation.keys())
                # Identify required keys that have not been specified
                missing = required_keys - relation_keys
                if missing:
                    missing_txt = ", ".join(list(missing))
@ -136,6 +137,13 @@ cdef class DependencyMatcher:
                        required=required_keys,
                        missing=missing_txt
                    ))
                # Identify additional, unsupported keys
                unsupported = relation_keys - required_keys
                if unsupported:
                    unsupported_txt = ", ".join(list(unsupported))
                    warnings.warn(Warnings.W126.format(
                        unsupported=unsupported_txt
                    ))
                if (
                    relation["RIGHT_ID"] in visited_nodes
                    or relation["LEFT_ID"] not in visited_nodes
--- a/spacy/matcher/levenshtein.pyx
+++ b/spacy/matcher/levenshtein.pyx
@ -1,4 +1,4 @@
-# cython: profile=True, binding=True, infer_types=True
+# cython: binding=True, infer_types=True
 from cpython.object cimport PyObject
 from libc.stdint cimport int64_t
--- a/spacy/matcher/matcher.pyx
+++ b/spacy/matcher/matcher.pyx
@ -1,4 +1,4 @@
-# cython: binding=True, infer_types=True, profile=True
+# cython: binding=True, infer_types=True
 from typing import Iterable, List
 from cymem.cymem cimport Pool
--- a/spacy/matcher/phrasematcher.pyx
+++ b/spacy/matcher/phrasematcher.pyx
@ -1,4 +1,4 @@
-# cython: infer_types=True, profile=True
+# cython: infer_types=True
 from collections import defaultdict
 from typing import List
--- a/spacy/ml/_precomputable_affine.py
+++ b/spacy/ml/_precomputable_affine.py
@ -0,0 +1,164 @@
 from thinc.api import Model, normal_init
 from ..util import registry
@registry.layers("spacy.PrecomputableAffine.v1")
 def PrecomputableAffine(nO, nI, nF, nP, dropout=0.1):
    model = Model(
        "precomputable_affine",
        forward,
        init=init,
        dims={"nO": nO, "nI": nI, "nF": nF, "nP": nP},
        params={"W": None, "b": None, "pad": None},
        attrs={"dropout_rate": dropout},
    )
    return model
 def forward(model, X, is_train):
    nF = model.get_dim("nF")
    nO = model.get_dim("nO")
    nP = model.get_dim("nP")
    nI = model.get_dim("nI")
    W = model.get_param("W")
    # Preallocate array for layer output, including padding.
    Yf = model.ops.alloc2f(X.shape[0] + 1, nF * nO * nP, zeros=False)
    model.ops.gemm(X, W.reshape((nF * nO * nP, nI)), trans2=True, out=Yf[1:])
    Yf = Yf.reshape((Yf.shape[0], nF, nO, nP))
    # Set padding. Padding has shape (1, nF, nO, nP). Unfortunately, we cannot
    # change its shape to (nF, nO, nP) without breaking existing models. So
    # we'll squeeze the first dimension here.
    Yf[0] = model.ops.xp.squeeze(model.get_param("pad"), 0)
    def backward(dY_ids):
        # This backprop is particularly tricky, because we get back a different
        # thing from what we put out. We put out an array of shape:
        # (nB, nF, nO, nP), and get back:
        # (nB, nO, nP) and ids (nB, nF)
        # The ids tell us the values of nF, so we would have:
        #
        # dYf = zeros((nB, nF, nO, nP))
        # for b in range(nB):
        #     for f in range(nF):
        #         dYf[b, ids[b, f]] += dY[b]
        #
        # However, we avoid building that array for efficiency -- and just pass
        # in the indices.
        dY, ids = dY_ids
        assert dY.ndim == 3
        assert dY.shape[1] == nO, dY.shape
        assert dY.shape[2] == nP, dY.shape
        # nB = dY.shape[0]
        model.inc_grad("pad", _backprop_precomputable_affine_padding(model, dY, ids))
        Xf = X[ids]
        Xf = Xf.reshape((Xf.shape[0], nF * nI))
        model.inc_grad("b", dY.sum(axis=0))
        dY = dY.reshape((dY.shape[0], nO * nP))
        Wopfi = W.transpose((1, 2, 0, 3))
        Wopfi = Wopfi.reshape((nO * nP, nF * nI))
        dXf = model.ops.gemm(dY.reshape((dY.shape[0], nO * nP)), Wopfi)
        dWopfi = model.ops.gemm(dY, Xf, trans1=True)
        dWopfi = dWopfi.reshape((nO, nP, nF, nI))
        # (o, p, f, i) --> (f, o, p, i)
        dWopfi = dWopfi.transpose((2, 0, 1, 3))
        model.inc_grad("W", dWopfi)
        return dXf.reshape((dXf.shape[0], nF, nI))
    return Yf, backward
 def _backprop_precomputable_affine_padding(model, dY, ids):
    nB = dY.shape[0]
    nF = model.get_dim("nF")
    nP = model.get_dim("nP")
    nO = model.get_dim("nO")
    # Backprop the "padding", used as a filler for missing values.
    # Values that are missing are set to -1, and each state vector could
    # have multiple missing values. The padding has different values for
    # different missing features. The gradient of the padding vector is:
    #
    # for b in range(nB):
    #     for f in range(nF):
    #         if ids[b, f] < 0:
    #             d_pad[f] += dY[b]
    #
    # Which can be rewritten as:
    #
    # (ids < 0).T @ dY
    mask = model.ops.asarray(ids < 0, dtype="f")
    d_pad = model.ops.gemm(mask, dY.reshape(nB, nO * nP), trans1=True)
    return d_pad.reshape((1, nF, nO, nP))
 def init(model, X=None, Y=None):
    """This is like the 'layer sequential unit variance', but instead
    of taking the actual inputs, we randomly generate whitened data.
    Why's this all so complicated? We have a huge number of inputs,
    and the maxout unit makes guessing the dynamics tricky. Instead
    we set the maxout weights to values that empirically result in
    whitened outputs given whitened inputs.
    """
    if model.has_param("W") and model.get_param("W").any():
        return
    nF = model.get_dim("nF")
    nO = model.get_dim("nO")
    nP = model.get_dim("nP")
    nI = model.get_dim("nI")
    W = model.ops.alloc4f(nF, nO, nP, nI)
    b = model.ops.alloc2f(nO, nP)
    pad = model.ops.alloc4f(1, nF, nO, nP)
    ops = model.ops
    W = normal_init(ops, W.shape, mean=float(ops.xp.sqrt(1.0 / nF * nI)))
    pad = normal_init(ops, pad.shape, mean=1.0)
    model.set_param("W", W)
    model.set_param("b", b)
    model.set_param("pad", pad)
    ids = ops.alloc((5000, nF), dtype="f")
    ids += ops.xp.random.uniform(0, 1000, ids.shape)
    ids = ops.asarray(ids, dtype="i")
    tokvecs = ops.alloc((5000, nI), dtype="f")
    tokvecs += ops.xp.random.normal(loc=0.0, scale=1.0, size=tokvecs.size).reshape(
        tokvecs.shape
    )
    def predict(ids, tokvecs):
        # nS ids. nW tokvecs. Exclude the padding array.
        hiddens = model.predict(tokvecs[:-1])  # (nW, f, o, p)
        vectors = model.ops.alloc((ids.shape[0], nO * nP), dtype="f")
        # need nS vectors
        hiddens = hiddens.reshape((hiddens.shape[0] * nF, nO * nP))
        model.ops.scatter_add(vectors, ids.flatten(), hiddens)
        vectors = vectors.reshape((vectors.shape[0], nO, nP))
        vectors += b
        vectors = model.ops.asarray(vectors)
        if nP >= 2:
            return model.ops.maxout(vectors)[0]
        else:
            return vectors * (vectors >= 0)
    tol_var = 0.01
    tol_mean = 0.01
    t_max = 10
    W = model.get_param("W").copy()
    b = model.get_param("b").copy()
    for t_i in range(t_max):
        acts1 = predict(ids, tokvecs)
        var = model.ops.xp.var(acts1)
        mean = model.ops.xp.mean(acts1)
        if abs(var - 1.0) >= tol_var:
            W /= model.ops.xp.sqrt(var)
            model.set_param("W", W)
        elif abs(mean) >= tol_mean:
            b -= mean
            model.set_param("b", b)
        else:
            break
--- a/spacy/ml/models/parser.py
+++ b/spacy/ml/models/parser.py
@ -1,66 +1,23 @@
-import warnings
+from typing import List, Literal, Optional
 from typing import Any, List, Literal, Optional, Tuple
-from thinc.api import Model
+from thinc.api import Linear, Model, chain, list2array, use_ops, zero_init
 from thinc.types import Floats2d
-from ...errors import Errors, Warnings
+from ...errors import Errors
-from ...tokens.doc import Doc
+from ...tokens import Doc
 from ...util import registry
 from .._precomputable_affine import PrecomputableAffine
 from ..tb_framework import TransitionModel
 TransitionSystem = Any  # TODO
 State = Any  # TODO
@registry.architectures.register("spacy.TransitionBasedParser.v2")
 def transition_parser_v2(
    tok2vec: Model[List[Doc], List[Floats2d]],
    state_type: Literal["parser", "ner"],
    extra_state_tokens: bool,
    hidden_width: int,
    maxout_pieces: int,
    use_upper: bool,
    nO: Optional[int] = None,
 ) -> Model:
    if not use_upper:
        warnings.warn(Warnings.W400)
    return build_tb_parser_model(
        tok2vec,
        state_type,
        extra_state_tokens,
        hidden_width,
        maxout_pieces,
        nO=nO,
    )
@registry.architectures.register("spacy.TransitionBasedParser.v3")
 def transition_parser_v3(
    tok2vec: Model[List[Doc], List[Floats2d]],
    state_type: Literal["parser", "ner"],
    extra_state_tokens: bool,
    hidden_width: int,
    maxout_pieces: int,
    nO: Optional[int] = None,
 ) -> Model:
    return build_tb_parser_model(
        tok2vec,
        state_type,
        extra_state_tokens,
        hidden_width,
        maxout_pieces,
        nO=nO,
    )
@registry.architectures("spacy.TransitionBasedParser.v2")
 def build_tb_parser_model(
    tok2vec: Model[List[Doc], List[Floats2d]],
    state_type: Literal["parser", "ner"],
    extra_state_tokens: bool,
    hidden_width: int,
    maxout_pieces: int,
    use_upper: bool,
    nO: Optional[int] = None,
 ) -> Model:
    """
@ -94,7 +51,14 @@ def build_tb_parser_model(
        feature sets (for the NER) or 13 (for the parser).
    hidden_width (int): The width of the hidden layer.
    maxout_pieces (int): How many pieces to use in the state prediction layer.
-        Recommended values are 1, 2 or 3.
+        Recommended values are 1, 2 or 3. If 1, the maxout non-linearity
        is replaced with a ReLu non-linearity if use_upper=True, and no
        non-linearity if use_upper=False.
    use_upper (bool): Whether to use an additional hidden layer after the state
        vector in order to predict the action scores. It is recommended to set
        this to False for large pretrained models such as transformers, and True
        for smaller networks. The upper layer is computed on CPU, which becomes
        a bottleneck on larger GPU-based models, where it's also less necessary.
    nO (int or None): The number of actions the model will predict between.
        Usually inferred from data at the beginning of training, or loaded from
        disk.
@ -105,11 +69,106 @@ def build_tb_parser_model(
        nr_feature_tokens = 6 if extra_state_tokens else 3
    else:
        raise ValueError(Errors.E917.format(value=state_type))
-    return TransitionModel(
+    t2v_width = tok2vec.get_dim("nO") if tok2vec.has_dim("nO") else None
-        tok2vec=tok2vec,
+    tok2vec = chain(
-        state_tokens=nr_feature_tokens,
+        tok2vec,
-        hidden_width=hidden_width,
+        list2array(),
-        maxout_pieces=maxout_pieces,
+        Linear(hidden_width, t2v_width),
        nO=nO,
        unseen_classes=set(),
    )
    tok2vec.set_dim("nO", hidden_width)
    lower = _define_lower(
        nO=hidden_width if use_upper else nO,
        nF=nr_feature_tokens,
        nI=tok2vec.get_dim("nO"),
        nP=maxout_pieces,
    )
    upper = None
    if use_upper:
        with use_ops("cpu"):
            # Initialize weights at zero, as it's a classification layer.
            upper = _define_upper(nO=nO, nI=None)
    return TransitionModel(tok2vec, lower, upper, resize_output)
 def _define_upper(nO, nI):
    return Linear(nO=nO, nI=nI, init_W=zero_init)
 def _define_lower(nO, nF, nI, nP):
    return PrecomputableAffine(nO=nO, nF=nF, nI=nI, nP=nP)
 def resize_output(model, new_nO):
    if model.attrs["has_upper"]:
        return _resize_upper(model, new_nO)
    return _resize_lower(model, new_nO)
 def _resize_upper(model, new_nO):
    upper = model.get_ref("upper")
    if upper.has_dim("nO") is None:
        upper.set_dim("nO", new_nO)
        return model
    elif new_nO == upper.get_dim("nO"):
        return model
    smaller = upper
    nI = smaller.maybe_get_dim("nI")
    with use_ops("cpu"):
        larger = _define_upper(nO=new_nO, nI=nI)
    # it could be that the model is not initialized yet, then skip this bit
    if smaller.has_param("W"):
        larger_W = larger.ops.alloc2f(new_nO, nI)
        larger_b = larger.ops.alloc1f(new_nO)
        smaller_W = smaller.get_param("W")
        smaller_b = smaller.get_param("b")
        # Weights are stored in (nr_out, nr_in) format, so we're basically
        # just adding rows here.
        if smaller.has_dim("nO"):
            old_nO = smaller.get_dim("nO")
            larger_W[:old_nO] = smaller_W
            larger_b[:old_nO] = smaller_b
            for i in range(old_nO, new_nO):
                model.attrs["unseen_classes"].add(i)
        larger.set_param("W", larger_W)
        larger.set_param("b", larger_b)
    model._layers[-1] = larger
    model.set_ref("upper", larger)
    return model
 def _resize_lower(model, new_nO):
    lower = model.get_ref("lower")
    if lower.has_dim("nO") is None:
        lower.set_dim("nO", new_nO)
        return model
    smaller = lower
    nI = smaller.maybe_get_dim("nI")
    nF = smaller.maybe_get_dim("nF")
    nP = smaller.maybe_get_dim("nP")
    larger = _define_lower(nO=new_nO, nI=nI, nF=nF, nP=nP)
    # it could be that the model is not initialized yet, then skip this bit
    if smaller.has_param("W"):
        larger_W = larger.ops.alloc4f(nF, new_nO, nP, nI)
        larger_b = larger.ops.alloc2f(new_nO, nP)
        larger_pad = larger.ops.alloc4f(1, nF, new_nO, nP)
        smaller_W = smaller.get_param("W")
        smaller_b = smaller.get_param("b")
        smaller_pad = smaller.get_param("pad")
        # Copy the old weights and padding into the new layer
        if smaller.has_dim("nO"):
            old_nO = smaller.get_dim("nO")
            larger_W[:, 0:old_nO, :, :] = smaller_W
            larger_pad[:, :, 0:old_nO, :] = smaller_pad
            larger_b[0:old_nO, :] = smaller_b
            for i in range(old_nO, new_nO):
                model.attrs["unseen_classes"].add(i)
        larger.set_param("W", larger_W)
        larger.set_param("b", larger_b)
        larger.set_param("pad", larger_pad)
    model._layers[1] = larger
    model.set_ref("lower", larger)
    return model
--- a/spacy/ml/models/textcat.py
+++ b/spacy/ml/models/textcat.py
@ -1,21 +1,28 @@
 from functools import partial
-from typing import List, Optional, cast
+from typing import List, Optional, Tuple, cast
 from thinc.api import (
    Dropout,
    Gelu,
    LayerNorm,
    Linear,
    Logistic,
    Maxout,
    Model,
    ParametricAttention,
    ParametricAttention_v2,
    Relu,
    Softmax,
    SparseLinear,
    SparseLinear_v2,
    chain,
    clone,
    concatenate,
    list2ragged,
    noop,
    reduce_first,
    reduce_last,
    reduce_max,
    reduce_mean,
    reduce_sum,
    residual,
@ -25,9 +32,10 @@ from thinc.api import (
 )
 from thinc.layers.chain import init as init_chain
 from thinc.layers.resizable import resize_linear_weighted, resize_model
-from thinc.types import Floats2d
+from thinc.types import ArrayXd, Floats2d
 from ...attrs import ORTH
 from ...errors import Errors
 from ...tokens import Doc
 from ...util import registry
 from ..extract_ngrams import extract_ngrams
@ -47,10 +55,255 @@ def build_simple_cnn_text_classifier(
    outputs sum to 1. If exclusive_classes=False, a logistic non-linearity
    is applied instead, so that outputs are in the range [0, 1].
    """
    return build_reduce_text_classifier(
        tok2vec=tok2vec,
        exclusive_classes=exclusive_classes,
        use_reduce_first=False,
        use_reduce_last=False,
        use_reduce_max=False,
        use_reduce_mean=True,
        nO=nO,
    )
 def resize_and_set_ref(model, new_nO, resizable_layer):
    resizable_layer = resize_model(resizable_layer, new_nO)
    model.set_ref("output_layer", resizable_layer.layers[0])
    model.set_dim("nO", new_nO, force=True)
    return model
@registry.architectures("spacy.TextCatBOW.v2")
 def build_bow_text_classifier(
    exclusive_classes: bool,
    ngram_size: int,
    no_output_layer: bool,
    nO: Optional[int] = None,
 ) -> Model[List[Doc], Floats2d]:
    return _build_bow_text_classifier(
        exclusive_classes=exclusive_classes,
        ngram_size=ngram_size,
        no_output_layer=no_output_layer,
        nO=nO,
        sparse_linear=SparseLinear(nO=nO),
    )
@registry.architectures("spacy.TextCatBOW.v3")
 def build_bow_text_classifier_v3(
    exclusive_classes: bool,
    ngram_size: int,
    no_output_layer: bool,
    length: int = 262144,
    nO: Optional[int] = None,
 ) -> Model[List[Doc], Floats2d]:
    if length < 1:
        raise ValueError(Errors.E1056.format(length=length))
    # Find k such that 2**(k-1) < length <= 2**k.
    length = 2 ** (length - 1).bit_length()
    return _build_bow_text_classifier(
        exclusive_classes=exclusive_classes,
        ngram_size=ngram_size,
        no_output_layer=no_output_layer,
        nO=nO,
        sparse_linear=SparseLinear_v2(nO=nO, length=length),
    )
 def _build_bow_text_classifier(
    exclusive_classes: bool,
    ngram_size: int,
    no_output_layer: bool,
    sparse_linear: Model[Tuple[ArrayXd, ArrayXd, ArrayXd], ArrayXd],
    nO: Optional[int] = None,
 ) -> Model[List[Doc], Floats2d]:
    fill_defaults = {"b": 0, "W": 0}
    with Model.define_operators({">>": chain}):
-        cnn = tok2vec >> list2ragged() >> reduce_mean()
+        output_layer = None
-        nI = tok2vec.maybe_get_dim("nO")
+        if not no_output_layer:
            fill_defaults["b"] = NEG_VALUE
            output_layer = softmax_activation() if exclusive_classes else Logistic()
        resizable_layer: Model[Floats2d, Floats2d] = resizable(
            sparse_linear,
            resize_layer=partial(resize_linear_weighted, fill_defaults=fill_defaults),
        )
        model = extract_ngrams(ngram_size, attr=ORTH) >> resizable_layer
        model = with_cpu(model, model.ops)
        if output_layer:
            model = model >> with_cpu(output_layer, output_layer.ops)
    if nO is not None:
        model.set_dim("nO", cast(int, nO))
    model.set_ref("output_layer", sparse_linear)
    model.attrs["multi_label"] = not exclusive_classes
    model.attrs["resize_output"] = partial(
        resize_and_set_ref, resizable_layer=resizable_layer
    )
    return model
@registry.architectures("spacy.TextCatEnsemble.v2")
 def build_text_classifier_v2(
    tok2vec: Model[List[Doc], List[Floats2d]],
    linear_model: Model[List[Doc], Floats2d],
    nO: Optional[int] = None,
 ) -> Model[List[Doc], Floats2d]:
    width = tok2vec.maybe_get_dim("nO")
    exclusive_classes = not linear_model.attrs["multi_label"]
    parametric_attention = _build_parametric_attention_with_residual_nonlinear(
        tok2vec=tok2vec,
        nonlinear_layer=Maxout(nI=width, nO=width),
        key_transform=noop(),
    )
    with Model.define_operators({">>": chain, "|": concatenate}):
        nO_double = nO * 2 if nO else None
        if exclusive_classes:
            output_layer = Softmax(nO=nO, nI=nO_double)
        else:
            output_layer = Linear(nO=nO, nI=nO_double) >> Logistic()
        model = (linear_model | parametric_attention) >> output_layer
        model.set_ref("tok2vec", tok2vec)
    if model.has_dim("nO") is not False and nO is not None:
        model.set_dim("nO", cast(int, nO))
    model.set_ref("output_layer", linear_model.get_ref("output_layer"))
    model.attrs["multi_label"] = not exclusive_classes
    return model
@registry.architectures("spacy.TextCatLowData.v1")
 def build_text_classifier_lowdata(
    width: int, dropout: Optional[float], nO: Optional[int] = None
 ) -> Model[List[Doc], Floats2d]:
    # Don't document this yet, I'm not sure it's right.
    # Note, before v.3, this was the default if setting "low_data" and "pretrained_dims"
    with Model.define_operators({">>": chain, "**": clone}):
        model = (
            StaticVectors(width)
            >> list2ragged()
            >> ParametricAttention(width)
            >> reduce_sum()
            >> residual(Relu(width, width)) ** 2
            >> Linear(nO, width)
        )
        if dropout:
            model = model >> Dropout(dropout)
        model = model >> Logistic()
    return model
@registry.architectures("spacy.TextCatParametricAttention.v1")
 def build_textcat_parametric_attention_v1(
    tok2vec: Model[List[Doc], List[Floats2d]],
    exclusive_classes: bool,
    nO: Optional[int] = None,
 ) -> Model[List[Doc], Floats2d]:
    width = tok2vec.maybe_get_dim("nO")
    parametric_attention = _build_parametric_attention_with_residual_nonlinear(
        tok2vec=tok2vec,
        nonlinear_layer=Maxout(nI=width, nO=width),
        key_transform=Gelu(nI=width, nO=width),
    )
    with Model.define_operators({">>": chain}):
        if exclusive_classes:
            output_layer = Softmax(nO=nO)
        else:
            output_layer = Linear(nO=nO) >> Logistic()
        model = parametric_attention >> output_layer
    if model.has_dim("nO") is not False and nO is not None:
        model.set_dim("nO", cast(int, nO))
    model.set_ref("output_layer", output_layer)
    model.attrs["multi_label"] = not exclusive_classes
    return model
 def _build_parametric_attention_with_residual_nonlinear(
    *,
    tok2vec: Model[List[Doc], List[Floats2d]],
    nonlinear_layer: Model[Floats2d, Floats2d],
    key_transform: Optional[Model[Floats2d, Floats2d]] = None,
 ) -> Model[List[Doc], Floats2d]:
    with Model.define_operators({">>": chain, "|": concatenate}):
        width = tok2vec.maybe_get_dim("nO")
        attention_layer = ParametricAttention_v2(nO=width, key_transform=key_transform)
        norm_layer = LayerNorm(nI=width)
        parametric_attention = (
            tok2vec
            >> list2ragged()
            >> attention_layer
            >> reduce_sum()
            >> residual(nonlinear_layer >> norm_layer >> Dropout(0.0))
        )
        parametric_attention.init = _init_parametric_attention_with_residual_nonlinear
        parametric_attention.set_ref("tok2vec", tok2vec)
        parametric_attention.set_ref("attention_layer", attention_layer)
        parametric_attention.set_ref("nonlinear_layer", nonlinear_layer)
        parametric_attention.set_ref("norm_layer", norm_layer)
        return parametric_attention
 def _init_parametric_attention_with_residual_nonlinear(model, X, Y) -> Model:
    tok2vec_width = get_tok2vec_width(model)
    model.get_ref("attention_layer").set_dim("nO", tok2vec_width)
    model.get_ref("nonlinear_layer").set_dim("nO", tok2vec_width)
    model.get_ref("nonlinear_layer").set_dim("nI", tok2vec_width)
    model.get_ref("norm_layer").set_dim("nI", tok2vec_width)
    model.get_ref("norm_layer").set_dim("nO", tok2vec_width)
    init_chain(model, X, Y)
    return model
@registry.architectures("spacy.TextCatReduce.v1")
 def build_reduce_text_classifier(
    tok2vec: Model,
    exclusive_classes: bool,
    use_reduce_first: bool,
    use_reduce_last: bool,
    use_reduce_max: bool,
    use_reduce_mean: bool,
    nO: Optional[int] = None,
 ) -> Model[List[Doc], Floats2d]:
    """Build a model that classifies pooled `Doc` representations.
    Pooling is performed using reductions. Reductions are concatenated when
    multiple reductions are used.
    tok2vec (Model): the tok2vec layer to pool over.
    exclusive_classes (bool): Whether or not classes are mutually exclusive.
    use_reduce_first (bool): Pool by using the hidden representation of the
        first token of a `Doc`.
    use_reduce_last (bool): Pool by using the hidden representation of the
        last token of a `Doc`.
    use_reduce_max (bool): Pool by taking the maximum values of the hidden
        representations of a `Doc`.
    use_reduce_mean (bool): Pool by taking the mean of all hidden
        representations of a `Doc`.
    nO (Optional[int]): Number of classes.
    """
    fill_defaults = {"b": 0, "W": 0}
    reductions = []
    if use_reduce_first:
        reductions.append(reduce_first())
    if use_reduce_last:
        reductions.append(reduce_last())
    if use_reduce_max:
        reductions.append(reduce_max())
    if use_reduce_mean:
        reductions.append(reduce_mean())
    if not len(reductions):
        raise ValueError(Errors.E1057)
    with Model.define_operators({">>": chain}):
        cnn = tok2vec >> list2ragged() >> concatenate(*reductions)
        nO_tok2vec = tok2vec.maybe_get_dim("nO")
        nI = nO_tok2vec * len(reductions) if nO_tok2vec is not None else None
        if exclusive_classes:
            output_layer = Softmax(nO=nO, nI=nI)
            fill_defaults["b"] = NEG_VALUE
@ -80,113 +333,3 @@ def build_simple_cnn_text_classifier(
        model.set_dim("nO", cast(int, nO))
    model.attrs["multi_label"] = not exclusive_classes
    return model
 def resize_and_set_ref(model, new_nO, resizable_layer):
    resizable_layer = resize_model(resizable_layer, new_nO)
    model.set_ref("output_layer", resizable_layer.layers[0])
    model.set_dim("nO", new_nO, force=True)
    return model
@registry.architectures("spacy.TextCatBOW.v2")
 def build_bow_text_classifier(
    exclusive_classes: bool,
    ngram_size: int,
    no_output_layer: bool,
    nO: Optional[int] = None,
 ) -> Model[List[Doc], Floats2d]:
    fill_defaults = {"b": 0, "W": 0}
    with Model.define_operators({">>": chain}):
        sparse_linear = SparseLinear(nO=nO)
        output_layer = None
        if not no_output_layer:
            fill_defaults["b"] = NEG_VALUE
            output_layer = softmax_activation() if exclusive_classes else Logistic()
        resizable_layer: Model[Floats2d, Floats2d] = resizable(
            sparse_linear,
            resize_layer=partial(resize_linear_weighted, fill_defaults=fill_defaults),
        )
        model = extract_ngrams(ngram_size, attr=ORTH) >> resizable_layer
        model = with_cpu(model, model.ops)
        if output_layer:
            model = model >> with_cpu(output_layer, output_layer.ops)
    if nO is not None:
        model.set_dim("nO", cast(int, nO))
    model.set_ref("output_layer", sparse_linear)
    model.attrs["multi_label"] = not exclusive_classes
    model.attrs["resize_output"] = partial(
        resize_and_set_ref, resizable_layer=resizable_layer
    )
    return model
@registry.architectures("spacy.TextCatEnsemble.v2")
 def build_text_classifier_v2(
    tok2vec: Model[List[Doc], List[Floats2d]],
    linear_model: Model[List[Doc], Floats2d],
    nO: Optional[int] = None,
 ) -> Model[List[Doc], Floats2d]:
    exclusive_classes = not linear_model.attrs["multi_label"]
    with Model.define_operators({">>": chain, "|": concatenate}):
        width = tok2vec.maybe_get_dim("nO")
        attention_layer = ParametricAttention(width)
        maxout_layer = Maxout(nO=width, nI=width)
        norm_layer = LayerNorm(nI=width)
        cnn_model = (
            tok2vec
            >> list2ragged()
            >> attention_layer
            >> reduce_sum()
            >> residual(maxout_layer >> norm_layer >> Dropout(0.0))
        )
        nO_double = nO * 2 if nO else None
        if exclusive_classes:
            output_layer = Softmax(nO=nO, nI=nO_double)
        else:
            output_layer = Linear(nO=nO, nI=nO_double) >> Logistic()
        model = (linear_model | cnn_model) >> output_layer
        model.set_ref("tok2vec", tok2vec)
    if model.has_dim("nO") is not False and nO is not None:
        model.set_dim("nO", cast(int, nO))
    model.set_ref("output_layer", linear_model.get_ref("output_layer"))
    model.set_ref("attention_layer", attention_layer)
    model.set_ref("maxout_layer", maxout_layer)
    model.set_ref("norm_layer", norm_layer)
    model.attrs["multi_label"] = not exclusive_classes
    model.init = init_ensemble_textcat  # type: ignore[assignment]
    return model
 def init_ensemble_textcat(model, X, Y) -> Model:
    tok2vec_width = get_tok2vec_width(model)
    model.get_ref("attention_layer").set_dim("nO", tok2vec_width)
    model.get_ref("maxout_layer").set_dim("nO", tok2vec_width)
    model.get_ref("maxout_layer").set_dim("nI", tok2vec_width)
    model.get_ref("norm_layer").set_dim("nI", tok2vec_width)
    model.get_ref("norm_layer").set_dim("nO", tok2vec_width)
    init_chain(model, X, Y)
    return model
@registry.architectures("spacy.TextCatLowData.v1")
 def build_text_classifier_lowdata(
    width: int, dropout: Optional[float], nO: Optional[int] = None
 ) -> Model[List[Doc], Floats2d]:
    # Don't document this yet, I'm not sure it's right.
    # Note, before v.3, this was the default if setting "low_data" and "pretrained_dims"
    with Model.define_operators({">>": chain, "**": clone}):
        model = (
            StaticVectors(width)
            >> list2ragged()
            >> ParametricAttention(width)
            >> reduce_sum()
            >> residual(Relu(width, width)) ** 2
            >> Linear(nO, width)
        )
        if dropout:
            model = model >> Dropout(dropout)
        model = model >> Logistic()
    return model
--- a/spacy/ml/models/tok2vec.py
+++ b/spacy/ml/models/tok2vec.py
@ -67,8 +67,8 @@ def build_hash_embed_cnn_tok2vec(
        are between 2 and 8.
    window_size (int): The number of tokens on either side to concatenate during
        the convolutions. The receptive field of the CNN will be
-        depth * (window_size * 2 + 1), so a 4-layer network with window_size of
+        depth * window_size * 2 + 1, so a 4-layer network with window_size of
-        2 will be sensitive to 20 words at a time. Recommended value is 1.
+        2 will be sensitive to 17 words at a time. Recommended value is 1.
    embed_size (int): The number of rows in the hash embedding tables. This can
        be surprisingly small, due to the use of the hash embeddings. Recommended
        values are between 2000 and 10000.
--- a/spacy/ml/parser_model.pxd
+++ b/spacy/ml/parser_model.pxd
@ -0,0 +1,49 @@
 from libc.string cimport memcpy, memset
 from thinc.backends.cblas cimport CBlas
 from ..pipeline._parser_internals._state cimport StateC
 from ..typedefs cimport hash_t, weight_t
 cdef struct SizesC:
    int states
    int classes
    int hiddens
    int pieces
    int feats
    int embed_width
 cdef struct WeightsC:
    const float* feat_weights
    const float* feat_bias
    const float* hidden_bias
    const float* hidden_weights
    const float* seen_classes
 cdef struct ActivationsC:
    int* token_ids
    float* unmaxed
    float* scores
    float* hiddens
    int* is_valid
    int _curr_size
    int _max_size
 cdef WeightsC get_c_weights(model) except *
 cdef SizesC get_c_sizes(model, int batch_size) except *
 cdef ActivationsC alloc_activations(SizesC n) nogil
 cdef void free_activations(const ActivationsC* A) nogil
 cdef void predict_states(CBlas cblas, ActivationsC* A, StateC** states,
                         const WeightsC* W, SizesC n) nogil
 cdef int arg_max_if_valid(const weight_t* scores, const int* is_valid, int n) nogil
 cdef void cpu_log_loss(float* d_scores, const float* costs,
                       const int* is_valid, const float* scores, int O) nogil
--- a/spacy/ml/parser_model.pyx
+++ b/spacy/ml/parser_model.pyx
@ -0,0 +1,500 @@
 # cython: infer_types=True, cdivision=True, boundscheck=False
 # cython: profile=False
 cimport numpy as np
 from libc.math cimport exp
 from libc.stdlib cimport calloc, free, realloc
 from libc.string cimport memcpy, memset
 from thinc.backends.cblas cimport saxpy, sgemm
 import numpy
 import numpy.random
 from thinc.api import CupyOps, Model, NumpyOps, get_ops
 from .. import util
 from ..errors import Errors
 from ..pipeline._parser_internals.stateclass cimport StateClass
 from ..typedefs cimport weight_t
 cdef WeightsC get_c_weights(model) except *:
    cdef WeightsC output
    cdef precompute_hiddens state2vec = model.state2vec
    output.feat_weights = state2vec.get_feat_weights()
    output.feat_bias = <const float*>state2vec.bias.data
    cdef np.ndarray vec2scores_W
    cdef np.ndarray vec2scores_b
    if model.vec2scores is None:
        output.hidden_weights = NULL
        output.hidden_bias = NULL
    else:
        vec2scores_W = model.vec2scores.get_param("W")
        vec2scores_b = model.vec2scores.get_param("b")
        output.hidden_weights = <const float*>vec2scores_W.data
        output.hidden_bias = <const float*>vec2scores_b.data
    cdef np.ndarray class_mask = model._class_mask
    output.seen_classes = <const float*>class_mask.data
    return output
 cdef SizesC get_c_sizes(model, int batch_size) except *:
    cdef SizesC output
    output.states = batch_size
    if model.vec2scores is None:
        output.classes = model.state2vec.get_dim("nO")
    else:
        output.classes = model.vec2scores.get_dim("nO")
    output.hiddens = model.state2vec.get_dim("nO")
    output.pieces = model.state2vec.get_dim("nP")
    output.feats = model.state2vec.get_dim("nF")
    output.embed_width = model.tokvecs.shape[1]
    return output
 cdef ActivationsC alloc_activations(SizesC n) nogil:
    cdef ActivationsC A
    memset(&A, 0, sizeof(A))
    resize_activations(&A, n)
    return A
 cdef void free_activations(const ActivationsC* A) nogil:
    free(A.token_ids)
    free(A.scores)
    free(A.unmaxed)
    free(A.hiddens)
    free(A.is_valid)
 cdef void resize_activations(ActivationsC* A, SizesC n) nogil:
    if n.states <= A._max_size:
        A._curr_size = n.states
        return
    if A._max_size == 0:
        A.token_ids = <int*>calloc(n.states * n.feats, sizeof(A.token_ids[0]))
        A.scores = <float*>calloc(n.states * n.classes, sizeof(A.scores[0]))
        A.unmaxed = <float*>calloc(n.states * n.hiddens * n.pieces, sizeof(A.unmaxed[0]))
        A.hiddens = <float*>calloc(n.states * n.hiddens, sizeof(A.hiddens[0]))
        A.is_valid = <int*>calloc(n.states * n.classes, sizeof(A.is_valid[0]))
        A._max_size = n.states
    else:
        A.token_ids = <int*>realloc(A.token_ids,
                                    n.states * n.feats * sizeof(A.token_ids[0]))
        A.scores = <float*>realloc(A.scores,
                                   n.states * n.classes * sizeof(A.scores[0]))
        A.unmaxed = <float*>realloc(A.unmaxed,
                                    n.states * n.hiddens * n.pieces * sizeof(A.unmaxed[0]))
        A.hiddens = <float*>realloc(A.hiddens,
                                    n.states * n.hiddens * sizeof(A.hiddens[0]))
        A.is_valid = <int*>realloc(A.is_valid,
                                   n.states * n.classes * sizeof(A.is_valid[0]))
        A._max_size = n.states
    A._curr_size = n.states
 cdef void predict_states(CBlas cblas, ActivationsC* A, StateC** states,
                         const WeightsC* W, SizesC n) nogil:
    resize_activations(A, n)
    for i in range(n.states):
        states[i].set_context_tokens(&A.token_ids[i*n.feats], n.feats)
    memset(A.unmaxed, 0, n.states * n.hiddens * n.pieces * sizeof(float))
    memset(A.hiddens, 0, n.states * n.hiddens * sizeof(float))
    sum_state_features(cblas, A.unmaxed, W.feat_weights, A.token_ids, n.states,
                       n.feats, n.hiddens * n.pieces)
    for i in range(n.states):
        saxpy(cblas)(n.hiddens * n.pieces, 1., W.feat_bias, 1,
                     &A.unmaxed[i*n.hiddens*n.pieces], 1)
        for j in range(n.hiddens):
            index = i * n.hiddens * n.pieces + j * n.pieces
            which = _arg_max(&A.unmaxed[index], n.pieces)
            A.hiddens[i*n.hiddens + j] = A.unmaxed[index + which]
    memset(A.scores, 0, n.states * n.classes * sizeof(float))
    if W.hidden_weights == NULL:
        memcpy(A.scores, A.hiddens, n.states * n.classes * sizeof(float))
    else:
        # Compute hidden-to-output
        sgemm(cblas)(False, True, n.states, n.classes, n.hiddens, 1.0,
                     <const float *>A.hiddens, n.hiddens,
                     <const float *>W.hidden_weights, n.hiddens, 0.0,
                     A.scores, n.classes)
        # Add bias
        for i in range(n.states):
            saxpy(cblas)(n.classes, 1., W.hidden_bias, 1, &A.scores[i*n.classes], 1)
    # Set unseen classes to minimum value
    i = 0
    min_ = A.scores[0]
    for i in range(1, n.states * n.classes):
        if A.scores[i] < min_:
            min_ = A.scores[i]
    for i in range(n.states):
        for j in range(n.classes):
            if not W.seen_classes[j]:
                A.scores[i*n.classes+j] = min_
 cdef void sum_state_features(CBlas cblas, float* output, const float* cached,
                             const int* token_ids, int B, int F, int O) nogil:
    cdef int idx, b, f
    cdef const float* feature
    padding = cached
    cached += F * O
    cdef int id_stride = F*O
    cdef float one = 1.
    for b in range(B):
        for f in range(F):
            if token_ids[f] < 0:
                feature = &padding[f*O]
            else:
                idx = token_ids[f] * id_stride + f*O
                feature = &cached[idx]
            saxpy(cblas)(O, one, <const float*>feature, 1, &output[b*O], 1)
        token_ids += F
 cdef void cpu_log_loss(float* d_scores, const float* costs, const int* is_valid,
                       const float* scores, int O) nogil:
    """Do multi-label log loss"""
    cdef double max_, gmax, Z, gZ
    best = arg_max_if_gold(scores, costs, is_valid, O)
    guess = _arg_max(scores, O)
    if best == -1 or guess == -1:
        # These shouldn't happen, but if they do, we want to make sure we don't
        # cause an OOB access.
        return
    Z = 1e-10
    gZ = 1e-10
    max_ = scores[guess]
    gmax = scores[best]
    for i in range(O):
        Z += exp(scores[i] - max_)
        if costs[i] <= costs[best]:
            gZ += exp(scores[i] - gmax)
    for i in range(O):
        if costs[i] <= costs[best]:
            d_scores[i] = (exp(scores[i]-max_) / Z) - (exp(scores[i]-gmax)/gZ)
        else:
            d_scores[i] = exp(scores[i]-max_) / Z
 cdef int arg_max_if_gold(const weight_t* scores, const weight_t* costs,
                         const int* is_valid, int n) nogil:
    # Find minimum cost
    cdef float cost = 1
    for i in range(n):
        if is_valid[i] and costs[i] < cost:
            cost = costs[i]
    # Now find best-scoring with that cost
    cdef int best = -1
    for i in range(n):
        if costs[i] <= cost and is_valid[i]:
            if best == -1 or scores[i] > scores[best]:
                best = i
    return best
 cdef int arg_max_if_valid(const weight_t* scores, const int* is_valid, int n) nogil:
    cdef int best = -1
    for i in range(n):
        if is_valid[i] >= 1:
            if best == -1 or scores[i] > scores[best]:
                best = i
    return best
 class ParserStepModel(Model):
    def __init__(self, docs, layers, *, has_upper, unseen_classes=None, train=True,
                 dropout=0.1):
        Model.__init__(self, name="parser_step_model", forward=step_forward)
        self.attrs["has_upper"] = has_upper
        self.attrs["dropout_rate"] = dropout
        self.tokvecs, self.bp_tokvecs = layers[0](docs, is_train=train)
        if layers[1].get_dim("nP") >= 2:
            activation = "maxout"
        elif has_upper:
            activation = None
        else:
            activation = "relu"
        self.state2vec = precompute_hiddens(len(docs), self.tokvecs, layers[1],
                                            activation=activation, train=train)
        if has_upper:
            self.vec2scores = layers[-1]
        else:
            self.vec2scores = None
        self.cuda_stream = util.get_cuda_stream(non_blocking=True)
        self.backprops = []
        self._class_mask = numpy.zeros((self.nO,), dtype='f')
        self._class_mask.fill(1)
        if unseen_classes is not None:
            for class_ in unseen_classes:
                self._class_mask[class_] = 0.
    def clear_memory(self):
        del self.tokvecs
        del self.bp_tokvecs
        del self.state2vec
        del self.backprops
        del self._class_mask
    @property
    def nO(self):
        if self.attrs["has_upper"]:
            return self.vec2scores.get_dim("nO")
        else:
            return self.state2vec.get_dim("nO")
    def class_is_unseen(self, class_):
        return self._class_mask[class_]
    def mark_class_unseen(self, class_):
        self._class_mask[class_] = 0
    def mark_class_seen(self, class_):
        self._class_mask[class_] = 1
    def get_token_ids(self, states):
        cdef StateClass state
        states = [state for state in states if not state.is_final()]
        cdef np.ndarray ids = numpy.zeros((len(states), self.state2vec.nF),
                                          dtype='i', order='C')
        ids.fill(-1)
        c_ids = <int*>ids.data
        for state in states:
            state.c.set_context_tokens(c_ids, ids.shape[1])
            c_ids += ids.shape[1]
        return ids
    def backprop_step(self, token_ids, d_vector, get_d_tokvecs):
        if isinstance(self.state2vec.ops, CupyOps) \
           and not isinstance(token_ids, self.state2vec.ops.xp.ndarray):
            # Move token_ids and d_vector to GPU, asynchronously
            self.backprops.append((
                util.get_async(self.cuda_stream, token_ids),
                util.get_async(self.cuda_stream, d_vector),
                get_d_tokvecs
            ))
        else:
            self.backprops.append((token_ids, d_vector, get_d_tokvecs))
    def finish_steps(self, golds):
        # Add a padding vector to the d_tokvecs gradient, so that missing
        # values don't affect the real gradient.
        d_tokvecs = self.ops.alloc((self.tokvecs.shape[0]+1, self.tokvecs.shape[1]))
        # Tells CUDA to block, so our async copies complete.
        if self.cuda_stream is not None:
            self.cuda_stream.synchronize()
        for ids, d_vector, bp_vector in self.backprops:
            d_state_features = bp_vector((d_vector, ids))
            ids = ids.flatten()
            d_state_features = d_state_features.reshape(
                (ids.size, d_state_features.shape[2]))
            self.ops.scatter_add(d_tokvecs, ids, d_state_features)
        # Padded -- see update()
        self.bp_tokvecs(d_tokvecs[:-1])
        return d_tokvecs
 NUMPY_OPS = NumpyOps()
 def step_forward(model: ParserStepModel, states, is_train):
    token_ids = model.get_token_ids(states)
    vector, get_d_tokvecs = model.state2vec(token_ids, is_train)
    mask = None
    if model.attrs["has_upper"]:
        dropout_rate = model.attrs["dropout_rate"]
        if is_train and dropout_rate > 0:
            mask = NUMPY_OPS.get_dropout_mask(vector.shape, 0.1)
            vector *= mask
        scores, get_d_vector = model.vec2scores(vector, is_train)
    else:
        scores = NumpyOps().asarray(vector)
        def get_d_vector(d_scores): return d_scores
    # If the class is unseen, make sure its score is minimum
    scores[:, model._class_mask == 0] = numpy.nanmin(scores)
    def backprop_parser_step(d_scores):
        # Zero vectors for unseen classes
        d_scores *= model._class_mask
        d_vector = get_d_vector(d_scores)
        if mask is not None:
            d_vector *= mask
        model.backprop_step(token_ids, d_vector, get_d_tokvecs)
        return None
    return scores, backprop_parser_step
 cdef class precompute_hiddens:
    """Allow a model to be "primed" by pre-computing input features in bulk.
    This is used for the parser, where we want to take a batch of documents,
    and compute vectors for each (token, position) pair. These vectors can then
    be reused, especially for beam-search.
    Let's say we're using 12 features for each state, e.g. word at start of
    buffer, three words on stack, their children, etc. In the normal arc-eager
    system, a document of length N is processed in 2*N states. This means we'll
    create 2*N*12 feature vectors --- but if we pre-compute, we only need
    N*12 vector computations. The saving for beam-search is much better:
    if we have a beam of k, we'll normally make 2*N*12*K computations --
    so we can save the factor k. This also gives a nice CPU/GPU division:
    we can do all our hard maths up front, packed into large multiplications,
    and do the hard-to-program parsing on the CPU.
    """
    cdef readonly int nF, nO, nP
    cdef bint _is_synchronized
    cdef public object ops
    cdef public object numpy_ops
    cdef public object _cpu_ops
    cdef np.ndarray _features
    cdef np.ndarray _cached
    cdef np.ndarray bias
    cdef object _cuda_stream
    cdef object _bp_hiddens
    cdef object activation
    def __init__(self, batch_size, tokvecs, lower_model, cuda_stream=None,
                 activation="maxout", train=False):
        gpu_cached, bp_features = lower_model(tokvecs, train)
        cdef np.ndarray cached
        if not isinstance(gpu_cached, numpy.ndarray):
            # Note the passing of cuda_stream here: it lets
            # cupy make the copy asynchronously.
            # We then have to block before first use.
            cached = gpu_cached.get(stream=cuda_stream)
        else:
            cached = gpu_cached
        if not isinstance(lower_model.get_param("b"), numpy.ndarray):
            self.bias = lower_model.get_param("b").get(stream=cuda_stream)
        else:
            self.bias = lower_model.get_param("b")
        self.nF = cached.shape[1]
        if lower_model.has_dim("nP"):
            self.nP = lower_model.get_dim("nP")
        else:
            self.nP = 1
        self.nO = cached.shape[2]
        self.ops = lower_model.ops
        self.numpy_ops = NumpyOps()
        self._cpu_ops = get_ops("cpu") if isinstance(self.ops, CupyOps) else self.ops
        assert activation in (None, "relu", "maxout")
        self.activation = activation
        self._is_synchronized = False
        self._cuda_stream = cuda_stream
        self._cached = cached
        self._bp_hiddens = bp_features
    cdef const float* get_feat_weights(self) except NULL:
        if not self._is_synchronized and self._cuda_stream is not None:
            self._cuda_stream.synchronize()
            self._is_synchronized = True
        return <float*>self._cached.data
    def has_dim(self, name):
        if name == "nF":
            return self.nF if self.nF is not None else True
        elif name == "nP":
            return self.nP if self.nP is not None else True
        elif name == "nO":
            return self.nO if self.nO is not None else True
        else:
            return False
    def get_dim(self, name):
        if name == "nF":
            return self.nF
        elif name == "nP":
            return self.nP
        elif name == "nO":
            return self.nO
        else:
            raise ValueError(Errors.E1033.format(name=name))
    def set_dim(self, name, value):
        if name == "nF":
            self.nF = value
        elif name == "nP":
            self.nP = value
        elif name == "nO":
            self.nO = value
        else:
            raise ValueError(Errors.E1033.format(name=name))
    def __call__(self, X, bint is_train):
        if is_train:
            return self.begin_update(X)
        else:
            return self.predict(X), lambda X: X
    def predict(self, X):
        return self.begin_update(X)[0]
    def begin_update(self, token_ids):
        cdef np.ndarray state_vector = numpy.zeros(
            (token_ids.shape[0], self.nO, self.nP), dtype='f')
        # This is tricky, but (assuming GPU available);
        # - Input to forward on CPU
        # - Output from forward on CPU
        # - Input to backward on GPU!
        # - Output from backward on GPU
        bp_hiddens = self._bp_hiddens
        cdef CBlas cblas = self._cpu_ops.cblas()
        feat_weights = self.get_feat_weights()
        cdef int[:, ::1] ids = token_ids
        sum_state_features(cblas, <float*>state_vector.data,
                           feat_weights, &ids[0, 0], token_ids.shape[0],
                           self.nF, self.nO*self.nP)
        state_vector += self.bias
        state_vector, bp_nonlinearity = self._nonlinearity(state_vector)
        def backward(d_state_vector_ids):
            d_state_vector, token_ids = d_state_vector_ids
            d_state_vector = bp_nonlinearity(d_state_vector)
            d_tokens = bp_hiddens((d_state_vector, token_ids))
            return d_tokens
        return state_vector, backward
    def _nonlinearity(self, state_vector):
        if self.activation == "maxout":
            return self._maxout_nonlinearity(state_vector)
        else:
            return self._relu_nonlinearity(state_vector)
    def _maxout_nonlinearity(self, state_vector):
        state_vector, mask = self.numpy_ops.maxout(state_vector)
        # We're outputting to CPU, but we need this variable on GPU for the
        # backward pass.
        mask = self.ops.asarray(mask)
        def backprop_maxout(d_best):
            return self.ops.backprop_maxout(d_best, mask, self.nP)
        return state_vector, backprop_maxout
    def _relu_nonlinearity(self, state_vector):
        state_vector = state_vector.reshape((state_vector.shape[0], -1))
        mask = state_vector >= 0.
        state_vector *= mask
        # We're outputting to CPU, but we need this variable on GPU for the
        # backward pass.
        mask = self.ops.asarray(mask)
        def backprop_relu(d_best):
            d_best *= mask
            return d_best.reshape((d_best.shape + (1,)))
        return state_vector, backprop_relu
 cdef inline int _arg_max(const float* scores, const int n_classes) nogil:
    if n_classes == 2:
        return 0 if scores[0] > scores[1] else 1
    cdef int i
    cdef int best = 0
    cdef float mode = scores[0]
    for i in range(1, n_classes):
        if scores[i] > mode:
            mode = scores[i]
            best = i
    return best
--- a/spacy/ml/staticvectors.py
+++ b/spacy/ml/staticvectors.py
@ -9,7 +9,7 @@ from thinc.util import partial
 from ..attrs import ORTH
 from ..errors import Errors, Warnings
 from ..tokens import Doc
-from ..vectors import Mode
+from ..vectors import Mode, Vectors
 from ..vocab import Vocab
@ -48,11 +48,14 @@ def forward(
    key_attr: int = getattr(vocab.vectors, "attr", ORTH)
    keys = model.ops.flatten([cast(Ints1d, doc.to_array(key_attr)) for doc in docs])
    W = cast(Floats2d, model.ops.as_contig(model.get_param("W")))
-    if vocab.vectors.mode == Mode.default:
+    if isinstance(vocab.vectors, Vectors) and vocab.vectors.mode == Mode.default:
        V = model.ops.asarray(vocab.vectors.data)
        rows = vocab.vectors.find(keys=keys)
        V = model.ops.as_contig(V[rows])
-    elif vocab.vectors.mode == Mode.floret:
+    elif isinstance(vocab.vectors, Vectors) and vocab.vectors.mode == Mode.floret:
        V = vocab.vectors.get_batch(keys)
        V = model.ops.as_contig(V)
    elif hasattr(vocab.vectors, "get_batch"):
        V = vocab.vectors.get_batch(keys)
        V = model.ops.as_contig(V)
    else:
@ -61,7 +64,7 @@ def forward(
        vectors_data = model.ops.gemm(V, W, trans2=True)
    except ValueError:
        raise RuntimeError(Errors.E896)
-    if vocab.vectors.mode == Mode.default:
+    if isinstance(vocab.vectors, Vectors) and vocab.vectors.mode == Mode.default:
        # Convert negative indices to 0-vectors
        # TODO: more options for UNK tokens
        vectors_data[rows < 0] = 0
--- a/spacy/ml/tb_framework.pxd
+++ b/spacy/ml/tb_framework.pxd
@ -1,28 +0,0 @@
 from libc.stdint cimport int8_t
 cdef struct SizesC:
    int states
    int classes
    int hiddens
    int pieces
    int feats
    int embed_width
    int tokens
 cdef struct WeightsC:
    const float* feat_weights
    const float* feat_bias
    const float* hidden_bias
    const float* hidden_weights
    const int8_t* seen_mask
 cdef struct ActivationsC:
    int* token_ids
    float* unmaxed
    float* hiddens
    int* is_valid
    int _curr_size
    int _max_size
--- a/spacy/ml/tb_framework.py
+++ b/spacy/ml/tb_framework.py
@ -0,0 +1,51 @@
 from thinc.api import Model, noop
 from ..util import registry
 from .parser_model import ParserStepModel
@registry.layers("spacy.TransitionModel.v1")
 def TransitionModel(
    tok2vec, lower, upper, resize_output, dropout=0.2, unseen_classes=set()
 ):
    """Set up a stepwise transition-based model"""
    if upper is None:
        has_upper = False
        upper = noop()
    else:
        has_upper = True
    # don't define nO for this object, because we can't dynamically change it
    return Model(
        name="parser_model",
        forward=forward,
        dims={"nI": tok2vec.maybe_get_dim("nI")},
        layers=[tok2vec, lower, upper],
        refs={"tok2vec": tok2vec, "lower": lower, "upper": upper},
        init=init,
        attrs={
            "has_upper": has_upper,
            "unseen_classes": set(unseen_classes),
            "resize_output": resize_output,
        },
    )
 def forward(model, X, is_train):
    step_model = ParserStepModel(
        X,
        model.layers,
        unseen_classes=model.attrs["unseen_classes"],
        train=is_train,
        has_upper=model.attrs["has_upper"],
    )
    return step_model, step_model.finish_steps
 def init(model, X=None, Y=None):
    model.get_ref("tok2vec").initialize(X=X)
    lower = model.get_ref("lower")
    lower.initialize()
    if model.attrs["has_upper"]:
        statevecs = model.ops.alloc2f(2, lower.get_dim("nO"))
        model.get_ref("upper").initialize(X=statevecs)
--- a/spacy/ml/tb_framework.pyx
+++ b/spacy/ml/tb_framework.pyx
@ -1,641 +0,0 @@
 # cython: infer_types=True, cdivision=True, boundscheck=False
 from typing import Any, List, Optional, Tuple, cast
 from libc.stdlib cimport calloc, free, realloc
 from libc.string cimport memcpy, memset
 from libcpp.vector cimport vector
 import numpy
 cimport numpy as np
 from thinc.api import (
    Linear,
    Model,
    NumpyOps,
    chain,
    glorot_uniform_init,
    list2array,
    normal_init,
    uniform_init,
    zero_init,
 )
 from thinc.backends.cblas cimport CBlas, saxpy, sgemm
 from thinc.types import Floats2d, Floats3d, Floats4d, Ints1d, Ints2d
 from ..errors import Errors
 from ..pipeline._parser_internals import _beam_utils
 from ..pipeline._parser_internals.batch import GreedyBatch
 from ..pipeline._parser_internals._parser_utils cimport arg_max
 from ..pipeline._parser_internals.stateclass cimport StateC, StateClass
 from ..pipeline._parser_internals.transition_system cimport (
    TransitionSystem,
    c_apply_actions,
    c_transition_batch,
 )
 from ..tokens.doc import Doc
 from ..util import registry
 State = Any  # TODO
@registry.layers("spacy.TransitionModel.v2")
 def TransitionModel(
    *,
    tok2vec: Model[List[Doc], List[Floats2d]],
    beam_width: int = 1,
    beam_density: float = 0.0,
    state_tokens: int,
    hidden_width: int,
    maxout_pieces: int,
    nO: Optional[int] = None,
    unseen_classes=set(),
 ) -> Model[Tuple[List[Doc], TransitionSystem], List[Tuple[State, List[Floats2d]]]]:
    """Set up a transition-based parsing model, using a maxout hidden
    layer and a linear output layer.
    """
    t2v_width = tok2vec.get_dim("nO") if tok2vec.has_dim("nO") else None
    tok2vec_projected = chain(tok2vec, list2array(), Linear(hidden_width, t2v_width))  # type: ignore
    tok2vec_projected.set_dim("nO", hidden_width)
    # FIXME: we use `output` as a container for the output layer's
    # weights and biases. Thinc optimizers cannot handle resizing
    # of parameters. So, when the parser model is resized, we
    # construct a new `output` layer, which has a different key in
    # the optimizer. Once the optimizer supports parameter resizing,
    # we can replace the `output` layer by `output_W` and `output_b`
    # parameters in this model.
    output = Linear(nO=None, nI=hidden_width, init_W=zero_init)
    return Model(
        name="parser_model",
        forward=forward,
        init=init,
        layers=[tok2vec_projected, output],
        refs={
            "tok2vec": tok2vec_projected,
            "output": output,
        },
        params={
            "hidden_W": None,  # Floats2d W for the hidden layer
            "hidden_b": None,  # Floats1d bias for the hidden layer
            "hidden_pad": None,  # Floats1d padding for the hidden layer
        },
        dims={
            "nO": None,  # Output size
            "nP": maxout_pieces,
            "nH": hidden_width,
            "nI": tok2vec_projected.maybe_get_dim("nO"),
            "nF": state_tokens,
        },
        attrs={
            "beam_width": beam_width,
            "beam_density": beam_density,
            "unseen_classes": set(unseen_classes),
            "resize_output": resize_output,
        },
    )
 def resize_output(model: Model, new_nO: int) -> Model:
    old_nO = model.maybe_get_dim("nO")
    output = model.get_ref("output")
    if old_nO is None:
        model.set_dim("nO", new_nO)
        output.set_dim("nO", new_nO)
        output.initialize()
        return model
    elif new_nO <= old_nO:
        return model
    elif output.has_param("W"):
        nH = model.get_dim("nH")
        new_output = Linear(nO=new_nO, nI=nH, init_W=zero_init)
        new_output.initialize()
        new_W = new_output.get_param("W")
        new_b = new_output.get_param("b")
        old_W = output.get_param("W")
        old_b = output.get_param("b")
        new_W[:old_nO] = old_W  # type: ignore
        new_b[:old_nO] = old_b  # type: ignore
        for i in range(old_nO, new_nO):
            model.attrs["unseen_classes"].add(i)
        model.layers[-1] = new_output
        model.set_ref("output", new_output)
    # TODO: Avoid this private intrusion
    model._dims["nO"] = new_nO
    return model
 def init(
    model,
    X: Optional[Tuple[List[Doc], TransitionSystem]] = None,
    Y: Optional[Tuple[List[State], List[Floats2d]]] = None,
 ):
    if X is not None:
        docs, _ = X
        model.get_ref("tok2vec").initialize(X=docs)
    else:
        model.get_ref("tok2vec").initialize()
    inferred_nO = _infer_nO(Y)
    if inferred_nO is not None:
        current_nO = model.maybe_get_dim("nO")
        if current_nO is None or current_nO != inferred_nO:
            model.attrs["resize_output"](model, inferred_nO)
    nP = model.get_dim("nP")
    nH = model.get_dim("nH")
    nI = model.get_dim("nI")
    nF = model.get_dim("nF")
    ops = model.ops
    Wl = ops.alloc2f(nH * nP, nF * nI)
    bl = ops.alloc1f(nH * nP)
    padl = ops.alloc1f(nI)
    # Wl = zero_init(ops, Wl.shape)
    Wl = glorot_uniform_init(ops, Wl.shape)
    padl = uniform_init(ops, padl.shape)  # type: ignore
    # TODO: Experiment with whether better to initialize output_W
    model.set_param("hidden_W", Wl)
    model.set_param("hidden_b", bl)
    model.set_param("hidden_pad", padl)
    # model = _lsuv_init(model)
    return model
 class TransitionModelInputs:
    """
    Input to transition model.
    """
    # dataclass annotation is not yet supported in Cython 0.29.x,
    # so, we'll do something close to it.
    actions: Optional[List[Ints1d]]
    docs: List[Doc]
    max_moves: int
    moves: TransitionSystem
    states: Optional[List[State]]
    __slots__ = [
        "actions",
        "docs",
        "max_moves",
        "moves",
        "states",
    ]
    def __init__(
        self,
        docs: List[Doc],
        moves: TransitionSystem,
        actions: Optional[List[Ints1d]] = None,
        max_moves: int = 0,
        states: Optional[List[State]] = None,
    ):
        """
        actions (Optional[List[Ints1d]]): actions to apply for each Doc.
        docs (List[Doc]): Docs to predict transition sequences for.
        max_moves: (int): the maximum number of moves to apply, values less
            than 1 will apply moves to states until they are final states.
        moves (TransitionSystem): the transition system to use when predicting
            the transition sequences.
        states (Optional[List[States]]): the initial states to predict the
            transition sequences for. When absent, the initial states are
            initialized from the provided Docs.
        """
        self.actions = actions
        self.docs = docs
        self.moves = moves
        self.max_moves = max_moves
        self.states = states
 def forward(model, inputs: TransitionModelInputs, is_train: bool):
    docs = inputs.docs
    moves = inputs.moves
    actions = inputs.actions
    beam_width = model.attrs["beam_width"]
    hidden_pad = model.get_param("hidden_pad")
    tok2vec = model.get_ref("tok2vec")
    states = moves.init_batch(docs) if inputs.states is None else inputs.states
    tokvecs, backprop_tok2vec = tok2vec(docs, is_train)
    tokvecs = model.ops.xp.vstack((tokvecs, hidden_pad))
    feats, backprop_feats = _forward_precomputable_affine(model, tokvecs, is_train)
    seen_mask = _get_seen_mask(model)
    if not is_train and beam_width == 1 and isinstance(model.ops, NumpyOps):
        # Note: max_moves is only used during training, so we don't need to
        #       pass it to the greedy inference path.
        return _forward_greedy_cpu(model, moves, states, feats, seen_mask, actions=actions)
    else:
        return _forward_fallback(model, moves, states, tokvecs, backprop_tok2vec,
                                 feats, backprop_feats, seen_mask, is_train, actions=actions,
                                 max_moves=inputs.max_moves)
 def _forward_greedy_cpu(model: Model, TransitionSystem moves, states: List[StateClass], np.ndarray feats,
                        np.ndarray[np.npy_bool, ndim = 1] seen_mask, actions: Optional[List[Ints1d]] = None):
    cdef vector[StateC*] c_states
    cdef StateClass state
    for state in states:
        if not state.is_final():
            c_states.push_back(state.c)
    weights = _get_c_weights(model, <float*>feats.data, seen_mask)
    # Precomputed features have rows for each token, plus one for padding.
    cdef int n_tokens = feats.shape[0] - 1
    sizes = _get_c_sizes(model, c_states.size(), n_tokens)
    cdef CBlas cblas = model.ops.cblas()
    scores = _parse_batch(cblas, moves, &c_states[0], weights, sizes, actions=actions)
    def backprop(dY):
        raise ValueError(Errors.E4004)
    return (states, scores), backprop
 cdef list _parse_batch(CBlas cblas, TransitionSystem moves, StateC** states,
                       WeightsC weights, SizesC sizes, actions: Optional[List[Ints1d]]=None):
    cdef int i
    cdef vector[StateC *] unfinished
    cdef ActivationsC activations = _alloc_activations(sizes)
    cdef np.ndarray step_scores
    cdef np.ndarray step_actions
    scores = []
    while sizes.states >= 1 and (actions is None or len(actions) > 0):
        step_scores = numpy.empty((sizes.states, sizes.classes), dtype="f")
        step_actions = actions[0] if actions is not None else None
        assert step_actions is None or step_actions.size == sizes.states, \
            f"number of step actions ({step_actions.size}) must equal number of states ({sizes.states})"
        with nogil:
            _predict_states(cblas, &activations, <float*>step_scores.data, states, &weights, sizes)
            if actions is None:
                # Validate actions, argmax, take action.
                c_transition_batch(moves, states, <const float*>step_scores.data, sizes.classes,
                                   sizes.states)
            else:
                c_apply_actions(moves, states, <const int*>step_actions.data, sizes.states)
            for i in range(sizes.states):
                if not states[i].is_final():
                    unfinished.push_back(states[i])
            for i in range(unfinished.size()):
                states[i] = unfinished[i]
        sizes.states = unfinished.size()
        scores.append(step_scores)
        unfinished.clear()
        actions = actions[1:] if actions is not None else None
    _free_activations(&activations)
    return scores
 def _forward_fallback(
    model: Model,
    moves: TransitionSystem,
    states: List[StateClass],
    tokvecs, backprop_tok2vec,
    feats,
    backprop_feats,
    seen_mask,
    is_train: bool,
    actions: Optional[List[Ints1d]] = None,
    max_moves: int = 0,
 ):
    nF = model.get_dim("nF")
    output = model.get_ref("output")
    hidden_b = model.get_param("hidden_b")
    nH = model.get_dim("nH")
    nP = model.get_dim("nP")
    beam_width = model.attrs["beam_width"]
    beam_density = model.attrs["beam_density"]
    ops = model.ops
    all_ids = []
    all_which = []
    all_statevecs = []
    all_scores = []
    if beam_width == 1:
        batch = GreedyBatch(moves, states, None)
    else:
        batch = _beam_utils.BeamBatch(
            moves, states, None, width=beam_width, density=beam_density
        )
    arange = ops.xp.arange(nF)
    n_moves = 0
    while not batch.is_done:
        ids = numpy.zeros((len(batch.get_unfinished_states()), nF), dtype="i")
        for i, state in enumerate(batch.get_unfinished_states()):
            state.set_context_tokens(ids, i, nF)
        # Sum the state features, add the bias and apply the activation (maxout)
        # to create the state vectors.
        preacts2f = feats[ids, arange].sum(axis=1)  # type: ignore
        preacts2f += hidden_b
        preacts = ops.reshape3f(preacts2f, preacts2f.shape[0], nH, nP)
        assert preacts.shape[0] == len(batch.get_unfinished_states()), preacts.shape
        statevecs, which = ops.maxout(preacts)
        # We don't use output's backprop, since we want to backprop for
        # all states at once, rather than a single state.
        scores = output.predict(statevecs)
        scores[:, seen_mask] = ops.xp.nanmin(scores)
        # Transition the states, filtering out any that are finished.
        cpu_scores = ops.to_numpy(scores)
        if actions is None:
            batch.advance(cpu_scores)
        else:
            batch.advance_with_actions(actions[0])
            actions = actions[1:]
        all_scores.append(scores)
        if is_train:
            # Remember intermediate results for the backprop.
            all_ids.append(ids)
            all_statevecs.append(statevecs)
            all_which.append(which)
        if n_moves >= max_moves >= 1:
            break
        n_moves += 1
    def backprop_parser(d_states_d_scores):
        ids = ops.xp.vstack(all_ids)
        which = ops.xp.vstack(all_which)
        statevecs = ops.xp.vstack(all_statevecs)
        _, d_scores = d_states_d_scores
        if model.attrs.get("unseen_classes"):
            # If we have a negative gradient (i.e. the probability should
            # increase) on any classes we filtered out as unseen, mark
            # them as seen.
            for clas in set(model.attrs["unseen_classes"]):
                if (d_scores[:, clas] < 0).any():
                    model.attrs["unseen_classes"].remove(clas)
        d_scores *= seen_mask == False  # no-cython-lint
        # Calculate the gradients for the parameters of the output layer.
        # The weight gemm is (nS, nO) @ (nS, nH).T
        output.inc_grad("b", d_scores.sum(axis=0))
        output.inc_grad("W", ops.gemm(d_scores, statevecs, trans1=True))
        # Now calculate d_statevecs, by backproping through the output linear layer.
        # This gemm is (nS, nO) @ (nO, nH)
        output_W = output.get_param("W")
        d_statevecs = ops.gemm(d_scores, output_W)
        # Backprop through the maxout activation
        d_preacts = ops.backprop_maxout(d_statevecs, which, nP)
        d_preacts2f = ops.reshape2f(d_preacts, d_preacts.shape[0], nH * nP)
        model.inc_grad("hidden_b", d_preacts2f.sum(axis=0))
        # We don't need to backprop the summation, because we pass back the IDs instead
        d_state_features = backprop_feats((d_preacts2f, ids))
        d_tokvecs = ops.alloc2f(tokvecs.shape[0], tokvecs.shape[1])
        ops.scatter_add(d_tokvecs, ids, d_state_features)
        model.inc_grad("hidden_pad", d_tokvecs[-1])
        return (backprop_tok2vec(d_tokvecs[:-1]), None)
    return (list(batch), all_scores), backprop_parser
 def _get_seen_mask(model: Model) -> numpy.array[bool, 1]:
    mask = model.ops.xp.zeros(model.get_dim("nO"), dtype="bool")
    for class_ in model.attrs.get("unseen_classes", set()):
        mask[class_] = True
    return mask
 def _forward_precomputable_affine(model, X: Floats2d, is_train: bool):
    W: Floats2d = model.get_param("hidden_W")
    nF = model.get_dim("nF")
    nH = model.get_dim("nH")
    nP = model.get_dim("nP")
    nI = model.get_dim("nI")
    # The weights start out (nH * nP, nF * nI). Transpose and reshape to (nF * nH *nP, nI)
    W3f = model.ops.reshape3f(W, nH * nP, nF, nI)
    W3f = W3f.transpose((1, 0, 2))
    W2f = model.ops.reshape2f(W3f, nF * nH * nP, nI)
    assert X.shape == (X.shape[0], nI), X.shape
    Yf_ = model.ops.gemm(X, W2f, trans2=True)
    Yf = model.ops.reshape3f(Yf_, Yf_.shape[0], nF, nH * nP)
    def backward(dY_ids: Tuple[Floats3d, Ints2d]):
        # This backprop is particularly tricky, because we get back a different
        # thing from what we put out. We put out an array of shape:
        # (nB, nF, nH, nP), and get back:
        # (nB, nH, nP) and ids (nB, nF)
        # The ids tell us the values of nF, so we would have:
        #
        # dYf = zeros((nB, nF, nH, nP))
        # for b in range(nB):
        #     for f in range(nF):
        #         dYf[b, ids[b, f]] += dY[b]
        #
        # However, we avoid building that array for efficiency -- and just pass
        # in the indices.
        dY, ids = dY_ids
        dXf = model.ops.gemm(dY, W)
        Xf = X[ids].reshape((ids.shape[0], -1))
        dW = model.ops.gemm(dY, Xf, trans1=True)
        model.inc_grad("hidden_W", dW)
        return model.ops.reshape3f(dXf, dXf.shape[0], nF, nI)
    return Yf, backward
 def _infer_nO(Y: Optional[Tuple[List[State], List[Floats2d]]]) -> Optional[int]:
    if Y is None:
        return None
    _, scores = Y
    if len(scores) == 0:
        return None
    assert scores[0].shape[0] >= 1
    assert len(scores[0].shape) == 2
    return scores[0].shape[1]
 def _lsuv_init(model: Model):
    """This is like the 'layer sequential unit variance', but instead
    of taking the actual inputs, we randomly generate whitened data.
    Why's this all so complicated? We have a huge number of inputs,
    and the maxout unit makes guessing the dynamics tricky. Instead
    we set the maxout weights to values that empirically result in
    whitened outputs given whitened inputs.
    """
    W = model.maybe_get_param("hidden_W")
    if W is not None and W.any():
        return
    nF = model.get_dim("nF")
    nH = model.get_dim("nH")
    nP = model.get_dim("nP")
    nI = model.get_dim("nI")
    W = model.ops.alloc4f(nF, nH, nP, nI)
    b = model.ops.alloc2f(nH, nP)
    pad = model.ops.alloc4f(1, nF, nH, nP)
    ops = model.ops
    W = normal_init(ops, W.shape, mean=float(ops.xp.sqrt(1.0 / nF * nI)))
    pad = normal_init(ops, pad.shape, mean=1.0)
    model.set_param("W", W)
    model.set_param("b", b)
    model.set_param("pad", pad)
    ids = ops.alloc_f((5000, nF), dtype="f")
    ids += ops.xp.random.uniform(0, 1000, ids.shape)
    ids = ops.asarray(ids, dtype="i")
    tokvecs = ops.alloc_f((5000, nI), dtype="f")
    tokvecs += ops.xp.random.normal(loc=0.0, scale=1.0, size=tokvecs.size).reshape(
        tokvecs.shape
    )
    def predict(ids, tokvecs):
        # nS ids. nW tokvecs. Exclude the padding array.
        hiddens, _ = _forward_precomputable_affine(model, tokvecs[:-1], False)
        vectors = model.ops.alloc2f(ids.shape[0], nH * nP)
        # need nS vectors
        hiddens = hiddens.reshape((hiddens.shape[0] * nF, nH * nP))
        model.ops.scatter_add(vectors, ids.flatten(), hiddens)
        vectors3f = model.ops.reshape3f(vectors, vectors.shape[0], nH, nP)
        vectors3f += b
        return model.ops.maxout(vectors3f)[0]
    tol_var = 0.01
    tol_mean = 0.01
    t_max = 10
    W = cast(Floats4d, model.get_param("hidden_W").copy())
    b = cast(Floats2d, model.get_param("hidden_b").copy())
    for t_i in range(t_max):
        acts1 = predict(ids, tokvecs)
        var = model.ops.xp.var(acts1)
        mean = model.ops.xp.mean(acts1)
        if abs(var - 1.0) >= tol_var:
            W /= model.ops.xp.sqrt(var)
            model.set_param("hidden_W", W)
        elif abs(mean) >= tol_mean:
            b -= mean
            model.set_param("hidden_b", b)
        else:
            break
    return model
 cdef WeightsC _get_c_weights(model, const float* feats, np.ndarray[np.npy_bool, ndim=1] seen_mask) except *:
    output = model.get_ref("output")
    cdef np.ndarray hidden_b = model.get_param("hidden_b")
    cdef np.ndarray output_W = output.get_param("W")
    cdef np.ndarray output_b = output.get_param("b")
    cdef WeightsC weights
    weights.feat_weights = feats
    weights.feat_bias = <const float*>hidden_b.data
    weights.hidden_weights = <const float *> output_W.data
    weights.hidden_bias = <const float *> output_b.data
    weights.seen_mask = <const int8_t*> seen_mask.data
    return weights
 cdef SizesC _get_c_sizes(model, int batch_size, int tokens) except *:
    cdef SizesC sizes
    sizes.states = batch_size
    sizes.classes = model.get_dim("nO")
    sizes.hiddens = model.get_dim("nH")
    sizes.pieces = model.get_dim("nP")
    sizes.feats = model.get_dim("nF")
    sizes.embed_width = model.get_dim("nI")
    sizes.tokens = tokens
    return sizes
 cdef ActivationsC _alloc_activations(SizesC n) nogil:
    cdef ActivationsC A
    memset(&A, 0, sizeof(A))
    _resize_activations(&A, n)
    return A
 cdef void _free_activations(const ActivationsC* A) nogil:
    free(A.token_ids)
    free(A.unmaxed)
    free(A.hiddens)
    free(A.is_valid)
 cdef void _resize_activations(ActivationsC* A, SizesC n) nogil:
    if n.states <= A._max_size:
        A._curr_size = n.states
        return
    if A._max_size == 0:
        A.token_ids = <int*>calloc(n.states * n.feats, sizeof(A.token_ids[0]))
        A.unmaxed = <float*>calloc(n.states * n.hiddens * n.pieces, sizeof(A.unmaxed[0]))
        A.hiddens = <float*>calloc(n.states * n.hiddens, sizeof(A.hiddens[0]))
        A.is_valid = <int*>calloc(n.states * n.classes, sizeof(A.is_valid[0]))
        A._max_size = n.states
    else:
        A.token_ids = <int*>realloc(A.token_ids,
                                    n.states * n.feats * sizeof(A.token_ids[0]))
        A.unmaxed = <float*>realloc(A.unmaxed,
                                    n.states * n.hiddens * n.pieces * sizeof(A.unmaxed[0]))
        A.hiddens = <float*>realloc(A.hiddens,
                                    n.states * n.hiddens * sizeof(A.hiddens[0]))
        A.is_valid = <int*>realloc(A.is_valid,
                                   n.states * n.classes * sizeof(A.is_valid[0]))
        A._max_size = n.states
    A._curr_size = n.states
 cdef void _predict_states(CBlas cblas, ActivationsC* A, float* scores, StateC** states, const WeightsC* W, SizesC n) nogil:
    _resize_activations(A, n)
    for i in range(n.states):
        states[i].set_context_tokens(&A.token_ids[i*n.feats], n.feats)
    memset(A.unmaxed, 0, n.states * n.hiddens * n.pieces * sizeof(float))
    _sum_state_features(cblas, A.unmaxed, W.feat_weights, A.token_ids, n)
    for i in range(n.states):
        saxpy(cblas)(n.hiddens * n.pieces, 1., W.feat_bias, 1, &A.unmaxed[i*n.hiddens*n.pieces], 1)
        for j in range(n.hiddens):
            index = i * n.hiddens * n.pieces + j * n.pieces
            which = arg_max(&A.unmaxed[index], n.pieces)
            A.hiddens[i*n.hiddens + j] = A.unmaxed[index + which]
    if W.hidden_weights == NULL:
        memcpy(scores, A.hiddens, n.states * n.classes * sizeof(float))
    else:
        # Compute hidden-to-output
        sgemm(cblas)(False, True, n.states, n.classes, n.hiddens,
                     1.0, <const float *>A.hiddens, n.hiddens,
                     <const float *>W.hidden_weights, n.hiddens,
                     0.0, scores, n.classes)
        # Add bias
        for i in range(n.states):
            saxpy(cblas)(n.classes, 1., W.hidden_bias, 1, &scores[i*n.classes], 1)
    # Set unseen classes to minimum value
    i = 0
    min_ = scores[0]
    for i in range(1, n.states * n.classes):
        if scores[i] < min_:
            min_ = scores[i]
    for i in range(n.states):
        for j in range(n.classes):
            if W.seen_mask[j]:
                scores[i*n.classes+j] = min_
 cdef void _sum_state_features(CBlas cblas, float* output, const float* cached,
                              const int* token_ids, SizesC n) nogil:
    cdef int idx, b, f
    cdef const float* feature
    cdef int B = n.states
    cdef int O = n.hiddens * n.pieces  # no-cython-lint
    cdef int F = n.feats
    cdef int T = n.tokens
    padding = cached + (T * F * O)
    cdef int id_stride = F*O
    cdef float one = 1.
    for b in range(B):
        for f in range(F):
            if token_ids[f] < 0:
                feature = &padding[f*O]
            else:
                idx = token_ids[f] * id_stride + f*O
                feature = &cached[idx]
            saxpy(cblas)(O, one, <const float*>feature, 1, &output[b*O], 1)
        token_ids += F
--- a/spacy/morphology.pyx
+++ b/spacy/morphology.pyx
@ -1,4 +1,5 @@
 # cython: infer_types
 # cython: profile=False
 import warnings
 from typing import Dict, List, Optional, Tuple, Union
--- a/spacy/parts_of_speech.pyx
+++ b/spacy/parts_of_speech.pyx
@ -1,4 +1,4 @@
-
+# cython: profile=False
 IDS = {
    "": NO_TAG,
    "ADJ": ADJ,
--- a/spacy/pipeline/init.py
+++ b/spacy/pipeline/init.py
@ -21,6 +21,7 @@ from .trainable_pipe import TrainablePipe
 __all__ = [
    "AttributeRuler",
    "DependencyParser",
    "EditTreeLemmatizer",
    "EntityLinker",
    "EntityRecognizer",
    "Morphologizer",
--- a/spacy/pipeline/_edit_tree_internals/edit_trees.pyx
+++ b/spacy/pipeline/_edit_tree_internals/edit_trees.pyx
@ -1,4 +1,5 @@
 # cython: infer_types=True, binding=True
 # cython: profile=False
 from cython.operator cimport dereference as deref
 from libc.stdint cimport UINT32_MAX, uint32_t
 from libc.string cimport memset
--- a/spacy/pipeline/_edit_tree_internals/schemas.py
+++ b/spacy/pipeline/_edit_tree_internals/schemas.py
@ -1,8 +1,12 @@
 from collections import defaultdict
 from typing import Any, Dict, List, Union
-from pydantic import BaseModel, Field, ValidationError
+try:
-from pydantic.types import StrictBool, StrictInt, StrictStr
+    from pydantic.v1 import BaseModel, Field, ValidationError
    from pydantic.v1.types import StrictBool, StrictInt, StrictStr
 except ImportError:
    from pydantic import BaseModel, Field, ValidationError  # type: ignore
    from pydantic.types import StrictBool, StrictInt, StrictStr  # type: ignore
 class MatchNodeSchema(BaseModel):
--- a/spacy/pipeline/_parser_internals/_beam_utils.pyx
+++ b/spacy/pipeline/_parser_internals/_beam_utils.pyx
@ -1,13 +1,10 @@
 # cython: infer_types=True
 # cython: profile=True
 import numpy
 from ...typedefs cimport class_t
 from .transition_system cimport Transition, TransitionSystem
 from ...errors import Errors
 from .batch cimport Batch
 from .search cimport Beam, MaxViolation
 from .search import MaxViolation
@ -29,7 +26,7 @@ cdef int check_final_state(void* _state, void* extra_args) except -1:
    return state.is_final()
-cdef class BeamBatch(Batch):
+cdef class BeamBatch(object):
    cdef public TransitionSystem moves
    cdef public object states
    cdef public object docs
--- a/spacy/pipeline/_parser_internals/_parser_utils.pxd
+++ b/spacy/pipeline/_parser_internals/_parser_utils.pxd
@ -1,2 +0,0 @@
 cdef int arg_max(const float* scores, const int n_classes) nogil
 cdef int arg_max_if_valid(const float* scores, const int* is_valid, int n) nogil
--- a/spacy/pipeline/_parser_internals/_parser_utils.pyx
+++ b/spacy/pipeline/_parser_internals/_parser_utils.pyx
@ -1,22 +0,0 @@
 # cython: infer_types=True
 cdef inline int arg_max(const float* scores, const int n_classes) nogil:
    if n_classes == 2:
        return 0 if scores[0] > scores[1] else 1
    cdef int i
    cdef int best = 0
    cdef float mode = scores[0]
    for i in range(1, n_classes):
        if scores[i] > mode:
            mode = scores[i]
            best = i
    return best
 cdef inline int arg_max_if_valid(const float* scores, const int* is_valid, int n) nogil:
    cdef int best = -1
    for i in range(n):
        if is_valid[i] >= 1:
            if best == -1 or scores[i] > scores[best]:
                best = i
    return best
--- a/spacy/pipeline/_parser_internals/_state.pxd
+++ b/spacy/pipeline/_parser_internals/_state.pxd
@ -1,4 +1,5 @@
 cimport libcpp
 from cpython.exc cimport PyErr_CheckSignals, PyErr_SetFromErrno
 from cython.operator cimport dereference as deref
 from cython.operator cimport preincrement as incr
 from libc.stdint cimport uint32_t, uint64_t
@ -26,7 +27,7 @@ cdef struct ArcC:
 cdef cppclass StateC:
-    vector[int] _heads
+    int* _heads
    const TokenC* _sent
    vector[int] _stack
    vector[int] _rebuffer
@ -34,34 +35,31 @@ cdef cppclass StateC:
    unordered_map[int, vector[ArcC]] _left_arcs
    unordered_map[int, vector[ArcC]] _right_arcs
    vector[libcpp.bool] _unshiftable
    vector[int] history
    set[int] _sent_starts
    TokenC _empty_token
    int length
    int offset
    int _b_i
-    __init__(const TokenC* sent, int length) nogil except +:
+    __init__(const TokenC* sent, int length) nogil:
        this._heads.resize(length, -1)
        this._unshiftable.resize(length, False)
        # Reserve memory ahead of time to minimize allocations during parsing.
        # The initial capacity set here ideally reflects the expected average-case/majority usage.
        cdef int init_capacity = 32
        this._stack.reserve(init_capacity)
        this._rebuffer.reserve(init_capacity)
        this._ents.reserve(init_capacity)
        this._left_arcs.reserve(init_capacity)
        this._right_arcs.reserve(init_capacity)
        this.history.reserve(init_capacity)
        this._sent = sent
        this._heads = <int*>calloc(length, sizeof(int))
        if not (this._sent and this._heads):
            with gil:
                PyErr_SetFromErrno(MemoryError)
                PyErr_CheckSignals()
        this.offset = 0
        this.length = length
        this._b_i = 0
        for i in range(length):
            this._heads[i] = -1
            this._unshiftable.push_back(0)
        memset(&this._empty_token, 0, sizeof(TokenC))
        this._empty_token.lex = &EMPTY_LEXEME
    __dealloc__():
        free(this._heads)
    void set_context_tokens(int* ids, int n) nogil:
        cdef int i, j
        if n == 1:
@ -134,20 +132,19 @@ cdef cppclass StateC:
                ids[i] = -1
    int S(int i) nogil const:
-        cdef int stack_size = this._stack.size()
+        if i >= this._stack.size():
        if i >= stack_size or i < 0:
            return -1
-        else:
+        elif i < 0:
-            return this._stack[stack_size - (i+1)]
+            return -1
        return this._stack.at(this._stack.size() - (i+1))
    int B(int i) nogil const:
        cdef int buf_size = this._rebuffer.size()
        if i < 0:
            return -1
-        elif i < buf_size:
+        elif i < this._rebuffer.size():
-            return this._rebuffer[buf_size - (i+1)]
+            return this._rebuffer.at(this._rebuffer.size() - (i+1))
        else:
-            b_i = this._b_i + (i - buf_size)
+            b_i = this._b_i + (i - this._rebuffer.size())
            if b_i >= this.length:
                return -1
            else:
@ -246,7 +243,7 @@ cdef cppclass StateC:
            return 0
        elif this._sent[word].sent_start == 1:
            return 1
-        elif this._sent_starts.const_find(word) != this._sent_starts.const_end():
+        elif this._sent_starts.count(word) >= 1:
            return 1
        else:
            return 0
@ -330,7 +327,7 @@ cdef cppclass StateC:
        if item >= this._unshiftable.size():
            return 0
        else:
-            return this._unshiftable[item]
+            return this._unshiftable.at(item)
    void set_reshiftable(int item) nogil:
        if item < this._unshiftable.size():
@ -350,9 +347,6 @@ cdef cppclass StateC:
        this._heads[child] = head
    void map_del_arc(unordered_map[int, vector[ArcC]]* heads_arcs, int h_i, int c_i) nogil:
        cdef vector[ArcC]* arcs
        cdef ArcC* arc
        arcs_it = heads_arcs.find(h_i)
        if arcs_it == heads_arcs.end():
            return
@ -361,12 +355,12 @@ cdef cppclass StateC:
        if arcs.size() == 0:
            return
-        arc = &arcs.back()
+        arc = arcs.back()
        if arc.head == h_i and arc.child == c_i:
            arcs.pop_back()
        else:
            for i in range(arcs.size()-1):
-                arc = &deref(arcs)[i]
+                arc = arcs.at(i)
                if arc.head == h_i and arc.child == c_i:
                    arc.head = -1
                    arc.child = -1
@ -406,11 +400,10 @@ cdef cppclass StateC:
        this._rebuffer = src._rebuffer
        this._sent_starts = src._sent_starts
        this._unshiftable = src._unshiftable
-        this._heads = src._heads
+        memcpy(this._heads, src._heads, this.length * sizeof(this._heads[0]))
        this._ents = src._ents
        this._left_arcs = src._left_arcs
        this._right_arcs = src._right_arcs
        this._b_i = src._b_i
        this.offset = src.offset
        this._empty_token = src._empty_token
        this.history = src.history
--- a/spacy/pipeline/_parser_internals/_state.pyx
+++ b/spacy/pipeline/_parser_internals/_state.pyx
@ -0,0 +1 @@
 # cython: profile=False
--- a/spacy/pipeline/_parser_internals/arc_eager.pyx
+++ b/spacy/pipeline/_parser_internals/arc_eager.pyx
@ -1,4 +1,4 @@
-# cython: profile=True, cdivision=True, infer_types=True
+# cython: cdivision=True, infer_types=True
 from cymem.cymem cimport Address, Pool
 from libc.stdint cimport int32_t
 from libcpp.vector cimport vector
@ -779,8 +779,6 @@ cdef class ArcEager(TransitionSystem):
        return list(arcs)
    def has_gold(self, Example eg, start=0, end=None):
        if end is not None and end < 0:
            end = None
        for word in eg.y[start:end]:
            if word.dep != 0:
                return True
@ -865,7 +863,6 @@ cdef class ArcEager(TransitionSystem):
                            state.print_state()
                        )))
                    action.do(state.c, action.label)
                    state.c.history.push_back(i)
                    break
            else:
                failed = False
--- a/spacy/pipeline/_parser_internals/batch.pxd
+++ b/spacy/pipeline/_parser_internals/batch.pxd
@ -1,2 +0,0 @@
 cdef class Batch:
    pass
--- a/spacy/pipeline/_parser_internals/batch.pyx
+++ b/spacy/pipeline/_parser_internals/batch.pyx
@ -1,52 +0,0 @@
 from typing import Any
 TransitionSystem = Any  # TODO
 cdef class Batch:
    def advance(self, scores):
        raise NotImplementedError
    def get_states(self):
        raise NotImplementedError
    @property
    def is_done(self):
        raise NotImplementedError
    def get_unfinished_states(self):
        raise NotImplementedError
    def __getitem__(self, i):
        raise NotImplementedError
    def __len__(self):
        raise NotImplementedError
 class GreedyBatch(Batch):
    def __init__(self, moves: TransitionSystem, states, golds):
        self._moves = moves
        self._states = states
        self._next_states = [s for s in states if not s.is_final()]
    def advance(self, scores):
        self._next_states = self._moves.transition_states(self._next_states, scores)
    def advance_with_actions(self, actions):
        self._next_states = self._moves.apply_actions(self._next_states, actions)
    def get_states(self):
        return self._states
    @property
    def is_done(self):
        return all(s.is_final() for s in self._states)
    def get_unfinished_states(self):
        return [st for st in self._states if not st.is_final()]
    def __getitem__(self, i):
        return self._states[i]
    def __len__(self):
        return len(self._states)
--- a/spacy/pipeline/_parser_internals/ner.pyx
+++ b/spacy/pipeline/_parser_internals/ner.pyx
@ -1,3 +1,4 @@
 # cython: profile=False
 from cymem.cymem cimport Pool
 from libcpp.memory cimport shared_ptr
 from libcpp.vector cimport vector
@ -306,8 +307,6 @@ cdef class BiluoPushDown(TransitionSystem):
            for span in eg.y.spans.get(neg_key, []):
                if span.start >= start and span.end <= end:
                    return True
        if end is not None and end < 0:
            end = None
        for word in eg.y[start:end]:
            if word.ent_iob != 0:
                return True
--- a/spacy/pipeline/_parser_internals/nonproj.pyx
+++ b/spacy/pipeline/_parser_internals/nonproj.pyx
@ -1,4 +1,4 @@
-# cython: profile=True, infer_types=True
+# cython: infer_types=True
 """Implements the projectivize/deprojectivize mechanism in Nivre & Nilsson 2005
 for doing pseudo-projective parsing implementation uses the HEAD decoration
 scheme.
--- a/spacy/pipeline/_parser_internals/search.pyx
+++ b/spacy/pipeline/_parser_internals/search.pyx
@ -1,4 +1,4 @@
-# cython: profile=True, experimental_cpp_class_def=True, cdivision=True, infer_types=True
+# cython: experimental_cpp_class_def=True, cdivision=True, infer_types=True
 cimport cython
 from cymem.cymem cimport Pool
 from libc.math cimport exp
--- a/spacy/pipeline/_parser_internals/stateclass.pyx
+++ b/spacy/pipeline/_parser_internals/stateclass.pyx
@ -1,4 +1,5 @@
 # cython: infer_types=True
 # cython: profile=False
 from libcpp.vector cimport vector
 from ...tokens.doc cimport Doc
@ -19,10 +20,6 @@ cdef class StateClass:
        if self._borrowed != 1:
            del self.c
    @property
    def history(self):
        return list(self.c.history)
    @property
    def stack(self):
        return [self.S(i) for i in range(self.c.stack_depth())]
@ -32,7 +29,7 @@ cdef class StateClass:
        return [self.B(i) for i in range(self.c.buffer_length())]
    @property
-    def token_vector_lenth(self):
+    def token_vector_length(self):
        return self.doc.tensor.shape[1]
    @property
@ -179,6 +176,3 @@ cdef class StateClass:
    def clone(self, StateClass src):
        self.c.clone(src.c)
    def set_context_tokens(self, int[:, :] output, int row, int n_feats):
        self.c.set_context_tokens(&output[row, 0], n_feats)
--- a/spacy/pipeline/_parser_internals/transition_system.pxd
+++ b/spacy/pipeline/_parser_internals/transition_system.pxd
@ -57,10 +57,3 @@ cdef class TransitionSystem:
    cdef int set_costs(self, int* is_valid, weight_t* costs,
                       const StateC* state, gold) except -1
 cdef void c_apply_actions(TransitionSystem moves, StateC** states, const int* actions,
                          int batch_size) nogil
 cdef void c_transition_batch(TransitionSystem moves, StateC** states, const float* scores,
                             int nr_class, int batch_size) nogil
--- a/spacy/pipeline/_parser_internals/transition_system.pyx
+++ b/spacy/pipeline/_parser_internals/transition_system.pyx
@ -1,17 +1,14 @@
 # cython: infer_types=True
 # cython: profile=False
 from __future__ import print_function
 from cymem.cymem cimport Pool
 from libc.stdlib cimport calloc, free
 from libcpp.vector cimport vector
 from collections import Counter
 import srsly
 from ...structs cimport TokenC
 from ...typedefs cimport attr_t, weight_t
 from ._parser_utils cimport arg_max_if_valid
 from .stateclass cimport StateClass
 from ... import util
@ -76,18 +73,7 @@ cdef class TransitionSystem:
            offset += len(doc)
        return states
    def follow_history(self, doc, history):
        cdef int clas
        cdef StateClass state = StateClass(doc)
        for clas in history:
            action = self.c[clas]
            action.do(state.c, action.label)
            state.c.history.push_back(clas)
        return state
    def get_oracle_sequence(self, Example example, _debug=False):
        if not self.has_gold(example):
            return []
        states, golds, _ = self.init_gold_batch([example])
        if not states:
            return []
@ -99,8 +85,6 @@ cdef class TransitionSystem:
            return self.get_oracle_sequence_from_state(state, gold)
    def get_oracle_sequence_from_state(self, StateClass state, gold, _debug=None):
        if state.is_final():
            return []
        cdef Pool mem = Pool()
        # n_moves should not be zero at this point, but make sure to avoid zero-length mem alloc
        assert self.n_moves > 0
@ -126,7 +110,6 @@ cdef class TransitionSystem:
                            "S0 head?", str(state.has_head(state.S(0))),
                        )))
                    action.do(state.c, action.label)
                    state.c.history.push_back(i)
                    break
            else:
                if _debug:
@ -154,28 +137,6 @@ cdef class TransitionSystem:
            raise ValueError(Errors.E170.format(name=name))
        action = self.lookup_transition(name)
        action.do(state.c, action.label)
        state.c.history.push_back(action.clas)
    def apply_actions(self, states, const int[::1] actions):
        assert len(states) == actions.shape[0]
        cdef StateClass state
        cdef vector[StateC*] c_states
        c_states.resize(len(states))
        cdef int i
        for (i, state) in enumerate(states):
            c_states[i] = state.c
        c_apply_actions(self, &c_states[0], &actions[0], actions.shape[0])
        return [state for state in states if not state.c.is_final()]
    def transition_states(self, states, float[:, ::1] scores):
        assert len(states) == scores.shape[0]
        cdef StateClass state
        cdef float* c_scores = &scores[0, 0]
        cdef vector[StateC*] c_states
        for state in states:
            c_states.push_back(state.c)
        c_transition_batch(self, &c_states[0], c_scores, scores.shape[1], scores.shape[0])
        return [state for state in states if not state.c.is_final()]
    cdef Transition lookup_transition(self, object name) except *:
        raise NotImplementedError
@ -288,34 +249,3 @@ cdef class TransitionSystem:
            self.cfg.update(msg['cfg'])
        self.initialize_actions(labels)
        return self
 cdef void c_apply_actions(TransitionSystem moves, StateC** states, const int* actions,
                          int batch_size) nogil:
    cdef int i
    cdef Transition action
    cdef StateC* state
    for i in range(batch_size):
        state = states[i]
        action = moves.c[actions[i]]
        action.do(state, action.label)
        state.history.push_back(action.clas)
 cdef void c_transition_batch(TransitionSystem moves, StateC** states, const float* scores,
                             int nr_class, int batch_size) nogil:
    is_valid = <int*>calloc(moves.n_moves, sizeof(int))
    cdef int i, guess
    cdef Transition action
    for i in range(batch_size):
        moves.set_valid(is_valid, states[i])
        guess = arg_max_if_valid(&scores[i*nr_class], is_valid, nr_class)
        if guess == -1:
            # This shouldn't happen, but it's hard to raise an error here,
            # and we don't want to infinite loop. So, force to end state.
            states[i].force_final()
        else:
            action = moves.c[guess]
            action.do(states[i], action.label)
            states[i].history.push_back(guess)
    free(is_valid)
--- a/spacy/pipeline/dep_parser.pyx
+++ b/spacy/pipeline/dep_parser.pyx
@ -1,9 +1,14 @@
-# cython: infer_types=True, profile=True, binding=True
+# cython: infer_types=True, binding=True
 from collections import defaultdict
 from typing import Callable, Optional
 from thinc.api import Config, Model
 from ._parser_internals.transition_system import TransitionSystem
 from ._parser_internals.arc_eager cimport ArcEager
 from .transition_parser cimport Parser
 from ..language import Language
 from ..scorer import Scorer
 from ..training import remove_bilu_prefix
@ -17,11 +22,12 @@ from .transition_parser import Parser
 default_model_config = """
 [model]
-@architectures = "spacy.TransitionBasedParser.v3"
+@architectures = "spacy.TransitionBasedParser.v2"
 state_type = "parser"
 extra_state_tokens = false
 hidden_width = 64
 maxout_pieces = 2
 use_upper = true
 [model.tok2vec]
@architectures = "spacy.HashEmbedCNN.v2"
@ -227,7 +233,6 @@ def parser_score(examples, **kwargs):
    DOCS: https://spacy.io/api/dependencyparser#score
    """
    def has_sents(doc):
        return doc.has_annotation("SENT_START")
@ -235,11 +240,8 @@ def parser_score(examples, **kwargs):
        dep = getattr(token, attr)
        dep = token.vocab.strings.as_string(dep).lower()
        return dep
    results = {}
-    results.update(
+    results.update(Scorer.score_spans(examples, "sents", has_annotation=has_sents, **kwargs))
        Scorer.score_spans(examples, "sents", has_annotation=has_sents, **kwargs)
    )
    kwargs.setdefault("getter", dep_getter)
    kwargs.setdefault("ignore_labels", ("p", "punct"))
    results.update(Scorer.score_deps(examples, "dep", **kwargs))
@ -252,12 +254,11 @@ def make_parser_scorer():
    return parser_score
-class DependencyParser(Parser):
+cdef class DependencyParser(Parser):
    """Pipeline component for dependency parsing.
    DOCS: https://spacy.io/api/dependencyparser
    """
    TransitionSystem = ArcEager
    def __init__(
@ -277,7 +278,8 @@ class DependencyParser(Parser):
        incorrect_spans_key=None,
        scorer=parser_score,
    ):
-        """Create a DependencyParser."""
+        """Create a DependencyParser.
        """
        super().__init__(
            vocab,
            model,
--- a/spacy/pipeline/edit_tree_lemmatizer.py
+++ b/spacy/pipeline/edit_tree_lemmatizer.py
@ -5,7 +5,6 @@ from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple, Union,
 import numpy as np
 import srsly
 from thinc.api import Config, Model, NumpyOps, SequenceCategoricalCrossentropy
 from thinc.legacy import LegacySequenceCategoricalCrossentropy
 from thinc.types import ArrayXd, Floats2d, Ints1d
 from .. import util
@ -131,9 +130,7 @@ class EditTreeLemmatizer(TrainablePipe):
        self, examples: Iterable[Example], scores: List[Floats2d]
    ) -> Tuple[float, List[Floats2d]]:
        validate_examples(examples, "EditTreeLemmatizer.get_loss")
-        loss_func = LegacySequenceCategoricalCrossentropy(
+        loss_func = SequenceCategoricalCrossentropy(normalize=False, missing_value=-1)
            normalize=False, missing_value=-1
        )
        truths = []
        for eg in examples:
@ -169,7 +166,7 @@ class EditTreeLemmatizer(TrainablePipe):
        DOCS: https://spacy.io/api/edittreelemmatizer#get_teacher_student_loss
        """
-        loss_func = LegacySequenceCategoricalCrossentropy(normalize=False)
+        loss_func = SequenceCategoricalCrossentropy(normalize=False)
        d_scores, loss = loss_func(student_scores, teacher_scores)
        if self.model.ops.xp.isnan(loss):
            raise ValueError(Errors.E910.format(name=self.name))
--- a/spacy/pipeline/lemmatizer.py
+++ b/spacy/pipeline/lemmatizer.py
@ -2,6 +2,7 @@ import warnings
 from pathlib import Path
 from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple, Union
 import srsly
 from thinc.api import Model
 from .. import util
@ -155,8 +156,24 @@ class Lemmatizer(Pipe):
        """
        required_tables, optional_tables = self.get_lookups_config(self.mode)
        if lookups is None:
-            logger.debug("Lemmatizer: loading tables from spacy-lookups-data")
+            logger.debug(
-            lookups = load_lookups(lang=self.vocab.lang, tables=required_tables)
+                "Lemmatizer: no lemmatizer lookups tables provided, "
                "trying to load tables from registered lookups (usually "
                "spacy-lookups-data)"
            )
            lookups = load_lookups(
                lang=self.vocab.lang, tables=required_tables, strict=False
            )
            missing_tables = set(required_tables) - set(lookups.tables)
            if len(missing_tables) > 0:
                raise ValueError(
                    Errors.E4010.format(
                        missing_tables=list(missing_tables),
                        pipe_name=self.name,
                        required_tables=srsly.json_dumps(required_tables),
                        tables=srsly.json_dumps(required_tables + optional_tables),
                    )
                )
            optional_lookups = load_lookups(
                lang=self.vocab.lang, tables=optional_tables, strict=False
            )
--- a/spacy/pipeline/morphologizer.pyx
+++ b/spacy/pipeline/morphologizer.pyx
@ -1,9 +1,8 @@
-# cython: infer_types=True, profile=True, binding=True
+# cython: infer_types=True, binding=True
 from itertools import islice
 from typing import Callable, Dict, Iterable, Optional, Union
-from thinc.api import Config, Model
+from thinc.api import Config, Model, SequenceCategoricalCrossentropy
 from thinc.legacy import LegacySequenceCategoricalCrossentropy
 from ..morphology cimport Morphology
 from ..tokens.doc cimport Doc
@ -296,8 +295,8 @@ class Morphologizer(Tagger):
        DOCS: https://spacy.io/api/morphologizer#get_loss
        """
        validate_examples(examples, "Morphologizer.get_loss")
-        loss_func = LegacySequenceCategoricalCrossentropy(names=self.labels, normalize=False,
+        loss_func = SequenceCategoricalCrossentropy(names=self.labels, normalize=False,
-                                                          label_smoothing=self.cfg["label_smoothing"])
+                                                    label_smoothing=self.cfg["label_smoothing"])
        truths = []
        for eg in examples:
            eg_truths = []
--- a/spacy/pipeline/ner.pyx
+++ b/spacy/pipeline/ner.pyx
@ -1,4 +1,4 @@
-# cython: infer_types=True, profile=True, binding=True
+# cython: infer_types=True, binding=True
 from collections import defaultdict
 from typing import Callable, Optional
@ -10,15 +10,23 @@ from ..training import remove_bilu_prefix
 from ..util import registry
 from ._parser_internals.ner import BiluoPushDown
 from ._parser_internals.transition_system import TransitionSystem
-from .transition_parser import Parser
+
 from ._parser_internals.ner cimport BiluoPushDown
 from .transition_parser cimport Parser
 from ..language import Language
 from ..scorer import get_ner_prf
 from ..training import remove_bilu_prefix
 from ..util import registry
 default_model_config = """
 [model]
-@architectures = "spacy.TransitionBasedParser.v3"
+@architectures = "spacy.TransitionBasedParser.v2"
 state_type = "ner"
 extra_state_tokens = false
 hidden_width = 64
 maxout_pieces = 2
 use_upper = true
 [model.tok2vec]
@architectures = "spacy.HashEmbedCNN.v2"
@ -43,12 +51,8 @@ DEFAULT_NER_MODEL = Config().from_str(default_model_config)["model"]
        "incorrect_spans_key": None,
        "scorer": {"@scorers": "spacy.ner_scorer.v1"},
    },
-    default_score_weights={
+    default_score_weights={"ents_f": 1.0, "ents_p": 0.0, "ents_r": 0.0, "ents_per_type": None},
-        "ents_f": 1.0,
+
        "ents_p": 0.0,
        "ents_r": 0.0,
        "ents_per_type": None,
    },
 )
 def make_ner(
    nlp: Language,
@ -115,12 +119,7 @@ def make_ner(
        "incorrect_spans_key": None,
        "scorer": None,
    },
-    default_score_weights={
+    default_score_weights={"ents_f": 1.0, "ents_p": 0.0, "ents_r": 0.0, "ents_per_type": None},
        "ents_f": 1.0,
        "ents_p": 0.0,
        "ents_r": 0.0,
        "ents_per_type": None,
    },
 )
 def make_beam_ner(
    nlp: Language,
@ -194,12 +193,11 @@ def make_ner_scorer():
    return ner_score
-class EntityRecognizer(Parser):
+cdef class EntityRecognizer(Parser):
    """Pipeline component for named entity recognition.
    DOCS: https://spacy.io/api/entityrecognizer
    """
    TransitionSystem = BiluoPushDown
    def __init__(
@ -217,14 +215,15 @@ class EntityRecognizer(Parser):
        incorrect_spans_key=None,
        scorer=ner_score,
    ):
-        """Create an EntityRecognizer."""
+        """Create an EntityRecognizer.
        """
        super().__init__(
            vocab,
            model,
            name,
            moves,
            update_with_oracle_cut_size=update_with_oracle_cut_size,
-            min_action_freq=1,  # not relevant for NER
+            min_action_freq=1,   # not relevant for NER
            learn_tokens=False,  # not relevant for NER
            beam_width=beam_width,
            beam_density=beam_density,
--- a/spacy/pipeline/pipe.pyx
+++ b/spacy/pipeline/pipe.pyx
@ -1,4 +1,4 @@
-# cython: infer_types=True, profile=True, binding=True
+# cython: infer_types=True, binding=True
 from typing import Callable, Dict, Iterable, Iterator, Tuple, Union
 import srsly
--- a/spacy/pipeline/sentencizer.pyx
+++ b/spacy/pipeline/sentencizer.pyx
@ -1,4 +1,4 @@
-# cython: infer_types=True, profile=True, binding=True
+# cython: infer_types=True, binding=True
 from typing import Callable, List, Optional
 import srsly
--- a/Show More
+++ b/Show More
		`@ -0,0 +1 @@`
							`custom: [https://explosion.ai/merch, https://explosion.ai/tailored-solutions]`
`@ -1,4 +1,4 @@`
	`# cython: infer_types=True, profile=True`	`# cython: infer_types=True`

	`from .kb_in_memory cimport InMemoryLookupKB`	`from .kb_in_memory cimport InMemoryLookupKB`
		`@ -1,2 +0,0 @@`
			`cdef int arg_max(const float* scores, const int n_classes) nogil`
			`cdef int arg_max_if_valid(const float* scores, const int* is_valid, int n) nogil`