Added Haitian Creole (ht) Language Support to spaCy (#13807 )

This PR adds official support for Haitian Creole (ht) to spaCy's spacy/lang module. It includes: Added all core language data files for spacy/lang/ht: tokenizer_exceptions.py punctuation.py lex_attrs.py syntax_iterators.py lemmatizer.py stop_words.py tag_map.py Unit tests for tokenizer and noun chunking (test_tokenizer.py, test_noun_chunking.py, etc.). Passed all 58 pytest spacy/tests/lang/ht tests that I've created. Basic tokenizer rules adapted for Haitian Creole orthography and informal contractions. Custom like_num atrribute supporting Haitian number formats (e.g., "3yèm"). Support for common informal apostrophe usage (e.g., "m'ap", "n'ap", "di'm"). Ensured no breakages in other language modules. Followed spaCy coding style (PEP8, Black). This provides a foundation for Haitian Creole NLP development using spaCy.
Correct API docs for Span.lemma_, Vocab.to_bytes and Vectors.__init__ (#13436 )
2025-08-04 04:10:20 +03:00 · 2025-05-28 17:23:38 +02:00 · 2025-05-28 17:22:50 +02:00 · 2025-05-28 17:21:46 +02:00 · 2025-05-28 17:06:11 +02:00 · 2025-05-28 17:04:23 +02:00
1611 changed files with 166565 additions and 144637 deletions
--- a/.buildkite/sdist.yml
+++ b/.buildkite/sdist.yml
@ -1,11 +0,0 @@
-steps:
-  -
-    command: "fab env clean make test sdist"
-    label: ":dizzy: :python:"
-    artifact_paths: "dist/*.tar.gz"
-  - wait
-  - trigger: "spacy-sdist-against-models"
-    label: ":dizzy: :hammer:"
-    build:
-      env:
-        SPACY_VERSION: "{$SPACY_VERSION}"
--- a/.buildkite/train.yml
+++ b/.buildkite/train.yml
@ -1,11 +0,0 @@
-steps:
-  -
-    command: "fab env clean make test wheel"
-    label: ":dizzy: :python:"
-    artifact_paths: "dist/*.whl"
-  - wait
-  - trigger: "spacy-train-from-wheel"
-    label: ":dizzy: :train:"
-    build:
-      env:
-        SPACY_VERSION: "{$SPACY_VERSION}"
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@ -0,0 +1 @@
+custom: [https://explosion.ai/merch, https://explosion.ai/tailored-solutions]
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@ -1,18 +0,0 @@
-<!--- Please provide a summary in the title and describe your issue here.
-Is this a bug or feature request? If a bug, include all the steps that led to the issue.
-
-If you're looking for help with your code, consider posting a question here:
-
- GitHub Discussions: https://github.com/explosion/spaCy/discussions
- Stack Overflow: http://stackoverflow.com/questions/tagged/spacy
-->
-
-## Your Environment
-
-<!-- Include details of your environment. If you're using spaCy 1.7+, you can also type
-`python -m spacy info --markdown` and copy-paste the result here.-->
-
- Operating System:
- Python Version Used:
- spaCy Version Used:
- Environment Information:
--- a/.github/ISSUE_TEMPLATE/01_bugs.md
+++ b/.github/ISSUE_TEMPLATE/01_bugs.md
@ -1,14 +1,16 @@
 ---
-name: "\U0001F6A8 Bug Report"
-about: Did you come across a bug or unexpected behaviour differing from the docs?
+name: "\U0001F6A8 Submit a Bug Report"
+about: Use this template if you came across a bug or unexpected behaviour differing from the docs.

 ---

+<!-- NOTE: For questions or install related issues, please open a Discussion instead. -->
+
 ## How to reproduce the behaviour
 <!-- Include a code example or the steps that led to the problem. Please try to be as specific as possible. -->

 ## Your Environment
-<!-- Include details of your environment. If you're using spaCy 1.7+, you can also type `python -m spacy info --markdown` and copy-paste the result here.-->
+<!-- Include details of your environment. You can also type `python -m spacy info --markdown` and copy-paste the result here.-->
 * Operating System:
 * Python Version Used:
 * spaCy Version Used:
--- a/.github/ISSUE_TEMPLATE/02_docs.md
+++ b/.github/ISSUE_TEMPLATE/02_docs.md
@ -1,5 +1,5 @@
 ---
-name: "\U0001F4DA Documentation"
+name: "\U0001F4DA Submit a Documentation Report"
 about: Did you spot a mistake in the docs, is anything unclear or do you have a
  suggestion?

--- a/.github/ISSUE_TEMPLATE/02_install.md
+++ b/.github/ISSUE_TEMPLATE/02_install.md
@ -1,21 +0,0 @@
---
-name: "\U000023F3 Installation Problem"
-about: Do you have problems installing spaCy, and none of the suggestions in the docs
-  and other issues helped?
-
---
-<!-- Before submitting an issue, make sure to check the docs and closed issues to see if any of the solutions work for you. Installation problems can often be related to Python environment issues and problems with compilation. -->
-
-## How to reproduce the problem
-<!-- Include the details of how the problem occurred. Which command did you run to install spaCy? Did you come across an error? What else did you try? -->
-
-```bash
-# copy-paste the error message here
-```
-
-## Your Environment
-<!-- Include details of your environment. If you're using spaCy 1.7+, you can also type `python -m spacy info --markdown` and copy-paste the result here.-->
-* Operating System:
-* Python Version Used:
-* spaCy Version Used:
-* Environment Information:
--- a/.github/ISSUE_TEMPLATE/04_other.md
+++ b/.github/ISSUE_TEMPLATE/04_other.md
@ -1,19 +0,0 @@
---
-name: "\U0001F4AC Anything else?"
-about: For feature and project ideas, general usage questions or help with your code, please post on the GitHub Discussions board instead.
---
-
-<!-- Describe your issue here. Please keep in mind that the GitHub issue tracker is mostly intended for reports related to the spaCy code base and source, and for bugs and enhancements. If you're looking for help with your code, consider posting a question here:
-
- GitHub Discussions: https://github.com/explosion/spaCy/discussions
- Stack Overflow: http://stackoverflow.com/questions/tagged/spacy
-->
-
-## Your Environment
-
-<!-- Include details of your environment. If you're using spaCy 1.7+, you can also type `python -m spacy info --markdown` and copy-paste the result here.-->
-
- Operating System:
- Python Version Used:
- spaCy Version Used:
- Environment Information:
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@ -0,0 +1,14 @@
+blank_issues_enabled: false
+contact_links:
+  - name: 🗯 Discussions Forum
+    url: https://github.com/explosion/spaCy/discussions
+    about: Install issues, usage questions, general discussion and anything else that isn't a bug report.
+  - name: 📖 spaCy FAQ & Troubleshooting
+    url: https://github.com/explosion/spaCy/discussions/8226
+    about: Before you post, check out the FAQ for answers to common community questions!
+  - name: 💫 spaCy Usage Guides & API reference
+    url: https://spacy.io/usage
+    about: Everything you need to know about spaCy and how to use it.
+  - name: 🛠 Submit a Pull Request
+    url: https://github.com/explosion/spaCy/pulls
+    about: Did you spot a mistake and know how to fix it? Feel free to submit a PR straight away!
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@ -14,6 +14,6 @@ or new feature, or a change to the documentation? -->
 ## Checklist
 <!--- Before you submit the PR, go over this checklist and make sure you can
 tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
+- [ ] I confirm that I have the right to submit this contribution under the project's MIT license.
 - [ ] I ran the tests, and all new and existing tests passed.
 - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
--- a/.github/contributors/0x2b3bfa0.md
+++ b/.github/contributors/0x2b3bfa0.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Helio Machado        |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2021-02-03           |
+| GitHub username                | 0x2b3bfa0            |
+| Website (optional)             |                      |
--- a/.github/contributors/AyushExel.md
+++ b/.github/contributors/AyushExel.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Ayush Chaurasia      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2021-03-12           |
+| GitHub username                | AyushExel            |
+| Website (optional)             |                      |
--- a/.github/contributors/Jette16.md
+++ b/.github/contributors/Jette16.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Henriette Behr      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |  23.09.2021          |
+| GitHub username                |  Jette16             |
+| Website (optional)             |                      |
--- a/.github/contributors/KennethEnevoldsen.md
+++ b/.github/contributors/KennethEnevoldsen.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                      |
+|------------------------------- | -------------------------- |
+| Name                           | Kenneth Enevoldsen         |
+| Company name (if applicable)   |                            |
+| Title or role (if applicable)  |                            |
+| Date                           | 2021-07-13                 |
+| GitHub username                | KennethEnevoldsen          |
+| Website (optional)             | www.kennethenevoldsen.com  |
--- a/.github/contributors/Lucaterre.md
+++ b/.github/contributors/Lucaterre.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry         |
+|------------------------------- |---------------|
+| Name                           | Lucas Terriel |
+| Company name (if applicable)   |               |
+| Title or role (if applicable)  |               |
+| Date                           | 2022-06-20    |
+| GitHub username                | Lucaterre     |
+| Website (optional)             |               |
--- a/.github/contributors/Pantalaymon.md
+++ b/.github/contributors/Pantalaymon.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |Valentin-Gabriel Soumah|
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |    2021-11-23        |
+| GitHub username                |     Pantalaymon      |
+| Website (optional)             |                      |
--- a/.github/contributors/SamEdwardes.md
+++ b/.github/contributors/SamEdwardes.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Sam Edwardes         |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2021-04-02           |
+| GitHub username                | SamEdwardes          |
+| Website (optional)             | samedwardes.com      |
--- a/.github/contributors/ZeeD.md
+++ b/.github/contributors/ZeeD.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Vito De Tullio       |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2021-06-01           |
+| GitHub username                | ZeeD                 |
+| Website (optional)             |                      |
--- a/.github/contributors/armsp.md
+++ b/.github/contributors/armsp.md
@ -98,9 +98,9 @@ mark both statements:

 | Field                          | Entry                |
 |------------------------------- | -------------------- |
-| Name                           |  Shantam             |
+| Name                           |  Shantam Raj         |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
-| Date                           |   21/5/2018          |
+| Date                           |   10/4/2021          |
 | GitHub username                |     armsp            |
-| Website (optional)             |                      |
+| Website (optional)             |https://shantamraj.com|
--- a/.github/contributors/avi197.md
+++ b/.github/contributors/avi197.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Son Pham             |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 09/10/2021           |
+| GitHub username                | Avi197               |
+| Website (optional)             |                      |
--- a/.github/contributors/bbieniek.md
+++ b/.github/contributors/bbieniek.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Baltazar Bieniek     |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2021.08.19           |
+| GitHub username                | bbieniek             |
+| Website (optional)             | https://baltazar.bieniek.org.pl/                     |
--- a/.github/contributors/bodak.md
+++ b/.github/contributors/bodak.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Kristian Boda        |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 18.05.2021           |
+| GitHub username                | bodak                |
+| Website (optional)             |                      |
--- a/.github/contributors/bratao.md
+++ b/.github/contributors/bratao.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Bruno Souza Cabral  |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 24/12/2020           |
+| GitHub username                |  bratao              |
+| Website (optional)             |                      |
--- a/.github/contributors/broaddeep.md
+++ b/.github/contributors/broaddeep.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Dongjun Park         |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2021-03-06           |
+| GitHub username                | broaddeep            |
+| Website (optional)             |                      |
--- a/.github/contributors/bsweileh.md
+++ b/.github/contributors/bsweileh.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Belal               |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |  March 13, 2021      |
+| GitHub username                |  bsweileh            |
+| Website (optional)             |                      |
--- a/.github/contributors/connorbrinton.md
+++ b/.github/contributors/connorbrinton.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Connor Brinton       |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | July 20th, 2021      |
+| GitHub username                | connorbrinton        |
+| Website (optional)             |                      |
--- a/.github/contributors/dardoria.md
+++ b/.github/contributors/dardoria.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Boian Tzonev         |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 18.02.2021           |
+| GitHub username                | dardoria             |
+| Website (optional)             |                      |
--- a/.github/contributors/dhruvrnaik.md
+++ b/.github/contributors/dhruvrnaik.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Dhruv Naik           |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 26-01-2021           |
+| GitHub username                | dhruvrnaik           |
+| Website (optional)             |                      |
--- a/.github/contributors/ezorita.md
+++ b/.github/contributors/ezorita.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Eduard Zorita        |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 06/17/2021           |
+| GitHub username                | ezorita              |
+| Website (optional)             |                      |
--- a/.github/contributors/fgaim.md
+++ b/.github/contributors/fgaim.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Fitsum Gaim          |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2021-08-07           |
+| GitHub username                | fgaim                |
+| Website (optional)             |                      |
--- a/.github/contributors/fonfonx.md
+++ b/.github/contributors/fonfonx.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Xavier Fontaine      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2022-04-13           |
+| GitHub username                | fonfonx              |
+| Website (optional)             |                      |
--- a/.github/contributors/gtoffoli.md
+++ b/.github/contributors/gtoffoli.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                    |
+|------------------------------- | ------------------------ |
+| Name                           | Giovanni Toffoli         |
+| Company name (if applicable)   |                          |
+| Title or role (if applicable)  |                          |
+| Date                           | 2021-05-12               |
+| GitHub username                | gtoffoli                 |
+| Website (optional)             |                          |
--- a/.github/contributors/hlasse.md
+++ b/.github/contributors/hlasse.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                      |
+|------------------------------- | -------------------------- |
+| Name                           | Lasse Hansen               |
+| Company name (if applicable)   |                            |
+| Title or role (if applicable)  |                            |
+| Date                           | 2021-08-11                 |
+| GitHub username                | HLasse                     |
+| Website (optional)             | www.lassehansen.me         |
--- a/.github/contributors/jankrepl.md
+++ b/.github/contributors/jankrepl.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Jan Krepl            |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2021-03-09           |
+| GitHub username                | jankrepl             |
+| Website (optional)             |                      |
--- a/.github/contributors/jklaise.md
+++ b/.github/contributors/jklaise.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |Janis Klaise          |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |26/04/2021            |
+| GitHub username                |jklaise               |
+| Website (optional)             |janisklaise.com       |
--- a/.github/contributors/jmargeta.md
+++ b/.github/contributors/jmargeta.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Jan Margeta          |
+| Company name (if applicable)   | KardioMe             |
+| Title or role (if applicable)  | Founder              |
+| Date                           | 2020-10-16           |
+| GitHub username                | jmargeta             |
+| Website (optional)             | kardio.me            |
--- a/.github/contributors/jmyerston.md
+++ b/.github/contributors/jmyerston.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+      assignment is or becomes invalid, ineffective or unenforceable, you hereby
+      grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+      royalty-free, unrestricted license to exercise all rights under those
+      copyrights. This includes, at our option, the right to sublicense these same
+      rights to third parties through multiple levels of sublicensees or other
+      licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+      contribution as if each of us were the sole owners, and if one of us makes
+      a derivative work of your contribution, the one who makes the derivative
+      work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+      against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+      exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+      consent of, pay or render an accounting to the other for any use or
+      distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+      your contribution in whole or in part, alone or in combination with or
+      included in any product, work or materials arising out of the project to
+      which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+      multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+      or entity, including my employer, has or will have rights with respect to my
+      contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+      actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                         | Entry                               |
+| ----------------------------- | ----------------------------------- |
+| Name                          | Jacobo Myerston                     |
+| Company name (if applicable)  | University of California, San Diego |
+| Title or role (if applicable) | Academic                            |
+| Date                          | 07/05/2021                          |
+| GitHub username               | jmyerston                           |
+| Website (optional)            | diogenet.ucsd.edu                                    |
--- a/.github/contributors/julien-talkair.md
+++ b/.github/contributors/julien-talkair.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Julien Rossi                    |
+| Company name (if applicable)   |  TalkAir BV                    |
+| Title or role (if applicable)  |  CTO, Partner                    |
+| Date                           |  June 28 2021                    |
+| GitHub username                |  julien-talkair                    |
+| Website (optional)             |                      |
--- a/.github/contributors/juliensalinas.md
+++ b/.github/contributors/juliensalinas.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                         | Entry               |
+| ----------------------------- | ------------------- |
+| Name                          | Julien Salinas      |
+| Company name (if applicable)  | NLP Cloud           |
+| Title or role (if applicable) | Founder and CTO     |
+| Date                          | Mayb 14th 2021      |
+| GitHub username               | juliensalinas       |
+| Website (optional)            | https://nlpcloud.io |
--- a/.github/contributors/keshav.md
+++ b/.github/contributors/keshav.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Keshav Garg          |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | Jan 22, 2021         |
+| GitHub username                | KeshavG-lb           |
+| Website (optional)             |                      |
--- a/.github/contributors/mariosasko.md
+++ b/.github/contributors/mariosasko.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Mario Šaško          |
+| Company name (if applicable)   | TakeLab FER          |
+| Title or role (if applicable)  | R&D Intern           |
+| Date                           | 2021-07-12           |
+| GitHub username                | mariosasko           |
+| Website (optional)             |                      |
--- a/.github/contributors/meghanabhange.md
+++ b/.github/contributors/meghanabhange.md
@ -0,0 +1,107 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                    |
+|------------------------------- | ------------------------ |
+| Name                           | Meghana Bhange            |
+| Company name (if applicable)   | Verloop.io                 |
+| Title or role (if applicable)  | ML Engineer        |
+| Date                           | 2020-04-21               |
+| GitHub username                | meghanbhange                  |
+| Website (optional)             | https://meghana.blog |
+
--- a/.github/contributors/narayanacharya6.md
+++ b/.github/contributors/narayanacharya6.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Narayan Acharya      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 29 APR 2021          |
+| GitHub username                | narayanacharya6      |
+| Website (optional)             | narayanacharya.com   |
--- a/.github/contributors/nsorros.md
+++ b/.github/contributors/nsorros.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Nick Sorros          |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2/8/2021             |
+| GitHub username                | nsorros              |
+| Website (optional)             |                      |
--- a/.github/contributors/peter-exos.md
+++ b/.github/contributors/peter-exos.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Peter Baumann       |
+| Company name (if applicable)   |  Exos Financial      |
+| Title or role (if applicable)  |  data scientist      |
+| Date                           |  Feb 1st, 2021       |
+| GitHub username                |  peter-exos          |
+| Website (optional)             |                      |
--- a/.github/contributors/philipvollet.md
+++ b/.github/contributors/philipvollet.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Philip Vollet       |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |  22.09.2021          |
+| GitHub username                |  philipvollet        |
+| Website (optional)             |                      |
--- a/.github/contributors/plison.md
+++ b/.github/contributors/plison.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Pierre Lison         |
+| Company name (if applicable)   | Norsk Regnesentral   |
+| Title or role (if applicable)  | Senior Researcher    |
+| Date                           | 22.04.2021           |
+| GitHub username                | plison               |
+| Website (optional)             | www.nr.no/~plison    |
--- a/.github/contributors/reneoctavio.md
+++ b/.github/contributors/reneoctavio.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1.  The term "contribution" or "contributed materials" means any source code,
+    object code, patch, tool, sample, graphic, specification, manual,
+    documentation, or any other material posted or submitted by you to the project.
+
+2.  With respect to any worldwide copyrights, or copyright applications and
+    registrations, in your contribution:
+
+        * you hereby assign to us joint ownership, and to the extent that such
+        assignment is or becomes invalid, ineffective or unenforceable, you hereby
+        grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+        royalty-free, unrestricted license to exercise all rights under those
+        copyrights. This includes, at our option, the right to sublicense these same
+        rights to third parties through multiple levels of sublicensees or other
+        licensing arrangements;
+
+        * you agree that each of us can do all things in relation to your
+        contribution as if each of us were the sole owners, and if one of us makes
+        a derivative work of your contribution, the one who makes the derivative
+        work (or has it made will be the sole owner of that derivative work;
+
+        * you agree that you will not assert any moral rights in your contribution
+        against us, our licensees or transferees;
+
+        * you agree that we may register a copyright in your contribution and
+        exercise all ownership rights associated with it; and
+
+        * you agree that neither of us has any duty to consult with, obtain the
+        consent of, pay or render an accounting to the other for any use or
+        distribution of your contribution.
+
+3.  With respect to any patents you own, or that you can license without payment
+    to any third party, you hereby grant to us a perpetual, irrevocable,
+    non-exclusive, worldwide, no-charge, royalty-free license to:
+
+        * make, have made, use, sell, offer to sell, import, and otherwise transfer
+        your contribution in whole or in part, alone or in combination with or
+        included in any product, work or materials arising out of the project to
+        which your contribution was submitted, and
+
+        * at our option, to sublicense these same rights to third parties through
+        multiple levels of sublicensees or other licensing arrangements.
+
+4.  Except as set out above, you keep all right, title, and interest in your
+    contribution. The rights that you grant to us under these terms are effective
+    on the date you first submitted a contribution to us, even if your submission
+    took place before the date you sign these terms.
+
+5.  You covenant, represent, warrant and agree that:
+
+    - Each contribution that you submit is and shall be an original work of
+      authorship and you can legally grant the rights set out in this SCA;
+
+    - to the best of your knowledge, each contribution will not violate any
+      third party's copyrights, trademarks, patents, or other intellectual
+      property rights; and
+
+    - each contribution shall be in compliance with U.S. export control laws and
+      other applicable export and import laws. You agree to notify us if you
+      become aware of any circumstance which would make any of the foregoing
+      representations inaccurate in any respect. We may publicly disclose your
+      participation in the project, including the fact that you have signed the SCA.
+
+6.  This SCA is governed by the laws of the State of California and applicable
+    U.S. Federal law. Any choice of law rules will not apply.
+
+7.  Please place an “x” on one of the applicable statement below. Please do NOT
+    mark both statements:
+
+        * [x] I am signing on behalf of myself as an individual and no other person
+        or entity, including my employer, has or will have rights with respect to my
+        contributions.
+
+        * [ ] I am signing on behalf of my employer or a legal entity and I have the
+        actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                         | Entry                |
+| ----------------------------- | -------------------- |
+| Name                          | Rene Octavio Q. Dias |
+| Company name (if applicable)  |                      |
+| Title or role (if applicable) |                      |
+| Date                          | 2020-02-03           |
+| GitHub username               | reneoctavio          |
+| Website (optional)            |                      |
--- a/.github/contributors/sevdimali.md
+++ b/.github/contributors/sevdimali.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Sevdimali            |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 10/4/2021            |
+| GitHub username                | sevdimali            |
+| Website (optional)             | https://sevdimali.me |
--- a/.github/contributors/shigapov.md
+++ b/.github/contributors/shigapov.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                    |
+|------------------------------- | ------------------------ |
+| Name                           | Renat Shigapov           |
+| Company name (if applicable)   |                          |
+| Title or role (if applicable)  |                          |
+| Date                           | 2021-09-09               |
+| GitHub username                | shigapov                 |
+| Website (optional)             |                          |
--- a/.github/contributors/swfarnsworth.md
+++ b/.github/contributors/swfarnsworth.md
@ -0,0 +1,88 @@
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Steele Farnsworth                    |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |  13 August, 2021                    |
+| GitHub username                |   swfarnsworth                   |
+| Website (optional)             |                      |
+
--- a/.github/contributors/syrull.md
+++ b/.github/contributors/syrull.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Dimitar Ganev |
+| Company name (if applicable)   |  |
+| Title or role (if applicable)  |  |
+| Date                           | 2021/8/2 |
+| GitHub username                | syrull |
+| Website (optional)             |                      |
--- a/.github/contributors/thomashacker.md
+++ b/.github/contributors/thomashacker.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Edward Schmuhl                    |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                    |
+| Date                           |  09.07.2021                    |
+| GitHub username                |  thomashacker                    |
+| Website (optional)             |                      |
--- a/.github/contributors/tiangolo.md
+++ b/.github/contributors/tiangolo.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Sebastián Ramírez    |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2020-07-01           |
+| GitHub username                | tiangolo             |
+| Website (optional)             |                      |
--- a/.github/contributors/werew.md
+++ b/.github/contributors/werew.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Luigi Coniglio       |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 10/01/2021           |
+| GitHub username                | werew                |
+| Website (optional)             |                      |
--- a/.github/contributors/xadrianzetx.md
+++ b/.github/contributors/xadrianzetx.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |Adrian Zuber          |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |20-06-2021            |
+| GitHub username                |xadrianzetx           |
+| Website (optional)             |                      |
--- a/.github/contributors/yohasebe.md
+++ b/.github/contributors/yohasebe.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Yoichiro Hasebe      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | July 4th, 2021       |
+| GitHub username                | yohasebe             |
+| Website (optional)             | https://yohasebe.com |
--- a/.github/lock.yml
+++ b/.github/lock.yml
@ -1,19 +0,0 @@
-# Configuration for lock-threads - https://github.com/dessant/lock-threads
-
-# Number of days of inactivity before a closed issue or pull request is locked
-daysUntilLock: 30
-
-# Issues and pull requests with these labels will not be locked. Set to `[]` to disable
-exemptLabels: []
-
-# Label to add before locking, such as `outdated`. Set to `false` to disable
-lockLabel: false
-
-# Comment to post before locking. Set to `false` to disable
-lockComment: >
-  This thread has been automatically locked since there has not been
-  any recent activity after it was closed. Please open a new issue for
-  related bugs.
-
-# Limit to only `issues` or `pulls`
-only: issues
--- a/.github/no-response.yml
+++ b/.github/no-response.yml
@ -1,13 +0,0 @@
-# Configuration for probot-no-response - https://github.com/probot/no-response
-
-# Number of days of inactivity before an Issue is closed for lack of response
-daysUntilClose: 14
-# Label requiring a response
-responseRequiredLabel: more-info-needed
-# Comment to post when closing an Issue for lack of response. Set to `false` to disable
-closeComment: >
-  This issue has been automatically closed because there has been no response
-  to a request for more information from the original author. With only the
-  information that is currently in the issue, there's not enough information
-  to take action. If you're the original author, feel free to reopen the issue
-  if you have or find the answers needed to investigate further.
--- a/.github/spacy_universe_alert.py
+++ b/.github/spacy_universe_alert.py
@ -0,0 +1,67 @@
+import os
+import sys
+import json
+from datetime import datetime
+
+from slack_sdk.web.client import WebClient
+
+CHANNEL = "#alerts-universe"
+SLACK_TOKEN = os.environ.get("SLACK_BOT_TOKEN", "ENV VAR not available!")
+DATETIME_FORMAT = "%Y-%m-%dT%H:%M:%SZ"
+
+client = WebClient(SLACK_TOKEN)
+github_context = json.loads(sys.argv[1])
+
+event = github_context['event']
+pr_title = event['pull_request']["title"]
+pr_link = event['pull_request']["patch_url"].replace(".patch", "")
+pr_author_url = event['sender']["html_url"]
+pr_author_name = pr_author_url.rsplit('/')[-1]
+pr_created_at_dt = datetime.strptime(
+    event['pull_request']["created_at"],
+    DATETIME_FORMAT
+)
+pr_created_at = pr_created_at_dt.strftime("%c")
+pr_updated_at_dt = datetime.strptime(
+    event['pull_request']["updated_at"],
+    DATETIME_FORMAT
+)
+pr_updated_at = pr_updated_at_dt.strftime("%c")
+
+blocks = [
+    {
+      "type": "section",
+      "text": {
+        "type": "mrkdwn",
+        "text": "📣 New spaCy Universe Project Alert ✨"
+      }
+    },
+    {
+      "type": "section",
+      "fields": [
+        {
+          "type": "mrkdwn",
+          "text": f"*Pull Request:*\n<{pr_link}|{pr_title}>"
+        },
+        {
+          "type": "mrkdwn",
+          "text": f"*Author:*\n<{pr_author_url}|{pr_author_name}>"
+        },
+        {
+          "type": "mrkdwn",
+          "text": f"*Created at:*\n {pr_created_at}"
+        },
+        {
+          "type": "mrkdwn",
+          "text": f"*Last Updated:*\n {pr_updated_at}"
+        }
+      ]
+    }
+  ]
+
+
+client.chat_postMessage(
+    channel=CHANNEL,
+    text="spaCy universe project PR alert",
+    blocks=blocks
+)
--- a/.github/validate_universe_json.py
+++ b/.github/validate_universe_json.py
@ -0,0 +1,19 @@
+import json
+import re
+import sys
+from pathlib import Path
+
+
+def validate_json(document):
+    universe_file = Path(document)
+    with universe_file.open() as f:
+        universe_data = json.load(f)
+        for entry in universe_data["resources"]:
+            if "github" in entry:
+                assert not re.match(
+                    r"^(http:)|^(https:)", entry["github"]
+                ), "Github field should be user/repo, not a url"
+
+
+if __name__ == "__main__":
+    validate_json(str(sys.argv[1]))
--- a/.github/workflows/cibuildwheel.yml
+++ b/.github/workflows/cibuildwheel.yml
@ -0,0 +1,99 @@
+name: Build
+
+on:
+  push:
+    tags:
+      # ytf did they invent their own syntax that's almost regex?
+      # ** matches 'zero or more of any character'
+      - 'release-v[0-9]+.[0-9]+.[0-9]+**'
+      - 'prerelease-v[0-9]+.[0-9]+.[0-9]+**'
+jobs:
+  build_wheels:
+    name: Build wheels on ${{ matrix.os }}
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        # macos-13 is an intel runner, macos-14 is apple silicon
+        os: [ubuntu-latest, windows-latest, macos-13, macos-14, ubuntu-24.04-arm]
+
+    steps:
+      - uses: actions/checkout@v4
+      # aarch64 (arm) is built via qemu emulation
+      # QEMU is sadly too slow. We need to wait for public ARM support
+      #- name: Set up QEMU
+      #  if: runner.os == 'Linux'
+      #  uses: docker/setup-qemu-action@v3
+      #  with:
+      #    platforms: all
+      - name: Build wheels
+        uses: pypa/cibuildwheel@v2.21.3
+        env:
+          CIBW_ARCHS_LINUX: auto
+        with:
+          package-dir: .
+          output-dir: wheelhouse
+          config-file: "{package}/pyproject.toml"
+      - uses: actions/upload-artifact@v4
+        with:
+          name: cibw-wheels-${{ matrix.os }}-${{ strategy.job-index }}
+          path: ./wheelhouse/*.whl
+
+  build_sdist:
+    name: Build source distribution
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Build sdist
+        run: pipx run build --sdist
+      - uses: actions/upload-artifact@v4
+        with:
+          name: cibw-sdist
+          path: dist/*.tar.gz
+  create_release:
+    needs: [build_wheels, build_sdist]
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+      checks: write
+      actions: read
+      issues: read
+      packages: write
+      pull-requests: read
+      repository-projects: read
+      statuses: read
+    steps:
+      - name: Get the tag name and determine if it's a prerelease
+        id: get_tag_info
+        run: |
+          FULL_TAG=${GITHUB_REF#refs/tags/}
+          if [[ $FULL_TAG == release-* ]]; then
+            TAG_NAME=${FULL_TAG#release-}
+            IS_PRERELEASE=false
+          elif [[ $FULL_TAG == prerelease-* ]]; then
+            TAG_NAME=${FULL_TAG#prerelease-}
+            IS_PRERELEASE=true
+          else
+            echo "Tag does not match expected patterns" >&2
+            exit 1
+          fi
+          echo "FULL_TAG=$TAG_NAME" >> $GITHUB_ENV
+          echo "TAG_NAME=$TAG_NAME" >> $GITHUB_ENV
+          echo "IS_PRERELEASE=$IS_PRERELEASE" >> $GITHUB_ENV
+      - uses: actions/download-artifact@v4
+        with:
+          # unpacks all CIBW artifacts into dist/
+          pattern: cibw-*
+          path: dist
+          merge-multiple: true
+      - name: Create Draft Release
+        id: create_release
+        uses: softprops/action-gh-release@v2
+        if: startsWith(github.ref, 'refs/tags/')
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        with:
+          name: ${{ env.TAG_NAME }}
+          draft: true
+          prerelease: ${{ env.IS_PRERELEASE }}
+          files: "./dist/*" 
--- a/.github/workflows/explosionbot.yml
+++ b/.github/workflows/explosionbot.yml
@ -0,0 +1,28 @@
+name: Explosion Bot
+
+on:
+  issue_comment:
+    types:
+      - created
+      - edited
+
+jobs:
+  explosion-bot:
+    if: github.repository_owner == 'explosion'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Dump GitHub context
+        env:
+          GITHUB_CONTEXT: ${{ toJson(github) }}
+        run: echo "$GITHUB_CONTEXT"
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v4
+      - name: Install and run explosion-bot
+        run: |
+          pip install git+https://${{ secrets.EXPLOSIONBOT_TOKEN }}@github.com/explosion/explosion-bot
+          python -m explosionbot
+        env:
+          INPUT_TOKEN: ${{ secrets.EXPLOSIONBOT_TOKEN }}
+          INPUT_BK_TOKEN: ${{ secrets.BUILDKITE_SECRET }}
+          ENABLED_COMMANDS: "test_gpu,test_slow,test_slow_gpu"
+          ALLOWED_TEAMS: "spaCy"
--- a/.github/workflows/gputests.yml.disabled
+++ b/.github/workflows/gputests.yml.disabled
@ -0,0 +1,22 @@
+name: Weekly GPU tests
+
+on:
+  schedule:
+    - cron: '0 1 * * MON'
+
+jobs:
+  weekly-gputests:
+    strategy:
+      fail-fast: false
+      matrix:
+        branch: [master, v4]
+    if: github.repository_owner == 'explosion'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Trigger buildkite build
+        uses: buildkite/trigger-pipeline-action@v1.2.0
+        env:
+          PIPELINE: explosion-ai/spacy-slow-gpu-tests
+          BRANCH: ${{ matrix.branch }}
+          MESSAGE: ":github: Weekly GPU + slow tests - triggered from a GitHub Action"
+          BUILDKITE_API_ACCESS_TOKEN: ${{ secrets.BUILDKITE_SECRET }}
--- a/.github/workflows/issue-manager.yml
+++ b/.github/workflows/issue-manager.yml
@ -13,9 +13,10 @@ on:

 jobs:
  issue-manager:
+    if: github.repository_owner == 'explosion'
    runs-on: ubuntu-latest
    steps:
-      - uses: tiangolo/issue-manager@0.2.1
+      - uses: tiangolo/issue-manager@0.4.0
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          config: >
@ -25,5 +26,11 @@ jobs:
                "message": "This issue has been automatically closed because it was answered and there was no follow-up discussion.",
                "remove_label_on_comment": true,
                "remove_label_on_close": true
+              },
+              "more-info-needed": {
+                "delay": "P7D",
+                "message": "This issue has been automatically closed because there has been no response to a request for more information from the original author. With only the information that is currently in the issue, there's not enough information to take action. If you're the original author, feel free to reopen the issue if you have or find the answers needed to investigate further.",
+                "remove_label_on_comment": true,
+                "remove_label_on_close": true
              }
            }
--- a/.github/workflows/lock.yml
+++ b/.github/workflows/lock.yml
@ -0,0 +1,26 @@
+name: 'Lock Threads'
+
+on:
+  schedule:
+    - cron: '0 0 * * *'  # check every day
+  workflow_dispatch:
+
+permissions:
+  issues: write
+
+concurrency:
+  group: lock
+
+jobs:
+  action:
+    if: github.repository_owner == 'explosion'
+    runs-on: ubuntu-latest
+    steps:
+      - uses: dessant/lock-threads@v5
+        with:
+          process-only: 'issues'
+          issue-inactive-days: '30'
+          issue-comment: >
+            This thread has been automatically locked since there
+            has not been any recent activity after it was closed.
+            Please open a new issue for related bugs.
--- a/.github/workflows/publish_pypi.yml
+++ b/.github/workflows/publish_pypi.yml
@ -0,0 +1,29 @@
+# The cibuildwheel action triggers on creation of a release, this
+# triggers on publication.
+# The expected workflow is to create a draft release and let the wheels
+# upload, and then hit 'publish', which uploads to PyPi.
+
+on:
+  release:
+    types:
+      - published
+
+jobs:
+  upload_pypi:
+    runs-on: ubuntu-latest
+    environment:
+      name: pypi
+      url: https://pypi.org/p/spacy
+    permissions:
+      id-token: write
+      contents: read
+    if: github.event_name == 'release' && github.event.action == 'published'
+    # or, alternatively, upload to PyPI on every tag starting with 'v' (remove on: release above to use this)
+    # if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v')
+    steps:
+      - uses: robinraju/release-downloader@v1
+        with:
+          tag: ${{ github.event.release.tag_name }}
+          fileName: '*'
+          out-file-path: 'dist'
+      - uses: pypa/gh-action-pypi-publish@release/v1
--- a/.github/workflows/slowtests.yml.disabled
+++ b/.github/workflows/slowtests.yml.disabled
@ -0,0 +1,38 @@
+name: Daily slow tests
+
+on:
+  schedule:
+    - cron: '0 0 * * *'
+
+jobs:
+  daily-slowtests:
+    strategy:
+      fail-fast: false
+      matrix:
+        branch: [master, v4]
+    if: github.repository_owner == 'explosion'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+        with:
+          ref: ${{ matrix.branch }}
+      - name: Get commits from past 24 hours
+        id: check_commits
+        run: |
+          today=$(date '+%Y-%m-%d %H:%M:%S')
+          yesterday=$(date -d "yesterday" '+%Y-%m-%d %H:%M:%S')
+          if git log --after="$yesterday" --before="$today" | grep commit ; then
+            echo run_tests=true >> $GITHUB_OUTPUT
+          else
+            echo run_tests=false >> $GITHUB_OUTPUT
+          fi
+
+      - name: Trigger buildkite build
+        if: steps.check_commits.outputs.run_tests == 'true'
+        uses: buildkite/trigger-pipeline-action@v1.2.0
+        env:
+          PIPELINE: explosion-ai/spacy-slow-tests
+          BRANCH: ${{ matrix.branch }}
+          MESSAGE: ":github: Daily slow tests - triggered from a GitHub Action"
+          BUILDKITE_API_ACCESS_TOKEN: ${{ secrets.BUILDKITE_SECRET }}
--- a/.github/workflows/spacy_universe_alert.yml
+++ b/.github/workflows/spacy_universe_alert.yml
@ -0,0 +1,33 @@
+name: spaCy universe project alert
+
+on:
+  pull_request_target:
+    paths:
+      - "website/meta/universe.json"
+
+jobs:
+  build:
+    if: github.repository_owner == 'explosion'
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Dump GitHub context
+        env:
+          GITHUB_CONTEXT: ${{ toJson(github) }}
+          PR_NUMBER: ${{github.event.number}}
+        run: |
+          echo "$GITHUB_CONTEXT"
+
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v4
+        with:
+          python-version: '3.10'
+      - name: Install Bernadette app dependency and send an alert
+        env:
+          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
+          GITHUB_CONTEXT: ${{ toJson(github) }}
+          CHANNEL: "#alerts-universe"
+        run: |
+          pip install slack-sdk==3.17.2 aiohttp==3.8.1
+          echo "$CHANNEL"
+          python .github/spacy_universe_alert.py "$GITHUB_CONTEXT"
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@ -0,0 +1,175 @@
+name: tests
+
+on:
+  push:
+    tags-ignore:
+      - '**'
+    branches-ignore:
+      - "spacy.io"
+      - "nightly.spacy.io"
+      - "v2.spacy.io"
+    paths-ignore:
+      - "*.md"
+      - "*.mdx"
+      - "website/**"
+  pull_request:
+    types: [opened, synchronize, reopened, edited]
+    paths-ignore:
+      - "*.md"
+      - "*.mdx"
+      - "website/**"
+
+jobs:
+  validate:
+    name: Validate
+    if: github.repository_owner == 'explosion'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repo
+        uses: actions/checkout@v4
+
+      - name: Configure Python version
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.10"
+
+      - name: black
+        run: |
+          python -m pip install black -c requirements.txt
+          python -m black spacy --check
+      - name: isort
+        run: |
+          python -m pip install isort -c requirements.txt
+          python -m isort spacy --check
+      - name: flake8
+        run: |
+          python -m pip install flake8==5.0.4
+          python -m flake8 spacy --count --select=E901,E999,F821,F822,F823,W605 --show-source --statistics
+          # Unfortunately cython-lint isn't working after the shift to Cython 3.
+          #- name: cython-lint
+          #  run: |
+          #    python -m pip install cython-lint -c requirements.txt
+          #    # E501: line too log, W291: trailing whitespace, E266: too many leading '#' for block comment
+          #    cython-lint spacy --ignore E501,W291,E266
+
+  tests:
+    name: Test
+    needs: Validate
+    strategy:
+      fail-fast: true
+      matrix:
+        os: [ubuntu-latest, windows-latest, macos-latest]
+        python_version: ["3.9", "3.12", "3.13"]
+
+    runs-on: ${{ matrix.os }}
+
+    steps:
+      - name: Check out repo
+        uses: actions/checkout@v4
+
+      - name: Configure Python version
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python_version }}
+
+      - name: Install dependencies
+        run: |
+          python -m pip install -U build pip setuptools
+          python -m pip install -U -r requirements.txt
+
+      - name: Build sdist
+        run: |
+          python -m build --sdist
+
+      - name: Run mypy
+        run: |
+          python -m mypy spacy
+        if: matrix.python_version != '3.7'
+
+      - name: Delete source directory and .egg-info
+        run: |
+          rm -rf spacy *.egg-info
+        shell: bash
+
+      - name: Uninstall all packages
+        run: |
+          python -m pip freeze
+          python -m pip freeze --exclude pywin32 > installed.txt
+          python -m pip uninstall -y -r installed.txt
+
+      - name: Install from sdist
+        run: |
+          SDIST=$(python -c "import os;print(os.listdir('./dist')[-1])" 2>&1)
+          SPACY_NUM_BUILD_JOBS=2 python -m pip install dist/$SDIST
+        shell: bash
+
+      - name: Test import
+        run: python -W error -c "import spacy"
+
+      - name: "Test download CLI"
+        run: |
+          python -m spacy download ca_core_news_sm
+          python -m spacy download ca_core_news_md
+          python -c "import spacy; nlp=spacy.load('ca_core_news_sm'); doc=nlp('test')"
+        if: matrix.python_version == '3.9'
+
+      - name: "Test download_url in info CLI"
+        run: |
+          python -W error -m spacy info ca_core_news_sm | grep -q download_url
+        if: matrix.python_version == '3.9'
+
+      - name: "Test no warnings on load (#11713)"
+        run: |
+          python -W error -c "import ca_core_news_sm; nlp = ca_core_news_sm.load(); doc=nlp('test')"
+        if: matrix.python_version == '3.9'
+
+      - name: "Test convert CLI"
+        run: |
+          python -m spacy convert extra/example_data/ner_example_data/ner-token-per-line-conll2003.json .
+        if: matrix.python_version == '3.9'
+
+      - name: "Test debug config CLI"
+        run: |
+          python -m spacy init config -p ner -l ca ner.cfg
+          python -m spacy debug config ner.cfg --paths.train ner-token-per-line-conll2003.spacy --paths.dev ner-token-per-line-conll2003.spacy
+        if: matrix.python_version == '3.9'
+
+      - name: "Test debug data CLI"
+        run: |
+          # will have errors due to sparse data, check for summary in output
+          python -m spacy debug data ner.cfg --paths.train ner-token-per-line-conll2003.spacy --paths.dev ner-token-per-line-conll2003.spacy | grep -q Summary
+        if: matrix.python_version == '3.9'
+
+      - name: "Test train CLI"
+        run: |
+          python -m spacy train ner.cfg --paths.train ner-token-per-line-conll2003.spacy --paths.dev ner-token-per-line-conll2003.spacy --training.max_steps 10 --gpu-id -1
+        if: matrix.python_version == '3.9'
+
+      - name: "Test assemble CLI"
+        run: |
+          python -c "import spacy; config = spacy.util.load_config('ner.cfg'); config['components']['ner'] = {'source': 'ca_core_news_sm'}; config.to_disk('ner_source_sm.cfg')"
+          python -m spacy assemble ner_source_sm.cfg output_dir
+        env:
+          PYTHONWARNINGS: "error,ignore::DeprecationWarning" 
+        if: matrix.python_version == '3.9'
+
+      - name: "Test assemble CLI vectors warning"
+        run: |
+          python -c "import spacy; config = spacy.util.load_config('ner.cfg'); config['components']['ner'] = {'source': 'ca_core_news_md'}; config.to_disk('ner_source_md.cfg')"
+          python -m spacy assemble ner_source_md.cfg output_dir 2>&1 | grep -q W113
+        if: matrix.python_version == '3.9'
+
+      - name: "Install test requirements"
+        run: |
+          python -m pip install -U -r requirements.txt
+
+      - name: "Run CPU tests"
+        run: |
+          python -m pytest --pyargs spacy -W error
+        if: "!(startsWith(matrix.os, 'macos') && matrix.python_version == '3.11')"
+
+      - name: "Run CPU tests with thinc-apple-ops"
+        run: |
+          python -m pip install 'spacy[apple]'
+          python -m pytest --pyargs spacy
+        if: startsWith(matrix.os, 'macos') && matrix.python_version == '3.11'
--- a/.github/workflows/universe_validation.yml
+++ b/.github/workflows/universe_validation.yml
@ -0,0 +1,32 @@
+name: universe validation
+
+on:
+  push:
+    branches-ignore:
+      - "spacy.io"
+      - "nightly.spacy.io"
+      - "v2.spacy.io"
+    paths:
+      - "website/meta/universe.json"
+  pull_request:
+    types: [opened, synchronize, reopened, edited]
+    paths:
+      - "website/meta/universe.json"
+
+jobs:
+  validate:
+    name: Validate
+    if: github.repository_owner == 'explosion'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repo
+        uses: actions/checkout@v4
+
+      - name: Configure Python version
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.7"
+
+      - name: Validate website/meta/universe.json
+        run: |
+          python .github/validate_universe_json.py website/meta/universe.json
--- a/.gitignore
+++ b/.gitignore
@ -10,20 +10,11 @@ spacy/tests/package/setup.cfg
 spacy/tests/package/pyproject.toml
 spacy/tests/package/requirements.txt

-# Website
-website/.cache/
-website/public/
-website/node_modules
-website/.npm
-website/logs
-*.log
-npm-debug.log*
-quickstart-training-generator.js
-
 # Cython / C extensions
 cythonize.json
 spacy/*.html
 *.cpp
+*.c
 *.so

 # Vim / VSCode / editors
@ -43,12 +34,15 @@ __pycache__/
 .env*
 .~env/
 .venv
+env3.6/
 venv/
 env3.*/
 .dev
 .denv
 .pypyenv
 .pytest_cache/
+.mypy_cache/
+.hypothesis/

 # Distribution / packaging
 env/
@ -118,3 +112,6 @@ Desktop.ini

 # Pycharm project files
 *.idea
+
+# IPython
+.ipynb_checkpoints/
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -0,0 +1,13 @@
+repos:
+-   repo: https://github.com/ambv/black
+    rev: 22.3.0
+    hooks:
+    - id: black
+      language_version: python3.7
+      additional_dependencies: ['click==8.0.4']
+-   repo: https://github.com/pycqa/flake8
+    rev: 5.0.4
+    hooks:
+    - id: flake8
+      args:
+        - "--config=setup.cfg"
--- a/.travis.yml
+++ b/.travis.yml
@ -1,23 +0,0 @@
-language: python
-sudo: false
-cache: pip
-dist: trusty
-group: edge
-python:
-   - "2.7"
-os:
-  - linux
-install:
-  - "python -m pip install -U pip setuptools"
-  - "pip install -e . --prefer-binary"
-script:
-  - "cat /proc/cpuinfo | grep flags | head -n 1"
-  - "pip install -r requirements.txt"
-  - "python -m pytest --tb=native spacy"
-branches:
-  except:
-    - spacy.io
-notifications:
-  slack:
-    secure: F8GvqnweSdzImuLL64TpfG0i5rYl89liyr9tmFVsHl4c0DNiDuGhZivUz0M1broS8svE3OPOllLfQbACG/4KxD890qfF9MoHzvRDlp7U+RtwMV/YAkYn8MGWjPIbRbX0HpGdY7O2Rc9Qy4Kk0T8ZgiqXYIqAz2Eva9/9BlSmsJQ=
-  email: false
--- a/8
+++ b/8
@ -1,8 +0,0 @@
-@software{spacy,
-  author = {Honnibal, Matthew and Montani, Ines and Van Landeghem, Sofie and Boyd, Adriane},
-  title = {{spaCy: Industrial-strength Natural Language Processing in Python}},
-  year = 2020,
-  publisher = {Zenodo},
-  doi = {10.5281/zenodo.1212303},
-  url = {https://doi.org/10.5281/zenodo.1212303}
-}
--- a/CITATION.cff
+++ b/CITATION.cff
@ -0,0 +1,16 @@
+cff-version: 1.2.0
+preferred-citation:
+  type: article
+  message: "If you use spaCy, please cite it as below."
+  authors:
+  - family-names: "Honnibal"
+    given-names: "Matthew"
+  - family-names: "Montani"
+    given-names: "Ines"
+  - family-names: "Van Landeghem"
+    given-names: "Sofie"
+  - family-names: "Boyd"
+    given-names: "Adriane"
+  title: "spaCy: Industrial-strength Natural Language Processing in Python"
+  doi: "10.5281/zenodo.1212303"
+  year: 2020
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -2,10 +2,8 @@

 # Contribute to spaCy

-Thanks for your interest in contributing to spaCy 🎉 The project is maintained
-by [@honnibal](https://github.com/honnibal) and [@ines](https://github.com/ines),
-and we'll do our best to help you get started. This page will give you a quick
-overview of how things are organised and most importantly, how to get involved.
+Thanks for your interest in contributing to spaCy 🎉 This page will give you a quick
+overview of how things are organized and most importantly, how to get involved.

 ## Table of contents

@ -37,39 +35,38 @@ so that more people can benefit from it.

 When opening an issue, use a **descriptive title** and include your
 **environment** (operating system, Python version, spaCy version). Our
-[issue template](https://github.com/explosion/spaCy/issues/new) helps you
+[issue templates](https://github.com/explosion/spaCy/issues/new/choose) help you
 remember the most important details to include. If you've discovered a bug, you
 can also submit a [regression test](#fixing-bugs) straight away. When you're
 opening an issue to report the bug, simply refer to your pull request in the
 issue body. A few more tips:

-   **Describing your issue:** Try to provide as many details as possible. What
-    exactly goes wrong? _How_ is it failing? Is there an error?
-    "XY doesn't work" usually isn't that helpful for tracking down problems. Always
-    remember to include the code you ran and if possible, extract only the relevant
-    parts and don't just dump your entire script. This will make it easier for us to
-    reproduce the error.
+- **Describing your issue:** Try to provide as many details as possible. What
+  exactly goes wrong? _How_ is it failing? Is there an error?
+  "XY doesn't work" usually isn't that helpful for tracking down problems. Always
+  remember to include the code you ran and if possible, extract only the relevant
+  parts and don't just dump your entire script. This will make it easier for us to
+  reproduce the error.

-   **Getting info about your spaCy installation and environment:** If you're
-    using spaCy v1.7+, you can use the command line interface to print details and
-    even format them as Markdown to copy-paste into GitHub issues:
-    `python -m spacy info --markdown`.
+- **Getting info about your spaCy installation and environment:** You can use the command line interface to print details and
+  even format them as Markdown to copy-paste into GitHub issues:
+  `python -m spacy info --markdown`.

-   **Checking the model compatibility:** If you're having problems with a
-    [statistical model](https://spacy.io/models), it may be because the
-    model is incompatible with your spaCy installation. In spaCy v2.0+, you can check
-    this on the command line by running `python -m spacy validate`.
+- **Checking the model compatibility:** If you're having problems with a
+  [statistical model](https://spacy.io/models), it may be because the
+  model is incompatible with your spaCy installation. In spaCy v2.0+, you can check
+  this on the command line by running `python -m spacy validate`.

-   **Sharing a model's output, like dependencies and entities:** spaCy v2.0+
-    comes with [built-in visualizers](https://spacy.io/usage/visualizers) that
-    you can run from within your script or a Jupyter notebook. For some issues, it's
-    helpful to **include a screenshot** of the visualization. You can simply drag and
-    drop the image into GitHub's editor and it will be uploaded and included.
+- **Sharing a model's output, like dependencies and entities:** spaCy
+  comes with [built-in visualizers](https://spacy.io/usage/visualizers) that
+  you can run from within your script or a Jupyter notebook. For some issues, it's
+  helpful to **include a screenshot** of the visualization. You can simply drag and
+  drop the image into GitHub's editor and it will be uploaded and included.

-   **Sharing long blocks of code or logs:** If you need to include long code,
-    logs or tracebacks, you can wrap them in `<details>` and `</details>`. This
-    [collapses the content](https://developer.mozilla.org/en/docs/Web/HTML/Element/details)
-    so it only becomes visible on click, making the issue easier to read and follow.
+- **Sharing long blocks of code or logs:** If you need to include long code,
+  logs or tracebacks, you can wrap them in `<details>` and `</details>`. This
+  [collapses the content](https://developer.mozilla.org/en/docs/Web/HTML/Element/details)
+  so it only becomes visible on click, making the issue easier to read and follow.

 ### Issue labels

@ -94,39 +91,39 @@ shipped in the core library, and what could be provided in other packages. Our
 philosophy is to prefer a smaller core library. We generally ask the following
 questions:

-   **What would this feature look like if implemented in a separate package?**
-    Some features would be very difficult to implement externally – for example,
-    changes to spaCy's built-in methods. In contrast, a library of word
-    alignment functions could easily live as a separate package that depended on
-    spaCy — there's little difference between writing `import word_aligner` and
-    `import spacy.word_aligner`. spaCy v2.0+ makes it easy to implement
-    [custom pipeline components](https://spacy.io/usage/processing-pipelines#custom-components),
-    and add your own attributes, properties and methods to the `Doc`, `Token` and
-    `Span`. If you're looking to implement a new spaCy feature, starting with a
-    custom component package is usually the best strategy. You won't have to worry
-    about spaCy's internals and you can test your module in an isolated
-    environment. And if it works well, we can always integrate it into the core
-    library later.
+- **What would this feature look like if implemented in a separate package?**
+  Some features would be very difficult to implement externally – for example,
+  changes to spaCy's built-in methods. In contrast, a library of word
+  alignment functions could easily live as a separate package that depended on
+  spaCy — there's little difference between writing `import word_aligner` and
+  `import spacy.word_aligner`. spaCy makes it easy to implement
+  [custom pipeline components](https://spacy.io/usage/processing-pipelines#custom-components),
+  and add your own attributes, properties and methods to the `Doc`, `Token` and
+  `Span`. If you're looking to implement a new spaCy feature, starting with a
+  custom component package is usually the best strategy. You won't have to worry
+  about spaCy's internals and you can test your module in an isolated
+  environment. And if it works well, we can always integrate it into the core
+  library later.

-   **Would the feature be easier to implement if it relied on "heavy" dependencies spaCy doesn't currently require?**
-    Python has a very rich ecosystem. Libraries like scikit-learn, SciPy, Gensim or
-    TensorFlow/Keras do lots of useful things — but we don't want to have them as
-    dependencies. If the feature requires functionality in one of these libraries,
-    it's probably better to break it out into a different package.
+- **Would the feature be easier to implement if it relied on "heavy" dependencies spaCy doesn't currently require?**
+  Python has a very rich ecosystem. Libraries like PyTorch, TensorFlow, scikit-learn, SciPy or Gensim
+  do lots of useful things — but we don't want to have them as default
+  dependencies. If the feature requires functionality in one of these libraries,
+  it's probably better to break it out into a different package.

-   **Is the feature orthogonal to the current spaCy functionality, or overlapping?**
-    spaCy strongly prefers to avoid having 6 different ways of doing the same thing.
-    As better techniques are developed, we prefer to drop support for "the old way".
-    However, it's rare that one approach _entirely_ dominates another. It's very
-    common that there's still a use-case for the "obsolete" approach. For instance,
-    [WordNet](https://wordnet.princeton.edu/) is still very useful — but word
-    vectors are better for most use-cases, and the two approaches to lexical
-    semantics do a lot of the same things. spaCy therefore only supports word
-    vectors, and support for WordNet is currently left for other packages.
+- **Is the feature orthogonal to the current spaCy functionality, or overlapping?**
+  spaCy strongly prefers to avoid having 6 different ways of doing the same thing.
+  As better techniques are developed, we prefer to drop support for "the old way".
+  However, it's rare that one approach _entirely_ dominates another. It's very
+  common that there's still a use-case for the "obsolete" approach. For instance,
+  [WordNet](https://wordnet.princeton.edu/) is still very useful — but word
+  vectors are better for most use-cases, and the two approaches to lexical
+  semantics do a lot of the same things. spaCy therefore only supports word
+  vectors, and support for WordNet is currently left for other packages.

-   **Do you need the feature to get basic things done?** We do want spaCy to be
-    at least somewhat self-contained. If we keep needing some feature in our
-    recipes, that does provide some argument for bringing it "in house".
+- **Do you need the feature to get basic things done?** We do want spaCy to be
+  at least somewhat self-contained. If we keep needing some feature in our
+  recipes, that does provide some argument for bringing it "in house".

 ### Getting started

@ -137,65 +134,65 @@ files, a compiler, [pip](https://pip.pypa.io/en/latest/installing/),
 [virtualenv](https://virtualenv.pypa.io/en/stable/) and
 [git](https://git-scm.com) installed. The compiler is usually the trickiest part.

-```
-python -m pip install -U pip
-git clone https://github.com/explosion/spaCy
-cd spaCy
-
-python -m venv .env
-source .env/bin/activate
-export PYTHONPATH=`pwd`
-pip install -r requirements.txt
-python setup.py build_ext --inplace
-```
-
-If you've made changes to `.pyx` files, you need to recompile spaCy before you
+If you've made changes to `.pyx` files, you need to **recompile spaCy** before you
 can test your changes by re-running `python setup.py build_ext --inplace`.
 Changes to `.py` files will be effective immediately.

 📖 **For more details and instructions, see the documentation on [compiling spaCy from source](https://spacy.io/usage/#source) and the [quickstart widget](https://spacy.io/usage/#section-quickstart) to get the right commands for your platform and Python version.**

-### Contributor agreement
-
-If you've made a contribution to spaCy, you should fill in the
-[spaCy contributor agreement](.github/CONTRIBUTOR_AGREEMENT.md) to ensure that
-your contribution can be used across the project. If you agree to be bound by
-the terms of the agreement, fill in the [template](.github/CONTRIBUTOR_AGREEMENT.md)
-and include it with your pull request, or submit it separately to
-[`.github/contributors/`](/.github/contributors). The name of the file should be
-your GitHub username, with the extension `.md`. For example, the user
-example_user would create the file `.github/contributors/example_user.md`.
-
 ### Fixing bugs

 When fixing a bug, first create an
-[issue](https://github.com/explosion/spaCy/issues) if one does not already exist.
-The description text can be very short – we don't want to make this too
+[issue](https://github.com/explosion/spaCy/issues) if one does not already
+exist. The description text can be very short – we don't want to make this too
 bureaucratic.

-Next, create a test file named `test_issue[ISSUE NUMBER].py` in the
-[`spacy/tests/regression`](spacy/tests/regression) folder. Test for the bug
-you're fixing, and make sure the test fails. Next, add and commit your test file
-referencing the issue number in the commit message. Finally, fix the bug, make
-sure your test passes and reference the issue in your commit message.
+Next, add a test to the relevant file in the
+[`spacy/tests`](spacy/tests)folder. Then add a [pytest
+mark](https://docs.pytest.org/en/6.2.x/example/markers.html#working-with-custom-markers),
+`@pytest.mark.issue(NUMBER)`, to reference the issue number.
+
+```python
+# Assume you're fixing Issue #1234
+@pytest.mark.issue(1234)
+def test_issue1234():
+    ...
+```
+
+Test for the bug you're fixing, and make sure the test fails. Next, add and
+commit your test file. Finally, fix the bug, make sure your test passes and
+reference the issue number in your pull request description.

 📖 **For more information on how to add tests, check out the [tests README](spacy/tests/README.md).**

 ## Code conventions

 Code should loosely follow [pep8](https://www.python.org/dev/peps/pep-0008/).
-As of `v2.1.0`, spaCy uses [`black`](https://github.com/ambv/black) for code
+spaCy uses [`black`](https://github.com/ambv/black) for code
 formatting and [`flake8`](http://flake8.pycqa.org/en/latest/) for linting its
 Python modules. If you've built spaCy from source, you'll already have both
 tools installed.

+As a general rule of thumb, we use f-strings for any formatting of strings.
+One exception are calls to Python's `logging` functionality.
+To avoid unnecessary string conversions in these cases, we use string formatting
+templates with `%s` and `%d` etc.
+
 **⚠️ Note that formatting and linting is currently only possible for Python
 modules in `.py` files, not Cython modules in `.pyx` and `.pxd` files.**

+### Pre-Commit Hooks
+
+After cloning the repo, after installing the packages from `requirements.txt`, enter the repo folder and run `pre-commit install`.
+Each time a `git commit` is initiated, `black` and `flake8` will run automatically on the modified files only.
+
+In case of error, or when `black` modified a file, the modified file needs to be `git add` once again and a new
+`git commit` has to be issued.
+
 ### Code formatting

 [`black`](https://github.com/ambv/black) is an opinionated Python code
-formatter, optimised to produce readable code and small diffs. You can run
+formatter, optimized to produce readable code and small diffs. You can run
 `black` from the command-line, or via your code editor. For example, if you're
 using [Visual Studio Code](https://code.visualstudio.com/), you can add the
 following to your `settings.json` to use `black` for formatting and auto-format
@ -203,10 +200,10 @@ your files on save:

 ```json
 {
-    "python.formatting.provider": "black",
-    "[python]": {
-        "editor.formatOnSave": true
-    }
+  "python.formatting.provider": "black",
+  "[python]": {
+    "editor.formatOnSave": true
+  }
 }
 ```

@ -216,15 +213,14 @@ list of available editor integrations.
 #### Disabling formatting

 There are a few cases where auto-formatting doesn't improve readability – for
-example, in some of the the language data files like the `tag_map.py`, or in
-the tests that construct `Doc` objects from lists of words and other labels.
+example, in some of the language data files or in the tests that construct `Doc` objects from lists of words and other labels.
 Wrapping a block in `# fmt: off` and `# fmt: on` lets you disable formatting
 for that particular code. Here's an example:

 ```python
 # fmt: off
 text = "I look forward to using Thingamajig.  I've been told it will make my life easier..."
-heads = [1, 0, -1, -2, -1, -1, -5, -1, 3, 2, 1, 0, 2, 1, -3, 1, 1, -3, -7]
+heads = [1, 1, 1, 1, 3, 4, 1, 6, 11, 11, 11, 11, 14, 14, 11, 16, 17, 14, 11]
 deps = ["nsubj", "ROOT", "advmod", "prep", "pcomp", "dobj", "punct", "",
        "nsubjpass", "aux", "auxpass", "ROOT", "nsubj", "aux", "ccomp",
        "poss", "nsubj", "ccomp", "punct"]
@ -242,7 +238,7 @@ also want to keep an eye on unused declared variables or repeated
 (i.e. overwritten) dictionary keys. If your code was formatted with `black`
 (see above), you shouldn't see any formatting-related warnings.

-The [`.flake8`](.flake8) config defines the configuration we use for this
+The `flake8` section in [`setup.cfg`](setup.cfg) defines the configuration we use for this
 codebase. For example, we're not super strict about the line length, and we're
 excluding very large files like lemmatization and tokenizer exception tables.

@ -280,40 +276,32 @@ except:  # noqa: E722

 ### Python conventions

-All Python code must be written in an **intersection of Python 2 and Python 3**.
-This is easy in Cython, but somewhat ugly in Python. Logic that deals with
-Python or platform compatibility should only live in
-[`spacy.compat`](spacy/compat.py). To distinguish them from the builtin
-functions, replacement functions are suffixed with an underscore, for example
-`unicode_`. If you need to access the user's version or platform information,
-for example to show more specific error messages, you can use the `is_config()`
-helper function.
+All Python code must be written **compatible with Python 3.6+**. More detailed
+code conventions can be found in the [developer docs](https://github.com/explosion/spaCy/blob/master/extra/DEVELOPER_DOCS/Code%20Conventions.md).

-```python
-from .compat import unicode_, is_config
-
-compatible_unicode = unicode_('hello world')
-if is_config(windows=True, python2=True):
-    print("You are using Python 2 on Windows.")
-```
+#### I/O and handling paths

 Code that interacts with the file-system should accept objects that follow the
 `pathlib.Path` API, without assuming that the object inherits from `pathlib.Path`.
 If the function is user-facing and takes a path as an argument, it should check
 whether the path is provided as a string. Strings should be converted to
 `pathlib.Path` objects. Serialization and deserialization functions should always
-accept **file-like objects**, as it makes the library io-agnostic. Working on
+accept **file-like objects**, as it makes the library IO-agnostic. Working on
 buffers makes the code more general, easier to test, and compatible with Python
 3's asynchronous IO.

+#### Composition vs. inheritance
+
 Although spaCy uses a lot of classes, **inheritance is viewed with some suspicion**
 — it's seen as a mechanism of last resort. You should discuss plans to extend
 the class hierarchy before implementing.

+#### Naming conventions
+
 We have a number of conventions around variable naming that are still being
 documented, and aren't 100% strict. A general policy is that instances of the
-class `Doc` should by default be called `doc`, `Token` `token`, `Lexeme` `lex`,
-`Vocab` `vocab` and `Language` `nlp`. You should avoid naming variables that are
+class `Doc` should by default be called `doc`, `Token` &rarr; `token`, `Lexeme` &rarr; `lex`,
+`Vocab` &rarr; `vocab` and `Language` &rarr; `nlp`. You should avoid naming variables that are
 of other types these names. For instance, don't name a text string `doc` — you
 should usually call this `text`. Two general code style preferences further help
 with naming. First, **lean away from introducing temporary variables**, as these
@ -400,7 +388,7 @@ of Python and C++, with additional complexity and syntax from numpy. The
 many "traps for new players". Working in Cython is very rewarding once you're
 over the initial learning curve. As with C and C++, the first way you write
 something in Cython will often be the performance-optimal approach. In contrast,
-Python optimisation generally requires a lot of experimentation. Is it faster to
+Python optimization generally requires a lot of experimentation. Is it faster to
 have an `if item in my_dict` check, or to use `.get()`? What about `try`/`except`?
 Does this numpy operation create a copy? There's no way to guess the answers to
 these questions, and you'll usually be dissatisfied with your results — so
@ -413,10 +401,10 @@ Python. If it's not fast enough the first time, just switch to Cython.

 ### Resources to get you started

-   [PEP 8 Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/) (python.org)
-   [Official Cython documentation](http://docs.cython.org/en/latest/) (cython.org)
-   [Writing C in Cython](https://explosion.ai/blog/writing-c-in-cython) (explosion.ai)
-   [Multi-threading spaCy’s parser and named entity recogniser](https://explosion.ai/blog/multithreading-with-cython) (explosion.ai)
+- [PEP 8 Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/) (python.org)
+- [Official Cython documentation](http://docs.cython.org/en/latest/) (cython.org)
+- [Writing C in Cython](https://explosion.ai/blog/writing-c-in-cython) (explosion.ai)
+- [Multi-threading spaCy’s parser and named entity recognizer](https://explosion.ai/blog/multithreading-with-cython) (explosion.ai)

 ## Adding tests

@ -428,16 +416,9 @@ name. For example, tests for the `Tokenizer` can be found in
 all test files and test functions need to be prefixed with `test_`.

 When adding tests, make sure to use descriptive names, keep the code short and
-concise and only test for one behaviour at a time. Try to `parametrize` test
+concise and only test for one behavior at a time. Try to `parametrize` test
 cases wherever possible, use our pre-defined fixtures for spaCy components and
-avoid unnecessary imports.
-
-Extensive tests that take a long time should be marked with `@pytest.mark.slow`.
-Tests that require the model to be loaded should be marked with
-`@pytest.mark.models`. Loading the models is expensive and not necessary if
-you're not actually testing the model performance. If all you need is a `Doc`
-object with annotations like heads, POS tags or the dependency parse, you can
-use the `get_doc()` utility function to construct it manually.
+avoid unnecessary imports. Extensive tests that take a long time should be marked with `@pytest.mark.slow`.

 📖 **For more guidelines and information on how to add tests, check out the [tests README](spacy/tests/README.md).**

@ -454,27 +435,26 @@ simply click on the "Suggest edits" button at the bottom of a page.
 ## Publishing spaCy extensions and plugins

 We're very excited about all the new possibilities for **community extensions**
-and plugins in spaCy v2.0, and we can't wait to see what you build with it!
+and plugins in spaCy v3.0, and we can't wait to see what you build with it!

-   An extension or plugin should add substantial functionality, be
-    **well-documented** and **open-source**. It should be available for users to download
-    and install as a Python package – for example via [PyPi](http://pypi.python.org).
+- An extension or plugin should add substantial functionality, be
+  **well-documented** and **open-source**. It should be available for users to download
+  and install as a Python package – for example via [PyPi](http://pypi.python.org).

-   Extensions that write to `Doc`, `Token` or `Span` attributes should be wrapped
-    as [pipeline components](https://spacy.io/usage/processing-pipelines#custom-components)
-    that users can **add to their processing pipeline** using `nlp.add_pipe()`.
+- Extensions that write to `Doc`, `Token` or `Span` attributes should be wrapped
+  as [pipeline components](https://spacy.io/usage/processing-pipelines#custom-components)
+  that users can **add to their processing pipeline** using `nlp.add_pipe()`.

-   When publishing your extension on GitHub, **tag it** with the topics
-    [`spacy`](https://github.com/topics/spacy?o=desc&s=stars) and
-    [`spacy-extensions`](https://github.com/topics/spacy-extension?o=desc&s=stars)
-    to make it easier to find. Those are also the topics we're linking to from the
-    spaCy website. If you're sharing your project on Twitter, feel free to tag
-    [@spacy_io](https://twitter.com/spacy_io) so we can check it out.
+- When publishing your extension on GitHub, **tag it** with the topics
+  [`spacy`](https://github.com/topics/spacy?o=desc&s=stars) and
+  [`spacy-extensions`](https://github.com/topics/spacy-extension?o=desc&s=stars)
+  to make it easier to find. Those are also the topics we're linking to from the
+  spaCy website. If you're sharing your project on X, feel free to tag
+  [@spacy_io](https://x.com/spacy_io) so we can check it out.

-   Once your extension is published, you can open an issue on the
-    [issue tracker](https://github.com/explosion/spacy/issues) to suggest it for the
-    [resources directory](https://spacy.io/usage/resources#extensions) on the
-    website.
+- Once your extension is published, you can open a
+  [PR](https://github.com/explosion/spaCy/pulls) to suggest it for the
+  [Universe](https://spacy.io/universe) page.

 📖 **For more tips and best practices, see the [checklist for developing spaCy extensions](https://spacy.io/usage/processing-pipelines#extensions).**

--- a/2
+++ b/2
@ -1,6 +1,6 @@
 The MIT License (MIT)

-Copyright (C) 2016-2020 ExplosionAI GmbH, 2016 spaCy GmbH, 2015 Matthew Honnibal
+Copyright (C) 2016-2024 ExplosionAI GmbH, 2016 spaCy GmbH, 2015 Matthew Honnibal

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/MANIFEST.in
+++ b/MANIFEST.in
@ -1,9 +1,9 @@
-recursive-include include *.h
-recursive-include spacy *.txt *.pyx *.pxd
+recursive-include spacy *.pyi *.pyx *.pxd *.txt *.cfg *.jinja *.toml *.hh
 include LICENSE
 include README.md
-include bin/spacy
 include pyproject.toml
-recursive-exclude spacy/lang *.json
-recursive-include spacy/lang *.json.gz
+include spacy/py.typed
+recursive-include spacy/cli *.yml
+recursive-include spacy/tests *.json
 recursive-include licenses *
+recursive-exclude spacy *.cpp
--- a/48
+++ b/48
@ -1,29 +1,55 @@
 SHELL := /bin/bash
-PYVER := 3.6
+
+ifndef SPACY_EXTRAS
+override SPACY_EXTRAS = spacy-lookups-data==1.0.3
+endif
+
+ifndef PYVER
+override PYVER = 3.8
+endif
+
 VENV := ./env$(PYVER)

 version := $(shell "bin/get-version.sh")
+package := $(shell "bin/get-package.sh")

-dist/spacy-$(version).pex : wheelhouse/spacy-$(version).stamp
-	$(VENV)/bin/pex -f ./wheelhouse --no-index --disable-cache -m spacy -o $@ spacy==$(version) jsonschema spacy-lookups-data jieba pkuseg==0.0.25 sudachipy sudachidict_core
+ifndef SPACY_BIN
+override SPACY_BIN = $(package)-$(version).pex
+endif
+
+ifndef WHEELHOUSE
+override WHEELHOUSE = "./wheelhouse"
+endif
+
+
+dist/$(SPACY_BIN) : $(WHEELHOUSE)/spacy-$(PYVER)-$(version).stamp
+	$(VENV)/bin/pex \
+		-f $(WHEELHOUSE) \
+		--no-index \
+		--disable-cache \
+		-o $@ \
+		$(package)==$(version) \
+		$(SPACY_EXTRAS)
 	chmod a+rx $@
 	cp $@ dist/spacy.pex

-dist/pytest.pex : wheelhouse/pytest-*.whl
-	$(VENV)/bin/pex -f ./wheelhouse --no-index --disable-cache -m pytest -o $@ pytest pytest-timeout mock
+dist/pytest.pex : $(WHEELHOUSE)/pytest-*.whl
+	$(VENV)/bin/pex -f $(WHEELHOUSE) --no-index --disable-cache -m pytest -o $@ pytest pytest-timeout mock
 	chmod a+rx $@

-wheelhouse/spacy-$(version).stamp : $(VENV)/bin/pex setup.py spacy/*.py* spacy/*/*.py*
-	$(VENV)/bin/pip wheel . -w ./wheelhouse
-	$(VENV)/bin/pip wheel jsonschema spacy-lookups-data jieba pkuseg==0.0.25 sudachipy sudachidict_core -w ./wheelhouse
+$(WHEELHOUSE)/spacy-$(PYVER)-$(version).stamp : $(VENV)/bin/pex setup.py spacy/*.py* spacy/*/*.py*
+	$(VENV)/bin/pip wheel . -w $(WHEELHOUSE)
+	$(VENV)/bin/pip wheel $(SPACY_EXTRAS) -w $(WHEELHOUSE)
+
 	touch $@

-wheelhouse/pytest-%.whl : $(VENV)/bin/pex
-	$(VENV)/bin/pip wheel pytest pytest-timeout mock -w ./wheelhouse
+$(WHEELHOUSE)/pytest-%.whl : $(VENV)/bin/pex
+	$(VENV)/bin/pip wheel pytest pytest-timeout mock -w $(WHEELHOUSE)

 $(VENV)/bin/pex :
 	python$(PYVER) -m venv $(VENV)
 	$(VENV)/bin/pip install -U pip setuptools pex wheel
+	$(VENV)/bin/pip install numpy

 .PHONY : clean test

@ -33,6 +59,6 @@ test : dist/spacy-$(version).pex dist/pytest.pex

 clean : setup.py
 	rm -rf dist/*
-	rm -rf ./wheelhouse
+	rm -rf $(WHEELHOUSE)/*
 	rm -rf $(VENV)
 	python setup.py clean --all
--- a/README.md
+++ b/README.md
@ -2,105 +2,122 @@

 # spaCy: Industrial-strength NLP

-spaCy is a library for advanced Natural Language Processing in Python and
+spaCy is a library for **advanced Natural Language Processing** in Python and
 Cython. It's built on the very latest research, and was designed from day one to
-be used in real products. spaCy comes with
-[pretrained statistical models](https://spacy.io/models) and word vectors, and
-currently supports tokenization for **60+ languages**. It features
-state-of-the-art speed, convolutional **neural network models** for tagging,
-parsing and **named entity recognition** and easy **deep learning** integration.
-It's commercial open-source software, released under the MIT license.
+be used in real products.

-💫 **Version 2.3 out now!**
+spaCy comes with [pretrained pipelines](https://spacy.io/models) and currently
+supports tokenization and training for **70+ languages**. It features
+state-of-the-art speed and **neural network models** for tagging, parsing,
+**named entity recognition**, **text classification** and more, multi-task
+learning with pretrained **transformers** like BERT, as well as a
+production-ready [**training system**](https://spacy.io/usage/training) and easy
+model packaging, deployment and workflow management. spaCy is commercial
+open-source software, released under the
+[MIT license](https://github.com/explosion/spaCy/blob/master/LICENSE).
+
+💫 **Version 3.8 out now!**
 [Check out the release notes here.](https://github.com/explosion/spaCy/releases)

-🌙 **Version 3.0 (nightly) out now!**
-[Check out the release notes here.](https://github.com/explosion/spaCy/releases/tag/v3.0.0rc1)
-
-[![Azure Pipelines](<https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build+(3.x)>)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
-[![Travis Build Status](<https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square&logo=travis-ci&logoColor=white&label=build+(2.7)>)](https://travis-ci.org/explosion/spaCy)
+[![tests](https://github.com/explosion/spaCy/actions/workflows/tests.yml/badge.svg)](https://github.com/explosion/spaCy/actions/workflows/tests.yml)
 [![Current Release Version](https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square&logo=github)](https://github.com/explosion/spaCy/releases)
 [![pypi Version](https://img.shields.io/pypi/v/spacy.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/spacy/)
 [![conda Version](https://img.shields.io/conda/vn/conda-forge/spacy.svg?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/spacy)
 [![Python wheels](https://img.shields.io/badge/wheels-%E2%9C%93-4c1.svg?longCache=true&style=flat-square&logo=python&logoColor=white)](https://github.com/explosion/wheelwright/releases)
-[![PyPi downloads](https://img.shields.io/pypi/dm/spacy?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/spacy/)
-[![Conda downloads](https://img.shields.io/conda/dn/conda-forge/spacy?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/spacy)
-[![Model downloads](https://img.shields.io/github/downloads/explosion/spacy-models/total?style=flat-square&label=model+downloads)](https://github.com/explosion/spacy-models/releases)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)
-[![spaCy on Twitter](https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow)](https://twitter.com/spacy_io)
+<br />
+[![PyPi downloads](https://static.pepy.tech/personalized-badge/spacy?period=total&units=international_system&left_color=grey&right_color=orange&left_text=pip%20downloads)](https://pypi.org/project/spacy/)
+[![Conda downloads](https://img.shields.io/conda/dn/conda-forge/spacy?label=conda%20downloads)](https://anaconda.org/conda-forge/spacy)

 ## 📖 Documentation

-| Documentation   |                                                                |
-| --------------- | -------------------------------------------------------------- |
-| [spaCy 101]     | New to spaCy? Here's everything you need to know!              |
-| [Usage Guides]  | How to use spaCy and its features.                             |
-| [New in v2.3]   | New features, backwards incompatibilities and migration guide. |
-| [API Reference] | The detailed reference for spaCy's API.                        |
-| [Models]        | Download statistical language models for spaCy.                |
-| [Universe]      | Libraries, extensions, demos, books and courses.               |
-| [Changelog]     | Changes and version history.                                   |
-| [Contribute]    | How to contribute to the spaCy project and code base.          |
+| Documentation                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                              |
+| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ⭐️ **[spaCy 101]**                                                                                                                                                                                                       | New to spaCy? Here's everything you need to know!                                                                                                                                                                                                                                                                                            |
+| 📚 **[Usage Guides]**                                                                                                                                                                                                     | How to use spaCy and its features.                                                                                                                                                                                                                                                                                                           |
+| 🚀 **[New in v3.0]**                                                                                                                                                                                                      | New features, backwards incompatibilities and migration guide.                                                                                                                                                                                                                                                                               |
+| 🪐 **[Project Templates]**                                                                                                                                                                                                | End-to-end workflows you can clone, modify and run.                                                                                                                                                                                                                                                                                          |
+| 🎛 **[API Reference]**                                                                                                                                                                                                     | The detailed reference for spaCy's API.                                                                                                                                                                                                                                                                                                      |
+| ⏩ **[GPU Processing]**                                                                                                                                                                                                    | Use spaCy with CUDA-compatible GPU processing.                                                                                                                                                                                                                                                                                               |
+| 📦 **[Models]**                                                                                                                                                                                                           | Download trained pipelines for spaCy.                                                                                                                                                                                                                                                                                                        |
+| 🦙 **[Large Language Models]**                                                                                                                                                                                            | Integrate LLMs into spaCy pipelines.                                                                                                                                                                                                                                                                                                        |
+| 🌌 **[Universe]**                                                                                                                                                                                                         | Plugins, extensions, demos and books from the spaCy ecosystem.                                                                                                                                                                                                                                                                               |
+| ⚙️ **[spaCy VS Code Extension]**                                                                                                                                                                                          | Additional tooling and features for working with spaCy's config files.                                                                                                                                                                                                                                                                       |
+| 👩‍🏫 **[Online Course]**                                                                                                                                                                                                    | Learn spaCy in this free and interactive online course.                                                                                                                                                                                                                                                                                      |
+| 📰 **[Blog]**                                                                                                                                                                                                             | Read about current spaCy and Prodigy development, releases, talks and more from Explosion.                                                                                                                                                                                                                 |
+| 📺 **[Videos]**                                                                                                                                                                                                           | Our YouTube channel with video tutorials, talks and more.                                                                                                                                                                                                                                                                                    |
+| 🔴 **[Live Stream]**                                                                                                                                                                                                       | Join Matt as he works on spaCy and chat about NLP, live every week.                                                                                                                                                                                                                                                                         |
+| 🛠 **[Changelog]**                                                                                                                                                                                                         | Changes and version history.                                                                                                                                                                                                                                                                                                                 |
+| 💝 **[Contribute]**                                                                                                                                                                                                       | How to contribute to the spaCy project and code base.                                                                                                                                                                                                                                                                                        |
+| 👕 **[Swag]**                                                                                                                                                                                                             | Support us and our work with unique, custom-designed swag!                                                                                                                                                                                                                                                                                   |
+| <a href="https://explosion.ai/tailored-solutions"><img src="https://github.com/explosion/spaCy/assets/13643239/36d2a42e-98c0-4599-90e1-788ef75181be" width="150" alt="Tailored Solutions"/></a> | Custom NLP consulting, implementation and strategic advice by spaCy’s core development team. Streamlined, production-ready, predictable and maintainable. Send us an email or take our 5-minute questionnaire, and well'be in touch! **[Learn more &rarr;](https://explosion.ai/tailored-solutions)**                 |

 [spacy 101]: https://spacy.io/usage/spacy-101
-[new in v2.3]: https://spacy.io/usage/v2-3
+[new in v3.0]: https://spacy.io/usage/v3
 [usage guides]: https://spacy.io/usage/
 [api reference]: https://spacy.io/api/
+[gpu processing]: https://spacy.io/usage#gpu
 [models]: https://spacy.io/models
+[large language models]: https://spacy.io/usage/large-language-models
 [universe]: https://spacy.io/universe
+[spacy vs code extension]: https://github.com/explosion/spacy-vscode
+[videos]: https://www.youtube.com/c/ExplosionAI
+[live stream]: https://www.youtube.com/playlist?list=PLBmcuObd5An5_iAxNYLJa_xWmNzsYce8c
+[online course]: https://course.spacy.io
+[blog]: https://explosion.ai
+[project templates]: https://github.com/explosion/projects
 [changelog]: https://spacy.io/usage#changelog
 [contribute]: https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md
+[swag]: https://explosion.ai/merch

 ## 💬 Where to ask questions

-The spaCy project is maintained by [@honnibal](https://github.com/honnibal) and
-[@ines](https://github.com/ines), along with core contributors
-[@svlandeg](https://github.com/svlandeg) and
-[@adrianeboyd](https://github.com/adrianeboyd). Please understand that we won't
-be able to provide individual support via email. We also believe that help is
-much more valuable if it's shared publicly, so that more people can benefit from
-it.
+The spaCy project is maintained by the [spaCy team](https://explosion.ai/about).
+Please understand that we won't be able to provide individual support via email.
+We also believe that help is much more valuable if it's shared publicly, so that
+more people can benefit from it.

 | Type                            | Platforms                               |
 | ------------------------------- | --------------------------------------- |
 | 🚨 **Bug Reports**              | [GitHub Issue Tracker]                  |
-| 🎁 **Feature Requests & Ideas** | [GitHub Discussions]                    |
+| 🎁 **Feature Requests & Ideas** | [GitHub Discussions] · [Live Stream]    |
 | 👩‍💻 **Usage Questions**          | [GitHub Discussions] · [Stack Overflow] |
-| 🗯 **General Discussion**        | [GitHub Discussions]                    |
+| 🗯 **General Discussion**        | [GitHub Discussions] · [Live Stream]   |

 [github issue tracker]: https://github.com/explosion/spaCy/issues
 [github discussions]: https://github.com/explosion/spaCy/discussions
 [stack overflow]: https://stackoverflow.com/questions/tagged/spacy
+[live stream]: https://www.youtube.com/playlist?list=PLBmcuObd5An5_iAxNYLJa_xWmNzsYce8c

 ## Features

- Non-destructive **tokenization**
- **Named entity** recognition
- Support for **50+ languages**
- pretrained [statistical models](https://spacy.io/models) and word vectors
+- Support for **70+ languages**
+- **Trained pipelines** for different languages and tasks
+- Multi-task learning with pretrained **transformers** like BERT
+- Support for pretrained **word vectors** and embeddings
 - State-of-the-art speed
- Easy **deep learning** integration
- Part-of-speech tagging
- Labelled dependency parsing
- Syntax-driven sentence segmentation
+- Production-ready **training system**
+- Linguistically-motivated **tokenization**
+- Components for named **entity recognition**, part-of-speech-tagging,
+  dependency parsing, sentence segmentation, **text classification**,
+  lemmatization, morphological analysis, entity linking and more
+- Easily extensible with **custom components** and attributes
+- Support for custom models in **PyTorch**, **TensorFlow** and other frameworks
 - Built in **visualizers** for syntax and NER
- Convenient string-to-hash mapping
- Export to numpy data arrays
- Efficient binary serialization
- Easy **model packaging** and deployment
+- Easy **model packaging**, deployment and workflow management
 - Robust, rigorously evaluated accuracy

 📖 **For more details, see the
 [facts, figures and benchmarks](https://spacy.io/usage/facts-figures).**

-## Install spaCy
+## ⏳ Install spaCy

 For detailed installation instructions, see the
 [documentation](https://spacy.io/usage).

 - **Operating system**: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual
  Studio)
- **Python version**: Python 2.7, 3.5+ (only 64 bit)
+- **Python version**: Python >=3.7, <3.13 (only 64 bit)
 - **Package managers**: [pip] · [conda] (via `conda-forge`)

 [pip]: https://pypi.org/project/spacy/
@ -108,30 +125,21 @@ For detailed installation instructions, see the

 ### pip

-Using pip, spaCy releases are available as source packages and binary wheels (as
-of `v2.0.13`). Before you install spaCy and its dependencies, make sure that
-`pip`, `setuptools` and `wheel` are up to date.
+Using pip, spaCy releases are available as source packages and binary wheels.
+Before you install spaCy and its dependencies, make sure that your `pip`,
+`setuptools` and `wheel` are up to date.

 ```bash
 pip install -U pip setuptools wheel
 pip install spacy
 ```

-For installation on python 2.7 or 3.5 where binary wheels are not provided for
-the most recent versions of the dependencies, you can prefer older binary
-wheels over newer source packages with `--prefer-binary`:
-
-```bash
-pip install spacy --prefer-binary
-```
-
-To install additional data tables for lemmatization and normalization in
-**spaCy v2.2+** you can run `pip install spacy[lookups]` or install
+To install additional data tables for lemmatization and normalization you can
+run `pip install spacy[lookups]` or install
 [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data)
 separately. The lookups package is needed to create blank models with
-lemmatization data for v2.2+ plus normalization data for v2.3+, and to
-lemmatize in languages that don't yet come with pretrained models and aren't
-powered by third-party libraries.
+lemmatization data, and to lemmatize in languages that don't yet come with
+pretrained models and aren't powered by third-party libraries.

 When using pip it is generally recommended to install packages in a virtual
 environment to avoid modifying system state:
@ -145,17 +153,14 @@ pip install spacy

 ### conda

-Thanks to our great community, we've finally re-added conda support. You can now
-install spaCy via `conda-forge`:
+You can also install spaCy from `conda` via the `conda-forge` channel. For the
+feedstock including the build recipe and configuration, check out
+[this repository](https://github.com/conda-forge/spacy-feedstock).

 ```bash
 conda install -c conda-forge spacy
 ```

-For the feedstock including the build recipe and configuration, check out
-[this repository](https://github.com/conda-forge/spacy-feedstock). Improvements
-and pull requests to the recipe and setup are always appreciated.
-
 ### Updating spaCy

 Some updates to spaCy may require downloading new statistical models. If you're
@ -172,37 +177,40 @@ If you've trained your own models, keep in mind that your training and runtime
 inputs must match. After updating spaCy, we recommend **retraining your models**
 with the new version.

-📖 **For details on upgrading from spaCy 1.x to spaCy 2.x, see the
-[migration guide](https://spacy.io/usage/v2#migrating).**
+📖 **For details on upgrading from spaCy 2.x to spaCy 3.x, see the
+[migration guide](https://spacy.io/usage/v3#migrating).**

-## Download models
+## 📦 Download model packages

-As of v1.7.0, models for spaCy can be installed as **Python packages**. This
-means that they're a component of your application, just like any other module.
-Models can be installed using spaCy's `download` command, or manually by
-pointing pip to a path or URL.
+Trained pipelines for spaCy can be installed as **Python packages**. This means
+that they're a component of your application, just like any other module. Models
+can be installed using spaCy's [`download`](https://spacy.io/api/cli#download)
+command, or manually by pointing pip to a path or URL.

-| Documentation          |                                                               |
-| ---------------------- | ------------------------------------------------------------- |
-| [Available Models]     | Detailed model descriptions, accuracy figures and benchmarks. |
-| [Models Documentation] | Detailed usage instructions.                                  |
+| Documentation              |                                                                  |
+| -------------------------- | ---------------------------------------------------------------- |
+| **[Available Pipelines]**  | Detailed pipeline descriptions, accuracy figures and benchmarks. |
+| **[Models Documentation]** | Detailed usage and installation instructions.                    |
+| **[Training]**             | How to train your own pipelines on your data.                    |

-[available models]: https://spacy.io/models
-[models documentation]: https://spacy.io/docs/usage/models
+[available pipelines]: https://spacy.io/models
+[models documentation]: https://spacy.io/usage/models
+[training]: https://spacy.io/usage/training

 ```bash
-# download best-matching version of specific model for your spaCy installation
+# Download best-matching version of specific model for your spaCy installation
 python -m spacy download en_core_web_sm

-# pip install .tar.gz archive from path or URL
-pip install /Users/you/en_core_web_sm-2.2.0.tar.gz
-pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz
+# pip install .tar.gz archive or .whl from path or URL
+pip install /Users/you/en_core_web_sm-3.0.0.tar.gz
+pip install /Users/you/en_core_web_sm-3.0.0-py3-none-any.whl
+pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
 ```

 ### Loading and using models

-To load a model, use `spacy.load()` with the model name, a shortcut link or a
-path to the model data directory.
+To load a model, use [`spacy.load()`](https://spacy.io/api/top-level#spacy.load)
+with the model name or a path to the model data directory.

 ```python
 import spacy
@ -224,7 +232,7 @@ doc = nlp("This is a sentence.")
 📖 **For more info and examples, check out the
 [models documentation](https://spacy.io/docs/usage/models).**

-## Compile from source
+## ⚒ Compile from source

 The other way to install spaCy is to clone its
 [GitHub repository](https://github.com/explosion/spaCy) and build it from
@ -234,8 +242,18 @@ Python distribution including header files, a compiler,
 [pip](https://pip.pypa.io/en/latest/installing/),
 [virtualenv](https://virtualenv.pypa.io/en/latest/) and
 [git](https://git-scm.com) installed. The compiler part is the trickiest. How to
-do that depends on your system. See notes on Ubuntu, OS X and Windows for
-details.
+do that depends on your system.
+
+| Platform    |                                                                                                                                                                                                                                                                     |
+| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **Ubuntu**  | Install system-level dependencies via `apt-get`: `sudo apt-get install build-essential python-dev git` .                                                                                                                                                            |
+| **Mac**     | Install a recent version of [XCode](https://developer.apple.com/xcode/), including the so-called "Command Line Tools". macOS and OS X ship with Python and git preinstalled.                                                                                        |
+| **Windows** | Install a version of the [Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) or [Visual Studio Express](https://visualstudio.microsoft.com/vs/express/) that matches the version that was used to compile your Python interpreter. |
+
+For more details and instructions, see the documentation on
+[compiling spaCy from source](https://spacy.io/usage#source) and the
+[quickstart widget](https://spacy.io/usage#section-quickstart) to get the right
+commands for your platform and Python version.

 ```bash
 git clone https://github.com/explosion/spaCy
@ -247,66 +265,28 @@ source .env/bin/activate
 # make sure you are using the latest pip
 python -m pip install -U pip setuptools wheel

-pip install .
+pip install -r requirements.txt
+pip install --no-build-isolation --editable .
 ```

 To install with extras:

 ```bash
-pip install .[lookups,cuda102]
+pip install --no-build-isolation --editable .[lookups,cuda102]
 ```

-To install all dependencies required for development:
-
-```bash
-pip install -r requirements.txt
-```
-
-Compared to regular install via pip, [requirements.txt](requirements.txt)
-additionally installs developer dependencies such as Cython. For more details
-and instructions, see the documentation on
-[compiling spaCy from source](https://spacy.io/usage#source) and the
-[quickstart widget](https://spacy.io/usage#section-quickstart) to get the right
-commands for your platform and Python version.
-
-### Ubuntu
-
-Install system-level dependencies via `apt-get`:
-
-```bash
-sudo apt-get install build-essential python-dev git
-```
-
-### macOS / OS X
-
-Install a recent version of [XCode](https://developer.apple.com/xcode/),
-including the so-called "Command Line Tools". macOS and OS X ship with Python
-and git preinstalled.
-
-### Windows
-
-Install a version of the
-[Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
-or [Visual Studio Express](https://visualstudio.microsoft.com/vs/express/) that
-matches the version that was used to compile your Python interpreter. For
-official distributions these are VS 2008 (Python 2.7), VS 2010 (Python 3.4) and
-VS 2015 (Python 3.5).
-
-## Run tests
+## 🚦 Run tests

 spaCy comes with an [extensive test suite](spacy/tests). In order to run the
 tests, you'll usually want to clone the repository and build spaCy from source.
 This will also install the required development dependencies and test utilities
-defined in the `requirements.txt`.
+defined in the [`requirements.txt`](requirements.txt).

 Alternatively, you can run `pytest` on the tests from within the installed
 `spacy` package. Don't forget to also install the test utilities via spaCy's
-`requirements.txt`:
+[`requirements.txt`](requirements.txt):

 ```bash
 pip install -r requirements.txt
 python -m pytest --pyargs spacy
 ```
-
-See [the documentation](https://spacy.io/usage#tests) for more details and
-examples.
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@ -1,133 +0,0 @@
-trigger:
-  batch: true
-  branches:
-    include:
-    - '*'
-    exclude:
-    - 'spacy.io'
-  paths:
-    exclude:
-    - 'website/*'
-    - '*.md'
-pr:
-  paths:
-    exclude:
-    - 'website/*'
-    - '*.md'
-
-jobs:
-
-# Perform basic checks for most important errors (syntax etc.) Uses the config
-# defined in .flake8 and overwrites the selected codes.
- job: 'Validate'
-  pool:
-    vmImage: 'ubuntu-16.04'
-  steps:
-  - task: UsePythonVersion@0
-    inputs:
-      versionSpec: '3.7'
-  - script: |
-      pip install flake8
-      python -m flake8 spacy --count --select=E901,E999,F821,F822,F823 --show-source --statistics
-    displayName: 'flake8'
-
- job: 'Test'
-  dependsOn: 'Validate'
-  strategy:
-    matrix:
-      Python35Linux:
-        imageName: 'ubuntu-16.04'
-        python.version: '3.5'
-        os: linux
-      Python35Windows:
-        imageName: 'vs2017-win2016'
-        python.version: '3.5'
-# Test on one OS per python 3.6/3.7/3.8 to speed up CI
-      Python36Linux:
-        imageName: 'ubuntu-16.04'
-        python.version: '3.6'
-#      Python36Windows:
-#        imageName: 'vs2017-win2016'
-#        python.version: '3.6'
-#      Python36Mac:
-#        imageName: 'macos-10.14'
-#        python.version: '3.6'
-#      Python37Linux:
-#        imageName: 'ubuntu-16.04'
-#        python.version: '3.7'
-      Python37Windows:
-        imageName: 'vs2017-win2016'
-        python.version: '3.7'
-#      Python37Mac:
-#        imageName: 'macos-10.14'
-#        python.version: '3.7'
-#      Python38Linux:
-#        imageName: 'ubuntu-16.04'
-#        python.version: '3.8'
-#      Python38Windows:
-#        imageName: 'vs2017-win2016'
-#        python.version: '3.8'
-      Python38Mac:
-        imageName: 'macos-10.14'
-        python.version: '3.8'
-      Python39Linux:
-        imageName: 'ubuntu-16.04'
-        python.version: '3.9'
-      Python39Windows:
-        imageName: 'vs2017-win2016'
-        python.version: '3.9'
-      Python39Mac:
-        imageName: 'macos-10.14'
-        python.version: '3.9'
-    maxParallel: 4
-  pool:
-    vmImage: $(imageName)
-
-  steps:
-  - task: UsePythonVersion@0
-    inputs:
-      versionSpec: '$(python.version)'
-      architecture: 'x64'
-
-  - script: python -m pip install -U pip setuptools
-    displayName: 'Update pip'
-
-  - script: pip install -r requirements.txt --prefer-binary
-    displayName: 'Install dependencies (python 3.5: prefer binary)'
-    condition: eq(variables['python.version'], '3.5')
-
-  - script: pip install -r requirements.txt
-    displayName: 'Install dependencies'
-    condition: not(eq(variables['python.version'], '3.5'))
-
-  - script: |
-      python setup.py build_ext --inplace -j 2
-      python setup.py sdist --formats=gztar
-    displayName: 'Compile and build sdist'
-
-  - task: DeleteFiles@1
-    inputs:
-      contents: 'spacy'
-    displayName: 'Delete source directory'
-
-  - script: |
-      pip freeze > installed.txt
-      pip uninstall -y -r installed.txt
-    displayName: 'Uninstall all packages'
-
-  - bash: |
-      SDIST=$(python -c "import os;print(os.listdir('./dist')[-1])" 2>&1)
-      pip install dist/$SDIST --prefer-binary
-    displayName: 'Install from sdist (python 3.5: prefer binary)'
-    condition: eq(variables['python.version'], '3.5')
-
-  - bash: |
-      SDIST=$(python -c "import os;print(os.listdir('./dist')[-1])" 2>&1)
-      pip install dist/$SDIST
-    displayName: 'Install from sdist'
-    condition: not(eq(variables['python.version'], '3.5'))
-
-  - script: |
-      pip install -r requirements.txt --prefer-binary
-      python -m pytest --pyargs spacy
-    displayName: 'Run tests'
--- a/bin/cythonize.py
+++ b/bin/cythonize.py
@ -1,169 +0,0 @@
-#!/usr/bin/env python
-""" cythonize.py
-
-Cythonize pyx files into C++ files as needed.
-
-Usage: cythonize.py [root]
-
-Checks pyx files to see if they have been changed relative to their
-corresponding C++ files. If they have, then runs cython on these files to
-recreate the C++ files.
-
-Additionally, checks pxd files and setup.py if they have been changed. If
-they have, rebuilds everything.
-
-Change detection based on file hashes stored in JSON format.
-
-For now, this script should be run by developers when changing Cython files
-and the resulting C++ files checked in, so that end-users (and Python-only
-developers) do not get the Cython dependencies.
-
-Based upon:
-
-https://raw.github.com/dagss/private-scipy-refactor/cythonize/cythonize.py
-https://raw.githubusercontent.com/numpy/numpy/master/tools/cythonize.py
-
-Note: this script does not check any of the dependent C++ libraries.
-"""
-from __future__ import print_function
-
-import os
-import sys
-import json
-import hashlib
-import subprocess
-import argparse
-
-
-HASH_FILE = "cythonize.json"
-
-
-def process_pyx(fromfile, tofile, language_level="-2"):
-    print("Processing %s" % fromfile)
-    try:
-        from Cython.Compiler.Version import version as cython_version
-        from distutils.version import LooseVersion
-
-        if LooseVersion(cython_version) < LooseVersion("0.19"):
-            raise Exception("Require Cython >= 0.19")
-
-    except ImportError:
-        pass
-
-    flags = ["--fast-fail", language_level]
-    if tofile.endswith(".cpp"):
-        flags += ["--cplus"]
-
-    try:
-        try:
-            r = subprocess.call(
-                ["cython"] + flags + ["-o", tofile, fromfile], env=os.environ
-            )  # See Issue #791
-            if r != 0:
-                raise Exception("Cython failed")
-        except OSError:
-            # There are ways of installing Cython that don't result in a cython
-            # executable on the path, see gh-2397.
-            r = subprocess.call(
-                [
-                    sys.executable,
-                    "-c",
-                    "import sys; from Cython.Compiler.Main import "
-                    "setuptools_main as main; sys.exit(main())",
-                ]
-                + flags
-                + ["-o", tofile, fromfile]
-            )
-            if r != 0:
-                raise Exception("Cython failed")
-    except OSError:
-        raise OSError("Cython needs to be installed")
-
-
-def preserve_cwd(path, func, *args):
-    orig_cwd = os.getcwd()
-    try:
-        os.chdir(path)
-        func(*args)
-    finally:
-        os.chdir(orig_cwd)
-
-
-def load_hashes(filename):
-    try:
-        return json.load(open(filename))
-    except (ValueError, IOError):
-        return {}
-
-
-def save_hashes(hash_db, filename):
-    with open(filename, "w") as f:
-        f.write(json.dumps(hash_db))
-
-
-def get_hash(path):
-    return hashlib.md5(open(path, "rb").read()).hexdigest()
-
-
-def hash_changed(base, path, db):
-    full_path = os.path.normpath(os.path.join(base, path))
-    return not get_hash(full_path) == db.get(full_path)
-
-
-def hash_add(base, path, db):
-    full_path = os.path.normpath(os.path.join(base, path))
-    db[full_path] = get_hash(full_path)
-
-
-def process(base, filename, db):
-    root, ext = os.path.splitext(filename)
-    if ext in [".pyx", ".cpp"]:
-        if hash_changed(base, filename, db) or not os.path.isfile(
-            os.path.join(base, root + ".cpp")
-        ):
-            preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp")
-            hash_add(base, root + ".cpp", db)
-            hash_add(base, root + ".pyx", db)
-
-
-def check_changes(root, db):
-    res = False
-    new_db = {}
-
-    setup_filename = "setup.py"
-    hash_add(".", setup_filename, new_db)
-    if hash_changed(".", setup_filename, db):
-        res = True
-
-    for base, _, files in os.walk(root):
-        for filename in files:
-            if filename.endswith(".pxd"):
-                hash_add(base, filename, new_db)
-                if hash_changed(base, filename, db):
-                    res = True
-
-    if res:
-        db.clear()
-        db.update(new_db)
-    return res
-
-
-def run(root):
-    db = load_hashes(HASH_FILE)
-
-    try:
-        check_changes(root, db)
-        for base, _, files in os.walk(root):
-            for filename in files:
-                process(base, filename, db)
-    finally:
-        save_hashes(db, HASH_FILE)
-
-
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(
-        description="Cythonize pyx files into C++ files as needed"
-    )
-    parser.add_argument("root", help="root directory")
-    args = parser.parse_args()
-    run(args.root)
--- a/bin/get-package.sh
+++ b/bin/get-package.sh
@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+
+set -e
+
+version=$(grep "__title__ = " spacy/about.py)
+version=${version/__title__ = }
+version=${version/\'/}
+version=${version/\'/}
+version=${version/\"/}
+version=${version/\"/}
+
+echo $version
--- a/bin/load_reddit.py
+++ b/bin/load_reddit.py
@ -1,97 +0,0 @@
-# coding: utf8
-from __future__ import unicode_literals
-
-import bz2
-import re
-import srsly
-import sys
-import random
-import datetime
-import plac
-from pathlib import Path
-
-_unset = object()
-
-
-class Reddit(object):
-    """Stream cleaned comments from Reddit."""
-
-    pre_format_re = re.compile(r"^[`*~]")
-    post_format_re = re.compile(r"[`*~]$")
-    url_re = re.compile(r"\[([^]]+)\]\(%%URL\)")
-    link_re = re.compile(r"\[([^]]+)\]\(https?://[^\)]+\)")
-
-    def __init__(self, file_path, meta_keys={"subreddit": "section"}):
-        """
-        file_path (unicode / Path): Path to archive or directory of archives.
-        meta_keys (dict): Meta data key included in the Reddit corpus, mapped
-            to display name in Prodigy meta.
-        RETURNS (Reddit): The Reddit loader.
-        """
-        self.meta = meta_keys
-        file_path = Path(file_path)
-        if not file_path.exists():
-            raise IOError("Can't find file path: {}".format(file_path))
-        if not file_path.is_dir():
-            self.files = [file_path]
-        else:
-            self.files = list(file_path.iterdir())
-
-    def __iter__(self):
-        for file_path in self.iter_files():
-            with bz2.open(str(file_path)) as f:
-                for line in f:
-                    line = line.strip()
-                    if not line:
-                        continue
-                    comment = srsly.json_loads(line)
-                    if self.is_valid(comment):
-                        text = self.strip_tags(comment["body"])
-                        yield {"text": text}
-
-    def get_meta(self, item):
-        return {name: item.get(key, "n/a") for key, name in self.meta.items()}
-
-    def iter_files(self):
-        for file_path in self.files:
-            yield file_path
-
-    def strip_tags(self, text):
-        text = self.link_re.sub(r"\1", text)
-        text = text.replace("&gt;", ">").replace("&lt;", "<")
-        text = self.pre_format_re.sub("", text)
-        text = self.post_format_re.sub("", text)
-        text = re.sub(r"\s+", " ", text)
-        return text.strip()
-
-    def is_valid(self, comment):
-        return (
-            comment["body"] is not None
-            and comment["body"] != "[deleted]"
-            and comment["body"] != "[removed]"
-        )
-
-
-def main(path):
-    reddit = Reddit(path)
-    for comment in reddit:
-        print(srsly.json_dumps(comment))
-
-
-if __name__ == "__main__":
-    import socket
-
-    try:
-        BrokenPipeError
-    except NameError:
-        BrokenPipeError = socket.error
-    try:
-        plac.call(main)
-    except BrokenPipeError:
-        import os, sys
-
-        # Python flushes standard streams on exit; redirect remaining output
-        # to devnull to avoid another BrokenPipeError at shutdown
-        devnull = os.open(os.devnull, os.O_WRONLY)
-        os.dup2(devnull, sys.stdout.fileno())
-        sys.exit(1)  # Python exits with error code 1 on EPIPE
--- a/bin/release.sh
+++ b/bin/release.sh
@ -0,0 +1,20 @@
+#!/usr/bin/env bash
+
+set -e
+
+# Insist repository is clean
+git diff-index --quiet HEAD
+
+version=$(grep "__version__ = " spacy/about.py)
+version=${version/__version__ = }
+version=${version/\'/}
+version=${version/\'/}
+version=${version/\"/}
+version=${version/\"/}
+
+echo "Pushing release-v"$version
+
+git tag -d release-v$version || true
+git push origin :release-v$version || true
+git tag release-v$version
+git push origin release-v$version
--- a/bin/spacy
+++ b/bin/spacy
@ -1,2 +0,0 @@
-#! /bin/sh
-python -m spacy "$@"
--- a/bin/train_word_vectors.py
+++ b/bin/train_word_vectors.py
@ -1,81 +0,0 @@
-#!/usr/bin/env python
-from __future__ import print_function, unicode_literals, division
-
-import logging
-from pathlib import Path
-from collections import defaultdict
-from gensim.models import Word2Vec
-import plac
-import spacy
-
-logger = logging.getLogger(__name__)
-
-
-class Corpus(object):
-    def __init__(self, directory, nlp):
-        self.directory = directory
-        self.nlp = nlp
-
-    def __iter__(self):
-        for text_loc in iter_dir(self.directory):
-            with text_loc.open("r", encoding="utf-8") as file_:
-                text = file_.read()
-
-            # This is to keep the input to the blank model (which doesn't
-            # sentencize) from being too long. It works particularly well with
-            # the output of [WikiExtractor](https://github.com/attardi/wikiextractor)
-            paragraphs = text.split('\n\n')
-            for par in paragraphs:
-                yield [word.orth_ for word in self.nlp(par)]
-
-
-def iter_dir(loc):
-    dir_path = Path(loc)
-    for fn_path in dir_path.iterdir():
-        if fn_path.is_dir():
-            for sub_path in fn_path.iterdir():
-                yield sub_path
-        else:
-            yield fn_path
-
-
-@plac.annotations(
-    lang=("ISO language code"),
-    in_dir=("Location of input directory"),
-    out_loc=("Location of output file"),
-    n_workers=("Number of workers", "option", "n", int),
-    size=("Dimension of the word vectors", "option", "d", int),
-    window=("Context window size", "option", "w", int),
-    min_count=("Min count", "option", "m", int),
-    negative=("Number of negative samples", "option", "g", int),
-    nr_iter=("Number of iterations", "option", "i", int),
-)
-def main(
-    lang,
-    in_dir,
-    out_loc,
-    negative=5,
-    n_workers=4,
-    window=5,
-    size=128,
-    min_count=10,
-    nr_iter=5,
-):
-    logging.basicConfig(
-        format="%(asctime)s : %(levelname)s : %(message)s", level=logging.INFO
-    )
-    nlp = spacy.blank(lang)
-    corpus = Corpus(in_dir, nlp)
-    model = Word2Vec(
-        sentences=corpus,
-        size=size,
-        window=window,
-        min_count=min_count,
-        workers=n_workers,
-        sample=1e-5,
-        negative=negative,
-    )
-    model.save(out_loc)
-
-if __name__ == "__main__":
-    plac.call(main)
--- a/bin/ud/init.py
+++ b/bin/ud/init.py
@ -1,2 +0,0 @@
-from .conll17_ud_eval import main as ud_evaluate  # noqa: F401
-from .ud_train import main as ud_train  # noqa: F401
--- a/bin/ud/conll17_ud_eval.py
+++ b/bin/ud/conll17_ud_eval.py
@ -1,614 +0,0 @@
-#!/usr/bin/env python
-# flake8: noqa
-
-# CoNLL 2017 UD Parsing evaluation script.
-#
-# Compatible with Python 2.7 and 3.2+, can be used either as a module
-# or a standalone executable.
-#
-# Copyright 2017 Institute of Formal and Applied Linguistics (UFAL),
-# Faculty of Mathematics and Physics, Charles University, Czech Republic.
-#
-# Changelog:
-# - [02 Jan 2017] Version 0.9: Initial release
-# - [25 Jan 2017] Version 0.9.1: Fix bug in LCS alignment computation
-# - [10 Mar 2017] Version 1.0: Add documentation and test
-#                              Compare HEADs correctly using aligned words
-#                              Allow evaluation with errorneous spaces in forms
-#                              Compare forms in LCS case insensitively
-#                              Detect cycles and multiple root nodes
-#                              Compute AlignedAccuracy
-
-# Command line usage
-# ------------------
-# conll17_ud_eval.py [-v] [-w weights_file] gold_conllu_file system_conllu_file
-#
-# - if no -v is given, only the CoNLL17 UD Shared Task evaluation LAS metrics
-#   is printed
-# - if -v is given, several metrics are printed (as precision, recall, F1 score,
-#   and in case the metric is computed on aligned words also accuracy on these):
-#   - Tokens: how well do the gold tokens match system tokens
-#   - Sentences: how well do the gold sentences match system sentences
-#   - Words: how well can the gold words be aligned to system words
-#   - UPOS: using aligned words, how well does UPOS match
-#   - XPOS: using aligned words, how well does XPOS match
-#   - Feats: using aligned words, how well does FEATS match
-#   - AllTags: using aligned words, how well does UPOS+XPOS+FEATS match
-#   - Lemmas: using aligned words, how well does LEMMA match
-#   - UAS: using aligned words, how well does HEAD match
-#   - LAS: using aligned words, how well does HEAD+DEPREL(ignoring subtypes) match
-# - if weights_file is given (with lines containing deprel-weight pairs),
-#   one more metric is shown:
-#   - WeightedLAS: as LAS, but each deprel (ignoring subtypes) has different weight
-
-# API usage
-# ---------
-# - load_conllu(file)
-#   - loads CoNLL-U file from given file object to an internal representation
-#   - the file object should return str on both Python 2 and Python 3
-#   - raises UDError exception if the given file cannot be loaded
-# - evaluate(gold_ud, system_ud)
-#   - evaluate the given gold and system CoNLL-U files (loaded with load_conllu)
-#   - raises UDError if the concatenated tokens of gold and system file do not match
-#   - returns a dictionary with the metrics described above, each metrics having
-#     four fields: precision, recall, f1 and aligned_accuracy (when using aligned
-#     words, otherwise this is None)
-
-# Description of token matching
-# -----------------------------
-# In order to match tokens of gold file and system file, we consider the text
-# resulting from concatenation of gold tokens and text resulting from
-# concatenation of system tokens. These texts should match -- if they do not,
-# the evaluation fails.
-#
-# If the texts do match, every token is represented as a range in this original
-# text, and tokens are equal only if their range is the same.
-
-# Description of word matching
-# ----------------------------
-# When matching words of gold file and system file, we first match the tokens.
-# The words which are also tokens are matched as tokens, but words in multi-word
-# tokens have to be handled differently.
-#
-# To handle multi-word tokens, we start by finding "multi-word spans".
-# Multi-word span is a span in the original text such that
-# - it contains at least one multi-word token
-# - all multi-word tokens in the span (considering both gold and system ones)
-#   are completely inside the span (i.e., they do not "stick out")
-# - the multi-word span is as small as possible
-#
-# For every multi-word span, we align the gold and system words completely
-# inside this span using LCS on their FORMs. The words not intersecting
-# (even partially) any multi-word span are then aligned as tokens.
-
-
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import io
-import sys
-import unittest
-
-# CoNLL-U column names
-ID, FORM, LEMMA, UPOS, XPOS, FEATS, HEAD, DEPREL, DEPS, MISC = range(10)
-
-# UD Error is used when raising exceptions in this module
-class UDError(Exception):
-    pass
-
-# Load given CoNLL-U file into internal representation
-def load_conllu(file, check_parse=True):
-    # Internal representation classes
-    class UDRepresentation:
-        def __init__(self):
-            # Characters of all the tokens in the whole file.
-            # Whitespace between tokens is not included.
-            self.characters = []
-            # List of UDSpan instances with start&end indices into `characters`.
-            self.tokens = []
-            # List of UDWord instances.
-            self.words = []
-            # List of UDSpan instances with start&end indices into `characters`.
-            self.sentences = []
-    class UDSpan:
-        def __init__(self, start, end, characters):
-            self.start = start
-            # Note that self.end marks the first position **after the end** of span,
-            # so we can use characters[start:end] or range(start, end).
-            self.end = end
-            self.characters = characters
-
-        @property
-        def text(self):
-            return ''.join(self.characters[self.start:self.end])
-
-        def __str__(self):
-            return self.text
-
-        def __repr__(self):
-            return self.text
-    class UDWord:
-        def __init__(self, span, columns, is_multiword):
-            # Span of this word (or MWT, see below) within ud_representation.characters.
-            self.span = span
-            # 10 columns of the CoNLL-U file: ID, FORM, LEMMA,...
-            self.columns = columns
-            # is_multiword==True means that this word is part of a multi-word token.
-            # In that case, self.span marks the span of the whole multi-word token.
-            self.is_multiword = is_multiword
-            # Reference to the UDWord instance representing the HEAD (or None if root).
-            self.parent = None
-            # Let's ignore language-specific deprel subtypes.
-            self.columns[DEPREL] = columns[DEPREL].split(':')[0]
-
-    ud = UDRepresentation()
-
-    # Load the CoNLL-U file
-    index, sentence_start = 0, None
-    linenum = 0
-    while True:
-        line = file.readline()
-        linenum += 1
-        if not line:
-            break
-        line = line.rstrip("\r\n")
-
-        # Handle sentence start boundaries
-        if sentence_start is None:
-            # Skip comments
-            if line.startswith("#"):
-                continue
-            # Start a new sentence
-            ud.sentences.append(UDSpan(index, 0, ud.characters))
-            sentence_start = len(ud.words)
-        if not line:
-            # Add parent UDWord links and check there are no cycles
-            def process_word(word):
-                if word.parent == "remapping":
-                    raise UDError("There is a cycle in a sentence")
-                if word.parent is None:
-                    head = int(word.columns[HEAD])
-                    if head > len(ud.words) - sentence_start:
-                        raise UDError("Line {}: HEAD '{}' points outside of the sentence".format(
-                            linenum, word.columns[HEAD]))
-                    if head:
-                        parent = ud.words[sentence_start + head - 1]
-                        word.parent = "remapping"
-                        process_word(parent)
-                        word.parent = parent
-
-            for word in ud.words[sentence_start:]:
-                process_word(word)
-
-            # Check there is a single root node
-            if check_parse:
-                if len([word for word in ud.words[sentence_start:] if word.parent is None]) != 1:
-                    raise UDError("There are multiple roots in a sentence")
-
-            # End the sentence
-            ud.sentences[-1].end = index
-            sentence_start = None
-            continue
-
-        # Read next token/word
-        columns = line.split("\t")
-        if len(columns) != 10:
-            raise UDError("The CoNLL-U line {} does not contain 10 tab-separated columns: '{}'".format(linenum, line))
-
-        # Skip empty nodes
-        if "." in columns[ID]:
-            continue
-
-        # Delete spaces from FORM so gold.characters == system.characters
-        # even if one of them tokenizes the space.
-        columns[FORM] = columns[FORM].replace(" ", "")
-        if not columns[FORM]:
-            raise UDError("There is an empty FORM in the CoNLL-U file -- line %d" % linenum)
-
-        # Save token
-        ud.characters.extend(columns[FORM])
-        ud.tokens.append(UDSpan(index, index + len(columns[FORM]), ud.characters))
-        index += len(columns[FORM])
-
-        # Handle multi-word tokens to save word(s)
-        if "-" in columns[ID]:
-            try:
-                start, end = map(int, columns[ID].split("-"))
-            except:
-                raise UDError("Cannot parse multi-word token ID '{}'".format(columns[ID]))
-
-            for _ in range(start, end + 1):
-                word_line = file.readline().rstrip("\r\n")
-                word_columns = word_line.split("\t")
-                if len(word_columns) != 10:
-                    print(columns)
-                    raise UDError("The CoNLL-U line {} does not contain 10 tab-separated columns: '{}'".format(linenum, word_line))
-                ud.words.append(UDWord(ud.tokens[-1], word_columns, is_multiword=True))
-        # Basic tokens/words
-        else:
-            try:
-                word_id = int(columns[ID])
-            except:
-                raise UDError("Cannot parse word ID '{}'".format(columns[ID]))
-            if word_id != len(ud.words) - sentence_start + 1:
-                raise UDError("Incorrect word ID '{}' for word '{}', expected '{}'".format(columns[ID], columns[FORM], len(ud.words) - sentence_start + 1))
-
-            try:
-                head_id = int(columns[HEAD])
-            except:
-                raise UDError("Cannot parse HEAD '{}'".format(columns[HEAD]))
-            if head_id < 0:
-                raise UDError("HEAD cannot be negative")
-
-            ud.words.append(UDWord(ud.tokens[-1], columns, is_multiword=False))
-
-    if sentence_start is not None:
-        raise UDError("The CoNLL-U file does not end with empty line")
-
-    return ud
-
-# Evaluate the gold and system treebanks (loaded using load_conllu).
-def evaluate(gold_ud, system_ud, deprel_weights=None, check_parse=True):
-    class Score:
-        def __init__(self, gold_total, system_total, correct, aligned_total=None, undersegmented=None, oversegmented=None):
-            self.precision = correct / system_total if system_total else 0.0
-            self.recall = correct / gold_total if gold_total else 0.0
-            self.f1 = 2 * correct / (system_total + gold_total) if system_total + gold_total else 0.0
-            self.aligned_accuracy = correct / aligned_total if aligned_total else aligned_total
-            self.undersegmented = undersegmented
-            self.oversegmented = oversegmented
-            self.under_perc = len(undersegmented) / gold_total if gold_total and undersegmented else 0.0
-            self.over_perc = len(oversegmented) / gold_total if gold_total and oversegmented else 0.0
-    class AlignmentWord:
-        def __init__(self, gold_word, system_word):
-            self.gold_word = gold_word
-            self.system_word = system_word
-            self.gold_parent = None
-            self.system_parent_gold_aligned = None
-    class Alignment:
-        def __init__(self, gold_words, system_words):
-            self.gold_words = gold_words
-            self.system_words = system_words
-            self.matched_words = []
-            self.matched_words_map = {}
-        def append_aligned_words(self, gold_word, system_word):
-            self.matched_words.append(AlignmentWord(gold_word, system_word))
-            self.matched_words_map[system_word] = gold_word
-        def fill_parents(self):
-            # We represent root parents in both gold and system data by '0'.
-            # For gold data, we represent non-root parent by corresponding gold word.
-            # For system data, we represent non-root parent by either gold word aligned
-            # to parent system nodes, or by None if no gold words is aligned to the parent.
-            for words in self.matched_words:
-                words.gold_parent = words.gold_word.parent if words.gold_word.parent is not None else 0
-                words.system_parent_gold_aligned = self.matched_words_map.get(words.system_word.parent, None) \
-                    if words.system_word.parent is not None else 0
-
-    def lower(text):
-        if sys.version_info < (3, 0) and isinstance(text, str):
-            return text.decode("utf-8").lower()
-        return text.lower()
-
-    def spans_score(gold_spans, system_spans):
-        correct, gi, si = 0, 0, 0
-        undersegmented = []
-        oversegmented = []
-        combo = 0
-        previous_end_si_earlier = False
-        previous_end_gi_earlier = False
-        while gi < len(gold_spans) and si < len(system_spans):
-            previous_si = system_spans[si-1] if si > 0 else None
-            previous_gi = gold_spans[gi-1] if gi > 0 else None
-            if system_spans[si].start < gold_spans[gi].start:
-                # avoid counting the same mistake twice
-                if not previous_end_si_earlier:
-                    combo += 1
-                    oversegmented.append(str(previous_gi).strip())
-                si += 1
-            elif gold_spans[gi].start < system_spans[si].start:
-                # avoid counting the same mistake twice
-                if not previous_end_gi_earlier:
-                    combo += 1
-                    undersegmented.append(str(previous_si).strip())
-                gi += 1
-            else:
-                correct += gold_spans[gi].end == system_spans[si].end
-                if gold_spans[gi].end < system_spans[si].end:
-                    undersegmented.append(str(system_spans[si]).strip())
-                    previous_end_gi_earlier = True
-                    previous_end_si_earlier = False
-                elif gold_spans[gi].end > system_spans[si].end:
-                    oversegmented.append(str(gold_spans[gi]).strip())
-                    previous_end_si_earlier = True
-                    previous_end_gi_earlier = False
-                else:
-                    previous_end_gi_earlier = False
-                    previous_end_si_earlier = False
-                si += 1
-                gi += 1
-
-        return Score(len(gold_spans), len(system_spans), correct, None, undersegmented, oversegmented)
-
-    def alignment_score(alignment, key_fn, weight_fn=lambda w: 1):
-        gold, system, aligned, correct = 0, 0, 0, 0
-
-        for word in alignment.gold_words:
-            gold += weight_fn(word)
-
-        for word in alignment.system_words:
-            system += weight_fn(word)
-
-        for words in alignment.matched_words:
-            aligned += weight_fn(words.gold_word)
-
-        if key_fn is None:
-            # Return score for whole aligned words
-            return Score(gold, system, aligned)
-
-        for words in alignment.matched_words:
-            if key_fn(words.gold_word, words.gold_parent) == key_fn(words.system_word, words.system_parent_gold_aligned):
-                correct += weight_fn(words.gold_word)
-
-        return Score(gold, system, correct, aligned)
-
-    def beyond_end(words, i, multiword_span_end):
-        if i >= len(words):
-            return True
-        if words[i].is_multiword:
-            return words[i].span.start >= multiword_span_end
-        return words[i].span.end > multiword_span_end
-
-    def extend_end(word, multiword_span_end):
-        if word.is_multiword and word.span.end > multiword_span_end:
-            return word.span.end
-        return multiword_span_end
-
-    def find_multiword_span(gold_words, system_words, gi, si):
-        # We know gold_words[gi].is_multiword or system_words[si].is_multiword.
-        # Find the start of the multiword span (gs, ss), so the multiword span is minimal.
-        # Initialize multiword_span_end characters index.
-        if gold_words[gi].is_multiword:
-            multiword_span_end = gold_words[gi].span.end
-            if not system_words[si].is_multiword and system_words[si].span.start < gold_words[gi].span.start:
-                si += 1
-        else: # if system_words[si].is_multiword
-            multiword_span_end = system_words[si].span.end
-            if not gold_words[gi].is_multiword and gold_words[gi].span.start < system_words[si].span.start:
-                gi += 1
-        gs, ss = gi, si
-
-        # Find the end of the multiword span
-        # (so both gi and si are pointing to the word following the multiword span end).
-        while not beyond_end(gold_words, gi, multiword_span_end) or \
-              not beyond_end(system_words, si, multiword_span_end):
-            if gi < len(gold_words) and (si >= len(system_words) or
-                                         gold_words[gi].span.start <= system_words[si].span.start):
-                multiword_span_end = extend_end(gold_words[gi], multiword_span_end)
-                gi += 1
-            else:
-                multiword_span_end = extend_end(system_words[si], multiword_span_end)
-                si += 1
-        return gs, ss, gi, si
-
-    def compute_lcs(gold_words, system_words, gi, si, gs, ss):
-        lcs = [[0] * (si - ss) for i in range(gi - gs)]
-        for g in reversed(range(gi - gs)):
-            for s in reversed(range(si - ss)):
-                if lower(gold_words[gs + g].columns[FORM]) == lower(system_words[ss + s].columns[FORM]):
-                    lcs[g][s] = 1 + (lcs[g+1][s+1] if g+1 < gi-gs and s+1 < si-ss else 0)
-                lcs[g][s] = max(lcs[g][s], lcs[g+1][s] if g+1 < gi-gs else 0)
-                lcs[g][s] = max(lcs[g][s], lcs[g][s+1] if s+1 < si-ss else 0)
-        return lcs
-
-    def align_words(gold_words, system_words):
-        alignment = Alignment(gold_words, system_words)
-
-        gi, si = 0, 0
-        while gi < len(gold_words) and si < len(system_words):
-            if gold_words[gi].is_multiword or system_words[si].is_multiword:
-                # A: Multi-word tokens => align via LCS within the whole "multiword span".
-                gs, ss, gi, si = find_multiword_span(gold_words, system_words, gi, si)
-
-                if si > ss and gi > gs:
-                    lcs = compute_lcs(gold_words, system_words, gi, si, gs, ss)
-
-                    # Store aligned words
-                    s, g = 0, 0
-                    while g < gi - gs and s < si - ss:
-                        if lower(gold_words[gs + g].columns[FORM]) == lower(system_words[ss + s].columns[FORM]):
-                            alignment.append_aligned_words(gold_words[gs+g], system_words[ss+s])
-                            g += 1
-                            s += 1
-                        elif lcs[g][s] == (lcs[g+1][s] if g+1 < gi-gs else 0):
-                            g += 1
-                        else:
-                            s += 1
-            else:
-                # B: No multi-word token => align according to spans.
-                if (gold_words[gi].span.start, gold_words[gi].span.end) == (system_words[si].span.start, system_words[si].span.end):
-                    alignment.append_aligned_words(gold_words[gi], system_words[si])
-                    gi += 1
-                    si += 1
-                elif gold_words[gi].span.start <= system_words[si].span.start:
-                    gi += 1
-                else:
-                    si += 1
-
-        alignment.fill_parents()
-
-        return alignment
-
-    # Check that underlying character sequences do match
-    if gold_ud.characters != system_ud.characters:
-        index = 0
-        while gold_ud.characters[index] == system_ud.characters[index]:
-            index += 1
-
-        raise UDError(
-            "The concatenation of tokens in gold file and in system file differ!\n" +
-            "First 20 differing characters in gold file: '{}' and system file: '{}'".format(
-                "".join(gold_ud.characters[index:index + 20]),
-                "".join(system_ud.characters[index:index + 20])
-            )
-        )
-
-    # Align words
-    alignment = align_words(gold_ud.words, system_ud.words)
-
-    # Compute the F1-scores
-    if check_parse:
-        result = {
-            "Tokens": spans_score(gold_ud.tokens, system_ud.tokens),
-            "Sentences": spans_score(gold_ud.sentences, system_ud.sentences),
-            "Words": alignment_score(alignment, None),
-            "UPOS": alignment_score(alignment, lambda w, parent: w.columns[UPOS]),
-            "XPOS": alignment_score(alignment, lambda w, parent: w.columns[XPOS]),
-            "Feats": alignment_score(alignment, lambda w, parent: w.columns[FEATS]),
-            "AllTags": alignment_score(alignment, lambda w, parent: (w.columns[UPOS], w.columns[XPOS], w.columns[FEATS])),
-            "Lemmas": alignment_score(alignment, lambda w, parent: w.columns[LEMMA]),
-            "UAS": alignment_score(alignment, lambda w, parent: parent),
-            "LAS": alignment_score(alignment, lambda w, parent: (parent, w.columns[DEPREL])),
-        }
-    else:
-        result = {
-            "Tokens": spans_score(gold_ud.tokens, system_ud.tokens),
-            "Sentences": spans_score(gold_ud.sentences, system_ud.sentences),
-            "Words": alignment_score(alignment, None),
-            "Feats": alignment_score(alignment, lambda w, parent: w.columns[FEATS]),
-            "Lemmas": alignment_score(alignment, lambda w, parent: w.columns[LEMMA]),
-        }
-
-
-    # Add WeightedLAS if weights are given
-    if deprel_weights is not None:
-        def weighted_las(word):
-            return deprel_weights.get(word.columns[DEPREL], 1.0)
-        result["WeightedLAS"] = alignment_score(alignment, lambda w, parent: (parent, w.columns[DEPREL]), weighted_las)
-
-    return result
-
-def load_deprel_weights(weights_file):
-    if weights_file is None:
-        return None
-
-    deprel_weights = {}
-    for line in weights_file:
-        # Ignore comments and empty lines
-        if line.startswith("#") or not line.strip():
-            continue
-
-        columns = line.rstrip("\r\n").split()
-        if len(columns) != 2:
-            raise ValueError("Expected two columns in the UD Relations weights file on line '{}'".format(line))
-
-        deprel_weights[columns[0]] = float(columns[1])
-
-    return deprel_weights
-
-def load_conllu_file(path):
-    _file = open(path, mode="r", **({"encoding": "utf-8"} if sys.version_info >= (3, 0) else {}))
-    return load_conllu(_file)
-
-def evaluate_wrapper(args):
-    # Load CoNLL-U files
-    gold_ud = load_conllu_file(args.gold_file)
-    system_ud = load_conllu_file(args.system_file)
-
-    # Load weights if requested
-    deprel_weights = load_deprel_weights(args.weights)
-
-    return evaluate(gold_ud, system_ud, deprel_weights)
-
-def main():
-    # Parse arguments
-    parser = argparse.ArgumentParser()
-    parser.add_argument("gold_file", type=str,
-                        help="Name of the CoNLL-U file with the gold data.")
-    parser.add_argument("system_file", type=str,
-                        help="Name of the CoNLL-U file with the predicted data.")
-    parser.add_argument("--weights", "-w", type=argparse.FileType("r"), default=None,
-                        metavar="deprel_weights_file",
-                        help="Compute WeightedLAS using given weights for Universal Dependency Relations.")
-    parser.add_argument("--verbose", "-v", default=0, action="count",
-                        help="Print all metrics.")
-    args = parser.parse_args()
-
-    # Use verbose if weights are supplied
-    if args.weights is not None and not args.verbose:
-        args.verbose = 1
-
-    # Evaluate
-    evaluation = evaluate_wrapper(args)
-
-    # Print the evaluation
-    if not args.verbose:
-        print("LAS F1 Score: {:.2f}".format(100 * evaluation["LAS"].f1))
-    else:
-        metrics = ["Tokens", "Sentences", "Words", "UPOS", "XPOS", "Feats", "AllTags", "Lemmas", "UAS", "LAS"]
-        if args.weights is not None:
-            metrics.append("WeightedLAS")
-
-        print("Metrics    | Precision |    Recall |  F1 Score | AligndAcc")
-        print("-----------+-----------+-----------+-----------+-----------")
-        for metric in metrics:
-            print("{:11}|{:10.2f} |{:10.2f} |{:10.2f} |{}".format(
-                metric,
-                100 * evaluation[metric].precision,
-                100 * evaluation[metric].recall,
-                100 * evaluation[metric].f1,
-                "{:10.2f}".format(100 * evaluation[metric].aligned_accuracy) if evaluation[metric].aligned_accuracy is not None else ""
-            ))
-
-if __name__ == "__main__":
-    main()
-
-# Tests, which can be executed with `python -m unittest conll17_ud_eval`.
-class TestAlignment(unittest.TestCase):
-    @staticmethod
-    def _load_words(words):
-        """Prepare fake CoNLL-U files with fake HEAD to prevent multiple roots errors."""
-        lines, num_words = [], 0
-        for w in words:
-            parts = w.split(" ")
-            if len(parts) == 1:
-                num_words += 1
-                lines.append("{}\t{}\t_\t_\t_\t_\t{}\t_\t_\t_".format(num_words, parts[0], int(num_words>1)))
-            else:
-                lines.append("{}-{}\t{}\t_\t_\t_\t_\t_\t_\t_\t_".format(num_words + 1, num_words + len(parts) - 1, parts[0]))
-                for part in parts[1:]:
-                    num_words += 1
-                    lines.append("{}\t{}\t_\t_\t_\t_\t{}\t_\t_\t_".format(num_words, part, int(num_words>1)))
-        return load_conllu((io.StringIO if sys.version_info >= (3, 0) else io.BytesIO)("\n".join(lines+["\n"])))
-
-    def _test_exception(self, gold, system):
-        self.assertRaises(UDError, evaluate, self._load_words(gold), self._load_words(system))
-
-    def _test_ok(self, gold, system, correct):
-        metrics = evaluate(self._load_words(gold), self._load_words(system))
-        gold_words = sum((max(1, len(word.split(" ")) - 1) for word in gold))
-        system_words = sum((max(1, len(word.split(" ")) - 1) for word in system))
-        self.assertEqual((metrics["Words"].precision, metrics["Words"].recall, metrics["Words"].f1),
-                         (correct / system_words, correct / gold_words, 2 * correct / (gold_words + system_words)))
-
-    def test_exception(self):
-        self._test_exception(["a"], ["b"])
-
-    def test_equal(self):
-        self._test_ok(["a"], ["a"], 1)
-        self._test_ok(["a", "b", "c"], ["a", "b", "c"], 3)
-
-    def test_equal_with_multiword(self):
-        self._test_ok(["abc a b c"], ["a", "b", "c"], 3)
-        self._test_ok(["a", "bc b c", "d"], ["a", "b", "c", "d"], 4)
-        self._test_ok(["abcd a b c d"], ["ab a b", "cd c d"], 4)
-        self._test_ok(["abc a b c", "de d e"], ["a", "bcd b c d", "e"], 5)
-
-    def test_alignment(self):
-        self._test_ok(["abcd"], ["a", "b", "c", "d"], 0)
-        self._test_ok(["abc", "d"], ["a", "b", "c", "d"], 1)
-        self._test_ok(["a", "bc", "d"], ["a", "b", "c", "d"], 2)
-        self._test_ok(["a", "bc b c", "d"], ["a", "b", "cd"], 2)
-        self._test_ok(["abc a BX c", "def d EX f"], ["ab a b", "cd c d", "ef e f"], 4)
-        self._test_ok(["ab a b", "cd bc d"], ["a", "bc", "d"], 2)
-        self._test_ok(["a", "bc b c", "d"], ["ab AX BX", "cd CX a"], 1)
--- a/bin/ud/run_eval.py
+++ b/bin/ud/run_eval.py
@ -1,293 +0,0 @@
-import spacy
-import time
-import re
-import plac
-import operator
-import datetime
-from pathlib import Path
-import xml.etree.ElementTree as ET
-
-import conll17_ud_eval
-from ud_train import write_conllu
-from spacy.lang.lex_attrs import word_shape
-from spacy.util import get_lang_class
-
-# All languages in spaCy - in UD format (note that Norwegian is 'no' instead of 'nb')
-ALL_LANGUAGES = ("af, ar, bg, bn, ca, cs, da, de, el, en, es, et, fa, fi, fr,"
-                 "ga, he, hi, hr, hu, id, is, it, ja, kn, ko, lt, lv, mr, no,"
-                 "nl, pl, pt, ro, ru, si, sk, sl, sq, sr, sv, ta, te, th, tl,"
-                 "tr, tt, uk, ur, vi, zh")
-
-# Non-parsing tasks that will be evaluated (works for default models)
-EVAL_NO_PARSE = ['Tokens', 'Words', 'Lemmas', 'Sentences', 'Feats']
-
-# Tasks that will be evaluated if check_parse=True (does not work for default models)
-EVAL_PARSE = ['Tokens', 'Words', 'Lemmas', 'Sentences', 'Feats', 'UPOS', 'XPOS', 'AllTags', 'UAS', 'LAS']
-
-# Minimum frequency an error should have to be printed
-PRINT_FREQ = 20
-
-# Maximum number of errors printed per category
-PRINT_TOTAL = 10
-
-space_re = re.compile("\s+")
-
-
-def load_model(modelname, add_sentencizer=False):
-    """ Load a specific spaCy model """
-    loading_start = time.time()
-    nlp = spacy.load(modelname)
-    if add_sentencizer:
-        nlp.add_pipe(nlp.create_pipe('sentencizer'))
-    loading_end = time.time()
-    loading_time = loading_end - loading_start
-    if add_sentencizer:
-        return nlp, loading_time, modelname + '_sentencizer'
-    return nlp, loading_time, modelname
-
-
-def load_default_model_sentencizer(lang):
-    """ Load a generic spaCy model and add the sentencizer for sentence tokenization"""
-    loading_start = time.time()
-    lang_class = get_lang_class(lang)
-    nlp = lang_class()
-    nlp.add_pipe(nlp.create_pipe('sentencizer'))
-    loading_end = time.time()
-    loading_time = loading_end - loading_start
-    return nlp, loading_time, lang + "_default_" + 'sentencizer'
-
-
-def split_text(text):
-    return [space_re.sub(" ", par.strip()) for par in text.split("\n\n")]
-
-
-def get_freq_tuples(my_list, print_total_threshold):
-    """ Turn a list of errors into frequency-sorted tuples thresholded by a certain total number """
-    d = {}
-    for token in my_list:
-        d.setdefault(token, 0)
-        d[token] += 1
-    return sorted(d.items(), key=operator.itemgetter(1), reverse=True)[:print_total_threshold]
-
-
-def _contains_blinded_text(stats_xml):
-    """ Heuristic to determine whether the treebank has blinded texts or not """
-    tree = ET.parse(stats_xml)
-    root = tree.getroot()
-    total_tokens = int(root.find('size/total/tokens').text)
-    unique_forms = int(root.find('forms').get('unique'))
-
-    # assume the corpus is largely blinded when there are less than 1% unique tokens
-    return (unique_forms / total_tokens) < 0.01
-
-
-def fetch_all_treebanks(ud_dir, languages, corpus, best_per_language):
-    """" Fetch the txt files for all treebanks for a given set of languages """
-    all_treebanks = dict()
-    treebank_size = dict()
-    for l in languages:
-        all_treebanks[l] = []
-        treebank_size[l] = 0
-
-    for treebank_dir in ud_dir.iterdir():
-        if treebank_dir.is_dir():
-            for txt_path in treebank_dir.iterdir():
-                if txt_path.name.endswith('-ud-' + corpus + '.txt'):
-                    file_lang = txt_path.name.split('_')[0]
-                    if file_lang in languages:
-                        gold_path = treebank_dir / txt_path.name.replace('.txt', '.conllu')
-                        stats_xml = treebank_dir / "stats.xml"
-                        # ignore treebanks where the texts are not publicly available
-                        if not _contains_blinded_text(stats_xml):
-                            if not best_per_language:
-                                all_treebanks[file_lang].append(txt_path)
-                            # check the tokens in the gold annotation to keep only the biggest treebank per language
-                            else:
-                                with gold_path.open(mode='r', encoding='utf-8') as gold_file:
-                                    gold_ud = conll17_ud_eval.load_conllu(gold_file)
-                                    gold_tokens = len(gold_ud.tokens)
-                                if treebank_size[file_lang] < gold_tokens:
-                                    all_treebanks[file_lang] = [txt_path]
-                                    treebank_size[file_lang] = gold_tokens
-
-    return all_treebanks
-
-
-def run_single_eval(nlp, loading_time, print_name, text_path, gold_ud, tmp_output_path, out_file, print_header,
-                    check_parse, print_freq_tasks):
-    """" Run an evaluation of a model nlp on a certain specified treebank """
-    with text_path.open(mode='r', encoding='utf-8') as f:
-        flat_text = f.read()
-
-    # STEP 1: tokenize text
-    tokenization_start = time.time()
-    texts = split_text(flat_text)
-    docs = list(nlp.pipe(texts))
-    tokenization_end = time.time()
-    tokenization_time = tokenization_end - tokenization_start
-
-    # STEP 2: record stats and timings
-    tokens_per_s = int(len(gold_ud.tokens) / tokenization_time)
-
-    print_header_1 = ['date', 'text_path', 'gold_tokens', 'model', 'loading_time', 'tokenization_time', 'tokens_per_s']
-    print_string_1 = [str(datetime.date.today()), text_path.name, len(gold_ud.tokens),
-                      print_name, "%.2f" % loading_time, "%.2f" % tokenization_time, tokens_per_s]
-
-    # STEP 3: evaluate predicted tokens and features
-    with tmp_output_path.open(mode="w", encoding="utf8") as tmp_out_file:
-        write_conllu(docs, tmp_out_file)
-    with tmp_output_path.open(mode="r", encoding="utf8") as sys_file:
-        sys_ud = conll17_ud_eval.load_conllu(sys_file, check_parse=check_parse)
-    tmp_output_path.unlink()
-    scores = conll17_ud_eval.evaluate(gold_ud, sys_ud, check_parse=check_parse)
-
-    # STEP 4: format the scoring results
-    eval_headers = EVAL_PARSE
-    if not check_parse:
-        eval_headers = EVAL_NO_PARSE
-
-    for score_name in eval_headers:
-        score = scores[score_name]
-        print_string_1.extend(["%.2f" % score.precision,
-                               "%.2f" % score.recall,
-                               "%.2f" % score.f1])
-        print_string_1.append("-" if score.aligned_accuracy is None else "%.2f" % score.aligned_accuracy)
-        print_string_1.append("-" if score.undersegmented is None else "%.4f" % score.under_perc)
-        print_string_1.append("-" if score.oversegmented is None else "%.4f" % score.over_perc)
-
-        print_header_1.extend([score_name + '_p', score_name + '_r', score_name + '_F', score_name + '_acc',
-                               score_name + '_under', score_name + '_over'])
-
-        if score_name in print_freq_tasks:
-            print_header_1.extend([score_name + '_word_under_ex', score_name + '_shape_under_ex',
-                                   score_name + '_word_over_ex', score_name + '_shape_over_ex'])
-
-            d_under_words = get_freq_tuples(score.undersegmented, PRINT_TOTAL)
-            d_under_shapes = get_freq_tuples([word_shape(x) for x in score.undersegmented], PRINT_TOTAL)
-            d_over_words = get_freq_tuples(score.oversegmented, PRINT_TOTAL)
-            d_over_shapes = get_freq_tuples([word_shape(x) for x in score.oversegmented], PRINT_TOTAL)
-
-            # saving to CSV with ; seperator so blinding ; in the example output
-            print_string_1.append(
-                str({k: v for k, v in d_under_words if v > PRINT_FREQ}).replace(";", "*SEMICOLON*"))
-            print_string_1.append(
-                str({k: v for k, v in d_under_shapes if v > PRINT_FREQ}).replace(";", "*SEMICOLON*"))
-            print_string_1.append(
-                str({k: v for k, v in d_over_words if v > PRINT_FREQ}).replace(";", "*SEMICOLON*"))
-            print_string_1.append(
-                str({k: v for k, v in d_over_shapes if v > PRINT_FREQ}).replace(";", "*SEMICOLON*"))
-
-    # STEP 5: print the formatted results to CSV
-    if print_header:
-        out_file.write(';'.join(map(str, print_header_1)) + '\n')
-    out_file.write(';'.join(map(str, print_string_1)) + '\n')
-
-
-def run_all_evals(models, treebanks, out_file, check_parse, print_freq_tasks):
-    """" Run an evaluation for each language with its specified models and treebanks """
-    print_header = True
-
-    for tb_lang, treebank_list in treebanks.items():
-        print()
-        print("Language", tb_lang)
-        for text_path in treebank_list:
-            print(" Evaluating on", text_path)
-
-            gold_path = text_path.parent / (text_path.stem + '.conllu')
-            print("  Gold data from ", gold_path)
-
-            # nested try blocks to ensure the code can continue with the next iteration after a failure
-            try:
-                with gold_path.open(mode='r', encoding='utf-8') as gold_file:
-                    gold_ud = conll17_ud_eval.load_conllu(gold_file)
-
-                for nlp, nlp_loading_time, nlp_name in models[tb_lang]:
-                    try:
-                        print("   Benchmarking", nlp_name)
-                        tmp_output_path = text_path.parent / str('tmp_' + nlp_name + '.conllu')
-                        run_single_eval(nlp, nlp_loading_time, nlp_name, text_path, gold_ud, tmp_output_path, out_file,
-                                        print_header, check_parse, print_freq_tasks)
-                        print_header = False
-                    except Exception as e:
-                        print("    Ran into trouble: ", str(e))
-            except Exception as e:
-                print("   Ran into trouble: ", str(e))
-
-
-@plac.annotations(
-    out_path=("Path to output CSV file", "positional", None, Path),
-    ud_dir=("Path to Universal Dependencies corpus", "positional", None, Path),
-    check_parse=("Set flag to evaluate parsing performance", "flag", "p", bool),
-    langs=("Enumeration of languages to evaluate (default: all)", "option", "l", str),
-    exclude_trained_models=("Set flag to exclude trained models", "flag", "t", bool),
-    exclude_multi=("Set flag to exclude the multi-language model as default baseline", "flag", "m", bool),
-    hide_freq=("Set flag to avoid printing out more detailed high-freq tokenization errors", "flag", "f", bool),
-    corpus=("Whether to run on train, dev or test", "option", "c", str),
-    best_per_language=("Set flag to only keep the largest treebank for each language", "flag", "b", bool)
-)
-def main(out_path, ud_dir, check_parse=False, langs=ALL_LANGUAGES, exclude_trained_models=False, exclude_multi=False,
-         hide_freq=False, corpus='train', best_per_language=False):
-    """"
-    Assemble all treebanks and models to run evaluations with.
-    When setting check_parse to True, the default models will not be evaluated as they don't have parsing functionality
-    """
-    languages = [lang.strip() for lang in langs.split(",")]
-
-    print_freq_tasks = []
-    if not hide_freq:
-        print_freq_tasks = ['Tokens']
-
-    # fetching all relevant treebank from the directory
-    treebanks = fetch_all_treebanks(ud_dir, languages, corpus, best_per_language)
-
-    print()
-    print("Loading all relevant models for", languages)
-    models = dict()
-
-    # multi-lang model
-    multi = None
-    if not exclude_multi and not check_parse:
-        multi = load_model('xx_ent_wiki_sm', add_sentencizer=True)
-
-    # initialize all models with the multi-lang model
-    for lang in languages:
-        models[lang] = [multi] if multi else []
-        # add default models if we don't want to evaluate parsing info
-        if not check_parse:
-            # Norwegian is 'nb' in spaCy but 'no' in the UD corpora
-            if lang == 'no':
-                models['no'].append(load_default_model_sentencizer('nb'))
-            else:
-                models[lang].append(load_default_model_sentencizer(lang))
-
-    # language-specific trained models
-    if not exclude_trained_models:
-        if 'de' in models:
-            models['de'].append(load_model('de_core_news_sm'))
-            models['de'].append(load_model('de_core_news_md'))
-        if 'el' in models:
-            models['el'].append(load_model('el_core_news_sm'))
-            models['el'].append(load_model('el_core_news_md'))
-        if 'en' in models:
-            models['en'].append(load_model('en_core_web_sm'))
-            models['en'].append(load_model('en_core_web_md'))
-            models['en'].append(load_model('en_core_web_lg'))
-        if 'es' in models:
-            models['es'].append(load_model('es_core_news_sm'))
-            models['es'].append(load_model('es_core_news_md'))
-        if 'fr' in models:
-            models['fr'].append(load_model('fr_core_news_sm'))
-            models['fr'].append(load_model('fr_core_news_md'))
-        if 'it' in models:
-            models['it'].append(load_model('it_core_news_sm'))
-        if 'nl' in models:
-            models['nl'].append(load_model('nl_core_news_sm'))
-        if 'pt' in models:
-            models['pt'].append(load_model('pt_core_news_sm'))
-
-    with out_path.open(mode='w', encoding='utf-8') as out_file:
-        run_all_evals(models, treebanks, out_file, check_parse, print_freq_tasks)
-
-
-if __name__ == "__main__":
-    plac.call(main)
--- a/bin/ud/ud_run_test.py
+++ b/bin/ud/ud_run_test.py
@ -1,335 +0,0 @@
-# flake8: noqa
-"""Train for CONLL 2017 UD treebank evaluation. Takes .conllu files, writes
-.conllu format for development data, allowing the official scorer to be used.
-"""
-from __future__ import unicode_literals
-
-import plac
-from pathlib import Path
-import re
-import sys
-import srsly
-
-import spacy
-import spacy.util
-from spacy.tokens import Token, Doc
-from spacy.gold import GoldParse
-from spacy.util import compounding, minibatch_by_words
-from spacy.syntax.nonproj import projectivize
-from spacy.matcher import Matcher
-
-# from spacy.morphology import Fused_begin, Fused_inside
-from spacy import displacy
-from collections import defaultdict, Counter
-from timeit import default_timer as timer
-
-Fused_begin = None
-Fused_inside = None
-
-import itertools
-import random
-import numpy.random
-
-from . import conll17_ud_eval
-
-from spacy import lang
-from spacy.lang import zh
-from spacy.lang import ja
-from spacy.lang import ru
-
-
-################
-# Data reading #
-################
-
-space_re = re.compile(r"\s+")
-
-
-def split_text(text):
-    return [space_re.sub(" ", par.strip()) for par in text.split("\n\n")]
-
-
-##############
-# Evaluation #
-##############
-
-
-def read_conllu(file_):
-    docs = []
-    sent = []
-    doc = []
-    for line in file_:
-        if line.startswith("# newdoc"):
-            if doc:
-                docs.append(doc)
-            doc = []
-        elif line.startswith("#"):
-            continue
-        elif not line.strip():
-            if sent:
-                doc.append(sent)
-            sent = []
-        else:
-            sent.append(list(line.strip().split("\t")))
-            if len(sent[-1]) != 10:
-                print(repr(line))
-                raise ValueError
-    if sent:
-        doc.append(sent)
-    if doc:
-        docs.append(doc)
-    return docs
-
-
-def evaluate(nlp, text_loc, gold_loc, sys_loc, limit=None):
-    if text_loc.parts[-1].endswith(".conllu"):
-        docs = []
-        with text_loc.open(encoding="utf8") as file_:
-            for conllu_doc in read_conllu(file_):
-                for conllu_sent in conllu_doc:
-                    words = [line[1] for line in conllu_sent]
-                    docs.append(Doc(nlp.vocab, words=words))
-        for name, component in nlp.pipeline:
-            docs = list(component.pipe(docs))
-    else:
-        with text_loc.open("r", encoding="utf8") as text_file:
-            texts = split_text(text_file.read())
-            docs = list(nlp.pipe(texts))
-    with sys_loc.open("w", encoding="utf8") as out_file:
-        write_conllu(docs, out_file)
-    with gold_loc.open("r", encoding="utf8") as gold_file:
-        gold_ud = conll17_ud_eval.load_conllu(gold_file)
-        with sys_loc.open("r", encoding="utf8") as sys_file:
-            sys_ud = conll17_ud_eval.load_conllu(sys_file)
-        scores = conll17_ud_eval.evaluate(gold_ud, sys_ud)
-    return docs, scores
-
-
-def write_conllu(docs, file_):
-    merger = Matcher(docs[0].vocab)
-    merger.add("SUBTOK", None, [{"DEP": "subtok", "op": "+"}])
-    for i, doc in enumerate(docs):
-        matches = []
-        if doc.is_parsed:
-            matches = merger(doc)
-        spans = [doc[start : end + 1] for _, start, end in matches]
-        with doc.retokenize() as retokenizer:
-            for span in spans:
-                retokenizer.merge(span)
-        file_.write("# newdoc id = {i}\n".format(i=i))
-        for j, sent in enumerate(doc.sents):
-            file_.write("# sent_id = {i}.{j}\n".format(i=i, j=j))
-            file_.write("# text = {text}\n".format(text=sent.text))
-            for k, token in enumerate(sent):
-                file_.write(_get_token_conllu(token, k, len(sent)) + "\n")
-            file_.write("\n")
-            for word in sent:
-                if word.head.i == word.i and word.dep_ == "ROOT":
-                    break
-            else:
-                print("Rootless sentence!")
-                print(sent)
-                print(i)
-                for w in sent:
-                    print(w.i, w.text, w.head.text, w.head.i, w.dep_)
-                raise ValueError
-
-
-def _get_token_conllu(token, k, sent_len):
-    if token.check_morph(Fused_begin) and (k + 1 < sent_len):
-        n = 1
-        text = [token.text]
-        while token.nbor(n).check_morph(Fused_inside):
-            text.append(token.nbor(n).text)
-            n += 1
-        id_ = "%d-%d" % (k + 1, (k + n))
-        fields = [id_, "".join(text)] + ["_"] * 8
-        lines = ["\t".join(fields)]
-    else:
-        lines = []
-    if token.head.i == token.i:
-        head = 0
-    else:
-        head = k + (token.head.i - token.i) + 1
-    fields = [
-        str(k + 1),
-        token.text,
-        token.lemma_,
-        token.pos_,
-        token.tag_,
-        "_",
-        str(head),
-        token.dep_.lower(),
-        "_",
-        "_",
-    ]
-    if token.check_morph(Fused_begin) and (k + 1 < sent_len):
-        if k == 0:
-            fields[1] = token.norm_[0].upper() + token.norm_[1:]
-        else:
-            fields[1] = token.norm_
-    elif token.check_morph(Fused_inside):
-        fields[1] = token.norm_
-    elif token._.split_start is not None:
-        split_start = token._.split_start
-        split_end = token._.split_end
-        split_len = (split_end.i - split_start.i) + 1
-        n_in_split = token.i - split_start.i
-        subtokens = guess_fused_orths(split_start.text, [""] * split_len)
-        fields[1] = subtokens[n_in_split]
-
-    lines.append("\t".join(fields))
-    return "\n".join(lines)
-
-
-def guess_fused_orths(word, ud_forms):
-    """The UD data 'fused tokens' don't necessarily expand to keys that match
-    the form. We need orths that exact match the string. Here we make a best
-    effort to divide up the word."""
-    if word == "".join(ud_forms):
-        # Happy case: we get a perfect split, with each letter accounted for.
-        return ud_forms
-    elif len(word) == sum(len(subtoken) for subtoken in ud_forms):
-        # Unideal, but at least lengths match.
-        output = []
-        remain = word
-        for subtoken in ud_forms:
-            assert len(subtoken) >= 1
-            output.append(remain[: len(subtoken)])
-            remain = remain[len(subtoken) :]
-        assert len(remain) == 0, (word, ud_forms, remain)
-        return output
-    else:
-        # Let's say word is 6 long, and there are three subtokens. The orths
-        # *must* equal the original string. Arbitrarily, split [4, 1, 1]
-        first = word[: len(word) - (len(ud_forms) - 1)]
-        output = [first]
-        remain = word[len(first) :]
-        for i in range(1, len(ud_forms)):
-            assert remain
-            output.append(remain[:1])
-            remain = remain[1:]
-        assert len(remain) == 0, (word, output, remain)
-        return output
-
-
-def print_results(name, ud_scores):
-    fields = {}
-    if ud_scores is not None:
-        fields.update(
-            {
-                "words": ud_scores["Words"].f1 * 100,
-                "sents": ud_scores["Sentences"].f1 * 100,
-                "tags": ud_scores["XPOS"].f1 * 100,
-                "uas": ud_scores["UAS"].f1 * 100,
-                "las": ud_scores["LAS"].f1 * 100,
-            }
-        )
-    else:
-        fields.update({"words": 0.0, "sents": 0.0, "tags": 0.0, "uas": 0.0, "las": 0.0})
-    tpl = "\t".join(
-        (name, "{las:.1f}", "{uas:.1f}", "{tags:.1f}", "{sents:.1f}", "{words:.1f}")
-    )
-    print(tpl.format(**fields))
-    return fields
-
-
-def get_token_split_start(token):
-    if token.text == "":
-        assert token.i != 0
-        i = -1
-        while token.nbor(i).text == "":
-            i -= 1
-        return token.nbor(i)
-    elif (token.i + 1) < len(token.doc) and token.nbor(1).text == "":
-        return token
-    else:
-        return None
-
-
-def get_token_split_end(token):
-    if (token.i + 1) == len(token.doc):
-        return token if token.text == "" else None
-    elif token.text != "" and token.nbor(1).text != "":
-        return None
-    i = 1
-    while (token.i + i) < len(token.doc) and token.nbor(i).text == "":
-        i += 1
-    return token.nbor(i - 1)
-
-
-##################
-# Initialization #
-##################
-
-
-def load_nlp(experiments_dir, corpus):
-    nlp = spacy.load(experiments_dir / corpus / "best-model")
-    return nlp
-
-
-def initialize_pipeline(nlp, docs, golds, config, device):
-    nlp.add_pipe(nlp.create_pipe("parser"))
-    return nlp
-
-
-@plac.annotations(
-    test_data_dir=(
-        "Path to Universal Dependencies test data",
-        "positional",
-        None,
-        Path,
-    ),
-    experiment_dir=("Parent directory with output model", "positional", None, Path),
-    corpus=(
-        "UD corpus to evaluate, e.g. UD_English, UD_Spanish, etc",
-        "positional",
-        None,
-        str,
-    ),
-)
-def main(test_data_dir, experiment_dir, corpus):
-    Token.set_extension("split_start", getter=get_token_split_start)
-    Token.set_extension("split_end", getter=get_token_split_end)
-    Token.set_extension("begins_fused", default=False)
-    Token.set_extension("inside_fused", default=False)
-    lang.zh.Chinese.Defaults.use_jieba = False
-    lang.ja.Japanese.Defaults.use_janome = False
-    lang.ru.Russian.Defaults.use_pymorphy2 = False
-
-    nlp = load_nlp(experiment_dir, corpus)
-
-    treebank_code = nlp.meta["treebank"]
-    for section in ("test", "dev"):
-        if section == "dev":
-            section_dir = "conll17-ud-development-2017-03-19"
-        else:
-            section_dir = "conll17-ud-test-2017-05-09"
-        text_path = test_data_dir / "input" / section_dir / (treebank_code + ".txt")
-        udpipe_path = (
-            test_data_dir / "input" / section_dir / (treebank_code + "-udpipe.conllu")
-        )
-        gold_path = test_data_dir / "gold" / section_dir / (treebank_code + ".conllu")
-
-        header = [section, "LAS", "UAS", "TAG", "SENT", "WORD"]
-        print("\t".join(header))
-        inputs = {"gold": gold_path, "udp": udpipe_path, "raw": text_path}
-        for input_type in ("udp", "raw"):
-            input_path = inputs[input_type]
-            output_path = (
-                experiment_dir / corpus / "{section}.conllu".format(section=section)
-            )
-
-            parsed_docs, test_scores = evaluate(nlp, input_path, gold_path, output_path)
-
-            accuracy = print_results(input_type, test_scores)
-            acc_path = (
-                experiment_dir
-                / corpus
-                / "{section}-accuracy.json".format(section=section)
-            )
-            srsly.write_json(acc_path, accuracy)
-
-
-if __name__ == "__main__":
-    plac.call(main)
--- a/bin/ud/ud_train.py
+++ b/bin/ud/ud_train.py
@ -1,570 +0,0 @@
-# flake8: noqa
-"""Train for CONLL 2017 UD treebank evaluation. Takes .conllu files, writes
-.conllu format for development data, allowing the official scorer to be used.
-"""
-from __future__ import unicode_literals
-
-import plac
-from pathlib import Path
-import re
-import json
-import tqdm
-
-import spacy
-import spacy.util
-from bin.ud import conll17_ud_eval
-from spacy.tokens import Token, Doc
-from spacy.gold import GoldParse
-from spacy.util import compounding, minibatch, minibatch_by_words
-from spacy.syntax.nonproj import projectivize
-from spacy.matcher import Matcher
-from spacy import displacy
-from collections import defaultdict
-
-import random
-
-from spacy import lang
-from spacy.lang import zh
-from spacy.lang import ja
-
-try:
-    import torch
-except ImportError:
-    torch = None
-
-
-################
-# Data reading #
-################
-
-space_re = re.compile("\s+")
-
-
-def split_text(text):
-    return [space_re.sub(" ", par.strip()) for par in text.split("\n\n")]
-
-
-def read_data(
-    nlp,
-    conllu_file,
-    text_file,
-    raw_text=True,
-    oracle_segments=False,
-    max_doc_length=None,
-    limit=None,
-):
-    """Read the CONLLU format into (Doc, GoldParse) tuples. If raw_text=True,
-    include Doc objects created using nlp.make_doc and then aligned against
-    the gold-standard sequences. If oracle_segments=True, include Doc objects
-    created from the gold-standard segments. At least one must be True."""
-    if not raw_text and not oracle_segments:
-        raise ValueError("At least one of raw_text or oracle_segments must be True")
-    paragraphs = split_text(text_file.read())
-    conllu = read_conllu(conllu_file)
-    # sd is spacy doc; cd is conllu doc
-    # cs is conllu sent, ct is conllu token
-    docs = []
-    golds = []
-    for doc_id, (text, cd) in enumerate(zip(paragraphs, conllu)):
-        sent_annots = []
-        for cs in cd:
-            sent = defaultdict(list)
-            for id_, word, lemma, pos, tag, morph, head, dep, _, space_after in cs:
-                if "." in id_:
-                    continue
-                if "-" in id_:
-                    continue
-                id_ = int(id_) - 1
-                head = int(head) - 1 if head != "0" else id_
-                sent["words"].append(word)
-                sent["tags"].append(tag)
-                sent["morphology"].append(_parse_morph_string(morph))
-                sent["morphology"][-1].add("POS_%s" % pos)
-                sent["heads"].append(head)
-                sent["deps"].append("ROOT" if dep == "root" else dep)
-                sent["spaces"].append(space_after == "_")
-            sent["entities"] = ["-"] * len(sent["words"])
-            sent["heads"], sent["deps"] = projectivize(sent["heads"], sent["deps"])
-            if oracle_segments:
-                docs.append(Doc(nlp.vocab, words=sent["words"], spaces=sent["spaces"]))
-                golds.append(GoldParse(docs[-1], **sent))
-                assert golds[-1].morphology is not None
-
-            sent_annots.append(sent)
-            if raw_text and max_doc_length and len(sent_annots) >= max_doc_length:
-                doc, gold = _make_gold(nlp, None, sent_annots)
-                assert gold.morphology is not None
-                sent_annots = []
-                docs.append(doc)
-                golds.append(gold)
-                if limit and len(docs) >= limit:
-                    return docs, golds
-
-        if raw_text and sent_annots:
-            doc, gold = _make_gold(nlp, None, sent_annots)
-            docs.append(doc)
-            golds.append(gold)
-        if limit and len(docs) >= limit:
-            return docs, golds
-    return docs, golds
-
-def _parse_morph_string(morph_string):
-    if morph_string == '_':
-        return set()
-    output = []
-    replacements = {'1': 'one', '2': 'two', '3': 'three'}
-    for feature in morph_string.split('|'):
-        key, value = feature.split('=')
-        value = replacements.get(value, value)
-        value = value.split(',')[0]
-        output.append('%s_%s' % (key, value.lower()))
-    return set(output)
-
-def read_conllu(file_):
-    docs = []
-    sent = []
-    doc = []
-    for line in file_:
-        if line.startswith("# newdoc"):
-            if doc:
-                docs.append(doc)
-            doc = []
-        elif line.startswith("#"):
-            continue
-        elif not line.strip():
-            if sent:
-                doc.append(sent)
-            sent = []
-        else:
-            sent.append(list(line.strip().split("\t")))
-            if len(sent[-1]) != 10:
-                print(repr(line))
-                raise ValueError
-    if sent:
-        doc.append(sent)
-    if doc:
-        docs.append(doc)
-    return docs
-
-
-def _make_gold(nlp, text, sent_annots, drop_deps=0.0):
-    # Flatten the conll annotations, and adjust the head indices
-    flat = defaultdict(list)
-    sent_starts = []
-    for sent in sent_annots:
-        flat["heads"].extend(len(flat["words"])+head for head in sent["heads"])
-        for field in ["words", "tags", "deps", "morphology", "entities", "spaces"]:
-            flat[field].extend(sent[field])
-        sent_starts.append(True)
-        sent_starts.extend([False] * (len(sent["words"]) - 1))
-    # Construct text if necessary
-    assert len(flat["words"]) == len(flat["spaces"])
-    if text is None:
-        text = "".join(
-            word + " " * space for word, space in zip(flat["words"], flat["spaces"])
-        )
-    doc = nlp.make_doc(text)
-    flat.pop("spaces")
-    gold = GoldParse(doc, **flat)
-    gold.sent_starts = sent_starts
-    for i in range(len(gold.heads)):
-        if random.random() < drop_deps:
-            gold.heads[i] = None
-            gold.labels[i] = None
-
-    return doc, gold
-
-
-#############################
-# Data transforms for spaCy #
-#############################
-
-
-def golds_to_gold_tuples(docs, golds):
-    """Get out the annoying 'tuples' format used by begin_training, given the
-    GoldParse objects."""
-    tuples = []
-    for doc, gold in zip(docs, golds):
-        text = doc.text
-        ids, words, tags, heads, labels, iob = zip(*gold.orig_annot)
-        sents = [((ids, words, tags, heads, labels, iob), [])]
-        tuples.append((text, sents))
-    return tuples
-
-
-##############
-# Evaluation #
-##############
-
-
-def evaluate(nlp, text_loc, gold_loc, sys_loc, limit=None):
-    if text_loc.parts[-1].endswith(".conllu"):
-        docs = []
-        with text_loc.open(encoding="utf8") as file_:
-            for conllu_doc in read_conllu(file_):
-                for conllu_sent in conllu_doc:
-                    words = [line[1] for line in conllu_sent]
-                    docs.append(Doc(nlp.vocab, words=words))
-        for name, component in nlp.pipeline:
-            docs = list(component.pipe(docs))
-    else:
-        with text_loc.open("r", encoding="utf8") as text_file:
-            texts = split_text(text_file.read())
-            docs = list(nlp.pipe(texts))
-    with sys_loc.open("w", encoding="utf8") as out_file:
-        write_conllu(docs, out_file)
-    with gold_loc.open("r", encoding="utf8") as gold_file:
-        gold_ud = conll17_ud_eval.load_conllu(gold_file)
-        with sys_loc.open("r", encoding="utf8") as sys_file:
-            sys_ud = conll17_ud_eval.load_conllu(sys_file)
-        scores = conll17_ud_eval.evaluate(gold_ud, sys_ud)
-    return docs, scores
-
-
-def write_conllu(docs, file_):
-    if not Token.has_extension("get_conllu_lines"):
-        Token.set_extension("get_conllu_lines", method=get_token_conllu)
-    if not Token.has_extension("begins_fused"):
-        Token.set_extension("begins_fused", default=False)
-    if not Token.has_extension("inside_fused"):
-        Token.set_extension("inside_fused", default=False)
-
-    merger = Matcher(docs[0].vocab)
-    merger.add("SUBTOK", None, [{"DEP": "subtok", "op": "+"}])
-    for i, doc in enumerate(docs):
-        matches = []
-        if doc.is_parsed:
-            matches = merger(doc)
-        spans = [doc[start : end + 1] for _, start, end in matches]
-        seen_tokens = set()
-        with doc.retokenize() as retokenizer:
-            for span in spans:
-                span_tokens = set(range(span.start, span.end))
-                if not span_tokens.intersection(seen_tokens):
-                    retokenizer.merge(span)
-                    seen_tokens.update(span_tokens)
-
-        file_.write("# newdoc id = {i}\n".format(i=i))
-        for j, sent in enumerate(doc.sents):
-            file_.write("# sent_id = {i}.{j}\n".format(i=i, j=j))
-            file_.write("# text = {text}\n".format(text=sent.text))
-            for k, token in enumerate(sent):
-                if token.head.i > sent[-1].i or token.head.i < sent[0].i:
-                    for word in doc[sent[0].i - 10 : sent[0].i]:
-                        print(word.i, word.head.i, word.text, word.dep_)
-                    for word in sent:
-                        print(word.i, word.head.i, word.text, word.dep_)
-                    for word in doc[sent[-1].i : sent[-1].i + 10]:
-                        print(word.i, word.head.i, word.text, word.dep_)
-                    raise ValueError(
-                        "Invalid parse: head outside sentence (%s)" % token.text
-                    )
-                file_.write(token._.get_conllu_lines(k) + "\n")
-            file_.write("\n")
-
-
-def print_progress(itn, losses, ud_scores):
-    fields = {
-        "dep_loss": losses.get("parser", 0.0),
-        "morph_loss": losses.get("morphologizer", 0.0),
-        "tag_loss": losses.get("tagger", 0.0),
-        "words": ud_scores["Words"].f1 * 100,
-        "sents": ud_scores["Sentences"].f1 * 100,
-        "tags": ud_scores["XPOS"].f1 * 100,
-        "uas": ud_scores["UAS"].f1 * 100,
-        "las": ud_scores["LAS"].f1 * 100,
-        "morph": ud_scores["Feats"].f1 * 100,
-    }
-    header = ["Epoch", "P.Loss", "M.Loss", "LAS", "UAS", "TAG", "MORPH", "SENT", "WORD"]
-    if itn == 0:
-        print("\t".join(header))
-    tpl = "\t".join((
-        "{:d}",
-        "{dep_loss:.1f}",
-        "{morph_loss:.1f}",
-        "{las:.1f}",
-        "{uas:.1f}",
-        "{tags:.1f}",
-        "{morph:.1f}",
-        "{sents:.1f}",
-        "{words:.1f}",
-    ))
-    print(tpl.format(itn, **fields))
-
-
-# def get_sent_conllu(sent, sent_id):
-#    lines = ["# sent_id = {sent_id}".format(sent_id=sent_id)]
-
-
-def get_token_conllu(token, i):
-    if token._.begins_fused:
-        n = 1
-        while token.nbor(n)._.inside_fused:
-            n += 1
-        id_ = "%d-%d" % (i, i + n)
-        lines = [id_, token.text, "_", "_", "_", "_", "_", "_", "_", "_"]
-    else:
-        lines = []
-    if token.head.i == token.i:
-        head = 0
-    else:
-        head = i + (token.head.i - token.i) + 1
-    features = list(token.morph)
-    feat_str = []
-    replacements = {"one": "1", "two": "2", "three": "3"}
-    for feat in features:
-        if not feat.startswith("begin") and not feat.startswith("end"):
-            key, value = feat.split("_", 1)
-            value = replacements.get(value, value)
-            feat_str.append("%s=%s" % (key, value.title()))
-    if not feat_str:
-        feat_str = "_"
-    else:
-        feat_str = "|".join(feat_str)
-    fields = [str(i+1), token.text, token.lemma_, token.pos_, token.tag_, feat_str,
-              str(head), token.dep_.lower(), "_", "_"]
-    lines.append("\t".join(fields))
-    return "\n".join(lines)
-
-
-
-##################
-# Initialization #
-##################
-
-
-def load_nlp(corpus, config, vectors=None):
-    lang = corpus.split("_")[0]
-    nlp = spacy.blank(lang)
-    if config.vectors:
-        if not vectors:
-            raise ValueError(
-                "config asks for vectors, but no vectors "
-                "directory set on command line (use -v)"
-            )
-        if (Path(vectors) / corpus).exists():
-            nlp.vocab.from_disk(Path(vectors) / corpus / "vocab")
-    nlp.meta["treebank"] = corpus
-    return nlp
-
-
-def initialize_pipeline(nlp, docs, golds, config, device):
-    nlp.add_pipe(nlp.create_pipe("tagger", config={"set_morphology": False}))
-    nlp.add_pipe(nlp.create_pipe("morphologizer"))
-    nlp.add_pipe(nlp.create_pipe("parser"))
-    if config.multitask_tag:
-        nlp.parser.add_multitask_objective("tag")
-    if config.multitask_sent:
-        nlp.parser.add_multitask_objective("sent_start")
-    for gold in golds:
-        for tag in gold.tags:
-            if tag is not None:
-                nlp.tagger.add_label(tag)
-    if torch is not None and device != -1:
-        torch.set_default_tensor_type("torch.cuda.FloatTensor")
-    optimizer = nlp.begin_training(
-        lambda: golds_to_gold_tuples(docs, golds),
-        device=device,
-        subword_features=config.subword_features,
-        conv_depth=config.conv_depth,
-        bilstm_depth=config.bilstm_depth,
-    )
-    if config.pretrained_tok2vec:
-        _load_pretrained_tok2vec(nlp, config.pretrained_tok2vec)
-    return optimizer
-
-
-def _load_pretrained_tok2vec(nlp, loc):
-    """Load pretrained weights for the 'token-to-vector' part of the component
-    models, which is typically a CNN. See 'spacy pretrain'. Experimental.
-    """
-    with Path(loc).open("rb", encoding="utf8") as file_:
-        weights_data = file_.read()
-    loaded = []
-    for name, component in nlp.pipeline:
-        if hasattr(component, "model") and hasattr(component.model, "tok2vec"):
-            component.tok2vec.from_bytes(weights_data)
-            loaded.append(name)
-    return loaded
-
-
-########################
-# Command line helpers #
-########################
-
-
-class Config(object):
-    def __init__(
-        self,
-        vectors=None,
-        max_doc_length=10,
-        multitask_tag=False,
-        multitask_sent=False,
-        multitask_dep=False,
-        multitask_vectors=None,
-        bilstm_depth=0,
-        nr_epoch=30,
-        min_batch_size=100,
-        max_batch_size=1000,
-        batch_by_words=True,
-        dropout=0.2,
-        conv_depth=4,
-        subword_features=True,
-        vectors_dir=None,
-        pretrained_tok2vec=None,
-    ):
-        if vectors_dir is not None:
-            if vectors is None:
-                vectors = True
-            if multitask_vectors is None:
-                multitask_vectors = True
-        for key, value in locals().items():
-            setattr(self, key, value)
-
-    @classmethod
-    def load(cls, loc, vectors_dir=None):
-        with Path(loc).open("r", encoding="utf8") as file_:
-            cfg = json.load(file_)
-        if vectors_dir is not None:
-            cfg["vectors_dir"] = vectors_dir
-        return cls(**cfg)
-
-
-class Dataset(object):
-    def __init__(self, path, section):
-        self.path = path
-        self.section = section
-        self.conllu = None
-        self.text = None
-        for file_path in self.path.iterdir():
-            name = file_path.parts[-1]
-            if section in name and name.endswith("conllu"):
-                self.conllu = file_path
-            elif section in name and name.endswith("txt"):
-                self.text = file_path
-        if self.conllu is None:
-            msg = "Could not find .txt file in {path} for {section}"
-            raise IOError(msg.format(section=section, path=path))
-        if self.text is None:
-            msg = "Could not find .txt file in {path} for {section}"
-        self.lang = self.conllu.parts[-1].split("-")[0].split("_")[0]
-
-
-class TreebankPaths(object):
-    def __init__(self, ud_path, treebank, **cfg):
-        self.train = Dataset(ud_path / treebank, "train")
-        self.dev = Dataset(ud_path / treebank, "dev")
-        self.lang = self.train.lang
-
-
-@plac.annotations(
-    ud_dir=("Path to Universal Dependencies corpus", "positional", None, Path),
-    parses_dir=("Directory to write the development parses", "positional", None, Path),
-    corpus=(
-        "UD corpus to train and evaluate on, e.g. UD_Spanish-AnCora",
-        "positional",
-        None,
-        str,
-    ),
-    config=("Path to json formatted config file", "option", "C", Path),
-    limit=("Size limit", "option", "n", int),
-    gpu_device=("Use GPU", "option", "g", int),
-    use_oracle_segments=("Use oracle segments", "flag", "G", int),
-    vectors_dir=(
-        "Path to directory with pretrained vectors, named e.g. en/",
-        "option",
-        "v",
-        Path,
-    ),
-)
-def main(
-    ud_dir,
-    parses_dir,
-    corpus,
-    config=None,
-    limit=0,
-    gpu_device=-1,
-    vectors_dir=None,
-    use_oracle_segments=False,
-):
-    Token.set_extension("get_conllu_lines", method=get_token_conllu)
-    Token.set_extension("begins_fused", default=False)
-    Token.set_extension("inside_fused", default=False)
-
-    spacy.util.fix_random_seed()
-    lang.zh.Chinese.Defaults.use_jieba = False
-    lang.ja.Japanese.Defaults.use_janome = False
-
-    if config is not None:
-        config = Config.load(config, vectors_dir=vectors_dir)
-    else:
-        config = Config(vectors_dir=vectors_dir)
-    paths = TreebankPaths(ud_dir, corpus)
-    if not (parses_dir / corpus).exists():
-        (parses_dir / corpus).mkdir()
-    print("Train and evaluate", corpus, "using lang", paths.lang)
-    nlp = load_nlp(paths.lang, config, vectors=vectors_dir)
-
-    docs, golds = read_data(
-        nlp,
-        paths.train.conllu.open(encoding="utf8"),
-        paths.train.text.open(encoding="utf8"),
-        max_doc_length=config.max_doc_length,
-        limit=limit,
-    )
-
-    optimizer = initialize_pipeline(nlp, docs, golds, config, gpu_device)
-
-    batch_sizes = compounding(config.min_batch_size, config.max_batch_size, 1.001)
-    beam_prob = compounding(0.2, 0.8, 1.001)
-    for i in range(config.nr_epoch):
-        docs, golds = read_data(
-            nlp,
-            paths.train.conllu.open(encoding="utf8"),
-            paths.train.text.open(encoding="utf8"),
-            max_doc_length=config.max_doc_length,
-            limit=limit,
-            oracle_segments=use_oracle_segments,
-            raw_text=not use_oracle_segments,
-        )
-        Xs = list(zip(docs, golds))
-        random.shuffle(Xs)
-        if config.batch_by_words:
-            batches = minibatch_by_words(Xs, size=batch_sizes)
-        else:
-            batches = minibatch(Xs, size=batch_sizes)
-        losses = {}
-        n_train_words = sum(len(doc) for doc in docs)
-        with tqdm.tqdm(total=n_train_words, leave=False) as pbar:
-            for batch in batches:
-                batch_docs, batch_gold = zip(*batch)
-                pbar.update(sum(len(doc) for doc in batch_docs))
-                nlp.parser.cfg["beam_update_prob"] = next(beam_prob)
-                nlp.update(
-                    batch_docs,
-                    batch_gold,
-                    sgd=optimizer,
-                    drop=config.dropout,
-                    losses=losses,
-                )
-
-        out_path = parses_dir / corpus / "epoch-{i}.conllu".format(i=i)
-        with nlp.use_params(optimizer.averages):
-            if use_oracle_segments:
-                parsed_docs, scores = evaluate(nlp, paths.dev.conllu,
-                                                paths.dev.conllu, out_path)
-            else:
-                parsed_docs, scores = evaluate(nlp, paths.dev.text,
-                                                paths.dev.conllu, out_path)
-        print_progress(i, losses, scores)
-
-
-def _render_parses(i, to_render):
-    to_render[0].user_data["title"] = "Batch %d" % i
-    with Path("/tmp/parses.html").open("w", encoding="utf8") as file_:
-        html = displacy.render(to_render[:5], style="dep", page=True)
-        file_.write(html)
-
-
-if __name__ == "__main__":
-    plac.call(main)
--- a/build-constraints.txt
+++ b/build-constraints.txt
@ -1,5 +1,2 @@
-# build version constraints for use with wheelwright + multibuild
-numpy==1.15.0; python_version<='3.7'
-numpy==1.17.3; python_version=='3.8'
-numpy==1.19.3; python_version=='3.9'
-numpy; python_version>='3.10'
+# build version constraints for use with wheelwright
+numpy>=2.0.0,<3.0.0
--- a/examples/README.md
+++ b/examples/README.md
@ -2,18 +2,129 @@

 # spaCy examples

-The examples are Python scripts with well-behaved command line interfaces. For
-more detailed usage guides, see the [documentation](https://spacy.io/usage/).
+For spaCy v3 we've converted many of the [v2 example
+scripts](https://github.com/explosion/spaCy/tree/v2.3.x/examples/) into
+end-to-end [spacy projects](https://spacy.io/usage/projects) workflows. The
+workflows include all the steps to go from data to packaged spaCy models.

-To see the available arguments, you can use the `--help` or `-h` flag:
+## 🪐 Pipeline component demos

-```bash
-$ python examples/training/train_ner.py --help
-```
+The simplest demos for training a single pipeline component are in the
+[`pipelines`](https://github.com/explosion/projects/blob/v3/pipelines) category
+including:

-While we try to keep the examples up to date, they are not currently exercised
-by the test suite, as some of them require significant data downloads or take
-time to train. If you find that an example is no longer running,
-[please tell us](https://github.com/explosion/spaCy/issues)! We know there's
-nothing worse than trying to figure out what you're doing wrong, and it turns
-out your code was never the problem.
+- [`pipelines/ner_demo`](https://github.com/explosion/projects/blob/v3/pipelines/ner_demo):
+  Train a named entity recognizer
+- [`pipelines/textcat_demo`](https://github.com/explosion/projects/blob/v3/pipelines/textcat_demo):
+  Train a text classifier
+- [`pipelines/parser_intent_demo`](https://github.com/explosion/projects/blob/v3/pipelines/parser_intent_demo):
+  Train a dependency parser for custom semantics
+
+## 🪐 Tutorials
+
+The [`tutorials`](https://github.com/explosion/projects/blob/v3/tutorials)
+category includes examples that work through specific NLP use cases end-to-end:
+
+- [`tutorials/textcat_goemotions`](https://github.com/explosion/projects/blob/v3/tutorials/textcat_goemotions):
+  Train a text classifier to categorize emotions in Reddit posts
+- [`tutorials/nel_emerson`](https://github.com/explosion/projects/blob/v3/tutorials/nel_emerson):
+  Use an entity linker to disambiguate mentions of the same name
+
+Check out the [projects documentation](https://spacy.io/usage/projects) and
+browse through the [available
+projects](https://github.com/explosion/projects/)!
+
+## 🚀 Get started with a demo project
+
+The
+[`pipelines/ner_demo`](https://github.com/explosion/projects/blob/v3/pipelines/ner_demo)
+project converts the spaCy v2
+[`train_ner.py`](https://github.com/explosion/spaCy/blob/v2.3.x/examples/training/train_ner.py)
+demo script into a spaCy v3 project.
+
+1. Clone the project:
+
+   ```bash
+   python -m spacy project clone pipelines/ner_demo
+   ```
+
+2. Install requirements and download any data assets:
+
+   ```bash
+   cd ner_demo
+   python -m pip install -r requirements.txt
+   python -m spacy project assets
+   ```
+
+3. Run the default workflow to convert, train and evaluate:
+
+   ```bash
+   python -m spacy project run all
+   ```
+
+   Sample output:
+
+   ```none
+   ℹ Running workflow 'all'
+   
+   ================================== convert ==================================
+   Running command: /home/user/venv/bin/python scripts/convert.py en assets/train.json corpus/train.spacy
+   Running command: /home/user/venv/bin/python scripts/convert.py en assets/dev.json corpus/dev.spacy
+   
+   =============================== create-config ===============================
+   Running command: /home/user/venv/bin/python -m spacy init config --lang en --pipeline ner configs/config.cfg --force
+   ℹ Generated config template specific for your use case
+   - Language: en
+   - Pipeline: ner
+   - Optimize for: efficiency
+   - Hardware: CPU
+   - Transformer: None
+   ✔ Auto-filled config with all values
+   ✔ Saved config
+   configs/config.cfg
+   You can now add your data and train your pipeline:
+   python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy
+   
+   =================================== train ===================================
+   Running command: /home/user/venv/bin/python -m spacy train configs/config.cfg --output training/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy --training.eval_frequency 10 --training.max_steps 100 --gpu-id -1
+   ℹ Using CPU
+   
+   =========================== Initializing pipeline ===========================
+   [2021-03-11 19:34:59,101] [INFO] Set up nlp object from config
+   [2021-03-11 19:34:59,109] [INFO] Pipeline: ['tok2vec', 'ner']
+   [2021-03-11 19:34:59,113] [INFO] Created vocabulary
+   [2021-03-11 19:34:59,113] [INFO] Finished initializing nlp object
+   [2021-03-11 19:34:59,265] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
+   ✔ Initialized pipeline
+   
+   ============================= Training pipeline =============================
+   ℹ Pipeline: ['tok2vec', 'ner']
+   ℹ Initial learn rate: 0.001
+   E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
+   ---  ------  ------------  --------  ------  ------  ------  ------
+     0       0          0.00      7.90    0.00    0.00    0.00    0.00
+    10      10          0.11     71.07    0.00    0.00    0.00    0.00
+    20      20          0.65     22.44   50.00   50.00   50.00    0.50
+    30      30          0.22      6.38   80.00   66.67  100.00    0.80
+    40      40          0.00      0.00   80.00   66.67  100.00    0.80
+    50      50          0.00      0.00   80.00   66.67  100.00    0.80
+    60      60          0.00      0.00  100.00  100.00  100.00    1.00
+    70      70          0.00      0.00  100.00  100.00  100.00    1.00
+    80      80          0.00      0.00  100.00  100.00  100.00    1.00
+    90      90          0.00      0.00  100.00  100.00  100.00    1.00
+   100     100          0.00      0.00  100.00  100.00  100.00    1.00
+   ✔ Saved pipeline to output directory
+   training/model-last
+   ```
+
+4. Package the model:
+
+   ```bash
+   python -m spacy project run package
+   ```
+
+5. Visualize the model's output with [Streamlit](https://streamlit.io):
+
+   ```bash
+   python -m spacy project run visualize-model
+   ```
--- a/examples/deep_learning_keras.py
+++ b/examples/deep_learning_keras.py
@ -1,267 +0,0 @@
-"""
-This example shows how to use an LSTM sentiment classification model trained
-using Keras in spaCy. spaCy splits the document into sentences, and each
-sentence is classified using the LSTM. The scores for the sentences are then
-aggregated to give the document score. This kind of hierarchical model is quite
-difficult in "pure" Keras or Tensorflow, but it's very effective. The Keras
-example on this dataset performs quite poorly, because it cuts off the documents
-so that they're a fixed size. This hurts review accuracy a lot, because people
-often summarise their rating in the final sentence
-
-Prerequisites:
-spacy download en_vectors_web_lg
-pip install keras==2.0.9
-
-Compatible with: spaCy v2.0.0+
-"""
-
-import plac
-import random
-import pathlib
-import cytoolz
-import numpy
-from keras.models import Sequential, model_from_json
-from keras.layers import LSTM, Dense, Embedding, Bidirectional
-from keras.layers import TimeDistributed
-from keras.optimizers import Adam
-import thinc.extra.datasets
-from spacy.compat import pickle
-import spacy
-
-
-class SentimentAnalyser(object):
-    @classmethod
-    def load(cls, path, nlp, max_length=100):
-        with (path / "config.json").open() as file_:
-            model = model_from_json(file_.read())
-        with (path / "model").open("rb") as file_:
-            lstm_weights = pickle.load(file_)
-        embeddings = get_embeddings(nlp.vocab)
-        model.set_weights([embeddings] + lstm_weights)
-        return cls(model, max_length=max_length)
-
-    def __init__(self, model, max_length=100):
-        self._model = model
-        self.max_length = max_length
-
-    def __call__(self, doc):
-        X = get_features([doc], self.max_length)
-        y = self._model.predict(X)
-        self.set_sentiment(doc, y)
-
-    def pipe(self, docs, batch_size=1000):
-        for minibatch in cytoolz.partition_all(batch_size, docs):
-            minibatch = list(minibatch)
-            sentences = []
-            for doc in minibatch:
-                sentences.extend(doc.sents)
-            Xs = get_features(sentences, self.max_length)
-            ys = self._model.predict(Xs)
-            for sent, label in zip(sentences, ys):
-                sent.doc.sentiment += label - 0.5
-            for doc in minibatch:
-                yield doc
-
-    def set_sentiment(self, doc, y):
-        doc.sentiment = float(y[0])
-        # Sentiment has a native slot for a single float.
-        # For arbitrary data storage, there's:
-        # doc.user_data['my_data'] = y
-
-
-def get_labelled_sentences(docs, doc_labels):
-    labels = []
-    sentences = []
-    for doc, y in zip(docs, doc_labels):
-        for sent in doc.sents:
-            sentences.append(sent)
-            labels.append(y)
-    return sentences, numpy.asarray(labels, dtype="int32")
-
-
-def get_features(docs, max_length):
-    docs = list(docs)
-    Xs = numpy.zeros((len(docs), max_length), dtype="int32")
-    for i, doc in enumerate(docs):
-        j = 0
-        for token in doc:
-            vector_id = token.vocab.vectors.find(key=token.orth)
-            if vector_id >= 0:
-                Xs[i, j] = vector_id
-            else:
-                Xs[i, j] = 0
-            j += 1
-            if j >= max_length:
-                break
-    return Xs
-
-
-def train(
-    train_texts,
-    train_labels,
-    dev_texts,
-    dev_labels,
-    lstm_shape,
-    lstm_settings,
-    lstm_optimizer,
-    batch_size=100,
-    nb_epoch=5,
-    by_sentence=True,
-):
-
-    print("Loading spaCy")
-    nlp = spacy.load("en_vectors_web_lg")
-    nlp.add_pipe(nlp.create_pipe("sentencizer"))
-    embeddings = get_embeddings(nlp.vocab)
-    model = compile_lstm(embeddings, lstm_shape, lstm_settings)
-
-    print("Parsing texts...")
-    train_docs = list(nlp.pipe(train_texts))
-    dev_docs = list(nlp.pipe(dev_texts))
-    if by_sentence:
-        train_docs, train_labels = get_labelled_sentences(train_docs, train_labels)
-        dev_docs, dev_labels = get_labelled_sentences(dev_docs, dev_labels)
-
-    train_X = get_features(train_docs, lstm_shape["max_length"])
-    dev_X = get_features(dev_docs, lstm_shape["max_length"])
-    model.fit(
-        train_X,
-        train_labels,
-        validation_data=(dev_X, dev_labels),
-        epochs=nb_epoch,
-        batch_size=batch_size,
-    )
-    return model
-
-
-def compile_lstm(embeddings, shape, settings):
-    model = Sequential()
-    model.add(
-        Embedding(
-            embeddings.shape[0],
-            embeddings.shape[1],
-            input_length=shape["max_length"],
-            trainable=False,
-            weights=[embeddings],
-            mask_zero=True,
-        )
-    )
-    model.add(TimeDistributed(Dense(shape["nr_hidden"], use_bias=False)))
-    model.add(
-        Bidirectional(
-            LSTM(
-                shape["nr_hidden"],
-                recurrent_dropout=settings["dropout"],
-                dropout=settings["dropout"],
-            )
-        )
-    )
-    model.add(Dense(shape["nr_class"], activation="sigmoid"))
-    model.compile(
-        optimizer=Adam(lr=settings["lr"]),
-        loss="binary_crossentropy",
-        metrics=["accuracy"],
-    )
-    return model
-
-
-def get_embeddings(vocab):
-    return vocab.vectors.data
-
-
-def evaluate(model_dir, texts, labels, max_length=100):
-    nlp = spacy.load("en_vectors_web_lg")
-    nlp.add_pipe(nlp.create_pipe("sentencizer"))
-    nlp.add_pipe(SentimentAnalyser.load(model_dir, nlp, max_length=max_length))
-
-    correct = 0
-    i = 0
-    for doc in nlp.pipe(texts, batch_size=1000):
-        correct += bool(doc.sentiment >= 0.5) == bool(labels[i])
-        i += 1
-    return float(correct) / i
-
-
-def read_data(data_dir, limit=0):
-    examples = []
-    for subdir, label in (("pos", 1), ("neg", 0)):
-        for filename in (data_dir / subdir).iterdir():
-            with filename.open() as file_:
-                text = file_.read()
-            examples.append((text, label))
-    random.shuffle(examples)
-    if limit >= 1:
-        examples = examples[:limit]
-    return zip(*examples)  # Unzips into two lists
-
-
-@plac.annotations(
-    train_dir=("Location of training file or directory"),
-    dev_dir=("Location of development file or directory"),
-    model_dir=("Location of output model directory",),
-    is_runtime=("Demonstrate run-time usage", "flag", "r", bool),
-    nr_hidden=("Number of hidden units", "option", "H", int),
-    max_length=("Maximum sentence length", "option", "L", int),
-    dropout=("Dropout", "option", "d", float),
-    learn_rate=("Learn rate", "option", "e", float),
-    nb_epoch=("Number of training epochs", "option", "i", int),
-    batch_size=("Size of minibatches for training LSTM", "option", "b", int),
-    nr_examples=("Limit to N examples", "option", "n", int),
-)
-def main(
-    model_dir=None,
-    train_dir=None,
-    dev_dir=None,
-    is_runtime=False,
-    nr_hidden=64,
-    max_length=100,  # Shape
-    dropout=0.5,
-    learn_rate=0.001,  # General NN config
-    nb_epoch=5,
-    batch_size=256,
-    nr_examples=-1,
-):  # Training params
-    if model_dir is not None:
-        model_dir = pathlib.Path(model_dir)
-    if train_dir is None or dev_dir is None:
-        imdb_data = thinc.extra.datasets.imdb()
-    if is_runtime:
-        if dev_dir is None:
-            dev_texts, dev_labels = zip(*imdb_data[1])
-        else:
-            dev_texts, dev_labels = read_data(dev_dir)
-        acc = evaluate(model_dir, dev_texts, dev_labels, max_length=max_length)
-        print(acc)
-    else:
-        if train_dir is None:
-            train_texts, train_labels = zip(*imdb_data[0])
-        else:
-            print("Read data")
-            train_texts, train_labels = read_data(train_dir, limit=nr_examples)
-        if dev_dir is None:
-            dev_texts, dev_labels = zip(*imdb_data[1])
-        else:
-            dev_texts, dev_labels = read_data(dev_dir, imdb_data, limit=nr_examples)
-        train_labels = numpy.asarray(train_labels, dtype="int32")
-        dev_labels = numpy.asarray(dev_labels, dtype="int32")
-        lstm = train(
-            train_texts,
-            train_labels,
-            dev_texts,
-            dev_labels,
-            {"nr_hidden": nr_hidden, "max_length": max_length, "nr_class": 1},
-            {"dropout": dropout, "lr": learn_rate},
-            {},
-            nb_epoch=nb_epoch,
-            batch_size=batch_size,
-        )
-        weights = lstm.get_weights()
-        if model_dir is not None:
-            with (model_dir / "model").open("wb") as file_:
-                pickle.dump(weights[1:], file_)
-            with (model_dir / "config.json").open("w") as file_:
-                file_.write(lstm.to_json())
-
-
-if __name__ == "__main__":
-    plac.call(main)
--- a/examples/information_extraction/entity_relations.py
+++ b/examples/information_extraction/entity_relations.py
@ -1,82 +0,0 @@
-#!/usr/bin/env python
-# coding: utf8
-"""A simple example of extracting relations between phrases and entities using
-spaCy's named entity recognizer and the dependency parse. Here, we extract
-money and currency values (entities labelled as MONEY) and then check the
-dependency tree to find the noun phrase they are referring to – for example:
-$9.4 million --> Net income.
-
-Compatible with: spaCy v2.0.0+
-Last tested with: v2.2.1
-"""
-from __future__ import unicode_literals, print_function
-
-import plac
-import spacy
-
-
-TEXTS = [
-    "Net income was $9.4 million compared to the prior year of $2.7 million.",
-    "Revenue exceeded twelve billion dollars, with a loss of $1b.",
-]
-
-
-@plac.annotations(
-    model=("Model to load (needs parser and NER)", "positional", None, str)
-)
-def main(model="en_core_web_sm"):
-    nlp = spacy.load(model)
-    print("Loaded model '%s'" % model)
-    print("Processing %d texts" % len(TEXTS))
-
-    for text in TEXTS:
-        doc = nlp(text)
-        relations = extract_currency_relations(doc)
-        for r1, r2 in relations:
-            print("{:<10}\t{}\t{}".format(r1.text, r2.ent_type_, r2.text))
-
-
-def filter_spans(spans):
-    # Filter a sequence of spans so they don't contain overlaps
-    # For spaCy 2.1.4+: this function is available as spacy.util.filter_spans()
-    get_sort_key = lambda span: (span.end - span.start, -span.start)
-    sorted_spans = sorted(spans, key=get_sort_key, reverse=True)
-    result = []
-    seen_tokens = set()
-    for span in sorted_spans:
-        # Check for end - 1 here because boundaries are inclusive
-        if span.start not in seen_tokens and span.end - 1 not in seen_tokens:
-            result.append(span)
-        seen_tokens.update(range(span.start, span.end))
-    result = sorted(result, key=lambda span: span.start)
-    return result
-
-
-def extract_currency_relations(doc):
-    # Merge entities and noun chunks into one token
-    spans = list(doc.ents) + list(doc.noun_chunks)
-    spans = filter_spans(spans)
-    with doc.retokenize() as retokenizer:
-        for span in spans:
-            retokenizer.merge(span)
-
-    relations = []
-    for money in filter(lambda w: w.ent_type_ == "MONEY", doc):
-        if money.dep_ in ("attr", "dobj"):
-            subject = [w for w in money.head.lefts if w.dep_ == "nsubj"]
-            if subject:
-                subject = subject[0]
-                relations.append((subject, money))
-        elif money.dep_ == "pobj" and money.head.dep_ == "prep":
-            relations.append((money.head.head, money))
-    return relations
-
-
-if __name__ == "__main__":
-    plac.call(main)
-
-    # Expected output:
-    # Net income      MONEY   $9.4 million
-    # the prior year  MONEY   $2.7 million
-    # Revenue         MONEY   twelve billion dollars
-    # a loss          MONEY   1b
--- a/examples/information_extraction/parse_subtrees.py
+++ b/examples/information_extraction/parse_subtrees.py
@ -1,67 +0,0 @@
-#!/usr/bin/env python
-# coding: utf8
-"""This example shows how to navigate the parse tree including subtrees
-attached to a word.
-
-Based on issue #252:
-"In the documents and tutorials the main thing I haven't found is
-examples on how to break sentences down into small sub thoughts/chunks. The
-noun_chunks is handy, but having examples on using the token.head to find small
-(near-complete) sentence chunks would be neat. Lets take the example sentence:
-"displaCy uses CSS and JavaScript to show you how computers understand language"
-
-This sentence has two main parts (XCOMP & CCOMP) according to the breakdown:
-[displaCy] uses CSS and Javascript [to + show]
-show you how computers understand [language]
-
-I'm assuming that we can use the token.head to build these groups."
-
-Compatible with: spaCy v2.0.0+
-Last tested with: v2.1.0
-"""
-from __future__ import unicode_literals, print_function
-
-import plac
-import spacy
-
-
-@plac.annotations(model=("Model to load", "positional", None, str))
-def main(model="en_core_web_sm"):
-    nlp = spacy.load(model)
-    print("Loaded model '%s'" % model)
-
-    doc = nlp(
-        "displaCy uses CSS and JavaScript to show you how computers "
-        "understand language"
-    )
-
-    # The easiest way is to find the head of the subtree you want, and then use
-    # the `.subtree`, `.children`, `.lefts` and `.rights` iterators. `.subtree`
-    # is the one that does what you're asking for most directly:
-    for word in doc:
-        if word.dep_ in ("xcomp", "ccomp"):
-            print("".join(w.text_with_ws for w in word.subtree))
-
-    # It'd probably be better for `word.subtree` to return a `Span` object
-    # instead of a generator over the tokens. If you want the `Span` you can
-    # get it via the `.right_edge` and `.left_edge` properties. The `Span`
-    # object is nice because you can easily get a vector, merge it, etc.
-    for word in doc:
-        if word.dep_ in ("xcomp", "ccomp"):
-            subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
-            print(subtree_span.text, "|", subtree_span.root.text)
-
-    # You might also want to select a head, and then select a start and end
-    # position by walking along its children. You could then take the
-    # `.left_edge` and `.right_edge` of those tokens, and use it to calculate
-    # a span.
-
-
-if __name__ == "__main__":
-    plac.call(main)
-
-    # Expected output:
-    # to show you how computers understand language
-    # how computers understand language
-    # to show you how computers understand language | show
-    # how computers understand language | understand
--- a/examples/information_extraction/phrase_matcher.py
+++ b/examples/information_extraction/phrase_matcher.py
@ -1,112 +0,0 @@
-#!/usr/bin/env python
-# coding: utf8
-"""Match a large set of multi-word expressions in O(1) time.
-
-The idea is to associate each word in the vocabulary with a tag, noting whether
-they begin, end, or are inside at least one pattern. An additional tag is used
-for single-word patterns. Complete patterns are also stored in a hash set.
-When we process a document, we look up the words in the vocabulary, to
-associate the words with the tags.  We then search for tag-sequences that
-correspond to valid candidates. Finally, we look up the candidates in the hash
-set.
-
-For instance, to search for the phrases "Barack Hussein Obama" and "Hilary
-Clinton", we would associate "Barack" and "Hilary" with the B tag, Hussein with
-the I tag, and Obama and Clinton with the L tag.
-
-The document "Barack Clinton and Hilary Clinton" would have the tag sequence
-[{B}, {L}, {}, {B}, {L}], so we'd get two matches. However, only the second
-candidate is in the phrase dictionary, so only one is returned as a match.
-
-The algorithm is O(n) at run-time for document of length n because we're only
-ever matching over the tag patterns. So no matter how many phrases we're
-looking for, our pattern set stays very small (exact size depends on the
-maximum length we're looking for, as the query language currently has no
-quantifiers).
-
-The example expects a .bz2 file from the Reddit corpus, and a patterns file,
-formatted in jsonl as a sequence of entries like this:
-
-{"text":"Anchorage"}
-{"text":"Angola"}
-{"text":"Ann Arbor"}
-{"text":"Annapolis"}
-{"text":"Appalachia"}
-{"text":"Argentina"}
-
-Reddit comments corpus:
-* https://files.pushshift.io/reddit/
-* https://archive.org/details/2015_reddit_comments_corpus
-
-Compatible with: spaCy v2.0.0+
-"""
-from __future__ import print_function, unicode_literals, division
-
-from bz2 import BZ2File
-import time
-import plac
-import json
-
-from spacy.matcher import PhraseMatcher
-import spacy
-
-
-@plac.annotations(
-    patterns_loc=("Path to gazetteer", "positional", None, str),
-    text_loc=("Path to Reddit corpus file", "positional", None, str),
-    n=("Number of texts to read", "option", "n", int),
-    lang=("Language class to initialise", "option", "l", str),
-)
-def main(patterns_loc, text_loc, n=10000, lang="en"):
-    nlp = spacy.blank(lang)
-    nlp.vocab.lex_attr_getters = {}
-    phrases = read_gazetteer(nlp.tokenizer, patterns_loc)
-    count = 0
-    t1 = time.time()
-    for ent_id, text in get_matches(nlp.tokenizer, phrases, read_text(text_loc, n=n)):
-        count += 1
-    t2 = time.time()
-    print("%d docs in %.3f s. %d matches" % (n, (t2 - t1), count))
-
-
-def read_gazetteer(tokenizer, loc, n=-1):
-    for i, line in enumerate(open(loc)):
-        data = json.loads(line.strip())
-        phrase = tokenizer(data["text"])
-        for w in phrase:
-            _ = tokenizer.vocab[w.text]
-        if len(phrase) >= 2:
-            yield phrase
-
-
-def read_text(bz2_loc, n=10000):
-    with BZ2File(bz2_loc) as file_:
-        for i, line in enumerate(file_):
-            data = json.loads(line)
-            yield data["body"]
-            if i >= n:
-                break
-
-
-def get_matches(tokenizer, phrases, texts):
-    matcher = PhraseMatcher(tokenizer.vocab)
-    matcher.add("Phrase", None, *phrases)
-    for text in texts:
-        doc = tokenizer(text)
-        for w in doc:
-            _ = doc.vocab[w.text]
-        matches = matcher(doc)
-        for ent_id, start, end in matches:
-            yield (ent_id, doc[start:end].text)
-
-
-if __name__ == "__main__":
-    if False:
-        import cProfile
-        import pstats
-
-        cProfile.runctx("plac.call(main)", globals(), locals(), "Profile.prof")
-        s = pstats.Stats("Profile.prof")
-        s.strip_dirs().sort_stats("time").print_stats()
-    else:
-        plac.call(main)
--- a/examples/keras_parikh_entailment/README.md
+++ b/examples/keras_parikh_entailment/README.md
@ -1,114 +0,0 @@
-<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
-
-# A decomposable attention model for Natural Language Inference
-**by Matthew Honnibal, [@honnibal](https://github.com/honnibal)**
-**Updated for spaCy 2.0+ and Keras 2.2.2+ by John Stewart, [@free-variation](https://github.com/free-variation)**
-
-This directory contains an implementation of the entailment prediction model described
-by [Parikh et al. (2016)](https://arxiv.org/pdf/1606.01933.pdf). The model is notable
-for its competitive performance with very few parameters.
-
-The model is implemented using [Keras](https://keras.io/) and [spaCy](https://spacy.io).
-Keras is used to build and train the network. spaCy is used to load
-the [GloVe](http://nlp.stanford.edu/projects/glove/) vectors, perform the
-feature extraction, and help you apply the model at run-time. The following
-demo code shows how the entailment model  can be used at runtime, once the
-hook is installed to customise the `.similarity()` method of spaCy's `Doc`
-and `Span` objects:
-
-```python
-def demo(shape):
-	nlp = spacy.load('en_vectors_web_lg')
-    nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))
-
-    doc1 = nlp(u'The king of France is bald.')
-    doc2 = nlp(u'France has no king.')
-
-    print("Sentence 1:", doc1)
-    print("Sentence 2:", doc2)
-
-    entailment_type, confidence = doc1.similarity(doc2)
-    print("Entailment type:", entailment_type, "(Confidence:", confidence, ")")
-```
-
-Which gives the output `Entailment type: contradiction (Confidence: 0.60604566)`, showing that
-the system has definite opinions about Betrand Russell's [famous conundrum](https://users.drew.edu/jlenz/br-on-denoting.html)!
-
-I'm working on a blog post to explain Parikh et al.'s model in more detail.
-A [notebook](https://github.com/free-variation/spaCy/blob/master/examples/notebooks/Decompositional%20Attention.ipynb) is available that briefly explains this implementation.
-I think it is a very interesting example of the attention mechanism, which
-I didn't understand very well before working through this paper. There are
-lots of ways to extend the model.
-
-## What's where
-
-| File | Description |
-| --- | --- |
-| `__main__.py` | The script that will be executed. Defines the CLI, the data reading, etc — all the boring stuff. |
-| `spacy_hook.py` | Provides a class `KerasSimilarityShim` that lets you use an arbitrary function to customize spaCy's `doc.similarity()` method. Instead of the default average-of-vectors algorithm, when you call `doc1.similarity(doc2)`, you'll get the result of `your_model(doc1, doc2)`. |
-| `keras_decomposable_attention.py` | Defines the neural network model. |
-
-## Setting up
-
-First, install [Keras](https://keras.io/), [spaCy](https://spacy.io) and the spaCy
-English models (about 1GB of data):
-
-```bash
-pip install keras
-pip install spacy
-python -m spacy download en_vectors_web_lg
-```
-
-You'll also want to get Keras working on your GPU, and you will need a backend, such as TensorFlow or Theano.
-This will depend on your set up, so you're mostly on your own for this step. If you're using AWS, try the
-[NVidia AMI](https://aws.amazon.com/marketplace/pp/B00FYCDDTE). It made things pretty easy.
-
-Once you've installed the dependencies, you can run a small preliminary test of
-the Keras model:
-
-```bash
-py.test keras_parikh_entailment/keras_decomposable_attention.py
-```
-
-This compiles the model and fits it with some dummy data. You should see that
-both tests passed.
-
-Finally, download the [Stanford Natural Language Inference corpus](http://nlp.stanford.edu/projects/snli/).
-
-## Running the example
-
-You can run the `keras_parikh_entailment/` directory as a script, which executes the file
-[`keras_parikh_entailment/__main__.py`](__main__.py).  If you run the script without arguments
-the usage is shown.  Running it with `-h` explains the command line arguments.
-
-The first thing you'll want to do is train the model:
-
-```bash
-python keras_parikh_entailment/ train -t <path to SNLI train JSON> -s <path to SNLI dev JSON>
-```
-
-Training takes about 300 epochs for full accuracy, and I haven't rerun the full
-experiment since refactoring things to publish this example — please let me
-know if I've broken something. You should get to at least 85% on the development data even after 10-15 epochs.
-
-The other two modes demonstrate run-time usage. I never like relying on the accuracy printed
-by `.fit()` methods. I never really feel confident until I've run a new process that loads
-the model and starts making predictions, without access to the gold labels. I've therefore
-included an `evaluate` mode. 
-
-```bash
-python keras_parikh_entailment/ evaluate -s <path to SNLI train JSON>
-```
-
-Finally, there's also a little demo, which mostly exists to show
-you how run-time usage will eventually look.
-
-```bash
-python keras_parikh_entailment/ demo
-```
-
-## Getting updates
-
-We should have the blog post explaining the model ready before the end of the week. To get
-notified when it's published, you can either follow me on [Twitter](https://twitter.com/honnibal)
-or subscribe to our [mailing list](http://eepurl.com/ckUpQ5).
--- a/Show More
+++ b/Show More
				`@ -0,0 +1 @@`
				`custom: [https://explosion.ai/merch, https://explosion.ai/tailored-solutions]`