Merge branch 'develop' into feature/init-config-cpu-gpu

2025-11-24 11:56:03 +03:00 · 2020-12-10 08:50:53 +11:00 · 2020-12-10 08:50:53 +11:00 · 9d32e839d3
commit 9d32e839d3
parent febf71af28 e09588e6ca
56 changed files with 7199 additions and 291 deletions
--- a/.github/contributors/KKsharma99.md
+++ b/.github/contributors/KKsharma99.md
@ -0,0 +1,108 @@
 <!-- This agreement was mistakenly submitted as an update to the CONTRIBUTOR_AGREEMENT.md template. Commit: 8a2d22222dec5cf910df5a378cbcd9ea2ab53ec4. It was therefore moved over manually. -->
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your 
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Kunal Sharma         |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 10/19/2020           |
 | GitHub username                | KKsharma99           |
 | Website (optional)             |                      |
--- a/.github/contributors/borijang.md
+++ b/.github/contributors/borijang.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [ ] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [x] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Borijan Georgievski  |
 | Company name (if applicable)   | Netcetera            |
 | Title or role (if applicable)  | Deta Scientist       |
 | Date                           | 2020.10.09           |
 | GitHub username                | borijang             |
 | Website (optional)             |                      |
--- a/.github/contributors/danielvasic.md
+++ b/.github/contributors/danielvasic.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Daniel Vasić         |
 | Company name (if applicable)   | University of Mostar |
 | Title or role (if applicable)  | Teaching asistant    |
 | Date                           | 13/10/2020           |
 | GitHub username                | danielvasic          |
 | Website (optional)             |                      |
--- a/.github/contributors/forest1988.md
+++ b/.github/contributors/forest1988.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Yusuke Mori          |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  | Ph.D. student        |
 | Date                           | 2020/11/22           |
 | GitHub username                | forest1988           |
 | Website (optional)             | https://forest1988.github.io  |
--- a/.github/contributors/jabortell.md
+++ b/.github/contributors/jabortell.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Jacob Bortell        |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 2020-11-20           |
 | GitHub username                | jabortell            |
 | Website (optional)             |                      |
--- a/.github/contributors/revuel.md
+++ b/.github/contributors/revuel.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your 
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Miguel Revuelta      |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 2020-11-17           |
 | GitHub username                | revuel               |
 | Website (optional)             |                      |
--- a/.github/contributors/robertsipek.md
+++ b/.github/contributors/robertsipek.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Robert Šípek         |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 22.10.2020           |
 | GitHub username                | @robertsipek         |
 | Website (optional)             |                      |
--- a/.github/contributors/vha14.md
+++ b/.github/contributors/vha14.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your 
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Vu Ha                |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 10-23-2020           |
 | GitHub username                | vha14                |
 | Website (optional)             |                      |
--- a/.github/contributors/walterhenry.md
+++ b/.github/contributors/walterhenry.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Walter Henry         |
 | Company name (if applicable)   | ExplosionAI GmbH     |
 | Title or role (if applicable)  | Executive Assistant  |
 | Date                           | September 14, 2020   |
 | GitHub username                | walterhenry          |
 | Website (optional)             |                      |
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@ -2,96 +2,113 @@ trigger:
  batch: true
  branches:
    include:
-    - '*'
+      - "*"
    exclude:
-    - 'spacy.io'
+      - "spacy.io"
  paths:
    exclude:
-    - 'website/*'
+      - "website/*"
-    - '*.md'
+      - "*.md"
 pr:
  paths:
    exclude:
-    - 'website/*'
+      - "website/*"
-    - '*.md'
+      - "*.md"
 jobs:
  # Perform basic checks for most important errors (syntax etc.) Uses the config
  # defined in .flake8 and overwrites the selected codes.
  - job: "Validate"
    pool:
      vmImage: "ubuntu-16.04"
    steps:
      - task: UsePythonVersion@0
        inputs:
          versionSpec: "3.7"
      - script: |
          pip install flake8==3.5.0
          python -m flake8 spacy --count --select=E901,E999,F821,F822,F823 --show-source --statistics
        displayName: "flake8"
-# Perform basic checks for most important errors (syntax etc.) Uses the config
+  - job: "Test"
-# defined in .flake8 and overwrites the selected codes.
+    dependsOn: "Validate"
- job: 'Validate'
+    strategy:
-  pool:
+      matrix:
-    vmImage: 'ubuntu-16.04'
+        Python36Linux:
-  steps:
+          imageName: "ubuntu-16.04"
-  - task: UsePythonVersion@0
+          python.version: "3.6"
-    inputs:
+        Python36Windows:
-      versionSpec: '3.7'
+          imageName: "vs2017-win2016"
-  - script: |
+          python.version: "3.6"
-      pip install flake8==3.5.0
+        Python36Mac:
-      python -m flake8 spacy --count --select=E901,E999,F821,F822,F823 --show-source --statistics
+          imageName: "macos-10.14"
-    displayName: 'flake8'
+          python.version: "3.6"
        # Don't test on 3.7 for now to speed up builds
        # Python37Linux:
        #   imageName: 'ubuntu-16.04'
        #   python.version: '3.7'
        # Python37Windows:
        #   imageName: 'vs2017-win2016'
        #   python.version: '3.7'
        # Python37Mac:
        #   imageName: 'macos-10.14'
        #   python.version: '3.7'
        Python38Linux:
          imageName: "ubuntu-16.04"
          python.version: "3.8"
        Python38Windows:
          imageName: "vs2017-win2016"
          python.version: "3.8"
        Python38Mac:
          imageName: "macos-10.14"
          python.version: "3.8"
        # Python39Linux:
        #   imageName: "ubuntu-16.04"
        #   python.version: "3.9"
        # Python39Windows:
        #   imageName: "vs2017-win2016"
        #   python.version: "3.9"
        # Python39Mac:
        #   imageName: "macos-10.14"
        #   python.version: "3.9"
      maxParallel: 4
    pool:
      vmImage: $(imageName)
- job: 'Test'
+    steps:
-  dependsOn: 'Validate'
+      - task: UsePythonVersion@0
-  strategy:
+        inputs:
-    matrix:
+          versionSpec: "$(python.version)"
-      Python36Linux:
+          architecture: "x64"
        imageName: 'ubuntu-16.04'
        python.version: '3.6'
      Python36Windows:
        imageName: 'vs2017-win2016'
        python.version: '3.6'
      Python36Mac:
        imageName: 'macos-10.14'
        python.version: '3.6'
      # Don't test on 3.7 for now to speed up builds
      # Python37Linux:
      #   imageName: 'ubuntu-16.04'
      #   python.version: '3.7'
      # Python37Windows:
      #   imageName: 'vs2017-win2016'
      #   python.version: '3.7'
      # Python37Mac:
      #   imageName: 'macos-10.14'
      #   python.version: '3.7'
      Python38Linux:
        imageName: 'ubuntu-16.04'
        python.version: '3.8'
      Python38Windows:
        imageName: 'vs2017-win2016'
        python.version: '3.8'
      Python38Mac:
        imageName: 'macos-10.14'
        python.version: '3.8'
    maxParallel: 4
  pool:
    vmImage: $(imageName)
-  steps:
+      - script: |
-  - task: UsePythonVersion@0
+          python -m pip install -U pip setuptools
-    inputs:
+          pip install -r requirements.txt
-      versionSpec: '$(python.version)'
+        displayName: "Install dependencies"
-      architecture: 'x64'
+        condition: not(eq(variables['python.version'], '3.5'))
-  - script: |
+      - script: |
-      python -m pip install -U setuptools
+          python setup.py build_ext --inplace -j 2
-      pip install -r requirements.txt
+          python setup.py sdist --formats=gztar
-    displayName: 'Install dependencies'
+        displayName: "Compile and build sdist"
-  - script: |
+      - task: DeleteFiles@1
-      python setup.py build_ext --inplace
+        inputs:
-      python setup.py sdist --formats=gztar
+          contents: "spacy"
-    displayName: 'Compile and build sdist'
+        displayName: "Delete source directory"
-  - task: DeleteFiles@1
+      - script: |
-    inputs:
+          pip freeze > installed.txt
-      contents: 'spacy'
+          pip uninstall -y -r installed.txt
-    displayName: 'Delete source directory'
+        displayName: "Uninstall all packages"
-  - bash: |
+      - bash: |
-      SDIST=$(python -c "import os;print(os.listdir('./dist')[-1])" 2>&1)
+          SDIST=$(python -c "import os;print(os.listdir('./dist')[-1])" 2>&1)
-      pip install dist/$SDIST
+          pip install dist/$SDIST
-    displayName: 'Install from sdist'
+        displayName: "Install from sdist"
        condition: not(eq(variables['python.version'], '3.5'))
-  - script: python -m pytest --pyargs spacy
+      - script: |
-    displayName: 'Run tests'
+          pip install -r requirements.txt
          python -m pytest --pyargs spacy
        displayName: "Run tests"
--- a/build-constraints.txt
+++ b/build-constraints.txt
@ -0,0 +1,5 @@
 # build version constraints for use with wheelwright + multibuild
 numpy==1.15.0; python_version<='3.7'
 numpy==1.17.3; python_version=='3.8'
 numpy==1.19.3; python_version=='3.9'
 numpy; python_version>='3.10'
--- a/netlify.toml
+++ b/netlify.toml
@ -3,6 +3,8 @@ redirects = [
    {from = "https://spacy.netlify.com/*", to="https://spacy.io/:splat", force = true },
    # Subdomain for branches
    {from = "https://nightly.spacy.io/*", to="https://nightly-spacy-io.spacy.io/:splat", force = true, status = 200},
    # TODO: update this with the v2 branch build once v3 is live (status = 200)
    {from = "https://v2.spacy.io/*", to="https://spacy.io/:splat", force = true},
    # Old subdomains
    {from = "https://survey.spacy.io/*", to = "https://spacy.io", force = true},
    {from = "http://survey.spacy.io/*", to = "https://spacy.io", force = true},
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,13 +1,16 @@
 [build-system]
 requires = [
    "setuptools",
    "wheel",
    "cython>=0.25",
    "cymem>=2.0.2,<2.1.0",
    "preshed>=3.0.2,<3.1.0",
    "murmurhash>=0.28.0,<1.1.0",
    "thinc>=8.0.0rc2,<8.1.0",
    "blis>=0.4.0,<0.8.0",
-    "pathy"
+    "pathy",
    "numpy==1.15.0; python_version<='3.7'",
    "numpy==1.17.3; python_version=='3.8'",
    "numpy==1.19.3; python_version=='3.9'",
    "numpy; python_version>='3.10'",
 ]
 build-backend = "setuptools.build_meta"
--- a/setup.cfg
+++ b/setup.cfg
@ -20,6 +20,7 @@ classifiers =
    Programming Language :: Python :: 3.6
    Programming Language :: Python :: 3.7
    Programming Language :: Python :: 3.8
    Programming Language :: Python :: 3.9
    Topic :: Scientific/Engineering
 [options]
@ -27,7 +28,6 @@ zip_safe = false
 include_package_data = true
 python_requires = >=3.6
 setup_requires =
    wheel
    cython>=0.25
    numpy>=1.15.0
    # We also need our Cython packages here to compile against
--- a/setup.py
+++ b/setup.py
@ -2,9 +2,9 @@
 from setuptools import Extension, setup, find_packages
 import sys
 import platform
 import numpy
 from distutils.command.build_ext import build_ext
 from distutils.sysconfig import get_python_inc
 import numpy
 from pathlib import Path
 import shutil
 from Cython.Build import cythonize
@ -194,8 +194,8 @@ def setup_package():
            print(f"Copied {copy_file} -> {target_dir}")
    include_dirs = [
        get_python_inc(plat_specific=True),
        numpy.get_include(),
        get_python_inc(plat_specific=True),
    ]
    ext_modules = []
    for name in MOD_NAMES:
@ -212,7 +212,7 @@ def setup_package():
        ext_modules=ext_modules,
        cmdclass={"build_ext": build_ext_subclass},
        include_dirs=include_dirs,
-        package_data={"": ["*.pyx", "*.pxd", "*.pxi", "*.cpp"]},
+        package_data={"": ["*.pyx", "*.pxd", "*.pxi"]},
    )
--- a/spacy/cli/init_config.py
+++ b/spacy/cli/init_config.py
@ -45,14 +45,16 @@ def init_config_cli(
    if isinstance(optimize, Optimizations):  # instance of enum from the CLI
        optimize = optimize.value
    pipeline = string_to_list(pipeline)
-    init_config(
+    is_stdout = str(output_file) == "-"
-        output_file,
+    config = init_config(
        lang=lang,
        pipeline=pipeline,
        optimize=optimize,
        gpu=gpu,
        pretraining=pretraining,
        silent=is_stdout,
    )
    save_config(config, output_file, is_stdout=is_stdout)
@init_cli.command("fill-config")
@ -118,16 +120,15 @@ def fill_config(
 def init_config(
    output_file: Path,
    *,
    lang: str,
    pipeline: List[str],
    optimize: str,
    gpu: bool,
    pretraining: bool = False,
-) -> None:
+    silent: bool = True,
-    is_stdout = str(output_file) == "-"
+) -> Config:
-    msg = Printer(no_print=is_stdout)
+    msg = Printer(no_print=silent)
    with TEMPLATE_PATH.open("r") as f:
        template = Template(f.read())
    # Filter out duplicates since tok2vec and transformer are added by template
@ -173,7 +174,7 @@ def init_config(
            pretrain_config = util.load_config(DEFAULT_CONFIG_PRETRAIN_PATH)
            config = pretrain_config.merge(config)
    msg.good("Auto-filled config with all values")
-    save_config(config, output_file, is_stdout=is_stdout)
+    return config
 def save_config(
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -119,6 +119,10 @@ class Warnings:
            "call the {matcher} on each Doc object.")
    W107 = ("The property `Doc.{prop}` is deprecated. Use "
            "`Doc.has_annotation(\"{attr}\")` instead.")
    W108 = ("The rule-based lemmatizer did not find POS annotation for the "
            "token '{text}'. Check that your pipeline includes components that "
            "assign token.pos, typically 'tagger'+'attribute_ruler' or "
            "'morphologizer'.")
@add_codes
--- a/spacy/lang/char_classes.py
+++ b/spacy/lang/char_classes.py
@ -210,8 +210,12 @@ _ukrainian_lower = r"а-щюяіїєґ"
 _ukrainian_upper = r"А-ЩЮЯІЇЄҐ"
 _ukrainian = r"а-щюяіїєґА-ЩЮЯІЇЄҐ"
-_upper = LATIN_UPPER + _russian_upper + _tatar_upper + _greek_upper + _ukrainian_upper
+_macedonian_lower = r"ѓѕјљњќѐѝ"
-_lower = LATIN_LOWER + _russian_lower + _tatar_lower + _greek_lower + _ukrainian_lower
+_macedonian_upper = r"ЃЅЈЉЊЌЀЍ"
 _macedonian = r"ѓѕјљњќѐѝЃЅЈЉЊЌЀЍ"
 _upper = LATIN_UPPER + _russian_upper + _tatar_upper + _greek_upper + _ukrainian_upper + _macedonian_upper
 _lower = LATIN_LOWER + _russian_lower + _tatar_lower + _greek_lower + _ukrainian_lower + _macedonian_lower
 _uncased = (
    _bengali
@ -226,7 +230,7 @@ _uncased = (
    + _cjk
 )
-ALPHA = group_chars(LATIN + _russian + _tatar + _greek + _ukrainian + _uncased)
+ALPHA = group_chars(LATIN + _russian + _tatar + _greek + _ukrainian + _macedonian + _uncased)
 ALPHA_LOWER = group_chars(_lower + _uncased)
 ALPHA_UPPER = group_chars(_upper + _uncased)
--- a/spacy/lang/cs/init.py
+++ b/spacy/lang/cs/init.py
@ -1,9 +1,16 @@
 from .stop_words import STOP_WORDS
 from .tag_map import TAG_MAP
 from ...language import Language
 from ...attrs import LANG
 from .lex_attrs import LEX_ATTRS
 from ...language import Language
 class CzechDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = lambda text: "cs"
    tag_map = TAG_MAP
    stop_words = STOP_WORDS
    lex_attr_getters = LEX_ATTRS
--- a/spacy/lang/cs/tag_map.py
+++ b/spacy/lang/cs/tag_map.py
--- a/spacy/lang/en/syntax_iterators.py
+++ b/spacy/lang/en/syntax_iterators.py
@ -6,10 +6,21 @@ from ...tokens import Doc, Span
 def noun_chunks(doclike: Union[Doc, Span]) -> Iterator[Span]:
-    """Detect base noun phrases from a dependency parse. Works on Doc and Span."""
+    """
-    # fmt: off
+    Detect base noun phrases from a dependency parse. Works on both Doc and Span.
-    labels = ["nsubj", "dobj", "nsubjpass", "pcomp", "pobj", "dative", "appos", "attr", "ROOT"]
+    """
-    # fmt: on
+    labels = [
        "oprd",
        "nsubj",
        "dobj",
        "nsubjpass",
        "pcomp",
        "pobj",
        "dative",
        "appos",
        "attr",
        "ROOT",
    ]
    doc = doclike.doc  # Ensure works on both Doc and Span.
    if not doc.has_annotation("DEP"):
        raise ValueError(Errors.E029)
--- a/spacy/lang/mk/init.py
+++ b/spacy/lang/mk/init.py
@ -0,0 +1,48 @@
 from typing import Optional
 from thinc.api import Model
 from .lemmatizer import MacedonianLemmatizer
 from .stop_words import STOP_WORDS
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .lex_attrs import LEX_ATTRS
 from ..tokenizer_exceptions import BASE_EXCEPTIONS
 from ...language import Language
 from ...attrs import LANG
 from ...util import update_exc
 from ...lookups import Lookups
 class MacedonianDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters[LANG] = lambda text: "mk"
    # Optional: replace flags with custom functions, e.g. like_num()
    lex_attr_getters.update(LEX_ATTRS)
    # Merge base exceptions and custom tokenizer exceptions
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    @classmethod
    def create_lemmatizer(cls, nlp=None, lookups=None):
        if lookups is None:
            lookups = Lookups()
        return MacedonianLemmatizer(lookups)
 class Macedonian(Language):
    lang = "mk"
    Defaults = MacedonianDefaults
@Macedonian.factory(
    "lemmatizer",
    assigns=["token.lemma"],
    default_config={"model": None, "mode": "rule"},
    default_score_weights={"lemma_acc": 1.0},
 )
 def make_lemmatizer(nlp: Language, model: Optional[Model], name: str, mode: str):
    return MacedonianLemmatizer(nlp.vocab, model, name, mode=mode)
 __all__ = ["Macedonian"]
--- a/spacy/lang/mk/lemmatizer.py
+++ b/spacy/lang/mk/lemmatizer.py
@ -0,0 +1,55 @@
 from typing import List
 from collections import OrderedDict
 from ...pipeline import Lemmatizer
 from ...tokens import Token
 class MacedonianLemmatizer(Lemmatizer):
    def rule_lemmatize(self, token: Token) -> List[str]:
        string = token.text
        univ_pos = token.pos_.lower()
        morphology = token.morph.to_dict()
        if univ_pos in ("", "eol", "space"):
            return [string.lower()]
        if string[-3:] == 'јќи':
            string = string[:-3]
            univ_pos = "verb"
        if callable(self.is_base_form) and self.is_base_form(univ_pos, morphology):
            return [string.lower()]
        index_table = self.lookups.get_table("lemma_index", {})
        exc_table = self.lookups.get_table("lemma_exc", {})
        rules_table = self.lookups.get_table("lemma_rules", {})
        if not any((index_table.get(univ_pos), exc_table.get(univ_pos), rules_table.get(univ_pos))):
            if univ_pos == "propn":
                return [string]
            else:
                return [string.lower()]
        index = index_table.get(univ_pos, {})
        exceptions = exc_table.get(univ_pos, {})
        rules = rules_table.get(univ_pos, [])
        orig = string
        string = string.lower()
        forms = []
        for old, new in rules:
            if string.endswith(old):
                form = string[: len(string) - len(old)] + new
                if not form:
                    continue
                if form in index or not form.isalpha():
                    forms.append(form)
        forms = list(OrderedDict.fromkeys(forms))
        for form in exceptions.get(string, []):
            if form not in forms:
                forms.insert(0, form)
        if not forms:
            forms.append(orig)
        return forms
--- a/spacy/lang/mk/lex_attrs.py
+++ b/spacy/lang/mk/lex_attrs.py
@ -0,0 +1,55 @@
 from ...attrs import LIKE_NUM
 _num_words = [
    "нула", "еден", "една", "едно", "два", "две", "три", "четири", "пет", "шест", "седум", "осум", "девет", "десет",
    "единаесет", "дванаесет", "тринаесет", "четиринаесет", "петнаесет", "шеснаесет", "седумнаесет", "осумнаесет",
    "деветнаесет", "дваесет", "триесет", "четириесет", "педесет", "шеесет", "седумдесет", "осумдесет", "деведесет",
    "сто", "двесте", "триста", "четиристотини", "петстотини", "шестотини", "седумстотини", "осумстотини",
    "деветстотини", "илјада", "илјади", 'милион', 'милиони', 'милијарда', 'милијарди', 'билион', 'билиони',
    "двајца", "тројца", "четворица", "петмина", "шестмина", "седуммина", "осуммина", "деветмина", "обата", "обајцата",
    "прв", "втор", "трет", "четврт", "седм", "осм", "двестоти",
    "два-три", "два-триесет", "два-триесетмина", "два-тринаесет", "два-тројца", "две-три", "две-тристотини",
    "пет-шеесет", "пет-шеесетмина", "пет-шеснаесетмина", "пет-шест", "пет-шестмина", "пет-шестотини", "петина",
    "осмина", "седум-осум", "седум-осумдесет", "седум-осуммина", "седум-осумнаесет", "седум-осумнаесетмина",
    "три-четириесет", "три-четиринаесет", "шеесет", "шеесетина", "шеесетмина", "шеснаесет", "шеснаесетмина",
    "шест-седум", "шест-седумдесет", "шест-седумнаесет", "шест-седумстотини", "шестоти", "шестотини"
 ]
 def like_num(text):
    if text.startswith(("+", "-", "±", "~")):
        text = text[1:]
    text = text.replace(",", "").replace(".", "")
    if text.isdigit():
        return True
    if text.count("/") == 1:
        num, denom = text.split("/")
        if num.isdigit() and denom.isdigit():
            return True
    text_lower = text.lower()
    if text_lower in _num_words:
        return True
    if text_lower.endswith(("а", "о", "и")):
        if text_lower[:-1] in _num_words:
            return True
    if text_lower.endswith(("ти", "та", "то", "на")):
        if text_lower[:-2] in _num_words:
            return True
    if text_lower.endswith(("ата", "иот", "ите", "ина", "чки")):
        if text_lower[:-3] in _num_words:
            return True
    if text_lower.endswith(("мина", "тина")):
        if text_lower[:-4] in _num_words:
            return True
    return False
 LEX_ATTRS = {LIKE_NUM: like_num}
--- a/spacy/lang/mk/stop_words.py
+++ b/spacy/lang/mk/stop_words.py
@ -0,0 +1,815 @@
 STOP_WORDS = set(
    """
 а
 абре
 aв
 аи
 ако
 алало
 ам
 ама
 аман
 ами
 амин
 априли-ли-ли
 ау
 аух
 ауч
 ах
 аха
 аха-ха
 аш
 ашколсум
 ашколсун
 ај
 ајде
 ајс
 аџаба
 бавно
 бам
 бам-бум
 бап
 бар
 баре
 барем
 бау
 бау-бау
 баш
 бај
 бе
 беа
 бев
 бевме
 бевте
 без
 безбели
 бездруго
 белки
 беше
 би
 бидејќи
 бим
 бис
 бла
 блазе
 богами
 божем
 боц
 браво
 бравос
 бре
 бреј
 брзо
 бришка
 бррр
 бу
 бум
 буф
 буц
 бујрум
 ваа
 вам
 варај
 варда
 вас
 вај
 ве
 велат
 вели
 версус
 веќе
 ви
 виа
 види
 вие
 вистина
 витос
 внатре
 во
 воз
 вон
 впрочем
 врв
 вред
 време
 врз
 всушност
 втор
 галиба
 ги
 гитла
 го
 годе
 годишник
 горе
 гра
 гуц
 гљу
 да
 даан
 дава
 дал
 дали
 дан
 два
 дваесет
 дванаесет
 двајца
 две
 двесте
 движам
 движат
 движи
 движиме
 движите
 движиш
 де
 деведесет
 девет
 деветнаесет
 деветстотини
 деветти
 дека
 дел
 делми
 демек
 десет
 десетина
 десетти
 деситици
 дејгиди
 дејди
 ди
 дилми
 дин
 дип
 дно
 до
 доволно
 додека
 додуша
 докај
 доколку
 доправено
 доправи
 досамоти
 доста
 држи
 дрн
 друг
 друга
 другата
 други
 другиот
 другите
 друго
 другото
 дум
 дур
 дури
 е
 евала
 еве
 евет
 ега
 егиди
 еден
 едикојси
 единаесет
 единствено
 еднаш
 едно
 ексик
 ела
 елбете
 елем
 ели
 ем
 еми
 ене
 ете
 еурека
 ех
 еј
 жими
 жити
 за
 завал
 заврши
 зад
 задека
 задоволна
 задржи
 заедно
 зар
 зарад
 заради
 заре
 зарем
 затоа
 зашто
 згора
 зема
 земе
 земува
 зер
 значи
 зошто
 зуј
 и
 иако
 из
 извезен
 изгледа
 измеѓу
 износ
 или
 или-или
 илјада
 илјади
 им
 има
 имаа
 имаат
 имавме
 имавте
 имам
 имаме
 имате
 имаш
 имаше
 име
 имено
 именува
 имплицира
 имплицираат
 имплицирам
 имплицираме
 имплицирате
 имплицираш
 инаку
 индицира
 исечок
 исклучен
 исклучена
 исклучени
 исклучено
 искористен
 искористена
 искористени
 искористено
 искористи
 искрај
 исти
 исто
 итака
 итн
 их
 иха
 ихуу
 иш
 ишала
 иј
 ка
 каде
 кажува
 како
 каков
 камоли
 кај
 ква
 ки
 кит
 кло
 клум
 кога
 кого
 кого-годе
 кое
 кои
 количество
 количина
 колку
 кому
 кон
 користена
 користени
 користено
 користи
 кот
 котрр
 кош-кош
 кој
 која
 којзнае
 којшто
 кр-кр-кр
 крај
 крек
 крз
 крк
 крц
 куку
 кукуригу
 куш
 ле
 лебами
 леле
 лели
 ли
 лиду
 луп
 ма
 макар
 малку
 марш
 мат
 мац
 машала
 ме
 мене
 место
 меѓу
 меѓувреме
 меѓутоа
 ми
 мое
 може
 можеби
 молам
 моли
 мор
 мора
 море
 мори
 мразец
 му
 муклец
 мутлак
 муц
 мјау
 на
 навидум
 навистина
 над
 надвор
 назад
 накај
 накрај
 нали
 нам
 наместо
 наоколу
 направено
 направи
 напред
 нас
 наспоред
 наспрема
 наспроти
 насред
 натаму
 натема
 начин
 наш
 наша
 наше
 наши
 нај
 најдоцна
 најмалку
 најмногу
 не
 неа
 него
 негов
 негова
 негови
 негово
 незе
 нека
 некаде
 некако
 некаков
 некого
 некое
 некои
 неколку
 некому
 некој
 некојси
 нели
 немој
 нему
 неоти
 нечиј
 нешто
 нејзе
 нејзин
 нејзини
 нејзино
 нејсе
 ни
 нив
 нивен
 нивна
 нивни
 нивно
 ние
 низ
 никаде
 никако
 никогаш
 никого
 никому
 никој
 ним
 нити
 нито
 ниту
 ничиј
 ништо
 но
 нѐ
 о
 обр
 ова
 ова-она
 оваа
 овај
 овде
 овега
 овие
 овој
 од
 одавде
 оди
 однесува
 односно
 одошто
 околу
 олеле
 олкацок
 он
 она
 онаа
 онака
 онаков
 онде
 они
 оние
 оно
 оној
 оп
 освем
 освен
 осем
 осми
 осум
 осумдесет
 осумнаесет
 осумстотитни
 отаде
 оти
 откако
 откај
 откога
 отколку
 оттаму
 оттука
 оф
 ох
 ој
 па
 пак
 папа
 пардон
 пате-ќуте
 пати
 пау
 паче
 пеесет
 пеки
 пет
 петнаесет
 петстотини
 петти
 пи
 пи-пи
 пис
 плас
 плус
 по
 побавно
 поблиску
 побрзо
 побуни
 повеќе
 повторно
 под
 подалеку
 подолу
 подоцна
 подруго
 позади
 поинаква
 поинакви
 поинакво
 поинаков
 поинаку
 покаже
 покажува
 покрај
 полно
 помалку
 помеѓу
 понатаму
 понекогаш
 понекој
 поради
 поразличен
 поразлична
 поразлични
 поразлично
 поседува
 после
 последен
 последна
 последни
 последно
 поспоро
 потег
 потоа
 пошироко
 прави
 празно
 прв
 пред
 през
 преку
 претежно
 претходен
 претходна
 претходни
 претходник
 претходно
 при
 присвои
 притоа
 причинува
 пријатно
 просто
 против
 прр
 пст
 пук
 пусто
 пуф
 пуј
 пфуј
 пшт
 ради
 различен
 различна
 различни
 различно
 разни
 разоружен
 разредлив
 рамките
 рамнообразно
 растревожено
 растреперено
 расчувствувано
 ратоборно
 рече
 роден
 с
 сакан
 сам
 сама
 сами
 самите
 само
 самоти
 свое
 свои
 свој
 своја
 се
 себе
 себеси
 сега
 седми
 седум
 седумдесет
 седумнаесет
 седумстотини
 секаде
 секаков
 секи
 секогаш
 секого
 секому
 секој
 секојдневно
 сем
 сенешто
 сепак
 сериозен
 сериозна
 сериозни
 сериозно
 сет
 сечиј
 сешто
 си
 сиктер
 сиот
 сип
 сиреч
 сите
 сичко
 скок
 скоро
 скрц
 следбеник
 следбеничка
 следен
 следователно
 следствено
 сме
 со
 соне
 сопствен
 сопствена
 сопствени
 сопствено
 сосе
 сосем
 сполај
 според
 споро
 спрема
 спроти
 спротив
 сред
 среде
 среќно
 срочен
 сст
 става
 ставаат
 ставам
 ставаме
 ставате
 ставаш
 стави
 сте
 сто
 стоп
 страна
 сум
 сума
 супер
 сус
 сѐ
 та
 таа
 така
 таква
 такви
 таков
 тамам
 таму
 тангар-мангар
 тандар-мандар
 тап
 твое
 те
 тебе
 тебека
 тек
 текот
 ти
 тие
 тизе
 тик-так
 тики
 тоа
 тогаш
 тој
 трак
 трака-трука
 трас
 треба
 трет
 три
 триесет
 тринаест
 триста
 труп
 трупа
 трус
 ту
 тука
 туку
 тукушто
 туф
 у
 уа
 убаво
 уви
 ужасно
 уз
 ура
 уу
 уф
 уха
 уш
 уште
 фазен
 фала
 фил
 филан
 фис
 фиу
 фиљан
 фоб
 фон
 ха
 ха-ха
 хе
 хеј
 хеј
 хи
 хм
 хо
 цак
 цап
 целина
 цело
 цигу-лигу
 циц
 чекај
 често
 четврт
 четири
 четириесет
 четиринаесет
 четирстотини
 чие
 чии
 чик
 чик-чирик
 чини
 чиш
 чиј
 чија
 чијшто
 чкрап
 чому
 чук
 чукш
 чуму
 чунки
 шеесет
 шеснаесет
 шест
 шести
 шестотини
 ширум
 шлак
 шлап
 шлапа-шлупа
 шлуп
 шмрк
 што
 штогоде
 штом
 штотуку
 штрак
 штрап
 штрап-штруп
 шуќур
 ѓиди
 ѓоа
 ѓоамити
 ѕан
 ѕе
 ѕин
 ја
 јадец
 јазе
 јали
 јас
 јаска
 јок
 ќе
 ќешки
 ѝ
 џагара-магара
 џанам
 џив-џив
    """.split()
 )
--- a/spacy/lang/mk/tokenizer_exceptions.py
+++ b/spacy/lang/mk/tokenizer_exceptions.py
@ -0,0 +1,100 @@
 from ...symbols import ORTH, NORM
 _exc = {}
 _abbr_exc = [
    {ORTH: "м", NORM: "метар"},
    {ORTH: "мм", NORM: "милиметар"},
    {ORTH: "цм", NORM: "центиметар"},
    {ORTH: "см", NORM: "сантиметар"},
    {ORTH: "дм", NORM: "дециметар"},
    {ORTH: "км", NORM: "километар"},
    {ORTH: "кг", NORM: "килограм"},
    {ORTH: "дкг", NORM: "декаграм"},
    {ORTH: "дг", NORM: "дециграм"},
    {ORTH: "мг", NORM: "милиграм"},
    {ORTH: "г", NORM: "грам"},
    {ORTH: "т", NORM: "тон"},
    {ORTH: "кл", NORM: "килолитар"},
    {ORTH: "хл", NORM: "хектолитар"},
    {ORTH: "дкл", NORM: "декалитар"},
    {ORTH: "л", NORM: "литар"},
    {ORTH: "дл", NORM: "децилитар"}
 ]
 for abbr in _abbr_exc:
    _exc[abbr[ORTH]] = [abbr]
 _abbr_line_exc = [
    {ORTH: "д-р", NORM: "доктор"},
    {ORTH: "м-р", NORM: "магистер"},
    {ORTH: "г-ѓа", NORM: "госпоѓа"},
    {ORTH: "г-ца", NORM: "госпоѓица"},
    {ORTH: "г-дин", NORM: "господин"},
 ]
 for abbr in _abbr_line_exc:
    _exc[abbr[ORTH]] = [abbr]
 _abbr_dot_exc = [
    {ORTH: "в.", NORM: "век"},
    {ORTH: "в.д.", NORM: "вршител на должност"},
    {ORTH: "г.", NORM: "година"},
    {ORTH: "г.г.", NORM: "господин господин"},
    {ORTH: "м.р.", NORM: "машки род"},
    {ORTH: "год.", NORM: "женски род"},
    {ORTH: "с.р.", NORM: "среден род"},
    {ORTH: "н.е.", NORM: "наша ера"},
    {ORTH: "о.г.", NORM: "оваа година"},
    {ORTH: "о.м.", NORM: "овој месец"},
    {ORTH: "с.", NORM: "село"},
    {ORTH: "т.", NORM: "точка"},
    {ORTH: "т.е.", NORM: "то ест"},
    {ORTH: "т.н.", NORM: "таканаречен"},
    {ORTH: "бр.", NORM: "број"},
    {ORTH: "гр.", NORM: "град"},
    {ORTH: "др.", NORM: "другар"},
    {ORTH: "и др.", NORM: "и друго"},
    {ORTH: "и сл.", NORM: "и слично"},
    {ORTH: "кн.", NORM: "книга"},
    {ORTH: "мн.", NORM: "множина"},
    {ORTH: "на пр.", NORM: "на пример"},
    {ORTH: "св.", NORM: "свети"},
    {ORTH: "сп.", NORM: "списание"},
    {ORTH: "с.", NORM: "страница"},
    {ORTH: "стр.", NORM: "страница"},
    {ORTH: "чл.", NORM: "член"},
    {ORTH: "арх.", NORM: "архитект"},
    {ORTH: "бел.", NORM: "белешка"},
    {ORTH: "гимн.", NORM: "гимназија"},
    {ORTH: "ден.", NORM: "денар"},
    {ORTH: "ул.", NORM: "улица"},
    {ORTH: "инж.", NORM: "инженер"},
    {ORTH: "проф.", NORM: "професор"},
    {ORTH: "студ.", NORM: "студент"},
    {ORTH: "бот.", NORM: "ботаника"},
    {ORTH: "мат.", NORM: "математика"},
    {ORTH: "мед.", NORM: "медицина"},
    {ORTH: "прил.", NORM: "прилог"},
    {ORTH: "прид.", NORM: "придавка"},
    {ORTH: "сврз.", NORM: "сврзник"},
    {ORTH: "физ.", NORM: "физика"},
    {ORTH: "хем.", NORM: "хемија"},
    {ORTH: "пр. н.", NORM: "природни науки"},
    {ORTH: "истор.", NORM: "историја"},
    {ORTH: "геогр.", NORM: "географија"},
    {ORTH: "литер.", NORM: "литература"},
 ]
 for abbr in _abbr_dot_exc:
    _exc[abbr[ORTH]] = [abbr]
 TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/lang/tr/init.py
+++ b/spacy/lang/tr/init.py
@ -1,4 +1,4 @@
-from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
+from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS, TOKEN_MATCH
 from .stop_words import STOP_WORDS
 from .syntax_iterators import SYNTAX_ITERATORS
 from .lex_attrs import LEX_ATTRS
@ -9,6 +9,7 @@ class TurkishDefaults(Language.Defaults):
    tokenizer_exceptions = TOKENIZER_EXCEPTIONS
    lex_attr_getters = LEX_ATTRS
    stop_words = STOP_WORDS
    token_match = TOKEN_MATCH
    syntax_iterators = SYNTAX_ITERATORS
--- a/spacy/lang/tr/tokenizer_exceptions.py
+++ b/spacy/lang/tr/tokenizer_exceptions.py
@ -1,119 +1,181 @@
-from ..tokenizer_exceptions import BASE_EXCEPTIONS
+import re
 from ..punctuation import ALPHA_LOWER, ALPHA
 from ...symbols import ORTH, NORM
 from ...util import update_exc
-_exc = {"sağol": [{ORTH: "sağ"}, {ORTH: "ol", NORM: "olun"}]}
+_exc = {}
-for exc_data in [
+_abbr_period_exc = [
-    {ORTH: "A.B.D.", NORM: "Amerika Birleşik Devletleri"},
+    {ORTH: "A.B.D.", NORM: "Amerika"},
-    {ORTH: "Alb.", NORM: "Albay"},
+    {ORTH: "Alb.", NORM: "albay"},
-    {ORTH: "Ar.Gör.", NORM: "Araştırma Görevlisi"},
+    {ORTH: "Ank.", NORM: "Ankara"},
-    {ORTH: "Arş.Gör.", NORM: "Araştırma Görevlisi"},
+    {ORTH: "Ar.Gör."},
-    {ORTH: "Asb.", NORM: "Astsubay"},
+    {ORTH: "Arş.Gör."},
-    {ORTH: "Astsb.", NORM: "Astsubay"},
+    {ORTH: "Asb.", NORM: "astsubay"},
-    {ORTH: "As.İz.", NORM: "Askeri İnzibat"},
+    {ORTH: "Astsb.", NORM: "astsubay"},
-    {ORTH: "Atğm", NORM: "Asteğmen"},
+    {ORTH: "As.İz."},
-    {ORTH: "Av.", NORM: "Avukat"},
+    {ORTH: "as.iz."},
-    {ORTH: "Apt.", NORM: "Apartmanı"},
+    {ORTH: "Atğm", NORM: "asteğmen"},
-    {ORTH: "Bçvş.", NORM: "Başçavuş"},
+    {ORTH: "Av.", NORM: "avukat"},
    {ORTH: "Apt.", NORM: "apartmanı"},
    {ORTH: "apt.", NORM: "apartmanı"},
    {ORTH: "Bçvş.", NORM: "başçavuş"},
    {ORTH: "bçvş.", NORM: "başçavuş"},
    {ORTH: "bk.", NORM: "bakınız"},
    {ORTH: "bknz.", NORM: "bakınız"},
-    {ORTH: "Bnb.", NORM: "Binbaşı"},
+    {ORTH: "Bnb.", NORM: "binbaşı"},
    {ORTH: "bnb.", NORM: "binbaşı"},
-    {ORTH: "Böl.", NORM: "Bölümü"},
+    {ORTH: "Böl.", NORM: "bölümü"},
-    {ORTH: "Bşk.", NORM: "Başkanlığı"},
+    {ORTH: "böl.", NORM: "bölümü"},
-    {ORTH: "Bştbp.", NORM: "Baştabip"},
+    {ORTH: "Bşk.", NORM: "başkanlığı"},
-    {ORTH: "Bul.", NORM: "Bulvarı"},
+    {ORTH: "bşk.", NORM: "başkanlığı"},
-    {ORTH: "Cad.", NORM: "Caddesi"},
+    {ORTH: "Bştbp.", NORM: "baştabip"},
    {ORTH: "bştbp.", NORM: "baştabip"},
    {ORTH: "Bul.", NORM: "bulvarı"},
    {ORTH: "bul.", NORM: "bulvarı"},
    {ORTH: "Cad.", NORM: "caddesi"},
    {ORTH: "cad.", NORM: "caddesi"},
    {ORTH: "çev.", NORM: "çeviren"},
-    {ORTH: "Çvş.", NORM: "Çavuş"},
+    {ORTH: "Çvş.", NORM: "çavuş"},
    {ORTH: "çvş.", NORM: "çavuş"},
    {ORTH: "dak.", NORM: "dakika"},
    {ORTH: "dk.", NORM: "dakika"},
-    {ORTH: "Doç.", NORM: "Doçent"},
+    {ORTH: "Doç.", NORM: "doçent"},
-    {ORTH: "doğ.", NORM: "doğum tarihi"},
+    {ORTH: "doğ."},
    {ORTH: "Dr.", NORM: "doktor"},
    {ORTH: "dr.", NORM:"doktor"},
    {ORTH: "drl.", NORM: "derleyen"},
-    {ORTH: "Dz.", NORM: "Deniz"},
+    {ORTH: "Dz.", NORM: "deniz"},
-    {ORTH: "Dz.K.K.lığı", NORM: "Deniz Kuvvetleri Komutanlığı"},
+    {ORTH: "Dz.K.K.lığı"},
-    {ORTH: "Dz.Kuv.", NORM: "Deniz Kuvvetleri"},
+    {ORTH: "Dz.Kuv."},
-    {ORTH: "Dz.Kuv.K.", NORM: "Deniz Kuvvetleri Komutanlığı"},
+    {ORTH: "Dz.Kuv.K."},
    {ORTH: "dzl.", NORM: "düzenleyen"},
-    {ORTH: "Ecz.", NORM: "Eczanesi"},
+    {ORTH: "Ecz.", NORM: "eczanesi"},
    {ORTH: "ecz.", NORM: "eczanesi"},
    {ORTH: "ekon.", NORM: "ekonomi"},
-    {ORTH: "Fak.", NORM: "Fakültesi"},
+    {ORTH: "Fak.", NORM: "fakültesi"},
-    {ORTH: "Gn.", NORM: "Genel"},
+    {ORTH: "Gn.", NORM: "genel"},
    {ORTH: "Gnkur.", NORM: "Genelkurmay"},
    {ORTH: "Gn.Kur.", NORM: "Genelkurmay"},
    {ORTH: "gr.", NORM: "gram"},
-    {ORTH: "Hst.", NORM: "Hastanesi"},
+    {ORTH: "Hst.", NORM: "hastanesi"},
-    {ORTH: "Hs.Uzm.", NORM: "Hesap Uzmanı"},
+    {ORTH: "hst.", NORM: "hastanesi"},
    {ORTH: "Hs.Uzm."},
    {ORTH: "huk.", NORM: "hukuk"},
-    {ORTH: "Hv.", NORM: "Hava"},
+    {ORTH: "Hv.", NORM: "hava"},
-    {ORTH: "Hv.K.K.lığı", NORM: "Hava Kuvvetleri Komutanlığı"},
+    {ORTH: "Hv.K.K.lığı"},
-    {ORTH: "Hv.Kuv.", NORM: "Hava Kuvvetleri"},
+    {ORTH: "Hv.Kuv."},
-    {ORTH: "Hv.Kuv.K.", NORM: "Hava Kuvvetleri Komutanlığı"},
+    {ORTH: "Hv.Kuv.K."},
-    {ORTH: "Hz.", NORM: "Hazreti"},
+    {ORTH: "Hz.", NORM: "hazreti"},
-    {ORTH: "Hz.Öz.", NORM: "Hizmete Özel"},
+    {ORTH: "Hz.Öz."},
-    {ORTH: "İng.", NORM: "İngilizce"},
+    {ORTH: "İng.", NORM: "ingilizce"},
-    {ORTH: "Jeol.", NORM: "Jeoloji"},
+    {ORTH: "İst.", NORM: "İstanbul"},
    {ORTH: "Jeol.", NORM: "jeoloji"},
    {ORTH: "jeol.", NORM: "jeoloji"},
-    {ORTH: "Korg.", NORM: "Korgeneral"},
+    {ORTH: "Korg.", NORM: "korgeneral"},
-    {ORTH: "Kur.", NORM: "Kurmay"},
+    {ORTH: "Kur.", NORM: "kurmay"},
-    {ORTH: "Kur.Bşk.", NORM: "Kurmay Başkanı"},
+    {ORTH: "Kur.Bşk."},
-    {ORTH: "Kuv.", NORM: "Kuvvetleri"},
+    {ORTH: "Kuv.", NORM: "kuvvetleri"},
-    {ORTH: "Ltd.", NORM: "Limited"},
+    {ORTH: "Ltd.", NORM: "limited"},
-    {ORTH: "Mah.", NORM: "Mahallesi"},
+    {ORTH: "ltd.", NORM: "limited"},
    {ORTH: "Mah.", NORM: "mahallesi"},
    {ORTH: "mah.", NORM: "mahallesi"},
    {ORTH: "max.", NORM: "maksimum"},
    {ORTH: "min.", NORM: "minimum"},
-    {ORTH: "Müh.", NORM: "Mühendisliği"},
+    {ORTH: "Müh.", NORM: "mühendisliği"},
    {ORTH: "müh.", NORM: "mühendisliği"},
-    {ORTH: "MÖ.", NORM: "Milattan Önce"},
+    {ORTH: "M.Ö."},
-    {ORTH: "Onb.", NORM: "Onbaşı"},
+    {ORTH: "M.S."},
-    {ORTH: "Ord.", NORM: "Ordinaryüs"},
+    {ORTH: "Onb.", NORM: "onbaşı"},
-    {ORTH: "Org.", NORM: "Orgeneral"},
+    {ORTH: "Ord.", NORM: "ordinaryüs"},
-    {ORTH: "Ped.", NORM: "Pedagoji"},
+    {ORTH: "Org.", NORM: "orgeneral"},
-    {ORTH: "Prof.", NORM: "Profesör"},
+    {ORTH: "Ped.", NORM: "pedagoji"},
-    {ORTH: "Sb.", NORM: "Subay"},
+    {ORTH: "Prof.", NORM: "profesör"},
-    {ORTH: "Sn.", NORM: "Sayın"},
+    {ORTH: "prof.", NORM: "profesör"},
    {ORTH: "Sb.", NORM: "subay"},
    {ORTH: "Sn.", NORM: "sayın"},
    {ORTH: "sn.", NORM: "saniye"},
-    {ORTH: "Sok.", NORM: "Sokak"},
+    {ORTH: "Sok.", NORM: "sokak"},
-    {ORTH: "Şb.", NORM: "Şube"},
+    {ORTH: "sok.", NORM: "sokak"},
-    {ORTH: "Şti.", NORM: "Şirketi"},
+    {ORTH: "Şb.", NORM: "şube"},
-    {ORTH: "Tbp.", NORM: "Tabip"},
+    {ORTH: "şb.", NORM: "şube"},
-    {ORTH: "T.C.", NORM: "Türkiye Cumhuriyeti"},
+    {ORTH: "Şti.", NORM: "şirketi"},
-    {ORTH: "Tel.", NORM: "Telefon"},
+    {ORTH: "şti.", NORM: "şirketi"},
    {ORTH: "Tbp.", NORM: "tabip"},
    {ORTH: "tbp.", NORM: "tabip"},
    {ORTH: "T.C."},
    {ORTH: "Tel.", NORM: "telefon"},
    {ORTH: "tel.", NORM: "telefon"},
    {ORTH: "telg.", NORM: "telgraf"},
-    {ORTH: "Tğm.", NORM: "Teğmen"},
+    {ORTH: "Tğm.", NORM: "teğmen"},
    {ORTH: "tğm.", NORM: "teğmen"},
    {ORTH: "tic.", NORM: "ticaret"},
-    {ORTH: "Tug.", NORM: "Tugay"},
+    {ORTH: "Tug.", NORM: "tugay"},
-    {ORTH: "Tuğg.", NORM: "Tuğgeneral"},
+    {ORTH: "Tuğg.", NORM: "tuğgeneral"},
-    {ORTH: "Tümg.", NORM: "Tümgeneral"},
+    {ORTH: "Tümg.", NORM: "tümgeneral"},
-    {ORTH: "Uzm.", NORM: "Uzman"},
+    {ORTH: "Uzm.", NORM: "uzman"},
-    {ORTH: "Üçvş.", NORM: "Üstçavuş"},
+    {ORTH: "Üçvş.", NORM: "üstçavuş"},
-    {ORTH: "Üni.", NORM: "Üniversitesi"},
+    {ORTH: "Üni.", NORM: "üniversitesi"},
-    {ORTH: "Ütğm.", NORM: "Üsteğmen"},
+    {ORTH: "Ütğm.", NORM:  "üsteğmen"},
-    {ORTH: "vb.", NORM: "ve benzeri"},
+    {ORTH: "vb."},
    {ORTH: "vs.", NORM: "vesaire"},
-    {ORTH: "Yard.", NORM: "Yardımcı"},
+    {ORTH: "Yard.", NORM: "yardımcı"},
-    {ORTH: "Yar.", NORM: "Yardımcı"},
+    {ORTH: "Yar.", NORM: "yardımcı"},
-    {ORTH: "Yd.Sb.", NORM: "Yedek Subay"},
+    {ORTH: "Yd.Sb."},
-    {ORTH: "Yard.Doç.", NORM: "Yardımcı Doçent"},
+    {ORTH: "Yard.Doç."},
-    {ORTH: "Yar.Doç.", NORM: "Yardımcı Doçent"},
+    {ORTH: "Yar.Doç."},
-    {ORTH: "Yb.", NORM: "Yarbay"},
+    {ORTH: "Yb.", NORM: "yarbay"},
-    {ORTH: "Yrd.", NORM: "Yardımcı"},
+    {ORTH: "Yrd.", NORM: "yardımcı"},
-    {ORTH: "Yrd.Doç.", NORM: "Yardımcı Doçent"},
+    {ORTH: "Yrd.Doç."},
-    {ORTH: "Y.Müh.", NORM: "Yüksek mühendis"},
+    {ORTH: "Y.Müh."},
-    {ORTH: "Y.Mim.", NORM: "Yüksek mimar"},
+    {ORTH: "Y.Mim."},
-]:
+    {ORTH: "yy.", NORM: "yüzyıl"},
-    _exc[exc_data[ORTH]] = [exc_data]
+]
 for abbr in _abbr_period_exc:
    _exc[abbr[ORTH]] = [abbr]
 _abbr_exc = [
    {ORTH: "AB", NORM: "Avrupa Birliği"},
    {ORTH: "ABD", NORM: "Amerika"},
    {ORTH: "ABS", NORM: "fren"},
    {ORTH: "AOÇ"},
    {ORTH: "ASKİ"},
    {ORTH: "Bağ-kur", NORM: "Bağkur"},
    {ORTH: "BDDK"},
    {ORTH: "BJK", NORM: "Beşiktaş"},
    {ORTH: "ESA", NORM: "Avrupa uzay ajansı"},
    {ORTH: "FB", NORM: "Fenerbahçe"},
    {ORTH: "GATA"},
    {ORTH: "GS", NORM: "Galatasaray"},
    {ORTH: "İSKİ"},
    {ORTH: "KBB"},
    {ORTH: "RTÜK", NORM: "radyo ve televizyon üst kurulu"},
    {ORTH: "TBMM"},
    {ORTH: "TC"},
    {ORTH: "TÜİK", NORM: "Türkiye istatistik kurumu"},
    {ORTH: "YÖK"},
 ]
 for abbr in _abbr_exc:
    _exc[abbr[ORTH]] = [abbr]
 for orth in ["Dr.", "yy."]:
    _exc[orth] = [{ORTH: orth}]
 _num = r"[+-]?\d+([,.]\d+)*"
 _ord_num = r"(\d+\.)"
 _date = r"(((\d{1,2}[./-]){2})?(\d{4})|(\d{1,2}[./]\d{1,2}(\.)?))"
 _dash_num = r"(([{al}\d]+/\d+)|(\d+/[{al}]))".format(al=ALPHA)
 _roman_num =  "M{0,3}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})"
 _roman_ord = r"({rn})\.".format(rn=_roman_num)
 _time_exp = r"\d+(:\d+)*"
-TOKENIZER_EXCEPTIONS = update_exc(BASE_EXCEPTIONS, _exc)
+_inflections = r"'[{al}]+".format(al=ALPHA_LOWER)
 _abbrev_inflected = r"[{a}]+\.'[{al}]+".format(a=ALPHA, al=ALPHA_LOWER)
 _nums = r"(({d})|({dn})|({te})|({on})|({n})|({ro})|({rn}))({inf})?".format(d=_date, dn=_dash_num, te=_time_exp, on=_ord_num, n=_num, ro=_roman_ord, rn=_roman_num, inf=_inflections)
 TOKENIZER_EXCEPTIONS = _exc
 TOKEN_MATCH = re.compile(r"^({abbr})|({n})$".format(n=_nums, abbr=_abbrev_inflected)).match
--- a/spacy/language.py
+++ b/spacy/language.py
@ -968,10 +968,6 @@ class Language:
        DOCS: https://nightly.spacy.io/api/language#call
        """
        if len(text) > self.max_length:
            raise ValueError(
                Errors.E088.format(length=len(text), max_length=self.max_length)
            )
        doc = self.make_doc(text)
        if component_cfg is None:
            component_cfg = {}
@ -1045,6 +1041,11 @@ class Language:
        text (str): The text to process.
        RETURNS (Doc): The processed doc.
        """
        if len(text) > self.max_length:
            raise ValueError(
                Errors.E088.format(length=len(text), max_length=self.max_length)
            )
        return self.tokenizer(text)
        return self.tokenizer(text)
    def update(
--- a/spacy/matcher/matcher.pxd
+++ b/spacy/matcher/matcher.pxd
@ -26,6 +26,7 @@ cdef enum quantifier_t:
    ZERO_PLUS
    ONE
    ONE_PLUS
    FINAL_ID
 cdef struct AttrValueC:
--- a/spacy/matcher/matcher.pyx
+++ b/spacy/matcher/matcher.pyx
@ -2,7 +2,7 @@
 from typing import List
 from libcpp.vector cimport vector
-from libc.stdint cimport int32_t
+from libc.stdint cimport int32_t, int8_t
 from libc.string cimport memset, memcmp
 from cymem.cymem cimport Pool
 from murmurhash.mrmr cimport hash64
@ -308,7 +308,7 @@ cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, e
        # avoid any processing or mem alloc if the document is empty
        return output
    if len(predicates) > 0:
-        predicate_cache = <char*>mem.alloc(length * len(predicates), sizeof(char))
+        predicate_cache = <int8_t*>mem.alloc(length * len(predicates), sizeof(int8_t))
    if extensions is not None and len(extensions) >= 1:
        nr_extra_attr = max(extensions.values()) + 1
        extra_attr_values = <attr_t*>mem.alloc(length * nr_extra_attr, sizeof(attr_t))
@ -349,7 +349,7 @@ cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, e
 cdef void transition_states(vector[PatternStateC]& states, vector[MatchC]& matches,
-                            char* cached_py_predicates,
+                            int8_t* cached_py_predicates,
        Token token, const attr_t* extra_attrs, py_predicates) except *:
    cdef int q = 0
    cdef vector[PatternStateC] new_states
@ -421,7 +421,7 @@ cdef void transition_states(vector[PatternStateC]& states, vector[MatchC]& match
        states.push_back(new_states[i])
-cdef int update_predicate_cache(char* cache,
+cdef int update_predicate_cache(int8_t* cache,
        const TokenPatternC* pattern, Token token, predicates) except -1:
    # If the state references any extra predicates, check whether they match.
    # These are cached, so that we don't call these potentially expensive
@ -459,7 +459,7 @@ cdef void finish_states(vector[MatchC]& matches, vector[PatternStateC]& states)
 cdef action_t get_action(PatternStateC state,
        const TokenC* token, const attr_t* extra_attrs,
-        const char* predicate_matches) nogil:
+        const int8_t* predicate_matches) nogil:
    """We need to consider:
    a) Does the token match the specification? [Yes, No]
    b) What's the quantifier? [1, 0+, ?]
@ -517,7 +517,7 @@ cdef action_t get_action(PatternStateC state,
    Problem: If a quantifier is matching, we're adding a lot of open partials
    """
-    cdef char is_match
+    cdef int8_t is_match
    is_match = get_is_match(state, token, extra_attrs, predicate_matches)
    quantifier = get_quantifier(state)
    is_final = get_is_final(state)
@ -569,9 +569,9 @@ cdef action_t get_action(PatternStateC state,
          return RETRY
-cdef char get_is_match(PatternStateC state,
+cdef int8_t get_is_match(PatternStateC state,
        const TokenC* token, const attr_t* extra_attrs,
-        const char* predicate_matches) nogil:
+        const int8_t* predicate_matches) nogil:
    for i in range(state.pattern.nr_py):
        if predicate_matches[state.pattern.py_predicates[i]] == -1:
            return 0
@ -586,8 +586,8 @@ cdef char get_is_match(PatternStateC state,
    return True
-cdef char get_is_final(PatternStateC state) nogil:
+cdef int8_t get_is_final(PatternStateC state) nogil:
-    if state.pattern[1].nr_attr == 0 and state.pattern[1].attrs != NULL:
+    if state.pattern[1].quantifier == FINAL_ID:
        id_attr = state.pattern[1].attrs[0]
        if id_attr.attr != ID:
            with gil:
@ -597,7 +597,7 @@ cdef char get_is_final(PatternStateC state) nogil:
        return 0
-cdef char get_quantifier(PatternStateC state) nogil:
+cdef int8_t get_quantifier(PatternStateC state) nogil:
    return state.pattern.quantifier
@ -626,36 +626,20 @@ cdef TokenPatternC* init_pattern(Pool mem, attr_t entity_id, object token_specs)
        pattern[i].nr_py = len(predicates)
        pattern[i].key = hash64(pattern[i].attrs, pattern[i].nr_attr * sizeof(AttrValueC), 0)
    i = len(token_specs)
-    # Even though here, nr_attr == 0, we're storing the ID value in attrs[0] (bug-prone, thread carefully!)
+    # Use quantifier to identify final ID pattern node (rather than previous
-    pattern[i].attrs = <AttrValueC*>mem.alloc(2, sizeof(AttrValueC))
+    # uninitialized quantifier == 0/ZERO + nr_attr == 0 + non-zero-length attrs)
    pattern[i].quantifier = FINAL_ID
    pattern[i].attrs = <AttrValueC*>mem.alloc(1, sizeof(AttrValueC))
    pattern[i].attrs[0].attr = ID
    pattern[i].attrs[0].value = entity_id
-    pattern[i].nr_attr = 0
+    pattern[i].nr_attr = 1
    pattern[i].nr_extra_attr = 0
    pattern[i].nr_py = 0
    return pattern
 cdef attr_t get_ent_id(const TokenPatternC* pattern) nogil:
-    # There have been a few bugs here. We used to have two functions,
+    while pattern.quantifier != FINAL_ID:
    # get_ent_id and get_pattern_key that tried to do the same thing. These
    # are now unified to try to solve the "ghost match" problem.
    # Below is the previous implementation of get_ent_id and the comment on it,
    # preserved for reference while we figure out whether the heisenbug in the
    # matcher is resolved.
    #
    #
    #     cdef attr_t get_ent_id(const TokenPatternC* pattern) nogil:
    #         # The code was originally designed to always have pattern[1].attrs.value
    #         # be the ent_id when we get to the end of a pattern. However, Issue #2671
    #         # showed this wasn't the case when we had a reject-and-continue before a
    #         # match.
    #         # The patch to #2671 was wrong though, which came up in #3839.
    #         while pattern.attrs.attr != ID:
    #             pattern += 1
    #         return pattern.attrs.value
    while pattern.nr_attr != 0 or pattern.nr_extra_attr != 0 or pattern.nr_py != 0 \
            or pattern.quantifier != ZERO:
        pattern += 1
    id_attr = pattern[0].attrs[0]
    if id_attr.attr != ID:
--- a/spacy/pipeline/entityruler.py
+++ b/spacy/pipeline/entityruler.py
@ -261,7 +261,11 @@ class EntityRuler(Pipe):
        # disable the nlp components after this one in case they hadn't been initialized / deserialised yet
        try:
-            current_index = self.nlp.pipe_names.index(self.name)
+            current_index = -1
            for i, (name, pipe) in enumerate(self.nlp.pipeline):
                if self == pipe:
                    current_index = i
                    break
            subsequent_pipes = [
                pipe for pipe in self.nlp.pipe_names[current_index + 1 :]
            ]
--- a/spacy/pipeline/lemmatizer.py
+++ b/spacy/pipeline/lemmatizer.py
@ -4,7 +4,7 @@ from thinc.api import Model
 from pathlib import Path
 from .pipe import Pipe
-from ..errors import Errors
+from ..errors import Errors, Warnings
 from ..language import Language
 from ..training import Example
 from ..lookups import Lookups, load_lookups
@ -197,6 +197,8 @@ class Lemmatizer(Pipe):
        string = token.text
        univ_pos = token.pos_.lower()
        if univ_pos in ("", "eol", "space"):
            if univ_pos == "":
                logger.warn(Warnings.W108.format(text=string))
            return [string.lower()]
        # See Issue #435 for example of where this logic is requied.
        if self.is_base_form(token):
--- a/spacy/tests/conftest.py
+++ b/spacy/tests/conftest.py
@ -172,6 +172,11 @@ def lt_tokenizer():
    return get_lang_class("lt")().tokenizer
@pytest.fixture(scope="session")
 def mk_tokenizer():
    return get_lang_class("mk")().tokenizer
@pytest.fixture(scope="session")
 def ml_tokenizer():
    return get_lang_class("ml")().tokenizer
--- a/spacy/tests/doc/test_doc_api.py
+++ b/spacy/tests/doc/test_doc_api.py
@ -123,6 +123,7 @@ def test_doc_api_serialize(en_tokenizer, text):
    tokens[0].norm_ = "norm"
    tokens.ents = [(tokens.vocab.strings["PRODUCT"], 0, 1)]
    tokens[0].ent_kb_id_ = "ent_kb_id"
    tokens[0].ent_id_ = "ent_id"
    new_tokens = Doc(tokens.vocab).from_bytes(tokens.to_bytes())
    assert tokens.text == new_tokens.text
    assert [t.text for t in tokens] == [t.text for t in new_tokens]
@ -130,6 +131,7 @@ def test_doc_api_serialize(en_tokenizer, text):
    assert new_tokens[0].lemma_ == "lemma"
    assert new_tokens[0].norm_ == "norm"
    assert new_tokens[0].ent_kb_id_ == "ent_kb_id"
    assert new_tokens[0].ent_id_ == "ent_id"
    new_tokens = Doc(tokens.vocab).from_bytes(
        tokens.to_bytes(exclude=["tensor"]), exclude=["tensor"]
--- a/spacy/tests/doc/test_retokenize_merge.py
+++ b/spacy/tests/doc/test_retokenize_merge.py
@ -416,6 +416,13 @@ def test_doc_retokenizer_merge_lex_attrs(en_vocab):
    assert doc[1].is_stop
    assert not doc[0].is_stop
    assert not doc[1].like_num
    # Test that norm is only set on tokens
    doc = Doc(en_vocab, words=["eins", "zwei", "!", "!"])
    assert doc[0].norm_ == "eins"
    with doc.retokenize() as retokenizer:
        retokenizer.merge(doc[0:1], attrs={"norm": "1"})
    assert doc[0].norm_ == "1"
    assert en_vocab["eins"].norm_ == "eins"
 def test_retokenize_skip_duplicates(en_vocab):
--- a/spacy/tests/lang/mk/init.py
+++ b/spacy/tests/lang/mk/init.py
--- a/spacy/tests/lang/mk/test_text.py
+++ b/spacy/tests/lang/mk/test_text.py
@ -0,0 +1,84 @@
 import pytest
 from spacy.lang.mk.lex_attrs import like_num
 def test_tokenizer_handles_long_text(mk_tokenizer):
    text = """
    Во организациските работи или на нашите собранија со членството, никој од нас не зборуваше за 
    организацијата и идеологијата. Работна беше нашата работа, а не идеолошка. Што се однесува до социјализмот на 
    Делчев, неговата дејност зборува сама за себе - спротивно. Во суштина, водачите си имаа свои основни погледи и 
    свои разбирања за положбата и работите, коишто стоеја пред нив и ги завршуваа со голема упорност, настојчивост и 
    насоченост. Значи, идеологија имаше, само што нивната идеологија имаше своја оригиналност. Македонија денеска, 
    чиста рожба на животот и положбата во Македонија, кои му служеа како база на неговите побуди, беше дејност која 
    имаше потреба од ум за да си најде своја смисла. Таквата идеологија и заемното дејство на умот и срцето му 
    помогнаа на Делчев да не се занесе по патот на својата идеологија... Во суштина, Организацијата и нејзините 
    водачи имаа свои разбирања за работите и положбата во идеен поглед, но тоа беше врската, животот и положбата во 
    Македонија и го внесуваа во својата идеологија гласот на своето срце, и на крај, прибегнуваа до умот, 
    за да најдат смисла или да ѝ дадат. Тоа содејство и заемен сооднос на умот и срцето му помогнаа на Делчев да ја 
    држи својата идеологија во сообразност со положбата на работите... Водачите навистина направија една жртва 
    бидејќи на населението не му зборуваа за своите мисли и идеи. Тие се одрекоа од секаква субјективност во своите 
    мисли. Целта беше да не се зголемуваат целите и задачите како и преданоста во работата. Населението не можеше да 
    ги разбере овие идеи... 
    """
    tokens = mk_tokenizer(text)
    assert len(tokens) == 297
@pytest.mark.parametrize(
    "word,match",
    [
        ("10", True),
        ("1", True),
        ("10.000", True),
        ("1000", True),
        ("бројка", False),
        ("999,0", True),
        ("еден", True),
        ("два", True),
        ("цифра", False),
        ("десет", True),
        ("сто", True),
        ("број", False),
        ("илјада", True),
        ("илјади", True),
        ("милион", True),
        (",", False),
        ("милијарда", True),
        ("билион", True),
    ]
 )
 def test_mk_lex_attrs_like_number(mk_tokenizer, word, match):
    tokens = mk_tokenizer(word)
    assert len(tokens) == 1
    assert tokens[0].like_num == match
@pytest.mark.parametrize(
    "word",
    [
        "двесте",
        "два-три",
        "пет-шест"
    ]
 )
 def test_mk_lex_attrs_capitals(word):
    assert like_num(word)
    assert like_num(word.upper())
@pytest.mark.parametrize(
    "word",
    [
        "првиот",
        "втора",
        "четврт",
        "четвртата",
        "петти",
        "петто",
        "стоти",
        "шеесетите",
        "седумдесетите"
    ]
 )
 def test_mk_lex_attrs_like_number_for_ordinal(word):
    assert like_num(word)
--- a/spacy/tests/lang/tr/test_text.py
+++ b/spacy/tests/lang/tr/test_text.py
@ -2,6 +2,27 @@ import pytest
 from spacy.lang.tr.lex_attrs import like_num
 def test_tr_tokenizer_handles_long_text(tr_tokenizer):
    text = """Pamuk nasıl ipliğe dönüştürülür?
 Sıkıştırılmış balyalar halindeki pamuk, iplik fabrikasına getirildiğinde hem 
 lifleri birbirine dolaşmıştır, hem de tarladan toplanırken araya bitkinin 
 parçaları karışmıştır. Üstelik balyalardaki pamuğun cinsi aynı olsa bile kalitesi 
 değişeceğinden, önce bütün balyaların birbirine karıştırılarak harmanlanması gerekir.
 Daha sonra pamuk yığınları, liflerin açılıp temizlenmesi için tek bir birim halinde 
 birleştirilmiş çeşitli makinelerden geçirilir.Bunlardan biri, dönen tokmaklarıyla
 pamuğu dövüp kabartarak dağınık yumaklar haline getiren ve liflerin arasındaki yabancı
 maddeleri temizleyen hallaç makinesidir. Daha sonra tarak makinesine giren pamuk demetleri,
 herbirinin yüzeyinde yüzbinlerce incecik iğne bulunan döner silindirlerin arasından geçerek lif lif ayrılır
 ve tül inceliğinde gevşek bir örtüye dönüşür. Ama bir sonraki makine bu lifleri dağınık 
 ve gevşek bir biçimde birbirine yaklaştırarak 2 cm eninde bir pamuk şeridi haline getirir."""
    tokens = tr_tokenizer(text)
    assert len(tokens) == 146
@pytest.mark.parametrize(
    "word",
    [
--- a/spacy/tests/lang/tr/test_tokenizer.py
+++ b/spacy/tests/lang/tr/test_tokenizer.py
@ -0,0 +1,152 @@
 import pytest
 ABBREV_TESTS = [
        ("Dr. Murat Bey ile görüştüm.", ["Dr.", "Murat", "Bey", "ile", "görüştüm", "."]),
        ("Dr.la görüştüm.", ["Dr.la", "görüştüm", "."]),
        ("Dr.'la görüştüm.", ["Dr.'la", "görüştüm", "."]),
        ("TBMM'de çalışıyormuş.", ["TBMM'de", "çalışıyormuş", "."]),
        ("Hem İst. hem Ank. bu konuda gayet iyi durumda.", ["Hem", "İst.", "hem", "Ank.", "bu", "konuda", "gayet", "iyi", "durumda", "."]),
        ("Hem İst. hem Ank.'da yağış var.", ["Hem", "İst.", "hem", "Ank.'da", "yağış", "var", "."]),
        ("Dr.", ["Dr."]),
        ("Yrd.Doç.", ["Yrd.Doç."]),
        ("Prof.'un", ["Prof.'un"]),
        ("Böl.'nde", ["Böl.'nde"]),
 ]
 URL_TESTS = [
        ("Bizler de www.duygu.com.tr adında bir websitesi kurduk.", ["Bizler", "de", "www.duygu.com.tr", "adında", "bir", "websitesi", "kurduk", "."]),
        ("Bizler de https://www.duygu.com.tr adında bir websitesi kurduk.", ["Bizler", "de", "https://www.duygu.com.tr", "adında", "bir", "websitesi", "kurduk", "."]),
        ("Bizler de www.duygu.com.tr'dan satın aldık.", ["Bizler", "de", "www.duygu.com.tr'dan", "satın", "aldık", "."]),
        ("Bizler de https://www.duygu.com.tr'dan satın aldık.", ["Bizler", "de", "https://www.duygu.com.tr'dan", "satın", "aldık", "."]),
 ]
 NUMBER_TESTS = [
        ("Rakamla 6 yazılıydı.", ["Rakamla", "6", "yazılıydı", "."]),
        ("Hava -4 dereceydi.", ["Hava", "-4", "dereceydi", "."]),
        ("Hava sıcaklığı -4ten +6ya yükseldi.", ["Hava", "sıcaklığı", "-4ten", "+6ya", "yükseldi", "."]),
        ("Hava sıcaklığı -4'ten +6'ya yükseldi.", ["Hava", "sıcaklığı", "-4'ten", "+6'ya", "yükseldi", "."]),
        ("Yarışta 6. oldum.", ["Yarışta", "6.", "oldum", "."]),
        ("Yarışta 438547745. oldum.", ["Yarışta", "438547745.", "oldum", "."]),
        ("Kitap IV. Murat hakkında.",["Kitap", "IV.", "Murat", "hakkında", "."]),
        #("Bana söylediği sayı 6.", ["Bana", "söylediği", "sayı", "6", "."]),
        ("Saat 6'da buluşalım.", ["Saat", "6'da", "buluşalım", "."]),
        ("Saat 6dan sonra buluşalım.", ["Saat", "6dan", "sonra", "buluşalım", "."]),
        ("6.dan sonra saymadım.", ["6.dan", "sonra", "saymadım", "."]),
        ("6.'dan sonra saymadım.", ["6.'dan", "sonra", "saymadım", "."]),
        ("Saat 6'ydı.", ["Saat", "6'ydı", "."]),
        ("5'te", ["5'te"]),
        ("6'da", ["6'da"]),
        ("9dan", ["9dan"]),
        ("19'da", ["19'da"]),
        ("VI'da", ["VI'da"]),
        ("5.", ["5."]),
        ("72.", ["72."]),
        ("VI.", ["VI."]),
        ("6.'dan", ["6.'dan"]),
        ("19.'dan", ["19.'dan"]),
        ("6.dan", ["6.dan"]),
        ("16.dan", ["16.dan"]),
        ("VI.'dan", ["VI.'dan"]),
        ("VI.dan", ["VI.dan"]),
        ("Hepsi 1994 yılında oldu.", ["Hepsi", "1994", "yılında", "oldu", "."]),
        ("Hepsi 1994'te oldu.", ["Hepsi", "1994'te", "oldu", "."]),
        ("2/3 tarihli faturayı bulamadım.", ["2/3", "tarihli", "faturayı", "bulamadım", "."]),
        ("2.3 tarihli faturayı bulamadım.", ["2.3", "tarihli", "faturayı", "bulamadım", "."]),
        ("2.3. tarihli faturayı bulamadım.", ["2.3.", "tarihli", "faturayı", "bulamadım", "."]),
        ("2/3/2020 tarihli faturayı bulamadm.", ["2/3/2020", "tarihli", "faturayı", "bulamadm", "."]),
        ("2/3/1987 tarihinden beri burda yaşıyorum.", ["2/3/1987", "tarihinden", "beri", "burda", "yaşıyorum", "."]),
        ("2-3-1987 tarihinden beri burdayım.", ["2-3-1987", "tarihinden", "beri", "burdayım", "."]),
        ("2.3.1987 tarihinden beri burdayım.", ["2.3.1987", "tarihinden", "beri", "burdayım", "."]),
        ("Bu olay 2005-2006 tarihleri arasında oldu.", ["Bu", "olay", "2005", "-", "2006", "tarihleri", "arasında", "oldu", "."]),
        ("Bu olay 4/12/2005-21/3/2006 tarihleri arasında oldu.", ["Bu", "olay", "4/12/2005", "-", "21/3/2006", "tarihleri", "arasında", "oldu", ".",]),
        ("Ek fıkra: 5/11/2003-4999/3 maddesine göre uygundur.", ["Ek", "fıkra", ":", "5/11/2003", "-", "4999/3", "maddesine", "göre", "uygundur", "."]),
        ("2/A alanları: 6831 sayılı Kanunun 2nci maddesinin birinci fıkrasının (A) bendine göre", ["2/A", "alanları", ":", "6831", "sayılı", "Kanunun", "2nci", "maddesinin", "birinci", "fıkrasının", "(", "A", ")", "bendine", "göre"]),
        ("ŞEHİTTEĞMENKALMAZ Cad. No: 2/311", ["ŞEHİTTEĞMENKALMAZ", "Cad.", "No", ":", "2/311"]),
        ("2-3-2025", ["2-3-2025",]),
        ("2/3/2025", ["2/3/2025"]),
        ("Yıllardır 0.5 uç kullanıyorum.", ["Yıllardır", "0.5", "uç", "kullanıyorum", "."]),
        ("Kan değerlerim 0.5-0.7 arasıydı.", ["Kan", "değerlerim", "0.5", "-", "0.7", "arasıydı", "."]),
        ("0.5", ["0.5"]),
        ("1/2", ["1/2"]),
        ("%1", ["%", "1"]),
        ("%1lik", ["%", "1lik"]),
        ("%1'lik", ["%", "1'lik"]),
        ("%1lik dilim", ["%", "1lik", "dilim"]),
        ("%1'lik dilim", ["%", "1'lik", "dilim"]),
        ("%1.5", ["%", "1.5"]),
        #("%1-%2 arası büyüme bekleniyor.", ["%", "1", "-", "%", "2", "arası", "büyüme", "bekleniyor", "."]),
        ("%1-2 arası büyüme bekliyoruz.", ["%", "1", "-", "2", "arası", "büyüme", "bekliyoruz", "."]),
        ("%11-12 arası büyüme bekliyoruz.", ["%", "11", "-", "12", "arası", "büyüme", "bekliyoruz", "."]),
        ("%1.5luk büyüme bekliyoruz.", ["%", "1.5luk", "büyüme", "bekliyoruz", "."]),
        ("Saat 1-2 arası gelin lütfen.", ["Saat", "1", "-", "2", "arası", "gelin", "lütfen", "."]),
        ("Saat 15:30 gibi buluşalım.", ["Saat", "15:30", "gibi", "buluşalım", "."]),
        ("Saat 15:30'da buluşalım.", ["Saat", "15:30'da", "buluşalım", "."]),
        ("Saat 15.30'da buluşalım.", ["Saat", "15.30'da", "buluşalım", "."]),
        ("Saat 15.30da buluşalım.", ["Saat", "15.30da", "buluşalım", "."]),
        ("Saat 15 civarı buluşalım.", ["Saat", "15", "civarı", "buluşalım", "."]),
        ("9’daki otobüse binsek mi?", ["9’daki", "otobüse", "binsek", "mi", "?"]),
        ("Okulumuz 3-B şubesi", ["Okulumuz", "3-B", "şubesi"]),
        ("Okulumuz 3/B şubesi", ["Okulumuz", "3/B", "şubesi"]),
        ("Okulumuz 3B şubesi", ["Okulumuz", "3B", "şubesi"]),
        ("Okulumuz 3b şubesi", ["Okulumuz", "3b", "şubesi"]),
        ("Antonio Gaudí 20. yüzyılda, 1904-1914 yılları arasında on yıl süren bir reform süreci getirmiştir.", ["Antonio", "Gaudí", "20.", "yüzyılda", ",", "1904", "-", "1914", "yılları", "arasında", "on", "yıl", "süren", "bir", "reform", "süreci", "getirmiştir", "."]),
        ("Dizel yakıtın avro bölgesi ortalaması olan 1,165 avroya kıyasla litre başına 1,335 avroya mal olduğunu gösteriyor.", ["Dizel", "yakıtın", "avro", "bölgesi", "ortalaması", "olan", "1,165", "avroya", "kıyasla", "litre", "başına", "1,335", "avroya", "mal", "olduğunu", "gösteriyor", "."]),
        ("Marcus Antonius M.Ö. 1 Ocak 49'da, Sezar'dan Vali'nin kendisini barış dostu ilan ettiği bir bildiri yayınlamıştır.", ["Marcus", "Antonius", "M.Ö.", "1", "Ocak", "49'da", ",", "Sezar'dan", "Vali'nin", "kendisini", "barış", "dostu", "ilan", "ettiği", "bir", "bildiri", "yayınlamıştır", "."])
 ]
 PUNCT_TESTS = [
        ("Gitmedim dedim ya!", ["Gitmedim", "dedim", "ya", "!"]),
        ("Gitmedim dedim ya!!", ["Gitmedim", "dedim", "ya", "!", "!"]),
        ("Gitsek mi?", ["Gitsek", "mi", "?"]),
        ("Gitsek mi??", ["Gitsek", "mi", "?", "?"]),
        ("Gitsek mi?!?", ["Gitsek", "mi", "?", "!", "?"]),
        ("Ankara - Antalya arası otobüs işliyor.", ["Ankara", "-",  "Antalya", "arası", "otobüs", "işliyor", "."]),
        ("Ankara-Antalya arası otobüs işliyor.", ["Ankara", "-", "Antalya", "arası", "otobüs", "işliyor", "."]),
        ("Sen--ben, ya da onlar.", ["Sen", "--", "ben", ",", "ya", "da", "onlar", "."]),
        ("Senden, benden, bizden şarkısını biliyor musun?", ["Senden", ",", "benden", ",", "bizden", "şarkısını", "biliyor", "musun", "?"]),
        ("Akif'le geldik, sonra da o ayrıldı.", ["Akif'le", "geldik", ",", "sonra", "da", "o", "ayrıldı", "."]),
        ("Bu adam ne dedi şimdi???", ["Bu", "adam", "ne", "dedi", "şimdi", "?", "?", "?"]),
        ("Yok hasta olmuş, yok annesi hastaymış, bahaneler işte...", ["Yok", "hasta", "olmuş", ",", "yok", "annesi", "hastaymış", ",", "bahaneler", "işte", "..."]),
        ("Ankara'dan İstanbul'a ... bir aşk hikayesi.", ["Ankara'dan", "İstanbul'a", "...", "bir", "aşk", "hikayesi", "."]),
        ("Ahmet'te", ["Ahmet'te"]),
        ("İstanbul'da", ["İstanbul'da"]),
 ]
 GENERAL_TESTS = [
        ("1914'teki Endurance seferinde, Sir Ernest Shackleton'ın kaptanlığını yaptığı İngiliz Endurance gemisi yirmi sekiz kişi ile Antarktika'yı geçmek üzere yelken açtı.", ["1914'teki", "Endurance", "seferinde", ",", "Sir", "Ernest", "Shackleton'ın", "kaptanlığını", "yaptığı", "İngiliz", "Endurance", "gemisi", "yirmi", "sekiz", "kişi", "ile", "Antarktika'yı", "geçmek", "üzere", "yelken", "açtı", "."]),
        ("Danışılan \"%100 Cospedal\" olduğunu belirtti.", ["Danışılan", '"', "%", "100", "Cospedal", '"', "olduğunu", "belirtti", "."]),
        ("1976'da parkur artık kullanılmıyordu; 1990'da ise bir yangın, daha sonraları ahırlarla birlikte yıkılacak olan tahta tribünlerden geri kalanları da yok etmişti.", ["1976'da", "parkur", "artık", "kullanılmıyordu", ";", "1990'da", "ise", "bir", "yangın", ",", "daha", "sonraları", "ahırlarla", "birlikte", "yıkılacak", "olan", "tahta", "tribünlerden", "geri", "kalanları", "da", "yok", "etmişti", "."]),
        ("Dahiyane bir ameliyat ve zorlu bir rehabilitasyon sürecinden sonra, tamamen iyileştim.", ["Dahiyane", "bir", "ameliyat", "ve", "zorlu", "bir", "rehabilitasyon", "sürecinden", "sonra", ",", "tamamen", "iyileştim", "."]),
        ("Yaklaşık iki hafta süren bireysel erken oy kullanma döneminin ardından 5,7 milyondan fazla Floridalı sandık başına gitti.", ["Yaklaşık", "iki", "hafta", "süren", "bireysel", "erken", "oy", "kullanma", "döneminin", "ardından", "5,7", "milyondan", "fazla", "Floridalı", "sandık", "başına", "gitti", "."]),
        ("Ancak, bu ABD Çevre Koruma Ajansı'nın dünyayı bu konularda uyarmasının ardından ortaya çıktı.", ["Ancak", ",", "bu", "ABD", "Çevre", "Koruma", "Ajansı'nın", "dünyayı", "bu", "konularda", "uyarmasının", "ardından", "ortaya", "çıktı", "."]),
        ("Ortalama şansa ve 10.000 Sterlin değerinde tahvillere sahip bir yatırımcı yılda 125 Sterlin ikramiye kazanabilir.", ["Ortalama", "şansa", "ve", "10.000", "Sterlin", "değerinde", "tahvillere", "sahip", "bir", "yatırımcı", "yılda", "125", "Sterlin", "ikramiye", "kazanabilir", "."]),
        ("Granit adaları; Seyşeller ve Tioman ile Saint Helena gibi volkanik adaları kapsar." , ["Granit", "adaları", ";", "Seyşeller", "ve", "Tioman", "ile", "Saint", "Helena", "gibi", "volkanik", "adaları", "kapsar", "."]),
        ("Barış antlaşmasıyla İspanya, Amerika'ya Porto Riko, Guam ve Filipinler kolonilerini devretti.", ["Barış", "antlaşmasıyla", "İspanya", ",", "Amerika'ya", "Porto", "Riko", ",", "Guam", "ve", "Filipinler", "kolonilerini", "devretti", "."]),
        ("Makedonya\'nın sınır bölgelerini güvence altına alan Philip, büyük bir Makedon ordusu kurdu ve uzun bir fetih seferi için Trakya\'ya doğru yürüdü.", ["Makedonya\'nın", "sınır", "bölgelerini", "güvence", "altına", "alan", "Philip", ",", "büyük", "bir", "Makedon", "ordusu", "kurdu", "ve", "uzun", "bir", "fetih", "seferi", "için", "Trakya\'ya", "doğru", "yürüdü", "."]),
        ("Fransız gazetesi Le Figaro'ya göre bu hükumet planı sayesinde 42 milyon Euro kazanç sağlanabilir ve elde edilen paranın 15.5 milyonu ulusal güvenlik için kullanılabilir.", ["Fransız", "gazetesi", "Le", "Figaro'ya", "göre", "bu", "hükumet", "planı", "sayesinde", "42", "milyon", "Euro", "kazanç", "sağlanabilir", "ve", "elde", "edilen", "paranın", "15.5", "milyonu", "ulusal", "güvenlik", "için", "kullanılabilir", "."]),
        ("Ortalama şansa ve 10.000 Sterlin değerinde tahvillere sahip bir yatırımcı yılda 125 Sterlin ikramiye kazanabilir.", ["Ortalama", "şansa", "ve", "10.000", "Sterlin", "değerinde", "tahvillere", "sahip", "bir", "yatırımcı", "yılda", "125", "Sterlin", "ikramiye", "kazanabilir", "."]),
        ("3 Kasım Salı günü, Ankara Belediye Başkanı 2014'te hükümetle birlikte oluşturulan kentsel gelişim anlaşmasını askıya alma kararı verdi.", ["3", "Kasım", "Salı", "günü", ",", "Ankara", "Belediye", "Başkanı", "2014'te", "hükümetle", "birlikte", "oluşturulan", "kentsel", "gelişim", "anlaşmasını", "askıya", "alma", "kararı", "verdi", "."]),
        ("Stalin, Abakumov'u Beria'nın enerji bakanlıkları üzerindeki baskınlığına karşı MGB içinde kendi ağını kurmaya teşvik etmeye başlamıştı.", ["Stalin", ",", "Abakumov'u", "Beria'nın", "enerji", "bakanlıkları", "üzerindeki", "baskınlığına", "karşı", "MGB", "içinde", "kendi", "ağını", "kurmaya", "teşvik", "etmeye", "başlamıştı", "."]),
        ("Güney Avrupa'daki kazı alanlarının çoğunluğu gibi, bu bulgu M.Ö. 5. yüzyılın başlar", ["Güney", "Avrupa'daki", "kazı", "alanlarının", "çoğunluğu", "gibi", ",", "bu", "bulgu", "M.Ö.", "5.", "yüzyılın", "başlar"]),
        ("Sağlığın bozulması Hitchcock hayatının son yirmi yılında üretimini azalttı.", ["Sağlığın", "bozulması", "Hitchcock", "hayatının", "son", "yirmi", "yılında", "üretimini", "azalttı", "."]),
 ]
 TESTS = (ABBREV_TESTS + URL_TESTS +  NUMBER_TESTS + PUNCT_TESTS + GENERAL_TESTS)
@pytest.mark.parametrize("text,expected_tokens", TESTS)
 def test_tr_tokenizer_handles_allcases(tr_tokenizer, text, expected_tokens):
    tokens = tr_tokenizer(text)
    token_list = [token.text for token in tokens if not token.is_space]
    print(token_list)
    assert expected_tokens == token_list
--- a/spacy/tests/matcher/test_matcher_api.py
+++ b/spacy/tests/matcher/test_matcher_api.py
@ -457,6 +457,7 @@ def test_attr_pipeline_checks(en_vocab):
        ([{"IS_LEFT_PUNCT": True}], "``"),
        ([{"IS_RIGHT_PUNCT": True}], "''"),
        ([{"IS_STOP": True}], "the"),
        ([{"SPACY": True}], "the"),
        ([{"LIKE_NUM": True}], "1"),
        ([{"LIKE_URL": True}], "http://example.com"),
        ([{"LIKE_EMAIL": True}], "mail@example.com"),
--- a/spacy/tests/package/test_requirements.py
+++ b/spacy/tests/package/test_requirements.py
@ -4,7 +4,9 @@ from pathlib import Path
 def test_build_dependencies():
    # Check that library requirements are pinned exactly the same across different setup files.
    # TODO: correct checks for numpy rather than ignoring
    libs_ignore_requirements = [
        "numpy",
        "pytest",
        "pytest-timeout",
        "mock",
@ -12,6 +14,7 @@ def test_build_dependencies():
    ]
    # ignore language-specific packages that shouldn't be installed by all
    libs_ignore_setup = [
        "numpy",
        "fugashi",
        "natto-py",
        "pythainlp",
@ -67,7 +70,7 @@ def test_build_dependencies():
        line = line.strip().strip(",").strip('"')
        if not line.startswith("#"):
            lib, v = _parse_req(line)
-            if lib:
+            if lib and lib not in libs_ignore_requirements:
                req_v = req_dict.get(lib, None)
                assert (lib + v) == (lib + req_v), (
                    "{} has different version in pyproject.toml and in requirements.txt: "
--- a/spacy/tests/pipeline/test_entity_ruler.py
+++ b/spacy/tests/pipeline/test_entity_ruler.py
@ -197,3 +197,21 @@ def test_entity_ruler_overlapping_spans(nlp):
    doc = ruler(nlp.make_doc("foo bar baz"))
    assert len(doc.ents) == 1
    assert doc.ents[0].label_ == "FOOBAR"
@pytest.mark.parametrize("n_process", [1, 2])
 def test_entity_ruler_multiprocessing(nlp, n_process):
    texts = [
        "I enjoy eating Pizza Hut pizza."
    ]
    patterns = [
        {"label": "FASTFOOD", "pattern": "Pizza Hut", "id": "1234"}
    ]
    ruler = nlp.add_pipe("entity_ruler")
    ruler.add_patterns(patterns)
    for doc in nlp.pipe(texts, n_process=2):
        for ent in doc.ents:
            assert ent.ent_id_ == "1234"
--- a/spacy/tests/pipeline/test_lemmatizer.py
+++ b/spacy/tests/pipeline/test_lemmatizer.py
@ -1,4 +1,6 @@
 import pytest
 import logging
 import mock
 from spacy import util, registry
 from spacy.lang.en import English
 from spacy.lookups import Lookups
@ -54,9 +56,18 @@ def test_lemmatizer_config(nlp):
    lemmatizer = nlp.add_pipe("lemmatizer", config={"mode": "rule"})
    nlp.initialize()
    # warning if no POS assigned
    doc = nlp.make_doc("coping")
    logger = logging.getLogger("spacy")
    with mock.patch.object(logger, "warn") as mock_warn:
        doc = lemmatizer(doc)
        mock_warn.assert_called_once()
    # works with POS
    doc = nlp.make_doc("coping")
    doc[0].pos_ = "VERB"
    assert doc[0].lemma_ == ""
    doc[0].pos_ = "VERB"
    doc = lemmatizer(doc)
    doc = lemmatizer(doc)
    assert doc[0].text == "coping"
    assert doc[0].lemma_ == "cope"
--- a/spacy/tests/test_cli.py
+++ b/spacy/tests/test_cli.py
@ -8,7 +8,7 @@ from spacy.cli.init_config import init_config, RECOMMENDATIONS
 from spacy.cli._util import validate_project_commands, parse_config_overrides
 from spacy.cli._util import load_project_config, substitute_project_variables
 from spacy.cli._util import string_to_list
-from thinc.api import ConfigValidationError
+from thinc.api import ConfigValidationError, Config
 import srsly
 import os
@ -368,7 +368,8 @@ def test_parse_cli_overrides():
@pytest.mark.parametrize("optimize", ["efficiency", "accuracy"])
 def test_init_config(lang, pipeline, optimize):
    # TODO: add more tests and also check for GPU with transformers
-    init_config("-", lang=lang, pipeline=pipeline, optimize=optimize, gpu=False)
+    config = init_config(lang=lang, pipeline=pipeline, optimize=optimize, gpu=False)
    assert isinstance(config, Config)
 def test_model_recommendations():
--- a/spacy/tokenizer.pyx
+++ b/spacy/tokenizer.pyx
@ -404,9 +404,7 @@ cdef class Tokenizer:
        cdef unicode minus_suf
        cdef size_t last_size = 0
        while string and len(string) != last_size:
-            if self.token_match and self.token_match(string) \
+            if self.token_match and self.token_match(string):
                    and not self.find_prefix(string) \
                    and not self.find_suffix(string):
                break
            if with_special_cases and self._specials.get(hash_string(string)) != NULL:
                break
@ -679,6 +677,8 @@ cdef class Tokenizer:
                            break
                        suffixes.append(("SUFFIX", substring[split:]))
                        substring = substring[:split]
                if len(substring) == 0:
                    continue
                if token_match(substring):
                    tokens.append(("TOKEN_MATCH", substring))
                    substring = ''
--- a/spacy/tokens/_retokenize.pyx
+++ b/spacy/tokens/_retokenize.pyx
@ -11,7 +11,7 @@ from .span cimport Span
 from .token cimport Token
 from ..lexeme cimport Lexeme, EMPTY_LEXEME
 from ..structs cimport LexemeC, TokenC
-from ..attrs cimport MORPH
+from ..attrs cimport MORPH, NORM
 from ..vocab cimport Vocab
 from .underscore import is_writable_attr
@ -372,9 +372,10 @@ def _split(Doc doc, int token_index, orths, heads, attrs):
                # Set attributes on both token and lexeme to take care of token
                # attribute vs. lexical attribute without having to enumerate
                # them. If an attribute name is not valid, set_struct_attr will
-                # ignore it.
+                # ignore it. Exception: set NORM only on tokens.
                Token.set_struct_attr(token, attr_name, get_string_id(attr_value))
-                Lexeme.set_struct_attr(<LexemeC*>token.lex, attr_name, get_string_id(attr_value))
+                if attr_name != NORM:
                    Lexeme.set_struct_attr(<LexemeC*>token.lex, attr_name, get_string_id(attr_value))
    # Assign correct dependencies to the inner token
    for i, head in enumerate(heads):
        doc.c[token_index + i].head = head
@ -435,6 +436,7 @@ def set_token_attrs(Token py_token, attrs):
            # Set attributes on both token and lexeme to take care of token
            # attribute vs. lexical attribute without having to enumerate
            # them. If an attribute name is not valid, set_struct_attr will
-            # ignore it.
+            # ignore it. Exception: set NORM only on tokens.
            Token.set_struct_attr(token, attr_name, attr_value)
-            Lexeme.set_struct_attr(<LexemeC*>lex, attr_name, attr_value)
+            if attr_name != NORM:
                Lexeme.set_struct_attr(<LexemeC*>lex, attr_name, attr_value)
--- a/spacy/typedefs.pxd
+++ b/spacy/typedefs.pxd
@ -5,7 +5,6 @@ from libc.stdint cimport uint8_t
 ctypedef float weight_t
 ctypedef uint64_t hash_t
 ctypedef uint64_t class_t
 ctypedef char* utf8_t
 ctypedef uint64_t attr_t
 ctypedef uint64_t flags_t
 ctypedef uint16_t len_t
--- a/spacy/util.py
+++ b/spacy/util.py
@ -1295,6 +1295,13 @@ def combine_score_weights(
 class DummyTokenizer:
    def __call__(self, text):
        raise NotImplementedError
    def pipe(self, texts, **kwargs):
        for text in texts:
            yield self(text)
    # add dummy methods for to_bytes, from_bytes, to_disk and from_disk to
    # allow serialization (see #1557)
    def to_bytes(self, **kwargs):
--- a/spacy/vocab.pxd
+++ b/spacy/vocab.pxd
@ -4,7 +4,7 @@ from cymem.cymem cimport Pool
 from murmurhash.mrmr cimport hash64
 from .structs cimport LexemeC, TokenC
-from .typedefs cimport utf8_t, attr_t, hash_t
+from .typedefs cimport attr_t, hash_t
 from .strings cimport StringStore
 from .morphology cimport Morphology
--- a/spacy/vocab.pyx
+++ b/spacy/vocab.pyx
@ -305,6 +305,9 @@ cdef class Vocab:
        DOCS: https://nightly.spacy.io/api/vocab#prune_vectors
        """
        xp = get_array_module(self.vectors.data)
        # Make sure all vectors are in the vocab
        for orth in self.vectors:
            self[orth]
        # Make prob negative so it sorts by rank ascending
        # (key2row contains the rank)
        priority = [(-lex.prob, self.vectors.key2row[lex.orth], lex.orth)
--- a/website/docs/api/matcher.md
+++ b/website/docs/api/matcher.md
@ -39,7 +39,9 @@ rule-based matching are:
 |  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT`             | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~                                          |
 |  `IS_LOWER`, `IS_UPPER`, `IS_TITLE`             | Token text is in lowercase, uppercase, titlecase. ~~bool~~                                                                |
 |  `IS_PUNCT`, `IS_SPACE`, `IS_STOP`              | Token is punctuation, whitespace, stop word. ~~bool~~                                                                     |
 |  `IS_SENT_START`                                | Token is start of sentence. ~~bool~~                                                                                      |
 |  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL`           | Token text resembles a number, URL, email. ~~bool~~                                                                       |
 | `SPACY`                                         | Token has a trailing space. ~~bool~~                                                                                      |
 |  `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. ~~str~~       |
 | `ENT_TYPE`                                      | The token's entity label. ~~str~~                                                                                         |
 | `_` <Tag variant="new">2.1</Tag>                | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
@ -61,7 +63,7 @@ matched:
 | `!` | Negate the pattern, by requiring it to match exactly 0 times.    |
 | `?` | Make the pattern optional, by allowing it to match 0 or 1 times. |
 | `+` | Require the pattern to match 1 or more times.                    |
-| `*` | Allow the pattern to match 0 or more times.                   |
+| `*` | Allow the pattern to match 0 or more times.                      |
 Token patterns can also map to a **dictionary of properties** instead of a
 single value to indicate whether the expected value is a member of a list or how
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -158,21 +158,22 @@ The available token pattern keys correspond to a number of
 [`Token` attributes](/api/token#attributes). The supported attributes for
 rule-based matching are:
-| Attribute                                       |  Description                                                                                                              |
+| Attribute                                       |  Description                                                                                                                                                                                                                                                                                              |
-| ----------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
+| ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `ORTH`                                          | The exact verbatim text of a token. ~~str~~                                                                               |
+| `ORTH`                                          | The exact verbatim text of a token. ~~str~~                                                                                                                                                                                                                                                               |
-| `TEXT` <Tag variant="new">2.1</Tag>             | The exact verbatim text of a token. ~~str~~                                                                               |
+| `TEXT` <Tag variant="new">2.1</Tag>             | The exact verbatim text of a token. ~~str~~                                                                                                                                                                                                                                                               |
-| `LOWER`                                         | The lowercase form of the token text. ~~str~~                                                                             |
+| `LOWER`                                         | The lowercase form of the token text. ~~str~~                                                                                                                                                                                                                                                             |
-|  `LENGTH`                                       | The length of the token text. ~~int~~                                                                                     |
+|  `LENGTH`                                       | The length of the token text. ~~int~~                                                                                                                                                                                                                                                                     |
-|  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT`             | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~                                          |
+|  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT`             | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~                                                                                                                                                                                                                          |
-|  `IS_LOWER`, `IS_UPPER`, `IS_TITLE`             | Token text is in lowercase, uppercase, titlecase. ~~bool~~                                                                |
+|  `IS_LOWER`, `IS_UPPER`, `IS_TITLE`             | Token text is in lowercase, uppercase, titlecase. ~~bool~~                                                                                                                                                                                                                                                |
-|  `IS_PUNCT`, `IS_SPACE`, `IS_STOP`              | Token is punctuation, whitespace, stop word. ~~bool~~                                                                     |
+|  `IS_PUNCT`, `IS_SPACE`, `IS_STOP`              | Token is punctuation, whitespace, stop word. ~~bool~~                                                                                                                                                                                                                                                     |
-|  `IS_SENT_START`                                | Token is start of sentence. ~~bool~~                                                                                      |
+|  `IS_SENT_START`                                | Token is start of sentence. ~~bool~~                                                                                                                                                                                                                                                                      |
-|  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL`           | Token text resembles a number, URL, email. ~~bool~~                                                                       |
+|  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL`           | Token text resembles a number, URL, email. ~~bool~~                                                                                                                                                                                                                                                       |
-|  `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. ~~str~~       |
+| `SPACY`                                         | Token has a trailing space. ~~bool~~                                                                                                                                                                                                                                                                      |
-| `ENT_TYPE`                                      | The token's entity label. ~~str~~                                                                                         |
+|  `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. Note that the values of these attributes are case-sensitive. For a list of available part-of-speech tags and dependency labels, see the [Annotation Specifications](/api/annotation). ~~str~~ |
-| `_` <Tag variant="new">2.1</Tag>                | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
+| `ENT_TYPE`                                      | The token's entity label. ~~str~~                                                                                                                                                                                                                                                                         |
-| `OP`                                            | [Operator or quantifier](#quantifiers) to determine how often to match a token pattern. ~~str~~                           |
+| `_` <Tag variant="new">2.1</Tag>                | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~                                                                                                                                                                                 |
 | `OP`                                            | [Operator or quantifier](#quantifiers) to determine how often to match a token pattern. ~~str~~                                                                                                                                                                                                           |
 <Accordion title="Does it matter if the attribute names are uppercase or lowercase?">
--- a/website/meta/languages.json
+++ b/website/meta/languages.json
@ -199,6 +199,36 @@
            "name": "Vietnamese",
            "dependencies": [{ "name": "Pyvi", "url": "https://github.com/trungtv/pyvi" }]
        },
        {
            "code": "lij",
            "name": "Ligurian",
            "example": "Sta chì a l'é unna fraxe.",
            "has_examples": true
        },
        {
            "code": "hy",
            "name": "Armenian",
            "has_examples": true
        },
        {
            "code": "gu",
            "name": "Gujarati",
            "has_examples": true
        },
        {
            "code": "ml",
            "name": "Malayalam",
            "has_examples": true
        },
        {
            "code": "ne",
            "name": "Nepali",
            "has_examples": true
        },
        {
            "code": "mk",
            "name": "Macedonian"
        },
        {
            "code": "xx",
            "name": "Multi-language",
--- a/website/meta/universe.json
+++ b/website/meta/universe.json
@ -1,5 +1,36 @@
 {
    "resources": [
    	{
            "id": "spacy-textblob",
            "title": "spaCyTextBlob",
            "slogan": "Easy sentiment analysis for spaCy using TextBlob",
            "description": "spaCyTextBlob is a pipeline component that enables sentiment analysis using the [TextBlob](https://github.com/sloria/TextBlob) library. It will add the additional extenstion `._.sentiment` to `Doc`, `Span`, and `Token` objects.",
            "github": "SamEdwardes/spaCyTextBlob",
            "pip": "spacytextblob",
            "code_example": [
            "import spacy",
            "from spacytextblob.spacytextblob import SpacyTextBlob",
            "",
            "nlp = spacy.load('en_core_web_sm')",
            "spacy_text_blob = SpacyTextBlob()",
            "nlp.add_pipe(spacy_text_blob)",
            "text = 'I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy.'",
            "doc = nlp(text)",
            "doc._.sentiment.polarity      # Polarity: -0.125",
            "doc._.sentiment.subjectivity  # Sujectivity: 0.9",
            "doc._.sentiment.assessments   # Assessments: [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]"
            ],
            "code_language": "python",
            "url": "https://spacytextblob.netlify.app/",
            "author": "Sam Edwardes",
            "author_links": {
            "twitter": "TheReaLSamlam",
            "github": "SamEdwardes",
            "website": "https://samedwardes.com"
            },
            "category": ["pipeline"],
            "tags": ["sentiment", "textblob"]
 	    },
        {
            "id": "spacy-ray",
            "title": "spacy-ray",
@ -788,6 +819,22 @@
            "category": ["conversational"],
            "tags": ["chatbots"]
        },
        {
            "id": "mindmeld",
            "title": "MindMeld - Conversational AI platform",
            "slogan": "Conversational AI platform for deep-domain voice interfaces and chatbots",
            "description": "The MindMeld Conversational AI platform is among the most advanced AI platforms for building production-quality conversational applications. It is a Python-based machine learning framework which encompasses all of the algorithms and utilities required for this purpose. (https://github.com/cisco/mindmeld)",
            "github": "cisco/mindmeld",
            "pip": "mindmeld",
            "thumb": "https://www.mindmeld.com/img/mindmeld-logo.png",
            "category": ["conversational", "ner"],
            "tags": ["chatbots"],
            "author": "Cisco",
            "author_links": {
                "github": "cisco/mindmeld",
                "website": "https://www.mindmeld.com/"
            }
        },
        {
            "id": "torchtext",
            "title": "torchtext",
@ -1648,7 +1695,7 @@
                "",
                "nlp = spacy.load('en')",
                "nlp.add_pipe(BeneparComponent('benepar_en'))",
-                "doc = nlp('The time for action is now. It's never too late to do something.')",
+                "doc = nlp('The time for action is now. It is never too late to do something.')",
                "sent = list(doc.sents)[0]",
                "print(sent._.parse_string)",
                "# (S (NP (NP (DT The) (NN time)) (PP (IN for) (NP (NN action)))) (VP (VBZ is) (ADVP (RB now))) (. .))",
@ -2527,14 +2574,14 @@
            "description": "A spaCy rule-based pipeline for identifying positive cases of COVID-19 from clinical text. A version of this system was deployed as part of the US Department of Veterans Affairs biosurveillance response to COVID-19.",
            "pip": "cov-bsv",
            "code_example": [
-                "import cov_bsv",
+              "import cov_bsv",
-                "",
+              "",
-                "nlp = cov_bsv.load()",
+              "nlp = cov_bsv.load()",
-                "text = 'Pt tested for COVID-19. His wife was recently diagnosed with novel coronavirus. SARS-COV-2: Detected'",
+              "doc = nlp('Pt tested for COVID-19. His wife was recently diagnosed with novel coronavirus. SARS-COV-2: Detected')",
-                "",
+              "",
-                "print(doc.ents)",
+              "print(doc.ents)",
-                "print(doc._.cov_classification)",
+              "print(doc._.cov_classification)",
-                "cov_bsv.visualize_doc(doc)"
+              "cov_bsv.visualize_doc(doc)"
            ],
            "category": ["pipeline", "standalone", "biomedical", "scientific"],
            "tags": ["clinical", "epidemiology", "covid-19", "surveillance"],
@ -2542,6 +2589,35 @@
            "author_links": {
                "github": "abchapman93"
            }
        },
        {
            "id": "medspacy",
            "title": "medspaCy",
            "thumb": "https://raw.githubusercontent.com/medspacy/medspacy/master/images/medspacy_logo.png",
            "slogan": "A toolkit for clinical NLP with spaCy.",
            "github": "medspacy/medspacy",
            "description": "A toolkit for clinical NLP with spaCy. Features include sentence splitting, section detection, and asserting negation, family history, and uncertainty.",
            "pip": "medspacy",
            "code_example": [
              "import medspacy",
              "from medspacy.ner import TargetRule",
              "",
              "nlp = medspacy.load()",
              "print(nlp.pipe_names)",
              "",
              "nlp.get_pipe('target_matcher').add([TargetRule('stroke', 'CONDITION'), TargetRule('diabetes', 'CONDITION'), TargetRule('pna', 'CONDITION')])",
              "doc = nlp('Patient has hx of stroke. Mother diagnosed with diabetes. No evidence of pna.')",
              "",
              "for ent in doc.ents:",
              "    print(ent, ent._.is_negated, ent._.is_family, ent._.is_historical)",
              "medspacy.visualization.visualize_ent(doc)"
            ],
            "category": ["biomedical", "scientific", "research"],
            "tags": ["clinical"],
            "author": "medspacy",
            "author_links": {
                "github": "medspacy"
            }
        },
 	      {
            "id": "rita-dsl",
@ -2578,6 +2654,32 @@
            "author_links": {
                "github": "zaibacu"
            }
        },
        {
            "id": "PatternOmatic",
            "title": "PatternOmatic",
            "slogan": "Finds linguistic patterns effortlessly",
            "description": "Discover spaCy's linguistic patterns matching a given set of String samples to be used by the spaCy's Rule Based Matcher",
            "github": "revuel/PatternOmatic",
            "pip": "PatternOmatic",
            "code_example": [
                "from PatternOmatic.api import find_patterns",
                "",
                "samples = ['I am a cat!', 'You are a dog!', 'She is an owl!']",
                "",
                "patterns_found, _ = find_patterns(samples)",
                "",
                "print(f'Patterns found: {patterns_found}')"
            ],
            "code_language": "python",
            "thumb": "https://svgshare.com/i/R3P.svg",
            "image": "https://svgshare.com/i/R3P.svg",
            "author": "Miguel Revuelta Espinosa",
            "author_links": {
                "github": "revuel"
            },
            "category": ["scientific", "research", "standalone"],
            "tags": ["Evolutionary Computation", "Grammatical Evolution"]
        }
    ],
--- a/website/src/widgets/landing.js
+++ b/website/src/widgets/landing.js
@ -207,42 +207,49 @@ const Landing = ({ data }) => {
            <LandingBannerGrid>
                <LandingBanner
-                    to="https://course.spacy.io"
+                    title="spaCy v3.0 nightly: Transformer-based pipelines, new training system, project templates &amp; more"
-                    button="Start the course"
+                    label="Try the pre-release"
-                    background="#f6f6f6"
+                    to="https://nightly.spacy.io"
-                    color="#252a33"
+                    button="See what's new"
                    background="#8758fe"
                    color="#ffffff"
                    small
                >
-                    <Link to="https://course.spacy.io" hidden>
+                    spaCy v3.0 features all new <strong>transformer-based pipelines</strong> that
                    bring spaCy's accuracy right up to the current <strong>state-of-the-art</strong>
                    . You can use any pretrained transformer to train your own pipelines, and even
                    share one transformer between multiple components with{' '}
                    <strong>multi-task learning</strong>. Training is now fully configurable and
                    extensible, and you can define your own custom models using{' '}
                    <strong>PyTorch</strong>, <strong>TensorFlow</strong> and other frameworks. The
                    new spaCy projects system lets you describe whole{' '}
                    <strong>end-to-end workflows</strong> in a single file, giving you an easy path
                    from prototype to production, and making it easy to clone and adapt
                    best-practice projects for your own use cases.
                </LandingBanner>
                <LandingBanner
                    title="Prodigy: Radically efficient machine teaching"
                    label="From the makers of spaCy"
                    to="https://prodi.gy"
                    button="Try it out"
                    background="#f6f6f6"
                    color="#000"
                    small
                >
                    <Link to="https://prodi.gy" hidden>
                        <img
-                            src={courseImage}
+                            src={prodigyImage}
-                            alt="Advanced NLP with spaCy: A free online course"
+                            alt="Prodigy: Radically efficient machine teaching"
                        />
                    </Link>
                    <br />
                    <br />
-                    In this <strong>free and interactive online course</strong> you’ll learn how to
+                    Prodigy is an <strong>annotation tool</strong> so efficient that data scientists
-                    use spaCy to build advanced natural language understanding systems, using both
+                    can do the annotation themselves, enabling a new level of rapid iteration.
-                    rule-based and machine learning approaches. It includes{' '}
+                    Whether you're working on entity recognition, intent detection or image
-                    <strong>55 exercises</strong> featuring videos, slide decks, multiple-choice
+                    classification, Prodigy can help you <strong>train and evaluate</strong> your
-                    questions and interactive coding practice in the browser.
+                    models faster.
                </LandingBanner>
                <LandingBanner
                    title="spaCy IRL: Two days of NLP"
                    label="Watch the videos"
                    to="https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc"
                    button="Watch the videos"
                    background="#ffc194"
                    backgroundImage={irlBackground}
                    color="#1a1e23"
                    small
                >
                    We were pleased to invite the spaCy community and other folks working on NLP to
                    Berlin for a small and intimate event. We booked a beautiful venue, hand-picked
                    an awesome lineup of speakers and scheduled plenty of social time to get to know
                    each other. The YouTube playlist includes 12 talks about NLP research,
                    development and applications, with keynotes by Sebastian Ruder (DeepMind) and
                    Yoav Goldberg (Allen AI).
                </LandingBanner>
            </LandingBannerGrid>