Merge branch 'develop' into feature/init-config-cpu-gpu

2026-03-05 12:21:27 +03:00 · 2020-12-10 08:50:53 +11:00 · 2020-12-10 08:50:53 +11:00 · 9d32e839d3
commit 9d32e839d3
parent febf71af28 e09588e6ca
56 changed files with 7199 additions and 291 deletions
--- a/.github/contributors/KKsharma99.md
+++ b/.github/contributors/KKsharma99.md
@ -0,0 +1,108 @@
+<!-- This agreement was mistakenly submitted as an update to the CONTRIBUTOR_AGREEMENT.md template. Commit: 8a2d22222dec5cf910df5a378cbcd9ea2ab53ec4. It was therefore moved over manually. -->
+
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Kunal Sharma         |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 10/19/2020           |
+| GitHub username                | KKsharma99           |
+| Website (optional)             |                      |
--- a/.github/contributors/borijang.md
+++ b/.github/contributors/borijang.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Borijan Georgievski  |
+| Company name (if applicable)   | Netcetera            |
+| Title or role (if applicable)  | Deta Scientist       |
+| Date                           | 2020.10.09           |
+| GitHub username                | borijang             |
+| Website (optional)             |                      |
--- a/.github/contributors/danielvasic.md
+++ b/.github/contributors/danielvasic.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Daniel Vasić         |
+| Company name (if applicable)   | University of Mostar |
+| Title or role (if applicable)  | Teaching asistant    |
+| Date                           | 13/10/2020           |
+| GitHub username                | danielvasic          |
+| Website (optional)             |                      |
--- a/.github/contributors/forest1988.md
+++ b/.github/contributors/forest1988.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Yusuke Mori          |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  | Ph.D. student        |
+| Date                           | 2020/11/22           |
+| GitHub username                | forest1988           |
+| Website (optional)             | https://forest1988.github.io  |
--- a/.github/contributors/jabortell.md
+++ b/.github/contributors/jabortell.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Jacob Bortell        |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2020-11-20           |
+| GitHub username                | jabortell            |
+| Website (optional)             |                      |
--- a/.github/contributors/revuel.md
+++ b/.github/contributors/revuel.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Miguel Revuelta      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2020-11-17           |
+| GitHub username                | revuel               |
+| Website (optional)             |                      |
--- a/.github/contributors/robertsipek.md
+++ b/.github/contributors/robertsipek.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Robert Šípek         |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 22.10.2020           |
+| GitHub username                | @robertsipek         |
+| Website (optional)             |                      |
--- a/.github/contributors/vha14.md
+++ b/.github/contributors/vha14.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Vu Ha                |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 10-23-2020           |
+| GitHub username                | vha14                |
+| Website (optional)             |                      |
--- a/.github/contributors/walterhenry.md
+++ b/.github/contributors/walterhenry.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Walter Henry         |
+| Company name (if applicable)   | ExplosionAI GmbH     |
+| Title or role (if applicable)  | Executive Assistant  |
+| Date                           | September 14, 2020   |
+| GitHub username                | walterhenry          |
+| Website (optional)             |                      |
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@ -2,96 +2,113 @@ trigger:
  batch: true
  branches:
    include:
-    - '*'
+      - "*"
    exclude:
-    - 'spacy.io'
+      - "spacy.io"
  paths:
    exclude:
-    - 'website/*'
-    - '*.md'
+      - "website/*"
+      - "*.md"
 pr:
  paths:
    exclude:
-    - 'website/*'
-    - '*.md'
+      - "website/*"
+      - "*.md"

 jobs:
+  # Perform basic checks for most important errors (syntax etc.) Uses the config
+  # defined in .flake8 and overwrites the selected codes.
+  - job: "Validate"
+    pool:
+      vmImage: "ubuntu-16.04"
+    steps:
+      - task: UsePythonVersion@0
+        inputs:
+          versionSpec: "3.7"
+      - script: |
+          pip install flake8==3.5.0
+          python -m flake8 spacy --count --select=E901,E999,F821,F822,F823 --show-source --statistics
+        displayName: "flake8"

-# Perform basic checks for most important errors (syntax etc.) Uses the config
-# defined in .flake8 and overwrites the selected codes.
- job: 'Validate'
-  pool:
-    vmImage: 'ubuntu-16.04'
-  steps:
-  - task: UsePythonVersion@0
-    inputs:
-      versionSpec: '3.7'
-  - script: |
-      pip install flake8==3.5.0
-      python -m flake8 spacy --count --select=E901,E999,F821,F822,F823 --show-source --statistics
-    displayName: 'flake8'
+  - job: "Test"
+    dependsOn: "Validate"
+    strategy:
+      matrix:
+        Python36Linux:
+          imageName: "ubuntu-16.04"
+          python.version: "3.6"
+        Python36Windows:
+          imageName: "vs2017-win2016"
+          python.version: "3.6"
+        Python36Mac:
+          imageName: "macos-10.14"
+          python.version: "3.6"
+        # Don't test on 3.7 for now to speed up builds
+        # Python37Linux:
+        #   imageName: 'ubuntu-16.04'
+        #   python.version: '3.7'
+        # Python37Windows:
+        #   imageName: 'vs2017-win2016'
+        #   python.version: '3.7'
+        # Python37Mac:
+        #   imageName: 'macos-10.14'
+        #   python.version: '3.7'
+        Python38Linux:
+          imageName: "ubuntu-16.04"
+          python.version: "3.8"
+        Python38Windows:
+          imageName: "vs2017-win2016"
+          python.version: "3.8"
+        Python38Mac:
+          imageName: "macos-10.14"
+          python.version: "3.8"
+        # Python39Linux:
+        #   imageName: "ubuntu-16.04"
+        #   python.version: "3.9"
+        # Python39Windows:
+        #   imageName: "vs2017-win2016"
+        #   python.version: "3.9"
+        # Python39Mac:
+        #   imageName: "macos-10.14"
+        #   python.version: "3.9"
+      maxParallel: 4
+    pool:
+      vmImage: $(imageName)

- job: 'Test'
-  dependsOn: 'Validate'
-  strategy:
-    matrix:
-      Python36Linux:
-        imageName: 'ubuntu-16.04'
-        python.version: '3.6'
-      Python36Windows:
-        imageName: 'vs2017-win2016'
-        python.version: '3.6'
-      Python36Mac:
-        imageName: 'macos-10.14'
-        python.version: '3.6'
-      # Don't test on 3.7 for now to speed up builds
-      # Python37Linux:
-      #   imageName: 'ubuntu-16.04'
-      #   python.version: '3.7'
-      # Python37Windows:
-      #   imageName: 'vs2017-win2016'
-      #   python.version: '3.7'
-      # Python37Mac:
-      #   imageName: 'macos-10.14'
-      #   python.version: '3.7'
-      Python38Linux:
-        imageName: 'ubuntu-16.04'
-        python.version: '3.8'
-      Python38Windows:
-        imageName: 'vs2017-win2016'
-        python.version: '3.8'
-      Python38Mac:
-        imageName: 'macos-10.14'
-        python.version: '3.8'
-    maxParallel: 4
-  pool:
-    vmImage: $(imageName)
+    steps:
+      - task: UsePythonVersion@0
+        inputs:
+          versionSpec: "$(python.version)"
+          architecture: "x64"

-  steps:
-  - task: UsePythonVersion@0
-    inputs:
-      versionSpec: '$(python.version)'
-      architecture: 'x64'
+      - script: |
+          python -m pip install -U pip setuptools
+          pip install -r requirements.txt
+        displayName: "Install dependencies"
+        condition: not(eq(variables['python.version'], '3.5'))

-  - script: |
-      python -m pip install -U setuptools
-      pip install -r requirements.txt
-    displayName: 'Install dependencies'
+      - script: |
+          python setup.py build_ext --inplace -j 2
+          python setup.py sdist --formats=gztar
+        displayName: "Compile and build sdist"

-  - script: |
-      python setup.py build_ext --inplace
-      python setup.py sdist --formats=gztar
-    displayName: 'Compile and build sdist'
+      - task: DeleteFiles@1
+        inputs:
+          contents: "spacy"
+        displayName: "Delete source directory"

-  - task: DeleteFiles@1
-    inputs:
-      contents: 'spacy'
-    displayName: 'Delete source directory'
+      - script: |
+          pip freeze > installed.txt
+          pip uninstall -y -r installed.txt
+        displayName: "Uninstall all packages"

-  - bash: |
-      SDIST=$(python -c "import os;print(os.listdir('./dist')[-1])" 2>&1)
-      pip install dist/$SDIST
-    displayName: 'Install from sdist'
+      - bash: |
+          SDIST=$(python -c "import os;print(os.listdir('./dist')[-1])" 2>&1)
+          pip install dist/$SDIST
+        displayName: "Install from sdist"
+        condition: not(eq(variables['python.version'], '3.5'))

-  - script: python -m pytest --pyargs spacy
-    displayName: 'Run tests'
+      - script: |
+          pip install -r requirements.txt
+          python -m pytest --pyargs spacy
+        displayName: "Run tests"
--- a/build-constraints.txt
+++ b/build-constraints.txt
@ -0,0 +1,5 @@
+# build version constraints for use with wheelwright + multibuild
+numpy==1.15.0; python_version<='3.7'
+numpy==1.17.3; python_version=='3.8'
+numpy==1.19.3; python_version=='3.9'
+numpy; python_version>='3.10'
--- a/netlify.toml
+++ b/netlify.toml
@ -3,6 +3,8 @@ redirects = [
    {from = "https://spacy.netlify.com/*", to="https://spacy.io/:splat", force = true },
    # Subdomain for branches
    {from = "https://nightly.spacy.io/*", to="https://nightly-spacy-io.spacy.io/:splat", force = true, status = 200},
+    # TODO: update this with the v2 branch build once v3 is live (status = 200)
+    {from = "https://v2.spacy.io/*", to="https://spacy.io/:splat", force = true},
    # Old subdomains
    {from = "https://survey.spacy.io/*", to = "https://spacy.io", force = true},
    {from = "http://survey.spacy.io/*", to = "https://spacy.io", force = true},
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,13 +1,16 @@
 [build-system]
 requires = [
    "setuptools",
-    "wheel",
    "cython>=0.25",
    "cymem>=2.0.2,<2.1.0",
    "preshed>=3.0.2,<3.1.0",
    "murmurhash>=0.28.0,<1.1.0",
    "thinc>=8.0.0rc2,<8.1.0",
    "blis>=0.4.0,<0.8.0",
-    "pathy"
+    "pathy",
+    "numpy==1.15.0; python_version<='3.7'",
+    "numpy==1.17.3; python_version=='3.8'",
+    "numpy==1.19.3; python_version=='3.9'",
+    "numpy; python_version>='3.10'",
 ]
 build-backend = "setuptools.build_meta"
--- a/setup.cfg
+++ b/setup.cfg
@ -20,6 +20,7 @@ classifiers =
    Programming Language :: Python :: 3.6
    Programming Language :: Python :: 3.7
    Programming Language :: Python :: 3.8
+    Programming Language :: Python :: 3.9
    Topic :: Scientific/Engineering

 [options]
@ -27,7 +28,6 @@ zip_safe = false
 include_package_data = true
 python_requires = >=3.6
 setup_requires =
-    wheel
    cython>=0.25
    numpy>=1.15.0
    # We also need our Cython packages here to compile against
--- a/setup.py
+++ b/setup.py
@ -2,9 +2,9 @@
 from setuptools import Extension, setup, find_packages
 import sys
 import platform
+import numpy
 from distutils.command.build_ext import build_ext
 from distutils.sysconfig import get_python_inc
-import numpy
 from pathlib import Path
 import shutil
 from Cython.Build import cythonize
@ -194,8 +194,8 @@ def setup_package():
            print(f"Copied {copy_file} -> {target_dir}")

    include_dirs = [
-        get_python_inc(plat_specific=True),
        numpy.get_include(),
+        get_python_inc(plat_specific=True),
    ]
    ext_modules = []
    for name in MOD_NAMES:
@ -212,7 +212,7 @@ def setup_package():
        ext_modules=ext_modules,
        cmdclass={"build_ext": build_ext_subclass},
        include_dirs=include_dirs,
-        package_data={"": ["*.pyx", "*.pxd", "*.pxi", "*.cpp"]},
+        package_data={"": ["*.pyx", "*.pxd", "*.pxi"]},
    )


--- a/spacy/cli/init_config.py
+++ b/spacy/cli/init_config.py
@ -45,14 +45,16 @@ def init_config_cli(
    if isinstance(optimize, Optimizations):  # instance of enum from the CLI
        optimize = optimize.value
    pipeline = string_to_list(pipeline)
-    init_config(
-        output_file,
+    is_stdout = str(output_file) == "-"
+    config = init_config(
        lang=lang,
        pipeline=pipeline,
        optimize=optimize,
        gpu=gpu,
        pretraining=pretraining,
+        silent=is_stdout,
    )
+    save_config(config, output_file, is_stdout=is_stdout)


@init_cli.command("fill-config")
@ -118,16 +120,15 @@ def fill_config(


 def init_config(
-    output_file: Path,
    *,
    lang: str,
    pipeline: List[str],
    optimize: str,
    gpu: bool,
    pretraining: bool = False,
-) -> None:
-    is_stdout = str(output_file) == "-"
-    msg = Printer(no_print=is_stdout)
+    silent: bool = True,
+) -> Config:
+    msg = Printer(no_print=silent)
    with TEMPLATE_PATH.open("r") as f:
        template = Template(f.read())
    # Filter out duplicates since tok2vec and transformer are added by template
@ -173,7 +174,7 @@ def init_config(
            pretrain_config = util.load_config(DEFAULT_CONFIG_PRETRAIN_PATH)
            config = pretrain_config.merge(config)
    msg.good("Auto-filled config with all values")
-    save_config(config, output_file, is_stdout=is_stdout)
+    return config


 def save_config(
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -119,6 +119,10 @@ class Warnings:
            "call the {matcher} on each Doc object.")
    W107 = ("The property `Doc.{prop}` is deprecated. Use "
            "`Doc.has_annotation(\"{attr}\")` instead.")
+    W108 = ("The rule-based lemmatizer did not find POS annotation for the "
+            "token '{text}'. Check that your pipeline includes components that "
+            "assign token.pos, typically 'tagger'+'attribute_ruler' or "
+            "'morphologizer'.")


@add_codes
--- a/spacy/lang/char_classes.py
+++ b/spacy/lang/char_classes.py
@ -210,8 +210,12 @@ _ukrainian_lower = r"а-щюяіїєґ"
 _ukrainian_upper = r"А-ЩЮЯІЇЄҐ"
 _ukrainian = r"а-щюяіїєґА-ЩЮЯІЇЄҐ"

-_upper = LATIN_UPPER + _russian_upper + _tatar_upper + _greek_upper + _ukrainian_upper
-_lower = LATIN_LOWER + _russian_lower + _tatar_lower + _greek_lower + _ukrainian_lower
+_macedonian_lower = r"ѓѕјљњќѐѝ"
+_macedonian_upper = r"ЃЅЈЉЊЌЀЍ"
+_macedonian = r"ѓѕјљњќѐѝЃЅЈЉЊЌЀЍ"
+
+_upper = LATIN_UPPER + _russian_upper + _tatar_upper + _greek_upper + _ukrainian_upper + _macedonian_upper
+_lower = LATIN_LOWER + _russian_lower + _tatar_lower + _greek_lower + _ukrainian_lower + _macedonian_lower

 _uncased = (
    _bengali
@ -226,7 +230,7 @@ _uncased = (
    + _cjk
 )

-ALPHA = group_chars(LATIN + _russian + _tatar + _greek + _ukrainian + _uncased)
+ALPHA = group_chars(LATIN + _russian + _tatar + _greek + _ukrainian + _macedonian + _uncased)
 ALPHA_LOWER = group_chars(_lower + _uncased)
 ALPHA_UPPER = group_chars(_upper + _uncased)

--- a/spacy/lang/cs/init.py
+++ b/spacy/lang/cs/init.py
@ -1,9 +1,16 @@
 from .stop_words import STOP_WORDS
+from .tag_map import TAG_MAP
+from ...language import Language
+from ...attrs import LANG
 from .lex_attrs import LEX_ATTRS
 from ...language import Language


 class CzechDefaults(Language.Defaults):
+    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    lex_attr_getters.update(LEX_ATTRS)
+    lex_attr_getters[LANG] = lambda text: "cs"
+    tag_map = TAG_MAP
    stop_words = STOP_WORDS
    lex_attr_getters = LEX_ATTRS

--- a/spacy/lang/cs/tag_map.py
+++ b/spacy/lang/cs/tag_map.py
--- a/spacy/lang/en/syntax_iterators.py
+++ b/spacy/lang/en/syntax_iterators.py
@ -6,10 +6,21 @@ from ...tokens import Doc, Span


 def noun_chunks(doclike: Union[Doc, Span]) -> Iterator[Span]:
-    """Detect base noun phrases from a dependency parse. Works on Doc and Span."""
-    # fmt: off
-    labels = ["nsubj", "dobj", "nsubjpass", "pcomp", "pobj", "dative", "appos", "attr", "ROOT"]
-    # fmt: on
+    """
+    Detect base noun phrases from a dependency parse. Works on both Doc and Span.
+    """
+    labels = [
+        "oprd",
+        "nsubj",
+        "dobj",
+        "nsubjpass",
+        "pcomp",
+        "pobj",
+        "dative",
+        "appos",
+        "attr",
+        "ROOT",
+    ]
    doc = doclike.doc  # Ensure works on both Doc and Span.
    if not doc.has_annotation("DEP"):
        raise ValueError(Errors.E029)
--- a/spacy/lang/mk/init.py
+++ b/spacy/lang/mk/init.py
@ -0,0 +1,48 @@
+from typing import Optional
+from thinc.api import Model
+from .lemmatizer import MacedonianLemmatizer
+from .stop_words import STOP_WORDS
+from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
+from .lex_attrs import LEX_ATTRS
+from ..tokenizer_exceptions import BASE_EXCEPTIONS
+
+from ...language import Language
+from ...attrs import LANG
+from ...util import update_exc
+from ...lookups import Lookups
+
+
+class MacedonianDefaults(Language.Defaults):
+    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    lex_attr_getters[LANG] = lambda text: "mk"
+
+    # Optional: replace flags with custom functions, e.g. like_num()
+    lex_attr_getters.update(LEX_ATTRS)
+
+    # Merge base exceptions and custom tokenizer exceptions
+    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
+    stop_words = STOP_WORDS
+
+    @classmethod
+    def create_lemmatizer(cls, nlp=None, lookups=None):
+        if lookups is None:
+            lookups = Lookups()
+        return MacedonianLemmatizer(lookups)
+
+
+class Macedonian(Language):
+    lang = "mk"
+    Defaults = MacedonianDefaults
+
+
+@Macedonian.factory(
+    "lemmatizer",
+    assigns=["token.lemma"],
+    default_config={"model": None, "mode": "rule"},
+    default_score_weights={"lemma_acc": 1.0},
+)
+def make_lemmatizer(nlp: Language, model: Optional[Model], name: str, mode: str):
+    return MacedonianLemmatizer(nlp.vocab, model, name, mode=mode)
+
+
+__all__ = ["Macedonian"]
--- a/spacy/lang/mk/lemmatizer.py
+++ b/spacy/lang/mk/lemmatizer.py
@ -0,0 +1,55 @@
+from typing import List
+from collections import OrderedDict
+
+from ...pipeline import Lemmatizer
+from ...tokens import Token
+
+
+class MacedonianLemmatizer(Lemmatizer):
+    def rule_lemmatize(self, token: Token) -> List[str]:
+        string = token.text
+        univ_pos = token.pos_.lower()
+        morphology = token.morph.to_dict()
+
+        if univ_pos in ("", "eol", "space"):
+            return [string.lower()]
+
+        if string[-3:] == 'јќи':
+            string = string[:-3]
+            univ_pos = "verb"
+
+        if callable(self.is_base_form) and self.is_base_form(univ_pos, morphology):
+            return [string.lower()]
+        index_table = self.lookups.get_table("lemma_index", {})
+        exc_table = self.lookups.get_table("lemma_exc", {})
+        rules_table = self.lookups.get_table("lemma_rules", {})
+        if not any((index_table.get(univ_pos), exc_table.get(univ_pos), rules_table.get(univ_pos))):
+            if univ_pos == "propn":
+                return [string]
+            else:
+                return [string.lower()]
+
+        index = index_table.get(univ_pos, {})
+        exceptions = exc_table.get(univ_pos, {})
+        rules = rules_table.get(univ_pos, [])
+
+        orig = string
+        string = string.lower()
+        forms = []
+
+        for old, new in rules:
+            if string.endswith(old):
+                form = string[: len(string) - len(old)] + new
+                if not form:
+                    continue
+                if form in index or not form.isalpha():
+                    forms.append(form)
+
+        forms = list(OrderedDict.fromkeys(forms))
+        for form in exceptions.get(string, []):
+            if form not in forms:
+                forms.insert(0, form)
+        if not forms:
+            forms.append(orig)
+
+        return forms
--- a/spacy/lang/mk/lex_attrs.py
+++ b/spacy/lang/mk/lex_attrs.py
@ -0,0 +1,55 @@
+from ...attrs import LIKE_NUM
+
+_num_words = [
+    "нула", "еден", "една", "едно", "два", "две", "три", "четири", "пет", "шест", "седум", "осум", "девет", "десет",
+    "единаесет", "дванаесет", "тринаесет", "четиринаесет", "петнаесет", "шеснаесет", "седумнаесет", "осумнаесет",
+    "деветнаесет", "дваесет", "триесет", "четириесет", "педесет", "шеесет", "седумдесет", "осумдесет", "деведесет",
+    "сто", "двесте", "триста", "четиристотини", "петстотини", "шестотини", "седумстотини", "осумстотини",
+    "деветстотини", "илјада", "илјади", 'милион', 'милиони', 'милијарда', 'милијарди', 'билион', 'билиони',
+
+    "двајца", "тројца", "четворица", "петмина", "шестмина", "седуммина", "осуммина", "деветмина", "обата", "обајцата",
+
+    "прв", "втор", "трет", "четврт", "седм", "осм", "двестоти",
+
+    "два-три", "два-триесет", "два-триесетмина", "два-тринаесет", "два-тројца", "две-три", "две-тристотини",
+    "пет-шеесет", "пет-шеесетмина", "пет-шеснаесетмина", "пет-шест", "пет-шестмина", "пет-шестотини", "петина",
+    "осмина", "седум-осум", "седум-осумдесет", "седум-осуммина", "седум-осумнаесет", "седум-осумнаесетмина",
+    "три-четириесет", "три-четиринаесет", "шеесет", "шеесетина", "шеесетмина", "шеснаесет", "шеснаесетмина",
+    "шест-седум", "шест-седумдесет", "шест-седумнаесет", "шест-седумстотини", "шестоти", "шестотини"
+]
+
+
+def like_num(text):
+    if text.startswith(("+", "-", "±", "~")):
+        text = text[1:]
+    text = text.replace(",", "").replace(".", "")
+    if text.isdigit():
+        return True
+    if text.count("/") == 1:
+        num, denom = text.split("/")
+        if num.isdigit() and denom.isdigit():
+            return True
+
+    text_lower = text.lower()
+    if text_lower in _num_words:
+        return True
+
+    if text_lower.endswith(("а", "о", "и")):
+        if text_lower[:-1] in _num_words:
+            return True
+
+    if text_lower.endswith(("ти", "та", "то", "на")):
+        if text_lower[:-2] in _num_words:
+            return True
+
+    if text_lower.endswith(("ата", "иот", "ите", "ина", "чки")):
+        if text_lower[:-3] in _num_words:
+            return True
+
+    if text_lower.endswith(("мина", "тина")):
+        if text_lower[:-4] in _num_words:
+            return True
+    return False
+
+
+LEX_ATTRS = {LIKE_NUM: like_num}
--- a/spacy/lang/mk/stop_words.py
+++ b/spacy/lang/mk/stop_words.py
@ -0,0 +1,815 @@
+STOP_WORDS = set(
+    """
+а
+абре
+aв
+аи
+ако
+алало
+ам
+ама
+аман
+ами
+амин
+априли-ли-ли
+ау
+аух
+ауч
+ах
+аха
+аха-ха
+аш
+ашколсум
+ашколсун
+ај
+ајде
+ајс
+аџаба
+бавно
+бам
+бам-бум
+бап
+бар
+баре
+барем
+бау
+бау-бау
+баш
+бај
+бе
+беа
+бев
+бевме
+бевте
+без
+безбели
+бездруго
+белки
+беше
+би
+бидејќи
+бим
+бис
+бла
+блазе
+богами
+божем
+боц
+браво
+бравос
+бре
+бреј
+брзо
+бришка
+бррр
+бу
+бум
+буф
+буц
+бујрум
+ваа
+вам
+варај
+варда
+вас
+вај
+ве
+велат
+вели
+версус
+веќе
+ви
+виа
+види
+вие
+вистина
+витос
+внатре
+во
+воз
+вон
+впрочем
+врв
+вред
+време
+врз
+всушност
+втор
+галиба
+ги
+гитла
+го
+годе
+годишник
+горе
+гра
+гуц
+гљу
+да
+даан
+дава
+дал
+дали
+дан
+два
+дваесет
+дванаесет
+двајца
+две
+двесте
+движам
+движат
+движи
+движиме
+движите
+движиш
+де
+деведесет
+девет
+деветнаесет
+деветстотини
+деветти
+дека
+дел
+делми
+демек
+десет
+десетина
+десетти
+деситици
+дејгиди
+дејди
+ди
+дилми
+дин
+дип
+дно
+до
+доволно
+додека
+додуша
+докај
+доколку
+доправено
+доправи
+досамоти
+доста
+држи
+дрн
+друг
+друга
+другата
+други
+другиот
+другите
+друго
+другото
+дум
+дур
+дури
+е
+евала
+еве
+евет
+ега
+егиди
+еден
+едикојси
+единаесет
+единствено
+еднаш
+едно
+ексик
+ела
+елбете
+елем
+ели
+ем
+еми
+ене
+ете
+еурека
+ех
+еј
+жими
+жити
+за
+завал
+заврши
+зад
+задека
+задоволна
+задржи
+заедно
+зар
+зарад
+заради
+заре
+зарем
+затоа
+зашто
+згора
+зема
+земе
+земува
+зер
+значи
+зошто
+зуј
+и
+иако
+из
+извезен
+изгледа
+измеѓу
+износ
+или
+или-или
+илјада
+илјади
+им
+има
+имаа
+имаат
+имавме
+имавте
+имам
+имаме
+имате
+имаш
+имаше
+име
+имено
+именува
+имплицира
+имплицираат
+имплицирам
+имплицираме
+имплицирате
+имплицираш
+инаку
+индицира
+исечок
+исклучен
+исклучена
+исклучени
+исклучено
+искористен
+искористена
+искористени
+искористено
+искористи
+искрај
+исти
+исто
+итака
+итн
+их
+иха
+ихуу
+иш
+ишала
+иј
+ка
+каде
+кажува
+како
+каков
+камоли
+кај
+ква
+ки
+кит
+кло
+клум
+кога
+кого
+кого-годе
+кое
+кои
+количество
+количина
+колку
+кому
+кон
+користена
+користени
+користено
+користи
+кот
+котрр
+кош-кош
+кој
+која
+којзнае
+којшто
+кр-кр-кр
+крај
+крек
+крз
+крк
+крц
+куку
+кукуригу
+куш
+ле
+лебами
+леле
+лели
+ли
+лиду
+луп
+ма
+макар
+малку
+марш
+мат
+мац
+машала
+ме
+мене
+место
+меѓу
+меѓувреме
+меѓутоа
+ми
+мое
+може
+можеби
+молам
+моли
+мор
+мора
+море
+мори
+мразец
+му
+муклец
+мутлак
+муц
+мјау
+на
+навидум
+навистина
+над
+надвор
+назад
+накај
+накрај
+нали
+нам
+наместо
+наоколу
+направено
+направи
+напред
+нас
+наспоред
+наспрема
+наспроти
+насред
+натаму
+натема
+начин
+наш
+наша
+наше
+наши
+нај
+најдоцна
+најмалку
+најмногу
+не
+неа
+него
+негов
+негова
+негови
+негово
+незе
+нека
+некаде
+некако
+некаков
+некого
+некое
+некои
+неколку
+некому
+некој
+некојси
+нели
+немој
+нему
+неоти
+нечиј
+нешто
+нејзе
+нејзин
+нејзини
+нејзино
+нејсе
+ни
+нив
+нивен
+нивна
+нивни
+нивно
+ние
+низ
+никаде
+никако
+никогаш
+никого
+никому
+никој
+ним
+нити
+нито
+ниту
+ничиј
+ништо
+но
+нѐ
+о
+обр
+ова
+ова-она
+оваа
+овај
+овде
+овега
+овие
+овој
+од
+одавде
+оди
+однесува
+односно
+одошто
+околу
+олеле
+олкацок
+он
+она
+онаа
+онака
+онаков
+онде
+они
+оние
+оно
+оној
+оп
+освем
+освен
+осем
+осми
+осум
+осумдесет
+осумнаесет
+осумстотитни
+отаде
+оти
+откако
+откај
+откога
+отколку
+оттаму
+оттука
+оф
+ох
+ој
+па
+пак
+папа
+пардон
+пате-ќуте
+пати
+пау
+паче
+пеесет
+пеки
+пет
+петнаесет
+петстотини
+петти
+пи
+пи-пи
+пис
+плас
+плус
+по
+побавно
+поблиску
+побрзо
+побуни
+повеќе
+повторно
+под
+подалеку
+подолу
+подоцна
+подруго
+позади
+поинаква
+поинакви
+поинакво
+поинаков
+поинаку
+покаже
+покажува
+покрај
+полно
+помалку
+помеѓу
+понатаму
+понекогаш
+понекој
+поради
+поразличен
+поразлична
+поразлични
+поразлично
+поседува
+после
+последен
+последна
+последни
+последно
+поспоро
+потег
+потоа
+пошироко
+прави
+празно
+прв
+пред
+през
+преку
+претежно
+претходен
+претходна
+претходни
+претходник
+претходно
+при
+присвои
+притоа
+причинува
+пријатно
+просто
+против
+прр
+пст
+пук
+пусто
+пуф
+пуј
+пфуј
+пшт
+ради
+различен
+различна
+различни
+различно
+разни
+разоружен
+разредлив
+рамките
+рамнообразно
+растревожено
+растреперено
+расчувствувано
+ратоборно
+рече
+роден
+с
+сакан
+сам
+сама
+сами
+самите
+само
+самоти
+свое
+свои
+свој
+своја
+се
+себе
+себеси
+сега
+седми
+седум
+седумдесет
+седумнаесет
+седумстотини
+секаде
+секаков
+секи
+секогаш
+секого
+секому
+секој
+секојдневно
+сем
+сенешто
+сепак
+сериозен
+сериозна
+сериозни
+сериозно
+сет
+сечиј
+сешто
+си
+сиктер
+сиот
+сип
+сиреч
+сите
+сичко
+скок
+скоро
+скрц
+следбеник
+следбеничка
+следен
+следователно
+следствено
+сме
+со
+соне
+сопствен
+сопствена
+сопствени
+сопствено
+сосе
+сосем
+сполај
+според
+споро
+спрема
+спроти
+спротив
+сред
+среде
+среќно
+срочен
+сст
+става
+ставаат
+ставам
+ставаме
+ставате
+ставаш
+стави
+сте
+сто
+стоп
+страна
+сум
+сума
+супер
+сус
+сѐ
+та
+таа
+така
+таква
+такви
+таков
+тамам
+таму
+тангар-мангар
+тандар-мандар
+тап
+твое
+те
+тебе
+тебека
+тек
+текот
+ти
+тие
+тизе
+тик-так
+тики
+тоа
+тогаш
+тој
+трак
+трака-трука
+трас
+треба
+трет
+три
+триесет
+тринаест
+триста
+труп
+трупа
+трус
+ту
+тука
+туку
+тукушто
+туф
+у
+уа
+убаво
+уви
+ужасно
+уз
+ура
+уу
+уф
+уха
+уш
+уште
+фазен
+фала
+фил
+филан
+фис
+фиу
+фиљан
+фоб
+фон
+ха
+ха-ха
+хе
+хеј
+хеј
+хи
+хм
+хо
+цак
+цап
+целина
+цело
+цигу-лигу
+циц
+чекај
+често
+четврт
+четири
+четириесет
+четиринаесет
+четирстотини
+чие
+чии
+чик
+чик-чирик
+чини
+чиш
+чиј
+чија
+чијшто
+чкрап
+чому
+чук
+чукш
+чуму
+чунки
+шеесет
+шеснаесет
+шест
+шести
+шестотини
+ширум
+шлак
+шлап
+шлапа-шлупа
+шлуп
+шмрк
+што
+штогоде
+штом
+штотуку
+штрак
+штрап
+штрап-штруп
+шуќур
+ѓиди
+ѓоа
+ѓоамити
+ѕан
+ѕе
+ѕин
+ја
+јадец
+јазе
+јали
+јас
+јаска
+јок
+ќе
+ќешки
+ѝ
+џагара-магара
+џанам
+џив-џив
+    """.split()
+)
--- a/spacy/lang/mk/tokenizer_exceptions.py
+++ b/spacy/lang/mk/tokenizer_exceptions.py
@ -0,0 +1,100 @@
+from ...symbols import ORTH, NORM
+
+
+_exc = {}
+
+
+_abbr_exc = [
+    {ORTH: "м", NORM: "метар"},
+    {ORTH: "мм", NORM: "милиметар"},
+    {ORTH: "цм", NORM: "центиметар"},
+    {ORTH: "см", NORM: "сантиметар"},
+    {ORTH: "дм", NORM: "дециметар"},
+    {ORTH: "км", NORM: "километар"},
+    {ORTH: "кг", NORM: "килограм"},
+    {ORTH: "дкг", NORM: "декаграм"},
+    {ORTH: "дг", NORM: "дециграм"},
+    {ORTH: "мг", NORM: "милиграм"},
+    {ORTH: "г", NORM: "грам"},
+    {ORTH: "т", NORM: "тон"},
+    {ORTH: "кл", NORM: "килолитар"},
+    {ORTH: "хл", NORM: "хектолитар"},
+    {ORTH: "дкл", NORM: "декалитар"},
+    {ORTH: "л", NORM: "литар"},
+    {ORTH: "дл", NORM: "децилитар"}
+
+]
+for abbr in _abbr_exc:
+    _exc[abbr[ORTH]] = [abbr]
+
+_abbr_line_exc = [
+    {ORTH: "д-р", NORM: "доктор"},
+    {ORTH: "м-р", NORM: "магистер"},
+    {ORTH: "г-ѓа", NORM: "госпоѓа"},
+    {ORTH: "г-ца", NORM: "госпоѓица"},
+    {ORTH: "г-дин", NORM: "господин"},
+
+]
+
+for abbr in _abbr_line_exc:
+    _exc[abbr[ORTH]] = [abbr]
+
+_abbr_dot_exc = [
+    {ORTH: "в.", NORM: "век"},
+    {ORTH: "в.д.", NORM: "вршител на должност"},
+    {ORTH: "г.", NORM: "година"},
+    {ORTH: "г.г.", NORM: "господин господин"},
+    {ORTH: "м.р.", NORM: "машки род"},
+    {ORTH: "год.", NORM: "женски род"},
+    {ORTH: "с.р.", NORM: "среден род"},
+    {ORTH: "н.е.", NORM: "наша ера"},
+    {ORTH: "о.г.", NORM: "оваа година"},
+    {ORTH: "о.м.", NORM: "овој месец"},
+    {ORTH: "с.", NORM: "село"},
+    {ORTH: "т.", NORM: "точка"},
+    {ORTH: "т.е.", NORM: "то ест"},
+    {ORTH: "т.н.", NORM: "таканаречен"},
+
+    {ORTH: "бр.", NORM: "број"},
+    {ORTH: "гр.", NORM: "град"},
+    {ORTH: "др.", NORM: "другар"},
+    {ORTH: "и др.", NORM: "и друго"},
+    {ORTH: "и сл.", NORM: "и слично"},
+    {ORTH: "кн.", NORM: "книга"},
+    {ORTH: "мн.", NORM: "множина"},
+    {ORTH: "на пр.", NORM: "на пример"},
+    {ORTH: "св.", NORM: "свети"},
+    {ORTH: "сп.", NORM: "списание"},
+    {ORTH: "с.", NORM: "страница"},
+    {ORTH: "стр.", NORM: "страница"},
+    {ORTH: "чл.", NORM: "член"},
+
+    {ORTH: "арх.", NORM: "архитект"},
+    {ORTH: "бел.", NORM: "белешка"},
+    {ORTH: "гимн.", NORM: "гимназија"},
+    {ORTH: "ден.", NORM: "денар"},
+    {ORTH: "ул.", NORM: "улица"},
+    {ORTH: "инж.", NORM: "инженер"},
+    {ORTH: "проф.", NORM: "професор"},
+    {ORTH: "студ.", NORM: "студент"},
+    {ORTH: "бот.", NORM: "ботаника"},
+    {ORTH: "мат.", NORM: "математика"},
+    {ORTH: "мед.", NORM: "медицина"},
+    {ORTH: "прил.", NORM: "прилог"},
+    {ORTH: "прид.", NORM: "придавка"},
+    {ORTH: "сврз.", NORM: "сврзник"},
+    {ORTH: "физ.", NORM: "физика"},
+    {ORTH: "хем.", NORM: "хемија"},
+    {ORTH: "пр. н.", NORM: "природни науки"},
+    {ORTH: "истор.", NORM: "историја"},
+    {ORTH: "геогр.", NORM: "географија"},
+    {ORTH: "литер.", NORM: "литература"},
+
+
+]
+
+for abbr in _abbr_dot_exc:
+    _exc[abbr[ORTH]] = [abbr]
+
+
+TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/lang/tr/init.py
+++ b/spacy/lang/tr/init.py
@ -1,4 +1,4 @@
-from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
+from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS, TOKEN_MATCH
 from .stop_words import STOP_WORDS
 from .syntax_iterators import SYNTAX_ITERATORS
 from .lex_attrs import LEX_ATTRS
@ -9,6 +9,7 @@ class TurkishDefaults(Language.Defaults):
    tokenizer_exceptions = TOKENIZER_EXCEPTIONS
    lex_attr_getters = LEX_ATTRS
    stop_words = STOP_WORDS
+    token_match = TOKEN_MATCH
    syntax_iterators = SYNTAX_ITERATORS


--- a/spacy/lang/tr/tokenizer_exceptions.py
+++ b/spacy/lang/tr/tokenizer_exceptions.py
@ -1,119 +1,181 @@
-from ..tokenizer_exceptions import BASE_EXCEPTIONS
+import re
+
+from ..punctuation import ALPHA_LOWER, ALPHA
 from ...symbols import ORTH, NORM
-from ...util import update_exc


-_exc = {"sağol": [{ORTH: "sağ"}, {ORTH: "ol", NORM: "olun"}]}
+_exc = {}


-for exc_data in [
-    {ORTH: "A.B.D.", NORM: "Amerika Birleşik Devletleri"},
-    {ORTH: "Alb.", NORM: "Albay"},
-    {ORTH: "Ar.Gör.", NORM: "Araştırma Görevlisi"},
-    {ORTH: "Arş.Gör.", NORM: "Araştırma Görevlisi"},
-    {ORTH: "Asb.", NORM: "Astsubay"},
-    {ORTH: "Astsb.", NORM: "Astsubay"},
-    {ORTH: "As.İz.", NORM: "Askeri İnzibat"},
-    {ORTH: "Atğm", NORM: "Asteğmen"},
-    {ORTH: "Av.", NORM: "Avukat"},
-    {ORTH: "Apt.", NORM: "Apartmanı"},
-    {ORTH: "Bçvş.", NORM: "Başçavuş"},
+_abbr_period_exc = [
+    {ORTH: "A.B.D.", NORM: "Amerika"},
+    {ORTH: "Alb.", NORM: "albay"},
+    {ORTH: "Ank.", NORM: "Ankara"},
+    {ORTH: "Ar.Gör."},
+    {ORTH: "Arş.Gör."},
+    {ORTH: "Asb.", NORM: "astsubay"},
+    {ORTH: "Astsb.", NORM: "astsubay"},
+    {ORTH: "As.İz."},
+    {ORTH: "as.iz."},
+    {ORTH: "Atğm", NORM: "asteğmen"},
+    {ORTH: "Av.", NORM: "avukat"},
+    {ORTH: "Apt.", NORM: "apartmanı"},
+    {ORTH: "apt.", NORM: "apartmanı"},
+    {ORTH: "Bçvş.", NORM: "başçavuş"},
+    {ORTH: "bçvş.", NORM: "başçavuş"},
    {ORTH: "bk.", NORM: "bakınız"},
    {ORTH: "bknz.", NORM: "bakınız"},
-    {ORTH: "Bnb.", NORM: "Binbaşı"},
+    {ORTH: "Bnb.", NORM: "binbaşı"},
    {ORTH: "bnb.", NORM: "binbaşı"},
-    {ORTH: "Böl.", NORM: "Bölümü"},
-    {ORTH: "Bşk.", NORM: "Başkanlığı"},
-    {ORTH: "Bştbp.", NORM: "Baştabip"},
-    {ORTH: "Bul.", NORM: "Bulvarı"},
-    {ORTH: "Cad.", NORM: "Caddesi"},
+    {ORTH: "Böl.", NORM: "bölümü"},
+    {ORTH: "böl.", NORM: "bölümü"},
+    {ORTH: "Bşk.", NORM: "başkanlığı"},
+    {ORTH: "bşk.", NORM: "başkanlığı"},
+    {ORTH: "Bştbp.", NORM: "baştabip"},
+    {ORTH: "bştbp.", NORM: "baştabip"},
+    {ORTH: "Bul.", NORM: "bulvarı"},
+    {ORTH: "bul.", NORM: "bulvarı"},
+    {ORTH: "Cad.", NORM: "caddesi"},
+    {ORTH: "cad.", NORM: "caddesi"},
    {ORTH: "çev.", NORM: "çeviren"},
-    {ORTH: "Çvş.", NORM: "Çavuş"},
+    {ORTH: "Çvş.", NORM: "çavuş"},
+    {ORTH: "çvş.", NORM: "çavuş"},
    {ORTH: "dak.", NORM: "dakika"},
    {ORTH: "dk.", NORM: "dakika"},
-    {ORTH: "Doç.", NORM: "Doçent"},
-    {ORTH: "doğ.", NORM: "doğum tarihi"},
+    {ORTH: "Doç.", NORM: "doçent"},
+    {ORTH: "doğ."},
+    {ORTH: "Dr.", NORM: "doktor"},
+    {ORTH: "dr.", NORM:"doktor"},
    {ORTH: "drl.", NORM: "derleyen"},
-    {ORTH: "Dz.", NORM: "Deniz"},
-    {ORTH: "Dz.K.K.lığı", NORM: "Deniz Kuvvetleri Komutanlığı"},
-    {ORTH: "Dz.Kuv.", NORM: "Deniz Kuvvetleri"},
-    {ORTH: "Dz.Kuv.K.", NORM: "Deniz Kuvvetleri Komutanlığı"},
+    {ORTH: "Dz.", NORM: "deniz"},
+    {ORTH: "Dz.K.K.lığı"},
+    {ORTH: "Dz.Kuv."},
+    {ORTH: "Dz.Kuv.K."},
    {ORTH: "dzl.", NORM: "düzenleyen"},
-    {ORTH: "Ecz.", NORM: "Eczanesi"},
+    {ORTH: "Ecz.", NORM: "eczanesi"},
+    {ORTH: "ecz.", NORM: "eczanesi"},
    {ORTH: "ekon.", NORM: "ekonomi"},
-    {ORTH: "Fak.", NORM: "Fakültesi"},
-    {ORTH: "Gn.", NORM: "Genel"},
+    {ORTH: "Fak.", NORM: "fakültesi"},
+    {ORTH: "Gn.", NORM: "genel"},
    {ORTH: "Gnkur.", NORM: "Genelkurmay"},
    {ORTH: "Gn.Kur.", NORM: "Genelkurmay"},
    {ORTH: "gr.", NORM: "gram"},
-    {ORTH: "Hst.", NORM: "Hastanesi"},
-    {ORTH: "Hs.Uzm.", NORM: "Hesap Uzmanı"},
+    {ORTH: "Hst.", NORM: "hastanesi"},
+    {ORTH: "hst.", NORM: "hastanesi"},
+    {ORTH: "Hs.Uzm."},
    {ORTH: "huk.", NORM: "hukuk"},
-    {ORTH: "Hv.", NORM: "Hava"},
-    {ORTH: "Hv.K.K.lığı", NORM: "Hava Kuvvetleri Komutanlığı"},
-    {ORTH: "Hv.Kuv.", NORM: "Hava Kuvvetleri"},
-    {ORTH: "Hv.Kuv.K.", NORM: "Hava Kuvvetleri Komutanlığı"},
-    {ORTH: "Hz.", NORM: "Hazreti"},
-    {ORTH: "Hz.Öz.", NORM: "Hizmete Özel"},
-    {ORTH: "İng.", NORM: "İngilizce"},
-    {ORTH: "Jeol.", NORM: "Jeoloji"},
+    {ORTH: "Hv.", NORM: "hava"},
+    {ORTH: "Hv.K.K.lığı"},
+    {ORTH: "Hv.Kuv."},
+    {ORTH: "Hv.Kuv.K."},
+    {ORTH: "Hz.", NORM: "hazreti"},
+    {ORTH: "Hz.Öz."},
+    {ORTH: "İng.", NORM: "ingilizce"},
+    {ORTH: "İst.", NORM: "İstanbul"},
+    {ORTH: "Jeol.", NORM: "jeoloji"},
    {ORTH: "jeol.", NORM: "jeoloji"},
-    {ORTH: "Korg.", NORM: "Korgeneral"},
-    {ORTH: "Kur.", NORM: "Kurmay"},
-    {ORTH: "Kur.Bşk.", NORM: "Kurmay Başkanı"},
-    {ORTH: "Kuv.", NORM: "Kuvvetleri"},
-    {ORTH: "Ltd.", NORM: "Limited"},
-    {ORTH: "Mah.", NORM: "Mahallesi"},
+    {ORTH: "Korg.", NORM: "korgeneral"},
+    {ORTH: "Kur.", NORM: "kurmay"},
+    {ORTH: "Kur.Bşk."},
+    {ORTH: "Kuv.", NORM: "kuvvetleri"},
+    {ORTH: "Ltd.", NORM: "limited"},
+    {ORTH: "ltd.", NORM: "limited"},
+    {ORTH: "Mah.", NORM: "mahallesi"},
    {ORTH: "mah.", NORM: "mahallesi"},
    {ORTH: "max.", NORM: "maksimum"},
    {ORTH: "min.", NORM: "minimum"},
-    {ORTH: "Müh.", NORM: "Mühendisliği"},
+    {ORTH: "Müh.", NORM: "mühendisliği"},
    {ORTH: "müh.", NORM: "mühendisliği"},
-    {ORTH: "MÖ.", NORM: "Milattan Önce"},
-    {ORTH: "Onb.", NORM: "Onbaşı"},
-    {ORTH: "Ord.", NORM: "Ordinaryüs"},
-    {ORTH: "Org.", NORM: "Orgeneral"},
-    {ORTH: "Ped.", NORM: "Pedagoji"},
-    {ORTH: "Prof.", NORM: "Profesör"},
-    {ORTH: "Sb.", NORM: "Subay"},
-    {ORTH: "Sn.", NORM: "Sayın"},
+    {ORTH: "M.Ö."},
+    {ORTH: "M.S."},
+    {ORTH: "Onb.", NORM: "onbaşı"},
+    {ORTH: "Ord.", NORM: "ordinaryüs"},
+    {ORTH: "Org.", NORM: "orgeneral"},
+    {ORTH: "Ped.", NORM: "pedagoji"},
+    {ORTH: "Prof.", NORM: "profesör"},
+    {ORTH: "prof.", NORM: "profesör"},
+    {ORTH: "Sb.", NORM: "subay"},
+    {ORTH: "Sn.", NORM: "sayın"},
    {ORTH: "sn.", NORM: "saniye"},
-    {ORTH: "Sok.", NORM: "Sokak"},
-    {ORTH: "Şb.", NORM: "Şube"},
-    {ORTH: "Şti.", NORM: "Şirketi"},
-    {ORTH: "Tbp.", NORM: "Tabip"},
-    {ORTH: "T.C.", NORM: "Türkiye Cumhuriyeti"},
-    {ORTH: "Tel.", NORM: "Telefon"},
+    {ORTH: "Sok.", NORM: "sokak"},
+    {ORTH: "sok.", NORM: "sokak"},
+    {ORTH: "Şb.", NORM: "şube"},
+    {ORTH: "şb.", NORM: "şube"},
+    {ORTH: "Şti.", NORM: "şirketi"},
+    {ORTH: "şti.", NORM: "şirketi"},
+    {ORTH: "Tbp.", NORM: "tabip"},
+    {ORTH: "tbp.", NORM: "tabip"},
+    {ORTH: "T.C."},
+    {ORTH: "Tel.", NORM: "telefon"},
    {ORTH: "tel.", NORM: "telefon"},
    {ORTH: "telg.", NORM: "telgraf"},
-    {ORTH: "Tğm.", NORM: "Teğmen"},
+    {ORTH: "Tğm.", NORM: "teğmen"},
    {ORTH: "tğm.", NORM: "teğmen"},
    {ORTH: "tic.", NORM: "ticaret"},
-    {ORTH: "Tug.", NORM: "Tugay"},
-    {ORTH: "Tuğg.", NORM: "Tuğgeneral"},
-    {ORTH: "Tümg.", NORM: "Tümgeneral"},
-    {ORTH: "Uzm.", NORM: "Uzman"},
-    {ORTH: "Üçvş.", NORM: "Üstçavuş"},
-    {ORTH: "Üni.", NORM: "Üniversitesi"},
-    {ORTH: "Ütğm.", NORM: "Üsteğmen"},
-    {ORTH: "vb.", NORM: "ve benzeri"},
+    {ORTH: "Tug.", NORM: "tugay"},
+    {ORTH: "Tuğg.", NORM: "tuğgeneral"},
+    {ORTH: "Tümg.", NORM: "tümgeneral"},
+    {ORTH: "Uzm.", NORM: "uzman"},
+    {ORTH: "Üçvş.", NORM: "üstçavuş"},
+    {ORTH: "Üni.", NORM: "üniversitesi"},
+    {ORTH: "Ütğm.", NORM:  "üsteğmen"},
+    {ORTH: "vb."},
    {ORTH: "vs.", NORM: "vesaire"},
-    {ORTH: "Yard.", NORM: "Yardımcı"},
-    {ORTH: "Yar.", NORM: "Yardımcı"},
-    {ORTH: "Yd.Sb.", NORM: "Yedek Subay"},
-    {ORTH: "Yard.Doç.", NORM: "Yardımcı Doçent"},
-    {ORTH: "Yar.Doç.", NORM: "Yardımcı Doçent"},
-    {ORTH: "Yb.", NORM: "Yarbay"},
-    {ORTH: "Yrd.", NORM: "Yardımcı"},
-    {ORTH: "Yrd.Doç.", NORM: "Yardımcı Doçent"},
-    {ORTH: "Y.Müh.", NORM: "Yüksek mühendis"},
-    {ORTH: "Y.Mim.", NORM: "Yüksek mimar"},
-]:
-    _exc[exc_data[ORTH]] = [exc_data]
+    {ORTH: "Yard.", NORM: "yardımcı"},
+    {ORTH: "Yar.", NORM: "yardımcı"},
+    {ORTH: "Yd.Sb."},
+    {ORTH: "Yard.Doç."},
+    {ORTH: "Yar.Doç."},
+    {ORTH: "Yb.", NORM: "yarbay"},
+    {ORTH: "Yrd.", NORM: "yardımcı"},
+    {ORTH: "Yrd.Doç."},
+    {ORTH: "Y.Müh."},
+    {ORTH: "Y.Mim."},
+    {ORTH: "yy.", NORM: "yüzyıl"},
+]
+
+for abbr in _abbr_period_exc:
+    _exc[abbr[ORTH]] = [abbr]
+
+_abbr_exc = [
+    {ORTH: "AB", NORM: "Avrupa Birliği"},
+    {ORTH: "ABD", NORM: "Amerika"},
+    {ORTH: "ABS", NORM: "fren"},
+    {ORTH: "AOÇ"},
+    {ORTH: "ASKİ"},
+    {ORTH: "Bağ-kur", NORM: "Bağkur"},
+    {ORTH: "BDDK"},
+    {ORTH: "BJK", NORM: "Beşiktaş"},
+    {ORTH: "ESA", NORM: "Avrupa uzay ajansı"},
+    {ORTH: "FB", NORM: "Fenerbahçe"},
+    {ORTH: "GATA"},
+    {ORTH: "GS", NORM: "Galatasaray"},
+    {ORTH: "İSKİ"},
+    {ORTH: "KBB"},
+    {ORTH: "RTÜK", NORM: "radyo ve televizyon üst kurulu"},
+    {ORTH: "TBMM"},
+    {ORTH: "TC"},
+    {ORTH: "TÜİK", NORM: "Türkiye istatistik kurumu"},
+    {ORTH: "YÖK"},
+]
+
+for abbr in _abbr_exc:
+    _exc[abbr[ORTH]] = [abbr]


-for orth in ["Dr.", "yy."]:
-    _exc[orth] = [{ORTH: orth}]

+_num = r"[+-]?\d+([,.]\d+)*"
+_ord_num = r"(\d+\.)"
+_date = r"(((\d{1,2}[./-]){2})?(\d{4})|(\d{1,2}[./]\d{1,2}(\.)?))"
+_dash_num = r"(([{al}\d]+/\d+)|(\d+/[{al}]))".format(al=ALPHA)
+_roman_num =  "M{0,3}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})"
+_roman_ord = r"({rn})\.".format(rn=_roman_num)
+_time_exp = r"\d+(:\d+)*"

-TOKENIZER_EXCEPTIONS = update_exc(BASE_EXCEPTIONS, _exc)
+_inflections = r"'[{al}]+".format(al=ALPHA_LOWER)
+_abbrev_inflected = r"[{a}]+\.'[{al}]+".format(a=ALPHA, al=ALPHA_LOWER)
+
+_nums = r"(({d})|({dn})|({te})|({on})|({n})|({ro})|({rn}))({inf})?".format(d=_date, dn=_dash_num, te=_time_exp, on=_ord_num, n=_num, ro=_roman_ord, rn=_roman_num, inf=_inflections)
+
+TOKENIZER_EXCEPTIONS = _exc
+TOKEN_MATCH = re.compile(r"^({abbr})|({n})$".format(n=_nums, abbr=_abbrev_inflected)).match
--- a/spacy/language.py
+++ b/spacy/language.py
@ -968,10 +968,6 @@ class Language:

        DOCS: https://nightly.spacy.io/api/language#call
        """
-        if len(text) > self.max_length:
-            raise ValueError(
-                Errors.E088.format(length=len(text), max_length=self.max_length)
-            )
        doc = self.make_doc(text)
        if component_cfg is None:
            component_cfg = {}
@ -1045,6 +1041,11 @@ class Language:
        text (str): The text to process.
        RETURNS (Doc): The processed doc.
        """
+        if len(text) > self.max_length:
+            raise ValueError(
+                Errors.E088.format(length=len(text), max_length=self.max_length)
+            )
+        return self.tokenizer(text)
        return self.tokenizer(text)

    def update(
--- a/spacy/matcher/matcher.pxd
+++ b/spacy/matcher/matcher.pxd
@ -26,6 +26,7 @@ cdef enum quantifier_t:
    ZERO_PLUS
    ONE
    ONE_PLUS
+    FINAL_ID


 cdef struct AttrValueC:
--- a/spacy/matcher/matcher.pyx
+++ b/spacy/matcher/matcher.pyx
@ -2,7 +2,7 @@
 from typing import List

 from libcpp.vector cimport vector
-from libc.stdint cimport int32_t
+from libc.stdint cimport int32_t, int8_t
 from libc.string cimport memset, memcmp
 from cymem.cymem cimport Pool
 from murmurhash.mrmr cimport hash64
@ -308,7 +308,7 @@ cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, e
        # avoid any processing or mem alloc if the document is empty
        return output
    if len(predicates) > 0:
-        predicate_cache = <char*>mem.alloc(length * len(predicates), sizeof(char))
+        predicate_cache = <int8_t*>mem.alloc(length * len(predicates), sizeof(int8_t))
    if extensions is not None and len(extensions) >= 1:
        nr_extra_attr = max(extensions.values()) + 1
        extra_attr_values = <attr_t*>mem.alloc(length * nr_extra_attr, sizeof(attr_t))
@ -349,7 +349,7 @@ cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, e


 cdef void transition_states(vector[PatternStateC]& states, vector[MatchC]& matches,
-                            char* cached_py_predicates,
+                            int8_t* cached_py_predicates,
        Token token, const attr_t* extra_attrs, py_predicates) except *:
    cdef int q = 0
    cdef vector[PatternStateC] new_states
@ -421,7 +421,7 @@ cdef void transition_states(vector[PatternStateC]& states, vector[MatchC]& match
        states.push_back(new_states[i])


-cdef int update_predicate_cache(char* cache,
+cdef int update_predicate_cache(int8_t* cache,
        const TokenPatternC* pattern, Token token, predicates) except -1:
    # If the state references any extra predicates, check whether they match.
    # These are cached, so that we don't call these potentially expensive
@ -459,7 +459,7 @@ cdef void finish_states(vector[MatchC]& matches, vector[PatternStateC]& states)

 cdef action_t get_action(PatternStateC state,
        const TokenC* token, const attr_t* extra_attrs,
-        const char* predicate_matches) nogil:
+        const int8_t* predicate_matches) nogil:
    """We need to consider:
    a) Does the token match the specification? [Yes, No]
    b) What's the quantifier? [1, 0+, ?]
@ -517,7 +517,7 @@ cdef action_t get_action(PatternStateC state,

    Problem: If a quantifier is matching, we're adding a lot of open partials
    """
-    cdef char is_match
+    cdef int8_t is_match
    is_match = get_is_match(state, token, extra_attrs, predicate_matches)
    quantifier = get_quantifier(state)
    is_final = get_is_final(state)
@ -569,9 +569,9 @@ cdef action_t get_action(PatternStateC state,
          return RETRY


-cdef char get_is_match(PatternStateC state,
+cdef int8_t get_is_match(PatternStateC state,
        const TokenC* token, const attr_t* extra_attrs,
-        const char* predicate_matches) nogil:
+        const int8_t* predicate_matches) nogil:
    for i in range(state.pattern.nr_py):
        if predicate_matches[state.pattern.py_predicates[i]] == -1:
            return 0
@ -586,8 +586,8 @@ cdef char get_is_match(PatternStateC state,
    return True


-cdef char get_is_final(PatternStateC state) nogil:
-    if state.pattern[1].nr_attr == 0 and state.pattern[1].attrs != NULL:
+cdef int8_t get_is_final(PatternStateC state) nogil:
+    if state.pattern[1].quantifier == FINAL_ID:
        id_attr = state.pattern[1].attrs[0]
        if id_attr.attr != ID:
            with gil:
@ -597,7 +597,7 @@ cdef char get_is_final(PatternStateC state) nogil:
        return 0


-cdef char get_quantifier(PatternStateC state) nogil:
+cdef int8_t get_quantifier(PatternStateC state) nogil:
    return state.pattern.quantifier


@ -626,36 +626,20 @@ cdef TokenPatternC* init_pattern(Pool mem, attr_t entity_id, object token_specs)
        pattern[i].nr_py = len(predicates)
        pattern[i].key = hash64(pattern[i].attrs, pattern[i].nr_attr * sizeof(AttrValueC), 0)
    i = len(token_specs)
-    # Even though here, nr_attr == 0, we're storing the ID value in attrs[0] (bug-prone, thread carefully!)
-    pattern[i].attrs = <AttrValueC*>mem.alloc(2, sizeof(AttrValueC))
+    # Use quantifier to identify final ID pattern node (rather than previous
+    # uninitialized quantifier == 0/ZERO + nr_attr == 0 + non-zero-length attrs)
+    pattern[i].quantifier = FINAL_ID
+    pattern[i].attrs = <AttrValueC*>mem.alloc(1, sizeof(AttrValueC))
    pattern[i].attrs[0].attr = ID
    pattern[i].attrs[0].value = entity_id
-    pattern[i].nr_attr = 0
+    pattern[i].nr_attr = 1
    pattern[i].nr_extra_attr = 0
    pattern[i].nr_py = 0
    return pattern


 cdef attr_t get_ent_id(const TokenPatternC* pattern) nogil:
-    # There have been a few bugs here. We used to have two functions,
-    # get_ent_id and get_pattern_key that tried to do the same thing. These
-    # are now unified to try to solve the "ghost match" problem.
-    # Below is the previous implementation of get_ent_id and the comment on it,
-    # preserved for reference while we figure out whether the heisenbug in the
-    # matcher is resolved.
-    #
-    #
-    #     cdef attr_t get_ent_id(const TokenPatternC* pattern) nogil:
-    #         # The code was originally designed to always have pattern[1].attrs.value
-    #         # be the ent_id when we get to the end of a pattern. However, Issue #2671
-    #         # showed this wasn't the case when we had a reject-and-continue before a
-    #         # match.
-    #         # The patch to #2671 was wrong though, which came up in #3839.
-    #         while pattern.attrs.attr != ID:
-    #             pattern += 1
-    #         return pattern.attrs.value
-    while pattern.nr_attr != 0 or pattern.nr_extra_attr != 0 or pattern.nr_py != 0 \
-            or pattern.quantifier != ZERO:
+    while pattern.quantifier != FINAL_ID:
        pattern += 1
    id_attr = pattern[0].attrs[0]
    if id_attr.attr != ID:
--- a/spacy/pipeline/entityruler.py
+++ b/spacy/pipeline/entityruler.py
@ -261,7 +261,11 @@ class EntityRuler(Pipe):

        # disable the nlp components after this one in case they hadn't been initialized / deserialised yet
        try:
-            current_index = self.nlp.pipe_names.index(self.name)
+            current_index = -1
+            for i, (name, pipe) in enumerate(self.nlp.pipeline):
+                if self == pipe:
+                    current_index = i
+                    break
            subsequent_pipes = [
                pipe for pipe in self.nlp.pipe_names[current_index + 1 :]
            ]
--- a/spacy/pipeline/lemmatizer.py
+++ b/spacy/pipeline/lemmatizer.py
@ -4,7 +4,7 @@ from thinc.api import Model
 from pathlib import Path

 from .pipe import Pipe
-from ..errors import Errors
+from ..errors import Errors, Warnings
 from ..language import Language
 from ..training import Example
 from ..lookups import Lookups, load_lookups
@ -197,6 +197,8 @@ class Lemmatizer(Pipe):
        string = token.text
        univ_pos = token.pos_.lower()
        if univ_pos in ("", "eol", "space"):
+            if univ_pos == "":
+                logger.warn(Warnings.W108.format(text=string))
            return [string.lower()]
        # See Issue #435 for example of where this logic is requied.
        if self.is_base_form(token):
--- a/spacy/tests/conftest.py
+++ b/spacy/tests/conftest.py
@ -172,6 +172,11 @@ def lt_tokenizer():
    return get_lang_class("lt")().tokenizer


+@pytest.fixture(scope="session")
+def mk_tokenizer():
+    return get_lang_class("mk")().tokenizer
+
+
@pytest.fixture(scope="session")
 def ml_tokenizer():
    return get_lang_class("ml")().tokenizer
--- a/spacy/tests/doc/test_doc_api.py
+++ b/spacy/tests/doc/test_doc_api.py
@ -123,6 +123,7 @@ def test_doc_api_serialize(en_tokenizer, text):
    tokens[0].norm_ = "norm"
    tokens.ents = [(tokens.vocab.strings["PRODUCT"], 0, 1)]
    tokens[0].ent_kb_id_ = "ent_kb_id"
+    tokens[0].ent_id_ = "ent_id"
    new_tokens = Doc(tokens.vocab).from_bytes(tokens.to_bytes())
    assert tokens.text == new_tokens.text
    assert [t.text for t in tokens] == [t.text for t in new_tokens]
@ -130,6 +131,7 @@ def test_doc_api_serialize(en_tokenizer, text):
    assert new_tokens[0].lemma_ == "lemma"
    assert new_tokens[0].norm_ == "norm"
    assert new_tokens[0].ent_kb_id_ == "ent_kb_id"
+    assert new_tokens[0].ent_id_ == "ent_id"

    new_tokens = Doc(tokens.vocab).from_bytes(
        tokens.to_bytes(exclude=["tensor"]), exclude=["tensor"]
--- a/spacy/tests/doc/test_retokenize_merge.py
+++ b/spacy/tests/doc/test_retokenize_merge.py
@ -416,6 +416,13 @@ def test_doc_retokenizer_merge_lex_attrs(en_vocab):
    assert doc[1].is_stop
    assert not doc[0].is_stop
    assert not doc[1].like_num
+    # Test that norm is only set on tokens
+    doc = Doc(en_vocab, words=["eins", "zwei", "!", "!"])
+    assert doc[0].norm_ == "eins"
+    with doc.retokenize() as retokenizer:
+        retokenizer.merge(doc[0:1], attrs={"norm": "1"})
+    assert doc[0].norm_ == "1"
+    assert en_vocab["eins"].norm_ == "eins"


 def test_retokenize_skip_duplicates(en_vocab):
--- a/spacy/tests/lang/mk/init.py
+++ b/spacy/tests/lang/mk/init.py
--- a/spacy/tests/lang/mk/test_text.py
+++ b/spacy/tests/lang/mk/test_text.py
@ -0,0 +1,84 @@
+import pytest
+from spacy.lang.mk.lex_attrs import like_num
+
+
+def test_tokenizer_handles_long_text(mk_tokenizer):
+    text = """
+    Во организациските работи или на нашите собранија со членството, никој од нас не зборуваше за 
+    организацијата и идеологијата. Работна беше нашата работа, а не идеолошка. Што се однесува до социјализмот на 
+    Делчев, неговата дејност зборува сама за себе - спротивно. Во суштина, водачите си имаа свои основни погледи и 
+    свои разбирања за положбата и работите, коишто стоеја пред нив и ги завршуваа со голема упорност, настојчивост и 
+    насоченост. Значи, идеологија имаше, само што нивната идеологија имаше своја оригиналност. Македонија денеска, 
+    чиста рожба на животот и положбата во Македонија, кои му служеа како база на неговите побуди, беше дејност која 
+    имаше потреба од ум за да си најде своја смисла. Таквата идеологија и заемното дејство на умот и срцето му 
+    помогнаа на Делчев да не се занесе по патот на својата идеологија... Во суштина, Организацијата и нејзините 
+    водачи имаа свои разбирања за работите и положбата во идеен поглед, но тоа беше врската, животот и положбата во 
+    Македонија и го внесуваа во својата идеологија гласот на своето срце, и на крај, прибегнуваа до умот, 
+    за да најдат смисла или да ѝ дадат. Тоа содејство и заемен сооднос на умот и срцето му помогнаа на Делчев да ја 
+    држи својата идеологија во сообразност со положбата на работите... Водачите навистина направија една жртва 
+    бидејќи на населението не му зборуваа за своите мисли и идеи. Тие се одрекоа од секаква субјективност во своите 
+    мисли. Целта беше да не се зголемуваат целите и задачите како и преданоста во работата. Населението не можеше да 
+    ги разбере овие идеи... 
+    """
+    tokens = mk_tokenizer(text)
+    assert len(tokens) == 297
+
+
+@pytest.mark.parametrize(
+    "word,match",
+    [
+        ("10", True),
+        ("1", True),
+        ("10.000", True),
+        ("1000", True),
+        ("бројка", False),
+        ("999,0", True),
+        ("еден", True),
+        ("два", True),
+        ("цифра", False),
+        ("десет", True),
+        ("сто", True),
+        ("број", False),
+        ("илјада", True),
+        ("илјади", True),
+        ("милион", True),
+        (",", False),
+        ("милијарда", True),
+        ("билион", True),
+    ]
+)
+def test_mk_lex_attrs_like_number(mk_tokenizer, word, match):
+    tokens = mk_tokenizer(word)
+    assert len(tokens) == 1
+    assert tokens[0].like_num == match
+
+
+@pytest.mark.parametrize(
+    "word",
+    [
+        "двесте",
+        "два-три",
+        "пет-шест"
+    ]
+)
+def test_mk_lex_attrs_capitals(word):
+    assert like_num(word)
+    assert like_num(word.upper())
+
+
+@pytest.mark.parametrize(
+    "word",
+    [
+        "првиот",
+        "втора",
+        "четврт",
+        "четвртата",
+        "петти",
+        "петто",
+        "стоти",
+        "шеесетите",
+        "седумдесетите"
+    ]
+)
+def test_mk_lex_attrs_like_number_for_ordinal(word):
+    assert like_num(word)
--- a/spacy/tests/lang/tr/test_text.py
+++ b/spacy/tests/lang/tr/test_text.py
@ -2,6 +2,27 @@ import pytest
 from spacy.lang.tr.lex_attrs import like_num


+def test_tr_tokenizer_handles_long_text(tr_tokenizer):
+    text = """Pamuk nasıl ipliğe dönüştürülür?
+
+Sıkıştırılmış balyalar halindeki pamuk, iplik fabrikasına getirildiğinde hem 
+lifleri birbirine dolaşmıştır, hem de tarladan toplanırken araya bitkinin 
+parçaları karışmıştır. Üstelik balyalardaki pamuğun cinsi aynı olsa bile kalitesi 
+değişeceğinden, önce bütün balyaların birbirine karıştırılarak harmanlanması gerekir.
+
+Daha sonra pamuk yığınları, liflerin açılıp temizlenmesi için tek bir birim halinde 
+birleştirilmiş çeşitli makinelerden geçirilir.Bunlardan biri, dönen tokmaklarıyla
+pamuğu dövüp kabartarak dağınık yumaklar haline getiren ve liflerin arasındaki yabancı
+maddeleri temizleyen hallaç makinesidir. Daha sonra tarak makinesine giren pamuk demetleri,
+herbirinin yüzeyinde yüzbinlerce incecik iğne bulunan döner silindirlerin arasından geçerek lif lif ayrılır
+ve tül inceliğinde gevşek bir örtüye dönüşür. Ama bir sonraki makine bu lifleri dağınık 
+ve gevşek bir biçimde birbirine yaklaştırarak 2 cm eninde bir pamuk şeridi haline getirir."""
+    tokens = tr_tokenizer(text)
+    assert len(tokens) == 146
+
+
+
+
@pytest.mark.parametrize(
    "word",
    [
--- a/spacy/tests/lang/tr/test_tokenizer.py
+++ b/spacy/tests/lang/tr/test_tokenizer.py
@ -0,0 +1,152 @@
+import pytest
+
+
+ABBREV_TESTS = [
+        ("Dr. Murat Bey ile görüştüm.", ["Dr.", "Murat", "Bey", "ile", "görüştüm", "."]),
+        ("Dr.la görüştüm.", ["Dr.la", "görüştüm", "."]),
+        ("Dr.'la görüştüm.", ["Dr.'la", "görüştüm", "."]),
+        ("TBMM'de çalışıyormuş.", ["TBMM'de", "çalışıyormuş", "."]),
+        ("Hem İst. hem Ank. bu konuda gayet iyi durumda.", ["Hem", "İst.", "hem", "Ank.", "bu", "konuda", "gayet", "iyi", "durumda", "."]),
+        ("Hem İst. hem Ank.'da yağış var.", ["Hem", "İst.", "hem", "Ank.'da", "yağış", "var", "."]),
+        ("Dr.", ["Dr."]),
+        ("Yrd.Doç.", ["Yrd.Doç."]),
+        ("Prof.'un", ["Prof.'un"]),
+        ("Böl.'nde", ["Böl.'nde"]),
+]
+
+
+
+URL_TESTS = [
+        ("Bizler de www.duygu.com.tr adında bir websitesi kurduk.", ["Bizler", "de", "www.duygu.com.tr", "adında", "bir", "websitesi", "kurduk", "."]),
+        ("Bizler de https://www.duygu.com.tr adında bir websitesi kurduk.", ["Bizler", "de", "https://www.duygu.com.tr", "adında", "bir", "websitesi", "kurduk", "."]),
+        ("Bizler de www.duygu.com.tr'dan satın aldık.", ["Bizler", "de", "www.duygu.com.tr'dan", "satın", "aldık", "."]),
+        ("Bizler de https://www.duygu.com.tr'dan satın aldık.", ["Bizler", "de", "https://www.duygu.com.tr'dan", "satın", "aldık", "."]),
+]
+
+
+
+NUMBER_TESTS = [
+        ("Rakamla 6 yazılıydı.", ["Rakamla", "6", "yazılıydı", "."]),
+        ("Hava -4 dereceydi.", ["Hava", "-4", "dereceydi", "."]),
+        ("Hava sıcaklığı -4ten +6ya yükseldi.", ["Hava", "sıcaklığı", "-4ten", "+6ya", "yükseldi", "."]),
+        ("Hava sıcaklığı -4'ten +6'ya yükseldi.", ["Hava", "sıcaklığı", "-4'ten", "+6'ya", "yükseldi", "."]),
+        ("Yarışta 6. oldum.", ["Yarışta", "6.", "oldum", "."]),
+        ("Yarışta 438547745. oldum.", ["Yarışta", "438547745.", "oldum", "."]),
+        ("Kitap IV. Murat hakkında.",["Kitap", "IV.", "Murat", "hakkında", "."]),
+        #("Bana söylediği sayı 6.", ["Bana", "söylediği", "sayı", "6", "."]),
+        ("Saat 6'da buluşalım.", ["Saat", "6'da", "buluşalım", "."]),
+        ("Saat 6dan sonra buluşalım.", ["Saat", "6dan", "sonra", "buluşalım", "."]),
+        ("6.dan sonra saymadım.", ["6.dan", "sonra", "saymadım", "."]),
+        ("6.'dan sonra saymadım.", ["6.'dan", "sonra", "saymadım", "."]),
+        ("Saat 6'ydı.", ["Saat", "6'ydı", "."]),
+        ("5'te", ["5'te"]),
+        ("6'da", ["6'da"]),
+        ("9dan", ["9dan"]),
+        ("19'da", ["19'da"]),
+        ("VI'da", ["VI'da"]),
+        ("5.", ["5."]),
+        ("72.", ["72."]),
+        ("VI.", ["VI."]),
+        ("6.'dan", ["6.'dan"]),
+        ("19.'dan", ["19.'dan"]),
+        ("6.dan", ["6.dan"]),
+        ("16.dan", ["16.dan"]),
+        ("VI.'dan", ["VI.'dan"]),
+        ("VI.dan", ["VI.dan"]),
+        ("Hepsi 1994 yılında oldu.", ["Hepsi", "1994", "yılında", "oldu", "."]),
+        ("Hepsi 1994'te oldu.", ["Hepsi", "1994'te", "oldu", "."]),
+        ("2/3 tarihli faturayı bulamadım.", ["2/3", "tarihli", "faturayı", "bulamadım", "."]),
+        ("2.3 tarihli faturayı bulamadım.", ["2.3", "tarihli", "faturayı", "bulamadım", "."]),
+        ("2.3. tarihli faturayı bulamadım.", ["2.3.", "tarihli", "faturayı", "bulamadım", "."]),
+        ("2/3/2020 tarihli faturayı bulamadm.", ["2/3/2020", "tarihli", "faturayı", "bulamadm", "."]),
+        ("2/3/1987 tarihinden beri burda yaşıyorum.", ["2/3/1987", "tarihinden", "beri", "burda", "yaşıyorum", "."]),
+        ("2-3-1987 tarihinden beri burdayım.", ["2-3-1987", "tarihinden", "beri", "burdayım", "."]),
+        ("2.3.1987 tarihinden beri burdayım.", ["2.3.1987", "tarihinden", "beri", "burdayım", "."]),
+        ("Bu olay 2005-2006 tarihleri arasında oldu.", ["Bu", "olay", "2005", "-", "2006", "tarihleri", "arasında", "oldu", "."]),
+        ("Bu olay 4/12/2005-21/3/2006 tarihleri arasında oldu.", ["Bu", "olay", "4/12/2005", "-", "21/3/2006", "tarihleri", "arasında", "oldu", ".",]),
+        ("Ek fıkra: 5/11/2003-4999/3 maddesine göre uygundur.", ["Ek", "fıkra", ":", "5/11/2003", "-", "4999/3", "maddesine", "göre", "uygundur", "."]),
+        ("2/A alanları: 6831 sayılı Kanunun 2nci maddesinin birinci fıkrasının (A) bendine göre", ["2/A", "alanları", ":", "6831", "sayılı", "Kanunun", "2nci", "maddesinin", "birinci", "fıkrasının", "(", "A", ")", "bendine", "göre"]),
+        ("ŞEHİTTEĞMENKALMAZ Cad. No: 2/311", ["ŞEHİTTEĞMENKALMAZ", "Cad.", "No", ":", "2/311"]),
+        ("2-3-2025", ["2-3-2025",]),
+        ("2/3/2025", ["2/3/2025"]),
+        ("Yıllardır 0.5 uç kullanıyorum.", ["Yıllardır", "0.5", "uç", "kullanıyorum", "."]),
+        ("Kan değerlerim 0.5-0.7 arasıydı.", ["Kan", "değerlerim", "0.5", "-", "0.7", "arasıydı", "."]),
+        ("0.5", ["0.5"]),
+        ("1/2", ["1/2"]),
+        ("%1", ["%", "1"]),
+        ("%1lik", ["%", "1lik"]),
+        ("%1'lik", ["%", "1'lik"]),
+        ("%1lik dilim", ["%", "1lik", "dilim"]),
+        ("%1'lik dilim", ["%", "1'lik", "dilim"]),
+        ("%1.5", ["%", "1.5"]),
+        #("%1-%2 arası büyüme bekleniyor.", ["%", "1", "-", "%", "2", "arası", "büyüme", "bekleniyor", "."]),
+        ("%1-2 arası büyüme bekliyoruz.", ["%", "1", "-", "2", "arası", "büyüme", "bekliyoruz", "."]),
+        ("%11-12 arası büyüme bekliyoruz.", ["%", "11", "-", "12", "arası", "büyüme", "bekliyoruz", "."]),
+        ("%1.5luk büyüme bekliyoruz.", ["%", "1.5luk", "büyüme", "bekliyoruz", "."]),
+        ("Saat 1-2 arası gelin lütfen.", ["Saat", "1", "-", "2", "arası", "gelin", "lütfen", "."]),
+        ("Saat 15:30 gibi buluşalım.", ["Saat", "15:30", "gibi", "buluşalım", "."]),
+        ("Saat 15:30'da buluşalım.", ["Saat", "15:30'da", "buluşalım", "."]),
+        ("Saat 15.30'da buluşalım.", ["Saat", "15.30'da", "buluşalım", "."]),
+        ("Saat 15.30da buluşalım.", ["Saat", "15.30da", "buluşalım", "."]),
+        ("Saat 15 civarı buluşalım.", ["Saat", "15", "civarı", "buluşalım", "."]),
+        ("9’daki otobüse binsek mi?", ["9’daki", "otobüse", "binsek", "mi", "?"]),
+        ("Okulumuz 3-B şubesi", ["Okulumuz", "3-B", "şubesi"]),
+        ("Okulumuz 3/B şubesi", ["Okulumuz", "3/B", "şubesi"]),
+        ("Okulumuz 3B şubesi", ["Okulumuz", "3B", "şubesi"]),
+        ("Okulumuz 3b şubesi", ["Okulumuz", "3b", "şubesi"]),
+        ("Antonio Gaudí 20. yüzyılda, 1904-1914 yılları arasında on yıl süren bir reform süreci getirmiştir.", ["Antonio", "Gaudí", "20.", "yüzyılda", ",", "1904", "-", "1914", "yılları", "arasında", "on", "yıl", "süren", "bir", "reform", "süreci", "getirmiştir", "."]),
+        ("Dizel yakıtın avro bölgesi ortalaması olan 1,165 avroya kıyasla litre başına 1,335 avroya mal olduğunu gösteriyor.", ["Dizel", "yakıtın", "avro", "bölgesi", "ortalaması", "olan", "1,165", "avroya", "kıyasla", "litre", "başına", "1,335", "avroya", "mal", "olduğunu", "gösteriyor", "."]),
+        ("Marcus Antonius M.Ö. 1 Ocak 49'da, Sezar'dan Vali'nin kendisini barış dostu ilan ettiği bir bildiri yayınlamıştır.", ["Marcus", "Antonius", "M.Ö.", "1", "Ocak", "49'da", ",", "Sezar'dan", "Vali'nin", "kendisini", "barış", "dostu", "ilan", "ettiği", "bir", "bildiri", "yayınlamıştır", "."])
+]
+
+
+PUNCT_TESTS = [
+        ("Gitmedim dedim ya!", ["Gitmedim", "dedim", "ya", "!"]),
+        ("Gitmedim dedim ya!!", ["Gitmedim", "dedim", "ya", "!", "!"]),
+        ("Gitsek mi?", ["Gitsek", "mi", "?"]),
+        ("Gitsek mi??", ["Gitsek", "mi", "?", "?"]),
+        ("Gitsek mi?!?", ["Gitsek", "mi", "?", "!", "?"]),
+        ("Ankara - Antalya arası otobüs işliyor.", ["Ankara", "-",  "Antalya", "arası", "otobüs", "işliyor", "."]),
+        ("Ankara-Antalya arası otobüs işliyor.", ["Ankara", "-", "Antalya", "arası", "otobüs", "işliyor", "."]),
+        ("Sen--ben, ya da onlar.", ["Sen", "--", "ben", ",", "ya", "da", "onlar", "."]),
+        ("Senden, benden, bizden şarkısını biliyor musun?", ["Senden", ",", "benden", ",", "bizden", "şarkısını", "biliyor", "musun", "?"]),
+        ("Akif'le geldik, sonra da o ayrıldı.", ["Akif'le", "geldik", ",", "sonra", "da", "o", "ayrıldı", "."]),
+        ("Bu adam ne dedi şimdi???", ["Bu", "adam", "ne", "dedi", "şimdi", "?", "?", "?"]),
+        ("Yok hasta olmuş, yok annesi hastaymış, bahaneler işte...", ["Yok", "hasta", "olmuş", ",", "yok", "annesi", "hastaymış", ",", "bahaneler", "işte", "..."]),
+        ("Ankara'dan İstanbul'a ... bir aşk hikayesi.", ["Ankara'dan", "İstanbul'a", "...", "bir", "aşk", "hikayesi", "."]),
+        ("Ahmet'te", ["Ahmet'te"]),
+        ("İstanbul'da", ["İstanbul'da"]),
+]
+
+GENERAL_TESTS = [
+        ("1914'teki Endurance seferinde, Sir Ernest Shackleton'ın kaptanlığını yaptığı İngiliz Endurance gemisi yirmi sekiz kişi ile Antarktika'yı geçmek üzere yelken açtı.", ["1914'teki", "Endurance", "seferinde", ",", "Sir", "Ernest", "Shackleton'ın", "kaptanlığını", "yaptığı", "İngiliz", "Endurance", "gemisi", "yirmi", "sekiz", "kişi", "ile", "Antarktika'yı", "geçmek", "üzere", "yelken", "açtı", "."]),
+        ("Danışılan \"%100 Cospedal\" olduğunu belirtti.", ["Danışılan", '"', "%", "100", "Cospedal", '"', "olduğunu", "belirtti", "."]),
+        ("1976'da parkur artık kullanılmıyordu; 1990'da ise bir yangın, daha sonraları ahırlarla birlikte yıkılacak olan tahta tribünlerden geri kalanları da yok etmişti.", ["1976'da", "parkur", "artık", "kullanılmıyordu", ";", "1990'da", "ise", "bir", "yangın", ",", "daha", "sonraları", "ahırlarla", "birlikte", "yıkılacak", "olan", "tahta", "tribünlerden", "geri", "kalanları", "da", "yok", "etmişti", "."]),
+        ("Dahiyane bir ameliyat ve zorlu bir rehabilitasyon sürecinden sonra, tamamen iyileştim.", ["Dahiyane", "bir", "ameliyat", "ve", "zorlu", "bir", "rehabilitasyon", "sürecinden", "sonra", ",", "tamamen", "iyileştim", "."]),
+        ("Yaklaşık iki hafta süren bireysel erken oy kullanma döneminin ardından 5,7 milyondan fazla Floridalı sandık başına gitti.", ["Yaklaşık", "iki", "hafta", "süren", "bireysel", "erken", "oy", "kullanma", "döneminin", "ardından", "5,7", "milyondan", "fazla", "Floridalı", "sandık", "başına", "gitti", "."]),
+        ("Ancak, bu ABD Çevre Koruma Ajansı'nın dünyayı bu konularda uyarmasının ardından ortaya çıktı.", ["Ancak", ",", "bu", "ABD", "Çevre", "Koruma", "Ajansı'nın", "dünyayı", "bu", "konularda", "uyarmasının", "ardından", "ortaya", "çıktı", "."]),
+        ("Ortalama şansa ve 10.000 Sterlin değerinde tahvillere sahip bir yatırımcı yılda 125 Sterlin ikramiye kazanabilir.", ["Ortalama", "şansa", "ve", "10.000", "Sterlin", "değerinde", "tahvillere", "sahip", "bir", "yatırımcı", "yılda", "125", "Sterlin", "ikramiye", "kazanabilir", "."]),
+        ("Granit adaları; Seyşeller ve Tioman ile Saint Helena gibi volkanik adaları kapsar." , ["Granit", "adaları", ";", "Seyşeller", "ve", "Tioman", "ile", "Saint", "Helena", "gibi", "volkanik", "adaları", "kapsar", "."]),
+        ("Barış antlaşmasıyla İspanya, Amerika'ya Porto Riko, Guam ve Filipinler kolonilerini devretti.", ["Barış", "antlaşmasıyla", "İspanya", ",", "Amerika'ya", "Porto", "Riko", ",", "Guam", "ve", "Filipinler", "kolonilerini", "devretti", "."]),
+        ("Makedonya\'nın sınır bölgelerini güvence altına alan Philip, büyük bir Makedon ordusu kurdu ve uzun bir fetih seferi için Trakya\'ya doğru yürüdü.", ["Makedonya\'nın", "sınır", "bölgelerini", "güvence", "altına", "alan", "Philip", ",", "büyük", "bir", "Makedon", "ordusu", "kurdu", "ve", "uzun", "bir", "fetih", "seferi", "için", "Trakya\'ya", "doğru", "yürüdü", "."]),
+        ("Fransız gazetesi Le Figaro'ya göre bu hükumet planı sayesinde 42 milyon Euro kazanç sağlanabilir ve elde edilen paranın 15.5 milyonu ulusal güvenlik için kullanılabilir.", ["Fransız", "gazetesi", "Le", "Figaro'ya", "göre", "bu", "hükumet", "planı", "sayesinde", "42", "milyon", "Euro", "kazanç", "sağlanabilir", "ve", "elde", "edilen", "paranın", "15.5", "milyonu", "ulusal", "güvenlik", "için", "kullanılabilir", "."]),
+        ("Ortalama şansa ve 10.000 Sterlin değerinde tahvillere sahip bir yatırımcı yılda 125 Sterlin ikramiye kazanabilir.", ["Ortalama", "şansa", "ve", "10.000", "Sterlin", "değerinde", "tahvillere", "sahip", "bir", "yatırımcı", "yılda", "125", "Sterlin", "ikramiye", "kazanabilir", "."]),
+        ("3 Kasım Salı günü, Ankara Belediye Başkanı 2014'te hükümetle birlikte oluşturulan kentsel gelişim anlaşmasını askıya alma kararı verdi.", ["3", "Kasım", "Salı", "günü", ",", "Ankara", "Belediye", "Başkanı", "2014'te", "hükümetle", "birlikte", "oluşturulan", "kentsel", "gelişim", "anlaşmasını", "askıya", "alma", "kararı", "verdi", "."]),
+        ("Stalin, Abakumov'u Beria'nın enerji bakanlıkları üzerindeki baskınlığına karşı MGB içinde kendi ağını kurmaya teşvik etmeye başlamıştı.", ["Stalin", ",", "Abakumov'u", "Beria'nın", "enerji", "bakanlıkları", "üzerindeki", "baskınlığına", "karşı", "MGB", "içinde", "kendi", "ağını", "kurmaya", "teşvik", "etmeye", "başlamıştı", "."]),
+        ("Güney Avrupa'daki kazı alanlarının çoğunluğu gibi, bu bulgu M.Ö. 5. yüzyılın başlar", ["Güney", "Avrupa'daki", "kazı", "alanlarının", "çoğunluğu", "gibi", ",", "bu", "bulgu", "M.Ö.", "5.", "yüzyılın", "başlar"]),
+        ("Sağlığın bozulması Hitchcock hayatının son yirmi yılında üretimini azalttı.", ["Sağlığın", "bozulması", "Hitchcock", "hayatının", "son", "yirmi", "yılında", "üretimini", "azalttı", "."]),
+]
+
+
+
+TESTS = (ABBREV_TESTS + URL_TESTS +  NUMBER_TESTS + PUNCT_TESTS + GENERAL_TESTS)
+
+
+
+@pytest.mark.parametrize("text,expected_tokens", TESTS)
+def test_tr_tokenizer_handles_allcases(tr_tokenizer, text, expected_tokens):
+    tokens = tr_tokenizer(text)
+    token_list = [token.text for token in tokens if not token.is_space]
+    print(token_list)
+    assert expected_tokens == token_list
+
--- a/spacy/tests/matcher/test_matcher_api.py
+++ b/spacy/tests/matcher/test_matcher_api.py
@ -457,6 +457,7 @@ def test_attr_pipeline_checks(en_vocab):
        ([{"IS_LEFT_PUNCT": True}], "``"),
        ([{"IS_RIGHT_PUNCT": True}], "''"),
        ([{"IS_STOP": True}], "the"),
+        ([{"SPACY": True}], "the"),
        ([{"LIKE_NUM": True}], "1"),
        ([{"LIKE_URL": True}], "http://example.com"),
        ([{"LIKE_EMAIL": True}], "mail@example.com"),
--- a/spacy/tests/package/test_requirements.py
+++ b/spacy/tests/package/test_requirements.py
@ -4,7 +4,9 @@ from pathlib import Path

 def test_build_dependencies():
    # Check that library requirements are pinned exactly the same across different setup files.
+    # TODO: correct checks for numpy rather than ignoring
    libs_ignore_requirements = [
+        "numpy",
        "pytest",
        "pytest-timeout",
        "mock",
@ -12,6 +14,7 @@ def test_build_dependencies():
    ]
    # ignore language-specific packages that shouldn't be installed by all
    libs_ignore_setup = [
+        "numpy",
        "fugashi",
        "natto-py",
        "pythainlp",
@ -67,7 +70,7 @@ def test_build_dependencies():
        line = line.strip().strip(",").strip('"')
        if not line.startswith("#"):
            lib, v = _parse_req(line)
-            if lib:
+            if lib and lib not in libs_ignore_requirements:
                req_v = req_dict.get(lib, None)
                assert (lib + v) == (lib + req_v), (
                    "{} has different version in pyproject.toml and in requirements.txt: "
--- a/spacy/tests/pipeline/test_entity_ruler.py
+++ b/spacy/tests/pipeline/test_entity_ruler.py
@ -197,3 +197,21 @@ def test_entity_ruler_overlapping_spans(nlp):
    doc = ruler(nlp.make_doc("foo bar baz"))
    assert len(doc.ents) == 1
    assert doc.ents[0].label_ == "FOOBAR"
+
+
+@pytest.mark.parametrize("n_process", [1, 2])
+def test_entity_ruler_multiprocessing(nlp, n_process):
+    texts = [
+        "I enjoy eating Pizza Hut pizza."
+    ]
+
+    patterns = [
+        {"label": "FASTFOOD", "pattern": "Pizza Hut", "id": "1234"}
+    ]
+
+    ruler = nlp.add_pipe("entity_ruler")
+    ruler.add_patterns(patterns)
+
+    for doc in nlp.pipe(texts, n_process=2):
+        for ent in doc.ents:
+            assert ent.ent_id_ == "1234"
--- a/spacy/tests/pipeline/test_lemmatizer.py
+++ b/spacy/tests/pipeline/test_lemmatizer.py
@ -1,4 +1,6 @@
 import pytest
+import logging
+import mock
 from spacy import util, registry
 from spacy.lang.en import English
 from spacy.lookups import Lookups
@ -54,9 +56,18 @@ def test_lemmatizer_config(nlp):
    lemmatizer = nlp.add_pipe("lemmatizer", config={"mode": "rule"})
    nlp.initialize()

+    # warning if no POS assigned
+    doc = nlp.make_doc("coping")
+    logger = logging.getLogger("spacy")
+    with mock.patch.object(logger, "warn") as mock_warn:
+        doc = lemmatizer(doc)
+        mock_warn.assert_called_once()
+
+    # works with POS
    doc = nlp.make_doc("coping")
-    doc[0].pos_ = "VERB"
    assert doc[0].lemma_ == ""
+    doc[0].pos_ = "VERB"
+    doc = lemmatizer(doc)
    doc = lemmatizer(doc)
    assert doc[0].text == "coping"
    assert doc[0].lemma_ == "cope"
--- a/spacy/tests/test_cli.py
+++ b/spacy/tests/test_cli.py
@ -8,7 +8,7 @@ from spacy.cli.init_config import init_config, RECOMMENDATIONS
 from spacy.cli._util import validate_project_commands, parse_config_overrides
 from spacy.cli._util import load_project_config, substitute_project_variables
 from spacy.cli._util import string_to_list
-from thinc.api import ConfigValidationError
+from thinc.api import ConfigValidationError, Config
 import srsly
 import os

@ -368,7 +368,8 @@ def test_parse_cli_overrides():
@pytest.mark.parametrize("optimize", ["efficiency", "accuracy"])
 def test_init_config(lang, pipeline, optimize):
    # TODO: add more tests and also check for GPU with transformers
-    init_config("-", lang=lang, pipeline=pipeline, optimize=optimize, gpu=False)
+    config = init_config(lang=lang, pipeline=pipeline, optimize=optimize, gpu=False)
+    assert isinstance(config, Config)


 def test_model_recommendations():
--- a/spacy/tokenizer.pyx
+++ b/spacy/tokenizer.pyx
@ -404,9 +404,7 @@ cdef class Tokenizer:
        cdef unicode minus_suf
        cdef size_t last_size = 0
        while string and len(string) != last_size:
-            if self.token_match and self.token_match(string) \
-                    and not self.find_prefix(string) \
-                    and not self.find_suffix(string):
+            if self.token_match and self.token_match(string):
                break
            if with_special_cases and self._specials.get(hash_string(string)) != NULL:
                break
@ -679,6 +677,8 @@ cdef class Tokenizer:
                            break
                        suffixes.append(("SUFFIX", substring[split:]))
                        substring = substring[:split]
+                if len(substring) == 0:
+                    continue
                if token_match(substring):
                    tokens.append(("TOKEN_MATCH", substring))
                    substring = ''
--- a/spacy/tokens/_retokenize.pyx
+++ b/spacy/tokens/_retokenize.pyx
@ -11,7 +11,7 @@ from .span cimport Span
 from .token cimport Token
 from ..lexeme cimport Lexeme, EMPTY_LEXEME
 from ..structs cimport LexemeC, TokenC
-from ..attrs cimport MORPH
+from ..attrs cimport MORPH, NORM
 from ..vocab cimport Vocab

 from .underscore import is_writable_attr
@ -372,9 +372,10 @@ def _split(Doc doc, int token_index, orths, heads, attrs):
                # Set attributes on both token and lexeme to take care of token
                # attribute vs. lexical attribute without having to enumerate
                # them. If an attribute name is not valid, set_struct_attr will
-                # ignore it.
+                # ignore it. Exception: set NORM only on tokens.
                Token.set_struct_attr(token, attr_name, get_string_id(attr_value))
-                Lexeme.set_struct_attr(<LexemeC*>token.lex, attr_name, get_string_id(attr_value))
+                if attr_name != NORM:
+                    Lexeme.set_struct_attr(<LexemeC*>token.lex, attr_name, get_string_id(attr_value))
    # Assign correct dependencies to the inner token
    for i, head in enumerate(heads):
        doc.c[token_index + i].head = head
@ -435,6 +436,7 @@ def set_token_attrs(Token py_token, attrs):
            # Set attributes on both token and lexeme to take care of token
            # attribute vs. lexical attribute without having to enumerate
            # them. If an attribute name is not valid, set_struct_attr will
-            # ignore it.
+            # ignore it. Exception: set NORM only on tokens.
            Token.set_struct_attr(token, attr_name, attr_value)
-            Lexeme.set_struct_attr(<LexemeC*>lex, attr_name, attr_value)
+            if attr_name != NORM:
+                Lexeme.set_struct_attr(<LexemeC*>lex, attr_name, attr_value)
--- a/spacy/typedefs.pxd
+++ b/spacy/typedefs.pxd
@ -5,7 +5,6 @@ from libc.stdint cimport uint8_t
 ctypedef float weight_t
 ctypedef uint64_t hash_t
 ctypedef uint64_t class_t
-ctypedef char* utf8_t
 ctypedef uint64_t attr_t
 ctypedef uint64_t flags_t
 ctypedef uint16_t len_t
--- a/spacy/util.py
+++ b/spacy/util.py
@ -1295,6 +1295,13 @@ def combine_score_weights(


 class DummyTokenizer:
+    def __call__(self, text):
+        raise NotImplementedError
+
+    def pipe(self, texts, **kwargs):
+        for text in texts:
+            yield self(text)
+
    # add dummy methods for to_bytes, from_bytes, to_disk and from_disk to
    # allow serialization (see #1557)
    def to_bytes(self, **kwargs):
--- a/spacy/vocab.pxd
+++ b/spacy/vocab.pxd
@ -4,7 +4,7 @@ from cymem.cymem cimport Pool
 from murmurhash.mrmr cimport hash64

 from .structs cimport LexemeC, TokenC
-from .typedefs cimport utf8_t, attr_t, hash_t
+from .typedefs cimport attr_t, hash_t
 from .strings cimport StringStore
 from .morphology cimport Morphology

--- a/spacy/vocab.pyx
+++ b/spacy/vocab.pyx
@ -305,6 +305,9 @@ cdef class Vocab:
        DOCS: https://nightly.spacy.io/api/vocab#prune_vectors
        """
        xp = get_array_module(self.vectors.data)
+        # Make sure all vectors are in the vocab
+        for orth in self.vectors:
+            self[orth]
        # Make prob negative so it sorts by rank ascending
        # (key2row contains the rank)
        priority = [(-lex.prob, self.vectors.key2row[lex.orth], lex.orth)
--- a/website/docs/api/matcher.md
+++ b/website/docs/api/matcher.md
@ -39,7 +39,9 @@ rule-based matching are:
 |  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT`             | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~                                          |
 |  `IS_LOWER`, `IS_UPPER`, `IS_TITLE`             | Token text is in lowercase, uppercase, titlecase. ~~bool~~                                                                |
 |  `IS_PUNCT`, `IS_SPACE`, `IS_STOP`              | Token is punctuation, whitespace, stop word. ~~bool~~                                                                     |
+|  `IS_SENT_START`                                | Token is start of sentence. ~~bool~~                                                                                      |
 |  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL`           | Token text resembles a number, URL, email. ~~bool~~                                                                       |
+| `SPACY`                                         | Token has a trailing space. ~~bool~~                                                                                      |
 |  `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. ~~str~~       |
 | `ENT_TYPE`                                      | The token's entity label. ~~str~~                                                                                         |
 | `_` <Tag variant="new">2.1</Tag>                | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
@ -61,7 +63,7 @@ matched:
 | `!` | Negate the pattern, by requiring it to match exactly 0 times.    |
 | `?` | Make the pattern optional, by allowing it to match 0 or 1 times. |
 | `+` | Require the pattern to match 1 or more times.                    |
-| `*` | Allow the pattern to match 0 or more times.                   |
+| `*` | Allow the pattern to match 0 or more times.                      |

 Token patterns can also map to a **dictionary of properties** instead of a
 single value to indicate whether the expected value is a member of a list or how
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -158,21 +158,22 @@ The available token pattern keys correspond to a number of
 [`Token` attributes](/api/token#attributes). The supported attributes for
 rule-based matching are:

-| Attribute                                       |  Description                                                                                                              |
-| ----------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
-| `ORTH`                                          | The exact verbatim text of a token. ~~str~~                                                                               |
-| `TEXT` <Tag variant="new">2.1</Tag>             | The exact verbatim text of a token. ~~str~~                                                                               |
-| `LOWER`                                         | The lowercase form of the token text. ~~str~~                                                                             |
-|  `LENGTH`                                       | The length of the token text. ~~int~~                                                                                     |
-|  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT`             | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~                                          |
-|  `IS_LOWER`, `IS_UPPER`, `IS_TITLE`             | Token text is in lowercase, uppercase, titlecase. ~~bool~~                                                                |
-|  `IS_PUNCT`, `IS_SPACE`, `IS_STOP`              | Token is punctuation, whitespace, stop word. ~~bool~~                                                                     |
-|  `IS_SENT_START`                                | Token is start of sentence. ~~bool~~                                                                                      |
-|  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL`           | Token text resembles a number, URL, email. ~~bool~~                                                                       |
-|  `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. ~~str~~       |
-| `ENT_TYPE`                                      | The token's entity label. ~~str~~                                                                                         |
-| `_` <Tag variant="new">2.1</Tag>                | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
-| `OP`                                            | [Operator or quantifier](#quantifiers) to determine how often to match a token pattern. ~~str~~                           |
+| Attribute                                       |  Description                                                                                                                                                                                                                                                                                              |
+| ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `ORTH`                                          | The exact verbatim text of a token. ~~str~~                                                                                                                                                                                                                                                               |
+| `TEXT` <Tag variant="new">2.1</Tag>             | The exact verbatim text of a token. ~~str~~                                                                                                                                                                                                                                                               |
+| `LOWER`                                         | The lowercase form of the token text. ~~str~~                                                                                                                                                                                                                                                             |
+|  `LENGTH`                                       | The length of the token text. ~~int~~                                                                                                                                                                                                                                                                     |
+|  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT`             | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~                                                                                                                                                                                                                          |
+|  `IS_LOWER`, `IS_UPPER`, `IS_TITLE`             | Token text is in lowercase, uppercase, titlecase. ~~bool~~                                                                                                                                                                                                                                                |
+|  `IS_PUNCT`, `IS_SPACE`, `IS_STOP`              | Token is punctuation, whitespace, stop word. ~~bool~~                                                                                                                                                                                                                                                     |
+|  `IS_SENT_START`                                | Token is start of sentence. ~~bool~~                                                                                                                                                                                                                                                                      |
+|  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL`           | Token text resembles a number, URL, email. ~~bool~~                                                                                                                                                                                                                                                       |
+| `SPACY`                                         | Token has a trailing space. ~~bool~~                                                                                                                                                                                                                                                                      |
+|  `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. Note that the values of these attributes are case-sensitive. For a list of available part-of-speech tags and dependency labels, see the [Annotation Specifications](/api/annotation). ~~str~~ |
+| `ENT_TYPE`                                      | The token's entity label. ~~str~~                                                                                                                                                                                                                                                                         |
+| `_` <Tag variant="new">2.1</Tag>                | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~                                                                                                                                                                                 |
+| `OP`                                            | [Operator or quantifier](#quantifiers) to determine how often to match a token pattern. ~~str~~                                                                                                                                                                                                           |

 <Accordion title="Does it matter if the attribute names are uppercase or lowercase?">

--- a/website/meta/languages.json
+++ b/website/meta/languages.json
@ -199,6 +199,36 @@
            "name": "Vietnamese",
            "dependencies": [{ "name": "Pyvi", "url": "https://github.com/trungtv/pyvi" }]
        },
+        {
+            "code": "lij",
+            "name": "Ligurian",
+            "example": "Sta chì a l'é unna fraxe.",
+            "has_examples": true
+        },
+        {
+            "code": "hy",
+            "name": "Armenian",
+            "has_examples": true
+        },
+        {
+            "code": "gu",
+            "name": "Gujarati",
+            "has_examples": true
+        },
+        {
+            "code": "ml",
+            "name": "Malayalam",
+            "has_examples": true
+        },
+        {
+            "code": "ne",
+            "name": "Nepali",
+            "has_examples": true
+        },
+        {
+            "code": "mk",
+            "name": "Macedonian"
+        },
        {
            "code": "xx",
            "name": "Multi-language",
--- a/website/meta/universe.json
+++ b/website/meta/universe.json
@ -1,5 +1,36 @@
 {
    "resources": [
+    	{
+            "id": "spacy-textblob",
+            "title": "spaCyTextBlob",
+            "slogan": "Easy sentiment analysis for spaCy using TextBlob",
+            "description": "spaCyTextBlob is a pipeline component that enables sentiment analysis using the [TextBlob](https://github.com/sloria/TextBlob) library. It will add the additional extenstion `._.sentiment` to `Doc`, `Span`, and `Token` objects.",
+            "github": "SamEdwardes/spaCyTextBlob",
+            "pip": "spacytextblob",
+            "code_example": [
+            "import spacy",
+            "from spacytextblob.spacytextblob import SpacyTextBlob",
+            "",
+            "nlp = spacy.load('en_core_web_sm')",
+            "spacy_text_blob = SpacyTextBlob()",
+            "nlp.add_pipe(spacy_text_blob)",
+            "text = 'I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy.'",
+            "doc = nlp(text)",
+            "doc._.sentiment.polarity      # Polarity: -0.125",
+            "doc._.sentiment.subjectivity  # Sujectivity: 0.9",
+            "doc._.sentiment.assessments   # Assessments: [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]"
+            ],
+            "code_language": "python",
+            "url": "https://spacytextblob.netlify.app/",
+            "author": "Sam Edwardes",
+            "author_links": {
+            "twitter": "TheReaLSamlam",
+            "github": "SamEdwardes",
+            "website": "https://samedwardes.com"
+            },
+            "category": ["pipeline"],
+            "tags": ["sentiment", "textblob"]
+	    },
        {
            "id": "spacy-ray",
            "title": "spacy-ray",
@ -788,6 +819,22 @@
            "category": ["conversational"],
            "tags": ["chatbots"]
        },
+        {
+            "id": "mindmeld",
+            "title": "MindMeld - Conversational AI platform",
+            "slogan": "Conversational AI platform for deep-domain voice interfaces and chatbots",
+            "description": "The MindMeld Conversational AI platform is among the most advanced AI platforms for building production-quality conversational applications. It is a Python-based machine learning framework which encompasses all of the algorithms and utilities required for this purpose. (https://github.com/cisco/mindmeld)",
+            "github": "cisco/mindmeld",
+            "pip": "mindmeld",
+            "thumb": "https://www.mindmeld.com/img/mindmeld-logo.png",
+            "category": ["conversational", "ner"],
+            "tags": ["chatbots"],
+            "author": "Cisco",
+            "author_links": {
+                "github": "cisco/mindmeld",
+                "website": "https://www.mindmeld.com/"
+            }
+        },
        {
            "id": "torchtext",
            "title": "torchtext",
@ -1648,7 +1695,7 @@
                "",
                "nlp = spacy.load('en')",
                "nlp.add_pipe(BeneparComponent('benepar_en'))",
-                "doc = nlp('The time for action is now. It's never too late to do something.')",
+                "doc = nlp('The time for action is now. It is never too late to do something.')",
                "sent = list(doc.sents)[0]",
                "print(sent._.parse_string)",
                "# (S (NP (NP (DT The) (NN time)) (PP (IN for) (NP (NN action)))) (VP (VBZ is) (ADVP (RB now))) (. .))",
@ -2527,14 +2574,14 @@
            "description": "A spaCy rule-based pipeline for identifying positive cases of COVID-19 from clinical text. A version of this system was deployed as part of the US Department of Veterans Affairs biosurveillance response to COVID-19.",
            "pip": "cov-bsv",
            "code_example": [
-                "import cov_bsv",
-                "",
-                "nlp = cov_bsv.load()",
-                "text = 'Pt tested for COVID-19. His wife was recently diagnosed with novel coronavirus. SARS-COV-2: Detected'",
-                "",
-                "print(doc.ents)",
-                "print(doc._.cov_classification)",
-                "cov_bsv.visualize_doc(doc)"
+              "import cov_bsv",
+              "",
+              "nlp = cov_bsv.load()",
+              "doc = nlp('Pt tested for COVID-19. His wife was recently diagnosed with novel coronavirus. SARS-COV-2: Detected')",
+              "",
+              "print(doc.ents)",
+              "print(doc._.cov_classification)",
+              "cov_bsv.visualize_doc(doc)"
            ],
            "category": ["pipeline", "standalone", "biomedical", "scientific"],
            "tags": ["clinical", "epidemiology", "covid-19", "surveillance"],
@ -2542,6 +2589,35 @@
            "author_links": {
                "github": "abchapman93"
            }
+        },
+        {
+            "id": "medspacy",
+            "title": "medspaCy",
+            "thumb": "https://raw.githubusercontent.com/medspacy/medspacy/master/images/medspacy_logo.png",
+            "slogan": "A toolkit for clinical NLP with spaCy.",
+            "github": "medspacy/medspacy",
+            "description": "A toolkit for clinical NLP with spaCy. Features include sentence splitting, section detection, and asserting negation, family history, and uncertainty.",
+            "pip": "medspacy",
+            "code_example": [
+              "import medspacy",
+              "from medspacy.ner import TargetRule",
+              "",
+              "nlp = medspacy.load()",
+              "print(nlp.pipe_names)",
+              "",
+              "nlp.get_pipe('target_matcher').add([TargetRule('stroke', 'CONDITION'), TargetRule('diabetes', 'CONDITION'), TargetRule('pna', 'CONDITION')])",
+              "doc = nlp('Patient has hx of stroke. Mother diagnosed with diabetes. No evidence of pna.')",
+              "",
+              "for ent in doc.ents:",
+              "    print(ent, ent._.is_negated, ent._.is_family, ent._.is_historical)",
+              "medspacy.visualization.visualize_ent(doc)"
+            ],
+            "category": ["biomedical", "scientific", "research"],
+            "tags": ["clinical"],
+            "author": "medspacy",
+            "author_links": {
+                "github": "medspacy"
+            }
        },
 	      {
            "id": "rita-dsl",
@ -2578,6 +2654,32 @@
            "author_links": {
                "github": "zaibacu"
            }
+        },
+        {
+            "id": "PatternOmatic",
+            "title": "PatternOmatic",
+            "slogan": "Finds linguistic patterns effortlessly",
+            "description": "Discover spaCy's linguistic patterns matching a given set of String samples to be used by the spaCy's Rule Based Matcher",
+            "github": "revuel/PatternOmatic",
+            "pip": "PatternOmatic",
+            "code_example": [
+                "from PatternOmatic.api import find_patterns",
+                "",
+                "samples = ['I am a cat!', 'You are a dog!', 'She is an owl!']",
+                "",
+                "patterns_found, _ = find_patterns(samples)",
+                "",
+                "print(f'Patterns found: {patterns_found}')"
+            ],
+            "code_language": "python",
+            "thumb": "https://svgshare.com/i/R3P.svg",
+            "image": "https://svgshare.com/i/R3P.svg",
+            "author": "Miguel Revuelta Espinosa",
+            "author_links": {
+                "github": "revuel"
+            },
+            "category": ["scientific", "research", "standalone"],
+            "tags": ["Evolutionary Computation", "Grammatical Evolution"]
        }
    ],

--- a/website/src/widgets/landing.js
+++ b/website/src/widgets/landing.js
@ -207,42 +207,49 @@ const Landing = ({ data }) => {

            <LandingBannerGrid>
                <LandingBanner
-                    to="https://course.spacy.io"
-                    button="Start the course"
-                    background="#f6f6f6"
-                    color="#252a33"
+                    title="spaCy v3.0 nightly: Transformer-based pipelines, new training system, project templates &amp; more"
+                    label="Try the pre-release"
+                    to="https://nightly.spacy.io"
+                    button="See what's new"
+                    background="#8758fe"
+                    color="#ffffff"
                    small
                >
-                    <Link to="https://course.spacy.io" hidden>
+                    spaCy v3.0 features all new <strong>transformer-based pipelines</strong> that
+                    bring spaCy's accuracy right up to the current <strong>state-of-the-art</strong>
+                    . You can use any pretrained transformer to train your own pipelines, and even
+                    share one transformer between multiple components with{' '}
+                    <strong>multi-task learning</strong>. Training is now fully configurable and
+                    extensible, and you can define your own custom models using{' '}
+                    <strong>PyTorch</strong>, <strong>TensorFlow</strong> and other frameworks. The
+                    new spaCy projects system lets you describe whole{' '}
+                    <strong>end-to-end workflows</strong> in a single file, giving you an easy path
+                    from prototype to production, and making it easy to clone and adapt
+                    best-practice projects for your own use cases.
+                </LandingBanner>
+
+                <LandingBanner
+                    title="Prodigy: Radically efficient machine teaching"
+                    label="From the makers of spaCy"
+                    to="https://prodi.gy"
+                    button="Try it out"
+                    background="#f6f6f6"
+                    color="#000"
+                    small
+                >
+                    <Link to="https://prodi.gy" hidden>
                        <img
-                            src={courseImage}
-                            alt="Advanced NLP with spaCy: A free online course"
+                            src={prodigyImage}
+                            alt="Prodigy: Radically efficient machine teaching"
                        />
                    </Link>
                    <br />
                    <br />
-                    In this <strong>free and interactive online course</strong> you’ll learn how to
-                    use spaCy to build advanced natural language understanding systems, using both
-                    rule-based and machine learning approaches. It includes{' '}
-                    <strong>55 exercises</strong> featuring videos, slide decks, multiple-choice
-                    questions and interactive coding practice in the browser.
-                </LandingBanner>
-                <LandingBanner
-                    title="spaCy IRL: Two days of NLP"
-                    label="Watch the videos"
-                    to="https://www.youtube.com/playlist?list=PLBmcuObd5An4UC6jvK_-eSl6jCvP1gwXc"
-                    button="Watch the videos"
-                    background="#ffc194"
-                    backgroundImage={irlBackground}
-                    color="#1a1e23"
-                    small
-                >
-                    We were pleased to invite the spaCy community and other folks working on NLP to
-                    Berlin for a small and intimate event. We booked a beautiful venue, hand-picked
-                    an awesome lineup of speakers and scheduled plenty of social time to get to know
-                    each other. The YouTube playlist includes 12 talks about NLP research,
-                    development and applications, with keynotes by Sebastian Ruder (DeepMind) and
-                    Yoav Goldberg (Allen AI).
+                    Prodigy is an <strong>annotation tool</strong> so efficient that data scientists
+                    can do the annotation themselves, enabling a new level of rapid iteration.
+                    Whether you're working on entity recognition, intent detection or image
+                    classification, Prodigy can help you <strong>train and evaluate</strong> your
+                    models faster.
                </LandingBanner>
            </LandingBannerGrid>