Merge branch 'master' into spacy.io

2025-12-20 00:24:39 +03:00 · 2020-05-21 21:01:32 +02:00 · 2020-05-21 21:01:32 +02:00 · dc94052d6e
commit dc94052d6e
parent 5753b43e60 5ce02c1b17
233 changed files with 11750 additions and 11008 deletions
--- a/.github/contributors/Baciccin.md
+++ b/.github/contributors/Baciccin.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                    |
+|------------------------------- | ------------------------ |
+| Name                           | Giovanni Battista Parodi |
+| Company name (if applicable)   |                          |
+| Title or role (if applicable)  |                          |
+| Date                           | 2020-03-19               |
+| GitHub username                | Baciccin                 |
+| Website (optional)             |                          |
--- a/.github/contributors/MiniLau.md
+++ b/.github/contributors/MiniLau.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Desausoi Laurent    |
+| Company name (if applicable)   |          /           |
+| Title or role (if applicable)  |          /           |
+| Date                           |  22 November 2019    |
+| GitHub username                |        MiniLau       |
+| Website (optional)             |          /           |
--- a/.github/contributors/Mlawrence95.md
+++ b/.github/contributors/Mlawrence95.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ x ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |   Mike Lawrence      |
+| Company name (if applicable)   |        NA            |
+| Title or role (if applicable)  |        NA            |
+| Date                           |     April 17, 2020   |
+| GitHub username                |     Mlawrence95      |
+| Website (optional)             |                      |
--- a/.github/contributors/YohannesDatasci.md
+++ b/.github/contributors/YohannesDatasci.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |     Yohannes         |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |     2020-04-02       |
+| GitHub username                |   YohannesDatasci    |
+| Website (optional)             |                      |
--- a/.github/contributors/chopeen.md
+++ b/.github/contributors/chopeen.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Marek Grzenkowicz    |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2020.04.10           |
+| GitHub username                | chopeen              |
+| Website (optional)             |                      |
--- a/.github/contributors/elben10
+++ b/.github/contributors/elben10
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Jakob Jul Elben      |
+| Company name (if applicable)   | N/A                  |
+| Title or role (if applicable)  | N/A                  |
+| Date                           | April 16th, 2020     |
+| GitHub username                | elben10              |
+| Website (optional)             | N/A                  |
--- a/.github/contributors/ilivans.md
+++ b/.github/contributors/ilivans.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                    |
+|------------------------------- | ------------------------ |
+| Name                           | Ilia Ivanov              |
+| Company name (if applicable)   | Chattermill              |
+| Title or role (if applicable)  | DL Engineer              |
+| Date                           | 2020-05-14               |
+| GitHub username                | ilivans                  |
+| Website (optional)             |                          |
--- a/.github/contributors/jacse.md
+++ b/.github/contributors/jacse.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Jacob Lauritzen      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2020-03-30           |
+| GitHub username                | jacse                |
+| Website (optional)             |                      |
--- a/.github/contributors/kevinlu1248.md
+++ b/.github/contributors/kevinlu1248.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |              Kevin Lu|
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |               Student|
+| Date                           |                      |
+| GitHub username                |           kevinlu1248|
+| Website (optional)             |                      |
--- a/.github/contributors/laszabine.md
+++ b/.github/contributors/laszabine.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Sabine Laszakovits                     |
+| Company name (if applicable)   | Austrian Academy of Sciences                     |
+| Title or role (if applicable)  | Data analyst                     |
+| Date                           | 2020-04-16                     |
+| GitHub username                | laszabine                     |
+| Website (optional)             | https://sabine.laszakovits.net                     |
--- a/.github/contributors/leicmi.md
+++ b/.github/contributors/leicmi.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Michael Leichtfried  |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 30.03.2020           |
+| GitHub username                | leicmi               |
+| Website (optional)             |                      |
--- a/.github/contributors/louisguitton.md
+++ b/.github/contributors/louisguitton.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Louis Guitton        |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2020-04-25           |
+| GitHub username                | louisguitton         |
+| Website (optional)             | https://guitton.co/  |
--- a/.github/contributors/michael-k.md
+++ b/.github/contributors/michael-k.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Michael Käufl        |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2020-04-23           |
+| GitHub username                | michael-k            |
+| Website (optional)             |                      |
--- a/.github/contributors/nikhilsaldanha.md
+++ b/.github/contributors/nikhilsaldanha.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Nikhil Saldanha      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2020-03-17           |
+| GitHub username                | nikhilsaldanha       |
+| Website (optional)             |                      |
--- a/.github/contributors/osori.md
+++ b/.github/contributors/osori.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Ilkyu Ju             |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2020-05-17           |
+| GitHub username                | osori                |
+| Website (optional)             |                      |
--- a/.github/contributors/paoloq.md
+++ b/.github/contributors/paoloq.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Paolo Arduin         |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 9 April 2020         |
+| GitHub username                | paoloq               |
+| Website (optional)             |                      |
--- a/.github/contributors/punitvara.md
+++ b/.github/contributors/punitvara.md
@ -0,0 +1,107 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                    |
+|------------------------------- | ------------------------ |
+| Name                           | Punit Vara	            |
+| Company name (if applicable)   |                          |
+| Title or role (if applicable)  | 		            |
+| Date                           | 2020-04-26               |
+| GitHub username                | punitvara                |
+| Website (optional)             | https://punitvara.com    |
+
--- a/.github/contributors/sabiqueqb.md
+++ b/.github/contributors/sabiqueqb.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Sabique Ahammed Lava |
+| Company name (if applicable)   | QBurst               |
+| Title or role (if applicable)  | Senior Engineer      |
+| Date                           | 24 Apr 2020          |
+| GitHub username                | sabiqueqb            |
+| Website (optional)             |                      |
--- a/.github/contributors/sebastienharinck.md
+++ b/.github/contributors/sebastienharinck.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                                        |
+|------------------------------- | -------------------------------------------- |
+| Name                           | Sébastien Harinck                            |
+| Company name (if applicable)   | Odaxiom                                      |
+| Title or role (if applicable)  | ML Engineer                                  |
+| Date                           | 2020-04-15                                   |
+| GitHub username                | sebastienharinck                             |
+| Website (optional)             | [https://odaxiom.com](https://odaxiom.com)   |
--- a/.github/contributors/thomasthiebaud.md
+++ b/.github/contributors/thomasthiebaud.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1.  The term "contribution" or "contributed materials" means any source code,
+    object code, patch, tool, sample, graphic, specification, manual,
+    documentation, or any other material posted or submitted by you to the project.
+
+2.  With respect to any worldwide copyrights, or copyright applications and
+    registrations, in your contribution:
+
+        * you hereby assign to us joint ownership, and to the extent that such
+        assignment is or becomes invalid, ineffective or unenforceable, you hereby
+        grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+        royalty-free, unrestricted license to exercise all rights under those
+        copyrights. This includes, at our option, the right to sublicense these same
+        rights to third parties through multiple levels of sublicensees or other
+        licensing arrangements;
+
+        * you agree that each of us can do all things in relation to your
+        contribution as if each of us were the sole owners, and if one of us makes
+        a derivative work of your contribution, the one who makes the derivative
+        work (or has it made will be the sole owner of that derivative work;
+
+        * you agree that you will not assert any moral rights in your contribution
+        against us, our licensees or transferees;
+
+        * you agree that we may register a copyright in your contribution and
+        exercise all ownership rights associated with it; and
+
+        * you agree that neither of us has any duty to consult with, obtain the
+        consent of, pay or render an accounting to the other for any use or
+        distribution of your contribution.
+
+3.  With respect to any patents you own, or that you can license without payment
+    to any third party, you hereby grant to us a perpetual, irrevocable,
+    non-exclusive, worldwide, no-charge, royalty-free license to:
+
+        * make, have made, use, sell, offer to sell, import, and otherwise transfer
+        your contribution in whole or in part, alone or in combination with or
+        included in any product, work or materials arising out of the project to
+        which your contribution was submitted, and
+
+        * at our option, to sublicense these same rights to third parties through
+        multiple levels of sublicensees or other licensing arrangements.
+
+4.  Except as set out above, you keep all right, title, and interest in your
+    contribution. The rights that you grant to us under these terms are effective
+    on the date you first submitted a contribution to us, even if your submission
+    took place before the date you sign these terms.
+
+5.  You covenant, represent, warrant and agree that:
+
+    - Each contribution that you submit is and shall be an original work of
+      authorship and you can legally grant the rights set out in this SCA;
+
+    - to the best of your knowledge, each contribution will not violate any
+      third party's copyrights, trademarks, patents, or other intellectual
+      property rights; and
+
+    - each contribution shall be in compliance with U.S. export control laws and
+      other applicable export and import laws. You agree to notify us if you
+      become aware of any circumstance which would make any of the foregoing
+      representations inaccurate in any respect. We may publicly disclose your
+      participation in the project, including the fact that you have signed the SCA.
+
+6.  This SCA is governed by the laws of the State of California and applicable
+    U.S. Federal law. Any choice of law rules will not apply.
+
+7.  Please place an “x” on one of the applicable statement below. Please do NOT
+    mark both statements:
+
+        * [x] I am signing on behalf of myself as an individual and no other person
+        or entity, including my employer, has or will have rights with respect to my
+        contributions.
+
+        * [ ] I am signing on behalf of my employer or a legal entity and I have the
+        actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                         | Entry           |
+| ----------------------------- | --------------- |
+| Name                          | Thomas Thiebaud |
+| Company name (if applicable)  |                 |
+| Title or role (if applicable) |                 |
+| Date                          | 2020-04-07      |
+| GitHub username               | thomasthiebaud  |
+| Website (optional)            |                 |
--- a/.github/contributors/tommilligan.md
+++ b/.github/contributors/tommilligan.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1.  The term "contribution" or "contributed materials" means any source code,
+    object code, patch, tool, sample, graphic, specification, manual,
+    documentation, or any other material posted or submitted by you to the project.
+
+2.  With respect to any worldwide copyrights, or copyright applications and
+    registrations, in your contribution:
+
+        * you hereby assign to us joint ownership, and to the extent that such
+        assignment is or becomes invalid, ineffective or unenforceable, you hereby
+        grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+        royalty-free, unrestricted license to exercise all rights under those
+        copyrights. This includes, at our option, the right to sublicense these same
+        rights to third parties through multiple levels of sublicensees or other
+        licensing arrangements;
+
+        * you agree that each of us can do all things in relation to your
+        contribution as if each of us were the sole owners, and if one of us makes
+        a derivative work of your contribution, the one who makes the derivative
+        work (or has it made will be the sole owner of that derivative work;
+
+        * you agree that you will not assert any moral rights in your contribution
+        against us, our licensees or transferees;
+
+        * you agree that we may register a copyright in your contribution and
+        exercise all ownership rights associated with it; and
+
+        * you agree that neither of us has any duty to consult with, obtain the
+        consent of, pay or render an accounting to the other for any use or
+        distribution of your contribution.
+
+3.  With respect to any patents you own, or that you can license without payment
+    to any third party, you hereby grant to us a perpetual, irrevocable,
+    non-exclusive, worldwide, no-charge, royalty-free license to:
+
+        * make, have made, use, sell, offer to sell, import, and otherwise transfer
+        your contribution in whole or in part, alone or in combination with or
+        included in any product, work or materials arising out of the project to
+        which your contribution was submitted, and
+
+        * at our option, to sublicense these same rights to third parties through
+        multiple levels of sublicensees or other licensing arrangements.
+
+4.  Except as set out above, you keep all right, title, and interest in your
+    contribution. The rights that you grant to us under these terms are effective
+    on the date you first submitted a contribution to us, even if your submission
+    took place before the date you sign these terms.
+
+5.  You covenant, represent, warrant and agree that:
+
+    - Each contribution that you submit is and shall be an original work of
+      authorship and you can legally grant the rights set out in this SCA;
+
+    - to the best of your knowledge, each contribution will not violate any
+      third party's copyrights, trademarks, patents, or other intellectual
+      property rights; and
+
+    - each contribution shall be in compliance with U.S. export control laws and
+      other applicable export and import laws. You agree to notify us if you
+      become aware of any circumstance which would make any of the foregoing
+      representations inaccurate in any respect. We may publicly disclose your
+      participation in the project, including the fact that you have signed the SCA.
+
+6.  This SCA is governed by the laws of the State of California and applicable
+    U.S. Federal law. Any choice of law rules will not apply.
+
+7.  Please place an “x” on one of the applicable statement below. Please do NOT
+    mark both statements:
+
+        * [x] I am signing on behalf of myself as an individual and no other person
+        or entity, including my employer, has or will have rights with respect to my
+        contributions.
+
+        * [ ] I am signing on behalf of my employer or a legal entity and I have the
+        actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                         | Entry        |
+| ----------------------------- | ------------ |
+| Name                          | Tom Milligan |
+| Company name (if applicable)  |              |
+| Title or role (if applicable) |              |
+| Date                          | 2020-03-24   |
+| GitHub username               | tommilligan  |
+| Website (optional)            |              |
--- a/.github/contributors/umarbutler.md
+++ b/.github/contributors/umarbutler.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                    |
+|------------------------------- | ------------------------ |
+| Name                           | Umar Butler              |
+| Company name (if applicable)   |                          |
+| Title or role (if applicable)  |                          |
+| Date                           | 2020-04-09               |
+| GitHub username                | umarbutler               |
+| Website (optional)             | https://umarbutler.com   |
--- a/.github/contributors/vishnupriyavr.md
+++ b/.github/contributors/vishnupriyavr.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                    |
+|------------------------------- | ------------------------ |
+| Name                           | Vishnu Priya VR          |
+| Company name (if applicable)   | Uniphore                 |
+| Title or role (if applicable)  | NLP/AI Engineer          |
+| Date                           | 2020-05-03               |
+| GitHub username                | vishnupriyavr            |
+| Website (optional)             |                          |
--- a/.github/contributors/vondersam.md
+++ b/.github/contributors/vondersam.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI GmbH](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                   |
+|------------------------------- | ------------------------|
+| Name                           | Samuel Rodríguez Medina |
+| Company name (if applicable)   |                         |
+| Title or role (if applicable)  | Computational linguist  |
+| Date                           | 28 April 2020           |
+| GitHub username                | vondersam               |
+| Website (optional)             |                         |
--- a/examples/training/rehearsal.py
+++ b/examples/training/rehearsal.py
@ -1,6 +1,7 @@
 """Prevent catastrophic forgetting with rehearsal updates."""
 import plac
 import random
+import warnings
 import srsly
 import spacy
 from spacy.gold import GoldParse
@ -66,7 +67,10 @@ def main(model_name, unlabelled_loc):
    pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    sizes = compounding(1.0, 4.0, 1.001)
-    with nlp.disable_pipes(*other_pipes):
+    with nlp.disable_pipes(*other_pipes) and warnings.catch_warnings():
+        # show warnings for misaligned entity spans once
+        warnings.filterwarnings("once", category=UserWarning, module='spacy')
+
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            random.shuffle(raw_docs)
--- a/examples/training/train_entity_linker.py
+++ b/examples/training/train_entity_linker.py
@ -64,7 +64,7 @@ def main(kb_path, vocab_path=None, output_dir=None, n_iter=50):
    """Create a blank model with the specified vocab, set up the pipeline and train the entity linker.
    The `vocab` should be the one used during creation of the KB."""
    vocab = Vocab().from_disk(vocab_path)
-    # create blank Language class with correct vocab
+    # create blank English model with correct vocab
    nlp = spacy.blank("en", vocab=vocab)
    nlp.vocab.vectors.name = "spacy_pretrained_vectors"
    print("Created blank 'en' model with vocab from '%s'" % vocab_path)
--- a/examples/training/train_ner.py
+++ b/examples/training/train_ner.py
@ -8,12 +8,13 @@ For more details, see the documentation:
 * NER: https://spacy.io/usage/linguistic-features#named-entities

 Compatible with: spaCy v2.0.0+
-Last tested with: v2.1.0
+Last tested with: v2.2.4
 """
 from __future__ import unicode_literals, print_function

 import plac
 import random
+import warnings
 from pathlib import Path
 import spacy
 from spacy.util import minibatch, compounding
@ -57,7 +58,11 @@ def main(model=None, output_dir=None, n_iter=100):
    # get names of other pipes to disable them during training
    pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
-    with nlp.disable_pipes(*other_pipes):  # only train NER
+    # only train NER
+    with nlp.disable_pipes(*other_pipes) and warnings.catch_warnings():
+        # show warnings for misaligned entity spans once
+        warnings.filterwarnings("once", category=UserWarning, module='spacy')
+
        # reset and initialize the weights randomly – but only if we're
        # training a new model
        if model is None:
--- a/examples/training/train_new_entity_type.py
+++ b/examples/training/train_new_entity_type.py
@ -24,12 +24,13 @@ For more details, see the documentation:
 * NER: https://spacy.io/usage/linguistic-features#named-entities

 Compatible with: spaCy v2.1.0+
-Last tested with: v2.1.0
+Last tested with: v2.2.4
 """
 from __future__ import unicode_literals, print_function

 import plac
 import random
+import warnings
 from pathlib import Path
 import spacy
 from spacy.util import minibatch, compounding
@ -97,7 +98,11 @@ def main(model=None, new_model_name="animal", output_dir=None, n_iter=30):
    # get names of other pipes to disable them during training
    pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
-    with nlp.disable_pipes(*other_pipes):  # only train NER
+    # only train NER
+    with nlp.disable_pipes(*other_pipes) and warnings.catch_warnings():
+        # show warnings for misaligned entity spans once
+        warnings.filterwarnings("once", category=UserWarning, module='spacy')
+
        sizes = compounding(1.0, 4.0, 1.001)
        # batch up the examples using spaCy's minibatch
        for itn in range(n_iter):
--- a/setup.cfg
+++ b/setup.cfg
@ -30,7 +30,7 @@ zip_safe = false
 include_package_data = true
 scripts =
    bin/spacy
-python_requires = >=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*
+python_requires = >=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*
 setup_requires =
    wheel
    cython>=0.25
@ -59,19 +59,23 @@ install_requires =

 [options.extras_require]
 lookups =
-    spacy_lookups_data>=0.0.5,<0.2.0
+    spacy_lookups_data>=0.3.1,<0.4.0
 cuda =
-    cupy>=5.0.0b4
+    cupy>=5.0.0b4,<9.0.0
 cuda80 =
-    cupy-cuda80>=5.0.0b4
+    cupy-cuda80>=5.0.0b4,<9.0.0
 cuda90 =
-    cupy-cuda90>=5.0.0b4
+    cupy-cuda90>=5.0.0b4,<9.0.0
 cuda91 =
-    cupy-cuda91>=5.0.0b4
+    cupy-cuda91>=5.0.0b4,<9.0.0
 cuda92 =
-    cupy-cuda92>=5.0.0b4
+    cupy-cuda92>=5.0.0b4,<9.0.0
 cuda100 =
-    cupy-cuda100>=5.0.0b4
+    cupy-cuda100>=5.0.0b4,<9.0.0
+cuda101 =
+    cupy-cuda101>=5.0.0b4,<9.0.0
+cuda102 =
+    cupy-cuda102>=5.0.0b4,<9.0.0
 # Language tokenizers with external dependencies
 ja =
    fugashi>=0.1.3
--- a/setup.py
+++ b/setup.py
@ -31,7 +31,6 @@ PACKAGES = find_packages()


 MOD_NAMES = [
-    "spacy._align",
    "spacy.parts_of_speech",
    "spacy.strings",
    "spacy.lexeme",
--- a/spacy/init.py
+++ b/spacy/init.py
@ -13,7 +13,7 @@ from . import pipeline
 from .cli.info import info as cli_info
 from .glossary import explain
 from .about import __version__
-from .errors import Errors, Warnings, deprecation_warning
+from .errors import Errors, Warnings
 from . import util
 from .util import registry
 from .language import component
@ -26,7 +26,7 @@ if sys.maxunicode == 65535:
 def load(name, **overrides):
    depr_path = overrides.get("path")
    if depr_path not in (True, False, None):
-        deprecation_warning(Warnings.W001.format(path=depr_path))
+        warnings.warn(Warnings.W001.format(path=depr_path), DeprecationWarning)
    return util.load_model(name, **overrides)


--- a/spacy/_align.pyx
+++ b/spacy/_align.pyx
@ -1,255 +0,0 @@
-# cython: infer_types=True
-'''Do Levenshtein alignment, for evaluation of tokenized input.
-
-Random notes:
-
-  r i n g
-  0 1 2 3 4
-r 1 0 1 2 3
-a 2 1 1 2 3
-n 3 2 2 1 2
-g 4 3 3 2 1
-
-0,0: (1,1)=min(0+0,1+1,1+1)=0 S
-1,0: (2,1)=min(1+1,0+1,2+1)=1 D
-2,0: (3,1)=min(2+1,3+1,1+1)=2 D
-3,0: (4,1)=min(3+1,4+1,2+1)=3 D
-0,1: (1,2)=min(1+1,2+1,0+1)=1 D
-1,1: (2,2)=min(0+1,1+1,1+1)=1 S
-2,1: (3,2)=min(1+1,1+1,2+1)=2 S or I
-3,1: (4,2)=min(2+1,2+1,3+1)=3 S or I
-0,2: (1,3)=min(2+1,3+1,1+1)=2 I
-1,2: (2,3)=min(1+1,2+1,1+1)=2 S or I
-2,2: (3,3)
-3,2: (4,3)
-At state (i, j) we're asking "How do I transform S[:i+1] to T[:j+1]?"
-
-We know the costs to transition:
-
-S[:i]   -> T[:j]   (at D[i,j])
-S[:i+1] -> T[:j]   (at D[i+1,j])
-S[:i]   -> T[:j+1] (at D[i,j+1])
-    
-Further, now we can transform:
-S[:i+1] -> S[:i] (DEL) for 1,
-T[:j+1] -> T[:j] (INS) for 1.
-S[i+1]  -> T[j+1] (SUB) for 0 or 1
-
-Therefore we have the costs:
-SUB: Cost(S[:i]->T[:j])   + Cost(S[i]->S[j])
-i.e. D[i, j] + S[i+1] != T[j+1]
-INS: Cost(S[:i+1]->T[:j]) + Cost(T[:j+1]->T[:j])
-i.e. D[i+1,j] + 1
-DEL: Cost(S[:i]->T[:j+1]) + Cost(S[:i+1]->S[:i]) 
-i.e. D[i,j+1] + 1
-
-    Source string S has length m, with index i
-    Target string T has length n, with index j
-
-    Output two alignment vectors: i2j (length m) and j2i (length n)
-    # function LevenshteinDistance(char s[1..m], char t[1..n]):
-    # for all i and j, d[i,j] will hold the Levenshtein distance between
-    # the first i characters of s and the first j characters of t
-    # note that d has (m+1)*(n+1) values
-    # set each element in d to zero
-    ring rang
-      - r i n g
-    - 0 0 0 0 0
-    r 0 0 0 0 0
-    a 0 0 0 0 0
-    n 0 0 0 0 0
-    g 0 0 0 0 0
-
-    # source prefixes can be transformed into empty string by
-    # dropping all characters
-    # d[i, 0] := i
-    ring rang
-      - r i n g
-    - 0 0 0 0 0
-    r 1 0 0 0 0
-    a 2 0 0 0 0
-    n 3 0 0 0 0
-    g 4 0 0 0 0
-
-    # target prefixes can be reached from empty source prefix
-    # by inserting every character
-    # d[0, j] := j
-      - r i n g
-    - 0 1 2 3 4
-    r 1 0 0 0 0
-    a 2 0 0 0 0
-    n 3 0 0 0 0
-    g 4 0 0 0 0
-
-'''
-from __future__ import unicode_literals
-from libc.stdint cimport uint32_t
-import numpy
-cimport numpy as np
-from .compat import unicode_
-from murmurhash.mrmr cimport hash32
-
-
-def align(S, T):
-    cdef int m = len(S)
-    cdef int n = len(T)
-    cdef np.ndarray matrix = numpy.zeros((m+1, n+1), dtype='int32')
-    cdef np.ndarray i2j = numpy.zeros((m,), dtype='i')
-    cdef np.ndarray j2i = numpy.zeros((n,), dtype='i')
-
-    cdef np.ndarray S_arr = _convert_sequence(S)
-    cdef np.ndarray T_arr = _convert_sequence(T)
-
-    fill_matrix(<int*>matrix.data,
-        <const int*>S_arr.data, m, <const int*>T_arr.data, n)
-    fill_i2j(i2j, matrix)
-    fill_j2i(j2i, matrix)
-    for i in range(i2j.shape[0]):
-        if i2j[i] >= 0 and len(S[i]) != len(T[i2j[i]]):
-            i2j[i] = -1
-    for j in range(j2i.shape[0]):
-        if j2i[j] >= 0 and len(T[j]) != len(S[j2i[j]]):
-            j2i[j] = -1
-    return matrix[-1,-1], i2j, j2i, matrix
-
-
-def multi_align(np.ndarray i2j, np.ndarray j2i, i_lengths, j_lengths):
-    '''Let's say we had:
-
-    Guess: [aa bb cc dd]
-    Truth: [aa bbcc dd]
-    i2j: [0, None, -2, 2]
-    j2i: [0, -2, 3]
-
-    We want:
-
-    i2j_multi: {1: 1, 2: 1}
-    j2i_multi: {}
-    '''
-    i2j_miss = _get_regions(i2j, i_lengths)
-    j2i_miss = _get_regions(j2i, j_lengths)
-
-    i2j_multi, j2i_multi = _get_mapping(i2j_miss, j2i_miss, i_lengths, j_lengths)
-    return i2j_multi, j2i_multi
-
-
-def _get_regions(alignment, lengths):
-    regions = {}
-    start = None
-    offset = 0
-    for i in range(len(alignment)):
-        if alignment[i] < 0:
-            if start is None:
-                start = offset
-                regions.setdefault(start, [])
-            regions[start].append(i)
-        else:
-            start = None
-        offset += lengths[i]
-    return regions
-
-
-def _get_mapping(miss1, miss2, lengths1, lengths2):
-    i2j = {}
-    j2i = {}
-    for start, region1 in miss1.items():
-        if not region1 or start not in miss2:
-            continue
-        region2 = miss2[start]
-        if sum(lengths1[i] for i in region1) == sum(lengths2[i] for i in region2):
-            j = region2.pop(0)
-            buff = []
-            # Consume tokens from region 1, until we meet the length of the
-            # first token in region2. If we do, align the tokens. If
-            # we exceed the length, break.
-            while region1:
-                buff.append(region1.pop(0))
-                if sum(lengths1[i] for i in buff) == lengths2[j]:
-                    for i in buff:
-                        i2j[i] = j
-                    j2i[j] = buff[-1]
-                    j += 1
-                    buff = []
-                elif sum(lengths1[i] for i in buff) > lengths2[j]:
-                    break
-            else:
-                if buff and sum(lengths1[i] for i in buff) == lengths2[j]:
-                    for i in buff:
-                        i2j[i] = j
-                    j2i[j] = buff[-1]
-    return i2j, j2i
-
-
-def _convert_sequence(seq):
-    if isinstance(seq, numpy.ndarray):
-        return numpy.ascontiguousarray(seq, dtype='uint32_t')
-    cdef np.ndarray output = numpy.zeros((len(seq),), dtype='uint32')
-    cdef bytes item_bytes
-    for i, item in enumerate(seq):
-        if item == "``":
-            item = '"'
-        elif item == "''":
-            item = '"'
-        if isinstance(item, unicode):
-            item_bytes = item.encode('utf8')
-        else:
-            item_bytes = item
-        output[i] = hash32(<void*><char*>item_bytes, len(item_bytes), 0)
-    return output
-
-
-cdef void fill_matrix(int* D, 
-        const int* S, int m, const int* T, int n) nogil:
-    m1 = m+1
-    n1 = n+1
-    for i in range(m1*n1):
-        D[i] = 0
- 
-    for i in range(m1):
-        D[i*n1] = i
- 
-    for j in range(n1):
-        D[j] = j
- 
-    cdef int sub_cost, ins_cost, del_cost
-    for j in range(n):
-        for i in range(m):
-            i_j = i*n1 + j
-            i1_j1 = (i+1)*n1 + j+1
-            i1_j = (i+1)*n1 + j
-            i_j1 = i*n1 + j+1
-            if S[i] != T[j]:
-                sub_cost = D[i_j] + 1
-            else:
-                sub_cost = D[i_j]
-            del_cost = D[i_j1] + 1
-            ins_cost = D[i1_j] + 1
-            best = min(min(sub_cost, ins_cost), del_cost)
-            D[i1_j1] = best
-
-
-cdef void fill_i2j(np.ndarray i2j, np.ndarray D) except *:
-    j = D.shape[1]-2
-    cdef int i = D.shape[0]-2
-    while i >= 0:
-        while D[i+1, j] < D[i+1, j+1]:
-            j -= 1
-        if D[i, j+1] < D[i+1, j+1]:
-            i2j[i] = -1
-        else:
-            i2j[i] = j
-            j -= 1
-        i -= 1
-
-cdef void fill_j2i(np.ndarray j2i, np.ndarray D) except *:
-    i = D.shape[0]-2
-    cdef int j = D.shape[1]-2
-    while j >= 0:
-        while D[i, j+1] < D[i+1, j+1]:
-            i -= 1
-        if D[i+1, j] < D[i+1, j+1]:
-            j2i[j] = -1
-        else:
-            j2i[j] = i
-            i -= 1
-        j -= 1
--- a/spacy/_ml.py
+++ b/spacy/_ml.py
@ -2,6 +2,7 @@
 from __future__ import unicode_literals

 import numpy
+import warnings
 from thinc.v2v import Model, Maxout, Softmax, Affine, ReLu
 from thinc.t2t import ExtractWindow, ParametricAttention
 from thinc.t2v import Pooling, sum_pool, mean_pool
@ -22,7 +23,7 @@ from thinc.neural._classes.affine import _set_dimensions_if_needed
 import thinc.extra.load_nlp

 from .attrs import ID, ORTH, LOWER, NORM, PREFIX, SUFFIX, SHAPE
-from .errors import Errors, user_warning, Warnings
+from .errors import Errors, Warnings
 from . import util
 from . import ml as new_ml
 from .ml import _legacy_tok2vec
@ -278,18 +279,19 @@ class PrecomputableAffine(Model):
                break


-def link_vectors_to_models(vocab):
+def link_vectors_to_models(vocab, skip_rank=False):
    vectors = vocab.vectors
    if vectors.name is None:
        vectors.name = VECTORS_KEY
        if vectors.data.size != 0:
-            user_warning(Warnings.W020.format(shape=vectors.data.shape))
+            warnings.warn(Warnings.W020.format(shape=vectors.data.shape))
    ops = Model.ops
+    if not skip_rank:
        for word in vocab:
            if word.orth in vectors.key2row:
                word.rank = vectors.key2row[word.orth]
            else:
-            word.rank = 0
+                word.rank = util.OOV_RANK
    data = ops.asarray(vectors.data)
    # Set an entry here, so that vectors are accessed by StaticVectors
    # (unideal, I know)
@ -299,7 +301,7 @@ def link_vectors_to_models(vocab):
            # This is a hack to avoid the problem in #3853.
            old_name = vectors.name
            new_name = vectors.name + "_%d" % data.shape[0]
-            user_warning(Warnings.W019.format(old=old_name, new=new_name))
+            warnings.warn(Warnings.W019.format(old=old_name, new=new_name))
            vectors.name = new_name
            key = (ops.device, vectors.name)
    thinc.extra.load_nlp.VECTORS[key] = data
@ -693,9 +695,11 @@ def build_text_classifier(nr_class, width=64, **cfg):
        )

        linear_model = build_bow_text_classifier(
-            nr_class, ngram_size=cfg.get("ngram_size", 1), exclusive_classes=False
+            nr_class,
+            ngram_size=cfg.get("ngram_size", 1),
+            exclusive_classes=cfg.get("exclusive_classes", False),
        )
-        if cfg.get("exclusive_classes"):
+        if cfg.get("exclusive_classes", False):
            output_layer = Softmax(nr_class, nr_class * 2)
        else:
            output_layer = (
--- a/spacy/analysis.py
+++ b/spacy/analysis.py
@ -1,11 +1,13 @@
 # coding: utf8
 from __future__ import unicode_literals

+import warnings
+
 from collections import OrderedDict
 from wasabi import Printer

 from .tokens import Doc, Token, Span
-from .errors import Errors, Warnings, user_warning
+from .errors import Errors, Warnings


 def analyze_pipes(pipeline, name, pipe, index, warn=True):
@ -34,7 +36,7 @@ def analyze_pipes(pipeline, name, pipe, index, warn=True):
        if not fulfilled:
            problems.append(annot)
            if warn:
-                user_warning(Warnings.W025.format(name=name, attr=annot))
+                warnings.warn(Warnings.W025.format(name=name, attr=annot))
    return problems


--- a/spacy/attrs.pxd
+++ b/spacy/attrs.pxd
@ -15,7 +15,7 @@ cdef enum attr_id_t:
    LIKE_NUM
    LIKE_EMAIL
    IS_STOP
-    IS_OOV
+    IS_OOV_DEPRECATED
    IS_BRACKET
    IS_QUOTE
    IS_LEFT_PUNCT
@ -94,3 +94,4 @@ cdef enum attr_id_t:
    ENT_ID = symbols.ENT_ID

    IDX
+    SENT_END
--- a/spacy/attrs.pyx
+++ b/spacy/attrs.pyx
@ -16,7 +16,7 @@ IDS = {
    "LIKE_NUM": LIKE_NUM,
    "LIKE_EMAIL": LIKE_EMAIL,
    "IS_STOP": IS_STOP,
-    "IS_OOV": IS_OOV,
+    "IS_OOV_DEPRECATED": IS_OOV_DEPRECATED,
    "IS_BRACKET": IS_BRACKET,
    "IS_QUOTE": IS_QUOTE,
    "IS_LEFT_PUNCT": IS_LEFT_PUNCT,
@ -88,6 +88,7 @@ IDS = {
    "ENT_KB_ID": ENT_KB_ID,
    "HEAD": HEAD,
    "SENT_START": SENT_START,
+    "SENT_END": SENT_END,
    "SPACY": SPACY,
    "PROB": PROB,
    "LANG": LANG,
--- a/spacy/cli/debug_data.py
+++ b/spacy/cli/debug_data.py
@ -108,9 +108,11 @@ def debug_data(
    msg.good("Corpus is loadable")

    # Create all gold data here to avoid iterating over the train_docs constantly
-    gold_train_data = _compile_gold(train_docs, pipeline)
-    gold_train_unpreprocessed_data = _compile_gold(train_docs_unpreprocessed, pipeline)
-    gold_dev_data = _compile_gold(dev_docs, pipeline)
+    gold_train_data = _compile_gold(train_docs, pipeline, nlp)
+    gold_train_unpreprocessed_data = _compile_gold(
+        train_docs_unpreprocessed, pipeline, nlp
+    )
+    gold_dev_data = _compile_gold(dev_docs, pipeline, nlp)

    train_texts = gold_train_data["texts"]
    dev_texts = gold_dev_data["texts"]
@ -182,6 +184,21 @@ def debug_data(
                nlp.vocab.vectors_length,
            )
        )
+        n_missing_vectors = sum(gold_train_data["words_missing_vectors"].values())
+        msg.warn(
+            "{} words in training data without vectors ({:0.2f}%)".format(
+                n_missing_vectors, n_missing_vectors / gold_train_data["n_words"],
+            ),
+        )
+        msg.text(
+            "10 most common words without vectors: {}".format(
+                _format_labels(
+                    gold_train_data["words_missing_vectors"].most_common(10),
+                    counts=True,
+                )
+            ),
+            show=verbose,
+        )
    else:
        msg.info("No word vectors present in the model")

@ -562,7 +579,7 @@ def _load_file(file_path, msg):
    )


-def _compile_gold(train_docs, pipeline):
+def _compile_gold(train_docs, pipeline, nlp):
    data = {
        "ner": Counter(),
        "cats": Counter(),
@ -574,6 +591,7 @@ def _compile_gold(train_docs, pipeline):
        "punct_ents": 0,
        "n_words": 0,
        "n_misaligned_words": 0,
+        "words_missing_vectors": Counter(),
        "n_sents": 0,
        "n_nonproj": 0,
        "n_cycles": 0,
@ -586,6 +604,10 @@ def _compile_gold(train_docs, pipeline):
        data["n_words"] += len(valid_words)
        data["n_misaligned_words"] += len(gold.words) - len(valid_words)
        data["texts"].add(doc.text)
+        if len(nlp.vocab.vectors):
+            for word in valid_words:
+                if nlp.vocab.strings[word] not in nlp.vocab.vectors:
+                    data["words_missing_vectors"].update([word])
        if "ner" in pipeline:
            for i, label in enumerate(gold.ner):
                if label is None:
@ -636,7 +658,11 @@ def _format_labels(labels, counts=False):
 def _get_examples_without_label(data, label):
    count = 0
    for doc, gold in data:
-        labels = [label.split("-")[1] for label in gold.ner if label not in ("O", "-")]
+        labels = [
+            label.split("-")[1]
+            for label in gold.ner
+            if label is not None and label not in ("O", "-")
+        ]
        if label not in labels:
            count += 1
    return count
--- a/spacy/cli/evaluate.py
+++ b/spacy/cli/evaluate.py
@ -43,6 +43,9 @@ def evaluate(
    if displacy_path and not displacy_path.exists():
        msg.fail("Visualization output directory not found", displacy_path, exits=1)
    corpus = GoldCorpus(data_path, data_path)
+    if model.startswith("blank:"):
+        nlp = util.get_lang_class(model.replace("blank:", ""))()
+    else:
        nlp = util.load_model(model)
    dev_docs = list(corpus.dev_docs(nlp, gold_preproc=gold_preproc))
    begin = timer()
--- a/spacy/cli/init_model.py
+++ b/spacy/cli/init_model.py
@ -12,11 +12,14 @@ import tarfile
 import gzip
 import zipfile
 import srsly
+import warnings
 from wasabi import msg

 from ..vectors import Vectors
-from ..errors import Errors, Warnings, user_warning
-from ..util import ensure_path, get_lang_class
+from ..errors import Errors, Warnings
+from ..util import ensure_path, get_lang_class, load_model, OOV_RANK
+from ..lookups import Lookups
+

 try:
    import ftfy
@ -34,6 +37,12 @@ DEFAULT_OOV_PROB = -20
    jsonl_loc=("Location of JSONL-formatted attributes file", "option", "j", Path),
    clusters_loc=("Optional location of brown clusters data", "option", "c", str),
    vectors_loc=("Optional vectors file in Word2Vec format", "option", "v", str),
+    truncate_vectors=(
+        "Optional number of vectors to truncate to when reading in vectors file",
+        "option",
+        "t",
+        int,
+    ),
    prune_vectors=("Optional number of vectors to prune to", "option", "V", int),
    vectors_name=(
        "Optional name for the word vectors, e.g. en_core_web_lg.vectors",
@ -42,6 +51,8 @@ DEFAULT_OOV_PROB = -20
        str,
    ),
    model_name=("Optional name for the model meta", "option", "mn", str),
+    omit_extra_lookups=("Don't include extra lookups in model", "flag", "OEL", bool),
+    base_model=("Base model (for languages with custom tokenizers)", "option", "b", str),
 )
 def init_model(
    lang,
@ -50,9 +61,12 @@ def init_model(
    clusters_loc=None,
    jsonl_loc=None,
    vectors_loc=None,
+    truncate_vectors=0,
    prune_vectors=-1,
    vectors_name=None,
    model_name=None,
+    omit_extra_lookups=False,
+    base_model=None,
 ):
    """
    Create a new model from raw data, like word frequencies, Brown clusters
@ -84,10 +98,19 @@ def init_model(
        lex_attrs = read_attrs_from_deprecated(freqs_loc, clusters_loc)

    with msg.loading("Creating model..."):
-        nlp = create_model(lang, lex_attrs, name=model_name)
+        nlp = create_model(lang, lex_attrs, name=model_name, base_model=base_model)
+
+    # Create empty extra lexeme tables so the data from spacy-lookups-data
+    # isn't loaded if these features are accessed
+    if omit_extra_lookups:
+        nlp.vocab.lookups_extra = Lookups()
+        nlp.vocab.lookups_extra.add_table("lexeme_cluster")
+        nlp.vocab.lookups_extra.add_table("lexeme_prob")
+        nlp.vocab.lookups_extra.add_table("lexeme_settings")
+
    msg.good("Successfully created model")
    if vectors_loc is not None:
-        add_vectors(nlp, vectors_loc, prune_vectors, vectors_name)
+        add_vectors(nlp, vectors_loc, truncate_vectors, prune_vectors, vectors_name)
    vec_added = len(nlp.vocab.vectors)
    lex_added = len(nlp.vocab)
    msg.good(
@ -144,20 +167,23 @@ def read_attrs_from_deprecated(freqs_loc, clusters_loc):
    return lex_attrs


-def create_model(lang, lex_attrs, name=None):
+def create_model(lang, lex_attrs, name=None, base_model=None):
+    if base_model:
+        nlp = load_model(base_model)
+        # keep the tokenizer but remove any existing pipeline components due to
+        # potentially conflicting vectors
+        for pipe in nlp.pipe_names:
+            nlp.remove_pipe(pipe)
+    else:
        lang_class = get_lang_class(lang)
        nlp = lang_class()
    for lexeme in nlp.vocab:
-        lexeme.rank = 0
-    lex_added = 0
+        lexeme.rank = OOV_RANK
    for attrs in lex_attrs:
        if "settings" in attrs:
            continue
        lexeme = nlp.vocab[attrs["orth"]]
        lexeme.set_attrs(**attrs)
-        lexeme.is_oov = False
-        lex_added += 1
-        lex_added += 1
    if len(nlp.vocab):
        oov_prob = min(lex.prob for lex in nlp.vocab) - 1
    else:
@ -168,25 +194,24 @@ def create_model(lang, lex_attrs, name=None):
    return nlp


-def add_vectors(nlp, vectors_loc, prune_vectors, name=None):
+def add_vectors(nlp, vectors_loc, truncate_vectors, prune_vectors, name=None):
    vectors_loc = ensure_path(vectors_loc)
    if vectors_loc and vectors_loc.parts[-1].endswith(".npz"):
        nlp.vocab.vectors = Vectors(data=numpy.load(vectors_loc.open("rb")))
        for lex in nlp.vocab:
-            if lex.rank:
+            if lex.rank and lex.rank != OOV_RANK:
                nlp.vocab.vectors.add(lex.orth, row=lex.rank)
    else:
        if vectors_loc:
            with msg.loading("Reading vectors from {}".format(vectors_loc)):
-                vectors_data, vector_keys = read_vectors(vectors_loc)
+                vectors_data, vector_keys = read_vectors(vectors_loc, truncate_vectors)
            msg.good("Loaded vectors from {}".format(vectors_loc))
        else:
            vectors_data, vector_keys = (None, None)
        if vector_keys is not None:
            for word in vector_keys:
                if word not in nlp.vocab:
-                    lexeme = nlp.vocab[word]
-                    lexeme.is_oov = False
+                    nlp.vocab[word]
        if vectors_data is not None:
            nlp.vocab.vectors = Vectors(data=vectors_data, keys=vector_keys)
    if name is None:
@ -198,9 +223,11 @@ def add_vectors(nlp, vectors_loc, prune_vectors, name=None):
        nlp.vocab.prune_vectors(prune_vectors)


-def read_vectors(vectors_loc):
+def read_vectors(vectors_loc, truncate_vectors=0):
    f = open_file(vectors_loc)
    shape = tuple(int(size) for size in next(f).split())
+    if truncate_vectors >= 1:
+        shape = (truncate_vectors, shape[1])
    vectors_data = numpy.zeros(shape=shape, dtype="f")
    vectors_keys = []
    for i, line in enumerate(tqdm(f)):
@ -211,6 +238,8 @@ def read_vectors(vectors_loc):
            msg.fail(Errors.E094.format(line_num=i, loc=vectors_loc), exits=1)
        vectors_data[i] = numpy.asarray(pieces, dtype="f")
        vectors_keys.append(word)
+        if i == truncate_vectors - 1:
+            break
    return vectors_data, vectors_keys


@ -246,7 +275,7 @@ def read_freqs(freqs_loc, max_length=100, min_doc_freq=5, min_freq=50):
 def read_clusters(clusters_loc):
    clusters = {}
    if ftfy is None:
-        user_warning(Warnings.W004)
+        warnings.warn(Warnings.W004)
    with clusters_loc.open() as f:
        for line in tqdm(f):
            try:
--- a/spacy/cli/train.py
+++ b/spacy/cli/train.py
@ -15,9 +15,9 @@ import random

 from .._ml import create_default_optimizer
 from ..util import use_gpu as set_gpu
-from ..attrs import PROB, IS_OOV, CLUSTER, LANG
 from ..gold import GoldCorpus
 from ..compat import path2str
+from ..lookups import Lookups
 from .. import util
 from .. import about

@ -58,6 +58,7 @@ from .. import about
    textcat_arch=("Textcat model architecture", "option", "ta", str),
    textcat_positive_label=("Textcat positive label for binary classes with two labels", "option", "tpl", str),
    tag_map_path=("Location of JSON-formatted tag map", "option", "tm", Path),
+    omit_extra_lookups=("Don't include extra lookups in model", "flag", "OEL", bool),
    verbose=("Display more information for debug", "flag", "VV", bool),
    debug=("Run data diagnostics before training", "flag", "D", bool),
    # fmt: on
@ -97,6 +98,7 @@ def train(
    textcat_arch="bow",
    textcat_positive_label=None,
    tag_map_path=None,
+    omit_extra_lookups=False,
    verbose=False,
    debug=False,
 ):
@ -225,7 +227,9 @@ def train(
                            exits=1,
                        )
                msg.text("Extending component from base model '{}'".format(pipe))
-        disabled_pipes = nlp.disable_pipes([p for p in nlp.pipe_names if p not in pipeline])
+        disabled_pipes = nlp.disable_pipes(
+            [p for p in nlp.pipe_names if p not in pipeline]
+        )
    else:
        msg.text("Starting with blank model '{}'".format(lang))
        lang_cls = util.get_lang_class(lang)
@ -246,6 +250,14 @@ def train(
    # Update tag map with provided mapping
    nlp.vocab.morphology.tag_map.update(tag_map)

+    # Create empty extra lexeme tables so the data from spacy-lookups-data
+    # isn't loaded if these features are accessed
+    if omit_extra_lookups:
+        nlp.vocab.lookups_extra = Lookups()
+        nlp.vocab.lookups_extra.add_table("lexeme_cluster")
+        nlp.vocab.lookups_extra.add_table("lexeme_prob")
+        nlp.vocab.lookups_extra.add_table("lexeme_settings")
+
    if vectors:
        msg.text("Loading vector from model '{}'".format(vectors))
        _load_vectors(nlp, vectors)
@ -361,7 +373,7 @@ def train(
            if len(textcat_labels) == 2:
                msg.warn(
                    "If the textcat component is a binary classifier with "
-                    "exclusive classes, provide '--textcat_positive_label' for "
+                    "exclusive classes, provide '--textcat-positive-label' for "
                    "an evaluation on the positive class."
                )
            msg.text(
@ -415,10 +427,10 @@ def train(
                            losses=losses,
                        )
                    except ValueError as e:
-                        msg.warn("Error during training")
+                        err = "Error during training"
                        if init_tok2vec:
-                            msg.warn("Did you provide the same parameters during 'train' as during 'pretrain'?")
-                        msg.fail("Original error message: {}".format(e), exits=1)
+                            err += " Did you provide the same parameters during 'train' as during 'pretrain'?"
+                        msg.fail(err, "Original error message: {}".format(e), exits=1)
                    if raw_text:
                        # If raw text is available, perform 'rehearsal' updates,
                        # which use unlabelled data to reduce overfitting.
@ -452,6 +464,9 @@ def train(
                        cpu_wps = nwords / (end_time - start_time)
                    else:
                        gpu_wps = nwords / (end_time - start_time)
+                        # Only evaluate on CPU in the first iteration (for
+                        # timing) if GPU is enabled
+                        if i == 0:
                            with Model.use_device("cpu"):
                                nlp_loaded = util.load_model_from_path(epoch_model_path)
                                for name, component in nlp_loaded.pipeline:
@ -546,7 +561,11 @@ def train(
                        )
                        break
    except Exception as e:
-        msg.warn("Aborting and saving the final best model. Encountered exception: {}".format(e))
+        msg.warn(
+            "Aborting and saving the final best model. "
+            "Encountered exception: {}".format(e),
+            exits=1,
+        )
    finally:
        best_pipes = nlp.pipe_names
        if disabled_pipes:
@ -561,15 +580,25 @@ def train(
            final_meta.setdefault("speed", {})
            final_meta["speed"].setdefault("cpu", None)
            final_meta["speed"].setdefault("gpu", None)
+            meta.setdefault("speed", {})
+            meta["speed"].setdefault("cpu", None)
+            meta["speed"].setdefault("gpu", None)
            # combine cpu and gpu speeds with the base model speeds
            if final_meta["speed"]["cpu"] and meta["speed"]["cpu"]:
-                speed = _get_total_speed([final_meta["speed"]["cpu"], meta["speed"]["cpu"]])
+                speed = _get_total_speed(
+                    [final_meta["speed"]["cpu"], meta["speed"]["cpu"]]
+                )
                final_meta["speed"]["cpu"] = speed
            if final_meta["speed"]["gpu"] and meta["speed"]["gpu"]:
-                speed = _get_total_speed([final_meta["speed"]["gpu"], meta["speed"]["gpu"]])
+                speed = _get_total_speed(
+                    [final_meta["speed"]["gpu"], meta["speed"]["gpu"]]
+                )
                final_meta["speed"]["gpu"] = speed
            # if there were no speeds to update, overwrite with meta
-            if final_meta["speed"]["cpu"] is None and final_meta["speed"]["gpu"] is None:
+            if (
+                final_meta["speed"]["cpu"] is None
+                and final_meta["speed"]["gpu"] is None
+            ):
                final_meta["speed"].update(meta["speed"])
            # note: beam speeds are not combined with the base model
            if has_beam_widths:
@ -611,15 +640,6 @@ def _create_progress_bar(total):

 def _load_vectors(nlp, vectors):
    util.load_model(vectors, vocab=nlp.vocab)
-    for lex in nlp.vocab:
-        values = {}
-        for attr, func in nlp.vocab.lex_attr_getters.items():
-            # These attrs are expected to be set by data. Others should
-            # be set by calling the language functions.
-            if attr not in (CLUSTER, PROB, IS_OOV, LANG):
-                values[lex.vocab.strings[attr]] = func(lex.orth_)
-        lex.set_attrs(**values)
-        lex.is_oov = False


 def _load_pretrained_tok2vec(nlp, loc):
@ -661,6 +681,8 @@ def _find_best(experiment_dir, component):
        if epoch_model.is_dir() and epoch_model.parts[-1] != "model-final":
            accs = srsly.read_json(epoch_model / "accuracy.json")
            scores = [accs.get(metric, 0.0) for metric in _get_metrics(component)]
+            # remove per_type dicts from score list for max() comparison
+            scores = [score for score in scores if isinstance(score, float)]
            accuracies.append((scores, epoch_model))
    if accuracies:
        return max(accuracies)[1]
--- a/spacy/displacy/init.py
+++ b/spacy/displacy/init.py
@ -7,10 +7,12 @@ USAGE: https://spacy.io/usage/visualizers
 """
 from __future__ import unicode_literals

+import warnings
+
 from .render import DependencyRenderer, EntityRenderer
 from ..tokens import Doc, Span
 from ..compat import b_to_str
-from ..errors import Errors, Warnings, user_warning
+from ..errors import Errors, Warnings
 from ..util import is_in_jupyter


@ -89,7 +91,7 @@ def serve(
    from wsgiref import simple_server

    if is_in_jupyter():
-        user_warning(Warnings.W011)
+        warnings.warn(Warnings.W011)

    render(docs, style=style, page=page, minify=minify, options=options, manual=manual)
    httpd = simple_server.make_server(host, port, app)
@ -119,7 +121,7 @@ def parse_deps(orig_doc, options={}):
    """
    doc = Doc(orig_doc.vocab).from_bytes(orig_doc.to_bytes(exclude=["user_data"]))
    if not doc.is_parsed:
-        user_warning(Warnings.W005)
+        warnings.warn(Warnings.W005)
    if options.get("collapse_phrases", False):
        with doc.retokenize() as retokenizer:
            for np in list(doc.noun_chunks):
@ -146,9 +148,14 @@ def parse_deps(orig_doc, options={}):
                retokenizer.merge(span, attrs=attrs)
    fine_grained = options.get("fine_grained")
    add_lemma = options.get("add_lemma")
-    words = [{"text": w.text,
+    words = [
+        {
+            "text": w.text,
            "tag": w.tag_ if fine_grained else w.pos_,
-              "lemma": w.lemma_ if add_lemma else None} for w in doc]
+            "lemma": w.lemma_ if add_lemma else None,
+        }
+        for w in doc
+    ]

    arcs = []
    for word in doc:
@ -179,7 +186,7 @@ def parse_ents(doc, options={}):
        for ent in doc.ents
    ]
    if not ents:
-        user_warning(Warnings.W006)
+        warnings.warn(Warnings.W006)
    title = doc.user_data.get("title", None) if hasattr(doc, "user_data") else None
    settings = get_doc_settings(doc)
    return {"text": doc.text, "ents": ents, "title": title, "settings": settings}
--- a/spacy/displacy/render.py
+++ b/spacy/displacy/render.py
@ -3,7 +3,13 @@ from __future__ import unicode_literals

 import uuid

-from .templates import TPL_DEP_SVG, TPL_DEP_WORDS, TPL_DEP_WORDS_LEMMA, TPL_DEP_ARCS, TPL_ENTS
+from .templates import (
+    TPL_DEP_SVG,
+    TPL_DEP_WORDS,
+    TPL_DEP_WORDS_LEMMA,
+    TPL_DEP_ARCS,
+    TPL_ENTS,
+)
 from .templates import TPL_ENT, TPL_ENT_RTL, TPL_FIGURE, TPL_TITLE, TPL_PAGE
 from ..util import minify_html, escape_html, registry
 from ..errors import Errors
@ -83,7 +89,10 @@ class DependencyRenderer(object):
        self.width = self.offset_x + len(words) * self.distance
        self.height = self.offset_y + 3 * self.word_spacing
        self.id = render_id
-        words = [self.render_word(w["text"], w["tag"],  w.get("lemma", None), i) for i, w in enumerate(words)]
+        words = [
+            self.render_word(w["text"], w["tag"], w.get("lemma", None), i)
+            for i, w in enumerate(words)
+        ]
        arcs = [
            self.render_arrow(a["label"], a["start"], a["end"], a["dir"], i)
            for i, a in enumerate(arcs)
@ -101,7 +110,9 @@ class DependencyRenderer(object):
            lang=self.lang,
        )

-    def render_word(self, text, tag, lemma, i,):
+    def render_word(
+        self, text, tag, lemma, i,
+    ):
        """Render individual word.

        text (unicode): Word text.
@ -115,7 +126,9 @@ class DependencyRenderer(object):
            x = self.width - x
        html_text = escape_html(text)
        if lemma is not None:
-            return TPL_DEP_WORDS_LEMMA.format(text=html_text, tag=tag, lemma=lemma, x=x, y=y)
+            return TPL_DEP_WORDS_LEMMA.format(
+                text=html_text, tag=tag, lemma=lemma, x=x, y=y
+            )
        return TPL_DEP_WORDS.format(text=html_text, tag=tag, x=x, y=y)

    def render_arrow(self, label, start, end, direction, i):
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -1,17 +1,16 @@
 # coding: utf8
 from __future__ import unicode_literals

-import os
-import warnings
-import inspect
-

 def add_codes(err_cls):
    """Add error codes to string messages via class attribute names."""

-    class ErrorsWithCodes(object):
+    class ErrorsWithCodes(err_cls):
        def __getattribute__(self, code):
-            msg = getattr(err_cls, code)
+            msg = super().__getattribute__(code)
+            if code.startswith("__"):  # python system attributes like __class__
+                return msg
+            else:
                return "[{code}] {msg}".format(code=code, msg=msg)

    return ErrorsWithCodes()
@ -93,8 +92,7 @@ class Warnings(object):
    W022 = ("Training a new part-of-speech tagger using a model with no "
            "lemmatization rules or data. This means that the trained model "
            "may not be able to lemmatize correctly. If this is intentional "
-            "or the language you're using doesn't have lemmatization data, "
-            "you can ignore this warning by setting SPACY_WARNING_IGNORE=W022. "
+            "or the language you're using doesn't have lemmatization data. "
            "If this is surprising, make sure you have the spacy-lookups-data "
            "package installed.")
    W023 = ("Multiprocessing of Language.pipe is not supported in Python 2. "
@ -110,7 +108,13 @@ class Warnings(object):
    W028 = ("Doc.from_array was called with a vector of type '{type}', "
            "but is expecting one of type 'uint64' instead. This may result "
            "in problems with the vocab further on in the pipeline.")
-
+    W029 = ("Unable to align tokens with entities from character offsets. "
+            "Discarding entity annotation for the text: {text}.")
+    W030 = ("Some entities could not be aligned in the text \"{text}\" with "
+            "entities \"{entities}\". Use "
+            "`spacy.gold.biluo_tags_from_offsets(nlp.make_doc(text), entities)`"
+            " to check the alignment. Misaligned entities ('-') will be "
+            "ignored during training.")


@add_codes
@ -552,6 +556,17 @@ class Errors(object):
            "array.")
    E191 = ("Invalid head: the head token must be from the same doc as the "
            "token itself.")
+    E192 = ("Unable to resize vectors in place with cupy.")
+    E193 = ("Unable to resize vectors in place if the resized vector dimension "
+            "({new_dim}) is not the same as the current vector dimension "
+            "({curr_dim}).")
+    E194 = ("Unable to aligned mismatched text '{text}' and words '{words}'.")
+    E195 = ("Matcher can be called on {good} only, got {got}.")
+    E196 = ("Refusing to write to token.is_sent_end. Sentence boundaries can "
+            "only be fixed with token.is_sent_start.")
+    E197 = ("Row out of bounds, unable to add row {row} for key {key}.")
+    E198 = ("Unable to return {n} most similar vectors for the current vectors "
+            "table, which contains {n_rows} vectors.")


@add_codes
@ -586,64 +601,3 @@ class MatchPatternError(ValueError):

 class AlignmentError(ValueError):
    pass
-
-
-class ModelsWarning(UserWarning):
-    pass
-
-
-WARNINGS = {
-    "user": UserWarning,
-    "deprecation": DeprecationWarning,
-    "models": ModelsWarning,
-}
-
-
-def _get_warn_types(arg):
-    if arg == "":  # don't show any warnings
-        return []
-    if not arg or arg == "all":  # show all available warnings
-        return WARNINGS.keys()
-    return [w_type.strip() for w_type in arg.split(",") if w_type.strip() in WARNINGS]
-
-
-def _get_warn_excl(arg):
-    if not arg:
-        return []
-    return [w_id.strip() for w_id in arg.split(",")]
-
-
-SPACY_WARNING_FILTER = os.environ.get("SPACY_WARNING_FILTER")
-SPACY_WARNING_TYPES = _get_warn_types(os.environ.get("SPACY_WARNING_TYPES"))
-SPACY_WARNING_IGNORE = _get_warn_excl(os.environ.get("SPACY_WARNING_IGNORE"))
-
-
-def user_warning(message):
-    _warn(message, "user")
-
-
-def deprecation_warning(message):
-    _warn(message, "deprecation")
-
-
-def models_warning(message):
-    _warn(message, "models")
-
-
-def _warn(message, warn_type="user"):
-    """
-    message (unicode): The message to display.
-    category (Warning): The Warning to show.
-    """
-    if message.startswith("["):
-        w_id = message.split("[", 1)[1].split("]", 1)[0]  # get ID from string
-    else:
-        w_id = None
-    ignore_warning = w_id and w_id in SPACY_WARNING_IGNORE
-    if warn_type in SPACY_WARNING_TYPES and not ignore_warning:
-        category = WARNINGS[warn_type]
-        stack = inspect.stack()[-1]
-        with warnings.catch_warnings():
-            if SPACY_WARNING_FILTER:
-                warnings.simplefilter(SPACY_WARNING_FILTER, category)
-            warnings.warn_explicit(message, category, stack[1], stack[2])
--- a/spacy/gold.pyx
+++ b/spacy/gold.pyx
@ -10,10 +10,11 @@ import shutil
 import itertools
 from pathlib import Path
 import srsly
+import warnings

 from .syntax import nonproj
 from .tokens import Doc, Span
-from .errors import Errors, AlignmentError, user_warning, Warnings
+from .errors import Errors, AlignmentError, Warnings
 from .compat import path2str
 from . import util
 from .util import minibatch, itershuffle
@ -21,7 +22,6 @@ from .util import minibatch, itershuffle
 from libc.stdio cimport FILE, fopen, fclose, fread, fwrite, feof, fseek


-USE_NEW_ALIGN = False
 punct_re = re.compile(r"\W")


@ -73,57 +73,8 @@ def merge_sents(sents):
    return [(m_deps, (m_cats, m_brackets))]


-_ALIGNMENT_NORM_MAP = [("``", "'"), ("''", "'"), ('"', "'"), ("`", "'")]
-
-
 def _normalize_for_alignment(tokens):
-    tokens = [w.replace(" ", "").lower() for w in tokens]
-    output = []
-    for token in tokens:
-        token = token.replace(" ", "").lower()
-        for before, after in _ALIGNMENT_NORM_MAP:
-            token = token.replace(before, after)
-        output.append(token)
-    return output
-
-
-def _align_before_v2_2_2(tokens_a, tokens_b):
-    """Calculate alignment tables between two tokenizations, using the Levenshtein
-    algorithm. The alignment is case-insensitive.
-
-    tokens_a (List[str]): The candidate tokenization.
-    tokens_b (List[str]): The reference tokenization.
-    RETURNS: (tuple): A 5-tuple consisting of the following information:
-      * cost (int): The number of misaligned tokens.
-      * a2b (List[int]): Mapping of indices in `tokens_a` to indices in `tokens_b`.
-        For instance, if `a2b[4] == 6`, that means that `tokens_a[4]` aligns
-        to `tokens_b[6]`. If there's no one-to-one alignment for a token,
-        it has the value -1.
-      * b2a (List[int]): The same as `a2b`, but mapping the other direction.
-      * a2b_multi (Dict[int, int]): A dictionary mapping indices in `tokens_a`
-        to indices in `tokens_b`, where multiple tokens of `tokens_a` align to
-        the same token of `tokens_b`.
-      * b2a_multi (Dict[int, int]): As with `a2b_multi`, but mapping the other
-            direction.
-    """
-    from . import _align
-    if tokens_a == tokens_b:
-        alignment = numpy.arange(len(tokens_a))
-        return 0, alignment, alignment, {}, {}
-    tokens_a = [w.replace(" ", "").lower() for w in tokens_a]
-    tokens_b = [w.replace(" ", "").lower() for w in tokens_b]
-    cost, i2j, j2i, matrix = _align.align(tokens_a, tokens_b)
-    i2j_multi, j2i_multi = _align.multi_align(i2j, j2i, [len(w) for w in tokens_a],
-                                                        [len(w) for w in tokens_b])
-    for i, j in list(i2j_multi.items()):
-        if i2j_multi.get(i+1) != j and i2j_multi.get(i-1) != j:
-            i2j[i] = j
-            i2j_multi.pop(i)
-    for j, i in list(j2i_multi.items()):
-        if j2i_multi.get(j+1) != i and j2i_multi.get(j-1) != i:
-            j2i[j] = i
-            j2i_multi.pop(j)
-    return cost, i2j, j2i, i2j_multi, j2i_multi
+    return [w.replace(" ", "").lower() for w in tokens]


 def align(tokens_a, tokens_b):
@ -144,8 +95,6 @@ def align(tokens_a, tokens_b):
      * b2a_multi (Dict[int, int]): As with `a2b_multi`, but mapping the other
            direction.
    """
-    if not USE_NEW_ALIGN:
-        return _align_before_v2_2_2(tokens_a, tokens_b)
    tokens_a = _normalize_for_alignment(tokens_a)
    tokens_b = _normalize_for_alignment(tokens_b)
    cost = 0
@ -382,6 +331,8 @@ class GoldCorpus(object):
 def make_orth_variants(nlp, raw, paragraph_tuples, orth_variant_level=0.0):
    if random.random() >= orth_variant_level:
        return raw, paragraph_tuples
+    raw_orig = str(raw)
+    lower = False
    if random.random() >= 0.5:
        lower = True
        if raw is not None:
@ -442,8 +393,11 @@ def make_orth_variants(nlp, raw, paragraph_tuples, orth_variant_level=0.0):
            ids, words, tags, heads, labels, ner = sent_tuples
            for word in words:
                match_found = False
+                # skip whitespace words
+                if word.isspace():
+                    match_found = True
                # add identical word
-                if word not in variants and raw[raw_idx:].startswith(word):
+                elif word not in variants and raw[raw_idx:].startswith(word):
                    variant_raw += word
                    raw_idx += len(word)
                    match_found = True
@ -458,7 +412,7 @@ def make_orth_variants(nlp, raw, paragraph_tuples, orth_variant_level=0.0):
                # something went wrong, abort
                # (add a warning message?)
                if not match_found:
-                    return raw, paragraph_tuples
+                    return raw_orig, paragraph_tuples
                # add following whitespace
                while raw_idx < len(raw) and re.match("\s", raw[raw_idx]):
                    variant_raw += raw[raw_idx]
@ -560,7 +514,7 @@ def _json_iterate(loc):
        py_raw = file_.read()
    cdef long file_length = len(py_raw)
    if file_length > 2 ** 30:
-        user_warning(Warnings.W027.format(size=file_length))
+        warnings.warn(Warnings.W027.format(size=file_length))

    raw = <char*>py_raw
    cdef int square_depth = 0
@ -700,8 +654,19 @@ cdef class GoldParse:
        # if self.lenght > 0, this is modified latter.
        self.orig_annot = []

+        # temporary doc for aligning entity annotation
+        entdoc = None
+
        # avoid allocating memory if the doc does not contain any tokens
-        if self.length > 0:
+        if self.length == 0:
+            self.words = []
+            self.tags = []
+            self.heads = []
+            self.labels = []
+            self.ner = []
+            self.morphology = []
+
+        else:
            if words is None:
                words = [token.text for token in doc]
            if tags is None:
@ -722,7 +687,25 @@ cdef class GoldParse:
                entities = [(ent if ent is not None else "-") for ent in entities]
                if not isinstance(entities[0], basestring):
                    # Assume we have entities specified by character offset.
-                    entities = biluo_tags_from_offsets(doc, entities)
+                    # Create a temporary Doc corresponding to provided words
+                    # (to preserve gold tokenization) and text (to preserve
+                    # character offsets).
+                    entdoc_words, entdoc_spaces = util.get_words_and_spaces(words, doc.text)
+                    entdoc = Doc(doc.vocab, words=entdoc_words, spaces=entdoc_spaces)
+                    entdoc_entities = biluo_tags_from_offsets(entdoc, entities)
+                    # There may be some additional whitespace tokens in the
+                    # temporary doc, so check that the annotations align with
+                    # the provided words while building a list of BILUO labels.
+                    entities = []
+                    words_offset = 0
+                    for i in range(len(entdoc_words)):
+                        if words[i + words_offset] == entdoc_words[i]:
+                            entities.append(entdoc_entities[i])
+                        else:
+                            words_offset -= 1
+                    if len(entities) != len(words):
+                        warnings.warn(Warnings.W029.format(text=doc.text))
+                        entities = ["-" for _ in words]

            # These are filled by the tagger/parser/entity recogniser
            self.c.tags = <int*>self.mem.alloc(len(doc), sizeof(int))
@ -749,7 +732,8 @@ cdef class GoldParse:
            # If we under-segment, we'll have one predicted word that covers a
            # sequence of gold words.
            # If we "mis-segment", we'll have a sequence of predicted words covering
-            # a sequence of gold words. That's many-to-many -- we don't do that.
+            # a sequence of gold words. That's many-to-many -- we don't do that
+            # except for NER spans where the start and end can be aligned.
            cost, i2j, j2i, i2j_multi, j2i_multi = align([t.orth_ for t in doc], words)

            self.cand_to_gold = [(j if j >= 0 else None) for j in i2j]
@ -772,7 +756,6 @@ cdef class GoldParse:
                        self.tags[i] = tags[i2j_multi[i]]
                        self.morphology[i] = morphology[i2j_multi[i]]
                        is_last = i2j_multi[i] != i2j_multi.get(i+1)
-                        is_first = i2j_multi[i] != i2j_multi.get(i-1)
                        # Set next word in multi-token span as head, until last
                        if not is_last:
                            self.heads[i] = i+1
@ -782,29 +765,9 @@ cdef class GoldParse:
                            if head_i:
                                self.heads[i] = self.gold_to_cand[head_i]
                            self.labels[i] = deps[i2j_multi[i]]
-                        # Now set NER...This is annoying because if we've split
-                        # got an entity word split into two, we need to adjust the
-                        # BILUO tags. We can't have BB or LL etc.
-                        # Case 1: O -- easy.
                        ner_tag = entities[i2j_multi[i]]
-                        if ner_tag == "O":
-                            self.ner[i] = "O"
-                        # Case 2: U. This has to become a B I* L sequence.
-                        elif ner_tag.startswith("U-"):
-                            if is_first:
-                                self.ner[i] = ner_tag.replace("U-", "B-", 1)
-                            elif is_last:
-                                self.ner[i] = ner_tag.replace("U-", "L-", 1)
-                            else:
-                                self.ner[i] = ner_tag.replace("U-", "I-", 1)
-                        # Case 3: L. If not last, change to I.
-                        elif ner_tag.startswith("L-"):
-                            if is_last:
-                                self.ner[i] = ner_tag
-                            else:
-                                self.ner[i] = ner_tag.replace("L-", "I-", 1)
-                        # Case 4: I. Stays correct
-                        elif ner_tag.startswith("I-"):
+                        # Assign O/- for many-to-one O/- NER tags
+                        if ner_tag in ("O", "-"):
                             self.ner[i] = ner_tag
                else:
                    self.words[i] = words[gold_i]
@ -816,6 +779,39 @@ cdef class GoldParse:
                        self.heads[i] = self.gold_to_cand[heads[gold_i]]
                    self.labels[i] = deps[gold_i]
                    self.ner[i] = entities[gold_i]
+            # Assign O/- for one-to-many O/- NER tags
+            for j, cand_j in enumerate(self.gold_to_cand):
+                if cand_j is None:
+                    if j in j2i_multi:
+                        i = j2i_multi[j]
+                        ner_tag = entities[j]
+                        if ner_tag in ("O", "-"):
+                            self.ner[i] = ner_tag
+
+            # If there is entity annotation and some tokens remain unaligned,
+            # align all entities at the character level to account for all
+            # possible token misalignments within the entity spans
+            if any([e not in ("O", "-") for e in entities]) and None in self.ner:
+                # If the temporary entdoc wasn't created above, initialize it
+                if not entdoc:
+                    entdoc_words, entdoc_spaces = util.get_words_and_spaces(words, doc.text)
+                    entdoc = Doc(doc.vocab, words=entdoc_words, spaces=entdoc_spaces)
+                # Get offsets based on gold words and BILUO entities
+                entdoc_offsets = offsets_from_biluo_tags(entdoc, entities)
+                aligned_offsets = []
+                aligned_spans = []
+                # Filter offsets to identify those that align with doc tokens
+                for offset in entdoc_offsets:
+                    span = doc.char_span(offset[0], offset[1])
+                    if span and not span.text.isspace():
+                        aligned_offsets.append(offset)
+                        aligned_spans.append(span)
+                # Convert back to BILUO for doc tokens and assign NER for all
+                # aligned spans
+                biluo_tags = biluo_tags_from_offsets(doc, aligned_offsets, missing=None)
+                for span in aligned_spans:
+                    for i in range(span.start, span.end):
+                        self.ner[i] = biluo_tags[i]

            # Prevent whitespace that isn't within entities from being tagged as
            # an entity.
@ -961,6 +957,12 @@ def biluo_tags_from_offsets(doc, entities, missing="O"):
                break
        else:
            biluo[token.i] = missing
+    if "-" in biluo:
+        ent_str = str(entities)
+        warnings.warn(Warnings.W030.format(
+            text=doc.text[:50] + "..." if len(doc.text) > 50 else doc.text,
+            entities=ent_str[:50] + "..." if len(ent_str) > 50 else ent_str
+        ))
    return biluo


--- a/spacy/kb.pxd
+++ b/spacy/kb.pxd
@ -6,7 +6,7 @@ from libcpp.vector cimport vector
 from libc.stdint cimport int32_t, int64_t
 from libc.stdio cimport FILE

-from spacy.vocab cimport Vocab
+from .vocab cimport Vocab
 from .typedefs cimport hash_t

 from .structs cimport KBEntryC, AliasC
@ -169,4 +169,3 @@ cdef class Reader:
    cdef int read_alias(self, int64_t* entry_index, float* prob) except -1

    cdef int _read(self, void* value, size_t size) except -1
-
--- a/spacy/kb.pyx
+++ b/spacy/kb.pyx
@ -1,21 +1,20 @@
 # cython: infer_types=True
 # cython: profile=True
 # coding: utf8
-from spacy.errors import Errors, Warnings, user_warning
-
-from pathlib import Path
 from cymem.cymem cimport Pool
 from preshed.maps cimport PreshMap
-
 from cpython.exc cimport PyErr_SetFromErrno
-
 from libc.stdio cimport fopen, fclose, fread, fwrite, feof, fseek
 from libc.stdint cimport int32_t, int64_t
+from libcpp.vector cimport vector
+
+import warnings
+from os import path
+from pathlib import Path

 from .typedefs cimport hash_t

-from os import path
-from libcpp.vector cimport vector
+from .errors import Errors, Warnings


 cdef class Candidate:
@ -115,7 +114,7 @@ cdef class KnowledgeBase:

        # Return if this entity was added before
        if entity_hash in self._entry_index:
-            user_warning(Warnings.W018.format(entity=entity))
+            warnings.warn(Warnings.W018.format(entity=entity))
            return

        # Raise an error if the provided entity vector is not of the correct length
@ -147,7 +146,7 @@ cdef class KnowledgeBase:
            # only process this entity if its unique ID hadn't been added before
            entity_hash = self.vocab.strings.add(entity_list[i])
            if entity_hash in self._entry_index:
-                user_warning(Warnings.W018.format(entity=entity_list[i]))
+                warnings.warn(Warnings.W018.format(entity=entity_list[i]))

            else:
                entity_vector = vector_list[i]
@ -195,7 +194,7 @@ cdef class KnowledgeBase:

        # Check whether this alias was added before
        if alias_hash in self._alias_index:
-            user_warning(Warnings.W017.format(alias=alias))
+            warnings.warn(Warnings.W017.format(alias=alias))
            return

        cdef vector[int64_t] entry_indices
@ -252,7 +251,7 @@ cdef class KnowledgeBase:

        if is_present:
            if not ignore_warnings:
-                user_warning(Warnings.W024.format(entity=entity, alias=alias))
+                warnings.warn(Warnings.W024.format(entity=entity, alias=alias))
        else:
            entry_indices.push_back(int(entry_index))
            alias_entry.entry_indices = entry_indices
@ -584,5 +583,3 @@ cdef class Reader:
    cdef int _read(self, void* value, size_t size) except -1:
        status = fread(value, size, 1, self._fp)
        return status
-
-
--- a/spacy/lang/da/init.py
+++ b/spacy/lang/da/init.py
@ -2,7 +2,6 @@
 from __future__ import unicode_literals

 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
-from .norm_exceptions import NORM_EXCEPTIONS
 from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
 from .stop_words import STOP_WORDS
 from .lex_attrs import LEX_ATTRS
@ -10,19 +9,15 @@ from .morph_rules import MORPH_RULES
 from ..tag_map import TAG_MAP

 from ..tokenizer_exceptions import BASE_EXCEPTIONS
-from ..norm_exceptions import BASE_NORMS
 from ...language import Language
-from ...attrs import LANG, NORM
-from ...util import update_exc, add_lookups
+from ...attrs import LANG
+from ...util import update_exc


 class DanishDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = lambda text: "da"
-    lex_attr_getters[NORM] = add_lookups(
-        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
-    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    morph_rules = MORPH_RULES
    infixes = TOKENIZER_INFIXES
--- a/spacy/lang/da/examples.py
+++ b/spacy/lang/da/examples.py
@ -9,10 +9,13 @@ Example sentences to test spaCy and its language models.
 >>> docs = nlp.pipe(sentences)
 """

-
 sentences = [
-    "Apple overvejer at købe et britisk startup for 1 milliard dollar",
-    "Selvkørende biler flytter forsikringsansvaret over på producenterne",
-    "San Francisco overvejer at forbyde udbringningsrobotter på fortov",
-    "London er en stor by i Storbritannien",
+    "Apple overvejer at købe et britisk startup for 1 milliard dollar.",
+    "Selvkørende biler flytter forsikringsansvaret over på producenterne.",
+    "San Francisco overvejer at forbyde udbringningsrobotter på fortovet.",
+    "London er en storby i Storbritannien.",
+    "Hvor er du?",
+    "Hvem er Frankrings president?",
+    "Hvad er hovedstaden i USA?",
+    "Hvornår blev Barack Obama født?",
 ]
--- a/spacy/lang/da/norm_exceptions.py
+++ b/spacy/lang/da/norm_exceptions.py
@ -1,527 +0,0 @@
-# coding: utf8
-"""
-Special-case rules for normalizing tokens to improve the model's predictions.
-For example 'mysterium' vs 'mysterie' and similar.
-"""
-from __future__ import unicode_literals
-
-
-# Sources:
-# 1: https://dsn.dk/retskrivning/om-retskrivningsordbogen/mere-om-retskrivningsordbogen-2012/endrede-stave-og-ordformer/
-# 2: http://www.tjerry-korrektur.dk/ord-med-flere-stavemaader/
-
-_exc = {
-    # Alternative spelling
-    "a-kraft-værk": "a-kraftværk",  # 1
-    "ålborg": "aalborg",  # 2
-    "århus": "aarhus",
-    "accessoirer": "accessoires",  # 1
-    "affektert": "affekteret",  # 1
-    "afrikander": "afrikaaner",  # 1
-    "aftabuere": "aftabuisere",  # 1
-    "aftabuering": "aftabuisering",  # 1
-    "akvarium": "akvarie",  # 1
-    "alenefader": "alenefar",  # 1
-    "alenemoder": "alenemor",  # 1
-    "alkoholambulatorium": "alkoholambulatorie",  # 1
-    "ambulatorium": "ambulatorie",  # 1
-    "ananassene": "ananasserne",  # 2
-    "anførelsestegn": "anførselstegn",  # 1
-    "anseelig": "anselig",  # 2
-    "antioxydant": "antioxidant",  # 1
-    "artrig": "artsrig",  # 1
-    "auditorium": "auditorie",  # 1
-    "avocado": "avokado",  # 2
-    "bagerst": "bagest",  # 2
-    "bagstræv": "bagstræb",  # 1
-    "bagstræver": "bagstræber",  # 1
-    "bagstræverisk": "bagstræberisk",  # 1
-    "balde": "balle",  # 2
-    "barselorlov": "barselsorlov",  # 1
-    "barselvikar": "barselsvikar",  # 1
-    "baskien": "baskerlandet",  # 1
-    "bayrisk": "bayersk",  # 1
-    "bedstefader": "bedstefar",  # 1
-    "bedstemoder": "bedstemor",  # 1
-    "behefte": "behæfte",  # 1
-    "beheftelse": "behæftelse",  # 1
-    "bidragydende": "bidragsydende",  # 1
-    "bidragyder": "bidragsyder",  # 1
-    "billiondel": "billiontedel",  # 1
-    "blaseret": "blasert",  # 1
-    "bleskifte": "bleskift",  # 1
-    "blodbroder": "blodsbroder",  # 2
-    "blyantspidser": "blyantsspidser",  # 2
-    "boligministerium": "boligministerie",  # 1
-    "borhul": "borehul",  # 1
-    "broder": "bror",  # 2
-    "buldog": "bulldog",  # 2
-    "bådhus": "bådehus",  # 1
-    "børnepleje": "barnepleje",  # 1
-    "børneseng": "barneseng",  # 1
-    "børnestol": "barnestol",  # 1
-    "cairo": "kairo",  # 1
-    "cambodia": "cambodja",  # 1
-    "cambodianer": "cambodjaner",  # 1
-    "cambodiansk": "cambodjansk",  # 1
-    "camouflage": "kamuflage",  # 2
-    "campylobacter": "kampylobakter",  # 1
-    "centeret": "centret",  # 2
-    "chefskahyt": "chefkahyt",  # 1
-    "chefspost": "chefpost",  # 1
-    "chefssekretær": "chefsekretær",  # 1
-    "chefsstol": "chefstol",  # 1
-    "cirkulærskrivelse": "cirkulæreskrivelse",  # 1
-    "cognacsglas": "cognacglas",  # 1
-    "columnist": "kolumnist",  # 1
-    "cricket": "kricket",  # 2
-    "dagplejemoder": "dagplejemor",  # 1
-    "damaskesdug": "damaskdug",  # 1
-    "damp-barn": "dampbarn",  # 1
-    "delfinarium": "delfinarie",  # 1
-    "dentallaboratorium": "dentallaboratorie",  # 1
-    "diaramme": "diasramme",  # 1
-    "diaré": "diarré",  # 1
-    "dioxyd": "dioxid",  # 1
-    "dommedagsprædiken": "dommedagspræken",  # 1
-    "donut": "doughnut",  # 2
-    "driftmæssig": "driftsmæssig",  # 1
-    "driftsikker": "driftssikker",  # 1
-    "driftsikring": "driftssikring",  # 1
-    "drikkejogurt": "drikkeyoghurt",  # 1
-    "drivein": "drive-in",  # 1
-    "driveinbiograf": "drive-in-biograf",  # 1
-    "drøvel": "drøbel",  # 1
-    "dødskriterium": "dødskriterie",  # 1
-    "e-mail-adresse": "e-mailadresse",  # 1
-    "e-post-adresse": "e-postadresse",  # 1
-    "egypten": "ægypten",  # 2
-    "ekskommunicere": "ekskommunikere",  # 1
-    "eksperimentarium": "eksperimentarie",  # 1
-    "elsass": "Alsace",  # 1
-    "elsasser": "alsacer",  # 1
-    "elsassisk": "alsacisk",  # 1
-    "elvetal": "ellevetal",  # 1
-    "elvetiden": "ellevetiden",  # 1
-    "elveårig": "elleveårig",  # 1
-    "elveårs": "elleveårs",  # 1
-    "elveårsbarn": "elleveårsbarn",  # 1
-    "elvte": "ellevte",  # 1
-    "elvtedel": "ellevtedel",  # 1
-    "energiministerium": "energiministerie",  # 1
-    "erhvervsministerium": "erhvervsministerie",  # 1
-    "espaliere": "spaliere",  # 2
-    "evangelium": "evangelie",  # 1
-    "fagministerium": "fagministerie",  # 1
-    "fakse": "faxe",  # 1
-    "fangstkvota": "fangstkvote",  # 1
-    "fader": "far",  # 2
-    "farbroder": "farbror",  # 1
-    "farfader": "farfar",  # 1
-    "farmoder": "farmor",  # 1
-    "federal": "føderal",  # 1
-    "federalisering": "føderalisering",  # 1
-    "federalisme": "føderalisme",  # 1
-    "federalist": "føderalist",  # 1
-    "federalistisk": "føderalistisk",  # 1
-    "federation": "føderation",  # 1
-    "federativ": "føderativ",  # 1
-    "fejlbeheftet": "fejlbehæftet",  # 1
-    "femetagers": "femetages",  # 2
-    "femhundredekroneseddel": "femhundredkroneseddel",  # 2
-    "filmpremiere": "filmpræmiere",  # 2
-    "finansimperium": "finansimperie",  # 1
-    "finansministerium": "finansministerie",  # 1
-    "firehjulstræk": "firhjulstræk",  # 2
-    "fjernstudium": "fjernstudie",  # 1
-    "formalier": "formalia",  # 1
-    "formandsskift": "formandsskifte",  # 1
-    "fornemst": "fornemmest",  # 2
-    "fornuftparti": "fornuftsparti",  # 1
-    "fornuftstridig": "fornuftsstridig",  # 1
-    "fornuftvæsen": "fornuftsvæsen",  # 1
-    "fornuftægteskab": "fornuftsægteskab",  # 1
-    "forretningsministerium": "forretningsministerie",  # 1
-    "forskningsministerium": "forskningsministerie",  # 1
-    "forstudium": "forstudie",  # 1
-    "forsvarsministerium": "forsvarsministerie",  # 1
-    "frilægge": "fritlægge",  # 1
-    "frilæggelse": "fritlæggelse",  # 1
-    "frilægning": "fritlægning",  # 1
-    "fristille": "fritstille",  # 1
-    "fristilling": "fritstilling",  # 1
-    "fuldttegnet": "fuldtegnet",  # 1
-    "fødestedskriterium": "fødestedskriterie",  # 1
-    "fødevareministerium": "fødevareministerie",  # 1
-    "følesløs": "følelsesløs",  # 1
-    "følgeligt": "følgelig",  # 1
-    "førne": "førn",  # 1
-    "gearskift": "gearskifte",  # 2
-    "gladeligt": "gladelig",  # 1
-    "glosehefte": "glosehæfte",  # 1
-    "glædeløs": "glædesløs",  # 1
-    "gonoré": "gonorré",  # 1
-    "grangiveligt": "grangivelig",  # 1
-    "grundliggende": "grundlæggende",  # 2
-    "grønsag": "grøntsag",  # 2
-    "gudbenådet": "gudsbenådet",  # 1
-    "gudfader": "gudfar",  # 1
-    "gudmoder": "gudmor",  # 1
-    "gulvmop": "gulvmoppe",  # 1
-    "gymnasium": "gymnasie",  # 1
-    "hackning": "hacking",  # 1
-    "halvbroder": "halvbror",  # 1
-    "halvelvetiden": "halvellevetiden",  # 1
-    "handelsgymnasium": "handelsgymnasie",  # 1
-    "hefte": "hæfte",  # 1
-    "hefteklamme": "hæfteklamme",  # 1
-    "heftelse": "hæftelse",  # 1
-    "heftemaskine": "hæftemaskine",  # 1
-    "heftepistol": "hæftepistol",  # 1
-    "hefteplaster": "hæfteplaster",  # 1
-    "heftestraf": "hæftestraf",  # 1
-    "heftning": "hæftning",  # 1
-    "helbroder": "helbror",  # 1
-    "hjemmeklasse": "hjemklasse",  # 1
-    "hjulspin": "hjulspind",  # 1
-    "huggevåben": "hugvåben",  # 1
-    "hulmurisolering": "hulmursisolering",  # 1
-    "hurtiggående": "hurtigtgående",  # 2
-    "hurtigttørrende": "hurtigtørrende",  # 2
-    "husmoder": "husmor",  # 1
-    "hydroxyd": "hydroxid",  # 1
-    "håndmikser": "håndmixer",  # 1
-    "højtaler": "højttaler",  # 2
-    "hønemoder": "hønemor",  # 1
-    "ide": "idé",  # 2
-    "imperium": "imperie",  # 1
-    "imponerthed": "imponerethed",  # 1
-    "inbox": "indboks",  # 2
-    "indenrigsministerium": "indenrigsministerie",  # 1
-    "indhefte": "indhæfte",  # 1
-    "indheftning": "indhæftning",  # 1
-    "indicium": "indicie",  # 1
-    "indkassere": "inkassere",  # 2
-    "iota": "jota",  # 1
-    "jobskift": "jobskifte",  # 1
-    "jogurt": "yoghurt",  # 1
-    "jukeboks": "jukebox",  # 1
-    "justitsministerium": "justitsministerie",  # 1
-    "kalorifere": "kalorifer",  # 1
-    "kandidatstipendium": "kandidatstipendie",  # 1
-    "kannevas": "kanvas",  # 1
-    "kaperssauce": "kaperssovs",  # 1
-    "kigge": "kikke",  # 2
-    "kirkeministerium": "kirkeministerie",  # 1
-    "klapmydse": "klapmyds",  # 1
-    "klimakterium": "klimakterie",  # 1
-    "klogeligt": "klogelig",  # 1
-    "knivblad": "knivsblad",  # 1
-    "kollegaer": "kolleger",  # 2
-    "kollegium": "kollegie",  # 1
-    "kollegiehefte": "kollegiehæfte",  # 1
-    "kollokviumx": "kollokvium",  # 1
-    "kommissorium": "kommissorie",  # 1
-    "kompendium": "kompendie",  # 1
-    "komplicerthed": "komplicerethed",  # 1
-    "konfederation": "konføderation",  # 1
-    "konfedereret": "konfødereret",  # 1
-    "konferensstudium": "konferensstudie",  # 1
-    "konservatorium": "konservatorie",  # 1
-    "konsulere": "konsultere",  # 1
-    "kradsbørstig": "krasbørstig",  # 2
-    "kravsspecifikation": "kravspecifikation",  # 1
-    "krematorium": "krematorie",  # 1
-    "krep": "crepe",  # 1
-    "krepnylon": "crepenylon",  # 1
-    "kreppapir": "crepepapir",  # 1
-    "kricket": "cricket",  # 2
-    "kriterium": "kriterie",  # 1
-    "kroat": "kroater",  # 2
-    "kroki": "croquis",  # 1
-    "kronprinsepar": "kronprinspar",  # 2
-    "kropdoven": "kropsdoven",  # 1
-    "kroplus": "kropslus",  # 1
-    "krøllefedt": "krølfedt",  # 1
-    "kulturministerium": "kulturministerie",  # 1
-    "kuponhefte": "kuponhæfte",  # 1
-    "kvota": "kvote",  # 1
-    "kvotaordning": "kvoteordning",  # 1
-    "laboratorium": "laboratorie",  # 1
-    "laksfarve": "laksefarve",  # 1
-    "laksfarvet": "laksefarvet",  # 1
-    "laksrød": "lakserød",  # 1
-    "laksyngel": "lakseyngel",  # 1
-    "laksørred": "lakseørred",  # 1
-    "landbrugsministerium": "landbrugsministerie",  # 1
-    "landskampstemning": "landskampsstemning",  # 1
-    "langust": "languster",  # 1
-    "lappegrejer": "lappegrej",  # 1
-    "lavløn": "lavtløn",  # 1
-    "lillebroder": "lillebror",  # 1
-    "linear": "lineær",  # 1
-    "loftlampe": "loftslampe",  # 2
-    "log-in": "login",  # 1
-    "login": "log-in",  # 2
-    "lovmedholdig": "lovmedholdelig",  # 1
-    "ludder": "luder",  # 2
-    "lysholder": "lyseholder",  # 1
-    "lægeskifte": "lægeskift",  # 1
-    "lærvillig": "lærevillig",  # 1
-    "løgsauce": "løgsovs",  # 1
-    "madmoder": "madmor",  # 1
-    "majonæse": "mayonnaise",  # 1
-    "mareridtagtig": "mareridtsagtig",  # 1
-    "margen": "margin",  # 2
-    "martyrium": "martyrie",  # 1
-    "mellemstatlig": "mellemstatslig",  # 1
-    "menneskene": "menneskerne",  # 2
-    "metropolis": "metropol",  # 1
-    "miks": "mix",  # 1
-    "mikse": "mixe",  # 1
-    "miksepult": "mixerpult",  # 1
-    "mikser": "mixer",  # 1
-    "mikserpult": "mixerpult",  # 1
-    "mikslån": "mixlån",  # 1
-    "miksning": "mixning",  # 1
-    "miljøministerium": "miljøministerie",  # 1
-    "milliarddel": "milliardtedel",  # 1
-    "milliondel": "milliontedel",  # 1
-    "ministerium": "ministerie",  # 1
-    "mop": "moppe",  # 1
-    "moder": "mor",  # 2
-    "moratorium": "moratorie",  # 1
-    "morbroder": "morbror",  # 1
-    "morfader": "morfar",  # 1
-    "mormoder": "mormor",  # 1
-    "musikkonservatorium": "musikkonservatorie",  # 1
-    "muslingskal": "muslingeskal",  # 1
-    "mysterium": "mysterie",  # 1
-    "naturalieydelse": "naturalydelse",  # 1
-    "naturalieøkonomi": "naturaløkonomi",  # 1
-    "navnebroder": "navnebror",  # 1
-    "nerium": "nerie",  # 1
-    "nådeløs": "nådesløs",  # 1
-    "nærforestående": "nærtforestående",  # 1
-    "nærstående": "nærtstående",  # 1
-    "observatorium": "observatorie",  # 1
-    "oldefader": "oldefar",  # 1
-    "oldemoder": "oldemor",  # 1
-    "opgraduere": "opgradere",  # 1
-    "opgraduering": "opgradering",  # 1
-    "oratorium": "oratorie",  # 1
-    "overbookning": "overbooking",  # 1
-    "overpræsidium": "overpræsidie",  # 1
-    "overstatlig": "overstatslig",  # 1
-    "oxyd": "oxid",  # 1
-    "oxydere": "oxidere",  # 1
-    "oxydering": "oxidering",  # 1
-    "pakkenellike": "pakkenelliker",  # 1
-    "papirtynd": "papirstynd",  # 1
-    "pastoralseminarium": "pastoralseminarie",  # 1
-    "peanutsene": "peanuttene",  # 2
-    "penalhus": "pennalhus",  # 2
-    "pensakrav": "pensumkrav",  # 1
-    "pepperoni": "peperoni",  # 1
-    "peruaner": "peruvianer",  # 1
-    "petrole": "petrol",  # 1
-    "piltast": "piletast",  # 1
-    "piltaste": "piletast",  # 1
-    "planetarium": "planetarie",  # 1
-    "plasteret": "plastret",  # 2
-    "plastic": "plastik",  # 2
-    "play-off-kamp": "playoffkamp",  # 1
-    "plejefader": "plejefar",  # 1
-    "plejemoder": "plejemor",  # 1
-    "podium": "podie",  # 2
-    "praha": "prag",  # 2
-    "preciøs": "pretiøs",  # 2
-    "privilegium": "privilegie",  # 1
-    "progredere": "progrediere",  # 1
-    "præsidium": "præsidie",  # 1
-    "psykodelisk": "psykedelisk",  # 1
-    "pudsegrejer": "pudsegrej",  # 1
-    "referensgruppe": "referencegruppe",  # 1
-    "referensramme": "referenceramme",  # 1
-    "refugium": "refugie",  # 1
-    "registeret": "registret",  # 2
-    "remedium": "remedie",  # 1
-    "remiks": "remix",  # 1
-    "reservert": "reserveret",  # 1
-    "ressortministerium": "ressortministerie",  # 1
-    "ressource": "resurse",  # 2
-    "resætte": "resette",  # 1
-    "rettelig": "retteligt",  # 1
-    "rettetaste": "rettetast",  # 1
-    "returtaste": "returtast",  # 1
-    "risici": "risikoer",  # 2
-    "roll-on": "rollon",  # 1
-    "rollehefte": "rollehæfte",  # 1
-    "rostbøf": "roastbeef",  # 1
-    "rygsæksturist": "rygsækturist",  # 1
-    "rødstjært": "rødstjert",  # 1
-    "saddel": "sadel",  # 2
-    "samaritan": "samaritaner",  # 2
-    "sanatorium": "sanatorie",  # 1
-    "sauce": "sovs",  # 1
-    "scanning": "skanning",  # 2
-    "sceneskifte": "sceneskift",  # 1
-    "scilla": "skilla",  # 1
-    "sejflydende": "sejtflydende",  # 1
-    "selvstudium": "selvstudie",  # 1
-    "seminarium": "seminarie",  # 1
-    "sennepssauce": "sennepssovs ",  # 1
-    "servitutbeheftet": "servitutbehæftet",  # 1
-    "sit-in": "sitin",  # 1
-    "skatteministerium": "skatteministerie",  # 1
-    "skifer": "skiffer",  # 2
-    "skyldsfølelse": "skyldfølelse",  # 1
-    "skysauce": "skysovs",  # 1
-    "sladdertaske": "sladretaske",  # 2
-    "sladdervorn": "sladrevorn",  # 2
-    "slagsbroder": "slagsbror",  # 1
-    "slettetaste": "slettetast",  # 1
-    "smørsauce": "smørsovs",  # 1
-    "snitsel": "schnitzel",  # 1
-    "snobbeeffekt": "snobeffekt",  # 2
-    "socialministerium": "socialministerie",  # 1
-    "solarium": "solarie",  # 1
-    "soldebroder": "soldebror",  # 1
-    "spagetti": "spaghetti",  # 1
-    "spagettistrop": "spaghettistrop",  # 1
-    "spagettiwestern": "spaghettiwestern",  # 1
-    "spin-off": "spinoff",  # 1
-    "spinnefiskeri": "spindefiskeri",  # 1
-    "spolorm": "spoleorm",  # 1
-    "sproglaboratorium": "sproglaboratorie",  # 1
-    "spækbræt": "spækkebræt",  # 2
-    "stand-in": "standin",  # 1
-    "stand-up-comedy": "standupcomedy",  # 1
-    "stand-up-komiker": "standupkomiker",  # 1
-    "statsministerium": "statsministerie",  # 1
-    "stedbroder": "stedbror",  # 1
-    "stedfader": "stedfar",  # 1
-    "stedmoder": "stedmor",  # 1
-    "stilehefte": "stilehæfte",  # 1
-    "stipendium": "stipendie",  # 1
-    "stjært": "stjert",  # 1
-    "stjærthage": "stjerthage",  # 1
-    "storebroder": "storebror",  # 1
-    "stortå": "storetå",  # 1
-    "strabads": "strabadser",  # 1
-    "strømlinjet": "strømlinet",  # 1
-    "studium": "studie",  # 1
-    "stænkelap": "stænklap",  # 1
-    "sundhedsministerium": "sundhedsministerie",  # 1
-    "suppositorium": "suppositorie",  # 1
-    "svejts": "schweiz",  # 1
-    "svejtser": "schweizer",  # 1
-    "svejtserfranc": "schweizerfranc",  # 1
-    "svejtserost": "schweizerost",  # 1
-    "svejtsisk": "schweizisk",  # 1
-    "svigerfader": "svigerfar",  # 1
-    "svigermoder": "svigermor",  # 1
-    "svirebroder": "svirebror",  # 1
-    "symposium": "symposie",  # 1
-    "sælarium": "sælarie",  # 1
-    "søreme": "sørme",  # 2
-    "søterritorium": "søterritorie",  # 1
-    "t-bone-steak": "t-bonesteak",  # 1
-    "tabgivende": "tabsgivende",  # 1
-    "tabuere": "tabuisere",  # 1
-    "tabuering": "tabuisering",  # 1
-    "tackle": "takle",  # 2
-    "tackling": "takling",  # 2
-    "taifun": "tyfon",  # 1
-    "take-off": "takeoff",  # 1
-    "taknemlig": "taknemmelig",  # 2
-    "talehørelærer": "tale-høre-lærer",  # 1
-    "talehøreundervisning": "tale-høre-undervisning",  # 1
-    "tandstik": "tandstikker",  # 1
-    "tao": "dao",  # 1
-    "taoisme": "daoisme",  # 1
-    "taoist": "daoist",  # 1
-    "taoistisk": "daoistisk",  # 1
-    "taverne": "taverna",  # 1
-    "teateret": "teatret",  # 2
-    "tekno": "techno",  # 1
-    "temposkifte": "temposkift",  # 1
-    "terrarium": "terrarie",  # 1
-    "territorium": "territorie",  # 1
-    "tesis": "tese",  # 1
-    "tidsstudium": "tidsstudie",  # 1
-    "tipoldefader": "tipoldefar",  # 1
-    "tipoldemoder": "tipoldemor",  # 1
-    "tomatsauce": "tomatsovs",  # 1
-    "tonart": "toneart",  # 1
-    "trafikministerium": "trafikministerie",  # 1
-    "tredve": "tredive",  # 1
-    "tredver": "trediver",  # 1
-    "tredveårig": "trediveårig",  # 1
-    "tredveårs": "trediveårs",  # 1
-    "tredveårsfødselsdag": "trediveårsfødselsdag",  # 1
-    "tredvte": "tredivte",  # 1
-    "tredvtedel": "tredivtedel",  # 1
-    "troldunge": "troldeunge",  # 1
-    "trommestikke": "trommestik",  # 1
-    "trubadur": "troubadour",  # 2
-    "trøstepræmie": "trøstpræmie",  # 2
-    "tummerum": "trummerum",  # 1
-    "tumultuarisk": "tumultarisk",  # 1
-    "tunghørighed": "tunghørhed",  # 1
-    "tus": "tusch",  # 2
-    "tusind": "tusinde",  # 2
-    "tvillingbroder": "tvillingebror",  # 1
-    "tvillingbror": "tvillingebror",  # 1
-    "tvillingebroder": "tvillingebror",  # 1
-    "ubeheftet": "ubehæftet",  # 1
-    "udenrigsministerium": "udenrigsministerie",  # 1
-    "udhulning": "udhuling",  # 1
-    "udslaggivende": "udslagsgivende",  # 1
-    "udspekulert": "udspekuleret",  # 1
-    "udviklingsministerium": "udviklingsministerie",  # 1
-    "uforpligtigende": "uforpligtende",  # 1
-    "uheldvarslende": "uheldsvarslende",  # 1
-    "uimponerthed": "uimponerethed",  # 1
-    "undervisningsministerium": "undervisningsministerie",  # 1
-    "unægtelig": "unægteligt",  # 1
-    "urinale": "urinal",  # 1
-    "uvederheftig": "uvederhæftig",  # 1
-    "vabel": "vable",  # 2
-    "vadi": "wadi",  # 1
-    "vaklevorn": "vakkelvorn",  # 1
-    "vanadin": "vanadium",  # 1
-    "vaselin": "vaseline",  # 1
-    "vederheftig": "vederhæftig",  # 1
-    "vedhefte": "vedhæfte",  # 1
-    "velar": "velær",  # 1
-    "videndeling": "vidensdeling",  # 2
-    "vinkelanførelsestegn": "vinkelanførselstegn",  # 1
-    "vipstjært": "vipstjert",  # 1
-    "vismut": "bismut",  # 1
-    "visvas": "vissevasse",  # 1
-    "voksværk": "vokseværk",  # 1
-    "værtdyr": "værtsdyr",  # 1
-    "værtplante": "værtsplante",  # 1
-    "wienersnitsel": "wienerschnitzel",  # 1
-    "yderliggående": "yderligtgående",  # 2
-    "zombi": "zombie",  # 1
-    "ægbakke": "æggebakke",  # 1
-    "ægformet": "æggeformet",  # 1
-    "ægleder": "æggeleder",  # 1
-    "ækvilibrist": "ekvilibrist",  # 2
-    "æselsøre": "æseløre",  # 1
-    "øjehule": "øjenhule",  # 1
-    "øjelåg": "øjenlåg",  # 1
-    "øjeåbner": "øjenåbner",  # 1
-    "økonomiministerium": "økonomiministerie",  # 1
-    "ørenring": "ørering",  # 2
-    "øvehefte": "øvehæfte",  # 1
-}
-
-
-NORM_EXCEPTIONS = {}
-
-for string, norm in _exc.items():
-    NORM_EXCEPTIONS[string] = norm
-    NORM_EXCEPTIONS[string.title()] = norm
--- a/spacy/lang/da/tokenizer_exceptions.py
+++ b/spacy/lang/da/tokenizer_exceptions.py
@ -6,7 +6,7 @@ Source: https://forkortelse.dk/ and various others.

 from __future__ import unicode_literals

-from ...symbols import ORTH, LEMMA, NORM, TAG, PUNCT
+from ...symbols import ORTH, LEMMA, NORM


 _exc = {}
@ -52,7 +52,7 @@ for exc_data in [
    {ORTH: "Ons.", LEMMA: "onsdag"},
    {ORTH: "Fre.", LEMMA: "fredag"},
    {ORTH: "Lør.", LEMMA: "lørdag"},
-    {ORTH: "og/eller", LEMMA: "og/eller", NORM: "og/eller", TAG: "CC"},
+    {ORTH: "og/eller", LEMMA: "og/eller", NORM: "og/eller"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]

@ -70,6 +70,7 @@ for orth in [
    "A/S",
    "B.C.",
    "BK.",
+    "B.T.",
    "Dr.",
    "Boul.",
    "Chr.",
@ -79,6 +80,7 @@ for orth in [
    "Hf.",
    "i/s",
    "I/S",
+    "Inc.",
    "Kprs.",
    "L.A.",
    "Ll.",
@ -149,6 +151,7 @@ for orth in [
    "bygn.",
    "c/o",
    "ca.",
+    "cm.",
    "cand.",
    "d.d.",
    "d.m.",
@ -172,10 +175,12 @@ for orth in [
    "dl.",
    "do.",
    "dobb.",
+    "dr.",
    "dr.h.c",
    "dr.phil.",
    "ds.",
    "dvs.",
+    "d.v.s.",
    "e.b.",
    "e.l.",
    "e.o.",
@ -297,10 +302,14 @@ for orth in [
    "kap.",
    "kbh.",
    "kem.",
+    "kg.",
+    "kgs.",
    "kgl.",
    "kl.",
    "kld.",
+    "km.",
    "km/t",
+    "km/t.",
    "knsp.",
    "komm.",
    "kons.",
@ -311,6 +320,7 @@ for orth in [
    "kt.",
    "ktr.",
    "kv.",
+    "kvm.",
    "kvt.",
    "l.c.",
    "lab.",
@ -357,6 +367,7 @@ for orth in [
    "nto.",
    "nuv.",
    "o/m",
+    "o/m.",
    "o.a.",
    "o.fl.",
    "o.h.",
@ -526,6 +537,7 @@ for orth in [
    "vejl.",
    "vh.",
    "vha.",
+    "vind.",
    "vs.",
    "vsa.",
    "vær.",
@ -565,7 +577,7 @@ for h in range(1, 31 + 1):
    for period in ["."]:
        _exc["%d%s" % (h, period)] = [{ORTH: "%d." % h}]

-_custom_base_exc = {"i.": [{ORTH: "i", LEMMA: "i", NORM: "i"}, {ORTH: ".", TAG: PUNCT}]}
+_custom_base_exc = {"i.": [{ORTH: "i", LEMMA: "i", NORM: "i"}, {ORTH: "."}]}
 _exc.update(_custom_base_exc)

 TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/lang/de/init.py
+++ b/spacy/lang/de/init.py
@ -2,7 +2,6 @@
 from __future__ import unicode_literals

 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
-from .norm_exceptions import NORM_EXCEPTIONS
 from .punctuation import TOKENIZER_PREFIXES, TOKENIZER_SUFFIXES
 from .punctuation import TOKENIZER_INFIXES
 from .tag_map import TAG_MAP
@ -10,18 +9,14 @@ from .stop_words import STOP_WORDS
 from .syntax_iterators import SYNTAX_ITERATORS

 from ..tokenizer_exceptions import BASE_EXCEPTIONS
-from ..norm_exceptions import BASE_NORMS
 from ...language import Language
-from ...attrs import LANG, NORM
-from ...util import update_exc, add_lookups
+from ...attrs import LANG
+from ...util import update_exc


 class GermanDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters[LANG] = lambda text: "de"
-    lex_attr_getters[NORM] = add_lookups(
-        Language.Defaults.lex_attr_getters[NORM], NORM_EXCEPTIONS, BASE_NORMS
-    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    prefixes = TOKENIZER_PREFIXES
    suffixes = TOKENIZER_SUFFIXES
--- a/spacy/lang/de/norm_exceptions.py
+++ b/spacy/lang/de/norm_exceptions.py
@ -1,16 +0,0 @@
-# coding: utf8
-from __future__ import unicode_literals
-
-# Here we only want to include the absolute most common words. Otherwise,
-# this list would get impossibly long for German – especially considering the
-# old vs. new spelling rules, and all possible cases.
-
-
-_exc = {"daß": "dass"}
-
-
-NORM_EXCEPTIONS = {}
-
-for string, norm in _exc.items():
-    NORM_EXCEPTIONS[string] = norm
-    NORM_EXCEPTIONS[string.title()] = norm
--- a/spacy/lang/de/punctuation.py
+++ b/spacy/lang/de/punctuation.py
@ -2,12 +2,12 @@
 from __future__ import unicode_literals

 from ..char_classes import LIST_ELLIPSES, LIST_ICONS, LIST_PUNCT, LIST_QUOTES
-from ..char_classes import LIST_CURRENCY, CURRENCY, UNITS, PUNCT
+from ..char_classes import CURRENCY, UNITS, PUNCT
 from ..char_classes import CONCAT_QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
-from ..punctuation import _prefixes, _suffixes
+from ..punctuation import TOKENIZER_PREFIXES as BASE_TOKENIZER_PREFIXES


-_prefixes = ["``",] + list(_prefixes)
+_prefixes = ["``"] + BASE_TOKENIZER_PREFIXES

 _suffixes = (
    ["''", "/"]
--- a/spacy/lang/de/syntax_iterators.py
+++ b/spacy/lang/de/syntax_iterators.py
@ -2,9 +2,10 @@
 from __future__ import unicode_literals

 from ...symbols import NOUN, PROPN, PRON
+from ...errors import Errors


-def noun_chunks(obj):
+def noun_chunks(doclike):
    """
    Detect base noun phrases from a dependency parse. Works on both Doc and Span.
    """
@ -27,13 +28,17 @@ def noun_chunks(obj):
        "og",
        "app",
    ]
-    doc = obj.doc  # Ensure works on both Doc and Span.
+    doc = doclike.doc  # Ensure works on both Doc and Span.
+
+    if not doc.is_parsed:
+        raise ValueError(Errors.E029)
+
    np_label = doc.vocab.strings.add("NP")
    np_deps = set(doc.vocab.strings.add(label) for label in labels)
    close_app = doc.vocab.strings.add("nk")

    rbracket = 0
-    for i, word in enumerate(obj):
+    for i, word in enumerate(doclike):
        if i < rbracket:
            continue
        if word.pos in (NOUN, PROPN, PRON) and word.dep in np_deps:
--- a/spacy/lang/el/init.py
+++ b/spacy/lang/el/init.py
@ -10,21 +10,16 @@ from .lemmatizer import GreekLemmatizer
 from .syntax_iterators import SYNTAX_ITERATORS
 from .punctuation import TOKENIZER_PREFIXES, TOKENIZER_SUFFIXES, TOKENIZER_INFIXES
 from ..tokenizer_exceptions import BASE_EXCEPTIONS
-from .norm_exceptions import NORM_EXCEPTIONS
-from ..norm_exceptions import BASE_NORMS
 from ...language import Language
 from ...lookups import Lookups
-from ...attrs import LANG, NORM
-from ...util import update_exc, add_lookups
+from ...attrs import LANG
+from ...util import update_exc


 class GreekDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = lambda text: "el"
-    lex_attr_getters[NORM] = add_lookups(
-        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
-    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    tag_map = TAG_MAP
--- a/spacy/lang/el/norm_exceptions.py
+++ b/spacy/lang/el/norm_exceptions.py
--- a/spacy/lang/el/syntax_iterators.py
+++ b/spacy/lang/el/syntax_iterators.py
@ -2,9 +2,10 @@
 from __future__ import unicode_literals

 from ...symbols import NOUN, PROPN, PRON
+from ...errors import Errors


-def noun_chunks(obj):
+def noun_chunks(doclike):
    """
    Detect base noun phrases. Works on both Doc and Span.
    """
@ -13,34 +14,34 @@ def noun_chunks(obj):
    # obj tag corrects some DEP tagger mistakes.
    # Further improvement of the models will eliminate the need for this tag.
    labels = ["nsubj", "obj", "iobj", "appos", "ROOT", "obl"]
-    doc = obj.doc  # Ensure works on both Doc and Span.
+    doc = doclike.doc  # Ensure works on both Doc and Span.
+
+    if not doc.is_parsed:
+        raise ValueError(Errors.E029)
+
    np_deps = [doc.vocab.strings.add(label) for label in labels]
    conj = doc.vocab.strings.add("conj")
    nmod = doc.vocab.strings.add("nmod")
    np_label = doc.vocab.strings.add("NP")
-    seen = set()
-    for i, word in enumerate(obj):
+    prev_end = -1
+    for i, word in enumerate(doclike):
        if word.pos not in (NOUN, PROPN, PRON):
            continue
        # Prevent nested chunks from being produced
-        if word.i in seen:
+        if word.left_edge.i <= prev_end:
            continue
        if word.dep in np_deps:
-            if any(w.i in seen for w in word.subtree):
-                continue
            flag = False
            if word.pos == NOUN:
                #  check for patterns such as γραμμή παραγωγής
                for potential_nmod in word.rights:
                    if potential_nmod.dep == nmod:
-                        seen.update(
-                            j for j in range(word.left_edge.i, potential_nmod.i + 1)
-                        )
+                        prev_end = potential_nmod.i
                        yield word.left_edge.i, potential_nmod.i + 1, np_label
                        flag = True
                        break
            if flag is False:
-                seen.update(j for j in range(word.left_edge.i, word.i + 1))
+                prev_end = word.i
                yield word.left_edge.i, word.i + 1, np_label
        elif word.dep == conj:
            # covers the case: έχει όμορφα και έξυπνα παιδιά
@ -49,9 +50,7 @@ def noun_chunks(obj):
                head = head.head
            # If the head is an NP, and we're coordinated to it, we're an NP
            if head.dep in np_deps:
-                if any(w.i in seen for w in word.subtree):
-                    continue
-                seen.update(j for j in range(word.left_edge.i, word.i + 1))
+                prev_end = word.i
                yield word.left_edge.i, word.i + 1, np_label


--- a/spacy/lang/en/init.py
+++ b/spacy/lang/en/init.py
@ -2,7 +2,6 @@
 from __future__ import unicode_literals

 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
-from .norm_exceptions import NORM_EXCEPTIONS
 from .tag_map import TAG_MAP
 from .stop_words import STOP_WORDS
 from .lex_attrs import LEX_ATTRS
@ -10,10 +9,9 @@ from .morph_rules import MORPH_RULES
 from .syntax_iterators import SYNTAX_ITERATORS

 from ..tokenizer_exceptions import BASE_EXCEPTIONS
-from ..norm_exceptions import BASE_NORMS
 from ...language import Language
-from ...attrs import LANG, NORM
-from ...util import update_exc, add_lookups
+from ...attrs import LANG
+from ...util import update_exc


 def _return_en(_):
@ -24,9 +22,6 @@ class EnglishDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = _return_en
-    lex_attr_getters[NORM] = add_lookups(
-        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
-    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    tag_map = TAG_MAP
    stop_words = STOP_WORDS
--- a/spacy/lang/en/norm_exceptions.py
+++ b/spacy/lang/en/norm_exceptions.py
--- a/spacy/lang/en/syntax_iterators.py
+++ b/spacy/lang/en/syntax_iterators.py
@ -2,9 +2,10 @@
 from __future__ import unicode_literals

 from ...symbols import NOUN, PROPN, PRON
+from ...errors import Errors


-def noun_chunks(obj):
+def noun_chunks(doclike):
    """
    Detect base noun phrases from a dependency parse. Works on both Doc and Span.
    """
@ -19,21 +20,23 @@ def noun_chunks(obj):
        "attr",
        "ROOT",
    ]
-    doc = obj.doc  # Ensure works on both Doc and Span.
+    doc = doclike.doc  # Ensure works on both Doc and Span.
+
+    if not doc.is_parsed:
+        raise ValueError(Errors.E029)
+
    np_deps = [doc.vocab.strings.add(label) for label in labels]
    conj = doc.vocab.strings.add("conj")
    np_label = doc.vocab.strings.add("NP")
-    seen = set()
-    for i, word in enumerate(obj):
+    prev_end = -1
+    for i, word in enumerate(doclike):
        if word.pos not in (NOUN, PROPN, PRON):
            continue
        # Prevent nested chunks from being produced
-        if word.i in seen:
+        if word.left_edge.i <= prev_end:
            continue
        if word.dep in np_deps:
-            if any(w.i in seen for w in word.subtree):
-                continue
-            seen.update(j for j in range(word.left_edge.i, word.i + 1))
+            prev_end = word.i
            yield word.left_edge.i, word.i + 1, np_label
        elif word.dep == conj:
            head = word.head
@ -41,9 +44,7 @@ def noun_chunks(obj):
                head = head.head
            # If the head is an NP, and we're coordinated to it, we're an NP
            if head.dep in np_deps:
-                if any(w.i in seen for w in word.subtree):
-                    continue
-                seen.update(j for j in range(word.left_edge.i, word.i + 1))
+                prev_end = word.i
                yield word.left_edge.i, word.i + 1, np_label


--- a/spacy/lang/en/tokenizer_exceptions.py
+++ b/spacy/lang/en/tokenizer_exceptions.py
@ -77,12 +77,12 @@ for pron in ["i", "you", "he", "she", "it", "we", "they"]:

        _exc[orth + "'d"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "'d", LEMMA: "would", NORM: "would", TAG: "MD"},
+            {ORTH: "'d", NORM: "'d"},
        ]

        _exc[orth + "d"] = [
            {ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
-            {ORTH: "d", LEMMA: "would", NORM: "would", TAG: "MD"},
+            {ORTH: "d", NORM: "'d"},
        ]

        _exc[orth + "'d've"] = [
@ -195,7 +195,10 @@ for word in ["who", "what", "when", "where", "why", "how", "there", "that"]:
            {ORTH: "'d", NORM: "'d"},
        ]

-        _exc[orth + "d"] = [{ORTH: orth, LEMMA: word, NORM: word}, {ORTH: "d"}]
+        _exc[orth + "d"] = [
+            {ORTH: orth, LEMMA: word, NORM: word},
+            {ORTH: "d", NORM: "'d"},
+        ]

        _exc[orth + "'d've"] = [
            {ORTH: orth, LEMMA: word, NORM: word},
--- a/spacy/lang/es/init.py
+++ b/spacy/lang/es/init.py
@ -6,6 +6,7 @@ from .tag_map import TAG_MAP
 from .stop_words import STOP_WORDS
 from .lex_attrs import LEX_ATTRS
 from .syntax_iterators import SYNTAX_ITERATORS
+from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES

 from ..tokenizer_exceptions import BASE_EXCEPTIONS
 from ..norm_exceptions import BASE_NORMS
@ -23,6 +24,8 @@ class SpanishDefaults(Language.Defaults):
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    tag_map = TAG_MAP
+    infixes = TOKENIZER_INFIXES
+    suffixes = TOKENIZER_SUFFIXES
    stop_words = STOP_WORDS
    syntax_iterators = SYNTAX_ITERATORS

--- a/spacy/lang/es/lex_attrs.py
+++ b/spacy/lang/es/lex_attrs.py
@ -26,6 +26,15 @@ _num_words = [
    "dieciocho",
    "diecinueve",
    "veinte",
+    "veintiuno",
+    "veintidós",
+    "veintitrés",
+    "veinticuatro",
+    "veinticinco",
+    "veintiséis",
+    "veintisiete",
+    "veintiocho",
+    "veintinueve",
    "treinta",
    "cuarenta",
    "cincuenta",
--- a/spacy/lang/es/punctuation.py
+++ b/spacy/lang/es/punctuation.py
@ -0,0 +1,47 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ..char_classes import LIST_PUNCT, LIST_ELLIPSES, LIST_QUOTES
+from ..char_classes import LIST_ICONS, CURRENCY, LIST_UNITS, PUNCT
+from ..char_classes import CONCAT_QUOTES, ALPHA_LOWER, ALPHA_UPPER, ALPHA
+from ..char_classes import merge_chars
+
+
+_list_units = [u for u in LIST_UNITS if u != "%"]
+_units = merge_chars(" ".join(_list_units))
+_concat_quotes = CONCAT_QUOTES + "—–"
+
+
+_suffixes = (
+    ["—", "–"]
+    + LIST_PUNCT
+    + LIST_ELLIPSES
+    + LIST_QUOTES
+    + LIST_ICONS
+    + [
+        r"(?<=[0-9])\+",
+        r"(?<=°[FfCcKk])\.",
+        r"(?<=[0-9])(?:{c})".format(c=CURRENCY),
+        r"(?<=[0-9])(?:{u})".format(u=_units),
+        r"(?<=[0-9{al}{e}{p}(?:{q})])\.".format(
+            al=ALPHA_LOWER, e=r"%²\-\+", q=_concat_quotes, p=PUNCT
+        ),
+        r"(?<=[{au}][{au}])\.".format(au=ALPHA_UPPER),
+    ]
+)
+
+_infixes = (
+    LIST_ELLIPSES
+    + LIST_ICONS
+    + [
+        r"(?<=[0-9])[+\*^](?=[0-9-])",
+        r"(?<=[{al}{q}])\.(?=[{au}{q}])".format(
+            al=ALPHA_LOWER, au=ALPHA_UPPER, q=_concat_quotes
+        ),
+        r"(?<=[{a}]),(?=[{a}])".format(a=ALPHA),
+        r"(?<=[{a}0-9])[:<>=/](?=[{a}])".format(a=ALPHA),
+    ]
+)
+
+TOKENIZER_SUFFIXES = _suffixes
+TOKENIZER_INFIXES = _infixes
--- a/spacy/lang/es/syntax_iterators.py
+++ b/spacy/lang/es/syntax_iterators.py
@ -2,10 +2,15 @@
 from __future__ import unicode_literals

 from ...symbols import NOUN, PROPN, PRON, VERB, AUX
+from ...errors import Errors


-def noun_chunks(obj):
-    doc = obj.doc
+def noun_chunks(doclike):
+    doc = doclike.doc
+
+    if not doc.is_parsed:
+        raise ValueError(Errors.E029)
+
    if not len(doc):
        return
    np_label = doc.vocab.strings.add("NP")
@ -16,7 +21,7 @@ def noun_chunks(obj):
    np_right_deps = [doc.vocab.strings.add(label) for label in right_labels]
    stop_deps = [doc.vocab.strings.add(label) for label in stop_labels]
    token = doc[0]
-    while token and token.i < len(doc):
+    while token and token.i < len(doclike):
        if token.pos in [PROPN, NOUN, PRON]:
            left, right = noun_bounds(
                doc, token, np_left_deps, np_right_deps, stop_deps
--- a/spacy/lang/es/tokenizer_exceptions.py
+++ b/spacy/lang/es/tokenizer_exceptions.py
@ -43,14 +43,16 @@ for orth in [
    "Av.",
    "Avda.",
    "Cía.",
+    "EE.UU.",
    "etc.",
+    "fig.",
    "Gob.",
    "Gral.",
    "Ing.",
    "J.C.",
+    "km/h",
    "Lic.",
    "m.n.",
-    "no.",
    "núm.",
    "P.D.",
    "Prof.",
--- a/spacy/lang/eu/examples.py
+++ b/spacy/lang/eu/examples.py
@ -10,5 +10,5 @@ Example sentences to test spaCy and its language models.

 sentences = [
    "bilbon ko castinga egin da eta nik jakin ez zuetako inork egin al du edota parte hartu duen ezagunik ba al du",
-    "gaur telebistan entzunda denok martetik gatoz hortaz martzianoak gara beno nire ustez batzuk beste batzuk baino martzianoagoak dira"
+    "gaur telebistan entzunda denok martetik gatoz hortaz martzianoak gara beno nire ustez batzuk beste batzuk baino martzianoagoak dira",
 ]
--- a/spacy/lang/eu/lex_attrs.py
+++ b/spacy/lang/eu/lex_attrs.py
@ -59,7 +59,6 @@ behin
 """.split()


-
 def like_num(text):
    if text.startswith(("+", "-", "±", "~")):
        text = text[1:]
--- a/spacy/lang/eu/stop_words.py
+++ b/spacy/lang/eu/stop_words.py
@ -5,7 +5,7 @@ from __future__ import unicode_literals
 # https://www.ranks.nl/stopwords/basque
 # https://www.mustgo.com/worldlanguages/basque/
 STOP_WORDS = set(
-"""
+    """
 al
 anitz
 arabera
--- a/spacy/lang/fa/init.py
+++ b/spacy/lang/fa/init.py
@ -10,6 +10,7 @@ from .lex_attrs import LEX_ATTRS
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .tag_map import TAG_MAP
 from .punctuation import TOKENIZER_SUFFIXES
+from .syntax_iterators import SYNTAX_ITERATORS


 class PersianDefaults(Language.Defaults):
@ -24,6 +25,7 @@ class PersianDefaults(Language.Defaults):
    tag_map = TAG_MAP
    suffixes = TOKENIZER_SUFFIXES
    writing_system = {"direction": "rtl", "has_case": False, "has_letters": True}
+    syntax_iterators = SYNTAX_ITERATORS


 class Persian(Language):
--- a/spacy/lang/fa/syntax_iterators.py
+++ b/spacy/lang/fa/syntax_iterators.py
@ -2,9 +2,10 @@
 from __future__ import unicode_literals

 from ...symbols import NOUN, PROPN, PRON
+from ...errors import Errors


-def noun_chunks(obj):
+def noun_chunks(doclike):
    """
    Detect base noun phrases from a dependency parse. Works on both Doc and Span.
    """
@ -19,21 +20,23 @@ def noun_chunks(obj):
        "attr",
        "ROOT",
    ]
-    doc = obj.doc  # Ensure works on both Doc and Span.
+    doc = doclike.doc  # Ensure works on both Doc and Span.
+
+    if not doc.is_parsed:
+        raise ValueError(Errors.E029)
+
    np_deps = [doc.vocab.strings.add(label) for label in labels]
    conj = doc.vocab.strings.add("conj")
    np_label = doc.vocab.strings.add("NP")
-    seen = set()
-    for i, word in enumerate(obj):
+    prev_end = -1
+    for i, word in enumerate(doclike):
        if word.pos not in (NOUN, PROPN, PRON):
            continue
        # Prevent nested chunks from being produced
-        if word.i in seen:
+        if word.left_edge.i <= prev_end:
            continue
        if word.dep in np_deps:
-            if any(w.i in seen for w in word.subtree):
-                continue
-            seen.update(j for j in range(word.left_edge.i, word.i + 1))
+            prev_end = word.i
            yield word.left_edge.i, word.i + 1, np_label
        elif word.dep == conj:
            head = word.head
@ -41,9 +44,7 @@ def noun_chunks(obj):
                head = head.head
            # If the head is an NP, and we're coordinated to it, we're an NP
            if head.dep in np_deps:
-                if any(w.i in seen for w in word.subtree):
-                    continue
-                seen.update(j for j in range(word.left_edge.i, word.i + 1))
+                prev_end = word.i
                yield word.left_edge.i, word.i + 1, np_label


--- a/spacy/lang/fr/init.py
+++ b/spacy/lang/fr/init.py
@ -2,7 +2,8 @@
 from __future__ import unicode_literals

 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS, TOKEN_MATCH
-from .punctuation import TOKENIZER_SUFFIXES, TOKENIZER_INFIXES
+from .punctuation import TOKENIZER_PREFIXES, TOKENIZER_INFIXES
+from .punctuation import TOKENIZER_SUFFIXES
 from .tag_map import TAG_MAP
 from .stop_words import STOP_WORDS
 from .lex_attrs import LEX_ATTRS
@ -27,6 +28,7 @@ class FrenchDefaults(Language.Defaults):
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    tag_map = TAG_MAP
    stop_words = STOP_WORDS
+    prefixes = TOKENIZER_PREFIXES
    infixes = TOKENIZER_INFIXES
    suffixes = TOKENIZER_SUFFIXES
    token_match = TOKEN_MATCH
--- a/spacy/lang/fr/punctuation.py
+++ b/spacy/lang/fr/punctuation.py
@ -1,15 +1,26 @@
 # coding: utf8
 from __future__ import unicode_literals

-from ..punctuation import TOKENIZER_INFIXES
+from ..punctuation import TOKENIZER_PREFIXES, TOKENIZER_INFIXES
 from ..char_classes import LIST_PUNCT, LIST_ELLIPSES, LIST_QUOTES, CURRENCY
 from ..char_classes import CONCAT_QUOTES, UNITS, ALPHA, ALPHA_LOWER, ALPHA_UPPER
+from ..char_classes import merge_chars


-ELISION = " ' ’ ".strip().replace(" ", "").replace("\n", "")
-HYPHENS = r"- – — ‐ ‑".strip().replace(" ", "").replace("\n", "")
+ELISION = "' ’".replace(" ", "")
+HYPHENS = r"- – — ‐ ‑".replace(" ", "")
+_prefixes_elision = "d l n"
+_prefixes_elision += " " + _prefixes_elision.upper()
+_hyphen_suffixes = "ce clés elle en il ils je là moi nous on t vous"
+_hyphen_suffixes += " " + _hyphen_suffixes.upper()


+_prefixes = TOKENIZER_PREFIXES + [
+    r"(?:({pe})[{el}])(?=[{a}])".format(
+        a=ALPHA, el=ELISION, pe=merge_chars(_prefixes_elision)
+    )
+]
+
 _suffixes = (
    LIST_PUNCT
    + LIST_ELLIPSES
@ -17,7 +28,6 @@ _suffixes = (
    + [
        r"(?<=[0-9])\+",
        r"(?<=°[FfCcKk])\.",  # °C. -> ["°C", "."]
-        r"(?<=[0-9])°[FfCcKk]",  # 4°C -> ["4", "°C"]
        r"(?<=[0-9])%",  # 4% -> ["4", "%"]
        r"(?<=[0-9])(?:{c})".format(c=CURRENCY),
        r"(?<=[0-9])(?:{u})".format(u=UNITS),
@ -25,14 +35,17 @@ _suffixes = (
            al=ALPHA_LOWER, e=r"%²\-\+", q=CONCAT_QUOTES
        ),
        r"(?<=[{au}][{au}])\.".format(au=ALPHA_UPPER),
+        r"(?<=[{a}])[{h}]({hs})".format(
+            a=ALPHA, h=HYPHENS, hs=merge_chars(_hyphen_suffixes)
+        ),
    ]
 )

-
 _infixes = TOKENIZER_INFIXES + [
    r"(?<=[{a}][{el}])(?=[{a}])".format(a=ALPHA, el=ELISION)
 ]


+TOKENIZER_PREFIXES = _prefixes
 TOKENIZER_SUFFIXES = _suffixes
 TOKENIZER_INFIXES = _infixes
--- a/spacy/lang/fr/syntax_iterators.py
+++ b/spacy/lang/fr/syntax_iterators.py
@ -2,9 +2,10 @@
 from __future__ import unicode_literals

 from ...symbols import NOUN, PROPN, PRON
+from ...errors import Errors


-def noun_chunks(obj):
+def noun_chunks(doclike):
    """
    Detect base noun phrases from a dependency parse. Works on both Doc and Span.
    """
@ -18,21 +19,23 @@ def noun_chunks(obj):
        "nmod",
        "nmod:poss",
    ]
-    doc = obj.doc  # Ensure works on both Doc and Span.
+    doc = doclike.doc  # Ensure works on both Doc and Span.
+
+    if not doc.is_parsed:
+        raise ValueError(Errors.E029)
+
    np_deps = [doc.vocab.strings[label] for label in labels]
    conj = doc.vocab.strings.add("conj")
    np_label = doc.vocab.strings.add("NP")
-    seen = set()
-    for i, word in enumerate(obj):
+    prev_end = -1
+    for i, word in enumerate(doclike):
        if word.pos not in (NOUN, PROPN, PRON):
            continue
        # Prevent nested chunks from being produced
-        if word.i in seen:
+        if word.left_edge.i <= prev_end:
            continue
        if word.dep in np_deps:
-            if any(w.i in seen for w in word.subtree):
-                continue
-            seen.update(j for j in range(word.left_edge.i, word.right_edge.i + 1))
+            prev_end = word.right_edge.i
            yield word.left_edge.i, word.right_edge.i + 1, np_label
        elif word.dep == conj:
            head = word.head
@ -40,9 +43,7 @@ def noun_chunks(obj):
                head = head.head
            # If the head is an NP, and we're coordinated to it, we're an NP
            if head.dep in np_deps:
-                if any(w.i in seen for w in word.subtree):
-                    continue
-                seen.update(j for j in range(word.left_edge.i, word.right_edge.i + 1))
+                prev_end = word.right_edge.i
                yield word.left_edge.i, word.right_edge.i + 1, np_label


--- a/spacy/lang/fr/tokenizer_exceptions.py
+++ b/spacy/lang/fr/tokenizer_exceptions.py
@ -6,7 +6,7 @@ import re
 from .punctuation import ELISION, HYPHENS
 from ..tokenizer_exceptions import URL_PATTERN
 from ..char_classes import ALPHA_LOWER, ALPHA
-from ...symbols import ORTH, LEMMA, TAG
+from ...symbols import ORTH, LEMMA

 # not using the large _tokenizer_exceptions_list by default as it slows down the tokenizer
 # from ._tokenizer_exceptions_list import FR_BASE_EXCEPTIONS
@ -56,7 +56,28 @@ for exc_data in [
    _exc[exc_data[ORTH]] = [exc_data]


-for orth in ["etc."]:
+for orth in [
+    "après-midi",
+    "au-delà",
+    "au-dessus",
+    "celle-ci",
+    "celles-ci",
+    "celui-ci",
+    "cf.",
+    "ci-dessous",
+    "elle-même",
+    "en-dessous",
+    "etc.",
+    "jusque-là",
+    "lui-même",
+    "MM.",
+    "No.",
+    "peut-être",
+    "pp.",
+    "quelques-uns",
+    "rendez-vous",
+    "Vol.",
+]:
    _exc[orth] = [{ORTH: orth}]


@ -72,7 +93,7 @@ for verb, verb_lemma in [
        for pronoun in ["elle", "il", "on"]:
            token = "{}-t-{}".format(orth, pronoun)
            _exc[token] = [
-                {LEMMA: verb_lemma, ORTH: orth, TAG: "VERB"},
+                {LEMMA: verb_lemma, ORTH: orth},  # , TAG: "VERB"},
                {LEMMA: "t", ORTH: "-t"},
                {LEMMA: pronoun, ORTH: "-" + pronoun},
            ]
@ -81,7 +102,7 @@ for verb, verb_lemma in [("est", "être")]:
    for orth in [verb, verb.title()]:
        token = "{}-ce".format(orth)
        _exc[token] = [
-            {LEMMA: verb_lemma, ORTH: orth, TAG: "VERB"},
+            {LEMMA: verb_lemma, ORTH: orth},  # , TAG: "VERB"},
            {LEMMA: "ce", ORTH: "-ce"},
        ]

@ -89,12 +110,29 @@ for verb, verb_lemma in [("est", "être")]:
 for pre, pre_lemma in [("qu'", "que"), ("n'", "ne")]:
    for orth in [pre, pre.title()]:
        _exc["%sest-ce" % orth] = [
-            {LEMMA: pre_lemma, ORTH: orth, TAG: "ADV"},
-            {LEMMA: "être", ORTH: "est", TAG: "VERB"},
+            {LEMMA: pre_lemma, ORTH: orth},
+            {LEMMA: "être", ORTH: "est"},
            {LEMMA: "ce", ORTH: "-ce"},
        ]


+for verb, pronoun in [("est", "il"), ("EST", "IL")]:
+    token = "{}-{}".format(verb, pronoun)
+    _exc[token] = [
+        {LEMMA: "être", ORTH: verb},
+        {LEMMA: pronoun, ORTH: "-" + pronoun},
+    ]
+
+
+for s, verb, pronoun in [("s", "est", "il"), ("S", "EST", "IL")]:
+    token = "{}'{}-{}".format(s, verb, pronoun)
+    _exc[token] = [
+        {LEMMA: "se", ORTH: s + "'"},
+        {LEMMA: "être", ORTH: verb},
+        {LEMMA: pronoun, ORTH: "-" + pronoun},
+    ]
+
+
 _infixes_exc = []
 orig_elision = "'"
 orig_hyphen = "-"
@ -423,5 +461,5 @@ _regular_exp.append(URL_PATTERN)

 TOKENIZER_EXCEPTIONS = _exc
 TOKEN_MATCH = re.compile(
-    "|".join("(?:{})".format(m) for m in _regular_exp), re.IGNORECASE | re.UNICODE
+    "(?iu)" + "|".join("(?:{})".format(m) for m in _regular_exp)
 ).match
--- a/spacy/lang/gu/init.py
+++ b/spacy/lang/gu/init.py
@ -0,0 +1,18 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from .stop_words import STOP_WORDS
+
+from ...language import Language
+
+
+class GujaratiDefaults(Language.Defaults):
+    stop_words = STOP_WORDS
+
+
+class Gujarati(Language):
+    lang = "gu"
+    Defaults = GujaratiDefaults
+
+
+__all__ = ["Gujarati"]
--- a/spacy/lang/gu/examples.py
+++ b/spacy/lang/gu/examples.py
@ -0,0 +1,22 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+"""
+Example sentences to test spaCy and its language models.
+
+>>> from spacy.lang.gu.examples import sentences
+>>> docs = nlp.pipe(sentences)
+"""
+
+
+sentences = [
+    "લોકશાહી એ સરકારનું એક એવું તંત્ર છે જ્યાં નાગરિકો મત દ્વારા સત્તાનો ઉપયોગ કરે છે.",
+    "તે ગુજરાત રાજ્યના ધરમપુર શહેરમાં આવેલું હતું",
+    "કર્ણદેવ પહેલો સોલંકી વંશનો રાજા હતો",
+    "તેજપાળને બે પત્ની હતી",
+    "ગુજરાતમાં ભારતીય જનતા પક્ષનો ઉદય આ સમયગાળા દરમિયાન થયો",
+    "આંદોલનકારીઓએ ચીમનભાઇ પટેલના રાજીનામાની માંગણી કરી.",
+    "અહિયાં શું જોડાય છે?",
+    "મંદિરનો પૂર્વાભિમુખ ભાગ નાના મંડપ સાથે થોડો લંબચોરસ આકારનો છે.",
+]
--- a/spacy/lang/gu/stop_words.py
+++ b/spacy/lang/gu/stop_words.py
@ -0,0 +1,91 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+STOP_WORDS = set(
+    """
+એમ
+આ
+એ
+રહી
+છે
+છો
+હતા
+હતું
+હતી
+હોય
+હતો
+શકે
+તે
+તેના
+તેનું
+તેને
+તેની
+તેઓ
+તેમને
+તેમના
+તેમણે
+તેમનું
+તેમાં
+અને
+અહીં
+થી
+થઈ
+થાય
+જે
+ ને
+કે
+ના
+ની
+નો
+ને
+નું
+શું
+માં
+પણ
+પર
+જેવા
+જેવું
+જાય
+જેમ
+જેથી
+માત્ર
+માટે
+પરથી
+આવ્યું
+એવી
+આવી
+રીતે
+સુધી
+થાય
+થઈ
+સાથે
+લાગે
+હોવા
+છતાં
+રહેલા
+કરી
+કરે
+કેટલા
+કોઈ
+કેમ
+કર્યો
+કર્યુ
+કરે
+સૌથી
+ત્યારબાદ
+તથા
+દ્વારા
+જુઓ
+જાઓ
+જ્યારે
+ત્યારે
+શકો
+નથી
+હવે
+અથવા
+થતો
+દર
+એટલો
+પરંતુ
+""".split()
+)
--- a/spacy/lang/hy/init.py
+++ b/spacy/lang/hy/init.py
@ -0,0 +1,26 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from .stop_words import STOP_WORDS
+from .lex_attrs import LEX_ATTRS
+from .tag_map import TAG_MAP
+
+from ...attrs import LANG
+from ...language import Language
+
+
+class ArmenianDefaults(Language.Defaults):
+    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    lex_attr_getters[LANG] = lambda text: "hy"
+
+    lex_attr_getters.update(LEX_ATTRS)
+    stop_words = STOP_WORDS
+    tag_map = TAG_MAP
+
+
+class Armenian(Language):
+    lang = "hy"
+    Defaults = ArmenianDefaults
+
+
+__all__ = ["Armenian"]
--- a/spacy/lang/hy/examples.py
+++ b/spacy/lang/hy/examples.py
@ -0,0 +1,16 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+"""
+Example sentences to test spaCy and its language models.
+>>> from spacy.lang.hy.examples import sentences
+>>> docs = nlp.pipe(sentences)
+"""
+
+
+sentences = [
+    "Լոնդոնը Միացյալ Թագավորության մեծ քաղաք է։",
+    "Ո՞վ է Ֆրանսիայի նախագահը։",
+    "Որն է Միացյալ Նահանգների մայրաքաղաքը։",
+    "Ե՞րբ է ծնվել Բարաք Օբաման։",
+]
--- a/spacy/lang/hy/lex_attrs.py
+++ b/spacy/lang/hy/lex_attrs.py
@ -0,0 +1,59 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ...attrs import LIKE_NUM
+
+
+_num_words = [
+    "զրօ",
+    "մէկ",
+    "երկու",
+    "երեք",
+    "չորս",
+    "հինգ",
+    "վեց",
+    "յոթ",
+    "ութ",
+    "ինը",
+    "տասը",
+    "տասնմեկ",
+    "տասներկու",
+    "տասներեք",
+    "տասնչորս",
+    "տասնհինգ",
+    "տասնվեց",
+    "տասնյոթ",
+    "տասնութ",
+    "տասնինը",
+    "քսան" "երեսուն",
+    "քառասուն",
+    "հիսուն",
+    "վաթցսուն",
+    "յոթանասուն",
+    "ութսուն",
+    "ինիսուն",
+    "հարյուր",
+    "հազար",
+    "միլիոն",
+    "միլիարդ",
+    "տրիլիոն",
+    "քվինտիլիոն",
+]
+
+
+def like_num(text):
+    if text.startswith(("+", "-", "±", "~")):
+        text = text[1:]
+    text = text.replace(",", "").replace(".", "")
+    if text.isdigit():
+        return True
+    if text.count("/") == 1:
+        num, denom = text.split("/")
+        if num.isdigit() and denom.isdigit():
+            return True
+    if text.lower() in _num_words:
+        return True
+    return False
+
+
+LEX_ATTRS = {LIKE_NUM: like_num}
--- a/spacy/lang/hy/stop_words.py
+++ b/spacy/lang/hy/stop_words.py
@ -0,0 +1,110 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+STOP_WORDS = set(
+    """
+նա
+ողջը
+այստեղ
+ենք
+նա
+էիր
+որպես
+ուրիշ
+բոլորը
+այն
+այլ
+նույնչափ
+էի
+մի
+և
+ողջ
+ես
+ոմն
+հետ
+նրանք
+ամենքը
+ըստ
+ինչ-ինչ
+այսպես
+համայն
+մի
+նաև
+նույնքան
+դա
+ովևէ
+համար
+այնտեղ
+էին
+որոնք
+սույն
+ինչ-որ
+ամենը
+նույնպիսի
+ու
+իր
+որոշ
+միևնույն
+ի
+այնպիսի
+մենք
+ամեն ոք
+նույն
+երբևէ
+այն
+որևէ
+ին
+այդպես
+նրա
+որը
+վրա
+դու
+էինք
+այդպիսի
+էիք
+յուրաքանչյուրը
+եմ
+պիտի
+այդ
+ամբողջը
+հետո
+եք
+ամեն
+այլ
+կամ
+այսքան
+որ
+այնպես
+այսինչ
+բոլոր
+է
+մեկնումեկը
+այդչափ
+այնքան
+ամբողջ
+երբևիցե
+այնչափ
+ամենայն
+մյուս
+այնինչ
+իսկ
+այդտեղ
+այս
+սա
+են
+ամեն ինչ
+որևիցե
+ում
+մեկը
+այդ
+դուք
+այսչափ
+այդքան
+այսպիսի
+էր
+յուրաքանչյուր
+այս
+մեջ
+թ
+""".split()
+)
--- a/spacy/lang/hy/tag_map.py
+++ b/spacy/lang/hy/tag_map.py
--- a/spacy/lang/id/init.py
+++ b/spacy/lang/id/init.py
@ -4,25 +4,20 @@ from __future__ import unicode_literals
 from .stop_words import STOP_WORDS
 from .punctuation import TOKENIZER_SUFFIXES, TOKENIZER_PREFIXES, TOKENIZER_INFIXES
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
-from .norm_exceptions import NORM_EXCEPTIONS
 from .lex_attrs import LEX_ATTRS
 from .syntax_iterators import SYNTAX_ITERATORS
 from .tag_map import TAG_MAP

 from ..tokenizer_exceptions import BASE_EXCEPTIONS
-from ..norm_exceptions import BASE_NORMS
 from ...language import Language
-from ...attrs import LANG, NORM
-from ...util import update_exc, add_lookups
+from ...attrs import LANG
+from ...util import update_exc


 class IndonesianDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters[LANG] = lambda text: "id"
    lex_attr_getters.update(LEX_ATTRS)
-    lex_attr_getters[NORM] = add_lookups(
-        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
-    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    prefixes = TOKENIZER_PREFIXES
--- a/spacy/lang/id/norm_exceptions.py
+++ b/spacy/lang/id/norm_exceptions.py
@ -1,532 +0,0 @@
-# coding: utf8
-from __future__ import unicode_literals
-
-# Daftar kosakata yang sering salah dieja
-# https://id.wikipedia.org/wiki/Wikipedia:Daftar_kosakata_bahasa_Indonesia_yang_sering_salah_dieja
-_exc = {
-    # Slang and abbreviations
-    "silahkan": "silakan",
-    "yg": "yang",
-    "kalo": "kalau",
-    "cawu": "caturwulan",
-    "ok": "oke",
-    "gak": "tidak",
-    "enggak": "tidak",
-    "nggak": "tidak",
-    "ndak": "tidak",
-    "ngga": "tidak",
-    "dgn": "dengan",
-    "tdk": "tidak",
-    "jg": "juga",
-    "klo": "kalau",
-    "denger": "dengar",
-    "pinter": "pintar",
-    "krn": "karena",
-    "nemuin": "menemukan",
-    "jgn": "jangan",
-    "udah": "sudah",
-    "sy": "saya",
-    "udh": "sudah",
-    "dapetin": "mendapatkan",
-    "ngelakuin": "melakukan",
-    "ngebuat": "membuat",
-    "membikin": "membuat",
-    "bikin": "buat",
-    # Daftar kosakata yang sering salah dieja
-    "malpraktik": "malapraktik",
-    "malfungsi": "malafungsi",
-    "malserap": "malaserap",
-    "maladaptasi": "malaadaptasi",
-    "malsuai": "malasuai",
-    "maldistribusi": "maladistribusi",
-    "malgizi": "malagizi",
-    "malsikap": "malasikap",
-    "memperhatikan": "memerhatikan",
-    "akte": "akta",
-    "cemilan": "camilan",
-    "esei": "esai",
-    "frase": "frasa",
-    "kafeteria": "kafetaria",
-    "ketapel": "katapel",
-    "kenderaan": "kendaraan",
-    "menejemen": "manajemen",
-    "menejer": "manajer",
-    "mesjid": "masjid",
-    "rebo": "rabu",
-    "seksama": "saksama",
-    "senggama": "sanggama",
-    "sekedar": "sekadar",
-    "seprei": "seprai",
-    "semedi": "semadi",
-    "samadi": "semadi",
-    "amandemen": "amendemen",
-    "algoritma": "algoritme",
-    "aritmatika": "aritmetika",
-    "metoda": "metode",
-    "materai": "meterai",
-    "meterei": "meterai",
-    "kalendar": "kalender",
-    "kadaluwarsa": "kedaluwarsa",
-    "katagori": "kategori",
-    "parlamen": "parlemen",
-    "sekular": "sekuler",
-    "selular": "seluler",
-    "sirkular": "sirkuler",
-    "survai": "survei",
-    "survey": "survei",
-    "aktuil": "aktual",
-    "formil": "formal",
-    "trotoir": "trotoar",
-    "komersiil": "komersial",
-    "komersil": "komersial",
-    "tradisionil": "tradisionial",
-    "orisinil": "orisinal",
-    "orijinil": "orisinal",
-    "afdol": "afdal",
-    "antri": "antre",
-    "apotik": "apotek",
-    "atlit": "atlet",
-    "atmosfir": "atmosfer",
-    "cidera": "cedera",
-    "cendikiawan": "cendekiawan",
-    "cepet": "cepat",
-    "cinderamata": "cenderamata",
-    "debet": "debit",
-    "difinisi": "definisi",
-    "dekrit": "dekret",
-    "disain": "desain",
-    "diskripsi": "deskripsi",
-    "diskotik": "diskotek",
-    "eksim": "eksem",
-    "exim": "eksem",
-    "faidah": "faedah",
-    "ekstrim": "ekstrem",
-    "ekstrimis": "ekstremis",
-    "komplit": "komplet",
-    "konkrit": "konkret",
-    "kongkrit": "konkret",
-    "kongkret": "konkret",
-    "kridit": "kredit",
-    "musium": "museum",
-    "pinalti": "penalti",
-    "piranti": "peranti",
-    "pinsil": "pensil",
-    "personil": "personel",
-    "sistim": "sistem",
-    "teoritis": "teoretis",
-    "vidio": "video",
-    "cengkeh": "cengkih",
-    "desertasi": "disertasi",
-    "hakekat": "hakikat",
-    "intelejen": "intelijen",
-    "kaedah": "kaidah",
-    "kempes": "kempis",
-    "kementrian": "kementerian",
-    "ledeng": "leding",
-    "nasehat": "nasihat",
-    "penasehat": "penasihat",
-    "praktek": "praktik",
-    "praktekum": "praktikum",
-    "resiko": "risiko",
-    "retsleting": "ritsleting",
-    "senen": "senin",
-    "amuba": "ameba",
-    "punggawa": "penggawa",
-    "surban": "serban",
-    "nomer": "nomor",
-    "sorban": "serban",
-    "bis": "bus",
-    "agribisnis": "agrobisnis",
-    "kantung": "kantong",
-    "khutbah": "khotbah",
-    "mandur": "mandor",
-    "rubuh": "roboh",
-    "pastur": "pastor",
-    "supir": "sopir",
-    "goncang": "guncang",
-    "goa": "gua",
-    "kaos": "kaus",
-    "kokoh": "kukuh",
-    "komulatif": "kumulatif",
-    "kolomnis": "kolumnis",
-    "korma": "kurma",
-    "lobang": "lubang",
-    "limo": "limusin",
-    "limosin": "limusin",
-    "mangkok": "mangkuk",
-    "saos": "saus",
-    "sop": "sup",
-    "sorga": "surga",
-    "tegor": "tegur",
-    "telor": "telur",
-    "obrak-abrik": "ubrak-abrik",
-    "ekwivalen": "ekuivalen",
-    "frekwensi": "frekuensi",
-    "konsekwensi": "konsekuensi",
-    "kwadran": "kuadran",
-    "kwadrat": "kuadrat",
-    "kwalifikasi": "kualifikasi",
-    "kwalitas": "kualitas",
-    "kwalitet": "kualitas",
-    "kwalitatif": "kualitatif",
-    "kwantitas": "kuantitas",
-    "kwantitatif": "kuantitatif",
-    "kwantum": "kuantum",
-    "kwartal": "kuartal",
-    "kwintal": "kuintal",
-    "kwitansi": "kuitansi",
-    "kwatir": "khawatir",
-    "kuatir": "khawatir",
-    "jadual": "jadwal",
-    "hirarki": "hierarki",
-    "karir": "karier",
-    "aktip": "aktif",
-    "daptar": "daftar",
-    "efektip": "efektif",
-    "epektif": "efektif",
-    "epektip": "efektif",
-    "Pebruari": "Februari",
-    "pisik": "fisik",
-    "pondasi": "fondasi",
-    "photo": "foto",
-    "photokopi": "fotokopi",
-    "hapal": "hafal",
-    "insap": "insaf",
-    "insyaf": "insaf",
-    "konperensi": "konferensi",
-    "kreatip": "kreatif",
-    "kreativ": "kreatif",
-    "maap": "maaf",
-    "napsu": "nafsu",
-    "negatip": "negatif",
-    "negativ": "negatif",
-    "objektip": "objektif",
-    "obyektip": "objektif",
-    "obyektif": "objektif",
-    "pasip": "pasif",
-    "pasiv": "pasif",
-    "positip": "positif",
-    "positiv": "positif",
-    "produktip": "produktif",
-    "produktiv": "produktif",
-    "sarap": "saraf",
-    "sertipikat": "sertifikat",
-    "subjektip": "subjektif",
-    "subyektip": "subjektif",
-    "subyektif": "subjektif",
-    "tarip": "tarif",
-    "transitip": "transitif",
-    "transitiv": "transitif",
-    "faham": "paham",
-    "fikir": "pikir",
-    "berfikir": "berpikir",
-    "telefon": "telepon",
-    "telfon": "telepon",
-    "telpon": "telepon",
-    "tilpon": "telepon",
-    "nafas": "napas",
-    "bernafas": "bernapas",
-    "pernafasan": "pernapasan",
-    "vermak": "permak",
-    "vulpen": "pulpen",
-    "aktifis": "aktivis",
-    "konfeksi": "konveksi",
-    "motifasi": "motivasi",
-    "Nopember": "November",
-    "propinsi": "provinsi",
-    "babtis": "baptis",
-    "jerembab": "jerembap",
-    "lembab": "lembap",
-    "sembab": "sembap",
-    "saptu": "sabtu",
-    "tekat": "tekad",
-    "bejad": "bejat",
-    "nekad": "nekat",
-    "otoped": "otopet",
-    "skuad": "skuat",
-    "jenius": "genius",
-    "marjin": "margin",
-    "marjinal": "marginal",
-    "obyek": "objek",
-    "subyek": "subjek",
-    "projek": "proyek",
-    "azas": "asas",
-    "ijasah": "ijazah",
-    "jenasah": "jenazah",
-    "plasa": "plaza",
-    "bathin": "batin",
-    "Katholik": "Katolik",
-    "orthografi": "ortografi",
-    "pathogen": "patogen",
-    "theologi": "teologi",
-    "ijin": "izin",
-    "rejeki": "rezeki",
-    "rejim": "rezim",
-    "jaman": "zaman",
-    "jamrud": "zamrud",
-    "jinah": "zina",
-    "perjinahan": "perzinaan",
-    "anugrah": "anugerah",
-    "cendrawasih": "cenderawasih",
-    "jendral": "jenderal",
-    "kripik": "keripik",
-    "krupuk": "kerupuk",
-    "ksatria": "kesatria",
-    "mentri": "menteri",
-    "negri": "negeri",
-    "Prancis": "Perancis",
-    "sebrang": "seberang",
-    "menyebrang": "menyeberang",
-    "Sumatra": "Sumatera",
-    "trampil": "terampil",
-    "isteri": "istri",
-    "justeru": "justru",
-    "perajurit": "prajurit",
-    "putera": "putra",
-    "puteri": "putri",
-    "samudera": "samudra",
-    "sastera": "sastra",
-    "sutera": "sutra",
-    "terompet": "trompet",
-    "iklas": "ikhlas",
-    "iktisar": "ikhtisar",
-    "kafilah": "khafilah",
-    "kawatir": "khawatir",
-    "kotbah": "khotbah",
-    "kusyuk": "khusyuk",
-    "makluk": "makhluk",
-    "mahluk": "makhluk",
-    "mahkluk": "makhluk",
-    "nahkoda": "nakhoda",
-    "nakoda": "nakhoda",
-    "tahta": "takhta",
-    "takhyul": "takhayul",
-    "tahyul": "takhayul",
-    "tahayul": "takhayul",
-    "akhli": "ahli",
-    "anarkhi": "anarki",
-    "kharisma": "karisma",
-    "kharismatik": "karismatik",
-    "mahsud": "maksud",
-    "makhsud": "maksud",
-    "rakhmat": "rahmat",
-    "tekhnik": "teknik",
-    "tehnik": "teknik",
-    "tehnologi": "teknologi",
-    "ikhwal": "ihwal",
-    "expor": "ekspor",
-    "extra": "ekstra",
-    "komplex": "komplek",
-    "sex": "seks",
-    "taxi": "taksi",
-    "extasi": "ekstasi",
-    "syaraf": "saraf",
-    "syurga": "surga",
-    "mashur": "masyhur",
-    "masyur": "masyhur",
-    "mahsyur": "masyhur",
-    "mashyur": "masyhur",
-    "muadzin": "muazin",
-    "adzan": "azan",
-    "ustadz": "ustaz",
-    "ustad": "ustaz",
-    "ustadzah": "ustaz",
-    "dzikir": "zikir",
-    "dzuhur": "zuhur",
-    "dhuhur": "zuhur",
-    "zhuhur": "zuhur",
-    "analisa": "analisis",
-    "diagnosa": "diagnosis",
-    "hipotesa": "hipotesis",
-    "sintesa": "sintesis",
-    "aktiviti": "aktivitas",
-    "aktifitas": "aktivitas",
-    "efektifitas": "efektivitas",
-    "komuniti": "komunitas",
-    "kreatifitas": "kreativitas",
-    "produktifitas": "produktivitas",
-    "realiti": "realitas",
-    "realita": "realitas",
-    "selebriti": "selebritas",
-    "spotifitas": "sportivitas",
-    "universiti": "universitas",
-    "utiliti": "utilitas",
-    "validiti": "validitas",
-    "dilokalisir": "dilokalisasi",
-    "didramatisir": "didramatisasi",
-    "dipolitisir": "dipolitisasi",
-    "dinetralisir": "dinetralisasi",
-    "dikonfrontir": "dikonfrontasi",
-    "mendominir": "mendominasi",
-    "koordinir": "koordinasi",
-    "proklamir": "proklamasi",
-    "terorganisir": "terorganisasi",
-    "terealisir": "terealisasi",
-    "robah": "ubah",
-    "dirubah": "diubah",
-    "merubah": "mengubah",
-    "terlanjur": "telanjur",
-    "terlantar": "telantar",
-    "penglepasan": "pelepasan",
-    "pelihatan": "penglihatan",
-    "pemukiman": "permukiman",
-    "pengrumahan": "perumahan",
-    "penyewaan": "persewaan",
-    "menyintai": "mencintai",
-    "menyolok": "mencolok",
-    "contek": "sontek",
-    "mencontek": "menyontek",
-    "pungkir": "mungkir",
-    "dipungkiri": "dimungkiri",
-    "kupungkiri": "kumungkiri",
-    "kaupungkiri": "kaumungkiri",
-    "nampak": "tampak",
-    "nampaknya": "tampaknya",
-    "nongkrong": "tongkrong",
-    "berternak": "beternak",
-    "berterbangan": "beterbangan",
-    "berserta": "beserta",
-    "berperkara": "beperkara",
-    "berpergian": "bepergian",
-    "berkerja": "bekerja",
-    "berberapa": "beberapa",
-    "terbersit": "tebersit",
-    "terpercaya": "tepercaya",
-    "terperdaya": "teperdaya",
-    "terpercik": "tepercik",
-    "terpergok": "tepergok",
-    "aksesoris": "aksesori",
-    "handal": "andal",
-    "hantar": "antar",
-    "panutan": "anutan",
-    "atsiri": "asiri",
-    "bhakti": "bakti",
-    "china": "cina",
-    "dharma": "darma",
-    "diktaktor": "diktator",
-    "eksport": "ekspor",
-    "hembus": "embus",
-    "hadits": "hadis",
-    "hadist": "hadits",
-    "harafiah": "harfiah",
-    "himbau": "imbau",
-    "import": "impor",
-    "inget": "ingat",
-    "hisap": "isap",
-    "interprestasi": "interpretasi",
-    "kangker": "kanker",
-    "konggres": "kongres",
-    "lansekap": "lanskap",
-    "maghrib": "magrib",
-    "emak": "mak",
-    "moderen": "modern",
-    "pasport": "paspor",
-    "perduli": "peduli",
-    "ramadhan": "ramadan",
-    "rapih": "rapi",
-    "Sansekerta": "Sanskerta",
-    "shalat": "salat",
-    "sholat": "salat",
-    "silahkan": "silakan",
-    "standard": "standar",
-    "hutang": "utang",
-    "zinah": "zina",
-    "ambulan": "ambulans",
-    "antartika": "sntarktika",
-    "arteri": "arteria",
-    "asik": "asyik",
-    "australi": "australia",
-    "denga": "dengan",
-    "depo": "depot",
-    "detil": "detail",
-    "ensiklopedi": "ensiklopedia",
-    "elit": "elite",
-    "frustasi": "frustrasi",
-    "gladi": "geladi",
-    "greget": "gereget",
-    "itali": "italia",
-    "karna": "karena",
-    "klenteng": "kelenteng",
-    "erling": "kerling",
-    "kontruksi": "konstruksi",
-    "masal": "massal",
-    "merk": "merek",
-    "respon": "respons",
-    "diresponi": "direspons",
-    "skak": "sekak",
-    "stir": "setir",
-    "singapur": "singapura",
-    "standarisasi": "standardisasi",
-    "varitas": "varietas",
-    "amphibi": "amfibi",
-    "anjlog": "anjlok",
-    "alpukat": "avokad",
-    "alpokat": "avokad",
-    "bolpen": "pulpen",
-    "cabe": "cabai",
-    "cabay": "cabai",
-    "ceret": "cerek",
-    "differensial": "diferensial",
-    "duren": "durian",
-    "faksimili": "faksimile",
-    "faksimil": "faksimile",
-    "graha": "gerha",
-    "goblog": "goblok",
-    "gombrong": "gombroh",
-    "horden": "gorden",
-    "korden": "gorden",
-    "gubug": "gubuk",
-    "imaginasi": "imajinasi",
-    "jerigen": "jeriken",
-    "jirigen": "jeriken",
-    "carut-marut": "karut-marut",
-    "kwota": "kuota",
-    "mahzab": "mazhab",
-    "mempesona": "memesona",
-    "milyar": "miliar",
-    "missi": "misi",
-    "nenas": "nanas",
-    "negoisasi": "negosiasi",
-    "automotif": "otomotif",
-    "pararel": "paralel",
-    "paska": "pasca",
-    "prosen": "persen",
-    "pete": "petai",
-    "petay": "petai",
-    "proffesor": "profesor",
-    "rame": "ramai",
-    "rapot": "rapor",
-    "rileks": "relaks",
-    "rileksasi": "relaksasi",
-    "renumerasi": "remunerasi",
-    "seketaris": "sekretaris",
-    "sekertaris": "sekretaris",
-    "sensorik": "sensoris",
-    "sentausa": "sentosa",
-    "strawberi": "stroberi",
-    "strawbery": "stroberi",
-    "taqwa": "takwa",
-    "tauco": "taoco",
-    "tauge": "taoge",
-    "toge": "taoge",
-    "tauladan": "teladan",
-    "taubat": "tobat",
-    "trilyun": "triliun",
-    "vissi": "visi",
-    "coklat": "cokelat",
-    "narkotika": "narkotik",
-    "oase": "oasis",
-    "politisi": "politikus",
-    "terong": "terung",
-    "wool": "wol",
-    "himpit": "impit",
-    "mujizat": "mukjizat",
-    "mujijat": "mukjizat",
-    "yag": "yang",
-}
-
-NORM_EXCEPTIONS = {}
-
-for string, norm in _exc.items():
-    NORM_EXCEPTIONS[string] = norm
-    NORM_EXCEPTIONS[string.title()] = norm
--- a/spacy/lang/id/syntax_iterators.py
+++ b/spacy/lang/id/syntax_iterators.py
@ -2,9 +2,10 @@
 from __future__ import unicode_literals

 from ...symbols import NOUN, PROPN, PRON
+from ...errors import Errors


-def noun_chunks(obj):
+def noun_chunks(doclike):
    """
    Detect base noun phrases from a dependency parse. Works on both Doc and Span.
    """
@ -18,21 +19,23 @@ def noun_chunks(obj):
        "nmod",
        "nmod:poss",
    ]
-    doc = obj.doc  # Ensure works on both Doc and Span.
+    doc = doclike.doc  # Ensure works on both Doc and Span.
+
+    if not doc.is_parsed:
+        raise ValueError(Errors.E029)
+
    np_deps = [doc.vocab.strings[label] for label in labels]
    conj = doc.vocab.strings.add("conj")
    np_label = doc.vocab.strings.add("NP")
-    seen = set()
-    for i, word in enumerate(obj):
+    prev_end = -1
+    for i, word in enumerate(doclike):
        if word.pos not in (NOUN, PROPN, PRON):
            continue
        # Prevent nested chunks from being produced
-        if word.i in seen:
+        if word.left_edge.i <= prev_end:
            continue
        if word.dep in np_deps:
-            if any(w.i in seen for w in word.subtree):
-                continue
-            seen.update(j for j in range(word.left_edge.i, word.right_edge.i + 1))
+            prev_end = word.right_edge.i
            yield word.left_edge.i, word.right_edge.i + 1, np_label
        elif word.dep == conj:
            head = word.head
@ -40,9 +43,7 @@ def noun_chunks(obj):
                head = head.head
            # If the head is an NP, and we're coordinated to it, we're an NP
            if head.dep in np_deps:
-                if any(w.i in seen for w in word.subtree):
-                    continue
-                seen.update(j for j in range(word.left_edge.i, word.right_edge.i + 1))
+                prev_end = word.right_edge.i
                yield word.left_edge.i, word.right_edge.i + 1, np_label


--- a/spacy/lang/it/init.py
+++ b/spacy/lang/it/init.py
@ -4,7 +4,7 @@ from __future__ import unicode_literals
 from .stop_words import STOP_WORDS
 from .tag_map import TAG_MAP
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
-from .punctuation import TOKENIZER_INFIXES
+from .punctuation import TOKENIZER_PREFIXES, TOKENIZER_INFIXES

 from ..tokenizer_exceptions import BASE_EXCEPTIONS
 from ..norm_exceptions import BASE_NORMS
@ -22,6 +22,7 @@ class ItalianDefaults(Language.Defaults):
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    tag_map = TAG_MAP
+    prefixes = TOKENIZER_PREFIXES
    infixes = TOKENIZER_INFIXES


--- a/spacy/lang/it/punctuation.py
+++ b/spacy/lang/it/punctuation.py
@ -1,15 +1,32 @@
 # coding: utf8
 from __future__ import unicode_literals

-from ..punctuation import TOKENIZER_INFIXES
-from ..char_classes import ALPHA
+from ..punctuation import TOKENIZER_PREFIXES as BASE_TOKENIZER_PREFIXES
+from ..char_classes import LIST_ELLIPSES, LIST_ICONS
+from ..char_classes import ALPHA, HYPHENS, CONCAT_QUOTES
+from ..char_classes import ALPHA_LOWER, ALPHA_UPPER


-ELISION = " ' ’ ".strip().replace(" ", "")
+ELISION = "'’"


-_infixes = TOKENIZER_INFIXES + [
-    r"(?<=[{a}][{el}])(?=[{a}])".format(a=ALPHA, el=ELISION)
-]
+_prefixes = [r"'[0-9][0-9]", r"[0-9]+°"] + BASE_TOKENIZER_PREFIXES

+
+_infixes = (
+    LIST_ELLIPSES
+    + LIST_ICONS
+    + [
+        r"(?<=[0-9])[+\-\*^](?=[0-9-])",
+        r"(?<=[{al}{q}])\.(?=[{au}{q}])".format(
+            al=ALPHA_LOWER, au=ALPHA_UPPER, q=CONCAT_QUOTES
+        ),
+        r"(?<=[{a}]),(?=[{a}])".format(a=ALPHA),
+        r"(?<=[{a}])(?:{h})(?=[{al}])".format(a=ALPHA, h=HYPHENS, al=ALPHA_LOWER),
+        r"(?<=[{a}0-9])[:<>=\/](?=[{a}])".format(a=ALPHA),
+        r"(?<=[{a}][{el}])(?=[{a}0-9\"])".format(a=ALPHA, el=ELISION),
+    ]
+)
+
+TOKENIZER_PREFIXES = _prefixes
 TOKENIZER_INFIXES = _infixes
--- a/spacy/lang/it/tokenizer_exceptions.py
+++ b/spacy/lang/it/tokenizer_exceptions.py
@ -2,6 +2,56 @@
 from __future__ import unicode_literals
 from ...symbols import ORTH, LEMMA

-_exc = {"po'": [{ORTH: "po'", LEMMA: "poco"}]}
+_exc = {
+    "all'art.": [{ORTH: "all'"}, {ORTH: "art."}],
+    "dall'art.": [{ORTH: "dall'"}, {ORTH: "art."}],
+    "dell'art.": [{ORTH: "dell'"}, {ORTH: "art."}],
+    "L'art.": [{ORTH: "L'"}, {ORTH: "art."}],
+    "l'art.": [{ORTH: "l'"}, {ORTH: "art."}],
+    "nell'art.": [{ORTH: "nell'"}, {ORTH: "art."}],
+    "po'": [{ORTH: "po'", LEMMA: "poco"}],
+    "sett..": [{ORTH: "sett."}, {ORTH: "."}],
+}
+
+for orth in [
+    "..",
+    "....",
+    "al.",
+    "all-path",
+    "art.",
+    "Art.",
+    "artt.",
+    "att.",
+    "by-pass",
+    "c.d.",
+    "centro-sinistra",
+    "check-up",
+    "Civ.",
+    "cm.",
+    "Cod.",
+    "col.",
+    "Cost.",
+    "d.C.",
+    'de"',
+    "distr.",
+    "E'",
+    "ecc.",
+    "e-mail",
+    "e/o",
+    "etc.",
+    "Jr.",
+    "n°",
+    "nord-est",
+    "pag.",
+    "Proc.",
+    "prof.",
+    "sett.",
+    "s.p.a.",
+    "ss.",
+    "St.",
+    "tel.",
+    "week-end",
+]:
+    _exc[orth] = [{ORTH: orth}]

 TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/lang/kn/examples.py
+++ b/spacy/lang/kn/examples.py
@ -0,0 +1,22 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+"""
+Example sentences to test spaCy and its language models.
+
+>>> from spacy.lang.en.examples import sentences
+>>> docs = nlp.pipe(sentences)
+"""
+
+
+sentences = [
+    "ಆಪಲ್ ಒಂದು ಯು.ಕೆ. ಸ್ಟಾರ್ಟ್ಅಪ್ ಅನ್ನು ೧ ಶತಕೋಟಿ ಡಾಲರ್ಗಳಿಗೆ ಖರೀದಿಸಲು ನೋಡುತ್ತಿದೆ.",
+    "ಸ್ವಾಯತ್ತ ಕಾರುಗಳು ವಿಮಾ ಹೊಣೆಗಾರಿಕೆಯನ್ನು ತಯಾರಕರ ಕಡೆಗೆ ಬದಲಾಯಿಸುತ್ತವೆ.",
+    "ಕಾಲುದಾರಿ ವಿತರಣಾ ರೋಬೋಟ್‌ಗಳನ್ನು ನಿಷೇಧಿಸುವುದನ್ನು ಸ್ಯಾನ್ ಫ್ರಾನ್ಸಿಸ್ಕೊ ಪರಿಗಣಿಸುತ್ತದೆ.",
+    "ಲಂಡನ್ ಯುನೈಟೆಡ್ ಕಿಂಗ್‌ಡಂನ ದೊಡ್ಡ ನಗರ.",
+    "ನೀನು ಎಲ್ಲಿದಿಯಾ?",
+    "ಫ್ರಾನ್ಸಾದ ಅಧ್ಯಕ್ಷರು ಯಾರು?",
+    "ಯುನೈಟೆಡ್ ಸ್ಟೇಟ್ಸ್ನ ರಾಜಧಾನಿ ಯಾವುದು?",
+    "ಬರಾಕ್ ಒಬಾಮ ಯಾವಾಗ ಜನಿಸಿದರು?",
+]
--- a/spacy/lang/ko/examples.py
+++ b/spacy/lang/ko/examples.py
@ -9,8 +9,8 @@ Example sentences to test spaCy and its language models.
 """

 sentences = [
-    "애플이 영국의 신생 기업을 10억 달러에 구매를 고려중이다.",
-    "자동 운전 자동차의 손해 배상 책임에 자동차 메이커에 일정한 부담을 요구하겠다.",
-    "자동 배달 로봇이 보도를 주행하는 것을 샌프란시스코시가 금지를 검토중이라고 합니다.",
+    "애플이 영국의 스타트업을 10억 달러에 인수하는 것을 알아보고 있다.",
+    "자율주행 자동차의 손해 배상 책임이 제조 업체로 옮겨 가다",
+    "샌프란시스코 시가 자동 배달 로봇의 보도 주행 금지를 검토 중이라고 합니다.",
    "런던은 영국의 수도이자 가장 큰 도시입니다.",
 ]
--- a/spacy/lang/lb/init.py
+++ b/spacy/lang/lb/init.py
@ -2,26 +2,21 @@
 from __future__ import unicode_literals

 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
-from .norm_exceptions import NORM_EXCEPTIONS
 from .punctuation import TOKENIZER_INFIXES
 from .lex_attrs import LEX_ATTRS
 from .tag_map import TAG_MAP
 from .stop_words import STOP_WORDS

 from ..tokenizer_exceptions import BASE_EXCEPTIONS
-from ..norm_exceptions import BASE_NORMS
 from ...language import Language
-from ...attrs import LANG, NORM
-from ...util import update_exc, add_lookups
+from ...attrs import LANG
+from ...util import update_exc


 class LuxembourgishDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = lambda text: "lb"
-    lex_attr_getters[NORM] = add_lookups(
-        Language.Defaults.lex_attr_getters[NORM], NORM_EXCEPTIONS, BASE_NORMS
-    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    tag_map = TAG_MAP
--- a/spacy/lang/lb/norm_exceptions.py
+++ b/spacy/lang/lb/norm_exceptions.py
@ -1,16 +0,0 @@
-# coding: utf8
-from __future__ import unicode_literals
-
-# TODO
-# norm execptions: find a possibility to deal with the zillions of spelling
-# variants (vläicht = vlaicht, vleicht, viläicht, viläischt, etc. etc.)
-# here one could include the most common spelling mistakes
-
-_exc = {"dass": "datt", "viläicht": "vläicht"}
-
-
-NORM_EXCEPTIONS = {}
-
-for string, norm in _exc.items():
-    NORM_EXCEPTIONS[string] = norm
-    NORM_EXCEPTIONS[string.title()] = norm
--- a/spacy/lang/lex_attrs.py
+++ b/spacy/lang/lex_attrs.py
@ -186,10 +186,6 @@ def suffix(string):
    return string[-3:]


-def cluster(string):
-    return 0
-
-
 def is_alpha(string):
    return string.isalpha()

@ -218,20 +214,11 @@ def is_stop(string, stops=set()):
    return string.lower() in stops


-def is_oov(string):
-    return True
-
-
-def get_prob(string):
-    return -20.0
-
-
 LEX_ATTRS = {
    attrs.LOWER: lower,
    attrs.NORM: lower,
    attrs.PREFIX: prefix,
    attrs.SUFFIX: suffix,
-    attrs.CLUSTER: cluster,
    attrs.IS_ALPHA: is_alpha,
    attrs.IS_DIGIT: is_digit,
    attrs.IS_LOWER: is_lower,
@ -239,8 +226,6 @@ LEX_ATTRS = {
    attrs.IS_TITLE: is_title,
    attrs.IS_UPPER: is_upper,
    attrs.IS_STOP: is_stop,
-    attrs.IS_OOV: is_oov,
-    attrs.PROB: get_prob,
    attrs.LIKE_EMAIL: like_email,
    attrs.LIKE_NUM: like_num,
    attrs.IS_PUNCT: is_punct,
--- a/spacy/lang/lij/init.py
+++ b/spacy/lang/lij/init.py
@ -0,0 +1,31 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from .stop_words import STOP_WORDS
+from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
+from .punctuation import TOKENIZER_INFIXES
+
+from ..tokenizer_exceptions import BASE_EXCEPTIONS
+from ..norm_exceptions import BASE_NORMS
+from ...language import Language
+from ...attrs import LANG, NORM
+from ...util import update_exc, add_lookups
+
+
+class LigurianDefaults(Language.Defaults):
+    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    lex_attr_getters[LANG] = lambda text: "lij"
+    lex_attr_getters[NORM] = add_lookups(
+        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
+    )
+    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
+    stop_words = STOP_WORDS
+    infixes = TOKENIZER_INFIXES
+
+
+class Ligurian(Language):
+    lang = "lij"
+    Defaults = LigurianDefaults
+
+
+__all__ = ["Ligurian"]
--- a/spacy/lang/lij/examples.py
+++ b/spacy/lang/lij/examples.py
@ -0,0 +1,18 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+"""
+Example sentences to test spaCy and its language models.
+
+>>> from spacy.lang.lij.examples import sentences
+>>> docs = nlp.pipe(sentences)
+"""
+
+
+sentences = [
+    "Sciusciâ e sciorbî no se peu.",
+    "Graçie di çetroin, che me son arrivæ.",
+    "Vegnime apreuvo, che ve fasso pescâ di òmmi.",
+    "Bella pe sempre l'ægua inta conchetta quande unn'agoggia d'ægua a se â trapaña.",
+]
--- a/spacy/lang/lij/punctuation.py
+++ b/spacy/lang/lij/punctuation.py
@ -0,0 +1,15 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ..punctuation import TOKENIZER_INFIXES
+from ..char_classes import ALPHA
+
+
+ELISION = " ' ’ ".strip().replace(" ", "").replace("\n", "")
+
+
+_infixes = TOKENIZER_INFIXES + [
+    r"(?<=[{a}][{el}])(?=[{a}])".format(a=ALPHA, el=ELISION)
+]
+
+TOKENIZER_INFIXES = _infixes
--- a/spacy/lang/lij/stop_words.py
+++ b/spacy/lang/lij/stop_words.py
@ -0,0 +1,43 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+STOP_WORDS = set(
+    """
+a à â a-a a-e a-i a-o aiva aloa an ancheu ancon apreuvo ascì atra atre atri atro avanti avei
+
+bella belle belli bello ben
+
+ch' che chì chi ciù co-a co-e co-i co-o comm' comme con cösa coscì cöse
+
+d' da da-a da-e da-i da-o dapeu de delongo derê di do doe doî donde dòppo
+
+é e ê ea ean emmo en ëse
+
+fin fiña
+
+gh' ghe guæei
+
+i î in insemme int' inta inte inti into
+
+l' lê lì lô
+
+m' ma manco me megio meno mezo mi
+
+na n' ne ni ninte nisciun nisciuña no
+
+o ò ô oua
+
+parte pe pe-a pe-i pe-e pe-o perché pittin pö primma pròpio
+
+quæ quand' quande quarche quella quelle quelli quello
+
+s' sce scê sci sciâ sciô sciù se segge seu sò solo son sott' sta stæta stæte stæti stæto ste sti sto
+
+tanta tante tanti tanto te ti torna tra tròppo tutta tutte tutti tutto
+
+un uña unn' unna
+
+za zu
+""".split()
+)
--- a/spacy/lang/lij/tokenizer_exceptions.py
+++ b/spacy/lang/lij/tokenizer_exceptions.py
@ -0,0 +1,52 @@
+# coding: utf8
+from __future__ import unicode_literals
+from ...symbols import ORTH, LEMMA
+
+_exc = {}
+
+for raw, lemma in [
+    ("a-a", "a-o"),
+    ("a-e", "a-o"),
+    ("a-o", "a-o"),
+    ("a-i", "a-o"),
+    ("co-a", "co-o"),
+    ("co-e", "co-o"),
+    ("co-i", "co-o"),
+    ("co-o", "co-o"),
+    ("da-a", "da-o"),
+    ("da-e", "da-o"),
+    ("da-i", "da-o"),
+    ("da-o", "da-o"),
+    ("pe-a", "pe-o"),
+    ("pe-e", "pe-o"),
+    ("pe-i", "pe-o"),
+    ("pe-o", "pe-o"),
+]:
+    for orth in [raw, raw.capitalize()]:
+        _exc[orth] = [{ORTH: orth, LEMMA: lemma}]
+
+# Prefix + prepositions with à (e.g. "sott'a-o")
+
+for prep, prep_lemma in [
+    ("a-a", "a-o"),
+    ("a-e", "a-o"),
+    ("a-o", "a-o"),
+    ("a-i", "a-o"),
+]:
+    for prefix, prefix_lemma in [
+        ("sott'", "sotta"),
+        ("sott’", "sotta"),
+        ("contr'", "contra"),
+        ("contr’", "contra"),
+        ("ch'", "che"),
+        ("ch’", "che"),
+        ("s'", "se"),
+        ("s’", "se"),
+    ]:
+        for prefix_orth in [prefix, prefix.capitalize()]:
+            _exc[prefix_orth + prep] = [
+                {ORTH: prefix_orth, LEMMA: prefix_lemma},
+                {ORTH: prep, LEMMA: prep_lemma},
+            ]
+
+TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/lang/lt/init.py
+++ b/spacy/lang/lt/init.py
@ -1,6 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals

+from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .stop_words import STOP_WORDS
 from .lex_attrs import LEX_ATTRS
@ -26,7 +27,13 @@ class LithuanianDefaults(Language.Defaults):
    )
    lex_attr_getters.update(LEX_ATTRS)

-    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
+    infixes = TOKENIZER_INFIXES
+    suffixes = TOKENIZER_SUFFIXES
+    mod_base_exceptions = {
+        exc: val for exc, val in BASE_EXCEPTIONS.items() if not exc.endswith(".")
+    }
+    del mod_base_exceptions["8)"]
+    tokenizer_exceptions = update_exc(mod_base_exceptions, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    tag_map = TAG_MAP
    morph_rules = MORPH_RULES
--- a/Show More
+++ b/Show More