Merge branch 'master' into feature/nel-wiki

2025-11-23 03:06:12 +03:00 · 2019-06-03 09:35:10 +02:00 · 2019-06-03 09:35:10 +02:00 · d83a1e3052
commit d83a1e3052
parent 9e88763dab e703301129
97 changed files with 3825 additions and 435 deletions
--- a/.github/contributors/BreakBB.md
+++ b/.github/contributors/BreakBB.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your 
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                    |
 |------------------------------- | ------------------------ |
 | Name                           | Björn Böing              |
 | Company name (if applicable)   |                          |
 | Title or role (if applicable)  |                          |
 | Date                           | 15.04.2019               |
 | GitHub username                | BreakBB                  |
 | Website (optional)             |                          |
--- a/.github/contributors/Dobita21.md
+++ b/.github/contributors/Dobita21.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |  Nattapol            |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |  18.04.2019          |
 | GitHub username                |  Dobita21            |
 | Website (optional)             |                      |
--- a/.github/contributors/F0rge1cE.md
+++ b/.github/contributors/F0rge1cE.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [x] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |  Icarus Xu           |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |  05/06/2019          |
 | GitHub username                |  F0rge1cE            |
 | Website (optional)             |                      |
--- a/.github/contributors/NirantK.md
+++ b/.github/contributors/NirantK.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Nirant Kasliwal      |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |                      |
 | GitHub username                | NirantK              |
 | Website (optional)             | https://nirantk.com  |
--- a/.github/contributors/aaronkub.md
+++ b/.github/contributors/aaronkub.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Aaron Kub            |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 2019-05-09           |
 | GitHub username                | aaronkub             |
 | Website (optional)             |                      |
--- a/.github/contributors/amitness.md
+++ b/.github/contributors/amitness.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [X] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Amit Chaudhary                     |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |  April 29, 2019                    |
 | GitHub username                |  amitness      |
 | Website (optional)             |  https://amitness.com                    |
--- a/.github/contributors/bjascob.md
+++ b/.github/contributors/bjascob.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Brad Jascob          |
 | Company name (if applicable)   | n/a                  |
 | Title or role (if applicable)  | Software Engineer    |
 | Date                           | 04/25/2019           |
 | GitHub username                | bjascob              |
 | Website (optional)             | n/a                  |
--- a/.github/contributors/bryant1410.md
+++ b/.github/contributors/bryant1410.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Santiago Castro      |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 2019-04-09           |
 | GitHub username                | bryant1410           |
 | Website (optional)             |                      |
--- a/.github/contributors/celikomer.md
+++ b/.github/contributors/celikomer.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |  Omer Celik          |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |  04/11/2019          |
 | GitHub username                |  celikomer           |
 | Website (optional)             |  www.ocelik.com      |
--- a/.github/contributors/estr4ng7d.md
+++ b/.github/contributors/estr4ng7d.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |   Amey Baviskar      |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |   21-May-2019        |
 | GitHub username                |   estr4ng7d          |
 | Website (optional)             |                      |
--- a/.github/contributors/fizban99.md
+++ b/.github/contributors/fizban99.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |  A.I.M.              |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |  16.04.2019          |
 | GitHub username                |  fizban99            |
 | Website (optional)             |                      |
--- a/.github/contributors/henry860916.md
+++ b/.github/contributors/henry860916.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                    |
 |------------------------------- | ------------------------ |
 | Name                           | Henry Zhang              |
 | Company name (if applicable)   |                          |
 | Title or role (if applicable)  |                          |
 | Date                           | 2019-04-30               |
 | GitHub username                | henry860916              |
 | Website (optional)             |                          |
--- a/.github/contributors/ldorigo.md
+++ b/.github/contributors/ldorigo.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Luca Dorigo          |
 | Company name (if applicable)   | /                    |
 | Title or role (if applicable)  | /                    |
 | Date                           | 08.05.2019           |
 | GitHub username                | ldorigo              |
 | Website (optional)             | /                    |
--- a/.github/contributors/munozbravo.md
+++ b/.github/contributors/munozbravo.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Germán Muñoz |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 2019-06-01 |
 | GitHub username                | munozbravo |
 | Website (optional)             |                      |
--- a/.github/contributors/nipunsadvilkar.md
+++ b/.github/contributors/nipunsadvilkar.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [x] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |      Nipun Sadvilkar |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |      31st May, 2019  |
 | GitHub username                |        nipunsadvilkar|
 | Website (optional)             |https://nipunsadvilkar.github.io/|
--- a/.github/contributors/pickfire.md
+++ b/.github/contributors/pickfire.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [ ] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [x] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Ivan Tham Jun Hoe    |
 | Company name (if applicable)   | Semut                |
 | Title or role (if applicable)  | Data Analyst         |
 | Date                           | Apr 11, 2019         |
 | GitHub username                | pickfire             |
 | Website (optional)             | https://pickfire.tk  |
--- a/.github/contributors/richardpaulhudson.md
+++ b/.github/contributors/richardpaulhudson.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Richard Paul Hudson  |
 | Company name (if applicable)   | msg systems ag       |
 | Title or role (if applicable)  | Principal IT Consultant|
 | Date                           | 06. May 2019         |
 | GitHub username                | richardpaulhudson    |
 | Website (optional)             |                      |
--- a/.github/contributors/ujwal-narayan.md
+++ b/.github/contributors/ujwal-narayan.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |      Ujwal Narayan   |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |   17/05/2019         |
 | GitHub username                |   ujwal-narayan      |
 | Website (optional)             |                      |
--- a/.github/contributors/xssChauhan.md
+++ b/.github/contributors/xssChauhan.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |       Shikhar Chauhan               |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |      12/11/2019                |
 | GitHub username                |      xssChauhan                |
 | Website (optional)             |                      |
--- a/.github/contributors/yaph.md
+++ b/.github/contributors/yaph.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI GmbH](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Ramiro Gómez         |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 2019-04-29           |
 | GitHub username                | yaph                 |
 | Website (optional)             | http://ramiro.org/   |
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -447,17 +447,7 @@ use the `get_doc()` utility function to construct it manually.
 ## Updating the website
-Our [website and docs](https://spacy.io) are implemented in
+For instructions on how to build and run the [website](https://spacy.io) locally see **[Setup and installation](https://github.com/explosion/spaCy/blob/master/website/README.md#setup-and-installation-setup)** in the *website* directory's README.
 [Jade/Pug](https://www.jade-lang.org), and built or served by
 [Harp](https://harpjs.com). Jade/Pug is an extensible templating language with a
 readable syntax, that compiles to HTML. Here's how to view the site locally:
 ```bash
 sudo npm install --global harp
 git clone https://github.com/explosion/spaCy
 cd spaCy/website
 harp server
 ```
 The docs can always use another example or more detail, and they should always
 be up to date and not misleading. To quickly find the correct file to edit,
--- a/README.md
+++ b/README.md
@ -6,11 +6,10 @@ spaCy is a library for advanced Natural Language Processing in Python and
 Cython. It's built on the very latest research, and was designed from day one
 to be used in real products. spaCy comes with
 [pre-trained statistical models](https://spacy.io/models) and word vectors, and
-currently supports tokenization for **45+ languages**. It features the
+currently supports tokenization for **49+ languages**. It features
-**fastest syntactic parser** in the world, convolutional
+state-of-the-art speed, convolutional **neural network models** for tagging,
-**neural network models** for tagging, parsing and **named entity recognition**
+parsing and **named entity recognition** and easy **deep learning** integration.
-and easy **deep learning** integration. It's commercial open-source software,
+It's commercial open-source software, released under the MIT license.
 released under the MIT license.
 💫 **Version 2.1 out now!** [Check out the release notes here.](https://github.com/explosion/spaCy/releases)
@ -66,11 +65,11 @@ valuable if it's shared publicly, so that more people can benefit from it.
 ## Features
 -   **Fastest syntactic parser** in the world
 -   **Named entity** recognition
 -   Non-destructive **tokenization**
-   Support for **45+ languages**
+-   **Named entity** recognition
 -   Support for **49+ languages**
 -   Pre-trained [statistical models](https://spacy.io/models) and word vectors
 -   State-of-the-art speed
 -   Easy **deep learning** integration
 -   Part-of-speech tagging
 -   Labelled dependency parsing
@ -80,7 +79,6 @@ valuable if it's shared publicly, so that more people can benefit from it.
 -   Export to numpy data arrays
 -   Efficient binary serialization
 -   Easy **model packaging** and deployment
 -   State-of-the-art speed
 -   Robust, rigorously evaluated accuracy
 📖 **For more details, see the
--- a/bin/push-tag.sh
+++ b/bin/push-tag.sh
@ -16,4 +16,4 @@ version=${version/\'/}
 version=${version/\"/}
 version=${version/\"/}
 git tag "v$version"
-git push origin "v$version" --tags
+git push origin "v$version"
--- a/examples/information_extraction/entity_relations.py
+++ b/examples/information_extraction/entity_relations.py
@ -36,11 +36,27 @@ def main(model="en_core_web_sm"):
            print("{:<10}\t{}\t{}".format(r1.text, r2.ent_type_, r2.text))
 def filter_spans(spans):
    # Filter a sequence of spans so they don't contain overlaps
    get_sort_key = lambda span: (span.end - span.start, span.start)
    sorted_spans = sorted(spans, key=get_sort_key, reverse=True)
    result = []
    seen_tokens = set()
    for span in sorted_spans:
        if span.start not in seen_tokens and span.end - 1 not in seen_tokens:
            result.append(span)
            seen_tokens.update(range(span.start, span.end))
    return result
 def extract_currency_relations(doc):
-    # merge entities and noun chunks into one token
+    # Merge entities and noun chunks into one token
    seen_tokens = set()
    spans = list(doc.ents) + list(doc.noun_chunks)
-    for span in spans:
+    spans = filter_spans(spans)
-        span.merge()
+    with doc.retokenize() as retokenizer:
        for span in spans:
            retokenizer.merge(span)
    relations = []
    for money in filter(lambda w: w.ent_type_ == "MONEY", doc):
--- a/requirements.txt
+++ b/requirements.txt
@ -9,9 +9,10 @@ srsly>=0.0.5,<1.1.0
 # Third party dependencies
 numpy>=1.15.0
 requests>=2.13.0,<3.0.0
 jsonschema>=2.6.0,<3.0.0
 plac<1.0.0,>=0.9.6
 pathlib==1.0.1; python_version < "3.4"
 # Optional dependencies
 jsonschema>=2.6.0,<3.1.0
 # Development dependencies
 cython>=0.25
 pytest>=4.0.0,<4.1.0
--- a/setup.py
+++ b/setup.py
@ -209,7 +209,7 @@ def setup_package():
            generate_cython(root, "spacy")
        setup(
-            name=about["__title__"],
+            name="spacy",
            zip_safe=False,
            packages=PACKAGES,
            package_data=PACKAGE_DATA,
@ -232,7 +232,6 @@ def setup_package():
                "blis>=0.2.2,<0.3.0",
                "plac<1.0.0,>=0.9.6",
                "requests>=2.13.0,<3.0.0",
                "jsonschema>=2.6.0,<3.0.0",
                "wasabi>=0.2.0,<1.1.0",
                "srsly>=0.0.5,<1.1.0",
                'pathlib==1.0.1; python_version < "3.4"',
--- a/spacy/about.py
+++ b/spacy/about.py
@ -4,13 +4,13 @@
 # fmt: off
 __title__ = "spacy"
-__version__ = "2.1.3"
+__version__ = "2.1.4"
 __summary__ = "Industrial-strength Natural Language Processing (NLP) with Python and Cython"
 __uri__ = "https://spacy.io"
 __author__ = "Explosion AI"
 __email__ = "contact@explosion.ai"
 __license__ = "MIT"
-__release__ = True
+__release__ = False
 __download_url__ = "https://github.com/explosion/spacy-models/releases/download"
 __compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
--- a/spacy/cli/convert.py
+++ b/spacy/cli/convert.py
@ -39,7 +39,7 @@ FILE_TYPES_STDOUT = ("json", "jsonl")
 def convert(
    input_file,
    output_dir="-",
-    file_type="jsonl",
+    file_type="json",
    n_sents=1,
    morphology=False,
    converter="auto",
@ -48,8 +48,8 @@ def convert(
    """
    Convert files into JSON format for use with train command and other
    experiment management functions. If no output_dir is specified, the data
-    is written to stdout, so you can pipe them forward to a JSONL file:
+    is written to stdout, so you can pipe them forward to a JSON file:
-    $ spacy convert some_file.conllu > some_file.jsonl
+    $ spacy convert some_file.conllu > some_file.json
    """
    msg = Printer()
    input_path = Path(input_file)
--- a/spacy/cli/converters/iob2json.py
+++ b/spacy/cli/converters/iob2json.py
@ -11,14 +11,8 @@ def iob2json(input_data, n_sents=10, *args, **kwargs):
    """
    Convert IOB files into JSON format for use with train cli.
    """
-    docs = []
+    sentences = read_iob(input_data.split("\n"))
-    for group in minibatch(docs, n_sents):
+    docs = merge_sentences(sentences, n_sents)
        group = list(group)
        first = group.pop(0)
        to_extend = first["paragraphs"][0]["sentences"]
        for sent in group[1:]:
            to_extend.extend(sent["paragraphs"][0]["sentences"])
        docs.append(first)
    return docs
@ -27,7 +21,6 @@ def read_iob(raw_sents):
    for line in raw_sents:
        if not line.strip():
            continue
        # tokens = [t.split("|") for t in line.split()]
        tokens = [re.split("[^\w\-]", line.strip())]
        if len(tokens[0]) == 3:
            words, pos, iob = zip(*tokens)
@ -49,3 +42,15 @@ def read_iob(raw_sents):
    paragraphs = [{"sentences": [sent]} for sent in sentences]
    docs = [{"id": 0, "paragraphs": [para]} for para in paragraphs]
    return docs
 def merge_sentences(docs, n_sents):
    merged = []
    for group in minibatch(docs, size=n_sents):
        group = list(group)
        first = group.pop(0)
        to_extend = first["paragraphs"][0]["sentences"]
        for sent in group[1:]:
            to_extend.extend(sent["paragraphs"][0]["sentences"])
        merged.append(first)
    return merged
--- a/spacy/cli/evaluate.py
+++ b/spacy/cli/evaluate.py
@ -17,6 +17,7 @@ from .. import displacy
    gpu_id=("Use GPU", "option", "g", int),
    displacy_path=("Directory to output rendered parses as HTML", "option", "dp", str),
    displacy_limit=("Limit of parses to render as HTML", "option", "dl", int),
    return_scores=("Return dict containing model scores", "flag", "R", bool),
 )
 def evaluate(
    model,
@ -25,6 +26,7 @@ def evaluate(
    gold_preproc=False,
    displacy_path=None,
    displacy_limit=25,
    return_scores=False,
 ):
    """
    Evaluate a model. To render a sample of parses in a HTML file, set an
@ -75,6 +77,8 @@ def evaluate(
            ents=render_ents,
        )
        msg.good("Generated {} parses as HTML".format(displacy_limit), displacy_path)
    if return_scores:
        return scorer.scores
 def render_parses(docs, output_path, model_name="", limit=250, deps=True, ents=True):
--- a/spacy/cli/init_model.py
+++ b/spacy/cli/init_model.py
@ -181,7 +181,7 @@ def read_vectors(vectors_loc):
    vectors_keys = []
    for i, line in enumerate(tqdm(f)):
        line = line.rstrip()
-        pieces = line.rsplit(" ", vectors_data.shape[1] + 1)
+        pieces = line.rsplit(" ", vectors_data.shape[1])
        word = pieces.pop(0)
        if len(pieces) != vectors_data.shape[1]:
            msg.fail(Errors.E094.format(line_num=i, loc=vectors_loc), exits=1)
--- a/spacy/cli/pretrain.py
+++ b/spacy/cli/pretrain.py
@ -34,7 +34,8 @@ from .. import util
    max_length=("Max words per example.", "option", "xw", int),
    min_length=("Min words per example.", "option", "nw", int),
    seed=("Seed for random number generators", "option", "s", float),
-    nr_iter=("Number of iterations to pretrain", "option", "i", int),
+    n_iter=("Number of iterations to pretrain", "option", "i", int),
    n_save_every=("Save model every X batches.", "option", "se", int),
 )
 def pretrain(
    texts_loc,
@ -46,11 +47,12 @@ def pretrain(
    loss_func="cosine",
    use_vectors=False,
    dropout=0.2,
-    nr_iter=1000,
+    n_iter=1000,
    batch_size=3000,
    max_length=500,
    min_length=5,
    seed=0,
    n_save_every=None,
 ):
    """
    Pre-train the 'token-to-vector' (tok2vec) layer of pipeline components,
@ -115,9 +117,26 @@ def pretrain(
    msg.divider("Pre-training tok2vec layer")
    row_settings = {"widths": (3, 10, 10, 6, 4), "aligns": ("r", "r", "r", "r", "r")}
    msg.row(("#", "# Words", "Total Loss", "Loss", "w/s"), **row_settings)
-    for epoch in range(nr_iter):
+
-        for batch in util.minibatch_by_words(
+    def _save_model(epoch, is_temp=False):
-            ((text, None) for text in texts), size=batch_size
+        is_temp_str = ".temp" if is_temp else ""
        with model.use_params(optimizer.averages):
            with (output_dir / ("model%d%s.bin" % (epoch, is_temp_str))).open(
                "wb"
            ) as file_:
                file_.write(model.tok2vec.to_bytes())
            log = {
                "nr_word": tracker.nr_word,
                "loss": tracker.loss,
                "epoch_loss": tracker.epoch_loss,
                "epoch": epoch,
            }
            with (output_dir / "log.jsonl").open("a") as file_:
                file_.write(srsly.json_dumps(log) + "\n")
    for epoch in range(n_iter):
        for batch_id, batch in enumerate(
            util.minibatch_by_words(((text, None) for text in texts), size=batch_size)
        ):
            docs = make_docs(
                nlp,
@ -133,17 +152,9 @@ def pretrain(
                msg.row(progress, **row_settings)
                if texts_loc == "-" and tracker.words_per_epoch[epoch] >= 10 ** 7:
                    break
-        with model.use_params(optimizer.averages):
+            if n_save_every and (batch_id % n_save_every == 0):
-            with (output_dir / ("model%d.bin" % epoch)).open("wb") as file_:
+                _save_model(epoch, is_temp=True)
-                file_.write(model.tok2vec.to_bytes())
+        _save_model(epoch)
            log = {
                "nr_word": tracker.nr_word,
                "loss": tracker.loss,
                "epoch_loss": tracker.epoch_loss,
                "epoch": epoch,
            }
            with (output_dir / "log.jsonl").open("a") as file_:
                file_.write(srsly.json_dumps(log) + "\n")
        tracker.epoch_loss = 0.0
        if texts_loc != "-":
            # Reshuffle the texts if texts were loaded from a file
@ -170,10 +181,10 @@ def make_update(model, docs, optimizer, drop=0.0, objective="L2"):
 def make_docs(nlp, batch, min_length, max_length):
    docs = []
    for record in batch:
        text = record["text"]
        if "tokens" in record:
            doc = Doc(nlp.vocab, words=record["tokens"])
        else:
            text = record["text"]
            doc = nlp.make_doc(text)
        if "heads" in record:
            heads = record["heads"]
--- a/spacy/cli/train.py
+++ b/spacy/cli/train.py
@ -16,6 +16,7 @@ import random
 from .._ml import create_default_optimizer
 from ..attrs import PROB, IS_OOV, CLUSTER, LANG
 from ..gold import GoldCorpus
 from ..compat import path2str
 from .. import util
 from .. import about
@ -35,6 +36,12 @@ from .. import about
    pipeline=("Comma-separated names of pipeline components", "option", "p", str),
    vectors=("Model to load vectors from", "option", "v", str),
    n_iter=("Number of iterations", "option", "n", int),
    n_early_stopping=(
        "Maximum number of training epochs without dev accuracy improvement",
        "option",
        "ne",
        int,
    ),
    n_examples=("Number of examples", "option", "ns", int),
    use_gpu=("Use GPU", "option", "g", int),
    version=("Model version", "option", "V", str),
@ -74,6 +81,7 @@ def train(
    pipeline="tagger,parser,ner",
    vectors=None,
    n_iter=30,
    n_early_stopping=None,
    n_examples=0,
    use_gpu=-1,
    version="0.0.0",
@ -101,6 +109,7 @@ def train(
    train_path = util.ensure_path(train_path)
    dev_path = util.ensure_path(dev_path)
    meta_path = util.ensure_path(meta_path)
    output_path = util.ensure_path(output_path)
    if raw_text is not None:
        raw_text = list(srsly.read_jsonl(raw_text))
    if not train_path or not train_path.exists():
@ -222,6 +231,8 @@ def train(
    msg.row(row_head, **row_settings)
    msg.row(["-" * width for width in row_settings["widths"]], **row_settings)
    try:
        iter_since_best = 0
        best_score = 0.0
        for i in range(n_iter):
            train_docs = corpus.train_docs(
                nlp, noise_level=noise_level, gold_preproc=gold_preproc, max_length=0
@ -276,7 +287,9 @@ def train(
                        gpu_wps = nwords / (end_time - start_time)
                        with Model.use_device("cpu"):
                            nlp_loaded = util.load_model_from_path(epoch_model_path)
-                            nlp_loaded.parser.cfg["beam_width"]
+                            for name, component in nlp_loaded.pipeline:
                                if hasattr(component, "cfg"):
                                    component.cfg["beam_width"] = beam_width
                            dev_docs = list(
                                corpus.dev_docs(nlp_loaded, gold_preproc=gold_preproc)
                            )
@ -328,6 +341,24 @@ def train(
                        gpu_wps=gpu_wps,
                    )
                    msg.row(progress, **row_settings)
                # Early stopping
                if n_early_stopping is not None:
                    current_score = _score_for_model(meta)
                    if current_score < best_score:
                        iter_since_best += 1
                    else:
                        iter_since_best = 0
                        best_score = current_score
                    if iter_since_best >= n_early_stopping:
                        msg.text(
                            "Early stopping, best iteration "
                            "is: {}".format(i - iter_since_best)
                        )
                        msg.text(
                            "Best score = {}; Final iteration "
                            "score = {}".format(best_score, current_score)
                        )
                        break
    finally:
        with nlp.use_params(optimizer.averages):
            final_model_path = output_path / "model-final"
@ -338,6 +369,20 @@ def train(
        msg.good("Created best model", best_model_path)
 def _score_for_model(meta):
    """ Returns mean score between tasks in pipeline that can be used for early stopping. """
    mean_acc = list()
    pipes = meta["pipeline"]
    acc = meta["accuracy"]
    if "tagger" in pipes:
        mean_acc.append(acc["tags_acc"])
    if "parser" in pipes:
        mean_acc.append((acc["uas"] + acc["las"]) / 2)
    if "ner" in pipes:
        mean_acc.append((acc["ents_p"] + acc["ents_r"] + acc["ents_f"]) / 3)
    return sum(mean_acc) / len(mean_acc)
@contextlib.contextmanager
 def _create_progress_bar(total):
    if int(os.environ.get("LOG_FRIENDLY", 0)):
@ -379,10 +424,12 @@ def _collate_best_model(meta, output_path, components):
    for component in components:
        bests[component] = _find_best(output_path, component)
    best_dest = output_path / "model-best"
-    shutil.copytree(output_path / "model-final", best_dest)
+    shutil.copytree(path2str(output_path / "model-final"), path2str(best_dest))
    for component, best_component_src in bests.items():
-        shutil.rmtree(best_dest / component)
+        shutil.rmtree(path2str(best_dest / component))
-        shutil.copytree(best_component_src / component, best_dest / component)
+        shutil.copytree(
            path2str(best_component_src / component), path2str(best_dest / component)
        )
        accs = srsly.read_json(best_component_src / "accuracy.json")
        for metric in _get_metrics(component):
            meta["accuracy"][metric] = accs[metric]
--- a/spacy/compat.py
+++ b/spacy/compat.py
@ -92,7 +92,9 @@ def symlink_to(orig, dest):
    if is_windows:
        import subprocess
-        subprocess.call(["mklink", "/d", path2str(orig), path2str(dest)], shell=True)
+        subprocess.check_call(
            ["mklink", "/d", path2str(orig), path2str(dest)], shell=True
        )
    else:
        orig.symlink_to(dest)
--- a/spacy/displacy/init.py
+++ b/spacy/displacy/init.py
@ -19,7 +19,7 @@ RENDER_WRAPPER = None
 def render(
-    docs, style="dep", page=False, minify=False, jupyter=False, options={}, manual=False
+    docs, style="dep", page=False, minify=False, jupyter=None, options={}, manual=False
 ):
    """Render displaCy visualisation.
@ -27,7 +27,7 @@ def render(
    style (unicode): Visualisation style, 'dep' or 'ent'.
    page (bool): Render markup as full HTML page.
    minify (bool): Minify HTML markup.
-    jupyter (bool): Experimental, use Jupyter's `display()` to output markup.
+    jupyter (bool): Override Jupyter auto-detection.
    options (dict): Visualiser-specific options, e.g. colors.
    manual (bool): Don't parse `Doc` and instead expect a dict/list of dicts.
    RETURNS (unicode): Rendered HTML markup.
@ -53,7 +53,8 @@ def render(
    html = _html["parsed"]
    if RENDER_WRAPPER is not None:
        html = RENDER_WRAPPER(html)
-    if jupyter or is_in_jupyter():  # return HTML rendered by IPython display()
+    if jupyter or (jupyter is None and is_in_jupyter()):
        # return HTML rendered by IPython display()
        from IPython.core.display import display, HTML
        return display(HTML(html))
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -141,8 +141,14 @@ class Errors(object):
    E023 = ("Error cleaning up beam: The same state occurred twice at "
            "memory address {addr} and position {i}.")
    E024 = ("Could not find an optimal move to supervise the parser. Usually, "
-            "this means the GoldParse was not correct. For example, are all "
+            "this means that the model can't be updated in a way that's valid "
-            "labels added to the model?")
+            "and satisfies the correct annotations specified in the GoldParse. "
            "For example, are all labels added to the model? If you're "
            "training a named entity recognizer, also make sure that none of "
            "your annotated entity spans have leading or trailing whitespace. "
            "You can also use the experimental `debug-data` command to "
            "validate your JSON-formatted training data. For details, run:\n"
            "python -m spacy debug-data --help")
    E025 = ("String is too long: {length} characters. Max is 2**30.")
    E026 = ("Error accessing token at position {i}: out of bounds in Doc of "
            "length {length}.")
@ -383,6 +389,10 @@ class Errors(object):
    E133 = ("The sum of prior probabilities for alias '{alias}' should not exceed 1, "
            "but found {sum}.")
    E134 = ("Alias '{alias}' defined for unknown entity '{entity}'.")
    E135 = ("If you meant to replace a built-in component, use `create_pipe`: "
            "`nlp.replace_pipe('{name}', nlp.create_pipe('{name}'))`")
    E136 = ("This additional feature requires the jsonschema library to be "
            "installed:\npip install jsonschema")
@add_codes
--- a/spacy/glossary.py
+++ b/spacy/glossary.py
@ -168,6 +168,7 @@ GLOSSARY = {
    # Dependency Labels (English)
    # ClearNLP / Universal Dependencies
    # https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md
    "acl": "clausal modifier of noun (adjectival clause)",
    "acomp": "adjectival complement",
    "advcl": "adverbial clause modifier",
    "advmod": "adverbial modifier",
@ -177,22 +178,32 @@ GLOSSARY = {
    "attr": "attribute",
    "aux": "auxiliary",
    "auxpass": "auxiliary (passive)",
    "case": "case marking",
    "cc": "coordinating conjunction",
    "ccomp": "clausal complement",
    "clf": "classifier",
    "complm": "complementizer",
    "compound": "compound",
    "conj": "conjunct",
    "cop": "copula",
    "csubj": "clausal subject",
    "csubjpass": "clausal subject (passive)",
    "dative": "dative",
    "dep": "unclassified dependent",
    "det": "determiner",
    "discourse": "discourse element",
    "dislocated": "dislocated elements",
    "dobj": "direct object",
    "expl": "expletive",
    "fixed": "fixed multiword expression",
    "flat": "flat multiword expression",
    "goeswith": "goes with",
    "hmod": "modifier in hyphenation",
    "hyph": "hyphen",
    "infmod": "infinitival modifier",
    "intj": "interjection",
    "iobj": "indirect object",
    "list": "list",
    "mark": "marker",
    "meta": "meta modifier",
    "neg": "negation modifier",
@ -201,11 +212,15 @@ GLOSSARY = {
    "npadvmod": "noun phrase as adverbial modifier",
    "nsubj": "nominal subject",
    "nsubjpass": "nominal subject (passive)",
    "nounmod": "modifier of nominal",
    "npmod": "noun phrase as adverbial modifier",
    "num": "number modifier",
    "number": "number compound modifier",
    "nummod": "numeric modifier",
    "oprd": "object predicate",
    "obj": "object",
    "obl": "oblique nominal",
    "orphan": "orphan",
    "parataxis": "parataxis",
    "partmod": "participal modifier",
    "pcomp": "complement of preposition",
@ -218,7 +233,10 @@ GLOSSARY = {
    "punct": "punctuation",
    "quantmod": "modifier of quantifier",
    "rcmod": "relative clause modifier",
    "relcl": "relative clause modifier",
    "reparandum": "overridden disfluency",
    "root": "root",
    "vocative": "vocative",
    "xcomp": "open clausal complement",
    # Dependency labels (German)
    # TIGER Treebank
--- a/spacy/gold.pyx
+++ b/spacy/gold.pyx
@ -532,7 +532,7 @@ cdef class GoldParse:
                        self.labels[i] = deps[i2j_multi[i]]
                    # Now set NER...This is annoying because if we've split
                    # got an entity word split into two, we need to adjust the
-                    # BILOU tags. We can't have BB or LL etc.
+                    # BILUO tags. We can't have BB or LL etc.
                    # Case 1: O -- easy.
                    ner_tag = entities[i2j_multi[i]]
                    if ner_tag == "O":
--- a/spacy/lang/de/stop_words.py
+++ b/spacy/lang/de/stop_words.py
@ -5,8 +5,8 @@ from __future__ import unicode_literals
 STOP_WORDS = set(
    """
 á a ab aber ach acht achte achten achter achtes ag alle allein allem allen
-aller allerdings alles allgemeinen als also am an andere anderen andern anders
+aller allerdings alles allgemeinen als also am an andere anderen anderem andern
-auch auf aus ausser außer ausserdem außerdem
+anders auch auf aus ausser außer ausserdem außerdem
 bald bei beide beiden beim beispiel bekannt bereits besonders besser besten bin
 bis bisher bist
@ -35,8 +35,8 @@ großen grosser großer grosses großes gut gute guter gutes
 habe haben habt hast hat hatte hätte hatten hätten heisst heißt her heute hier
 hin hinter hoch
-ich ihm ihn ihnen ihr ihre ihrem ihrer ihres im immer in indem infolgedessen
+ich ihm ihn ihnen ihr ihre ihrem ihren ihrer ihres im immer in indem
-ins irgend ist
+infolgedessen ins irgend ist
 ja jahr jahre jahren je jede jedem jeden jeder jedermann jedermanns jedoch
 jemand jemandem jemanden jene jenem jenen jener jenes jetzt
--- a/spacy/lang/en/stop_words.py
+++ b/spacy/lang/en/stop_words.py
@ -39,7 +39,7 @@ made make many may me meanwhile might mine more moreover most mostly move much
 must my myself
 name namely neither never nevertheless next nine no nobody none noone nor not
-nothing now nowhere 
+nothing now nowhere
 of off often on once one only onto or other others otherwise our ours ourselves
 out over own
@ -75,4 +75,3 @@ STOP_WORDS.update(contractions)
 for apostrophe in ["‘", "’"]:
    for stopword in contractions:
        STOP_WORDS.add(stopword.replace("'", apostrophe))
--- a/spacy/lang/es/init.py
+++ b/spacy/lang/es/init.py
@ -4,6 +4,7 @@ from __future__ import unicode_literals
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .tag_map import TAG_MAP
 from .stop_words import STOP_WORDS
 from .lex_attrs import LEX_ATTRS
 from .lemmatizer import LOOKUP
 from .syntax_iterators import SYNTAX_ITERATORS
@ -16,6 +17,7 @@ from ...util import update_exc, add_lookups
 class SpanishDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = lambda text: "es"
    lex_attr_getters[NORM] = add_lookups(
        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
--- a/spacy/lang/es/lex_attrs.py
+++ b/spacy/lang/es/lex_attrs.py
@ -0,0 +1,59 @@
 # coding: utf8
 from __future__ import unicode_literals
 from ...attrs import LIKE_NUM
 _num_words = [
    "cero",
    "uno",
    "dos",
    "tres",
    "cuatro",
    "cinco",
    "seis",
    "siete",
    "ocho",
    "nueve",
    "diez",
    "once",
    "doce",
    "trece",
    "catorce",
    "quince",
    "dieciséis",
    "diecisiete",
    "dieciocho",
    "diecinueve",
    "veinte",
    "treinta",
    "cuarenta",
    "cincuenta",
    "sesenta",
    "setenta",
    "ochenta",
    "noventa",
    "cien",
    "mil",
    "millón",
    "billón",
    "trillón",
 ]
 def like_num(text):
    if text.startswith(("+", "-", "±", "~")):
        text = text[1:]
    text = text.replace(",", "").replace(".", "")
    if text.isdigit():
        return True
    if text.count("/") == 1:
        num, denom = text.split("/")
        if num.isdigit() and denom.isdigit():
            return True
    if text.lower() in _num_words:
        return True
    return False
 LEX_ATTRS = {LIKE_NUM: like_num}
--- a/spacy/lang/fr/examples.py
+++ b/spacy/lang/fr/examples.py
@ -11,9 +11,9 @@ Example sentences to test spaCy and its language models.
 sentences = [
-    "Apple cherche a acheter une startup anglaise pour 1 milliard de dollard",
+    "Apple cherche à acheter une startup anglaise pour 1 milliard de dollars",
-    "Les voitures autonomes voient leur assurances décalées vers les constructeurs",
+    "Les voitures autonomes déplacent la responsabilité de l'assurance vers les constructeurs",
-    "San Francisco envisage d'interdire les robots coursiers",
+    "San Francisco envisage d'interdire les robots coursiers sur les trottoirs",
    "Londres est une grande ville du Royaume-Uni",
    "L’Italie choisit ArcelorMittal pour reprendre la plus grande aciérie d’Europe",
    "Apple lance HomePod parce qu'il se sent menacé par l'Echo d'Amazon",
--- a/spacy/lang/id/tag_map.py
+++ b/spacy/lang/id/tag_map.py
@ -7,88 +7,89 @@ from ...symbols import NOUN, PRON, AUX, SCONJ, INTJ, PART, PROPN
 # POS explanations for indonesian available from https://www.aclweb.org/anthology/Y12-1014
 TAG_MAP = {
-	"NSD": {POS: NOUN},
+    "NSD": {POS: NOUN},
-	"Z--": {POS: PUNCT},
+    "Z--": {POS: PUNCT},
-	"VSA": {POS: VERB},
+    "VSA": {POS: VERB},
-	"CC-": {POS: NUM},
+    "CC-": {POS: NUM},
-	"R--": {POS: ADP},
+    "R--": {POS: ADP},
-	"D--": {POS: ADV},
+    "D--": {POS: ADV},
-	"ASP": {POS: ADJ},
+    "ASP": {POS: ADJ},
-	"S--": {POS: SCONJ},
+    "S--": {POS: SCONJ},
-	"VSP": {POS: VERB},
+    "VSP": {POS: VERB},
-	"H--": {POS: CCONJ},
+    "H--": {POS: CCONJ},
-	"F--": {POS: X},
+    "F--": {POS: X},
-	"B--": {POS: DET},
+    "B--": {POS: DET},
-	"CO-": {POS: NUM},
+    "CO-": {POS: NUM},
-	"G--": {POS: ADV},
+    "G--": {POS: ADV},
-	"PS3": {POS: PRON},
+    "PS3": {POS: PRON},
-	"W--": {POS: ADV},
+    "W--": {POS: ADV},
-	"O--": {POS: AUX},
+    "O--": {POS: AUX},
-	"PP1": {POS: PRON},
+    "PP1": {POS: PRON},
-	"ASS": {POS: ADJ},
+    "ASS": {POS: ADJ},
-	"PS1": {POS: PRON},
+    "PS1": {POS: PRON},
-	"APP": {POS: ADJ},
+    "APP": {POS: ADJ},
-	"CD-": {POS: NUM},
+    "CD-": {POS: NUM},
-	"VPA": {POS: VERB},
+    "VPA": {POS: VERB},
-	"VPP": {POS: VERB},
+    "VPP": {POS: VERB},
-	"X--": {POS: X}, 
+    "X--": {POS: X},
-	"CO-+PS3": {POS: NUM},
+    "CO-+PS3": {POS: NUM},
-	"NSD+PS3": {POS: NOUN}, 
+    "NSD+PS3": {POS: NOUN},
-	"ASP+PS3": {POS: ADJ}, 
+    "ASP+PS3": {POS: ADJ},
-	"M--":     {POS: AUX}, 
+    "M--": {POS: AUX},
-	"VSA+PS3": {POS: VERB},
+    "VSA+PS3": {POS: VERB},
-	"R--+PS3": {POS: ADP},
+    "R--+PS3": {POS: ADP},
-	"W--+T--": {POS: ADV},
+    "W--+T--": {POS: ADV},
-	"PS2":     {POS:PRON},
+    "PS2": {POS: PRON},
-	"NSD+PS1": {POS:NOUN},
+    "NSD+PS1": {POS: NOUN},
-	"PP3":     {POS: PRON}, 
+    "PP3": {POS: PRON},
-	"VSA+T--": {POS: VERB},  
+    "VSA+T--": {POS: VERB},
-	"D--+T--": {POS: ADV}, 
+    "D--+T--": {POS: ADV},
-	"VSP+PS3": {POS: VERB}, 
+    "VSP+PS3": {POS: VERB},
-	"F--+PS3": {POS: X},  
+    "F--+PS3": {POS: X},
-	"M--+T--":  {POS: AUX}, 
+    "M--+T--": {POS: AUX},
-	"F--+T--":  {POS: X}, 
+    "F--+T--": {POS: X},
-	"PUNCT":   {POS: PUNCT}, 
+    "PUNCT": {POS: PUNCT},
-	"PROPN":   {POS: PROPN}, 
+    "PROPN": {POS: PROPN},
-	"I--":     {POS: INTJ}, 
+    "I--": {POS: INTJ},
-	"S--+PS3": {POS: SCONJ}, 
+    "S--+PS3": {POS: SCONJ},
-	"ASP+T--":  {POS: ADJ}, 
+    "ASP+T--": {POS: ADJ},
-	"CC-+PS3": {POS: NUM},
+    "CC-+PS3": {POS: NUM},
-	"NSD+PS2": {POS: NOUN}, 
+    "NSD+PS2": {POS: NOUN},
-	"B--+T--":  {POS: DET}, 
+    "B--+T--": {POS: DET},
-	"H--+T--": {POS: CCONJ},
+    "H--+T--": {POS: CCONJ},
-	"VSA+PS2": {POS: VERB},
+    "VSA+PS2": {POS: VERB},
-	"NSF":     {POS: NOUN}, 
+    "NSF": {POS: NOUN},
-	"PS1+VSA": {POS: PRON}, 
+    "PS1+VSA": {POS: PRON},
-	"NPD":     {POS: NOUN}, 
+    "NPD": {POS: NOUN},
-	"PP2":     {POS:PRON}, 
+    "PP2": {POS: PRON},
-	"VSA+PS1": {POS: VERB},
+    "VSA+PS1": {POS: VERB},
-	"T--":      {POS: PART},  
+    "T--": {POS: PART},
-	"NSM":     {POS: NOUN},  
+    "NSM": {POS: NOUN},
-	"NUM":     {POS: NUM},  
+    "NUM": {POS: NUM},
-	"ASP+PS2": {POS: ADJ},  
+    "ASP+PS2": {POS: ADJ},
-	"G--+T--":  {POS: PART},  
+    "G--+T--": {POS: PART},
-	"D--+PS3": {POS: ADV},  
+    "D--+PS3": {POS: ADV},
-	"R--+PS2": {POS: ADP}, 
+    "R--+PS2": {POS: ADP},
-	"NSM+PS3": {POS: NOUN}, 
+    "NSM+PS3": {POS: NOUN},
-	"VSP+T--":  {POS: VERB},  
+    "VSP+T--": {POS: VERB},
-	"M--+PS3": {POS: AUX}, 
+    "M--+PS3": {POS: AUX},
-	"ASS+PS3": {POS: ADJ},  
+    "ASS+PS3": {POS: ADJ},
-	"G--+PS3": {POS: PART}, 
+    "G--+PS3": {POS: PART},
-	"F--+PS1": {POS: X}, 
+    "F--+PS1": {POS: X},
-	"NSD+T--": {POS: NOUN},   
+    "NSD+T--": {POS: NOUN},
-	"PP1+T--": {POS: PRON}, 
+    "PP1+T--": {POS: PRON},
-	"B--+PS3": {POS: DET},  
+    "B--+PS3": {POS: DET},
-	"NOUN":    {POS: NOUN},  
+    "NOUN": {POS: NOUN},
-	"NPD+PS3": {POS: NOUN},  
+    "NPD+PS3": {POS: NOUN},
-	"R--+PS1": {POS: ADP},  
+    "R--+PS1": {POS: ADP},
-	"F--+PS2": {POS: X},  
+    "F--+PS2": {POS: X},
-	"CD-+PS3": {POS: NUM},  
+    "CD-+PS3": {POS: NUM},
-	"PS1+VSA+T--":{POS: VERB},  
+    "PS1+VSA+T--": {POS: VERB},
-	"PS2+VSA": {POS: VERB}, 
+    "PS2+VSA": {POS: VERB},
-	"VERB":   {POS: VERB},
+    "VERB": {POS: VERB},
-	"CC-+T--": {POS: NUM},  
+    "CC-+T--": {POS: NUM},
-	"NPD+PS2":{POS: NOUN},  
+    "NPD+PS2": {POS: NOUN},
-	"D--+PS2":{POS: ADV},  
+    "D--+PS2": {POS: ADV},
-	"PP3+T--": {POS: PRON},
+    "PP3+T--": {POS: PRON},
-	"X":      {POS: X}} 
+    "X": {POS: X},
 }
--- a/spacy/lang/kn/stop_words.py
+++ b/spacy/lang/kn/stop_words.py
@ -4,67 +4,87 @@ from __future__ import unicode_literals
 STOP_WORDS = set(
    """
 ಈ
 ಮತ್ತು
 ಹಾಗೂ
 ಅವರು
 ಅವರ
 ಬಗ್ಗೆ
 ಎಂಬ
 ಆದರೆ
 ಅವರನ್ನು
 ಆದರೆ
 ತಮ್ಮ
 ಒಂದು
 ಎಂದರು
 ಮೇಲೆ
 ಹೇಳಿದರು
 ಸೇರಿದಂತೆ
 ಬಳಿಕ
 ಆ
 ಯಾವುದೇ
 ಅವರಿಗೆ
 ನಡೆದ
 ಕುರಿತು
 ಇದು
 ಅವರು
 ಕಳೆದ
 ಇದೇ
 ತಿಳಿಸಿದರು
 ಹೀಗಾಗಿ
 ಕೂಡ
 ತನ್ನ
 ತಿಳಿಸಿದ್ದಾರೆ
 ನಾನು
 ಹೇಳಿದ್ದಾರೆ
 ಈಗ
 ಎಲ್ಲ
 ನನ್ನ
 ನಮ್ಮ
 ಈಗಾಗಲೇ
 ಇದಕ್ಕೆ
 ಹಲವು
-ಇದೆ
+ಮೂಲಕ
-ಮತ್ತೆ
+ಹಾಗೂ
 ಮಾಡುವ
 ನೀಡಿದರು
 ನಾವು
 ನೀಡಿದ
 ಇದರಿಂದ
 ಅದು
 ಇದನ್ನು
 ನೀಡಿದ್ದಾರೆ
 ಯಾವ
 ಎಂದರು
 ಅವರು
 ಈಗ
 ಎಂಬ
 ಹಾಗಾಗಿ
 ಅಷ್ಟೇ
 ನಾವು
 ಇದೇ
 ಹೇಳಿ
 ತಮ್ಮ
 ಹೀಗೆ
 ನಮ್ಮ
 ಬೇರೆ
 ನೀಡಿದರು
 ಮತ್ತೆ
 ಇದು
 ಈ
 ನೀವು
 ನಾನು
 ಇತ್ತು
 ಎಲ್ಲಾ
 ಯಾವುದೇ
 ನಡೆದ
 ಅದನ್ನು
-ಇಲ್ಲಿ
+ಎಂದರೆ
 ಆಗ
 ಬಂದಿದೆ.
 ಅದೇ
 ಇರುವ
 ಅಲ್ಲದೆ
 ಕೆಲವು
 ನೀಡಿದೆ
 ಹೀಗಾಗಿ
 ಜೊತೆಗೆ
 ಇದರಿಂದ
 ನನಗೆ
 ಅಲ್ಲದೆ
 ಎಷ್ಟು
 ಇದರ
 ಇಲ್ಲ
 ಕಳೆದ
 ತುಂಬಾ
 ಈಗಾಗಲೇ
 ಮಾಡಿ
 ಅದಕ್ಕೆ
 ಬಗ್ಗೆ
 ಅವರ
 ಇದನ್ನು
 ಆ
 ಇದೆ
 ಹೆಚ್ಚು
 ಇನ್ನು
 ಎಲ್ಲ
 ಇರುವ
 ಅವರಿಗೆ
 ನಿಮ್ಮ
 ಏನು
 ಕೂಡ
 ಇಲ್ಲಿ
 ನನ್ನನ್ನು
 ಕೆಲವು
 ಮಾತ್ರ
 ಬಳಿಕ
 ಅಂತ
 ತನ್ನ
 ಆಗ
 ಅಥವಾ
 ಅಲ್ಲ
 ಕೇವಲ
 ಆದರೆ
 ಮತ್ತು
 ಇನ್ನೂ
 ಅದೇ
 ಆಗಿ
 ಅವರನ್ನು
 ಹೇಳಿದ್ದಾರೆ
 ನಡೆದಿದೆ
 ಇದಕ್ಕೆ
 ಎಂಬುದು
 ಎಂದು
 ನನ್ನ
 ಮೇಲೆ
 """.split()
 )
--- a/spacy/lang/mr/init.py
+++ b/spacy/lang/mr/init.py
@ -0,0 +1,20 @@
 #coding: utf8
 from __future__ import unicode_literals
 from .stop_words import STOP_WORDS
 from ...language import Language
 from ...attrs import LANG
 class MarathiDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters[LANG] = lambda text: "mr"
    stop_words = STOP_WORDS
 class Marathi(Language):
    lang = "mr"
    Defaults = MarathiDefaults
 __all__ = ["Marathi"]
--- a/spacy/lang/mr/stop_words.py
+++ b/spacy/lang/mr/stop_words.py
@ -0,0 +1,196 @@
 # coding: utf8
 from __future__ import unicode_literals
 # Source: https://github.com/stopwords-iso/stopwords-mr/blob/master/stopwords-mr.txt, https://github.com/6/stopwords-json/edit/master/dist/mr.json
 STOP_WORDS = set(
    """
 न
 अतरी
 तो
 हें
 तें
 कां
 आणि
 जें
 जे
 मग
 ते
 मी
 जो
 परी
 गा
 हे
 ऐसें
 आतां
 नाहीं
 तेथ
 हा
 तया
 असे
 म्हणे
 काय
 कीं
 जैसें
 तंव
 तूं
 होय
 जैसा
 आहे
 पैं
 तैसा
 जरी
 म्हणोनि
 एक
 ऐसा
 जी
 ना
 मज
 एथ
 या
 जेथ
 जया
 तुज
 तेणें
 तैं
 पां
 असो
 करी
 ऐसी
 येणें
 जाहला
 तेंचि
 आघवें
 होती
 कांहीं
 होऊनि
 एकें
 मातें
 ठायीं
 ये
 सकळ
 केलें
 जेणें
 जाण
 जैसी
 होये
 जेवीं
 एऱ्हवीं
 मीचि
 किरीटी
 दिसे
 देवा
 हो
 तरि
 कीजे
 तैसे
 आपण
 तिये
 कर्म
 नोहे
 इये
 पडे
 माझें
 तैसी
 लागे
 नाना
 जंव
 कीर
 अधिक
 अनेक
 अशी
 असलयाचे
 असलेल्या
 असा
 असून
 असे
 आज
 आणि
 आता
 आपल्या
 आला
 आली
 आले
 आहे
 आहेत
 एक
 एका
 कमी
 करणयात
 करून
 का
 काम
 काय
 काही
 किवा
 की
 केला
 केली
 केले
 कोटी
 गेल्या
 घेऊन
 जात
 झाला
 झाली
 झाले
 झालेल्या
 टा
 तर
 तरी
 तसेच
 ता
 ती
 तीन
 ते
 तो
 त्या
 त्याचा
 त्याची
 त्याच्या
 त्याना
 त्यानी
 त्यामुळे
 त्री
 दिली
 दोन
 न
 पण
 पम
 परयतन
 पाटील
 म
 मात्र
 माहिती
 मी
 मुबी
 म्हणजे
 म्हणाले
 म्हणून
 या
 याचा
 याची
 याच्या
 याना
 यानी
 येणार
 येत
 येथील
 येथे
 लाख
 व
 व्यकत
 सर्व
 सागित्ले
 सुरू
 हजार
 हा
 ही
 हे
 होणार
 होत
 होता
 होती
 होते
 """.split()
 )
--- a/spacy/lang/nl/init.py
+++ b/spacy/lang/nl/init.py
@ -6,10 +6,7 @@ from .lex_attrs import LEX_ATTRS
 from .tag_map import TAG_MAP
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
-
+from .lemmatizer import LOOKUP, LEMMA_EXC, LEMMA_INDEX, RULES, DutchLemmatizer
 from .lemmatizer import LOOKUP, LEMMA_EXC, LEMMA_INDEX, RULES
 from .lemmatizer.lemmatizer import DutchLemmatizer
 from ..tokenizer_exceptions import BASE_EXCEPTIONS
 from ..norm_exceptions import BASE_NORMS
 from ...language import Language
@ -21,9 +18,10 @@ class DutchDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
-    lex_attr_getters[LANG] = lambda text: 'nl'
+    lex_attr_getters[LANG] = lambda text: "nl"
-    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM],
+    lex_attr_getters[NORM] = add_lookups(
-                                         BASE_NORMS)
+        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    tag_map = TAG_MAP
@ -36,15 +34,14 @@ class DutchDefaults(Language.Defaults):
        lemma_index = LEMMA_INDEX
        lemma_exc = LEMMA_EXC
        lemma_lookup = LOOKUP
-        return DutchLemmatizer(index=lemma_index,
+        return DutchLemmatizer(
-                               exceptions=lemma_exc,
+            index=lemma_index, exceptions=lemma_exc, lookup=lemma_lookup, rules=rules
-                               lookup=lemma_lookup,
+        )
                               rules=rules)
 class Dutch(Language):
-    lang = 'nl'
+    lang = "nl"
    Defaults = DutchDefaults
-__all__ = ['Dutch']
+__all__ = ["Dutch"]
--- a/spacy/lang/nl/lemmatizer/init.py
+++ b/spacy/lang/nl/lemmatizer/init.py
@ -18,23 +18,26 @@ from ._adpositions import ADPOSITIONS
 from ._determiners import DETERMINERS
 from .lookup import LOOKUP
 from ._lemma_rules import RULES
 from .lemmatizer import DutchLemmatizer
-LEMMA_INDEX = {"adj": ADJECTIVES,
+LEMMA_INDEX = {
-               "noun": NOUNS,
+    "adj": ADJECTIVES,
-               "verb": VERBS,
+    "noun": NOUNS,
-               "adp": ADPOSITIONS,
+    "verb": VERBS,
-               "det": DETERMINERS}
+    "adp": ADPOSITIONS,
    "det": DETERMINERS,
 }
-LEMMA_EXC = {"adj": ADJECTIVES_IRREG,
+LEMMA_EXC = {
-             "adv": ADVERBS_IRREG,
+    "adj": ADJECTIVES_IRREG,
-             "adp": ADPOSITIONS_IRREG,
+    "adv": ADVERBS_IRREG,
-             "noun": NOUNS_IRREG,
+    "adp": ADPOSITIONS_IRREG,
-             "verb": VERBS_IRREG,
+    "noun": NOUNS_IRREG,
-             "det": DETERMINERS_IRREG,
+    "verb": VERBS_IRREG,
-             "pron": PRONOUNS_IRREG}
+    "det": DETERMINERS_IRREG,
    "pron": PRONOUNS_IRREG,
 }
 __all__ = ["LOOKUP", "LEMMA_EXC", "LEMMA_INDEX", "RULES", "DutchLemmatizer"]
--- a/spacy/lang/nl/tokenizer_exceptions.py
+++ b/spacy/lang/nl/tokenizer_exceptions.py
@ -1,7 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals
-from ...symbols import ORTH, LEMMA, TAG, NORM, PRON_LEMMA
+from ...symbols import ORTH
 # Extensive list of both common and uncommon dutch abbreviations copied from
 # github.com/diasks2/pragmatic_segmenter, a Ruby library for rule-based
@ -16,7 +16,7 @@ from ...symbols import ORTH, LEMMA, TAG, NORM, PRON_LEMMA
 # are extremely domain-specific. Tokenizer performance may benefit from some
 # slight pruning, although no performance regression has been observed so far.
-
+# fmt: off
 abbrevs = ['a.2d.', 'a.a.', 'a.a.j.b.', 'a.f.t.', 'a.g.j.b.',
           'a.h.v.', 'a.h.w.', 'a.hosp.', 'a.i.', 'a.j.b.', 'a.j.t.',
           'a.m.', 'a.m.r.', 'a.p.m.', 'a.p.r.', 'a.p.t.', 'a.s.',
@ -326,7 +326,7 @@ abbrevs = ['a.2d.', 'a.a.', 'a.a.j.b.', 'a.f.t.', 'a.g.j.b.',
           'wtvb.', 'ww.', 'x.d.', 'z.a.', 'z.g.', 'z.i.', 'z.j.',
           'z.o.z.', 'z.p.', 'z.s.m.', 'zg.', 'zgn.', 'zn.', 'znw.',
           'zr.', 'zr.', 'ms.', 'zr.ms.']
-
+# fmt: on
 _exc = {}
 for orth in abbrevs:
--- a/spacy/lang/norm_exceptions.py
+++ b/spacy/lang/norm_exceptions.py
@ -53,4 +53,11 @@ BASE_NORMS = {
    "US$": "$",
    "C$": "$",
    "A$": "$",
    "₺": "$",
    "₹": "$",
    "৳": "$",
    "₩": "$",
    "Mex$": "$",
    "₣": "$",
    "E£": "$",
 }
--- a/spacy/lang/th/init.py
+++ b/spacy/lang/th/init.py
@ -4,11 +4,14 @@ from __future__ import unicode_literals
 from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .tag_map import TAG_MAP
 from .stop_words import STOP_WORDS
 from .norm_exceptions import NORM_EXCEPTIONS
 from .lex_attrs import LEX_ATTRS
-from ...attrs import LANG
+from ..norm_exceptions import BASE_NORMS
 from ...attrs import LANG, NORM
 from ...language import Language
 from ...tokens import Doc
-from ...util import DummyTokenizer
+from ...util import DummyTokenizer, add_lookups
 class ThaiTokenizer(DummyTokenizer):
@ -25,15 +28,18 @@ class ThaiTokenizer(DummyTokenizer):
        self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
    def __call__(self, text):
-        words = list(self.word_tokenize(text, "newmm"))
+        words = list(self.word_tokenize(text))
        spaces = [False] * len(words)
        return Doc(self.vocab, words=words, spaces=spaces)
 class ThaiDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[LANG] = lambda _text: "th"
-
+    lex_attr_getters[NORM] = add_lookups(
        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
    )
    tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
    tag_map = TAG_MAP
    stop_words = STOP_WORDS
--- a/spacy/lang/th/lex_attrs.py
+++ b/spacy/lang/th/lex_attrs.py
@ -0,0 +1,62 @@
 # coding: utf8
 from __future__ import unicode_literals
 from ...attrs import LIKE_NUM
 _num_words = [
    "ศูนย์",
    "หนึ่ง",
    "สอง",
    "สาม",
    "สี่",
    "ห้า",
    "หก",
    "เจ็ด",
    "แปด",
    "เก้า",
    "สิบ",
    "สิบเอ็ด",
    "ยี่สิบ",
    "ยี่สิบเอ็ด",
    "สามสิบ",
    "สามสิบเอ็ด",
    "สี่สิบ",
    "สี่สิบเอ็ด",
    "ห้าสิบ",
    "ห้าสิบเอ็ด",
    "หกสิบเอ็ด",
    "เจ็ดสิบ",
    "เจ็ดสิบเอ็ด",
    "แปดสิบ",
    "แปดสิบเอ็ด",
    "เก้าสิบ",
    "เก้าสิบเอ็ด",
    "ร้อย",
    "พัน",
    "ล้าน",
    "พันล้าน",
    "หมื่นล้าน",
    "แสนล้าน",
    "ล้านล้าน",
    "ล้านล้านล้าน",
    "ล้านล้านล้านล้าน",
 ]
 def like_num(text):
    if text.startswith(("+", "-", "±", "~")):
        text = text[1:]
    text = text.replace(",", "").replace(".", "")
    if text.isdigit():
        return True
    if text.count("/") == 1:
        num, denom = text.split("/")
        if num.isdigit() and denom.isdigit():
            return True
    if text in _num_words:
        return True
    return False
 LEX_ATTRS = {LIKE_NUM: like_num}
--- a/spacy/lang/th/norm_exceptions.py
+++ b/spacy/lang/th/norm_exceptions.py
@ -0,0 +1,113 @@
 # coding: utf8
 from __future__ import unicode_literals
 _exc = {
    # Conjugation and Diversion invalid to Tonal form (ผันอักษรและเสียงไม่ตรงกับรูปวรรณยุกต์)
    "สนุ๊กเกอร์": "สนุกเกอร์",
    "โน้ต": "โน้ต",
    # Misspelled because of being lazy or hustle (สะกดผิดเพราะขี้เกียจพิมพ์ หรือเร่งรีบ)
    "โทสับ": "โทรศัพท์",
    "พุ่งนี้": "พรุ่งนี้",
    # Strange (ให้ดูแปลกตา)
    "ชะมะ": "ใช่ไหม",
    "ชิมิ": "ใช่ไหม",
    "ชะ": "ใช่ไหม",
    "ช่ายมะ": "ใช่ไหม",
    "ป่าว": "เปล่า",
    "ป่ะ": "เปล่า",
    "ปล่าว": "เปล่า",
    "คัย": "ใคร",
    "ไค": "ใคร",
    "คราย": "ใคร",
    "เตง": "ตัวเอง",
    "ตะเอง": "ตัวเอง",
    "รึ": "หรือ",
    "เหรอ": "หรือ",
    "หรา": "หรือ",
    "หรอ": "หรือ",
    "ชั้น": "ฉัน",
    "ชั้ล": "ฉัน",
    "ช้าน": "ฉัน",
    "เทอ": "เธอ",
    "เทอร์": "เธอ",
    "เทอว์": "เธอ",
    "แกร": "แก",
    "ป๋ม": "ผม",
    "บ่องตง": "บอกตรงๆ",
    "ถ่ามตง": "ถามตรงๆ",
    "ต่อมตง": "ตอบตรงๆ",
    "เพิ่ล": "เพื่อน",
    "จอบอ": "จอบอ",
    "ดั้ย": "ได้",
    "ขอบคุง": "ขอบคุณ",
    "ยังงัย": "ยังไง",
    "Inw": "เทพ",
    "uou": "นอน",
    "Lกรีeu": "เกรียน",
    # Misspelled to express emotions (คำที่สะกดผิดเพื่อแสดงอารมณ์)
    "เปงราย": "เป็นอะไร",
    "เปนรัย": "เป็นอะไร",
    "เปงรัย": "เป็นอะไร",
    "เป็นอัลไล": "เป็นอะไร",
    "ทามมาย": "ทำไม",
    "ทามมัย": "ทำไม",
    "จังรุย": "จังเลย",
    "จังเยย": "จังเลย",
    "จุงเบย": "จังเลย",
    "ไม่รู้": "มะรุ",
    "เฮ่ย": "เฮ้ย",
    "เห้ย": "เฮ้ย",
    "น่าร็อค": "น่ารัก",
    "น่าร๊าก": "น่ารัก",
    "ตั้ลล๊าก": "น่ารัก",
    "คือร๊ะ": "คืออะไร",
    "โอป่ะ": "โอเคหรือเปล่า",
    "น่ามคาน": "น่ารำคาญ",
    "น่ามสาร": "น่าสงสาร",
    "วงวาร": "สงสาร",
    "บับว่า": "แบบว่า",
    "อัลไล": "อะไร",
    "อิจ": "อิจฉา",
    # Reduce rough words or Avoid to software filter (คำที่สะกดผิดเพื่อลดความหยาบของคำ หรืออาจใช้หลีกเลี่ยงการกรองคำหยาบของซอฟต์แวร์)
    "กรู": "กู",
    "กุ": "กู",
    "กรุ": "กู",
    "ตู": "กู",
    "ตรู": "กู",
    "มรึง": "มึง",
    "เมิง": "มึง",
    "มืง": "มึง",
    "มุง": "มึง",
    "สาด": "สัตว์",
    "สัส": "สัตว์",
    "สัก": "สัตว์",
    "แสรด": "สัตว์",
    "โคโตะ": "โคตร",
    "โคด": "โคตร",
    "โครต": "โคตร",
    "โคตะระ": "โคตร",
    "พ่อง": "พ่อมึง",
    "แม่เมิง": "แม่มึง",
    "เชี่ย": "เหี้ย",
    # Imitate words (คำเลียนเสียง โดยส่วนใหญ่จะเพิ่มทัณฑฆาต หรือซ้ำตัวอักษร)
    "แอร๊ยย": "อ๊าย",
    "อร๊ายยย": "อ๊าย",
    "มันส์": "มัน",
    "วู๊วววววววว์": "วู้",
    # Acronym (แบบคำย่อ)
    "หมาลัย": "มหาวิทยาลัย",
    "วิดวะ": "วิศวะ",
    "สินสาด ": "ศิลปศาสตร์",
    "สินกำ ": "ศิลปกรรมศาสตร์",
    "เสารีย์ ": "อนุเสาวรีย์ชัยสมรภูมิ",
    "เมกา ": "อเมริกา",
    "มอไซค์ ": "มอเตอร์ไซค์",
 }
 NORM_EXCEPTIONS = {}
 for string, norm in _exc.items():
    NORM_EXCEPTIONS[string] = norm
    NORM_EXCEPTIONS[string.title()] = norm
--- a/spacy/lang/th/tokenizer_exceptions.py
+++ b/spacy/lang/th/tokenizer_exceptions.py
@ -5,7 +5,7 @@ from ...symbols import ORTH, LEMMA
 _exc = {
-    #หน่วยงานรัฐ / government agency
+    # หน่วยงานรัฐ / government agency
    "กกต.": [{ORTH: "กกต.", LEMMA: "คณะกรรมการการเลือกตั้ง"}],
    "กทท.": [{ORTH: "กทท.", LEMMA: "การท่าเรือแห่งประเทศไทย"}],
    "กทพ.": [{ORTH: "กทพ.", LEMMA: "การทางพิเศษแห่งประเทศไทย"}],
@ -44,11 +44,21 @@ _exc = {
    "ธอส.": [{ORTH: "ธอส.", LEMMA: "ธนาคารอาคารสงเคราะห์"}],
    "นย.": [{ORTH: "นย.", LEMMA: "นาวิกโยธิน"}],
    "ปตท.": [{ORTH: "ปตท.", LEMMA: "การปิโตรเลียมแห่งประเทศไทย"}],
-    "ป.ป.ช.": [{ORTH: "ป.ป.ช.", LEMMA: "คณะกรรมการป้องกันและปราบปรามการทุจริตและประพฤติมิชอบในวงราชการ"}],
+    "ป.ป.ช.": [
        {
            ORTH: "ป.ป.ช.",
            LEMMA: "คณะกรรมการป้องกันและปราบปรามการทุจริตและประพฤติมิชอบในวงราชการ",
        }
    ],
    "ป.ป.ส.": [{ORTH: "ป.ป.ส.", LEMMA: "คณะกรรมการป้องกันและปราบปรามยาเสพติด"}],
    "บพร.": [{ORTH: "บพร.", LEMMA: "กรมการบินพลเรือน"}],
    "บย.": [{ORTH: "บย.", LEMMA: "กองบินยุทธการ"}],
-    "พสวท.": [{ORTH: "พสวท.", LEMMA: "โครงการพัฒนาและส่งเสริมผู้มีความรู้ความสามารถพิเศษทางวิทยาศาสตร์และเทคโนโลยี"}],
+    "พสวท.": [
        {
            ORTH: "พสวท.",
            LEMMA: "โครงการพัฒนาและส่งเสริมผู้มีความรู้ความสามารถพิเศษทางวิทยาศาสตร์และเทคโนโลยี",
        }
    ],
    "มอก.": [{ORTH: "มอก.", LEMMA: "สำนักงานมาตรฐานผลิตภัณฑ์อุตสาหกรรม"}],
    "ยธ.": [{ORTH: "ยธ.", LEMMA: "กรมโยธาธิการ"}],
    "รพช.": [{ORTH: "รพช.", LEMMA: "สำนักงานเร่งรัดพัฒนาชนบท"}],
@ -71,11 +81,15 @@ _exc = {
    "สปช.": [{ORTH: "สปช.", LEMMA: "สำนักงานคณะกรรมการการประถมศึกษาแห่งชาติ"}],
    "สปอ.": [{ORTH: "สปอ.", LEMMA: "สำนักงานการประถมศึกษาอำเภอ"}],
    "สพช.": [{ORTH: "สพช.", LEMMA: "สำนักงานคณะกรรมการนโยบายพลังงานแห่งชาติ"}],
-    "สยช.": [{ORTH: "สยช.", LEMMA: "สำนักงานคณะกรรมการส่งเสริมและประสานงานเยาวชนแห่งชาติ"}],
+    "สยช.": [
        {ORTH: "สยช.", LEMMA: "สำนักงานคณะกรรมการส่งเสริมและประสานงานเยาวชนแห่งชาติ"}
    ],
    "สวช.": [{ORTH: "สวช.", LEMMA: "สำนักงานคณะกรรมการวัฒนธรรมแห่งชาติ"}],
    "สวท.": [{ORTH: "สวท.", LEMMA: "สถานีวิทยุกระจายเสียงแห่งประเทศไทย"}],
    "สวทช.": [{ORTH: "สวทช.", LEMMA: "สำนักงานพัฒนาวิทยาศาสตร์และเทคโนโลยีแห่งชาติ"}],
-    "สคช.": [{ORTH: "สคช.", LEMMA: "สำนักงานคณะกรรมการพัฒนาการเศรษฐกิจและสังคมแห่งชาติ"}],
+    "สคช.": [
        {ORTH: "สคช.", LEMMA: "สำนักงานคณะกรรมการพัฒนาการเศรษฐกิจและสังคมแห่งชาติ"}
    ],
    "สสว.": [{ORTH: "สสว.", LEMMA: "สำนักงานส่งเสริมวิสาหกิจขนาดกลางและขนาดย่อม"}],
    "สสส.": [{ORTH: "สสส.", LEMMA: "สำนักงานกองทุนสนับสนุนการสร้างเสริมสุขภาพ"}],
    "สสวท.": [{ORTH: "สสวท.", LEMMA: "สถาบันส่งเสริมการสอนวิทยาศาสตร์และเทคโนโลยี"}],
@ -85,7 +99,7 @@ _exc = {
    "อปพร.": [{ORTH: "อปพร.", LEMMA: "อาสาสมัครป้องกันภัยฝ่ายพลเรือน"}],
    "อย.": [{ORTH: "อย.", LEMMA: "สำนักงานคณะกรรมการอาหารและยา"}],
    "อ.ส.ม.ท.": [{ORTH: "อ.ส.ม.ท.", LEMMA: "องค์การสื่อสารมวลชนแห่งประเทศไทย"}],
-    #มหาวิทยาลัย / สถานศึกษา / university / college
+    # มหาวิทยาลัย / สถานศึกษา / university / college
    "มทส.": [{ORTH: "มทส.", LEMMA: "มหาวิทยาลัยเทคโนโลยีสุรนารี"}],
    "มธ.": [{ORTH: "มธ.", LEMMA: "มหาวิทยาลัยธรรมศาสตร์"}],
    "ม.อ.": [{ORTH: "ม.อ.", LEMMA: "มหาวิทยาลัยสงขลานครินทร์"}],
@ -93,7 +107,7 @@ _exc = {
    "มมส.": [{ORTH: "มมส.", LEMMA: "มหาวิทยาลัยมหาสารคาม"}],
    "วท.": [{ORTH: "วท.", LEMMA: "วิทยาลัยเทคนิค"}],
    "สตม.": [{ORTH: "สตม.", LEMMA: "สำนักงานตรวจคนเข้าเมือง (ตำรวจ)"}],
-    #ยศ / rank
+    # ยศ / rank
    "ดร.": [{ORTH: "ดร.", LEMMA: "ดอกเตอร์"}],
    "ด.ต.": [{ORTH: "ด.ต.", LEMMA: "ดาบตำรวจ"}],
    "จ.ต.": [{ORTH: "จ.ต.", LEMMA: "จ่าตรี"}],
@ -133,10 +147,14 @@ _exc = {
    "ผญบ.": [{ORTH: "ผญบ.", LEMMA: "ผู้ใหญ่บ้าน"}],
    "ผบ.": [{ORTH: "ผบ.", LEMMA: "ผู้บังคับบัญชา"}],
    "ผบก.": [{ORTH: "ผบก.", LEMMA: "ผู้บังคับบัญชาการ (ตำรวจ)"}],
    "ผบก.": [{ORTH: "ผบก.", LEMMA: "ผู้บังคับการ (ตำรวจ)"}],
    "ผบก.น.": [{ORTH: "ผบก.น.", LEMMA: "ผู้บังคับการตำรวจนครบาล"}],
    "ผบก.ป.": [{ORTH: "ผบก.ป.", LEMMA: "ผู้บังคับการตำรวจกองปราบปราม"}],
-    "ผบก.ปค.": [{ORTH: "ผบก.ปค.", LEMMA: "ผู้บังคับการ กองบังคับการปกครอง (โรงเรียนนายร้อยตำรวจ)"}],
+    "ผบก.ปค.": [
        {
            ORTH: "ผบก.ปค.",
            LEMMA: "ผู้บังคับการ กองบังคับการปกครอง (โรงเรียนนายร้อยตำรวจ)",
        }
    ],
    "ผบก.ปม.": [{ORTH: "ผบก.ปม.", LEMMA: "ผู้บังคับการตำรวจป่าไม้"}],
    "ผบก.ภ.": [{ORTH: "ผบก.ภ.", LEMMA: "ผู้บังคับการตำรวจภูธร"}],
    "ผบช.": [{ORTH: "ผบช.", LEMMA: "ผู้บัญชาการ (ตำรวจ)"}],
@ -177,7 +195,6 @@ _exc = {
    "พล.อ.ต.": [{ORTH: "พล.อ.ต.", LEMMA: "พลอากาศตรี"}],
    "พล.อ.ท.": [{ORTH: "พล.อ.ท.", LEMMA: "พลอากาศโท"}],
    "พล.อ.อ.": [{ORTH: "พล.อ.อ.", LEMMA: "พลอากาศเอก"}],
    "พ.อ.": [{ORTH: "พ.อ.", LEMMA: "พันเอก"}],
    "พ.อ.พิเศษ": [{ORTH: "พ.อ.พิเศษ", LEMMA: "พันเอกพิเศษ"}],
    "พ.อ.ต.": [{ORTH: "พ.อ.ต.", LEMMA: "พันจ่าอากาศตรี"}],
    "พ.อ.ท.": [{ORTH: "พ.อ.ท.", LEMMA: "พันจ่าอากาศโท"}],
@ -209,7 +226,7 @@ _exc = {
    "ส.อ.": [{ORTH: "ส.อ.", LEMMA: "สิบเอก"}],
    "อจ.": [{ORTH: "อจ.", LEMMA: "อาจารย์"}],
    "อจญ.": [{ORTH: "อจญ.", LEMMA: "อาจารย์ใหญ่"}],
-    #วุฒิ / bachelor degree
+    # วุฒิ / bachelor degree
    "ป.": [{ORTH: "ป.", LEMMA: "ประถมศึกษา"}],
    "ป.กศ.": [{ORTH: "ป.กศ.", LEMMA: "ประกาศนียบัตรวิชาการศึกษา"}],
    "ป.กศ.สูง": [{ORTH: "ป.กศ.สูง", LEMMA: "ประกาศนียบัตรวิชาการศึกษาชั้นสูง"}],
@ -283,20 +300,20 @@ _exc = {
    "อ.บ.": [{ORTH: "อ.บ.", LEMMA: "อักษรศาสตรบัณฑิต"}],
    "อ.ม.": [{ORTH: "อ.ม.", LEMMA: "อักษรศาสตรมหาบัณฑิต"}],
    "อ.ด.": [{ORTH: "อ.ด.", LEMMA: "อักษรศาสตรดุษฎีบัณฑิต"}],
-    #ปี / เวลา / year / time
+    # ปี / เวลา / year / time
    "ชม.": [{ORTH: "ชม.", LEMMA: "ชั่วโมง"}],
    "จ.ศ.": [{ORTH: "จ.ศ.", LEMMA: "จุลศักราช"}],
    "ค.ศ.": [{ORTH: "ค.ศ.", LEMMA: "คริสต์ศักราช"}],
    "ฮ.ศ.": [{ORTH: "ฮ.ศ.", LEMMA: "ฮิจเราะห์ศักราช"}],
    "ว.ด.ป.": [{ORTH: "ว.ด.ป.", LEMMA: "วัน เดือน ปี"}],
-    #ระยะทาง / distance
+    # ระยะทาง / distance
    "ฮม.": [{ORTH: "ฮม.", LEMMA: "เฮกโตเมตร"}],
    "ดคม.": [{ORTH: "ดคม.", LEMMA: "เดคาเมตร"}],
    "ดม.": [{ORTH: "ดม.", LEMMA: "เดซิเมตร"}],
    "มม.": [{ORTH: "มม.", LEMMA: "มิลลิเมตร"}],
    "ซม.": [{ORTH: "ซม.", LEMMA: "เซนติเมตร"}],
    "กม.": [{ORTH: "กม.", LEMMA: "กิโลเมตร"}],
-    #น้ำหนัก / weight
+    # น้ำหนัก / weight
    "น.น.": [{ORTH: "น.น.", LEMMA: "น้ำหนัก"}],
    "ฮก.": [{ORTH: "ฮก.", LEMMA: "เฮกโตกรัม"}],
    "ดคก.": [{ORTH: "ดคก.", LEMMA: "เดคากรัม"}],
@ -305,7 +322,7 @@ _exc = {
    "มก.": [{ORTH: "มก.", LEMMA: "มิลลิกรัม"}],
    "ก.": [{ORTH: "ก.", LEMMA: "กรัม"}],
    "กก.": [{ORTH: "กก.", LEMMA: "กิโลกรัม"}],
-    #ปริมาตร / volume
+    # ปริมาตร / volume
    "ฮล.": [{ORTH: "ฮล.", LEMMA: "เฮกโตลิตร"}],
    "ดคล.": [{ORTH: "ดคล.", LEMMA: "เดคาลิตร"}],
    "ดล.": [{ORTH: "ดล.", LEMMA: "เดซิลิตร"}],
@ -313,12 +330,12 @@ _exc = {
    "ล.": [{ORTH: "ล.", LEMMA: "ลิตร"}],
    "กล.": [{ORTH: "กล.", LEMMA: "กิโลลิตร"}],
    "ลบ.": [{ORTH: "ลบ.", LEMMA: "ลูกบาศก์"}],
-    #พื้นที่ / area
+    # พื้นที่ / area
    "ตร.ซม.": [{ORTH: "ตร.ซม.", LEMMA: "ตารางเซนติเมตร"}],
    "ตร.ม.": [{ORTH: "ตร.ม.", LEMMA: "ตารางเมตร"}],
    "ตร.ว.": [{ORTH: "ตร.ว.", LEMMA: "ตารางวา"}],
    "ตร.กม.": [{ORTH: "ตร.กม.", LEMMA: "ตารางกิโลเมตร"}],
-    #เดือน / month
+    # เดือน / month
    "ม.ค.": [{ORTH: "ม.ค.", LEMMA: "มกราคม"}],
    "ก.พ.": [{ORTH: "ก.พ.", LEMMA: "กุมภาพันธ์"}],
    "มี.ค.": [{ORTH: "มี.ค.", LEMMA: "มีนาคม"}],
@ -331,22 +348,22 @@ _exc = {
    "ต.ค.": [{ORTH: "ต.ค.", LEMMA: "ตุลาคม"}],
    "พ.ย.": [{ORTH: "พ.ย.", LEMMA: "พฤศจิกายน"}],
    "ธ.ค.": [{ORTH: "ธ.ค.", LEMMA: "ธันวาคม"}],
-    #เพศ / gender
+    # เพศ / gender
    "ช.": [{ORTH: "ช.", LEMMA: "ชาย"}],
    "ญ.": [{ORTH: "ญ.", LEMMA: "หญิง"}],
    "ด.ช.": [{ORTH: "ด.ช.", LEMMA: "เด็กชาย"}],
    "ด.ญ.": [{ORTH: "ด.ญ.", LEMMA: "เด็กหญิง"}],
-    #ที่อยู่ / address
+    # ที่อยู่ / address
    "ถ.": [{ORTH: "ถ.", LEMMA: "ถนน"}],
    "ต.": [{ORTH: "ต.", LEMMA: "ตำบล"}],
    "อ.": [{ORTH: "อ.", LEMMA: "อำเภอ"}],
    "จ.": [{ORTH: "จ.", LEMMA: "จังหวัด"}],
-    #สรรพนาม / pronoun
+    # สรรพนาม / pronoun
    "ข้าฯ": [{ORTH: "ข้าฯ", LEMMA: "ข้าพระพุทธเจ้า"}],
    "ทูลเกล้าฯ": [{ORTH: "ทูลเกล้าฯ", LEMMA: "ทูลเกล้าทูลกระหม่อม"}],
    "น้อมเกล้าฯ": [{ORTH: "น้อมเกล้าฯ", LEMMA: "น้อมเกล้าน้อมกระหม่อม"}],
    "โปรดเกล้าฯ": [{ORTH: "โปรดเกล้าฯ", LEMMA: "โปรดเกล้าโปรดกระหม่อม"}],
-    #การเมือง / politic
+    # การเมือง / politic
    "ขจก.": [{ORTH: "ขจก.", LEMMA: "ขบวนการโจรก่อการร้าย"}],
    "ขบด.": [{ORTH: "ขบด.", LEMMA: "ขบวนการแบ่งแยกดินแดน"}],
    "นปช.": [{ORTH: "นปช.", LEMMA: "แนวร่วมประชาธิปไตยขับไล่เผด็จการ"}],
@ -363,7 +380,7 @@ _exc = {
    "สจ.": [{ORTH: "สจ.", LEMMA: "สมาชิกสภาจังหวัด"}],
    "สว.": [{ORTH: "สว.", LEMMA: "สมาชิกวุฒิสภา"}],
    "ส.ส.": [{ORTH: "ส.ส.", LEMMA: "สมาชิกสภาผู้แทนราษฎร"}],
-    #ทั่วไป / general
+    # ทั่วไป / general
    "ก.ข.ค.": [{ORTH: "ก.ข.ค.", LEMMA: "ก้างขวางคอ"}],
    "กทม.": [{ORTH: "กทม.", LEMMA: "กรุงเทพมหานคร"}],
    "กรุงเทพฯ": [{ORTH: "กรุงเทพฯ", LEMMA: "กรุงเทพมหานคร"}],
@ -376,7 +393,12 @@ _exc = {
    "จก.": [{ORTH: "จก.", LEMMA: "จำกัด"}],
    "จขกท.": [{ORTH: "จขกท.", LEMMA: "เจ้าของกระทู้"}],
    "จนท.": [{ORTH: "จนท.", LEMMA: "เจ้าหน้าที่"}],
-    "จ.ป.ร.": [{ORTH: "จ.ป.ร.", LEMMA: "มหาจุฬาลงกรณ ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระจุลจอมเกล้าเจ้าอยู่หัว)"}],
+    "จ.ป.ร.": [
        {
            ORTH: "จ.ป.ร.",
            LEMMA: "มหาจุฬาลงกรณ ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระจุลจอมเกล้าเจ้าอยู่หัว)",
        }
    ],
    "จ.ม.": [{ORTH: "จ.ม.", LEMMA: "จดหมาย"}],
    "จย.": [{ORTH: "จย.", LEMMA: "จักรยาน"}],
    "จยย.": [{ORTH: "จยย.", LEMMA: "จักรยานยนต์"}],
@ -387,7 +409,9 @@ _exc = {
    "น.ศ.": [{ORTH: "น.ศ.", LEMMA: "นักศึกษา"}],
    "น.ส.": [{ORTH: "น.ส.", LEMMA: "นางสาว"}],
    "น.ส.๓": [{ORTH: "น.ส.๓", LEMMA: "หนังสือรับรองการทำประโยชน์ในที่ดิน"}],
-    "น.ส.๓ ก.": [{ORTH: "น.ส.๓ ก", LEMMA: "หนังสือแสดงกรรมสิทธิ์ในที่ดิน (มีระวางกำหนด)"}],
+    "น.ส.๓ ก.": [
        {ORTH: "น.ส.๓ ก", LEMMA: "หนังสือแสดงกรรมสิทธิ์ในที่ดิน (มีระวางกำหนด)"}
    ],
    "นสพ.": [{ORTH: "นสพ.", LEMMA: "หนังสือพิมพ์"}],
    "บ.ก.": [{ORTH: "บ.ก.", LEMMA: "บรรณาธิการ"}],
    "บจก.": [{ORTH: "บจก.", LEMMA: "บริษัทจำกัด"}],
@ -410,7 +434,12 @@ _exc = {
    "พขร.": [{ORTH: "พขร.", LEMMA: "พนักงานขับรถ"}],
    "ภ.ง.ด.": [{ORTH: "ภ.ง.ด.", LEMMA: "ภาษีเงินได้"}],
    "ภ.ง.ด.๙": [{ORTH: "ภ.ง.ด.๙", LEMMA: "แบบแสดงรายการเสียภาษีเงินได้ของกรมสรรพากร"}],
-    "ภ.ป.ร.": [{ORTH: "ภ.ป.ร.", LEMMA: "ภูมิพลอดุยเดช ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระปรมินทรมหาภูมิพลอดุลยเดช)"}],
+    "ภ.ป.ร.": [
        {
            ORTH: "ภ.ป.ร.",
            LEMMA: "ภูมิพลอดุยเดช ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระปรมินทรมหาภูมิพลอดุลยเดช)",
        }
    ],
    "ภ.พ.": [{ORTH: "ภ.พ.", LEMMA: "ภาษีมูลค่าเพิ่ม"}],
    "ร.": [{ORTH: "ร.", LEMMA: "รัชกาล"}],
    "ร.ง.": [{ORTH: "ร.ง.", LEMMA: "โรงงาน"}],
@ -438,7 +467,6 @@ _exc = {
    "เสธ.": [{ORTH: "เสธ.", LEMMA: "เสนาธิการ"}],
    "หจก.": [{ORTH: "หจก.", LEMMA: "ห้างหุ้นส่วนจำกัด"}],
    "ห.ร.ม.": [{ORTH: "ห.ร.ม.", LEMMA: "ตัวหารร่วมมาก"}],
 }
--- a/spacy/language.py
+++ b/spacy/language.py
@ -333,6 +333,11 @@ class Language(object):
        """
        if name not in self.pipe_names:
            raise ValueError(Errors.E001.format(name=name, opts=self.pipe_names))
        if not hasattr(component, "__call__"):
            msg = Errors.E003.format(component=repr(component), name=name)
            if isinstance(component, basestring_) and component in self.factories:
                msg += Errors.E135.format(name=name)
            raise ValueError(msg)
        self.pipeline[self.pipe_names.index(name)] = (name, component)
    def rename_pipe(self, old_name, new_name):
@ -412,7 +417,9 @@ class Language(object):
        golds (iterable): A batch of `GoldParse` objects.
        drop (float): The droput rate.
        sgd (callable): An optimizer.
-        RETURNS (dict): Results from the update.
+        losses (dict): Dictionary to update with the loss, keyed by component.
        component_cfg (dict): Config parameters for specific pipeline
            components, keyed by component name.
        DOCS: https://spacy.io/api/language#update
        """
@ -593,6 +600,19 @@ class Language(object):
    def evaluate(
        self, docs_golds, verbose=False, batch_size=256, scorer=None, component_cfg=None
    ):
        """Evaluate a model's pipeline components.
        docs_golds (iterable): Tuples of `Doc` and `GoldParse` objects.
        verbose (bool): Print debugging information.
        batch_size (int): Batch size to use.
        scorer (Scorer): Optional `Scorer` to use. If not passed in, a new one
            will be created.
        component_cfg (dict): An optional dictionary with extra keyword
            arguments for specific components.
        RETURNS (Scorer): The scorer containing the evaluation results.
        DOCS: https://spacy.io/api/language#evaluate
        """
        if scorer is None:
            scorer = Scorer()
        if component_cfg is None:
--- a/spacy/lemmatizer.py
+++ b/spacy/lemmatizer.py
@ -1,5 +1,6 @@
 # coding: utf8
 from __future__ import unicode_literals
 from collections import OrderedDict
 from .symbols import POS, NOUN, VERB, ADJ, PUNCT, PROPN
 from .symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
@ -118,8 +119,8 @@ def lemmatize(string, index, exceptions, rules):
                forms.append(form)
            else:
                oov_forms.append(form)
-    # Remove duplicates, and sort forms generated by rules alphabetically.
+    # Remove duplicates but preserve the ordering of applied "rules"
-    forms = list(set(forms))
+    forms = list(OrderedDict.fromkeys(forms))
    # Put exceptions at the front of the list, so they get priority.
    # This is a dodgy heuristic -- but it's the best we can do until we get
    # frequencies on this. We can at least prune out problematic exceptions,
--- a/spacy/matcher/matcher.pyx
+++ b/spacy/matcher/matcher.pyx
@ -48,7 +48,10 @@ cdef class Matcher:
        self._extra_predicates = []
        self.vocab = vocab
        self.mem = Pool()
-        self.validator = get_json_validator(TOKEN_PATTERN_SCHEMA) if validate else None
+        if validate:
            self.validator = get_json_validator(TOKEN_PATTERN_SCHEMA)
        else:
            self.validator = None
    def __reduce__(self):
        data = (self.vocab, self._patterns, self._callbacks)
@ -105,7 +108,7 @@ cdef class Matcher:
                raise ValueError(Errors.E012.format(key=key))
            if self.validator:
                errors[i] = validate_json(pattern, self.validator)
-        if errors:
+        if any(err for err in errors.values()):
            raise MatchPatternError(key, errors)
        key = self._normalize_key(key)
        for pattern in patterns:
--- a/spacy/matcher/phrasematcher.pyx
+++ b/spacy/matcher/phrasematcher.pyx
@ -127,7 +127,7 @@ cdef class PhraseMatcher:
              and self.attr not in (DEP, POS, TAG, LEMMA):
                string_attr = self.vocab.strings[self.attr]
                user_warning(Warnings.W012.format(key=key, attr=string_attr))
-            tags = get_bilou(length)
+            tags = get_biluo(length)
            phrase_key = <attr_t*>mem.alloc(length, sizeof(attr_t))
            for i, tag in enumerate(tags):
                attr_value = self.get_lex_value(doc, i)
@ -230,7 +230,7 @@ cdef class PhraseMatcher:
        return "matcher:{}-{}".format(string_attr_name, string_attr_value)
-def get_bilou(length):
+def get_biluo(length):
    if length == 0:
        raise ValueError(Errors.E127)
    elif length == 1:
--- a/spacy/morphology.pyx
+++ b/spacy/morphology.pyx
@ -109,6 +109,7 @@ cdef class Morphology:
            analysis.tag = rich_tag
            analysis.lemma = self.lemmatize(analysis.tag.pos, token.lex.orth,
                                            self.tag_map.get(tag_str, {}))
            self._cache.set(tag_id, token.lex.orth, analysis)
        if token.lemma == 0:
            token.lemma = analysis.lemma
@ -140,7 +141,7 @@ cdef class Morphology:
        if tag not in self.reverse_index:
            return
        tag_id = self.reverse_index[tag]
-        orth = self.strings[orth_str]
+        orth = self.strings.add(orth_str)
        cdef RichTagC rich_tag = self.rich_tags[tag_id]
        attrs = intify_attrs(attrs, self.strings, _do_deprecated=True)
        cached = <MorphAnalysisC*>self._cache.get(tag_id, orth)
--- a/spacy/scorer.py
+++ b/spacy/scorer.py
@ -35,7 +35,17 @@ class PRFScore(object):
 class Scorer(object):
    """Compute evaluation scores."""
    def __init__(self, eval_punct=False):
        """Initialize the Scorer.
        eval_punct (bool): Evaluate the dependency attachments to and from
            punctuation.
        RETURNS (Scorer): The newly created object.
        DOCS: https://spacy.io/api/scorer#init
        """
        self.tokens = PRFScore()
        self.sbd = PRFScore()
        self.unlabelled = PRFScore()
@ -46,34 +56,46 @@ class Scorer(object):
    @property
    def tags_acc(self):
        """RETURNS (float): Part-of-speech tag accuracy (fine grained tags,
            i.e. `Token.tag`).
        """
        return self.tags.fscore * 100
    @property
    def token_acc(self):
        """RETURNS (float): Tokenization accuracy."""
        return self.tokens.precision * 100
    @property
    def uas(self):
        """RETURNS (float): Unlabelled dependency score."""
        return self.unlabelled.fscore * 100
    @property
    def las(self):
        """RETURNS (float): Labelled depdendency score."""
        return self.labelled.fscore * 100
    @property
    def ents_p(self):
        """RETURNS (float): Named entity accuracy (precision)."""
        return self.ner.precision * 100
    @property
    def ents_r(self):
        """RETURNS (float): Named entity accuracy (recall)."""
        return self.ner.recall * 100
    @property
    def ents_f(self):
        """RETURNS (float): Named entity accuracy (F-score)."""
        return self.ner.fscore * 100
    @property
    def scores(self):
        """RETURNS (dict): All scores with keys `uas`, `las`, `ents_p`,
            `ents_r`, `ents_f`, `tags_acc` and `token_acc`.
        """
        return {
            "uas": self.uas,
            "las": self.las,
@ -84,9 +106,20 @@ class Scorer(object):
            "token_acc": self.token_acc,
        }
-    def score(self, tokens, gold, verbose=False, punct_labels=("p", "punct")):
+    def score(self, doc, gold, verbose=False, punct_labels=("p", "punct")):
-        if len(tokens) != len(gold):
+        """Update the evaluation scores from a single Doc / GoldParse pair.
-            gold = GoldParse.from_annot_tuples(tokens, zip(*gold.orig_annot))
+
        doc (Doc): The predicted annotations.
        gold (GoldParse): The correct annotations.
        verbose (bool): Print debugging information.
        punct_labels (tuple): Dependency labels for punctuation. Used to
            evaluate dependency attachments to punctuation if `eval_punct` is
            `True`.
        DOCS: https://spacy.io/api/scorer#score
        """
        if len(doc) != len(gold):
            gold = GoldParse.from_annot_tuples(doc, zip(*gold.orig_annot))
        gold_deps = set()
        gold_tags = set()
        gold_ents = set(tags_to_entities([annot[-1] for annot in gold.orig_annot]))
@ -96,7 +129,7 @@ class Scorer(object):
                gold_deps.add((id_, head, dep.lower()))
        cand_deps = set()
        cand_tags = set()
-        for token in tokens:
+        for token in doc:
            if token.orth_.isspace():
                continue
            gold_i = gold.cand_to_gold[token.i]
@ -116,7 +149,7 @@ class Scorer(object):
                    cand_deps.add((gold_i, gold_head, token.dep_.lower()))
        if "-" not in [token[-1] for token in gold.orig_annot]:
            cand_ents = set()
-            for ent in tokens.ents:
+            for ent in doc.ents:
                first = gold.cand_to_gold[ent.start]
                last = gold.cand_to_gold[ent.end - 1]
                if first is None or last is None:
--- a/spacy/tests/doc/test_span.py
+++ b/spacy/tests/doc/test_span.py
@ -6,6 +6,7 @@ from spacy.attrs import ORTH, LENGTH
 from spacy.tokens import Doc, Span
 from spacy.vocab import Vocab
 from spacy.errors import ModelsWarning
 from spacy.util import filter_spans
 from ..util import get_doc
@ -219,3 +220,21 @@ def test_span_ents_property(doc):
    assert sentences[2].ents[0].label_ == "PRODUCT"
    assert sentences[2].ents[0].start == 11
    assert sentences[2].ents[0].end == 14
 def test_filter_spans(doc):
    # Test filtering duplicates
    spans = [doc[1:4], doc[6:8], doc[1:4], doc[10:14]]
    filtered = filter_spans(spans)
    assert len(filtered) == 3
    assert filtered[0].start == 1 and filtered[0].end == 4
    assert filtered[1].start == 6 and filtered[1].end == 8
    assert filtered[2].start == 10 and filtered[2].end == 14
    # Test filtering overlaps with longest preference
    spans = [doc[1:4], doc[1:3], doc[5:10], doc[7:9], doc[1:4]]
    filtered = filter_spans(spans)
    assert len(filtered) == 2
    assert len(filtered[0]) == 3
    assert len(filtered[1]) == 5
    assert filtered[0].start == 1 and filtered[0].end == 4
    assert filtered[1].start == 5 and filtered[1].end == 10
--- a/spacy/tests/doc/test_underscore.py
+++ b/spacy/tests/doc/test_underscore.py
@ -140,3 +140,28 @@ def test_underscore_mutable_defaults_dict(en_vocab):
    assert len(token1._.mutable) == 2
    assert token1._.mutable["x"] == ["y"]
    assert len(token2._.mutable) == 0
 def test_underscore_dir(en_vocab):
    """Test that dir() correctly returns extension attributes. This enables
    things like tab-completion for the attributes in doc._."""
    Doc.set_extension("test_dir", default=None)
    doc = Doc(en_vocab, words=["hello", "world"])
    assert "_" in dir(doc)
    assert "test_dir" in dir(doc._)
    assert "test_dir" not in dir(doc[0]._)
    assert "test_dir" not in dir(doc[0:2]._)
 def test_underscore_docstring(en_vocab):
    """Test that docstrings are available for extension methods, even though
    they're partials."""
    def test_method(doc, arg1=1, arg2=2):
        """I am a docstring"""
        return (arg1, arg2)
    Doc.set_extension("test_docstrings", method=test_method)
    doc = Doc(en_vocab, words=["hello", "world"])
    assert test_method.__doc__ == "I am a docstring"
    assert doc._.test_docstrings.__doc__.rsplit(". ")[-1] == "I am a docstring"
--- a/spacy/tests/pipeline/test_pipe_methods.py
+++ b/spacy/tests/pipeline/test_pipe_methods.py
@ -52,11 +52,13 @@ def test_get_pipe(nlp, name):
    assert nlp.get_pipe(name) == new_pipe
-@pytest.mark.parametrize("name,replacement", [("my_component", lambda doc: doc)])
+@pytest.mark.parametrize("name,replacement,not_callable", [("my_component", lambda doc: doc, {})])
-def test_replace_pipe(nlp, name, replacement):
+def test_replace_pipe(nlp, name, replacement, not_callable):
    with pytest.raises(ValueError):
        nlp.replace_pipe(name, new_pipe)
    nlp.add_pipe(new_pipe, name=name)
    with pytest.raises(ValueError):
        nlp.replace_pipe(name, not_callable)
    nlp.replace_pipe(name, replacement)
    assert nlp.get_pipe(name) != new_pipe
    assert nlp.get_pipe(name) == replacement
--- a/spacy/tests/regression/test_issue3449.py
+++ b/spacy/tests/regression/test_issue3449.py
@ -6,20 +6,16 @@ import pytest
 from spacy.lang.en import English
-@pytest.mark.xfail(reason="Current default suffix rules avoid one upper-case letter before a dot.")
+@pytest.mark.xfail(reason="default suffix rules avoid one upper-case letter before dot")
 def test_issue3449():
    nlp = English()
-    nlp.add_pipe(nlp.create_pipe('sentencizer'))
+    nlp.add_pipe(nlp.create_pipe("sentencizer"))
    text1 = "He gave the ball to I. Do you want to go to the movies with I?"
    text2 = "He gave the ball to I.  Do you want to go to the movies with I?"
    text3 = "He gave the ball to I.\nDo you want to go to the movies with I?"
    t1 = nlp(text1)
    t2 = nlp(text2)
    t3 = nlp(text3)
-
+    assert t1[5].text == "I"
-    assert t1[5].text == 'I'
+    assert t2[5].text == "I"
-    assert t2[5].text == 'I'
+    assert t3[5].text == "I"
    assert t3[5].text == 'I'
--- a/spacy/tests/regression/test_issue3549.py
+++ b/spacy/tests/regression/test_issue3549.py
@ -0,0 +1,15 @@
 # coding: utf8
 from __future__ import unicode_literals
 import pytest
 from spacy.matcher import Matcher
 from spacy.errors import MatchPatternError
 def test_issue3549(en_vocab):
    """Test that match pattern validation doesn't raise on empty errors."""
    matcher = Matcher(en_vocab, validate=True)
    pattern = [{"LOWER": "hello"}, {"LOWER": "world"}]
    matcher.add("GOOD", None, pattern)
    with pytest.raises(MatchPatternError):
        matcher.add("BAD", None, [{"X": "Y"}])
--- a/spacy/tests/regression/test_issue3555.py
+++ b/spacy/tests/regression/test_issue3555.py
@ -0,0 +1,17 @@
 # coding: utf8
 from __future__ import unicode_literals
 import pytest
 from spacy.tokens import Doc, Token
 from spacy.matcher import Matcher
@pytest.mark.xfail
 def test_issue3555(en_vocab):
    """Test that custom extensions with default None don't break matcher."""
    Token.set_extension("issue3555", default=None)
    matcher = Matcher(en_vocab)
    pattern = [{"LEMMA": "have"}, {"_": {"issue3555": True}}]
    matcher.add("TEST", None, pattern)
    doc = Doc(en_vocab, words=["have", "apple"])
    matcher(doc)
--- a/spacy/tests/regression/test_issue3803.py
+++ b/spacy/tests/regression/test_issue3803.py
@ -0,0 +1,15 @@
 # coding: utf8
 from __future__ import unicode_literals
 import pytest
 from spacy.lang.es import Spanish
 def test_issue3803():
    """Test that spanish num-like tokens have True for like_num attribute."""
    nlp = Spanish()
    text = "2 dos 1000 mil 12 doce"
    doc = nlp(text)
    assert [t.like_num for t in doc] == [True, True, True, True, True, True]
--- a/spacy/tests/test_misc.py
+++ b/spacy/tests/test_misc.py
@ -3,11 +3,13 @@ from __future__ import unicode_literals
 import pytest
 import os
 import ctypes
 from pathlib import Path
 from spacy import util
 from spacy import prefer_gpu, require_gpu
-from spacy.compat import symlink_to, symlink_remove, path2str
+from spacy.compat import symlink_to, symlink_remove, path2str, is_windows
 from spacy._ml import PrecomputableAffine
 from subprocess import CalledProcessError
@pytest.fixture
@ -28,12 +30,25 @@ def symlink_setup_target(request, symlink_target, symlink):
    # https://github.com/pytest-dev/pytest/issues/2508#issuecomment-309934240
    def cleanup():
-        symlink_remove(symlink)
+        # Remove symlink only if it was created
        if symlink.exists():
            symlink_remove(symlink)
        os.rmdir(path2str(symlink_target))
    request.addfinalizer(cleanup)
@pytest.fixture
 def is_admin():
    """Determine if the tests are run as admin or not."""
    try:
        admin = os.getuid() == 0
    except AttributeError:
        admin = ctypes.windll.shell32.IsUserAnAdmin() != 0
    return admin
@pytest.mark.parametrize("text", ["hello/world", "hello world"])
 def test_util_ensure_path_succeeds(text):
    path = util.ensure_path(text)
@ -88,7 +103,20 @@ def test_require_gpu():
        require_gpu()
-def test_create_symlink_windows(symlink_setup_target, symlink_target, symlink):
+def test_create_symlink_windows(
    symlink_setup_target, symlink_target, symlink, is_admin
 ):
    """Test the creation of symlinks on windows. If run as admin or not on windows it should succeed, otherwise a CalledProcessError should be raised."""
    assert symlink_target.exists()
-    symlink_to(symlink, symlink_target)
+
-    assert symlink.exists()
+    if is_admin or not is_windows:
        try:
            symlink_to(symlink, symlink_target)
            assert symlink.exists()
        except CalledProcessError as e:
            pytest.fail(e)
    else:
        with pytest.raises(CalledProcessError):
            symlink_to(symlink, symlink_target)
        assert not symlink.exists()
--- a/spacy/tokens/underscore.py
+++ b/spacy/tokens/underscore.py
@ -25,6 +25,11 @@ class Underscore(object):
        object.__setattr__(self, "_start", start)
        object.__setattr__(self, "_end", end)
    def __dir__(self):
        # Hack to enable autocomplete on custom extensions
        extensions = list(self._extensions.keys())
        return ["set", "get", "has"] + extensions
    def __getattr__(self, name):
        if name not in self._extensions:
            raise AttributeError(Errors.E046.format(name=name))
@ -32,7 +37,16 @@ class Underscore(object):
        if getter is not None:
            return getter(self._obj)
        elif method is not None:
-            return functools.partial(method, self._obj)
+            method_partial = functools.partial(method, self._obj)
            # Hack to port over docstrings of the original function
            # See https://stackoverflow.com/q/27362727/6400719
            method_docstring = method.__doc__ or ""
            method_docstring_prefix = (
                "This method is a partial function and its first argument "
                "(the object it's called on) will be filled automatically. "
            )
            method_partial.__doc__ = method_docstring_prefix + method_docstring
            return method_partial
        else:
            key = self._get_key(name)
            if key in self._doc.user_data:
--- a/spacy/util.py
+++ b/spacy/util.py
@ -14,8 +14,11 @@ import functools
 import itertools
 import numpy.random
 import srsly
 from jsonschema import Draft4Validator
 try:
    import jsonschema
 except ImportError:
    jsonschema = None
 try:
    import cupy.random
@ -510,7 +513,7 @@ def decaying(start, stop, decay):
    curr = float(start)
    while True:
        yield max(curr, stop)
-        curr -= (decay)
+        curr -= decay
 def minibatch_by_words(items, size, tuples=True, count_words=len):
@ -571,6 +574,28 @@ def itershuffle(iterable, bufsize=1000):
        raise StopIteration
 def filter_spans(spans):
    """Filter a sequence of spans and remove duplicates or overlaps. Useful for
    creating named entities (where one token can only be part of one entity) or
    when merging spans with `Retokenizer.merge`. When spans overlap, the (first)
    longest span is preferred over shorter spans.
    spans (iterable): The spans to filter.
    RETURNS (list): The filtered spans.
    """
    get_sort_key = lambda span: (span.end - span.start, span.start)
    sorted_spans = sorted(spans, key=get_sort_key, reverse=True)
    result = []
    seen_tokens = set()
    for span in sorted_spans:
        # Check for end - 1 here because boundaries are inclusive
        if span.start not in seen_tokens and span.end - 1 not in seen_tokens:
            result.append(span)
        seen_tokens.update(range(span.start, span.end))
    result = sorted(result, key=lambda span: span.start)
    return result
 def to_bytes(getters, exclude):
    serialized = OrderedDict()
    for key, getter in getters.items():
@ -660,7 +685,9 @@ def get_json_validator(schema):
    # validator that's used (e.g. different draft implementation), without
    # having to change it all across the codebase.
    # TODO: replace with (stable) Draft6Validator, if available
-    return Draft4Validator(schema)
+    if jsonschema is None:
        raise ValueError(Errors.E136)
    return jsonschema.Draft4Validator(schema)
 def validate_schema(schema):
--- a/website/README.md
+++ b/website/README.md
@ -457,7 +457,7 @@ sit amet dignissim justo congue.
 ## Setup and installation {#setup}
 Before running the setup, make sure your versions of
-[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date.
+[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date.  Node v10.15 or later is required.
 ```bash
 # Clone the repository
--- a/website/UNIVERSE.md
+++ b/website/UNIVERSE.md
@ -0,0 +1,94 @@
 <a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
 # spaCy Universe
 The [spaCy Universe](https://spacy.io/universe) collects the many great resources developed with or for spaCy. It
 includes standalone packages, plugins, extensions, educational materials,
 operational utilities and bindings for other languages.
 If you have a project that you want the spaCy community to make use of, you can
 suggest it by submitting a pull request to this repository. The Universe
 database is open-source and collected in a simple JSON file.
 Looking for inspiration for your own spaCy plugin or extension? Check out the
 [`project idea`](https://github.com/explosion/spaCy/labels/project%20idea) label
 on the issue tracker.
 ## Checklist
 ### Projects
 ✅ Libraries and packages should be **open-source** (with a user-friendly license) and at least somewhat **documented** (e.g. a simple `README` with usage instructions).
 ✅ We're happy to include work in progress and prereleases, but we'd like to keep the emphasis on projects that should be useful to the community **right away**.
 ✅ Demos and visualizers should be available via a **public URL**.
 ### Educational Materials
 ✅ Books should be **available for purchase or download** (not just pre-order). Ebooks and self-published books are fine, too, if they include enough substantial content.
 ✅ The `"url"` of book entries should either point to the publisher's website or a reseller of your choice (ideally one that ships worldwide or as close as possible).
 ✅ If an online course is only available behind a paywall, it should at least have a **free excerpt** or chapter available, so users know what to expect.
 ## JSON format
 To add a project, fork this repository, edit the [`universe.json`](meta/universe.json)
 and add an object of the following format to the list of `"resources"`. Before
 you submit your pull request, make sure to use a linter to verify that your
 markup is correct.
 ```json
 {
    "id": "unique-project-id",
    "title": "Project title",
    "slogan": "A short summary",
    "description": "A longer description – *Mardown allowed!*",
    "github": "user/repo",
    "pip": "package-name",
    "code_example": [
        "import spacy",
        "import package_name",
        "",
        "nlp = spacy.load('en')",
        "nlp.add_pipe(package_name)"
    ],
    "code_language": "python",
    "url": "https://example.com",
    "thumb": "https://example.com/thumb.jpg",
    "image": "https://example.com/image.jpg",
    "author": "Your Name",
    "author_links": {
        "twitter": "username",
        "github": "username",
        "website": "https://example.com"
    },
    "category": ["pipeline", "standalone"],
    "tags": ["some-tag", "etc"]
 }
 ```
 |  Field | Type | Description |
 | --- | --- | --- |
 | `id` | string | Unique ID of the project. |
 | `title` | string | Project title. If not set, the `id` will be used as the display title. |
 | `slogan` | string | A short description of the project. Displayed in the overview and under the title. |
 | `description` | string | A longer description of the project. Markdown is allowed, but should be limited to basic formatting like bold, italics, code or links. |
 | `github` | string | Associated GitHub repo in the format `user/repo`. Will be displayed as a link and used for release, license and star badges. |
 | `pip` | string | Package name on pip. If available, the installation command will be displayed. |
 | `cran` | string | For R packages: package name on CRAN. If available, the installation command will be displayed. |
 | `code_example` | array | Short example that shows how to use the project. Formatted as an array with one string per line. |
 | `code_language` | string | Defaults to `'python'`. Optional code language used for syntax highlighting with [Prism](http://prismjs.com/). |
 | `url` | string | Optional project link to display as button. |
 | `thumb` | string | Optional URL to project thumbnail to display in overview and project header. Recommended size is 100x100px. |
 | `image` | string | Optional URL to project image to display with description. |
 | `author` | string | Name(s) of project author(s). |
 | `author_links` | object | Usernames and links to display as icons to author info. Currently supports `twitter` and `github` usernames, as well as `website` link. |
 | `category` | list | One or more categories to assign to project. Must be one of the available options. |
 | `tags` | list | Still experimental and not used for filtering: one or more tags to assign to project. |
 To separate them from the projects, educational materials also specify
 `"type": "education`. Books can also set a `"cover"` field containing a URL
 to a cover image. If available, it's used in the overview and displayed on
 the individual book page.
--- a/website/docs/api/annotation.md
+++ b/website/docs/api/annotation.md
@ -510,7 +510,7 @@ described in any single publication. The model is a greedy transition-based
 parser guided by a linear model whose weights are learned using the averaged
 perceptron loss, via the
 [dynamic oracle](http://www.aclweb.org/anthology/C12-1059) imitation learning
-strategy. The transition system is equivalent to the BILOU tagging scheme.
+strategy. The transition system is equivalent to the BILUO tagging scheme.
 ## Models and training data {#training}
--- a/website/docs/api/cli.md
+++ b/website/docs/api/cli.md
@ -189,7 +189,7 @@ using the [`package`](/api/cli#package) command.
 <Infobox title="Changed in v2.1" variant="warning">
-As of spaCy 2.1, the `--no-tagger`, `--no-parser` and `--no-parser` flags have
+As of spaCy 2.1, the `--no-tagger`, `--no-parser` and `--no-entities` flags have
 been replaced by a `--pipeline` option, which lets you define comma-separated
 names of pipeline components to train. For example, `--pipeline tagger,parser`
 will only train the tagger and parser.
@ -198,7 +198,7 @@ will only train the tagger and parser.
 ```bash
 $ python -m spacy train [lang] [output_path] [train_path] [dev_path]
-[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-examples] [--use-gpu]
+[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-early-stopping] [--n-examples] [--use-gpu]
 [--version] [--meta-path] [--init-tok2vec] [--parser-multitasks]
 [--entity-multitasks] [--gold-preproc] [--noise-level] [--learn-tokens]
 [--verbose]
@ -210,10 +210,11 @@ $ python -m spacy train [lang] [output_path] [train_path] [dev_path]
 | `output_path`                                         | positional    | Directory to store model in. Will be created if it doesn't exist.                                                                                                 |
 | `train_path`                                          | positional    | Location of JSON-formatted training data. Can be a file or a directory of files.                                                                                  |
 | `dev_path`                                            | positional    | Location of JSON-formatted development data for evaluation. Can be a file or a directory of files.                                                                |
-| `--base-model`, `-b`                                  | option        | Optional name of base model to update. Can be any loadable spaCy model.                                                                                           |
+| `--base-model`, `-b` <Tag variant="new">2.1</Tag>     | option        | Optional name of base model to update. Can be any loadable spaCy model.                                                                                           |
 | `--pipeline`, `-p` <Tag variant="new">2.1</Tag>       | option        | Comma-separated names of pipeline components to train. Defaults to `'tagger,parser,ner'`.                                                                         |
 | `--vectors`, `-v`                                     | option        | Model to load vectors from.                                                                                                                                       |
 | `--n-iter`, `-n`                                      | option        | Number of iterations (default: `30`).                                                                                                                             |
 | `--n-early-stopping`, `-ne`                           | option        | Maximum number of training epochs without dev accuracy improvement.                                                                                               |
 | `--n-examples`, `-ns`                                 | option        | Number of examples to use (defaults to `0` for all examples).                                                                                                     |
 | `--use-gpu`, `-g`                                     | option        | Whether to use GPU. Can be either `0`, `1` or `-1`.                                                                                                               |
 | `--version`, `-V`                                     | option        | Model version. Will be written out to the model's `meta.json` after training.                                                                                     |
@ -274,7 +275,7 @@ an approximate language-modeling objective. Specifically, we load pre-trained
 vectors, and train a component like a CNN, BiLSTM, etc to predict vectors which
 match the pre-trained ones. The weights are saved to a directory after each
 epoch. You can then pass a path to one of these pre-trained weights files to the
-'spacy train' command.
+`spacy train` command.
 This technique may be especially helpful if you have little labelled data.
 However, it's still quite experimental, so your mileage may vary. To load the
@ -285,24 +286,26 @@ improvement.
 ```bash
 $ python -m spacy pretrain [texts_loc] [vectors_model] [output_dir] [--width]
 [--depth] [--embed-rows] [--dropout] [--seed] [--n-iter] [--use-vectors]
 [--n-save_every]
 ```
-| Argument               | Type       | Description                                                                                                                       |
+| Argument                | Type       | Description                                                                                                                       |
-| ---------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- |
+| ----------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- |
-| `texts_loc`            | positional | Path to JSONL file with raw texts to learn from, with text provided as the key `"text"`. [See here](#pretrain-jsonl) for details. |
+| `texts_loc`             | positional | Path to JSONL file with raw texts to learn from, with text provided as the key `"text"`. [See here](#pretrain-jsonl) for details. |
-| `vectors_model`        | positional | Name or path to spaCy model with vectors to learn from.                                                                           |
+| `vectors_model`         | positional | Name or path to spaCy model with vectors to learn from.                                                                           |
-| `output_dir`           | positional | Directory to write models to on each epoch.                                                                                       |
+| `output_dir`            | positional | Directory to write models to on each epoch.                                                                                       |
-| `--width`, `-cw`       | option     | Width of CNN layers.                                                                                                              |
+| `--width`, `-cw`        | option     | Width of CNN layers.                                                                                                              |
-| `--depth`, `-cd`       | option     | Depth of CNN layers.                                                                                                              |
+| `--depth`, `-cd`        | option     | Depth of CNN layers.                                                                                                              |
-| `--embed-rows`, `-er`  | option     | Number of embedding rows.                                                                                                         |
+| `--embed-rows`, `-er`   | option     | Number of embedding rows.                                                                                                         |
-| `--dropout`, `-d`      | option     | Dropout rate.                                                                                                                     |
+| `--dropout`, `-d`       | option     | Dropout rate.                                                                                                                     |
-| `--batch-size`, `-bs`  | option     | Number of words per training batch.                                                                                               |
+| `--batch-size`, `-bs`   | option     | Number of words per training batch.                                                                                               |
-| `--max-length`, `-xw`  | option     | Maximum words per example. Longer examples are discarded.                                                                         |
+| `--max-length`, `-xw`   | option     | Maximum words per example. Longer examples are discarded.                                                                         |
-| `--min-length`, `-nw`  | option     | Minimum words per example. Shorter examples are discarded.                                                                        |
+| `--min-length`, `-nw`   | option     | Minimum words per example. Shorter examples are discarded.                                                                        |
-| `--seed`, `-s`         | option     | Seed for random number generators.                                                                                                |
+| `--seed`, `-s`          | option     | Seed for random number generators.                                                                                                |
-| `--n-iter`, `-i`       | option     | Number of iterations to pretrain.                                                                                                 |
+| `--n-iter`, `-i`        | option     | Number of iterations to pretrain.                                                                                                 |
-| `--use-vectors`, `-uv` | flag       | Whether to use the static vectors as input features.                                                                              |
+| `--use-vectors`, `-uv`  | flag       | Whether to use the static vectors as input features.                                                                              |
-| **CREATES**            | weights    | The pre-trained weights that can be used to initialize `spacy train`.                                                             |
+| `--n-save_every`, `-se` | option     | Save model every X batches.                                                                                                       |
 | **CREATES**             | weights    | The pre-trained weights that can be used to initialize `spacy train`.                                                             |
 ### JSONL format for raw text {#pretrain-jsonl}
@ -324,7 +327,7 @@ tokenization can be provided.
 | Key      | Type    | Description                                  |
 | -------- | ------- | -------------------------------------------- |
-| `text`   | unicode | The raw input text.                          |
+| `text`   | unicode | The raw input text. Is not required if `tokens` available. |
 | `tokens` | list    | Optional tokenization, one string per token. |
 ```json
@ -332,6 +335,7 @@ tokenization can be provided.
 {"text": "Can I ask where you work now and what you do, and if you enjoy it?"}
 {"text": "They may just pull out of the Seattle market completely, at least until they have autonomous vehicles."}
 {"text": "My cynical view on this is that it will never be free to the public. Reason: what would be the draw of joining the military? Right now their selling point is free Healthcare and Education. Ironically both are run horribly and most, that I've talked to, come out wishing they never went in."}
 {"tokens": ["If", "tokens", "are", "provided", "then", "we", "can", "skip", "the", "raw", "input", "text"]}
 ```
 ## Init Model {#init-model new="2"}
@ -375,7 +379,7 @@ pipeline.
 ```bash
 $ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit]
-[--gpu-id] [--gold-preproc]
+[--gpu-id] [--gold-preproc] [--return-scores]
 ```
 | Argument                  | Type           | Description                                                                                                                                              |
@ -386,6 +390,7 @@ $ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-lim
 | `--displacy-limit`, `-dl` | option         | Number of parses to generate per file. Defaults to `25`. Keep in mind that a significantly higher number might cause the `.html` files to render slowly. |
 | `--gpu-id`, `-g`          | option         | GPU to use, if any. Defaults to `-1` for CPU.                                                                                                            |
 | `--gold-preproc`, `-G`    | flag           | Use gold preprocessing.                                                                                                                                  |
 | `--return-scores`, `-R`   | flag           | Return dict containing model scores.                                                                                                                     |
 | **CREATES**               | `stdout`, HTML | Training results and optional displaCy visualizations.                                                                                                   |
 ## Package {#package}
--- a/website/docs/api/cython-structs.md
+++ b/website/docs/api/cython-structs.md
@ -172,7 +172,7 @@ struct.
 | `prefix`    | <Abbr title="uint64_t">`attr_t`</Abbr>  | Length-N substring from the start of the lexeme. Defaults to `N=1`.                                                        |
 | `suffix`    | <Abbr title="uint64_t">`attr_t`</Abbr>  | Length-N substring from the end of the lexeme. Defaults to `N=3`.                                                          |
 | `cluster`   | <Abbr title="uint64_t">`attr_t`</Abbr>  | Brown cluster ID.                                                                                                          |
-| `prob`      | `float`                                 | Smoothed log probability estimate of the lexeme's type.                                                                    |
+| `prob`      | `float`                                 | Smoothed log probability estimate of the lexeme's word type (context-independent entry in the vocabulary).                 |
 | `sentiment` | `float`                                 | A scalar value indicating positivity or negativity.                                                                        |
 ### Lexeme.get_struct_attr {#lexeme_get_struct_attr tag="staticmethod, nogil" source="spacy/lexeme.pxd"}
--- a/website/docs/api/dependencyparser.md
+++ b/website/docs/api/dependencyparser.md
@ -102,10 +102,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
 > scores = parser.predict([doc1, doc2])
 > ```
-| Name        | Type     | Description                                                                                                                                                                                                                        |
+| Name        | Type                | Description                                    |
-| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ----------- | ------------------- | ---------------------------------------------- |
-| `docs`      | iterable | The documents to predict.                                                                                                                                                                                                          |
+| `docs`      | iterable            | The documents to predict.                      |
-| **RETURNS** | tuple    | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
+| **RETURNS** | `syntax.StateClass` | A helper class for the parse state (internal). |
 ## DependencyParser.set_annotations {#set_annotations tag="method"}
--- a/website/docs/api/language.md
+++ b/website/docs/api/language.md
@ -119,8 +119,27 @@ Update the models in the pipeline.
 | `golds`                                      | iterable | A batch of `GoldParse` objects or dictionaries. Dictionaries will be used to create [`GoldParse`](/api/goldparse) objects. For the available keys and their usage, see [`GoldParse.__init__`](/api/goldparse#init). |
 | `drop`                                       | float    | The dropout rate.                                                                                                                                                                                                   |
 | `sgd`                                        | callable | An optimizer.                                                                                                                                                                                                       |
 | `losses`                                     | dict     | Dictionary to update with the loss, keyed by pipeline component.                                                                                                                                                    |
 | `component_cfg` <Tag variant="new">2.1</Tag> | dict     | Config parameters for specific pipeline components, keyed by component name.                                                                                                                                        |
-| **RETURNS**                                  | dict     | Results from the update.                                                                                                                                                                                            |
+
 ## Language.evaluate {#evaluate tag="method"}
 Evaluate a model's pipeline components.
 > #### Example
 >
 > ```python
 > scorer = nlp.evaluate(docs_golds, verbose=True)
 > print(scorer.scores)
 > ```
 | Name                                         | Type     | Description                                                                           |
 | -------------------------------------------- | -------- | ------------------------------------------------------------------------------------- |
 | `docs_golds`                                 | iterable | Tuples of `Doc` and `GoldParse` objects.                                              |
 | `verbose`                                    | bool     | Print debugging information.                                                          |
 | `batch_size`                                 | int      | The batch size to use.                                                                |
 | `scorer`                                     | `Scorer` | Optional [`Scorer`](/api/scorer) to use. If not passed in, a new one will be created. |
 | `component_cfg` <Tag variant="new">2.1</Tag> | dict     | Config parameters for specific pipeline components, keyed by component name.          |
 ## Language.begin_training {#begin_training tag="method"}
--- a/website/docs/api/lexeme.md
+++ b/website/docs/api/lexeme.md
@ -128,7 +128,6 @@ The L2 norm of the lexeme's vector representation.
 | `text`                                       | unicode | Verbatim text content.                                                                                       |
 | `orth`                                       | int     | ID of the verbatim text content.                                                                             |
 | `orth_`                                      | unicode | Verbatim text content (identical to `Lexeme.text`). Exists mostly for consistency with the other attributes. |
 | `lex_id`                                     | int     | ID of the lexeme's lexical type.                                                                             |
 | `rank`                                       | int     | Sequential ID of the lexemes's lexical type, used to index into tables, e.g. for word vectors.               |
 | `flags`                                      | int     | Container of the lexeme's binary flags.                                                                      |
 | `norm`                                       | int     | The lexemes's norm, i.e. a normalized form of the lexeme text.                                               |
@ -161,6 +160,6 @@ The L2 norm of the lexeme's vector representation.
 | `is_stop`                                    | bool    | Is the lexeme part of a "stop list"?                                                                         |
 | `lang`                                       | int     | Language of the parent vocabulary.                                                                           |
 | `lang_`                                      | unicode | Language of the parent vocabulary.                                                                           |
-| `prob`                                       | float   | Smoothed log probability estimate of the lexeme's type.                                                      |
+| `prob`                                       | float   | Smoothed log probability estimate of the lexeme's word type (context-independent entry in the vocabulary).   |
 | `cluster`                                    | int     | Brown cluster ID.                                                                                            |
 | `sentiment`                                  | float   | A scalar value indicating the positivity or negativity of the lexeme.                                        |
--- a/website/docs/api/scorer.md
+++ b/website/docs/api/scorer.md
@ -0,0 +1,58 @@
 ---
 title: Scorer
 teaser: Compute evaluation scores
 tag: class
 source: spacy/scorer.py
 ---
 The `Scorer` computes and stores evaluation scores. It's typically created by
 [`Language.evaluate`](/api/language#evaluate).
 ## Scorer.\_\_init\_\_ {#init tag="method"}
 Create a new `Scorer`.
 > #### Example
 >
 > ```python
 > from spacy.scorer import Scorer
 >
 > scorer = Scorer()
 > ```
 | Name         | Type     | Description                                                  |
 | ------------ | -------- | ------------------------------------------------------------ |
 | `eval_punct` | bool     | Evaluate the dependency attachments to and from punctuation. |
 | **RETURNS**  | `Scorer` | The newly created object.                                    |
 ## Scorer.score {#score tag="method"}
 Update the evaluation scores from a single [`Doc`](/api/doc) /
 [`GoldParse`](/api/goldparse) pair.
 > #### Example
 >
 > ```python
 > scorer = Scorer()
 > scorer.score(doc, gold)
 > ```
 | Name           | Type        | Description                                                                                                          |
 | -------------- | ----------- | -------------------------------------------------------------------------------------------------------------------- |
 | `doc`          | `Doc`       | The predicted annotations.                                                                                           |
 | `gold`         | `GoldParse` | The correct annotations.                                                                                             |
 | `verbose`      | bool        | Print debugging information.                                                                                         |
 | `punct_labels` | tuple       | Dependency labels for punctuation. Used to evaluate dependency attachments to punctuation if `eval_punct` is `True`. |
 ## Properties
 | Name        | Type  | Description                                                                                  |
 | ----------- | ----- | -------------------------------------------------------------------------------------------- |
 | `token_acc` | float | Tokenization accuracy.                                                                       |
 | `tags_acc`  | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`).                           |
 | `uas`       | float | Unlabelled dependency score.                                                                 |
 | `las`       | float | Labelled dependency score.                                                                   |
 | `ents_p`    | float | Named entity accuracy (precision).                                                           |
 | `ents_r`    | float | Named entity accuracy (recall).                                                              |
 | `ents_f`    | float | Named entity accuracy (F-score).                                                             |
 | `scores`    | dict  | All scores with keys `uas`, `las`, `ents_p`, `ents_r`, `ents_f`, `tags_acc` and `token_acc`. |
--- a/website/docs/api/token.md
+++ b/website/docs/api/token.md
@ -424,7 +424,7 @@ The L2 norm of the token's vector representation.
 | `ent_type`                                   | int          | Named entity type.                                                                                                                                                                                                            |
 | `ent_type_`                                  | unicode      | Named entity type.                                                                                                                                                                                                            |
 | `ent_iob`                                    | int          | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set.                                                  |  |
-| `ent_iob_`                                   | unicode      | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set.                                                  |
+| `ent_iob_`                                   | unicode      | IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set.                                                  |
 | `ent_id`                                     | int          | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution.                                                                                                         |
 | `ent_id_`                                    | unicode      | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution.                                                                                                         |
 | `lemma`                                      | int          | Base form of the token, with no inflectional suffixes.                                                                                                                                                                        |
@ -465,10 +465,10 @@ The L2 norm of the token's vector representation.
 | `dep_`                                       | unicode      | Syntactic dependency relation.                                                                                                                                                                                                |
 | `lang`                                       | int          | Language of the parent document's vocabulary.                                                                                                                                                                                 |
 | `lang_`                                      | unicode      | Language of the parent document's vocabulary.                                                                                                                                                                                 |
-| `prob`                                       | float        | Smoothed log probability estimate of token's type.                                                                                                                                                                            |
+| `prob`                                       | float        | Smoothed log probability estimate of token's word type (context-independent entry in the vocabulary).                                                                                                                         |
 | `idx`                                        | int          | The character offset of the token within the parent document.                                                                                                                                                                 |
 | `sentiment`                                  | float        | A scalar value indicating the positivity or negativity of the token.                                                                                                                                                          |
-| `lex_id`                                     | int          | Sequential ID of the token's lexical type.                                                                                                                                                                                    |
+| `lex_id`                                     | int          | Sequential ID of the token's lexical type, used to index into tables, e.g. for word vectors.                                                                                                                                  |
 | `rank`                                       | int          | Sequential ID of the token's lexical type, used to index into tables, e.g. for word vectors.                                                                                                                                  |
 | `cluster`                                    | int          | Brown cluster ID.                                                                                                                                                                                                             |
 | `_`                                          | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes).                                                                                                                |
--- a/website/docs/api/top-level.md
+++ b/website/docs/api/top-level.md
@ -211,16 +211,16 @@ Render a dependency parse tree or named entity visualization.
 > html = displacy.render(doc, style="dep")
 > ```
-| Name        | Type                | Description                                                                                                                          | Default                |
+| Name        | Type                | Description                                                                                                                                               | Default |
-| ----------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------- |
+| ----------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
-| `docs`      | list, `Doc`, `Span` | Document(s) to visualize.                                                                                                            |
+| `docs`      | list, `Doc`, `Span` | Document(s) to visualize.                                                                                                                                 |
-| `style`     | unicode             | Visualization style, `'dep'` or `'ent'`.                                                                                             | `'dep'`                |
+| `style`     | unicode             | Visualization style, `'dep'` or `'ent'`.                                                                                                                  | `'dep'` |
-| `page`      | bool                | Render markup as full HTML page.                                                                                                     | `False`                |
+| `page`      | bool                | Render markup as full HTML page.                                                                                                                          | `False` |
-| `minify`    | bool                | Minify HTML markup.                                                                                                                  | `False`                |
+| `minify`    | bool                | Minify HTML markup.                                                                                                                                       | `False` |
-| `jupyter`   | bool                | Explicitly enable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook.                         | detected automatically |
+| `jupyter`   | bool                | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None`. | `None`  |
-| `options`   | dict                | [Visualizer-specific options](#options), e.g. colors.                                                                                | `{}`                   |
+| `options`   | dict                | [Visualizer-specific options](#options), e.g. colors.                                                                                                     | `{}`    |
-| `manual`    | bool                | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False`                |
+| `manual`    | bool                | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples.                      | `False` |
-| **RETURNS** | unicode             | Rendered HTML markup.                                                                                                                |
+| **RETURNS** | unicode             | Rendered HTML markup.                                                                                                                                     |
 ### Visualizer options {#displacy_options}
@ -351,7 +351,7 @@ the two-letter language code.
 | `name` | unicode    | Two-letter language code, e.g. `'en'`. |
 | `cls`  | `Language` | The language class, e.g. `English`.    |
-### util.lang_class_is_loaded (#util.lang_class_is_loaded tag="function" new="2.1")
+### util.lang_class_is_loaded {#util.lang_class_is_loaded tag="function" new="2.1"}
 Check whether a `Language` class is already loaded. `Language` classes are
 loaded lazily, to avoid expensive setup code associated with the language data.
@ -654,6 +654,27 @@ for batching. Larger `buffsize` means less bias.
 | `buffsize` | int      | Items to hold back.    |
 | **YIELDS** | iterable | The shuffled iterator. |
 ### util.filter_spans {#util.filter_spans tag="function" new="2.1.4"}
 Filter a sequence of [`Span`](/api/span) objects and remove duplicates or
 overlaps. Useful for creating named entities (where one token can only be part
 of one entity) or when merging spans with
 [`Retokenizer.merge`](/api/doc#retokenizer.merge). When spans overlap, the
 (first) longest span is preferred over shorter spans.
 > #### Example
 >
 > ```python
 > doc = nlp("This is a sentence.")
 > spans = [doc[0:2], doc[0:2], doc[0:4]]
 > filtered = filter_spans(spans)
 > ```
 | Name        | Type     | Description          |
 | ----------- | -------- | -------------------- |
 | `spans`     | iterable | The spans to filter. |
 | **RETURNS** | list     | The filtered spans.  |
 ## Compatibility functions {#compat source="spacy/compaty.py"}
 All Python code is written in an **intersection of Python 2 and Python 3**. This
--- a/website/docs/api/vectors.md
+++ b/website/docs/api/vectors.md
@ -306,7 +306,7 @@ vectors, they will be counted individually.
 Load [GloVe](https://nlp.stanford.edu/projects/glove/) vectors from a directory.
 Assumes binary format, that the vocab is in a `vocab.txt`, and that vectors are
-named `vectors.{size}.[fd`.bin], e.g. `vectors.128.f.bin` for 128d float32
+named `vectors.{size}.[fd.bin]`, e.g. `vectors.128.f.bin` for 128d float32
 vectors, `vectors.300.d.bin` for 300d float64 (double) vectors, etc. By default
 GloVe outputs 64-bit vectors.
--- a/website/docs/images/cheatsheet.jpg
+++ b/website/docs/images/cheatsheet.jpg
--- a/website/docs/images/course.jpg
+++ b/website/docs/images/course.jpg
--- a/website/docs/usage/101/_serialization.md
+++ b/website/docs/usage/101/_serialization.md
@ -4,7 +4,7 @@ example, everything that's in your `nlp` object. This means you'll have to
 translate its contents and structure into a format that can be saved, like a
 file or a byte string. This process is called serialization. spaCy comes with
 **built-in serialization methods** and supports the
-[Pickle protocol](http://www.diveintopython3.net/serializing.html#dump).
+[Pickle protocol](https://www.diveinto.org/python3/serializing.html#dump).
 > #### What's pickle?
 >
--- a/website/docs/usage/facts-figures.md
+++ b/website/docs/usage/facts-figures.md
@ -50,7 +50,7 @@ together.
 ## Benchmarks {#benchmarks}
-Two peer-reviewed papers in 2015 confirm that spaCy offers the **fastest
+Two peer-reviewed papers in 2015 confirmed that spaCy offers the **fastest
 syntactic parser in the world** and that **its accuracy is within 1% of the
 best** available. The few systems that are more accurate are 20× slower or more.
--- a/website/docs/usage/models.md
+++ b/website/docs/usage/models.md
@ -326,7 +326,7 @@ URLs.
 ```text
 ### requirements.txt
 spacy>=2.0.0,<3.0.0
-https://github.com/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm
+https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz#egg=en_core_web_sm
 ```
 Specifying `#egg=` with the package name tells pip which package to expect from
--- a/website/docs/usage/processing-pipelines.md
+++ b/website/docs/usage/processing-pipelines.md
@ -260,7 +260,7 @@ def my_component(doc):
 nlp = spacy.load("en_core_web_sm")
 nlp.add_pipe(my_component, name="print_info", last=True)
-print(nlp.pipe_names)  # ['print_info', 'tagger', 'parser', 'ner']
+print(nlp.pipe_names)  # ['tagger', 'parser', 'ner', 'print_info']
 doc = nlp(u"This is a sentence.")
 ```
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -214,7 +214,8 @@ example, you might want to match different spellings of a word, without having
 to add a new pattern for each spelling.
 ```python
-pattern = [{"TEXT": {"REGEX": "^([Uu](\\.?|nited) ?[Ss](\\.?|tates)"}},
+pattern = [{"TEXT": {"REGEX": "^[Uu](\\.?|nited)$"}},
           {"TEXT": {"REGEX": "^[Ss](\\.?|tates)$"}},
           {"LOWER": "president"}]
 ```
@ -227,7 +228,7 @@ attributes:
 pattern = [{"TAG": {"REGEX": "^V"}}]
 # Match custom attribute values with regular expressions
-pattern = [{"_": {"country": {"REGEX": "^([Uu](\\.?|nited) ?[Ss](\\.?|tates)"}}}]
+pattern = [{"_": {"country": {"REGEX": "^[Uu](\\.?|nited) ?[Ss](\\.?|tates)$"}}}]
 ```
 <Infobox title="Regular expressions in older versions" variant="warning">
@ -404,7 +405,7 @@ class BadHTMLMerger(object):
        for match_id, start, end in matches:
            spans.append(doc[start:end])
        with doc.retokenize() as retokenizer:
-            for span in hashtags:
+            for span in spans:
                retokenizer.merge(span)
                for token in span:
                    token._.bad_html = True  # Mark token as bad HTML
@ -678,7 +679,7 @@ for match_id, start, end in matches:
    if doc.vocab.strings[match_id] == "HASHTAG":
        hashtags.append(doc[start:end])
 with doc.retokenize() as retokenizer:
-    for span in spans:
+    for span in hashtags:
        retokenizer.merge(span)
        for token in span:
            token._.is_hashtag = True
@ -712,9 +713,9 @@ from spacy.matcher import PhraseMatcher
 nlp = spacy.load('en_core_web_sm')
 matcher = PhraseMatcher(nlp.vocab)
-terminology_list = [u"Barack Obama", u"Angela Merkel", u"Washington, D.C."]
+terms = [u"Barack Obama", u"Angela Merkel", u"Washington, D.C."]
 # Only run nlp.make_doc to speed things up
-patterns = [nlp.make_doc(text) for text in terminology_list]
+patterns = [nlp.make_doc(text) for text in terms]
 matcher.add("TerminologyList", None, *patterns)
 doc = nlp(u"German Chancellor Angela Merkel and US President Barack Obama "
--- a/website/docs/usage/spacy-101.md
+++ b/website/docs/usage/spacy-101.md
@ -29,6 +29,19 @@ quick introduction.
 > [pull requests](https://github.com/explosion/spaCy/pulls). You can find a
 > "Suggest edits" link at the bottom of each page that points you to the source.
 <Infobox title="Take the free interactive course">
 [![Advanced NLP with spaCy](../images/course.jpg)](https://course.spacy.io)
 In this course you'll learn how to use spaCy to build advanced natural language
 understanding systems, using both rule-based and machine learning approaches. It
 includes 55 exercises featuring interactive coding practice, multiple-choice
 questions and slide decks.
 <p><Button to="https://course.spacy.io" variant="primary">Start the course</Button></p>
 </Infobox>
 ## What's spaCy? {#whats-spacy}
 <Grid cols={2}>
@ -89,27 +102,12 @@ systems, or to pre-process text for **deep learning**.
  integrated and opinionated. spaCy tries to avoid asking the user to choose
  between multiple algorithms that deliver equivalent functionality. Keeping the
  menu small lets spaCy deliver generally better performance and developer
-  experience.M
+  experience.
 - **spaCy is not a company**. It's an open-source library. Our company
  publishing spaCy and other software is called
  [Explosion AI](https://explosion.ai).
 <Infobox title="Download the spaCy Cheat Sheet!">
 [![spaCy Cheatsheet](../images/cheatsheet.jpg)](http://datacamp-community-prod.s3.amazonaws.com/29aa28bf-570a-4965-8f54-d6a541ae4e06)
 For the launch of our
 ["Advanced NLP with spaCy"](https://www.datacamp.com/courses/advanced-nlp-with-spacy)
 course on DataCamp we created the first official spaCy cheat sheet! A handy
 two-page reference to the most important concepts and features, from loading
 models and accessing linguistic annotations, to custom pipeline components and
 rule-based matching.
 <p><Button to="http://datacamp-community-prod.s3.amazonaws.com/29aa28bf-570a-4965-8f54-d6a541ae4e06" variant="primary">Download</Button></p>
 </Infobox>
 ## Features {#features}
 In the documentation, you'll come across mentions of spaCy's features and
--- a/website/docs/usage/visualizers.md
+++ b/website/docs/usage/visualizers.md
@ -136,7 +136,7 @@ The entity visualizer lets you customize the following `options`:
 | Argument | Type | Description                                                                           | Default |
 | -------- | ---- | ------------------------------------------------------------------------------------- | ------- |
 | `ents`   | list |  Entity types to highlight (`None` for all types).                                    | `None`  |
-| `colors` | dict | Color overrides. Entity types in lowercase should be mapped to color names or values. | `{}`    |
+| `colors` | dict | Color overrides. Entity types in uppercase should be mapped to color names or values. | `{}`    |
 If you specify a list of `ents`, only those entity types will be rendered – for
 example, you can choose to display `PERSON` entities. Internally, the visualizer
--- a/website/meta/sidebars.json
+++ b/website/meta/sidebars.json
@ -90,7 +90,8 @@
                    { "text": "StringStore", "url": "/api/stringstore" },
                    { "text": "Vectors", "url": "/api/vectors" },
                    { "text": "GoldParse", "url": "/api/goldparse" },
-                    { "text": "GoldCorpus", "url": "/api/goldcorpus" }
+                    { "text": "GoldCorpus", "url": "/api/goldcorpus" },
                    { "text": "Scorer", "url": "/api/scorer" }
                ]
            },
            {
--- a/website/meta/universe.json
+++ b/website/meta/universe.json
@ -1,5 +1,107 @@
 {
    "resources": [
        {
            "id": "nlp-architect",
            "title": "NLP Architect",
            "slogan": "Python lib for exploring Deep NLP & NLU by Intel AI",
            "github": "NervanaSystems/nlp-architect",
            "pip": "nlp-architect",
            "thumb": "https://i.imgur.com/vMideRx.png",
            "category": ["standalone", "research"],
            "tags": ["pytorch"]
        },
        {
            "id": "NeuroNER",
            "title": "NeuroNER",
            "slogan": "Named-entity recognition using neural networks",
            "github": "Franck-Dernoncourt/NeuroNER",
            "pip": "pyneuroner[cpu]",
            "code_example": [
                "from neuroner import neuromodel",
                "nn = neuromodel.NeuroNER(train_model=False, use_pretrained_model=True)"
            ],
            "category": ["ner"],
            "tags": ["standalone"]
        },
        {
            "id": "NLPre",
            "title": "NLPre",
            "slogan": "Natural Language Preprocessing Library for health data and more",
            "github": "NIHOPA/NLPre",
            "pip": "nlpre",
            "code_example": [
                "from nlpre import titlecaps, dedash, identify_parenthetical_phrases",
                "from nlpre import replace_acronyms, replace_from_dictionary",
                "ABBR = identify_parenthetical_phrases()(text)",
                "parsers = [dedash(), titlecaps(), replace_acronyms(ABBR),",
                "        replace_from_dictionary(prefix='MeSH_')]",
                "for f in parsers:",
                "    text = f(text)",
                "print(text)"
            ],
            "category": ["scientific"]
        },
        {
            "id": "Chatterbot",
            "title": "Chatterbot",
            "slogan": "A machine-learning based conversational dialog engine for creating chat bots",
            "github": "gunthercox/ChatterBot",
            "pip": "chatterbot",
            "thumb": "https://i.imgur.com/eyAhwXk.jpg",
            "code_example": [
                "from chatterbot import ChatBot",
                "from chatterbot.trainers import ListTrainer",
                "# Create a new chat bot named Charlie",
                "chatbot = ChatBot('Charlie')",
                "trainer = ListTrainer(chatbot)",
                "trainer.train([",
                "'Hi, can I help you?',",
                "'Sure, I would like to book a flight to Iceland.",
                "'Your flight has been booked.'",
                "])",
                "",
                "response = chatbot.get_response('I would like to book a flight.')"
            ],
            "author": "Gunther Cox",
            "author_links": {
                "github": "gunthercox"
            },
            "category": ["conversational", "standalone"],
            "tags": ["chatbots"]
        },
        {
            "id": "saber",
            "title": "saber",
            "slogan": "Deep-learning based tool for information extraction in the biomedical domain",
            "github": "BaderLab/saber",
            "pip": "saber",
            "thumb": "https://raw.githubusercontent.com/BaderLab/saber/master/docs/img/saber_logo.png",
            "code_example": [
                "from saber.saber import Saber",
                "saber = Saber()",
                "saber.load('PRGE')",
                "saber.annotate('The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53.')"
            ],
            "author": "Bader Lab, University of Toronto",
            "category": ["scientific"],
            "tags": ["keras", "biomedical"]
        },
        {
            "id": "alibi",
            "title": "alibi",
            "slogan": "Algorithms for monitoring and explaining machine learning models ",
            "github": "SeldonIO/alibi",
            "pip": "alibi",
            "thumb": "https://i.imgur.com/YkzQHRp.png",
            "code_example": [
                "from alibi.explainers import AnchorTabular",
                "explainer = AnchorTabular(predict_fn, feature_names)",
                "explainer.fit(X_train)",
                "explainer.explain(x)"
            ],
            "author": "Seldon",
            "category": ["standalone", "research"]
        },
        {
            "id": "spacymoji",
            "slogan": "Emoji handling and meta data as a spaCy pipeline component",
@ -143,7 +245,7 @@
                "doc = nlp(my_doc_text)"
            ],
            "author": "tc64",
-            "author_link": {
+            "author_links": {
                "github": "tc64"
            },
            "category": ["pipeline"]
@ -346,7 +448,7 @@
            "author_links": {
                "github": "huggingface"
            },
-            "category": ["standalone", "conversational"],
+            "category": ["standalone", "conversational", "models"],
            "tags": ["coref"]
        },
        {
@ -538,7 +640,7 @@
                "twitter": "allenai_org",
                "website": "http://allenai.org"
            },
-            "category": ["models", "research"]
+            "category": ["scientific", "models", "research"]
        },
        {
            "id": "textacy",
@ -601,7 +703,7 @@
                "github": "ahalterman",
                "twitter": "ahalterman"
            },
-            "category": ["standalone"]
+            "category": ["standalone", "scientific"]
        },
        {
            "id": "kindred",
@ -626,7 +728,7 @@
            "author_links": {
                "github": "jakelever"
            },
-            "category": ["standalone"]
+            "category": ["standalone", "scientific"]
        },
        {
            "id": "sense2vec",
@ -837,6 +939,42 @@
            },
            "category": ["standalone"]
        },
        {
            "id": "prefect",
            "title": "Prefect",
            "slogan": "Workflow management system designed for modern infrastructure",
            "github": "PrefectHQ/prefect",
            "pip": "prefect",
            "thumb": "https://i.imgur.com/oLTwr0e.png",
            "code_example": [
                "from prefect import Flow",
                "from prefect.tasks.spacy.spacy_tasks import SpacyNLP",
                "import spacy",
                "",
                "nlp = spacy.load(\"en_core_web_sm\")",
                "",
                "with Flow(\"Natural Language Processing\") as flow:",
                "    doc = SpacyNLP(text=\"This is some text\", nlp=nlp)",
                "",
                "flow.run()"
            ],
            "author": "Prefect",
            "author_links": {
                "website": "https://prefect.io"
            },
            "category": ["standalone"]
        },
        {
            "id": "graphbrain",
            "title": "Graphbrain",
            "slogan": "Automated meaning extraction and text understanding",
            "description": "Graphbrain is an Artificial Intelligence open-source software library and scientific research tool. Its aim is to facilitate automated meaning extraction and text understanding, as well as the exploration and inference of knowledge.",
            "github": "graphbrain/graphbrain",
            "pip": "graphbrain",
            "thumb": "https://i.imgur.com/cct9W1E.png",
            "author": "Graphbrain",
            "category": ["standalone"]
        },
        {
            "type": "education",
            "id": "oreilly-python-ds",
@ -883,36 +1021,6 @@
            "author": "Bhargav Srinivasa-Desikan",
            "category": ["books"]
        },
        {
            "type": "education",
            "id": "datacamp-nlp-fundamentals",
            "title": "Natural Language Processing Fundamentals in Python",
            "slogan": "Datacamp, 2017",
            "description": "In this course, you'll learn Natural Language Processing (NLP) basics, such as how to identify and separate words, how to extract topics in a text, and how to build your own fake news classifier. You'll also learn how to use basic libraries such as NLTK, alongside libraries which utilize deep learning to solve common NLP problems. This course will give you the foundation to process and parse text as you move forward in your Python learning.",
            "url": "https://www.datacamp.com/courses/natural-language-processing-fundamentals-in-python",
            "thumb": "https://i.imgur.com/0Zks7c0.jpg",
            "author": "Katharine Jarmul",
            "author_links": {
                "twitter": "kjam"
            },
            "category": ["courses"]
        },
        {
            "type": "education",
            "id": "datacamp-advanced-nlp",
            "title": "Advanced Natural Language Processing with spaCy",
            "slogan": "Datacamp, 2019",
            "description": "If you're working with a lot of text, you'll eventually want to know more about it. For example, what's it about? What do the words mean in context? Who is doing what to whom? What companies and products are mentioned? Which texts are similar to each other? In this course, you'll learn how to use spaCy, a fast-growing industry standard library for NLP in Python, to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
            "url": "https://www.datacamp.com/courses/advanced-nlp-with-spacy",
            "thumb": "https://i.imgur.com/0Zks7c0.jpg",
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines",
                "website": "https://ines.io"
            },
            "category": ["courses"]
        },
        {
            "type": "education",
            "id": "learning-path-spacy",
@ -924,6 +1032,23 @@
            "author": "Aaron Kramer",
            "category": ["courses"]
        },
        {
            "type": "education",
            "id": "spacy-course",
            "title": "Advanced NLP with spaCy",
            "slogan": "spaCy, 2019",
            "description": "In this free interactive course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
            "url": "https://course.spacy.io",
            "image": "https://i.imgur.com/JC00pHW.jpg",
            "thumb": "https://i.imgur.com/5RXLtrr.jpg",
            "author": "Ines Montani",
            "author_links": {
                "twitter": "_inesmontani",
                "github": "ines",
                "website": "https://ines.io"
            },
            "category": ["courses"]
        },
        {
            "type": "education",
            "id": "video-spacys-ner-model",
@ -1010,6 +1135,22 @@
            },
            "category": ["podcasts"]
        },
        {
            "type": "education",
            "id": "twimlai-podcast",
            "title": "TWiML & AI: Practical NLP with spaCy and Prodigy",
            "slogan": "May 2019",
            "description": "\"Ines and I caught up to discuss her various projects, including the aforementioned SpaCy, an open-source NLP library built with a focus on industry and production use cases. In our conversation, Ines gives us an overview of the SpaCy Library, a look at some of the use cases that excite her, and the Spacy community and contributors. We also discuss her work with Prodigy, an annotation service tool that uses continuous active learning to train models, and finally, what other exciting projects she is working on.\"",
            "thumb": "https://i.imgur.com/ng2F5gK.png",
            "url": "https://twimlai.com/twiml-talk-262-practical-natural-language-processing-with-spacy-and-prodigy-w-ines-montani",
            "iframe": "https://html5-player.libsyn.com/embed/episode/id/9691514/height/90/theme/custom/thumbnail/no/preload/no/direction/backward/render-playlist/no/custom-color/3e85b1/",
            "iframe_height": 90,
            "author": "Sam Charrington",
            "author_links": {
                "website": "https://twimlai.com"
            },
            "category": ["podcasts"]
        },
        {
            "id": "adam_qas",
            "title": "ADAM: Question Answering System",
@ -1068,7 +1209,7 @@
                "github": "ecohealthalliance",
                "website": " https://ecohealthalliance.org/"
            },
-            "category": ["research", "standalone"]
+            "category": ["scientific", "standalone"]
        },
        {
            "id": "self-attentive-parser",
@ -1311,8 +1452,100 @@
                "website": "http://w4nderlu.st"
            },
            "category": ["standalone", "research"]
        },
        {
            "id": "gracyql",
            "title": "gracyql",
            "slogan": "A thin GraphQL wrapper around spacy",
            "github": "oterrier/gracyql",
            "description": "An example of a basic [Starlette](https://github.com/encode/starlette) app using [Spacy](https://github.com/explosion/spaCy) and [Graphene](https://github.com/graphql-python/graphene). The main goal is to be able to use the amazing power of spaCy from other languages and retrieving only the information you need thanks to the GraphQL query definition. The GraphQL schema tries to mimic as much as possible the original Spacy API with classes Doc, Span and Token.",
            "thumb": "https://i.imgur.com/xC7zpTO.png",
            "category": ["apis"],
            "tags": ["graphql"],
            "code_example": [
                "query ParserDisabledQuery {",
                "  nlp(model: \"en\", disable: [\"parser\", \"ner\"]) {",
                "    doc(text: \"I live in Grenoble, France\") {",
                "      text",
                "      tokens {",
                "        id",
                "        pos",
                "        lemma",
                "        dep",
                "      }",
                "      ents {",
                "        start",
                "        end",
                "        label",
                "      }",
                "    }",
                "  }",
                "}"
            ],
            "code_language": "json",
            "author": "Olivier Terrier",
            "author_links": {
                "github": "oterrier"
            }
        },
        {
            "id": "pyInflect",
            "slogan": "A python module for word inflections",
            "description": "This package uses the [spaCy 2.0 extensions](https://spacy.io/usage/processing-pipelines#extensions) to add word inflections to the system.",
            "github": "bjascob/pyInflect",
            "pip": "pyinflect",
            "code_example": [
                "import spacy",
                "import pyinflect",
                "",
                "nlp = spacy.load('en_core_web_sm')",
                "doc = nlp('This is an example.')",
                "doc[3].tag_                # NN",
                "doc[3]._.inflect('NNS')    # examples"
            ],
            "author": "Brad Jascob",
            "author_links": {
                "github": "bjascob"
            },
            "category": ["pipeline"],
            "tags": ["inflection"]
        },
        {
            "id": "NGym",
            "title": "NeuralGym",
            "slogan": "A little Windows GUI for training models with spaCy",
            "description": "NeuralGym is a Python application for Windows with a graphical user interface to train models with spaCy. Run the application, select an output folder, a training data file in spaCy's data format, a spaCy model or blank model and press 'Start'.",
            "github": "d5555/NeuralGym",
            "url": "https://github.com/d5555/NeuralGym",
            "image": "https://github.com/d5555/NeuralGym/raw/master/NGym.png",
            "thumb": "https://github.com/d5555/NeuralGym/raw/master/NGym/web.png",
            "author": "d5555",
            "category": ["training"],
            "tags": ["windows"]
        },
        {
            "id": "holmes",
            "title": "Holmes",
            "slogan": "Information extraction from English and German texts based on predicate logic",
            "github": "msg-systems/holmes-extractor",
            "url": "https://github.com/msg-systems/holmes-extractor",
            "description": "Holmes is a Python 3 library that supports a number of use cases involving information extraction from English and German texts, including chatbot, structural search, topic matching and supervised document classification.",
            "pip": "holmes-extractor",
            "category": ["conversational", "standalone"],
            "tags": ["chatbots", "text-processing"],
            "code_example": [
                "import holmes_extractor as holmes",
                "holmes_manager = holmes.Manager(model='en_coref_lg')",
                "holmes_manager.register_search_phrase('A big dog chases a cat')",
                "holmes_manager.start_chatbot_mode_console()"
            ],
            "author": "Richard Paul Hudson",
            "author_links": {
                "github": "richardpaulhudson"
            }
        }
    ],
    "categories": [
        {
            "label": "Projects",
@ -1337,6 +1570,11 @@
                    "title": "Research",
                    "description": "Frameworks and utilities for developing better NLP models, especially using neural networks"
                },
                {
                    "id": "scientific",
                    "title": "Scientific",
                    "description": "Frameworks and utilities for scientific text processing"
                },
                {
                    "id": "visualizers",
                    "title": "Visualizers",
@ -1356,6 +1594,11 @@
                    "id": "standalone",
                    "title": "Standalone",
                    "description": "Self-contained libraries or tools that use spaCy under the hood"
                },
                {
                    "id": "models",
                    "title": "Models",
                    "description": "Third-party pre-trained models for different languages and domains"
                }
            ]
        },
--- a/website/src/components/seo.js
+++ b/website/src/components/seo.js
@ -93,6 +93,7 @@ const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) =>
            return (
                <Helmet
                    defer={false}
                    htmlAttributes={{ lang }}
                    bodyAttributes={{ class: bodyClass }}
                    title={pageTitle}
--- a/website/src/templates/universe.js
+++ b/website/src/templates/universe.js
@ -125,7 +125,7 @@ const UniverseContent = ({ content = [], categories, pageContext, location, mdxC
                    </p>
                    <InlineList>
-                        <Button variant="primary" to={github('website/universe/README.md')}>
+                        <Button variant="primary" to={github('website/UNIVERSE.md')}>
                            Read the docs
                        </Button>
                        <Button icon="code" to={github('website/meta/universe.json')}>
--- a/website/src/widgets/landing.js
+++ b/website/src/widgets/landing.js
@ -75,16 +75,6 @@ const Landing = ({ data }) => {
                <LandingSubtitle>in Python</LandingSubtitle>
            </LandingHeader>
            <LandingGrid blocks>
                <LandingCard title="Fastest in the world">
                    <p>
                        spaCy excels at large-scale information extraction tasks. It's written from
                        the ground up in carefully memory-managed Cython. Independent research has
                        confirmed that spaCy is the fastest in the world. If your application needs
                        to process entire web dumps, spaCy is the library you want to be using.
                    </p>
                    <LandingButton to="/usage/facts-figures">Facts & Figures</LandingButton>
                </LandingCard>
                <LandingCard title="Get things done">
                    <p>
                        spaCy is designed to help you do real work — to build real products, or
@ -92,7 +82,16 @@ const Landing = ({ data }) => {
                        wasting it. It's easy to install, and its API is simple and productive. We
                        like to think of spaCy as the Ruby on Rails of Natural Language Processing.
                    </p>
-                    <LandingButton to="/usage">Get started</LandingButton>
+                    <LandingButton to="/usage/spacy-101">Get started</LandingButton>
                </LandingCard>
                <LandingCard title="Blazing fast">
                    <p>
                        spaCy excels at large-scale information extraction tasks. It's written from
                        the ground up in carefully memory-managed Cython. Independent research in
                        2015 found spaCy to be the fastest in the world. If your application needs
                        to process entire web dumps, spaCy is the library you want to be using.
                    </p>
                    <LandingButton to="/usage/facts-figures">Facts & Figures</LandingButton>
                </LandingCard>
                <LandingCard title="Deep learning">
@ -129,6 +128,7 @@ const Landing = ({ data }) => {
                        <Li>
                            Pre-trained <strong>word vectors</strong>
                        </Li>
                        <Li>State-of-the-art speed</Li>
                        <Li>
                            Easy <strong>deep learning</strong> integration
                        </Li>
@ -144,7 +144,6 @@ const Landing = ({ data }) => {
                        <Li>
                            Easy <strong>model packaging</strong> and deployment
                        </Li>
                        <Li>State-of-the-art speed</Li>
                        <Li>Robust, rigorously evaluated accuracy</Li>
                    </Ul>
                </LandingCol>