Merge branch 'master' into develop

2025-11-27 13:26:07 +03:00 · 2019-02-07 20:54:07 +01:00 · 2019-02-07 20:54:07 +01:00 · 5d0b60999d
commit 5d0b60999d
parent dbeebfa3a2 04aa041c9e
77 changed files with 293374 additions and 292084 deletions
--- a/.github/contributors/DeNeutoy.md
+++ b/.github/contributors/DeNeutoy.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |Mark Neumann                      |
 | Company name (if applicable)   |Allen Institute for AI                      |
 | Title or role (if applicable)  |Research Engineer                      |
 | Date                           | 13/01/2019                      |
 | GitHub username                |@Deneutoy                      |
 | Website (optional)             |markneumann.xyz                      |
--- a/.github/contributors/Loghijiaha.md
+++ b/.github/contributors/Loghijiaha.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [ x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ x] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Loghi Perinpanayagam |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |   Student            |
 | Date                           |   13 Jan, 2019       |
 | GitHub username                |   loghijiaha         |
 | Website (optional)             |                      |
--- a/.github/contributors/PolyglotOpenstreetmap.md
+++ b/.github/contributors/PolyglotOpenstreetmap.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Jo                   |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 2018-01-26           |
 | GitHub username                | PolyglotOpenstreetmap|
 | Website (optional)             |                      |
--- a/.github/contributors/adrianeboyd.md
+++ b/.github/contributors/adrianeboyd.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Adriane Boyd         |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | 28 January 2019      |
 | GitHub username                | adrianeboyd          |
 | Website (optional)             |                      |
--- a/.github/contributors/alvations.md
+++ b/.github/contributors/alvations.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [ ] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |  Liling              |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           |  04 Jan 2019         |
 | GitHub username                |  alvations           |
 | Website (optional)             |                      |
--- a/.github/contributors/amperinet.md
+++ b/.github/contributors/amperinet.md
@ -101,6 +101,6 @@ mark both statements:
 | Name                           | Amandine Périnet        |
 | Company name (if applicable)   | 365Talents              |
 | Title or role (if applicable)  | Data Science Researcher |
-| Date                           | 12/12/2018              |
+| Date                           | 28/01/2019              |
 | GitHub username                | amperinet               |
 | Website (optional)             |                         |
--- a/.github/contributors/boena.md
+++ b/.github/contributors/boena.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Björn Lennartsson                     |
 | Company name (if applicable)   | Uptrail AB                     |
 | Title or role (if applicable)  | CTO                     |
 | Date                           | 2019-01-15                     |
 | GitHub username                | boena                     |
 | Website (optional)             | www.uptrail.com                     |
--- a/.github/contributors/foufaster.md
+++ b/.github/contributors/foufaster.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           |Anès Foufa            |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |NLP developer         |
 | Date                           |21/01/2019            |
 | GitHub username                |foufaster             |
 | Website (optional)             |                      |
--- a/.github/contributors/ozcankasal.md
+++ b/.github/contributors/ozcankasal.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Ozcan Kasal          |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | December 21, 2018    |
 | GitHub username                | ozcankasal           |
 | Website (optional)             |                      |
--- a/.github/contributors/retnuh.md
+++ b/.github/contributors/retnuh.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1.  The term "contribution" or "contributed materials" means any source code,
    object code, patch, tool, sample, graphic, specification, manual,
    documentation, or any other material posted or submitted by you to the project.
 2.  With respect to any worldwide copyrights, or copyright applications and
    registrations, in your contribution:
        * you hereby assign to us joint ownership, and to the extent that such
        assignment is or becomes invalid, ineffective or unenforceable, you hereby
        grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
        royalty-free, unrestricted license to exercise all rights under those
        copyrights. This includes, at our option, the right to sublicense these same
        rights to third parties through multiple levels of sublicensees or other
        licensing arrangements;
        * you agree that each of us can do all things in relation to your
        contribution as if each of us were the sole owners, and if one of us makes
        a derivative work of your contribution, the one who makes the derivative
        work (or has it made will be the sole owner of that derivative work;
        * you agree that you will not assert any moral rights in your contribution
        against us, our licensees or transferees;
        * you agree that we may register a copyright in your contribution and
        exercise all ownership rights associated with it; and
        * you agree that neither of us has any duty to consult with, obtain the
        consent of, pay or render an accounting to the other for any use or
        distribution of your contribution.
 3.  With respect to any patents you own, or that you can license without payment
    to any third party, you hereby grant to us a perpetual, irrevocable,
    non-exclusive, worldwide, no-charge, royalty-free license to:
        * make, have made, use, sell, offer to sell, import, and otherwise transfer
        your contribution in whole or in part, alone or in combination with or
        included in any product, work or materials arising out of the project to
        which your contribution was submitted, and
        * at our option, to sublicense these same rights to third parties through
        multiple levels of sublicensees or other licensing arrangements.
 4.  Except as set out above, you keep all right, title, and interest in your
    contribution. The rights that you grant to us under these terms are effective
    on the date you first submitted a contribution to us, even if your submission
    took place before the date you sign these terms.
 5.  You covenant, represent, warrant and agree that:
    - Each contribution that you submit is and shall be an original work of
      authorship and you can legally grant the rights set out in this SCA;
    - to the best of your knowledge, each contribution will not violate any
      third party's copyrights, trademarks, patents, or other intellectual
      property rights; and
    - each contribution shall be in compliance with U.S. export control laws and
      other applicable export and import laws. You agree to notify us if you
      become aware of any circumstance which would make any of the foregoing
      representations inaccurate in any respect. We may publicly disclose your
      participation in the project, including the fact that you have signed the SCA.
 6.  This SCA is governed by the laws of the State of California and applicable
    U.S. Federal law. Any choice of law rules will not apply.
 7.  Please place an “x” on one of the applicable statement below. Please do NOT
    mark both statements:
        * [x] I am signing on behalf of myself as an individual and no other person
        or entity, including my employer, has or will have rights with respect to my
        contributions.
        * [ ] I am signing on behalf of my employer or a legal entity and I have the
        actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                         | Entry        |
 | ----------------------------- | ------------ |
 | Name                          | Hunter Kelly |
 | Company name (if applicable)  |              |
 | Title or role (if applicable) |              |
 | Date                          | 2019-01-10   |
 | GitHub username               | retnuh       |
 | Website (optional)            |              |
--- a/.github/contributors/willprice.md
+++ b/.github/contributors/willprice.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                 |
 |------------------------------- | --------------------- |
 | Name                           | Will Price            |
 | Company name (if applicable)   | N/A                   |
 | Title or role (if applicable)  | N/A                   |
 | Date                           | 26/12/2018            |
 | GitHub username                | willprice             |
 | Website (optional)             | https://willprice.org |
--- a/MANIFEST.in
+++ b/MANIFEST.in
@ -1,4 +1,5 @@
 recursive-include include *.h
 include LICENSE
 include README.md
 include pyproject.toml
 include bin/spacy
--- a/contributer_agreement.md
+++ b/contributer_agreement.md
@ -0,0 +1,106 @@
 # spaCy contributor agreement
 This spaCy Contributor Agreement (**"SCA"**) is based on the
 [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
 The SCA applies to any contribution that you make to any product or project
 managed by us (the **"project"**), and sets out the intellectual property rights
 you grant to us in the contributed materials. The term **"us"** shall mean
 [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
 **"you"** shall mean the person or entity identified below.
 If you agree to be bound by these terms, fill in the information requested
 below and include the filled-in version with your first pull request, under the
 folder [`.github/contributors/`](/.github/contributors/). The name of the file
 should be your GitHub username, with the extension `.md`. For example, the user
 example_user would create the file `.github/contributors/example_user.md`.
 Read this agreement carefully before signing. These terms and conditions
 constitute a binding legal agreement.
 ## Contributor Agreement
 1. The term "contribution" or "contributed materials" means any source code,
 object code, patch, tool, sample, graphic, specification, manual,
 documentation, or any other material posted or submitted by you to the project.
 2. With respect to any worldwide copyrights, or copyright applications and
 registrations, in your contribution:
    * you hereby assign to us joint ownership, and to the extent that such
    assignment is or becomes invalid, ineffective or unenforceable, you hereby
    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
    royalty-free, unrestricted license to exercise all rights under those
    copyrights. This includes, at our option, the right to sublicense these same
    rights to third parties through multiple levels of sublicensees or other
    licensing arrangements;
    * you agree that each of us can do all things in relation to your
    contribution as if each of us were the sole owners, and if one of us makes
    a derivative work of your contribution, the one who makes the derivative
    work (or has it made will be the sole owner of that derivative work;
    * you agree that you will not assert any moral rights in your contribution
    against us, our licensees or transferees;
    * you agree that we may register a copyright in your contribution and
    exercise all ownership rights associated with it; and
    * you agree that neither of us has any duty to consult with, obtain the
    consent of, pay or render an accounting to the other for any use or
    distribution of your contribution.
 3. With respect to any patents you own, or that you can license without payment
 to any third party, you hereby grant to us a perpetual, irrevocable,
 non-exclusive, worldwide, no-charge, royalty-free license to:
    * make, have made, use, sell, offer to sell, import, and otherwise transfer
    your contribution in whole or in part, alone or in combination with or
    included in any product, work or materials arising out of the project to
    which your contribution was submitted, and
    * at our option, to sublicense these same rights to third parties through
    multiple levels of sublicensees or other licensing arrangements.
 4. Except as set out above, you keep all right, title, and interest in your
 contribution. The rights that you grant to us under these terms are effective
 on the date you first submitted a contribution to us, even if your submission
 took place before the date you sign these terms.
 5. You covenant, represent, warrant and agree that:
    * Each contribution that you submit is and shall be an original work of
    authorship and you can legally grant the rights set out in this SCA;
    * to the best of your knowledge, each contribution will not violate any
    third party's copyrights, trademarks, patents, or other intellectual
    property rights; and
    * each contribution shall be in compliance with U.S. export control laws and
    other applicable export and import laws. You agree to notify us if you
    become aware of any circumstance which would make any of the foregoing
    representations inaccurate in any respect. We may publicly disclose your
    participation in the project, including the fact that you have signed the SCA.
 6. This SCA is governed by the laws of the State of California and applicable
 U.S. Federal law. Any choice of law rules will not apply.
 7. Please place an “x” on one of the applicable statement below. Please do NOT
 mark both statements:
    * [x] I am signing on behalf of myself as an individual and no other person
    or entity, including my employer, has or will have rights with respect to my
    contributions.
    * [ ] I am signing on behalf of my employer or a legal entity and I have the
    actual authority to contractually bind that entity.
 ## Contributor Details
 | Field                          | Entry                |
 |------------------------------- | -------------------- |
 | Name                           | Laura Baakman        |
 | Company name (if applicable)   |                      |
 | Title or role (if applicable)  |                      |
 | Date                           | February 7, 2019     |
 | GitHub username                | lauraBaakman         |
 | Website (optional)             |                      |
--- a/examples/information_extraction/phrase_matcher.py
+++ b/examples/information_extraction/phrase_matcher.py
@ -58,7 +58,7 @@ import spacy
    lang=("Language class to initialise", "option", "l", str),
 )
 def main(patterns_loc, text_loc, n=10000, lang="en"):
-    nlp = spacy.blank("en")
+    nlp = spacy.blank(lang)
    nlp.vocab.lex_attr_getters = {}
    phrases = read_gazetteer(nlp.tokenizer, patterns_loc)
    count = 0
--- a/examples/training/train_textcat.py
+++ b/examples/training/train_textcat.py
@ -26,6 +26,11 @@ from spacy.util import minibatch, compounding
    n_iter=("Number of training iterations", "option", "n", int),
 )
 def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
@ -87,9 +92,6 @@ def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
    print(test_text, doc.cats)
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        with nlp.use_params(optimizer.averages):
            nlp.to_disk(output_dir)
        print("Saved model to", output_dir)
--- a/examples/training/training-data.json
+++ b/examples/training/training-data.json
@ -1,6 +1,6 @@
 [
    {
-      "id": "wsj_0200",
+      "id": 42,
      "paragraphs": [
        {
          "raw": "In an Oct. 19 review of \"The Misanthrope\" at Chicago's Goodman Theatre (\"Revitalized Classics Take the Stage in Windy City,\" Leisure & Arts), the role of Celimene, played by Kim Cattrall, was mistakenly attributed to Christina Haag. Ms. Haag plays Elianti.",
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,10 @@
 [build-system]
 requires = ["setuptools",
            "wheel>0.32.0.<0.33.0",
            "Cython",
            "cymem>=2.0.2,<2.1.0",
            "preshed>=2.0.1,<2.1.0",
            "murmurhash>=0.28.0,<1.1.0",
            "thinc>=6.12.1,<6.13.0",
            ]
 build-backend = "setuptools.build_meta"
--- a/requirements.txt
+++ b/requirements.txt
@ -14,7 +14,7 @@ plac<1.0.0,>=0.9.6
 pathlib==1.0.1; python_version < "3.4"
 # Development dependencies
 cython>=0.25
-pytest>=4.0.0,<5.0.0
+pytest>=4.0.0,<4.1.0
 pytest-timeout>=1.3.0,<2.0.0
 mock>=2.0.0,<3.0.0
 flake8>=3.5.0,<3.6.0
--- a/setup.py
+++ b/setup.py
@ -246,6 +246,7 @@ def setup_package():
                "cuda92": ["cupy-cuda92>=4.0"],
                "cuda100": ["cupy-cuda100>=4.0"],
            },
            python_requires=">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*",
            classifiers=[
                "Development Status :: 5 - Production/Stable",
                "Environment :: Console",
--- a/spacy/cli/converters/iob2json.py
+++ b/spacy/cli/converters/iob2json.py
@ -31,9 +31,13 @@ def read_iob(raw_sents):
        tokens = [re.split("[^\w\-]", line.strip())]
        if len(tokens[0]) == 3:
            words, pos, iob = zip(*tokens)
-        else:
+        elif len(tokens[0]) == 2:
            words, iob = zip(*tokens)
            pos = ["-"] * len(words)
        else:
            raise ValueError(
                "The iob/iob2 file is not formatted correctly. Try checking whitespace and delimiters."
            )
        biluo = iob_to_biluo(iob)
        sentences.append(
            [
--- a/spacy/cli/init_model.py
+++ b/spacy/cli/init_model.py
@ -208,7 +208,11 @@ def read_freqs(freqs_loc, max_length=100, min_doc_freq=5, min_freq=50):
            doc_freq = int(doc_freq)
            freq = int(freq)
            if doc_freq >= min_doc_freq and freq >= min_freq and len(key) < max_length:
-                word = literal_eval(key)
+                try:
                    word = literal_eval(key)
                except SyntaxError:
                    # Take odd strings literally.
                    word = literal_eval("'%s'" % key)
                smooth_count = counts.smoother(int(freq))
                probs[word] = math.log(smooth_count) - log_total
    oov_prob = math.log(counts.smoother(0)) - log_total
--- a/spacy/displacy/init.py
+++ b/spacy/displacy/init.py
@ -9,7 +9,6 @@ from ..util import is_in_jupyter
 _html = {}
 IS_JUPYTER = is_in_jupyter()
 RENDER_WRAPPER = None
@ -18,7 +17,7 @@ def render(
    style="dep",
    page=False,
    minify=False,
-    jupyter=IS_JUPYTER,
+    jupyter=False,
    options={},
    manual=False,
 ):
@ -51,7 +50,7 @@ def render(
    html = _html["parsed"]
    if RENDER_WRAPPER is not None:
        html = RENDER_WRAPPER(html)
-    if jupyter:  # return HTML rendered by IPython display()
+    if jupyter or is_in_jupyter():  # return HTML rendered by IPython display()
        from IPython.core.display import display, HTML
        return display(HTML(html))
--- a/spacy/displacy/render.py
+++ b/spacy/displacy/render.py
@ -1,7 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals
-import random
+import uuid
 from .templates import TPL_DEP_SVG, TPL_DEP_WORDS, TPL_DEP_ARCS
 from .templates import TPL_ENT, TPL_ENTS, TPL_FIGURE, TPL_TITLE, TPL_PAGE
@ -41,7 +41,7 @@ class DependencyRenderer(object):
        """
        # Create a random ID prefix to make sure parses don't receive the
        # same ID, even if they're identical
-        id_prefix = random.randint(0, 999)
+        id_prefix = uuid.uuid4().hex
        rendered = [
            self.render_svg("{}-{}".format(id_prefix, i), p["words"], p["arcs"])
            for i, p in enumerate(parsed)
--- a/spacy/lang/fr/lemmatizer/init.py
+++ b/spacy/lang/fr/lemmatizer/init.py
@ -4,20 +4,24 @@ from __future__ import unicode_literals
 from .lookup import LOOKUP
 from ._adjectives import ADJECTIVES
 from ._adjectives_irreg import ADJECTIVES_IRREG
 from ._adp_irreg import ADP_IRREG
 from ._adverbs import ADVERBS
 from ._auxiliary_verbs_irreg import AUXILIARY_VERBS_IRREG
 from ._cconj_irreg import CCONJ_IRREG
 from ._dets_irreg import DETS_IRREG
 from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES
 from ._nouns import NOUNS
 from ._nouns_irreg import NOUNS_IRREG
 from ._pronouns_irreg import PRONOUNS_IRREG
 from ._sconj_irreg import SCONJ_IRREG
 from ._verbs import VERBS
 from ._verbs_irreg import VERBS_IRREG
 from ._dets_irreg import DETS_IRREG
 from ._pronouns_irreg import PRONOUNS_IRREG
 from ._auxiliary_verbs_irreg import AUXILIARY_VERBS_IRREG
 from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES
 LEMMA_INDEX = {'adj': ADJECTIVES, 'adv': ADVERBS, 'noun': NOUNS, 'verb': VERBS}
-LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'noun': NOUNS_IRREG, 'verb': VERBS_IRREG, 
+LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'adp': ADP_IRREG, 'aux': AUXILIARY_VERBS_IRREG,
-             'det': DETS_IRREG, 'pron': PRONOUNS_IRREG, 'aux': AUXILIARY_VERBS_IRREG}
+             'cconj': CCONJ_IRREG, 'det': DETS_IRREG, 'noun': NOUNS_IRREG, 'verb': VERBS_IRREG, 
             'pron': PRONOUNS_IRREG, 'sconj': SCONJ_IRREG}
 LEMMA_RULES = {'adj': ADJECTIVE_RULES, 'noun': NOUN_RULES, 'verb': VERB_RULES}
--- a/spacy/lang/fr/lemmatizer/_adjectives.py
+++ b/spacy/lang/fr/lemmatizer/_adjectives.py
--- a/spacy/lang/fr/lemmatizer/_adjectives_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_adjectives_irreg.py
--- a/spacy/lang/fr/lemmatizer/_adp_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_adp_irreg.py
@ -0,0 +1,24 @@
 # coding: utf8
 from __future__ import unicode_literals
 ADP_IRREG = {
 	"a": ("à",),
 	"apr.": ("après",),
 	"aux": ("à",),
 	"av.": ("avant",),
 	"avt": ("avant",),
 	"cf.": ("cf",),
 	"conf.": ("cf",),
 	"confer": ("cf",),
 	"d'": ("de",),
 	"des": ("de",),
 	"du": ("de",),
 	"jusqu'": ("jusque",),
 	"pdt": ("pendant",),
        "+": ("plus",),
        "pr": ("pour",),
 	"/": ("sur",),
 	"versus": ("vs",),
 	"vs.": ("vs",)
 }
--- a/spacy/lang/fr/lemmatizer/_adverbs.py
+++ b/spacy/lang/fr/lemmatizer/_adverbs.py
--- a/spacy/lang/fr/lemmatizer/_cconj_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_cconj_irreg.py
@ -0,0 +1,17 @@
 # coding: utf8
 from __future__ import unicode_literals
 CCONJ_IRREG = {
 	"&amp;": ("et",),
 	"c-à-d": ("c'est-à-dire",),
 	"c.-à.-d.": ("c'est-à-dire",),
 	"càd": ("c'est-à-dire",),
 	"&": ("et",),
 	"et|ou": ("et-ou",),
 	"et/ou": ("et-ou",),
 	"i.e.": ("c'est-à-dire",),
 	"ie": ("c'est-à-dire",),
 	"ou/et": ("et-ou",),
 	"+": ("plus",)
 }
--- a/spacy/lang/fr/lemmatizer/_dets_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_dets_irreg.py
@ -4,20 +4,27 @@ from __future__ import unicode_literals
 DETS_IRREG = {
    "aucune": ("aucun",),
    "cents": ("cent",),
    "certaine": ("certain",),
    "certaines": ("certain",),
    "certains": ("certain",),
    "ces": ("ce",),
    "cet": ("ce",),
    "cette": ("ce",),
-    "cents": ("cent",),
+    "des": ("un",),
    "certaines": ("certains",),
    "différentes": ("différents",),
    "diverse": ("divers",),
    "diverses": ("divers",),
    "du": ("de",),
    "la": ("le",),
    "les": ("le",),
    "l'": ("le",),
    "laquelle": ("lequel",),
    "les": ("le",),
    "lesdites": ("ledit",),
    "lesdits": ("ledit",),
    "leurs": ("leur",),
    "lesquelles": ("lequel",),
    "lesquels": ("lequel",),
-    "leurs": ("leur",),
+    "l'": ("le",),
    "mainte": ("maint",),
    "maintes": ("maint",),
    "maints": ("maint",),
@ -27,23 +34,29 @@ DETS_IRREG = {
    "nulle": ("nul",),
    "nulles": ("nul",),
    "nuls": ("nul",),
    "pareille": ("pareil",),
    "pareilles": ("pareil",),
    "pareils": ("pareil",),
    "quelle": ("quel",),
    "quelles": ("quel",),
-    "quels": ("quel",),
+    "qq": ("quelque",),
-    "quelqu'": ("quelque",),
+    "qqes": ("quelque",),
    "qqs": ("quelque",),
    "quelques": ("quelque",),
    "quelqu'": ("quelque",),
    "quels": ("quel",),
    "sa": ("son",),
    "ses": ("son",),
    "telle": ("tel",),
    "telles": ("tel",),
    "tels": ("tel",),
    "ta": ("ton",),
    "telles": ("tel",),
    "telle": ("tel",),
    "tels": ("tel",),
    "tes": ("ton",),
    "tous": ("tout",),
    "toute": ("tout",),
    "toutes": ("tout",),
-    "des": ("un",),
+    "toute": ("tout",),
    "une": ("un",),
    "vingts": ("vingt",),
    "vot'": ("votre",),
    "vos": ("votre",),
 }
--- a/spacy/lang/fr/lemmatizer/_lemma_rules.py
+++ b/spacy/lang/fr/lemmatizer/_lemma_rules.py
@ -63,36 +63,8 @@ NOUN_RULES = [
    ["w", "w"],
    ["y", "y"],
    ["z", "z"],
-    ["as", "a"],
+    ["s", ""],
-    ["aux", "au"],
+    ["x", ""],
    ["cs", "c"],
    ["chs", "ch"],
    ["ds", "d"],
    ["és", "é"],
    ["es", "e"],
    ["eux", "eu"],
    ["fs", "f"],
    ["gs", "g"],
    ["hs", "h"],
    ["is", "i"],
    ["ïs", "ï"],
    ["js", "j"],
    ["ks", "k"],
    ["ls", "l"],
    ["ms", "m"],
    ["ns", "n"],
    ["oux", "ou"],
    ["os", "o"],
    ["ps", "p"],
    ["qs", "q"],
    ["rs", "r"],
    ["ses", "se"],
    ["se", "se"],
    ["ts", "t"],
    ["us", "u"],
    ["vs", "v"],
    ["ws", "w"],
    ["ys", "y"],
    ["nt(e", "nt"],
    ["nt(e)", "nt"],
    ["al(e", "ale"],
--- a/spacy/lang/fr/lemmatizer/_nouns.py
+++ b/spacy/lang/fr/lemmatizer/_nouns.py
--- a/spacy/lang/fr/lemmatizer/_nouns_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_nouns_irreg.py
--- a/spacy/lang/fr/lemmatizer/_pronouns_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_pronouns_irreg.py
@ -4,37 +4,89 @@ from __future__ import unicode_literals
 PRONOUNS_IRREG = {
    "aucune": ("aucun",),
-    "celle-ci": ("celui-ci",),
+    "autres": ("autre",),
-    "celles-ci": ("celui-ci",),
+    "ça": ("cela",),
-    "ceux-ci": ("celui-ci",),
+    "c'": ("ce",),
    "celle-là": ("celui-là",),
    "celles-là": ("celui-là",),
    "ceux-là": ("celui-là",),
    "celle": ("celui",),
    "celle-ci": ("celui-ci",),
    "celle-là": ("celui-là",),
    "celles": ("celui",),
-    "ceux": ("celui",),
+    "celles-ci": ("celui-ci",),
    "celles-là": ("celui-là",),
    "certaines": ("certains",),
    "ceux": ("celui",),
    "ceux-ci": ("celui-ci",),
    "ceux-là": ("celui-là",),
    "chacune": ("chacun",),
    "-elle": ("lui",),
    "elle": ("lui",),
    "elle-même": ("lui-même",),
    "-elles": ("lui",),
    "elles": ("lui",),
    "elles-mêmes": ("lui-même",),
    "eux": ("lui",),
    "eux-mêmes": ("lui-même",),
    "icelle": ("icelui",),
    "icelles": ("icelui",),
    "iceux": ("icelui",),
    "-il": ("il",),
    "-ils": ("il",),
    "ils": ("il",),
    "-je": ("je",),
    "j'": ("je",),
    "la": ("le",),
    "les": ("le",),
    "laquelle": ("lequel",),
    "l'autre": ("l'autre",),
    "les": ("le",),
    "lesquelles": ("lequel",),
    "lesquels": ("lequel",),
-    "elle-même": ("lui-même",),
+    "-leur": ("leur",),
-    "elles-mêmes": ("lui-même",),
+    "l'on": ("on",),
-    "eux-mêmes": ("lui-même",),
+    "-lui": ("lui",),
    "l'une": ("l'un",),
    "mêmes": ("même",),
    "-m'": ("me",),
    "m'": ("me",),
    "-moi": ("moi",),
    "nous-mêmes": ("nous-même",),
    "-nous": ("nous",),
    "-on": ("on",),
    "qqchose": ("quelque chose",),
    "qqch": ("quelque chose",),
    "qqc": ("quelque chose",),
    "qqn": ("quelqu'un",),
    "quelle": ("quel",),
    "quelles": ("quel",),
-    "quels": ("quel",),
+    "quelques-unes": ("quelques-uns",),
    "quelques-unes": ("quelqu'un",),
    "quelques-uns": ("quelqu'un",),
    "quelque-une": ("quelqu'un",),
    "quelqu'une": ("quelqu'un",),
    "quels": ("quel",),
    "qu": ("que",),
-    "telle": ("tel",),
+    "s'": ("se",),
    "-t-elle": ("elle",),
    "-t-elles": ("elle",),
    "telles": ("tel",),
    "telle": ("tel",),
    "tels": ("tel",),
-    "toutes": ("tous",),
+    "-t-en": ("en",),
    "-t-il": ("il",),
    "-t-ils": ("il",),
    "-toi": ("toi",),
    "-t-on": ("on",),
    "tous": ("tout",),
    "toutes": ("tout",),
    "toute": ("tout",),
    "-t'": ("te",),
    "t'": ("te",),
    "-tu": ("tu",),
    "-t-y": ("y",),
    "unes": ("un",),
    "une": ("un",),
    "uns": ("un",),
    "vous-mêmes": ("vous-même",),
    "vous-même": ("vous-même",),
    "-vous": ("vous",),
    "-vs": ("vous",),
    "vs": ("vous",),
    "-y": ("y",),
 }
--- a/spacy/lang/fr/lemmatizer/_sconj_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_sconj_irreg.py
@ -0,0 +1,19 @@
 # coding: utf8
 from __future__ import unicode_literals
 SCONJ_IRREG = {
 	"lorsqu'": ("lorsque",),
 	"pac'que": ("parce que",),
 	"pac'qu'": ("parce que",),
 	"parc'que": ("parce que",),
 	"parc'qu'": ("parce que",),
 	"paske": ("parce que",),
 	"pask'": ("parce que",),
 	"pcq": ("parce que",),
 	"+": ("plus",),
 	"puisqu'": ("puisque",),
 	"qd": ("quand",),
 	"quoiqu'": ("quoique",),
 	"qu'": ("que",)
 }
--- a/spacy/lang/fr/lemmatizer/_verbs.py
+++ b/spacy/lang/fr/lemmatizer/_verbs.py
@ -6,63 +6,64 @@ VERBS = set(
    """
 abaisser abandonner abdiquer abecquer abéliser aberrer abhorrer abîmer abjurer
 ablater abluer ablutionner abominer abonder abonner aborder aborner aboucher
- abouler abouter abraquer abraser abreuver abricoter abriter absenter absinther
+ abouler abouter aboutonner abracadabrer abraquer abraser abreuver abricoter
- absolutiser absorber abuser académifier académiser acagnarder accabler
+ abriter absenter absinther absolutiser absorber abuser académifier académiser
- accagner accaparer accastiller accentuer accepter accessoiriser accidenter
+ acagnarder accabler accagner accaparer accastiller accentuer accepter
- acclamer acclimater accointer accolader accoler accommoder accompagner
+ accessoiriser accidenter acclamer acclimater accointer accolader accoler
- accorder accorer accoster accoter accoucher accouder accouer accoupler
+ accommoder accompagner accorder accorer accoster accoter accoucher accouder
- accoutrer accoutumer accouver accrassiner accréditer accrocher acculer
+ accouer accoupler accoutrer accoutumer accouver accrassiner accréditer
- acculturer accumuler accuser acenser acétaliser acétyler achalander acharner
+ accrocher acculer acculturer accumuler accuser acenser acétaliser acétyler
- acheminer achopper achromatiser aciduler aciériser acliquer acoquiner acquêter
+ achalander acharner acheminer achopper achromatiser aciduler aciériser
- acquitter acter actiniser actionner activer actoriser actualiser acupuncturer
+ acliquer acoquiner acquêter acquitter acter actiniser actionner activer
- acyler adapter additionner adenter adieuser adirer adjectiver adjectiviser
+ actoriser actualiser acupuncturer acyler adapter additionner adenter adieuser
- adjurer adjuver administrer admirer admonester adoniser adonner adopter adorer
+ adirer adjectiver adjectiviser adjurer adjuver administrer admirer admonester
- adorner adosser adouber adresser adsorber aduler adverbialiser aéroporter
+ adoniser adonner adopter adorer adorner adosser adouber adresser adsorber
- aérosoliser aérosonder aérotransporter affabuler affacturer affairer affaisser
+ aduler adverbialiser aéroporter aérosoliser aérosonder aérotransporter
- affaiter affaler affamer affecter affectionner affermer afficher affider
+ affabuler affacturer affairer affaisser affaiter affaler affamer affecter
- affiler affiner affirmer affistoler affixer affleurer afflouer affluer affoler
+ affectionner affermer afficher affider affiler affiner affirmer affistoler
- afforester affouiller affourcher affriander affricher affrioler affriquer
+ affixer affleurer afflouer affluer affoler afforester affouiller affourcher
- affriter affronter affruiter affubler affurer affûter afghaniser afistoler
+ affriander affricher affrioler affriquer affriter affronter affruiter affubler
- africaniser agatiser agenouiller agglutiner aggraver agioter agiter agoniser
+ affurer affûter afghaniser afistoler africaniser agatiser agenouiller
- agourmander agrafer agrainer agrémenter agresser agriffer agripper
+ agglutiner aggraver agioter agiter agoniser agourmander agrafer agrainer
- agroalimentariser agrouper aguetter aguicher ahaner aheurter aicher aider
+ agrémenter agresser agricher agriffer agripper agroalimentariser agrouper
- aigretter aiguer aiguiller aiguillonner aiguiser ailer ailler ailloliser
+ aguetter aguicher aguiller ahaner aheurter aicher aider aigretter aiguer
- aimanter aimer airer ajointer ajourer ajourner ajouter ajuster ajuter
+ aiguiller aiguillonner aiguiser ailer ailler ailloliser aimanter aimer airer
- alambiquer alarmer albaniser albitiser alcaliniser alcaliser alcooliser
+ ajointer ajourer ajourner ajouter ajuster ajuter alambiquer alarmer albaniser
- alcoolyser alcoyler aldoliser alerter aleviner algébriser algérianiser
+ albitiser alcaliniser alcaliser alcooliser alcoolyser alcoyler aldoliser
- algorithmiser aligner alimenter alinéater alinéatiser aliter alkyler allaiter
+ alerter aleviner algébriser algérianiser algorithmiser aligner alimenter
- allectomiser allégoriser allitiser allivrer allocutionner alloter allouer
+ alinéater alinéatiser aliter alkyler allaiter allectomiser allégoriser
- alluder allumer allusionner alluvionner allyler aloter alpaguer alphabétiser
+ allitiser allivrer allocutionner alloter allouer alluder allumer allusionner
- alterner aluminer aluminiser aluner alvéoler alvéoliser amabiliser amadouer
+ alluvionner allyler aloter alpaguer alphabétiser alterner aluminer aluminiser
- amalgamer amariner amarrer amateloter ambitionner ambler ambrer ambuler
+ aluner alvéoler alvéoliser amabiliser amadouer amalgamer amariner amarrer
- améliorer amender amenuiser américaniser ameulonner ameuter amhariser amiauler
+ amateloter ambitionner ambler ambrer ambuler améliorer amender amenuiser
- amicoter amidonner amignarder amignoter amignotter aminer ammoniaquer
+ américaniser ameulonner ameuter amhariser amiauler amicoter amidonner
- ammoniser ammoxyder amocher amouiller amouracher amourer amphotériser ampouler
+ amignarder amignoter amignotter aminer ammoniaquer ammoniser ammoxyder amocher
- amputer amunitionner amurer amuser anagrammatiser anagrammer analyser
+ amouiller amouracher amourer amphotériser ampouler amputer amunitionner amurer
- anamorphoser anaphylactiser anarchiser anastomoser anathématiser anatomiser
+ amuser anagrammatiser anagrammer analyser anamorphoser anaphylactiser
- ancher anchoiter ancrer anecdoter anecdotiser angéliser anglaiser angler
+ anarchiser anastomoser anathématiser anatomiser ancher anchoiter ancrer
- angliciser angoisser anguler animaliser animer aniser ankyloser annexer
+ anecdoter anecdotiser angéliser anglaiser angler angliciser angoisser anguler
- annihiler annoter annualiser annuler anodiser ânonner anser antagoniser
+ animaliser animer aniser ankyloser annexer annihiler annoter annualiser
- antéposer antérioriser anthropomorphiser anticiper anticoaguler antidater
+ annuler anodiser ânonner anser antagoniser antéposer antérioriser
- antiparasiter antiquer antiseptiser anuiter aoûter apaiser apériter apetisser
+ anthropomorphiser anticiper anticoaguler antidater antiparasiter antiquer
- apeurer apicaliser apiquer aplaner apologiser aponévrotomiser aponter aposter
+ antiseptiser anuiter aoûter apaiser apériter apetisser apeurer apicaliser
- apostiller apostoliser apostropher apostumer apothéoser appareiller apparenter
+ apiquer aplaner apologiser aponévrotomiser aponter aposter apostiller
- appeauter appertiser appliquer appointer appoltronner apponter apporter
+ apostoliser apostropher apostumer apothéoser appareiller apparenter appeauter
- apposer appréhender apprêter apprivoiser approcher approuver approvisionner
+ appertiser appliquer appointer appoltronner apponter apporter apposer
- approximer apurer aquareller arabiser araméiser aramer araser arbitrer arborer
+ appréhender apprêter apprivoiser approcher approuver approvisionner approximer
- arboriser arcbouter arc-bouter archaïser architecturer archiver arçonner
+ apurer aquareller arabiser araméiser aramer araser arbitrer arborer arboriser
- ardoiser aréniser arer argenter argentiniser argoter argotiser argumenter
+ arcbouter arc-bouter archaïser architecturer archiver arçonner ardoiser
- arianiser arimer ariser aristocratiser aristotéliser arithmétiser armaturer
+ aréniser arer argenter argentiniser argoter argotiser argumenter arianiser
- armer arnaquer aromatiser arpenter arquebuser arquer arracher arraisonner
+ arimer ariser aristocratiser aristotéliser arithmétiser armaturer armer
- arrenter arrêter arrher arrimer arriser arriver arroser arsouiller
+ arnaquer aromatiser arpenter arquebuser arquer arracher arraisonner arrenter
- artérialiser articler articuler artificialiser artistiquer aryaniser aryler
+ arrêter arrher arrimer arriser arriver arroser arsouiller artérialiser
- ascensionner ascétiser aseptiser asexuer asianiser asiatiser aspecter
+ articler articuler artificialiser artistiquer aryaniser aryler ascensionner
- asphalter aspirer assabler assaisonner assassiner assembler assener asséner
+ ascétiser aseptiser asexuer asianiser asiatiser aspecter asphalter aspirer
- assermenter asserter assibiler assigner assimiler assister assoiffer assoler
+ assabler assaisonner assassiner assembler assener asséner assermenter asserter
- assommer assoner assoter assumer assurer asticoter astiquer athéiser
+ assibiler assigner assimiler assister assoiffer assoler assommer assoner
- atlantiser atomiser atourner atropiniser attabler attacher attaquer attarder
+ assoter assumer assurer asticoter astiquer athéiser atlantiser atomiser
- attenter attentionner atténuer atterrer attester attifer attirer attiser
+ atourner atropiniser attabler attacher attaquer attarder attenter attentionner
- attitrer attraper attremper attribuer attrister attrouper aubiner
+ atténuer atterrer attester attifer attirer attiser attitrer attoucher attraper
 attremper attribuer attriquer attrister attrouper aubader aubiner
 audiovisualiser auditer auditionner augmenter augurer aulofer auloffer aumôner
 auner auréoler ausculter authentiquer autoaccuser autoadapter autoadministrer
 autoagglutiner autoalimenter autoallumer autoamputer autoanalyser autoancrer
@ -73,10 +74,10 @@ VERBS = set(
 autodéterminer autodévelopper autodévorer autodicter autodiscipliner
 autodupliquer autoéduquer autoenchâsser autoenseigner autoépurer autoéquiper
 autoévaporiser autoévoluer autoféconder autofertiliser autoflageller
- autofonder autoformer autofretter autogouverner autogreffer autoguider auto-
+ autofonder autoformer autofretter autogouverner autogreffer autoguider
- immuniser auto-ioniser autolégitimer autolimiter autoliquider autolyser
+ auto-immuniser auto-ioniser autolégitimer autolimiter autoliquider autolyser
- automatiser automédiquer automitrailler automutiler autonomiser auto-
+ automatiser automédiquer automitrailler automutiler autonomiser
- optimaliser auto-optimiser autoorganiser autoperpétuer autopersuader
+ auto-optimaliser auto-optimiser autoorganiser autoperpétuer autopersuader
 autopiloter autopolliniser autoporter autopositionner autoproclamer
 autopropulser autoréaliser autorecruter autoréglementer autoréguler
 autorelaxer autoréparer autoriser autosélectionner autosevrer autostabiliser
@ -84,7 +85,7 @@ VERBS = set(
 autotracter autotransformer autovacciner autoventiler avaler avaliser
 aventurer aveugler avillonner aviner avironner aviser avitailler aviver
 avoiner avoisiner avorter avouer axéniser axer axiomatiser azimuter azoter
- azurer babiller babouiner bâcher bachonner bachoter bâcler badauder
+ azurer babiller babouiner bâcher bachonner bachoter bâcler badauder bader
 badigeonner badiner baffer bafouer bafouiller bâfrer bagarrer bagoter bagouler
 baguenauder baguer baguetter bahuter baigner bailler bâiller baîller
 bâillonner baîllonner baiser baisoter baisouiller baisser bakéliser balader
@ -135,9 +136,9 @@ VERBS = set(
 brouillonner broussailler brousser brouter bruiner bruisser bruiter brûler
 brumer brumiser bruncher brusquer brutaliser bruter bûcher bucoliser
 budgétiser buer buffériser buffler bugler bugner buiser buissonner bulgariser
- buquer bureaucratiser buriner buser busquer buter butiner butonner butter
+ buller buquer bureaucratiser buriner buser busquer buter butiner butonner
- buvoter byzantiner byzantiniser cabaler cabaliser cabaner câbler cabosser
+ butter buvoter byzantiner byzantiniser cabaler cabaliser cabaner câbler
- caboter cabotiner cabrer cabrioler cacaber cacaoter cacarder cacher
+ cabosser caboter cabotiner cabrer cabrioler cacaber cacaoter cacarder cacher
 cachetonner cachotter cadastrer cadavériser cadeauter cadetter cadoter cadrer
 cafarder cafeter cafouiller cafter cageoler cagnarder cagner caguer cahoter
 caillebotter cailler caillouter cajoler calaminer calamistrer calamiter
@ -185,65 +186,66 @@ VERBS = set(
 claveliser claver clavetter clayonner cléricaliser clicher cligner clignoter
 climatiser clinquanter clinquer cliper cliquer clisser cliver clochardiser
 clocher clocter cloisonner cloîtrer cloner cloper clopiner cloquer clôturer
- clouer clouter coaccuser coacerver coacher coadapter coagglutiner coaguler
+ clotûrer clouer clouter coaccuser coacerver coacher coadapter coagglutiner
- coaliser coaltarer coaltariser coanimer coarticuler cobelligérer cocaïniser
+ coaguler coaliser coaltarer coaltariser coanimer coarticuler cobelligérer
- cocarder cocheniller cocher côcher cochonner coconiser coconner cocooner
+ cocaïniser cocarder cocheniller cocher côcher cochonner coconiser coconner
- cocoter coder codéterminer codiller coéditer coéduquer coexister coexploiter
+ cocooner cocoter coder codéterminer codiller coéditer coéduquer coexister
- coexprimer coffiner coffrer cofonder cogiter cogner cogouverner cohabiter
+ coexploiter coexprimer coffiner coffrer cofonder cogiter cogner cogouverner
- cohériter cohober coiffer coincher coincider coïncider coïter colchiciner
+ cohabiter cohériter cohober coiffer coincher coincider coïncider coïter
- collaber collaborer collationner collecter collectionner collectiviser coller
+ colchiciner collaber collaborer collationner collecter collectionner
- collisionner colloquer colluvionner colmater colombianiser colombiner
+ collectiviser coller collisionner colloquer colluvionner colmater
- coloniser colorer coloriser colostomiser colporter colpotomiser coltiner
+ colombianiser colombiner coloniser colorer coloriser colostomiser colporter
- columniser combiner combler commander commanditer commémorer commenter
+ colpotomiser coltiner columniser combiner combler commander commanditer
- commercialiser comminer commissionner commotionner commuer communaliser
+ commémorer commenter commercialiser comminer commissionner commotionner
- communautariser communiquer communiser commuter compacifier compacter comparer
+ commuer communaliser communautariser communiquer communiser commuter
- compartimenter compenser compiler compisser complanter complémenter
+ compacifier compacter comparer compartimenter compenser compiler compisser
- complétiviser complexer complimenter compliquer comploter comporter composer
+ complanter complémenter complétiviser complexer complimenter compliquer
- composter compoter compounder compresser comprimer comptabiliser compter
+ comploter comporter composer composter compoter compounder compresser
- compulser computer computériser concentrer conceptualiser concerner concerter
+ comprimer comptabiliser compter compulser computer computériser concentrer
- concher conciliabuler concocter concomiter concorder concrétionner concrétiser
+ conceptualiser concerner concerter concher conciliabuler concocter concomiter
- concubiner condamner condenser condimenter conditionner confabuler
+ concorder concrétionner concrétiser concubiner condamner condenser condimenter
- confectionner confédéraliser confesser confessionnaliser configurer confiner
+ conditionner confabuler confectionner confédéraliser confesser
- confirmer confisquer confiter confluer conformer conforter confronter
+ confessionnaliser configurer confiner confirmer confisquer confiter confluer
- confusionner congestionner conglober conglutiner congoliser congratuler
+ conformer conforter confronter confusionner congestionner conglober
- coniser conjecturer conjointer conjuger conjuguer conjurer connecter conniver
+ conglutiner congoliser congratuler coniser conjecturer conjointer conjuger
- connoter conquêter consacrer conscientiser conseiller conserver consigner
+ conjuguer conjurer connecter conniver connoter conquêter consacrer
- consister consoler consolider consommariser consommer consonantiser consoner
+ conscientiser conseiller conserver consigner consister consoler consolider
- conspirer conspuer constater consteller conster consterner constiper
+ consommariser consommer consonantiser consoner conspirer conspuer constater
- constituer constitutionnaliser consulter consumer contacter contagionner
+ consteller conster consterner constiper constituer constitutionnaliser
- containeriser containériser contaminer contemner contempler conteneuriser
+ consulter consumer contacter contagionner containeriser containériser
- contenter conter contester contextualiser continentaliser contingenter
+ contaminer contemner contempler conteneuriser contenter conter contester
- continuer contorsionner contourner contracter contractualiser contracturer
+ contextualiser continentaliser contingenter continuer contorsionner contourner
- contraposer contraster contre-attaquer contrebouter contrebuter contrecalquer
+ contracter contractualiser contracturer contraposer contraster contre-attaquer
- contrecarrer contre-expertiser contreficher contrefraser contre-indiquer
+ contrebouter contrebuter contrecalquer contrecarrer contre-expertiser
- contremander contremanifester contremarcher contremarquer contreminer
+ contreficher contrefraser contre-indiquer contremander contremanifester
- contremurer contrenquêter contreplaquer contrepointer contrer contresigner
+ contremarcher contremarquer contreminer contremurer contrenquêter
- contrespionner contretyper contreventer contribuer contrister contrôler
+ contreplaquer contrepointer contrer contresigner contrespionner contretyper
- controuver controverser contusionner conventionnaliser conventionner
+ contreventer contribuer contrister contrôler controuver controverser
- conventualiser converser convoiter convoler convoquer convulser convulsionner
+ contusionner conventionnaliser conventionner conventualiser converser
- cooccuper coopératiser coopter coordonner coorganiser coparrainer coparticiper
+ convoiter convoler convoquer convulser convulsionner cooccuper coopératiser
- copermuter copiner copolycondenser copolymériser coprésenter coprésider copser
+ coopter coordonner coorganiser coparrainer coparticiper copermuter copiner
- copter copuler copyrighter coqueliner coquer coqueriquer coquiller corailler
+ copolycondenser copolymériser coprésenter coprésider copser copter copuler
- corder cordonner coréaliser coréaniser coréguler coresponsabiliser cornaquer
+ copyrighter coqueliner coquer coqueriquer coquiller corailler corder cordonner
- cornemuser corner coroniser corporiser correctionaliser correctionnaliser
+ coréaliser coréaniser coréguler coresponsabiliser cornaquer cornemuser corner
- correler corréler corroborer corroder corser corticaliser cosigner cosmétiquer
+ coroniser corporiser correctionaliser correctionnaliser correler corréler
- cosser costumer coter cotillonner cotiser cotonner cotransfecter couaquer
+ corroborer corroder corser corticaliser cosigner cosmétiquer cosser costumer
- couarder couchailler coucher couchoter couchotter coucouer coucouler couder
+ coter cotillonner cotiser cotonner cotransfecter couaquer couarder couchailler
- coudrer couillonner couiner couler coulisser coupailler coupeller couper
+ coucher couchoter couchotter coucouer coucouler couder coudrer couillonner
- couperoser coupler couponner courailler courbaturer courber courbetter
+ couiner couler coulisser coupailler coupeller couper couperoser coupler
- courcailler couronner courrieler courser courtauder court-circuiter courtiser
+ couponner courailler courbaturer courber courbetter courcailler couronner
- cousiner coussiner coûter couturer couver cracher crachiner crachoter
+ courrieler courser courtauder court-circuiter courtiser cousiner coussiner
- crachouiller crailler cramer craminer cramper cramponner crampser cramser
+ coûter couturer couver cracher crachiner crachoter crachouiller crailler
- craner crâner crânoter cranter crapahuter crapaüter crapser crapuler craquer
+ cramer craminer cramper cramponner crampser cramser craner crâner crânoter
- crasher cratériser craticuler cratoniser cravacher cravater crawler crayonner
+ cranter crapahuter crapaüter crapser crapuler craquer crasher cratériser
- crédibiliser créditer crématiser créoliser créosoter crêper crépiner crépiter
+ craticuler cratoniser cravacher cravater crawler crayonner crédibiliser
- crésyler crêter crétiniser creuser criailler cribler criminaliser criquer
+ créditer crématiser créoliser créosoter crêper crépiner crépiter crésyler
- crisper crisser cristalliser criticailler critiquer crocher croiser crôler
+ crêter crétiniser creuser criailler cribler criminaliser criquer crisper
- croquer croskiller crosser crotoniser crotter crouler croupionner crouponner
+ crisser cristalliser criticailler critiquer crocher croiser crôler croquer
 croskiller crosser crotoniser crotter crouler croupionner crouponner
 croustiller croûter croûtonner cryoappliquer cryocautériser cryocoaguler
 cryoconcentrer cryodécaper cryoébarber cryofixer cryogéniser cryomarquer
- cryosorber crypter cuber cueiller cuider cuisiner cuiter cuivrer culbuter
+ cryosorber crypter cuber cueiller cuider cuisiner cuivrer culbuter culer
- culer culminer culotter culpabiliser cultiver culturaliser cumuler curariser
+ culminer culotter culpabiliser cultiver culturaliser cumuler curariser
 curedenter curer curetter customiser cuter cutiniser cuver cyaniser cyanoser
 cyanurer cybernétiser cycler cycliser cycloner cylindrer dactylocoder daguer
 daguerréotyper daïer daigner dailler daller damasquiner damer damner
@ -748,8 +750,8 @@ VERBS = set(
 mithridatiser mitonner mitrailler mixer mixter mixtionner mobiliser modaliser
 modéliser modérantiser moderniser moduler moellonner mofler moirer moiser
 moissonner molarder molariser moléculariser molester moletter mollarder
- molletter monarchiser mondaniser monder mondialiser monétariser monétiser
+ molletonner molletter monarchiser mondaniser monder mondialiser monétariser
- moniliser monologuer monomériser monophtonguer monopoler monopoliser
+ monétiser moniliser monologuer monomériser monophtonguer monopoler monopoliser
 monoprogrammer monosiallitiser monotoniser monseigneuriser monter montrer
 monumentaliser moquer moquetter morailler moraliser mordailler mordiller
 mordillonner mordorer mordoriser morfailler morfaler morfiler morfler morganer
@ -792,63 +794,64 @@ VERBS = set(
 palpiter palucher panacher panader pancarter paner paniquer panneauter panner
 pannetonner panoramiquer panser pantiner pantomimer pantoufler paoner paonner
 papelarder papillonner papilloter papoter papouiller paquer paraboliser
- parachuter parader parafer paraffiner paralléliser paralyser paramétriser
+ parachuter parader parafer paraffiner paraisonner paralléliser paralyser
- parangonner parapher paraphraser parasiter parcellariser parceller parcelliser
+ paramétriser parangonner parapher paraphraser parasiter parcellariser
- parcheminer parcoriser pardonner parementer parenthétiser parer paresser
+ parceller parcelliser parcheminer parcoriser pardonner parementer
- parfiler parfumer parisianiser parjurer parkériser parlementer parler parloter
+ parenthétiser parer paresser parfiler parfumer parisianiser parjurer
- parlotter parquer parrainer participer particulariser partitionner partouzer
+ parkériser parlementer parler parloter parlotter parquer parrainer participer
- pasquiner pasquiniser passefiler passementer passepoiler passeriller
+ particulariser partitionner partouzer pasquiner pasquiniser passefiler
- passionnaliser passionner pasteller pasteuriser pasticher pastiller pastoriser
+ passementer passepoiler passeriller passionnaliser passionner pasteller
- patafioler pateliner patenter paternaliser paterner pathétiser patienter
+ pasteuriser pasticher pastiller pastoriser patafioler pateliner patenter
- patiner pâtisser patoiser pâtonner patouiller patrimonialiser patrociner
+ paternaliser paterner pathétiser patienter patiner pâtisser patoiser pâtonner
- patronner patrouiller patter pâturer paumer paupériser pauser pavaner paver
+ patouiller patrimonialiser patrociner patronner patrouiller patter pâturer
- pavoiser peaufiner pébriner pécher pêcher pécloter pectiser pédaler pédanter
+ paumer paupériser pauser pavaner paver pavoiser peaufiner pébriner pécher
- pédantiser pédiculiser pédicurer pédimenter peigner peiner peinturer
+ pêcher pécloter pectiser pédaler pédanter pédantiser pédiculiser pédicurer
- peinturlurer péjorer pelaner pelauder péleriner pèleriner pelletiser
+ pédimenter peigner peiner peinturer peinturlurer péjorer pelaner pelauder
- pelleverser pelliculer peloter pelotonner pelucher pelurer pénaliser pencher
+ péleriner pèleriner pelletiser pelleverser pelliculer peloter pelotonner
- pendeloquer pendiller pendouiller penduler pénéplaner penser pensionner
+ pelucher pelurer pénaliser pencher pendeloquer pendiller pendouiller penduler
- peptiser peptoniser percaliner percher percoler percuter perdurer pérégriner
+ pénéplaner penser pensionner peptiser peptoniser percaliner percher percoler
- pérenniser perfectionner perforer performer perfuser péricliter périmer
+ percuter perdurer pérégriner pérenniser perfectionner perforer performer
- périodiser périphériser périphraser péritoniser perler permanenter permaner
+ perfuser péricliter périmer périodiser périphériser périphraser péritoniser
- perméabiliser permuter pérorer pérouaniser peroxyder perpétuer perquisitionner
+ perler permanenter permaner perméabiliser permuter pérorer pérouaniser
- perreyer perruquer persécuter persifler persiller persister personnaliser
+ peroxyder perpétuer perquisitionner perreyer perruquer persécuter persifler
- persuader perturber pervibrer pester pétarader pétarder pétiller pétitionner
+ persiller persister personnaliser persuader perturber pervibrer pester
- pétocher pétouiller pétrarquiser pétroliser pétuner peupler pexer
+ pétarader pétarder pétiller pétitionner pétocher pétouiller pétrarquiser
- phacoémulsifier phagocyter phalangiser pharyngaliser phéniquer phénoler
+ pétroliser pétuner peupler pexer phacoémulsifier phagocyter phalangiser
- phényler philosophailler philosopher phlébotomiser phlegmatiser phlogistiquer
+ pharyngaliser phéniquer phénoler phényler philosophailler philosopher
- phonétiser phonologiser phosphater phosphorer phosphoriser phosphoryler
+ phlébotomiser phlegmatiser phlogistiquer phonétiser phonologiser phosphater
- photoactiver photocomposer photograver photo-ioniser photoïoniser photomonter
+ phosphorer phosphoriser phosphoryler photoactiver photocomposer photograver
- photophosphoryler photopolymériser photosensibiliser phraser piaffer piailler
+ photo-ioniser photoïoniser photomonter photophosphoryler photopolymériser
- pianomiser pianoter piauler pickler picocher picoler picorer picoter picouser
+ photosensibiliser phraser piaffer piailler pianomiser pianoter piauler pickler
- picouzer picrater pictonner picturaliser pidginiser piédestaliser pierrer
+ picocher picoler picorer picoter picouser picouzer picrater pictonner
- piétiner piétonnifier piétonniser pieuter pifer piffer piffrer pigeonner
+ picturaliser pidginiser piédestaliser pierrer piétiner piétonnifier
- pigmenter pigner pignocher pignoler piler piller pilloter pilonner piloter
+ piétonniser pieuter pifer piffer piffrer pigeonner pigmenter pigner pignocher
- pimenter pinailler pinceauter pinçoter pindariser pinter piocher pionner
+ pignoler piler piller pilloter pilonner piloter pimenter pinailler pinceauter
- piotter piper piqueniquer pique-niquer piquer piquetonner piquouser piquouzer
+ pinçoter pindariser pinter piocher pionner piotter piper piqueniquer
- pirater pirouetter piser pisser pissoter pissouiller pistacher pister pistoler
+ pique-niquer piquer piquetonner piquouser piquouzer pirater pirouetter piser
- pistonner pitancher pitcher piter pitonner pituiter pivoter placarder
+ pisser pissoter pissouiller pistacher pister pistoler pistonner pitancher
- placardiser plafonner plaider plainer plaisanter plamer plancher planer
+ pitcher piter pitonner pituiter pivoter placarder placardiser plafonner
- planétariser planétiser planquer planter plaquer plasmolyser plastiquer
+ plaider plainer plaisanter plamer plancher planer planétariser planétiser
- plastronner platiner platiniser platoniser plâtrer plébisciter pleurailler
+ planquer planter plaquer plasmolyser plastiquer plastronner platiner
- pleuraliser pleurer pleurnicher pleuroter pleuviner pleuvioter pleuvoter
+ platiniser platoniser plâtrer plébisciter pleurailler pleuraliser pleurer
- plisser plissoter plomber ploquer plotiniser plouter ploutrer plucher
+ pleurnicher pleuroter pleuviner pleuvioter pleuvoter plisser plissoter plomber
- plumarder plumer pluraliser plussoyer pluviner pluvioter pocharder pocher
+ ploquer plotiniser plouter ploutrer plucher plumarder plumer pluraliser
- pochetronner pochtronner poculer podzoliser poêler poétiser poignarder poigner
+ plussoyer pluviner pluvioter pocharder pocher pochetronner pochtronner poculer
- poiler poinçonner pointer pointiller poireauter poirer poiroter poisser
+ podzoliser poêler poétiser poignarder poigner poiler poinçonner pointer
- poitriner poivrer poivroter polariser poldériser polémiquer polissonner
+ pointiller poireauter poirer poiroter poisser poitriner poivrer poivroter
- politicailler politiquer politiser polker polliciser polliniser polluer
+ polariser poldériser polémiquer polissonner politicailler politiquer politiser
- poloniser polychromer polycontaminer polygoner polygoniser polymériser
+ polker polliciser polliniser polluer poloniser polychromer polycontaminer
- polyploïdiser polytransfuser polyviser pommader pommer pomper pomponner
+ polygoner polygoniser polymériser polyploïdiser polytransfuser polyviser
- ponctionner ponctuer ponter pontiller populariser poquer porer porphyriser
+ pommader pommer pomper pomponner ponctionner ponctuer ponter pontiller
- porter porteuser portionner portoricaniser portraicturer portraiturer poser
+ populariser poquer porer porphyriser porter porteuser portionner
- positionner positiver possibiliser postdater poster postérioriser posticher
+ portoricaniser portraicturer portraiturer poser positionner positiver
- postillonner postposer postsonoriser postsynchroniser postuler potabiliser
+ possibiliser postdater poster postérioriser posticher postillonner postposer
- potentialiser poter poteyer potiner poudrer pouffer pouiller pouliner pouloper
+ postsonoriser postsynchroniser postuler potabiliser potentialiser poter
- poulotter pouponner pourpenser pourprer poussailler pousser poutser praliner
+ poteyer potiner poudrer pouffer pouiller pouliner pouloper poulotter pouponner
- pratiquer préaccentuer préadapter préallouer préassembler préassimiler
+ pourpenser pourprer poussailler pousser poutser praliner pratiquer
- préaviser précariser précautionner prêchailler préchauffer préchauler prêcher
+ préaccentuer préadapter préallouer préassembler préassimiler préaviser
- précipiter préciser préciter précompter préconditionner préconfigurer
+ précariser précautionner prêchailler préchauffer préchauler prêcher précipiter
- préconiser préconstituer précoter prédater prédécouper prédésigner prédestiner
+ préciser préciter précompter préconditionner préconfigurer préconiser
 préconstituer précoter prédater prédécouper prédésigner prédestiner
 prédéterminer prédiffuser prédilectionner prédiquer prédisposer prédominer
 préemballer préempter préencoller préenregistrer préenrober préexaminer
 préexister préfabriquer préfaner préfigurer préfixer préformater préformer
@ -879,8 +882,8 @@ VERBS = set(
 raccommoder raccompagner raccorder raccoutrer raccoutumer raccrocher racémiser
 rachalander racher raciner racketter racler râcler racoler raconter racoquiner
 radariser rader radicaliser radiner radioactiver radiobaliser radiocommander
- radioconserver radiodétecter radiodiffuser radioexposer radioguider radio-
+ radioconserver radiodétecter radiodiffuser radioexposer radioguider
- immuniser radiolocaliser radiopasteuriser radiosonder radiostériliser
+ radio-immuniser radiolocaliser radiopasteuriser radiosonder radiostériliser
 radiotéléphoner radiotéléviser radoter radouber rafaler raffermer raffiler
 raffiner raffluer raffoler raffûter rafistoler rafler ragoter ragoûter
 ragrafer raguer raguser raiguiser railler rainer rainurer raisonner rajouter
@ -1123,19 +1126,21 @@ VERBS = set(
 sommer somnambuler somniloquer somnoler sonder sonnailler sonner sonoriser
 sophistiquer sorguer soubresauter souder souffler souffroter soufrer souhaiter
 souiller souillonner soûler souligner soûlotter soumissionner soupailler
- soupçonner souper soupirer souquer sourciller sourdiner sous-capitaliser sous-
+ soupçonner souper soupirer souquer sourciller sourdiner sous-alimenter
- catégoriser sousestimer sous-estimer sous-industrialiser sous-médicaliser
+ sous-capitaliser sous-catégoriser sous-équiper sousestimer sous-estimer
- sousperformer sous-qualifier soussigner sous-titrer sous-utiliser soutacher
+ sous-évaluer sous-exploiter sous-exposer sous-industrialiser sous-louer
- souter soutirer soviétiser spammer spasmer spatialiser spatuler spécialiser
+ sous-médicaliser sousperformer sous-qualifier soussigner sous-titrer
- spéculer sphéroïdiser spilitiser spiraler spiraliser spirantiser spiritualiser
+ sous-traiter sous-utiliser sous-virer soutacher souter soutirer soviétiser
- spitter splénectomiser spléniser sponsoriser sporter sporuler sprinter
+ spammer spasmer spatialiser spatuler spécialiser spéculer sphéroïdiser
- squatériser squatter squatteriser squattériser squeezer stabiliser stabuler
+ spilitiser spiraler spiraliser spirantiser spiritualiser spitter
- staffer stagner staliniser standardiser standoliser stanioler stariser
+ splénectomiser spléniser sponsoriser sporter sporuler sprinter squatériser
- stationner statistiquer statuer stelliter stenciler stendhaliser sténoser
+ squatter squatteriser squattériser squeezer stabiliser stabuler staffer
- sténotyper stepper stéréotyper stériliser stigmatiser stimuler stipuler
+ stagner staliniser standardiser standoliser stanioler stariser stationner
- stocker stoloniser stopper stranguler stratégiser stresser strider striduler
+ statistiquer statuer stelliter stenciler stendhaliser sténoser sténotyper
- striper stripper striquer stronker strouiller structurer strychniser stuquer
+ stepper stéréotyper stériliser stigmatiser stimuler stipuler stocker
- styler styliser subalterniser subdiviser subdivisionner subériser subjectiver
+ stoloniser stopper stranguler stratégiser stresser strider striduler striper
 stripper striquer stronker strouiller structurer strychniser stuquer styler
 styliser subalterniser subdiviser subdivisionner subériser subjectiver
 subjectiviser subjuguer sublimer sublimiser subluxer subminiaturiser subodorer
 subordonner suborner subsister substanter substantialiser substantiver
 substituer subsumer subtiliser suburbaniser subventionner succomber suçoter
--- a/spacy/lang/fr/lemmatizer/_verbs_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_verbs_irreg.py
--- a/spacy/lang/fr/lemmatizer/lemmatizer.py
+++ b/spacy/lang/fr/lemmatizer/lemmatizer.py
@ -1,7 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals
-from ....symbols import POS, NOUN, VERB, ADJ, ADV, PRON, DET, AUX, PUNCT
+from ....symbols import POS, NOUN, VERB, ADJ, ADV, PRON, DET, AUX, PUNCT, ADP, SCONJ, CCONJ
 from ....symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
 from .lookup import LOOKUP
@ -9,7 +9,7 @@ from .lookup import LOOKUP
 French language lemmatizer applies the default rule based lemmatization
 procedure with some modifications for better French language support.
-The parts of speech 'ADV', 'PRON', 'DET' and 'AUX' are added to use the 
+The parts of speech 'ADV', 'PRON', 'DET', 'ADP' and 'AUX' are added to use the 
 rule-based lemmatization. As a last resort, the lemmatizer checks in 
 the lookup table.
 '''
@ -34,16 +34,22 @@ class FrenchLemmatizer(object):
            univ_pos = 'verb'
        elif univ_pos in (ADJ, 'ADJ', 'adj'):
            univ_pos = 'adj'
        elif univ_pos in (ADP, 'ADP', 'adp'):
            univ_pos = 'adp'
        elif univ_pos in (ADV, 'ADV', 'adv'):
            univ_pos = 'adv'
        elif univ_pos in (PRON, 'PRON', 'pron'):
            univ_pos = 'pron'
        elif univ_pos in (DET, 'DET', 'det'):
            univ_pos = 'det'
        elif univ_pos in (AUX, 'AUX', 'aux'):
            univ_pos = 'aux'
        elif univ_pos in (CCONJ, 'CCONJ', 'cconj'):
            univ_pos = 'cconj'
        elif univ_pos in (DET, 'DET', 'det'):
            univ_pos = 'det'
        elif univ_pos in (PRON, 'PRON', 'pron'):
            univ_pos = 'pron'
        elif univ_pos in (PUNCT, 'PUNCT', 'punct'):
            univ_pos = 'punct'
        elif univ_pos in (SCONJ, 'SCONJ', 'sconj'):
            univ_pos = 'sconj'
        else:
            return [self.lookup(string)]
        # See Issue #435 for example of where this logic is requied.
@ -100,7 +106,7 @@ class FrenchLemmatizer(object):
    def lookup(self, string):
        if string in self.lookup_table:
-            return self.lookup_table[string]
+            return self.lookup_table[string][0]
        return string
@ -125,7 +131,7 @@ def lemmatize(string, index, exceptions, rules):
    if not forms:
        forms.extend(oov_forms)
    if not forms and string in LOOKUP.keys():
-        forms.append(LOOKUP[string])
+        forms.append(LOOKUP[string][0])
    if not forms:
        forms.append(string)
    return list(set(forms))
--- a/spacy/lang/fr/lemmatizer/lookup.py
+++ b/spacy/lang/fr/lemmatizer/lookup.py
--- a/spacy/lang/ja/init.py
+++ b/spacy/lang/ja/init.py
@ -1,16 +1,15 @@
 # encoding: utf8
 from __future__ import unicode_literals, print_function
 from ...language import Language
 from ...attrs import LANG
 from ...tokens import Doc, Token
 from ...tokenizer import Tokenizer
 from ... import util
 from .tag_map import TAG_MAP
 import re
 from collections import namedtuple
 from .tag_map import TAG_MAP
 from ...attrs import LANG
 from ...language import Language
 from ...tokens import Doc, Token
 from ...util import DummyTokenizer
 ShortUnitWord = namedtuple("ShortUnitWord", ["surface", "lemma", "pos"])
@ -46,12 +45,12 @@ def resolve_pos(token):
    # PoS mappings.
    if token.pos == "連体詞,*,*,*":
-        if re.match("^[こそあど此其彼]の", token.surface):
+        if re.match(r"[こそあど此其彼]の", token.surface):
            return token.pos + ",DET"
-        if re.match("^[こそあど此其彼]", token.surface):
+        if re.match(r"[こそあど此其彼]", token.surface):
            return token.pos + ",PRON"
-        else:
+        return token.pos + ",ADJ"
-            return token.pos + ",ADJ"
+
    return token.pos
@ -68,7 +67,8 @@ def detailed_tokens(tokenizer, text):
        pos = ",".join(parts[0:4])
        if len(parts) > 7:
-            # this information is only available for words in the tokenizer dictionary
+            # this information is only available for words in the tokenizer
            # dictionary
            base = parts[7]
        words.append(ShortUnitWord(surface, base, pos))
@ -76,38 +76,27 @@ def detailed_tokens(tokenizer, text):
    return words
-class JapaneseTokenizer(object):
+class JapaneseTokenizer(DummyTokenizer):
    def __init__(self, cls, nlp=None):
        self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
-        MeCab = try_mecab_import()
+        self.tokenizer = try_mecab_import().Tagger()
        self.tokenizer = MeCab.Tagger()
        self.tokenizer.parseToNode("")  # see #2901
    def __call__(self, text):
        dtokens = detailed_tokens(self.tokenizer, text)
        words = [x.surface for x in dtokens]
-        doc = Doc(self.vocab, words=words, spaces=[False] * len(words))
+        spaces = [False] * len(words)
        doc = Doc(self.vocab, words=words, spaces=spaces)
        for token, dtoken in zip(doc, dtokens):
            token._.mecab_tag = dtoken.pos
            token.tag_ = resolve_pos(dtoken)
            token.lemma_ = dtoken.lemma
        return doc
    # add dummy methods for to_bytes, from_bytes, to_disk and from_disk to
    # allow serialization (see #1557)
    def to_bytes(self, **exclude):
        return b""
    def from_bytes(self, bytes_data, **exclude):
        return self
    def to_disk(self, path, **exclude):
        return None
    def from_disk(self, path, **exclude):
        return self
 class JapaneseCharacterSegmenter(object):
    def __init__(self, vocab):
@ -154,7 +143,8 @@ class JapaneseCharacterSegmenter(object):
 class JapaneseDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
-    lex_attr_getters[LANG] = lambda text: "ja"
+    lex_attr_getters[LANG] = lambda _text: "ja"
    tag_map = TAG_MAP
    use_janome = True
@ -169,7 +159,6 @@ class JapaneseDefaults(Language.Defaults):
 class Japanese(Language):
    lang = "ja"
    Defaults = JapaneseDefaults
    Tokenizer = JapaneseTokenizer
    def make_doc(self, text):
        return self.tokenizer(text)
--- a/spacy/lang/sv/init.py
+++ b/spacy/lang/sv/init.py
@ -5,6 +5,7 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .stop_words import STOP_WORDS
 from .morph_rules import MORPH_RULES
 from .lemmatizer import LEMMA_RULES, LOOKUP
 from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
 from ..tokenizer_exceptions import BASE_EXCEPTIONS
 from ..norm_exceptions import BASE_NORMS
@ -20,12 +21,14 @@ class SwedishDefaults(Language.Defaults):
        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    morph_rules = MORPH_RULES
    infixes = TOKENIZER_INFIXES
    suffixes = TOKENIZER_SUFFIXES
    stop_words = STOP_WORDS
    lemma_rules = LEMMA_RULES
    lemma_lookup = LOOKUP
    morph_rules = MORPH_RULES
 class Swedish(Language):
    lang = "sv"
    Defaults = SwedishDefaults
--- a/spacy/lang/sv/lemmatizer/lookup.py
+++ b/spacy/lang/sv/lemmatizer/lookup.py
@ -233167,7 +233167,6 @@ LOOKUP = {
    "jades": "jade",
    "jaet": "ja",
    "jaets": "ja",
    "jag": "jaga",
    "jagad": "jaga",
    "jagade": "jaga",
    "jagades": "jaga",
--- a/spacy/lang/sv/punctuation.py
+++ b/spacy/lang/sv/punctuation.py
@ -0,0 +1,25 @@
 # coding: utf8
 """Punctuation stolen from Danish"""
 from __future__ import unicode_literals
 from ..char_classes import LIST_ELLIPSES, LIST_ICONS
 from ..char_classes import QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
 from ..punctuation import TOKENIZER_SUFFIXES
 _quotes = QUOTES.replace("'", '')
 _infixes = (LIST_ELLIPSES + LIST_ICONS +
            [r'(?<=[{}])\.(?=[{}])'.format(ALPHA_LOWER, ALPHA_UPPER),
             r'(?<=[{a}])[,!?](?=[{a}])'.format(a=ALPHA),
             r'(?<=[{a}"])[:<>=](?=[{a}])'.format(a=ALPHA),
             r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
             r'(?<=[{a}])([{q}\)\]\(\[])(?=[\{a}])'.format(a=ALPHA, q=_quotes),
             r'(?<=[{a}])--(?=[{a}])'.format(a=ALPHA)])
 _suffixes = [suffix for suffix in TOKENIZER_SUFFIXES if suffix not in ["'s", "'S", "’s", "’S", r"\'"]]
 _suffixes += [r"(?<=[^sSxXzZ])\'"]
 TOKENIZER_INFIXES = _infixes
 TOKENIZER_SUFFIXES = _suffixes
--- a/spacy/lang/sv/tokenizer_exceptions.py
+++ b/spacy/lang/sv/tokenizer_exceptions.py
@ -26,14 +26,15 @@ for verb_data in [
            {ORTH: "u", LEMMA: PRON_LEMMA, NORM: "du"},
        ]
-
+# Abbreviations for weekdays "sön." (for "söndag" / "söner")
 # are left out because they are ambiguous. The same is the case
 # for abbreviations "jul." and "Jul." ("juli" / "jul").
 for exc_data in [
    {ORTH: "jan.", LEMMA: "januari"},
    {ORTH: "febr.", LEMMA: "februari"},
    {ORTH: "feb.", LEMMA: "februari"},
    {ORTH: "apr.", LEMMA: "april"},
    {ORTH: "jun.", LEMMA: "juni"},
    {ORTH: "jul.", LEMMA: "juli"},
    {ORTH: "aug.", LEMMA: "augusti"},
    {ORTH: "sept.", LEMMA: "september"},
    {ORTH: "sep.", LEMMA: "september"},
@ -46,13 +47,11 @@ for exc_data in [
    {ORTH: "tors.", LEMMA: "torsdag"},
    {ORTH: "fre.", LEMMA: "fredag"},
    {ORTH: "lör.", LEMMA: "lördag"},
    {ORTH: "sön.", LEMMA: "söndag"},
    {ORTH: "Jan.", LEMMA: "Januari"},
    {ORTH: "Febr.", LEMMA: "Februari"},
    {ORTH: "Feb.", LEMMA: "Februari"},
    {ORTH: "Apr.", LEMMA: "April"},
    {ORTH: "Jun.", LEMMA: "Juni"},
    {ORTH: "Jul.", LEMMA: "Juli"},
    {ORTH: "Aug.", LEMMA: "Augusti"},
    {ORTH: "Sept.", LEMMA: "September"},
    {ORTH: "Sep.", LEMMA: "September"},
@ -65,28 +64,32 @@ for exc_data in [
    {ORTH: "Tors.", LEMMA: "Torsdag"},
    {ORTH: "Fre.", LEMMA: "Fredag"},
    {ORTH: "Lör.", LEMMA: "Lördag"},
    {ORTH: "Sön.", LEMMA: "Söndag"},
    {ORTH: "sthlm", LEMMA: "Stockholm"},
    {ORTH: "gbg", LEMMA: "Göteborg"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]
 # Specific case abbreviations only
 for orth in ["AB", "Dr.", "H.M.", "H.K.H.", "m/s", "M/S", "Ph.d.", "S:t", "s:t"]:
    _exc[orth] = [{ORTH: orth}]
 ABBREVIATIONS = [
    "ang",
    "anm",
    "bil",
    "bl.a",
    "d.v.s",
    "doc",
    "dvs",
    "e.d",
    "e.kr",
-    "el",
+    "el.",
    "eng",
    "etc",
    "exkl",
-    "f",
+    "ev",
    "f.",
    "f.d",
    "f.kr",
    "f.n",
@ -97,10 +100,11 @@ ABBREVIATIONS = [
    "fr.o.m",
    "förf",
    "inkl",
-    "jur",
+    "iofs",
    "jur.",
    "kap",
    "kl",
-    "kor",
+    "kor.",
    "kr",
    "kungl",
    "lat",
@ -109,9 +113,10 @@ ABBREVIATIONS = [
    "m.m",
    "max",
    "milj",
-    "min",
+    "min.",
    "mos",
    "mt",
    "mvh",
    "o.d",
    "o.s.v",
    "obs",
@ -125,21 +130,27 @@ ABBREVIATIONS = [
    "s.k",
    "s.t",
    "sid",
    "s:t",
    "t.ex",
    "t.h",
    "t.o.m",
    "t.v",
    "tel",
-    "ung",
+    "ung.",
    "vol",
    "v.",
    "äv",
    "övers",
 ]
-ABBREVIATIONS = [abbr + "." for abbr in ABBREVIATIONS] + ABBREVIATIONS
+
 # Add abbreviation for trailing punctuation too. If the abbreviation already has a trailing punctuation - skip it.
 for abbr in ABBREVIATIONS:
    if abbr.endswith(".") == False:
        ABBREVIATIONS.append(abbr + ".")
 for orth in ABBREVIATIONS:
    _exc[orth] = [{ORTH: orth}]
    capitalized = orth.capitalize()
    _exc[capitalized] = [{ORTH: capitalized}]
 # Sentences ending in "i." (as in "... peka i."), "m." (as in "...än 2000 m."),
 # should be tokenized as two separate tokens.
--- a/spacy/lang/ta/init.py
+++ b/spacy/lang/ta/init.py
@ -0,0 +1,24 @@
 # import language-specific data
 from .stop_words import STOP_WORDS
 from .lex_attrs import LEX_ATTRS
 from ..tokenizer_exceptions import BASE_EXCEPTIONS
 from ...language import Language
 from ...attrs import LANG
 from ...util import update_exc
 # create Defaults class in the module scope (necessary for pickling!)
 class TamilDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters[LANG] = lambda text: 'ta' # language ISO code
    # optional: replace flags with custom functions, e.g. like_num()
    lex_attr_getters.update(LEX_ATTRS)
 # create actual Language class
 class Tamil(Language):
    lang = 'ta' # language ISO code
    Defaults = TamilDefaults # override defaults
 # set default export – this allows the language class to be lazy-loaded
 __all__ = ['Tamil']
--- a/spacy/lang/ta/examples.py
+++ b/spacy/lang/ta/examples.py
@ -0,0 +1,21 @@
 # coding: utf8
 from __future__ import unicode_literals
 """
 Example sentences to test spaCy and its language models.
 >>> from spacy.lang.ta.examples import sentences
 >>> docs = nlp.pipe(sentences)
 """
 sentences = [
    "கிறிஸ்துமஸ் மற்றும் இனிய புத்தாண்டு வாழ்த்துக்கள்",
    "எனக்கு என் குழந்தைப் பருவம் நினைவிருக்கிறது",
    "உங்கள் பெயர் என்ன?",
    "ஏறத்தாழ இலங்கைத் தமிழரில் மூன்றிலொரு பங்கினர் இலங்கையை விட்டு வெளியேறிப் பிற நாடுகளில் வாழ்கின்றனர்",
    "இந்த ஃபோனுடன் சுமார் ரூ.2,990 மதிப்புள்ள போட் ராக்கர்ஸ் நிறுவனத்தின் ஸ்போர்ட் புளூடூத் ஹெட்போன்ஸ்  இலவசமாக வழங்கப்படவுள்ளது.",
    "மட்டக்களப்பில் பல இடங்களில் வீட்டுத் திட்டங்களுக்கு இன்று அடிக்கல் நாட்டல்",
    "ஐ போன்க்கு முகத்தை வைத்து அன்லாக் செய்யும் முறை மற்றும்  விரலால் தொட்டு அன்லாக் செய்யும் முறையை வாட்ஸ் ஆப் நிறுவனம் இதற்கு முன் கண்டுபிடித்தது"
 ]
--- a/spacy/lang/ta/lex_attrs.py
+++ b/spacy/lang/ta/lex_attrs.py
@ -0,0 +1,44 @@
 # coding: utf8
 from __future__ import unicode_literals
 from ...attrs import LIKE_NUM
 _numeral_suffixes = {'பத்து': 'பது', 'ற்று': 'று', 'ரத்து':'ரம்' , 'சத்து': 'சம்'}
 _num_words = ['பூச்சியம்', 'ஒரு', 'ஒன்று', 'இரண்டு', 'மூன்று', 'நான்கு', 'ஐந்து', 'ஆறு', 'ஏழு',
              'எட்டு', 'ஒன்பது', 'பத்து', 'பதினொன்று', 'பன்னிரண்டு', 'பதின்மூன்று', 'பதினான்கு',
              'பதினைந்து', 'பதினாறு', 'பதினேழு', 'பதினெட்டு', 'பத்தொன்பது', 'இருபது',
              'முப்பது', 'நாற்பது', 'ஐம்பது', 'அறுபது', 'எழுபது', 'எண்பது', 'தொண்ணூறு',
              'நூறு', 'இருநூறு', 'முன்னூறு', 'நாநூறு', 'ஐநூறு', 'அறுநூறு', 'எழுநூறு', 'எண்ணூறு', 'தொள்ளாயிரம்',
              'ஆயிரம்', 'ஒராயிரம்', 'லட்சம்', 'மில்லியன்', 'கோடி', 'பில்லியன்', 'டிரில்லியன்']
 # 20-89 ,90-899,900-99999 and above have different suffixes
 def suffix_filter(text):
    # text without numeral suffixes
    for num_suffix in _numeral_suffixes.keys():
        length = len(num_suffix)
        if (len(text) < length):
            break
        elif text.endswith(num_suffix):
            return text[:-length] + _numeral_suffixes[num_suffix]
    return text
 def like_num(text):
    text = text.replace(',', '').replace('.', '')
    if text.isdigit():
        return True
    if text.count('/') == 1:
        num, denom = text.split('/')
        if num.isdigit() and denom.isdigit():
            return True
    print(suffix_filter(text))
    if text.lower() in _num_words:
        return True
    elif suffix_filter(text) in _num_words:
        return True
    return False
 LEX_ATTRS = {
    LIKE_NUM: like_num
 }
--- a/spacy/lang/ta/norm_exceptions.py
+++ b/spacy/lang/ta/norm_exceptions.py
@ -0,0 +1,148 @@
 # coding: utf8
 from __future__ import unicode_literals
 _exc = {
    # Regional words normal
    # Sri Lanka - wikipeadia
    "இங்க": "இங்கே",
    "வாங்க": "வாருங்கள்",
    'ஒண்டு':'ஒன்று',
    'கண்டு': 'கன்று',
    'கொண்டு': 'கொன்று',
    'பண்டி': 'பன்றி',
    'பச்ச': 'பச்சை',
    'அம்பது': 'ஐம்பது',
    'வெச்ச': 'வைத்து',
    'வச்ச': 'வைத்து',
    'வச்சி': 'வைத்து',
    'வாளைப்பழம்':'வாழைப்பழம்',
    'மண்ணு': 'மண்',
    'பொன்னு': 'பொன்',
    'சாவல்': 'சேவல்',
    'அங்கால': 'அங்கு ',
    'அசுப்பு': 'நடமாட்டம்',
    'எழுவான் கரை': 'எழுவான்கரை',
    'ஓய்யாரம்': 'எழில் ',
    'ஒளும்பு': 'எழும்பு',
    'ஓர்மை': 'துணிவு',
    'கச்சை': 'கோவணம்',
    'கடப்பு': 'தெருவாசல்',
    'சுள்ளி': 'காய்ந்த குச்சி',
    'திறாவுதல்': 'தடவுதல்',
    'நாசமறுப்பு': 'தொல்லை',
    'பரிசாரி': 'வைத்தியன்',
    'பறவாதி': 'பேராசைக்காரன்',
    'பிசினி': 'உலோபி ',
    'விசர்': 'பைத்தியம்',
    'ஏனம்': 'பாத்திரம்',
    'ஏலா': 'இயலாது',
    'ஒசில்': 'அழகு',
    'ஒள்ளுப்பம்': 'கொஞ்சம்',
    # Srilankan and indian
    'குத்துமதிப்பு': '',
    'நூனாயம்': 'நூல்நயம்',
    'பைய': 'மெதுவாக',
    'மண்டை': 'தலை',
    'வெள்ளனே': 'சீக்கிரம்',
    'உசுப்பு': 'எழுப்பு',
    'ஆணம்': 'குழம்பு',
    'உறக்கம்': 'தூக்கம்',
    'பஸ்': 'பேருந்து',
    'களவு': 'திருட்டு ',
    #relationship
    'புருசன்': 'கணவன்',
    'பொஞ்சாதி': 'மனைவி',
    'புள்ள': 'பிள்ளை',
    'பிள்ள': 'பிள்ளை',
    'ஆம்பிளப்புள்ள': 'ஆண் பிள்ளை',
    'பொம்பிளப்புள்ள': 'பெண் பிள்ளை',
    'அண்ணாச்சி': 'அண்ணா',
    'அக்காச்சி': 'அக்கா',
    'தங்கச்சி': 'தங்கை',
    #difference words
    'பொடியன்': 'சிறுவன்',
    'பொட்டை': 'சிறுமி',
    'பிறகு': 'பின்பு',
    'டக்கென்டு': 'விரைவாக',
    'கெதியா': 'விரைவாக',
    'கிறுகி': 'திரும்பி',
    'போயித்து வாறன்': 'போய் வருகிறேன்',
    'வருவாங்களா': 'வருவார்களா',
    # regular spokens
    'சொல்லு': 'சொல்',
    'கேளு': 'கேள்',
    'சொல்லுங்க': 'சொல்லுங்கள்',
    'கேளுங்க': 'கேளுங்கள்',
    'நீங்கள்': 'நீ',
    'உன்': 'உன்னுடைய',
    # Portugeese formal words
    'அலவாங்கு': 'கடப்பாரை',
    'ஆசுப்பத்திரி': 'மருத்துவமனை',
    'உரோதை': 'சில்லு',
    'கடுதாசி': 'கடிதம்',
    'கதிரை': 'நாற்காலி',
    'குசினி': 'அடுக்களை',
    'கோப்பை': 'கிண்ணம்',
    'சப்பாத்து': 'காலணி',
    'தாச்சி': 'இரும்புச் சட்டி',
    'துவாய்': 'துவாலை',
    'தவறணை': 'மதுக்கடை',
    'பீப்பா': 'மரத்தாழி',
    'யன்னல்': 'சாளரம்',
    'வாங்கு': 'மரஇருக்கை',
    # Dutch formal words
    'இறாக்கை': 'பற்சட்டம்',
    'இலாட்சி': 'இழுப்பறை',
    'கந்தோர்': 'பணிமனை',
    'நொத்தாரிசு': 'ஆவண எழுத்துபதிவாளர்',
    # English formal words
    'இஞ்சினியர்': 'பொறியியலாளர்',
    'சூப்பு': 'ரசம்',
    'செக்': 'காசோலை',
    'சேட்டு': 'மேற்ச்சட்டை',
    'மார்க்கட்டு': 'சந்தை',
    'விண்ணன்': 'கெட்டிக்காரன்',
    # Arabic formal words
    'ஈமான்': 'நம்பிக்கை',
    'சுன்னத்து': 'விருத்தசேதனம்',
    'செய்த்தான்': 'பிசாசு',
    'மவுத்து': 'இறப்பு',
    'ஹலால்': 'அங்கீகரிக்கப்பட்டது',
    'கறாம்': 'நிராகரிக்கப்பட்டது',
    # Persian, Hindustanian and hindi formal words
    'சுமார்': 'கிட்டத்தட்ட',
    'சிப்பாய்': 'போர்வீரன்',
    'சிபார்சு': 'சிபாரிசு',
    'ஜமீன்': 'பணக்காரா்',
    'அசல்': 'மெய்யான',
    'அந்தஸ்து': 'கௌரவம்',
    'ஆஜர்': 'சமா்ப்பித்தல்',
    'உசார்': 'எச்சரிக்கை',
    'அச்சா':'நல்ல',
    # English words used in text conversations
    "bcoz": "ஏனெனில்",
    "bcuz": "ஏனெனில்",
    "fav": "விருப்பமான",
    "morning": "காலை வணக்கம்",
    "gdeveng": "மாலை வணக்கம்",
    "gdnyt": "இரவு வணக்கம்",
    "gdnit": "இரவு வணக்கம்",
    "plz": "தயவு செய்து",
    "pls": "தயவு செய்து",
    "thx": "நன்றி",
    "thanx": "நன்றி",
 }
 NORM_EXCEPTIONS = {}
 for string, norm in _exc.items():
    NORM_EXCEPTIONS[string] = norm
--- a/spacy/lang/ta/stop_words.py
+++ b/spacy/lang/ta/stop_words.py
@ -0,0 +1,133 @@
 # coding: utf8
 from __future__ import unicode_literals
 # Stop words
 STOP_WORDS = set("""
 ஒரு
 என்று
 மற்றும்
 இந்த
 இது
 என்ற
 கொண்டு
 என்பது
 பல
 ஆகும்
 அல்லது
 அவர்
 நான்
 உள்ள
 அந்த
 இவர்
 என
 முதல்
 என்ன
 இருந்து
 சில
 என்
 போன்ற
 வேண்டும்
 வந்து
 இதன்
 அது
 அவன்
 தான்
 பலரும்
 என்னும்
 மேலும்
 பின்னர்
 கொண்ட
 இருக்கும்
 தனது
 உள்ளது
 போது
 என்றும்
 அதன்
 தன்
 பிறகு
 அவர்கள்
 வரை
 அவள்
 நீ
 ஆகிய
 இருந்தது
 உள்ளன
 வந்த
 இருந்த
 மிகவும்
 இங்கு
 மீது
 ஓர்
 இவை
 இந்தக்
 பற்றி
 வரும்
 வேறு
 இரு
 இதில்
 போல்
 இப்போது
 அவரது
 மட்டும்
 இந்தப்
 எனும்
 மேல்
 பின்
 சேர்ந்த
 ஆகியோர்
 எனக்கு
 இன்னும்
 அந்தப்
 அன்று
 ஒரே
 மிக
 அங்கு
 பல்வேறு
 விட்டு
 பெரும்
 அதை
 பற்றிய
 உன்
 அதிக
 அந்தக்
 பேர்
 இதனால்
 அவை
 அதே
 ஏன்
 முறை
 யார்
 என்பதை
 எல்லாம்
 மட்டுமே
 இங்கே
 அங்கே
 இடம்
 இடத்தில்
 அதில்
 நாம்
 அதற்கு
 எனவே
 பிற
 சிறு
 மற்ற
 விட
 எந்த
 எனவும்
 எனப்படும்
 எனினும்
 அடுத்த
 இதனை
 இதை
 கொள்ள
 இந்தத்
 இதற்கு
 அதனால்
 தவிர
 போல
 வரையில்
 சற்று
 எனக்
 """.split())
--- a/spacy/lang/th/init.py
+++ b/spacy/lang/th/init.py
@ -5,24 +5,14 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .tag_map import TAG_MAP
 from .stop_words import STOP_WORDS
 from ...tokens import Doc
 from ...language import Language
 from ...attrs import LANG
 from ...language import Language
 from ...tokens import Doc
 from ...util import DummyTokenizer
-class ThaiDefaults(Language.Defaults):
+class ThaiTokenizer(DummyTokenizer):
-    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    def __init__(self, cls, nlp=None):
    lex_attr_getters[LANG] = lambda text: "th"
    tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
    tag_map = TAG_MAP
    stop_words = STOP_WORDS
 class Thai(Language):
    lang = "th"
    Defaults = ThaiDefaults
    def make_doc(self, text):
        try:
            from pythainlp.tokenize import word_tokenize
        except ImportError:
@ -30,8 +20,35 @@ class Thai(Language):
                "The Thai tokenizer requires the PyThaiNLP library: "
                "https://github.com/PyThaiNLP/pythainlp"
            )
-        words = [x for x in list(word_tokenize(text, "newmm"))]
+
-        return Doc(self.vocab, words=words, spaces=[False] * len(words))
+        self.word_tokenize = word_tokenize
        self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
    def __call__(self, text):
        words = list(self.word_tokenize(text, "newmm"))
        spaces = [False] * len(words)
        return Doc(self.vocab, words=words, spaces=spaces)
 class ThaiDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters[LANG] = lambda _text: "th"
    tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
    tag_map = TAG_MAP
    stop_words = STOP_WORDS
    @classmethod
    def create_tokenizer(cls, nlp=None):
        return ThaiTokenizer(cls, nlp)
 class Thai(Language):
    lang = "th"
    Defaults = ThaiDefaults
    def make_doc(self, text):
        return self.tokenizer(text)
 __all__ = ["Thai"]
--- a/spacy/lang/tr/lex_attrs.py
+++ b/spacy/lang/tr/lex_attrs.py
@ -5,6 +5,7 @@ from ...attrs import LIKE_NUM
 # Thirteen, fifteen etc. are written separate: on üç
 _num_words = [
    "bir",
    "iki",
@ -28,6 +29,7 @@ _num_words = [
    "bin",
    "milyon",
    "milyar",
    "trilyon",
    "katrilyon",
    "kentilyon",
 ]
--- a/spacy/tests/doc/test_doc_api.py
+++ b/spacy/tests/doc/test_doc_api.py
@ -353,10 +353,38 @@ def test_doc_api_similarity_match():
        assert doc.similarity(doc2) == 0.0
-def test_lowest_common_ancestor(en_tokenizer):
+@pytest.mark.parametrize(
-    tokens = en_tokenizer("the lazy dog slept")
+    "sentence,heads,lca_matrix",
-    doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=[2, 1, 1, 0])
+    [
        (
            "the lazy dog slept",
            [2, 1, 1, 0],
            numpy.array([[0, 2, 2, 3], [2, 1, 2, 3], [2, 2, 2, 3], [3, 3, 3, 3]]),
        ),
        (
            "The lazy dog slept. The quick fox jumped",
            [2, 1, 1, 0, -1, 2, 1, 1, 0],
            numpy.array(
                [
                    [0, 2, 2, 3, 3, -1, -1, -1, -1],
                    [2, 1, 2, 3, 3, -1, -1, -1, -1],
                    [2, 2, 2, 3, 3, -1, -1, -1, -1],
                    [3, 3, 3, 3, 3, -1, -1, -1, -1],
                    [3, 3, 3, 3, 4, -1, -1, -1, -1],
                    [-1, -1, -1, -1, -1, 5, 7, 7, 8],
                    [-1, -1, -1, -1, -1, 7, 6, 7, 8],
                    [-1, -1, -1, -1, -1, 7, 7, 7, 8],
                    [-1, -1, -1, -1, -1, 8, 8, 8, 8],
                ]
            ),
        ),
    ],
 )
 def test_lowest_common_ancestor(en_tokenizer, sentence, heads, lca_matrix):
    tokens = en_tokenizer(sentence)
    doc = get_doc(tokens.vocab, [t.text for t in tokens], heads=heads)
    lca = doc.get_lca_matrix()
    assert (lca == lca_matrix).all()
    assert lca[1, 1] == 1
    assert lca[0, 1] == 2
    assert lca[1, 2] == 2
--- a/spacy/tests/doc/test_span.py
+++ b/spacy/tests/doc/test_span.py
@ -80,10 +80,24 @@ def test_spans_lca_matrix(en_tokenizer):
    tokens = en_tokenizer("the lazy dog slept")
    doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=[2, 1, 1, 0])
    lca = doc[:2].get_lca_matrix()
-    assert lca[0, 0] == 0
+    assert lca.shape == (2, 2)
-    assert lca[0, 1] == -1
+    assert lca[0, 0] == 0  # the & the -> the
-    assert lca[1, 0] == -1
+    assert lca[0, 1] == -1  # the & lazy -> dog (out of span)
-    assert lca[1, 1] == 1
+    assert lca[1, 0] == -1  # lazy & the -> dog (out of span)
    assert lca[1, 1] == 1  # lazy & lazy -> lazy
    lca = doc[1:].get_lca_matrix()
    assert lca.shape == (3, 3)
    assert lca[0, 0] == 0  # lazy & lazy -> lazy
    assert lca[0, 1] == 1  # lazy & dog -> dog
    assert lca[0, 2] == 2  # lazy & slept -> slept
    lca = doc[2:].get_lca_matrix()
    assert lca.shape == (2, 2)
    assert lca[0, 0] == 0  # dog & dog -> dog
    assert lca[0, 1] == 1  # dog & slept -> slept
    assert lca[1, 0] == 1  # slept & dog -> slept
    assert lca[1, 1] == 1  # slept & slept -> slept
 def test_span_similarity_match():
@ -158,15 +172,17 @@ def test_span_as_doc(doc):
 def test_span_string_label(doc):
-    span = Span(doc, 0, 1, label='hello')
+    span = Span(doc, 0, 1, label="hello")
-    assert span.label_ == 'hello'
+    assert span.label_ == "hello"
-    assert span.label == doc.vocab.strings['hello']
+    assert span.label == doc.vocab.strings["hello"]
 def test_span_string_set_label(doc):
    span = Span(doc, 0, 1)
-    span.label_ = 'hello'
+    span.label_ = "hello"
-    assert span.label_ == 'hello'
+    assert span.label_ == "hello"
-    assert span.label == doc.vocab.strings['hello']
+    assert span.label == doc.vocab.strings["hello"]
 def test_span_ents_property(doc):
    """Test span.ents for the """
--- a/spacy/tests/lang/sv/test_exceptions.py
+++ b/spacy/tests/lang/sv/test_exceptions.py
@ -0,0 +1,53 @@
 # coding: utf8
 from __future__ import unicode_literals
 import pytest
 SV_TOKEN_EXCEPTION_TESTS = [
    ('Smörsåsen används bl.a. till fisk', ['Smörsåsen', 'används', 'bl.a.', 'till', 'fisk']),
    ('Jag kommer först kl. 13 p.g.a. diverse förseningar', ['Jag', 'kommer', 'först', 'kl.', '13', 'p.g.a.', 'diverse', 'förseningar']),
    ('Anders I. tycker om ord med i i.', ["Anders", "I.", "tycker", "om", "ord", "med", "i", "i", "."])
 ]
@pytest.mark.parametrize('text,expected_tokens', SV_TOKEN_EXCEPTION_TESTS)
 def test_sv_tokenizer_handles_exception_cases(sv_tokenizer, text, expected_tokens):
    tokens = sv_tokenizer(text)
    token_list = [token.text for token in tokens if not token.is_space]
    assert expected_tokens == token_list
@pytest.mark.parametrize('text', ["driveru", "hajaru", "Serru", "Fixaru"])
 def test_sv_tokenizer_handles_verb_exceptions(sv_tokenizer, text):
    tokens = sv_tokenizer(text)
    assert len(tokens) == 2
    assert tokens[1].text == "u"
@pytest.mark.parametrize('text',
                         ["bl.a", "m.a.o.", "Jan.", "Dec.", "kr.", "osv."])
 def test_sv_tokenizer_handles_abbr(sv_tokenizer, text):
    tokens = sv_tokenizer(text)
    assert len(tokens) == 1
@pytest.mark.parametrize('text', ["Jul.", "jul.", "sön.", "Sön."])
 def test_sv_tokenizer_handles_ambiguous_abbr(sv_tokenizer, text):
    tokens = sv_tokenizer(text)
    assert len(tokens) == 2
 def test_sv_tokenizer_handles_exc_in_text(sv_tokenizer):
    text = "Det er bl.a. ikke meningen"
    tokens = sv_tokenizer(text)
    assert len(tokens) == 5
    assert tokens[2].text == "bl.a."
 def test_sv_tokenizer_handles_custom_base_exc(sv_tokenizer):
    text = "Her er noget du kan kigge i."
    tokens = sv_tokenizer(text)
    assert len(tokens) == 8
    assert tokens[6].text == "i"
    assert tokens[7].text == "."
--- a/spacy/tests/lang/sv/test_lemmatizer.py
+++ b/spacy/tests/lang/sv/test_lemmatizer.py
@ -0,0 +1,15 @@
 # coding: utf-8
 from __future__ import unicode_literals
 import pytest
@pytest.mark.parametrize('string,lemma', [('DNA-profilernas', 'DNA-profil'),
                                          ('Elfenbenskustens', 'Elfenbenskusten'),
                                          ('abortmotståndarens', 'abortmotståndare'),
                                          ('kolesterols', 'kolesterol'),
                                          ('portionssnusernas', 'portionssnus'),
                                          ('åsyns', 'åsyn')])
 def test_lemmatizer_lookup_assigns(sv_tokenizer, string, lemma):
    tokens = sv_tokenizer(string)
    assert tokens[0].lemma_ == lemma
--- a/spacy/tests/lang/sv/test_prefix_suffix_infix.py
+++ b/spacy/tests/lang/sv/test_prefix_suffix_infix.py
@ -0,0 +1,37 @@
 # coding: utf-8
 """Test that tokenizer prefixes, suffixes and infixes are handled correctly."""
 from __future__ import unicode_literals
 import pytest
@pytest.mark.parametrize('text', ["(under)"])
 def test_tokenizer_splits_no_special(sv_tokenizer, text):
    tokens = sv_tokenizer(text)
    assert len(tokens) == 3
@pytest.mark.parametrize('text', ["gitta'r", "Björn's", "Lars'"])
 def test_tokenizer_handles_no_punct(sv_tokenizer, text):
    tokens = sv_tokenizer(text)
    assert len(tokens) == 1
@pytest.mark.parametrize('text', ["svart.Gul", "Hej.Världen"])
 def test_tokenizer_splits_period_infix(sv_tokenizer, text):
    tokens = sv_tokenizer(text)
    assert len(tokens) == 3
@pytest.mark.parametrize('text', ["Hej,Världen", "en,två"])
 def test_tokenizer_splits_comma_infix(sv_tokenizer, text):
    tokens = sv_tokenizer(text)
    assert len(tokens) == 3
    assert tokens[0].text == text.split(",")[0]
    assert tokens[1].text == ","
    assert tokens[2].text == text.split(",")[1]
@pytest.mark.parametrize('text', ["svart...Gul", "svart...gul"])
 def test_tokenizer_splits_ellipsis_infix(sv_tokenizer, text):
    tokens = sv_tokenizer(text)
    assert len(tokens) == 3
--- a/spacy/tests/lang/sv/test_text.py
+++ b/spacy/tests/lang/sv/test_text.py
@ -0,0 +1,21 @@
 # coding: utf-8
 """Test that longer and mixed texts are tokenized correctly."""
 from __future__ import unicode_literals
 import pytest
 def test_sv_tokenizer_handles_long_text(sv_tokenizer):
    text = """Det var så härligt ute på landet. Det var sommar, majsen var gul, havren grön,
 höet var uppställt i stackar nere vid den gröna ängen, och där gick storken på sina långa,
 röda ben och snackade engelska, för det språket hade han lärt sig av sin mor.
 Runt om åkrar och äng låg den stora skogen, och mitt i skogen fanns djupa sjöar; jo, det var verkligen trevligt ute på landet!"""
    tokens = sv_tokenizer(text)
    assert len(tokens) == 86
 def test_sv_tokenizer_handles_trailing_dot_for_i_in_sentence(sv_tokenizer):
    text = "Provar att tokenisera en mening med ord i."
    tokens = sv_tokenizer(text)
    assert len(tokens) == 9
--- a/spacy/tests/regression/test_issue2396.py
+++ b/spacy/tests/regression/test_issue2396.py
@ -5,27 +5,31 @@ from ..util import get_doc
 import pytest
 import numpy
 from numpy.testing import assert_array_equal
-@pytest.mark.parametrize('words,heads,matrix', [
+@pytest.mark.parametrize(
-    (
+    "sentence,heads,matrix",
-        'She created a test for spacy'.split(),
+    [
-        [1, 0, 1, -2, -1, -1],
+        (
-        numpy.array([
+            "She created a test for spacy",
-            [0, 1, 1, 1, 1, 1],
+            [1, 0, 1, -2, -1, -1],
-            [1, 1, 1, 1, 1, 1],
+            numpy.array(
-            [1, 1, 2, 3, 3, 3],
+                [
-            [1, 1, 3, 3, 3, 3],
+                    [0, 1, 1, 1, 1, 1],
-            [1, 1, 3, 3, 4, 4],
+                    [1, 1, 1, 1, 1, 1],
-            [1, 1, 3, 3, 4, 5]], dtype=numpy.int32)
+                    [1, 1, 2, 3, 3, 3],
-    )
+                    [1, 1, 3, 3, 3, 3],
-    ])
+                    [1, 1, 3, 3, 4, 4],
-def test_issue2396(en_vocab, words, heads, matrix):
+                    [1, 1, 3, 3, 4, 5],
-    doc = get_doc(en_vocab, words=words, heads=heads)
+                ],
-
+                dtype=numpy.int32,
            ),
        )
    ],
 )
 def test_issue2396(en_tokenizer, sentence, heads, matrix):
    tokens = en_tokenizer(sentence)
    doc = get_doc(tokens.vocab, [t.text for t in tokens], heads=heads)
    span = doc[:]
-    assert_array_equal(doc.get_lca_matrix(), matrix)
+    assert (doc.get_lca_matrix() == matrix).all()
-    assert_array_equal(span.get_lca_matrix(), matrix)
+    assert (span.get_lca_matrix() == matrix).all()
--- a/spacy/tests/regression/test_issue2901.py
+++ b/spacy/tests/regression/test_issue2901.py
@ -10,7 +10,7 @@ def test_issue2901():
    """Test that `nlp` doesn't fail."""
    try:
        nlp = Japanese()
-    except:
+    except ImportError:
        pytest.skip()
    doc = nlp("pythonが大好きです")
--- a/spacy/tests/regression/test_issue3178.py
+++ b/spacy/tests/regression/test_issue3178.py
@ -0,0 +1,10 @@
 from __future__ import unicode_literals
 import pytest
 import spacy
@pytest.mark.models('fr')
 def test_issue1959(FR):
    texts = ['Je suis la mauvaise herbe', "Me, myself and moi"]
    for text in texts:
        FR(text)
--- a/spacy/tokens/doc.pyx
+++ b/spacy/tokens/doc.pyx
@ -1075,21 +1075,30 @@ cdef int [:,:] _get_lca_matrix(Doc doc, int start, int end):
    cdef int [:,:] lca_matrix
    n_tokens= end - start
-    lca_matrix = numpy.empty((n_tokens, n_tokens), dtype=numpy.int32)
+    lca_mat = numpy.empty((n_tokens, n_tokens), dtype=numpy.int32)
    lca_mat.fill(-1)
    lca_matrix = lca_mat
-    for j in range(start, end):
+    for j in range(n_tokens):
-        token_j = doc[j]
+        token_j = doc[start + j]
        # the common ancestor of token and itself is itself:
        lca_matrix[j, j] = j
-        for k in range(j + 1, end):
+        # we will only iterate through tokens in the same sentence
-            lca = _get_tokens_lca(token_j, doc[k])
+        sent = token_j.sent
        sent_start = sent.start
        j_idx_in_sent = start + j - sent_start
        n_missing_tokens_in_sent = len(sent) - j_idx_in_sent
        # make sure we do not go past `end`, in cases where `end` < sent.end
        max_range = min(j + n_missing_tokens_in_sent, end)
        for k in range(j + 1, max_range):
            lca = _get_tokens_lca(token_j, doc[start + k])
            # if lca is outside of span, we set it to -1
            if not start <= lca < end:
                lca_matrix[j, k] = -1
                lca_matrix[k, j] = -1
            else:
-                lca_matrix[j, k] = lca
+                lca_matrix[j, k] = lca - start
-                lca_matrix[k, j] = lca
+                lca_matrix[k, j] = lca - start
    return lca_matrix
--- a/spacy/tokens/span.pyx
+++ b/spacy/tokens/span.pyx
@ -524,9 +524,9 @@ cdef class Span:
            return len(list(self.rights))
    property subtree:
-        """Tokens that descend from tokens in the span, but fall outside it.
+        """Tokens within the span and tokens which descend from them.
-        YIELDS (Token): A descendant of a token within the span.
+        YIELDS (Token): A token within the span, or a descendant from it.
        """
        def __get__(self):
            for word in self.lefts:
--- a/spacy/tokens/token.pyx
+++ b/spacy/tokens/token.pyx
@ -457,10 +457,11 @@ cdef class Token:
            yield from self.rights
    property subtree:
-        """A sequence of all the token's syntactic descendents.
+        """A sequence containing the token and all the token's syntactic
        descendants.
        YIELDS (Token): A descendent token such that
-            `self.is_ancestor(descendent)`.
+            `self.is_ancestor(descendent) or token == self`.
        """
        def __get__(self):
            for word in self.lefts:
--- a/spacy/util.py
+++ b/spacy/util.py
@ -253,7 +253,6 @@ def get_entry_point(key, value):
 def is_in_jupyter():
    """Check if user is running spaCy from a Jupyter notebook by detecting the
    IPython kernel. Mainly used for the displaCy visualizer.
    RETURNS (bool): True if in Jupyter, False if not.
    """
    # https://stackoverflow.com/a/39662359/6400719
@ -667,3 +666,19 @@ class SimpleFrozenDict(dict):
    def update(self, other):
        raise NotImplementedError(Errors.E095)
 class DummyTokenizer(object):
    # add dummy methods for to_bytes, from_bytes, to_disk and from_disk to
    # allow serialization (see #1557)
    def to_bytes(self, **exclude):
        return b''
    def from_bytes(self, _bytes_data, **exclude):
        return self
    def to_disk(self, _path, **exclude):
        return None
    def from_disk(self, _path, **exclude):
        return self
--- a/website/api/_annotation/_dep-labels.jade
+++ b/website/api/_annotation/_dep-labels.jade
@ -150,3 +150,9 @@ p
        +dep-row("re", "repeated element")
        +dep-row("rs", "reported speech")
        +dep-row("sb", "subject")
        +dep-row("sbp", "passivised subject")
        +dep-row("sp", "subject or predicate")
        +dep-row("svp", "separable verb prefix")
        +dep-row("uc", "unit component")
        +dep-row("vo", "vocative")
        +dep-row("ROOT", "root")
--- a/website/api/phrasematcher.jade
+++ b/website/api/phrasematcher.jade
@ -5,7 +5,7 @@ include ../_includes/_mixins
 p
    |  The #[code PhraseMatcher] lets you efficiently match large terminology
    |  lists. While the #[+api("matcher") #[code Matcher]] lets you match
-    |  squences based on lists of token descriptions, the #[code PhraseMatcher]
+    |  sequences based on lists of token descriptions, the #[code PhraseMatcher]
    |  accepts match patterns in the form of #[code Doc] objects.
 +h(2, "init") PhraseMatcher.__init__
--- a/website/api/span.jade
+++ b/website/api/span.jade
@ -489,7 +489,7 @@ p
    +tag property
    +tag-model("parse")
-p Tokens that descend from tokens in the span, but fall outside it.
+p Tokens within the span and tokens which descend from them.
 +aside-code("Example").
    doc = nlp(u'Give it back! He pleaded.')
@ -500,7 +500,7 @@ p Tokens that descend from tokens in the span, but fall outside it.
    +row("foot")
        +cell yields
        +cell #[code Token]
-        +cell A descendant of a token within the span.
+        +cell A token within the span, or a descendant from it.
 +h(2, "has_vector") Span.has_vector
    +tag property
--- a/website/api/token.jade
+++ b/website/api/token.jade
@ -1,3 +1,4 @@
 //- 💫 DOCS > API > TOKEN
 include ../_includes/_mixins
@ -405,7 +406,7 @@ p
    +tag property
    +tag-model("parse")
-p A sequence of all the token's syntactic descendants.
+p A sequence containing the token and all the token's syntactic descendants.
 +aside-code("Example").
    doc = nlp(u'Give it back! He pleaded.')
@ -416,7 +417,7 @@ p A sequence of all the token's syntactic descendants.
    +row("foot")
        +cell yields
        +cell #[code Token]
-        +cell A descendant token such that #[code self.is_ancestor(descendant)].
+        +cell A descendant token such that #[code self.is_ancestor(token) or token == self].
 +h(2, "is_sent_start") Token.is_sent_start
    +tag property
--- a/website/universe/universe.json
+++ b/website/universe/universe.json
@ -1083,20 +1083,31 @@
            "category": ["pipeline"]
        },
        {
-            "id": "spacy2conllu",
+            "id": "spacy-conll",
-            "title": "spaCy2CoNLLU",
+            "title": "spacy_conll",
            "slogan": "Parse text with spaCy and print the output in CoNLL-U format",
-            "description": "Simple script to parse text with spaCy and print the output in CoNLL-U format",
+            "description": "This module allows you to parse a text to CoNLL-U format. You can use it as a command line tool, or embed it in your own scripts.",
            "code_example": [
-                "python parse_as_conllu.py [-h] --input_file INPUT_FILE [--output_file OUTPUT_FILE] --model MODEL"
+                "from spacy_conll import Spacy2ConllParser",
                "spacyconll = Spacy2ConllParser()",
                "",
                "# `parse` returns a generator of the parsed sentences",
                "for parsed_sent in spacyconll.parse(input_str='I like cookies.\nWhat about you?\nI don't like 'em!'):",
                "    do_something_(parsed_sent)",
                "",
                "# `parseprint` prints output to stdout (default) or a file (use `output_file` parameter)",
                "# This method is called when using the command line",
                "spacyconll.parseprint(input_str='I like cookies.')"
            ],
-            "code_language": "bash",
+            "code_language": "python",
-            "author": "Raquel G. Alhama",
+            "author": "Bram Vanroy",
            "author_links": {
-                "github": "rgalhama"
+                "github": "BramVanroy",
                "website": "https://bramvanroy.be"
            },
-            "github": "rgalhama/spaCy2CoNLLU",
+            "github": "BramVanroy/spacy_conll",
-            "category": ["training"]
+            "category": ["standalone"]
        }
    ],
    "projectCats": {
--- a/website/usage/_linguistic-features/_named-entities.jade
+++ b/website/usage/_linguistic-features/_named-entities.jade
@ -159,7 +159,7 @@ p
    |  To provide training examples to the entity recogniser, you'll first need
    |  to create an instance of the #[+api("goldparse") #[code GoldParse]] class.
    |  You can specify your annotations in a stand-off format or as token tags.
-    |  If a character offset in your entity annotations don't fall on a token
+    |  If a character offset in your entity annotations doesn't fall on a token
    |  boundary, the #[code GoldParse] class will treat that annotation as a
    |  missing value.  This allows for more realistic training, because the
    |  entity recogniser is allowed to learn from examples that may feature
--- a/website/usage/_linguistic-features/_rule-based-matching.jade
+++ b/website/usage/_linguistic-features/_rule-based-matching.jade
@ -444,7 +444,7 @@ p
    |  Let's say you're analysing user comments and you want to find out what
    |  people are saying about Facebook. You want to start off by finding
    |  adjectives following "Facebook is" or "Facebook was". This is obviously
-    |  a very rudimentary solution, but it'll be fast, and a great way get an
+    |  a very rudimentary solution, but it'll be fast, and a great way to get an
    |  idea for what's in your data. Your pattern could look like this:
 +code.
--- a/website/usage/_linguistic-features/_sentence-segmentation.jade
+++ b/website/usage/_linguistic-features/_sentence-segmentation.jade
@ -40,7 +40,7 @@ p
    |  constrained to predict parses consistent with the sentence boundaries.
 +infobox("Important note", "⚠️")
-    |  To prevent inconsitent state, you can only set boundaries #[em before] a
+    |  To prevent inconsistent state, you can only set boundaries #[em before] a
    |  document is parsed (and #[code Doc.is_parsed] is #[code False]). To
    |  ensure that your component is added in the right place, you can set
    |  #[code before='parser'] or #[code first=True] when adding it to the
--- a/website/usage/_linguistic-features/_tokenization.jade
+++ b/website/usage/_linguistic-features/_tokenization.jade
@ -21,7 +21,7 @@ p
    |  which needs to be split into two tokens: #[code {ORTH: "do"}] and
    |  #[code {ORTH: "n't", LEMMA: "not"}]. The prefixes, suffixes and infixes
    |  mosty define punctuation rules – for example, when to split off periods
-    |  (at the end of a sentence), and when to leave token containing periods
+    |  (at the end of a sentence), and when to leave tokens containing periods
    |  intact (abbreviations like "U.S.").
 +graphic("/assets/img/language_data.svg")
--- a/website/usage/_processing-pipelines/_multithreading.jade
+++ b/website/usage/_processing-pipelines/_multithreading.jade
@ -43,7 +43,7 @@ p
 p
    |  This example shows how to use multiple cores to process text using
-    |  spaCy and #[+a("https://pythonhosted.org/joblib/") Joblib]. We're
+    |  spaCy and #[+a("https://joblib.readthedocs.io/en/latest/parallel.html") Joblib]. We're
    |  exporting part-of-speech-tagged, true-cased, (very roughly)
    |  sentence-separated text, with each "sentence" on a newline, and
    |  spaces between tokens. Data is loaded from the IMDB movie reviews
--- a/website/usage/_visualizers/_ent.jade
+++ b/website/usage/_visualizers/_ent.jade
@ -74,7 +74,7 @@ p
    displacy.serve(doc, style='ent')
 p
-    |  This feature is espeically handy if you're using displaCy to compare
+    |  This feature is especially handy if you're using displaCy to compare
    |  performance at different stages of a process, e.g. during training. Here
    |  you could use the title for a brief description of the text example and
    |  the number of iterations.
--- a/website/usage/_visualizers/_html.jade
+++ b/website/usage/_visualizers/_html.jade
@ -61,7 +61,7 @@ p
        output_path.open('w', encoding='utf-8').write(svg)
 p
-    |  The above code will generate the dependency visualizations as to
+    |  The above code will generate the dependency visualizations as
    |  two files, #[code This-is-an-example.svg] and #[code This-is-another-one.svg].
--- a/website/usage/visualizers.jade
+++ b/website/usage/visualizers.jade
@ -24,7 +24,7 @@ include ../_includes/_mixins
        |  standards.
    p
-        |  The quickest way visualize  #[code Doc] is to use
+        |  The quickest way to visualize  #[code Doc] is to use
        |  #[+api("displacy#serve") #[code displacy.serve]]. This will spin up a
        |  simple web server and let you view the result straight from your browser.
        |  displaCy can either take a single #[code Doc] or a list of #[code Doc]