Merge branch 'master' into develop

2025-11-25 20:36:02 +03:00 · 2019-02-07 20:54:07 +01:00 · 2019-02-07 20:54:07 +01:00 · 5d0b60999d
commit 5d0b60999d
parent dbeebfa3a2 04aa041c9e
77 changed files with 293374 additions and 292084 deletions
--- a/.github/contributors/DeNeutoy.md
+++ b/.github/contributors/DeNeutoy.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |Mark Neumann                      |
+| Company name (if applicable)   |Allen Institute for AI                      |
+| Title or role (if applicable)  |Research Engineer                      |
+| Date                           | 13/01/2019                      |
+| GitHub username                |@Deneutoy                      |
+| Website (optional)             |markneumann.xyz                      |
--- a/.github/contributors/Loghijiaha.md
+++ b/.github/contributors/Loghijiaha.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Loghi Perinpanayagam |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |   Student            |
+| Date                           |   13 Jan, 2019       |
+| GitHub username                |   loghijiaha         |
+| Website (optional)             |                      |
--- a/.github/contributors/PolyglotOpenstreetmap.md
+++ b/.github/contributors/PolyglotOpenstreetmap.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Jo                   |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2018-01-26           |
+| GitHub username                | PolyglotOpenstreetmap|
+| Website (optional)             |                      |
--- a/.github/contributors/adrianeboyd.md
+++ b/.github/contributors/adrianeboyd.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Adriane Boyd         |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 28 January 2019      |
+| GitHub username                | adrianeboyd          |
+| Website (optional)             |                      |
--- a/.github/contributors/alvations.md
+++ b/.github/contributors/alvations.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Liling              |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |  04 Jan 2019         |
+| GitHub username                |  alvations           |
+| Website (optional)             |                      |
--- a/.github/contributors/amperinet.md
+++ b/.github/contributors/amperinet.md
@ -101,6 +101,6 @@ mark both statements:
 | Name                           | Amandine Périnet        |
 | Company name (if applicable)   | 365Talents              |
 | Title or role (if applicable)  | Data Science Researcher |
-| Date                           | 12/12/2018              |
+| Date                           | 28/01/2019              |
 | GitHub username                | amperinet               |
 | Website (optional)             |                         |
--- a/.github/contributors/boena.md
+++ b/.github/contributors/boena.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Björn Lennartsson                     |
+| Company name (if applicable)   | Uptrail AB                     |
+| Title or role (if applicable)  | CTO                     |
+| Date                           | 2019-01-15                     |
+| GitHub username                | boena                     |
+| Website (optional)             | www.uptrail.com                     |
--- a/.github/contributors/foufaster.md
+++ b/.github/contributors/foufaster.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |Anès Foufa            |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |NLP developer         |
+| Date                           |21/01/2019            |
+| GitHub username                |foufaster             |
+| Website (optional)             |                      |
--- a/.github/contributors/ozcankasal.md
+++ b/.github/contributors/ozcankasal.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Ozcan Kasal          |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | December 21, 2018    |
+| GitHub username                | ozcankasal           |
+| Website (optional)             |                      |
--- a/.github/contributors/retnuh.md
+++ b/.github/contributors/retnuh.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1.  The term "contribution" or "contributed materials" means any source code,
+    object code, patch, tool, sample, graphic, specification, manual,
+    documentation, or any other material posted or submitted by you to the project.
+
+2.  With respect to any worldwide copyrights, or copyright applications and
+    registrations, in your contribution:
+
+        * you hereby assign to us joint ownership, and to the extent that such
+        assignment is or becomes invalid, ineffective or unenforceable, you hereby
+        grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+        royalty-free, unrestricted license to exercise all rights under those
+        copyrights. This includes, at our option, the right to sublicense these same
+        rights to third parties through multiple levels of sublicensees or other
+        licensing arrangements;
+
+        * you agree that each of us can do all things in relation to your
+        contribution as if each of us were the sole owners, and if one of us makes
+        a derivative work of your contribution, the one who makes the derivative
+        work (or has it made will be the sole owner of that derivative work;
+
+        * you agree that you will not assert any moral rights in your contribution
+        against us, our licensees or transferees;
+
+        * you agree that we may register a copyright in your contribution and
+        exercise all ownership rights associated with it; and
+
+        * you agree that neither of us has any duty to consult with, obtain the
+        consent of, pay or render an accounting to the other for any use or
+        distribution of your contribution.
+
+3.  With respect to any patents you own, or that you can license without payment
+    to any third party, you hereby grant to us a perpetual, irrevocable,
+    non-exclusive, worldwide, no-charge, royalty-free license to:
+
+        * make, have made, use, sell, offer to sell, import, and otherwise transfer
+        your contribution in whole or in part, alone or in combination with or
+        included in any product, work or materials arising out of the project to
+        which your contribution was submitted, and
+
+        * at our option, to sublicense these same rights to third parties through
+        multiple levels of sublicensees or other licensing arrangements.
+
+4.  Except as set out above, you keep all right, title, and interest in your
+    contribution. The rights that you grant to us under these terms are effective
+    on the date you first submitted a contribution to us, even if your submission
+    took place before the date you sign these terms.
+
+5.  You covenant, represent, warrant and agree that:
+
+    - Each contribution that you submit is and shall be an original work of
+      authorship and you can legally grant the rights set out in this SCA;
+
+    - to the best of your knowledge, each contribution will not violate any
+      third party's copyrights, trademarks, patents, or other intellectual
+      property rights; and
+
+    - each contribution shall be in compliance with U.S. export control laws and
+      other applicable export and import laws. You agree to notify us if you
+      become aware of any circumstance which would make any of the foregoing
+      representations inaccurate in any respect. We may publicly disclose your
+      participation in the project, including the fact that you have signed the SCA.
+
+6.  This SCA is governed by the laws of the State of California and applicable
+    U.S. Federal law. Any choice of law rules will not apply.
+
+7.  Please place an “x” on one of the applicable statement below. Please do NOT
+    mark both statements:
+
+        * [x] I am signing on behalf of myself as an individual and no other person
+        or entity, including my employer, has or will have rights with respect to my
+        contributions.
+
+        * [ ] I am signing on behalf of my employer or a legal entity and I have the
+        actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                         | Entry        |
+| ----------------------------- | ------------ |
+| Name                          | Hunter Kelly |
+| Company name (if applicable)  |              |
+| Title or role (if applicable) |              |
+| Date                          | 2019-01-10   |
+| GitHub username               | retnuh       |
+| Website (optional)            |              |
--- a/.github/contributors/willprice.md
+++ b/.github/contributors/willprice.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                 |
+|------------------------------- | --------------------- |
+| Name                           | Will Price            |
+| Company name (if applicable)   | N/A                   |
+| Title or role (if applicable)  | N/A                   |
+| Date                           | 26/12/2018            |
+| GitHub username                | willprice             |
+| Website (optional)             | https://willprice.org |
--- a/MANIFEST.in
+++ b/MANIFEST.in
@ -1,4 +1,5 @@
 recursive-include include *.h
 include LICENSE
 include README.md
+include pyproject.toml
 include bin/spacy
--- a/contributer_agreement.md
+++ b/contributer_agreement.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Laura Baakman        |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | February 7, 2019     |
+| GitHub username                | lauraBaakman         |
+| Website (optional)             |                      |
--- a/examples/information_extraction/phrase_matcher.py
+++ b/examples/information_extraction/phrase_matcher.py
@ -58,7 +58,7 @@ import spacy
    lang=("Language class to initialise", "option", "l", str),
 )
 def main(patterns_loc, text_loc, n=10000, lang="en"):
-    nlp = spacy.blank("en")
+    nlp = spacy.blank(lang)
    nlp.vocab.lex_attr_getters = {}
    phrases = read_gazetteer(nlp.tokenizer, patterns_loc)
    count = 0
--- a/examples/training/train_textcat.py
+++ b/examples/training/train_textcat.py
@ -26,6 +26,11 @@ from spacy.util import minibatch, compounding
    n_iter=("Number of training iterations", "option", "n", int),
 )
 def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
+    if output_dir is not None:
+        output_dir = Path(output_dir)
+        if not output_dir.exists():
+            output_dir.mkdir()
+
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
@ -87,9 +92,6 @@ def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
    print(test_text, doc.cats)

    if output_dir is not None:
-        output_dir = Path(output_dir)
-        if not output_dir.exists():
-            output_dir.mkdir()
        with nlp.use_params(optimizer.averages):
            nlp.to_disk(output_dir)
        print("Saved model to", output_dir)
--- a/examples/training/training-data.json
+++ b/examples/training/training-data.json
@ -1,6 +1,6 @@
 [
    {
-      "id": "wsj_0200",
+      "id": 42,
      "paragraphs": [
        {
          "raw": "In an Oct. 19 review of \"The Misanthrope\" at Chicago's Goodman Theatre (\"Revitalized Classics Take the Stage in Windy City,\" Leisure & Arts), the role of Celimene, played by Kim Cattrall, was mistakenly attributed to Christina Haag. Ms. Haag plays Elianti.",
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,10 @@
+[build-system]
+requires = ["setuptools",
+            "wheel>0.32.0.<0.33.0",
+            "Cython",
+            "cymem>=2.0.2,<2.1.0",
+            "preshed>=2.0.1,<2.1.0",
+            "murmurhash>=0.28.0,<1.1.0",
+            "thinc>=6.12.1,<6.13.0",
+            ]
+build-backend = "setuptools.build_meta"
--- a/requirements.txt
+++ b/requirements.txt
@ -14,7 +14,7 @@ plac<1.0.0,>=0.9.6
 pathlib==1.0.1; python_version < "3.4"
 # Development dependencies
 cython>=0.25
-pytest>=4.0.0,<5.0.0
+pytest>=4.0.0,<4.1.0
 pytest-timeout>=1.3.0,<2.0.0
 mock>=2.0.0,<3.0.0
 flake8>=3.5.0,<3.6.0
--- a/setup.py
+++ b/setup.py
@ -246,6 +246,7 @@ def setup_package():
                "cuda92": ["cupy-cuda92>=4.0"],
                "cuda100": ["cupy-cuda100>=4.0"],
            },
+            python_requires=">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*",
            classifiers=[
                "Development Status :: 5 - Production/Stable",
                "Environment :: Console",
--- a/spacy/cli/converters/iob2json.py
+++ b/spacy/cli/converters/iob2json.py
@ -31,9 +31,13 @@ def read_iob(raw_sents):
        tokens = [re.split("[^\w\-]", line.strip())]
        if len(tokens[0]) == 3:
            words, pos, iob = zip(*tokens)
-        else:
+        elif len(tokens[0]) == 2:
            words, iob = zip(*tokens)
            pos = ["-"] * len(words)
+        else:
+            raise ValueError(
+                "The iob/iob2 file is not formatted correctly. Try checking whitespace and delimiters."
+            )
        biluo = iob_to_biluo(iob)
        sentences.append(
            [
--- a/spacy/cli/init_model.py
+++ b/spacy/cli/init_model.py
@ -208,7 +208,11 @@ def read_freqs(freqs_loc, max_length=100, min_doc_freq=5, min_freq=50):
            doc_freq = int(doc_freq)
            freq = int(freq)
            if doc_freq >= min_doc_freq and freq >= min_freq and len(key) < max_length:
+                try:
                    word = literal_eval(key)
+                except SyntaxError:
+                    # Take odd strings literally.
+                    word = literal_eval("'%s'" % key)
                smooth_count = counts.smoother(int(freq))
                probs[word] = math.log(smooth_count) - log_total
    oov_prob = math.log(counts.smoother(0)) - log_total
--- a/spacy/displacy/init.py
+++ b/spacy/displacy/init.py
@ -9,7 +9,6 @@ from ..util import is_in_jupyter


 _html = {}
-IS_JUPYTER = is_in_jupyter()
 RENDER_WRAPPER = None


@ -18,7 +17,7 @@ def render(
    style="dep",
    page=False,
    minify=False,
-    jupyter=IS_JUPYTER,
+    jupyter=False,
    options={},
    manual=False,
 ):
@ -51,7 +50,7 @@ def render(
    html = _html["parsed"]
    if RENDER_WRAPPER is not None:
        html = RENDER_WRAPPER(html)
-    if jupyter:  # return HTML rendered by IPython display()
+    if jupyter or is_in_jupyter():  # return HTML rendered by IPython display()
        from IPython.core.display import display, HTML

        return display(HTML(html))
--- a/spacy/displacy/render.py
+++ b/spacy/displacy/render.py
@ -1,7 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals

-import random
+import uuid

 from .templates import TPL_DEP_SVG, TPL_DEP_WORDS, TPL_DEP_ARCS
 from .templates import TPL_ENT, TPL_ENTS, TPL_FIGURE, TPL_TITLE, TPL_PAGE
@ -41,7 +41,7 @@ class DependencyRenderer(object):
        """
        # Create a random ID prefix to make sure parses don't receive the
        # same ID, even if they're identical
-        id_prefix = random.randint(0, 999)
+        id_prefix = uuid.uuid4().hex
        rendered = [
            self.render_svg("{}-{}".format(id_prefix, i), p["words"], p["arcs"])
            for i, p in enumerate(parsed)
--- a/spacy/lang/fr/lemmatizer/init.py
+++ b/spacy/lang/fr/lemmatizer/init.py
@ -4,20 +4,24 @@ from __future__ import unicode_literals
 from .lookup import LOOKUP
 from ._adjectives import ADJECTIVES
 from ._adjectives_irreg import ADJECTIVES_IRREG
+from ._adp_irreg import ADP_IRREG
 from ._adverbs import ADVERBS
+from ._auxiliary_verbs_irreg import AUXILIARY_VERBS_IRREG
+from ._cconj_irreg import CCONJ_IRREG
+from ._dets_irreg import DETS_IRREG
+from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES
 from ._nouns import NOUNS
 from ._nouns_irreg import NOUNS_IRREG
+from ._pronouns_irreg import PRONOUNS_IRREG
+from ._sconj_irreg import SCONJ_IRREG
 from ._verbs import VERBS
 from ._verbs_irreg import VERBS_IRREG
-from ._dets_irreg import DETS_IRREG
-from ._pronouns_irreg import PRONOUNS_IRREG
-from ._auxiliary_verbs_irreg import AUXILIARY_VERBS_IRREG
-from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES


 LEMMA_INDEX = {'adj': ADJECTIVES, 'adv': ADVERBS, 'noun': NOUNS, 'verb': VERBS}

-LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'noun': NOUNS_IRREG, 'verb': VERBS_IRREG, 
-             'det': DETS_IRREG, 'pron': PRONOUNS_IRREG, 'aux': AUXILIARY_VERBS_IRREG}
+LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'adp': ADP_IRREG, 'aux': AUXILIARY_VERBS_IRREG,
+             'cconj': CCONJ_IRREG, 'det': DETS_IRREG, 'noun': NOUNS_IRREG, 'verb': VERBS_IRREG, 
+             'pron': PRONOUNS_IRREG, 'sconj': SCONJ_IRREG}

 LEMMA_RULES = {'adj': ADJECTIVE_RULES, 'noun': NOUN_RULES, 'verb': VERB_RULES}
--- a/spacy/lang/fr/lemmatizer/_adjectives.py
+++ b/spacy/lang/fr/lemmatizer/_adjectives.py
--- a/spacy/lang/fr/lemmatizer/_adjectives_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_adjectives_irreg.py
@ -863,6 +863,7 @@ ADJECTIVES_IRREG = {
 	"affixales": ("affixal",),
 	"affixe": ("affixe",),
 	"affixées": ("affixé",),
+	"afflanqué": ("afflanquer",),
 	"affleuré": ("affleurer",),
 	"affleurée": ("affleurer",),
 	"affleurées": ("affleurer",),
@ -1063,11 +1064,15 @@ ADJECTIVES_IRREG = {
 	"agrobiologique": ("agrobiologique",),
 	"agrochimique": ("agrochimique",),
 	"agroclimatologique": ("agroclimatologique",),
+	"agro-environnementales": ("agro-environnemental",),
+	"agro-environnementaux": ("agro-environnemental",),
 	"agrogéologique": ("agrogéologique",),
 	"agrologique": ("agrologique",),
 	"agrométéorologique": ("agrométéorologique",),
 	"agronomique": ("agronomique",),
+	"agro-pastorales": ("agro-pastoral",),
 	"agropastorales": ("agropastoral",),
+	"agro-pastoraux": ("agro-pastoral",),
 	"agrotechnique": ("agrotechnique",),
 	"aguerri": ("aguerrir",),
 	"aguerrie": ("aguerrir",),
@ -1677,6 +1682,7 @@ ADJECTIVES_IRREG = {
 	"amitotique": ("amitotique",),
 	"ammoniacales": ("ammoniacal",),
 	"ammoniaque": ("ammoniac",),
+	"ammoniaquées": ("ammoniaqué",),
 	"ammoniaques": ("ammoniac",),
 	"ammonifié": ("ammonifier",),
 	"ammonifiée": ("ammonifier",),
@ -1984,6 +1990,7 @@ ADJECTIVES_IRREG = {
 	"anglicisées": ("angliciser",),
 	"anglicisés": ("angliciser",),
 	"anglo-arabe": ("anglo-arabe",),
+	"anglo-égyptienne": ("anglo-égyptien",),
 	"anglo-irlandais": ("anglo-irlandais",),
 	"anglomane": ("anglomane",),
 	"anglo-normandes": ("anglo-normand",),
@ -2227,6 +2234,7 @@ ADJECTIVES_IRREG = {
 	"antiacridiennes": ("antiacridien",),
 	"antiadhésive": ("antiadhésif",),
 	"antiadhésives": ("antiadhésif",),
+	"anti-aérienne": ("anti-aérien",),
 	"antiaérienne": ("antiaérien",),
 	"antiaériennes": ("antiaérien",),
 	"antialcoolique": ("antialcoolique",),
@ -3007,6 +3015,7 @@ ADJECTIVES_IRREG = {
 	"arasées": ("araser",),
 	"arasés": ("araser",),
 	"aratoire": ("aratoire",),
+	"araucaniennes": ("araucanien",),
 	"arbitrable": ("arbitrable",),
 	"arbitragiste": ("arbitragiste",),
 	"arbitraire": ("arbitraire",),
@ -3061,6 +3070,7 @@ ADJECTIVES_IRREG = {
 	"archimédienne": ("archimédien",),
 	"archimédiennes": ("archimédien",),
 	"archimillionnaire": ("archimillionnaire",),
+	"archinulles": ("archinul",),
 	"archipalliales": ("archipallial",),
 	"archipélagique": ("archipélagique",),
 	"archipleines": ("archiplein",),
@ -3284,6 +3294,7 @@ ADJECTIVES_IRREG = {
 	"arthroscopique": ("arthroscopique",),
 	"arthrosique": ("arthrosique",),
 	"arthrosynoviales": ("arthrosynovial",),
+	"arthurienne": ("arthurien",),
 	"articulaire": ("articulaire",),
 	"articulatoire": ("articulatoire",),
 	"articulatrice": ("articulateur",),
@ -4186,6 +4197,7 @@ ADJECTIVES_IRREG = {
 	"autrichienne": ("autrichien",),
 	"autrichiennes": ("autrichien",),
 	"auvergnates": ("auvergnat",),
+	"Auxerroises": ("auxerrois",),
 	"auxiliaire": ("auxiliaire",),
 	"auxiliatrice": ("auxiliateur",),
 	"auxiliatrices": ("auxiliateur",),
@ -4341,6 +4353,7 @@ ADJECTIVES_IRREG = {
 	"baasiste": ("baasiste",),
 	"baassiste": ("baassiste",),
 	"babelienne": ("babelien",),
+	"babélienne": ("babélien",),
 	"babeliennes": ("babelien",),
 	"babies": ("babi",),
 	"babillardes": ("babillard",),
@ -4464,6 +4477,7 @@ ADJECTIVES_IRREG = {
 	"balafrées": ("balafré",),
 	"balais": ("balais",),
 	"balaise": ("balaise",),
+	"balaises": ("balaise",),
 	"balancées": ("balancé",),
 	"balayé": ("balayer",),
 	"balayée": ("balayer",),
@ -4646,12 +4660,14 @@ ADJECTIVES_IRREG = {
 	"baroquisés": ("baroquiser",),
 	"barotraumatique": ("barotraumatique",),
 	"barotrope": ("barotrope",),
+	"barreaudées": ("barreaudé",),
 	"barrées": ("barré",),
 	"barri": ("barrir",),
 	"barricadé": ("barricader",),
 	"barricadée": ("barricader",),
 	"barricadées": ("barricader",),
 	"barricadés": ("barricader",),
+	"barricadières": ("barricadier",),
 	"barrie": ("barrir",),
 	"barries": ("barrir",),
 	"barris": ("barrir",),
@ -4730,6 +4746,7 @@ ADJECTIVES_IRREG = {
 	"bathyales": ("bathyal",),
 	"bathymétrique": ("bathymétrique",),
 	"bathypélagique": ("bathypélagique",),
+	"bâtière": ("bâtier",),
 	"bâties": ("bâti",),
 	"batifolantes": ("batifolant",),
 	"bâtissable": ("bâtissable",),
@ -4781,7 +4798,6 @@ ADJECTIVES_IRREG = {
 	"beauvaisinnes": ("beauvaisin",),
 	"beauvoirienne": ("beauvoirien",),
 	"beauvoiriennes": ("beauvoirien",),
-    "beaux": ("bel",),
 	"bébête": ("bébête",),
 	"bécarre": ("bécarre",),
 	"bêché": ("bêcher",),
@ -4947,6 +4963,7 @@ ADJECTIVES_IRREG = {
 	"berrichonnes": ("berrichon",),
 	"berruyère": ("berruyer",),
 	"berruyères": ("berruyer",),
+	"bérullienne": ("bérullien",),
 	"besogné": ("besogner",),
 	"besognée": ("besogner",),
 	"besognées": ("besogner",),
@ -5625,6 +5642,7 @@ ADJECTIVES_IRREG = {
 	"bostonnés": ("bostonner",),
 	"botanique": ("botanique",),
 	"botes": ("bot",),
+	"botswanaises": ("botswanais",),
 	"botté": ("botter",),
 	"bottée": ("botter",),
 	"bottées": ("botter",),
@ -5680,6 +5698,8 @@ ADJECTIVES_IRREG = {
 	"bouffies": ("bouffi",),
 	"bouffonne": ("bouffon",),
 	"bouffonnes": ("bouffon",),
+	"bouffonneuse": ("bouffonneux",),
+	"bouffonneuses": ("bouffonneux",),
 	"bougé": ("bouger",),
 	"bougée": ("bouger",),
 	"bougées": ("bouger",),
@ -5803,6 +5823,7 @@ ADJECTIVES_IRREG = {
 	"bouturés": ("bouturer",),
 	"bouvière": ("bouvier",),
 	"bouvières": ("bouvier",),
+	"bouvilloises": ("bouvillois",),
 	"bovines": ("bovin",),
 	"bowalisé": ("bowaliser",),
 	"bowalisée": ("bowaliser",),
@ -5984,6 +6005,7 @@ ADJECTIVES_IRREG = {
 	"brillantinées": ("brillantiner",),
 	"brillantinés": ("brillantiner",),
 	"brillantissime": ("brillantissime",),
+	"brilliantes": ("brilliant",),
 	"brimbalé": ("brimbaler",),
 	"brimbalée": ("brimbaler",),
 	"brimbalées": ("brimbaler",),
@ -6071,6 +6093,7 @@ ADJECTIVES_IRREG = {
 	"brouettée": ("brouetter",),
 	"brouettées": ("brouetter",),
 	"brouettés": ("brouetter",),
+	"brouillardeuse": ("brouillardeux",),
 	"brouillées": ("brouillé",),
 	"brouilleuse": ("brouilleur",),
 	"brouilleuses": ("brouilleur",),
@ -6087,6 +6110,8 @@ ADJECTIVES_IRREG = {
 	"broutée": ("brouter",),
 	"broutées": ("brouter",),
 	"broutés": ("brouter",),
+	"brouteuse": ("brouteur",),
+	"brouteuses": ("brouteur",),
 	"brownienne": ("brownien",),
 	"browniennes": ("brownien",),
 	"broyé": ("broyer",),
@ -6123,6 +6148,7 @@ ADJECTIVES_IRREG = {
 	"brusquée": ("brusquer",),
 	"brusquées": ("brusquer",),
 	"brusqués": ("brusquer",),
+	"brusquette": ("brusquet",),
 	"brutales": ("brutal",),
 	"brutalisé": ("brutaliser",),
 	"brutalisée": ("brutaliser",),
@ -6143,6 +6169,7 @@ ADJECTIVES_IRREG = {
 	"buccinatrices": ("buccinateur",),
 	"bucco-dentaire": ("bucco-dentaire",),
 	"bucco-génitales": ("bucco-génital",),
+	"buccogénitales": ("buccogénital",),
 	"bucco-pharyngées": ("bucco-pharyngé",),
 	"bûché": ("bûcher",),
 	"bûchée": ("bûcher",),
@ -7339,6 +7366,7 @@ ADJECTIVES_IRREG = {
 	"centripète": ("centripète",),
 	"centriste": ("centriste",),
 	"centroacinaire": ("centroacinaire",),
+	"centroaméricaines": ("centroaméricain",),
 	"centrolobulaire": ("centrolobulaire",),
 	"centromédullaire": ("centromédullaire",),
 	"centronucléaire": ("centronucléaire",),
@ -8049,6 +8077,7 @@ ADJECTIVES_IRREG = {
 	"chosifiées": ("chosifier",),
 	"chosifiés": ("chosifier",),
 	"chosiste": ("chosiste",),
+	"choucardes": ("choucard",),
 	"chouchoute": ("chouchou",),
 	"chouchouté": ("chouchouter",),
 	"chouchoutée": ("chouchouter",),
@ -8060,6 +8089,8 @@ ADJECTIVES_IRREG = {
 	"choucroutées": ("choucrouter",),
 	"choucroutés": ("choucrouter",),
 	"chouette": ("chouette",),
+	"chouraveuse": ("chouraveur",),
+	"chouraveuses": ("chouraveur",),
 	"chouré": ("chourer",),
 	"chourée": ("chourer",),
 	"chourées": ("chourer",),
@ -8140,6 +8171,7 @@ ADJECTIVES_IRREG = {
 	"chrysanthémique": ("chrysanthémique",),
 	"chryséléphantines": ("chryséléphantin",),
 	"chrysophanique": ("chrysophanique",),
+	"chtarbées": ("chtarbé",),
 	"chthonienne": ("chthonien",),
 	"chthoniennes": ("chthonien",),
 	"ch'ti": ("petit",),
@ -8177,6 +8209,7 @@ ADJECTIVES_IRREG = {
 	"chymifiées": ("chymifier",),
 	"chymifiés": ("chymifier",),
 	"chypriote": ("chypriote",),
+	"ci-annexées": ("ci-annexé",),
 	"cibiste": ("cibiste",),
 	"ciblées": ("ciblé",),
 	"cicatricielle": ("cicatriciel",),
@ -8425,6 +8458,7 @@ ADJECTIVES_IRREG = {
 	"classable": ("classable",),
 	"classe": ("classe",),
 	"classées": ("classé",),
+	"classieuse": ("classieux",),
 	"classificatoire": ("classificatoire",),
 	"classificatrice": ("classificateur",),
 	"classificatrices": ("classificateur",),
@ -8682,6 +8716,7 @@ ADJECTIVES_IRREG = {
 	"coédités": ("coéditer",),
 	"coéditrice": ("coéditeur",),
 	"coéditrices": ("coéditeur",),
+	"co-éducative": ("co-éducatif",),
 	"coeliaque": ("coeliaque",),
 	"coelioscopique": ("coelioscopique",),
 	"coelomique": ("coelomique",),
@ -8754,11 +8789,11 @@ ADJECTIVES_IRREG = {
 	"coincidés": ("coincider",),
 	"cois": ("cois",),
 	"coïtales": ("coïtal",),
-    "coite": ("coi",),
+	"coite": ("cois",),
 	"coïté": ("coïter",),
 	"coïtée": ("coïter",),
 	"coïtées": ("coïter",),
-    "coites": ("coi",),
+	"coites": ("cois",),
 	"coïtés": ("coïter",),
 	"cokéfiable": ("cokéfiable",),
 	"cokéfié": ("cokéfier",),
@ -9219,6 +9254,7 @@ ADJECTIVES_IRREG = {
 	"concélébrées": ("concélébrer",),
 	"concélébrés": ("concélébrer",),
 	"concentrationnaire": ("concentrationnaire",),
+	"concentratives": ("concentratif",),
 	"concentrées": ("concentré",),
 	"concentrique": ("concentrique",),
 	"conceptualisé": ("conceptualiser",),
@ -10378,9 +10414,7 @@ ADJECTIVES_IRREG = {
 	"coulés": ("couler",),
 	"coulis": ("coulis",),
 	"coulissantes": ("coulissant",),
-    "coulisse": ("coulis",),
 	"coulissées": ("coulissé",),
-    "coulisses": ("coulis",),
 	"coumarinique": ("coumarinique",),
 	"coumarique": ("coumarique",),
 	"coupable": ("coupable",),
@ -12255,6 +12289,7 @@ ADJECTIVES_IRREG = {
 	"définitionnelle": ("définitionnel",),
 	"définitionnelles": ("définitionnel",),
 	"définitive": ("définitif",),
+	"definitives": ("definitif",),
 	"définitives": ("définitif",),
 	"définitoire": ("définitoire",),
 	"défiscalisé": ("défiscaliser",),
@ -14036,6 +14071,8 @@ ADJECTIVES_IRREG = {
 	"désertées": ("déserter",),
 	"désertes": ("désert",),
 	"désertés": ("déserter",),
+	"déserteuse": ("déserteux",),
+	"déserteuses": ("déserteux",),
 	"déserticole": ("déserticole",),
 	"désertifié": ("désertifier",),
 	"désertifiée": ("désertifier",),
@ -15676,6 +15713,8 @@ ADJECTIVES_IRREG = {
 	"domitiennes": ("domitien",),
 	"dommageable": ("dommageable",),
 	"dommage": ("dommage",),
+	"domoticienne": ("domoticien",),
+	"domoticiennes": ("domoticien",),
 	"domotisé": ("domotiser",),
 	"domotisée": ("domotiser",),
 	"domotisées": ("domotiser",),
@ -18204,6 +18243,7 @@ ADJECTIVES_IRREG = {
 	"entrevus": ("entrevoir",),
 	"entriste": ("entriste",),
 	"entropique": ("entropique",),
+	"entr'ouvertes": ("entr'ouvert",),
 	"entrouvertes": ("entrouvert",),
 	"entrustée": ("entruster",),
 	"entrusté": ("entruster",),
@ -19539,6 +19579,7 @@ ADJECTIVES_IRREG = {
 	"existentialiste": ("existentialiste",),
 	"existentielle": ("existentiel",),
 	"existentielles": ("existentiel",),
+	"exlusives": ("exlusif",),
 	"exobiologique": ("exobiologique",),
 	"exocardiaque": ("exocardiaque",),
 	"exocarpe": ("exocarpe",),
@ -19860,6 +19901,7 @@ ADJECTIVES_IRREG = {
 	"extraterritorialisé": ("extraterritorialiser",),
 	"extraterritorialisés": ("extraterritorialiser",),
 	"extratropicales": ("extratropical",),
+	"extra-utérines": ("extra-utérin",),
 	"extravagantes": ("extravagant",),
 	"extravaguée": ("extravaguer",),
 	"extravaguées": ("extravaguer",),
@ -21140,9 +21182,12 @@ ADJECTIVES_IRREG = {
 	"franc-maçonne": ("franc-maçon",),
 	"franc-maçonnes": ("franc-maçon",),
 	"franc-maçonnique": ("franc-maçonnique",),
+	"franco-algérienne": ("franco-algérien",),
 	"franco-américaines": ("franco-américain",),
+	"franco-anglaises": ("franco-anglais",),
 	"franco-belge": ("franco-belge",),
 	"franco-britannique": ("franco-britannique",),
+	"franco-chinoises": ("franco-chinois",),
 	"franco-françaises": ("franco-français",),
 	"franco-français": ("franco-français",),
 	"franco-italienne": ("franco-italien",),
@ -21273,6 +21318,7 @@ ADJECTIVES_IRREG = {
 	"fripés": ("friper",),
 	"friponne": ("fripon",),
 	"friponnes": ("fripon",),
+	"fripouillardes": ("fripouillard",),
 	"friquées": ("friqué",),
 	"frisantes": ("frisant",),
 	"frisées": ("frisé",),
@ -23262,6 +23308,8 @@ ADJECTIVES_IRREG = {
 	"heptatubulaire": ("heptatubulaire",),
 	"heptylique": ("heptylique",),
 	"heptynecarboxylique": ("heptynecarboxylique",),
+	"héraclitéenne": ("héraclitéen",),
+	"héraclitienne": ("héraclitien",),
 	"héraldique": ("héraldique",),
 	"herbacées": ("herbacé",),
 	"herbagée": ("herbager",),
@ -23316,6 +23364,8 @@ ADJECTIVES_IRREG = {
 	"herniées": ("hernié",),
 	"hernieuse": ("hernieux",),
 	"hernieuses": ("hernieux",),
+	"hernusienne": ("hernusien",),
+	"hernusiennes": ("hernusien",),
 	"héroï-comique": ("héroï-comique",),
 	"héroïnomane": ("héroïnomane",),
 	"héroïque": ("héroïque",),
@ -23543,6 +23593,7 @@ ADJECTIVES_IRREG = {
 	"historicisé": ("historiciser",),
 	"historicisés": ("historiciser",),
 	"historiciste": ("historiciste",),
+	"historico-culturelle": ("historico-culturel",),
 	"historiées": ("historié",),
 	"historienne": ("historien",),
 	"historiennes": ("historien",),
@ -24802,6 +24853,7 @@ ADJECTIVES_IRREG = {
 	"impolies": ("impoli",),
 	"impolitique": ("impolitique",),
 	"impolluable": ("impolluable",),
+	"impolluées": ("impollué",),
 	"impondérable": ("impondérable",),
 	"impopulaire": ("impopulaire",),
 	"importable": ("importable",),
@ -24951,6 +25003,7 @@ ADJECTIVES_IRREG = {
 	"inapaisées": ("inapaisé",),
 	"inaperçues": ("inaperçu",),
 	"inappareillable": ("inappareillable",),
+	"inapparentes": ("inapparent",),
 	"inapplicable": ("inapplicable",),
 	"inappliquées": ("inappliqué",),
 	"inappréciable": ("inappréciable",),
@ -31426,8 +31479,6 @@ ADJECTIVES_IRREG = {
 	"multinorme": ("multinorme",),
 	"multioculaire": ("multioculaire",),
 	"multipare": ("multipare",),
-    "multipartite": ("multiparti",),
-    "multipartites": ("multiparti",),
 	"multipas": ("multipas",),
 	"multipasse": ("multipasse",),
 	"multiphasique": ("multiphasique",),
@ -32787,6 +32838,7 @@ ADJECTIVES_IRREG = {
 	"observés": ("observer",),
 	"obsessionnelle": ("obsessionnel",),
 	"obsessionnelles": ("obsessionnel",),
+	"obsessive": ("obsessif",),
 	"obsidionales": ("obsidional",),
 	"obsolescentes": ("obsolescent",),
 	"obsolète": ("obsolète",),
@ -35336,8 +35388,8 @@ ADJECTIVES_IRREG = {
 	"pétersbourgeoises": ("pétersbourgeois",),
 	"pétersbourgeois": ("pétersbourgeois",),
 	"pétés": ("péter",),
-    "péteuse": ("péteux",),
-    "péteuses": ("péteux",),
+	"péteuse": ("péteur",),
+	"péteuses": ("péteur",),
 	"pétillantes": ("pétillant",),
 	"pétiniste": ("pétiniste",),
 	"pétiolaire": ("pétiolaire",),
@ -36263,8 +36315,8 @@ ADJECTIVES_IRREG = {
 	"plumées": ("plumer",),
 	"plumé": ("plumer",),
 	"plumés": ("plumer",),
-    "plumeuse": ("plumeux",),
-    "plumeuses": ("plumeux",),
+	"plumeuse": ("plumeur",),
+	"plumeuses": ("plumeur",),
 	"plurales": ("plural",),
 	"pluralisée": ("pluraliser",),
 	"pluralisées": ("pluraliser",),
@ -43778,9 +43830,12 @@ ADJECTIVES_IRREG = {
 	"saoulé": ("saouler",),
 	"saoûlé": ("saoûler",),
 	"saoules": ("saoul",),
+	"saoules": ("saoûl",),
 	"saoûles": ("saoûl",),
 	"saoulés": ("saouler",),
 	"saoûlés": ("saoûler",),
+	"saoul": ("saoûl",),
+	"saouls": ("saoûl",),
 	"sapée": ("saper",),
 	"sapées": ("saper",),
 	"sapé": ("saper",),
@ -45235,7 +45290,7 @@ ADJECTIVES_IRREG = {
 	"sophianique": ("sophianique",),
 	"sophiologique": ("sophiologique",),
 	"sophistiquée": ("sophistiquer",),
-    "sophistiquées": ("sophistiqué",),
+	"sophistiquées": ("sophistiquer",),
 	"sophistique": ("sophistique",),
 	"sophistiqué": ("sophistiquer",),
 	"sophistiqués": ("sophistiquer",),
@ -45318,7 +45373,10 @@ ADJECTIVES_IRREG = {
 	"soûlantes": ("soûlant",),
 	"soûlée": ("soûler",),
 	"soûlées": ("soûler",),
+	"soûle": ("saoûl",),
 	"soûlé": ("soûler",),
+	"soules": ("saoûl",),
+	"soûles": ("saoûl",),
 	"soûles": ("soûl",),
 	"soûlés": ("soûler",),
 	"soulevée": ("soulever",),
@ -45329,6 +45387,9 @@ ADJECTIVES_IRREG = {
 	"soulignées": ("souligner",),
 	"souligné": ("souligner",),
 	"soulignés": ("souligner",),
+	"soul": ("saoûl",),
+	"soûl": ("saoûl",),
+	"souls": ("saoûl",),
 	"soumise": ("soumettre",),
 	"soumises": ("soumis",),
 	"soumissionnée": ("soumissionner",),
--- a/spacy/lang/fr/lemmatizer/_adp_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_adp_irreg.py
@ -0,0 +1,24 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+ADP_IRREG = {
+	"a": ("à",),
+	"apr.": ("après",),
+	"aux": ("à",),
+	"av.": ("avant",),
+	"avt": ("avant",),
+	"cf.": ("cf",),
+	"conf.": ("cf",),
+	"confer": ("cf",),
+	"d'": ("de",),
+	"des": ("de",),
+	"du": ("de",),
+	"jusqu'": ("jusque",),
+	"pdt": ("pendant",),
+        "+": ("plus",),
+        "pr": ("pour",),
+	"/": ("sur",),
+	"versus": ("vs",),
+	"vs.": ("vs",)
+}
--- a/spacy/lang/fr/lemmatizer/_adverbs.py
+++ b/spacy/lang/fr/lemmatizer/_adverbs.py
--- a/spacy/lang/fr/lemmatizer/_cconj_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_cconj_irreg.py
@ -0,0 +1,17 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+CCONJ_IRREG = {
+	"&amp;": ("et",),
+	"c-à-d": ("c'est-à-dire",),
+	"c.-à.-d.": ("c'est-à-dire",),
+	"càd": ("c'est-à-dire",),
+	"&": ("et",),
+	"et|ou": ("et-ou",),
+	"et/ou": ("et-ou",),
+	"i.e.": ("c'est-à-dire",),
+	"ie": ("c'est-à-dire",),
+	"ou/et": ("et-ou",),
+	"+": ("plus",)
+}
--- a/spacy/lang/fr/lemmatizer/_dets_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_dets_irreg.py
@ -4,20 +4,27 @@ from __future__ import unicode_literals

 DETS_IRREG = {
    "aucune": ("aucun",),
+    "cents": ("cent",),
+    "certaine": ("certain",),
+    "certaines": ("certain",),
+    "certains": ("certain",),
    "ces": ("ce",),
    "cet": ("ce",),
    "cette": ("ce",),
-    "cents": ("cent",),
-    "certaines": ("certains",),
+    "des": ("un",),
    "différentes": ("différents",),
+    "diverse": ("divers",),
    "diverses": ("divers",),
+    "du": ("de",),
    "la": ("le",),
-    "les": ("le",),
-    "l'": ("le",),
    "laquelle": ("lequel",),
+    "les": ("le",),
+    "lesdites": ("ledit",),
+    "lesdits": ("ledit",),
+    "leurs": ("leur",),
    "lesquelles": ("lequel",),
    "lesquels": ("lequel",),
-    "leurs": ("leur",),
+    "l'": ("le",),
    "mainte": ("maint",),
    "maintes": ("maint",),
    "maints": ("maint",),
@ -27,23 +34,29 @@ DETS_IRREG = {
    "nulle": ("nul",),
    "nulles": ("nul",),
    "nuls": ("nul",),
+    "pareille": ("pareil",),
+    "pareilles": ("pareil",),
+    "pareils": ("pareil",),
    "quelle": ("quel",),
    "quelles": ("quel",),
-    "quels": ("quel",),
-    "quelqu'": ("quelque",),
+    "qq": ("quelque",),
+    "qqes": ("quelque",),
+    "qqs": ("quelque",),
    "quelques": ("quelque",),
+    "quelqu'": ("quelque",),
+    "quels": ("quel",),
    "sa": ("son",),
    "ses": ("son",),
-    "telle": ("tel",),
-    "telles": ("tel",),
-    "tels": ("tel",),
    "ta": ("ton",),
+    "telles": ("tel",),
+    "telle": ("tel",),
+    "tels": ("tel",),
    "tes": ("ton",),
    "tous": ("tout",),
-    "toute": ("tout",),
    "toutes": ("tout",),
-    "des": ("un",),
+    "toute": ("tout",),
    "une": ("un",),
    "vingts": ("vingt",),
+    "vot'": ("votre",),
    "vos": ("votre",),
 }
--- a/spacy/lang/fr/lemmatizer/_lemma_rules.py
+++ b/spacy/lang/fr/lemmatizer/_lemma_rules.py
@ -63,36 +63,8 @@ NOUN_RULES = [
    ["w", "w"],
    ["y", "y"],
    ["z", "z"],
-    ["as", "a"],
-    ["aux", "au"],
-    ["cs", "c"],
-    ["chs", "ch"],
-    ["ds", "d"],
-    ["és", "é"],
-    ["es", "e"],
-    ["eux", "eu"],
-    ["fs", "f"],
-    ["gs", "g"],
-    ["hs", "h"],
-    ["is", "i"],
-    ["ïs", "ï"],
-    ["js", "j"],
-    ["ks", "k"],
-    ["ls", "l"],
-    ["ms", "m"],
-    ["ns", "n"],
-    ["oux", "ou"],
-    ["os", "o"],
-    ["ps", "p"],
-    ["qs", "q"],
-    ["rs", "r"],
-    ["ses", "se"],
-    ["se", "se"],
-    ["ts", "t"],
-    ["us", "u"],
-    ["vs", "v"],
-    ["ws", "w"],
-    ["ys", "y"],
+    ["s", ""],
+    ["x", ""],
    ["nt(e", "nt"],
    ["nt(e)", "nt"],
    ["al(e", "ale"],
--- a/spacy/lang/fr/lemmatizer/_nouns.py
+++ b/spacy/lang/fr/lemmatizer/_nouns.py
--- a/spacy/lang/fr/lemmatizer/_nouns_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_nouns_irreg.py
--- a/spacy/lang/fr/lemmatizer/_pronouns_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_pronouns_irreg.py
@ -4,37 +4,89 @@ from __future__ import unicode_literals

 PRONOUNS_IRREG = {
    "aucune": ("aucun",),
-    "celle-ci": ("celui-ci",),
-    "celles-ci": ("celui-ci",),
-    "ceux-ci": ("celui-ci",),
-    "celle-là": ("celui-là",),
-    "celles-là": ("celui-là",),
-    "ceux-là": ("celui-là",),
+    "autres": ("autre",),
+    "ça": ("cela",),
+    "c'": ("ce",),
    "celle": ("celui",),
+    "celle-ci": ("celui-ci",),
+    "celle-là": ("celui-là",),
    "celles": ("celui",),
-    "ceux": ("celui",),
+    "celles-ci": ("celui-ci",),
+    "celles-là": ("celui-là",),
    "certaines": ("certains",),
+    "ceux": ("celui",),
+    "ceux-ci": ("celui-ci",),
+    "ceux-là": ("celui-là",),
    "chacune": ("chacun",),
+    "-elle": ("lui",),
+    "elle": ("lui",),
+    "elle-même": ("lui-même",),
+    "-elles": ("lui",),
+    "elles": ("lui",),
+    "elles-mêmes": ("lui-même",),
+    "eux": ("lui",),
+    "eux-mêmes": ("lui-même",),
    "icelle": ("icelui",),
    "icelles": ("icelui",),
    "iceux": ("icelui",),
+    "-il": ("il",),
+    "-ils": ("il",),
+    "ils": ("il",),
+    "-je": ("je",),
+    "j'": ("je",),
    "la": ("le",),
-    "les": ("le",),
    "laquelle": ("lequel",),
+    "l'autre": ("l'autre",),
+    "les": ("le",),
    "lesquelles": ("lequel",),
    "lesquels": ("lequel",),
-    "elle-même": ("lui-même",),
-    "elles-mêmes": ("lui-même",),
-    "eux-mêmes": ("lui-même",),
+    "-leur": ("leur",),
+    "l'on": ("on",),
+    "-lui": ("lui",),
+    "l'une": ("l'un",),
+    "mêmes": ("même",),
+    "-m'": ("me",),
+    "m'": ("me",),
+    "-moi": ("moi",),
+    "nous-mêmes": ("nous-même",),
+    "-nous": ("nous",),
+    "-on": ("on",),
+    "qqchose": ("quelque chose",),
+    "qqch": ("quelque chose",),
+    "qqc": ("quelque chose",),
+    "qqn": ("quelqu'un",),
    "quelle": ("quel",),
    "quelles": ("quel",),
-    "quels": ("quel",),
-    "quelques-unes": ("quelqu'un",),
-    "quelques-uns": ("quelqu'un",),
+    "quelques-unes": ("quelques-uns",),
    "quelque-une": ("quelqu'un",),
+    "quelqu'une": ("quelqu'un",),
+    "quels": ("quel",),
    "qu": ("que",),
-    "telle": ("tel",),
+    "s'": ("se",),
+    "-t-elle": ("elle",),
+    "-t-elles": ("elle",),
    "telles": ("tel",),
+    "telle": ("tel",),
    "tels": ("tel",),
-    "toutes": ("tous",),
+    "-t-en": ("en",),
+    "-t-il": ("il",),
+    "-t-ils": ("il",),
+    "-toi": ("toi",),
+    "-t-on": ("on",),
+    "tous": ("tout",),
+    "toutes": ("tout",),
+    "toute": ("tout",),
+    "-t'": ("te",),
+    "t'": ("te",),
+    "-tu": ("tu",),
+    "-t-y": ("y",),
+    "unes": ("un",),
+    "une": ("un",),
+    "uns": ("un",),
+    "vous-mêmes": ("vous-même",),
+    "vous-même": ("vous-même",),
+    "-vous": ("vous",),
+    "-vs": ("vous",),
+    "vs": ("vous",),
+    "-y": ("y",),
 }
--- a/spacy/lang/fr/lemmatizer/_sconj_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_sconj_irreg.py
@ -0,0 +1,19 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+SCONJ_IRREG = {
+	"lorsqu'": ("lorsque",),
+	"pac'que": ("parce que",),
+	"pac'qu'": ("parce que",),
+	"parc'que": ("parce que",),
+	"parc'qu'": ("parce que",),
+	"paske": ("parce que",),
+	"pask'": ("parce que",),
+	"pcq": ("parce que",),
+	"+": ("plus",),
+	"puisqu'": ("puisque",),
+	"qd": ("quand",),
+	"quoiqu'": ("quoique",),
+	"qu'": ("que",)
+}
--- a/spacy/lang/fr/lemmatizer/_verbs.py
+++ b/spacy/lang/fr/lemmatizer/_verbs.py
@ -6,63 +6,64 @@ VERBS = set(
    """
 abaisser abandonner abdiquer abecquer abéliser aberrer abhorrer abîmer abjurer
 ablater abluer ablutionner abominer abonder abonner aborder aborner aboucher
- abouler abouter abraquer abraser abreuver abricoter abriter absenter absinther
- absolutiser absorber abuser académifier académiser acagnarder accabler
- accagner accaparer accastiller accentuer accepter accessoiriser accidenter
- acclamer acclimater accointer accolader accoler accommoder accompagner
- accorder accorer accoster accoter accoucher accouder accouer accoupler
- accoutrer accoutumer accouver accrassiner accréditer accrocher acculer
- acculturer accumuler accuser acenser acétaliser acétyler achalander acharner
- acheminer achopper achromatiser aciduler aciériser acliquer acoquiner acquêter
- acquitter acter actiniser actionner activer actoriser actualiser acupuncturer
- acyler adapter additionner adenter adieuser adirer adjectiver adjectiviser
- adjurer adjuver administrer admirer admonester adoniser adonner adopter adorer
- adorner adosser adouber adresser adsorber aduler adverbialiser aéroporter
- aérosoliser aérosonder aérotransporter affabuler affacturer affairer affaisser
- affaiter affaler affamer affecter affectionner affermer afficher affider
- affiler affiner affirmer affistoler affixer affleurer afflouer affluer affoler
- afforester affouiller affourcher affriander affricher affrioler affriquer
- affriter affronter affruiter affubler affurer affûter afghaniser afistoler
- africaniser agatiser agenouiller agglutiner aggraver agioter agiter agoniser
- agourmander agrafer agrainer agrémenter agresser agriffer agripper
- agroalimentariser agrouper aguetter aguicher ahaner aheurter aicher aider
- aigretter aiguer aiguiller aiguillonner aiguiser ailer ailler ailloliser
- aimanter aimer airer ajointer ajourer ajourner ajouter ajuster ajuter
- alambiquer alarmer albaniser albitiser alcaliniser alcaliser alcooliser
- alcoolyser alcoyler aldoliser alerter aleviner algébriser algérianiser
- algorithmiser aligner alimenter alinéater alinéatiser aliter alkyler allaiter
- allectomiser allégoriser allitiser allivrer allocutionner alloter allouer
- alluder allumer allusionner alluvionner allyler aloter alpaguer alphabétiser
- alterner aluminer aluminiser aluner alvéoler alvéoliser amabiliser amadouer
- amalgamer amariner amarrer amateloter ambitionner ambler ambrer ambuler
- améliorer amender amenuiser américaniser ameulonner ameuter amhariser amiauler
- amicoter amidonner amignarder amignoter amignotter aminer ammoniaquer
- ammoniser ammoxyder amocher amouiller amouracher amourer amphotériser ampouler
- amputer amunitionner amurer amuser anagrammatiser anagrammer analyser
- anamorphoser anaphylactiser anarchiser anastomoser anathématiser anatomiser
- ancher anchoiter ancrer anecdoter anecdotiser angéliser anglaiser angler
- angliciser angoisser anguler animaliser animer aniser ankyloser annexer
- annihiler annoter annualiser annuler anodiser ânonner anser antagoniser
- antéposer antérioriser anthropomorphiser anticiper anticoaguler antidater
- antiparasiter antiquer antiseptiser anuiter aoûter apaiser apériter apetisser
- apeurer apicaliser apiquer aplaner apologiser aponévrotomiser aponter aposter
- apostiller apostoliser apostropher apostumer apothéoser appareiller apparenter
- appeauter appertiser appliquer appointer appoltronner apponter apporter
- apposer appréhender apprêter apprivoiser approcher approuver approvisionner
- approximer apurer aquareller arabiser araméiser aramer araser arbitrer arborer
- arboriser arcbouter arc-bouter archaïser architecturer archiver arçonner
- ardoiser aréniser arer argenter argentiniser argoter argotiser argumenter
- arianiser arimer ariser aristocratiser aristotéliser arithmétiser armaturer
- armer arnaquer aromatiser arpenter arquebuser arquer arracher arraisonner
- arrenter arrêter arrher arrimer arriser arriver arroser arsouiller
- artérialiser articler articuler artificialiser artistiquer aryaniser aryler
- ascensionner ascétiser aseptiser asexuer asianiser asiatiser aspecter
- asphalter aspirer assabler assaisonner assassiner assembler assener asséner
- assermenter asserter assibiler assigner assimiler assister assoiffer assoler
- assommer assoner assoter assumer assurer asticoter astiquer athéiser
- atlantiser atomiser atourner atropiniser attabler attacher attaquer attarder
- attenter attentionner atténuer atterrer attester attifer attirer attiser
- attitrer attraper attremper attribuer attrister attrouper aubiner
+ abouler abouter aboutonner abracadabrer abraquer abraser abreuver abricoter
+ abriter absenter absinther absolutiser absorber abuser académifier académiser
+ acagnarder accabler accagner accaparer accastiller accentuer accepter
+ accessoiriser accidenter acclamer acclimater accointer accolader accoler
+ accommoder accompagner accorder accorer accoster accoter accoucher accouder
+ accouer accoupler accoutrer accoutumer accouver accrassiner accréditer
+ accrocher acculer acculturer accumuler accuser acenser acétaliser acétyler
+ achalander acharner acheminer achopper achromatiser aciduler aciériser
+ acliquer acoquiner acquêter acquitter acter actiniser actionner activer
+ actoriser actualiser acupuncturer acyler adapter additionner adenter adieuser
+ adirer adjectiver adjectiviser adjurer adjuver administrer admirer admonester
+ adoniser adonner adopter adorer adorner adosser adouber adresser adsorber
+ aduler adverbialiser aéroporter aérosoliser aérosonder aérotransporter
+ affabuler affacturer affairer affaisser affaiter affaler affamer affecter
+ affectionner affermer afficher affider affiler affiner affirmer affistoler
+ affixer affleurer afflouer affluer affoler afforester affouiller affourcher
+ affriander affricher affrioler affriquer affriter affronter affruiter affubler
+ affurer affûter afghaniser afistoler africaniser agatiser agenouiller
+ agglutiner aggraver agioter agiter agoniser agourmander agrafer agrainer
+ agrémenter agresser agricher agriffer agripper agroalimentariser agrouper
+ aguetter aguicher aguiller ahaner aheurter aicher aider aigretter aiguer
+ aiguiller aiguillonner aiguiser ailer ailler ailloliser aimanter aimer airer
+ ajointer ajourer ajourner ajouter ajuster ajuter alambiquer alarmer albaniser
+ albitiser alcaliniser alcaliser alcooliser alcoolyser alcoyler aldoliser
+ alerter aleviner algébriser algérianiser algorithmiser aligner alimenter
+ alinéater alinéatiser aliter alkyler allaiter allectomiser allégoriser
+ allitiser allivrer allocutionner alloter allouer alluder allumer allusionner
+ alluvionner allyler aloter alpaguer alphabétiser alterner aluminer aluminiser
+ aluner alvéoler alvéoliser amabiliser amadouer amalgamer amariner amarrer
+ amateloter ambitionner ambler ambrer ambuler améliorer amender amenuiser
+ américaniser ameulonner ameuter amhariser amiauler amicoter amidonner
+ amignarder amignoter amignotter aminer ammoniaquer ammoniser ammoxyder amocher
+ amouiller amouracher amourer amphotériser ampouler amputer amunitionner amurer
+ amuser anagrammatiser anagrammer analyser anamorphoser anaphylactiser
+ anarchiser anastomoser anathématiser anatomiser ancher anchoiter ancrer
+ anecdoter anecdotiser angéliser anglaiser angler angliciser angoisser anguler
+ animaliser animer aniser ankyloser annexer annihiler annoter annualiser
+ annuler anodiser ânonner anser antagoniser antéposer antérioriser
+ anthropomorphiser anticiper anticoaguler antidater antiparasiter antiquer
+ antiseptiser anuiter aoûter apaiser apériter apetisser apeurer apicaliser
+ apiquer aplaner apologiser aponévrotomiser aponter aposter apostiller
+ apostoliser apostropher apostumer apothéoser appareiller apparenter appeauter
+ appertiser appliquer appointer appoltronner apponter apporter apposer
+ appréhender apprêter apprivoiser approcher approuver approvisionner approximer
+ apurer aquareller arabiser araméiser aramer araser arbitrer arborer arboriser
+ arcbouter arc-bouter archaïser architecturer archiver arçonner ardoiser
+ aréniser arer argenter argentiniser argoter argotiser argumenter arianiser
+ arimer ariser aristocratiser aristotéliser arithmétiser armaturer armer
+ arnaquer aromatiser arpenter arquebuser arquer arracher arraisonner arrenter
+ arrêter arrher arrimer arriser arriver arroser arsouiller artérialiser
+ articler articuler artificialiser artistiquer aryaniser aryler ascensionner
+ ascétiser aseptiser asexuer asianiser asiatiser aspecter asphalter aspirer
+ assabler assaisonner assassiner assembler assener asséner assermenter asserter
+ assibiler assigner assimiler assister assoiffer assoler assommer assoner
+ assoter assumer assurer asticoter astiquer athéiser atlantiser atomiser
+ atourner atropiniser attabler attacher attaquer attarder attenter attentionner
+ atténuer atterrer attester attifer attirer attiser attitrer attoucher attraper
+ attremper attribuer attriquer attrister attrouper aubader aubiner
 audiovisualiser auditer auditionner augmenter augurer aulofer auloffer aumôner
 auner auréoler ausculter authentiquer autoaccuser autoadapter autoadministrer
 autoagglutiner autoalimenter autoallumer autoamputer autoanalyser autoancrer
@ -73,10 +74,10 @@ VERBS = set(
 autodéterminer autodévelopper autodévorer autodicter autodiscipliner
 autodupliquer autoéduquer autoenchâsser autoenseigner autoépurer autoéquiper
 autoévaporiser autoévoluer autoféconder autofertiliser autoflageller
- autofonder autoformer autofretter autogouverner autogreffer autoguider auto-
- immuniser auto-ioniser autolégitimer autolimiter autoliquider autolyser
- automatiser automédiquer automitrailler automutiler autonomiser auto-
- optimaliser auto-optimiser autoorganiser autoperpétuer autopersuader
+ autofonder autoformer autofretter autogouverner autogreffer autoguider
+ auto-immuniser auto-ioniser autolégitimer autolimiter autoliquider autolyser
+ automatiser automédiquer automitrailler automutiler autonomiser
+ auto-optimaliser auto-optimiser autoorganiser autoperpétuer autopersuader
 autopiloter autopolliniser autoporter autopositionner autoproclamer
 autopropulser autoréaliser autorecruter autoréglementer autoréguler
 autorelaxer autoréparer autoriser autosélectionner autosevrer autostabiliser
@ -84,7 +85,7 @@ VERBS = set(
 autotracter autotransformer autovacciner autoventiler avaler avaliser
 aventurer aveugler avillonner aviner avironner aviser avitailler aviver
 avoiner avoisiner avorter avouer axéniser axer axiomatiser azimuter azoter
- azurer babiller babouiner bâcher bachonner bachoter bâcler badauder
+ azurer babiller babouiner bâcher bachonner bachoter bâcler badauder bader
 badigeonner badiner baffer bafouer bafouiller bâfrer bagarrer bagoter bagouler
 baguenauder baguer baguetter bahuter baigner bailler bâiller baîller
 bâillonner baîllonner baiser baisoter baisouiller baisser bakéliser balader
@ -135,9 +136,9 @@ VERBS = set(
 brouillonner broussailler brousser brouter bruiner bruisser bruiter brûler
 brumer brumiser bruncher brusquer brutaliser bruter bûcher bucoliser
 budgétiser buer buffériser buffler bugler bugner buiser buissonner bulgariser
- buquer bureaucratiser buriner buser busquer buter butiner butonner butter
- buvoter byzantiner byzantiniser cabaler cabaliser cabaner câbler cabosser
- caboter cabotiner cabrer cabrioler cacaber cacaoter cacarder cacher
+ buller buquer bureaucratiser buriner buser busquer buter butiner butonner
+ butter buvoter byzantiner byzantiniser cabaler cabaliser cabaner câbler
+ cabosser caboter cabotiner cabrer cabrioler cacaber cacaoter cacarder cacher
 cachetonner cachotter cadastrer cadavériser cadeauter cadetter cadoter cadrer
 cafarder cafeter cafouiller cafter cageoler cagnarder cagner caguer cahoter
 caillebotter cailler caillouter cajoler calaminer calamistrer calamiter
@ -185,65 +186,66 @@ VERBS = set(
 claveliser claver clavetter clayonner cléricaliser clicher cligner clignoter
 climatiser clinquanter clinquer cliper cliquer clisser cliver clochardiser
 clocher clocter cloisonner cloîtrer cloner cloper clopiner cloquer clôturer
- clouer clouter coaccuser coacerver coacher coadapter coagglutiner coaguler
- coaliser coaltarer coaltariser coanimer coarticuler cobelligérer cocaïniser
- cocarder cocheniller cocher côcher cochonner coconiser coconner cocooner
- cocoter coder codéterminer codiller coéditer coéduquer coexister coexploiter
- coexprimer coffiner coffrer cofonder cogiter cogner cogouverner cohabiter
- cohériter cohober coiffer coincher coincider coïncider coïter colchiciner
- collaber collaborer collationner collecter collectionner collectiviser coller
- collisionner colloquer colluvionner colmater colombianiser colombiner
- coloniser colorer coloriser colostomiser colporter colpotomiser coltiner
- columniser combiner combler commander commanditer commémorer commenter
- commercialiser comminer commissionner commotionner commuer communaliser
- communautariser communiquer communiser commuter compacifier compacter comparer
- compartimenter compenser compiler compisser complanter complémenter
- complétiviser complexer complimenter compliquer comploter comporter composer
- composter compoter compounder compresser comprimer comptabiliser compter
- compulser computer computériser concentrer conceptualiser concerner concerter
- concher conciliabuler concocter concomiter concorder concrétionner concrétiser
- concubiner condamner condenser condimenter conditionner confabuler
- confectionner confédéraliser confesser confessionnaliser configurer confiner
- confirmer confisquer confiter confluer conformer conforter confronter
- confusionner congestionner conglober conglutiner congoliser congratuler
- coniser conjecturer conjointer conjuger conjuguer conjurer connecter conniver
- connoter conquêter consacrer conscientiser conseiller conserver consigner
- consister consoler consolider consommariser consommer consonantiser consoner
- conspirer conspuer constater consteller conster consterner constiper
- constituer constitutionnaliser consulter consumer contacter contagionner
- containeriser containériser contaminer contemner contempler conteneuriser
- contenter conter contester contextualiser continentaliser contingenter
- continuer contorsionner contourner contracter contractualiser contracturer
- contraposer contraster contre-attaquer contrebouter contrebuter contrecalquer
- contrecarrer contre-expertiser contreficher contrefraser contre-indiquer
- contremander contremanifester contremarcher contremarquer contreminer
- contremurer contrenquêter contreplaquer contrepointer contrer contresigner
- contrespionner contretyper contreventer contribuer contrister contrôler
- controuver controverser contusionner conventionnaliser conventionner
- conventualiser converser convoiter convoler convoquer convulser convulsionner
- cooccuper coopératiser coopter coordonner coorganiser coparrainer coparticiper
- copermuter copiner copolycondenser copolymériser coprésenter coprésider copser
- copter copuler copyrighter coqueliner coquer coqueriquer coquiller corailler
- corder cordonner coréaliser coréaniser coréguler coresponsabiliser cornaquer
- cornemuser corner coroniser corporiser correctionaliser correctionnaliser
- correler corréler corroborer corroder corser corticaliser cosigner cosmétiquer
- cosser costumer coter cotillonner cotiser cotonner cotransfecter couaquer
- couarder couchailler coucher couchoter couchotter coucouer coucouler couder
- coudrer couillonner couiner couler coulisser coupailler coupeller couper
- couperoser coupler couponner courailler courbaturer courber courbetter
- courcailler couronner courrieler courser courtauder court-circuiter courtiser
- cousiner coussiner coûter couturer couver cracher crachiner crachoter
- crachouiller crailler cramer craminer cramper cramponner crampser cramser
- craner crâner crânoter cranter crapahuter crapaüter crapser crapuler craquer
- crasher cratériser craticuler cratoniser cravacher cravater crawler crayonner
- crédibiliser créditer crématiser créoliser créosoter crêper crépiner crépiter
- crésyler crêter crétiniser creuser criailler cribler criminaliser criquer
- crisper crisser cristalliser criticailler critiquer crocher croiser crôler
- croquer croskiller crosser crotoniser crotter crouler croupionner crouponner
+ clotûrer clouer clouter coaccuser coacerver coacher coadapter coagglutiner
+ coaguler coaliser coaltarer coaltariser coanimer coarticuler cobelligérer
+ cocaïniser cocarder cocheniller cocher côcher cochonner coconiser coconner
+ cocooner cocoter coder codéterminer codiller coéditer coéduquer coexister
+ coexploiter coexprimer coffiner coffrer cofonder cogiter cogner cogouverner
+ cohabiter cohériter cohober coiffer coincher coincider coïncider coïter
+ colchiciner collaber collaborer collationner collecter collectionner
+ collectiviser coller collisionner colloquer colluvionner colmater
+ colombianiser colombiner coloniser colorer coloriser colostomiser colporter
+ colpotomiser coltiner columniser combiner combler commander commanditer
+ commémorer commenter commercialiser comminer commissionner commotionner
+ commuer communaliser communautariser communiquer communiser commuter
+ compacifier compacter comparer compartimenter compenser compiler compisser
+ complanter complémenter complétiviser complexer complimenter compliquer
+ comploter comporter composer composter compoter compounder compresser
+ comprimer comptabiliser compter compulser computer computériser concentrer
+ conceptualiser concerner concerter concher conciliabuler concocter concomiter
+ concorder concrétionner concrétiser concubiner condamner condenser condimenter
+ conditionner confabuler confectionner confédéraliser confesser
+ confessionnaliser configurer confiner confirmer confisquer confiter confluer
+ conformer conforter confronter confusionner congestionner conglober
+ conglutiner congoliser congratuler coniser conjecturer conjointer conjuger
+ conjuguer conjurer connecter conniver connoter conquêter consacrer
+ conscientiser conseiller conserver consigner consister consoler consolider
+ consommariser consommer consonantiser consoner conspirer conspuer constater
+ consteller conster consterner constiper constituer constitutionnaliser
+ consulter consumer contacter contagionner containeriser containériser
+ contaminer contemner contempler conteneuriser contenter conter contester
+ contextualiser continentaliser contingenter continuer contorsionner contourner
+ contracter contractualiser contracturer contraposer contraster contre-attaquer
+ contrebouter contrebuter contrecalquer contrecarrer contre-expertiser
+ contreficher contrefraser contre-indiquer contremander contremanifester
+ contremarcher contremarquer contreminer contremurer contrenquêter
+ contreplaquer contrepointer contrer contresigner contrespionner contretyper
+ contreventer contribuer contrister contrôler controuver controverser
+ contusionner conventionnaliser conventionner conventualiser converser
+ convoiter convoler convoquer convulser convulsionner cooccuper coopératiser
+ coopter coordonner coorganiser coparrainer coparticiper copermuter copiner
+ copolycondenser copolymériser coprésenter coprésider copser copter copuler
+ copyrighter coqueliner coquer coqueriquer coquiller corailler corder cordonner
+ coréaliser coréaniser coréguler coresponsabiliser cornaquer cornemuser corner
+ coroniser corporiser correctionaliser correctionnaliser correler corréler
+ corroborer corroder corser corticaliser cosigner cosmétiquer cosser costumer
+ coter cotillonner cotiser cotonner cotransfecter couaquer couarder couchailler
+ coucher couchoter couchotter coucouer coucouler couder coudrer couillonner
+ couiner couler coulisser coupailler coupeller couper couperoser coupler
+ couponner courailler courbaturer courber courbetter courcailler couronner
+ courrieler courser courtauder court-circuiter courtiser cousiner coussiner
+ coûter couturer couver cracher crachiner crachoter crachouiller crailler
+ cramer craminer cramper cramponner crampser cramser craner crâner crânoter
+ cranter crapahuter crapaüter crapser crapuler craquer crasher cratériser
+ craticuler cratoniser cravacher cravater crawler crayonner crédibiliser
+ créditer crématiser créoliser créosoter crêper crépiner crépiter crésyler
+ crêter crétiniser creuser criailler cribler criminaliser criquer crisper
+ crisser cristalliser criticailler critiquer crocher croiser crôler croquer
+ croskiller crosser crotoniser crotter crouler croupionner crouponner
 croustiller croûter croûtonner cryoappliquer cryocautériser cryocoaguler
 cryoconcentrer cryodécaper cryoébarber cryofixer cryogéniser cryomarquer
- cryosorber crypter cuber cueiller cuider cuisiner cuiter cuivrer culbuter
- culer culminer culotter culpabiliser cultiver culturaliser cumuler curariser
+ cryosorber crypter cuber cueiller cuider cuisiner cuivrer culbuter culer
+ culminer culotter culpabiliser cultiver culturaliser cumuler curariser
 curedenter curer curetter customiser cuter cutiniser cuver cyaniser cyanoser
 cyanurer cybernétiser cycler cycliser cycloner cylindrer dactylocoder daguer
 daguerréotyper daïer daigner dailler daller damasquiner damer damner
@ -748,8 +750,8 @@ VERBS = set(
 mithridatiser mitonner mitrailler mixer mixter mixtionner mobiliser modaliser
 modéliser modérantiser moderniser moduler moellonner mofler moirer moiser
 moissonner molarder molariser moléculariser molester moletter mollarder
- molletter monarchiser mondaniser monder mondialiser monétariser monétiser
- moniliser monologuer monomériser monophtonguer monopoler monopoliser
+ molletonner molletter monarchiser mondaniser monder mondialiser monétariser
+ monétiser moniliser monologuer monomériser monophtonguer monopoler monopoliser
 monoprogrammer monosiallitiser monotoniser monseigneuriser monter montrer
 monumentaliser moquer moquetter morailler moraliser mordailler mordiller
 mordillonner mordorer mordoriser morfailler morfaler morfiler morfler morganer
@ -792,63 +794,64 @@ VERBS = set(
 palpiter palucher panacher panader pancarter paner paniquer panneauter panner
 pannetonner panoramiquer panser pantiner pantomimer pantoufler paoner paonner
 papelarder papillonner papilloter papoter papouiller paquer paraboliser
- parachuter parader parafer paraffiner paralléliser paralyser paramétriser
- parangonner parapher paraphraser parasiter parcellariser parceller parcelliser
- parcheminer parcoriser pardonner parementer parenthétiser parer paresser
- parfiler parfumer parisianiser parjurer parkériser parlementer parler parloter
- parlotter parquer parrainer participer particulariser partitionner partouzer
- pasquiner pasquiniser passefiler passementer passepoiler passeriller
- passionnaliser passionner pasteller pasteuriser pasticher pastiller pastoriser
- patafioler pateliner patenter paternaliser paterner pathétiser patienter
- patiner pâtisser patoiser pâtonner patouiller patrimonialiser patrociner
- patronner patrouiller patter pâturer paumer paupériser pauser pavaner paver
- pavoiser peaufiner pébriner pécher pêcher pécloter pectiser pédaler pédanter
- pédantiser pédiculiser pédicurer pédimenter peigner peiner peinturer
- peinturlurer péjorer pelaner pelauder péleriner pèleriner pelletiser
- pelleverser pelliculer peloter pelotonner pelucher pelurer pénaliser pencher
- pendeloquer pendiller pendouiller penduler pénéplaner penser pensionner
- peptiser peptoniser percaliner percher percoler percuter perdurer pérégriner
- pérenniser perfectionner perforer performer perfuser péricliter périmer
- périodiser périphériser périphraser péritoniser perler permanenter permaner
- perméabiliser permuter pérorer pérouaniser peroxyder perpétuer perquisitionner
- perreyer perruquer persécuter persifler persiller persister personnaliser
- persuader perturber pervibrer pester pétarader pétarder pétiller pétitionner
- pétocher pétouiller pétrarquiser pétroliser pétuner peupler pexer
- phacoémulsifier phagocyter phalangiser pharyngaliser phéniquer phénoler
- phényler philosophailler philosopher phlébotomiser phlegmatiser phlogistiquer
- phonétiser phonologiser phosphater phosphorer phosphoriser phosphoryler
- photoactiver photocomposer photograver photo-ioniser photoïoniser photomonter
- photophosphoryler photopolymériser photosensibiliser phraser piaffer piailler
- pianomiser pianoter piauler pickler picocher picoler picorer picoter picouser
- picouzer picrater pictonner picturaliser pidginiser piédestaliser pierrer
- piétiner piétonnifier piétonniser pieuter pifer piffer piffrer pigeonner
- pigmenter pigner pignocher pignoler piler piller pilloter pilonner piloter
- pimenter pinailler pinceauter pinçoter pindariser pinter piocher pionner
- piotter piper piqueniquer pique-niquer piquer piquetonner piquouser piquouzer
- pirater pirouetter piser pisser pissoter pissouiller pistacher pister pistoler
- pistonner pitancher pitcher piter pitonner pituiter pivoter placarder
- placardiser plafonner plaider plainer plaisanter plamer plancher planer
- planétariser planétiser planquer planter plaquer plasmolyser plastiquer
- plastronner platiner platiniser platoniser plâtrer plébisciter pleurailler
- pleuraliser pleurer pleurnicher pleuroter pleuviner pleuvioter pleuvoter
- plisser plissoter plomber ploquer plotiniser plouter ploutrer plucher
- plumarder plumer pluraliser plussoyer pluviner pluvioter pocharder pocher
- pochetronner pochtronner poculer podzoliser poêler poétiser poignarder poigner
- poiler poinçonner pointer pointiller poireauter poirer poiroter poisser
- poitriner poivrer poivroter polariser poldériser polémiquer polissonner
- politicailler politiquer politiser polker polliciser polliniser polluer
- poloniser polychromer polycontaminer polygoner polygoniser polymériser
- polyploïdiser polytransfuser polyviser pommader pommer pomper pomponner
- ponctionner ponctuer ponter pontiller populariser poquer porer porphyriser
- porter porteuser portionner portoricaniser portraicturer portraiturer poser
- positionner positiver possibiliser postdater poster postérioriser posticher
- postillonner postposer postsonoriser postsynchroniser postuler potabiliser
- potentialiser poter poteyer potiner poudrer pouffer pouiller pouliner pouloper
- poulotter pouponner pourpenser pourprer poussailler pousser poutser praliner
- pratiquer préaccentuer préadapter préallouer préassembler préassimiler
- préaviser précariser précautionner prêchailler préchauffer préchauler prêcher
- précipiter préciser préciter précompter préconditionner préconfigurer
- préconiser préconstituer précoter prédater prédécouper prédésigner prédestiner
+ parachuter parader parafer paraffiner paraisonner paralléliser paralyser
+ paramétriser parangonner parapher paraphraser parasiter parcellariser
+ parceller parcelliser parcheminer parcoriser pardonner parementer
+ parenthétiser parer paresser parfiler parfumer parisianiser parjurer
+ parkériser parlementer parler parloter parlotter parquer parrainer participer
+ particulariser partitionner partouzer pasquiner pasquiniser passefiler
+ passementer passepoiler passeriller passionnaliser passionner pasteller
+ pasteuriser pasticher pastiller pastoriser patafioler pateliner patenter
+ paternaliser paterner pathétiser patienter patiner pâtisser patoiser pâtonner
+ patouiller patrimonialiser patrociner patronner patrouiller patter pâturer
+ paumer paupériser pauser pavaner paver pavoiser peaufiner pébriner pécher
+ pêcher pécloter pectiser pédaler pédanter pédantiser pédiculiser pédicurer
+ pédimenter peigner peiner peinturer peinturlurer péjorer pelaner pelauder
+ péleriner pèleriner pelletiser pelleverser pelliculer peloter pelotonner
+ pelucher pelurer pénaliser pencher pendeloquer pendiller pendouiller penduler
+ pénéplaner penser pensionner peptiser peptoniser percaliner percher percoler
+ percuter perdurer pérégriner pérenniser perfectionner perforer performer
+ perfuser péricliter périmer périodiser périphériser périphraser péritoniser
+ perler permanenter permaner perméabiliser permuter pérorer pérouaniser
+ peroxyder perpétuer perquisitionner perreyer perruquer persécuter persifler
+ persiller persister personnaliser persuader perturber pervibrer pester
+ pétarader pétarder pétiller pétitionner pétocher pétouiller pétrarquiser
+ pétroliser pétuner peupler pexer phacoémulsifier phagocyter phalangiser
+ pharyngaliser phéniquer phénoler phényler philosophailler philosopher
+ phlébotomiser phlegmatiser phlogistiquer phonétiser phonologiser phosphater
+ phosphorer phosphoriser phosphoryler photoactiver photocomposer photograver
+ photo-ioniser photoïoniser photomonter photophosphoryler photopolymériser
+ photosensibiliser phraser piaffer piailler pianomiser pianoter piauler pickler
+ picocher picoler picorer picoter picouser picouzer picrater pictonner
+ picturaliser pidginiser piédestaliser pierrer piétiner piétonnifier
+ piétonniser pieuter pifer piffer piffrer pigeonner pigmenter pigner pignocher
+ pignoler piler piller pilloter pilonner piloter pimenter pinailler pinceauter
+ pinçoter pindariser pinter piocher pionner piotter piper piqueniquer
+ pique-niquer piquer piquetonner piquouser piquouzer pirater pirouetter piser
+ pisser pissoter pissouiller pistacher pister pistoler pistonner pitancher
+ pitcher piter pitonner pituiter pivoter placarder placardiser plafonner
+ plaider plainer plaisanter plamer plancher planer planétariser planétiser
+ planquer planter plaquer plasmolyser plastiquer plastronner platiner
+ platiniser platoniser plâtrer plébisciter pleurailler pleuraliser pleurer
+ pleurnicher pleuroter pleuviner pleuvioter pleuvoter plisser plissoter plomber
+ ploquer plotiniser plouter ploutrer plucher plumarder plumer pluraliser
+ plussoyer pluviner pluvioter pocharder pocher pochetronner pochtronner poculer
+ podzoliser poêler poétiser poignarder poigner poiler poinçonner pointer
+ pointiller poireauter poirer poiroter poisser poitriner poivrer poivroter
+ polariser poldériser polémiquer polissonner politicailler politiquer politiser
+ polker polliciser polliniser polluer poloniser polychromer polycontaminer
+ polygoner polygoniser polymériser polyploïdiser polytransfuser polyviser
+ pommader pommer pomper pomponner ponctionner ponctuer ponter pontiller
+ populariser poquer porer porphyriser porter porteuser portionner
+ portoricaniser portraicturer portraiturer poser positionner positiver
+ possibiliser postdater poster postérioriser posticher postillonner postposer
+ postsonoriser postsynchroniser postuler potabiliser potentialiser poter
+ poteyer potiner poudrer pouffer pouiller pouliner pouloper poulotter pouponner
+ pourpenser pourprer poussailler pousser poutser praliner pratiquer
+ préaccentuer préadapter préallouer préassembler préassimiler préaviser
+ précariser précautionner prêchailler préchauffer préchauler prêcher précipiter
+ préciser préciter précompter préconditionner préconfigurer préconiser
+ préconstituer précoter prédater prédécouper prédésigner prédestiner
 prédéterminer prédiffuser prédilectionner prédiquer prédisposer prédominer
 préemballer préempter préencoller préenregistrer préenrober préexaminer
 préexister préfabriquer préfaner préfigurer préfixer préformater préformer
@ -879,8 +882,8 @@ VERBS = set(
 raccommoder raccompagner raccorder raccoutrer raccoutumer raccrocher racémiser
 rachalander racher raciner racketter racler râcler racoler raconter racoquiner
 radariser rader radicaliser radiner radioactiver radiobaliser radiocommander
- radioconserver radiodétecter radiodiffuser radioexposer radioguider radio-
- immuniser radiolocaliser radiopasteuriser radiosonder radiostériliser
+ radioconserver radiodétecter radiodiffuser radioexposer radioguider
+ radio-immuniser radiolocaliser radiopasteuriser radiosonder radiostériliser
 radiotéléphoner radiotéléviser radoter radouber rafaler raffermer raffiler
 raffiner raffluer raffoler raffûter rafistoler rafler ragoter ragoûter
 ragrafer raguer raguser raiguiser railler rainer rainurer raisonner rajouter
@ -1123,19 +1126,21 @@ VERBS = set(
 sommer somnambuler somniloquer somnoler sonder sonnailler sonner sonoriser
 sophistiquer sorguer soubresauter souder souffler souffroter soufrer souhaiter
 souiller souillonner soûler souligner soûlotter soumissionner soupailler
- soupçonner souper soupirer souquer sourciller sourdiner sous-capitaliser sous-
- catégoriser sousestimer sous-estimer sous-industrialiser sous-médicaliser
- sousperformer sous-qualifier soussigner sous-titrer sous-utiliser soutacher
- souter soutirer soviétiser spammer spasmer spatialiser spatuler spécialiser
- spéculer sphéroïdiser spilitiser spiraler spiraliser spirantiser spiritualiser
- spitter splénectomiser spléniser sponsoriser sporter sporuler sprinter
- squatériser squatter squatteriser squattériser squeezer stabiliser stabuler
- staffer stagner staliniser standardiser standoliser stanioler stariser
- stationner statistiquer statuer stelliter stenciler stendhaliser sténoser
- sténotyper stepper stéréotyper stériliser stigmatiser stimuler stipuler
- stocker stoloniser stopper stranguler stratégiser stresser strider striduler
- striper stripper striquer stronker strouiller structurer strychniser stuquer
- styler styliser subalterniser subdiviser subdivisionner subériser subjectiver
+ soupçonner souper soupirer souquer sourciller sourdiner sous-alimenter
+ sous-capitaliser sous-catégoriser sous-équiper sousestimer sous-estimer
+ sous-évaluer sous-exploiter sous-exposer sous-industrialiser sous-louer
+ sous-médicaliser sousperformer sous-qualifier soussigner sous-titrer
+ sous-traiter sous-utiliser sous-virer soutacher souter soutirer soviétiser
+ spammer spasmer spatialiser spatuler spécialiser spéculer sphéroïdiser
+ spilitiser spiraler spiraliser spirantiser spiritualiser spitter
+ splénectomiser spléniser sponsoriser sporter sporuler sprinter squatériser
+ squatter squatteriser squattériser squeezer stabiliser stabuler staffer
+ stagner staliniser standardiser standoliser stanioler stariser stationner
+ statistiquer statuer stelliter stenciler stendhaliser sténoser sténotyper
+ stepper stéréotyper stériliser stigmatiser stimuler stipuler stocker
+ stoloniser stopper stranguler stratégiser stresser strider striduler striper
+ stripper striquer stronker strouiller structurer strychniser stuquer styler
+ styliser subalterniser subdiviser subdivisionner subériser subjectiver
 subjectiviser subjuguer sublimer sublimiser subluxer subminiaturiser subodorer
 subordonner suborner subsister substanter substantialiser substantiver
 substituer subsumer subtiliser suburbaniser subventionner succomber suçoter
--- a/spacy/lang/fr/lemmatizer/_verbs_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_verbs_irreg.py
--- a/spacy/lang/fr/lemmatizer/lemmatizer.py
+++ b/spacy/lang/fr/lemmatizer/lemmatizer.py
@ -1,7 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals

-from ....symbols import POS, NOUN, VERB, ADJ, ADV, PRON, DET, AUX, PUNCT
+from ....symbols import POS, NOUN, VERB, ADJ, ADV, PRON, DET, AUX, PUNCT, ADP, SCONJ, CCONJ
 from ....symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
 from .lookup import LOOKUP

@ -9,7 +9,7 @@ from .lookup import LOOKUP
 French language lemmatizer applies the default rule based lemmatization
 procedure with some modifications for better French language support.

-The parts of speech 'ADV', 'PRON', 'DET' and 'AUX' are added to use the 
+The parts of speech 'ADV', 'PRON', 'DET', 'ADP' and 'AUX' are added to use the 
 rule-based lemmatization. As a last resort, the lemmatizer checks in 
 the lookup table.
 '''
@ -34,16 +34,22 @@ class FrenchLemmatizer(object):
            univ_pos = 'verb'
        elif univ_pos in (ADJ, 'ADJ', 'adj'):
            univ_pos = 'adj'
+        elif univ_pos in (ADP, 'ADP', 'adp'):
+            univ_pos = 'adp'
        elif univ_pos in (ADV, 'ADV', 'adv'):
            univ_pos = 'adv'
-        elif univ_pos in (PRON, 'PRON', 'pron'):
-            univ_pos = 'pron'
-        elif univ_pos in (DET, 'DET', 'det'):
-            univ_pos = 'det'
        elif univ_pos in (AUX, 'AUX', 'aux'):
            univ_pos = 'aux'
+        elif univ_pos in (CCONJ, 'CCONJ', 'cconj'):
+            univ_pos = 'cconj'
+        elif univ_pos in (DET, 'DET', 'det'):
+            univ_pos = 'det'
+        elif univ_pos in (PRON, 'PRON', 'pron'):
+            univ_pos = 'pron'
        elif univ_pos in (PUNCT, 'PUNCT', 'punct'):
            univ_pos = 'punct'
+        elif univ_pos in (SCONJ, 'SCONJ', 'sconj'):
+            univ_pos = 'sconj'
        else:
            return [self.lookup(string)]
        # See Issue #435 for example of where this logic is requied.
@ -100,7 +106,7 @@ class FrenchLemmatizer(object):

    def lookup(self, string):
        if string in self.lookup_table:
-            return self.lookup_table[string]
+            return self.lookup_table[string][0]
        return string


@ -125,7 +131,7 @@ def lemmatize(string, index, exceptions, rules):
    if not forms:
        forms.extend(oov_forms)
    if not forms and string in LOOKUP.keys():
-        forms.append(LOOKUP[string])
+        forms.append(LOOKUP[string][0])
    if not forms:
        forms.append(string)
    return list(set(forms))
--- a/spacy/lang/fr/lemmatizer/lookup.py
+++ b/spacy/lang/fr/lemmatizer/lookup.py
--- a/spacy/lang/ja/init.py
+++ b/spacy/lang/ja/init.py
@ -1,16 +1,15 @@
 # encoding: utf8
 from __future__ import unicode_literals, print_function

-from ...language import Language
-from ...attrs import LANG
-from ...tokens import Doc, Token
-from ...tokenizer import Tokenizer
-from ... import util
-from .tag_map import TAG_MAP
-
 import re
 from collections import namedtuple

+from .tag_map import TAG_MAP
+
+from ...attrs import LANG
+from ...language import Language
+from ...tokens import Doc, Token
+from ...util import DummyTokenizer

 ShortUnitWord = namedtuple("ShortUnitWord", ["surface", "lemma", "pos"])

@ -46,12 +45,12 @@ def resolve_pos(token):
    # PoS mappings.

    if token.pos == "連体詞,*,*,*":
-        if re.match("^[こそあど此其彼]の", token.surface):
+        if re.match(r"[こそあど此其彼]の", token.surface):
            return token.pos + ",DET"
-        if re.match("^[こそあど此其彼]", token.surface):
+        if re.match(r"[こそあど此其彼]", token.surface):
            return token.pos + ",PRON"
-        else:
        return token.pos + ",ADJ"
+
    return token.pos


@ -68,7 +67,8 @@ def detailed_tokens(tokenizer, text):
        pos = ",".join(parts[0:4])

        if len(parts) > 7:
-            # this information is only available for words in the tokenizer dictionary
+            # this information is only available for words in the tokenizer
+            # dictionary
            base = parts[7]

        words.append(ShortUnitWord(surface, base, pos))
@ -76,38 +76,27 @@ def detailed_tokens(tokenizer, text):
    return words


-class JapaneseTokenizer(object):
+class JapaneseTokenizer(DummyTokenizer):
    def __init__(self, cls, nlp=None):
        self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)

-        MeCab = try_mecab_import()
-        self.tokenizer = MeCab.Tagger()
+        self.tokenizer = try_mecab_import().Tagger()
        self.tokenizer.parseToNode("")  # see #2901

    def __call__(self, text):
        dtokens = detailed_tokens(self.tokenizer, text)
+
        words = [x.surface for x in dtokens]
-        doc = Doc(self.vocab, words=words, spaces=[False] * len(words))
+        spaces = [False] * len(words)
+        doc = Doc(self.vocab, words=words, spaces=spaces)
+
        for token, dtoken in zip(doc, dtokens):
            token._.mecab_tag = dtoken.pos
            token.tag_ = resolve_pos(dtoken)
            token.lemma_ = dtoken.lemma
+
        return doc

-    # add dummy methods for to_bytes, from_bytes, to_disk and from_disk to
-    # allow serialization (see #1557)
-    def to_bytes(self, **exclude):
-        return b""
-
-    def from_bytes(self, bytes_data, **exclude):
-        return self
-
-    def to_disk(self, path, **exclude):
-        return None
-
-    def from_disk(self, path, **exclude):
-        return self
-

 class JapaneseCharacterSegmenter(object):
    def __init__(self, vocab):
@ -154,7 +143,8 @@ class JapaneseCharacterSegmenter(object):

 class JapaneseDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
-    lex_attr_getters[LANG] = lambda text: "ja"
+    lex_attr_getters[LANG] = lambda _text: "ja"
+
    tag_map = TAG_MAP
    use_janome = True

@ -169,7 +159,6 @@ class JapaneseDefaults(Language.Defaults):
 class Japanese(Language):
    lang = "ja"
    Defaults = JapaneseDefaults
-    Tokenizer = JapaneseTokenizer

    def make_doc(self, text):
        return self.tokenizer(text)
--- a/spacy/lang/sv/init.py
+++ b/spacy/lang/sv/init.py
@ -5,6 +5,7 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .stop_words import STOP_WORDS
 from .morph_rules import MORPH_RULES
 from .lemmatizer import LEMMA_RULES, LOOKUP
+from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES

 from ..tokenizer_exceptions import BASE_EXCEPTIONS
 from ..norm_exceptions import BASE_NORMS
@ -20,12 +21,14 @@ class SwedishDefaults(Language.Defaults):
        Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
    )
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
+    morph_rules = MORPH_RULES
+    infixes = TOKENIZER_INFIXES
+    suffixes = TOKENIZER_SUFFIXES
    stop_words = STOP_WORDS
    lemma_rules = LEMMA_RULES
    lemma_lookup = LOOKUP
    morph_rules = MORPH_RULES

-
 class Swedish(Language):
    lang = "sv"
    Defaults = SwedishDefaults
--- a/spacy/lang/sv/lemmatizer/lookup.py
+++ b/spacy/lang/sv/lemmatizer/lookup.py
@ -233167,7 +233167,6 @@ LOOKUP = {
    "jades": "jade",
    "jaet": "ja",
    "jaets": "ja",
-    "jag": "jaga",
    "jagad": "jaga",
    "jagade": "jaga",
    "jagades": "jaga",
--- a/spacy/lang/sv/punctuation.py
+++ b/spacy/lang/sv/punctuation.py
@ -0,0 +1,25 @@
+# coding: utf8
+"""Punctuation stolen from Danish"""
+from __future__ import unicode_literals
+
+from ..char_classes import LIST_ELLIPSES, LIST_ICONS
+from ..char_classes import QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
+from ..punctuation import TOKENIZER_SUFFIXES
+
+
+_quotes = QUOTES.replace("'", '')
+
+_infixes = (LIST_ELLIPSES + LIST_ICONS +
+            [r'(?<=[{}])\.(?=[{}])'.format(ALPHA_LOWER, ALPHA_UPPER),
+             r'(?<=[{a}])[,!?](?=[{a}])'.format(a=ALPHA),
+             r'(?<=[{a}"])[:<>=](?=[{a}])'.format(a=ALPHA),
+             r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
+             r'(?<=[{a}])([{q}\)\]\(\[])(?=[\{a}])'.format(a=ALPHA, q=_quotes),
+             r'(?<=[{a}])--(?=[{a}])'.format(a=ALPHA)])
+
+_suffixes = [suffix for suffix in TOKENIZER_SUFFIXES if suffix not in ["'s", "'S", "’s", "’S", r"\'"]]
+_suffixes += [r"(?<=[^sSxXzZ])\'"]
+
+
+TOKENIZER_INFIXES = _infixes
+TOKENIZER_SUFFIXES = _suffixes
--- a/spacy/lang/sv/tokenizer_exceptions.py
+++ b/spacy/lang/sv/tokenizer_exceptions.py
@ -26,14 +26,15 @@ for verb_data in [
            {ORTH: "u", LEMMA: PRON_LEMMA, NORM: "du"},
        ]

-
+# Abbreviations for weekdays "sön." (for "söndag" / "söner")
+# are left out because they are ambiguous. The same is the case
+# for abbreviations "jul." and "Jul." ("juli" / "jul").
 for exc_data in [
    {ORTH: "jan.", LEMMA: "januari"},
    {ORTH: "febr.", LEMMA: "februari"},
    {ORTH: "feb.", LEMMA: "februari"},
    {ORTH: "apr.", LEMMA: "april"},
    {ORTH: "jun.", LEMMA: "juni"},
-    {ORTH: "jul.", LEMMA: "juli"},
    {ORTH: "aug.", LEMMA: "augusti"},
    {ORTH: "sept.", LEMMA: "september"},
    {ORTH: "sep.", LEMMA: "september"},
@ -46,13 +47,11 @@ for exc_data in [
    {ORTH: "tors.", LEMMA: "torsdag"},
    {ORTH: "fre.", LEMMA: "fredag"},
    {ORTH: "lör.", LEMMA: "lördag"},
-    {ORTH: "sön.", LEMMA: "söndag"},
    {ORTH: "Jan.", LEMMA: "Januari"},
    {ORTH: "Febr.", LEMMA: "Februari"},
    {ORTH: "Feb.", LEMMA: "Februari"},
    {ORTH: "Apr.", LEMMA: "April"},
    {ORTH: "Jun.", LEMMA: "Juni"},
-    {ORTH: "Jul.", LEMMA: "Juli"},
    {ORTH: "Aug.", LEMMA: "Augusti"},
    {ORTH: "Sept.", LEMMA: "September"},
    {ORTH: "Sep.", LEMMA: "September"},
@ -65,28 +64,32 @@ for exc_data in [
    {ORTH: "Tors.", LEMMA: "Torsdag"},
    {ORTH: "Fre.", LEMMA: "Fredag"},
    {ORTH: "Lör.", LEMMA: "Lördag"},
-    {ORTH: "Sön.", LEMMA: "Söndag"},
    {ORTH: "sthlm", LEMMA: "Stockholm"},
    {ORTH: "gbg", LEMMA: "Göteborg"},
 ]:
    _exc[exc_data[ORTH]] = [exc_data]


+# Specific case abbreviations only
+for orth in ["AB", "Dr.", "H.M.", "H.K.H.", "m/s", "M/S", "Ph.d.", "S:t", "s:t"]:
+    _exc[orth] = [{ORTH: orth}]
+
+
 ABBREVIATIONS = [
    "ang",
    "anm",
-    "bil",
    "bl.a",
    "d.v.s",
    "doc",
    "dvs",
    "e.d",
    "e.kr",
-    "el",
+    "el.",
    "eng",
    "etc",
    "exkl",
-    "f",
+    "ev",
+    "f.",
    "f.d",
    "f.kr",
    "f.n",
@ -97,10 +100,11 @@ ABBREVIATIONS = [
    "fr.o.m",
    "förf",
    "inkl",
-    "jur",
+    "iofs",
+    "jur.",
    "kap",
    "kl",
-    "kor",
+    "kor.",
    "kr",
    "kungl",
    "lat",
@ -109,9 +113,10 @@ ABBREVIATIONS = [
    "m.m",
    "max",
    "milj",
-    "min",
+    "min.",
    "mos",
    "mt",
+    "mvh",
    "o.d",
    "o.s.v",
    "obs",
@ -125,21 +130,27 @@ ABBREVIATIONS = [
    "s.k",
    "s.t",
    "sid",
-    "s:t",
    "t.ex",
    "t.h",
    "t.o.m",
    "t.v",
    "tel",
-    "ung",
+    "ung.",
    "vol",
+    "v.",
    "äv",
    "övers",
 ]
-ABBREVIATIONS = [abbr + "." for abbr in ABBREVIATIONS] + ABBREVIATIONS
+
+# Add abbreviation for trailing punctuation too. If the abbreviation already has a trailing punctuation - skip it.
+for abbr in ABBREVIATIONS:
+    if abbr.endswith(".") == False:
+        ABBREVIATIONS.append(abbr + ".")

 for orth in ABBREVIATIONS:
    _exc[orth] = [{ORTH: orth}]
+    capitalized = orth.capitalize()
+    _exc[capitalized] = [{ORTH: capitalized}]

 # Sentences ending in "i." (as in "... peka i."), "m." (as in "...än 2000 m."),
 # should be tokenized as two separate tokens.
--- a/spacy/lang/ta/init.py
+++ b/spacy/lang/ta/init.py
@ -0,0 +1,24 @@
+# import language-specific data
+from .stop_words import STOP_WORDS
+from .lex_attrs import LEX_ATTRS
+
+from ..tokenizer_exceptions import BASE_EXCEPTIONS
+from ...language import Language
+from ...attrs import LANG
+from ...util import update_exc
+
+# create Defaults class in the module scope (necessary for pickling!)
+class TamilDefaults(Language.Defaults):
+    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    lex_attr_getters[LANG] = lambda text: 'ta' # language ISO code
+
+    # optional: replace flags with custom functions, e.g. like_num()
+    lex_attr_getters.update(LEX_ATTRS)
+
+# create actual Language class
+class Tamil(Language):
+    lang = 'ta' # language ISO code
+    Defaults = TamilDefaults # override defaults
+
+# set default export – this allows the language class to be lazy-loaded
+__all__ = ['Tamil']
--- a/spacy/lang/ta/examples.py
+++ b/spacy/lang/ta/examples.py
@ -0,0 +1,21 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+"""
+Example sentences to test spaCy and its language models.
+
+>>> from spacy.lang.ta.examples import sentences
+>>> docs = nlp.pipe(sentences)
+"""
+
+
+sentences = [
+    "கிறிஸ்துமஸ் மற்றும் இனிய புத்தாண்டு வாழ்த்துக்கள்",
+    "எனக்கு என் குழந்தைப் பருவம் நினைவிருக்கிறது",
+    "உங்கள் பெயர் என்ன?",
+    "ஏறத்தாழ இலங்கைத் தமிழரில் மூன்றிலொரு பங்கினர் இலங்கையை விட்டு வெளியேறிப் பிற நாடுகளில் வாழ்கின்றனர்",
+    "இந்த ஃபோனுடன் சுமார் ரூ.2,990 மதிப்புள்ள போட் ராக்கர்ஸ் நிறுவனத்தின் ஸ்போர்ட் புளூடூத் ஹெட்போன்ஸ்  இலவசமாக வழங்கப்படவுள்ளது.",
+    "மட்டக்களப்பில் பல இடங்களில் வீட்டுத் திட்டங்களுக்கு இன்று அடிக்கல் நாட்டல்",
+    "ஐ போன்க்கு முகத்தை வைத்து அன்லாக் செய்யும் முறை மற்றும்  விரலால் தொட்டு அன்லாக் செய்யும் முறையை வாட்ஸ் ஆப் நிறுவனம் இதற்கு முன் கண்டுபிடித்தது"
+]
--- a/spacy/lang/ta/lex_attrs.py
+++ b/spacy/lang/ta/lex_attrs.py
@ -0,0 +1,44 @@
+# coding: utf8
+from __future__ import unicode_literals
+from ...attrs import LIKE_NUM
+
+
+_numeral_suffixes = {'பத்து': 'பது', 'ற்று': 'று', 'ரத்து':'ரம்' , 'சத்து': 'சம்'}
+_num_words = ['பூச்சியம்', 'ஒரு', 'ஒன்று', 'இரண்டு', 'மூன்று', 'நான்கு', 'ஐந்து', 'ஆறு', 'ஏழு',
+              'எட்டு', 'ஒன்பது', 'பத்து', 'பதினொன்று', 'பன்னிரண்டு', 'பதின்மூன்று', 'பதினான்கு',
+              'பதினைந்து', 'பதினாறு', 'பதினேழு', 'பதினெட்டு', 'பத்தொன்பது', 'இருபது',
+              'முப்பது', 'நாற்பது', 'ஐம்பது', 'அறுபது', 'எழுபது', 'எண்பது', 'தொண்ணூறு',
+              'நூறு', 'இருநூறு', 'முன்னூறு', 'நாநூறு', 'ஐநூறு', 'அறுநூறு', 'எழுநூறு', 'எண்ணூறு', 'தொள்ளாயிரம்',
+              'ஆயிரம்', 'ஒராயிரம்', 'லட்சம்', 'மில்லியன்', 'கோடி', 'பில்லியன்', 'டிரில்லியன்']
+
+
+# 20-89 ,90-899,900-99999 and above have different suffixes
+def suffix_filter(text):
+    # text without numeral suffixes
+    for num_suffix in _numeral_suffixes.keys():
+        length = len(num_suffix)
+        if (len(text) < length):
+            break
+        elif text.endswith(num_suffix):
+            return text[:-length] + _numeral_suffixes[num_suffix]
+    return text
+
+
+def like_num(text):
+    text = text.replace(',', '').replace('.', '')
+    if text.isdigit():
+        return True
+    if text.count('/') == 1:
+        num, denom = text.split('/')
+        if num.isdigit() and denom.isdigit():
+            return True
+    print(suffix_filter(text))
+    if text.lower() in _num_words:
+        return True
+    elif suffix_filter(text) in _num_words:
+        return True
+
+    return False
+LEX_ATTRS = {
+    LIKE_NUM: like_num
+}
--- a/spacy/lang/ta/norm_exceptions.py
+++ b/spacy/lang/ta/norm_exceptions.py
@ -0,0 +1,148 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+_exc = {
+
+    # Regional words normal
+    # Sri Lanka - wikipeadia
+    "இங்க": "இங்கே",
+    "வாங்க": "வாருங்கள்",
+    'ஒண்டு':'ஒன்று',
+    'கண்டு': 'கன்று',
+    'கொண்டு': 'கொன்று',
+    'பண்டி': 'பன்றி',
+    'பச்ச': 'பச்சை',
+    'அம்பது': 'ஐம்பது',
+    'வெச்ச': 'வைத்து',
+    'வச்ச': 'வைத்து',
+    'வச்சி': 'வைத்து',
+    'வாளைப்பழம்':'வாழைப்பழம்',
+    'மண்ணு': 'மண்',
+    'பொன்னு': 'பொன்',
+    'சாவல்': 'சேவல்',
+    'அங்கால': 'அங்கு ',
+    'அசுப்பு': 'நடமாட்டம்',
+    'எழுவான் கரை': 'எழுவான்கரை',
+    'ஓய்யாரம்': 'எழில் ',
+    'ஒளும்பு': 'எழும்பு',
+    'ஓர்மை': 'துணிவு',
+    'கச்சை': 'கோவணம்',
+    'கடப்பு': 'தெருவாசல்',
+    'சுள்ளி': 'காய்ந்த குச்சி',
+    'திறாவுதல்': 'தடவுதல்',
+    'நாசமறுப்பு': 'தொல்லை',
+    'பரிசாரி': 'வைத்தியன்',
+    'பறவாதி': 'பேராசைக்காரன்',
+    'பிசினி': 'உலோபி ',
+    'விசர்': 'பைத்தியம்',
+    'ஏனம்': 'பாத்திரம்',
+    'ஏலா': 'இயலாது',
+    'ஒசில்': 'அழகு',
+    'ஒள்ளுப்பம்': 'கொஞ்சம்',
+
+    # Srilankan and indian
+    'குத்துமதிப்பு': '',
+    'நூனாயம்': 'நூல்நயம்',
+    'பைய': 'மெதுவாக',
+    'மண்டை': 'தலை',
+    'வெள்ளனே': 'சீக்கிரம்',
+    'உசுப்பு': 'எழுப்பு',
+    'ஆணம்': 'குழம்பு',
+    'உறக்கம்': 'தூக்கம்',
+    'பஸ்': 'பேருந்து',
+    'களவு': 'திருட்டு ',
+
+    #relationship
+    'புருசன்': 'கணவன்',
+    'பொஞ்சாதி': 'மனைவி',
+    'புள்ள': 'பிள்ளை',
+    'பிள்ள': 'பிள்ளை',
+    'ஆம்பிளப்புள்ள': 'ஆண் பிள்ளை',
+    'பொம்பிளப்புள்ள': 'பெண் பிள்ளை',
+    'அண்ணாச்சி': 'அண்ணா',
+    'அக்காச்சி': 'அக்கா',
+    'தங்கச்சி': 'தங்கை',
+
+    #difference words
+    'பொடியன்': 'சிறுவன்',
+    'பொட்டை': 'சிறுமி',
+    'பிறகு': 'பின்பு',
+    'டக்கென்டு': 'விரைவாக',
+    'கெதியா': 'விரைவாக',
+    'கிறுகி': 'திரும்பி',
+    'போயித்து வாறன்': 'போய் வருகிறேன்',
+    'வருவாங்களா': 'வருவார்களா',
+
+    # regular spokens
+    'சொல்லு': 'சொல்',
+    'கேளு': 'கேள்',
+    'சொல்லுங்க': 'சொல்லுங்கள்',
+    'கேளுங்க': 'கேளுங்கள்',
+    'நீங்கள்': 'நீ',
+    'உன்': 'உன்னுடைய',
+
+    # Portugeese formal words
+    'அலவாங்கு': 'கடப்பாரை',
+    'ஆசுப்பத்திரி': 'மருத்துவமனை',
+    'உரோதை': 'சில்லு',
+    'கடுதாசி': 'கடிதம்',
+    'கதிரை': 'நாற்காலி',
+    'குசினி': 'அடுக்களை',
+    'கோப்பை': 'கிண்ணம்',
+    'சப்பாத்து': 'காலணி',
+    'தாச்சி': 'இரும்புச் சட்டி',
+    'துவாய்': 'துவாலை',
+    'தவறணை': 'மதுக்கடை',
+    'பீப்பா': 'மரத்தாழி',
+    'யன்னல்': 'சாளரம்',
+    'வாங்கு': 'மரஇருக்கை',
+
+    # Dutch formal words
+    'இறாக்கை': 'பற்சட்டம்',
+    'இலாட்சி': 'இழுப்பறை',
+    'கந்தோர்': 'பணிமனை',
+    'நொத்தாரிசு': 'ஆவண எழுத்துபதிவாளர்',
+
+    # English formal words
+    'இஞ்சினியர்': 'பொறியியலாளர்',
+    'சூப்பு': 'ரசம்',
+    'செக்': 'காசோலை',
+    'சேட்டு': 'மேற்ச்சட்டை',
+    'மார்க்கட்டு': 'சந்தை',
+    'விண்ணன்': 'கெட்டிக்காரன்',
+
+    # Arabic formal words
+    'ஈமான்': 'நம்பிக்கை',
+    'சுன்னத்து': 'விருத்தசேதனம்',
+    'செய்த்தான்': 'பிசாசு',
+    'மவுத்து': 'இறப்பு',
+    'ஹலால்': 'அங்கீகரிக்கப்பட்டது',
+    'கறாம்': 'நிராகரிக்கப்பட்டது',
+    # Persian, Hindustanian and hindi formal words
+    'சுமார்': 'கிட்டத்தட்ட',
+    'சிப்பாய்': 'போர்வீரன்',
+    'சிபார்சு': 'சிபாரிசு',
+    'ஜமீன்': 'பணக்காரா்',
+    'அசல்': 'மெய்யான',
+    'அந்தஸ்து': 'கௌரவம்',
+    'ஆஜர்': 'சமா்ப்பித்தல்',
+    'உசார்': 'எச்சரிக்கை',
+    'அச்சா':'நல்ல',
+    # English words used in text conversations
+    "bcoz": "ஏனெனில்",
+    "bcuz": "ஏனெனில்",
+    "fav": "விருப்பமான",
+    "morning": "காலை வணக்கம்",
+    "gdeveng": "மாலை வணக்கம்",
+    "gdnyt": "இரவு வணக்கம்",
+    "gdnit": "இரவு வணக்கம்",
+    "plz": "தயவு செய்து",
+    "pls": "தயவு செய்து",
+    "thx": "நன்றி",
+    "thanx": "நன்றி",
+}
+
+NORM_EXCEPTIONS = {}
+
+for string, norm in _exc.items():
+    NORM_EXCEPTIONS[string] = norm
--- a/spacy/lang/ta/stop_words.py
+++ b/spacy/lang/ta/stop_words.py
@ -0,0 +1,133 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+# Stop words
+
+STOP_WORDS = set("""
+ஒரு
+என்று
+மற்றும்
+இந்த
+இது
+என்ற
+கொண்டு
+என்பது
+பல
+ஆகும்
+அல்லது
+அவர்
+நான்
+உள்ள
+அந்த
+இவர்
+என
+முதல்
+என்ன
+இருந்து
+சில
+என்
+போன்ற
+வேண்டும்
+வந்து
+இதன்
+அது
+அவன்
+தான்
+பலரும்
+என்னும்
+மேலும்
+பின்னர்
+கொண்ட
+இருக்கும்
+தனது
+உள்ளது
+போது
+என்றும்
+அதன்
+தன்
+பிறகு
+அவர்கள்
+வரை
+அவள்
+நீ
+ஆகிய
+இருந்தது
+உள்ளன
+வந்த
+இருந்த
+மிகவும்
+இங்கு
+மீது
+ஓர்
+இவை
+இந்தக்
+பற்றி
+வரும்
+வேறு
+இரு
+இதில்
+போல்
+இப்போது
+அவரது
+மட்டும்
+இந்தப்
+எனும்
+மேல்
+பின்
+சேர்ந்த
+ஆகியோர்
+எனக்கு
+இன்னும்
+அந்தப்
+அன்று
+ஒரே
+மிக
+அங்கு
+பல்வேறு
+விட்டு
+பெரும்
+அதை
+பற்றிய
+உன்
+அதிக
+அந்தக்
+பேர்
+இதனால்
+அவை
+அதே
+ஏன்
+முறை
+யார்
+என்பதை
+எல்லாம்
+மட்டுமே
+இங்கே
+அங்கே
+இடம்
+இடத்தில்
+அதில்
+நாம்
+அதற்கு
+எனவே
+பிற
+சிறு
+மற்ற
+விட
+எந்த
+எனவும்
+எனப்படும்
+எனினும்
+அடுத்த
+இதனை
+இதை
+கொள்ள
+இந்தத்
+இதற்கு
+அதனால்
+தவிர
+போல
+வரையில்
+சற்று
+எனக்
+""".split())
--- a/spacy/lang/th/init.py
+++ b/spacy/lang/th/init.py
@ -5,24 +5,14 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
 from .tag_map import TAG_MAP
 from .stop_words import STOP_WORDS

-from ...tokens import Doc
-from ...language import Language
 from ...attrs import LANG
+from ...language import Language
+from ...tokens import Doc
+from ...util import DummyTokenizer


-class ThaiDefaults(Language.Defaults):
-    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
-    lex_attr_getters[LANG] = lambda text: "th"
-    tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
-    tag_map = TAG_MAP
-    stop_words = STOP_WORDS
-
-
-class Thai(Language):
-    lang = "th"
-    Defaults = ThaiDefaults
-
-    def make_doc(self, text):
+class ThaiTokenizer(DummyTokenizer):
+    def __init__(self, cls, nlp=None):
        try:
            from pythainlp.tokenize import word_tokenize
        except ImportError:
@ -30,8 +20,35 @@ class Thai(Language):
                "The Thai tokenizer requires the PyThaiNLP library: "
                "https://github.com/PyThaiNLP/pythainlp"
            )
-        words = [x for x in list(word_tokenize(text, "newmm"))]
-        return Doc(self.vocab, words=words, spaces=[False] * len(words))
+
+        self.word_tokenize = word_tokenize
+        self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
+
+    def __call__(self, text):
+        words = list(self.word_tokenize(text, "newmm"))
+        spaces = [False] * len(words)
+        return Doc(self.vocab, words=words, spaces=spaces)
+
+
+class ThaiDefaults(Language.Defaults):
+    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    lex_attr_getters[LANG] = lambda _text: "th"
+
+    tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
+    tag_map = TAG_MAP
+    stop_words = STOP_WORDS
+
+    @classmethod
+    def create_tokenizer(cls, nlp=None):
+        return ThaiTokenizer(cls, nlp)
+
+
+class Thai(Language):
+    lang = "th"
+    Defaults = ThaiDefaults
+
+    def make_doc(self, text):
+        return self.tokenizer(text)


 __all__ = ["Thai"]
--- a/spacy/lang/tr/lex_attrs.py
+++ b/spacy/lang/tr/lex_attrs.py
@ -5,6 +5,7 @@ from ...attrs import LIKE_NUM


 # Thirteen, fifteen etc. are written separate: on üç
+
 _num_words = [
    "bir",
    "iki",
@ -28,6 +29,7 @@ _num_words = [
    "bin",
    "milyon",
    "milyar",
+    "trilyon",
    "katrilyon",
    "kentilyon",
 ]
--- a/spacy/tests/doc/test_doc_api.py
+++ b/spacy/tests/doc/test_doc_api.py
@ -353,10 +353,38 @@ def test_doc_api_similarity_match():
        assert doc.similarity(doc2) == 0.0


-def test_lowest_common_ancestor(en_tokenizer):
-    tokens = en_tokenizer("the lazy dog slept")
-    doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=[2, 1, 1, 0])
+@pytest.mark.parametrize(
+    "sentence,heads,lca_matrix",
+    [
+        (
+            "the lazy dog slept",
+            [2, 1, 1, 0],
+            numpy.array([[0, 2, 2, 3], [2, 1, 2, 3], [2, 2, 2, 3], [3, 3, 3, 3]]),
+        ),
+        (
+            "The lazy dog slept. The quick fox jumped",
+            [2, 1, 1, 0, -1, 2, 1, 1, 0],
+            numpy.array(
+                [
+                    [0, 2, 2, 3, 3, -1, -1, -1, -1],
+                    [2, 1, 2, 3, 3, -1, -1, -1, -1],
+                    [2, 2, 2, 3, 3, -1, -1, -1, -1],
+                    [3, 3, 3, 3, 3, -1, -1, -1, -1],
+                    [3, 3, 3, 3, 4, -1, -1, -1, -1],
+                    [-1, -1, -1, -1, -1, 5, 7, 7, 8],
+                    [-1, -1, -1, -1, -1, 7, 6, 7, 8],
+                    [-1, -1, -1, -1, -1, 7, 7, 7, 8],
+                    [-1, -1, -1, -1, -1, 8, 8, 8, 8],
+                ]
+            ),
+        ),
+    ],
+)
+def test_lowest_common_ancestor(en_tokenizer, sentence, heads, lca_matrix):
+    tokens = en_tokenizer(sentence)
+    doc = get_doc(tokens.vocab, [t.text for t in tokens], heads=heads)
    lca = doc.get_lca_matrix()
+    assert (lca == lca_matrix).all()
    assert lca[1, 1] == 1
    assert lca[0, 1] == 2
    assert lca[1, 2] == 2
--- a/spacy/tests/doc/test_span.py
+++ b/spacy/tests/doc/test_span.py
@ -80,10 +80,24 @@ def test_spans_lca_matrix(en_tokenizer):
    tokens = en_tokenizer("the lazy dog slept")
    doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=[2, 1, 1, 0])
    lca = doc[:2].get_lca_matrix()
-    assert lca[0, 0] == 0
-    assert lca[0, 1] == -1
-    assert lca[1, 0] == -1
-    assert lca[1, 1] == 1
+    assert lca.shape == (2, 2)
+    assert lca[0, 0] == 0  # the & the -> the
+    assert lca[0, 1] == -1  # the & lazy -> dog (out of span)
+    assert lca[1, 0] == -1  # lazy & the -> dog (out of span)
+    assert lca[1, 1] == 1  # lazy & lazy -> lazy
+
+    lca = doc[1:].get_lca_matrix()
+    assert lca.shape == (3, 3)
+    assert lca[0, 0] == 0  # lazy & lazy -> lazy
+    assert lca[0, 1] == 1  # lazy & dog -> dog
+    assert lca[0, 2] == 2  # lazy & slept -> slept
+
+    lca = doc[2:].get_lca_matrix()
+    assert lca.shape == (2, 2)
+    assert lca[0, 0] == 0  # dog & dog -> dog
+    assert lca[0, 1] == 1  # dog & slept -> slept
+    assert lca[1, 0] == 1  # slept & dog -> slept
+    assert lca[1, 1] == 1  # slept & slept -> slept


 def test_span_similarity_match():
@ -158,15 +172,17 @@ def test_span_as_doc(doc):


 def test_span_string_label(doc):
-    span = Span(doc, 0, 1, label='hello')
-    assert span.label_ == 'hello'
-    assert span.label == doc.vocab.strings['hello']
+    span = Span(doc, 0, 1, label="hello")
+    assert span.label_ == "hello"
+    assert span.label == doc.vocab.strings["hello"]
+

 def test_span_string_set_label(doc):
    span = Span(doc, 0, 1)
-    span.label_ = 'hello'
-    assert span.label_ == 'hello'
-    assert span.label == doc.vocab.strings['hello']
+    span.label_ = "hello"
+    assert span.label_ == "hello"
+    assert span.label == doc.vocab.strings["hello"]
+

 def test_span_ents_property(doc):
    """Test span.ents for the """
--- a/spacy/tests/lang/sv/test_exceptions.py
+++ b/spacy/tests/lang/sv/test_exceptions.py
@ -0,0 +1,53 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+import pytest
+
+
+SV_TOKEN_EXCEPTION_TESTS = [
+    ('Smörsåsen används bl.a. till fisk', ['Smörsåsen', 'används', 'bl.a.', 'till', 'fisk']),
+    ('Jag kommer först kl. 13 p.g.a. diverse förseningar', ['Jag', 'kommer', 'först', 'kl.', '13', 'p.g.a.', 'diverse', 'förseningar']),
+    ('Anders I. tycker om ord med i i.', ["Anders", "I.", "tycker", "om", "ord", "med", "i", "i", "."])
+]
+
+
+@pytest.mark.parametrize('text,expected_tokens', SV_TOKEN_EXCEPTION_TESTS)
+def test_sv_tokenizer_handles_exception_cases(sv_tokenizer, text, expected_tokens):
+    tokens = sv_tokenizer(text)
+    token_list = [token.text for token in tokens if not token.is_space]
+    assert expected_tokens == token_list
+
+
+@pytest.mark.parametrize('text', ["driveru", "hajaru", "Serru", "Fixaru"])
+def test_sv_tokenizer_handles_verb_exceptions(sv_tokenizer, text):
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 2
+    assert tokens[1].text == "u"
+
+
+@pytest.mark.parametrize('text',
+                         ["bl.a", "m.a.o.", "Jan.", "Dec.", "kr.", "osv."])
+def test_sv_tokenizer_handles_abbr(sv_tokenizer, text):
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 1
+
+
+@pytest.mark.parametrize('text', ["Jul.", "jul.", "sön.", "Sön."])
+def test_sv_tokenizer_handles_ambiguous_abbr(sv_tokenizer, text):
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 2
+
+
+def test_sv_tokenizer_handles_exc_in_text(sv_tokenizer):
+    text = "Det er bl.a. ikke meningen"
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 5
+    assert tokens[2].text == "bl.a."
+
+
+def test_sv_tokenizer_handles_custom_base_exc(sv_tokenizer):
+    text = "Her er noget du kan kigge i."
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 8
+    assert tokens[6].text == "i"
+    assert tokens[7].text == "."
--- a/spacy/tests/lang/sv/test_lemmatizer.py
+++ b/spacy/tests/lang/sv/test_lemmatizer.py
@ -0,0 +1,15 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import pytest
+
+
+@pytest.mark.parametrize('string,lemma', [('DNA-profilernas', 'DNA-profil'),
+                                          ('Elfenbenskustens', 'Elfenbenskusten'),
+                                          ('abortmotståndarens', 'abortmotståndare'),
+                                          ('kolesterols', 'kolesterol'),
+                                          ('portionssnusernas', 'portionssnus'),
+                                          ('åsyns', 'åsyn')])
+def test_lemmatizer_lookup_assigns(sv_tokenizer, string, lemma):
+    tokens = sv_tokenizer(string)
+    assert tokens[0].lemma_ == lemma
--- a/spacy/tests/lang/sv/test_prefix_suffix_infix.py
+++ b/spacy/tests/lang/sv/test_prefix_suffix_infix.py
@ -0,0 +1,37 @@
+# coding: utf-8
+"""Test that tokenizer prefixes, suffixes and infixes are handled correctly."""
+from __future__ import unicode_literals
+
+import pytest
+
+@pytest.mark.parametrize('text', ["(under)"])
+def test_tokenizer_splits_no_special(sv_tokenizer, text):
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 3
+
+
+@pytest.mark.parametrize('text', ["gitta'r", "Björn's", "Lars'"])
+def test_tokenizer_handles_no_punct(sv_tokenizer, text):
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 1
+
+
+@pytest.mark.parametrize('text', ["svart.Gul", "Hej.Världen"])
+def test_tokenizer_splits_period_infix(sv_tokenizer, text):
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 3
+
+
+@pytest.mark.parametrize('text', ["Hej,Världen", "en,två"])
+def test_tokenizer_splits_comma_infix(sv_tokenizer, text):
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 3
+    assert tokens[0].text == text.split(",")[0]
+    assert tokens[1].text == ","
+    assert tokens[2].text == text.split(",")[1]
+
+
+@pytest.mark.parametrize('text', ["svart...Gul", "svart...gul"])
+def test_tokenizer_splits_ellipsis_infix(sv_tokenizer, text):
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 3
--- a/spacy/tests/lang/sv/test_text.py
+++ b/spacy/tests/lang/sv/test_text.py
@ -0,0 +1,21 @@
+# coding: utf-8
+"""Test that longer and mixed texts are tokenized correctly."""
+
+from __future__ import unicode_literals
+
+import pytest
+
+def test_sv_tokenizer_handles_long_text(sv_tokenizer):
+    text = """Det var så härligt ute på landet. Det var sommar, majsen var gul, havren grön,
+höet var uppställt i stackar nere vid den gröna ängen, och där gick storken på sina långa,
+röda ben och snackade engelska, för det språket hade han lärt sig av sin mor.
+
+Runt om åkrar och äng låg den stora skogen, och mitt i skogen fanns djupa sjöar; jo, det var verkligen trevligt ute på landet!"""
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 86
+
+
+def test_sv_tokenizer_handles_trailing_dot_for_i_in_sentence(sv_tokenizer):
+    text = "Provar att tokenisera en mening med ord i."
+    tokens = sv_tokenizer(text)
+    assert len(tokens) == 9
--- a/spacy/tests/regression/test_issue2396.py
+++ b/spacy/tests/regression/test_issue2396.py
@ -5,27 +5,31 @@ from ..util import get_doc

 import pytest
 import numpy
-from numpy.testing import assert_array_equal


-@pytest.mark.parametrize('words,heads,matrix', [
+@pytest.mark.parametrize(
+    "sentence,heads,matrix",
+    [
        (
-        'She created a test for spacy'.split(),
+            "She created a test for spacy",
            [1, 0, 1, -2, -1, -1],
-        numpy.array([
+            numpy.array(
+                [
                    [0, 1, 1, 1, 1, 1],
                    [1, 1, 1, 1, 1, 1],
                    [1, 1, 2, 3, 3, 3],
                    [1, 1, 3, 3, 3, 3],
                    [1, 1, 3, 3, 4, 4],
-            [1, 1, 3, 3, 4, 5]], dtype=numpy.int32)
+                    [1, 1, 3, 3, 4, 5],
+                ],
+                dtype=numpy.int32,
+            ),
        )
-    ])
-def test_issue2396(en_vocab, words, heads, matrix):
-    doc = get_doc(en_vocab, words=words, heads=heads)
-
+    ],
+)
+def test_issue2396(en_tokenizer, sentence, heads, matrix):
+    tokens = en_tokenizer(sentence)
+    doc = get_doc(tokens.vocab, [t.text for t in tokens], heads=heads)
    span = doc[:]
-    assert_array_equal(doc.get_lca_matrix(), matrix)
-    assert_array_equal(span.get_lca_matrix(), matrix)
-
-
+    assert (doc.get_lca_matrix() == matrix).all()
+    assert (span.get_lca_matrix() == matrix).all()
--- a/spacy/tests/regression/test_issue2901.py
+++ b/spacy/tests/regression/test_issue2901.py
@ -10,7 +10,7 @@ def test_issue2901():
    """Test that `nlp` doesn't fail."""
    try:
        nlp = Japanese()
-    except:
+    except ImportError:
        pytest.skip()

    doc = nlp("pythonが大好きです")
--- a/spacy/tests/regression/test_issue3178.py
+++ b/spacy/tests/regression/test_issue3178.py
@ -0,0 +1,10 @@
+from __future__ import unicode_literals
+import pytest
+import spacy
+
+
+@pytest.mark.models('fr')
+def test_issue1959(FR):
+    texts = ['Je suis la mauvaise herbe', "Me, myself and moi"]
+    for text in texts:
+        FR(text)
--- a/spacy/tokens/doc.pyx
+++ b/spacy/tokens/doc.pyx
@ -1075,21 +1075,30 @@ cdef int [:,:] _get_lca_matrix(Doc doc, int start, int end):
    cdef int [:,:] lca_matrix

    n_tokens= end - start
-    lca_matrix = numpy.empty((n_tokens, n_tokens), dtype=numpy.int32)
+    lca_mat = numpy.empty((n_tokens, n_tokens), dtype=numpy.int32)
+    lca_mat.fill(-1)
+    lca_matrix = lca_mat

-    for j in range(start, end):
-        token_j = doc[j]
+    for j in range(n_tokens):
+        token_j = doc[start + j]
        # the common ancestor of token and itself is itself:
        lca_matrix[j, j] = j
-        for k in range(j + 1, end):
-            lca = _get_tokens_lca(token_j, doc[k])
+        # we will only iterate through tokens in the same sentence
+        sent = token_j.sent
+        sent_start = sent.start
+        j_idx_in_sent = start + j - sent_start
+        n_missing_tokens_in_sent = len(sent) - j_idx_in_sent
+        # make sure we do not go past `end`, in cases where `end` < sent.end
+        max_range = min(j + n_missing_tokens_in_sent, end)
+        for k in range(j + 1, max_range):
+            lca = _get_tokens_lca(token_j, doc[start + k])
            # if lca is outside of span, we set it to -1
            if not start <= lca < end:
                lca_matrix[j, k] = -1
                lca_matrix[k, j] = -1
            else:
-                lca_matrix[j, k] = lca
-                lca_matrix[k, j] = lca
+                lca_matrix[j, k] = lca - start
+                lca_matrix[k, j] = lca - start

    return lca_matrix

--- a/spacy/tokens/span.pyx
+++ b/spacy/tokens/span.pyx
@ -524,9 +524,9 @@ cdef class Span:
            return len(list(self.rights))

    property subtree:
-        """Tokens that descend from tokens in the span, but fall outside it.
+        """Tokens within the span and tokens which descend from them.

-        YIELDS (Token): A descendant of a token within the span.
+        YIELDS (Token): A token within the span, or a descendant from it.
        """
        def __get__(self):
            for word in self.lefts:
--- a/spacy/tokens/token.pyx
+++ b/spacy/tokens/token.pyx
@ -457,10 +457,11 @@ cdef class Token:
            yield from self.rights

    property subtree:
-        """A sequence of all the token's syntactic descendents.
+        """A sequence containing the token and all the token's syntactic
+        descendants.

        YIELDS (Token): A descendent token such that
-            `self.is_ancestor(descendent)`.
+            `self.is_ancestor(descendent) or token == self`.
        """
        def __get__(self):
            for word in self.lefts:
--- a/spacy/util.py
+++ b/spacy/util.py
@ -253,7 +253,6 @@ def get_entry_point(key, value):
 def is_in_jupyter():
    """Check if user is running spaCy from a Jupyter notebook by detecting the
    IPython kernel. Mainly used for the displaCy visualizer.
-
    RETURNS (bool): True if in Jupyter, False if not.
    """
    # https://stackoverflow.com/a/39662359/6400719
@ -667,3 +666,19 @@ class SimpleFrozenDict(dict):

    def update(self, other):
        raise NotImplementedError(Errors.E095)
+
+
+class DummyTokenizer(object):
+    # add dummy methods for to_bytes, from_bytes, to_disk and from_disk to
+    # allow serialization (see #1557)
+    def to_bytes(self, **exclude):
+        return b''
+
+    def from_bytes(self, _bytes_data, **exclude):
+        return self
+
+    def to_disk(self, _path, **exclude):
+        return None
+
+    def from_disk(self, _path, **exclude):
+        return self
--- a/website/api/_annotation/_dep-labels.jade
+++ b/website/api/_annotation/_dep-labels.jade
@ -150,3 +150,9 @@ p
        +dep-row("re", "repeated element")
        +dep-row("rs", "reported speech")
        +dep-row("sb", "subject")
+        +dep-row("sbp", "passivised subject")
+        +dep-row("sp", "subject or predicate")
+        +dep-row("svp", "separable verb prefix")
+        +dep-row("uc", "unit component")
+        +dep-row("vo", "vocative")
+        +dep-row("ROOT", "root")
--- a/website/api/phrasematcher.jade
+++ b/website/api/phrasematcher.jade
@ -5,7 +5,7 @@ include ../_includes/_mixins
 p
    |  The #[code PhraseMatcher] lets you efficiently match large terminology
    |  lists. While the #[+api("matcher") #[code Matcher]] lets you match
-    |  squences based on lists of token descriptions, the #[code PhraseMatcher]
+    |  sequences based on lists of token descriptions, the #[code PhraseMatcher]
    |  accepts match patterns in the form of #[code Doc] objects.

 +h(2, "init") PhraseMatcher.__init__
--- a/website/api/span.jade
+++ b/website/api/span.jade
@ -489,7 +489,7 @@ p
    +tag property
    +tag-model("parse")

-p Tokens that descend from tokens in the span, but fall outside it.
+p Tokens within the span and tokens which descend from them.

 +aside-code("Example").
    doc = nlp(u'Give it back! He pleaded.')
@ -500,7 +500,7 @@ p Tokens that descend from tokens in the span, but fall outside it.
    +row("foot")
        +cell yields
        +cell #[code Token]
-        +cell A descendant of a token within the span.
+        +cell A token within the span, or a descendant from it.

 +h(2, "has_vector") Span.has_vector
    +tag property
--- a/website/api/token.jade
+++ b/website/api/token.jade
@ -1,3 +1,4 @@
+
 //- 💫 DOCS > API > TOKEN

 include ../_includes/_mixins
@ -405,7 +406,7 @@ p
    +tag property
    +tag-model("parse")

-p A sequence of all the token's syntactic descendants.
+p A sequence containing the token and all the token's syntactic descendants.

 +aside-code("Example").
    doc = nlp(u'Give it back! He pleaded.')
@ -416,7 +417,7 @@ p A sequence of all the token's syntactic descendants.
    +row("foot")
        +cell yields
        +cell #[code Token]
-        +cell A descendant token such that #[code self.is_ancestor(descendant)].
+        +cell A descendant token such that #[code self.is_ancestor(token) or token == self].

 +h(2, "is_sent_start") Token.is_sent_start
    +tag property
--- a/website/universe/universe.json
+++ b/website/universe/universe.json
@ -1083,20 +1083,31 @@
            "category": ["pipeline"]
        },
        {
-            "id": "spacy2conllu",
-            "title": "spaCy2CoNLLU",
+            "id": "spacy-conll",
+            "title": "spacy_conll",
            "slogan": "Parse text with spaCy and print the output in CoNLL-U format",
-            "description": "Simple script to parse text with spaCy and print the output in CoNLL-U format",
+            "description": "This module allows you to parse a text to CoNLL-U format. You can use it as a command line tool, or embed it in your own scripts.",
            "code_example": [
-                "python parse_as_conllu.py [-h] --input_file INPUT_FILE [--output_file OUTPUT_FILE] --model MODEL"
+                "from spacy_conll import Spacy2ConllParser",
+                "spacyconll = Spacy2ConllParser()",
+                "",
+                "# `parse` returns a generator of the parsed sentences",
+                "for parsed_sent in spacyconll.parse(input_str='I like cookies.\nWhat about you?\nI don't like 'em!'):",
+                "    do_something_(parsed_sent)",
+                "",
+                "# `parseprint` prints output to stdout (default) or a file (use `output_file` parameter)",
+                "# This method is called when using the command line",
+                "spacyconll.parseprint(input_str='I like cookies.')"
            ],
-            "code_language": "bash",
-            "author": "Raquel G. Alhama",
+            "code_language": "python",
+            "author": "Bram Vanroy",
            "author_links": {
-                "github": "rgalhama"
+                "github": "BramVanroy",
+                "website": "https://bramvanroy.be"
+
            },
-            "github": "rgalhama/spaCy2CoNLLU",
-            "category": ["training"]
+            "github": "BramVanroy/spacy_conll",
+            "category": ["standalone"]
        }
    ],
    "projectCats": {
--- a/website/usage/_linguistic-features/_named-entities.jade
+++ b/website/usage/_linguistic-features/_named-entities.jade
@ -159,7 +159,7 @@ p
    |  To provide training examples to the entity recogniser, you'll first need
    |  to create an instance of the #[+api("goldparse") #[code GoldParse]] class.
    |  You can specify your annotations in a stand-off format or as token tags.
-    |  If a character offset in your entity annotations don't fall on a token
+    |  If a character offset in your entity annotations doesn't fall on a token
    |  boundary, the #[code GoldParse] class will treat that annotation as a
    |  missing value.  This allows for more realistic training, because the
    |  entity recogniser is allowed to learn from examples that may feature
--- a/website/usage/_linguistic-features/_rule-based-matching.jade
+++ b/website/usage/_linguistic-features/_rule-based-matching.jade
@ -444,7 +444,7 @@ p
    |  Let's say you're analysing user comments and you want to find out what
    |  people are saying about Facebook. You want to start off by finding
    |  adjectives following "Facebook is" or "Facebook was". This is obviously
-    |  a very rudimentary solution, but it'll be fast, and a great way get an
+    |  a very rudimentary solution, but it'll be fast, and a great way to get an
    |  idea for what's in your data. Your pattern could look like this:

 +code.
--- a/website/usage/_linguistic-features/_sentence-segmentation.jade
+++ b/website/usage/_linguistic-features/_sentence-segmentation.jade
@ -40,7 +40,7 @@ p
    |  constrained to predict parses consistent with the sentence boundaries.

 +infobox("Important note", "⚠️")
-    |  To prevent inconsitent state, you can only set boundaries #[em before] a
+    |  To prevent inconsistent state, you can only set boundaries #[em before] a
    |  document is parsed (and #[code Doc.is_parsed] is #[code False]). To
    |  ensure that your component is added in the right place, you can set
    |  #[code before='parser'] or #[code first=True] when adding it to the
--- a/website/usage/_linguistic-features/_tokenization.jade
+++ b/website/usage/_linguistic-features/_tokenization.jade
@ -21,7 +21,7 @@ p
    |  which needs to be split into two tokens: #[code {ORTH: "do"}] and
    |  #[code {ORTH: "n't", LEMMA: "not"}]. The prefixes, suffixes and infixes
    |  mosty define punctuation rules – for example, when to split off periods
-    |  (at the end of a sentence), and when to leave token containing periods
+    |  (at the end of a sentence), and when to leave tokens containing periods
    |  intact (abbreviations like "U.S.").

 +graphic("/assets/img/language_data.svg")
--- a/website/usage/_processing-pipelines/_multithreading.jade
+++ b/website/usage/_processing-pipelines/_multithreading.jade
@ -43,7 +43,7 @@ p

 p
    |  This example shows how to use multiple cores to process text using
-    |  spaCy and #[+a("https://pythonhosted.org/joblib/") Joblib]. We're
+    |  spaCy and #[+a("https://joblib.readthedocs.io/en/latest/parallel.html") Joblib]. We're
    |  exporting part-of-speech-tagged, true-cased, (very roughly)
    |  sentence-separated text, with each "sentence" on a newline, and
    |  spaces between tokens. Data is loaded from the IMDB movie reviews
--- a/website/usage/_visualizers/_ent.jade
+++ b/website/usage/_visualizers/_ent.jade
@ -74,7 +74,7 @@ p
    displacy.serve(doc, style='ent')

 p
-    |  This feature is espeically handy if you're using displaCy to compare
+    |  This feature is especially handy if you're using displaCy to compare
    |  performance at different stages of a process, e.g. during training. Here
    |  you could use the title for a brief description of the text example and
    |  the number of iterations.
--- a/website/usage/_visualizers/_html.jade
+++ b/website/usage/_visualizers/_html.jade
@ -61,7 +61,7 @@ p
        output_path.open('w', encoding='utf-8').write(svg)

 p
-    |  The above code will generate the dependency visualizations as to
+    |  The above code will generate the dependency visualizations as
    |  two files, #[code This-is-an-example.svg] and #[code This-is-another-one.svg].


--- a/website/usage/visualizers.jade
+++ b/website/usage/visualizers.jade
@ -24,7 +24,7 @@ include ../_includes/_mixins
        |  standards.

    p
-        |  The quickest way visualize  #[code Doc] is to use
+        |  The quickest way to visualize  #[code Doc] is to use
        |  #[+api("displacy#serve") #[code displacy.serve]]. This will spin up a
        |  simple web server and let you view the result straight from your browser.
        |  displaCy can either take a single #[code Doc] or a list of #[code Doc]