Merge branch 'master' into develop

2025-10-18 09:44:16 +03:00 · 2018-12-18 13:48:10 +01:00 · 2018-12-18 13:48:10 +01:00 · 61d09c481b
commit 61d09c481b
parent 92f4b9c8ea 52f3c95004
56 changed files with 691923 additions and 334997 deletions
--- a/.github/contributors/Brixjohn.md
+++ b/.github/contributors/Brixjohn.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [X] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Brixter John Lumabi  |
+| Company name (if applicable)   | Stratpoint           |
+| Title or role (if applicable)  | Software Developer   |
+| Date                           | 18 December 2018     |
+| GitHub username                | Brixjohn             |
+| Website (optional)             |                      |
--- a/.github/contributors/amperinet.md
+++ b/.github/contributors/amperinet.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                   |
+|------------------------------- | ----------------------- |
+| Name                           | Amandine Périnet        |
+| Company name (if applicable)   | 365Talents              |
+| Title or role (if applicable)  | Data Science Researcher |
+| Date                           | 12/12/2018              |
+| GitHub username                | amperinet               |
+| Website (optional)             |                         |
--- a/.github/contributors/beatesi.md
+++ b/.github/contributors/beatesi.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Beate Sildnes        |
+| Company name (if applicable)   | NAV                  |
+| Title or role (if applicable)  | Data Scientist       |
+| Date                           | 04.12.2018           |
+| GitHub username                | beatesi              |
+| Website (optional)             |                      |
--- a/.github/contributors/chezou.md
+++ b/.github/contributors/chezou.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Aki Ariga            |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 07/12/2018            |
+| GitHub username                | chezou            |
+| Website (optional)             | chezo.uno             |
--- a/.github/contributors/svlandeg.md
+++ b/.github/contributors/svlandeg.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Sofie Van Landeghem  |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 29 Nov 2018          |
+| GitHub username                | svlandeg             |
+| Website (optional)             |                      |
--- a/.github/contributors/wxv.md
+++ b/.github/contributors/wxv.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Jason Xu             |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2018-11-29           |
+| GitHub username                | wxv                  |
+| Website (optional)             |                      |
--- a/examples/keras_parikh_entailment/main.py
+++ b/examples/keras_parikh_entailment/main.py
@ -20,9 +20,10 @@ import os
 import importlib
 from keras import backend as K

+
 def set_keras_backend(backend):
    if K.backend() != backend:
-        os.environ['KERAS_BACKEND'] = backend
+        os.environ["KERAS_BACKEND"] = backend
        importlib.reload(K)
        assert K.backend() == backend
    if backend == "tensorflow":
@ -32,6 +33,7 @@ def set_keras_backend(backend):
        K.set_session(K.tf.Session(config=cfg))
        K.clear_session()

+
 set_keras_backend("tensorflow")


@ -40,9 +42,8 @@ def train(train_loc, dev_loc, shape, settings):
    dev_texts1, dev_texts2, dev_labels = read_snli(dev_loc)

    print("Loading spaCy")
-    nlp = spacy.load('en_vectors_web_lg')
+    nlp = spacy.load("en_vectors_web_lg")
    assert nlp.path is not None
-
    print("Processing texts...")
    train_X = create_dataset(nlp, train_texts1, train_texts2, 100, shape[0])
    dev_X = create_dataset(nlp, dev_texts1, dev_texts2, 100, shape[0])
@ -54,29 +55,28 @@ def train(train_loc, dev_loc, shape, settings):
    model.fit(
        train_X,
        train_labels,
-        validation_data = (dev_X, dev_labels),
-        epochs = settings['nr_epoch'],
-        batch_size = settings['batch_size'])
-
-    if not (nlp.path / 'similarity').exists():
-        (nlp.path / 'similarity').mkdir()
-    print("Saving to", nlp.path / 'similarity')
+        validation_data=(dev_X, dev_labels),
+        epochs=settings["nr_epoch"],
+        batch_size=settings["batch_size"],
+    )
+    if not (nlp.path / "similarity").exists():
+        (nlp.path / "similarity").mkdir()
+    print("Saving to", nlp.path / "similarity")
    weights = model.get_weights()
    # remove the embedding matrix.  We can reconstruct it.
    del weights[1]
-    with (nlp.path / 'similarity' / 'model').open('wb') as file_:
+    with (nlp.path / "similarity" / "model").open("wb") as file_:
        pickle.dump(weights, file_)
-    with (nlp.path / 'similarity' / 'config.json').open('w') as file_:
+    with (nlp.path / "similarity" / "config.json").open("w") as file_:
        file_.write(model.to_json())


 def evaluate(dev_loc, shape):
    dev_texts1, dev_texts2, dev_labels = read_snli(dev_loc)
-    nlp = spacy.load('en_vectors_web_lg')
-    nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))
-
-    total = 0.
-    correct = 0.
+    nlp = spacy.load("en_vectors_web_lg")
+    nlp.add_pipe(KerasSimilarityShim.load(nlp.path / "similarity", nlp, shape[0]))
+    total = 0.0
+    correct = 0.0
    for text1, text2, label in zip(dev_texts1, dev_texts2, dev_labels):
        doc1 = nlp(text1)
        doc2 = nlp(text2)
@ -88,11 +88,11 @@ def evaluate(dev_loc, shape):


 def demo(shape):
-    nlp = spacy.load('en_vectors_web_lg')
-    nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))
+    nlp = spacy.load("en_vectors_web_lg")
+    nlp.add_pipe(KerasSimilarityShim.load(nlp.path / "similarity", nlp, shape[0]))

-    doc1 = nlp(u'The king of France is bald.')
-    doc2 = nlp(u'France has no king.')
+    doc1 = nlp(u"The king of France is bald.")
+    doc2 = nlp(u"France has no king.")

    print("Sentence 1:", doc1)
    print("Sentence 2:", doc2)
@ -101,30 +101,31 @@ def demo(shape):
    print("Entailment type:", entailment_type, "(Confidence:", confidence, ")")


-LABELS = {'entailment': 0, 'contradiction': 1, 'neutral': 2}
+LABELS = {"entailment": 0, "contradiction": 1, "neutral": 2}
+
+
 def read_snli(path):
    texts1 = []
    texts2 = []
    labels = []
-    with open(path, 'r') as file_:
+    with open(path, "r") as file_:
        for line in file_:
            eg = json.loads(line)
-            label = eg['gold_label']
-            if label == '-':  # per Parikh, ignore - SNLI entries
+            label = eg["gold_label"]
+            if label == "-":  # per Parikh, ignore - SNLI entries
                continue
-            texts1.append(eg['sentence1'])
-            texts2.append(eg['sentence2'])
+            texts1.append(eg["sentence1"])
+            texts2.append(eg["sentence2"])
            labels.append(LABELS[label])
-    return texts1, texts2, to_categorical(np.asarray(labels, dtype='int32'))
+    return texts1, texts2, to_categorical(np.asarray(labels, dtype="int32"))
+

 def create_dataset(nlp, texts, hypotheses, num_unk, max_length):
    sents = texts + hypotheses
-
    sents_as_ids = []
    for sent in sents:
        doc = nlp(sent)
        word_ids = []
-
        for i, token in enumerate(doc):
            # skip odd spaces from tokenizer
            if token.has_vector and token.vector_norm == 0:
@ -140,13 +141,12 @@ def create_dataset(nlp, texts, hypotheses, num_unk, max_length):
                word_ids.append(token.rank % num_unk + 1)

        # there must be a simpler way of generating padded arrays from lists...
-        word_id_vec = np.zeros((max_length), dtype='int')
+        word_id_vec = np.zeros((max_length), dtype="int")
        clipped_len = min(max_length, len(word_ids))
        word_id_vec[:clipped_len] = word_ids[:clipped_len]
        sents_as_ids.append(word_id_vec)

-
-    return [np.array(sents_as_ids[:len(texts)]), np.array(sents_as_ids[len(texts):])]
+    return [np.array(sents_as_ids[: len(texts)]), np.array(sents_as_ids[len(texts) :])]


@plac.annotations(
@ -159,39 +159,49 @@ def create_dataset(nlp, texts, hypotheses, num_unk, max_length):
    learn_rate=("Learning rate", "option", "r", float),
    batch_size=("Batch size for neural network training", "option", "b", int),
    nr_epoch=("Number of training epochs", "option", "e", int),
-    entail_dir=("Direction of entailment", "option", "D", str, ["both", "left", "right"])
+    entail_dir=(
+        "Direction of entailment",
+        "option",
+        "D",
+        str,
+        ["both", "left", "right"],
+    ),
 )
-def main(mode, train_loc, dev_loc,
-        max_length = 50,
-        nr_hidden = 200,
-        dropout = 0.2,
-        learn_rate = 0.001,
-        batch_size = 1024,
-        nr_epoch = 10,
-        entail_dir="both"):
-
+def main(
+    mode,
+    train_loc,
+    dev_loc,
+    max_length=50,
+    nr_hidden=200,
+    dropout=0.2,
+    learn_rate=0.001,
+    batch_size=1024,
+    nr_epoch=10,
+    entail_dir="both",
+):
    shape = (max_length, nr_hidden, 3)
    settings = {
-        'lr': learn_rate,
-        'dropout': dropout,
-        'batch_size': batch_size,
-        'nr_epoch': nr_epoch,
-        'entail_dir': entail_dir
+        "lr": learn_rate,
+        "dropout": dropout,
+        "batch_size": batch_size,
+        "nr_epoch": nr_epoch,
+        "entail_dir": entail_dir,
    }

-    if mode == 'train':
+    if mode == "train":
        if train_loc == None or dev_loc == None:
            print("Train mode requires paths to training and development data sets.")
            sys.exit(1)
        train(train_loc, dev_loc, shape, settings)
-    elif mode == 'evaluate':
-        if  dev_loc == None:
+    elif mode == "evaluate":
+        if dev_loc == None:
            print("Evaluate mode requires paths to test data set.")
            sys.exit(1)
        correct, total = evaluate(dev_loc, shape)
-        print(correct, '/', total, correct / total)
+        print(correct, "/", total, correct / total)
    else:
        demo(shape)

-if __name__ == '__main__':
+
+if __name__ == "__main__":
    plac.call(main)
--- a/examples/keras_parikh_entailment/keras_decomposable_attention.py
+++ b/examples/keras_parikh_entailment/keras_decomposable_attention.py
@ -5,11 +5,12 @@ import numpy as np
 from keras import layers, Model, models, optimizers
 from keras import backend as K

+
 def build_model(vectors, shape, settings):
    max_length, nr_hidden, nr_class = shape

-    input1 = layers.Input(shape=(max_length,), dtype='int32', name='words1')
-    input2 = layers.Input(shape=(max_length,), dtype='int32', name='words2')
+    input1 = layers.Input(shape=(max_length,), dtype="int32", name="words1")
+    input2 = layers.Input(shape=(max_length,), dtype="int32", name="words2")

    # embeddings (projected)
    embed = create_embedding(vectors, max_length, nr_hidden)
@ -23,11 +24,11 @@ def build_model(vectors, shape, settings):

    G = create_feedforward(nr_hidden)

-    if settings['entail_dir'] == 'both':
+    if settings["entail_dir"] == "both":
        norm_weights_a = layers.Lambda(normalizer(1))(att_weights)
        norm_weights_b = layers.Lambda(normalizer(2))(att_weights)
        alpha = layers.dot([norm_weights_a, a], axes=1)
-        beta  = layers.dot([norm_weights_b, b], axes=1)
+        beta = layers.dot([norm_weights_b, b], axes=1)

        # step 2: compare
        comp1 = layers.concatenate([a, beta])
@ -40,7 +41,7 @@ def build_model(vectors, shape, settings):
        v2_sum = layers.Lambda(sum_word)(v2)
        concat = layers.concatenate([v1_sum, v2_sum])

-    elif settings['entail_dir'] == 'left':
+    elif settings["entail_dir"] == "left":
        norm_weights_a = layers.Lambda(normalizer(1))(att_weights)
        alpha = layers.dot([norm_weights_a, a], axes=1)
        comp2 = layers.concatenate([b, alpha])
@ -50,7 +51,7 @@ def build_model(vectors, shape, settings):

    else:
        norm_weights_b = layers.Lambda(normalizer(2))(att_weights)
-        beta  = layers.dot([norm_weights_b, b], axes=1)
+        beta = layers.dot([norm_weights_b, b], axes=1)
        comp1 = layers.concatenate([a, beta])
        v1 = layers.TimeDistributed(G)(comp1)
        v1_sum = layers.Lambda(sum_word)(v1)
@ -58,80 +59,86 @@ def build_model(vectors, shape, settings):

    H = create_feedforward(nr_hidden)
    out = H(concat)
-    out = layers.Dense(nr_class, activation='softmax')(out)
+    out = layers.Dense(nr_class, activation="softmax")(out)

    model = Model([input1, input2], out)

    model.compile(
-        optimizer=optimizers.Adam(lr=settings['lr']),
-        loss='categorical_crossentropy',
-        metrics=['accuracy'])
+        optimizer=optimizers.Adam(lr=settings["lr"]),
+        loss="categorical_crossentropy",
+        metrics=["accuracy"],
+    )

    return model


 def create_embedding(vectors, max_length, projected_dim):
-    return models.Sequential([
-        layers.Embedding(
-            vectors.shape[0],
-            vectors.shape[1],
-            input_length=max_length,
-            weights=[vectors],
-            trainable=False),
+    return models.Sequential(
+        [
+            layers.Embedding(
+                vectors.shape[0],
+                vectors.shape[1],
+                input_length=max_length,
+                weights=[vectors],
+                trainable=False,
+            ),
+            layers.TimeDistributed(
+                layers.Dense(projected_dim, activation=None, use_bias=False)
+            ),
+        ]
+    )

-        layers.TimeDistributed(
-            layers.Dense(projected_dim,
-                         activation=None,
-                         use_bias=False))
-    ])

-def create_feedforward(num_units=200, activation='relu', dropout_rate=0.2):
-    return models.Sequential([
-        layers.Dense(num_units, activation=activation),
-        layers.Dropout(dropout_rate),
-        layers.Dense(num_units, activation=activation),
-        layers.Dropout(dropout_rate)
-    ])
+def create_feedforward(num_units=200, activation="relu", dropout_rate=0.2):
+    return models.Sequential(
+        [
+            layers.Dense(num_units, activation=activation),
+            layers.Dropout(dropout_rate),
+            layers.Dense(num_units, activation=activation),
+            layers.Dropout(dropout_rate),
+        ]
+    )


 def normalizer(axis):
    def _normalize(att_weights):
        exp_weights = K.exp(att_weights)
        sum_weights = K.sum(exp_weights, axis=axis, keepdims=True)
-        return exp_weights/sum_weights
+        return exp_weights / sum_weights
+
    return _normalize

+
 def sum_word(x):
    return K.sum(x, axis=1)


 def test_build_model():
-    vectors = np.ndarray((100, 8), dtype='float32')
+    vectors = np.ndarray((100, 8), dtype="float32")
    shape = (10, 16, 3)
-    settings = {'lr': 0.001, 'dropout': 0.2, 'gru_encode':True, 'entail_dir':'both'}
+    settings = {"lr": 0.001, "dropout": 0.2, "gru_encode": True, "entail_dir": "both"}
    model = build_model(vectors, shape, settings)


 def test_fit_model():
-
    def _generate_X(nr_example, length, nr_vector):
-        X1 = np.ndarray((nr_example, length), dtype='int32')
+        X1 = np.ndarray((nr_example, length), dtype="int32")
        X1 *= X1 < nr_vector
        X1 *= 0 <= X1
-        X2 = np.ndarray((nr_example, length), dtype='int32')
+        X2 = np.ndarray((nr_example, length), dtype="int32")
        X2 *= X2 < nr_vector
        X2 *= 0 <= X2
        return [X1, X2]

    def _generate_Y(nr_example, nr_class):
-        ys = np.zeros((nr_example, nr_class), dtype='int32')
+        ys = np.zeros((nr_example, nr_class), dtype="int32")
        for i in range(nr_example):
            ys[i, i % nr_class] = 1
        return ys

-    vectors = np.ndarray((100, 8), dtype='float32')
+    vectors = np.ndarray((100, 8), dtype="float32")
    shape = (10, 16, 3)
-    settings = {'lr': 0.001, 'dropout': 0.2, 'gru_encode':True, 'entail_dir':'both'}
+    settings = {"lr": 0.001, "dropout": 0.2, "gru_encode": True, "entail_dir": "both"}
    model = build_model(vectors, shape, settings)

    train_X = _generate_X(20, shape[0], vectors.shape[0])
--- a/examples/training/train_ner.py
+++ b/examples/training/train_ner.py
@ -59,7 +59,7 @@ def main(model=None, output_dir=None, n_iter=100):
        # reset and initialize the weights randomly – but only if we're
        # training a new model
        if model is None:
-            optimizer = nlp.begin_training()
+            nlp.begin_training()
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
--- a/examples/training/train_textcat.py
+++ b/examples/training/train_textcat.py
@ -90,7 +90,8 @@ def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
-        nlp.to_disk(output_dir)
+        with nlp.use_params(optimizer.averages):
+            nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
--- a/setup.py
+++ b/setup.py
@ -98,6 +98,14 @@ if os.environ.get("USE_OPENMP", USE_OPENMP_DEFAULT) == "1":
        COMPILE_OPTIONS["other"].append("-fopenmp")
        LINK_OPTIONS["other"].append("-fopenmp")

+if sys.platform == "darwin":
+    # On Mac, use libc++ because Apple deprecated use of
+    # libstdc
+    COMPILE_OPTIONS["other"].append("-stdlib=libc++")
+    LINK_OPTIONS["other"].append("-lc++")
+    # g++ (used by unix compiler on mac) links to libstdc++ as a default lib.
+    # See: https://stackoverflow.com/questions/1653047/avoid-linking-to-libstdc
+    LINK_OPTIONS["other"].append("-nodefaultlibs")

 # By subclassing build_extensions we have the actual compiler that will be used which is really known only after finalize_options
 # http://stackoverflow.com/questions/724664/python-distutils-how-to-get-a-compiler-that-is-going-to-be-used
@ -183,6 +191,7 @@ def setup_package():
        for mod_name in MOD_NAMES:
            mod_path = mod_name.replace(".", "/") + ".cpp"
            extra_link_args = []
+            extra_compile_args = []
            # ???
            # Imported from patch from @mikepb
            # See Issue #267. Running blind here...
--- a/spacy/cli/converters/iob2json.py
+++ b/spacy/cli/converters/iob2json.py
@ -4,6 +4,8 @@ from __future__ import unicode_literals
 from ...gold import iob_to_biluo
 from ...util import minibatch

+import re
+

 def iob2json(input_data, n_sents=10, *args, **kwargs):
    """
@ -25,7 +27,8 @@ def read_iob(raw_sents):
    for line in raw_sents:
        if not line.strip():
            continue
-        tokens = [t.split("|") for t in line.split()]
+        # tokens = [t.split("|") for t in line.split()]
+        tokens = [re.split("[^\w\-]", line.strip())]
        if len(tokens[0]) == 3:
            words, pos, iob = zip(*tokens)
        else:
--- a/spacy/lang/fa/stop_words.py
+++ b/spacy/lang/fa/stop_words.py
@ -1,6 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals

+# stop words from HAZM package

 # Stop words from HAZM package
 STOP_WORDS = set(
--- a/spacy/lang/fr/lemmatizer/_adjectives.py
+++ b/spacy/lang/fr/lemmatizer/_adjectives.py
--- a/spacy/lang/fr/lemmatizer/_adjectives_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_adjectives_irreg.py
--- a/spacy/lang/fr/lemmatizer/_adverbs.py
+++ b/spacy/lang/fr/lemmatizer/_adverbs.py
--- a/spacy/lang/fr/lemmatizer/_auxiliary_verbs_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_auxiliary_verbs_irreg.py
@ -3,6 +3,7 @@ from __future__ import unicode_literals


 AUXILIARY_VERBS_IRREG = {
+    "été": ("être",),
    "suis": ("être",),
    "es": ("être",),
    "est": ("être",),
@ -83,4 +84,286 @@ AUXILIARY_VERBS_IRREG = {
    "eussiez": ("avoir",),
    "eussent": ("avoir",),
    "ayant": ("avoir",),
+    "eu": ("avoir",),
+    "eue": ("avoir",),
+    "eues": ("avoir",),
+    "devaient": ("devoir",),
+    "devais": ("devoir",),
+    "devait": ("devoir",),
+    "devant": ("devoir",),
+    "devez": ("devoir",),
+    "deviez": ("devoir",),
+    "devions": ("devoir",),
+    "devons": ("devoir",),
+    "devra": ("devoir",),
+    "devrai": ("devoir",),
+    "devraient": ("devoir",),
+    "devrais": ("devoir",),
+    "devrait": ("devoir",),
+    "devras": ("devoir",),
+    "devrez": ("devoir",),
+    "devriez": ("devoir",),
+    "devrions": ("devoir",),
+    "devrons": ("devoir",),
+    "devront": ("devoir",),
+    "dois": ("devoir",),
+    "doit": ("devoir",),
+    "doive": ("devoir",),
+    "doivent": ("devoir",),
+    "doives": ("devoir",),
+    "dû": ("devoir",),
+    "due": ("devoir",),
+    "dues": ("devoir",),
+    "dûmes": ("devoir",),
+    "durent": ("devoir",),
+    "dus": ("devoir",),
+    "dûs": ("devoir",),
+    "dusse": ("devoir",),
+    "dussent": ("devoir",),
+    "dusses": ("devoir",),
+    "dussiez": ("devoir",),
+    "dussions": ("devoir",),
+    "dut": ("devoir",),
+    "dût": ("devoir",),
+    "dûtes": ("devoir",),
+    "peut": ("pouvoir",),
+    "peuvent": ("pouvoir",),
+    "peux": ("pouvoir",),
+    "pourraient": ("pouvoir",),
+    "pourrai": ("pouvoir",),
+    "pourrais": ("pouvoir",),
+    "pourrait": ("pouvoir",),
+    "pourra": ("pouvoir",),
+    "pourras": ("pouvoir",),
+    "pourrez": ("pouvoir",),
+    "pourriez": ("pouvoir",),
+    "pourrions": ("pouvoir",),
+    "pourrons": ("pouvoir",),
+    "pourront": ("pouvoir",),
+    "pouvaient": ("pouvoir",),
+    "pouvais": ("pouvoir",),
+    "pouvait": ("pouvoir",),
+    "pouvez": ("pouvoir",),
+    "pouviez": ("pouvoir",),
+    "pouvions": ("pouvoir",),
+    "pouvons": ("pouvoir",),
+    "pûmes": ("pouvoir",),
+    "pu": ("pouvoir",),
+    "purent": ("pouvoir",),
+    "pus": ("pouvoir",),
+    "pûtes": ("pouvoir",),
+    "put": ("pouvoir",),
+    "pouvant": ("pouvoir",),
+    "puisse": ("pouvoir",),
+    "puissions": ("pouvoir",),
+    "puissiez": ("pouvoir",),
+    "puissent": ("pouvoir",),
+    "pusse": ("pouvoir",),
+    "pusses": ("pouvoir",),
+    "pussions": ("pouvoir",),
+    "pussiez": ("pouvoir",),
+    "pussent": ("pouvoir",),
+    "faisaient": ("faire",),
+    "faisais": ("faire",),
+    "faisait": ("faire",),
+    "faisant": ("faire",),
+    "fais": ("faire",),
+    "faisiez": ("faire",),
+    "faisions": ("faire",),
+    "faisons": ("faire",),
+    "faite": ("faire",),
+    "faites": ("faire",),
+    "fait": ("faire",),
+    "faits": ("faire",),
+    "fasse": ("faire",),
+    "fassent": ("faire",),
+    "fasses": ("faire",),
+    "fassiez": ("faire",),
+    "fassions": ("faire",),
+    "fera": ("faire",),
+    "feraient": ("faire",),
+    "ferai": ("faire",),
+    "ferais": ("faire",),
+    "ferait": ("faire",),
+    "feras": ("faire",),
+    "ferez": ("faire",),
+    "feriez": ("faire",),
+    "ferions": ("faire",),
+    "ferons": ("faire",),
+    "feront": ("faire",),
+    "fîmes": ("faire",),
+    "firent": ("faire",),
+    "fis": ("faire",),
+    "fisse": ("faire",),
+    "fissent": ("faire",),
+    "fisses": ("faire",),
+    "fissiez": ("faire",),
+    "fissions": ("faire",),
+    "fîtes": ("faire",),
+    "fit": ("faire",),
+    "fît": ("faire",),
+    "font": ("faire",),
+    "veuillent": ("vouloir",),
+    "veuilles": ("vouloir",),
+    "veuille": ("vouloir",),
+    "veuillez": ("vouloir",),
+    "veuillons": ("vouloir",),
+    "veulent": ("vouloir",),
+    "veut": ("vouloir",),
+    "veux": ("vouloir",),
+    "voudraient": ("vouloir",),
+    "voudrais": ("vouloir",),
+    "voudrait": ("vouloir",),
+    "voudrai": ("vouloir",),
+    "voudras": ("vouloir",),
+    "voudra": ("vouloir",),
+    "voudrez": ("vouloir",),
+    "voudriez": ("vouloir",),
+    "voudrions": ("vouloir",),
+    "voudrons": ("vouloir",),
+    "voudront": ("vouloir",),
+    "voulaient": ("vouloir",),
+    "voulais": ("vouloir",),
+    "voulait": ("vouloir",),
+    "voulant": ("vouloir",),
+    "voulez": ("vouloir",),
+    "vouliez": ("vouloir",),
+    "voulions": ("vouloir",),
+    "voulons": ("vouloir",),
+    "voulues": ("vouloir",),
+    "voulue": ("vouloir",),
+    "voulûmes": ("vouloir",),
+    "voulurent": ("vouloir",),
+    "voulussent": ("vouloir",),
+    "voulusses": ("vouloir",),
+    "voulusse": ("vouloir",),
+    "voulussiez": ("vouloir",),
+    "voulussions": ("vouloir",),
+    "voulus": ("vouloir",),
+    "voulûtes": ("vouloir",),
+    "voulut": ("vouloir",),
+    "voulût": ("vouloir",),
+    "voulu": ("vouloir",),
+    "sachant": ("savoir",),
+    "sachent": ("savoir",),
+    "sache": ("savoir",),
+    "saches": ("savoir",),
+    "sachez": ("savoir",),
+    "sachiez": ("savoir",),
+    "sachions": ("savoir",),
+    "sachons": ("savoir",),
+    "sais": ("savoir",),
+    "sait": ("savoir",),
+    "sauraient": ("savoir",),
+    "saurai": ("savoir",),
+    "saurais": ("savoir",),
+    "saurait": ("savoir",),
+    "saura": ("savoir",),
+    "sauras": ("savoir",),
+    "saurez": ("savoir",),
+    "sauriez": ("savoir",),
+    "saurions": ("savoir",),
+    "saurons": ("savoir",),
+    "sauront": ("savoir",),
+    "savaient": ("savoir",),
+    "savais": ("savoir",),
+    "savait": ("savoir",),
+    "savent": ("savoir",),
+    "savez": ("savoir",),
+    "saviez": ("savoir",),
+    "savions": ("savoir",),
+    "savons": ("savoir",),
+    "sue": ("savoir",),
+    "sues": ("savoir",),
+    "sûmes": ("savoir",),
+    "surent": ("savoir",),
+    "su": ("savoir",),
+    "sus": ("savoir",),
+    "sussent": ("savoir",),
+    "susse": ("savoir",),
+    "susses": ("savoir",),
+    "sussiez": ("savoir",),
+    "sussions": ("savoir",),
+    "sûtes": ("savoir",),
+    "sut": ("savoir",),
+    "sût": ("savoir",),
+    "venaient": ("venir",),
+    "venais": ("venir",),
+    "venait": ("venir",),
+    "venant": ("venir",),
+    "venez": ("venir",),
+    "veniez": ("venir",),
+    "venions": ("venir",),
+    "venons": ("venir",),
+    "venues": ("venir",),
+    "venue": ("venir",),
+    "venus": ("venir",),
+    "venu": ("venir",),
+    "viendraient": ("venir",),
+    "viendrais": ("venir",),
+    "viendrait": ("venir",),
+    "viendrai": ("venir",),
+    "viendras": ("venir",),
+    "viendra": ("venir",),
+    "viendrez": ("venir",),
+    "viendriez": ("venir",),
+    "viendrions": ("venir",),
+    "viendrons": ("venir",),
+    "viendront": ("venir",),
+    "viennent": ("venir",),
+    "viennes": ("venir",),
+    "vienne": ("venir",),
+    "viens": ("venir",),
+    "vient": ("venir",),
+    "vînmes": ("venir",),
+    "vinrent": ("venir",),
+    "vinssent": ("venir",),
+    "vinsses": ("venir",),
+    "vinsse": ("venir",),
+    "vinssiez": ("venir",),
+    "vinssions": ("venir",),
+    "vins": ("venir",),
+    "vîntes": ("venir",),
+    "vint": ("venir",),
+    "vînt": ("venir",),
+    "aille": ("aller",),
+    "aillent": ("aller",),
+    "ailles": ("aller",),
+    "alla": ("aller",),
+    "allai": ("aller",),
+    "allaient": ("aller",),
+    "allais": ("aller",),
+    "allait": ("aller",),
+    "allâmes": ("aller",),
+    "allant": ("aller",),
+    "allas": ("aller",),
+    "allasse": ("aller",),
+    "allassent": ("aller",),
+    "allasses": ("aller",),
+    "allassiez": ("aller",),
+    "allassions": ("aller",),
+    "allât": ("aller",),
+    "allâtes": ("aller",),
+    "allé": ("aller",),
+    "allée": ("aller",),
+    "allées": ("aller",),
+    "allèrent": ("aller",),
+    "allés": ("aller",),
+    "allez": ("aller",),
+    "allons": ("aller",),
+    "ira": ("aller",),
+    "irai": ("aller",),
+    "iraient": ("aller",),
+    "irais": ("aller",),
+    "irait": ("aller",),
+    "iras": ("aller",),
+    "irez": ("aller",),
+    "iriez": ("aller",),
+    "irions": ("aller",),
+    "irons": ("aller",),
+    "iront": ("aller",),
+    "va": ("aller",),
+    "vais": ("aller",),
+    "vas": ("aller",),
+    "vont": ("aller",)
 }
--- a/spacy/lang/fr/lemmatizer/_lemma_rules.py
+++ b/spacy/lang/fr/lemmatizer/_lemma_rules.py
@ -2,10 +2,113 @@
 from __future__ import unicode_literals


-ADJECTIVE_RULES = [["s", ""], ["e", ""], ["es", ""]]
+ADJECTIVE_RULES = [
+    ["a", "a"],
+    ["aux", "al"],
+    ["c", "c"],
+    ["d", "d"],
+    ["e", ""],
+    ["é", "é"],
+    ["eux", "eux"],
+    ["f", "f"],
+    ["i", "i"],
+    ["ï", "ï"],
+    ["l", "l"],
+    ["m", "m"],
+    ["n", "n"],
+    ["o", "o"],
+    ["p", "p"],
+    ["r", "r"],
+    ["s", ""],
+    ["t", "t"],
+    ["u", "u"],
+    ["y", "y"],
+]


-NOUN_RULES = [["s", ""]]
+NOUN_RULES = [
+    ["a", "a"],
+    ["à", "à"],
+    ["â", "â"],
+    ["b", "b"],
+    ["c", "c"],
+    ["ç", "ç"],
+    ["d", "d"],
+    ["e", "e"],
+    ["é", "é"],
+    ["è", "è"],
+    ["ê", "ê"],
+    ["ë", "ë"],
+    ["f", "f"],
+    ["g", "g"],
+    ["h", "h"],
+    ["i", "i"],
+    ["î", "î"],
+    ["ï", "ï"],
+    ["j", "j"],
+    ["k", "k"],
+    ["l", "l"],
+    ["m", "m"],
+    ["n", "n"],
+    ["o", "o"],
+    ["ô", "ö"],
+    ["ö", "ö"],
+    ["p", "p"],
+    ["q", "q"],
+    ["r", "r"],
+    ["t", "t"],
+    ["u", "u"],
+    ["û", "û"],
+    ["v", "v"],
+    ["w", "w"],
+    ["y", "y"],
+    ["z", "z"],
+    ["as", "a"],
+    ["aux", "au"],
+    ["cs", "c"],
+    ["chs", "ch"],
+    ["ds", "d"],
+    ["és", "é"],
+    ["es", "e"],
+    ["eux", "eu"],
+    ["fs", "f"],
+    ["gs", "g"],
+    ["hs", "h"],
+    ["is", "i"],
+    ["ïs", "ï"],
+    ["js", "j"],
+    ["ks", "k"],
+    ["ls", "l"],
+    ["ms", "m"],
+    ["ns", "n"],
+    ["oux", "ou"],
+    ["os", "o"],
+    ["ps", "p"],
+    ["qs", "q"],
+    ["rs", "r"],
+    ["ses", "se"],
+    ["se", "se"],
+    ["ts", "t"],
+    ["us", "u"],
+    ["vs", "v"],
+    ["ws", "w"],
+    ["ys", "y"],
+    ["nt(e", "nt"],
+    ["nt(e)", "nt"],
+    ["al(e", "ale"],
+    ["é(", "é"],
+    ["é(e", "é"],
+    ["é.e", "é"],
+    ["el(le", "el"],
+    ["eurs(rices", "eur"],
+    ["eur(rice", "eur"],
+    ["eux(se", "eux"],
+    ["ial(e", "ial"],
+    ["er(ère", "er"],
+    ["eur(se", "eur"],
+    ["teur(trice", "teur"],
+    ["teurs(trices", "teur"],
+]


 VERB_RULES = [
@ -47,4 +150,11 @@ VERB_RULES = [
    ["assiez", "er"],
    ["assent", "er"],
    ["ant", "er"],
+    ["ante", "er"],
+    ["ants", "er"],
+    ["antes", "er"],
+    ["u(er", "u"],
+    ["és(ées", "er"],
+    ["é()e", "er"],
+    ["é()", "er"],
 ]
--- a/spacy/lang/fr/lemmatizer/_nouns.py
+++ b/spacy/lang/fr/lemmatizer/_nouns.py
--- a/spacy/lang/fr/lemmatizer/_nouns_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_nouns_irreg.py
--- a/spacy/lang/fr/lemmatizer/_verbs.py
+++ b/spacy/lang/fr/lemmatizer/_verbs.py
--- a/spacy/lang/fr/lemmatizer/_verbs_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_verbs_irreg.py
--- a/spacy/lang/fr/tokenizer_exceptions.py
+++ b/spacy/lang/fr/tokenizer_exceptions.py
@ -94,15 +94,19 @@ for pre, pre_lemma in [("qu'", "que"), ("n'", "ne")]:

 _infixes_exc = []
 orig_elision = "'"
-orig_hyphen = '-'
+orig_hyphen = "-"

 # loop through the elison and hyphen characters, and try to substitute the ones that weren't used in the original list
 for infix in FR_BASE_EXCEPTIONS:
    variants_infix = {infix}
    for elision_char in [x for x in ELISION if x != orig_elision]:
-        variants_infix.update([word.replace(orig_elision, elision_char) for word in variants_infix])
-    for hyphen_char in [x for x in ['-', '‐'] if x != orig_hyphen]:
-        variants_infix.update([word.replace(orig_hyphen, hyphen_char) for word in variants_infix])
+        variants_infix.update(
+            [word.replace(orig_elision, elision_char) for word in variants_infix]
+        )
+    for hyphen_char in [x for x in ["-", "‐"] if x != orig_hyphen]:
+        variants_infix.update(
+            [word.replace(orig_hyphen, hyphen_char) for word in variants_infix]
+        )
    variants_infix.update([upper_first_letter(word) for word in variants_infix])
    _infixes_exc.extend(variants_infix)

@ -327,7 +331,9 @@ _regular_exp = [
    "^chape[{hyphen}]chut[{alpha}]+$".format(hyphen=HYPHENS, alpha=ALPHA_LOWER),
    "^down[{hyphen}]load[{alpha}]*$".format(hyphen=HYPHENS, alpha=ALPHA_LOWER),
    "^[ée]tats[{hyphen}]uni[{alpha}]*$".format(hyphen=HYPHENS, alpha=ALPHA_LOWER),
-    "^droits?[{hyphen}]de[{hyphen}]l'homm[{alpha}]+$".format(hyphen=HYPHENS, alpha=ALPHA_LOWER),
+    "^droits?[{hyphen}]de[{hyphen}]l'homm[{alpha}]+$".format(
+        hyphen=HYPHENS, alpha=ALPHA_LOWER
+    ),
    "^fac[{hyphen}]simil[{alpha}]*$".format(hyphen=HYPHENS, alpha=ALPHA_LOWER),
    "^fleur[{hyphen}]bleuis[{alpha}]+$".format(hyphen=HYPHENS, alpha=ALPHA_LOWER),
    "^flic[{hyphen}]flaqu[{alpha}]+$".format(hyphen=HYPHENS, alpha=ALPHA_LOWER),
@ -380,25 +386,32 @@ _regular_exp += [
 ]

 # catching cases like entr'abat
-_elision_prefix = ['r?é?entr', 'grande?s?', 'r']
+_elision_prefix = ["r?é?entr", "grande?s?", "r"]
 _regular_exp += [
    "^{prefix}[{elision}][{alpha}][{alpha}{elision}{hyphen}\-]*$".format(
-        prefix=p,
-        elision=ELISION,
-        hyphen=_other_hyphens,
-        alpha=ALPHA_LOWER,
+        prefix=p, elision=ELISION, hyphen=_other_hyphens, alpha=ALPHA_LOWER
    )
    for p in _elision_prefix
 ]

 # catching cases like saut-de-ski, pet-en-l'air
-_hyphen_combination = ['l[èe]s?', 'la', 'en', 'des?', 'd[eu]', 'sur', 'sous', 'aux?', 'à', 'et', "près", "saint"]
+_hyphen_combination = [
+    "l[èe]s?",
+    "la",
+    "en",
+    "des?",
+    "d[eu]",
+    "sur",
+    "sous",
+    "aux?",
+    "à",
+    "et",
+    "près",
+    "saint",
+]
 _regular_exp += [
    "^[{alpha}]+[{hyphen}]{hyphen_combo}[{hyphen}](?:l[{elision}])?[{alpha}]+$".format(
-        hyphen_combo=hc,
-        elision=ELISION,
-        hyphen=HYPHENS,
-        alpha=ALPHA_LOWER,
+        hyphen_combo=hc, elision=ELISION, hyphen=HYPHENS, alpha=ALPHA_LOWER
    )
    for hc in _hyphen_combination
 ]
--- a/spacy/lang/id/norm_exceptions.py
+++ b/spacy/lang/id/norm_exceptions.py
@ -1,3 +1,10 @@
+"""
+Slang and abbreviations
+
+Daftar kosakata yang sering salah dieja
+https://id.wikipedia.org/wiki/Wikipedia:Daftar_kosakata_bahasa_Indonesia_yang_sering_salah_dieja
+
+"""
 # coding: utf8
 from __future__ import unicode_literals

--- a/spacy/lang/id/stop_words.py
+++ b/spacy/lang/id/stop_words.py
@ -1,3 +1,6 @@
+"""
+List of stop words in Bahasa Indonesia.
+"""
 # coding: utf8
 from __future__ import unicode_literals

--- a/spacy/lang/id/tokenizer_exceptions.py
+++ b/spacy/lang/id/tokenizer_exceptions.py
@ -1,3 +1,7 @@
+"""
+Daftar singkatan dan Akronim dari:
+https://id.wiktionary.org/wiki/Wiktionary:Daftar_singkatan_dan_akronim_bahasa_Indonesia#A
+"""
 # coding: utf8
 from __future__ import unicode_literals

--- a/spacy/lang/nb/lemmatizer/_adjectives_wordforms.py
+++ b/spacy/lang/nb/lemmatizer/_adjectives_wordforms.py
--- a/spacy/lang/nb/lemmatizer/_adverbs_wordforms.py
+++ b/spacy/lang/nb/lemmatizer/_adverbs_wordforms.py
@ -1,6 +1,6 @@
 # coding: utf8
 """
-All wordforms are extracted from Norsk Ordbank in Norwegian Bokmål 2005
+All wordforms are extracted from Norsk Ordbank in Norwegian Bokmål 2005,  updated 20180627
 (CLARINO NB - Språkbanken), Nasjonalbiblioteket, Norway:
 https://www.nb.no/sprakbanken/show?serial=oai%3Anb.no%3Asbr-5&lang=en
 License:
@ -15,9 +15,7 @@ ADVERBS_WORDFORMS = {
 'à la grecque': ('à la grecque',),
 'à la mode': ('à la mode',),
 'òg': ('òg',),
-'a': ('a',),
 'a cappella': ('a cappella',),
-'a conto': ('a conto',),
 'a konto': ('a konto',),
 'a posteriori': ('a posteriori',),
 'a prima vista': ('a prima vista',),
@ -34,6 +32,12 @@ ADVERBS_WORDFORMS = {
 'ad undas': ('ad undas',),
 'adagio': ('adagio',),
 'akkurat': ('akkurat',),
+'aktenfor': ('aktenfor',),
+'aktenfra': ('aktenfra',),
+'akter': ('akter',),
+'akterinn': ('akterinn',),
+'akterover': ('akterover',),
+'akterut': ('akterut',),
 'al fresco': ('al fresco',),
 'al secco': ('al secco',),
 'aldeles': ('aldeles',),
@ -46,6 +50,9 @@ ADVERBS_WORDFORMS = {
 'allegro': ('allegro',),
 'aller': ('aller',),
 'allerede': ('allerede',),
+'allesteds': ('allesteds',),
+'allestedsfra': ('allestedsfra',),
+'allestedshen': ('allestedshen',),
 'allikevel': ('allikevel',),
 'alltid': ('alltid',),
 'alltids': ('alltids',),
@ -60,8 +67,12 @@ ADVERBS_WORDFORMS = {
 'andelsvis': ('andelsvis',),
 'andfares': ('andfares',),
 'andføttes': ('andføttes',),
+'annensteds': ('annensteds',),
+'annenstedsfra': ('annenstedsfra',),
+'annenstedshen': ('annenstedshen',),
 'annetsteds': ('annetsteds',),
 'annetstedsfra': ('annetstedsfra',),
+'annetstedsfra': ('annetstedsfra',),
 'annetstedshen': ('annetstedshen',),
 'anno': ('anno',),
 'anslagsvis': ('anslagsvis',),
@ -72,21 +83,35 @@ ADVERBS_WORDFORMS = {
 'apropos': ('apropos',),
 'argende': ('argende',),
 'at': ('at',),
+'att': ('att',),
+'attende': ('attende',),
 'atter': ('atter',),
 'attpåtil': ('attpåtil',),
 'attåt': ('attåt',),
 'au': ('au',),
+'aust': ('aust',),
+'austa': ('austa',),
+'austafjells': ('austafjells',),
+'av gårde': ('av gårde',),
+'av sted': ('av sted',),
 'avdelingsvis': ('avdelingsvis',),
 'avdragsvis': ('avdragsvis',),
 'avhendes': ('avhendes',),
 'avhends': ('avhends',),
 'avsatsvis': ('avsatsvis',),
+'babord': ('babord',),
+'bakfra': ('bakfra',),
 'bakk': ('bakk',),
 'baklengs': ('baklengs',),
+'bakover': ('bakover',),
+'bakut': ('bakut',),
 'bare': ('bare',),
 'bataljonsvis': ('bataljonsvis',),
+'beint fram': ('beint fram',),
 'bekende': ('bekende',),
 'belgende': ('belgende',),
+'bent fram': ('bent fram',),
+'bent frem': ('bent frem',),
 'betids': ('betids',),
 'bi': ('bi',),
 'bidevind': ('bidevind',),
@ -102,17 +127,21 @@ ADVERBS_WORDFORMS = {
 'bom': ('bom',),
 'bommende': ('bommende',),
 'bona fide': ('bona fide',),
+'bort': ('bort',),
+'borte': ('borte',),
+'bortimot': ('bortimot',),
 'brennfort': ('brennfort',),
-'brutto': ('brutto',),
 'bråtevis': ('bråtevis',),
 'bums': ('bums',),
 'buntevis': ('buntevis',),
 'buntvis': ('buntvis',),
 'bus': ('bus',),
+'bygdimellom': ('bygdimellom',),
 'cantabile': ('cantabile',),
 'cf': ('cf',),
 'cif': ('cif',),
 'cirka': ('cirka',),
+'comme il faut': ('comme il faut',),
 'crescendo': ('crescendo',),
 'da': ('da',),
 'dagevis': ('dagevis',),
@ -127,18 +156,38 @@ ADVERBS_WORDFORMS = {
 'delkredere': ('delkredere',),
 'dels': ('dels',),
 'delvis': ('delvis',),
+'den gang': ('den gang',),
+'der': ('der',),
+'der borte': ('der borte',),
+'der hen': ('der hen',),
+'der inne': ('der inne',),
+'der nede': ('der nede',),
+'der oppe': ('der oppe',),
+'der ute': ('der ute',),
 'derav': ('derav',),
 'deretter': ('deretter',),
 'derfor': ('derfor',),
+'derfra': ('derfra',),
+'deri': ('deri',),
+'deriblant': ('deriblant',),
+'derifra': ('derifra',),
 'derimot': ('derimot',),
 'dermed': ('dermed',),
 'dernest': ('dernest',),
+'derom': ('derom',),
+'derpå': ('derpå',),
+'dertil': ('dertil',),
+'derved': ('derved',),
 'dess': ('dess',),
 'dessuten': ('dessuten',),
 'dessverre': ('dessverre',),
 'desto': ('desto',),
 'diminuendo': ('diminuendo',),
 'dis': ('dis',),
+'dit': ('dit',),
+'dit hen': ('dit hen',),
+'ditover': ('ditover',),
+'ditto': ('ditto',),
 'dog': ('dog',),
 'dolce': ('dolce',),
 'dorgende': ('dorgende',),
@ -158,10 +207,10 @@ ADVERBS_WORDFORMS = {
 'eitrende': ('eitrende',),
 'eks': ('eks',),
 'eksempelvis': ('eksempelvis',),
+'eksklusiv': ('eksklusiv',),
+'eksklusive': ('eksklusive',),
 'ekspress': ('ekspress',),
 'ekstempore': ('ekstempore',),
-'eldende': ('eldende',),
-'eldende': ('eldende',),
 'ellers': ('ellers',),
 'en': ('en',),
 'en bloc': ('en bloc',),
@ -175,6 +224,8 @@ ADVERBS_WORDFORMS = {
 'enda': ('enda',),
 'endatil': ('endatil',),
 'ende': ('ende',),
+'ende fram': ('ende fram',),
+'ende frem': ('ende frem',),
 'ender': ('ender',),
 'endog': ('endog',),
 'ene': ('ene',),
@ -183,10 +234,12 @@ ADVERBS_WORDFORMS = {
 'enkom': ('enkom',),
 'enn': ('enn',),
 'ennå': ('ennå',),
+'ensteds': ('ensteds',),
 'eo ipso': ('eo ipso',),
 'ergo': ('ergo',),
 'et cetera': ('et cetera',),
 'etappevis': ('etappevis',),
+'etsteds': ('etsteds',),
 'etterhånden': ('etterhånden',),
 'etterpå': ('etterpå',),
 'etterskottsvis': ('etterskottsvis',),
@ -195,9 +248,10 @@ ADVERBS_WORDFORMS = {
 'ex auditorio': ('ex auditorio',),
 'ex cathedra': ('ex cathedra',),
 'ex officio': ('ex officio',),
+'exit': ('exit',),
+'f.o.r.': ('f.o.r.',),
 'fas': ('fas',),
 'fatt': ('fatt',),
-'fatt': ('fatt',),
 'feil': ('feil',),
 'femti-femti': ('femti-femti',),
 'fifty-fifty': ('fifty-fifty',),
@ -208,44 +262,64 @@ ADVERBS_WORDFORMS = {
 'flunkende': ('flunkende',),
 'flust': ('flust',),
 'fly': ('fly',),
+'fløyten': ('fløyten',),
 'fob': ('fob',),
 'for': ('for',),
+'for hånden': ('for hånden',),
 'for lengst': ('for lengst',),
 'for resten': ('for resten',),
 'for så vidt': ('for så vidt',),
+'for tida': ('for tida',),
+'for tiden': ('for tiden',),
 'for visst': ('for visst',),
 'for øvrig': ('for øvrig',),
 'fordevind': ('fordevind',),
 'fordum': ('fordum',),
 'fore': ('fore',),
+'forfra': ('forfra',),
 'forhakkende': ('forhakkende',),
 'forholdsvis': ('forholdsvis',),
 'forhåpentlig': ('forhåpentlig',),
 'forhåpentligvis': ('forhåpentligvis',),
 'forlengs': ('forlengs',),
 'formelig': ('formelig',),
+'forover': ('forover',),
 'forresten': ('forresten',),
 'forsøksvis': ('forsøksvis',),
+'fort': ('fort',),
+'fortere': ('fort',),
+'fortest': ('fort',),
 'forte': ('forte',),
 'fortfarende': ('fortfarende',),
 'fortissimo': ('fortissimo',),
 'fortrinnsvis': ('fortrinnsvis',),
+'forut': ('forut',),
+'fra borde': ('fra borde',),
+'fram': ('fram',),
+'framføre': ('framføre',),
 'framleis': ('framleis',),
 'framlengs': ('framlengs',),
+'framme': ('framme',),
 'framstupes': ('framstupes',),
 'framstups': ('framstups',),
 'franko': ('franko',),
 'free on board': ('free on board',),
 'free on rail': ('free on rail',),
+'frem': ('frem',),
+'fremad': ('fremad',),
 'fremdeles': ('fremdeles',),
 'fremlengs': ('fremlengs',),
+'fremme': ('fremme',),
 'fremstupes': ('fremstupes',),
 'fremstups': ('fremstups',),
 'furioso': ('furioso',),
 'fylkesvis': ('fylkesvis',),
 'følgelig': ('følgelig',),
+'føre': ('føre',),
 'først': ('først',),
 'ganske': ('ganske',),
+'gardimellom': ('gardimellom',),
+'gatelangs': ('gatelangs',),
 'gid': ('gid',),
 'givetvis': ('givetvis',),
 'gjerne': ('gjerne',),
@ -267,17 +341,56 @@ ADVERBS_WORDFORMS = {
 'gørrende': ('gørrende',),
 'hakk': ('hakk',),
 'hakkende': ('hakkende',),
+'halvveges': ('halvveges',),
+'halvvegs': ('halvvegs',),
 'halvveis': ('halvveis',),
 'haugevis': ('haugevis',),
 'heden': ('heden',),
+'heim': ('heim',),
+'heim att': ('heim att',),
 'heiman': ('heiman',),
+'heime': ('heime',),
+'heimefra': ('heimefra',),
+'heimetter': ('heimetter',),
+'heimom': ('heimom',),
+'heimover': ('heimover',),
 'heldigvis': ('heldigvis',),
 'heller': ('heller',),
 'helst': ('helst',),
+'hen': ('hen',),
 'henholdsvis': ('henholdsvis',),
+'henne': ('henne',),
+'her': ('her',),
+'herav': ('herav',),
+'heretter': ('heretter',),
+'herfra': ('herfra',),
+'heri': ('heri',),
+'heriblant': ('heriblant',),
+'herifra': ('herifra',),
+'herigjennom': ('herigjennom',),
+'herimot': ('herimot',),
+'hermed': ('hermed',),
+'herom': ('herom',),
+'herover': ('herover',),
+'herpå': ('herpå',),
 'herre': ('herre',),
 'hersens': ('hersens',),
+'hertil': ('hertil',),
+'herunder': ('herunder',),
+'herved': ('herved',),
 'himlende': ('himlende',),
+'hisset': ('hisset',),
+'hist': ('hist',),
+'hit': ('hit',),
+'hitover': ('hitover',),
+'hittil': ('hittil',),
+'hjem': ('hjem',),
+'hjemad': ('hjemad',),
+'hjemetter': ('hjemetter',),
+'hjemme': ('hjemme',),
+'hjemmefra': ('hjemmefra',),
+'hjemom': ('hjemom',),
+'hjemover': ('hjemover',),
 'hodekulls': ('hodekulls',),
 'hodestupes': ('hodestupes',),
 'hodestups': ('hodestups',),
@ -288,15 +401,41 @@ ADVERBS_WORDFORMS = {
 'hundretusenvis': ('hundretusenvis',),
 'hundrevis': ('hundrevis',),
 'hurra-meg-rundt': ('hurra-meg-rundt',),
+'husimellom': ('husimellom',),
 'hvi': ('hvi',),
 'hvor': ('hvor',),
+'hvor hen': ('hvor hen',),
 'hvorav': ('hvorav',),
 'hvordan': ('hvordan',),
+'hvoretter': ('hvoretter',),
 'hvorfor': ('hvorfor',),
+'hvorfra': ('hvorfra',),
+'hvori': ('hvori',),
+'hvoriblant': ('hvoriblant',),
+'hvorimot': ('hvorimot',),
+'hvorledes': ('hvorledes',),
+'hvormed': ('hvormed',),
+'hvorom': ('hvorom',),
+'hvorpå': ('hvorpå',),
 'hånt': ('hånt',),
 'høylig': ('høylig',),
 'høyst': ('høyst',),
+'i aften': ('i aften',),
+'i aftes': ('i aftes',),
 'i alle fall': ('i alle fall',),
+'i dag': ('i dag',),
+'i fjor': ('i fjor',),
+'i fleng': ('i fleng',),
+'i forfjor': ('i forfjor',),
+'i forgårs': ('i forgårs',),
+'i gjerde': ('i gjerde',),
+'i gjære': ('i gjære',),
+'i grunnen': ('i grunnen',),
+'i går': ('i går',),
+'i hende': ('i hende',),
+'i hjel': ('i hjel',),
+'i hug': ('i hug',),
+'i huleste': ('i huleste',),
 'i stedet': ('i stedet',),
 'iallfall': ('iallfall',),
 'ibidem': ('ibidem',),
@ -304,7 +443,7 @@ ADVERBS_WORDFORMS = {
 'igjen': ('igjen',),
 'ikke': ('ikke',),
 'ildende': ('ildende',),
-'ildende': ('ildende',),
+'ille': ('ille',),
 'imens': ('imens',),
 'imidlertid': ('imidlertid',),
 'in absentia': ('in absentia',),
@ -334,10 +473,22 @@ ADVERBS_WORDFORMS = {
 'in vivo': ('in vivo',),
 'ingenlunde': ('ingenlunde',),
 'ingensteds': ('ingensteds',),
+'inklusiv': ('inklusiv',),
+'inklusive': ('inklusive',),
 'inkognito': ('inkognito',),
+'inn': ('inn',),
+'innad': ('innad',),
+'innafra': ('innafra',),
+'innalands': ('innalands',),
+'innaskjærs': ('innaskjærs',),
+'inne': ('inne',),
 'innenat': ('innenat',),
+'innenfra': ('innenfra',),
+'innenlands': ('innenlands',),
+'innenskjærs': ('innenskjærs',),
 'innledningsvis': ('innledningsvis',),
 'innleiingsvis': ('innleiingsvis',),
+'innomhus': ('innomhus',),
 'isteden': ('isteden',),
 'især': ('især',),
 'item': ('item',),
@ -380,12 +531,26 @@ ADVERBS_WORDFORMS = {
 'lagerfritt': ('lagerfritt',),
 'lagom': ('lagom',),
 'lagvis': ('lagvis',),
+'landimellom': ('landimellom',),
+'landverts': ('landverts',),
+'langt': ('langt',),
+'lenger': ('langt',),
+'lengst': ('langt',),
+'langveges': ('langveges',),
+'langvegesfra': ('langvegesfra',),
+'langvegs': ('langvegs',),
+'langvegsfra': ('langvegsfra',),
+'langveis': ('langveis',),
+'langveisfra': ('langveisfra',),
 'larghetto': ('larghetto',),
 'largo': ('largo',),
 'lassevis': ('lassevis',),
 'legato': ('legato',),
 'leilighetsvis': ('leilighetsvis',),
 'lell': ('lell',),
+'lenge': ('lenge',),
+'lenger': ('lenge',),
+'lengst': ('lenge',),
 'lenger': ('lenger',),
 'liddelig': ('liddelig',),
 'like': ('like',),
@ -408,19 +573,25 @@ ADVERBS_WORDFORMS = {
 'maestoso': ('maestoso',),
 'mala fide': ('mala fide',),
 'malapropos': ('malapropos',),
+'mannemellom': ('mannemellom',),
 'massevis': ('massevis',),
 'med rette': ('med rette',),
 'medio': ('medio',),
 'medium': ('medium',),
+'medsols': ('medsols',),
+'medstrøms': ('medstrøms',),
 'meget': ('meget',),
 'mengdevis': ('mengdevis',),
 'metervis': ('metervis',),
 'mezzoforte': ('mezzoforte',),
 'midsommers': ('midsommers',),
-'midsommers': ('midsommers',),
 'midt': ('midt',),
+'midtfjords': ('midtfjords',),
+'midtskips': ('midtskips',),
 'midtsommers': ('midtsommers',),
-'midtsommers': ('midtsommers',),
+'midtveges': ('midtveges',),
+'midtvegs': ('midtvegs',),
+'midtveis': ('midtveis',),
 'midtvinters': ('midtvinters',),
 'midvinters': ('midvinters',),
 'milevis': ('milevis',),
@ -445,6 +616,13 @@ ADVERBS_WORDFORMS = {
 'naturligvis': ('naturligvis',),
 'nauende': ('nauende',),
 'navnlig': ('navnlig',),
+'ned': ('ned',),
+'nedad': ('nedad',),
+'nedatil': ('nedatil',),
+'nede': ('nede',),
+'nedentil': ('nedentil',),
+'nedenunder': ('nedenunder',),
+'nedstrøms': ('nedstrøms',),
 'neigu': ('neigu',),
 'neimen': ('neimen',),
 'nemlig': ('nemlig',),
@ -452,31 +630,46 @@ ADVERBS_WORDFORMS = {
 'nesegrus': ('nesegrus',),
 'nest': ('nest',),
 'nesten': ('nesten',),
-'netto': ('netto',),
 'nettopp': ('nettopp',),
 'noenlunde': ('noenlunde',),
 'noensinne': ('noensinne',),
 'noensteds': ('noensteds',),
 'nok': ('nok',),
-'nok': ('nok',),
 'noksom': ('noksom',),
 'nokså': ('nokså',),
 'non stop': ('non stop',),
 'nonstop': ('nonstop',),
+'nord': ('nord',),
+'nordafjells': ('nordafjells',),
+'nordaust': ('nordaust',),
+'nordenfjells': ('nordenfjells',),
+'nordost': ('nordost',),
+'nordvest': ('nordvest',),
+'nordøst': ('nordøst',),
 'notabene': ('notabene',),
-'nu': ('nu',),
-'nylig': ('nylig',),
 'nyss': ('nyss',),
 'nå': ('nå',),
 'når': ('når',),
 'nåvel': ('nåvel',),
+'nær': ('nær',),
+'nærere': ('nær',),
+'nærmere': ('nær',),
+'nærest': ('nær',),
+'nærmest': ('nær',),
 'nære': ('nære',),
 'nærere': ('nærere',),
 'nærest': ('nærest',),
+'nærme': ('nærme',),
 'nærmere': ('nærmere',),
 'nærmest': ('nærmest',),
+'nødig': ('nødig',),
+'nødigere': ('nødig',),
+'nødigst': ('nødig',),
 'nødvendigvis': ('nødvendigvis',),
 'offside': ('offside',),
+'ofte': ('ofte',),
+'oftere': ('ofte',),
+'oftest': ('ofte',),
 'også': ('også',),
 'om att': ('om att',),
 'om igjen': ('om igjen',),
@ -485,11 +678,18 @@ ADVERBS_WORDFORMS = {
 'omsonst': ('omsonst',),
 'omtrent': ('omtrent',),
 'onnimellom': ('onnimellom',),
+'opp': ('opp',),
 'opp att': ('opp att',),
 'opp ned': ('opp ned',),
 'oppad': ('oppad',),
+'oppe': ('oppe',),
 'oppstrøms': ('oppstrøms',),
+'ost': ('ost',),
+'ovabords': ('ovabords',),
+'ovatil': ('ovatil',),
 'oven': ('oven',),
+'ovenbords': ('ovenbords',),
+'oventil': ('oventil',),
 'overalt': ('overalt',),
 'overens': ('overens',),
 'overhodet': ('overhodet',),
@ -506,8 +706,6 @@ ADVERBS_WORDFORMS = {
 'partout': ('partout',),
 'parvis': ('parvis',),
 'per capita': ('per capita',),
-'peu à peu': ('peu à peu',),
-'peu om peu': ('peu om peu',),
 'pianissimo': ('pianissimo',),
 'piano': ('piano',),
 'pinende': ('pinende',),
@ -554,7 +752,6 @@ ADVERBS_WORDFORMS = {
 'respektive': ('respektive',),
 'rettsøles': ('rettsøles',),
 'reverenter': ('reverenter',),
-'riktig nok': ('riktig nok',),
 'riktignok': ('riktignok',),
 'rimeligvis': ('rimeligvis',),
 'ringside': ('ringside',),
@ -567,6 +764,8 @@ ADVERBS_WORDFORMS = {
 'saktelig': ('saktelig',),
 'saktens': ('saktens',),
 'sammen': ('sammen',),
+'sammesteds': ('sammesteds',),
+'sammestedsfra': ('sammestedsfra',),
 'samstundes': ('samstundes',),
 'samt': ('samt',),
 'sann': ('sann',),
@ -578,6 +777,7 @@ ADVERBS_WORDFORMS = {
 'senhøstes': ('senhøstes',),
 'sia': ('sia',),
 'sic': ('sic',),
+'sidelangs': ('sidelangs',),
 'sidelengs': ('sidelengs',),
 'siden': ('siden',),
 'sideveges': ('sideveges',),
@ -587,9 +787,9 @@ ADVERBS_WORDFORMS = {
 'silde': ('silde',),
 'simpelthen': ('simpelthen',),
 'sine anno': ('sine anno',),
+'sistpå': ('sistpå',),
 'sjelden': ('sjelden',),
 'sjøleies': ('sjøleies',),
-'sjøleis': ('sjøleis',),
 'sjøverts': ('sjøverts',),
 'skeis': ('skeis',),
 'skiftevis': ('skiftevis',),
@ -607,6 +807,9 @@ ADVERBS_WORDFORMS = {
 'smekk': ('smekk',),
 'smellende': ('smellende',),
 'småningom': ('småningom',),
+'snart': ('snart',),
+'snarere': ('snart',),
+'snarest': ('snart',),
 'sneisevis': ('sneisevis',),
 'snesevis': ('snesevis',),
 'snuft': ('snuft',),
@ -616,6 +819,7 @@ ADVERBS_WORDFORMS = {
 'snyte': ('snyte',),
 'solo': ('solo',),
 'sommerstid': ('sommerstid',),
+'sommesteds': ('sommesteds',),
 'spenna': ('spenna',),
 'spent': ('spent',),
 'spika': ('spika',),
@ -651,6 +855,7 @@ ADVERBS_WORDFORMS = {
 'styggelig': ('styggelig',),
 'styggende': ('styggende',),
 'stykkevis': ('stykkevis',),
+'styrbord': ('styrbord',),
 'støtt': ('støtt',),
 'støtvis': ('støtvis',),
 'støytvis': ('støytvis',),
@ -658,6 +863,12 @@ ADVERBS_WORDFORMS = {
 'summa summarum': ('summa summarum',),
 'surr': ('surr',),
 'svinaktig': ('svinaktig',),
+'svint': ('svint',),
+'svintere': ('svint',),
+'svintest': ('svint',),
+'syd': ('syd',),
+'sydost': ('sydost',),
+'sydvest': ('sydvest',),
 'sydøst': ('sydøst',),
 'synderlig': ('synderlig',),
 'så': ('så',),
@ -672,6 +883,13 @@ ADVERBS_WORDFORMS = {
 'søkk': ('søkk',),
 'søkkende': ('søkkende',),
 'sønder': ('sønder',),
+'sønna': ('sønna',),
+'sønnafjells': ('sønnafjells',),
+'sønnenfjells': ('sønnenfjells',),
+'sør': ('sør',),
+'søraust': ('søraust',),
+'sørvest': ('sørvest',),
+'sørøst': ('sørøst',),
 'takimellom': ('takimellom',),
 'takomtil': ('takomtil',),
 'temmelig': ('temmelig',),
@ -679,10 +897,15 @@ ADVERBS_WORDFORMS = {
 'tidligdags': ('tidligdags',),
 'tidsnok': ('tidsnok',),
 'tidvis': ('tidvis',),
+'til like': ('til like',),
+'tilbake': ('tilbake',),
 'tilfeldigvis': ('tilfeldigvis',),
 'tilmed': ('tilmed',),
 'tilnærmelsesvis': ('tilnærmelsesvis',),
 'timevis': ('timevis',),
+'titt': ('titt',),
+'tiere': ('titt',),
+'tiest': ('titt',),
 'tjokkende': ('tjokkende',),
 'tomreipes': ('tomreipes',),
 'tott': ('tott',),
@ -695,44 +918,55 @@ ADVERBS_WORDFORMS = {
 'trutt': ('trutt',),
 'turevis': ('turevis',),
 'turvis': ('turvis',),
-'tusenfold': ('tusenfold',),
 'tusenvis': ('tusenvis',),
 'tvers': ('tvers',),
 'tvert': ('tvert',),
 'tydeligvis': ('tydeligvis',),
-'tynnevis': ('tynnevis',),
-'tynnevis': ('tynnevis',),
 'tålig': ('tålig',),
 'tønnevis': ('tønnevis',),
-'tønnevis': ('tønnevis',),
 'ufravendt': ('ufravendt',),
 'ugjerne': ('ugjerne',),
 'uheldigvis': ('uheldigvis',),
 'ukevis': ('ukevis',),
-'ukevis': ('ukevis',),
+'ultimo': ('ultimo',),
 'ulykkeligvis': ('ulykkeligvis',),
 'uløyves': ('uløyves',),
+'undas': ('undas',),
 'underhånden': ('underhånden',),
 'undertiden': ('undertiden',),
+'undervegs': ('undervegs',),
+'underveis': ('underveis',),
 'unntakelsesvis': ('unntakelsesvis',),
 'unntaksvis': ('unntaksvis',),
 'ustyggelig': ('ustyggelig',),
+'ut': ('ut',),
 'utaboks': ('utaboks',),
+'utad': ('utad',),
+'utalands': ('utalands',),
 'utbygdes': ('utbygdes',),
 'utdragsvis': ('utdragsvis',),
+'ute': ('ute',),
 'utelukkende': ('utelukkende',),
 'utenat': ('utenat',),
 'utenboks': ('utenboks',),
+'utenlands': ('utenlands',),
+'utomhus': ('utomhus',),
 'uvegerlig': ('uvegerlig',),
 'uviselig': ('uviselig',),
 'uvislig': ('uvislig',),
 'va banque': ('va banque',),
 'vanligvis': ('vanligvis',),
 'vann': ('vann',),
-'vekevis': ('vekevis',),
-'vekevis': ('vekevis',),
+'ved like': ('ved like',),
+'veggimellom': ('veggimellom',),
+'vekk': ('vekk',),
+'vekke': ('vekke',),
 'vekselvis': ('vekselvis',),
 'vel': ('vel',),
+'vest': ('vest',),
+'vesta': ('vesta',),
+'vestafjells': ('vestafjells',),
+'vestenfjells': ('vestenfjells',),
 'vibrato': ('vibrato',),
 'vice versa': ('vice versa',),
 'vide': ('vide',),
@ -741,7 +975,6 @@ ADVERBS_WORDFORMS = {
 'viselig': ('viselig',),
 'visselig': ('visselig',),
 'visst': ('visst',),
-'visst nok': ('visst nok',),
 'visstnok': ('visstnok',),
 'vivace': ('vivace',),
 'vonlig': ('vonlig',),
@ -754,40 +987,183 @@ ADVERBS_WORDFORMS = {
 'årlig års': ('årlig års',),
 'åssen': ('åssen',),
 'ørende': ('ørende',),
+'øst': ('øst',),
+'østa': ('østa',),
+'østafjells': ('østafjells',),
+'østenfjells': ('østenfjells',),
 'øyensynlig': ('øyensynlig',),
 'antageligvis': ('antageligvis',),
-'coolly': ('coolly',),
-'kor': ('kor',),
-'korfor': ('korfor',),
-'kor': ('kor',),
-'korfor': ('korfor',),
-'medels': ('medels',),
-'nasegrus': ('nasegrus',),
 'overimorgen': ('overimorgen',),
 'unntagelsesvis': ('unntagelsesvis',),
-'åffer': ('åffer',),
-'åffer': ('åffer',),
 'sist': ('sist',),
-'seinhaustes': ('seinhaustes',),
 'stetse': ('stetse',),
 'stikk': ('stikk',),
 'storlig': ('storlig',),
-'A': ('A',),
-'for': ('for',),
+'still going strong': ('still going strong',),
+'til og med': ('til og med',),
+'i hu': ('i hu',),
+'dengang': ('dengang',),
+'derborte': ('derborte',),
+'derefter': ('derefter',),
+'derinne': ('derinne',),
+'dernede': ('dernede',),
+'deromkring': ('deromkring',),
+'etterhvert': ('etterhvert',),
+'fordømrade': ('fordømrade',),
+'foreksempel': ('foreksempel',),
+'forsåvidt': ('forsåvidt',),
+'forøvrig': ('forøvrig',),
+'herefter': ('herefter',),
+'hvertfall': ('hvertfall',),
+'idag': ('idag',),
+'ifjor': ('ifjor',),
+'i gang': ('i gang',),
+'igår': ('igår',),
+'ihvertfall': ('ihvertfall',),
+'ikveld': ('ikveld',),
+'iland': ('iland',),
+'imorgen': ('imorgen',),
+'imøte': ('imøte',),
+'inatt': ('inatt',),
+'iorden': ('iorden',),
+'istand': ('istand',),
+'istedet': ('istedet',),
+'javisst': ('javisst',),
+'neivisst': ('neivisst',),
+'fortsatt': ('fortsatt',),
+'slik': ('slik',),
+'sådan': ('sådan',),
+'sånn': ('sånn',),
+'for eksempel': ('for eksempel',),
+'fra barnsbein av': ('fra barnsbein av',),
+'fra barnsben av': ('fra barnsben av',),
+'fra oven': ('fra oven',),
+'på vidvanke': ('på vidvanke',),
+'rubb og stubb': ('rubb og stubb',),
+'akterifra': ('akterifra',),
+'andsynes': ('andsynes',),
+'austenom': ('austenom',),
+'avslutningsvis': ('avslutningsvis',),
+'bøttevis': ('bøttevis',),
+'bakenfra': ('bakenfra',),
+'bakenom': ('bakenom',),
+'baki': ('baki',),
+'bedriftsvis': ('bedriftsvis',),
+'beklageligvis': ('beklageligvis',),
 'benveges': ('benveges',),
+'benveies': ('benveies',),
+'bistrende': ('bistrende',),
+'bitvis': ('bitvis',),
+'bortenom': ('bortenom',),
+'bortmed': ('bortmed',),
+'bråfort': ('bråfort',),
 'bunkevis': ('bunkevis',),
+'ca': ('ca',),
+'derigjennom': ('derigjennom',),
+'derover': ('derover',),
+'dessuaktet': ('dessuaktet',),
+'distriktsvis': ('distriktsvis',),
+'doloroso': ('doloroso',),
+'erfaringsvis': ('erfaringsvis',),
+'falskelig': ('falskelig',),
+'fjellstøtt': ('fjellstøtt',),
+'flekkvis': ('flekkvis',),
+'flerveis': ('flerveis',),
+'forholdvis': ('forholdvis',),
+'fornemmelig': ('fornemmelig',),
+'fornuftigvis': ('fornuftigvis',),
+'forsiktigvis': ('forsiktigvis',),
+'forskottsvis': ('forskottsvis',),
+'forskuddsvis': ('forskuddsvis',),
+'forutsetningsvis': ('forutsetningsvis',),
+'framt': ('framt',),
+'fremt': ('fremt',),
+'godhetsfullt': ('godhetsfullt',),
+'hvortil': ('hvortil',),
+'hvorunder': ('hvorunder',),
+'hvorved': ('hvorved',),
+'iltrende': ('iltrende',),
+'innatil': ('innatil',),
+'innentil': ('innentil',),
+'innigjennom': ('innigjennom',),
+'kilometervis': ('kilometervis',),
+'klattvis': ('klattvis',),
+'kolonnevis': ('kolonnevis',),
+'kommunevis': ('kommunevis',),
+'listelig': ('listelig',),
+'lusende': ('lusende',),
+'mildelig': ('mildelig',),
+'milevidt': ('milevidt',),
+'nordøstover': ('nordøstover',),
+'ovenover': ('ovenover',),
+'periodevis': ('periodevis',),
+'pirende': ('pirende',),
+'priori': ('priori',),
+'rettnok': ('rettnok',),
+'rykkvis': ('rykkvis',),
+'sørøstover': ('sørøstover',),
+'sørvestover': ('sørvestover',),
+'sedvanligvis': ('sedvanligvis',),
+'seksjonsvis': ('seksjonsvis',),
+'styggfort': ('styggfort',),
+'stykkomtil': ('stykkomtil',),
+'sydvestover': ('sydvestover',),
+'terminvis': ('terminvis',),
+'tertialvis': ('tertialvis',),
+'utdannelsesmessig': ('utdannelsesmessig',),
+'vis-à-vis': ('vis-à-vis',),
+'før': ('før',),
+'jo': ('jo',),
+'såvel': ('såvel',),
+'efterhvert': ('efterhvert',),
+'liksom': ('liksom',),
+'dann og vann': ('dann og vann',),
+'jaggu': ('jaggu',),
+'joggu': ('joggu',),
+'knekk': ('knekk',),
+'live': ('live',),
+'og': ('og',),
+'sabla': ('sabla',),
+'sikksakk': ('sikksakk',),
+'stadig': ('stadig',),
+'rett og slett': ('rett og slett',),
+'såvidt': ('såvidt',),
+'for moro skyld': ('for moro skyld',),
+'omlag': ('omlag',),
+'nattestid': ('nattestid',),
+'sørpe': ('sørpe',),
+'A.': ('A.',),
 'selv': ('selv',),
+'forlengst': ('forlengst',),
 'sjøl': ('sjøl',),
+'drita': ('drita',),
+'ennu': ('ennu',),
 'skauleies': ('skauleies',),
-'da capo': ('da capo',),
+'iallefall': ('iallefall',),
+'til alters': ('til alters',),
+'pokka': ('pokka',),
+'tilslutt': ('tilslutt',),
+'i steden': ('i steden',),
+'m.a.': ('m.a.',),
+'til syvende og sist': ('til syvende og sist',),
+'i en fei': ('i en fei',),
+'ender og da': ('ender og da',),
+'ender og gang': ('ender og gang',),
+'fra arilds tid': ('fra arilds tid',),
+'i hør og heim': ('i hør og heim',),
+'for fote': ('for fote',),
+'natterstid': ('natterstid',),
+'natterstider': ('natterstider',),
+'høgstdags': ('høgstdags',),
+'høgstnattes': ('høgstnattes',),
 'beint frem': ('beint frem',),
-'beintfrem': ('beintfrem',),
 'beinveges': ('beinveges',),
 'beinvegs': ('beinvegs',),
 'beinveis': ('beinveis',),
 'benvegs': ('benvegs',),
 'benveis': ('benveis',),
 'en garde': ('en garde',),
+'etter hvert': ('etter hvert',),
 'framåt': ('framåt',),
 'krittende': ('krittende',),
 'kvivitt': ('kvivitt',),
@ -801,5 +1177,14 @@ ADVERBS_WORDFORMS = {
 'til sammen': ('til sammen',),
 'tomrepes': ('tomrepes',),
 'medurs': ('medurs',),
-'moturs': ('moturs',)
+'moturs': ('moturs',),
+'til ansvar': ('til ansvar',),
+'til ansvars': ('til ansvars',),
+'til fullnads': ('til fullnads',),
+'concertando': ('concertando',),
+'lesto': ('lesto',),
+'tardando': ('tardando',),
+'natters tid': ('natters tid',),
+'natters tider': ('natters tider',),
+'snydens': ('snydens',)
 }
--- a/spacy/lang/nb/lemmatizer/_nouns_wordforms.py
+++ b/spacy/lang/nb/lemmatizer/_nouns_wordforms.py
--- a/spacy/lang/nb/lemmatizer/_verbs_wordforms.py
+++ b/spacy/lang/nb/lemmatizer/_verbs_wordforms.py
--- a/spacy/lang/nb/lemmatizer/lookup.py
+++ b/spacy/lang/nb/lemmatizer/lookup.py
--- a/spacy/lang/tl/init.py
+++ b/spacy/lang/tl/init.py
@ -0,0 +1,73 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
+from .stop_words import STOP_WORDS
+from .lex_attrs import LEX_ATTRS
+
+# uncomment if files are available
+# from .norm_exceptions import NORM_EXCEPTIONS
+from .tag_map import TAG_MAP
+# from .morph_rules import MORPH_RULES
+
+# uncomment if lookup-based lemmatizer is available
+from .lemmatizer import LOOKUP
+# from ...lemmatizerlookup import Lemmatizer
+
+from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
+from ..tokenizer_exceptions import BASE_EXCEPTIONS
+from ..norm_exceptions import BASE_NORMS
+from ...language import Language
+from ...attrs import LANG, NORM
+from ...util import update_exc, add_lookups
+
+def _return_tl(_):
+    return 'tl'
+
+
+# Create a Language subclass
+# Documentation: https://spacy.io/docs/usage/adding-languages
+
+# This file should be placed in spacy/lang/xx (ISO code of language).
+# Before submitting a pull request, make sure the remove all comments from the
+# language data files, and run at least the basic tokenizer tests. Simply add the
+# language ID to the list of languages in spacy/tests/conftest.py to include it
+# in the basic tokenizer sanity tests. You can optionally add a fixture for the
+# language's tokenizer and add more specific tests. For more info, see the
+# tests documentation: https://github.com/explosion/spaCy/tree/master/spacy/tests
+
+
+class TagalogDefaults(Language.Defaults):
+    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    lex_attr_getters[LANG] = _return_tl # ISO code
+    # add more norm exception dictionaries here
+    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS)
+
+    # overwrite functions for lexical attributes
+    lex_attr_getters.update(LEX_ATTRS)
+
+    # add custom tokenizer exceptions to base exceptions
+    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
+
+    # add stop words
+    stop_words = STOP_WORDS
+
+    # if available: add tag map
+    # tag_map = dict(TAG_MAP)
+
+    # if available: add morph rules
+    # morph_rules = dict(MORPH_RULES)
+
+    # if available: add lookup lemmatizer
+    # @classmethod
+    # def create_lemmatizer(cls, nlp=None):
+    #     return Lemmatizer(LOOKUP)
+
+
+class Tagalog(Language):
+    lang = 'tl' # ISO code
+    Defaults = TagalogDefaults # set Defaults to custom language defaults
+
+
+# set default export – this allows the language class to be lazy-loaded
+__all__ = ['Tagalog']
--- a/spacy/lang/tl/lemmatizer.py
+++ b/spacy/lang/tl/lemmatizer.py
@ -0,0 +1,18 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+# Adding a lemmatizer lookup table
+# Documentation: https://spacy.io/docs/usage/adding-languages#lemmatizer
+# Entries should be added in the following format:
+
+
+LOOKUP = {
+    "kaugnayan": "ugnay",
+    "sangkatauhan": "tao",
+    "kanayunan": "nayon",
+    "pandaigdigan": "daigdig",
+    "kasaysayan": "saysay",
+    "kabayanihan": "bayani",
+    "karuwagan": "duwag"
+}
--- a/spacy/lang/tl/lex_attrs.py
+++ b/spacy/lang/tl/lex_attrs.py
@ -0,0 +1,43 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+# import the symbols for the attrs you want to overwrite
+from ...attrs import LIKE_NUM
+
+
+# Overwriting functions for lexical attributes
+# Documentation: https://localhost:1234/docs/usage/adding-languages#lex-attrs
+# Most of these functions, like is_lower or like_url should be language-
+# independent. Others, like like_num (which includes both digits and number
+# words), requires customisation.
+
+
+# Example: check if token resembles a number
+
+_num_words = ['sero', 'isa', 'dalawa', 'tatlo', 'apat', 'lima', 'anim', 'pito',
+              'walo', 'siyam', 'sampu', 'labing-isa', 'labindalawa', 'labintatlo', 'labing-apat',
+              'labinlima', 'labing-anim', 'labimpito', 'labing-walo', 'labinsiyam', 'dalawampu',
+              'tatlumpu', 'apatnapu', 'limampu', 'animnapu', 'pitumpu', 'walumpu', 'siyamnapu',
+              'daan', 'libo', 'milyon', 'bilyon', 'trilyon', 'quadrilyon',
+              'gajilyon', 'bazilyon']
+
+
+def like_num(text):
+    text = text.replace(',', '').replace('.', '')
+    if text.isdigit():
+        return True
+    if text.count('/') == 1:
+        num, denom = text.split('/')
+        if num.isdigit() and denom.isdigit():
+            return True
+    if text in _num_words:
+        return True
+    return False
+
+
+# Create dictionary of functions to overwrite. The default lex_attr_getters are
+# updated with this one, so only the functions defined here are overwritten.
+
+LEX_ATTRS = {
+    LIKE_NUM: like_num
+}
--- a/spacy/lang/tl/stop_words.py
+++ b/spacy/lang/tl/stop_words.py
@ -0,0 +1,162 @@
+# encoding: utf8
+from __future__ import unicode_literals
+
+
+# Add stop words
+# Documentation: https://spacy.io/docs/usage/adding-languages#stop-words
+# To improve readability, words should be ordered alphabetically and separated
+# by spaces and newlines. When adding stop words from an online source, always
+# include the link in a comment. Make sure to proofread and double-check the
+# words – lists available online are often known to contain mistakes.
+
+# data from https://github.com/stopwords-iso/stopwords-tl/blob/master/stopwords-tl.txt
+
+STOP_WORDS = set("""
+    akin
+    aking
+    ako
+    alin
+    am
+    amin
+    aming
+    ang
+    ano
+    anumang
+    apat
+    at
+    atin
+    ating
+    ay
+    bababa
+    bago
+    bakit
+    bawat
+    bilang
+    dahil
+    dalawa
+    dapat
+    din
+    dito
+    doon
+    gagawin
+    gayunman
+    ginagawa
+    ginawa
+    ginawang
+    gumawa
+    gusto
+    habang
+    hanggang
+    hindi
+    huwag
+    iba
+    ibaba
+    ibabaw
+    ibig
+    ikaw
+    ilagay
+    ilalim
+    ilan
+    inyong
+    isa
+    isang
+    itaas
+    ito
+    iyo
+    iyon
+    iyong
+    ka
+    kahit
+    kailangan
+    kailanman
+    kami
+    kanila
+    kanilang
+    kanino
+    kanya
+    kanyang
+    kapag
+    kapwa
+    karamihan
+    katiyakan
+    katulad
+    kaya
+    kaysa
+    ko
+    kong
+    kulang
+    kumuha
+    kung
+    laban
+    lahat
+    lamang
+    likod
+    lima
+    maaari
+    maaaring
+    maging
+    mahusay
+    makita
+    marami
+    marapat
+    masyado
+    may
+    mayroon
+    mga
+    minsan
+    mismo
+    mula
+    muli
+    na
+    nabanggit
+    naging
+    nagkaroon
+    nais
+    nakita
+    namin
+    napaka
+    narito
+    nasaan
+    ng
+    ngayon
+    ni
+    nila
+    nilang
+    nito
+    niya
+    niyang
+    noon
+    o
+    pa
+    paano
+    pababa
+    paggawa
+    pagitan
+    pagkakaroon
+    pagkatapos
+    palabas
+    pamamagitan
+    panahon
+    pangalawa
+    para
+    paraan
+    pareho
+    pataas
+    pero
+    pumunta
+    pumupunta
+    sa
+    saan
+    sabi
+    sabihin
+    sarili
+    sila
+    sino
+    siya
+    tatlo
+    tayo
+    tulad
+    tungkol
+    una
+    walang
+""".split())
--- a/spacy/lang/tl/tag_map.py
+++ b/spacy/lang/tl/tag_map.py
@ -0,0 +1,36 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ...symbols import POS, ADV, NOUN, ADP, PRON, SCONJ, PROPN, DET, SYM, INTJ
+from ...symbols import PUNCT, NUM, AUX, X, CONJ, ADJ, VERB, PART, SPACE, CCONJ
+
+
+# Add a tag map
+# Documentation: https://spacy.io/docs/usage/adding-languages#tag-map
+# Universal Dependencies: http://universaldependencies.org/u/pos/all.html
+# The keys of the tag map should be strings in your tag set. The dictionary must
+# have an entry POS whose value is one of the Universal Dependencies tags.
+# Optionally, you can also include morphological features or other attributes.
+
+
+TAG_MAP = {
+    "ADV":      {POS: ADV},
+    "NOUN":     {POS: NOUN},
+    "ADP":      {POS: ADP},
+    "PRON":     {POS: PRON},
+    "SCONJ":    {POS: SCONJ},
+    "PROPN":    {POS: PROPN},
+    "DET":      {POS: DET},
+    "SYM":      {POS: SYM},
+    "INTJ":     {POS: INTJ},
+    "PUNCT":    {POS: PUNCT},
+    "NUM":      {POS: NUM},
+    "AUX":      {POS: AUX},
+    "X":        {POS: X},
+    "CONJ":     {POS: CONJ},
+    "CCONJ":    {POS: CCONJ},
+    "ADJ":      {POS: ADJ},
+    "VERB":     {POS: VERB},
+    "PART":     {POS: PART},
+    "SP":     	{POS: SPACE}
+}
--- a/spacy/lang/tl/tokenizer_exceptions.py
+++ b/spacy/lang/tl/tokenizer_exceptions.py
@ -0,0 +1,48 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+# import symbols – if you need to use more, add them here
+from ...symbols import ORTH, LEMMA, TAG, NORM, ADP, DET
+
+
+# Add tokenizer exceptions
+# Documentation: https://spacy.io/docs/usage/adding-languages#tokenizer-exceptions
+# Feel free to use custom logic to generate repetitive exceptions more efficiently.
+# If an exception is split into more than one token, the ORTH values combined always
+# need to match the original string.
+
+# Exceptions should be added in the following format:
+
+_exc = {
+    "tayo'y": [
+        {ORTH: "tayo", LEMMA: "tayo"},
+        {ORTH: "'y", LEMMA: "ay"}],
+    "isa'y": [
+        {ORTH: "isa", LEMMA: "isa"},
+        {ORTH: "'y", LEMMA: "ay"}],
+    "baya'y": [
+        {ORTH: "baya", LEMMA: "bayan"},
+        {ORTH: "'y", LEMMA: "ay"}],
+    "sa'yo": [
+        {ORTH: "sa", LEMMA: "sa"},
+        {ORTH: "'yo", LEMMA: "iyo"}],
+    "ano'ng": [
+        {ORTH: "ano", LEMMA: "ano"},
+        {ORTH: "'ng", LEMMA: "ang"}],
+    "siya'y": [
+        {ORTH: "siya", LEMMA: "siya"},
+        {ORTH: "'y", LEMMA: "ay"}],
+    "nawa'y": [
+        {ORTH: "nawa", LEMMA: "nawa"},
+        {ORTH: "'y", LEMMA: "ay"}],
+    "papa'no": [
+        {ORTH: "papa'no", LEMMA: "papaano"}],
+    "'di": [
+        {ORTH: "'di", LEMMA: "hindi"}]
+}
+
+
+# To keep things clean and readable, it's recommended to only declare the
+# TOKENIZER_EXCEPTIONS at the bottom:
+
+TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/matcher.pyx
+++ b/spacy/matcher.pyx
@ -291,6 +291,8 @@ cdef char get_quantifier(PatternStateC state) nogil:

 DEF PADDING = 5

+DEF PADDING = 5
+

 cdef TokenPatternC* init_pattern(Pool mem, attr_t entity_id,
                                 object token_specs) except NULL:
--- a/spacy/tests/doc/test_doc_api.py
+++ b/spacy/tests/doc/test_doc_api.py
@ -189,6 +189,25 @@ def test_doc_api_merge(en_tokenizer):
    assert doc[5].text_with_ws == "all night"
    assert doc[5].tag_ == "NAMED"

+    # merge both with bulk merge
+    doc = en_tokenizer(text)
+    assert len(doc) == 9
+    with doc.retokenize() as retokenizer:
+        retokenizer.merge(
+            doc[4:7], attrs={"tag": "NAMED", "lemma": "LEMMA", "ent_type": "TYPE"}
+        )
+        retokenizer.merge(
+            doc[7:9], attrs={"tag": "NAMED", "lemma": "LEMMA", "ent_type": "TYPE"}
+        )
+
+    assert len(doc) == 6
+    assert doc[4].text == "the beach boys"
+    assert doc[4].text_with_ws == "the beach boys "
+    assert doc[4].tag_ == "NAMED"
+    assert doc[5].text == "all night"
+    assert doc[5].text_with_ws == "all night"
+    assert doc[5].tag_ == "NAMED"
+

 def test_doc_api_merge_children(en_tokenizer):
    """Test that attachments work correctly after merging."""
--- a/spacy/tests/doc/test_span_merge.py
+++ b/spacy/tests/doc/test_span_merge.py
@ -67,6 +67,22 @@ def test_spans_merge_non_disjoint(en_tokenizer):
            )


+def test_spans_merge_non_disjoint(en_tokenizer):
+    text = "Los Angeles start."
+    tokens = en_tokenizer(text)
+    doc = get_doc(tokens.vocab, [t.text for t in tokens])
+    with pytest.raises(ValueError):
+        with doc.retokenize() as retokenizer:
+            retokenizer.merge(
+                doc[0:2],
+                attrs={"tag": "NNP", "lemma": "Los Angeles", "ent_type": "GPE"},
+            )
+            retokenizer.merge(
+                doc[0:1],
+                attrs={"tag": "NNP", "lemma": "Los Angeles", "ent_type": "GPE"},
+            )
+
+
 def test_span_np_merges(en_tokenizer):
    text = "displaCy is a parse tool built with Javascript"
    heads = [1, 0, 2, 1, -3, -1, -1, -1]
--- a/spacy/tests/lang/fr/test_exceptions.py
+++ b/spacy/tests/lang/fr/test_exceptions.py
@ -5,15 +5,36 @@ import pytest


@pytest.mark.parametrize(
-    "text", ["aujourd'hui", "Aujourd'hui", "prud'hommes", "prud’hommal",
-             "audio-numérique", "Audio-numérique",
-             "entr'amis", "entr'abat", "rentr'ouvertes", "grand'hamien",
-             "Châteauneuf-la-Forêt", "Château-Guibert",
-             "11-septembre", "11-Septembre", "refox-trottâmes",
-             "K-POP", "K-Pop", "K-pop", "z'yeutes",
-             "black-outeront", "états-unienne",
-             "courtes-pattes", "court-pattes",
-             "saut-de-ski", "Écourt-Saint-Quentin", "Bout-de-l'Îlien", "pet-en-l'air"]
+    "text",
+    [
+        "aujourd'hui",
+        "Aujourd'hui",
+        "prud'hommes",
+        "prud’hommal",
+        "audio-numérique",
+        "Audio-numérique",
+        "entr'amis",
+        "entr'abat",
+        "rentr'ouvertes",
+        "grand'hamien",
+        "Châteauneuf-la-Forêt",
+        "Château-Guibert",
+        "11-septembre",
+        "11-Septembre",
+        "refox-trottâmes",
+        "K-POP",
+        "K-Pop",
+        "K-pop",
+        "z'yeutes",
+        "black-outeront",
+        "états-unienne",
+        "courtes-pattes",
+        "court-pattes",
+        "saut-de-ski",
+        "Écourt-Saint-Quentin",
+        "Bout-de-l'Îlien",
+        "pet-en-l'air",
+    ],
 )
 def test_fr_tokenizer_infix_exceptions(fr_tokenizer, text):
    tokens = fr_tokenizer(text)
--- a/spacy/tests/regression/_test_issue1622.py
+++ b/spacy/tests/regression/_test_issue1622.py
@ -0,0 +1,89 @@
+# coding: utf-8
+from __future__ import unicode_literals
+import json
+from tempfile import NamedTemporaryFile
+import pytest
+
+from ...cli.train import train
+
+
+def test_cli_trained_model_can_be_saved(tmpdir):
+    lang = 'nl'
+    output_dir = str(tmpdir)
+    train_file = NamedTemporaryFile('wb', dir=output_dir, delete=False)
+    train_corpus = [
+        {
+            "id": "identifier_0",
+            "paragraphs": [
+                {
+                    "raw": "Jan houdt van Marie.\n",
+                    "sentences": [
+                        {
+                            "tokens": [
+                                {
+                                    "id": 0,
+                                    "dep": "nsubj",
+                                    "head": 1,
+                                    "tag": "NOUN",
+                                    "orth": "Jan",
+                                    "ner": "B-PER"
+                                },
+                                {
+                                    "id": 1,
+                                    "dep": "ROOT",
+                                    "head": 0,
+                                    "tag": "VERB",
+                                    "orth": "houdt",
+                                    "ner": "O"
+                                },
+                                {
+                                    "id": 2,
+                                    "dep": "case",
+                                    "head": 1,
+                                    "tag": "ADP",
+                                    "orth": "van",
+                                    "ner": "O"
+                                },
+                                {
+                                    "id": 3,
+                                    "dep": "obj",
+                                    "head": -2,
+                                    "tag": "NOUN",
+                                    "orth": "Marie",
+                                    "ner": "B-PER"
+                                },
+                                {
+                                    "id": 4,
+                                    "dep": "punct",
+                                    "head": -3,
+                                    "tag": "PUNCT",
+                                    "orth": ".",
+                                    "ner": "O"
+                                },
+                                {
+                                    "id": 5,
+                                    "dep": "",
+                                    "head": -1,
+                                    "tag": "SPACE",
+                                    "orth": "\n",
+                                    "ner": "O"
+                                }
+                            ],
+                            "brackets": []
+                        }
+                    ]
+                }
+            ]
+        }
+    ]
+
+    train_file.write(json.dumps(train_corpus).encode('utf-8'))
+    train_file.close()
+    train_data = train_file.name
+    dev_data = train_data
+
+    # spacy train -n 1 -g -1 nl output_nl training_corpus.json training \
+    # corpus.json
+    train(lang, output_dir, train_data, dev_data, n_iter=1)
+
+    assert True
--- a/spacy/tests/regression/test_issue2800.py
+++ b/spacy/tests/regression/test_issue2800.py
@ -0,0 +1,36 @@
+'''Test issue that arises when too many labels are added to NER model.'''
+from __future__ import unicode_literals
+
+import random
+from ...lang.en import English
+
+def train_model(train_data, entity_types):
+    nlp = English(pipeline=[])
+
+    ner = nlp.create_pipe("ner")
+    nlp.add_pipe(ner)
+
+    for entity_type in list(entity_types):
+        ner.add_label(entity_type)
+
+    optimizer = nlp.begin_training()
+
+    # Start training
+    for i in range(20):
+        losses = {}
+        index = 0
+        random.shuffle(train_data)
+
+        for statement, entities in train_data:
+            nlp.update([statement], [entities], sgd=optimizer, losses=losses, drop=0.5)
+    return nlp
+
+
+def test_train_with_many_entity_types():
+    train_data = []
+    train_data.extend([("One sentence", {"entities": []})])
+    entity_types = [str(i) for i in range(1000)]
+
+    model = train_model(train_data, entity_types)
+
+    
--- a/spacy/tests/test_symlink_windows.py
+++ b/spacy/tests/test_symlink_windows.py
@ -0,0 +1,40 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import pytest
+import os
+from pathlib import Path
+
+from ..compat import symlink_to, symlink_remove, path2str
+
+
+def target_local_path():
+    return "./foo-target"
+
+
+def link_local_path():
+    return "./foo-symlink"
+
+
+@pytest.fixture(scope="function")
+def setup_target(request):
+    target = Path(target_local_path())
+    if not target.exists():
+        os.mkdir(path2str(target))
+
+    # yield -- need to cleanup even if assertion fails
+    # https://github.com/pytest-dev/pytest/issues/2508#issuecomment-309934240
+    def cleanup():
+        symlink_remove(Path(link_local_path()))
+        os.rmdir(target_local_path())
+
+    request.addfinalizer(cleanup)
+
+
+def test_create_symlink_windows(setup_target):
+    target = Path(target_local_path())
+    link = Path(link_local_path())
+    assert target.exists()
+
+    symlink_to(link, target)
+    assert link.exists()
--- a/spacy/tokens/token.pyx
+++ b/spacy/tokens/token.pyx
@ -865,7 +865,7 @@ cdef class Token:
            return Lexeme.c_check_flag(self.c.lex, IS_LEFT_PUNCT)

    property is_right_punct:
-        """RETURNS (bool): Whether the token is a left punctuation mark."""
+        """RETURNS (bool): Whether the token is a right punctuation mark."""
        def __get__(self):
            return Lexeme.c_check_flag(self.c.lex, IS_RIGHT_PUNCT)

--- a/website/api/_annotation/_named-entities.jade
+++ b/website/api/_annotation/_named-entities.jade
@ -2,7 +2,7 @@

 p
    |  Models trained on the
-    |  #[+a("https://catalog.ldc.upenn.edu/ldc2013t19") OntoNotes 5] corpus
+    |  #[+a("https://catalog.ldc.upenn.edu/LDC2013T19") OntoNotes 5] corpus
    |  support the following entity types:

 +table(["Type", "Description"])
--- a/website/api/cli.jade
+++ b/website/api/cli.jade
@ -245,7 +245,7 @@ p The following file format converters are available:

    +row
        +cell #[code iob]
-        +cell IOB named entity recognition format.
+        +cell IOB or IOB2 named entity recognition format.

 +h(3, "train") Train

--- a/website/api/span.jade
+++ b/website/api/span.jade
@ -352,6 +352,7 @@ p Retokenize the document, such that the span is merged into a single token.
 +h(2, "ents") Span.ents
    +tag property
    +tag-model("NER")
+    +tag-new("2.0.12")

 p
    |  Iterate over the entities in the span. Yields named-entity
--- a/website/api/token.jade
+++ b/website/api/token.jade
@ -714,7 +714,7 @@ p The L2 norm of the token's vector representation.
        +cell bool
        +cell
            |  Does the token consist of ASCII characters? Equivalent to
-            |  #[code [any(ord(c) >= 128 for c in token.text)]].
+            |  #[code all(ord(c) &lt; 128 for c in token.text)].

    +row
        +cell #[code is_digit]
--- a/website/usage/_install/_instructions.jade
+++ b/website/usage/_install/_instructions.jade
@ -91,8 +91,8 @@ p

 p
    |  spaCy can be installed on GPU by specifying #[code spacy[cuda]],
-    |  #[code spacy[cuda90]], #[code spacy[cuda91]], #[code spacy[cuda92]] or
-    |  #[code spacy[cuda10]]. If you know your cuda version, using the more
+    |  #[code spacy[cuda90]], #[code spacy[cuda91]] or #[code spacy[cuda92]].
+    |  If you know your cuda version, using the more
    |  explicit specifier allows cupy to be installed via wheel, saving some
    |  compilation time. The specifiers should install two libraries:
    |  #[+a("https://cupy.chainer.org") #[code cupy]] and
--- a/website/usage/_linguistic-features/_rule-based-matching.jade
+++ b/website/usage/_linguistic-features/_rule-based-matching.jade
@ -206,7 +206,8 @@ p
    nlp = spacy.load('en_core_web_sm')
    matcher = PhraseMatcher(nlp.vocab)
    terminology_list = ['Barack Obama', 'Angela Merkel', 'Washington, D.C.']
-    patterns = [nlp(text) for text in terminology_list]
+    # Only run nlp.make_doc to speed things up
+    patterns = [nlp.make_doc(text) for text in terminology_list]
    matcher.add('TerminologyList', None, *patterns)

    doc = nlp(u"German Chancellor Angela Merkel and US President Barack Obama "
--- a/website/usage/_models/_languages.jade
+++ b/website/usage/_models/_languages.jade
@ -44,7 +44,7 @@ p

    +list.o-no-block
        +item #[strong Chinese]: #[+a("https://github.com/fxsjy/jieba") Jieba]
-        +item #[strong Japanese]: #[+a("https://github.com/taku910/mecab") MeCab]
+        +item #[strong Japanese]: #[+a("https://github.com/taku910/mecab") MeCab] with #[+a("http://unidic.ninjal.ac.jp/back_number#unidic_cwj") Unidic]
        +item #[strong Thai]: #[+a("https://github.com/wannaphongcom/pythainlp") pythainlp]
        +item #[strong Vietnamese]: #[+a("https://github.com/trungtv/pyvi") Pyvi]
        +item #[strong Russian]: #[+a("https://github.com/kmike/pymorphy2") pymorphy2]
--- a/website/usage/_processing-pipelines/_custom-components.jade
+++ b/website/usage/_processing-pipelines/_custom-components.jade
@ -72,7 +72,7 @@ p
        name = 'entity_matcher'

        def __init__(self, nlp, terms, label):
-            patterns = [nlp(text) for text in terms]
+            patterns = [nlp.make_doc(text) for text in terms]
            self.matcher = PhraseMatcher(nlp.vocab)
            self.matcher.add(label, None, *patterns)

--- a/website/usage/_v2/_migrating.jade
+++ b/website/usage/_v2/_migrating.jade
@ -240,7 +240,7 @@ p
 +code-new.
    from spacy.matcher import PhraseMatcher
    matcher = PhraseMatcher(nlp.vocab)
-    patterns = [nlp(text) for text in large_terminology_list]
+    patterns = [nlp.make_doc(text) for text in large_terminology_list]
    matcher.add('PRODUCT', None, *patterns)

 +code-old.