diff --git a/.github/contributors/F0rge1cE.md b/.github/contributors/F0rge1cE.md new file mode 100644 index 000000000..f9f987100 --- /dev/null +++ b/.github/contributors/F0rge1cE.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [x] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Icarus Xu | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 05/06/2019 | +| GitHub username | F0rge1cE | +| Website (optional) | | diff --git a/.github/contributors/amitness.md b/.github/contributors/amitness.md new file mode 100644 index 000000000..dd27e7481 --- /dev/null +++ b/.github/contributors/amitness.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [X] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Amit Chaudhary | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | April 29, 2019 | +| GitHub username | amitness | +| Website (optional) | https://amitness.com | diff --git a/.github/contributors/henry860916.md b/.github/contributors/henry860916.md new file mode 100644 index 000000000..b01f81edd --- /dev/null +++ b/.github/contributors/henry860916.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | ------------------------ | +| Name | Henry Zhang | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 2019-04-30 | +| GitHub username | henry860916 | +| Website (optional) | | diff --git a/.github/contributors/ldorigo.md b/.github/contributors/ldorigo.md new file mode 100644 index 000000000..c37e8bf1d --- /dev/null +++ b/.github/contributors/ldorigo.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Luca Dorigo | +| Company name (if applicable) | / | +| Title or role (if applicable) | / | +| Date | 08.05.2019 | +| GitHub username | ldorigo | +| Website (optional) | / | diff --git a/.github/contributors/richardpaulhudson.md b/.github/contributors/richardpaulhudson.md new file mode 100644 index 000000000..3d68b98c2 --- /dev/null +++ b/.github/contributors/richardpaulhudson.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Richard Paul Hudson | +| Company name (if applicable) | msg systems ag | +| Title or role (if applicable) | Principal IT Consultant| +| Date | 06. May 2019 | +| GitHub username | richardpaulhudson | +| Website (optional) | | diff --git a/.github/contributors/yaph.md b/.github/contributors/yaph.md new file mode 100644 index 000000000..d3697bcbc --- /dev/null +++ b/.github/contributors/yaph.md @@ -0,0 +1,106 @@ +# spaCy contributor agreement + +This spaCy Contributor Agreement (**"SCA"**) is based on the +[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). +The SCA applies to any contribution that you make to any product or project +managed by us (the **"project"**), and sets out the intellectual property rights +you grant to us in the contributed materials. The term **"us"** shall mean +[ExplosionAI GmbH](https://explosion.ai/legal). The term +**"you"** shall mean the person or entity identified below. + +If you agree to be bound by these terms, fill in the information requested +below and include the filled-in version with your first pull request, under the +folder [`.github/contributors/`](/.github/contributors/). The name of the file +should be your GitHub username, with the extension `.md`. For example, the user +example_user would create the file `.github/contributors/example_user.md`. + +Read this agreement carefully before signing. These terms and conditions +constitute a binding legal agreement. + +## Contributor Agreement + +1. The term "contribution" or "contributed materials" means any source code, +object code, patch, tool, sample, graphic, specification, manual, +documentation, or any other material posted or submitted by you to the project. + +2. With respect to any worldwide copyrights, or copyright applications and +registrations, in your contribution: + + * you hereby assign to us joint ownership, and to the extent that such + assignment is or becomes invalid, ineffective or unenforceable, you hereby + grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, + royalty-free, unrestricted license to exercise all rights under those + copyrights. This includes, at our option, the right to sublicense these same + rights to third parties through multiple levels of sublicensees or other + licensing arrangements; + + * you agree that each of us can do all things in relation to your + contribution as if each of us were the sole owners, and if one of us makes + a derivative work of your contribution, the one who makes the derivative + work (or has it made will be the sole owner of that derivative work; + + * you agree that you will not assert any moral rights in your contribution + against us, our licensees or transferees; + + * you agree that we may register a copyright in your contribution and + exercise all ownership rights associated with it; and + + * you agree that neither of us has any duty to consult with, obtain the + consent of, pay or render an accounting to the other for any use or + distribution of your contribution. + +3. With respect to any patents you own, or that you can license without payment +to any third party, you hereby grant to us a perpetual, irrevocable, +non-exclusive, worldwide, no-charge, royalty-free license to: + + * make, have made, use, sell, offer to sell, import, and otherwise transfer + your contribution in whole or in part, alone or in combination with or + included in any product, work or materials arising out of the project to + which your contribution was submitted, and + + * at our option, to sublicense these same rights to third parties through + multiple levels of sublicensees or other licensing arrangements. + +4. Except as set out above, you keep all right, title, and interest in your +contribution. The rights that you grant to us under these terms are effective +on the date you first submitted a contribution to us, even if your submission +took place before the date you sign these terms. + +5. You covenant, represent, warrant and agree that: + + * Each contribution that you submit is and shall be an original work of + authorship and you can legally grant the rights set out in this SCA; + + * to the best of your knowledge, each contribution will not violate any + third party's copyrights, trademarks, patents, or other intellectual + property rights; and + + * each contribution shall be in compliance with U.S. export control laws and + other applicable export and import laws. You agree to notify us if you + become aware of any circumstance which would make any of the foregoing + representations inaccurate in any respect. We may publicly disclose your + participation in the project, including the fact that you have signed the SCA. + +6. This SCA is governed by the laws of the State of California and applicable +U.S. Federal law. Any choice of law rules will not apply. + +7. Please place an “x” on one of the applicable statement below. Please do NOT +mark both statements: + + * [x] I am signing on behalf of myself as an individual and no other person + or entity, including my employer, has or will have rights with respect to my + contributions. + + * [ ] I am signing on behalf of my employer or a legal entity and I have the + actual authority to contractually bind that entity. + +## Contributor Details + +| Field | Entry | +|------------------------------- | -------------------- | +| Name | Ramiro Gómez | +| Company name (if applicable) | | +| Title or role (if applicable) | | +| Date | 2019-04-29 | +| GitHub username | yaph | +| Website (optional) | http://ramiro.org/ | diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 3c681f74f..82de54f01 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -447,17 +447,7 @@ use the `get_doc()` utility function to construct it manually. ## Updating the website -Our [website and docs](https://spacy.io) are implemented in -[Jade/Pug](https://www.jade-lang.org), and built or served by -[Harp](https://harpjs.com). Jade/Pug is an extensible templating language with a -readable syntax, that compiles to HTML. Here's how to view the site locally: - -```bash -sudo npm install --global harp -git clone https://github.com/explosion/spaCy -cd spaCy/website -harp server -``` +For instructions on how to build and run the [website](https://spacy.io) locally see **[Setup and installation](https://github.com/explosion/spaCy/blob/master/website/README.md#setup-and-installation-setup)** in the *website* directory's README. The docs can always use another example or more detail, and they should always be up to date and not misleading. To quickly find the correct file to edit, diff --git a/examples/information_extraction/entity_relations.py b/examples/information_extraction/entity_relations.py index ffc8164e1..138247623 100644 --- a/examples/information_extraction/entity_relations.py +++ b/examples/information_extraction/entity_relations.py @@ -36,11 +36,27 @@ def main(model="en_core_web_sm"): print("{:<10}\t{}\t{}".format(r1.text, r2.ent_type_, r2.text)) +def filter_spans(spans): + # Filter a sequence of spans so they don't contain overlaps + get_sort_key = lambda span: (span.end - span.start, span.start) + sorted_spans = sorted(spans, key=get_sort_key, reverse=True) + result = [] + seen_tokens = set() + for span in sorted_spans: + if span.start not in seen_tokens and span.end - 1 not in seen_tokens: + result.append(span) + seen_tokens.update(range(span.start, span.end)) + return result + + def extract_currency_relations(doc): - # merge entities and noun chunks into one token + # Merge entities and noun chunks into one token + seen_tokens = set() spans = list(doc.ents) + list(doc.noun_chunks) - for span in spans: - span.merge() + spans = filter_spans(spans) + with doc.retokenize() as retokenizer: + for span in spans: + retokenizer.merge(span) relations = [] for money in filter(lambda w: w.ent_type_ == "MONEY", doc): diff --git a/requirements.txt b/requirements.txt index bf95839b5..169fb37cd 100644 --- a/requirements.txt +++ b/requirements.txt @@ -9,7 +9,7 @@ srsly>=0.0.5,<1.1.0 # Third party dependencies numpy>=1.15.0 requests>=2.13.0,<3.0.0 -jsonschema>=2.6.0,<3.0.0 +jsonschema>=2.6.0,<3.1.0 plac<1.0.0,>=0.9.6 pathlib==1.0.1; python_version < "3.4" # Development dependencies diff --git a/setup.py b/setup.py index 23d535058..2c05f8d70 100755 --- a/setup.py +++ b/setup.py @@ -232,7 +232,7 @@ def setup_package(): "blis>=0.2.2,<0.3.0", "plac<1.0.0,>=0.9.6", "requests>=2.13.0,<3.0.0", - "jsonschema>=2.6.0,<3.0.0", + "jsonschema>=2.6.0,<3.1.0", "wasabi>=0.2.0,<1.1.0", "srsly>=0.0.5,<1.1.0", 'pathlib==1.0.1; python_version < "3.4"', diff --git a/spacy/cli/init_model.py b/spacy/cli/init_model.py index 8ffbe7976..6626b52e4 100644 --- a/spacy/cli/init_model.py +++ b/spacy/cli/init_model.py @@ -181,7 +181,7 @@ def read_vectors(vectors_loc): vectors_keys = [] for i, line in enumerate(tqdm(f)): line = line.rstrip() - pieces = line.rsplit(" ", vectors_data.shape[1] + 1) + pieces = line.rsplit(" ", vectors_data.shape[1]) word = pieces.pop(0) if len(pieces) != vectors_data.shape[1]: msg.fail(Errors.E094.format(line_num=i, loc=vectors_loc), exits=1) diff --git a/spacy/cli/pretrain.py b/spacy/cli/pretrain.py index ef91937a6..b2c22d929 100644 --- a/spacy/cli/pretrain.py +++ b/spacy/cli/pretrain.py @@ -181,10 +181,10 @@ def make_update(model, docs, optimizer, drop=0.0, objective="L2"): def make_docs(nlp, batch, min_length, max_length): docs = [] for record in batch: - text = record["text"] if "tokens" in record: doc = Doc(nlp.vocab, words=record["tokens"]) else: + text = record["text"] doc = nlp.make_doc(text) if "heads" in record: heads = record["heads"] diff --git a/spacy/cli/train.py b/spacy/cli/train.py index 63c6242de..5b7cffb6b 100644 --- a/spacy/cli/train.py +++ b/spacy/cli/train.py @@ -16,6 +16,7 @@ import random from .._ml import create_default_optimizer from ..attrs import PROB, IS_OOV, CLUSTER, LANG from ..gold import GoldCorpus +from ..compat import path2str from .. import util from .. import about @@ -423,10 +424,12 @@ def _collate_best_model(meta, output_path, components): for component in components: bests[component] = _find_best(output_path, component) best_dest = output_path / "model-best" - shutil.copytree(output_path / "model-final", best_dest) + shutil.copytree(path2str(output_path / "model-final"), path2str(best_dest)) for component, best_component_src in bests.items(): - shutil.rmtree(best_dest / component) - shutil.copytree(best_component_src / component, best_dest / component) + shutil.rmtree(path2str(best_dest / component)) + shutil.copytree( + path2str(best_component_src / component), path2str(best_dest / component) + ) accs = srsly.read_json(best_component_src / "accuracy.json") for metric in _get_metrics(component): meta["accuracy"][metric] = accs[metric] diff --git a/spacy/glossary.py b/spacy/glossary.py index 6e393bba2..ff38e7138 100644 --- a/spacy/glossary.py +++ b/spacy/glossary.py @@ -168,6 +168,7 @@ GLOSSARY = { # Dependency Labels (English) # ClearNLP / Universal Dependencies # https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md + "acl": "clausal modifier of noun (adjectival clause)", "acomp": "adjectival complement", "advcl": "adverbial clause modifier", "advmod": "adverbial modifier", @@ -177,22 +178,32 @@ GLOSSARY = { "attr": "attribute", "aux": "auxiliary", "auxpass": "auxiliary (passive)", + "case": "case marking", "cc": "coordinating conjunction", "ccomp": "clausal complement", + "clf": "classifier", "complm": "complementizer", + "compound": "compound", "conj": "conjunct", "cop": "copula", "csubj": "clausal subject", "csubjpass": "clausal subject (passive)", + "dative": "dative", "dep": "unclassified dependent", "det": "determiner", + "discourse": "discourse element", + "dislocated": "dislocated elements", "dobj": "direct object", "expl": "expletive", + "fixed": "fixed multiword expression", + "flat": "flat multiword expression", + "goeswith": "goes with", "hmod": "modifier in hyphenation", "hyph": "hyphen", "infmod": "infinitival modifier", "intj": "interjection", "iobj": "indirect object", + "list": "list", "mark": "marker", "meta": "meta modifier", "neg": "negation modifier", @@ -201,11 +212,15 @@ GLOSSARY = { "npadvmod": "noun phrase as adverbial modifier", "nsubj": "nominal subject", "nsubjpass": "nominal subject (passive)", + "nounmod": "modifier of nominal", + "npmod": "noun phrase as adverbial modifier", "num": "number modifier", "number": "number compound modifier", + "nummod": "numeric modifier", "oprd": "object predicate", "obj": "object", "obl": "oblique nominal", + "orphan": "orphan", "parataxis": "parataxis", "partmod": "participal modifier", "pcomp": "complement of preposition", @@ -218,7 +233,10 @@ GLOSSARY = { "punct": "punctuation", "quantmod": "modifier of quantifier", "rcmod": "relative clause modifier", + "relcl": "relative clause modifier", + "reparandum": "overridden disfluency", "root": "root", + "vocative": "vocative", "xcomp": "open clausal complement", # Dependency labels (German) # TIGER Treebank diff --git a/spacy/lang/de/stop_words.py b/spacy/lang/de/stop_words.py index b5d25bf04..cf3204d5e 100644 --- a/spacy/lang/de/stop_words.py +++ b/spacy/lang/de/stop_words.py @@ -5,8 +5,8 @@ from __future__ import unicode_literals STOP_WORDS = set( """ á a ab aber ach acht achte achten achter achtes ag alle allein allem allen -aller allerdings alles allgemeinen als also am an andere anderen andern anders -auch auf aus ausser außer ausserdem außerdem +aller allerdings alles allgemeinen als also am an andere anderen anderem andern +anders auch auf aus ausser außer ausserdem außerdem bald bei beide beiden beim beispiel bekannt bereits besonders besser besten bin bis bisher bist @@ -35,8 +35,8 @@ großen grosser großer grosses großes gut gute guter gutes habe haben habt hast hat hatte hätte hatten hätten heisst heißt her heute hier hin hinter hoch -ich ihm ihn ihnen ihr ihre ihrem ihrer ihres im immer in indem infolgedessen -ins irgend ist +ich ihm ihn ihnen ihr ihre ihrem ihren ihrer ihres im immer in indem +infolgedessen ins irgend ist ja jahr jahre jahren je jede jedem jeden jeder jedermann jedermanns jedoch jemand jemandem jemanden jene jenem jenen jener jenes jetzt diff --git a/spacy/lang/fr/examples.py b/spacy/lang/fr/examples.py index d2f6a91d2..bf508022e 100644 --- a/spacy/lang/fr/examples.py +++ b/spacy/lang/fr/examples.py @@ -11,9 +11,9 @@ Example sentences to test spaCy and its language models. sentences = [ - "Apple cherche a acheter une startup anglaise pour 1 milliard de dollard", - "Les voitures autonomes voient leur assurances décalées vers les constructeurs", - "San Francisco envisage d'interdire les robots coursiers", + "Apple cherche à acheter une startup anglaise pour 1 milliard de dollars", + "Les voitures autonomes déplacent la responsabilité de l'assurance vers les constructeurs", + "San Francisco envisage d'interdire les robots coursiers sur les trottoirs", "Londres est une grande ville du Royaume-Uni", "L’Italie choisit ArcelorMittal pour reprendre la plus grande aciérie d’Europe", "Apple lance HomePod parce qu'il se sent menacé par l'Echo d'Amazon", diff --git a/spacy/lang/th/__init__.py b/spacy/lang/th/__init__.py index ba5b86d77..06970fbd7 100644 --- a/spacy/lang/th/__init__.py +++ b/spacy/lang/th/__init__.py @@ -5,6 +5,7 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS from .tag_map import TAG_MAP from .stop_words import STOP_WORDS from .norm_exceptions import NORM_EXCEPTIONS +from .lex_attrs import LEX_ATTRS from ..norm_exceptions import BASE_NORMS from ...attrs import LANG, NORM @@ -27,13 +28,14 @@ class ThaiTokenizer(DummyTokenizer): self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp) def __call__(self, text): - words = list(self.word_tokenize(text, "newmm")) + words = list(self.word_tokenize(text)) spaces = [False] * len(words) return Doc(self.vocab, words=words, spaces=spaces) class ThaiDefaults(Language.Defaults): lex_attr_getters = dict(Language.Defaults.lex_attr_getters) + lex_attr_getters.update(LEX_ATTRS) lex_attr_getters[LANG] = lambda _text: "th" lex_attr_getters[NORM] = add_lookups( Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS diff --git a/spacy/lang/th/lex_attrs.py b/spacy/lang/th/lex_attrs.py new file mode 100644 index 000000000..047d046c2 --- /dev/null +++ b/spacy/lang/th/lex_attrs.py @@ -0,0 +1,62 @@ +# coding: utf8 +from __future__ import unicode_literals + +from ...attrs import LIKE_NUM + + +_num_words = [ + "ศูนย์", + "หนึ่ง", + "สอง", + "สาม", + "สี่", + "ห้า", + "หก", + "เจ็ด", + "แปด", + "เก้า", + "สิบ", + "สิบเอ็ด", + "ยี่สิบ", + "ยี่สิบเอ็ด", + "สามสิบ", + "สามสิบเอ็ด", + "สี่สิบ", + "สี่สิบเอ็ด", + "ห้าสิบ", + "ห้าสิบเอ็ด", + "หกสิบเอ็ด", + "เจ็ดสิบ", + "เจ็ดสิบเอ็ด", + "แปดสิบ", + "แปดสิบเอ็ด", + "เก้าสิบ", + "เก้าสิบเอ็ด", + "ร้อย", + "พัน", + "ล้าน", + "พันล้าน", + "หมื่นล้าน", + "แสนล้าน", + "ล้านล้าน", + "ล้านล้านล้าน", + "ล้านล้านล้านล้าน", +] + + +def like_num(text): + if text.startswith(("+", "-", "±", "~")): + text = text[1:] + text = text.replace(",", "").replace(".", "") + if text.isdigit(): + return True + if text.count("/") == 1: + num, denom = text.split("/") + if num.isdigit() and denom.isdigit(): + return True + if text in _num_words: + return True + return False + + +LEX_ATTRS = {LIKE_NUM: like_num} diff --git a/spacy/lang/th/norm_exceptions.py b/spacy/lang/th/norm_exceptions.py index 497779cf9..ed1b3e760 100644 --- a/spacy/lang/th/norm_exceptions.py +++ b/spacy/lang/th/norm_exceptions.py @@ -111,4 +111,3 @@ NORM_EXCEPTIONS = {} for string, norm in _exc.items(): NORM_EXCEPTIONS[string] = norm NORM_EXCEPTIONS[string.title()] = norm - diff --git a/spacy/lemmatizer.py b/spacy/lemmatizer.py index 1aea308f9..f9e35f44a 100644 --- a/spacy/lemmatizer.py +++ b/spacy/lemmatizer.py @@ -1,5 +1,6 @@ # coding: utf8 from __future__ import unicode_literals +from collections import OrderedDict from .symbols import POS, NOUN, VERB, ADJ, PUNCT, PROPN from .symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos @@ -118,8 +119,8 @@ def lemmatize(string, index, exceptions, rules): forms.append(form) else: oov_forms.append(form) - # Remove duplicates, and sort forms generated by rules alphabetically. - forms = list(set(forms)) + # Remove duplicates but preserve the ordering of applied "rules" + forms = list(OrderedDict.fromkeys(forms)) # Put exceptions at the front of the list, so they get priority. # This is a dodgy heuristic -- but it's the best we can do until we get # frequencies on this. We can at least prune out problematic exceptions, diff --git a/spacy/tests/doc/test_span.py b/spacy/tests/doc/test_span.py index 13f7f2771..60b711741 100644 --- a/spacy/tests/doc/test_span.py +++ b/spacy/tests/doc/test_span.py @@ -6,6 +6,7 @@ from spacy.attrs import ORTH, LENGTH from spacy.tokens import Doc, Span from spacy.vocab import Vocab from spacy.errors import ModelsWarning +from spacy.util import filter_spans from ..util import get_doc @@ -219,3 +220,21 @@ def test_span_ents_property(doc): assert sentences[2].ents[0].label_ == "PRODUCT" assert sentences[2].ents[0].start == 11 assert sentences[2].ents[0].end == 14 + + +def test_filter_spans(doc): + # Test filtering duplicates + spans = [doc[1:4], doc[6:8], doc[1:4], doc[10:14]] + filtered = filter_spans(spans) + assert len(filtered) == 3 + assert filtered[0].start == 1 and filtered[0].end == 4 + assert filtered[1].start == 6 and filtered[1].end == 8 + assert filtered[2].start == 10 and filtered[2].end == 14 + # Test filtering overlaps with longest preference + spans = [doc[1:4], doc[1:3], doc[5:10], doc[7:9], doc[1:4]] + filtered = filter_spans(spans) + assert len(filtered) == 2 + assert len(filtered[0]) == 3 + assert len(filtered[1]) == 5 + assert filtered[0].start == 1 and filtered[0].end == 4 + assert filtered[1].start == 5 and filtered[1].end == 10 diff --git a/spacy/util.py b/spacy/util.py index 1cea8b6ca..475d556d0 100644 --- a/spacy/util.py +++ b/spacy/util.py @@ -510,7 +510,7 @@ def decaying(start, stop, decay): curr = float(start) while True: yield max(curr, stop) - curr -= (decay) + curr -= decay def minibatch_by_words(items, size, tuples=True, count_words=len): @@ -571,6 +571,28 @@ def itershuffle(iterable, bufsize=1000): raise StopIteration +def filter_spans(spans): + """Filter a sequence of spans and remove duplicates or overlaps. Useful for + creating named entities (where one token can only be part of one entity) or + when merging spans with `Retokenizer.merge`. When spans overlap, the (first) + longest span is preferred over shorter spans. + + spans (iterable): The spans to filter. + RETURNS (list): The filtered spans. + """ + get_sort_key = lambda span: (span.end - span.start, span.start) + sorted_spans = sorted(spans, key=get_sort_key, reverse=True) + result = [] + seen_tokens = set() + for span in sorted_spans: + # Check for end - 1 here because boundaries are inclusive + if span.start not in seen_tokens and span.end - 1 not in seen_tokens: + result.append(span) + seen_tokens.update(range(span.start, span.end)) + result = sorted(result, key=lambda span: span.start) + return result + + def to_bytes(getters, exclude): serialized = OrderedDict() for key, getter in getters.items(): diff --git a/website/README.md b/website/README.md index 900e637ae..be817225d 100644 --- a/website/README.md +++ b/website/README.md @@ -457,7 +457,7 @@ sit amet dignissim justo congue. ## Setup and installation {#setup} Before running the setup, make sure your versions of -[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date. +[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date. Node v10.15 or later is required. ```bash # Clone the repository diff --git a/website/docs/api/cli.md b/website/docs/api/cli.md index d9886004a..7788d7a8f 100644 --- a/website/docs/api/cli.md +++ b/website/docs/api/cli.md @@ -198,7 +198,7 @@ will only train the tagger and parser. ```bash $ python -m spacy train [lang] [output_path] [train_path] [dev_path] -[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-examples] [--use-gpu] +[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-early-stopping] [--n-examples] [--use-gpu] [--version] [--meta-path] [--init-tok2vec] [--parser-multitasks] [--entity-multitasks] [--gold-preproc] [--noise-level] [--learn-tokens] [--verbose] @@ -214,6 +214,7 @@ $ python -m spacy train [lang] [output_path] [train_path] [dev_path] | `--pipeline`, `-p` 2.1 | option | Comma-separated names of pipeline components to train. Defaults to `'tagger,parser,ner'`. | | `--vectors`, `-v` | option | Model to load vectors from. | | `--n-iter`, `-n` | option | Number of iterations (default: `30`). | +| `--n-early-stopping`, `-ne` | option | Maximum number of training epochs without dev accuracy improvement. | | `--n-examples`, `-ns` | option | Number of examples to use (defaults to `0` for all examples). | | `--use-gpu`, `-g` | option | Whether to use GPU. Can be either `0`, `1` or `-1`. | | `--version`, `-V` | option | Model version. Will be written out to the model's `meta.json` after training. | @@ -285,24 +286,26 @@ improvement. ```bash $ python -m spacy pretrain [texts_loc] [vectors_model] [output_dir] [--width] [--depth] [--embed-rows] [--dropout] [--seed] [--n-iter] [--use-vectors] +[--n-save_every] ``` -| Argument | Type | Description | -| ---------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- | -| `texts_loc` | positional | Path to JSONL file with raw texts to learn from, with text provided as the key `"text"`. [See here](#pretrain-jsonl) for details. | -| `vectors_model` | positional | Name or path to spaCy model with vectors to learn from. | -| `output_dir` | positional | Directory to write models to on each epoch. | -| `--width`, `-cw` | option | Width of CNN layers. | -| `--depth`, `-cd` | option | Depth of CNN layers. | -| `--embed-rows`, `-er` | option | Number of embedding rows. | -| `--dropout`, `-d` | option | Dropout rate. | -| `--batch-size`, `-bs` | option | Number of words per training batch. | -| `--max-length`, `-xw` | option | Maximum words per example. Longer examples are discarded. | -| `--min-length`, `-nw` | option | Minimum words per example. Shorter examples are discarded. | -| `--seed`, `-s` | option | Seed for random number generators. | -| `--n-iter`, `-i` | option | Number of iterations to pretrain. | -| `--use-vectors`, `-uv` | flag | Whether to use the static vectors as input features. | -| **CREATES** | weights | The pre-trained weights that can be used to initialize `spacy train`. | +| Argument | Type | Description | +| ----------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- | +| `texts_loc` | positional | Path to JSONL file with raw texts to learn from, with text provided as the key `"text"`. [See here](#pretrain-jsonl) for details. | +| `vectors_model` | positional | Name or path to spaCy model with vectors to learn from. | +| `output_dir` | positional | Directory to write models to on each epoch. | +| `--width`, `-cw` | option | Width of CNN layers. | +| `--depth`, `-cd` | option | Depth of CNN layers. | +| `--embed-rows`, `-er` | option | Number of embedding rows. | +| `--dropout`, `-d` | option | Dropout rate. | +| `--batch-size`, `-bs` | option | Number of words per training batch. | +| `--max-length`, `-xw` | option | Maximum words per example. Longer examples are discarded. | +| `--min-length`, `-nw` | option | Minimum words per example. Shorter examples are discarded. | +| `--seed`, `-s` | option | Seed for random number generators. | +| `--n-iter`, `-i` | option | Number of iterations to pretrain. | +| `--use-vectors`, `-uv` | flag | Whether to use the static vectors as input features. | +| `--n-save_every`, `-se` | option | Save model every X batches. | +| **CREATES** | weights | The pre-trained weights that can be used to initialize `spacy train`. | ### JSONL format for raw text {#pretrain-jsonl} @@ -324,7 +327,7 @@ tokenization can be provided. | Key | Type | Description | | -------- | ------- | -------------------------------------------- | -| `text` | unicode | The raw input text. | +| `text` | unicode | The raw input text. Is not required if `tokens` available. | | `tokens` | list | Optional tokenization, one string per token. | ```json @@ -332,6 +335,7 @@ tokenization can be provided. {"text": "Can I ask where you work now and what you do, and if you enjoy it?"} {"text": "They may just pull out of the Seattle market completely, at least until they have autonomous vehicles."} {"text": "My cynical view on this is that it will never be free to the public. Reason: what would be the draw of joining the military? Right now their selling point is free Healthcare and Education. Ironically both are run horribly and most, that I've talked to, come out wishing they never went in."} +{"tokens": ["If", "tokens", "are", "provided", "then", "we", "can", "skip", "the", "raw", "input", "text"]} ``` ## Init Model {#init-model new="2"} @@ -375,7 +379,7 @@ pipeline. ```bash $ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit] -[--gpu-id] [--gold-preproc] +[--gpu-id] [--gold-preproc] [--return-scores] ``` | Argument | Type | Description | @@ -386,6 +390,7 @@ $ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-lim | `--displacy-limit`, `-dl` | option | Number of parses to generate per file. Defaults to `25`. Keep in mind that a significantly higher number might cause the `.html` files to render slowly. | | `--gpu-id`, `-g` | option | GPU to use, if any. Defaults to `-1` for CPU. | | `--gold-preproc`, `-G` | flag | Use gold preprocessing. | +| `--return-scores`, `-R` | flag | Return dict containing model scores. | | **CREATES** | `stdout`, HTML | Training results and optional displaCy visualizations. | ## Package {#package} diff --git a/website/docs/api/top-level.md b/website/docs/api/top-level.md index 924aca283..9d5bdc527 100644 --- a/website/docs/api/top-level.md +++ b/website/docs/api/top-level.md @@ -211,16 +211,16 @@ Render a dependency parse tree or named entity visualization. > html = displacy.render(doc, style="dep") > ``` -| Name | Type | Description | Default | -| ----------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------- | -| `docs` | list, `Doc`, `Span` | Document(s) to visualize. | -| `style` | unicode | Visualization style, `'dep'` or `'ent'`. | `'dep'` | -| `page` | bool | Render markup as full HTML page. | `False` | -| `minify` | bool | Minify HTML markup. | `False` | -| `jupyter` | bool | Explicitly enable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. | detected automatically | -| `options` | dict | [Visualizer-specific options](#options), e.g. colors. | `{}` | -| `manual` | bool | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False` | -| **RETURNS** | unicode | Rendered HTML markup. | +| Name | Type | Description | Default | +| ----------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | +| `docs` | list, `Doc`, `Span` | Document(s) to visualize. | +| `style` | unicode | Visualization style, `'dep'` or `'ent'`. | `'dep'` | +| `page` | bool | Render markup as full HTML page. | `False` | +| `minify` | bool | Minify HTML markup. | `False` | +| `jupyter` | bool | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None`. | `None` | +| `options` | dict | [Visualizer-specific options](#options), e.g. colors. | `{}` | +| `manual` | bool | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False` | +| **RETURNS** | unicode | Rendered HTML markup. | ### Visualizer options {#displacy_options} @@ -654,6 +654,27 @@ for batching. Larger `buffsize` means less bias. | `buffsize` | int | Items to hold back. | | **YIELDS** | iterable | The shuffled iterator. | +### util.filter_spans {#util.filter_spans tag="function" new="2.1.4"} + +Filter a sequence of [`Span`](/api/span) objects and remove duplicates or +overlaps. Useful for creating named entities (where one token can only be part +of one entity) or when merging spans with +[`Retokenizer.merge`](/api/doc#retokenizer.merge). When spans overlap, the +(first) longest span is preferred over shorter spans. + +> #### Example +> +> ```python +> doc = nlp("This is a sentence.") +> spans = [doc[0:2], doc[0:2], doc[0:4]] +> filtered = filter_spans(spans) +> ``` + +| Name | Type | Description | +| ----------- | -------- | -------------------- | +| `spans` | iterable | The spans to filter. | +| **RETURNS** | list | The filtered spans. | + ## Compatibility functions {#compat source="spacy/compaty.py"} All Python code is written in an **intersection of Python 2 and Python 3**. This diff --git a/website/docs/usage/101/_serialization.md b/website/docs/usage/101/_serialization.md index 9b00ece04..828b796b3 100644 --- a/website/docs/usage/101/_serialization.md +++ b/website/docs/usage/101/_serialization.md @@ -4,7 +4,7 @@ example, everything that's in your `nlp` object. This means you'll have to translate its contents and structure into a format that can be saved, like a file or a byte string. This process is called serialization. spaCy comes with **built-in serialization methods** and supports the -[Pickle protocol](http://www.diveintopython3.net/serializing.html#dump). +[Pickle protocol](https://www.diveinto.org/python3/serializing.html#dump). > #### What's pickle? > diff --git a/website/docs/usage/processing-pipelines.md b/website/docs/usage/processing-pipelines.md index 8eaf81652..871ca3db6 100644 --- a/website/docs/usage/processing-pipelines.md +++ b/website/docs/usage/processing-pipelines.md @@ -260,7 +260,7 @@ def my_component(doc): nlp = spacy.load("en_core_web_sm") nlp.add_pipe(my_component, name="print_info", last=True) -print(nlp.pipe_names) # ['print_info', 'tagger', 'parser', 'ner'] +print(nlp.pipe_names) # ['tagger', 'parser', 'ner', 'print_info'] doc = nlp(u"This is a sentence.") ``` diff --git a/website/docs/usage/rule-based-matching.md b/website/docs/usage/rule-based-matching.md index a0959bfbc..13d3dcd32 100644 --- a/website/docs/usage/rule-based-matching.md +++ b/website/docs/usage/rule-based-matching.md @@ -713,9 +713,9 @@ from spacy.matcher import PhraseMatcher nlp = spacy.load('en_core_web_sm') matcher = PhraseMatcher(nlp.vocab) -terminology_list = [u"Barack Obama", u"Angela Merkel", u"Washington, D.C."] +terms = [u"Barack Obama", u"Angela Merkel", u"Washington, D.C."] # Only run nlp.make_doc to speed things up -patterns = [nlp.make_doc(text) for text in terminology_list] +patterns = [nlp.make_doc(text) for text in terms] matcher.add("TerminologyList", None, *patterns) doc = nlp(u"German Chancellor Angela Merkel and US President Barack Obama " diff --git a/website/docs/usage/spacy-101.md b/website/docs/usage/spacy-101.md index a089957d5..03feb03b1 100644 --- a/website/docs/usage/spacy-101.md +++ b/website/docs/usage/spacy-101.md @@ -102,7 +102,7 @@ systems, or to pre-process text for **deep learning**. integrated and opinionated. spaCy tries to avoid asking the user to choose between multiple algorithms that deliver equivalent functionality. Keeping the menu small lets spaCy deliver generally better performance and developer - experience.M + experience. - **spaCy is not a company**. It's an open-source library. Our company publishing spaCy and other software is called diff --git a/website/meta/universe.json b/website/meta/universe.json index a6a8bf247..151b41452 100644 --- a/website/meta/universe.json +++ b/website/meta/universe.json @@ -980,6 +980,22 @@ }, "category": ["podcasts"] }, + { + "type": "education", + "id": "twimlai-podcast", + "title": "TWiML & AI: Practical NLP with spaCy and Prodigy", + "slogan": "May 2019", + "description": "\"Ines and I caught up to discuss her various projects, including the aforementioned SpaCy, an open-source NLP library built with a focus on industry and production use cases. In our conversation, Ines gives us an overview of the SpaCy Library, a look at some of the use cases that excite her, and the Spacy community and contributors. We also discuss her work with Prodigy, an annotation service tool that uses continuous active learning to train models, and finally, what other exciting projects she is working on.\"", + "thumb": "https://i.imgur.com/ng2F5gK.png", + "url": "https://twimlai.com/twiml-talk-262-practical-natural-language-processing-with-spacy-and-prodigy-w-ines-montani", + "iframe": "https://html5-player.libsyn.com/embed/episode/id/9691514/height/90/theme/custom/thumbnail/no/preload/no/direction/backward/render-playlist/no/custom-color/3e85b1/", + "iframe_height": 90, + "author": "Sam Charrington", + "author_links": { + "website": "https://twimlai.com" + }, + "category": ["podcasts"] + }, { "id": "adam_qas", "title": "ADAM: Question Answering System", @@ -1338,8 +1354,43 @@ }, "category": ["pipeline"], "tags": ["inflection"] + }, + { + "id": "NGym", + "title": "NeuralGym", + "slogan": "A little Windows GUI for training models with spaCy", + "description": "NeuralGym is a Python application for Windows with a graphical user interface to train models with spaCy. Run the application, select an output folder, a training data file in spaCy's data format, a spaCy model or blank model and press 'Start'.", + "github": "d5555/NeuralGym", + "url": "https://github.com/d5555/NeuralGym", + "image": "https://github.com/d5555/NeuralGym/raw/master/NGym.png", + "thumb": "https://github.com/d5555/NeuralGym/raw/master/NGym/web.png", + "author": "d5555", + "category": ["training"], + "tags": ["windows"] + }, + { + "id": "holmes", + "title": "Holmes", + "slogan": "Information extraction from English and German texts based on predicate logic", + "github": "msg-systems/holmes-extractor", + "url": "https://github.com/msg-systems/holmes-extractor", + "description": "Holmes is a Python 3 library that supports a number of use cases involving information extraction from English and German texts, including chatbot, structural search, topic matching and supervised document classification.", + "pip": "holmes-extractor", + "category": ["conversational", "research", "standalone"], + "tags": ["chatbots", "text-processing"], + "code_example": [ + "import holmes_extractor as holmes", + "holmes_manager = holmes.Manager(model='en_coref_lg')", + "holmes_manager.register_search_phrase('A big dog chases a cat')", + "holmes_manager.start_chatbot_mode_console()" + ], + "author": "Richard Paul Hudson", + "author_links": { + "github": "richardpaulhudson" + } } ], + "categories": [ { "label": "Projects",