mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-25 16:54:24 +03:00
Merge branch 'master' into spacy.io
This commit is contained in:
commit
1d1df7b5f9
106
.github/contributors/F0rge1cE.md
vendored
Normal file
106
.github/contributors/F0rge1cE.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Icarus Xu |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 05/06/2019 |
|
||||
| GitHub username | F0rge1cE |
|
||||
| Website (optional) | |
|
106
.github/contributors/amitness.md
vendored
Normal file
106
.github/contributors/amitness.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [X] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Amit Chaudhary |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | April 29, 2019 |
|
||||
| GitHub username | amitness |
|
||||
| Website (optional) | https://amitness.com |
|
106
.github/contributors/henry860916.md
vendored
Normal file
106
.github/contributors/henry860916.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | ------------------------ |
|
||||
| Name | Henry Zhang |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2019-04-30 |
|
||||
| GitHub username | henry860916 |
|
||||
| Website (optional) | |
|
106
.github/contributors/ldorigo.md
vendored
Normal file
106
.github/contributors/ldorigo.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Luca Dorigo |
|
||||
| Company name (if applicable) | / |
|
||||
| Title or role (if applicable) | / |
|
||||
| Date | 08.05.2019 |
|
||||
| GitHub username | ldorigo |
|
||||
| Website (optional) | / |
|
106
.github/contributors/richardpaulhudson.md
vendored
Normal file
106
.github/contributors/richardpaulhudson.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Richard Paul Hudson |
|
||||
| Company name (if applicable) | msg systems ag |
|
||||
| Title or role (if applicable) | Principal IT Consultant|
|
||||
| Date | 06. May 2019 |
|
||||
| GitHub username | richardpaulhudson |
|
||||
| Website (optional) | |
|
106
.github/contributors/yaph.md
vendored
Normal file
106
.github/contributors/yaph.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Ramiro Gómez |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2019-04-29 |
|
||||
| GitHub username | yaph |
|
||||
| Website (optional) | http://ramiro.org/ |
|
|
@ -447,17 +447,7 @@ use the `get_doc()` utility function to construct it manually.
|
|||
|
||||
## Updating the website
|
||||
|
||||
Our [website and docs](https://spacy.io) are implemented in
|
||||
[Jade/Pug](https://www.jade-lang.org), and built or served by
|
||||
[Harp](https://harpjs.com). Jade/Pug is an extensible templating language with a
|
||||
readable syntax, that compiles to HTML. Here's how to view the site locally:
|
||||
|
||||
```bash
|
||||
sudo npm install --global harp
|
||||
git clone https://github.com/explosion/spaCy
|
||||
cd spaCy/website
|
||||
harp server
|
||||
```
|
||||
For instructions on how to build and run the [website](https://spacy.io) locally see **[Setup and installation](https://github.com/explosion/spaCy/blob/master/website/README.md#setup-and-installation-setup)** in the *website* directory's README.
|
||||
|
||||
The docs can always use another example or more detail, and they should always
|
||||
be up to date and not misleading. To quickly find the correct file to edit,
|
||||
|
|
|
@ -36,11 +36,27 @@ def main(model="en_core_web_sm"):
|
|||
print("{:<10}\t{}\t{}".format(r1.text, r2.ent_type_, r2.text))
|
||||
|
||||
|
||||
def filter_spans(spans):
|
||||
# Filter a sequence of spans so they don't contain overlaps
|
||||
get_sort_key = lambda span: (span.end - span.start, span.start)
|
||||
sorted_spans = sorted(spans, key=get_sort_key, reverse=True)
|
||||
result = []
|
||||
seen_tokens = set()
|
||||
for span in sorted_spans:
|
||||
if span.start not in seen_tokens and span.end - 1 not in seen_tokens:
|
||||
result.append(span)
|
||||
seen_tokens.update(range(span.start, span.end))
|
||||
return result
|
||||
|
||||
|
||||
def extract_currency_relations(doc):
|
||||
# merge entities and noun chunks into one token
|
||||
# Merge entities and noun chunks into one token
|
||||
seen_tokens = set()
|
||||
spans = list(doc.ents) + list(doc.noun_chunks)
|
||||
for span in spans:
|
||||
span.merge()
|
||||
spans = filter_spans(spans)
|
||||
with doc.retokenize() as retokenizer:
|
||||
for span in spans:
|
||||
retokenizer.merge(span)
|
||||
|
||||
relations = []
|
||||
for money in filter(lambda w: w.ent_type_ == "MONEY", doc):
|
||||
|
|
|
@ -9,7 +9,7 @@ srsly>=0.0.5,<1.1.0
|
|||
# Third party dependencies
|
||||
numpy>=1.15.0
|
||||
requests>=2.13.0,<3.0.0
|
||||
jsonschema>=2.6.0,<3.0.0
|
||||
jsonschema>=2.6.0,<3.1.0
|
||||
plac<1.0.0,>=0.9.6
|
||||
pathlib==1.0.1; python_version < "3.4"
|
||||
# Development dependencies
|
||||
|
|
2
setup.py
2
setup.py
|
@ -232,7 +232,7 @@ def setup_package():
|
|||
"blis>=0.2.2,<0.3.0",
|
||||
"plac<1.0.0,>=0.9.6",
|
||||
"requests>=2.13.0,<3.0.0",
|
||||
"jsonschema>=2.6.0,<3.0.0",
|
||||
"jsonschema>=2.6.0,<3.1.0",
|
||||
"wasabi>=0.2.0,<1.1.0",
|
||||
"srsly>=0.0.5,<1.1.0",
|
||||
'pathlib==1.0.1; python_version < "3.4"',
|
||||
|
|
|
@ -181,7 +181,7 @@ def read_vectors(vectors_loc):
|
|||
vectors_keys = []
|
||||
for i, line in enumerate(tqdm(f)):
|
||||
line = line.rstrip()
|
||||
pieces = line.rsplit(" ", vectors_data.shape[1] + 1)
|
||||
pieces = line.rsplit(" ", vectors_data.shape[1])
|
||||
word = pieces.pop(0)
|
||||
if len(pieces) != vectors_data.shape[1]:
|
||||
msg.fail(Errors.E094.format(line_num=i, loc=vectors_loc), exits=1)
|
||||
|
|
|
@ -181,10 +181,10 @@ def make_update(model, docs, optimizer, drop=0.0, objective="L2"):
|
|||
def make_docs(nlp, batch, min_length, max_length):
|
||||
docs = []
|
||||
for record in batch:
|
||||
text = record["text"]
|
||||
if "tokens" in record:
|
||||
doc = Doc(nlp.vocab, words=record["tokens"])
|
||||
else:
|
||||
text = record["text"]
|
||||
doc = nlp.make_doc(text)
|
||||
if "heads" in record:
|
||||
heads = record["heads"]
|
||||
|
|
|
@ -16,6 +16,7 @@ import random
|
|||
from .._ml import create_default_optimizer
|
||||
from ..attrs import PROB, IS_OOV, CLUSTER, LANG
|
||||
from ..gold import GoldCorpus
|
||||
from ..compat import path2str
|
||||
from .. import util
|
||||
from .. import about
|
||||
|
||||
|
@ -423,10 +424,12 @@ def _collate_best_model(meta, output_path, components):
|
|||
for component in components:
|
||||
bests[component] = _find_best(output_path, component)
|
||||
best_dest = output_path / "model-best"
|
||||
shutil.copytree(output_path / "model-final", best_dest)
|
||||
shutil.copytree(path2str(output_path / "model-final"), path2str(best_dest))
|
||||
for component, best_component_src in bests.items():
|
||||
shutil.rmtree(best_dest / component)
|
||||
shutil.copytree(best_component_src / component, best_dest / component)
|
||||
shutil.rmtree(path2str(best_dest / component))
|
||||
shutil.copytree(
|
||||
path2str(best_component_src / component), path2str(best_dest / component)
|
||||
)
|
||||
accs = srsly.read_json(best_component_src / "accuracy.json")
|
||||
for metric in _get_metrics(component):
|
||||
meta["accuracy"][metric] = accs[metric]
|
||||
|
|
|
@ -168,6 +168,7 @@ GLOSSARY = {
|
|||
# Dependency Labels (English)
|
||||
# ClearNLP / Universal Dependencies
|
||||
# https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md
|
||||
"acl": "clausal modifier of noun (adjectival clause)",
|
||||
"acomp": "adjectival complement",
|
||||
"advcl": "adverbial clause modifier",
|
||||
"advmod": "adverbial modifier",
|
||||
|
@ -177,22 +178,32 @@ GLOSSARY = {
|
|||
"attr": "attribute",
|
||||
"aux": "auxiliary",
|
||||
"auxpass": "auxiliary (passive)",
|
||||
"case": "case marking",
|
||||
"cc": "coordinating conjunction",
|
||||
"ccomp": "clausal complement",
|
||||
"clf": "classifier",
|
||||
"complm": "complementizer",
|
||||
"compound": "compound",
|
||||
"conj": "conjunct",
|
||||
"cop": "copula",
|
||||
"csubj": "clausal subject",
|
||||
"csubjpass": "clausal subject (passive)",
|
||||
"dative": "dative",
|
||||
"dep": "unclassified dependent",
|
||||
"det": "determiner",
|
||||
"discourse": "discourse element",
|
||||
"dislocated": "dislocated elements",
|
||||
"dobj": "direct object",
|
||||
"expl": "expletive",
|
||||
"fixed": "fixed multiword expression",
|
||||
"flat": "flat multiword expression",
|
||||
"goeswith": "goes with",
|
||||
"hmod": "modifier in hyphenation",
|
||||
"hyph": "hyphen",
|
||||
"infmod": "infinitival modifier",
|
||||
"intj": "interjection",
|
||||
"iobj": "indirect object",
|
||||
"list": "list",
|
||||
"mark": "marker",
|
||||
"meta": "meta modifier",
|
||||
"neg": "negation modifier",
|
||||
|
@ -201,11 +212,15 @@ GLOSSARY = {
|
|||
"npadvmod": "noun phrase as adverbial modifier",
|
||||
"nsubj": "nominal subject",
|
||||
"nsubjpass": "nominal subject (passive)",
|
||||
"nounmod": "modifier of nominal",
|
||||
"npmod": "noun phrase as adverbial modifier",
|
||||
"num": "number modifier",
|
||||
"number": "number compound modifier",
|
||||
"nummod": "numeric modifier",
|
||||
"oprd": "object predicate",
|
||||
"obj": "object",
|
||||
"obl": "oblique nominal",
|
||||
"orphan": "orphan",
|
||||
"parataxis": "parataxis",
|
||||
"partmod": "participal modifier",
|
||||
"pcomp": "complement of preposition",
|
||||
|
@ -218,7 +233,10 @@ GLOSSARY = {
|
|||
"punct": "punctuation",
|
||||
"quantmod": "modifier of quantifier",
|
||||
"rcmod": "relative clause modifier",
|
||||
"relcl": "relative clause modifier",
|
||||
"reparandum": "overridden disfluency",
|
||||
"root": "root",
|
||||
"vocative": "vocative",
|
||||
"xcomp": "open clausal complement",
|
||||
# Dependency labels (German)
|
||||
# TIGER Treebank
|
||||
|
|
|
@ -5,8 +5,8 @@ from __future__ import unicode_literals
|
|||
STOP_WORDS = set(
|
||||
"""
|
||||
á a ab aber ach acht achte achten achter achtes ag alle allein allem allen
|
||||
aller allerdings alles allgemeinen als also am an andere anderen andern anders
|
||||
auch auf aus ausser außer ausserdem außerdem
|
||||
aller allerdings alles allgemeinen als also am an andere anderen anderem andern
|
||||
anders auch auf aus ausser außer ausserdem außerdem
|
||||
|
||||
bald bei beide beiden beim beispiel bekannt bereits besonders besser besten bin
|
||||
bis bisher bist
|
||||
|
@ -35,8 +35,8 @@ großen grosser großer grosses großes gut gute guter gutes
|
|||
habe haben habt hast hat hatte hätte hatten hätten heisst heißt her heute hier
|
||||
hin hinter hoch
|
||||
|
||||
ich ihm ihn ihnen ihr ihre ihrem ihrer ihres im immer in indem infolgedessen
|
||||
ins irgend ist
|
||||
ich ihm ihn ihnen ihr ihre ihrem ihren ihrer ihres im immer in indem
|
||||
infolgedessen ins irgend ist
|
||||
|
||||
ja jahr jahre jahren je jede jedem jeden jeder jedermann jedermanns jedoch
|
||||
jemand jemandem jemanden jene jenem jenen jener jenes jetzt
|
||||
|
|
|
@ -11,9 +11,9 @@ Example sentences to test spaCy and its language models.
|
|||
|
||||
|
||||
sentences = [
|
||||
"Apple cherche a acheter une startup anglaise pour 1 milliard de dollard",
|
||||
"Les voitures autonomes voient leur assurances décalées vers les constructeurs",
|
||||
"San Francisco envisage d'interdire les robots coursiers",
|
||||
"Apple cherche à acheter une startup anglaise pour 1 milliard de dollars",
|
||||
"Les voitures autonomes déplacent la responsabilité de l'assurance vers les constructeurs",
|
||||
"San Francisco envisage d'interdire les robots coursiers sur les trottoirs",
|
||||
"Londres est une grande ville du Royaume-Uni",
|
||||
"L’Italie choisit ArcelorMittal pour reprendre la plus grande aciérie d’Europe",
|
||||
"Apple lance HomePod parce qu'il se sent menacé par l'Echo d'Amazon",
|
||||
|
|
|
@ -5,6 +5,7 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
|||
from .tag_map import TAG_MAP
|
||||
from .stop_words import STOP_WORDS
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...attrs import LANG, NORM
|
||||
|
@ -27,13 +28,14 @@ class ThaiTokenizer(DummyTokenizer):
|
|||
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
||||
|
||||
def __call__(self, text):
|
||||
words = list(self.word_tokenize(text, "newmm"))
|
||||
words = list(self.word_tokenize(text))
|
||||
spaces = [False] * len(words)
|
||||
return Doc(self.vocab, words=words, spaces=spaces)
|
||||
|
||||
|
||||
class ThaiDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = lambda _text: "th"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
|
||||
|
|
62
spacy/lang/th/lex_attrs.py
Normal file
62
spacy/lang/th/lex_attrs.py
Normal file
|
@ -0,0 +1,62 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from ...attrs import LIKE_NUM
|
||||
|
||||
|
||||
_num_words = [
|
||||
"ศูนย์",
|
||||
"หนึ่ง",
|
||||
"สอง",
|
||||
"สาม",
|
||||
"สี่",
|
||||
"ห้า",
|
||||
"หก",
|
||||
"เจ็ด",
|
||||
"แปด",
|
||||
"เก้า",
|
||||
"สิบ",
|
||||
"สิบเอ็ด",
|
||||
"ยี่สิบ",
|
||||
"ยี่สิบเอ็ด",
|
||||
"สามสิบ",
|
||||
"สามสิบเอ็ด",
|
||||
"สี่สิบ",
|
||||
"สี่สิบเอ็ด",
|
||||
"ห้าสิบ",
|
||||
"ห้าสิบเอ็ด",
|
||||
"หกสิบเอ็ด",
|
||||
"เจ็ดสิบ",
|
||||
"เจ็ดสิบเอ็ด",
|
||||
"แปดสิบ",
|
||||
"แปดสิบเอ็ด",
|
||||
"เก้าสิบ",
|
||||
"เก้าสิบเอ็ด",
|
||||
"ร้อย",
|
||||
"พัน",
|
||||
"ล้าน",
|
||||
"พันล้าน",
|
||||
"หมื่นล้าน",
|
||||
"แสนล้าน",
|
||||
"ล้านล้าน",
|
||||
"ล้านล้านล้าน",
|
||||
"ล้านล้านล้านล้าน",
|
||||
]
|
||||
|
||||
|
||||
def like_num(text):
|
||||
if text.startswith(("+", "-", "±", "~")):
|
||||
text = text[1:]
|
||||
text = text.replace(",", "").replace(".", "")
|
||||
if text.isdigit():
|
||||
return True
|
||||
if text.count("/") == 1:
|
||||
num, denom = text.split("/")
|
||||
if num.isdigit() and denom.isdigit():
|
||||
return True
|
||||
if text in _num_words:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
LEX_ATTRS = {LIKE_NUM: like_num}
|
|
@ -111,4 +111,3 @@ NORM_EXCEPTIONS = {}
|
|||
for string, norm in _exc.items():
|
||||
NORM_EXCEPTIONS[string] = norm
|
||||
NORM_EXCEPTIONS[string.title()] = norm
|
||||
|
||||
|
|
|
@ -1,5 +1,6 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
from collections import OrderedDict
|
||||
|
||||
from .symbols import POS, NOUN, VERB, ADJ, PUNCT, PROPN
|
||||
from .symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
|
||||
|
@ -118,8 +119,8 @@ def lemmatize(string, index, exceptions, rules):
|
|||
forms.append(form)
|
||||
else:
|
||||
oov_forms.append(form)
|
||||
# Remove duplicates, and sort forms generated by rules alphabetically.
|
||||
forms = list(set(forms))
|
||||
# Remove duplicates but preserve the ordering of applied "rules"
|
||||
forms = list(OrderedDict.fromkeys(forms))
|
||||
# Put exceptions at the front of the list, so they get priority.
|
||||
# This is a dodgy heuristic -- but it's the best we can do until we get
|
||||
# frequencies on this. We can at least prune out problematic exceptions,
|
||||
|
|
|
@ -6,6 +6,7 @@ from spacy.attrs import ORTH, LENGTH
|
|||
from spacy.tokens import Doc, Span
|
||||
from spacy.vocab import Vocab
|
||||
from spacy.errors import ModelsWarning
|
||||
from spacy.util import filter_spans
|
||||
|
||||
from ..util import get_doc
|
||||
|
||||
|
@ -219,3 +220,21 @@ def test_span_ents_property(doc):
|
|||
assert sentences[2].ents[0].label_ == "PRODUCT"
|
||||
assert sentences[2].ents[0].start == 11
|
||||
assert sentences[2].ents[0].end == 14
|
||||
|
||||
|
||||
def test_filter_spans(doc):
|
||||
# Test filtering duplicates
|
||||
spans = [doc[1:4], doc[6:8], doc[1:4], doc[10:14]]
|
||||
filtered = filter_spans(spans)
|
||||
assert len(filtered) == 3
|
||||
assert filtered[0].start == 1 and filtered[0].end == 4
|
||||
assert filtered[1].start == 6 and filtered[1].end == 8
|
||||
assert filtered[2].start == 10 and filtered[2].end == 14
|
||||
# Test filtering overlaps with longest preference
|
||||
spans = [doc[1:4], doc[1:3], doc[5:10], doc[7:9], doc[1:4]]
|
||||
filtered = filter_spans(spans)
|
||||
assert len(filtered) == 2
|
||||
assert len(filtered[0]) == 3
|
||||
assert len(filtered[1]) == 5
|
||||
assert filtered[0].start == 1 and filtered[0].end == 4
|
||||
assert filtered[1].start == 5 and filtered[1].end == 10
|
||||
|
|
|
@ -510,7 +510,7 @@ def decaying(start, stop, decay):
|
|||
curr = float(start)
|
||||
while True:
|
||||
yield max(curr, stop)
|
||||
curr -= (decay)
|
||||
curr -= decay
|
||||
|
||||
|
||||
def minibatch_by_words(items, size, tuples=True, count_words=len):
|
||||
|
@ -571,6 +571,28 @@ def itershuffle(iterable, bufsize=1000):
|
|||
raise StopIteration
|
||||
|
||||
|
||||
def filter_spans(spans):
|
||||
"""Filter a sequence of spans and remove duplicates or overlaps. Useful for
|
||||
creating named entities (where one token can only be part of one entity) or
|
||||
when merging spans with `Retokenizer.merge`. When spans overlap, the (first)
|
||||
longest span is preferred over shorter spans.
|
||||
|
||||
spans (iterable): The spans to filter.
|
||||
RETURNS (list): The filtered spans.
|
||||
"""
|
||||
get_sort_key = lambda span: (span.end - span.start, span.start)
|
||||
sorted_spans = sorted(spans, key=get_sort_key, reverse=True)
|
||||
result = []
|
||||
seen_tokens = set()
|
||||
for span in sorted_spans:
|
||||
# Check for end - 1 here because boundaries are inclusive
|
||||
if span.start not in seen_tokens and span.end - 1 not in seen_tokens:
|
||||
result.append(span)
|
||||
seen_tokens.update(range(span.start, span.end))
|
||||
result = sorted(result, key=lambda span: span.start)
|
||||
return result
|
||||
|
||||
|
||||
def to_bytes(getters, exclude):
|
||||
serialized = OrderedDict()
|
||||
for key, getter in getters.items():
|
||||
|
|
|
@ -457,7 +457,7 @@ sit amet dignissim justo congue.
|
|||
## Setup and installation {#setup}
|
||||
|
||||
Before running the setup, make sure your versions of
|
||||
[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date.
|
||||
[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date. Node v10.15 or later is required.
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
|
|
|
@ -198,7 +198,7 @@ will only train the tagger and parser.
|
|||
|
||||
```bash
|
||||
$ python -m spacy train [lang] [output_path] [train_path] [dev_path]
|
||||
[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-examples] [--use-gpu]
|
||||
[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-early-stopping] [--n-examples] [--use-gpu]
|
||||
[--version] [--meta-path] [--init-tok2vec] [--parser-multitasks]
|
||||
[--entity-multitasks] [--gold-preproc] [--noise-level] [--learn-tokens]
|
||||
[--verbose]
|
||||
|
@ -214,6 +214,7 @@ $ python -m spacy train [lang] [output_path] [train_path] [dev_path]
|
|||
| `--pipeline`, `-p` <Tag variant="new">2.1</Tag> | option | Comma-separated names of pipeline components to train. Defaults to `'tagger,parser,ner'`. |
|
||||
| `--vectors`, `-v` | option | Model to load vectors from. |
|
||||
| `--n-iter`, `-n` | option | Number of iterations (default: `30`). |
|
||||
| `--n-early-stopping`, `-ne` | option | Maximum number of training epochs without dev accuracy improvement. |
|
||||
| `--n-examples`, `-ns` | option | Number of examples to use (defaults to `0` for all examples). |
|
||||
| `--use-gpu`, `-g` | option | Whether to use GPU. Can be either `0`, `1` or `-1`. |
|
||||
| `--version`, `-V` | option | Model version. Will be written out to the model's `meta.json` after training. |
|
||||
|
@ -285,24 +286,26 @@ improvement.
|
|||
```bash
|
||||
$ python -m spacy pretrain [texts_loc] [vectors_model] [output_dir] [--width]
|
||||
[--depth] [--embed-rows] [--dropout] [--seed] [--n-iter] [--use-vectors]
|
||||
[--n-save_every]
|
||||
```
|
||||
|
||||
| Argument | Type | Description |
|
||||
| ---------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `texts_loc` | positional | Path to JSONL file with raw texts to learn from, with text provided as the key `"text"`. [See here](#pretrain-jsonl) for details. |
|
||||
| `vectors_model` | positional | Name or path to spaCy model with vectors to learn from. |
|
||||
| `output_dir` | positional | Directory to write models to on each epoch. |
|
||||
| `--width`, `-cw` | option | Width of CNN layers. |
|
||||
| `--depth`, `-cd` | option | Depth of CNN layers. |
|
||||
| `--embed-rows`, `-er` | option | Number of embedding rows. |
|
||||
| `--dropout`, `-d` | option | Dropout rate. |
|
||||
| `--batch-size`, `-bs` | option | Number of words per training batch. |
|
||||
| `--max-length`, `-xw` | option | Maximum words per example. Longer examples are discarded. |
|
||||
| `--min-length`, `-nw` | option | Minimum words per example. Shorter examples are discarded. |
|
||||
| `--seed`, `-s` | option | Seed for random number generators. |
|
||||
| `--n-iter`, `-i` | option | Number of iterations to pretrain. |
|
||||
| `--use-vectors`, `-uv` | flag | Whether to use the static vectors as input features. |
|
||||
| **CREATES** | weights | The pre-trained weights that can be used to initialize `spacy train`. |
|
||||
| Argument | Type | Description |
|
||||
| ----------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `texts_loc` | positional | Path to JSONL file with raw texts to learn from, with text provided as the key `"text"`. [See here](#pretrain-jsonl) for details. |
|
||||
| `vectors_model` | positional | Name or path to spaCy model with vectors to learn from. |
|
||||
| `output_dir` | positional | Directory to write models to on each epoch. |
|
||||
| `--width`, `-cw` | option | Width of CNN layers. |
|
||||
| `--depth`, `-cd` | option | Depth of CNN layers. |
|
||||
| `--embed-rows`, `-er` | option | Number of embedding rows. |
|
||||
| `--dropout`, `-d` | option | Dropout rate. |
|
||||
| `--batch-size`, `-bs` | option | Number of words per training batch. |
|
||||
| `--max-length`, `-xw` | option | Maximum words per example. Longer examples are discarded. |
|
||||
| `--min-length`, `-nw` | option | Minimum words per example. Shorter examples are discarded. |
|
||||
| `--seed`, `-s` | option | Seed for random number generators. |
|
||||
| `--n-iter`, `-i` | option | Number of iterations to pretrain. |
|
||||
| `--use-vectors`, `-uv` | flag | Whether to use the static vectors as input features. |
|
||||
| `--n-save_every`, `-se` | option | Save model every X batches. |
|
||||
| **CREATES** | weights | The pre-trained weights that can be used to initialize `spacy train`. |
|
||||
|
||||
### JSONL format for raw text {#pretrain-jsonl}
|
||||
|
||||
|
@ -324,7 +327,7 @@ tokenization can be provided.
|
|||
|
||||
| Key | Type | Description |
|
||||
| -------- | ------- | -------------------------------------------- |
|
||||
| `text` | unicode | The raw input text. |
|
||||
| `text` | unicode | The raw input text. Is not required if `tokens` available. |
|
||||
| `tokens` | list | Optional tokenization, one string per token. |
|
||||
|
||||
```json
|
||||
|
@ -332,6 +335,7 @@ tokenization can be provided.
|
|||
{"text": "Can I ask where you work now and what you do, and if you enjoy it?"}
|
||||
{"text": "They may just pull out of the Seattle market completely, at least until they have autonomous vehicles."}
|
||||
{"text": "My cynical view on this is that it will never be free to the public. Reason: what would be the draw of joining the military? Right now their selling point is free Healthcare and Education. Ironically both are run horribly and most, that I've talked to, come out wishing they never went in."}
|
||||
{"tokens": ["If", "tokens", "are", "provided", "then", "we", "can", "skip", "the", "raw", "input", "text"]}
|
||||
```
|
||||
|
||||
## Init Model {#init-model new="2"}
|
||||
|
@ -375,7 +379,7 @@ pipeline.
|
|||
|
||||
```bash
|
||||
$ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit]
|
||||
[--gpu-id] [--gold-preproc]
|
||||
[--gpu-id] [--gold-preproc] [--return-scores]
|
||||
```
|
||||
|
||||
| Argument | Type | Description |
|
||||
|
@ -386,6 +390,7 @@ $ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-lim
|
|||
| `--displacy-limit`, `-dl` | option | Number of parses to generate per file. Defaults to `25`. Keep in mind that a significantly higher number might cause the `.html` files to render slowly. |
|
||||
| `--gpu-id`, `-g` | option | GPU to use, if any. Defaults to `-1` for CPU. |
|
||||
| `--gold-preproc`, `-G` | flag | Use gold preprocessing. |
|
||||
| `--return-scores`, `-R` | flag | Return dict containing model scores. |
|
||||
| **CREATES** | `stdout`, HTML | Training results and optional displaCy visualizations. |
|
||||
|
||||
## Package {#package}
|
||||
|
|
|
@ -211,16 +211,16 @@ Render a dependency parse tree or named entity visualization.
|
|||
> html = displacy.render(doc, style="dep")
|
||||
> ```
|
||||
|
||||
| Name | Type | Description | Default |
|
||||
| ----------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------- |
|
||||
| `docs` | list, `Doc`, `Span` | Document(s) to visualize. |
|
||||
| `style` | unicode | Visualization style, `'dep'` or `'ent'`. | `'dep'` |
|
||||
| `page` | bool | Render markup as full HTML page. | `False` |
|
||||
| `minify` | bool | Minify HTML markup. | `False` |
|
||||
| `jupyter` | bool | Explicitly enable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. | detected automatically |
|
||||
| `options` | dict | [Visualizer-specific options](#options), e.g. colors. | `{}` |
|
||||
| `manual` | bool | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False` |
|
||||
| **RETURNS** | unicode | Rendered HTML markup. |
|
||||
| Name | Type | Description | Default |
|
||||
| ----------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
|
||||
| `docs` | list, `Doc`, `Span` | Document(s) to visualize. |
|
||||
| `style` | unicode | Visualization style, `'dep'` or `'ent'`. | `'dep'` |
|
||||
| `page` | bool | Render markup as full HTML page. | `False` |
|
||||
| `minify` | bool | Minify HTML markup. | `False` |
|
||||
| `jupyter` | bool | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None`. | `None` |
|
||||
| `options` | dict | [Visualizer-specific options](#options), e.g. colors. | `{}` |
|
||||
| `manual` | bool | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False` |
|
||||
| **RETURNS** | unicode | Rendered HTML markup. |
|
||||
|
||||
### Visualizer options {#displacy_options}
|
||||
|
||||
|
@ -654,6 +654,27 @@ for batching. Larger `buffsize` means less bias.
|
|||
| `buffsize` | int | Items to hold back. |
|
||||
| **YIELDS** | iterable | The shuffled iterator. |
|
||||
|
||||
### util.filter_spans {#util.filter_spans tag="function" new="2.1.4"}
|
||||
|
||||
Filter a sequence of [`Span`](/api/span) objects and remove duplicates or
|
||||
overlaps. Useful for creating named entities (where one token can only be part
|
||||
of one entity) or when merging spans with
|
||||
[`Retokenizer.merge`](/api/doc#retokenizer.merge). When spans overlap, the
|
||||
(first) longest span is preferred over shorter spans.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> doc = nlp("This is a sentence.")
|
||||
> spans = [doc[0:2], doc[0:2], doc[0:4]]
|
||||
> filtered = filter_spans(spans)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | -------- | -------------------- |
|
||||
| `spans` | iterable | The spans to filter. |
|
||||
| **RETURNS** | list | The filtered spans. |
|
||||
|
||||
## Compatibility functions {#compat source="spacy/compaty.py"}
|
||||
|
||||
All Python code is written in an **intersection of Python 2 and Python 3**. This
|
||||
|
|
|
@ -4,7 +4,7 @@ example, everything that's in your `nlp` object. This means you'll have to
|
|||
translate its contents and structure into a format that can be saved, like a
|
||||
file or a byte string. This process is called serialization. spaCy comes with
|
||||
**built-in serialization methods** and supports the
|
||||
[Pickle protocol](http://www.diveintopython3.net/serializing.html#dump).
|
||||
[Pickle protocol](https://www.diveinto.org/python3/serializing.html#dump).
|
||||
|
||||
> #### What's pickle?
|
||||
>
|
||||
|
|
|
@ -260,7 +260,7 @@ def my_component(doc):
|
|||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
nlp.add_pipe(my_component, name="print_info", last=True)
|
||||
print(nlp.pipe_names) # ['print_info', 'tagger', 'parser', 'ner']
|
||||
print(nlp.pipe_names) # ['tagger', 'parser', 'ner', 'print_info']
|
||||
doc = nlp(u"This is a sentence.")
|
||||
|
||||
```
|
||||
|
|
|
@ -713,9 +713,9 @@ from spacy.matcher import PhraseMatcher
|
|||
|
||||
nlp = spacy.load('en_core_web_sm')
|
||||
matcher = PhraseMatcher(nlp.vocab)
|
||||
terminology_list = [u"Barack Obama", u"Angela Merkel", u"Washington, D.C."]
|
||||
terms = [u"Barack Obama", u"Angela Merkel", u"Washington, D.C."]
|
||||
# Only run nlp.make_doc to speed things up
|
||||
patterns = [nlp.make_doc(text) for text in terminology_list]
|
||||
patterns = [nlp.make_doc(text) for text in terms]
|
||||
matcher.add("TerminologyList", None, *patterns)
|
||||
|
||||
doc = nlp(u"German Chancellor Angela Merkel and US President Barack Obama "
|
||||
|
|
|
@ -102,7 +102,7 @@ systems, or to pre-process text for **deep learning**.
|
|||
integrated and opinionated. spaCy tries to avoid asking the user to choose
|
||||
between multiple algorithms that deliver equivalent functionality. Keeping the
|
||||
menu small lets spaCy deliver generally better performance and developer
|
||||
experience.M
|
||||
experience.
|
||||
|
||||
- **spaCy is not a company**. It's an open-source library. Our company
|
||||
publishing spaCy and other software is called
|
||||
|
|
|
@ -980,6 +980,22 @@
|
|||
},
|
||||
"category": ["podcasts"]
|
||||
},
|
||||
{
|
||||
"type": "education",
|
||||
"id": "twimlai-podcast",
|
||||
"title": "TWiML & AI: Practical NLP with spaCy and Prodigy",
|
||||
"slogan": "May 2019",
|
||||
"description": "\"Ines and I caught up to discuss her various projects, including the aforementioned SpaCy, an open-source NLP library built with a focus on industry and production use cases. In our conversation, Ines gives us an overview of the SpaCy Library, a look at some of the use cases that excite her, and the Spacy community and contributors. We also discuss her work with Prodigy, an annotation service tool that uses continuous active learning to train models, and finally, what other exciting projects she is working on.\"",
|
||||
"thumb": "https://i.imgur.com/ng2F5gK.png",
|
||||
"url": "https://twimlai.com/twiml-talk-262-practical-natural-language-processing-with-spacy-and-prodigy-w-ines-montani",
|
||||
"iframe": "https://html5-player.libsyn.com/embed/episode/id/9691514/height/90/theme/custom/thumbnail/no/preload/no/direction/backward/render-playlist/no/custom-color/3e85b1/",
|
||||
"iframe_height": 90,
|
||||
"author": "Sam Charrington",
|
||||
"author_links": {
|
||||
"website": "https://twimlai.com"
|
||||
},
|
||||
"category": ["podcasts"]
|
||||
},
|
||||
{
|
||||
"id": "adam_qas",
|
||||
"title": "ADAM: Question Answering System",
|
||||
|
@ -1338,8 +1354,43 @@
|
|||
},
|
||||
"category": ["pipeline"],
|
||||
"tags": ["inflection"]
|
||||
},
|
||||
{
|
||||
"id": "NGym",
|
||||
"title": "NeuralGym",
|
||||
"slogan": "A little Windows GUI for training models with spaCy",
|
||||
"description": "NeuralGym is a Python application for Windows with a graphical user interface to train models with spaCy. Run the application, select an output folder, a training data file in spaCy's data format, a spaCy model or blank model and press 'Start'.",
|
||||
"github": "d5555/NeuralGym",
|
||||
"url": "https://github.com/d5555/NeuralGym",
|
||||
"image": "https://github.com/d5555/NeuralGym/raw/master/NGym.png",
|
||||
"thumb": "https://github.com/d5555/NeuralGym/raw/master/NGym/web.png",
|
||||
"author": "d5555",
|
||||
"category": ["training"],
|
||||
"tags": ["windows"]
|
||||
},
|
||||
{
|
||||
"id": "holmes",
|
||||
"title": "Holmes",
|
||||
"slogan": "Information extraction from English and German texts based on predicate logic",
|
||||
"github": "msg-systems/holmes-extractor",
|
||||
"url": "https://github.com/msg-systems/holmes-extractor",
|
||||
"description": "Holmes is a Python 3 library that supports a number of use cases involving information extraction from English and German texts, including chatbot, structural search, topic matching and supervised document classification.",
|
||||
"pip": "holmes-extractor",
|
||||
"category": ["conversational", "research", "standalone"],
|
||||
"tags": ["chatbots", "text-processing"],
|
||||
"code_example": [
|
||||
"import holmes_extractor as holmes",
|
||||
"holmes_manager = holmes.Manager(model='en_coref_lg')",
|
||||
"holmes_manager.register_search_phrase('A big dog chases a cat')",
|
||||
"holmes_manager.start_chatbot_mode_console()"
|
||||
],
|
||||
"author": "Richard Paul Hudson",
|
||||
"author_links": {
|
||||
"github": "richardpaulhudson"
|
||||
}
|
||||
}
|
||||
],
|
||||
|
||||
"categories": [
|
||||
{
|
||||
"label": "Projects",
|
||||
|
|
Loading…
Reference in New Issue
Block a user