mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-25 00:34:20 +03:00
Merge branch 'master' into feature/nel-wiki
This commit is contained in:
commit
d83a1e3052
106
.github/contributors/BreakBB.md
vendored
Normal file
106
.github/contributors/BreakBB.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | ------------------------ |
|
||||
| Name | Björn Böing |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 15.04.2019 |
|
||||
| GitHub username | BreakBB |
|
||||
| Website (optional) | |
|
106
.github/contributors/Dobita21.md
vendored
Normal file
106
.github/contributors/Dobita21.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Nattapol |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 18.04.2019 |
|
||||
| GitHub username | Dobita21 |
|
||||
| Website (optional) | |
|
106
.github/contributors/F0rge1cE.md
vendored
Normal file
106
.github/contributors/F0rge1cE.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Icarus Xu |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 05/06/2019 |
|
||||
| GitHub username | F0rge1cE |
|
||||
| Website (optional) | |
|
106
.github/contributors/NirantK.md
vendored
Normal file
106
.github/contributors/NirantK.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Nirant Kasliwal |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | |
|
||||
| GitHub username | NirantK |
|
||||
| Website (optional) | https://nirantk.com |
|
106
.github/contributors/aaronkub.md
vendored
Normal file
106
.github/contributors/aaronkub.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Aaron Kub |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2019-05-09 |
|
||||
| GitHub username | aaronkub |
|
||||
| Website (optional) | |
|
106
.github/contributors/amitness.md
vendored
Normal file
106
.github/contributors/amitness.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [X] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Amit Chaudhary |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | April 29, 2019 |
|
||||
| GitHub username | amitness |
|
||||
| Website (optional) | https://amitness.com |
|
106
.github/contributors/bjascob.md
vendored
Normal file
106
.github/contributors/bjascob.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Brad Jascob |
|
||||
| Company name (if applicable) | n/a |
|
||||
| Title or role (if applicable) | Software Engineer |
|
||||
| Date | 04/25/2019 |
|
||||
| GitHub username | bjascob |
|
||||
| Website (optional) | n/a |
|
106
.github/contributors/bryant1410.md
vendored
Normal file
106
.github/contributors/bryant1410.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Santiago Castro |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2019-04-09 |
|
||||
| GitHub username | bryant1410 |
|
||||
| Website (optional) | |
|
106
.github/contributors/celikomer.md
vendored
Normal file
106
.github/contributors/celikomer.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Omer Celik |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 04/11/2019 |
|
||||
| GitHub username | celikomer |
|
||||
| Website (optional) | www.ocelik.com |
|
106
.github/contributors/estr4ng7d.md
vendored
Normal file
106
.github/contributors/estr4ng7d.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Amey Baviskar |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 21-May-2019 |
|
||||
| GitHub username | estr4ng7d |
|
||||
| Website (optional) | |
|
106
.github/contributors/fizban99.md
vendored
Normal file
106
.github/contributors/fizban99.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | A.I.M. |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 16.04.2019 |
|
||||
| GitHub username | fizban99 |
|
||||
| Website (optional) | |
|
106
.github/contributors/henry860916.md
vendored
Normal file
106
.github/contributors/henry860916.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | ------------------------ |
|
||||
| Name | Henry Zhang |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2019-04-30 |
|
||||
| GitHub username | henry860916 |
|
||||
| Website (optional) | |
|
106
.github/contributors/ldorigo.md
vendored
Normal file
106
.github/contributors/ldorigo.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Luca Dorigo |
|
||||
| Company name (if applicable) | / |
|
||||
| Title or role (if applicable) | / |
|
||||
| Date | 08.05.2019 |
|
||||
| GitHub username | ldorigo |
|
||||
| Website (optional) | / |
|
106
.github/contributors/munozbravo.md
vendored
Normal file
106
.github/contributors/munozbravo.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Germán Muñoz |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2019-06-01 |
|
||||
| GitHub username | munozbravo |
|
||||
| Website (optional) | |
|
106
.github/contributors/nipunsadvilkar.md
vendored
Normal file
106
.github/contributors/nipunsadvilkar.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Nipun Sadvilkar |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 31st May, 2019 |
|
||||
| GitHub username | nipunsadvilkar|
|
||||
| Website (optional) |https://nipunsadvilkar.github.io/|
|
106
.github/contributors/pickfire.md
vendored
Normal file
106
.github/contributors/pickfire.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [ ] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Ivan Tham Jun Hoe |
|
||||
| Company name (if applicable) | Semut |
|
||||
| Title or role (if applicable) | Data Analyst |
|
||||
| Date | Apr 11, 2019 |
|
||||
| GitHub username | pickfire |
|
||||
| Website (optional) | https://pickfire.tk |
|
106
.github/contributors/richardpaulhudson.md
vendored
Normal file
106
.github/contributors/richardpaulhudson.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Richard Paul Hudson |
|
||||
| Company name (if applicable) | msg systems ag |
|
||||
| Title or role (if applicable) | Principal IT Consultant|
|
||||
| Date | 06. May 2019 |
|
||||
| GitHub username | richardpaulhudson |
|
||||
| Website (optional) | |
|
106
.github/contributors/ujwal-narayan.md
vendored
Normal file
106
.github/contributors/ujwal-narayan.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Ujwal Narayan |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 17/05/2019 |
|
||||
| GitHub username | ujwal-narayan |
|
||||
| Website (optional) | |
|
106
.github/contributors/xssChauhan.md
vendored
Normal file
106
.github/contributors/xssChauhan.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Shikhar Chauhan |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 12/11/2019 |
|
||||
| GitHub username | xssChauhan |
|
||||
| Website (optional) | |
|
106
.github/contributors/yaph.md
vendored
Normal file
106
.github/contributors/yaph.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Ramiro Gómez |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2019-04-29 |
|
||||
| GitHub username | yaph |
|
||||
| Website (optional) | http://ramiro.org/ |
|
|
@ -447,17 +447,7 @@ use the `get_doc()` utility function to construct it manually.
|
|||
|
||||
## Updating the website
|
||||
|
||||
Our [website and docs](https://spacy.io) are implemented in
|
||||
[Jade/Pug](https://www.jade-lang.org), and built or served by
|
||||
[Harp](https://harpjs.com). Jade/Pug is an extensible templating language with a
|
||||
readable syntax, that compiles to HTML. Here's how to view the site locally:
|
||||
|
||||
```bash
|
||||
sudo npm install --global harp
|
||||
git clone https://github.com/explosion/spaCy
|
||||
cd spaCy/website
|
||||
harp server
|
||||
```
|
||||
For instructions on how to build and run the [website](https://spacy.io) locally see **[Setup and installation](https://github.com/explosion/spaCy/blob/master/website/README.md#setup-and-installation-setup)** in the *website* directory's README.
|
||||
|
||||
The docs can always use another example or more detail, and they should always
|
||||
be up to date and not misleading. To quickly find the correct file to edit,
|
||||
|
|
16
README.md
16
README.md
|
@ -6,11 +6,10 @@ spaCy is a library for advanced Natural Language Processing in Python and
|
|||
Cython. It's built on the very latest research, and was designed from day one
|
||||
to be used in real products. spaCy comes with
|
||||
[pre-trained statistical models](https://spacy.io/models) and word vectors, and
|
||||
currently supports tokenization for **45+ languages**. It features the
|
||||
**fastest syntactic parser** in the world, convolutional
|
||||
**neural network models** for tagging, parsing and **named entity recognition**
|
||||
and easy **deep learning** integration. It's commercial open-source software,
|
||||
released under the MIT license.
|
||||
currently supports tokenization for **49+ languages**. It features
|
||||
state-of-the-art speed, convolutional **neural network models** for tagging,
|
||||
parsing and **named entity recognition** and easy **deep learning** integration.
|
||||
It's commercial open-source software, released under the MIT license.
|
||||
|
||||
💫 **Version 2.1 out now!** [Check out the release notes here.](https://github.com/explosion/spaCy/releases)
|
||||
|
||||
|
@ -66,11 +65,11 @@ valuable if it's shared publicly, so that more people can benefit from it.
|
|||
|
||||
## Features
|
||||
|
||||
- **Fastest syntactic parser** in the world
|
||||
- **Named entity** recognition
|
||||
- Non-destructive **tokenization**
|
||||
- Support for **45+ languages**
|
||||
- **Named entity** recognition
|
||||
- Support for **49+ languages**
|
||||
- Pre-trained [statistical models](https://spacy.io/models) and word vectors
|
||||
- State-of-the-art speed
|
||||
- Easy **deep learning** integration
|
||||
- Part-of-speech tagging
|
||||
- Labelled dependency parsing
|
||||
|
@ -80,7 +79,6 @@ valuable if it's shared publicly, so that more people can benefit from it.
|
|||
- Export to numpy data arrays
|
||||
- Efficient binary serialization
|
||||
- Easy **model packaging** and deployment
|
||||
- State-of-the-art speed
|
||||
- Robust, rigorously evaluated accuracy
|
||||
|
||||
📖 **For more details, see the
|
||||
|
|
|
@ -16,4 +16,4 @@ version=${version/\'/}
|
|||
version=${version/\"/}
|
||||
version=${version/\"/}
|
||||
git tag "v$version"
|
||||
git push origin "v$version" --tags
|
||||
git push origin "v$version"
|
||||
|
|
|
@ -36,11 +36,27 @@ def main(model="en_core_web_sm"):
|
|||
print("{:<10}\t{}\t{}".format(r1.text, r2.ent_type_, r2.text))
|
||||
|
||||
|
||||
def filter_spans(spans):
|
||||
# Filter a sequence of spans so they don't contain overlaps
|
||||
get_sort_key = lambda span: (span.end - span.start, span.start)
|
||||
sorted_spans = sorted(spans, key=get_sort_key, reverse=True)
|
||||
result = []
|
||||
seen_tokens = set()
|
||||
for span in sorted_spans:
|
||||
if span.start not in seen_tokens and span.end - 1 not in seen_tokens:
|
||||
result.append(span)
|
||||
seen_tokens.update(range(span.start, span.end))
|
||||
return result
|
||||
|
||||
|
||||
def extract_currency_relations(doc):
|
||||
# merge entities and noun chunks into one token
|
||||
# Merge entities and noun chunks into one token
|
||||
seen_tokens = set()
|
||||
spans = list(doc.ents) + list(doc.noun_chunks)
|
||||
for span in spans:
|
||||
span.merge()
|
||||
spans = filter_spans(spans)
|
||||
with doc.retokenize() as retokenizer:
|
||||
for span in spans:
|
||||
retokenizer.merge(span)
|
||||
|
||||
relations = []
|
||||
for money in filter(lambda w: w.ent_type_ == "MONEY", doc):
|
||||
|
|
|
@ -9,9 +9,10 @@ srsly>=0.0.5,<1.1.0
|
|||
# Third party dependencies
|
||||
numpy>=1.15.0
|
||||
requests>=2.13.0,<3.0.0
|
||||
jsonschema>=2.6.0,<3.0.0
|
||||
plac<1.0.0,>=0.9.6
|
||||
pathlib==1.0.1; python_version < "3.4"
|
||||
# Optional dependencies
|
||||
jsonschema>=2.6.0,<3.1.0
|
||||
# Development dependencies
|
||||
cython>=0.25
|
||||
pytest>=4.0.0,<4.1.0
|
||||
|
|
3
setup.py
3
setup.py
|
@ -209,7 +209,7 @@ def setup_package():
|
|||
generate_cython(root, "spacy")
|
||||
|
||||
setup(
|
||||
name=about["__title__"],
|
||||
name="spacy",
|
||||
zip_safe=False,
|
||||
packages=PACKAGES,
|
||||
package_data=PACKAGE_DATA,
|
||||
|
@ -232,7 +232,6 @@ def setup_package():
|
|||
"blis>=0.2.2,<0.3.0",
|
||||
"plac<1.0.0,>=0.9.6",
|
||||
"requests>=2.13.0,<3.0.0",
|
||||
"jsonschema>=2.6.0,<3.0.0",
|
||||
"wasabi>=0.2.0,<1.1.0",
|
||||
"srsly>=0.0.5,<1.1.0",
|
||||
'pathlib==1.0.1; python_version < "3.4"',
|
||||
|
|
|
@ -4,13 +4,13 @@
|
|||
# fmt: off
|
||||
|
||||
__title__ = "spacy"
|
||||
__version__ = "2.1.3"
|
||||
__version__ = "2.1.4"
|
||||
__summary__ = "Industrial-strength Natural Language Processing (NLP) with Python and Cython"
|
||||
__uri__ = "https://spacy.io"
|
||||
__author__ = "Explosion AI"
|
||||
__email__ = "contact@explosion.ai"
|
||||
__license__ = "MIT"
|
||||
__release__ = True
|
||||
__release__ = False
|
||||
|
||||
__download_url__ = "https://github.com/explosion/spacy-models/releases/download"
|
||||
__compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
|
||||
|
|
|
@ -39,7 +39,7 @@ FILE_TYPES_STDOUT = ("json", "jsonl")
|
|||
def convert(
|
||||
input_file,
|
||||
output_dir="-",
|
||||
file_type="jsonl",
|
||||
file_type="json",
|
||||
n_sents=1,
|
||||
morphology=False,
|
||||
converter="auto",
|
||||
|
@ -48,8 +48,8 @@ def convert(
|
|||
"""
|
||||
Convert files into JSON format for use with train command and other
|
||||
experiment management functions. If no output_dir is specified, the data
|
||||
is written to stdout, so you can pipe them forward to a JSONL file:
|
||||
$ spacy convert some_file.conllu > some_file.jsonl
|
||||
is written to stdout, so you can pipe them forward to a JSON file:
|
||||
$ spacy convert some_file.conllu > some_file.json
|
||||
"""
|
||||
msg = Printer()
|
||||
input_path = Path(input_file)
|
||||
|
|
|
@ -11,14 +11,8 @@ def iob2json(input_data, n_sents=10, *args, **kwargs):
|
|||
"""
|
||||
Convert IOB files into JSON format for use with train cli.
|
||||
"""
|
||||
docs = []
|
||||
for group in minibatch(docs, n_sents):
|
||||
group = list(group)
|
||||
first = group.pop(0)
|
||||
to_extend = first["paragraphs"][0]["sentences"]
|
||||
for sent in group[1:]:
|
||||
to_extend.extend(sent["paragraphs"][0]["sentences"])
|
||||
docs.append(first)
|
||||
sentences = read_iob(input_data.split("\n"))
|
||||
docs = merge_sentences(sentences, n_sents)
|
||||
return docs
|
||||
|
||||
|
||||
|
@ -27,7 +21,6 @@ def read_iob(raw_sents):
|
|||
for line in raw_sents:
|
||||
if not line.strip():
|
||||
continue
|
||||
# tokens = [t.split("|") for t in line.split()]
|
||||
tokens = [re.split("[^\w\-]", line.strip())]
|
||||
if len(tokens[0]) == 3:
|
||||
words, pos, iob = zip(*tokens)
|
||||
|
@ -49,3 +42,15 @@ def read_iob(raw_sents):
|
|||
paragraphs = [{"sentences": [sent]} for sent in sentences]
|
||||
docs = [{"id": 0, "paragraphs": [para]} for para in paragraphs]
|
||||
return docs
|
||||
|
||||
|
||||
def merge_sentences(docs, n_sents):
|
||||
merged = []
|
||||
for group in minibatch(docs, size=n_sents):
|
||||
group = list(group)
|
||||
first = group.pop(0)
|
||||
to_extend = first["paragraphs"][0]["sentences"]
|
||||
for sent in group[1:]:
|
||||
to_extend.extend(sent["paragraphs"][0]["sentences"])
|
||||
merged.append(first)
|
||||
return merged
|
||||
|
|
|
@ -17,6 +17,7 @@ from .. import displacy
|
|||
gpu_id=("Use GPU", "option", "g", int),
|
||||
displacy_path=("Directory to output rendered parses as HTML", "option", "dp", str),
|
||||
displacy_limit=("Limit of parses to render as HTML", "option", "dl", int),
|
||||
return_scores=("Return dict containing model scores", "flag", "R", bool),
|
||||
)
|
||||
def evaluate(
|
||||
model,
|
||||
|
@ -25,6 +26,7 @@ def evaluate(
|
|||
gold_preproc=False,
|
||||
displacy_path=None,
|
||||
displacy_limit=25,
|
||||
return_scores=False,
|
||||
):
|
||||
"""
|
||||
Evaluate a model. To render a sample of parses in a HTML file, set an
|
||||
|
@ -75,6 +77,8 @@ def evaluate(
|
|||
ents=render_ents,
|
||||
)
|
||||
msg.good("Generated {} parses as HTML".format(displacy_limit), displacy_path)
|
||||
if return_scores:
|
||||
return scorer.scores
|
||||
|
||||
|
||||
def render_parses(docs, output_path, model_name="", limit=250, deps=True, ents=True):
|
||||
|
|
|
@ -181,7 +181,7 @@ def read_vectors(vectors_loc):
|
|||
vectors_keys = []
|
||||
for i, line in enumerate(tqdm(f)):
|
||||
line = line.rstrip()
|
||||
pieces = line.rsplit(" ", vectors_data.shape[1] + 1)
|
||||
pieces = line.rsplit(" ", vectors_data.shape[1])
|
||||
word = pieces.pop(0)
|
||||
if len(pieces) != vectors_data.shape[1]:
|
||||
msg.fail(Errors.E094.format(line_num=i, loc=vectors_loc), exits=1)
|
||||
|
|
|
@ -34,7 +34,8 @@ from .. import util
|
|||
max_length=("Max words per example.", "option", "xw", int),
|
||||
min_length=("Min words per example.", "option", "nw", int),
|
||||
seed=("Seed for random number generators", "option", "s", float),
|
||||
nr_iter=("Number of iterations to pretrain", "option", "i", int),
|
||||
n_iter=("Number of iterations to pretrain", "option", "i", int),
|
||||
n_save_every=("Save model every X batches.", "option", "se", int),
|
||||
)
|
||||
def pretrain(
|
||||
texts_loc,
|
||||
|
@ -46,11 +47,12 @@ def pretrain(
|
|||
loss_func="cosine",
|
||||
use_vectors=False,
|
||||
dropout=0.2,
|
||||
nr_iter=1000,
|
||||
n_iter=1000,
|
||||
batch_size=3000,
|
||||
max_length=500,
|
||||
min_length=5,
|
||||
seed=0,
|
||||
n_save_every=None,
|
||||
):
|
||||
"""
|
||||
Pre-train the 'token-to-vector' (tok2vec) layer of pipeline components,
|
||||
|
@ -115,9 +117,26 @@ def pretrain(
|
|||
msg.divider("Pre-training tok2vec layer")
|
||||
row_settings = {"widths": (3, 10, 10, 6, 4), "aligns": ("r", "r", "r", "r", "r")}
|
||||
msg.row(("#", "# Words", "Total Loss", "Loss", "w/s"), **row_settings)
|
||||
for epoch in range(nr_iter):
|
||||
for batch in util.minibatch_by_words(
|
||||
((text, None) for text in texts), size=batch_size
|
||||
|
||||
def _save_model(epoch, is_temp=False):
|
||||
is_temp_str = ".temp" if is_temp else ""
|
||||
with model.use_params(optimizer.averages):
|
||||
with (output_dir / ("model%d%s.bin" % (epoch, is_temp_str))).open(
|
||||
"wb"
|
||||
) as file_:
|
||||
file_.write(model.tok2vec.to_bytes())
|
||||
log = {
|
||||
"nr_word": tracker.nr_word,
|
||||
"loss": tracker.loss,
|
||||
"epoch_loss": tracker.epoch_loss,
|
||||
"epoch": epoch,
|
||||
}
|
||||
with (output_dir / "log.jsonl").open("a") as file_:
|
||||
file_.write(srsly.json_dumps(log) + "\n")
|
||||
|
||||
for epoch in range(n_iter):
|
||||
for batch_id, batch in enumerate(
|
||||
util.minibatch_by_words(((text, None) for text in texts), size=batch_size)
|
||||
):
|
||||
docs = make_docs(
|
||||
nlp,
|
||||
|
@ -133,17 +152,9 @@ def pretrain(
|
|||
msg.row(progress, **row_settings)
|
||||
if texts_loc == "-" and tracker.words_per_epoch[epoch] >= 10 ** 7:
|
||||
break
|
||||
with model.use_params(optimizer.averages):
|
||||
with (output_dir / ("model%d.bin" % epoch)).open("wb") as file_:
|
||||
file_.write(model.tok2vec.to_bytes())
|
||||
log = {
|
||||
"nr_word": tracker.nr_word,
|
||||
"loss": tracker.loss,
|
||||
"epoch_loss": tracker.epoch_loss,
|
||||
"epoch": epoch,
|
||||
}
|
||||
with (output_dir / "log.jsonl").open("a") as file_:
|
||||
file_.write(srsly.json_dumps(log) + "\n")
|
||||
if n_save_every and (batch_id % n_save_every == 0):
|
||||
_save_model(epoch, is_temp=True)
|
||||
_save_model(epoch)
|
||||
tracker.epoch_loss = 0.0
|
||||
if texts_loc != "-":
|
||||
# Reshuffle the texts if texts were loaded from a file
|
||||
|
@ -170,10 +181,10 @@ def make_update(model, docs, optimizer, drop=0.0, objective="L2"):
|
|||
def make_docs(nlp, batch, min_length, max_length):
|
||||
docs = []
|
||||
for record in batch:
|
||||
text = record["text"]
|
||||
if "tokens" in record:
|
||||
doc = Doc(nlp.vocab, words=record["tokens"])
|
||||
else:
|
||||
text = record["text"]
|
||||
doc = nlp.make_doc(text)
|
||||
if "heads" in record:
|
||||
heads = record["heads"]
|
||||
|
|
|
@ -16,6 +16,7 @@ import random
|
|||
from .._ml import create_default_optimizer
|
||||
from ..attrs import PROB, IS_OOV, CLUSTER, LANG
|
||||
from ..gold import GoldCorpus
|
||||
from ..compat import path2str
|
||||
from .. import util
|
||||
from .. import about
|
||||
|
||||
|
@ -35,6 +36,12 @@ from .. import about
|
|||
pipeline=("Comma-separated names of pipeline components", "option", "p", str),
|
||||
vectors=("Model to load vectors from", "option", "v", str),
|
||||
n_iter=("Number of iterations", "option", "n", int),
|
||||
n_early_stopping=(
|
||||
"Maximum number of training epochs without dev accuracy improvement",
|
||||
"option",
|
||||
"ne",
|
||||
int,
|
||||
),
|
||||
n_examples=("Number of examples", "option", "ns", int),
|
||||
use_gpu=("Use GPU", "option", "g", int),
|
||||
version=("Model version", "option", "V", str),
|
||||
|
@ -74,6 +81,7 @@ def train(
|
|||
pipeline="tagger,parser,ner",
|
||||
vectors=None,
|
||||
n_iter=30,
|
||||
n_early_stopping=None,
|
||||
n_examples=0,
|
||||
use_gpu=-1,
|
||||
version="0.0.0",
|
||||
|
@ -101,6 +109,7 @@ def train(
|
|||
train_path = util.ensure_path(train_path)
|
||||
dev_path = util.ensure_path(dev_path)
|
||||
meta_path = util.ensure_path(meta_path)
|
||||
output_path = util.ensure_path(output_path)
|
||||
if raw_text is not None:
|
||||
raw_text = list(srsly.read_jsonl(raw_text))
|
||||
if not train_path or not train_path.exists():
|
||||
|
@ -222,6 +231,8 @@ def train(
|
|||
msg.row(row_head, **row_settings)
|
||||
msg.row(["-" * width for width in row_settings["widths"]], **row_settings)
|
||||
try:
|
||||
iter_since_best = 0
|
||||
best_score = 0.0
|
||||
for i in range(n_iter):
|
||||
train_docs = corpus.train_docs(
|
||||
nlp, noise_level=noise_level, gold_preproc=gold_preproc, max_length=0
|
||||
|
@ -276,7 +287,9 @@ def train(
|
|||
gpu_wps = nwords / (end_time - start_time)
|
||||
with Model.use_device("cpu"):
|
||||
nlp_loaded = util.load_model_from_path(epoch_model_path)
|
||||
nlp_loaded.parser.cfg["beam_width"]
|
||||
for name, component in nlp_loaded.pipeline:
|
||||
if hasattr(component, "cfg"):
|
||||
component.cfg["beam_width"] = beam_width
|
||||
dev_docs = list(
|
||||
corpus.dev_docs(nlp_loaded, gold_preproc=gold_preproc)
|
||||
)
|
||||
|
@ -328,6 +341,24 @@ def train(
|
|||
gpu_wps=gpu_wps,
|
||||
)
|
||||
msg.row(progress, **row_settings)
|
||||
# Early stopping
|
||||
if n_early_stopping is not None:
|
||||
current_score = _score_for_model(meta)
|
||||
if current_score < best_score:
|
||||
iter_since_best += 1
|
||||
else:
|
||||
iter_since_best = 0
|
||||
best_score = current_score
|
||||
if iter_since_best >= n_early_stopping:
|
||||
msg.text(
|
||||
"Early stopping, best iteration "
|
||||
"is: {}".format(i - iter_since_best)
|
||||
)
|
||||
msg.text(
|
||||
"Best score = {}; Final iteration "
|
||||
"score = {}".format(best_score, current_score)
|
||||
)
|
||||
break
|
||||
finally:
|
||||
with nlp.use_params(optimizer.averages):
|
||||
final_model_path = output_path / "model-final"
|
||||
|
@ -338,6 +369,20 @@ def train(
|
|||
msg.good("Created best model", best_model_path)
|
||||
|
||||
|
||||
def _score_for_model(meta):
|
||||
""" Returns mean score between tasks in pipeline that can be used for early stopping. """
|
||||
mean_acc = list()
|
||||
pipes = meta["pipeline"]
|
||||
acc = meta["accuracy"]
|
||||
if "tagger" in pipes:
|
||||
mean_acc.append(acc["tags_acc"])
|
||||
if "parser" in pipes:
|
||||
mean_acc.append((acc["uas"] + acc["las"]) / 2)
|
||||
if "ner" in pipes:
|
||||
mean_acc.append((acc["ents_p"] + acc["ents_r"] + acc["ents_f"]) / 3)
|
||||
return sum(mean_acc) / len(mean_acc)
|
||||
|
||||
|
||||
@contextlib.contextmanager
|
||||
def _create_progress_bar(total):
|
||||
if int(os.environ.get("LOG_FRIENDLY", 0)):
|
||||
|
@ -379,10 +424,12 @@ def _collate_best_model(meta, output_path, components):
|
|||
for component in components:
|
||||
bests[component] = _find_best(output_path, component)
|
||||
best_dest = output_path / "model-best"
|
||||
shutil.copytree(output_path / "model-final", best_dest)
|
||||
shutil.copytree(path2str(output_path / "model-final"), path2str(best_dest))
|
||||
for component, best_component_src in bests.items():
|
||||
shutil.rmtree(best_dest / component)
|
||||
shutil.copytree(best_component_src / component, best_dest / component)
|
||||
shutil.rmtree(path2str(best_dest / component))
|
||||
shutil.copytree(
|
||||
path2str(best_component_src / component), path2str(best_dest / component)
|
||||
)
|
||||
accs = srsly.read_json(best_component_src / "accuracy.json")
|
||||
for metric in _get_metrics(component):
|
||||
meta["accuracy"][metric] = accs[metric]
|
||||
|
|
|
@ -92,7 +92,9 @@ def symlink_to(orig, dest):
|
|||
if is_windows:
|
||||
import subprocess
|
||||
|
||||
subprocess.call(["mklink", "/d", path2str(orig), path2str(dest)], shell=True)
|
||||
subprocess.check_call(
|
||||
["mklink", "/d", path2str(orig), path2str(dest)], shell=True
|
||||
)
|
||||
else:
|
||||
orig.symlink_to(dest)
|
||||
|
||||
|
|
|
@ -19,7 +19,7 @@ RENDER_WRAPPER = None
|
|||
|
||||
|
||||
def render(
|
||||
docs, style="dep", page=False, minify=False, jupyter=False, options={}, manual=False
|
||||
docs, style="dep", page=False, minify=False, jupyter=None, options={}, manual=False
|
||||
):
|
||||
"""Render displaCy visualisation.
|
||||
|
||||
|
@ -27,7 +27,7 @@ def render(
|
|||
style (unicode): Visualisation style, 'dep' or 'ent'.
|
||||
page (bool): Render markup as full HTML page.
|
||||
minify (bool): Minify HTML markup.
|
||||
jupyter (bool): Experimental, use Jupyter's `display()` to output markup.
|
||||
jupyter (bool): Override Jupyter auto-detection.
|
||||
options (dict): Visualiser-specific options, e.g. colors.
|
||||
manual (bool): Don't parse `Doc` and instead expect a dict/list of dicts.
|
||||
RETURNS (unicode): Rendered HTML markup.
|
||||
|
@ -53,7 +53,8 @@ def render(
|
|||
html = _html["parsed"]
|
||||
if RENDER_WRAPPER is not None:
|
||||
html = RENDER_WRAPPER(html)
|
||||
if jupyter or is_in_jupyter(): # return HTML rendered by IPython display()
|
||||
if jupyter or (jupyter is None and is_in_jupyter()):
|
||||
# return HTML rendered by IPython display()
|
||||
from IPython.core.display import display, HTML
|
||||
|
||||
return display(HTML(html))
|
||||
|
|
|
@ -141,8 +141,14 @@ class Errors(object):
|
|||
E023 = ("Error cleaning up beam: The same state occurred twice at "
|
||||
"memory address {addr} and position {i}.")
|
||||
E024 = ("Could not find an optimal move to supervise the parser. Usually, "
|
||||
"this means the GoldParse was not correct. For example, are all "
|
||||
"labels added to the model?")
|
||||
"this means that the model can't be updated in a way that's valid "
|
||||
"and satisfies the correct annotations specified in the GoldParse. "
|
||||
"For example, are all labels added to the model? If you're "
|
||||
"training a named entity recognizer, also make sure that none of "
|
||||
"your annotated entity spans have leading or trailing whitespace. "
|
||||
"You can also use the experimental `debug-data` command to "
|
||||
"validate your JSON-formatted training data. For details, run:\n"
|
||||
"python -m spacy debug-data --help")
|
||||
E025 = ("String is too long: {length} characters. Max is 2**30.")
|
||||
E026 = ("Error accessing token at position {i}: out of bounds in Doc of "
|
||||
"length {length}.")
|
||||
|
@ -383,6 +389,10 @@ class Errors(object):
|
|||
E133 = ("The sum of prior probabilities for alias '{alias}' should not exceed 1, "
|
||||
"but found {sum}.")
|
||||
E134 = ("Alias '{alias}' defined for unknown entity '{entity}'.")
|
||||
E135 = ("If you meant to replace a built-in component, use `create_pipe`: "
|
||||
"`nlp.replace_pipe('{name}', nlp.create_pipe('{name}'))`")
|
||||
E136 = ("This additional feature requires the jsonschema library to be "
|
||||
"installed:\npip install jsonschema")
|
||||
|
||||
|
||||
@add_codes
|
||||
|
|
|
@ -168,6 +168,7 @@ GLOSSARY = {
|
|||
# Dependency Labels (English)
|
||||
# ClearNLP / Universal Dependencies
|
||||
# https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md
|
||||
"acl": "clausal modifier of noun (adjectival clause)",
|
||||
"acomp": "adjectival complement",
|
||||
"advcl": "adverbial clause modifier",
|
||||
"advmod": "adverbial modifier",
|
||||
|
@ -177,22 +178,32 @@ GLOSSARY = {
|
|||
"attr": "attribute",
|
||||
"aux": "auxiliary",
|
||||
"auxpass": "auxiliary (passive)",
|
||||
"case": "case marking",
|
||||
"cc": "coordinating conjunction",
|
||||
"ccomp": "clausal complement",
|
||||
"clf": "classifier",
|
||||
"complm": "complementizer",
|
||||
"compound": "compound",
|
||||
"conj": "conjunct",
|
||||
"cop": "copula",
|
||||
"csubj": "clausal subject",
|
||||
"csubjpass": "clausal subject (passive)",
|
||||
"dative": "dative",
|
||||
"dep": "unclassified dependent",
|
||||
"det": "determiner",
|
||||
"discourse": "discourse element",
|
||||
"dislocated": "dislocated elements",
|
||||
"dobj": "direct object",
|
||||
"expl": "expletive",
|
||||
"fixed": "fixed multiword expression",
|
||||
"flat": "flat multiword expression",
|
||||
"goeswith": "goes with",
|
||||
"hmod": "modifier in hyphenation",
|
||||
"hyph": "hyphen",
|
||||
"infmod": "infinitival modifier",
|
||||
"intj": "interjection",
|
||||
"iobj": "indirect object",
|
||||
"list": "list",
|
||||
"mark": "marker",
|
||||
"meta": "meta modifier",
|
||||
"neg": "negation modifier",
|
||||
|
@ -201,11 +212,15 @@ GLOSSARY = {
|
|||
"npadvmod": "noun phrase as adverbial modifier",
|
||||
"nsubj": "nominal subject",
|
||||
"nsubjpass": "nominal subject (passive)",
|
||||
"nounmod": "modifier of nominal",
|
||||
"npmod": "noun phrase as adverbial modifier",
|
||||
"num": "number modifier",
|
||||
"number": "number compound modifier",
|
||||
"nummod": "numeric modifier",
|
||||
"oprd": "object predicate",
|
||||
"obj": "object",
|
||||
"obl": "oblique nominal",
|
||||
"orphan": "orphan",
|
||||
"parataxis": "parataxis",
|
||||
"partmod": "participal modifier",
|
||||
"pcomp": "complement of preposition",
|
||||
|
@ -218,7 +233,10 @@ GLOSSARY = {
|
|||
"punct": "punctuation",
|
||||
"quantmod": "modifier of quantifier",
|
||||
"rcmod": "relative clause modifier",
|
||||
"relcl": "relative clause modifier",
|
||||
"reparandum": "overridden disfluency",
|
||||
"root": "root",
|
||||
"vocative": "vocative",
|
||||
"xcomp": "open clausal complement",
|
||||
# Dependency labels (German)
|
||||
# TIGER Treebank
|
||||
|
|
|
@ -532,7 +532,7 @@ cdef class GoldParse:
|
|||
self.labels[i] = deps[i2j_multi[i]]
|
||||
# Now set NER...This is annoying because if we've split
|
||||
# got an entity word split into two, we need to adjust the
|
||||
# BILOU tags. We can't have BB or LL etc.
|
||||
# BILUO tags. We can't have BB or LL etc.
|
||||
# Case 1: O -- easy.
|
||||
ner_tag = entities[i2j_multi[i]]
|
||||
if ner_tag == "O":
|
||||
|
|
|
@ -5,8 +5,8 @@ from __future__ import unicode_literals
|
|||
STOP_WORDS = set(
|
||||
"""
|
||||
á a ab aber ach acht achte achten achter achtes ag alle allein allem allen
|
||||
aller allerdings alles allgemeinen als also am an andere anderen andern anders
|
||||
auch auf aus ausser außer ausserdem außerdem
|
||||
aller allerdings alles allgemeinen als also am an andere anderen anderem andern
|
||||
anders auch auf aus ausser außer ausserdem außerdem
|
||||
|
||||
bald bei beide beiden beim beispiel bekannt bereits besonders besser besten bin
|
||||
bis bisher bist
|
||||
|
@ -35,8 +35,8 @@ großen grosser großer grosses großes gut gute guter gutes
|
|||
habe haben habt hast hat hatte hätte hatten hätten heisst heißt her heute hier
|
||||
hin hinter hoch
|
||||
|
||||
ich ihm ihn ihnen ihr ihre ihrem ihrer ihres im immer in indem infolgedessen
|
||||
ins irgend ist
|
||||
ich ihm ihn ihnen ihr ihre ihrem ihren ihrer ihres im immer in indem
|
||||
infolgedessen ins irgend ist
|
||||
|
||||
ja jahr jahre jahren je jede jedem jeden jeder jedermann jedermanns jedoch
|
||||
jemand jemandem jemanden jene jenem jenen jener jenes jetzt
|
||||
|
|
|
@ -39,7 +39,7 @@ made make many may me meanwhile might mine more moreover most mostly move much
|
|||
must my myself
|
||||
|
||||
name namely neither never nevertheless next nine no nobody none noone nor not
|
||||
nothing now nowhere
|
||||
nothing now nowhere
|
||||
|
||||
of off often on once one only onto or other others otherwise our ours ourselves
|
||||
out over own
|
||||
|
@ -75,4 +75,3 @@ STOP_WORDS.update(contractions)
|
|||
for apostrophe in ["‘", "’"]:
|
||||
for stopword in contractions:
|
||||
STOP_WORDS.add(stopword.replace("'", apostrophe))
|
||||
|
||||
|
|
|
@ -4,6 +4,7 @@ from __future__ import unicode_literals
|
|||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .tag_map import TAG_MAP
|
||||
from .stop_words import STOP_WORDS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from .lemmatizer import LOOKUP
|
||||
from .syntax_iterators import SYNTAX_ITERATORS
|
||||
|
||||
|
@ -16,6 +17,7 @@ from ...util import update_exc, add_lookups
|
|||
|
||||
class SpanishDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = lambda text: "es"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
|
||||
|
|
59
spacy/lang/es/lex_attrs.py
Normal file
59
spacy/lang/es/lex_attrs.py
Normal file
|
@ -0,0 +1,59 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from ...attrs import LIKE_NUM
|
||||
|
||||
|
||||
_num_words = [
|
||||
"cero",
|
||||
"uno",
|
||||
"dos",
|
||||
"tres",
|
||||
"cuatro",
|
||||
"cinco",
|
||||
"seis",
|
||||
"siete",
|
||||
"ocho",
|
||||
"nueve",
|
||||
"diez",
|
||||
"once",
|
||||
"doce",
|
||||
"trece",
|
||||
"catorce",
|
||||
"quince",
|
||||
"dieciséis",
|
||||
"diecisiete",
|
||||
"dieciocho",
|
||||
"diecinueve",
|
||||
"veinte",
|
||||
"treinta",
|
||||
"cuarenta",
|
||||
"cincuenta",
|
||||
"sesenta",
|
||||
"setenta",
|
||||
"ochenta",
|
||||
"noventa",
|
||||
"cien",
|
||||
"mil",
|
||||
"millón",
|
||||
"billón",
|
||||
"trillón",
|
||||
]
|
||||
|
||||
|
||||
def like_num(text):
|
||||
if text.startswith(("+", "-", "±", "~")):
|
||||
text = text[1:]
|
||||
text = text.replace(",", "").replace(".", "")
|
||||
if text.isdigit():
|
||||
return True
|
||||
if text.count("/") == 1:
|
||||
num, denom = text.split("/")
|
||||
if num.isdigit() and denom.isdigit():
|
||||
return True
|
||||
if text.lower() in _num_words:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
LEX_ATTRS = {LIKE_NUM: like_num}
|
|
@ -11,9 +11,9 @@ Example sentences to test spaCy and its language models.
|
|||
|
||||
|
||||
sentences = [
|
||||
"Apple cherche a acheter une startup anglaise pour 1 milliard de dollard",
|
||||
"Les voitures autonomes voient leur assurances décalées vers les constructeurs",
|
||||
"San Francisco envisage d'interdire les robots coursiers",
|
||||
"Apple cherche à acheter une startup anglaise pour 1 milliard de dollars",
|
||||
"Les voitures autonomes déplacent la responsabilité de l'assurance vers les constructeurs",
|
||||
"San Francisco envisage d'interdire les robots coursiers sur les trottoirs",
|
||||
"Londres est une grande ville du Royaume-Uni",
|
||||
"L’Italie choisit ArcelorMittal pour reprendre la plus grande aciérie d’Europe",
|
||||
"Apple lance HomePod parce qu'il se sent menacé par l'Echo d'Amazon",
|
||||
|
|
|
@ -7,88 +7,89 @@ from ...symbols import NOUN, PRON, AUX, SCONJ, INTJ, PART, PROPN
|
|||
|
||||
# POS explanations for indonesian available from https://www.aclweb.org/anthology/Y12-1014
|
||||
TAG_MAP = {
|
||||
"NSD": {POS: NOUN},
|
||||
"Z--": {POS: PUNCT},
|
||||
"VSA": {POS: VERB},
|
||||
"CC-": {POS: NUM},
|
||||
"R--": {POS: ADP},
|
||||
"D--": {POS: ADV},
|
||||
"ASP": {POS: ADJ},
|
||||
"S--": {POS: SCONJ},
|
||||
"VSP": {POS: VERB},
|
||||
"H--": {POS: CCONJ},
|
||||
"F--": {POS: X},
|
||||
"B--": {POS: DET},
|
||||
"CO-": {POS: NUM},
|
||||
"G--": {POS: ADV},
|
||||
"PS3": {POS: PRON},
|
||||
"W--": {POS: ADV},
|
||||
"O--": {POS: AUX},
|
||||
"PP1": {POS: PRON},
|
||||
"ASS": {POS: ADJ},
|
||||
"PS1": {POS: PRON},
|
||||
"APP": {POS: ADJ},
|
||||
"CD-": {POS: NUM},
|
||||
"VPA": {POS: VERB},
|
||||
"VPP": {POS: VERB},
|
||||
"X--": {POS: X},
|
||||
"CO-+PS3": {POS: NUM},
|
||||
"NSD+PS3": {POS: NOUN},
|
||||
"ASP+PS3": {POS: ADJ},
|
||||
"M--": {POS: AUX},
|
||||
"VSA+PS3": {POS: VERB},
|
||||
"R--+PS3": {POS: ADP},
|
||||
"W--+T--": {POS: ADV},
|
||||
"PS2": {POS:PRON},
|
||||
"NSD+PS1": {POS:NOUN},
|
||||
"PP3": {POS: PRON},
|
||||
"VSA+T--": {POS: VERB},
|
||||
"D--+T--": {POS: ADV},
|
||||
"VSP+PS3": {POS: VERB},
|
||||
"F--+PS3": {POS: X},
|
||||
"M--+T--": {POS: AUX},
|
||||
"F--+T--": {POS: X},
|
||||
"PUNCT": {POS: PUNCT},
|
||||
"PROPN": {POS: PROPN},
|
||||
"I--": {POS: INTJ},
|
||||
"S--+PS3": {POS: SCONJ},
|
||||
"ASP+T--": {POS: ADJ},
|
||||
"CC-+PS3": {POS: NUM},
|
||||
"NSD+PS2": {POS: NOUN},
|
||||
"B--+T--": {POS: DET},
|
||||
"H--+T--": {POS: CCONJ},
|
||||
"VSA+PS2": {POS: VERB},
|
||||
"NSF": {POS: NOUN},
|
||||
"PS1+VSA": {POS: PRON},
|
||||
"NPD": {POS: NOUN},
|
||||
"PP2": {POS:PRON},
|
||||
"VSA+PS1": {POS: VERB},
|
||||
"T--": {POS: PART},
|
||||
"NSM": {POS: NOUN},
|
||||
"NUM": {POS: NUM},
|
||||
"ASP+PS2": {POS: ADJ},
|
||||
"G--+T--": {POS: PART},
|
||||
"D--+PS3": {POS: ADV},
|
||||
"R--+PS2": {POS: ADP},
|
||||
"NSM+PS3": {POS: NOUN},
|
||||
"VSP+T--": {POS: VERB},
|
||||
"M--+PS3": {POS: AUX},
|
||||
"ASS+PS3": {POS: ADJ},
|
||||
"G--+PS3": {POS: PART},
|
||||
"F--+PS1": {POS: X},
|
||||
"NSD+T--": {POS: NOUN},
|
||||
"PP1+T--": {POS: PRON},
|
||||
"B--+PS3": {POS: DET},
|
||||
"NOUN": {POS: NOUN},
|
||||
"NPD+PS3": {POS: NOUN},
|
||||
"R--+PS1": {POS: ADP},
|
||||
"F--+PS2": {POS: X},
|
||||
"CD-+PS3": {POS: NUM},
|
||||
"PS1+VSA+T--":{POS: VERB},
|
||||
"PS2+VSA": {POS: VERB},
|
||||
"VERB": {POS: VERB},
|
||||
"CC-+T--": {POS: NUM},
|
||||
"NPD+PS2":{POS: NOUN},
|
||||
"D--+PS2":{POS: ADV},
|
||||
"PP3+T--": {POS: PRON},
|
||||
"X": {POS: X}}
|
||||
"NSD": {POS: NOUN},
|
||||
"Z--": {POS: PUNCT},
|
||||
"VSA": {POS: VERB},
|
||||
"CC-": {POS: NUM},
|
||||
"R--": {POS: ADP},
|
||||
"D--": {POS: ADV},
|
||||
"ASP": {POS: ADJ},
|
||||
"S--": {POS: SCONJ},
|
||||
"VSP": {POS: VERB},
|
||||
"H--": {POS: CCONJ},
|
||||
"F--": {POS: X},
|
||||
"B--": {POS: DET},
|
||||
"CO-": {POS: NUM},
|
||||
"G--": {POS: ADV},
|
||||
"PS3": {POS: PRON},
|
||||
"W--": {POS: ADV},
|
||||
"O--": {POS: AUX},
|
||||
"PP1": {POS: PRON},
|
||||
"ASS": {POS: ADJ},
|
||||
"PS1": {POS: PRON},
|
||||
"APP": {POS: ADJ},
|
||||
"CD-": {POS: NUM},
|
||||
"VPA": {POS: VERB},
|
||||
"VPP": {POS: VERB},
|
||||
"X--": {POS: X},
|
||||
"CO-+PS3": {POS: NUM},
|
||||
"NSD+PS3": {POS: NOUN},
|
||||
"ASP+PS3": {POS: ADJ},
|
||||
"M--": {POS: AUX},
|
||||
"VSA+PS3": {POS: VERB},
|
||||
"R--+PS3": {POS: ADP},
|
||||
"W--+T--": {POS: ADV},
|
||||
"PS2": {POS: PRON},
|
||||
"NSD+PS1": {POS: NOUN},
|
||||
"PP3": {POS: PRON},
|
||||
"VSA+T--": {POS: VERB},
|
||||
"D--+T--": {POS: ADV},
|
||||
"VSP+PS3": {POS: VERB},
|
||||
"F--+PS3": {POS: X},
|
||||
"M--+T--": {POS: AUX},
|
||||
"F--+T--": {POS: X},
|
||||
"PUNCT": {POS: PUNCT},
|
||||
"PROPN": {POS: PROPN},
|
||||
"I--": {POS: INTJ},
|
||||
"S--+PS3": {POS: SCONJ},
|
||||
"ASP+T--": {POS: ADJ},
|
||||
"CC-+PS3": {POS: NUM},
|
||||
"NSD+PS2": {POS: NOUN},
|
||||
"B--+T--": {POS: DET},
|
||||
"H--+T--": {POS: CCONJ},
|
||||
"VSA+PS2": {POS: VERB},
|
||||
"NSF": {POS: NOUN},
|
||||
"PS1+VSA": {POS: PRON},
|
||||
"NPD": {POS: NOUN},
|
||||
"PP2": {POS: PRON},
|
||||
"VSA+PS1": {POS: VERB},
|
||||
"T--": {POS: PART},
|
||||
"NSM": {POS: NOUN},
|
||||
"NUM": {POS: NUM},
|
||||
"ASP+PS2": {POS: ADJ},
|
||||
"G--+T--": {POS: PART},
|
||||
"D--+PS3": {POS: ADV},
|
||||
"R--+PS2": {POS: ADP},
|
||||
"NSM+PS3": {POS: NOUN},
|
||||
"VSP+T--": {POS: VERB},
|
||||
"M--+PS3": {POS: AUX},
|
||||
"ASS+PS3": {POS: ADJ},
|
||||
"G--+PS3": {POS: PART},
|
||||
"F--+PS1": {POS: X},
|
||||
"NSD+T--": {POS: NOUN},
|
||||
"PP1+T--": {POS: PRON},
|
||||
"B--+PS3": {POS: DET},
|
||||
"NOUN": {POS: NOUN},
|
||||
"NPD+PS3": {POS: NOUN},
|
||||
"R--+PS1": {POS: ADP},
|
||||
"F--+PS2": {POS: X},
|
||||
"CD-+PS3": {POS: NUM},
|
||||
"PS1+VSA+T--": {POS: VERB},
|
||||
"PS2+VSA": {POS: VERB},
|
||||
"VERB": {POS: VERB},
|
||||
"CC-+T--": {POS: NUM},
|
||||
"NPD+PS2": {POS: NOUN},
|
||||
"D--+PS2": {POS: ADV},
|
||||
"PP3+T--": {POS: PRON},
|
||||
"X": {POS: X},
|
||||
}
|
||||
|
|
|
@ -4,67 +4,87 @@ from __future__ import unicode_literals
|
|||
|
||||
STOP_WORDS = set(
|
||||
"""
|
||||
ಈ
|
||||
ಮತ್ತು
|
||||
ಹಾಗೂ
|
||||
ಅವರು
|
||||
ಅವರ
|
||||
ಬಗ್ಗೆ
|
||||
ಎಂಬ
|
||||
ಆದರೆ
|
||||
ಅವರನ್ನು
|
||||
ಆದರೆ
|
||||
ತಮ್ಮ
|
||||
ಒಂದು
|
||||
ಎಂದರು
|
||||
ಮೇಲೆ
|
||||
ಹೇಳಿದರು
|
||||
ಸೇರಿದಂತೆ
|
||||
ಬಳಿಕ
|
||||
ಆ
|
||||
ಯಾವುದೇ
|
||||
ಅವರಿಗೆ
|
||||
ನಡೆದ
|
||||
ಕುರಿತು
|
||||
ಇದು
|
||||
ಅವರು
|
||||
ಕಳೆದ
|
||||
ಇದೇ
|
||||
ತಿಳಿಸಿದರು
|
||||
ಹೀಗಾಗಿ
|
||||
ಕೂಡ
|
||||
ತನ್ನ
|
||||
ತಿಳಿಸಿದ್ದಾರೆ
|
||||
ನಾನು
|
||||
ಹೇಳಿದ್ದಾರೆ
|
||||
ಈಗ
|
||||
ಎಲ್ಲ
|
||||
ನನ್ನ
|
||||
ನಮ್ಮ
|
||||
ಈಗಾಗಲೇ
|
||||
ಇದಕ್ಕೆ
|
||||
ಹಲವು
|
||||
ಇದೆ
|
||||
ಮತ್ತೆ
|
||||
ಮಾಡುವ
|
||||
ನೀಡಿದರು
|
||||
ನಾವು
|
||||
ನೀಡಿದ
|
||||
ಇದರಿಂದ
|
||||
ಮೂಲಕ
|
||||
ಹಾಗೂ
|
||||
ಅದು
|
||||
ಇದನ್ನು
|
||||
ನೀಡಿದ್ದಾರೆ
|
||||
ಯಾವ
|
||||
ಎಂದರು
|
||||
ಅವರು
|
||||
ಈಗ
|
||||
ಎಂಬ
|
||||
ಹಾಗಾಗಿ
|
||||
ಅಷ್ಟೇ
|
||||
ನಾವು
|
||||
ಇದೇ
|
||||
ಹೇಳಿ
|
||||
ತಮ್ಮ
|
||||
ಹೀಗೆ
|
||||
ನಮ್ಮ
|
||||
ಬೇರೆ
|
||||
ನೀಡಿದರು
|
||||
ಮತ್ತೆ
|
||||
ಇದು
|
||||
ಈ
|
||||
ನೀವು
|
||||
ನಾನು
|
||||
ಇತ್ತು
|
||||
ಎಲ್ಲಾ
|
||||
ಯಾವುದೇ
|
||||
ನಡೆದ
|
||||
ಅದನ್ನು
|
||||
ಇಲ್ಲಿ
|
||||
ಆಗ
|
||||
ಬಂದಿದೆ.
|
||||
ಅದೇ
|
||||
ಇರುವ
|
||||
ಅಲ್ಲದೆ
|
||||
ಕೆಲವು
|
||||
ಎಂದರೆ
|
||||
ನೀಡಿದೆ
|
||||
ಹೀಗಾಗಿ
|
||||
ಜೊತೆಗೆ
|
||||
ಇದರಿಂದ
|
||||
ನನಗೆ
|
||||
ಅಲ್ಲದೆ
|
||||
ಎಷ್ಟು
|
||||
ಇದರ
|
||||
ಇಲ್ಲ
|
||||
ಕಳೆದ
|
||||
ತುಂಬಾ
|
||||
ಈಗಾಗಲೇ
|
||||
ಮಾಡಿ
|
||||
ಅದಕ್ಕೆ
|
||||
ಬಗ್ಗೆ
|
||||
ಅವರ
|
||||
ಇದನ್ನು
|
||||
ಆ
|
||||
ಇದೆ
|
||||
ಹೆಚ್ಚು
|
||||
ಇನ್ನು
|
||||
ಎಲ್ಲ
|
||||
ಇರುವ
|
||||
ಅವರಿಗೆ
|
||||
ನಿಮ್ಮ
|
||||
ಏನು
|
||||
ಕೂಡ
|
||||
ಇಲ್ಲಿ
|
||||
ನನ್ನನ್ನು
|
||||
ಕೆಲವು
|
||||
ಮಾತ್ರ
|
||||
ಬಳಿಕ
|
||||
ಅಂತ
|
||||
ತನ್ನ
|
||||
ಆಗ
|
||||
ಅಥವಾ
|
||||
ಅಲ್ಲ
|
||||
ಕೇವಲ
|
||||
ಆದರೆ
|
||||
ಮತ್ತು
|
||||
ಇನ್ನೂ
|
||||
ಅದೇ
|
||||
ಆಗಿ
|
||||
ಅವರನ್ನು
|
||||
ಹೇಳಿದ್ದಾರೆ
|
||||
ನಡೆದಿದೆ
|
||||
ಇದಕ್ಕೆ
|
||||
ಎಂಬುದು
|
||||
ಎಂದು
|
||||
ನನ್ನ
|
||||
ಮೇಲೆ
|
||||
""".split()
|
||||
)
|
||||
|
|
20
spacy/lang/mr/__init__.py
Normal file
20
spacy/lang/mr/__init__.py
Normal file
|
@ -0,0 +1,20 @@
|
|||
#coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .stop_words import STOP_WORDS
|
||||
from ...language import Language
|
||||
from ...attrs import LANG
|
||||
|
||||
|
||||
class MarathiDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters[LANG] = lambda text: "mr"
|
||||
stop_words = STOP_WORDS
|
||||
|
||||
|
||||
class Marathi(Language):
|
||||
lang = "mr"
|
||||
Defaults = MarathiDefaults
|
||||
|
||||
|
||||
__all__ = ["Marathi"]
|
196
spacy/lang/mr/stop_words.py
Normal file
196
spacy/lang/mr/stop_words.py
Normal file
|
@ -0,0 +1,196 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
# Source: https://github.com/stopwords-iso/stopwords-mr/blob/master/stopwords-mr.txt, https://github.com/6/stopwords-json/edit/master/dist/mr.json
|
||||
STOP_WORDS = set(
|
||||
"""
|
||||
न
|
||||
अतरी
|
||||
तो
|
||||
हें
|
||||
तें
|
||||
कां
|
||||
आणि
|
||||
जें
|
||||
जे
|
||||
मग
|
||||
ते
|
||||
मी
|
||||
जो
|
||||
परी
|
||||
गा
|
||||
हे
|
||||
ऐसें
|
||||
आतां
|
||||
नाहीं
|
||||
तेथ
|
||||
हा
|
||||
तया
|
||||
असे
|
||||
म्हणे
|
||||
काय
|
||||
कीं
|
||||
जैसें
|
||||
तंव
|
||||
तूं
|
||||
होय
|
||||
जैसा
|
||||
आहे
|
||||
पैं
|
||||
तैसा
|
||||
जरी
|
||||
म्हणोनि
|
||||
एक
|
||||
ऐसा
|
||||
जी
|
||||
ना
|
||||
मज
|
||||
एथ
|
||||
या
|
||||
जेथ
|
||||
जया
|
||||
तुज
|
||||
तेणें
|
||||
तैं
|
||||
पां
|
||||
असो
|
||||
करी
|
||||
ऐसी
|
||||
येणें
|
||||
जाहला
|
||||
तेंचि
|
||||
आघवें
|
||||
होती
|
||||
कांहीं
|
||||
होऊनि
|
||||
एकें
|
||||
मातें
|
||||
ठायीं
|
||||
ये
|
||||
सकळ
|
||||
केलें
|
||||
जेणें
|
||||
जाण
|
||||
जैसी
|
||||
होये
|
||||
जेवीं
|
||||
एऱ्हवीं
|
||||
मीचि
|
||||
किरीटी
|
||||
दिसे
|
||||
देवा
|
||||
हो
|
||||
तरि
|
||||
कीजे
|
||||
तैसे
|
||||
आपण
|
||||
तिये
|
||||
कर्म
|
||||
नोहे
|
||||
इये
|
||||
पडे
|
||||
माझें
|
||||
तैसी
|
||||
लागे
|
||||
नाना
|
||||
जंव
|
||||
कीर
|
||||
अधिक
|
||||
अनेक
|
||||
अशी
|
||||
असलयाचे
|
||||
असलेल्या
|
||||
असा
|
||||
असून
|
||||
असे
|
||||
आज
|
||||
आणि
|
||||
आता
|
||||
आपल्या
|
||||
आला
|
||||
आली
|
||||
आले
|
||||
आहे
|
||||
आहेत
|
||||
एक
|
||||
एका
|
||||
कमी
|
||||
करणयात
|
||||
करून
|
||||
का
|
||||
काम
|
||||
काय
|
||||
काही
|
||||
किवा
|
||||
की
|
||||
केला
|
||||
केली
|
||||
केले
|
||||
कोटी
|
||||
गेल्या
|
||||
घेऊन
|
||||
जात
|
||||
झाला
|
||||
झाली
|
||||
झाले
|
||||
झालेल्या
|
||||
टा
|
||||
तर
|
||||
तरी
|
||||
तसेच
|
||||
ता
|
||||
ती
|
||||
तीन
|
||||
ते
|
||||
तो
|
||||
त्या
|
||||
त्याचा
|
||||
त्याची
|
||||
त्याच्या
|
||||
त्याना
|
||||
त्यानी
|
||||
त्यामुळे
|
||||
त्री
|
||||
दिली
|
||||
दोन
|
||||
न
|
||||
पण
|
||||
पम
|
||||
परयतन
|
||||
पाटील
|
||||
म
|
||||
मात्र
|
||||
माहिती
|
||||
मी
|
||||
मुबी
|
||||
म्हणजे
|
||||
म्हणाले
|
||||
म्हणून
|
||||
या
|
||||
याचा
|
||||
याची
|
||||
याच्या
|
||||
याना
|
||||
यानी
|
||||
येणार
|
||||
येत
|
||||
येथील
|
||||
येथे
|
||||
लाख
|
||||
व
|
||||
व्यकत
|
||||
सर्व
|
||||
सागित्ले
|
||||
सुरू
|
||||
हजार
|
||||
हा
|
||||
ही
|
||||
हे
|
||||
होणार
|
||||
होत
|
||||
होता
|
||||
होती
|
||||
होते
|
||||
""".split()
|
||||
)
|
|
@ -6,10 +6,7 @@ from .lex_attrs import LEX_ATTRS
|
|||
from .tag_map import TAG_MAP
|
||||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
|
||||
|
||||
from .lemmatizer import LOOKUP, LEMMA_EXC, LEMMA_INDEX, RULES
|
||||
from .lemmatizer.lemmatizer import DutchLemmatizer
|
||||
|
||||
from .lemmatizer import LOOKUP, LEMMA_EXC, LEMMA_INDEX, RULES, DutchLemmatizer
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...language import Language
|
||||
|
@ -21,9 +18,10 @@ class DutchDefaults(Language.Defaults):
|
|||
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = lambda text: 'nl'
|
||||
lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM],
|
||||
BASE_NORMS)
|
||||
lex_attr_getters[LANG] = lambda text: "nl"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
stop_words = STOP_WORDS
|
||||
tag_map = TAG_MAP
|
||||
|
@ -36,15 +34,14 @@ class DutchDefaults(Language.Defaults):
|
|||
lemma_index = LEMMA_INDEX
|
||||
lemma_exc = LEMMA_EXC
|
||||
lemma_lookup = LOOKUP
|
||||
return DutchLemmatizer(index=lemma_index,
|
||||
exceptions=lemma_exc,
|
||||
lookup=lemma_lookup,
|
||||
rules=rules)
|
||||
return DutchLemmatizer(
|
||||
index=lemma_index, exceptions=lemma_exc, lookup=lemma_lookup, rules=rules
|
||||
)
|
||||
|
||||
|
||||
class Dutch(Language):
|
||||
lang = 'nl'
|
||||
lang = "nl"
|
||||
Defaults = DutchDefaults
|
||||
|
||||
|
||||
__all__ = ['Dutch']
|
||||
__all__ = ["Dutch"]
|
||||
|
|
|
@ -18,23 +18,26 @@ from ._adpositions import ADPOSITIONS
|
|||
from ._determiners import DETERMINERS
|
||||
|
||||
from .lookup import LOOKUP
|
||||
|
||||
from ._lemma_rules import RULES
|
||||
|
||||
from .lemmatizer import DutchLemmatizer
|
||||
|
||||
|
||||
LEMMA_INDEX = {"adj": ADJECTIVES,
|
||||
"noun": NOUNS,
|
||||
"verb": VERBS,
|
||||
"adp": ADPOSITIONS,
|
||||
"det": DETERMINERS}
|
||||
LEMMA_INDEX = {
|
||||
"adj": ADJECTIVES,
|
||||
"noun": NOUNS,
|
||||
"verb": VERBS,
|
||||
"adp": ADPOSITIONS,
|
||||
"det": DETERMINERS,
|
||||
}
|
||||
|
||||
LEMMA_EXC = {"adj": ADJECTIVES_IRREG,
|
||||
"adv": ADVERBS_IRREG,
|
||||
"adp": ADPOSITIONS_IRREG,
|
||||
"noun": NOUNS_IRREG,
|
||||
"verb": VERBS_IRREG,
|
||||
"det": DETERMINERS_IRREG,
|
||||
"pron": PRONOUNS_IRREG}
|
||||
LEMMA_EXC = {
|
||||
"adj": ADJECTIVES_IRREG,
|
||||
"adv": ADVERBS_IRREG,
|
||||
"adp": ADPOSITIONS_IRREG,
|
||||
"noun": NOUNS_IRREG,
|
||||
"verb": VERBS_IRREG,
|
||||
"det": DETERMINERS_IRREG,
|
||||
"pron": PRONOUNS_IRREG,
|
||||
}
|
||||
|
||||
__all__ = ["LOOKUP", "LEMMA_EXC", "LEMMA_INDEX", "RULES", "DutchLemmatizer"]
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from ...symbols import ORTH, LEMMA, TAG, NORM, PRON_LEMMA
|
||||
from ...symbols import ORTH
|
||||
|
||||
# Extensive list of both common and uncommon dutch abbreviations copied from
|
||||
# github.com/diasks2/pragmatic_segmenter, a Ruby library for rule-based
|
||||
|
@ -16,7 +16,7 @@ from ...symbols import ORTH, LEMMA, TAG, NORM, PRON_LEMMA
|
|||
# are extremely domain-specific. Tokenizer performance may benefit from some
|
||||
# slight pruning, although no performance regression has been observed so far.
|
||||
|
||||
|
||||
# fmt: off
|
||||
abbrevs = ['a.2d.', 'a.a.', 'a.a.j.b.', 'a.f.t.', 'a.g.j.b.',
|
||||
'a.h.v.', 'a.h.w.', 'a.hosp.', 'a.i.', 'a.j.b.', 'a.j.t.',
|
||||
'a.m.', 'a.m.r.', 'a.p.m.', 'a.p.r.', 'a.p.t.', 'a.s.',
|
||||
|
@ -326,7 +326,7 @@ abbrevs = ['a.2d.', 'a.a.', 'a.a.j.b.', 'a.f.t.', 'a.g.j.b.',
|
|||
'wtvb.', 'ww.', 'x.d.', 'z.a.', 'z.g.', 'z.i.', 'z.j.',
|
||||
'z.o.z.', 'z.p.', 'z.s.m.', 'zg.', 'zgn.', 'zn.', 'znw.',
|
||||
'zr.', 'zr.', 'ms.', 'zr.ms.']
|
||||
|
||||
# fmt: on
|
||||
|
||||
_exc = {}
|
||||
for orth in abbrevs:
|
||||
|
|
|
@ -53,4 +53,11 @@ BASE_NORMS = {
|
|||
"US$": "$",
|
||||
"C$": "$",
|
||||
"A$": "$",
|
||||
"₺": "$",
|
||||
"₹": "$",
|
||||
"৳": "$",
|
||||
"₩": "$",
|
||||
"Mex$": "$",
|
||||
"₣": "$",
|
||||
"E£": "$",
|
||||
}
|
||||
|
|
|
@ -4,11 +4,14 @@ from __future__ import unicode_literals
|
|||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .tag_map import TAG_MAP
|
||||
from .stop_words import STOP_WORDS
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
|
||||
from ...attrs import LANG
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...attrs import LANG, NORM
|
||||
from ...language import Language
|
||||
from ...tokens import Doc
|
||||
from ...util import DummyTokenizer
|
||||
from ...util import DummyTokenizer, add_lookups
|
||||
|
||||
|
||||
class ThaiTokenizer(DummyTokenizer):
|
||||
|
@ -25,15 +28,18 @@ class ThaiTokenizer(DummyTokenizer):
|
|||
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
||||
|
||||
def __call__(self, text):
|
||||
words = list(self.word_tokenize(text, "newmm"))
|
||||
words = list(self.word_tokenize(text))
|
||||
spaces = [False] * len(words)
|
||||
return Doc(self.vocab, words=words, spaces=spaces)
|
||||
|
||||
|
||||
class ThaiDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = lambda _text: "th"
|
||||
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
|
||||
)
|
||||
tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
|
||||
tag_map = TAG_MAP
|
||||
stop_words = STOP_WORDS
|
||||
|
|
62
spacy/lang/th/lex_attrs.py
Normal file
62
spacy/lang/th/lex_attrs.py
Normal file
|
@ -0,0 +1,62 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from ...attrs import LIKE_NUM
|
||||
|
||||
|
||||
_num_words = [
|
||||
"ศูนย์",
|
||||
"หนึ่ง",
|
||||
"สอง",
|
||||
"สาม",
|
||||
"สี่",
|
||||
"ห้า",
|
||||
"หก",
|
||||
"เจ็ด",
|
||||
"แปด",
|
||||
"เก้า",
|
||||
"สิบ",
|
||||
"สิบเอ็ด",
|
||||
"ยี่สิบ",
|
||||
"ยี่สิบเอ็ด",
|
||||
"สามสิบ",
|
||||
"สามสิบเอ็ด",
|
||||
"สี่สิบ",
|
||||
"สี่สิบเอ็ด",
|
||||
"ห้าสิบ",
|
||||
"ห้าสิบเอ็ด",
|
||||
"หกสิบเอ็ด",
|
||||
"เจ็ดสิบ",
|
||||
"เจ็ดสิบเอ็ด",
|
||||
"แปดสิบ",
|
||||
"แปดสิบเอ็ด",
|
||||
"เก้าสิบ",
|
||||
"เก้าสิบเอ็ด",
|
||||
"ร้อย",
|
||||
"พัน",
|
||||
"ล้าน",
|
||||
"พันล้าน",
|
||||
"หมื่นล้าน",
|
||||
"แสนล้าน",
|
||||
"ล้านล้าน",
|
||||
"ล้านล้านล้าน",
|
||||
"ล้านล้านล้านล้าน",
|
||||
]
|
||||
|
||||
|
||||
def like_num(text):
|
||||
if text.startswith(("+", "-", "±", "~")):
|
||||
text = text[1:]
|
||||
text = text.replace(",", "").replace(".", "")
|
||||
if text.isdigit():
|
||||
return True
|
||||
if text.count("/") == 1:
|
||||
num, denom = text.split("/")
|
||||
if num.isdigit() and denom.isdigit():
|
||||
return True
|
||||
if text in _num_words:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
LEX_ATTRS = {LIKE_NUM: like_num}
|
113
spacy/lang/th/norm_exceptions.py
Normal file
113
spacy/lang/th/norm_exceptions.py
Normal file
|
@ -0,0 +1,113 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
_exc = {
|
||||
# Conjugation and Diversion invalid to Tonal form (ผันอักษรและเสียงไม่ตรงกับรูปวรรณยุกต์)
|
||||
"สนุ๊กเกอร์": "สนุกเกอร์",
|
||||
"โน้ต": "โน้ต",
|
||||
# Misspelled because of being lazy or hustle (สะกดผิดเพราะขี้เกียจพิมพ์ หรือเร่งรีบ)
|
||||
"โทสับ": "โทรศัพท์",
|
||||
"พุ่งนี้": "พรุ่งนี้",
|
||||
# Strange (ให้ดูแปลกตา)
|
||||
"ชะมะ": "ใช่ไหม",
|
||||
"ชิมิ": "ใช่ไหม",
|
||||
"ชะ": "ใช่ไหม",
|
||||
"ช่ายมะ": "ใช่ไหม",
|
||||
"ป่าว": "เปล่า",
|
||||
"ป่ะ": "เปล่า",
|
||||
"ปล่าว": "เปล่า",
|
||||
"คัย": "ใคร",
|
||||
"ไค": "ใคร",
|
||||
"คราย": "ใคร",
|
||||
"เตง": "ตัวเอง",
|
||||
"ตะเอง": "ตัวเอง",
|
||||
"รึ": "หรือ",
|
||||
"เหรอ": "หรือ",
|
||||
"หรา": "หรือ",
|
||||
"หรอ": "หรือ",
|
||||
"ชั้น": "ฉัน",
|
||||
"ชั้ล": "ฉัน",
|
||||
"ช้าน": "ฉัน",
|
||||
"เทอ": "เธอ",
|
||||
"เทอร์": "เธอ",
|
||||
"เทอว์": "เธอ",
|
||||
"แกร": "แก",
|
||||
"ป๋ม": "ผม",
|
||||
"บ่องตง": "บอกตรงๆ",
|
||||
"ถ่ามตง": "ถามตรงๆ",
|
||||
"ต่อมตง": "ตอบตรงๆ",
|
||||
"เพิ่ล": "เพื่อน",
|
||||
"จอบอ": "จอบอ",
|
||||
"ดั้ย": "ได้",
|
||||
"ขอบคุง": "ขอบคุณ",
|
||||
"ยังงัย": "ยังไง",
|
||||
"Inw": "เทพ",
|
||||
"uou": "นอน",
|
||||
"Lกรีeu": "เกรียน",
|
||||
# Misspelled to express emotions (คำที่สะกดผิดเพื่อแสดงอารมณ์)
|
||||
"เปงราย": "เป็นอะไร",
|
||||
"เปนรัย": "เป็นอะไร",
|
||||
"เปงรัย": "เป็นอะไร",
|
||||
"เป็นอัลไล": "เป็นอะไร",
|
||||
"ทามมาย": "ทำไม",
|
||||
"ทามมัย": "ทำไม",
|
||||
"จังรุย": "จังเลย",
|
||||
"จังเยย": "จังเลย",
|
||||
"จุงเบย": "จังเลย",
|
||||
"ไม่รู้": "มะรุ",
|
||||
"เฮ่ย": "เฮ้ย",
|
||||
"เห้ย": "เฮ้ย",
|
||||
"น่าร็อค": "น่ารัก",
|
||||
"น่าร๊าก": "น่ารัก",
|
||||
"ตั้ลล๊าก": "น่ารัก",
|
||||
"คือร๊ะ": "คืออะไร",
|
||||
"โอป่ะ": "โอเคหรือเปล่า",
|
||||
"น่ามคาน": "น่ารำคาญ",
|
||||
"น่ามสาร": "น่าสงสาร",
|
||||
"วงวาร": "สงสาร",
|
||||
"บับว่า": "แบบว่า",
|
||||
"อัลไล": "อะไร",
|
||||
"อิจ": "อิจฉา",
|
||||
# Reduce rough words or Avoid to software filter (คำที่สะกดผิดเพื่อลดความหยาบของคำ หรืออาจใช้หลีกเลี่ยงการกรองคำหยาบของซอฟต์แวร์)
|
||||
"กรู": "กู",
|
||||
"กุ": "กู",
|
||||
"กรุ": "กู",
|
||||
"ตู": "กู",
|
||||
"ตรู": "กู",
|
||||
"มรึง": "มึง",
|
||||
"เมิง": "มึง",
|
||||
"มืง": "มึง",
|
||||
"มุง": "มึง",
|
||||
"สาด": "สัตว์",
|
||||
"สัส": "สัตว์",
|
||||
"สัก": "สัตว์",
|
||||
"แสรด": "สัตว์",
|
||||
"โคโตะ": "โคตร",
|
||||
"โคด": "โคตร",
|
||||
"โครต": "โคตร",
|
||||
"โคตะระ": "โคตร",
|
||||
"พ่อง": "พ่อมึง",
|
||||
"แม่เมิง": "แม่มึง",
|
||||
"เชี่ย": "เหี้ย",
|
||||
# Imitate words (คำเลียนเสียง โดยส่วนใหญ่จะเพิ่มทัณฑฆาต หรือซ้ำตัวอักษร)
|
||||
"แอร๊ยย": "อ๊าย",
|
||||
"อร๊ายยย": "อ๊าย",
|
||||
"มันส์": "มัน",
|
||||
"วู๊วววววววว์": "วู้",
|
||||
# Acronym (แบบคำย่อ)
|
||||
"หมาลัย": "มหาวิทยาลัย",
|
||||
"วิดวะ": "วิศวะ",
|
||||
"สินสาด ": "ศิลปศาสตร์",
|
||||
"สินกำ ": "ศิลปกรรมศาสตร์",
|
||||
"เสารีย์ ": "อนุเสาวรีย์ชัยสมรภูมิ",
|
||||
"เมกา ": "อเมริกา",
|
||||
"มอไซค์ ": "มอเตอร์ไซค์",
|
||||
}
|
||||
|
||||
|
||||
NORM_EXCEPTIONS = {}
|
||||
|
||||
for string, norm in _exc.items():
|
||||
NORM_EXCEPTIONS[string] = norm
|
||||
NORM_EXCEPTIONS[string.title()] = norm
|
|
@ -5,7 +5,7 @@ from ...symbols import ORTH, LEMMA
|
|||
|
||||
|
||||
_exc = {
|
||||
#หน่วยงานรัฐ / government agency
|
||||
# หน่วยงานรัฐ / government agency
|
||||
"กกต.": [{ORTH: "กกต.", LEMMA: "คณะกรรมการการเลือกตั้ง"}],
|
||||
"กทท.": [{ORTH: "กทท.", LEMMA: "การท่าเรือแห่งประเทศไทย"}],
|
||||
"กทพ.": [{ORTH: "กทพ.", LEMMA: "การทางพิเศษแห่งประเทศไทย"}],
|
||||
|
@ -44,11 +44,21 @@ _exc = {
|
|||
"ธอส.": [{ORTH: "ธอส.", LEMMA: "ธนาคารอาคารสงเคราะห์"}],
|
||||
"นย.": [{ORTH: "นย.", LEMMA: "นาวิกโยธิน"}],
|
||||
"ปตท.": [{ORTH: "ปตท.", LEMMA: "การปิโตรเลียมแห่งประเทศไทย"}],
|
||||
"ป.ป.ช.": [{ORTH: "ป.ป.ช.", LEMMA: "คณะกรรมการป้องกันและปราบปรามการทุจริตและประพฤติมิชอบในวงราชการ"}],
|
||||
"ป.ป.ช.": [
|
||||
{
|
||||
ORTH: "ป.ป.ช.",
|
||||
LEMMA: "คณะกรรมการป้องกันและปราบปรามการทุจริตและประพฤติมิชอบในวงราชการ",
|
||||
}
|
||||
],
|
||||
"ป.ป.ส.": [{ORTH: "ป.ป.ส.", LEMMA: "คณะกรรมการป้องกันและปราบปรามยาเสพติด"}],
|
||||
"บพร.": [{ORTH: "บพร.", LEMMA: "กรมการบินพลเรือน"}],
|
||||
"บย.": [{ORTH: "บย.", LEMMA: "กองบินยุทธการ"}],
|
||||
"พสวท.": [{ORTH: "พสวท.", LEMMA: "โครงการพัฒนาและส่งเสริมผู้มีความรู้ความสามารถพิเศษทางวิทยาศาสตร์และเทคโนโลยี"}],
|
||||
"พสวท.": [
|
||||
{
|
||||
ORTH: "พสวท.",
|
||||
LEMMA: "โครงการพัฒนาและส่งเสริมผู้มีความรู้ความสามารถพิเศษทางวิทยาศาสตร์และเทคโนโลยี",
|
||||
}
|
||||
],
|
||||
"มอก.": [{ORTH: "มอก.", LEMMA: "สำนักงานมาตรฐานผลิตภัณฑ์อุตสาหกรรม"}],
|
||||
"ยธ.": [{ORTH: "ยธ.", LEMMA: "กรมโยธาธิการ"}],
|
||||
"รพช.": [{ORTH: "รพช.", LEMMA: "สำนักงานเร่งรัดพัฒนาชนบท"}],
|
||||
|
@ -71,11 +81,15 @@ _exc = {
|
|||
"สปช.": [{ORTH: "สปช.", LEMMA: "สำนักงานคณะกรรมการการประถมศึกษาแห่งชาติ"}],
|
||||
"สปอ.": [{ORTH: "สปอ.", LEMMA: "สำนักงานการประถมศึกษาอำเภอ"}],
|
||||
"สพช.": [{ORTH: "สพช.", LEMMA: "สำนักงานคณะกรรมการนโยบายพลังงานแห่งชาติ"}],
|
||||
"สยช.": [{ORTH: "สยช.", LEMMA: "สำนักงานคณะกรรมการส่งเสริมและประสานงานเยาวชนแห่งชาติ"}],
|
||||
"สยช.": [
|
||||
{ORTH: "สยช.", LEMMA: "สำนักงานคณะกรรมการส่งเสริมและประสานงานเยาวชนแห่งชาติ"}
|
||||
],
|
||||
"สวช.": [{ORTH: "สวช.", LEMMA: "สำนักงานคณะกรรมการวัฒนธรรมแห่งชาติ"}],
|
||||
"สวท.": [{ORTH: "สวท.", LEMMA: "สถานีวิทยุกระจายเสียงแห่งประเทศไทย"}],
|
||||
"สวทช.": [{ORTH: "สวทช.", LEMMA: "สำนักงานพัฒนาวิทยาศาสตร์และเทคโนโลยีแห่งชาติ"}],
|
||||
"สคช.": [{ORTH: "สคช.", LEMMA: "สำนักงานคณะกรรมการพัฒนาการเศรษฐกิจและสังคมแห่งชาติ"}],
|
||||
"สคช.": [
|
||||
{ORTH: "สคช.", LEMMA: "สำนักงานคณะกรรมการพัฒนาการเศรษฐกิจและสังคมแห่งชาติ"}
|
||||
],
|
||||
"สสว.": [{ORTH: "สสว.", LEMMA: "สำนักงานส่งเสริมวิสาหกิจขนาดกลางและขนาดย่อม"}],
|
||||
"สสส.": [{ORTH: "สสส.", LEMMA: "สำนักงานกองทุนสนับสนุนการสร้างเสริมสุขภาพ"}],
|
||||
"สสวท.": [{ORTH: "สสวท.", LEMMA: "สถาบันส่งเสริมการสอนวิทยาศาสตร์และเทคโนโลยี"}],
|
||||
|
@ -85,7 +99,7 @@ _exc = {
|
|||
"อปพร.": [{ORTH: "อปพร.", LEMMA: "อาสาสมัครป้องกันภัยฝ่ายพลเรือน"}],
|
||||
"อย.": [{ORTH: "อย.", LEMMA: "สำนักงานคณะกรรมการอาหารและยา"}],
|
||||
"อ.ส.ม.ท.": [{ORTH: "อ.ส.ม.ท.", LEMMA: "องค์การสื่อสารมวลชนแห่งประเทศไทย"}],
|
||||
#มหาวิทยาลัย / สถานศึกษา / university / college
|
||||
# มหาวิทยาลัย / สถานศึกษา / university / college
|
||||
"มทส.": [{ORTH: "มทส.", LEMMA: "มหาวิทยาลัยเทคโนโลยีสุรนารี"}],
|
||||
"มธ.": [{ORTH: "มธ.", LEMMA: "มหาวิทยาลัยธรรมศาสตร์"}],
|
||||
"ม.อ.": [{ORTH: "ม.อ.", LEMMA: "มหาวิทยาลัยสงขลานครินทร์"}],
|
||||
|
@ -93,7 +107,7 @@ _exc = {
|
|||
"มมส.": [{ORTH: "มมส.", LEMMA: "มหาวิทยาลัยมหาสารคาม"}],
|
||||
"วท.": [{ORTH: "วท.", LEMMA: "วิทยาลัยเทคนิค"}],
|
||||
"สตม.": [{ORTH: "สตม.", LEMMA: "สำนักงานตรวจคนเข้าเมือง (ตำรวจ)"}],
|
||||
#ยศ / rank
|
||||
# ยศ / rank
|
||||
"ดร.": [{ORTH: "ดร.", LEMMA: "ดอกเตอร์"}],
|
||||
"ด.ต.": [{ORTH: "ด.ต.", LEMMA: "ดาบตำรวจ"}],
|
||||
"จ.ต.": [{ORTH: "จ.ต.", LEMMA: "จ่าตรี"}],
|
||||
|
@ -133,10 +147,14 @@ _exc = {
|
|||
"ผญบ.": [{ORTH: "ผญบ.", LEMMA: "ผู้ใหญ่บ้าน"}],
|
||||
"ผบ.": [{ORTH: "ผบ.", LEMMA: "ผู้บังคับบัญชา"}],
|
||||
"ผบก.": [{ORTH: "ผบก.", LEMMA: "ผู้บังคับบัญชาการ (ตำรวจ)"}],
|
||||
"ผบก.": [{ORTH: "ผบก.", LEMMA: "ผู้บังคับการ (ตำรวจ)"}],
|
||||
"ผบก.น.": [{ORTH: "ผบก.น.", LEMMA: "ผู้บังคับการตำรวจนครบาล"}],
|
||||
"ผบก.ป.": [{ORTH: "ผบก.ป.", LEMMA: "ผู้บังคับการตำรวจกองปราบปราม"}],
|
||||
"ผบก.ปค.": [{ORTH: "ผบก.ปค.", LEMMA: "ผู้บังคับการ กองบังคับการปกครอง (โรงเรียนนายร้อยตำรวจ)"}],
|
||||
"ผบก.ปค.": [
|
||||
{
|
||||
ORTH: "ผบก.ปค.",
|
||||
LEMMA: "ผู้บังคับการ กองบังคับการปกครอง (โรงเรียนนายร้อยตำรวจ)",
|
||||
}
|
||||
],
|
||||
"ผบก.ปม.": [{ORTH: "ผบก.ปม.", LEMMA: "ผู้บังคับการตำรวจป่าไม้"}],
|
||||
"ผบก.ภ.": [{ORTH: "ผบก.ภ.", LEMMA: "ผู้บังคับการตำรวจภูธร"}],
|
||||
"ผบช.": [{ORTH: "ผบช.", LEMMA: "ผู้บัญชาการ (ตำรวจ)"}],
|
||||
|
@ -177,7 +195,6 @@ _exc = {
|
|||
"พล.อ.ต.": [{ORTH: "พล.อ.ต.", LEMMA: "พลอากาศตรี"}],
|
||||
"พล.อ.ท.": [{ORTH: "พล.อ.ท.", LEMMA: "พลอากาศโท"}],
|
||||
"พล.อ.อ.": [{ORTH: "พล.อ.อ.", LEMMA: "พลอากาศเอก"}],
|
||||
"พ.อ.": [{ORTH: "พ.อ.", LEMMA: "พันเอก"}],
|
||||
"พ.อ.พิเศษ": [{ORTH: "พ.อ.พิเศษ", LEMMA: "พันเอกพิเศษ"}],
|
||||
"พ.อ.ต.": [{ORTH: "พ.อ.ต.", LEMMA: "พันจ่าอากาศตรี"}],
|
||||
"พ.อ.ท.": [{ORTH: "พ.อ.ท.", LEMMA: "พันจ่าอากาศโท"}],
|
||||
|
@ -209,7 +226,7 @@ _exc = {
|
|||
"ส.อ.": [{ORTH: "ส.อ.", LEMMA: "สิบเอก"}],
|
||||
"อจ.": [{ORTH: "อจ.", LEMMA: "อาจารย์"}],
|
||||
"อจญ.": [{ORTH: "อจญ.", LEMMA: "อาจารย์ใหญ่"}],
|
||||
#วุฒิ / bachelor degree
|
||||
# วุฒิ / bachelor degree
|
||||
"ป.": [{ORTH: "ป.", LEMMA: "ประถมศึกษา"}],
|
||||
"ป.กศ.": [{ORTH: "ป.กศ.", LEMMA: "ประกาศนียบัตรวิชาการศึกษา"}],
|
||||
"ป.กศ.สูง": [{ORTH: "ป.กศ.สูง", LEMMA: "ประกาศนียบัตรวิชาการศึกษาชั้นสูง"}],
|
||||
|
@ -283,20 +300,20 @@ _exc = {
|
|||
"อ.บ.": [{ORTH: "อ.บ.", LEMMA: "อักษรศาสตรบัณฑิต"}],
|
||||
"อ.ม.": [{ORTH: "อ.ม.", LEMMA: "อักษรศาสตรมหาบัณฑิต"}],
|
||||
"อ.ด.": [{ORTH: "อ.ด.", LEMMA: "อักษรศาสตรดุษฎีบัณฑิต"}],
|
||||
#ปี / เวลา / year / time
|
||||
# ปี / เวลา / year / time
|
||||
"ชม.": [{ORTH: "ชม.", LEMMA: "ชั่วโมง"}],
|
||||
"จ.ศ.": [{ORTH: "จ.ศ.", LEMMA: "จุลศักราช"}],
|
||||
"ค.ศ.": [{ORTH: "ค.ศ.", LEMMA: "คริสต์ศักราช"}],
|
||||
"ฮ.ศ.": [{ORTH: "ฮ.ศ.", LEMMA: "ฮิจเราะห์ศักราช"}],
|
||||
"ว.ด.ป.": [{ORTH: "ว.ด.ป.", LEMMA: "วัน เดือน ปี"}],
|
||||
#ระยะทาง / distance
|
||||
# ระยะทาง / distance
|
||||
"ฮม.": [{ORTH: "ฮม.", LEMMA: "เฮกโตเมตร"}],
|
||||
"ดคม.": [{ORTH: "ดคม.", LEMMA: "เดคาเมตร"}],
|
||||
"ดม.": [{ORTH: "ดม.", LEMMA: "เดซิเมตร"}],
|
||||
"มม.": [{ORTH: "มม.", LEMMA: "มิลลิเมตร"}],
|
||||
"ซม.": [{ORTH: "ซม.", LEMMA: "เซนติเมตร"}],
|
||||
"กม.": [{ORTH: "กม.", LEMMA: "กิโลเมตร"}],
|
||||
#น้ำหนัก / weight
|
||||
# น้ำหนัก / weight
|
||||
"น.น.": [{ORTH: "น.น.", LEMMA: "น้ำหนัก"}],
|
||||
"ฮก.": [{ORTH: "ฮก.", LEMMA: "เฮกโตกรัม"}],
|
||||
"ดคก.": [{ORTH: "ดคก.", LEMMA: "เดคากรัม"}],
|
||||
|
@ -305,7 +322,7 @@ _exc = {
|
|||
"มก.": [{ORTH: "มก.", LEMMA: "มิลลิกรัม"}],
|
||||
"ก.": [{ORTH: "ก.", LEMMA: "กรัม"}],
|
||||
"กก.": [{ORTH: "กก.", LEMMA: "กิโลกรัม"}],
|
||||
#ปริมาตร / volume
|
||||
# ปริมาตร / volume
|
||||
"ฮล.": [{ORTH: "ฮล.", LEMMA: "เฮกโตลิตร"}],
|
||||
"ดคล.": [{ORTH: "ดคล.", LEMMA: "เดคาลิตร"}],
|
||||
"ดล.": [{ORTH: "ดล.", LEMMA: "เดซิลิตร"}],
|
||||
|
@ -313,12 +330,12 @@ _exc = {
|
|||
"ล.": [{ORTH: "ล.", LEMMA: "ลิตร"}],
|
||||
"กล.": [{ORTH: "กล.", LEMMA: "กิโลลิตร"}],
|
||||
"ลบ.": [{ORTH: "ลบ.", LEMMA: "ลูกบาศก์"}],
|
||||
#พื้นที่ / area
|
||||
# พื้นที่ / area
|
||||
"ตร.ซม.": [{ORTH: "ตร.ซม.", LEMMA: "ตารางเซนติเมตร"}],
|
||||
"ตร.ม.": [{ORTH: "ตร.ม.", LEMMA: "ตารางเมตร"}],
|
||||
"ตร.ว.": [{ORTH: "ตร.ว.", LEMMA: "ตารางวา"}],
|
||||
"ตร.กม.": [{ORTH: "ตร.กม.", LEMMA: "ตารางกิโลเมตร"}],
|
||||
#เดือน / month
|
||||
# เดือน / month
|
||||
"ม.ค.": [{ORTH: "ม.ค.", LEMMA: "มกราคม"}],
|
||||
"ก.พ.": [{ORTH: "ก.พ.", LEMMA: "กุมภาพันธ์"}],
|
||||
"มี.ค.": [{ORTH: "มี.ค.", LEMMA: "มีนาคม"}],
|
||||
|
@ -331,22 +348,22 @@ _exc = {
|
|||
"ต.ค.": [{ORTH: "ต.ค.", LEMMA: "ตุลาคม"}],
|
||||
"พ.ย.": [{ORTH: "พ.ย.", LEMMA: "พฤศจิกายน"}],
|
||||
"ธ.ค.": [{ORTH: "ธ.ค.", LEMMA: "ธันวาคม"}],
|
||||
#เพศ / gender
|
||||
# เพศ / gender
|
||||
"ช.": [{ORTH: "ช.", LEMMA: "ชาย"}],
|
||||
"ญ.": [{ORTH: "ญ.", LEMMA: "หญิง"}],
|
||||
"ด.ช.": [{ORTH: "ด.ช.", LEMMA: "เด็กชาย"}],
|
||||
"ด.ญ.": [{ORTH: "ด.ญ.", LEMMA: "เด็กหญิง"}],
|
||||
#ที่อยู่ / address
|
||||
# ที่อยู่ / address
|
||||
"ถ.": [{ORTH: "ถ.", LEMMA: "ถนน"}],
|
||||
"ต.": [{ORTH: "ต.", LEMMA: "ตำบล"}],
|
||||
"อ.": [{ORTH: "อ.", LEMMA: "อำเภอ"}],
|
||||
"จ.": [{ORTH: "จ.", LEMMA: "จังหวัด"}],
|
||||
#สรรพนาม / pronoun
|
||||
# สรรพนาม / pronoun
|
||||
"ข้าฯ": [{ORTH: "ข้าฯ", LEMMA: "ข้าพระพุทธเจ้า"}],
|
||||
"ทูลเกล้าฯ": [{ORTH: "ทูลเกล้าฯ", LEMMA: "ทูลเกล้าทูลกระหม่อม"}],
|
||||
"น้อมเกล้าฯ": [{ORTH: "น้อมเกล้าฯ", LEMMA: "น้อมเกล้าน้อมกระหม่อม"}],
|
||||
"โปรดเกล้าฯ": [{ORTH: "โปรดเกล้าฯ", LEMMA: "โปรดเกล้าโปรดกระหม่อม"}],
|
||||
#การเมือง / politic
|
||||
# การเมือง / politic
|
||||
"ขจก.": [{ORTH: "ขจก.", LEMMA: "ขบวนการโจรก่อการร้าย"}],
|
||||
"ขบด.": [{ORTH: "ขบด.", LEMMA: "ขบวนการแบ่งแยกดินแดน"}],
|
||||
"นปช.": [{ORTH: "นปช.", LEMMA: "แนวร่วมประชาธิปไตยขับไล่เผด็จการ"}],
|
||||
|
@ -363,7 +380,7 @@ _exc = {
|
|||
"สจ.": [{ORTH: "สจ.", LEMMA: "สมาชิกสภาจังหวัด"}],
|
||||
"สว.": [{ORTH: "สว.", LEMMA: "สมาชิกวุฒิสภา"}],
|
||||
"ส.ส.": [{ORTH: "ส.ส.", LEMMA: "สมาชิกสภาผู้แทนราษฎร"}],
|
||||
#ทั่วไป / general
|
||||
# ทั่วไป / general
|
||||
"ก.ข.ค.": [{ORTH: "ก.ข.ค.", LEMMA: "ก้างขวางคอ"}],
|
||||
"กทม.": [{ORTH: "กทม.", LEMMA: "กรุงเทพมหานคร"}],
|
||||
"กรุงเทพฯ": [{ORTH: "กรุงเทพฯ", LEMMA: "กรุงเทพมหานคร"}],
|
||||
|
@ -376,7 +393,12 @@ _exc = {
|
|||
"จก.": [{ORTH: "จก.", LEMMA: "จำกัด"}],
|
||||
"จขกท.": [{ORTH: "จขกท.", LEMMA: "เจ้าของกระทู้"}],
|
||||
"จนท.": [{ORTH: "จนท.", LEMMA: "เจ้าหน้าที่"}],
|
||||
"จ.ป.ร.": [{ORTH: "จ.ป.ร.", LEMMA: "มหาจุฬาลงกรณ ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระจุลจอมเกล้าเจ้าอยู่หัว)"}],
|
||||
"จ.ป.ร.": [
|
||||
{
|
||||
ORTH: "จ.ป.ร.",
|
||||
LEMMA: "มหาจุฬาลงกรณ ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระจุลจอมเกล้าเจ้าอยู่หัว)",
|
||||
}
|
||||
],
|
||||
"จ.ม.": [{ORTH: "จ.ม.", LEMMA: "จดหมาย"}],
|
||||
"จย.": [{ORTH: "จย.", LEMMA: "จักรยาน"}],
|
||||
"จยย.": [{ORTH: "จยย.", LEMMA: "จักรยานยนต์"}],
|
||||
|
@ -387,7 +409,9 @@ _exc = {
|
|||
"น.ศ.": [{ORTH: "น.ศ.", LEMMA: "นักศึกษา"}],
|
||||
"น.ส.": [{ORTH: "น.ส.", LEMMA: "นางสาว"}],
|
||||
"น.ส.๓": [{ORTH: "น.ส.๓", LEMMA: "หนังสือรับรองการทำประโยชน์ในที่ดิน"}],
|
||||
"น.ส.๓ ก.": [{ORTH: "น.ส.๓ ก", LEMMA: "หนังสือแสดงกรรมสิทธิ์ในที่ดิน (มีระวางกำหนด)"}],
|
||||
"น.ส.๓ ก.": [
|
||||
{ORTH: "น.ส.๓ ก", LEMMA: "หนังสือแสดงกรรมสิทธิ์ในที่ดิน (มีระวางกำหนด)"}
|
||||
],
|
||||
"นสพ.": [{ORTH: "นสพ.", LEMMA: "หนังสือพิมพ์"}],
|
||||
"บ.ก.": [{ORTH: "บ.ก.", LEMMA: "บรรณาธิการ"}],
|
||||
"บจก.": [{ORTH: "บจก.", LEMMA: "บริษัทจำกัด"}],
|
||||
|
@ -410,7 +434,12 @@ _exc = {
|
|||
"พขร.": [{ORTH: "พขร.", LEMMA: "พนักงานขับรถ"}],
|
||||
"ภ.ง.ด.": [{ORTH: "ภ.ง.ด.", LEMMA: "ภาษีเงินได้"}],
|
||||
"ภ.ง.ด.๙": [{ORTH: "ภ.ง.ด.๙", LEMMA: "แบบแสดงรายการเสียภาษีเงินได้ของกรมสรรพากร"}],
|
||||
"ภ.ป.ร.": [{ORTH: "ภ.ป.ร.", LEMMA: "ภูมิพลอดุยเดช ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระปรมินทรมหาภูมิพลอดุลยเดช)"}],
|
||||
"ภ.ป.ร.": [
|
||||
{
|
||||
ORTH: "ภ.ป.ร.",
|
||||
LEMMA: "ภูมิพลอดุยเดช ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระปรมินทรมหาภูมิพลอดุลยเดช)",
|
||||
}
|
||||
],
|
||||
"ภ.พ.": [{ORTH: "ภ.พ.", LEMMA: "ภาษีมูลค่าเพิ่ม"}],
|
||||
"ร.": [{ORTH: "ร.", LEMMA: "รัชกาล"}],
|
||||
"ร.ง.": [{ORTH: "ร.ง.", LEMMA: "โรงงาน"}],
|
||||
|
@ -438,7 +467,6 @@ _exc = {
|
|||
"เสธ.": [{ORTH: "เสธ.", LEMMA: "เสนาธิการ"}],
|
||||
"หจก.": [{ORTH: "หจก.", LEMMA: "ห้างหุ้นส่วนจำกัด"}],
|
||||
"ห.ร.ม.": [{ORTH: "ห.ร.ม.", LEMMA: "ตัวหารร่วมมาก"}],
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
|
|
@ -333,6 +333,11 @@ class Language(object):
|
|||
"""
|
||||
if name not in self.pipe_names:
|
||||
raise ValueError(Errors.E001.format(name=name, opts=self.pipe_names))
|
||||
if not hasattr(component, "__call__"):
|
||||
msg = Errors.E003.format(component=repr(component), name=name)
|
||||
if isinstance(component, basestring_) and component in self.factories:
|
||||
msg += Errors.E135.format(name=name)
|
||||
raise ValueError(msg)
|
||||
self.pipeline[self.pipe_names.index(name)] = (name, component)
|
||||
|
||||
def rename_pipe(self, old_name, new_name):
|
||||
|
@ -412,7 +417,9 @@ class Language(object):
|
|||
golds (iterable): A batch of `GoldParse` objects.
|
||||
drop (float): The droput rate.
|
||||
sgd (callable): An optimizer.
|
||||
RETURNS (dict): Results from the update.
|
||||
losses (dict): Dictionary to update with the loss, keyed by component.
|
||||
component_cfg (dict): Config parameters for specific pipeline
|
||||
components, keyed by component name.
|
||||
|
||||
DOCS: https://spacy.io/api/language#update
|
||||
"""
|
||||
|
@ -593,6 +600,19 @@ class Language(object):
|
|||
def evaluate(
|
||||
self, docs_golds, verbose=False, batch_size=256, scorer=None, component_cfg=None
|
||||
):
|
||||
"""Evaluate a model's pipeline components.
|
||||
|
||||
docs_golds (iterable): Tuples of `Doc` and `GoldParse` objects.
|
||||
verbose (bool): Print debugging information.
|
||||
batch_size (int): Batch size to use.
|
||||
scorer (Scorer): Optional `Scorer` to use. If not passed in, a new one
|
||||
will be created.
|
||||
component_cfg (dict): An optional dictionary with extra keyword
|
||||
arguments for specific components.
|
||||
RETURNS (Scorer): The scorer containing the evaluation results.
|
||||
|
||||
DOCS: https://spacy.io/api/language#evaluate
|
||||
"""
|
||||
if scorer is None:
|
||||
scorer = Scorer()
|
||||
if component_cfg is None:
|
||||
|
|
|
@ -1,5 +1,6 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
from collections import OrderedDict
|
||||
|
||||
from .symbols import POS, NOUN, VERB, ADJ, PUNCT, PROPN
|
||||
from .symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
|
||||
|
@ -118,8 +119,8 @@ def lemmatize(string, index, exceptions, rules):
|
|||
forms.append(form)
|
||||
else:
|
||||
oov_forms.append(form)
|
||||
# Remove duplicates, and sort forms generated by rules alphabetically.
|
||||
forms = list(set(forms))
|
||||
# Remove duplicates but preserve the ordering of applied "rules"
|
||||
forms = list(OrderedDict.fromkeys(forms))
|
||||
# Put exceptions at the front of the list, so they get priority.
|
||||
# This is a dodgy heuristic -- but it's the best we can do until we get
|
||||
# frequencies on this. We can at least prune out problematic exceptions,
|
||||
|
|
|
@ -48,7 +48,10 @@ cdef class Matcher:
|
|||
self._extra_predicates = []
|
||||
self.vocab = vocab
|
||||
self.mem = Pool()
|
||||
self.validator = get_json_validator(TOKEN_PATTERN_SCHEMA) if validate else None
|
||||
if validate:
|
||||
self.validator = get_json_validator(TOKEN_PATTERN_SCHEMA)
|
||||
else:
|
||||
self.validator = None
|
||||
|
||||
def __reduce__(self):
|
||||
data = (self.vocab, self._patterns, self._callbacks)
|
||||
|
@ -105,7 +108,7 @@ cdef class Matcher:
|
|||
raise ValueError(Errors.E012.format(key=key))
|
||||
if self.validator:
|
||||
errors[i] = validate_json(pattern, self.validator)
|
||||
if errors:
|
||||
if any(err for err in errors.values()):
|
||||
raise MatchPatternError(key, errors)
|
||||
key = self._normalize_key(key)
|
||||
for pattern in patterns:
|
||||
|
|
|
@ -127,7 +127,7 @@ cdef class PhraseMatcher:
|
|||
and self.attr not in (DEP, POS, TAG, LEMMA):
|
||||
string_attr = self.vocab.strings[self.attr]
|
||||
user_warning(Warnings.W012.format(key=key, attr=string_attr))
|
||||
tags = get_bilou(length)
|
||||
tags = get_biluo(length)
|
||||
phrase_key = <attr_t*>mem.alloc(length, sizeof(attr_t))
|
||||
for i, tag in enumerate(tags):
|
||||
attr_value = self.get_lex_value(doc, i)
|
||||
|
@ -230,7 +230,7 @@ cdef class PhraseMatcher:
|
|||
return "matcher:{}-{}".format(string_attr_name, string_attr_value)
|
||||
|
||||
|
||||
def get_bilou(length):
|
||||
def get_biluo(length):
|
||||
if length == 0:
|
||||
raise ValueError(Errors.E127)
|
||||
elif length == 1:
|
||||
|
|
|
@ -109,6 +109,7 @@ cdef class Morphology:
|
|||
analysis.tag = rich_tag
|
||||
analysis.lemma = self.lemmatize(analysis.tag.pos, token.lex.orth,
|
||||
self.tag_map.get(tag_str, {}))
|
||||
|
||||
self._cache.set(tag_id, token.lex.orth, analysis)
|
||||
if token.lemma == 0:
|
||||
token.lemma = analysis.lemma
|
||||
|
@ -140,7 +141,7 @@ cdef class Morphology:
|
|||
if tag not in self.reverse_index:
|
||||
return
|
||||
tag_id = self.reverse_index[tag]
|
||||
orth = self.strings[orth_str]
|
||||
orth = self.strings.add(orth_str)
|
||||
cdef RichTagC rich_tag = self.rich_tags[tag_id]
|
||||
attrs = intify_attrs(attrs, self.strings, _do_deprecated=True)
|
||||
cached = <MorphAnalysisC*>self._cache.get(tag_id, orth)
|
||||
|
|
|
@ -35,7 +35,17 @@ class PRFScore(object):
|
|||
|
||||
|
||||
class Scorer(object):
|
||||
"""Compute evaluation scores."""
|
||||
|
||||
def __init__(self, eval_punct=False):
|
||||
"""Initialize the Scorer.
|
||||
|
||||
eval_punct (bool): Evaluate the dependency attachments to and from
|
||||
punctuation.
|
||||
RETURNS (Scorer): The newly created object.
|
||||
|
||||
DOCS: https://spacy.io/api/scorer#init
|
||||
"""
|
||||
self.tokens = PRFScore()
|
||||
self.sbd = PRFScore()
|
||||
self.unlabelled = PRFScore()
|
||||
|
@ -46,34 +56,46 @@ class Scorer(object):
|
|||
|
||||
@property
|
||||
def tags_acc(self):
|
||||
"""RETURNS (float): Part-of-speech tag accuracy (fine grained tags,
|
||||
i.e. `Token.tag`).
|
||||
"""
|
||||
return self.tags.fscore * 100
|
||||
|
||||
@property
|
||||
def token_acc(self):
|
||||
"""RETURNS (float): Tokenization accuracy."""
|
||||
return self.tokens.precision * 100
|
||||
|
||||
@property
|
||||
def uas(self):
|
||||
"""RETURNS (float): Unlabelled dependency score."""
|
||||
return self.unlabelled.fscore * 100
|
||||
|
||||
@property
|
||||
def las(self):
|
||||
"""RETURNS (float): Labelled depdendency score."""
|
||||
return self.labelled.fscore * 100
|
||||
|
||||
@property
|
||||
def ents_p(self):
|
||||
"""RETURNS (float): Named entity accuracy (precision)."""
|
||||
return self.ner.precision * 100
|
||||
|
||||
@property
|
||||
def ents_r(self):
|
||||
"""RETURNS (float): Named entity accuracy (recall)."""
|
||||
return self.ner.recall * 100
|
||||
|
||||
@property
|
||||
def ents_f(self):
|
||||
"""RETURNS (float): Named entity accuracy (F-score)."""
|
||||
return self.ner.fscore * 100
|
||||
|
||||
@property
|
||||
def scores(self):
|
||||
"""RETURNS (dict): All scores with keys `uas`, `las`, `ents_p`,
|
||||
`ents_r`, `ents_f`, `tags_acc` and `token_acc`.
|
||||
"""
|
||||
return {
|
||||
"uas": self.uas,
|
||||
"las": self.las,
|
||||
|
@ -84,9 +106,20 @@ class Scorer(object):
|
|||
"token_acc": self.token_acc,
|
||||
}
|
||||
|
||||
def score(self, tokens, gold, verbose=False, punct_labels=("p", "punct")):
|
||||
if len(tokens) != len(gold):
|
||||
gold = GoldParse.from_annot_tuples(tokens, zip(*gold.orig_annot))
|
||||
def score(self, doc, gold, verbose=False, punct_labels=("p", "punct")):
|
||||
"""Update the evaluation scores from a single Doc / GoldParse pair.
|
||||
|
||||
doc (Doc): The predicted annotations.
|
||||
gold (GoldParse): The correct annotations.
|
||||
verbose (bool): Print debugging information.
|
||||
punct_labels (tuple): Dependency labels for punctuation. Used to
|
||||
evaluate dependency attachments to punctuation if `eval_punct` is
|
||||
`True`.
|
||||
|
||||
DOCS: https://spacy.io/api/scorer#score
|
||||
"""
|
||||
if len(doc) != len(gold):
|
||||
gold = GoldParse.from_annot_tuples(doc, zip(*gold.orig_annot))
|
||||
gold_deps = set()
|
||||
gold_tags = set()
|
||||
gold_ents = set(tags_to_entities([annot[-1] for annot in gold.orig_annot]))
|
||||
|
@ -96,7 +129,7 @@ class Scorer(object):
|
|||
gold_deps.add((id_, head, dep.lower()))
|
||||
cand_deps = set()
|
||||
cand_tags = set()
|
||||
for token in tokens:
|
||||
for token in doc:
|
||||
if token.orth_.isspace():
|
||||
continue
|
||||
gold_i = gold.cand_to_gold[token.i]
|
||||
|
@ -116,7 +149,7 @@ class Scorer(object):
|
|||
cand_deps.add((gold_i, gold_head, token.dep_.lower()))
|
||||
if "-" not in [token[-1] for token in gold.orig_annot]:
|
||||
cand_ents = set()
|
||||
for ent in tokens.ents:
|
||||
for ent in doc.ents:
|
||||
first = gold.cand_to_gold[ent.start]
|
||||
last = gold.cand_to_gold[ent.end - 1]
|
||||
if first is None or last is None:
|
||||
|
|
|
@ -6,6 +6,7 @@ from spacy.attrs import ORTH, LENGTH
|
|||
from spacy.tokens import Doc, Span
|
||||
from spacy.vocab import Vocab
|
||||
from spacy.errors import ModelsWarning
|
||||
from spacy.util import filter_spans
|
||||
|
||||
from ..util import get_doc
|
||||
|
||||
|
@ -219,3 +220,21 @@ def test_span_ents_property(doc):
|
|||
assert sentences[2].ents[0].label_ == "PRODUCT"
|
||||
assert sentences[2].ents[0].start == 11
|
||||
assert sentences[2].ents[0].end == 14
|
||||
|
||||
|
||||
def test_filter_spans(doc):
|
||||
# Test filtering duplicates
|
||||
spans = [doc[1:4], doc[6:8], doc[1:4], doc[10:14]]
|
||||
filtered = filter_spans(spans)
|
||||
assert len(filtered) == 3
|
||||
assert filtered[0].start == 1 and filtered[0].end == 4
|
||||
assert filtered[1].start == 6 and filtered[1].end == 8
|
||||
assert filtered[2].start == 10 and filtered[2].end == 14
|
||||
# Test filtering overlaps with longest preference
|
||||
spans = [doc[1:4], doc[1:3], doc[5:10], doc[7:9], doc[1:4]]
|
||||
filtered = filter_spans(spans)
|
||||
assert len(filtered) == 2
|
||||
assert len(filtered[0]) == 3
|
||||
assert len(filtered[1]) == 5
|
||||
assert filtered[0].start == 1 and filtered[0].end == 4
|
||||
assert filtered[1].start == 5 and filtered[1].end == 10
|
||||
|
|
|
@ -140,3 +140,28 @@ def test_underscore_mutable_defaults_dict(en_vocab):
|
|||
assert len(token1._.mutable) == 2
|
||||
assert token1._.mutable["x"] == ["y"]
|
||||
assert len(token2._.mutable) == 0
|
||||
|
||||
|
||||
def test_underscore_dir(en_vocab):
|
||||
"""Test that dir() correctly returns extension attributes. This enables
|
||||
things like tab-completion for the attributes in doc._."""
|
||||
Doc.set_extension("test_dir", default=None)
|
||||
doc = Doc(en_vocab, words=["hello", "world"])
|
||||
assert "_" in dir(doc)
|
||||
assert "test_dir" in dir(doc._)
|
||||
assert "test_dir" not in dir(doc[0]._)
|
||||
assert "test_dir" not in dir(doc[0:2]._)
|
||||
|
||||
|
||||
def test_underscore_docstring(en_vocab):
|
||||
"""Test that docstrings are available for extension methods, even though
|
||||
they're partials."""
|
||||
|
||||
def test_method(doc, arg1=1, arg2=2):
|
||||
"""I am a docstring"""
|
||||
return (arg1, arg2)
|
||||
|
||||
Doc.set_extension("test_docstrings", method=test_method)
|
||||
doc = Doc(en_vocab, words=["hello", "world"])
|
||||
assert test_method.__doc__ == "I am a docstring"
|
||||
assert doc._.test_docstrings.__doc__.rsplit(". ")[-1] == "I am a docstring"
|
||||
|
|
|
@ -52,11 +52,13 @@ def test_get_pipe(nlp, name):
|
|||
assert nlp.get_pipe(name) == new_pipe
|
||||
|
||||
|
||||
@pytest.mark.parametrize("name,replacement", [("my_component", lambda doc: doc)])
|
||||
def test_replace_pipe(nlp, name, replacement):
|
||||
@pytest.mark.parametrize("name,replacement,not_callable", [("my_component", lambda doc: doc, {})])
|
||||
def test_replace_pipe(nlp, name, replacement, not_callable):
|
||||
with pytest.raises(ValueError):
|
||||
nlp.replace_pipe(name, new_pipe)
|
||||
nlp.add_pipe(new_pipe, name=name)
|
||||
with pytest.raises(ValueError):
|
||||
nlp.replace_pipe(name, not_callable)
|
||||
nlp.replace_pipe(name, replacement)
|
||||
assert nlp.get_pipe(name) != new_pipe
|
||||
assert nlp.get_pipe(name) == replacement
|
||||
|
|
|
@ -6,20 +6,16 @@ import pytest
|
|||
from spacy.lang.en import English
|
||||
|
||||
|
||||
@pytest.mark.xfail(reason="Current default suffix rules avoid one upper-case letter before a dot.")
|
||||
@pytest.mark.xfail(reason="default suffix rules avoid one upper-case letter before dot")
|
||||
def test_issue3449():
|
||||
nlp = English()
|
||||
nlp.add_pipe(nlp.create_pipe('sentencizer'))
|
||||
|
||||
nlp.add_pipe(nlp.create_pipe("sentencizer"))
|
||||
text1 = "He gave the ball to I. Do you want to go to the movies with I?"
|
||||
text2 = "He gave the ball to I. Do you want to go to the movies with I?"
|
||||
text3 = "He gave the ball to I.\nDo you want to go to the movies with I?"
|
||||
|
||||
t1 = nlp(text1)
|
||||
t2 = nlp(text2)
|
||||
t3 = nlp(text3)
|
||||
|
||||
assert t1[5].text == 'I'
|
||||
assert t2[5].text == 'I'
|
||||
assert t3[5].text == 'I'
|
||||
|
||||
assert t1[5].text == "I"
|
||||
assert t2[5].text == "I"
|
||||
assert t3[5].text == "I"
|
||||
|
|
15
spacy/tests/regression/test_issue3549.py
Normal file
15
spacy/tests/regression/test_issue3549.py
Normal file
|
@ -0,0 +1,15 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import pytest
|
||||
from spacy.matcher import Matcher
|
||||
from spacy.errors import MatchPatternError
|
||||
|
||||
|
||||
def test_issue3549(en_vocab):
|
||||
"""Test that match pattern validation doesn't raise on empty errors."""
|
||||
matcher = Matcher(en_vocab, validate=True)
|
||||
pattern = [{"LOWER": "hello"}, {"LOWER": "world"}]
|
||||
matcher.add("GOOD", None, pattern)
|
||||
with pytest.raises(MatchPatternError):
|
||||
matcher.add("BAD", None, [{"X": "Y"}])
|
17
spacy/tests/regression/test_issue3555.py
Normal file
17
spacy/tests/regression/test_issue3555.py
Normal file
|
@ -0,0 +1,17 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import pytest
|
||||
from spacy.tokens import Doc, Token
|
||||
from spacy.matcher import Matcher
|
||||
|
||||
|
||||
@pytest.mark.xfail
|
||||
def test_issue3555(en_vocab):
|
||||
"""Test that custom extensions with default None don't break matcher."""
|
||||
Token.set_extension("issue3555", default=None)
|
||||
matcher = Matcher(en_vocab)
|
||||
pattern = [{"LEMMA": "have"}, {"_": {"issue3555": True}}]
|
||||
matcher.add("TEST", None, pattern)
|
||||
doc = Doc(en_vocab, words=["have", "apple"])
|
||||
matcher(doc)
|
15
spacy/tests/regression/test_issue3803.py
Normal file
15
spacy/tests/regression/test_issue3803.py
Normal file
|
@ -0,0 +1,15 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import pytest
|
||||
|
||||
from spacy.lang.es import Spanish
|
||||
|
||||
|
||||
def test_issue3803():
|
||||
"""Test that spanish num-like tokens have True for like_num attribute."""
|
||||
nlp = Spanish()
|
||||
text = "2 dos 1000 mil 12 doce"
|
||||
doc = nlp(text)
|
||||
|
||||
assert [t.like_num for t in doc] == [True, True, True, True, True, True]
|
|
@ -3,11 +3,13 @@ from __future__ import unicode_literals
|
|||
|
||||
import pytest
|
||||
import os
|
||||
import ctypes
|
||||
from pathlib import Path
|
||||
from spacy import util
|
||||
from spacy import prefer_gpu, require_gpu
|
||||
from spacy.compat import symlink_to, symlink_remove, path2str
|
||||
from spacy.compat import symlink_to, symlink_remove, path2str, is_windows
|
||||
from spacy._ml import PrecomputableAffine
|
||||
from subprocess import CalledProcessError
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
|
@ -28,12 +30,25 @@ def symlink_setup_target(request, symlink_target, symlink):
|
|||
# https://github.com/pytest-dev/pytest/issues/2508#issuecomment-309934240
|
||||
|
||||
def cleanup():
|
||||
symlink_remove(symlink)
|
||||
# Remove symlink only if it was created
|
||||
if symlink.exists():
|
||||
symlink_remove(symlink)
|
||||
os.rmdir(path2str(symlink_target))
|
||||
|
||||
request.addfinalizer(cleanup)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def is_admin():
|
||||
"""Determine if the tests are run as admin or not."""
|
||||
try:
|
||||
admin = os.getuid() == 0
|
||||
except AttributeError:
|
||||
admin = ctypes.windll.shell32.IsUserAnAdmin() != 0
|
||||
|
||||
return admin
|
||||
|
||||
|
||||
@pytest.mark.parametrize("text", ["hello/world", "hello world"])
|
||||
def test_util_ensure_path_succeeds(text):
|
||||
path = util.ensure_path(text)
|
||||
|
@ -88,7 +103,20 @@ def test_require_gpu():
|
|||
require_gpu()
|
||||
|
||||
|
||||
def test_create_symlink_windows(symlink_setup_target, symlink_target, symlink):
|
||||
def test_create_symlink_windows(
|
||||
symlink_setup_target, symlink_target, symlink, is_admin
|
||||
):
|
||||
"""Test the creation of symlinks on windows. If run as admin or not on windows it should succeed, otherwise a CalledProcessError should be raised."""
|
||||
assert symlink_target.exists()
|
||||
symlink_to(symlink, symlink_target)
|
||||
assert symlink.exists()
|
||||
|
||||
if is_admin or not is_windows:
|
||||
try:
|
||||
symlink_to(symlink, symlink_target)
|
||||
assert symlink.exists()
|
||||
except CalledProcessError as e:
|
||||
pytest.fail(e)
|
||||
else:
|
||||
with pytest.raises(CalledProcessError):
|
||||
symlink_to(symlink, symlink_target)
|
||||
|
||||
assert not symlink.exists()
|
||||
|
|
|
@ -25,6 +25,11 @@ class Underscore(object):
|
|||
object.__setattr__(self, "_start", start)
|
||||
object.__setattr__(self, "_end", end)
|
||||
|
||||
def __dir__(self):
|
||||
# Hack to enable autocomplete on custom extensions
|
||||
extensions = list(self._extensions.keys())
|
||||
return ["set", "get", "has"] + extensions
|
||||
|
||||
def __getattr__(self, name):
|
||||
if name not in self._extensions:
|
||||
raise AttributeError(Errors.E046.format(name=name))
|
||||
|
@ -32,7 +37,16 @@ class Underscore(object):
|
|||
if getter is not None:
|
||||
return getter(self._obj)
|
||||
elif method is not None:
|
||||
return functools.partial(method, self._obj)
|
||||
method_partial = functools.partial(method, self._obj)
|
||||
# Hack to port over docstrings of the original function
|
||||
# See https://stackoverflow.com/q/27362727/6400719
|
||||
method_docstring = method.__doc__ or ""
|
||||
method_docstring_prefix = (
|
||||
"This method is a partial function and its first argument "
|
||||
"(the object it's called on) will be filled automatically. "
|
||||
)
|
||||
method_partial.__doc__ = method_docstring_prefix + method_docstring
|
||||
return method_partial
|
||||
else:
|
||||
key = self._get_key(name)
|
||||
if key in self._doc.user_data:
|
||||
|
|
|
@ -14,8 +14,11 @@ import functools
|
|||
import itertools
|
||||
import numpy.random
|
||||
import srsly
|
||||
from jsonschema import Draft4Validator
|
||||
|
||||
try:
|
||||
import jsonschema
|
||||
except ImportError:
|
||||
jsonschema = None
|
||||
|
||||
try:
|
||||
import cupy.random
|
||||
|
@ -510,7 +513,7 @@ def decaying(start, stop, decay):
|
|||
curr = float(start)
|
||||
while True:
|
||||
yield max(curr, stop)
|
||||
curr -= (decay)
|
||||
curr -= decay
|
||||
|
||||
|
||||
def minibatch_by_words(items, size, tuples=True, count_words=len):
|
||||
|
@ -571,6 +574,28 @@ def itershuffle(iterable, bufsize=1000):
|
|||
raise StopIteration
|
||||
|
||||
|
||||
def filter_spans(spans):
|
||||
"""Filter a sequence of spans and remove duplicates or overlaps. Useful for
|
||||
creating named entities (where one token can only be part of one entity) or
|
||||
when merging spans with `Retokenizer.merge`. When spans overlap, the (first)
|
||||
longest span is preferred over shorter spans.
|
||||
|
||||
spans (iterable): The spans to filter.
|
||||
RETURNS (list): The filtered spans.
|
||||
"""
|
||||
get_sort_key = lambda span: (span.end - span.start, span.start)
|
||||
sorted_spans = sorted(spans, key=get_sort_key, reverse=True)
|
||||
result = []
|
||||
seen_tokens = set()
|
||||
for span in sorted_spans:
|
||||
# Check for end - 1 here because boundaries are inclusive
|
||||
if span.start not in seen_tokens and span.end - 1 not in seen_tokens:
|
||||
result.append(span)
|
||||
seen_tokens.update(range(span.start, span.end))
|
||||
result = sorted(result, key=lambda span: span.start)
|
||||
return result
|
||||
|
||||
|
||||
def to_bytes(getters, exclude):
|
||||
serialized = OrderedDict()
|
||||
for key, getter in getters.items():
|
||||
|
@ -660,7 +685,9 @@ def get_json_validator(schema):
|
|||
# validator that's used (e.g. different draft implementation), without
|
||||
# having to change it all across the codebase.
|
||||
# TODO: replace with (stable) Draft6Validator, if available
|
||||
return Draft4Validator(schema)
|
||||
if jsonschema is None:
|
||||
raise ValueError(Errors.E136)
|
||||
return jsonschema.Draft4Validator(schema)
|
||||
|
||||
|
||||
def validate_schema(schema):
|
||||
|
|
|
@ -457,7 +457,7 @@ sit amet dignissim justo congue.
|
|||
## Setup and installation {#setup}
|
||||
|
||||
Before running the setup, make sure your versions of
|
||||
[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date.
|
||||
[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date. Node v10.15 or later is required.
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
|
|
94
website/UNIVERSE.md
Normal file
94
website/UNIVERSE.md
Normal file
|
@ -0,0 +1,94 @@
|
|||
<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
|
||||
|
||||
# spaCy Universe
|
||||
|
||||
The [spaCy Universe](https://spacy.io/universe) collects the many great resources developed with or for spaCy. It
|
||||
includes standalone packages, plugins, extensions, educational materials,
|
||||
operational utilities and bindings for other languages.
|
||||
|
||||
If you have a project that you want the spaCy community to make use of, you can
|
||||
suggest it by submitting a pull request to this repository. The Universe
|
||||
database is open-source and collected in a simple JSON file.
|
||||
|
||||
Looking for inspiration for your own spaCy plugin or extension? Check out the
|
||||
[`project idea`](https://github.com/explosion/spaCy/labels/project%20idea) label
|
||||
on the issue tracker.
|
||||
|
||||
## Checklist
|
||||
|
||||
### Projects
|
||||
|
||||
✅ Libraries and packages should be **open-source** (with a user-friendly license) and at least somewhat **documented** (e.g. a simple `README` with usage instructions).
|
||||
|
||||
✅ We're happy to include work in progress and prereleases, but we'd like to keep the emphasis on projects that should be useful to the community **right away**.
|
||||
|
||||
✅ Demos and visualizers should be available via a **public URL**.
|
||||
|
||||
### Educational Materials
|
||||
|
||||
✅ Books should be **available for purchase or download** (not just pre-order). Ebooks and self-published books are fine, too, if they include enough substantial content.
|
||||
|
||||
✅ The `"url"` of book entries should either point to the publisher's website or a reseller of your choice (ideally one that ships worldwide or as close as possible).
|
||||
|
||||
✅ If an online course is only available behind a paywall, it should at least have a **free excerpt** or chapter available, so users know what to expect.
|
||||
|
||||
## JSON format
|
||||
|
||||
To add a project, fork this repository, edit the [`universe.json`](meta/universe.json)
|
||||
and add an object of the following format to the list of `"resources"`. Before
|
||||
you submit your pull request, make sure to use a linter to verify that your
|
||||
markup is correct.
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "unique-project-id",
|
||||
"title": "Project title",
|
||||
"slogan": "A short summary",
|
||||
"description": "A longer description – *Mardown allowed!*",
|
||||
"github": "user/repo",
|
||||
"pip": "package-name",
|
||||
"code_example": [
|
||||
"import spacy",
|
||||
"import package_name",
|
||||
"",
|
||||
"nlp = spacy.load('en')",
|
||||
"nlp.add_pipe(package_name)"
|
||||
],
|
||||
"code_language": "python",
|
||||
"url": "https://example.com",
|
||||
"thumb": "https://example.com/thumb.jpg",
|
||||
"image": "https://example.com/image.jpg",
|
||||
"author": "Your Name",
|
||||
"author_links": {
|
||||
"twitter": "username",
|
||||
"github": "username",
|
||||
"website": "https://example.com"
|
||||
},
|
||||
"category": ["pipeline", "standalone"],
|
||||
"tags": ["some-tag", "etc"]
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Description |
|
||||
| --- | --- | --- |
|
||||
| `id` | string | Unique ID of the project. |
|
||||
| `title` | string | Project title. If not set, the `id` will be used as the display title. |
|
||||
| `slogan` | string | A short description of the project. Displayed in the overview and under the title. |
|
||||
| `description` | string | A longer description of the project. Markdown is allowed, but should be limited to basic formatting like bold, italics, code or links. |
|
||||
| `github` | string | Associated GitHub repo in the format `user/repo`. Will be displayed as a link and used for release, license and star badges. |
|
||||
| `pip` | string | Package name on pip. If available, the installation command will be displayed. |
|
||||
| `cran` | string | For R packages: package name on CRAN. If available, the installation command will be displayed. |
|
||||
| `code_example` | array | Short example that shows how to use the project. Formatted as an array with one string per line. |
|
||||
| `code_language` | string | Defaults to `'python'`. Optional code language used for syntax highlighting with [Prism](http://prismjs.com/). |
|
||||
| `url` | string | Optional project link to display as button. |
|
||||
| `thumb` | string | Optional URL to project thumbnail to display in overview and project header. Recommended size is 100x100px. |
|
||||
| `image` | string | Optional URL to project image to display with description. |
|
||||
| `author` | string | Name(s) of project author(s). |
|
||||
| `author_links` | object | Usernames and links to display as icons to author info. Currently supports `twitter` and `github` usernames, as well as `website` link. |
|
||||
| `category` | list | One or more categories to assign to project. Must be one of the available options. |
|
||||
| `tags` | list | Still experimental and not used for filtering: one or more tags to assign to project. |
|
||||
|
||||
To separate them from the projects, educational materials also specify
|
||||
`"type": "education`. Books can also set a `"cover"` field containing a URL
|
||||
to a cover image. If available, it's used in the overview and displayed on
|
||||
the individual book page.
|
|
@ -510,7 +510,7 @@ described in any single publication. The model is a greedy transition-based
|
|||
parser guided by a linear model whose weights are learned using the averaged
|
||||
perceptron loss, via the
|
||||
[dynamic oracle](http://www.aclweb.org/anthology/C12-1059) imitation learning
|
||||
strategy. The transition system is equivalent to the BILOU tagging scheme.
|
||||
strategy. The transition system is equivalent to the BILUO tagging scheme.
|
||||
|
||||
## Models and training data {#training}
|
||||
|
||||
|
|
|
@ -189,7 +189,7 @@ using the [`package`](/api/cli#package) command.
|
|||
|
||||
<Infobox title="Changed in v2.1" variant="warning">
|
||||
|
||||
As of spaCy 2.1, the `--no-tagger`, `--no-parser` and `--no-parser` flags have
|
||||
As of spaCy 2.1, the `--no-tagger`, `--no-parser` and `--no-entities` flags have
|
||||
been replaced by a `--pipeline` option, which lets you define comma-separated
|
||||
names of pipeline components to train. For example, `--pipeline tagger,parser`
|
||||
will only train the tagger and parser.
|
||||
|
@ -198,7 +198,7 @@ will only train the tagger and parser.
|
|||
|
||||
```bash
|
||||
$ python -m spacy train [lang] [output_path] [train_path] [dev_path]
|
||||
[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-examples] [--use-gpu]
|
||||
[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-early-stopping] [--n-examples] [--use-gpu]
|
||||
[--version] [--meta-path] [--init-tok2vec] [--parser-multitasks]
|
||||
[--entity-multitasks] [--gold-preproc] [--noise-level] [--learn-tokens]
|
||||
[--verbose]
|
||||
|
@ -210,10 +210,11 @@ $ python -m spacy train [lang] [output_path] [train_path] [dev_path]
|
|||
| `output_path` | positional | Directory to store model in. Will be created if it doesn't exist. |
|
||||
| `train_path` | positional | Location of JSON-formatted training data. Can be a file or a directory of files. |
|
||||
| `dev_path` | positional | Location of JSON-formatted development data for evaluation. Can be a file or a directory of files. |
|
||||
| `--base-model`, `-b` | option | Optional name of base model to update. Can be any loadable spaCy model. |
|
||||
| `--base-model`, `-b` <Tag variant="new">2.1</Tag> | option | Optional name of base model to update. Can be any loadable spaCy model. |
|
||||
| `--pipeline`, `-p` <Tag variant="new">2.1</Tag> | option | Comma-separated names of pipeline components to train. Defaults to `'tagger,parser,ner'`. |
|
||||
| `--vectors`, `-v` | option | Model to load vectors from. |
|
||||
| `--n-iter`, `-n` | option | Number of iterations (default: `30`). |
|
||||
| `--n-early-stopping`, `-ne` | option | Maximum number of training epochs without dev accuracy improvement. |
|
||||
| `--n-examples`, `-ns` | option | Number of examples to use (defaults to `0` for all examples). |
|
||||
| `--use-gpu`, `-g` | option | Whether to use GPU. Can be either `0`, `1` or `-1`. |
|
||||
| `--version`, `-V` | option | Model version. Will be written out to the model's `meta.json` after training. |
|
||||
|
@ -274,7 +275,7 @@ an approximate language-modeling objective. Specifically, we load pre-trained
|
|||
vectors, and train a component like a CNN, BiLSTM, etc to predict vectors which
|
||||
match the pre-trained ones. The weights are saved to a directory after each
|
||||
epoch. You can then pass a path to one of these pre-trained weights files to the
|
||||
'spacy train' command.
|
||||
`spacy train` command.
|
||||
|
||||
This technique may be especially helpful if you have little labelled data.
|
||||
However, it's still quite experimental, so your mileage may vary. To load the
|
||||
|
@ -285,24 +286,26 @@ improvement.
|
|||
```bash
|
||||
$ python -m spacy pretrain [texts_loc] [vectors_model] [output_dir] [--width]
|
||||
[--depth] [--embed-rows] [--dropout] [--seed] [--n-iter] [--use-vectors]
|
||||
[--n-save_every]
|
||||
```
|
||||
|
||||
| Argument | Type | Description |
|
||||
| ---------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `texts_loc` | positional | Path to JSONL file with raw texts to learn from, with text provided as the key `"text"`. [See here](#pretrain-jsonl) for details. |
|
||||
| `vectors_model` | positional | Name or path to spaCy model with vectors to learn from. |
|
||||
| `output_dir` | positional | Directory to write models to on each epoch. |
|
||||
| `--width`, `-cw` | option | Width of CNN layers. |
|
||||
| `--depth`, `-cd` | option | Depth of CNN layers. |
|
||||
| `--embed-rows`, `-er` | option | Number of embedding rows. |
|
||||
| `--dropout`, `-d` | option | Dropout rate. |
|
||||
| `--batch-size`, `-bs` | option | Number of words per training batch. |
|
||||
| `--max-length`, `-xw` | option | Maximum words per example. Longer examples are discarded. |
|
||||
| `--min-length`, `-nw` | option | Minimum words per example. Shorter examples are discarded. |
|
||||
| `--seed`, `-s` | option | Seed for random number generators. |
|
||||
| `--n-iter`, `-i` | option | Number of iterations to pretrain. |
|
||||
| `--use-vectors`, `-uv` | flag | Whether to use the static vectors as input features. |
|
||||
| **CREATES** | weights | The pre-trained weights that can be used to initialize `spacy train`. |
|
||||
| Argument | Type | Description |
|
||||
| ----------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `texts_loc` | positional | Path to JSONL file with raw texts to learn from, with text provided as the key `"text"`. [See here](#pretrain-jsonl) for details. |
|
||||
| `vectors_model` | positional | Name or path to spaCy model with vectors to learn from. |
|
||||
| `output_dir` | positional | Directory to write models to on each epoch. |
|
||||
| `--width`, `-cw` | option | Width of CNN layers. |
|
||||
| `--depth`, `-cd` | option | Depth of CNN layers. |
|
||||
| `--embed-rows`, `-er` | option | Number of embedding rows. |
|
||||
| `--dropout`, `-d` | option | Dropout rate. |
|
||||
| `--batch-size`, `-bs` | option | Number of words per training batch. |
|
||||
| `--max-length`, `-xw` | option | Maximum words per example. Longer examples are discarded. |
|
||||
| `--min-length`, `-nw` | option | Minimum words per example. Shorter examples are discarded. |
|
||||
| `--seed`, `-s` | option | Seed for random number generators. |
|
||||
| `--n-iter`, `-i` | option | Number of iterations to pretrain. |
|
||||
| `--use-vectors`, `-uv` | flag | Whether to use the static vectors as input features. |
|
||||
| `--n-save_every`, `-se` | option | Save model every X batches. |
|
||||
| **CREATES** | weights | The pre-trained weights that can be used to initialize `spacy train`. |
|
||||
|
||||
### JSONL format for raw text {#pretrain-jsonl}
|
||||
|
||||
|
@ -324,7 +327,7 @@ tokenization can be provided.
|
|||
|
||||
| Key | Type | Description |
|
||||
| -------- | ------- | -------------------------------------------- |
|
||||
| `text` | unicode | The raw input text. |
|
||||
| `text` | unicode | The raw input text. Is not required if `tokens` available. |
|
||||
| `tokens` | list | Optional tokenization, one string per token. |
|
||||
|
||||
```json
|
||||
|
@ -332,6 +335,7 @@ tokenization can be provided.
|
|||
{"text": "Can I ask where you work now and what you do, and if you enjoy it?"}
|
||||
{"text": "They may just pull out of the Seattle market completely, at least until they have autonomous vehicles."}
|
||||
{"text": "My cynical view on this is that it will never be free to the public. Reason: what would be the draw of joining the military? Right now their selling point is free Healthcare and Education. Ironically both are run horribly and most, that I've talked to, come out wishing they never went in."}
|
||||
{"tokens": ["If", "tokens", "are", "provided", "then", "we", "can", "skip", "the", "raw", "input", "text"]}
|
||||
```
|
||||
|
||||
## Init Model {#init-model new="2"}
|
||||
|
@ -375,7 +379,7 @@ pipeline.
|
|||
|
||||
```bash
|
||||
$ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit]
|
||||
[--gpu-id] [--gold-preproc]
|
||||
[--gpu-id] [--gold-preproc] [--return-scores]
|
||||
```
|
||||
|
||||
| Argument | Type | Description |
|
||||
|
@ -386,6 +390,7 @@ $ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-lim
|
|||
| `--displacy-limit`, `-dl` | option | Number of parses to generate per file. Defaults to `25`. Keep in mind that a significantly higher number might cause the `.html` files to render slowly. |
|
||||
| `--gpu-id`, `-g` | option | GPU to use, if any. Defaults to `-1` for CPU. |
|
||||
| `--gold-preproc`, `-G` | flag | Use gold preprocessing. |
|
||||
| `--return-scores`, `-R` | flag | Return dict containing model scores. |
|
||||
| **CREATES** | `stdout`, HTML | Training results and optional displaCy visualizations. |
|
||||
|
||||
## Package {#package}
|
||||
|
|
|
@ -172,7 +172,7 @@ struct.
|
|||
| `prefix` | <Abbr title="uint64_t">`attr_t`</Abbr> | Length-N substring from the start of the lexeme. Defaults to `N=1`. |
|
||||
| `suffix` | <Abbr title="uint64_t">`attr_t`</Abbr> | Length-N substring from the end of the lexeme. Defaults to `N=3`. |
|
||||
| `cluster` | <Abbr title="uint64_t">`attr_t`</Abbr> | Brown cluster ID. |
|
||||
| `prob` | `float` | Smoothed log probability estimate of the lexeme's type. |
|
||||
| `prob` | `float` | Smoothed log probability estimate of the lexeme's word type (context-independent entry in the vocabulary). |
|
||||
| `sentiment` | `float` | A scalar value indicating positivity or negativity. |
|
||||
|
||||
### Lexeme.get_struct_attr {#lexeme_get_struct_attr tag="staticmethod, nogil" source="spacy/lexeme.pxd"}
|
||||
|
|
|
@ -102,10 +102,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
|
|||
> scores = parser.predict([doc1, doc2])
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `docs` | iterable | The documents to predict. |
|
||||
| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
|
||||
| Name | Type | Description |
|
||||
| ----------- | ------------------- | ---------------------------------------------- |
|
||||
| `docs` | iterable | The documents to predict. |
|
||||
| **RETURNS** | `syntax.StateClass` | A helper class for the parse state (internal). |
|
||||
|
||||
## DependencyParser.set_annotations {#set_annotations tag="method"}
|
||||
|
||||
|
|
|
@ -119,8 +119,27 @@ Update the models in the pipeline.
|
|||
| `golds` | iterable | A batch of `GoldParse` objects or dictionaries. Dictionaries will be used to create [`GoldParse`](/api/goldparse) objects. For the available keys and their usage, see [`GoldParse.__init__`](/api/goldparse#init). |
|
||||
| `drop` | float | The dropout rate. |
|
||||
| `sgd` | callable | An optimizer. |
|
||||
| `losses` | dict | Dictionary to update with the loss, keyed by pipeline component. |
|
||||
| `component_cfg` <Tag variant="new">2.1</Tag> | dict | Config parameters for specific pipeline components, keyed by component name. |
|
||||
| **RETURNS** | dict | Results from the update. |
|
||||
|
||||
## Language.evaluate {#evaluate tag="method"}
|
||||
|
||||
Evaluate a model's pipeline components.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> scorer = nlp.evaluate(docs_golds, verbose=True)
|
||||
> print(scorer.scores)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| -------------------------------------------- | -------- | ------------------------------------------------------------------------------------- |
|
||||
| `docs_golds` | iterable | Tuples of `Doc` and `GoldParse` objects. |
|
||||
| `verbose` | bool | Print debugging information. |
|
||||
| `batch_size` | int | The batch size to use. |
|
||||
| `scorer` | `Scorer` | Optional [`Scorer`](/api/scorer) to use. If not passed in, a new one will be created. |
|
||||
| `component_cfg` <Tag variant="new">2.1</Tag> | dict | Config parameters for specific pipeline components, keyed by component name. |
|
||||
|
||||
## Language.begin_training {#begin_training tag="method"}
|
||||
|
||||
|
|
|
@ -128,7 +128,6 @@ The L2 norm of the lexeme's vector representation.
|
|||
| `text` | unicode | Verbatim text content. |
|
||||
| `orth` | int | ID of the verbatim text content. |
|
||||
| `orth_` | unicode | Verbatim text content (identical to `Lexeme.text`). Exists mostly for consistency with the other attributes. |
|
||||
| `lex_id` | int | ID of the lexeme's lexical type. |
|
||||
| `rank` | int | Sequential ID of the lexemes's lexical type, used to index into tables, e.g. for word vectors. |
|
||||
| `flags` | int | Container of the lexeme's binary flags. |
|
||||
| `norm` | int | The lexemes's norm, i.e. a normalized form of the lexeme text. |
|
||||
|
@ -161,6 +160,6 @@ The L2 norm of the lexeme's vector representation.
|
|||
| `is_stop` | bool | Is the lexeme part of a "stop list"? |
|
||||
| `lang` | int | Language of the parent vocabulary. |
|
||||
| `lang_` | unicode | Language of the parent vocabulary. |
|
||||
| `prob` | float | Smoothed log probability estimate of the lexeme's type. |
|
||||
| `prob` | float | Smoothed log probability estimate of the lexeme's word type (context-independent entry in the vocabulary). |
|
||||
| `cluster` | int | Brown cluster ID. |
|
||||
| `sentiment` | float | A scalar value indicating the positivity or negativity of the lexeme. |
|
||||
|
|
58
website/docs/api/scorer.md
Normal file
58
website/docs/api/scorer.md
Normal file
|
@ -0,0 +1,58 @@
|
|||
---
|
||||
title: Scorer
|
||||
teaser: Compute evaluation scores
|
||||
tag: class
|
||||
source: spacy/scorer.py
|
||||
---
|
||||
|
||||
The `Scorer` computes and stores evaluation scores. It's typically created by
|
||||
[`Language.evaluate`](/api/language#evaluate).
|
||||
|
||||
## Scorer.\_\_init\_\_ {#init tag="method"}
|
||||
|
||||
Create a new `Scorer`.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> from spacy.scorer import Scorer
|
||||
>
|
||||
> scorer = Scorer()
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ------------ | -------- | ------------------------------------------------------------ |
|
||||
| `eval_punct` | bool | Evaluate the dependency attachments to and from punctuation. |
|
||||
| **RETURNS** | `Scorer` | The newly created object. |
|
||||
|
||||
## Scorer.score {#score tag="method"}
|
||||
|
||||
Update the evaluation scores from a single [`Doc`](/api/doc) /
|
||||
[`GoldParse`](/api/goldparse) pair.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> scorer = Scorer()
|
||||
> scorer.score(doc, gold)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| -------------- | ----------- | -------------------------------------------------------------------------------------------------------------------- |
|
||||
| `doc` | `Doc` | The predicted annotations. |
|
||||
| `gold` | `GoldParse` | The correct annotations. |
|
||||
| `verbose` | bool | Print debugging information. |
|
||||
| `punct_labels` | tuple | Dependency labels for punctuation. Used to evaluate dependency attachments to punctuation if `eval_punct` is `True`. |
|
||||
|
||||
## Properties
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | ----- | -------------------------------------------------------------------------------------------- |
|
||||
| `token_acc` | float | Tokenization accuracy. |
|
||||
| `tags_acc` | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`). |
|
||||
| `uas` | float | Unlabelled dependency score. |
|
||||
| `las` | float | Labelled dependency score. |
|
||||
| `ents_p` | float | Named entity accuracy (precision). |
|
||||
| `ents_r` | float | Named entity accuracy (recall). |
|
||||
| `ents_f` | float | Named entity accuracy (F-score). |
|
||||
| `scores` | dict | All scores with keys `uas`, `las`, `ents_p`, `ents_r`, `ents_f`, `tags_acc` and `token_acc`. |
|
|
@ -424,7 +424,7 @@ The L2 norm of the token's vector representation.
|
|||
| `ent_type` | int | Named entity type. |
|
||||
| `ent_type_` | unicode | Named entity type. |
|
||||
| `ent_iob` | int | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set. | |
|
||||
| `ent_iob_` | unicode | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set. |
|
||||
| `ent_iob_` | unicode | IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set. |
|
||||
| `ent_id` | int | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. |
|
||||
| `ent_id_` | unicode | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. |
|
||||
| `lemma` | int | Base form of the token, with no inflectional suffixes. |
|
||||
|
@ -465,10 +465,10 @@ The L2 norm of the token's vector representation.
|
|||
| `dep_` | unicode | Syntactic dependency relation. |
|
||||
| `lang` | int | Language of the parent document's vocabulary. |
|
||||
| `lang_` | unicode | Language of the parent document's vocabulary. |
|
||||
| `prob` | float | Smoothed log probability estimate of token's type. |
|
||||
| `prob` | float | Smoothed log probability estimate of token's word type (context-independent entry in the vocabulary). |
|
||||
| `idx` | int | The character offset of the token within the parent document. |
|
||||
| `sentiment` | float | A scalar value indicating the positivity or negativity of the token. |
|
||||
| `lex_id` | int | Sequential ID of the token's lexical type. |
|
||||
| `lex_id` | int | Sequential ID of the token's lexical type, used to index into tables, e.g. for word vectors. |
|
||||
| `rank` | int | Sequential ID of the token's lexical type, used to index into tables, e.g. for word vectors. |
|
||||
| `cluster` | int | Brown cluster ID. |
|
||||
| `_` | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes). |
|
||||
|
|
|
@ -211,16 +211,16 @@ Render a dependency parse tree or named entity visualization.
|
|||
> html = displacy.render(doc, style="dep")
|
||||
> ```
|
||||
|
||||
| Name | Type | Description | Default |
|
||||
| ----------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------- |
|
||||
| `docs` | list, `Doc`, `Span` | Document(s) to visualize. |
|
||||
| `style` | unicode | Visualization style, `'dep'` or `'ent'`. | `'dep'` |
|
||||
| `page` | bool | Render markup as full HTML page. | `False` |
|
||||
| `minify` | bool | Minify HTML markup. | `False` |
|
||||
| `jupyter` | bool | Explicitly enable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. | detected automatically |
|
||||
| `options` | dict | [Visualizer-specific options](#options), e.g. colors. | `{}` |
|
||||
| `manual` | bool | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False` |
|
||||
| **RETURNS** | unicode | Rendered HTML markup. |
|
||||
| Name | Type | Description | Default |
|
||||
| ----------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
|
||||
| `docs` | list, `Doc`, `Span` | Document(s) to visualize. |
|
||||
| `style` | unicode | Visualization style, `'dep'` or `'ent'`. | `'dep'` |
|
||||
| `page` | bool | Render markup as full HTML page. | `False` |
|
||||
| `minify` | bool | Minify HTML markup. | `False` |
|
||||
| `jupyter` | bool | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None`. | `None` |
|
||||
| `options` | dict | [Visualizer-specific options](#options), e.g. colors. | `{}` |
|
||||
| `manual` | bool | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False` |
|
||||
| **RETURNS** | unicode | Rendered HTML markup. |
|
||||
|
||||
### Visualizer options {#displacy_options}
|
||||
|
||||
|
@ -351,7 +351,7 @@ the two-letter language code.
|
|||
| `name` | unicode | Two-letter language code, e.g. `'en'`. |
|
||||
| `cls` | `Language` | The language class, e.g. `English`. |
|
||||
|
||||
### util.lang_class_is_loaded (#util.lang_class_is_loaded tag="function" new="2.1")
|
||||
### util.lang_class_is_loaded {#util.lang_class_is_loaded tag="function" new="2.1"}
|
||||
|
||||
Check whether a `Language` class is already loaded. `Language` classes are
|
||||
loaded lazily, to avoid expensive setup code associated with the language data.
|
||||
|
@ -654,6 +654,27 @@ for batching. Larger `buffsize` means less bias.
|
|||
| `buffsize` | int | Items to hold back. |
|
||||
| **YIELDS** | iterable | The shuffled iterator. |
|
||||
|
||||
### util.filter_spans {#util.filter_spans tag="function" new="2.1.4"}
|
||||
|
||||
Filter a sequence of [`Span`](/api/span) objects and remove duplicates or
|
||||
overlaps. Useful for creating named entities (where one token can only be part
|
||||
of one entity) or when merging spans with
|
||||
[`Retokenizer.merge`](/api/doc#retokenizer.merge). When spans overlap, the
|
||||
(first) longest span is preferred over shorter spans.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> doc = nlp("This is a sentence.")
|
||||
> spans = [doc[0:2], doc[0:2], doc[0:4]]
|
||||
> filtered = filter_spans(spans)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | -------- | -------------------- |
|
||||
| `spans` | iterable | The spans to filter. |
|
||||
| **RETURNS** | list | The filtered spans. |
|
||||
|
||||
## Compatibility functions {#compat source="spacy/compaty.py"}
|
||||
|
||||
All Python code is written in an **intersection of Python 2 and Python 3**. This
|
||||
|
|
|
@ -306,7 +306,7 @@ vectors, they will be counted individually.
|
|||
|
||||
Load [GloVe](https://nlp.stanford.edu/projects/glove/) vectors from a directory.
|
||||
Assumes binary format, that the vocab is in a `vocab.txt`, and that vectors are
|
||||
named `vectors.{size}.[fd`.bin], e.g. `vectors.128.f.bin` for 128d float32
|
||||
named `vectors.{size}.[fd.bin]`, e.g. `vectors.128.f.bin` for 128d float32
|
||||
vectors, `vectors.300.d.bin` for 300d float64 (double) vectors, etc. By default
|
||||
GloVe outputs 64-bit vectors.
|
||||
|
||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 1.6 MiB |
BIN
website/docs/images/course.jpg
Normal file
BIN
website/docs/images/course.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 270 KiB |
|
@ -4,7 +4,7 @@ example, everything that's in your `nlp` object. This means you'll have to
|
|||
translate its contents and structure into a format that can be saved, like a
|
||||
file or a byte string. This process is called serialization. spaCy comes with
|
||||
**built-in serialization methods** and supports the
|
||||
[Pickle protocol](http://www.diveintopython3.net/serializing.html#dump).
|
||||
[Pickle protocol](https://www.diveinto.org/python3/serializing.html#dump).
|
||||
|
||||
> #### What's pickle?
|
||||
>
|
||||
|
|
|
@ -50,7 +50,7 @@ together.
|
|||
|
||||
## Benchmarks {#benchmarks}
|
||||
|
||||
Two peer-reviewed papers in 2015 confirm that spaCy offers the **fastest
|
||||
Two peer-reviewed papers in 2015 confirmed that spaCy offers the **fastest
|
||||
syntactic parser in the world** and that **its accuracy is within 1% of the
|
||||
best** available. The few systems that are more accurate are 20× slower or more.
|
||||
|
||||
|
|
|
@ -326,7 +326,7 @@ URLs.
|
|||
```text
|
||||
### requirements.txt
|
||||
spacy>=2.0.0,<3.0.0
|
||||
https://github.com/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm
|
||||
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz#egg=en_core_web_sm
|
||||
```
|
||||
|
||||
Specifying `#egg=` with the package name tells pip which package to expect from
|
||||
|
|
|
@ -260,7 +260,7 @@ def my_component(doc):
|
|||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
nlp.add_pipe(my_component, name="print_info", last=True)
|
||||
print(nlp.pipe_names) # ['print_info', 'tagger', 'parser', 'ner']
|
||||
print(nlp.pipe_names) # ['tagger', 'parser', 'ner', 'print_info']
|
||||
doc = nlp(u"This is a sentence.")
|
||||
|
||||
```
|
||||
|
|
|
@ -214,7 +214,8 @@ example, you might want to match different spellings of a word, without having
|
|||
to add a new pattern for each spelling.
|
||||
|
||||
```python
|
||||
pattern = [{"TEXT": {"REGEX": "^([Uu](\\.?|nited) ?[Ss](\\.?|tates)"}},
|
||||
pattern = [{"TEXT": {"REGEX": "^[Uu](\\.?|nited)$"}},
|
||||
{"TEXT": {"REGEX": "^[Ss](\\.?|tates)$"}},
|
||||
{"LOWER": "president"}]
|
||||
```
|
||||
|
||||
|
@ -227,7 +228,7 @@ attributes:
|
|||
pattern = [{"TAG": {"REGEX": "^V"}}]
|
||||
|
||||
# Match custom attribute values with regular expressions
|
||||
pattern = [{"_": {"country": {"REGEX": "^([Uu](\\.?|nited) ?[Ss](\\.?|tates)"}}}]
|
||||
pattern = [{"_": {"country": {"REGEX": "^[Uu](\\.?|nited) ?[Ss](\\.?|tates)$"}}}]
|
||||
```
|
||||
|
||||
<Infobox title="Regular expressions in older versions" variant="warning">
|
||||
|
@ -404,7 +405,7 @@ class BadHTMLMerger(object):
|
|||
for match_id, start, end in matches:
|
||||
spans.append(doc[start:end])
|
||||
with doc.retokenize() as retokenizer:
|
||||
for span in hashtags:
|
||||
for span in spans:
|
||||
retokenizer.merge(span)
|
||||
for token in span:
|
||||
token._.bad_html = True # Mark token as bad HTML
|
||||
|
@ -678,7 +679,7 @@ for match_id, start, end in matches:
|
|||
if doc.vocab.strings[match_id] == "HASHTAG":
|
||||
hashtags.append(doc[start:end])
|
||||
with doc.retokenize() as retokenizer:
|
||||
for span in spans:
|
||||
for span in hashtags:
|
||||
retokenizer.merge(span)
|
||||
for token in span:
|
||||
token._.is_hashtag = True
|
||||
|
@ -712,9 +713,9 @@ from spacy.matcher import PhraseMatcher
|
|||
|
||||
nlp = spacy.load('en_core_web_sm')
|
||||
matcher = PhraseMatcher(nlp.vocab)
|
||||
terminology_list = [u"Barack Obama", u"Angela Merkel", u"Washington, D.C."]
|
||||
terms = [u"Barack Obama", u"Angela Merkel", u"Washington, D.C."]
|
||||
# Only run nlp.make_doc to speed things up
|
||||
patterns = [nlp.make_doc(text) for text in terminology_list]
|
||||
patterns = [nlp.make_doc(text) for text in terms]
|
||||
matcher.add("TerminologyList", None, *patterns)
|
||||
|
||||
doc = nlp(u"German Chancellor Angela Merkel and US President Barack Obama "
|
||||
|
|
|
@ -29,6 +29,19 @@ quick introduction.
|
|||
> [pull requests](https://github.com/explosion/spaCy/pulls). You can find a
|
||||
> "Suggest edits" link at the bottom of each page that points you to the source.
|
||||
|
||||
<Infobox title="Take the free interactive course">
|
||||
|
||||
[![Advanced NLP with spaCy](../images/course.jpg)](https://course.spacy.io)
|
||||
|
||||
In this course you'll learn how to use spaCy to build advanced natural language
|
||||
understanding systems, using both rule-based and machine learning approaches. It
|
||||
includes 55 exercises featuring interactive coding practice, multiple-choice
|
||||
questions and slide decks.
|
||||
|
||||
<p><Button to="https://course.spacy.io" variant="primary">Start the course</Button></p>
|
||||
|
||||
</Infobox>
|
||||
|
||||
## What's spaCy? {#whats-spacy}
|
||||
|
||||
<Grid cols={2}>
|
||||
|
@ -89,27 +102,12 @@ systems, or to pre-process text for **deep learning**.
|
|||
integrated and opinionated. spaCy tries to avoid asking the user to choose
|
||||
between multiple algorithms that deliver equivalent functionality. Keeping the
|
||||
menu small lets spaCy deliver generally better performance and developer
|
||||
experience.M
|
||||
experience.
|
||||
|
||||
- **spaCy is not a company**. It's an open-source library. Our company
|
||||
publishing spaCy and other software is called
|
||||
[Explosion AI](https://explosion.ai).
|
||||
|
||||
<Infobox title="Download the spaCy Cheat Sheet!">
|
||||
|
||||
[![spaCy Cheatsheet](../images/cheatsheet.jpg)](http://datacamp-community-prod.s3.amazonaws.com/29aa28bf-570a-4965-8f54-d6a541ae4e06)
|
||||
|
||||
For the launch of our
|
||||
["Advanced NLP with spaCy"](https://www.datacamp.com/courses/advanced-nlp-with-spacy)
|
||||
course on DataCamp we created the first official spaCy cheat sheet! A handy
|
||||
two-page reference to the most important concepts and features, from loading
|
||||
models and accessing linguistic annotations, to custom pipeline components and
|
||||
rule-based matching.
|
||||
|
||||
<p><Button to="http://datacamp-community-prod.s3.amazonaws.com/29aa28bf-570a-4965-8f54-d6a541ae4e06" variant="primary">Download</Button></p>
|
||||
|
||||
</Infobox>
|
||||
|
||||
## Features {#features}
|
||||
|
||||
In the documentation, you'll come across mentions of spaCy's features and
|
||||
|
|
|
@ -136,7 +136,7 @@ The entity visualizer lets you customize the following `options`:
|
|||
| Argument | Type | Description | Default |
|
||||
| -------- | ---- | ------------------------------------------------------------------------------------- | ------- |
|
||||
| `ents` | list | Entity types to highlight (`None` for all types). | `None` |
|
||||
| `colors` | dict | Color overrides. Entity types in lowercase should be mapped to color names or values. | `{}` |
|
||||
| `colors` | dict | Color overrides. Entity types in uppercase should be mapped to color names or values. | `{}` |
|
||||
|
||||
If you specify a list of `ents`, only those entity types will be rendered – for
|
||||
example, you can choose to display `PERSON` entities. Internally, the visualizer
|
||||
|
|
|
@ -90,7 +90,8 @@
|
|||
{ "text": "StringStore", "url": "/api/stringstore" },
|
||||
{ "text": "Vectors", "url": "/api/vectors" },
|
||||
{ "text": "GoldParse", "url": "/api/goldparse" },
|
||||
{ "text": "GoldCorpus", "url": "/api/goldcorpus" }
|
||||
{ "text": "GoldCorpus", "url": "/api/goldcorpus" },
|
||||
{ "text": "Scorer", "url": "/api/scorer" }
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
|
@ -1,5 +1,107 @@
|
|||
{
|
||||
"resources": [
|
||||
{
|
||||
"id": "nlp-architect",
|
||||
"title": "NLP Architect",
|
||||
"slogan": "Python lib for exploring Deep NLP & NLU by Intel AI",
|
||||
"github": "NervanaSystems/nlp-architect",
|
||||
"pip": "nlp-architect",
|
||||
"thumb": "https://i.imgur.com/vMideRx.png",
|
||||
"category": ["standalone", "research"],
|
||||
"tags": ["pytorch"]
|
||||
},
|
||||
{
|
||||
"id": "NeuroNER",
|
||||
"title": "NeuroNER",
|
||||
"slogan": "Named-entity recognition using neural networks",
|
||||
"github": "Franck-Dernoncourt/NeuroNER",
|
||||
"pip": "pyneuroner[cpu]",
|
||||
"code_example": [
|
||||
"from neuroner import neuromodel",
|
||||
"nn = neuromodel.NeuroNER(train_model=False, use_pretrained_model=True)"
|
||||
],
|
||||
"category": ["ner"],
|
||||
"tags": ["standalone"]
|
||||
},
|
||||
{
|
||||
"id": "NLPre",
|
||||
"title": "NLPre",
|
||||
"slogan": "Natural Language Preprocessing Library for health data and more",
|
||||
"github": "NIHOPA/NLPre",
|
||||
"pip": "nlpre",
|
||||
"code_example": [
|
||||
"from nlpre import titlecaps, dedash, identify_parenthetical_phrases",
|
||||
"from nlpre import replace_acronyms, replace_from_dictionary",
|
||||
"ABBR = identify_parenthetical_phrases()(text)",
|
||||
"parsers = [dedash(), titlecaps(), replace_acronyms(ABBR),",
|
||||
" replace_from_dictionary(prefix='MeSH_')]",
|
||||
"for f in parsers:",
|
||||
" text = f(text)",
|
||||
"print(text)"
|
||||
],
|
||||
"category": ["scientific"]
|
||||
},
|
||||
{
|
||||
"id": "Chatterbot",
|
||||
"title": "Chatterbot",
|
||||
"slogan": "A machine-learning based conversational dialog engine for creating chat bots",
|
||||
"github": "gunthercox/ChatterBot",
|
||||
"pip": "chatterbot",
|
||||
"thumb": "https://i.imgur.com/eyAhwXk.jpg",
|
||||
"code_example": [
|
||||
"from chatterbot import ChatBot",
|
||||
"from chatterbot.trainers import ListTrainer",
|
||||
"# Create a new chat bot named Charlie",
|
||||
"chatbot = ChatBot('Charlie')",
|
||||
"trainer = ListTrainer(chatbot)",
|
||||
"trainer.train([",
|
||||
"'Hi, can I help you?',",
|
||||
"'Sure, I would like to book a flight to Iceland.",
|
||||
"'Your flight has been booked.'",
|
||||
"])",
|
||||
"",
|
||||
"response = chatbot.get_response('I would like to book a flight.')"
|
||||
],
|
||||
"author": "Gunther Cox",
|
||||
"author_links": {
|
||||
"github": "gunthercox"
|
||||
},
|
||||
"category": ["conversational", "standalone"],
|
||||
"tags": ["chatbots"]
|
||||
},
|
||||
{
|
||||
"id": "saber",
|
||||
"title": "saber",
|
||||
"slogan": "Deep-learning based tool for information extraction in the biomedical domain",
|
||||
"github": "BaderLab/saber",
|
||||
"pip": "saber",
|
||||
"thumb": "https://raw.githubusercontent.com/BaderLab/saber/master/docs/img/saber_logo.png",
|
||||
"code_example": [
|
||||
"from saber.saber import Saber",
|
||||
"saber = Saber()",
|
||||
"saber.load('PRGE')",
|
||||
"saber.annotate('The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53.')"
|
||||
],
|
||||
"author": "Bader Lab, University of Toronto",
|
||||
"category": ["scientific"],
|
||||
"tags": ["keras", "biomedical"]
|
||||
},
|
||||
{
|
||||
"id": "alibi",
|
||||
"title": "alibi",
|
||||
"slogan": "Algorithms for monitoring and explaining machine learning models ",
|
||||
"github": "SeldonIO/alibi",
|
||||
"pip": "alibi",
|
||||
"thumb": "https://i.imgur.com/YkzQHRp.png",
|
||||
"code_example": [
|
||||
"from alibi.explainers import AnchorTabular",
|
||||
"explainer = AnchorTabular(predict_fn, feature_names)",
|
||||
"explainer.fit(X_train)",
|
||||
"explainer.explain(x)"
|
||||
],
|
||||
"author": "Seldon",
|
||||
"category": ["standalone", "research"]
|
||||
},
|
||||
{
|
||||
"id": "spacymoji",
|
||||
"slogan": "Emoji handling and meta data as a spaCy pipeline component",
|
||||
|
@ -143,7 +245,7 @@
|
|||
"doc = nlp(my_doc_text)"
|
||||
],
|
||||
"author": "tc64",
|
||||
"author_link": {
|
||||
"author_links": {
|
||||
"github": "tc64"
|
||||
},
|
||||
"category": ["pipeline"]
|
||||
|
@ -346,7 +448,7 @@
|
|||
"author_links": {
|
||||
"github": "huggingface"
|
||||
},
|
||||
"category": ["standalone", "conversational"],
|
||||
"category": ["standalone", "conversational", "models"],
|
||||
"tags": ["coref"]
|
||||
},
|
||||
{
|
||||
|
@ -538,7 +640,7 @@
|
|||
"twitter": "allenai_org",
|
||||
"website": "http://allenai.org"
|
||||
},
|
||||
"category": ["models", "research"]
|
||||
"category": ["scientific", "models", "research"]
|
||||
},
|
||||
{
|
||||
"id": "textacy",
|
||||
|
@ -601,7 +703,7 @@
|
|||
"github": "ahalterman",
|
||||
"twitter": "ahalterman"
|
||||
},
|
||||
"category": ["standalone"]
|
||||
"category": ["standalone", "scientific"]
|
||||
},
|
||||
{
|
||||
"id": "kindred",
|
||||
|
@ -626,7 +728,7 @@
|
|||
"author_links": {
|
||||
"github": "jakelever"
|
||||
},
|
||||
"category": ["standalone"]
|
||||
"category": ["standalone", "scientific"]
|
||||
},
|
||||
{
|
||||
"id": "sense2vec",
|
||||
|
@ -837,6 +939,42 @@
|
|||
},
|
||||
"category": ["standalone"]
|
||||
},
|
||||
{
|
||||
"id": "prefect",
|
||||
"title": "Prefect",
|
||||
"slogan": "Workflow management system designed for modern infrastructure",
|
||||
"github": "PrefectHQ/prefect",
|
||||
"pip": "prefect",
|
||||
"thumb": "https://i.imgur.com/oLTwr0e.png",
|
||||
"code_example": [
|
||||
"from prefect import Flow",
|
||||
"from prefect.tasks.spacy.spacy_tasks import SpacyNLP",
|
||||
"import spacy",
|
||||
"",
|
||||
"nlp = spacy.load(\"en_core_web_sm\")",
|
||||
"",
|
||||
"with Flow(\"Natural Language Processing\") as flow:",
|
||||
" doc = SpacyNLP(text=\"This is some text\", nlp=nlp)",
|
||||
"",
|
||||
"flow.run()"
|
||||
],
|
||||
"author": "Prefect",
|
||||
"author_links": {
|
||||
"website": "https://prefect.io"
|
||||
},
|
||||
"category": ["standalone"]
|
||||
},
|
||||
{
|
||||
"id": "graphbrain",
|
||||
"title": "Graphbrain",
|
||||
"slogan": "Automated meaning extraction and text understanding",
|
||||
"description": "Graphbrain is an Artificial Intelligence open-source software library and scientific research tool. Its aim is to facilitate automated meaning extraction and text understanding, as well as the exploration and inference of knowledge.",
|
||||
"github": "graphbrain/graphbrain",
|
||||
"pip": "graphbrain",
|
||||
"thumb": "https://i.imgur.com/cct9W1E.png",
|
||||
"author": "Graphbrain",
|
||||
"category": ["standalone"]
|
||||
},
|
||||
{
|
||||
"type": "education",
|
||||
"id": "oreilly-python-ds",
|
||||
|
@ -883,36 +1021,6 @@
|
|||
"author": "Bhargav Srinivasa-Desikan",
|
||||
"category": ["books"]
|
||||
},
|
||||
{
|
||||
"type": "education",
|
||||
"id": "datacamp-nlp-fundamentals",
|
||||
"title": "Natural Language Processing Fundamentals in Python",
|
||||
"slogan": "Datacamp, 2017",
|
||||
"description": "In this course, you'll learn Natural Language Processing (NLP) basics, such as how to identify and separate words, how to extract topics in a text, and how to build your own fake news classifier. You'll also learn how to use basic libraries such as NLTK, alongside libraries which utilize deep learning to solve common NLP problems. This course will give you the foundation to process and parse text as you move forward in your Python learning.",
|
||||
"url": "https://www.datacamp.com/courses/natural-language-processing-fundamentals-in-python",
|
||||
"thumb": "https://i.imgur.com/0Zks7c0.jpg",
|
||||
"author": "Katharine Jarmul",
|
||||
"author_links": {
|
||||
"twitter": "kjam"
|
||||
},
|
||||
"category": ["courses"]
|
||||
},
|
||||
{
|
||||
"type": "education",
|
||||
"id": "datacamp-advanced-nlp",
|
||||
"title": "Advanced Natural Language Processing with spaCy",
|
||||
"slogan": "Datacamp, 2019",
|
||||
"description": "If you're working with a lot of text, you'll eventually want to know more about it. For example, what's it about? What do the words mean in context? Who is doing what to whom? What companies and products are mentioned? Which texts are similar to each other? In this course, you'll learn how to use spaCy, a fast-growing industry standard library for NLP in Python, to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
|
||||
"url": "https://www.datacamp.com/courses/advanced-nlp-with-spacy",
|
||||
"thumb": "https://i.imgur.com/0Zks7c0.jpg",
|
||||
"author": "Ines Montani",
|
||||
"author_links": {
|
||||
"twitter": "_inesmontani",
|
||||
"github": "ines",
|
||||
"website": "https://ines.io"
|
||||
},
|
||||
"category": ["courses"]
|
||||
},
|
||||
{
|
||||
"type": "education",
|
||||
"id": "learning-path-spacy",
|
||||
|
@ -924,6 +1032,23 @@
|
|||
"author": "Aaron Kramer",
|
||||
"category": ["courses"]
|
||||
},
|
||||
{
|
||||
"type": "education",
|
||||
"id": "spacy-course",
|
||||
"title": "Advanced NLP with spaCy",
|
||||
"slogan": "spaCy, 2019",
|
||||
"description": "In this free interactive course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
|
||||
"url": "https://course.spacy.io",
|
||||
"image": "https://i.imgur.com/JC00pHW.jpg",
|
||||
"thumb": "https://i.imgur.com/5RXLtrr.jpg",
|
||||
"author": "Ines Montani",
|
||||
"author_links": {
|
||||
"twitter": "_inesmontani",
|
||||
"github": "ines",
|
||||
"website": "https://ines.io"
|
||||
},
|
||||
"category": ["courses"]
|
||||
},
|
||||
{
|
||||
"type": "education",
|
||||
"id": "video-spacys-ner-model",
|
||||
|
@ -1010,6 +1135,22 @@
|
|||
},
|
||||
"category": ["podcasts"]
|
||||
},
|
||||
{
|
||||
"type": "education",
|
||||
"id": "twimlai-podcast",
|
||||
"title": "TWiML & AI: Practical NLP with spaCy and Prodigy",
|
||||
"slogan": "May 2019",
|
||||
"description": "\"Ines and I caught up to discuss her various projects, including the aforementioned SpaCy, an open-source NLP library built with a focus on industry and production use cases. In our conversation, Ines gives us an overview of the SpaCy Library, a look at some of the use cases that excite her, and the Spacy community and contributors. We also discuss her work with Prodigy, an annotation service tool that uses continuous active learning to train models, and finally, what other exciting projects she is working on.\"",
|
||||
"thumb": "https://i.imgur.com/ng2F5gK.png",
|
||||
"url": "https://twimlai.com/twiml-talk-262-practical-natural-language-processing-with-spacy-and-prodigy-w-ines-montani",
|
||||
"iframe": "https://html5-player.libsyn.com/embed/episode/id/9691514/height/90/theme/custom/thumbnail/no/preload/no/direction/backward/render-playlist/no/custom-color/3e85b1/",
|
||||
"iframe_height": 90,
|
||||
"author": "Sam Charrington",
|
||||
"author_links": {
|
||||
"website": "https://twimlai.com"
|
||||
},
|
||||
"category": ["podcasts"]
|
||||
},
|
||||
{
|
||||
"id": "adam_qas",
|
||||
"title": "ADAM: Question Answering System",
|
||||
|
@ -1068,7 +1209,7 @@
|
|||
"github": "ecohealthalliance",
|
||||
"website": " https://ecohealthalliance.org/"
|
||||
},
|
||||
"category": ["research", "standalone"]
|
||||
"category": ["scientific", "standalone"]
|
||||
},
|
||||
{
|
||||
"id": "self-attentive-parser",
|
||||
|
@ -1311,8 +1452,100 @@
|
|||
"website": "http://w4nderlu.st"
|
||||
},
|
||||
"category": ["standalone", "research"]
|
||||
},
|
||||
{
|
||||
"id": "gracyql",
|
||||
"title": "gracyql",
|
||||
"slogan": "A thin GraphQL wrapper around spacy",
|
||||
"github": "oterrier/gracyql",
|
||||
"description": "An example of a basic [Starlette](https://github.com/encode/starlette) app using [Spacy](https://github.com/explosion/spaCy) and [Graphene](https://github.com/graphql-python/graphene). The main goal is to be able to use the amazing power of spaCy from other languages and retrieving only the information you need thanks to the GraphQL query definition. The GraphQL schema tries to mimic as much as possible the original Spacy API with classes Doc, Span and Token.",
|
||||
"thumb": "https://i.imgur.com/xC7zpTO.png",
|
||||
"category": ["apis"],
|
||||
"tags": ["graphql"],
|
||||
"code_example": [
|
||||
"query ParserDisabledQuery {",
|
||||
" nlp(model: \"en\", disable: [\"parser\", \"ner\"]) {",
|
||||
" doc(text: \"I live in Grenoble, France\") {",
|
||||
" text",
|
||||
" tokens {",
|
||||
" id",
|
||||
" pos",
|
||||
" lemma",
|
||||
" dep",
|
||||
" }",
|
||||
" ents {",
|
||||
" start",
|
||||
" end",
|
||||
" label",
|
||||
" }",
|
||||
" }",
|
||||
" }",
|
||||
"}"
|
||||
],
|
||||
"code_language": "json",
|
||||
"author": "Olivier Terrier",
|
||||
"author_links": {
|
||||
"github": "oterrier"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "pyInflect",
|
||||
"slogan": "A python module for word inflections",
|
||||
"description": "This package uses the [spaCy 2.0 extensions](https://spacy.io/usage/processing-pipelines#extensions) to add word inflections to the system.",
|
||||
"github": "bjascob/pyInflect",
|
||||
"pip": "pyinflect",
|
||||
"code_example": [
|
||||
"import spacy",
|
||||
"import pyinflect",
|
||||
"",
|
||||
"nlp = spacy.load('en_core_web_sm')",
|
||||
"doc = nlp('This is an example.')",
|
||||
"doc[3].tag_ # NN",
|
||||
"doc[3]._.inflect('NNS') # examples"
|
||||
],
|
||||
"author": "Brad Jascob",
|
||||
"author_links": {
|
||||
"github": "bjascob"
|
||||
},
|
||||
"category": ["pipeline"],
|
||||
"tags": ["inflection"]
|
||||
},
|
||||
{
|
||||
"id": "NGym",
|
||||
"title": "NeuralGym",
|
||||
"slogan": "A little Windows GUI for training models with spaCy",
|
||||
"description": "NeuralGym is a Python application for Windows with a graphical user interface to train models with spaCy. Run the application, select an output folder, a training data file in spaCy's data format, a spaCy model or blank model and press 'Start'.",
|
||||
"github": "d5555/NeuralGym",
|
||||
"url": "https://github.com/d5555/NeuralGym",
|
||||
"image": "https://github.com/d5555/NeuralGym/raw/master/NGym.png",
|
||||
"thumb": "https://github.com/d5555/NeuralGym/raw/master/NGym/web.png",
|
||||
"author": "d5555",
|
||||
"category": ["training"],
|
||||
"tags": ["windows"]
|
||||
},
|
||||
{
|
||||
"id": "holmes",
|
||||
"title": "Holmes",
|
||||
"slogan": "Information extraction from English and German texts based on predicate logic",
|
||||
"github": "msg-systems/holmes-extractor",
|
||||
"url": "https://github.com/msg-systems/holmes-extractor",
|
||||
"description": "Holmes is a Python 3 library that supports a number of use cases involving information extraction from English and German texts, including chatbot, structural search, topic matching and supervised document classification.",
|
||||
"pip": "holmes-extractor",
|
||||
"category": ["conversational", "standalone"],
|
||||
"tags": ["chatbots", "text-processing"],
|
||||
"code_example": [
|
||||
"import holmes_extractor as holmes",
|
||||
"holmes_manager = holmes.Manager(model='en_coref_lg')",
|
||||
"holmes_manager.register_search_phrase('A big dog chases a cat')",
|
||||
"holmes_manager.start_chatbot_mode_console()"
|
||||
],
|
||||
"author": "Richard Paul Hudson",
|
||||
"author_links": {
|
||||
"github": "richardpaulhudson"
|
||||
}
|
||||
}
|
||||
],
|
||||
|
||||
"categories": [
|
||||
{
|
||||
"label": "Projects",
|
||||
|
@ -1337,6 +1570,11 @@
|
|||
"title": "Research",
|
||||
"description": "Frameworks and utilities for developing better NLP models, especially using neural networks"
|
||||
},
|
||||
{
|
||||
"id": "scientific",
|
||||
"title": "Scientific",
|
||||
"description": "Frameworks and utilities for scientific text processing"
|
||||
},
|
||||
{
|
||||
"id": "visualizers",
|
||||
"title": "Visualizers",
|
||||
|
@ -1356,6 +1594,11 @@
|
|||
"id": "standalone",
|
||||
"title": "Standalone",
|
||||
"description": "Self-contained libraries or tools that use spaCy under the hood"
|
||||
},
|
||||
{
|
||||
"id": "models",
|
||||
"title": "Models",
|
||||
"description": "Third-party pre-trained models for different languages and domains"
|
||||
}
|
||||
]
|
||||
},
|
||||
|
|
|
@ -93,6 +93,7 @@ const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) =>
|
|||
|
||||
return (
|
||||
<Helmet
|
||||
defer={false}
|
||||
htmlAttributes={{ lang }}
|
||||
bodyAttributes={{ class: bodyClass }}
|
||||
title={pageTitle}
|
||||
|
|
|
@ -125,7 +125,7 @@ const UniverseContent = ({ content = [], categories, pageContext, location, mdxC
|
|||
</p>
|
||||
|
||||
<InlineList>
|
||||
<Button variant="primary" to={github('website/universe/README.md')}>
|
||||
<Button variant="primary" to={github('website/UNIVERSE.md')}>
|
||||
Read the docs
|
||||
</Button>
|
||||
<Button icon="code" to={github('website/meta/universe.json')}>
|
||||
|
|
|
@ -75,16 +75,6 @@ const Landing = ({ data }) => {
|
|||
<LandingSubtitle>in Python</LandingSubtitle>
|
||||
</LandingHeader>
|
||||
<LandingGrid blocks>
|
||||
<LandingCard title="Fastest in the world">
|
||||
<p>
|
||||
spaCy excels at large-scale information extraction tasks. It's written from
|
||||
the ground up in carefully memory-managed Cython. Independent research has
|
||||
confirmed that spaCy is the fastest in the world. If your application needs
|
||||
to process entire web dumps, spaCy is the library you want to be using.
|
||||
</p>
|
||||
<LandingButton to="/usage/facts-figures">Facts & Figures</LandingButton>
|
||||
</LandingCard>
|
||||
|
||||
<LandingCard title="Get things done">
|
||||
<p>
|
||||
spaCy is designed to help you do real work — to build real products, or
|
||||
|
@ -92,7 +82,16 @@ const Landing = ({ data }) => {
|
|||
wasting it. It's easy to install, and its API is simple and productive. We
|
||||
like to think of spaCy as the Ruby on Rails of Natural Language Processing.
|
||||
</p>
|
||||
<LandingButton to="/usage">Get started</LandingButton>
|
||||
<LandingButton to="/usage/spacy-101">Get started</LandingButton>
|
||||
</LandingCard>
|
||||
<LandingCard title="Blazing fast">
|
||||
<p>
|
||||
spaCy excels at large-scale information extraction tasks. It's written from
|
||||
the ground up in carefully memory-managed Cython. Independent research in
|
||||
2015 found spaCy to be the fastest in the world. If your application needs
|
||||
to process entire web dumps, spaCy is the library you want to be using.
|
||||
</p>
|
||||
<LandingButton to="/usage/facts-figures">Facts & Figures</LandingButton>
|
||||
</LandingCard>
|
||||
|
||||
<LandingCard title="Deep learning">
|
||||
|
@ -129,6 +128,7 @@ const Landing = ({ data }) => {
|
|||
<Li>
|
||||
Pre-trained <strong>word vectors</strong>
|
||||
</Li>
|
||||
<Li>State-of-the-art speed</Li>
|
||||
<Li>
|
||||
Easy <strong>deep learning</strong> integration
|
||||
</Li>
|
||||
|
@ -144,7 +144,6 @@ const Landing = ({ data }) => {
|
|||
<Li>
|
||||
Easy <strong>model packaging</strong> and deployment
|
||||
</Li>
|
||||
<Li>State-of-the-art speed</Li>
|
||||
<Li>Robust, rigorously evaluated accuracy</Li>
|
||||
</Ul>
|
||||
</LandingCol>
|
||||
|
|
Loading…
Reference in New Issue
Block a user