mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 18:26:30 +03:00
Merge pull request #5479 from explosion/master-tmp
This commit is contained in:
commit
56a9d1b78c
106
.github/contributors/MiniLau.md
vendored
Normal file
106
.github/contributors/MiniLau.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Desausoi Laurent |
|
||||
| Company name (if applicable) | / |
|
||||
| Title or role (if applicable) | / |
|
||||
| Date | 22 November 2019 |
|
||||
| GitHub username | MiniLau |
|
||||
| Website (optional) | / |
|
106
.github/contributors/Mlawrence95.md
vendored
Normal file
106
.github/contributors/Mlawrence95.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [ x ] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Mike Lawrence |
|
||||
| Company name (if applicable) | NA |
|
||||
| Title or role (if applicable) | NA |
|
||||
| Date | April 17, 2020 |
|
||||
| GitHub username | Mlawrence95 |
|
||||
| Website (optional) | |
|
106
.github/contributors/YohannesDatasci.md
vendored
Normal file
106
.github/contributors/YohannesDatasci.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [X] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Yohannes |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020-04-02 |
|
||||
| GitHub username | YohannesDatasci |
|
||||
| Website (optional) | |
|
106
.github/contributors/chopeen.md
vendored
Normal file
106
.github/contributors/chopeen.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Marek Grzenkowicz |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020.04.10 |
|
||||
| GitHub username | chopeen |
|
||||
| Website (optional) | |
|
106
.github/contributors/elben10
vendored
Normal file
106
.github/contributors/elben10
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Jakob Jul Elben |
|
||||
| Company name (if applicable) | N/A |
|
||||
| Title or role (if applicable) | N/A |
|
||||
| Date | April 16th, 2020 |
|
||||
| GitHub username | elben10 |
|
||||
| Website (optional) | N/A |
|
106
.github/contributors/ilivans.md
vendored
Normal file
106
.github/contributors/ilivans.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | ------------------------ |
|
||||
| Name | Ilia Ivanov |
|
||||
| Company name (if applicable) | Chattermill |
|
||||
| Title or role (if applicable) | DL Engineer |
|
||||
| Date | 2020-05-14 |
|
||||
| GitHub username | ilivans |
|
||||
| Website (optional) | |
|
106
.github/contributors/jacse.md
vendored
Normal file
106
.github/contributors/jacse.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Jacob Lauritzen |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020-03-30 |
|
||||
| GitHub username | jacse |
|
||||
| Website (optional) | |
|
106
.github/contributors/kevinlu1248.md
vendored
Normal file
106
.github/contributors/kevinlu1248.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Kevin Lu|
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | Student|
|
||||
| Date | |
|
||||
| GitHub username | kevinlu1248|
|
||||
| Website (optional) | |
|
106
.github/contributors/koaning.md
vendored
Normal file
106
.github/contributors/koaning.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | ------------------------ |
|
||||
| Name | Vincent D. Warmerdam |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | Data Person |
|
||||
| Date | 2020-03-01 |
|
||||
| GitHub username | koaning |
|
||||
| Website (optional) | https://koaning.io |
|
106
.github/contributors/laszabine.md
vendored
Normal file
106
.github/contributors/laszabine.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Sabine Laszakovits |
|
||||
| Company name (if applicable) | Austrian Academy of Sciences |
|
||||
| Title or role (if applicable) | Data analyst |
|
||||
| Date | 2020-04-16 |
|
||||
| GitHub username | laszabine |
|
||||
| Website (optional) | https://sabine.laszakovits.net |
|
106
.github/contributors/leicmi.md
vendored
Normal file
106
.github/contributors/leicmi.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Michael Leichtfried |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 30.03.2020 |
|
||||
| GitHub username | leicmi |
|
||||
| Website (optional) | |
|
106
.github/contributors/louisguitton.md
vendored
Normal file
106
.github/contributors/louisguitton.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Louis Guitton |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020-04-25 |
|
||||
| GitHub username | louisguitton |
|
||||
| Website (optional) | https://guitton.co/ |
|
106
.github/contributors/michael-k.md
vendored
Normal file
106
.github/contributors/michael-k.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [X] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Michael Käufl |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020-04-23 |
|
||||
| GitHub username | michael-k |
|
||||
| Website (optional) | |
|
106
.github/contributors/nikhilsaldanha.md
vendored
Normal file
106
.github/contributors/nikhilsaldanha.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Nikhil Saldanha |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020-03-17 |
|
||||
| GitHub username | nikhilsaldanha |
|
||||
| Website (optional) | |
|
106
.github/contributors/osori.md
vendored
Normal file
106
.github/contributors/osori.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Ilkyu Ju |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020-05-17 |
|
||||
| GitHub username | osori |
|
||||
| Website (optional) | |
|
106
.github/contributors/paoloq.md
vendored
Normal file
106
.github/contributors/paoloq.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Paolo Arduin |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 9 April 2020 |
|
||||
| GitHub username | paoloq |
|
||||
| Website (optional) | |
|
107
.github/contributors/punitvara.md
vendored
Normal file
107
.github/contributors/punitvara.md
vendored
Normal file
|
@ -0,0 +1,107 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | ------------------------ |
|
||||
| Name | Punit Vara |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020-04-26 |
|
||||
| GitHub username | punitvara |
|
||||
| Website (optional) | https://punitvara.com |
|
||||
|
106
.github/contributors/sabiqueqb.md
vendored
Normal file
106
.github/contributors/sabiqueqb.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [ ] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Sabique Ahammed Lava |
|
||||
| Company name (if applicable) | QBurst |
|
||||
| Title or role (if applicable) | Senior Engineer |
|
||||
| Date | 24 Apr 2020 |
|
||||
| GitHub username | sabiqueqb |
|
||||
| Website (optional) | |
|
106
.github/contributors/sebastienharinck.md
vendored
Normal file
106
.github/contributors/sebastienharinck.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [ ] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------------------------------- |
|
||||
| Name | Sébastien Harinck |
|
||||
| Company name (if applicable) | Odaxiom |
|
||||
| Title or role (if applicable) | ML Engineer |
|
||||
| Date | 2020-04-15 |
|
||||
| GitHub username | sebastienharinck |
|
||||
| Website (optional) | [https://odaxiom.com](https://odaxiom.com) |
|
106
.github/contributors/thomasthiebaud.md
vendored
Normal file
106
.github/contributors/thomasthiebaud.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
- Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
- to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
- each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
| ----------------------------- | --------------- |
|
||||
| Name | Thomas Thiebaud |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020-04-07 |
|
||||
| GitHub username | thomasthiebaud |
|
||||
| Website (optional) | |
|
106
.github/contributors/thoppe.md
vendored
Normal file
106
.github/contributors/thoppe.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Travis Hoppe |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | Data Scientist |
|
||||
| Date | 07 May 2020 |
|
||||
| GitHub username | thoppe |
|
||||
| Website (optional) | http://thoppe.github.io/ |
|
106
.github/contributors/tommilligan.md
vendored
Normal file
106
.github/contributors/tommilligan.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
- Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
- to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
- each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
| ----------------------------- | ------------ |
|
||||
| Name | Tom Milligan |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020-03-24 |
|
||||
| GitHub username | tommilligan |
|
||||
| Website (optional) | |
|
106
.github/contributors/umarbutler.md
vendored
Normal file
106
.github/contributors/umarbutler.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | ------------------------ |
|
||||
| Name | Umar Butler |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2020-04-09 |
|
||||
| GitHub username | umarbutler |
|
||||
| Website (optional) | https://umarbutler.com |
|
106
.github/contributors/vishnupriyavr.md
vendored
Normal file
106
.github/contributors/vishnupriyavr.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | ------------------------ |
|
||||
| Name | Vishnu Priya VR |
|
||||
| Company name (if applicable) | Uniphore |
|
||||
| Title or role (if applicable) | NLP/AI Engineer |
|
||||
| Date | 2020-05-03 |
|
||||
| GitHub username | vishnupriyavr |
|
||||
| Website (optional) | |
|
106
.github/contributors/vondersam.md
vendored
Normal file
106
.github/contributors/vondersam.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | ------------------------|
|
||||
| Name | Samuel Rodríguez Medina |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | Computational linguist |
|
||||
| Date | 28 April 2020 |
|
||||
| GitHub username | vondersam |
|
||||
| Website (optional) | |
|
|
@ -1,15 +1,15 @@
|
|||
#!/usr/bin/env python
|
||||
# coding: utf8
|
||||
|
||||
"""Example of defining and (pre)training spaCy's knowledge base,
|
||||
"""Example of defining a knowledge base in spaCy,
|
||||
which is needed to implement entity linking functionality.
|
||||
|
||||
For more details, see the documentation:
|
||||
* Knowledge base: https://spacy.io/api/kb
|
||||
* Entity Linking: https://spacy.io/usage/linguistic-features#entity-linking
|
||||
|
||||
Compatible with: spaCy v2.2.3
|
||||
Last tested with: v2.2.3
|
||||
Compatible with: spaCy v2.2.4
|
||||
Last tested with: v2.2.4
|
||||
"""
|
||||
from __future__ import unicode_literals, print_function
|
||||
|
||||
|
@ -20,24 +20,18 @@ from spacy.vocab import Vocab
|
|||
import spacy
|
||||
from spacy.kb import KnowledgeBase
|
||||
|
||||
from bin.wiki_entity_linking.train_descriptions import EntityEncoder
|
||||
|
||||
|
||||
# Q2146908 (Russ Cochran): American golfer
|
||||
# Q7381115 (Russ Cochran): publisher
|
||||
ENTITIES = {"Q2146908": ("American golfer", 342), "Q7381115": ("publisher", 17)}
|
||||
|
||||
INPUT_DIM = 300 # dimension of pretrained input vectors
|
||||
DESC_WIDTH = 64 # dimension of output entity vectors
|
||||
|
||||
|
||||
@plac.annotations(
|
||||
model=("Model name, should have pretrained word embeddings", "positional", None, str),
|
||||
output_dir=("Optional output directory", "option", "o", Path),
|
||||
n_iter=("Number of training iterations", "option", "n", int),
|
||||
)
|
||||
def main(model=None, output_dir=None, n_iter=50):
|
||||
"""Load the model, create the KB and pretrain the entity encodings.
|
||||
def main(model=None, output_dir=None):
|
||||
"""Load the model and create the KB with pre-defined entity encodings.
|
||||
If an output_dir is provided, the KB will be stored there in a file 'kb'.
|
||||
The updated vocab will also be written to a directory in the output_dir."""
|
||||
|
||||
|
@ -51,33 +45,23 @@ def main(model=None, output_dir=None, n_iter=50):
|
|||
" cf. https://spacy.io/usage/models#languages."
|
||||
)
|
||||
|
||||
kb = KnowledgeBase(vocab=nlp.vocab)
|
||||
# You can change the dimension of vectors in your KB by using an encoder that changes the dimensionality.
|
||||
# For simplicity, we'll just use the original vector dimension here instead.
|
||||
vectors_dim = nlp.vocab.vectors.shape[1]
|
||||
kb = KnowledgeBase(vocab=nlp.vocab, entity_vector_length=vectors_dim)
|
||||
|
||||
# set up the data
|
||||
entity_ids = []
|
||||
descriptions = []
|
||||
descr_embeddings = []
|
||||
freqs = []
|
||||
for key, value in ENTITIES.items():
|
||||
desc, freq = value
|
||||
entity_ids.append(key)
|
||||
descriptions.append(desc)
|
||||
descr_embeddings.append(nlp(desc).vector)
|
||||
freqs.append(freq)
|
||||
|
||||
# training entity description encodings
|
||||
# this part can easily be replaced with a custom entity encoder
|
||||
encoder = EntityEncoder(
|
||||
nlp=nlp,
|
||||
input_dim=INPUT_DIM,
|
||||
desc_width=DESC_WIDTH,
|
||||
epochs=n_iter,
|
||||
)
|
||||
encoder.train(description_list=descriptions, to_print=True)
|
||||
|
||||
# get the pretrained entity vectors
|
||||
embeddings = encoder.apply_encoder(descriptions)
|
||||
|
||||
# set the entities, can also be done by calling `kb.add_entity` for each entity
|
||||
kb.set_entities(entity_list=entity_ids, freq_list=freqs, vector_list=embeddings)
|
||||
kb.set_entities(entity_list=entity_ids, freq_list=freqs, vector_list=descr_embeddings)
|
||||
|
||||
# adding aliases, the entities need to be defined in the KB beforehand
|
||||
kb.add_alias(
|
||||
|
@ -113,8 +97,8 @@ def main(model=None, output_dir=None, n_iter=50):
|
|||
vocab2 = Vocab().from_disk(vocab_path)
|
||||
kb2 = KnowledgeBase(vocab=vocab2)
|
||||
kb2.load_bulk(kb_path)
|
||||
_print_kb(kb2)
|
||||
print()
|
||||
_print_kb(kb2)
|
||||
|
||||
|
||||
def _print_kb(kb):
|
||||
|
@ -126,6 +110,5 @@ if __name__ == "__main__":
|
|||
plac.call(main)
|
||||
|
||||
# Expected output:
|
||||
|
||||
# 2 kb entities: ['Q2146908', 'Q7381115']
|
||||
# 1 kb aliases: ['Russ Cochran']
|
|
@ -1,6 +1,7 @@
|
|||
"""Prevent catastrophic forgetting with rehearsal updates."""
|
||||
import plac
|
||||
import random
|
||||
import warnings
|
||||
import srsly
|
||||
import spacy
|
||||
from spacy.gold import GoldParse
|
||||
|
@ -63,7 +64,10 @@ def main(model_name, unlabelled_loc):
|
|||
optimizer.b2 = 0.0
|
||||
|
||||
sizes = compounding(1.0, 4.0, 1.001)
|
||||
with nlp.select_pipes(enable="ner"):
|
||||
with nlp.select_pipes(enable="ner") and warnings.catch_warnings():
|
||||
# show warnings for misaligned entity spans once
|
||||
warnings.filterwarnings("once", category=UserWarning, module="spacy")
|
||||
|
||||
for itn in range(n_iter):
|
||||
random.shuffle(TRAIN_DATA)
|
||||
random.shuffle(raw_docs)
|
||||
|
|
|
@ -1,15 +1,15 @@
|
|||
#!/usr/bin/env python
|
||||
# coding: utf8
|
||||
|
||||
"""Example of training spaCy's entity linker, starting off with an
|
||||
existing model and a pre-defined knowledge base.
|
||||
"""Example of training spaCy's entity linker, starting off with a predefined
|
||||
knowledge base and corresponding vocab, and a blank English model.
|
||||
|
||||
For more details, see the documentation:
|
||||
* Training: https://spacy.io/usage/training
|
||||
* Entity Linking: https://spacy.io/usage/linguistic-features#entity-linking
|
||||
|
||||
Compatible with: spaCy v2.2.3
|
||||
Last tested with: v2.2.3
|
||||
Compatible with: spaCy v2.2.4
|
||||
Last tested with: v2.2.4
|
||||
"""
|
||||
from __future__ import unicode_literals, print_function
|
||||
|
||||
|
@ -17,13 +17,10 @@ import plac
|
|||
import random
|
||||
from pathlib import Path
|
||||
|
||||
import srsly
|
||||
from spacy.vocab import Vocab
|
||||
|
||||
import spacy
|
||||
from spacy.kb import KnowledgeBase
|
||||
from spacy.pipeline import EntityRuler
|
||||
from spacy.tokens import Span
|
||||
from spacy.util import minibatch, compounding
|
||||
|
||||
|
||||
|
@ -66,18 +63,20 @@ def main(kb_path, vocab_path=None, output_dir=None, n_iter=50):
|
|||
"""Create a blank model with the specified vocab, set up the pipeline and train the entity linker.
|
||||
The `vocab` should be the one used during creation of the KB."""
|
||||
vocab = Vocab().from_disk(vocab_path)
|
||||
# create blank Language class with correct vocab
|
||||
# create blank English model with correct vocab
|
||||
nlp = spacy.blank("en", vocab=vocab)
|
||||
nlp.vocab.vectors.name = "nel_vectors"
|
||||
print("Created blank 'en' model with vocab from '%s'" % vocab_path)
|
||||
|
||||
# Add a sentencizer component. Alternatively, add a dependency parser for higher accuracy.
|
||||
nlp.add_pipe(nlp.create_pipe('sentencizer'))
|
||||
nlp.add_pipe(nlp.create_pipe("sentencizer"))
|
||||
|
||||
# Add a custom component to recognize "Russ Cochran" as an entity for the example training data.
|
||||
# Note that in a realistic application, an actual NER algorithm should be used instead.
|
||||
ruler = EntityRuler(nlp)
|
||||
patterns = [{"label": "PERSON", "pattern": [{"LOWER": "russ"}, {"LOWER": "cochran"}]}]
|
||||
patterns = [
|
||||
{"label": "PERSON", "pattern": [{"LOWER": "russ"}, {"LOWER": "cochran"}]}
|
||||
]
|
||||
ruler.add_patterns(patterns)
|
||||
nlp.add_pipe(ruler)
|
||||
|
||||
|
|
|
@ -8,12 +8,13 @@ For more details, see the documentation:
|
|||
* NER: https://spacy.io/usage/linguistic-features#named-entities
|
||||
|
||||
Compatible with: spaCy v2.0.0+
|
||||
Last tested with: v2.1.0
|
||||
Last tested with: v2.2.4
|
||||
"""
|
||||
from __future__ import unicode_literals, print_function
|
||||
|
||||
import plac
|
||||
import random
|
||||
import warnings
|
||||
from pathlib import Path
|
||||
import spacy
|
||||
from spacy.util import minibatch, compounding
|
||||
|
@ -55,12 +56,17 @@ def main(model=None, output_dir=None, n_iter=100):
|
|||
print("Add label", ent[2])
|
||||
ner.add_label(ent[2])
|
||||
|
||||
with nlp.select_pipes(enable="ner"): # only train NER
|
||||
with nlp.select_pipes(enable="ner") and warnings.catch_warnings():
|
||||
# show warnings for misaligned entity spans once
|
||||
warnings.filterwarnings("once", category=UserWarning, module="spacy")
|
||||
|
||||
# reset and initialize the weights randomly – but only if we're
|
||||
# training a new model
|
||||
if model is None:
|
||||
nlp.begin_training()
|
||||
print("Transitions", list(enumerate(nlp.get_pipe("simple_ner").get_tag_names())))
|
||||
print(
|
||||
"Transitions", list(enumerate(nlp.get_pipe("simple_ner").get_tag_names()))
|
||||
)
|
||||
for itn in range(n_iter):
|
||||
random.shuffle(TRAIN_DATA)
|
||||
losses = {}
|
||||
|
|
|
@ -24,12 +24,13 @@ For more details, see the documentation:
|
|||
* NER: https://spacy.io/usage/linguistic-features#named-entities
|
||||
|
||||
Compatible with: spaCy v2.1.0+
|
||||
Last tested with: v2.1.0
|
||||
Last tested with: v2.2.4
|
||||
"""
|
||||
from __future__ import unicode_literals, print_function
|
||||
|
||||
import plac
|
||||
import random
|
||||
import warnings
|
||||
from pathlib import Path
|
||||
import spacy
|
||||
from spacy.util import minibatch, compounding
|
||||
|
@ -94,8 +95,10 @@ def main(model=None, new_model_name="animal", output_dir=None, n_iter=30):
|
|||
else:
|
||||
optimizer = nlp.resume_training()
|
||||
move_names = list(ner.move_names)
|
||||
with nlp.select_pipes(enable="ner") and warnings.catch_warnings():
|
||||
# show warnings for misaligned entity spans once
|
||||
warnings.filterwarnings("once", category=UserWarning, module="spacy")
|
||||
|
||||
with nlp.select_pipes(enable="ner"): # only train NER
|
||||
sizes = compounding(1.0, 4.0, 1.001)
|
||||
# batch up the examples using spaCy's minibatch
|
||||
for itn in range(n_iter):
|
||||
|
|
62
netlify.toml
62
netlify.toml
|
@ -7,42 +7,42 @@ redirects = [
|
|||
{from = "https://alpha.spacy.io/*", to = "https://spacy.io", force = true},
|
||||
{from = "http://alpha.spacy.io/*", to = "https://spacy.io", force = true},
|
||||
# Old demos
|
||||
{from = "/demos/*", to = "https://explosion.ai/demos/:splat"},
|
||||
{from = "/demos/*", to = "https://explosion.ai/demos/:splat", force = true},
|
||||
# Old blog
|
||||
{from = "/blog/*", to = "https://explosion.ai/blog/:splat"},
|
||||
{from = "/feed", to = "https://explosion.ai/feed"},
|
||||
{from = "/feed.xml", to = "https://explosion.ai/feed"},
|
||||
{from = "/blog/*", to = "https://explosion.ai/blog/:splat", force = true},
|
||||
{from = "/feed", to = "https://explosion.ai/feed", force = true},
|
||||
{from = "/feed.xml", to = "https://explosion.ai/feed", force = true},
|
||||
# Old documentation pages (1.x)
|
||||
{from = "/docs/usage/processing-text", to = "/usage/linguistic-features"},
|
||||
{from = "/docs/usage/deep-learning", to = "/usage/training"},
|
||||
{from = "/docs/usage/pos-tagging", to = "/usage/linguistic-features#pos-tagging"},
|
||||
{from = "/docs/usage/dependency-parse", to = "/usage/linguistic-features#dependency-parse"},
|
||||
{from = "/docs/usage/entity-recognition", to = "/usage/linguistic-features#named-entities"},
|
||||
{from = "/docs/usage/word-vectors-similarities", to = "/usage/vectors-similarity"},
|
||||
{from = "/docs/usage/customizing-tokenizer", to = "/usage/linguistic-features#tokenization"},
|
||||
{from = "/docs/usage/language-processing-pipeline", to = "/usage/processing-pipelines"},
|
||||
{from = "/docs/usage/customizing-pipeline", to = "/usage/processing-pipelines"},
|
||||
{from = "/docs/usage/training-ner", to = "/usage/training#ner"},
|
||||
{from = "/docs/usage/tutorials", to = "/usage/examples"},
|
||||
{from = "/docs/usage/data-model", to = "/api"},
|
||||
{from = "/docs/usage/cli", to = "/api/cli"},
|
||||
{from = "/docs/usage/lightning-tour", to = "/usage/spacy-101#lightning-tour"},
|
||||
{from = "/docs/api/language-models", to = "/usage/models#languages"},
|
||||
{from = "/docs/api/spacy", to = "/docs/api/top-level"},
|
||||
{from = "/docs/api/displacy", to = "/api/top-level#displacy"},
|
||||
{from = "/docs/api/util", to = "/api/top-level#util"},
|
||||
{from = "/docs/api/features", to = "/models/#architecture"},
|
||||
{from = "/docs/api/philosophy", to = "/usage/spacy-101"},
|
||||
{from = "/docs/usage/showcase", to = "/universe"},
|
||||
{from = "/tutorials/load-new-word-vectors", to = "/usage/vectors-similarity#custom"},
|
||||
{from = "/tutorials", to = "/usage/examples"},
|
||||
{from = "/docs/usage/processing-text", to = "/usage/linguistic-features", force = true},
|
||||
{from = "/docs/usage/deep-learning", to = "/usage/training", force = true},
|
||||
{from = "/docs/usage/pos-tagging", to = "/usage/linguistic-features#pos-tagging", force = true},
|
||||
{from = "/docs/usage/dependency-parse", to = "/usage/linguistic-features#dependency-parse", force = true},
|
||||
{from = "/docs/usage/entity-recognition", to = "/usage/linguistic-features#named-entities", force = true},
|
||||
{from = "/docs/usage/word-vectors-similarities", to = "/usage/vectors-similarity", force = true},
|
||||
{from = "/docs/usage/customizing-tokenizer", to = "/usage/linguistic-features#tokenization", force = true},
|
||||
{from = "/docs/usage/language-processing-pipeline", to = "/usage/processing-pipelines", force = true},
|
||||
{from = "/docs/usage/customizing-pipeline", to = "/usage/processing-pipelines", force = true},
|
||||
{from = "/docs/usage/training-ner", to = "/usage/training#ner", force = true},
|
||||
{from = "/docs/usage/tutorials", to = "/usage/examples", force = true},
|
||||
{from = "/docs/usage/data-model", to = "/api", force = true},
|
||||
{from = "/docs/usage/cli", to = "/api/cli", force = true},
|
||||
{from = "/docs/usage/lightning-tour", to = "/usage/spacy-101#lightning-tour", force = true},
|
||||
{from = "/docs/api/language-models", to = "/usage/models#languages", force = true},
|
||||
{from = "/docs/api/spacy", to = "/docs/api/top-level", force = true},
|
||||
{from = "/docs/api/displacy", to = "/api/top-level#displacy", force = true},
|
||||
{from = "/docs/api/util", to = "/api/top-level#util", force = true},
|
||||
{from = "/docs/api/features", to = "/models/#architecture", force = true},
|
||||
{from = "/docs/api/philosophy", to = "/usage/spacy-101", force = true},
|
||||
{from = "/docs/usage/showcase", to = "/universe", force = true},
|
||||
{from = "/tutorials/load-new-word-vectors", to = "/usage/vectors-similarity#custom", force = true},
|
||||
{from = "/tutorials", to = "/usage/examples", force = true},
|
||||
# Rewrite all other docs pages to /
|
||||
{from = "/docs/*", to = "/:splat"},
|
||||
# Updated documentation pages
|
||||
{from = "/usage/resources", to = "/universe"},
|
||||
{from = "/usage/lightning-tour", to = "/usage/spacy-101#lightning-tour"},
|
||||
{from = "/usage/linguistic-features#rule-based-matching", to = "/usage/rule-based-matching"},
|
||||
{from = "/models/comparison", to = "/models"},
|
||||
{from = "/usage/resources", to = "/universe", force = true},
|
||||
{from = "/usage/lightning-tour", to = "/usage/spacy-101#lightning-tour", force = true},
|
||||
{from = "/usage/linguistic-features#rule-based-matching", to = "/usage/rule-based-matching", force = true},
|
||||
{from = "/models/comparison", to = "/models", force = true},
|
||||
{from = "/api/#section-cython", to = "/api/cython", force = true},
|
||||
{from = "/api/#cython", to = "/api/cython", force = true},
|
||||
{from = "/api/sentencesegmenter", to="/api/sentencizer"},
|
||||
|
|
18
setup.cfg
18
setup.cfg
|
@ -61,19 +61,23 @@ install_requires =
|
|||
|
||||
[options.extras_require]
|
||||
lookups =
|
||||
spacy_lookups_data>=0.0.5,<0.2.0
|
||||
spacy_lookups_data>=0.3.1,<0.4.0
|
||||
cuda =
|
||||
cupy>=5.0.0b4
|
||||
cupy>=5.0.0b4,<9.0.0
|
||||
cuda80 =
|
||||
cupy-cuda80>=5.0.0b4
|
||||
cupy-cuda80>=5.0.0b4,<9.0.0
|
||||
cuda90 =
|
||||
cupy-cuda90>=5.0.0b4
|
||||
cupy-cuda90>=5.0.0b4,<9.0.0
|
||||
cuda91 =
|
||||
cupy-cuda91>=5.0.0b4
|
||||
cupy-cuda91>=5.0.0b4,<9.0.0
|
||||
cuda92 =
|
||||
cupy-cuda92>=5.0.0b4
|
||||
cupy-cuda92>=5.0.0b4,<9.0.0
|
||||
cuda100 =
|
||||
cupy-cuda100>=5.0.0b4
|
||||
cupy-cuda100>=5.0.0b4,<9.0.0
|
||||
cuda101 =
|
||||
cupy-cuda101>=5.0.0b4,<9.0.0
|
||||
cuda102 =
|
||||
cupy-cuda102>=5.0.0b4,<9.0.0
|
||||
# Language tokenizers with external dependencies
|
||||
ja =
|
||||
fugashi>=0.1.3
|
||||
|
|
|
@ -15,7 +15,7 @@ cdef enum attr_id_t:
|
|||
LIKE_NUM
|
||||
LIKE_EMAIL
|
||||
IS_STOP
|
||||
IS_OOV
|
||||
IS_OOV_DEPRECATED
|
||||
IS_BRACKET
|
||||
IS_QUOTE
|
||||
IS_LEFT_PUNCT
|
||||
|
@ -95,3 +95,4 @@ cdef enum attr_id_t:
|
|||
ENT_ID = symbols.ENT_ID
|
||||
|
||||
IDX
|
||||
SENT_END
|
|
@ -13,7 +13,7 @@ IDS = {
|
|||
"LIKE_NUM": LIKE_NUM,
|
||||
"LIKE_EMAIL": LIKE_EMAIL,
|
||||
"IS_STOP": IS_STOP,
|
||||
"IS_OOV": IS_OOV,
|
||||
"IS_OOV_DEPRECATED": IS_OOV_DEPRECATED,
|
||||
"IS_BRACKET": IS_BRACKET,
|
||||
"IS_QUOTE": IS_QUOTE,
|
||||
"IS_LEFT_PUNCT": IS_LEFT_PUNCT,
|
||||
|
@ -85,6 +85,7 @@ IDS = {
|
|||
"ENT_KB_ID": ENT_KB_ID,
|
||||
"HEAD": HEAD,
|
||||
"SENT_START": SENT_START,
|
||||
"SENT_END": SENT_END,
|
||||
"SPACY": SPACY,
|
||||
"PROB": PROB,
|
||||
"LANG": LANG,
|
||||
|
|
|
@ -89,11 +89,11 @@ def debug_data(
|
|||
msg.good("Corpus is loadable")
|
||||
|
||||
# Create all gold data here to avoid iterating over the train_dataset constantly
|
||||
gold_train_data = _compile_gold(train_dataset, pipeline)
|
||||
gold_train_data = _compile_gold(train_dataset, pipeline, nlp)
|
||||
gold_train_unpreprocessed_data = _compile_gold(
|
||||
train_dataset_unpreprocessed, pipeline
|
||||
)
|
||||
gold_dev_data = _compile_gold(dev_dataset, pipeline)
|
||||
gold_dev_data = _compile_gold(dev_dataset, pipeline, nlp)
|
||||
|
||||
train_texts = gold_train_data["texts"]
|
||||
dev_texts = gold_dev_data["texts"]
|
||||
|
@ -151,6 +151,21 @@ def debug_data(
|
|||
f"{len(nlp.vocab.vectors)} vectors ({nlp.vocab.vectors.n_keys} "
|
||||
f"unique keys, {nlp.vocab.vectors_length} dimensions)"
|
||||
)
|
||||
n_missing_vectors = sum(gold_train_data["words_missing_vectors"].values())
|
||||
msg.warn(
|
||||
"{} words in training data without vectors ({:0.2f}%)".format(
|
||||
n_missing_vectors, n_missing_vectors / gold_train_data["n_words"],
|
||||
),
|
||||
)
|
||||
msg.text(
|
||||
"10 most common words without vectors: {}".format(
|
||||
_format_labels(
|
||||
gold_train_data["words_missing_vectors"].most_common(10),
|
||||
counts=True,
|
||||
)
|
||||
),
|
||||
show=verbose,
|
||||
)
|
||||
else:
|
||||
msg.info("No word vectors present in the model")
|
||||
|
||||
|
@ -450,7 +465,7 @@ def _load_file(file_path, msg):
|
|||
)
|
||||
|
||||
|
||||
def _compile_gold(examples, pipeline):
|
||||
def _compile_gold(examples, pipeline, nlp):
|
||||
data = {
|
||||
"ner": Counter(),
|
||||
"cats": Counter(),
|
||||
|
@ -462,6 +477,7 @@ def _compile_gold(examples, pipeline):
|
|||
"punct_ents": 0,
|
||||
"n_words": 0,
|
||||
"n_misaligned_words": 0,
|
||||
"words_missing_vectors": Counter(),
|
||||
"n_sents": 0,
|
||||
"n_nonproj": 0,
|
||||
"n_cycles": 0,
|
||||
|
@ -476,6 +492,10 @@ def _compile_gold(examples, pipeline):
|
|||
data["n_words"] += len(valid_words)
|
||||
data["n_misaligned_words"] += len(gold.words) - len(valid_words)
|
||||
data["texts"].add(doc.text)
|
||||
if len(nlp.vocab.vectors):
|
||||
for word in valid_words:
|
||||
if nlp.vocab.strings[word] not in nlp.vocab.vectors:
|
||||
data["words_missing_vectors"].update([word])
|
||||
if "ner" in pipeline:
|
||||
for i, label in enumerate(gold.ner):
|
||||
if label is None:
|
||||
|
|
|
@ -32,7 +32,10 @@ def evaluate(
|
|||
if displacy_path and not displacy_path.exists():
|
||||
msg.fail("Visualization output directory not found", displacy_path, exits=1)
|
||||
corpus = GoldCorpus(data_path, data_path)
|
||||
nlp = util.load_model(model)
|
||||
if model.startswith("blank:"):
|
||||
nlp = util.get_lang_class(model.replace("blank:", ""))()
|
||||
else:
|
||||
nlp = util.load_model(model)
|
||||
dev_dataset = list(corpus.dev_dataset(nlp, gold_preproc=gold_preproc))
|
||||
begin = timer()
|
||||
scorer = nlp.evaluate(dev_dataset, verbose=False)
|
||||
|
|
|
@ -8,12 +8,13 @@ import tarfile
|
|||
import gzip
|
||||
import zipfile
|
||||
import srsly
|
||||
from wasabi import msg
|
||||
import warnings
|
||||
from wasabi import msg
|
||||
|
||||
from ..vectors import Vectors
|
||||
from ..errors import Errors, Warnings
|
||||
from ..util import ensure_path, get_lang_class
|
||||
from ..util import ensure_path, get_lang_class, load_model, OOV_RANK
|
||||
from ..lookups import Lookups
|
||||
|
||||
try:
|
||||
import ftfy
|
||||
|
@ -33,8 +34,11 @@ def init_model(
|
|||
jsonl_loc: ("Location of JSONL-formatted attributes file", "option", "j", Path) = None,
|
||||
vectors_loc: ("Optional vectors file in Word2Vec format", "option", "v", str) = None,
|
||||
prune_vectors: ("Optional number of vectors to prune to", "option", "V", int) = -1,
|
||||
truncate_vectors: ("Optional number of vectors to truncate to when reading in vectors file", "option", "t", int) = 0,
|
||||
vectors_name: ("Optional name for the word vectors, e.g. en_core_web_lg.vectors", "option", "vn", str) = None,
|
||||
model_name: ("Optional name for the model meta", "option", "mn", str) = None,
|
||||
omit_extra_lookups: ("Don't include extra lookups in model", "flag", "OEL", bool) = False,
|
||||
base_model: ("Base model (for languages with custom tokenizers)", "option", "b", str) = None
|
||||
# fmt: on
|
||||
):
|
||||
"""
|
||||
|
@ -67,10 +71,19 @@ def init_model(
|
|||
lex_attrs = read_attrs_from_deprecated(freqs_loc, clusters_loc)
|
||||
|
||||
with msg.loading("Creating model..."):
|
||||
nlp = create_model(lang, lex_attrs, name=model_name)
|
||||
nlp = create_model(lang, lex_attrs, name=model_name, base_model=base_model)
|
||||
|
||||
# Create empty extra lexeme tables so the data from spacy-lookups-data
|
||||
# isn't loaded if these features are accessed
|
||||
if omit_extra_lookups:
|
||||
nlp.vocab.lookups_extra = Lookups()
|
||||
nlp.vocab.lookups_extra.add_table("lexeme_cluster")
|
||||
nlp.vocab.lookups_extra.add_table("lexeme_prob")
|
||||
nlp.vocab.lookups_extra.add_table("lexeme_settings")
|
||||
|
||||
msg.good("Successfully created model")
|
||||
if vectors_loc is not None:
|
||||
add_vectors(nlp, vectors_loc, prune_vectors, vectors_name)
|
||||
add_vectors(nlp, vectors_loc, truncate_vectors, prune_vectors, vectors_name)
|
||||
vec_added = len(nlp.vocab.vectors)
|
||||
lex_added = len(nlp.vocab)
|
||||
msg.good(
|
||||
|
@ -126,20 +139,23 @@ def read_attrs_from_deprecated(freqs_loc, clusters_loc):
|
|||
return lex_attrs
|
||||
|
||||
|
||||
def create_model(lang, lex_attrs, name=None):
|
||||
lang_class = get_lang_class(lang)
|
||||
nlp = lang_class()
|
||||
def create_model(lang, lex_attrs, name=None, base_model=None):
|
||||
if base_model:
|
||||
nlp = load_model(base_model)
|
||||
# keep the tokenizer but remove any existing pipeline components due to
|
||||
# potentially conflicting vectors
|
||||
for pipe in nlp.pipe_names:
|
||||
nlp.remove_pipe(pipe)
|
||||
else:
|
||||
lang_class = get_lang_class(lang)
|
||||
nlp = lang_class()
|
||||
for lexeme in nlp.vocab:
|
||||
lexeme.rank = 0
|
||||
lex_added = 0
|
||||
lexeme.rank = OOV_RANK
|
||||
for attrs in lex_attrs:
|
||||
if "settings" in attrs:
|
||||
continue
|
||||
lexeme = nlp.vocab[attrs["orth"]]
|
||||
lexeme.set_attrs(**attrs)
|
||||
lexeme.is_oov = False
|
||||
lex_added += 1
|
||||
lex_added += 1
|
||||
if len(nlp.vocab):
|
||||
oov_prob = min(lex.prob for lex in nlp.vocab) - 1
|
||||
else:
|
||||
|
@ -150,12 +166,12 @@ def create_model(lang, lex_attrs, name=None):
|
|||
return nlp
|
||||
|
||||
|
||||
def add_vectors(nlp, vectors_loc, prune_vectors, name=None):
|
||||
def add_vectors(nlp, vectors_loc, truncate_vectors, prune_vectors, name=None):
|
||||
vectors_loc = ensure_path(vectors_loc)
|
||||
if vectors_loc and vectors_loc.parts[-1].endswith(".npz"):
|
||||
nlp.vocab.vectors = Vectors(data=numpy.load(vectors_loc.open("rb")))
|
||||
for lex in nlp.vocab:
|
||||
if lex.rank:
|
||||
if lex.rank and lex.rank != OOV_RANK:
|
||||
nlp.vocab.vectors.add(lex.orth, row=lex.rank)
|
||||
else:
|
||||
if vectors_loc:
|
||||
|
@ -167,8 +183,7 @@ def add_vectors(nlp, vectors_loc, prune_vectors, name=None):
|
|||
if vector_keys is not None:
|
||||
for word in vector_keys:
|
||||
if word not in nlp.vocab:
|
||||
lexeme = nlp.vocab[word]
|
||||
lexeme.is_oov = False
|
||||
nlp.vocab[word]
|
||||
if vectors_data is not None:
|
||||
nlp.vocab.vectors = Vectors(data=vectors_data, keys=vector_keys)
|
||||
if name is None:
|
||||
|
@ -180,9 +195,11 @@ def add_vectors(nlp, vectors_loc, prune_vectors, name=None):
|
|||
nlp.vocab.prune_vectors(prune_vectors)
|
||||
|
||||
|
||||
def read_vectors(vectors_loc):
|
||||
def read_vectors(vectors_loc, truncate_vectors=0):
|
||||
f = open_file(vectors_loc)
|
||||
shape = tuple(int(size) for size in next(f).split())
|
||||
if truncate_vectors >= 1:
|
||||
shape = (truncate_vectors, shape[1])
|
||||
vectors_data = numpy.zeros(shape=shape, dtype="f")
|
||||
vectors_keys = []
|
||||
for i, line in enumerate(tqdm(f)):
|
||||
|
@ -193,6 +210,8 @@ def read_vectors(vectors_loc):
|
|||
msg.fail(Errors.E094.format(line_num=i, loc=vectors_loc), exits=1)
|
||||
vectors_data[i] = numpy.asarray(pieces, dtype="f")
|
||||
vectors_keys.append(word)
|
||||
if i == truncate_vectors - 1:
|
||||
break
|
||||
return vectors_data, vectors_keys
|
||||
|
||||
|
||||
|
|
|
@ -11,8 +11,8 @@ import random
|
|||
|
||||
from ..util import create_default_optimizer
|
||||
from ..util import use_gpu as set_gpu
|
||||
from ..attrs import PROB, IS_OOV, CLUSTER, LANG
|
||||
from ..gold import GoldCorpus
|
||||
from ..lookups import Lookups
|
||||
from .. import util
|
||||
from .. import about
|
||||
|
||||
|
@ -46,6 +46,7 @@ def train(
|
|||
textcat_arch: ("Textcat model architecture", "option", "ta", str) = "bow",
|
||||
textcat_positive_label: ("Textcat positive label for binary classes with two labels", "option", "tpl", str) = None,
|
||||
tag_map_path: ("Location of JSON-formatted tag map", "option", "tm", Path) = None,
|
||||
omit_extra_lookups: ("Don't include extra lookups in model", "flag", "OEL", bool) = False,
|
||||
verbose: ("Display more information for debug", "flag", "VV", bool) = False,
|
||||
debug: ("Run data diagnostics before training", "flag", "D", bool) = False,
|
||||
# fmt: on
|
||||
|
@ -111,7 +112,7 @@ def train(
|
|||
eval_beam_widths.sort()
|
||||
has_beam_widths = eval_beam_widths != [1]
|
||||
|
||||
default_dir = Path(__file__).parent.parent / "ml" / "models" / "defaults"
|
||||
default_dir = Path(__file__).parent.parent / "pipeline" / "defaults"
|
||||
|
||||
# Set up the base model and pipeline. If a base model is specified, load
|
||||
# the model and make sure the pipeline matches the pipeline setting. If
|
||||
|
@ -252,6 +253,18 @@ def train(
|
|||
# Update tag map with provided mapping
|
||||
nlp.vocab.morphology.tag_map.update(tag_map)
|
||||
|
||||
# Create empty extra lexeme tables so the data from spacy-lookups-data
|
||||
# isn't loaded if these features are accessed
|
||||
if omit_extra_lookups:
|
||||
nlp.vocab.lookups_extra = Lookups()
|
||||
nlp.vocab.lookups_extra.add_table("lexeme_cluster")
|
||||
nlp.vocab.lookups_extra.add_table("lexeme_prob")
|
||||
nlp.vocab.lookups_extra.add_table("lexeme_settings")
|
||||
|
||||
if vectors:
|
||||
msg.text("Loading vector from model '{}'".format(vectors))
|
||||
_load_vectors(nlp, vectors)
|
||||
|
||||
# Multitask objectives
|
||||
multitask_options = [("parser", parser_multitasks), ("ner", entity_multitasks)]
|
||||
for pipe_name, multitasks in multitask_options:
|
||||
|
@ -355,7 +368,7 @@ def train(
|
|||
if len(textcat_labels) == 2:
|
||||
msg.warn(
|
||||
"If the textcat component is a binary classifier with "
|
||||
"exclusive classes, provide '--textcat_positive_label' for "
|
||||
"exclusive classes, provide '--textcat-positive-label' for "
|
||||
"an evaluation on the positive class."
|
||||
)
|
||||
msg.text(
|
||||
|
@ -445,22 +458,25 @@ def train(
|
|||
cpu_wps = nwords / (end_time - start_time)
|
||||
else:
|
||||
gpu_wps = nwords / (end_time - start_time)
|
||||
with use_ops("numpy"):
|
||||
nlp_loaded = util.load_model_from_path(epoch_model_path)
|
||||
for name, component in nlp_loaded.pipeline:
|
||||
if hasattr(component, "cfg"):
|
||||
component.cfg["beam_width"] = beam_width
|
||||
dev_dataset = list(
|
||||
corpus.dev_dataset(
|
||||
nlp_loaded,
|
||||
gold_preproc=gold_preproc,
|
||||
ignore_misaligned=True,
|
||||
# Evaluate on CPU in the first iteration only (for
|
||||
# timing) when GPU is enabled
|
||||
if i == 0:
|
||||
with use_ops("numpy"):
|
||||
nlp_loaded = util.load_model_from_path(epoch_model_path)
|
||||
for name, component in nlp_loaded.pipeline:
|
||||
if hasattr(component, "cfg"):
|
||||
component.cfg["beam_width"] = beam_width
|
||||
dev_dataset = list(
|
||||
corpus.dev_dataset(
|
||||
nlp_loaded,
|
||||
gold_preproc=gold_preproc,
|
||||
ignore_misaligned=True,
|
||||
)
|
||||
)
|
||||
)
|
||||
start_time = timer()
|
||||
scorer = nlp_loaded.evaluate(dev_dataset, verbose=verbose)
|
||||
end_time = timer()
|
||||
cpu_wps = nwords / (end_time - start_time)
|
||||
start_time = timer()
|
||||
scorer = nlp_loaded.evaluate(dev_dataset, verbose=verbose)
|
||||
end_time = timer()
|
||||
cpu_wps = nwords / (end_time - start_time)
|
||||
acc_loc = output_path / f"model{i}" / "accuracy.json"
|
||||
srsly.write_json(acc_loc, scorer.scores)
|
||||
|
||||
|
@ -536,7 +552,7 @@ def train(
|
|||
)
|
||||
break
|
||||
except Exception as e:
|
||||
msg.warn(f"Aborting and saving final best model. Encountered exception: {e}")
|
||||
msg.warn(f"Aborting and saving final best model. Encountered exception: {e}", exits=1)
|
||||
finally:
|
||||
best_pipes = nlp.pipe_names
|
||||
if disabled_pipes:
|
||||
|
@ -614,17 +630,7 @@ def _create_progress_bar(total):
|
|||
|
||||
|
||||
def _load_vectors(nlp, vectors):
|
||||
loaded_model = util.load_model(vectors, vocab=nlp.vocab)
|
||||
for lex in nlp.vocab:
|
||||
values = {}
|
||||
for attr, func in nlp.vocab.lex_attr_getters.items():
|
||||
# These attrs are expected to be set by data. Others should
|
||||
# be set by calling the language functions.
|
||||
if attr not in (CLUSTER, PROB, IS_OOV, LANG):
|
||||
values[lex.vocab.strings[attr]] = func(lex.orth_)
|
||||
lex.set_attrs(**values)
|
||||
lex.is_oov = False
|
||||
return loaded_model
|
||||
util.load_model(vectors, vocab=nlp.vocab)
|
||||
|
||||
|
||||
def _load_pretrained_tok2vec(nlp, loc):
|
||||
|
|
|
@ -1,10 +1,13 @@
|
|||
def add_codes(err_cls):
|
||||
"""Add error codes to string messages via class attribute names."""
|
||||
|
||||
class ErrorsWithCodes(object):
|
||||
class ErrorsWithCodes(err_cls):
|
||||
def __getattribute__(self, code):
|
||||
msg = getattr(err_cls, code)
|
||||
return f"[{code}] {msg}"
|
||||
msg = super().__getattribute__(code)
|
||||
if code.startswith("__"): # python system attributes like __class__
|
||||
return msg
|
||||
else:
|
||||
return "[{code}] {msg}".format(code=code, msg=msg)
|
||||
|
||||
return ErrorsWithCodes()
|
||||
|
||||
|
@ -88,6 +91,8 @@ class Warnings(object):
|
|||
"or the language you're using doesn't have lemmatization data, "
|
||||
"you can ignore this warning. If this is surprising, make sure you "
|
||||
"have the spacy-lookups-data package installed.")
|
||||
W023 = ("Multiprocessing of Language.pipe is not supported in Python 2. "
|
||||
"'n_process' will be set to 1.")
|
||||
W024 = ("Entity '{entity}' - Alias '{alias}' combination already exists in "
|
||||
"the Knowledge Base.")
|
||||
W025 = ("'{name}' requires '{attr}' to be assigned, but none of the "
|
||||
|
@ -99,9 +104,13 @@ class Warnings(object):
|
|||
W028 = ("Doc.from_array was called with a vector of type '{type}', "
|
||||
"but is expecting one of type 'uint64' instead. This may result "
|
||||
"in problems with the vocab further on in the pipeline.")
|
||||
W029 = ("Skipping unsupported morphological feature(s): {feature}. "
|
||||
"Provide features as a dict {{\"Field1\": \"Value1,Value2\"}} or "
|
||||
"string \"Field1=Value1,Value2|Field2=Value3\".")
|
||||
W029 = ("Unable to align tokens with entities from character offsets. "
|
||||
"Discarding entity annotation for the text: {text}.")
|
||||
W030 = ("Some entities could not be aligned in the text \"{text}\" with "
|
||||
"entities \"{entities}\". Use "
|
||||
"`spacy.gold.biluo_tags_from_offsets(nlp.make_doc(text), entities)`"
|
||||
" to check the alignment. Misaligned entities ('-') will be "
|
||||
"ignored during training.")
|
||||
|
||||
# TODO: fix numbering after merging develop into master
|
||||
W095 = ("Model '{model}' ({model_version}) requires spaCy {version} and is "
|
||||
|
@ -118,6 +127,9 @@ class Warnings(object):
|
|||
"so a default configuration was used.")
|
||||
W099 = ("Expected 'dict' type for the 'model' argument of pipe '{pipe}', "
|
||||
"but got '{type}' instead, so ignoring it.")
|
||||
W100 = ("Skipping unsupported morphological feature(s): {feature}. "
|
||||
"Provide features as a dict {{\"Field1\": \"Value1,Value2\"}} or "
|
||||
"string \"Field1=Value1,Value2|Field2=Value3\".")
|
||||
|
||||
|
||||
@add_codes
|
||||
|
@ -551,6 +563,17 @@ class Errors(object):
|
|||
"array.")
|
||||
E191 = ("Invalid head: the head token must be from the same doc as the "
|
||||
"token itself.")
|
||||
E192 = ("Unable to resize vectors in place with cupy.")
|
||||
E193 = ("Unable to resize vectors in place if the resized vector dimension "
|
||||
"({new_dim}) is not the same as the current vector dimension "
|
||||
"({curr_dim}).")
|
||||
E194 = ("Unable to aligned mismatched text '{text}' and words '{words}'.")
|
||||
E195 = ("Matcher can be called on {good} only, got {got}.")
|
||||
E196 = ("Refusing to write to token.is_sent_end. Sentence boundaries can "
|
||||
"only be fixed with token.is_sent_start.")
|
||||
E197 = ("Row out of bounds, unable to add row {row} for key {key}.")
|
||||
E198 = ("Unable to return {n} most similar vectors for the current vectors "
|
||||
"table, which contains {n_rows} vectors.")
|
||||
|
||||
# TODO: fix numbering after merging develop into master
|
||||
|
||||
|
|
130
spacy/gold.pyx
130
spacy/gold.pyx
|
@ -47,13 +47,27 @@ def tags_to_entities(tags):
|
|||
return entities
|
||||
|
||||
|
||||
def merge_sents(sents):
|
||||
m_deps = [[], [], [], [], [], []]
|
||||
m_cats = {}
|
||||
m_brackets = []
|
||||
i = 0
|
||||
for (ids, words, tags, heads, labels, ner), (cats, brackets) in sents:
|
||||
m_deps[0].extend(id_ + i for id_ in ids)
|
||||
m_deps[1].extend(words)
|
||||
m_deps[2].extend(tags)
|
||||
m_deps[3].extend(head + i for head in heads)
|
||||
m_deps[4].extend(labels)
|
||||
m_deps[5].extend(ner)
|
||||
m_brackets.extend((b["first"] + i, b["last"] + i, b["label"])
|
||||
for b in brackets)
|
||||
m_cats.update(cats)
|
||||
i += len(ids)
|
||||
return [(m_deps, (m_cats, m_brackets))]
|
||||
|
||||
|
||||
def _normalize_for_alignment(tokens):
|
||||
tokens = [w.replace(" ", "").lower() for w in tokens]
|
||||
output = []
|
||||
for token in tokens:
|
||||
token = token.replace(" ", "").lower()
|
||||
output.append(token)
|
||||
return output
|
||||
return [w.replace(" ", "").lower() for w in tokens]
|
||||
|
||||
|
||||
def align(tokens_a, tokens_b):
|
||||
|
@ -348,6 +362,7 @@ def make_orth_variants(nlp, example, orth_variant_level=0.0):
|
|||
if not example.token_annotation:
|
||||
return example
|
||||
raw = example.text
|
||||
lower = False
|
||||
if random.random() >= 0.5:
|
||||
lower = True
|
||||
if raw is not None:
|
||||
|
@ -415,8 +430,11 @@ def make_orth_variants(nlp, example, orth_variant_level=0.0):
|
|||
raw_idx += 1
|
||||
for word in variant_example.token_annotation.words:
|
||||
match_found = False
|
||||
# skip whitespace words
|
||||
if word.isspace():
|
||||
match_found = True
|
||||
# add identical word
|
||||
if word not in variants and raw[raw_idx:].startswith(word):
|
||||
elif word not in variants and raw[raw_idx:].startswith(word):
|
||||
variant_raw += word
|
||||
raw_idx += len(word)
|
||||
match_found = True
|
||||
|
@ -1031,8 +1049,17 @@ cdef class GoldParse:
|
|||
self.cats = {} if cats is None else dict(cats)
|
||||
self.links = {} if links is None else dict(links)
|
||||
|
||||
# temporary doc for aligning entity annotation
|
||||
entdoc = None
|
||||
|
||||
# avoid allocating memory if the doc does not contain any tokens
|
||||
if self.length == 0:
|
||||
self.words = []
|
||||
self.tags = []
|
||||
self.heads = []
|
||||
self.labels = []
|
||||
self.ner = []
|
||||
self.morphs = []
|
||||
# set a minimal orig so that the scorer can score an empty doc
|
||||
self.orig = TokenAnnotation(ids=[])
|
||||
else:
|
||||
|
@ -1062,7 +1089,25 @@ cdef class GoldParse:
|
|||
entities = [(ent if ent is not None else "-") for ent in entities]
|
||||
if not isinstance(entities[0], str):
|
||||
# Assume we have entities specified by character offset.
|
||||
entities = biluo_tags_from_offsets(doc, entities)
|
||||
# Create a temporary Doc corresponding to provided words
|
||||
# (to preserve gold tokenization) and text (to preserve
|
||||
# character offsets).
|
||||
entdoc_words, entdoc_spaces = util.get_words_and_spaces(words, doc.text)
|
||||
entdoc = Doc(doc.vocab, words=entdoc_words, spaces=entdoc_spaces)
|
||||
entdoc_entities = biluo_tags_from_offsets(entdoc, entities)
|
||||
# There may be some additional whitespace tokens in the
|
||||
# temporary doc, so check that the annotations align with
|
||||
# the provided words while building a list of BILUO labels.
|
||||
entities = []
|
||||
words_offset = 0
|
||||
for i in range(len(entdoc_words)):
|
||||
if words[i + words_offset] == entdoc_words[i]:
|
||||
entities.append(entdoc_entities[i])
|
||||
else:
|
||||
words_offset -= 1
|
||||
if len(entities) != len(words):
|
||||
warnings.warn(Warnings.W029.format(text=doc.text))
|
||||
entities = ["-" for _ in words]
|
||||
|
||||
# These are filled by the tagger/parser/entity recogniser
|
||||
self.c.tags = <int*>self.mem.alloc(len(doc), sizeof(int))
|
||||
|
@ -1092,7 +1137,8 @@ cdef class GoldParse:
|
|||
# If we under-segment, we'll have one predicted word that covers a
|
||||
# sequence of gold words.
|
||||
# If we "mis-segment", we'll have a sequence of predicted words covering
|
||||
# a sequence of gold words. That's many-to-many -- we don't do that.
|
||||
# a sequence of gold words. That's many-to-many -- we don't do that
|
||||
# except for NER spans where the start and end can be aligned.
|
||||
cost, i2j, j2i, i2j_multi, j2i_multi = align([t.orth_ for t in doc], words)
|
||||
|
||||
self.cand_to_gold = [(j if j >= 0 else None) for j in i2j]
|
||||
|
@ -1123,7 +1169,6 @@ cdef class GoldParse:
|
|||
self.lemmas[i] = lemmas[i2j_multi[i]]
|
||||
self.sent_starts[i] = sent_starts[i2j_multi[i]]
|
||||
is_last = i2j_multi[i] != i2j_multi.get(i+1)
|
||||
is_first = i2j_multi[i] != i2j_multi.get(i-1)
|
||||
# Set next word in multi-token span as head, until last
|
||||
if not is_last:
|
||||
self.heads[i] = i+1
|
||||
|
@ -1133,30 +1178,10 @@ cdef class GoldParse:
|
|||
if head_i:
|
||||
self.heads[i] = self.gold_to_cand[head_i]
|
||||
self.labels[i] = deps[i2j_multi[i]]
|
||||
# Now set NER...This is annoying because if we've split
|
||||
# got an entity word split into two, we need to adjust the
|
||||
# BILUO tags. We can't have BB or LL etc.
|
||||
# Case 1: O -- easy.
|
||||
ner_tag = entities[i2j_multi[i]]
|
||||
if ner_tag == "O":
|
||||
self.ner[i] = "O"
|
||||
# Case 2: U. This has to become a B I* L sequence.
|
||||
elif ner_tag.startswith("U-"):
|
||||
if is_first:
|
||||
self.ner[i] = ner_tag.replace("U-", "B-", 1)
|
||||
elif is_last:
|
||||
self.ner[i] = ner_tag.replace("U-", "L-", 1)
|
||||
else:
|
||||
self.ner[i] = ner_tag.replace("U-", "I-", 1)
|
||||
# Case 3: L. If not last, change to I.
|
||||
elif ner_tag.startswith("L-"):
|
||||
if is_last:
|
||||
self.ner[i] = ner_tag
|
||||
else:
|
||||
self.ner[i] = ner_tag.replace("L-", "I-", 1)
|
||||
# Case 4: I. Stays correct
|
||||
elif ner_tag.startswith("I-"):
|
||||
self.ner[i] = ner_tag
|
||||
# Assign O/- for many-to-one O/- NER tags
|
||||
if ner_tag in ("O", "-"):
|
||||
self.ner[i] = ner_tag
|
||||
else:
|
||||
self.words[i] = words[gold_i]
|
||||
self.tags[i] = tags[gold_i]
|
||||
|
@ -1170,6 +1195,39 @@ cdef class GoldParse:
|
|||
self.heads[i] = self.gold_to_cand[heads[gold_i]]
|
||||
self.labels[i] = deps[gold_i]
|
||||
self.ner[i] = entities[gold_i]
|
||||
# Assign O/- for one-to-many O/- NER tags
|
||||
for j, cand_j in enumerate(self.gold_to_cand):
|
||||
if cand_j is None:
|
||||
if j in j2i_multi:
|
||||
i = j2i_multi[j]
|
||||
ner_tag = entities[j]
|
||||
if ner_tag in ("O", "-"):
|
||||
self.ner[i] = ner_tag
|
||||
|
||||
# If there is entity annotation and some tokens remain unaligned,
|
||||
# align all entities at the character level to account for all
|
||||
# possible token misalignments within the entity spans
|
||||
if any([e not in ("O", "-") for e in entities]) and None in self.ner:
|
||||
# If the temporary entdoc wasn't created above, initialize it
|
||||
if not entdoc:
|
||||
entdoc_words, entdoc_spaces = util.get_words_and_spaces(words, doc.text)
|
||||
entdoc = Doc(doc.vocab, words=entdoc_words, spaces=entdoc_spaces)
|
||||
# Get offsets based on gold words and BILUO entities
|
||||
entdoc_offsets = offsets_from_biluo_tags(entdoc, entities)
|
||||
aligned_offsets = []
|
||||
aligned_spans = []
|
||||
# Filter offsets to identify those that align with doc tokens
|
||||
for offset in entdoc_offsets:
|
||||
span = doc.char_span(offset[0], offset[1])
|
||||
if span and not span.text.isspace():
|
||||
aligned_offsets.append(offset)
|
||||
aligned_spans.append(span)
|
||||
# Convert back to BILUO for doc tokens and assign NER for all
|
||||
# aligned spans
|
||||
biluo_tags = biluo_tags_from_offsets(doc, aligned_offsets, missing=None)
|
||||
for span in aligned_spans:
|
||||
for i in range(span.start, span.end):
|
||||
self.ner[i] = biluo_tags[i]
|
||||
|
||||
# Prevent whitespace that isn't within entities from being tagged as
|
||||
# an entity.
|
||||
|
@ -1303,6 +1361,12 @@ def biluo_tags_from_offsets(doc, entities, missing="O"):
|
|||
break
|
||||
else:
|
||||
biluo[token.i] = missing
|
||||
if "-" in biluo:
|
||||
ent_str = str(entities)
|
||||
warnings.warn(Warnings.W030.format(
|
||||
text=doc.text[:50] + "..." if len(doc.text) > 50 else doc.text,
|
||||
entities=ent_str[:50] + "..." if len(ent_str) > 50 else ent_str
|
||||
))
|
||||
return biluo
|
||||
|
||||
|
||||
|
|
|
@ -1,24 +1,19 @@
|
|||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
|
||||
from .stop_words import STOP_WORDS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from .morph_rules import MORPH_RULES
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...language import Language
|
||||
from ...attrs import LANG, NORM
|
||||
from ...util import update_exc, add_lookups
|
||||
from ...attrs import LANG
|
||||
from ...util import update_exc
|
||||
|
||||
|
||||
class DanishDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = lambda text: "da"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
morph_rules = MORPH_RULES
|
||||
infixes = TOKENIZER_INFIXES
|
||||
|
|
|
@ -5,10 +5,13 @@ Example sentences to test spaCy and its language models.
|
|||
>>> docs = nlp.pipe(sentences)
|
||||
"""
|
||||
|
||||
|
||||
sentences = [
|
||||
"Apple overvejer at købe et britisk startup for 1 milliard dollar",
|
||||
"Selvkørende biler flytter forsikringsansvaret over på producenterne",
|
||||
"San Francisco overvejer at forbyde udbringningsrobotter på fortov",
|
||||
"London er en stor by i Storbritannien",
|
||||
"Apple overvejer at købe et britisk startup for 1 milliard dollar.",
|
||||
"Selvkørende biler flytter forsikringsansvaret over på producenterne.",
|
||||
"San Francisco overvejer at forbyde udbringningsrobotter på fortovet.",
|
||||
"London er en storby i Storbritannien.",
|
||||
"Hvor er du?",
|
||||
"Hvem er Frankrings president?",
|
||||
"Hvad er hovedstaden i USA?",
|
||||
"Hvornår blev Barack Obama født?",
|
||||
]
|
||||
|
|
|
@ -1,524 +0,0 @@
|
|||
"""
|
||||
Special-case rules for normalizing tokens to improve the model's predictions.
|
||||
For example 'mysterium' vs 'mysterie' and similar.
|
||||
"""
|
||||
|
||||
# Sources:
|
||||
# 1: https://dsn.dk/retskrivning/om-retskrivningsordbogen/mere-om-retskrivningsordbogen-2012/endrede-stave-og-ordformer/
|
||||
# 2: http://www.tjerry-korrektur.dk/ord-med-flere-stavemaader/
|
||||
|
||||
_exc = {
|
||||
# Alternative spelling
|
||||
"a-kraft-værk": "a-kraftværk", # 1
|
||||
"ålborg": "aalborg", # 2
|
||||
"århus": "aarhus",
|
||||
"accessoirer": "accessoires", # 1
|
||||
"affektert": "affekteret", # 1
|
||||
"afrikander": "afrikaaner", # 1
|
||||
"aftabuere": "aftabuisere", # 1
|
||||
"aftabuering": "aftabuisering", # 1
|
||||
"akvarium": "akvarie", # 1
|
||||
"alenefader": "alenefar", # 1
|
||||
"alenemoder": "alenemor", # 1
|
||||
"alkoholambulatorium": "alkoholambulatorie", # 1
|
||||
"ambulatorium": "ambulatorie", # 1
|
||||
"ananassene": "ananasserne", # 2
|
||||
"anførelsestegn": "anførselstegn", # 1
|
||||
"anseelig": "anselig", # 2
|
||||
"antioxydant": "antioxidant", # 1
|
||||
"artrig": "artsrig", # 1
|
||||
"auditorium": "auditorie", # 1
|
||||
"avocado": "avokado", # 2
|
||||
"bagerst": "bagest", # 2
|
||||
"bagstræv": "bagstræb", # 1
|
||||
"bagstræver": "bagstræber", # 1
|
||||
"bagstræverisk": "bagstræberisk", # 1
|
||||
"balde": "balle", # 2
|
||||
"barselorlov": "barselsorlov", # 1
|
||||
"barselvikar": "barselsvikar", # 1
|
||||
"baskien": "baskerlandet", # 1
|
||||
"bayrisk": "bayersk", # 1
|
||||
"bedstefader": "bedstefar", # 1
|
||||
"bedstemoder": "bedstemor", # 1
|
||||
"behefte": "behæfte", # 1
|
||||
"beheftelse": "behæftelse", # 1
|
||||
"bidragydende": "bidragsydende", # 1
|
||||
"bidragyder": "bidragsyder", # 1
|
||||
"billiondel": "billiontedel", # 1
|
||||
"blaseret": "blasert", # 1
|
||||
"bleskifte": "bleskift", # 1
|
||||
"blodbroder": "blodsbroder", # 2
|
||||
"blyantspidser": "blyantsspidser", # 2
|
||||
"boligministerium": "boligministerie", # 1
|
||||
"borhul": "borehul", # 1
|
||||
"broder": "bror", # 2
|
||||
"buldog": "bulldog", # 2
|
||||
"bådhus": "bådehus", # 1
|
||||
"børnepleje": "barnepleje", # 1
|
||||
"børneseng": "barneseng", # 1
|
||||
"børnestol": "barnestol", # 1
|
||||
"cairo": "kairo", # 1
|
||||
"cambodia": "cambodja", # 1
|
||||
"cambodianer": "cambodjaner", # 1
|
||||
"cambodiansk": "cambodjansk", # 1
|
||||
"camouflage": "kamuflage", # 2
|
||||
"campylobacter": "kampylobakter", # 1
|
||||
"centeret": "centret", # 2
|
||||
"chefskahyt": "chefkahyt", # 1
|
||||
"chefspost": "chefpost", # 1
|
||||
"chefssekretær": "chefsekretær", # 1
|
||||
"chefsstol": "chefstol", # 1
|
||||
"cirkulærskrivelse": "cirkulæreskrivelse", # 1
|
||||
"cognacsglas": "cognacglas", # 1
|
||||
"columnist": "kolumnist", # 1
|
||||
"cricket": "kricket", # 2
|
||||
"dagplejemoder": "dagplejemor", # 1
|
||||
"damaskesdug": "damaskdug", # 1
|
||||
"damp-barn": "dampbarn", # 1
|
||||
"delfinarium": "delfinarie", # 1
|
||||
"dentallaboratorium": "dentallaboratorie", # 1
|
||||
"diaramme": "diasramme", # 1
|
||||
"diaré": "diarré", # 1
|
||||
"dioxyd": "dioxid", # 1
|
||||
"dommedagsprædiken": "dommedagspræken", # 1
|
||||
"donut": "doughnut", # 2
|
||||
"driftmæssig": "driftsmæssig", # 1
|
||||
"driftsikker": "driftssikker", # 1
|
||||
"driftsikring": "driftssikring", # 1
|
||||
"drikkejogurt": "drikkeyoghurt", # 1
|
||||
"drivein": "drive-in", # 1
|
||||
"driveinbiograf": "drive-in-biograf", # 1
|
||||
"drøvel": "drøbel", # 1
|
||||
"dødskriterium": "dødskriterie", # 1
|
||||
"e-mail-adresse": "e-mailadresse", # 1
|
||||
"e-post-adresse": "e-postadresse", # 1
|
||||
"egypten": "ægypten", # 2
|
||||
"ekskommunicere": "ekskommunikere", # 1
|
||||
"eksperimentarium": "eksperimentarie", # 1
|
||||
"elsass": "Alsace", # 1
|
||||
"elsasser": "alsacer", # 1
|
||||
"elsassisk": "alsacisk", # 1
|
||||
"elvetal": "ellevetal", # 1
|
||||
"elvetiden": "ellevetiden", # 1
|
||||
"elveårig": "elleveårig", # 1
|
||||
"elveårs": "elleveårs", # 1
|
||||
"elveårsbarn": "elleveårsbarn", # 1
|
||||
"elvte": "ellevte", # 1
|
||||
"elvtedel": "ellevtedel", # 1
|
||||
"energiministerium": "energiministerie", # 1
|
||||
"erhvervsministerium": "erhvervsministerie", # 1
|
||||
"espaliere": "spaliere", # 2
|
||||
"evangelium": "evangelie", # 1
|
||||
"fagministerium": "fagministerie", # 1
|
||||
"fakse": "faxe", # 1
|
||||
"fangstkvota": "fangstkvote", # 1
|
||||
"fader": "far", # 2
|
||||
"farbroder": "farbror", # 1
|
||||
"farfader": "farfar", # 1
|
||||
"farmoder": "farmor", # 1
|
||||
"federal": "føderal", # 1
|
||||
"federalisering": "føderalisering", # 1
|
||||
"federalisme": "føderalisme", # 1
|
||||
"federalist": "føderalist", # 1
|
||||
"federalistisk": "føderalistisk", # 1
|
||||
"federation": "føderation", # 1
|
||||
"federativ": "føderativ", # 1
|
||||
"fejlbeheftet": "fejlbehæftet", # 1
|
||||
"femetagers": "femetages", # 2
|
||||
"femhundredekroneseddel": "femhundredkroneseddel", # 2
|
||||
"filmpremiere": "filmpræmiere", # 2
|
||||
"finansimperium": "finansimperie", # 1
|
||||
"finansministerium": "finansministerie", # 1
|
||||
"firehjulstræk": "firhjulstræk", # 2
|
||||
"fjernstudium": "fjernstudie", # 1
|
||||
"formalier": "formalia", # 1
|
||||
"formandsskift": "formandsskifte", # 1
|
||||
"fornemst": "fornemmest", # 2
|
||||
"fornuftparti": "fornuftsparti", # 1
|
||||
"fornuftstridig": "fornuftsstridig", # 1
|
||||
"fornuftvæsen": "fornuftsvæsen", # 1
|
||||
"fornuftægteskab": "fornuftsægteskab", # 1
|
||||
"forretningsministerium": "forretningsministerie", # 1
|
||||
"forskningsministerium": "forskningsministerie", # 1
|
||||
"forstudium": "forstudie", # 1
|
||||
"forsvarsministerium": "forsvarsministerie", # 1
|
||||
"frilægge": "fritlægge", # 1
|
||||
"frilæggelse": "fritlæggelse", # 1
|
||||
"frilægning": "fritlægning", # 1
|
||||
"fristille": "fritstille", # 1
|
||||
"fristilling": "fritstilling", # 1
|
||||
"fuldttegnet": "fuldtegnet", # 1
|
||||
"fødestedskriterium": "fødestedskriterie", # 1
|
||||
"fødevareministerium": "fødevareministerie", # 1
|
||||
"følesløs": "følelsesløs", # 1
|
||||
"følgeligt": "følgelig", # 1
|
||||
"førne": "førn", # 1
|
||||
"gearskift": "gearskifte", # 2
|
||||
"gladeligt": "gladelig", # 1
|
||||
"glosehefte": "glosehæfte", # 1
|
||||
"glædeløs": "glædesløs", # 1
|
||||
"gonoré": "gonorré", # 1
|
||||
"grangiveligt": "grangivelig", # 1
|
||||
"grundliggende": "grundlæggende", # 2
|
||||
"grønsag": "grøntsag", # 2
|
||||
"gudbenådet": "gudsbenådet", # 1
|
||||
"gudfader": "gudfar", # 1
|
||||
"gudmoder": "gudmor", # 1
|
||||
"gulvmop": "gulvmoppe", # 1
|
||||
"gymnasium": "gymnasie", # 1
|
||||
"hackning": "hacking", # 1
|
||||
"halvbroder": "halvbror", # 1
|
||||
"halvelvetiden": "halvellevetiden", # 1
|
||||
"handelsgymnasium": "handelsgymnasie", # 1
|
||||
"hefte": "hæfte", # 1
|
||||
"hefteklamme": "hæfteklamme", # 1
|
||||
"heftelse": "hæftelse", # 1
|
||||
"heftemaskine": "hæftemaskine", # 1
|
||||
"heftepistol": "hæftepistol", # 1
|
||||
"hefteplaster": "hæfteplaster", # 1
|
||||
"heftestraf": "hæftestraf", # 1
|
||||
"heftning": "hæftning", # 1
|
||||
"helbroder": "helbror", # 1
|
||||
"hjemmeklasse": "hjemklasse", # 1
|
||||
"hjulspin": "hjulspind", # 1
|
||||
"huggevåben": "hugvåben", # 1
|
||||
"hulmurisolering": "hulmursisolering", # 1
|
||||
"hurtiggående": "hurtigtgående", # 2
|
||||
"hurtigttørrende": "hurtigtørrende", # 2
|
||||
"husmoder": "husmor", # 1
|
||||
"hydroxyd": "hydroxid", # 1
|
||||
"håndmikser": "håndmixer", # 1
|
||||
"højtaler": "højttaler", # 2
|
||||
"hønemoder": "hønemor", # 1
|
||||
"ide": "idé", # 2
|
||||
"imperium": "imperie", # 1
|
||||
"imponerthed": "imponerethed", # 1
|
||||
"inbox": "indboks", # 2
|
||||
"indenrigsministerium": "indenrigsministerie", # 1
|
||||
"indhefte": "indhæfte", # 1
|
||||
"indheftning": "indhæftning", # 1
|
||||
"indicium": "indicie", # 1
|
||||
"indkassere": "inkassere", # 2
|
||||
"iota": "jota", # 1
|
||||
"jobskift": "jobskifte", # 1
|
||||
"jogurt": "yoghurt", # 1
|
||||
"jukeboks": "jukebox", # 1
|
||||
"justitsministerium": "justitsministerie", # 1
|
||||
"kalorifere": "kalorifer", # 1
|
||||
"kandidatstipendium": "kandidatstipendie", # 1
|
||||
"kannevas": "kanvas", # 1
|
||||
"kaperssauce": "kaperssovs", # 1
|
||||
"kigge": "kikke", # 2
|
||||
"kirkeministerium": "kirkeministerie", # 1
|
||||
"klapmydse": "klapmyds", # 1
|
||||
"klimakterium": "klimakterie", # 1
|
||||
"klogeligt": "klogelig", # 1
|
||||
"knivblad": "knivsblad", # 1
|
||||
"kollegaer": "kolleger", # 2
|
||||
"kollegium": "kollegie", # 1
|
||||
"kollegiehefte": "kollegiehæfte", # 1
|
||||
"kollokviumx": "kollokvium", # 1
|
||||
"kommissorium": "kommissorie", # 1
|
||||
"kompendium": "kompendie", # 1
|
||||
"komplicerthed": "komplicerethed", # 1
|
||||
"konfederation": "konføderation", # 1
|
||||
"konfedereret": "konfødereret", # 1
|
||||
"konferensstudium": "konferensstudie", # 1
|
||||
"konservatorium": "konservatorie", # 1
|
||||
"konsulere": "konsultere", # 1
|
||||
"kradsbørstig": "krasbørstig", # 2
|
||||
"kravsspecifikation": "kravspecifikation", # 1
|
||||
"krematorium": "krematorie", # 1
|
||||
"krep": "crepe", # 1
|
||||
"krepnylon": "crepenylon", # 1
|
||||
"kreppapir": "crepepapir", # 1
|
||||
"kricket": "cricket", # 2
|
||||
"kriterium": "kriterie", # 1
|
||||
"kroat": "kroater", # 2
|
||||
"kroki": "croquis", # 1
|
||||
"kronprinsepar": "kronprinspar", # 2
|
||||
"kropdoven": "kropsdoven", # 1
|
||||
"kroplus": "kropslus", # 1
|
||||
"krøllefedt": "krølfedt", # 1
|
||||
"kulturministerium": "kulturministerie", # 1
|
||||
"kuponhefte": "kuponhæfte", # 1
|
||||
"kvota": "kvote", # 1
|
||||
"kvotaordning": "kvoteordning", # 1
|
||||
"laboratorium": "laboratorie", # 1
|
||||
"laksfarve": "laksefarve", # 1
|
||||
"laksfarvet": "laksefarvet", # 1
|
||||
"laksrød": "lakserød", # 1
|
||||
"laksyngel": "lakseyngel", # 1
|
||||
"laksørred": "lakseørred", # 1
|
||||
"landbrugsministerium": "landbrugsministerie", # 1
|
||||
"landskampstemning": "landskampsstemning", # 1
|
||||
"langust": "languster", # 1
|
||||
"lappegrejer": "lappegrej", # 1
|
||||
"lavløn": "lavtløn", # 1
|
||||
"lillebroder": "lillebror", # 1
|
||||
"linear": "lineær", # 1
|
||||
"loftlampe": "loftslampe", # 2
|
||||
"log-in": "login", # 1
|
||||
"login": "log-in", # 2
|
||||
"lovmedholdig": "lovmedholdelig", # 1
|
||||
"ludder": "luder", # 2
|
||||
"lysholder": "lyseholder", # 1
|
||||
"lægeskifte": "lægeskift", # 1
|
||||
"lærvillig": "lærevillig", # 1
|
||||
"løgsauce": "løgsovs", # 1
|
||||
"madmoder": "madmor", # 1
|
||||
"majonæse": "mayonnaise", # 1
|
||||
"mareridtagtig": "mareridtsagtig", # 1
|
||||
"margen": "margin", # 2
|
||||
"martyrium": "martyrie", # 1
|
||||
"mellemstatlig": "mellemstatslig", # 1
|
||||
"menneskene": "menneskerne", # 2
|
||||
"metropolis": "metropol", # 1
|
||||
"miks": "mix", # 1
|
||||
"mikse": "mixe", # 1
|
||||
"miksepult": "mixerpult", # 1
|
||||
"mikser": "mixer", # 1
|
||||
"mikserpult": "mixerpult", # 1
|
||||
"mikslån": "mixlån", # 1
|
||||
"miksning": "mixning", # 1
|
||||
"miljøministerium": "miljøministerie", # 1
|
||||
"milliarddel": "milliardtedel", # 1
|
||||
"milliondel": "milliontedel", # 1
|
||||
"ministerium": "ministerie", # 1
|
||||
"mop": "moppe", # 1
|
||||
"moder": "mor", # 2
|
||||
"moratorium": "moratorie", # 1
|
||||
"morbroder": "morbror", # 1
|
||||
"morfader": "morfar", # 1
|
||||
"mormoder": "mormor", # 1
|
||||
"musikkonservatorium": "musikkonservatorie", # 1
|
||||
"muslingskal": "muslingeskal", # 1
|
||||
"mysterium": "mysterie", # 1
|
||||
"naturalieydelse": "naturalydelse", # 1
|
||||
"naturalieøkonomi": "naturaløkonomi", # 1
|
||||
"navnebroder": "navnebror", # 1
|
||||
"nerium": "nerie", # 1
|
||||
"nådeløs": "nådesløs", # 1
|
||||
"nærforestående": "nærtforestående", # 1
|
||||
"nærstående": "nærtstående", # 1
|
||||
"observatorium": "observatorie", # 1
|
||||
"oldefader": "oldefar", # 1
|
||||
"oldemoder": "oldemor", # 1
|
||||
"opgraduere": "opgradere", # 1
|
||||
"opgraduering": "opgradering", # 1
|
||||
"oratorium": "oratorie", # 1
|
||||
"overbookning": "overbooking", # 1
|
||||
"overpræsidium": "overpræsidie", # 1
|
||||
"overstatlig": "overstatslig", # 1
|
||||
"oxyd": "oxid", # 1
|
||||
"oxydere": "oxidere", # 1
|
||||
"oxydering": "oxidering", # 1
|
||||
"pakkenellike": "pakkenelliker", # 1
|
||||
"papirtynd": "papirstynd", # 1
|
||||
"pastoralseminarium": "pastoralseminarie", # 1
|
||||
"peanutsene": "peanuttene", # 2
|
||||
"penalhus": "pennalhus", # 2
|
||||
"pensakrav": "pensumkrav", # 1
|
||||
"pepperoni": "peperoni", # 1
|
||||
"peruaner": "peruvianer", # 1
|
||||
"petrole": "petrol", # 1
|
||||
"piltast": "piletast", # 1
|
||||
"piltaste": "piletast", # 1
|
||||
"planetarium": "planetarie", # 1
|
||||
"plasteret": "plastret", # 2
|
||||
"plastic": "plastik", # 2
|
||||
"play-off-kamp": "playoffkamp", # 1
|
||||
"plejefader": "plejefar", # 1
|
||||
"plejemoder": "plejemor", # 1
|
||||
"podium": "podie", # 2
|
||||
"praha": "prag", # 2
|
||||
"preciøs": "pretiøs", # 2
|
||||
"privilegium": "privilegie", # 1
|
||||
"progredere": "progrediere", # 1
|
||||
"præsidium": "præsidie", # 1
|
||||
"psykodelisk": "psykedelisk", # 1
|
||||
"pudsegrejer": "pudsegrej", # 1
|
||||
"referensgruppe": "referencegruppe", # 1
|
||||
"referensramme": "referenceramme", # 1
|
||||
"refugium": "refugie", # 1
|
||||
"registeret": "registret", # 2
|
||||
"remedium": "remedie", # 1
|
||||
"remiks": "remix", # 1
|
||||
"reservert": "reserveret", # 1
|
||||
"ressortministerium": "ressortministerie", # 1
|
||||
"ressource": "resurse", # 2
|
||||
"resætte": "resette", # 1
|
||||
"rettelig": "retteligt", # 1
|
||||
"rettetaste": "rettetast", # 1
|
||||
"returtaste": "returtast", # 1
|
||||
"risici": "risikoer", # 2
|
||||
"roll-on": "rollon", # 1
|
||||
"rollehefte": "rollehæfte", # 1
|
||||
"rostbøf": "roastbeef", # 1
|
||||
"rygsæksturist": "rygsækturist", # 1
|
||||
"rødstjært": "rødstjert", # 1
|
||||
"saddel": "sadel", # 2
|
||||
"samaritan": "samaritaner", # 2
|
||||
"sanatorium": "sanatorie", # 1
|
||||
"sauce": "sovs", # 1
|
||||
"scanning": "skanning", # 2
|
||||
"sceneskifte": "sceneskift", # 1
|
||||
"scilla": "skilla", # 1
|
||||
"sejflydende": "sejtflydende", # 1
|
||||
"selvstudium": "selvstudie", # 1
|
||||
"seminarium": "seminarie", # 1
|
||||
"sennepssauce": "sennepssovs ", # 1
|
||||
"servitutbeheftet": "servitutbehæftet", # 1
|
||||
"sit-in": "sitin", # 1
|
||||
"skatteministerium": "skatteministerie", # 1
|
||||
"skifer": "skiffer", # 2
|
||||
"skyldsfølelse": "skyldfølelse", # 1
|
||||
"skysauce": "skysovs", # 1
|
||||
"sladdertaske": "sladretaske", # 2
|
||||
"sladdervorn": "sladrevorn", # 2
|
||||
"slagsbroder": "slagsbror", # 1
|
||||
"slettetaste": "slettetast", # 1
|
||||
"smørsauce": "smørsovs", # 1
|
||||
"snitsel": "schnitzel", # 1
|
||||
"snobbeeffekt": "snobeffekt", # 2
|
||||
"socialministerium": "socialministerie", # 1
|
||||
"solarium": "solarie", # 1
|
||||
"soldebroder": "soldebror", # 1
|
||||
"spagetti": "spaghetti", # 1
|
||||
"spagettistrop": "spaghettistrop", # 1
|
||||
"spagettiwestern": "spaghettiwestern", # 1
|
||||
"spin-off": "spinoff", # 1
|
||||
"spinnefiskeri": "spindefiskeri", # 1
|
||||
"spolorm": "spoleorm", # 1
|
||||
"sproglaboratorium": "sproglaboratorie", # 1
|
||||
"spækbræt": "spækkebræt", # 2
|
||||
"stand-in": "standin", # 1
|
||||
"stand-up-comedy": "standupcomedy", # 1
|
||||
"stand-up-komiker": "standupkomiker", # 1
|
||||
"statsministerium": "statsministerie", # 1
|
||||
"stedbroder": "stedbror", # 1
|
||||
"stedfader": "stedfar", # 1
|
||||
"stedmoder": "stedmor", # 1
|
||||
"stilehefte": "stilehæfte", # 1
|
||||
"stipendium": "stipendie", # 1
|
||||
"stjært": "stjert", # 1
|
||||
"stjærthage": "stjerthage", # 1
|
||||
"storebroder": "storebror", # 1
|
||||
"stortå": "storetå", # 1
|
||||
"strabads": "strabadser", # 1
|
||||
"strømlinjet": "strømlinet", # 1
|
||||
"studium": "studie", # 1
|
||||
"stænkelap": "stænklap", # 1
|
||||
"sundhedsministerium": "sundhedsministerie", # 1
|
||||
"suppositorium": "suppositorie", # 1
|
||||
"svejts": "schweiz", # 1
|
||||
"svejtser": "schweizer", # 1
|
||||
"svejtserfranc": "schweizerfranc", # 1
|
||||
"svejtserost": "schweizerost", # 1
|
||||
"svejtsisk": "schweizisk", # 1
|
||||
"svigerfader": "svigerfar", # 1
|
||||
"svigermoder": "svigermor", # 1
|
||||
"svirebroder": "svirebror", # 1
|
||||
"symposium": "symposie", # 1
|
||||
"sælarium": "sælarie", # 1
|
||||
"søreme": "sørme", # 2
|
||||
"søterritorium": "søterritorie", # 1
|
||||
"t-bone-steak": "t-bonesteak", # 1
|
||||
"tabgivende": "tabsgivende", # 1
|
||||
"tabuere": "tabuisere", # 1
|
||||
"tabuering": "tabuisering", # 1
|
||||
"tackle": "takle", # 2
|
||||
"tackling": "takling", # 2
|
||||
"taifun": "tyfon", # 1
|
||||
"take-off": "takeoff", # 1
|
||||
"taknemlig": "taknemmelig", # 2
|
||||
"talehørelærer": "tale-høre-lærer", # 1
|
||||
"talehøreundervisning": "tale-høre-undervisning", # 1
|
||||
"tandstik": "tandstikker", # 1
|
||||
"tao": "dao", # 1
|
||||
"taoisme": "daoisme", # 1
|
||||
"taoist": "daoist", # 1
|
||||
"taoistisk": "daoistisk", # 1
|
||||
"taverne": "taverna", # 1
|
||||
"teateret": "teatret", # 2
|
||||
"tekno": "techno", # 1
|
||||
"temposkifte": "temposkift", # 1
|
||||
"terrarium": "terrarie", # 1
|
||||
"territorium": "territorie", # 1
|
||||
"tesis": "tese", # 1
|
||||
"tidsstudium": "tidsstudie", # 1
|
||||
"tipoldefader": "tipoldefar", # 1
|
||||
"tipoldemoder": "tipoldemor", # 1
|
||||
"tomatsauce": "tomatsovs", # 1
|
||||
"tonart": "toneart", # 1
|
||||
"trafikministerium": "trafikministerie", # 1
|
||||
"tredve": "tredive", # 1
|
||||
"tredver": "trediver", # 1
|
||||
"tredveårig": "trediveårig", # 1
|
||||
"tredveårs": "trediveårs", # 1
|
||||
"tredveårsfødselsdag": "trediveårsfødselsdag", # 1
|
||||
"tredvte": "tredivte", # 1
|
||||
"tredvtedel": "tredivtedel", # 1
|
||||
"troldunge": "troldeunge", # 1
|
||||
"trommestikke": "trommestik", # 1
|
||||
"trubadur": "troubadour", # 2
|
||||
"trøstepræmie": "trøstpræmie", # 2
|
||||
"tummerum": "trummerum", # 1
|
||||
"tumultuarisk": "tumultarisk", # 1
|
||||
"tunghørighed": "tunghørhed", # 1
|
||||
"tus": "tusch", # 2
|
||||
"tusind": "tusinde", # 2
|
||||
"tvillingbroder": "tvillingebror", # 1
|
||||
"tvillingbror": "tvillingebror", # 1
|
||||
"tvillingebroder": "tvillingebror", # 1
|
||||
"ubeheftet": "ubehæftet", # 1
|
||||
"udenrigsministerium": "udenrigsministerie", # 1
|
||||
"udhulning": "udhuling", # 1
|
||||
"udslaggivende": "udslagsgivende", # 1
|
||||
"udspekulert": "udspekuleret", # 1
|
||||
"udviklingsministerium": "udviklingsministerie", # 1
|
||||
"uforpligtigende": "uforpligtende", # 1
|
||||
"uheldvarslende": "uheldsvarslende", # 1
|
||||
"uimponerthed": "uimponerethed", # 1
|
||||
"undervisningsministerium": "undervisningsministerie", # 1
|
||||
"unægtelig": "unægteligt", # 1
|
||||
"urinale": "urinal", # 1
|
||||
"uvederheftig": "uvederhæftig", # 1
|
||||
"vabel": "vable", # 2
|
||||
"vadi": "wadi", # 1
|
||||
"vaklevorn": "vakkelvorn", # 1
|
||||
"vanadin": "vanadium", # 1
|
||||
"vaselin": "vaseline", # 1
|
||||
"vederheftig": "vederhæftig", # 1
|
||||
"vedhefte": "vedhæfte", # 1
|
||||
"velar": "velær", # 1
|
||||
"videndeling": "vidensdeling", # 2
|
||||
"vinkelanførelsestegn": "vinkelanførselstegn", # 1
|
||||
"vipstjært": "vipstjert", # 1
|
||||
"vismut": "bismut", # 1
|
||||
"visvas": "vissevasse", # 1
|
||||
"voksværk": "vokseværk", # 1
|
||||
"værtdyr": "værtsdyr", # 1
|
||||
"værtplante": "værtsplante", # 1
|
||||
"wienersnitsel": "wienerschnitzel", # 1
|
||||
"yderliggående": "yderligtgående", # 2
|
||||
"zombi": "zombie", # 1
|
||||
"ægbakke": "æggebakke", # 1
|
||||
"ægformet": "æggeformet", # 1
|
||||
"ægleder": "æggeleder", # 1
|
||||
"ækvilibrist": "ekvilibrist", # 2
|
||||
"æselsøre": "æseløre", # 1
|
||||
"øjehule": "øjenhule", # 1
|
||||
"øjelåg": "øjenlåg", # 1
|
||||
"øjeåbner": "øjenåbner", # 1
|
||||
"økonomiministerium": "økonomiministerie", # 1
|
||||
"ørenring": "ørering", # 2
|
||||
"øvehefte": "øvehæfte", # 1
|
||||
}
|
||||
|
||||
|
||||
NORM_EXCEPTIONS = {}
|
||||
|
||||
for string, norm in _exc.items():
|
||||
NORM_EXCEPTIONS[string] = norm
|
||||
NORM_EXCEPTIONS[string.title()] = norm
|
|
@ -2,7 +2,7 @@
|
|||
Tokenizer Exceptions.
|
||||
Source: https://forkortelse.dk/ and various others.
|
||||
"""
|
||||
from ...symbols import ORTH, LEMMA, NORM, TAG, PUNCT
|
||||
from ...symbols import ORTH, LEMMA, NORM
|
||||
|
||||
|
||||
_exc = {}
|
||||
|
@ -48,7 +48,7 @@ for exc_data in [
|
|||
{ORTH: "Ons.", LEMMA: "onsdag"},
|
||||
{ORTH: "Fre.", LEMMA: "fredag"},
|
||||
{ORTH: "Lør.", LEMMA: "lørdag"},
|
||||
{ORTH: "og/eller", LEMMA: "og/eller", NORM: "og/eller", TAG: "CC"},
|
||||
{ORTH: "og/eller", LEMMA: "og/eller", NORM: "og/eller"},
|
||||
]:
|
||||
_exc[exc_data[ORTH]] = [exc_data]
|
||||
|
||||
|
@ -573,7 +573,7 @@ for h in range(1, 31 + 1):
|
|||
for period in ["."]:
|
||||
_exc[f"{h}{period}"] = [{ORTH: f"{h}."}]
|
||||
|
||||
_custom_base_exc = {"i.": [{ORTH: "i", LEMMA: "i", NORM: "i"}, {ORTH: ".", TAG: PUNCT}]}
|
||||
_custom_base_exc = {"i.": [{ORTH: "i", LEMMA: "i", NORM: "i"}, {ORTH: "."}]}
|
||||
_exc.update(_custom_base_exc)
|
||||
|
||||
TOKENIZER_EXCEPTIONS = _exc
|
||||
|
|
|
@ -1,5 +1,4 @@
|
|||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
from .punctuation import TOKENIZER_PREFIXES, TOKENIZER_SUFFIXES
|
||||
from .punctuation import TOKENIZER_INFIXES
|
||||
from .tag_map import TAG_MAP
|
||||
|
@ -7,18 +6,14 @@ from .stop_words import STOP_WORDS
|
|||
from .syntax_iterators import SYNTAX_ITERATORS
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...language import Language
|
||||
from ...attrs import LANG, NORM
|
||||
from ...util import update_exc, add_lookups
|
||||
from ...attrs import LANG
|
||||
from ...util import update_exc
|
||||
|
||||
|
||||
class GermanDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters[LANG] = lambda text: "de"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], NORM_EXCEPTIONS, BASE_NORMS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
prefixes = TOKENIZER_PREFIXES
|
||||
suffixes = TOKENIZER_SUFFIXES
|
||||
|
|
|
@ -1,13 +0,0 @@
|
|||
# Here we only want to include the absolute most common words. Otherwise,
|
||||
# this list would get impossibly long for German – especially considering the
|
||||
# old vs. new spelling rules, and all possible cases.
|
||||
|
||||
|
||||
_exc = {"daß": "dass"}
|
||||
|
||||
|
||||
NORM_EXCEPTIONS = {}
|
||||
|
||||
for string, norm in _exc.items():
|
||||
NORM_EXCEPTIONS[string] = norm
|
||||
NORM_EXCEPTIONS[string.title()] = norm
|
|
@ -1,7 +1,8 @@
|
|||
from ...symbols import NOUN, PROPN, PRON
|
||||
from ...errors import Errors
|
||||
|
||||
|
||||
def noun_chunks(obj):
|
||||
def noun_chunks(doclike):
|
||||
"""
|
||||
Detect base noun phrases from a dependency parse. Works on both Doc and Span.
|
||||
"""
|
||||
|
@ -24,13 +25,17 @@ def noun_chunks(obj):
|
|||
"og",
|
||||
"app",
|
||||
]
|
||||
doc = obj.doc # Ensure works on both Doc and Span.
|
||||
doc = doclike.doc # Ensure works on both Doc and Span.
|
||||
|
||||
if not doc.is_parsed:
|
||||
raise ValueError(Errors.E029)
|
||||
|
||||
np_label = doc.vocab.strings.add("NP")
|
||||
np_deps = set(doc.vocab.strings.add(label) for label in labels)
|
||||
close_app = doc.vocab.strings.add("nk")
|
||||
|
||||
rbracket = 0
|
||||
for i, word in enumerate(obj):
|
||||
for i, word in enumerate(doclike):
|
||||
if i < rbracket:
|
||||
continue
|
||||
if word.pos in (NOUN, PROPN, PRON) and word.dep in np_deps:
|
||||
|
|
|
@ -6,21 +6,16 @@ from .lemmatizer import GreekLemmatizer
|
|||
from .syntax_iterators import SYNTAX_ITERATORS
|
||||
from .punctuation import TOKENIZER_PREFIXES, TOKENIZER_SUFFIXES, TOKENIZER_INFIXES
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...language import Language
|
||||
from ...lookups import Lookups
|
||||
from ...attrs import LANG, NORM
|
||||
from ...util import update_exc, add_lookups
|
||||
from ...attrs import LANG
|
||||
from ...util import update_exc
|
||||
|
||||
|
||||
class GreekDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = lambda text: "el"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
stop_words = STOP_WORDS
|
||||
tag_map = TAG_MAP
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,7 +1,8 @@
|
|||
from ...symbols import NOUN, PROPN, PRON
|
||||
from ...errors import Errors
|
||||
|
||||
|
||||
def noun_chunks(obj):
|
||||
def noun_chunks(doclike):
|
||||
"""
|
||||
Detect base noun phrases. Works on both Doc and Span.
|
||||
"""
|
||||
|
@ -10,13 +11,17 @@ def noun_chunks(obj):
|
|||
# obj tag corrects some DEP tagger mistakes.
|
||||
# Further improvement of the models will eliminate the need for this tag.
|
||||
labels = ["nsubj", "obj", "iobj", "appos", "ROOT", "obl"]
|
||||
doc = obj.doc # Ensure works on both Doc and Span.
|
||||
doc = doclike.doc # Ensure works on both Doc and Span.
|
||||
|
||||
if not doc.is_parsed:
|
||||
raise ValueError(Errors.E029)
|
||||
|
||||
np_deps = [doc.vocab.strings.add(label) for label in labels]
|
||||
conj = doc.vocab.strings.add("conj")
|
||||
nmod = doc.vocab.strings.add("nmod")
|
||||
np_label = doc.vocab.strings.add("NP")
|
||||
seen = set()
|
||||
for i, word in enumerate(obj):
|
||||
for i, word in enumerate(doclike):
|
||||
if word.pos not in (NOUN, PROPN, PRON):
|
||||
continue
|
||||
# Prevent nested chunks from being produced
|
||||
|
|
|
@ -1,5 +1,4 @@
|
|||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
from .tag_map import TAG_MAP
|
||||
from .stop_words import STOP_WORDS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
|
@ -7,10 +6,9 @@ from .morph_rules import MORPH_RULES
|
|||
from .syntax_iterators import SYNTAX_ITERATORS
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...language import Language
|
||||
from ...attrs import LANG, NORM
|
||||
from ...util import update_exc, add_lookups
|
||||
from ...attrs import LANG
|
||||
from ...util import update_exc
|
||||
|
||||
|
||||
def _return_en(_):
|
||||
|
@ -21,9 +19,6 @@ class EnglishDefaults(Language.Defaults):
|
|||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = _return_en
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
tag_map = TAG_MAP
|
||||
stop_words = STOP_WORDS
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,7 +1,8 @@
|
|||
from ...symbols import NOUN, PROPN, PRON
|
||||
from ...errors import Errors
|
||||
|
||||
|
||||
def noun_chunks(obj):
|
||||
def noun_chunks(doclike):
|
||||
"""
|
||||
Detect base noun phrases from a dependency parse. Works on both Doc and Span.
|
||||
"""
|
||||
|
@ -16,12 +17,16 @@ def noun_chunks(obj):
|
|||
"attr",
|
||||
"ROOT",
|
||||
]
|
||||
doc = obj.doc # Ensure works on both Doc and Span.
|
||||
doc = doclike.doc # Ensure works on both Doc and Span.
|
||||
|
||||
if not doc.is_parsed:
|
||||
raise ValueError(Errors.E029)
|
||||
|
||||
np_deps = [doc.vocab.strings.add(label) for label in labels]
|
||||
conj = doc.vocab.strings.add("conj")
|
||||
np_label = doc.vocab.strings.add("NP")
|
||||
seen = set()
|
||||
for i, word in enumerate(obj):
|
||||
for i, word in enumerate(doclike):
|
||||
if word.pos not in (NOUN, PROPN, PRON):
|
||||
continue
|
||||
# Prevent nested chunks from being produced
|
||||
|
|
|
@ -74,12 +74,12 @@ for pron in ["i", "you", "he", "she", "it", "we", "they"]:
|
|||
|
||||
_exc[orth + "'d"] = [
|
||||
{ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
|
||||
{ORTH: "'d", LEMMA: "would", NORM: "would", TAG: "MD"},
|
||||
{ORTH: "'d", NORM: "'d"},
|
||||
]
|
||||
|
||||
_exc[orth + "d"] = [
|
||||
{ORTH: orth, LEMMA: PRON_LEMMA, NORM: pron, TAG: "PRP"},
|
||||
{ORTH: "d", LEMMA: "would", NORM: "would", TAG: "MD"},
|
||||
{ORTH: "d", NORM: "'d"},
|
||||
]
|
||||
|
||||
_exc[orth + "'d've"] = [
|
||||
|
@ -192,7 +192,10 @@ for word in ["who", "what", "when", "where", "why", "how", "there", "that"]:
|
|||
{ORTH: "'d", NORM: "'d"},
|
||||
]
|
||||
|
||||
_exc[orth + "d"] = [{ORTH: orth, LEMMA: word, NORM: word}, {ORTH: "d"}]
|
||||
_exc[orth + "d"] = [
|
||||
{ORTH: orth, LEMMA: word, NORM: word},
|
||||
{ORTH: "d", NORM: "'d"},
|
||||
]
|
||||
|
||||
_exc[orth + "'d've"] = [
|
||||
{ORTH: orth, LEMMA: word, NORM: word},
|
||||
|
|
|
@ -3,6 +3,7 @@ from .tag_map import TAG_MAP
|
|||
from .stop_words import STOP_WORDS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from .syntax_iterators import SYNTAX_ITERATORS
|
||||
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
|
@ -20,6 +21,8 @@ class SpanishDefaults(Language.Defaults):
|
|||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
tag_map = TAG_MAP
|
||||
infixes = TOKENIZER_INFIXES
|
||||
suffixes = TOKENIZER_SUFFIXES
|
||||
stop_words = STOP_WORDS
|
||||
syntax_iterators = SYNTAX_ITERATORS
|
||||
|
||||
|
|
|
@ -23,6 +23,15 @@ _num_words = [
|
|||
"dieciocho",
|
||||
"diecinueve",
|
||||
"veinte",
|
||||
"veintiuno",
|
||||
"veintidós",
|
||||
"veintitrés",
|
||||
"veinticuatro",
|
||||
"veinticinco",
|
||||
"veintiséis",
|
||||
"veintisiete",
|
||||
"veintiocho",
|
||||
"veintinueve",
|
||||
"treinta",
|
||||
"cuarenta",
|
||||
"cincuenta",
|
||||
|
|
47
spacy/lang/es/punctuation.py
Normal file
47
spacy/lang/es/punctuation.py
Normal file
|
@ -0,0 +1,47 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from ..char_classes import LIST_PUNCT, LIST_ELLIPSES, LIST_QUOTES
|
||||
from ..char_classes import LIST_ICONS, CURRENCY, LIST_UNITS, PUNCT
|
||||
from ..char_classes import CONCAT_QUOTES, ALPHA_LOWER, ALPHA_UPPER, ALPHA
|
||||
from ..char_classes import merge_chars
|
||||
|
||||
|
||||
_list_units = [u for u in LIST_UNITS if u != "%"]
|
||||
_units = merge_chars(" ".join(_list_units))
|
||||
_concat_quotes = CONCAT_QUOTES + "—–"
|
||||
|
||||
|
||||
_suffixes = (
|
||||
["—", "–"]
|
||||
+ LIST_PUNCT
|
||||
+ LIST_ELLIPSES
|
||||
+ LIST_QUOTES
|
||||
+ LIST_ICONS
|
||||
+ [
|
||||
r"(?<=[0-9])\+",
|
||||
r"(?<=°[FfCcKk])\.",
|
||||
r"(?<=[0-9])(?:{c})".format(c=CURRENCY),
|
||||
r"(?<=[0-9])(?:{u})".format(u=_units),
|
||||
r"(?<=[0-9{al}{e}{p}(?:{q})])\.".format(
|
||||
al=ALPHA_LOWER, e=r"%²\-\+", q=_concat_quotes, p=PUNCT
|
||||
),
|
||||
r"(?<=[{au}][{au}])\.".format(au=ALPHA_UPPER),
|
||||
]
|
||||
)
|
||||
|
||||
_infixes = (
|
||||
LIST_ELLIPSES
|
||||
+ LIST_ICONS
|
||||
+ [
|
||||
r"(?<=[0-9])[+\*^](?=[0-9-])",
|
||||
r"(?<=[{al}{q}])\.(?=[{au}{q}])".format(
|
||||
al=ALPHA_LOWER, au=ALPHA_UPPER, q=_concat_quotes
|
||||
),
|
||||
r"(?<=[{a}]),(?=[{a}])".format(a=ALPHA),
|
||||
r"(?<=[{a}0-9])[:<>=/](?=[{a}])".format(a=ALPHA),
|
||||
]
|
||||
)
|
||||
|
||||
TOKENIZER_SUFFIXES = _suffixes
|
||||
TOKENIZER_INFIXES = _infixes
|
|
@ -1,8 +1,13 @@
|
|||
from ...symbols import NOUN, PROPN, PRON, VERB, AUX
|
||||
from ...errors import Errors
|
||||
|
||||
|
||||
def noun_chunks(obj):
|
||||
doc = obj.doc
|
||||
def noun_chunks(doclike):
|
||||
doc = doclike.doc
|
||||
|
||||
if not doc.is_parsed:
|
||||
raise ValueError(Errors.E029)
|
||||
|
||||
if not len(doc):
|
||||
return
|
||||
np_label = doc.vocab.strings.add("NP")
|
||||
|
@ -13,7 +18,7 @@ def noun_chunks(obj):
|
|||
np_right_deps = [doc.vocab.strings.add(label) for label in right_labels]
|
||||
stop_deps = [doc.vocab.strings.add(label) for label in stop_labels]
|
||||
token = doc[0]
|
||||
while token and token.i < len(doc):
|
||||
while token and token.i < len(doclike):
|
||||
if token.pos in [PROPN, NOUN, PRON]:
|
||||
left, right = noun_bounds(
|
||||
doc, token, np_left_deps, np_right_deps, stop_deps
|
||||
|
|
|
@ -39,14 +39,16 @@ for orth in [
|
|||
"Av.",
|
||||
"Avda.",
|
||||
"Cía.",
|
||||
"EE.UU.",
|
||||
"etc.",
|
||||
"fig.",
|
||||
"Gob.",
|
||||
"Gral.",
|
||||
"Ing.",
|
||||
"J.C.",
|
||||
"km/h",
|
||||
"Lic.",
|
||||
"m.n.",
|
||||
"no.",
|
||||
"núm.",
|
||||
"P.D.",
|
||||
"Prof.",
|
||||
|
|
|
@ -7,6 +7,7 @@ from .lex_attrs import LEX_ATTRS
|
|||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .tag_map import TAG_MAP
|
||||
from .punctuation import TOKENIZER_SUFFIXES
|
||||
from .syntax_iterators import SYNTAX_ITERATORS
|
||||
|
||||
|
||||
class PersianDefaults(Language.Defaults):
|
||||
|
@ -21,6 +22,7 @@ class PersianDefaults(Language.Defaults):
|
|||
tag_map = TAG_MAP
|
||||
suffixes = TOKENIZER_SUFFIXES
|
||||
writing_system = {"direction": "rtl", "has_case": False, "has_letters": True}
|
||||
syntax_iterators = SYNTAX_ITERATORS
|
||||
|
||||
|
||||
class Persian(Language):
|
||||
|
|
|
@ -1,7 +1,8 @@
|
|||
from ...symbols import NOUN, PROPN, PRON
|
||||
from ...errors import Errors
|
||||
|
||||
|
||||
def noun_chunks(obj):
|
||||
def noun_chunks(doclike):
|
||||
"""
|
||||
Detect base noun phrases from a dependency parse. Works on both Doc and Span.
|
||||
"""
|
||||
|
@ -16,12 +17,16 @@ def noun_chunks(obj):
|
|||
"attr",
|
||||
"ROOT",
|
||||
]
|
||||
doc = obj.doc # Ensure works on both Doc and Span.
|
||||
doc = doclike.doc # Ensure works on both Doc and Span.
|
||||
|
||||
if not doc.is_parsed:
|
||||
raise ValueError(Errors.E029)
|
||||
|
||||
np_deps = [doc.vocab.strings.add(label) for label in labels]
|
||||
conj = doc.vocab.strings.add("conj")
|
||||
np_label = doc.vocab.strings.add("NP")
|
||||
seen = set()
|
||||
for i, word in enumerate(obj):
|
||||
for i, word in enumerate(doclike):
|
||||
if word.pos not in (NOUN, PROPN, PRON):
|
||||
continue
|
||||
# Prevent nested chunks from being produced
|
||||
|
|
|
@ -1,7 +1,8 @@
|
|||
from ...symbols import NOUN, PROPN, PRON
|
||||
from ...errors import Errors
|
||||
|
||||
|
||||
def noun_chunks(obj):
|
||||
def noun_chunks(doclike):
|
||||
"""
|
||||
Detect base noun phrases from a dependency parse. Works on both Doc and Span.
|
||||
"""
|
||||
|
@ -15,12 +16,16 @@ def noun_chunks(obj):
|
|||
"nmod",
|
||||
"nmod:poss",
|
||||
]
|
||||
doc = obj.doc # Ensure works on both Doc and Span.
|
||||
doc = doclike.doc # Ensure works on both Doc and Span.
|
||||
|
||||
if not doc.is_parsed:
|
||||
raise ValueError(Errors.E029)
|
||||
|
||||
np_deps = [doc.vocab.strings[label] for label in labels]
|
||||
conj = doc.vocab.strings.add("conj")
|
||||
np_label = doc.vocab.strings.add("NP")
|
||||
seen = set()
|
||||
for i, word in enumerate(obj):
|
||||
for i, word in enumerate(doclike):
|
||||
if word.pos not in (NOUN, PROPN, PRON):
|
||||
continue
|
||||
# Prevent nested chunks from being produced
|
||||
|
|
|
@ -458,5 +458,5 @@ _regular_exp.append(URL_PATTERN)
|
|||
|
||||
TOKENIZER_EXCEPTIONS = _exc
|
||||
TOKEN_MATCH = re.compile(
|
||||
"|".join("(?:{})".format(m) for m in _regular_exp), re.IGNORECASE | re.UNICODE
|
||||
"(?iu)" + "|".join("(?:{})".format(m) for m in _regular_exp)
|
||||
).match
|
||||
|
|
18
spacy/lang/gu/__init__.py
Normal file
18
spacy/lang/gu/__init__.py
Normal file
|
@ -0,0 +1,18 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .stop_words import STOP_WORDS
|
||||
|
||||
from ...language import Language
|
||||
|
||||
|
||||
class GujaratiDefaults(Language.Defaults):
|
||||
stop_words = STOP_WORDS
|
||||
|
||||
|
||||
class Gujarati(Language):
|
||||
lang = "gu"
|
||||
Defaults = GujaratiDefaults
|
||||
|
||||
|
||||
__all__ = ["Gujarati"]
|
22
spacy/lang/gu/examples.py
Normal file
22
spacy/lang/gu/examples.py
Normal file
|
@ -0,0 +1,22 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
"""
|
||||
Example sentences to test spaCy and its language models.
|
||||
|
||||
>>> from spacy.lang.gu.examples import sentences
|
||||
>>> docs = nlp.pipe(sentences)
|
||||
"""
|
||||
|
||||
|
||||
sentences = [
|
||||
"લોકશાહી એ સરકારનું એક એવું તંત્ર છે જ્યાં નાગરિકો મત દ્વારા સત્તાનો ઉપયોગ કરે છે.",
|
||||
"તે ગુજરાત રાજ્યના ધરમપુર શહેરમાં આવેલું હતું",
|
||||
"કર્ણદેવ પહેલો સોલંકી વંશનો રાજા હતો",
|
||||
"તેજપાળને બે પત્ની હતી",
|
||||
"ગુજરાતમાં ભારતીય જનતા પક્ષનો ઉદય આ સમયગાળા દરમિયાન થયો",
|
||||
"આંદોલનકારીઓએ ચીમનભાઇ પટેલના રાજીનામાની માંગણી કરી.",
|
||||
"અહિયાં શું જોડાય છે?",
|
||||
"મંદિરનો પૂર્વાભિમુખ ભાગ નાના મંડપ સાથે થોડો લંબચોરસ આકારનો છે.",
|
||||
]
|
91
spacy/lang/gu/stop_words.py
Normal file
91
spacy/lang/gu/stop_words.py
Normal file
|
@ -0,0 +1,91 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
STOP_WORDS = set(
|
||||
"""
|
||||
એમ
|
||||
આ
|
||||
એ
|
||||
રહી
|
||||
છે
|
||||
છો
|
||||
હતા
|
||||
હતું
|
||||
હતી
|
||||
હોય
|
||||
હતો
|
||||
શકે
|
||||
તે
|
||||
તેના
|
||||
તેનું
|
||||
તેને
|
||||
તેની
|
||||
તેઓ
|
||||
તેમને
|
||||
તેમના
|
||||
તેમણે
|
||||
તેમનું
|
||||
તેમાં
|
||||
અને
|
||||
અહીં
|
||||
થી
|
||||
થઈ
|
||||
થાય
|
||||
જે
|
||||
ને
|
||||
કે
|
||||
ના
|
||||
ની
|
||||
નો
|
||||
ને
|
||||
નું
|
||||
શું
|
||||
માં
|
||||
પણ
|
||||
પર
|
||||
જેવા
|
||||
જેવું
|
||||
જાય
|
||||
જેમ
|
||||
જેથી
|
||||
માત્ર
|
||||
માટે
|
||||
પરથી
|
||||
આવ્યું
|
||||
એવી
|
||||
આવી
|
||||
રીતે
|
||||
સુધી
|
||||
થાય
|
||||
થઈ
|
||||
સાથે
|
||||
લાગે
|
||||
હોવા
|
||||
છતાં
|
||||
રહેલા
|
||||
કરી
|
||||
કરે
|
||||
કેટલા
|
||||
કોઈ
|
||||
કેમ
|
||||
કર્યો
|
||||
કર્યુ
|
||||
કરે
|
||||
સૌથી
|
||||
ત્યારબાદ
|
||||
તથા
|
||||
દ્વારા
|
||||
જુઓ
|
||||
જાઓ
|
||||
જ્યારે
|
||||
ત્યારે
|
||||
શકો
|
||||
નથી
|
||||
હવે
|
||||
અથવા
|
||||
થતો
|
||||
દર
|
||||
એટલો
|
||||
પરંતુ
|
||||
""".split()
|
||||
)
|
26
spacy/lang/hy/__init__.py
Normal file
26
spacy/lang/hy/__init__.py
Normal file
|
@ -0,0 +1,26 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .stop_words import STOP_WORDS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from .tag_map import TAG_MAP
|
||||
|
||||
from ...attrs import LANG
|
||||
from ...language import Language
|
||||
|
||||
|
||||
class ArmenianDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters[LANG] = lambda text: "hy"
|
||||
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
stop_words = STOP_WORDS
|
||||
tag_map = TAG_MAP
|
||||
|
||||
|
||||
class Armenian(Language):
|
||||
lang = "hy"
|
||||
Defaults = ArmenianDefaults
|
||||
|
||||
|
||||
__all__ = ["Armenian"]
|
16
spacy/lang/hy/examples.py
Normal file
16
spacy/lang/hy/examples.py
Normal file
|
@ -0,0 +1,16 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
"""
|
||||
Example sentences to test spaCy and its language models.
|
||||
>>> from spacy.lang.hy.examples import sentences
|
||||
>>> docs = nlp.pipe(sentences)
|
||||
"""
|
||||
|
||||
|
||||
sentences = [
|
||||
"Լոնդոնը Միացյալ Թագավորության մեծ քաղաք է։",
|
||||
"Ո՞վ է Ֆրանսիայի նախագահը։",
|
||||
"Որն է Միացյալ Նահանգների մայրաքաղաքը։",
|
||||
"Ե՞րբ է ծնվել Բարաք Օբաման։",
|
||||
]
|
59
spacy/lang/hy/lex_attrs.py
Normal file
59
spacy/lang/hy/lex_attrs.py
Normal file
|
@ -0,0 +1,59 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from ...attrs import LIKE_NUM
|
||||
|
||||
|
||||
_num_words = [
|
||||
"զրօ",
|
||||
"մէկ",
|
||||
"երկու",
|
||||
"երեք",
|
||||
"չորս",
|
||||
"հինգ",
|
||||
"վեց",
|
||||
"յոթ",
|
||||
"ութ",
|
||||
"ինը",
|
||||
"տասը",
|
||||
"տասնմեկ",
|
||||
"տասներկու",
|
||||
"տասներեք",
|
||||
"տասնչորս",
|
||||
"տասնհինգ",
|
||||
"տասնվեց",
|
||||
"տասնյոթ",
|
||||
"տասնութ",
|
||||
"տասնինը",
|
||||
"քսան" "երեսուն",
|
||||
"քառասուն",
|
||||
"հիսուն",
|
||||
"վաթցսուն",
|
||||
"յոթանասուն",
|
||||
"ութսուն",
|
||||
"ինիսուն",
|
||||
"հարյուր",
|
||||
"հազար",
|
||||
"միլիոն",
|
||||
"միլիարդ",
|
||||
"տրիլիոն",
|
||||
"քվինտիլիոն",
|
||||
]
|
||||
|
||||
|
||||
def like_num(text):
|
||||
if text.startswith(("+", "-", "±", "~")):
|
||||
text = text[1:]
|
||||
text = text.replace(",", "").replace(".", "")
|
||||
if text.isdigit():
|
||||
return True
|
||||
if text.count("/") == 1:
|
||||
num, denom = text.split("/")
|
||||
if num.isdigit() and denom.isdigit():
|
||||
return True
|
||||
if text.lower() in _num_words:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
LEX_ATTRS = {LIKE_NUM: like_num}
|
110
spacy/lang/hy/stop_words.py
Normal file
110
spacy/lang/hy/stop_words.py
Normal file
|
@ -0,0 +1,110 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
STOP_WORDS = set(
|
||||
"""
|
||||
նա
|
||||
ողջը
|
||||
այստեղ
|
||||
ենք
|
||||
նա
|
||||
էիր
|
||||
որպես
|
||||
ուրիշ
|
||||
բոլորը
|
||||
այն
|
||||
այլ
|
||||
նույնչափ
|
||||
էի
|
||||
մի
|
||||
և
|
||||
ողջ
|
||||
ես
|
||||
ոմն
|
||||
հետ
|
||||
նրանք
|
||||
ամենքը
|
||||
ըստ
|
||||
ինչ-ինչ
|
||||
այսպես
|
||||
համայն
|
||||
մի
|
||||
նաև
|
||||
նույնքան
|
||||
դա
|
||||
ովևէ
|
||||
համար
|
||||
այնտեղ
|
||||
էին
|
||||
որոնք
|
||||
սույն
|
||||
ինչ-որ
|
||||
ամենը
|
||||
նույնպիսի
|
||||
ու
|
||||
իր
|
||||
որոշ
|
||||
միևնույն
|
||||
ի
|
||||
այնպիսի
|
||||
մենք
|
||||
ամեն ոք
|
||||
նույն
|
||||
երբևէ
|
||||
այն
|
||||
որևէ
|
||||
ին
|
||||
այդպես
|
||||
նրա
|
||||
որը
|
||||
վրա
|
||||
դու
|
||||
էինք
|
||||
այդպիսի
|
||||
էիք
|
||||
յուրաքանչյուրը
|
||||
եմ
|
||||
պիտի
|
||||
այդ
|
||||
ամբողջը
|
||||
հետո
|
||||
եք
|
||||
ամեն
|
||||
այլ
|
||||
կամ
|
||||
այսքան
|
||||
որ
|
||||
այնպես
|
||||
այսինչ
|
||||
բոլոր
|
||||
է
|
||||
մեկնումեկը
|
||||
այդչափ
|
||||
այնքան
|
||||
ամբողջ
|
||||
երբևիցե
|
||||
այնչափ
|
||||
ամենայն
|
||||
մյուս
|
||||
այնինչ
|
||||
իսկ
|
||||
այդտեղ
|
||||
այս
|
||||
սա
|
||||
են
|
||||
ամեն ինչ
|
||||
որևիցե
|
||||
ում
|
||||
մեկը
|
||||
այդ
|
||||
դուք
|
||||
այսչափ
|
||||
այդքան
|
||||
այսպիսի
|
||||
էր
|
||||
յուրաքանչյուր
|
||||
այս
|
||||
մեջ
|
||||
թ
|
||||
""".split()
|
||||
)
|
2478
spacy/lang/hy/tag_map.py
Normal file
2478
spacy/lang/hy/tag_map.py
Normal file
File diff suppressed because it is too large
Load Diff
|
@ -1,25 +1,20 @@
|
|||
from .stop_words import STOP_WORDS
|
||||
from .punctuation import TOKENIZER_SUFFIXES, TOKENIZER_PREFIXES, TOKENIZER_INFIXES
|
||||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from .syntax_iterators import SYNTAX_ITERATORS
|
||||
from .tag_map import TAG_MAP
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...language import Language
|
||||
from ...attrs import LANG, NORM
|
||||
from ...util import update_exc, add_lookups
|
||||
from ...attrs import LANG
|
||||
from ...util import update_exc
|
||||
|
||||
|
||||
class IndonesianDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters[LANG] = lambda text: "id"
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
stop_words = STOP_WORDS
|
||||
prefixes = TOKENIZER_PREFIXES
|
||||
|
|
|
@ -1,529 +0,0 @@
|
|||
# Daftar kosakata yang sering salah dieja
|
||||
# https://id.wikipedia.org/wiki/Wikipedia:Daftar_kosakata_bahasa_Indonesia_yang_sering_salah_dieja
|
||||
_exc = {
|
||||
# Slang and abbreviations
|
||||
"silahkan": "silakan",
|
||||
"yg": "yang",
|
||||
"kalo": "kalau",
|
||||
"cawu": "caturwulan",
|
||||
"ok": "oke",
|
||||
"gak": "tidak",
|
||||
"enggak": "tidak",
|
||||
"nggak": "tidak",
|
||||
"ndak": "tidak",
|
||||
"ngga": "tidak",
|
||||
"dgn": "dengan",
|
||||
"tdk": "tidak",
|
||||
"jg": "juga",
|
||||
"klo": "kalau",
|
||||
"denger": "dengar",
|
||||
"pinter": "pintar",
|
||||
"krn": "karena",
|
||||
"nemuin": "menemukan",
|
||||
"jgn": "jangan",
|
||||
"udah": "sudah",
|
||||
"sy": "saya",
|
||||
"udh": "sudah",
|
||||
"dapetin": "mendapatkan",
|
||||
"ngelakuin": "melakukan",
|
||||
"ngebuat": "membuat",
|
||||
"membikin": "membuat",
|
||||
"bikin": "buat",
|
||||
# Daftar kosakata yang sering salah dieja
|
||||
"malpraktik": "malapraktik",
|
||||
"malfungsi": "malafungsi",
|
||||
"malserap": "malaserap",
|
||||
"maladaptasi": "malaadaptasi",
|
||||
"malsuai": "malasuai",
|
||||
"maldistribusi": "maladistribusi",
|
||||
"malgizi": "malagizi",
|
||||
"malsikap": "malasikap",
|
||||
"memperhatikan": "memerhatikan",
|
||||
"akte": "akta",
|
||||
"cemilan": "camilan",
|
||||
"esei": "esai",
|
||||
"frase": "frasa",
|
||||
"kafeteria": "kafetaria",
|
||||
"ketapel": "katapel",
|
||||
"kenderaan": "kendaraan",
|
||||
"menejemen": "manajemen",
|
||||
"menejer": "manajer",
|
||||
"mesjid": "masjid",
|
||||
"rebo": "rabu",
|
||||
"seksama": "saksama",
|
||||
"senggama": "sanggama",
|
||||
"sekedar": "sekadar",
|
||||
"seprei": "seprai",
|
||||
"semedi": "semadi",
|
||||
"samadi": "semadi",
|
||||
"amandemen": "amendemen",
|
||||
"algoritma": "algoritme",
|
||||
"aritmatika": "aritmetika",
|
||||
"metoda": "metode",
|
||||
"materai": "meterai",
|
||||
"meterei": "meterai",
|
||||
"kalendar": "kalender",
|
||||
"kadaluwarsa": "kedaluwarsa",
|
||||
"katagori": "kategori",
|
||||
"parlamen": "parlemen",
|
||||
"sekular": "sekuler",
|
||||
"selular": "seluler",
|
||||
"sirkular": "sirkuler",
|
||||
"survai": "survei",
|
||||
"survey": "survei",
|
||||
"aktuil": "aktual",
|
||||
"formil": "formal",
|
||||
"trotoir": "trotoar",
|
||||
"komersiil": "komersial",
|
||||
"komersil": "komersial",
|
||||
"tradisionil": "tradisionial",
|
||||
"orisinil": "orisinal",
|
||||
"orijinil": "orisinal",
|
||||
"afdol": "afdal",
|
||||
"antri": "antre",
|
||||
"apotik": "apotek",
|
||||
"atlit": "atlet",
|
||||
"atmosfir": "atmosfer",
|
||||
"cidera": "cedera",
|
||||
"cendikiawan": "cendekiawan",
|
||||
"cepet": "cepat",
|
||||
"cinderamata": "cenderamata",
|
||||
"debet": "debit",
|
||||
"difinisi": "definisi",
|
||||
"dekrit": "dekret",
|
||||
"disain": "desain",
|
||||
"diskripsi": "deskripsi",
|
||||
"diskotik": "diskotek",
|
||||
"eksim": "eksem",
|
||||
"exim": "eksem",
|
||||
"faidah": "faedah",
|
||||
"ekstrim": "ekstrem",
|
||||
"ekstrimis": "ekstremis",
|
||||
"komplit": "komplet",
|
||||
"konkrit": "konkret",
|
||||
"kongkrit": "konkret",
|
||||
"kongkret": "konkret",
|
||||
"kridit": "kredit",
|
||||
"musium": "museum",
|
||||
"pinalti": "penalti",
|
||||
"piranti": "peranti",
|
||||
"pinsil": "pensil",
|
||||
"personil": "personel",
|
||||
"sistim": "sistem",
|
||||
"teoritis": "teoretis",
|
||||
"vidio": "video",
|
||||
"cengkeh": "cengkih",
|
||||
"desertasi": "disertasi",
|
||||
"hakekat": "hakikat",
|
||||
"intelejen": "intelijen",
|
||||
"kaedah": "kaidah",
|
||||
"kempes": "kempis",
|
||||
"kementrian": "kementerian",
|
||||
"ledeng": "leding",
|
||||
"nasehat": "nasihat",
|
||||
"penasehat": "penasihat",
|
||||
"praktek": "praktik",
|
||||
"praktekum": "praktikum",
|
||||
"resiko": "risiko",
|
||||
"retsleting": "ritsleting",
|
||||
"senen": "senin",
|
||||
"amuba": "ameba",
|
||||
"punggawa": "penggawa",
|
||||
"surban": "serban",
|
||||
"nomer": "nomor",
|
||||
"sorban": "serban",
|
||||
"bis": "bus",
|
||||
"agribisnis": "agrobisnis",
|
||||
"kantung": "kantong",
|
||||
"khutbah": "khotbah",
|
||||
"mandur": "mandor",
|
||||
"rubuh": "roboh",
|
||||
"pastur": "pastor",
|
||||
"supir": "sopir",
|
||||
"goncang": "guncang",
|
||||
"goa": "gua",
|
||||
"kaos": "kaus",
|
||||
"kokoh": "kukuh",
|
||||
"komulatif": "kumulatif",
|
||||
"kolomnis": "kolumnis",
|
||||
"korma": "kurma",
|
||||
"lobang": "lubang",
|
||||
"limo": "limusin",
|
||||
"limosin": "limusin",
|
||||
"mangkok": "mangkuk",
|
||||
"saos": "saus",
|
||||
"sop": "sup",
|
||||
"sorga": "surga",
|
||||
"tegor": "tegur",
|
||||
"telor": "telur",
|
||||
"obrak-abrik": "ubrak-abrik",
|
||||
"ekwivalen": "ekuivalen",
|
||||
"frekwensi": "frekuensi",
|
||||
"konsekwensi": "konsekuensi",
|
||||
"kwadran": "kuadran",
|
||||
"kwadrat": "kuadrat",
|
||||
"kwalifikasi": "kualifikasi",
|
||||
"kwalitas": "kualitas",
|
||||
"kwalitet": "kualitas",
|
||||
"kwalitatif": "kualitatif",
|
||||
"kwantitas": "kuantitas",
|
||||
"kwantitatif": "kuantitatif",
|
||||
"kwantum": "kuantum",
|
||||
"kwartal": "kuartal",
|
||||
"kwintal": "kuintal",
|
||||
"kwitansi": "kuitansi",
|
||||
"kwatir": "khawatir",
|
||||
"kuatir": "khawatir",
|
||||
"jadual": "jadwal",
|
||||
"hirarki": "hierarki",
|
||||
"karir": "karier",
|
||||
"aktip": "aktif",
|
||||
"daptar": "daftar",
|
||||
"efektip": "efektif",
|
||||
"epektif": "efektif",
|
||||
"epektip": "efektif",
|
||||
"Pebruari": "Februari",
|
||||
"pisik": "fisik",
|
||||
"pondasi": "fondasi",
|
||||
"photo": "foto",
|
||||
"photokopi": "fotokopi",
|
||||
"hapal": "hafal",
|
||||
"insap": "insaf",
|
||||
"insyaf": "insaf",
|
||||
"konperensi": "konferensi",
|
||||
"kreatip": "kreatif",
|
||||
"kreativ": "kreatif",
|
||||
"maap": "maaf",
|
||||
"napsu": "nafsu",
|
||||
"negatip": "negatif",
|
||||
"negativ": "negatif",
|
||||
"objektip": "objektif",
|
||||
"obyektip": "objektif",
|
||||
"obyektif": "objektif",
|
||||
"pasip": "pasif",
|
||||
"pasiv": "pasif",
|
||||
"positip": "positif",
|
||||
"positiv": "positif",
|
||||
"produktip": "produktif",
|
||||
"produktiv": "produktif",
|
||||
"sarap": "saraf",
|
||||
"sertipikat": "sertifikat",
|
||||
"subjektip": "subjektif",
|
||||
"subyektip": "subjektif",
|
||||
"subyektif": "subjektif",
|
||||
"tarip": "tarif",
|
||||
"transitip": "transitif",
|
||||
"transitiv": "transitif",
|
||||
"faham": "paham",
|
||||
"fikir": "pikir",
|
||||
"berfikir": "berpikir",
|
||||
"telefon": "telepon",
|
||||
"telfon": "telepon",
|
||||
"telpon": "telepon",
|
||||
"tilpon": "telepon",
|
||||
"nafas": "napas",
|
||||
"bernafas": "bernapas",
|
||||
"pernafasan": "pernapasan",
|
||||
"vermak": "permak",
|
||||
"vulpen": "pulpen",
|
||||
"aktifis": "aktivis",
|
||||
"konfeksi": "konveksi",
|
||||
"motifasi": "motivasi",
|
||||
"Nopember": "November",
|
||||
"propinsi": "provinsi",
|
||||
"babtis": "baptis",
|
||||
"jerembab": "jerembap",
|
||||
"lembab": "lembap",
|
||||
"sembab": "sembap",
|
||||
"saptu": "sabtu",
|
||||
"tekat": "tekad",
|
||||
"bejad": "bejat",
|
||||
"nekad": "nekat",
|
||||
"otoped": "otopet",
|
||||
"skuad": "skuat",
|
||||
"jenius": "genius",
|
||||
"marjin": "margin",
|
||||
"marjinal": "marginal",
|
||||
"obyek": "objek",
|
||||
"subyek": "subjek",
|
||||
"projek": "proyek",
|
||||
"azas": "asas",
|
||||
"ijasah": "ijazah",
|
||||
"jenasah": "jenazah",
|
||||
"plasa": "plaza",
|
||||
"bathin": "batin",
|
||||
"Katholik": "Katolik",
|
||||
"orthografi": "ortografi",
|
||||
"pathogen": "patogen",
|
||||
"theologi": "teologi",
|
||||
"ijin": "izin",
|
||||
"rejeki": "rezeki",
|
||||
"rejim": "rezim",
|
||||
"jaman": "zaman",
|
||||
"jamrud": "zamrud",
|
||||
"jinah": "zina",
|
||||
"perjinahan": "perzinaan",
|
||||
"anugrah": "anugerah",
|
||||
"cendrawasih": "cenderawasih",
|
||||
"jendral": "jenderal",
|
||||
"kripik": "keripik",
|
||||
"krupuk": "kerupuk",
|
||||
"ksatria": "kesatria",
|
||||
"mentri": "menteri",
|
||||
"negri": "negeri",
|
||||
"Prancis": "Perancis",
|
||||
"sebrang": "seberang",
|
||||
"menyebrang": "menyeberang",
|
||||
"Sumatra": "Sumatera",
|
||||
"trampil": "terampil",
|
||||
"isteri": "istri",
|
||||
"justeru": "justru",
|
||||
"perajurit": "prajurit",
|
||||
"putera": "putra",
|
||||
"puteri": "putri",
|
||||
"samudera": "samudra",
|
||||
"sastera": "sastra",
|
||||
"sutera": "sutra",
|
||||
"terompet": "trompet",
|
||||
"iklas": "ikhlas",
|
||||
"iktisar": "ikhtisar",
|
||||
"kafilah": "khafilah",
|
||||
"kawatir": "khawatir",
|
||||
"kotbah": "khotbah",
|
||||
"kusyuk": "khusyuk",
|
||||
"makluk": "makhluk",
|
||||
"mahluk": "makhluk",
|
||||
"mahkluk": "makhluk",
|
||||
"nahkoda": "nakhoda",
|
||||
"nakoda": "nakhoda",
|
||||
"tahta": "takhta",
|
||||
"takhyul": "takhayul",
|
||||
"tahyul": "takhayul",
|
||||
"tahayul": "takhayul",
|
||||
"akhli": "ahli",
|
||||
"anarkhi": "anarki",
|
||||
"kharisma": "karisma",
|
||||
"kharismatik": "karismatik",
|
||||
"mahsud": "maksud",
|
||||
"makhsud": "maksud",
|
||||
"rakhmat": "rahmat",
|
||||
"tekhnik": "teknik",
|
||||
"tehnik": "teknik",
|
||||
"tehnologi": "teknologi",
|
||||
"ikhwal": "ihwal",
|
||||
"expor": "ekspor",
|
||||
"extra": "ekstra",
|
||||
"komplex": "komplek",
|
||||
"sex": "seks",
|
||||
"taxi": "taksi",
|
||||
"extasi": "ekstasi",
|
||||
"syaraf": "saraf",
|
||||
"syurga": "surga",
|
||||
"mashur": "masyhur",
|
||||
"masyur": "masyhur",
|
||||
"mahsyur": "masyhur",
|
||||
"mashyur": "masyhur",
|
||||
"muadzin": "muazin",
|
||||
"adzan": "azan",
|
||||
"ustadz": "ustaz",
|
||||
"ustad": "ustaz",
|
||||
"ustadzah": "ustaz",
|
||||
"dzikir": "zikir",
|
||||
"dzuhur": "zuhur",
|
||||
"dhuhur": "zuhur",
|
||||
"zhuhur": "zuhur",
|
||||
"analisa": "analisis",
|
||||
"diagnosa": "diagnosis",
|
||||
"hipotesa": "hipotesis",
|
||||
"sintesa": "sintesis",
|
||||
"aktiviti": "aktivitas",
|
||||
"aktifitas": "aktivitas",
|
||||
"efektifitas": "efektivitas",
|
||||
"komuniti": "komunitas",
|
||||
"kreatifitas": "kreativitas",
|
||||
"produktifitas": "produktivitas",
|
||||
"realiti": "realitas",
|
||||
"realita": "realitas",
|
||||
"selebriti": "selebritas",
|
||||
"spotifitas": "sportivitas",
|
||||
"universiti": "universitas",
|
||||
"utiliti": "utilitas",
|
||||
"validiti": "validitas",
|
||||
"dilokalisir": "dilokalisasi",
|
||||
"didramatisir": "didramatisasi",
|
||||
"dipolitisir": "dipolitisasi",
|
||||
"dinetralisir": "dinetralisasi",
|
||||
"dikonfrontir": "dikonfrontasi",
|
||||
"mendominir": "mendominasi",
|
||||
"koordinir": "koordinasi",
|
||||
"proklamir": "proklamasi",
|
||||
"terorganisir": "terorganisasi",
|
||||
"terealisir": "terealisasi",
|
||||
"robah": "ubah",
|
||||
"dirubah": "diubah",
|
||||
"merubah": "mengubah",
|
||||
"terlanjur": "telanjur",
|
||||
"terlantar": "telantar",
|
||||
"penglepasan": "pelepasan",
|
||||
"pelihatan": "penglihatan",
|
||||
"pemukiman": "permukiman",
|
||||
"pengrumahan": "perumahan",
|
||||
"penyewaan": "persewaan",
|
||||
"menyintai": "mencintai",
|
||||
"menyolok": "mencolok",
|
||||
"contek": "sontek",
|
||||
"mencontek": "menyontek",
|
||||
"pungkir": "mungkir",
|
||||
"dipungkiri": "dimungkiri",
|
||||
"kupungkiri": "kumungkiri",
|
||||
"kaupungkiri": "kaumungkiri",
|
||||
"nampak": "tampak",
|
||||
"nampaknya": "tampaknya",
|
||||
"nongkrong": "tongkrong",
|
||||
"berternak": "beternak",
|
||||
"berterbangan": "beterbangan",
|
||||
"berserta": "beserta",
|
||||
"berperkara": "beperkara",
|
||||
"berpergian": "bepergian",
|
||||
"berkerja": "bekerja",
|
||||
"berberapa": "beberapa",
|
||||
"terbersit": "tebersit",
|
||||
"terpercaya": "tepercaya",
|
||||
"terperdaya": "teperdaya",
|
||||
"terpercik": "tepercik",
|
||||
"terpergok": "tepergok",
|
||||
"aksesoris": "aksesori",
|
||||
"handal": "andal",
|
||||
"hantar": "antar",
|
||||
"panutan": "anutan",
|
||||
"atsiri": "asiri",
|
||||
"bhakti": "bakti",
|
||||
"china": "cina",
|
||||
"dharma": "darma",
|
||||
"diktaktor": "diktator",
|
||||
"eksport": "ekspor",
|
||||
"hembus": "embus",
|
||||
"hadits": "hadis",
|
||||
"hadist": "hadits",
|
||||
"harafiah": "harfiah",
|
||||
"himbau": "imbau",
|
||||
"import": "impor",
|
||||
"inget": "ingat",
|
||||
"hisap": "isap",
|
||||
"interprestasi": "interpretasi",
|
||||
"kangker": "kanker",
|
||||
"konggres": "kongres",
|
||||
"lansekap": "lanskap",
|
||||
"maghrib": "magrib",
|
||||
"emak": "mak",
|
||||
"moderen": "modern",
|
||||
"pasport": "paspor",
|
||||
"perduli": "peduli",
|
||||
"ramadhan": "ramadan",
|
||||
"rapih": "rapi",
|
||||
"Sansekerta": "Sanskerta",
|
||||
"shalat": "salat",
|
||||
"sholat": "salat",
|
||||
"silahkan": "silakan",
|
||||
"standard": "standar",
|
||||
"hutang": "utang",
|
||||
"zinah": "zina",
|
||||
"ambulan": "ambulans",
|
||||
"antartika": "sntarktika",
|
||||
"arteri": "arteria",
|
||||
"asik": "asyik",
|
||||
"australi": "australia",
|
||||
"denga": "dengan",
|
||||
"depo": "depot",
|
||||
"detil": "detail",
|
||||
"ensiklopedi": "ensiklopedia",
|
||||
"elit": "elite",
|
||||
"frustasi": "frustrasi",
|
||||
"gladi": "geladi",
|
||||
"greget": "gereget",
|
||||
"itali": "italia",
|
||||
"karna": "karena",
|
||||
"klenteng": "kelenteng",
|
||||
"erling": "kerling",
|
||||
"kontruksi": "konstruksi",
|
||||
"masal": "massal",
|
||||
"merk": "merek",
|
||||
"respon": "respons",
|
||||
"diresponi": "direspons",
|
||||
"skak": "sekak",
|
||||
"stir": "setir",
|
||||
"singapur": "singapura",
|
||||
"standarisasi": "standardisasi",
|
||||
"varitas": "varietas",
|
||||
"amphibi": "amfibi",
|
||||
"anjlog": "anjlok",
|
||||
"alpukat": "avokad",
|
||||
"alpokat": "avokad",
|
||||
"bolpen": "pulpen",
|
||||
"cabe": "cabai",
|
||||
"cabay": "cabai",
|
||||
"ceret": "cerek",
|
||||
"differensial": "diferensial",
|
||||
"duren": "durian",
|
||||
"faksimili": "faksimile",
|
||||
"faksimil": "faksimile",
|
||||
"graha": "gerha",
|
||||
"goblog": "goblok",
|
||||
"gombrong": "gombroh",
|
||||
"horden": "gorden",
|
||||
"korden": "gorden",
|
||||
"gubug": "gubuk",
|
||||
"imaginasi": "imajinasi",
|
||||
"jerigen": "jeriken",
|
||||
"jirigen": "jeriken",
|
||||
"carut-marut": "karut-marut",
|
||||
"kwota": "kuota",
|
||||
"mahzab": "mazhab",
|
||||
"mempesona": "memesona",
|
||||
"milyar": "miliar",
|
||||
"missi": "misi",
|
||||
"nenas": "nanas",
|
||||
"negoisasi": "negosiasi",
|
||||
"automotif": "otomotif",
|
||||
"pararel": "paralel",
|
||||
"paska": "pasca",
|
||||
"prosen": "persen",
|
||||
"pete": "petai",
|
||||
"petay": "petai",
|
||||
"proffesor": "profesor",
|
||||
"rame": "ramai",
|
||||
"rapot": "rapor",
|
||||
"rileks": "relaks",
|
||||
"rileksasi": "relaksasi",
|
||||
"renumerasi": "remunerasi",
|
||||
"seketaris": "sekretaris",
|
||||
"sekertaris": "sekretaris",
|
||||
"sensorik": "sensoris",
|
||||
"sentausa": "sentosa",
|
||||
"strawberi": "stroberi",
|
||||
"strawbery": "stroberi",
|
||||
"taqwa": "takwa",
|
||||
"tauco": "taoco",
|
||||
"tauge": "taoge",
|
||||
"toge": "taoge",
|
||||
"tauladan": "teladan",
|
||||
"taubat": "tobat",
|
||||
"trilyun": "triliun",
|
||||
"vissi": "visi",
|
||||
"coklat": "cokelat",
|
||||
"narkotika": "narkotik",
|
||||
"oase": "oasis",
|
||||
"politisi": "politikus",
|
||||
"terong": "terung",
|
||||
"wool": "wol",
|
||||
"himpit": "impit",
|
||||
"mujizat": "mukjizat",
|
||||
"mujijat": "mukjizat",
|
||||
"yag": "yang",
|
||||
}
|
||||
|
||||
NORM_EXCEPTIONS = {}
|
||||
|
||||
for string, norm in _exc.items():
|
||||
NORM_EXCEPTIONS[string] = norm
|
||||
NORM_EXCEPTIONS[string.title()] = norm
|
|
@ -1,7 +1,8 @@
|
|||
from ...symbols import NOUN, PROPN, PRON
|
||||
from ...errors import Errors
|
||||
|
||||
|
||||
def noun_chunks(obj):
|
||||
def noun_chunks(doclike):
|
||||
"""
|
||||
Detect base noun phrases from a dependency parse. Works on both Doc and Span.
|
||||
"""
|
||||
|
@ -15,12 +16,16 @@ def noun_chunks(obj):
|
|||
"nmod",
|
||||
"nmod:poss",
|
||||
]
|
||||
doc = obj.doc # Ensure works on both Doc and Span.
|
||||
doc = doclike.doc # Ensure works on both Doc and Span.
|
||||
|
||||
if not doc.is_parsed:
|
||||
raise ValueError(Errors.E029)
|
||||
|
||||
np_deps = [doc.vocab.strings[label] for label in labels]
|
||||
conj = doc.vocab.strings.add("conj")
|
||||
np_label = doc.vocab.strings.add("NP")
|
||||
seen = set()
|
||||
for i, word in enumerate(obj):
|
||||
for i, word in enumerate(doclike):
|
||||
if word.pos not in (NOUN, PROPN, PRON):
|
||||
continue
|
||||
# Prevent nested chunks from being produced
|
||||
|
|
22
spacy/lang/kn/examples.py
Normal file
22
spacy/lang/kn/examples.py
Normal file
|
@ -0,0 +1,22 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
"""
|
||||
Example sentences to test spaCy and its language models.
|
||||
|
||||
>>> from spacy.lang.en.examples import sentences
|
||||
>>> docs = nlp.pipe(sentences)
|
||||
"""
|
||||
|
||||
|
||||
sentences = [
|
||||
"ಆಪಲ್ ಒಂದು ಯು.ಕೆ. ಸ್ಟಾರ್ಟ್ಅಪ್ ಅನ್ನು ೧ ಶತಕೋಟಿ ಡಾಲರ್ಗಳಿಗೆ ಖರೀದಿಸಲು ನೋಡುತ್ತಿದೆ.",
|
||||
"ಸ್ವಾಯತ್ತ ಕಾರುಗಳು ವಿಮಾ ಹೊಣೆಗಾರಿಕೆಯನ್ನು ತಯಾರಕರ ಕಡೆಗೆ ಬದಲಾಯಿಸುತ್ತವೆ.",
|
||||
"ಕಾಲುದಾರಿ ವಿತರಣಾ ರೋಬೋಟ್ಗಳನ್ನು ನಿಷೇಧಿಸುವುದನ್ನು ಸ್ಯಾನ್ ಫ್ರಾನ್ಸಿಸ್ಕೊ ಪರಿಗಣಿಸುತ್ತದೆ.",
|
||||
"ಲಂಡನ್ ಯುನೈಟೆಡ್ ಕಿಂಗ್ಡಂನ ದೊಡ್ಡ ನಗರ.",
|
||||
"ನೀನು ಎಲ್ಲಿದಿಯಾ?",
|
||||
"ಫ್ರಾನ್ಸಾದ ಅಧ್ಯಕ್ಷರು ಯಾರು?",
|
||||
"ಯುನೈಟೆಡ್ ಸ್ಟೇಟ್ಸ್ನ ರಾಜಧಾನಿ ಯಾವುದು?",
|
||||
"ಬರಾಕ್ ಒಬಾಮ ಯಾವಾಗ ಜನಿಸಿದರು?",
|
||||
]
|
|
@ -6,8 +6,8 @@ Example sentences to test spaCy and its language models.
|
|||
"""
|
||||
|
||||
sentences = [
|
||||
"애플이 영국의 신생 기업을 10억 달러에 구매를 고려중이다.",
|
||||
"자동 운전 자동차의 손해 배상 책임에 자동차 메이커에 일정한 부담을 요구하겠다.",
|
||||
"자동 배달 로봇이 보도를 주행하는 것을 샌프란시스코시가 금지를 검토중이라고 합니다.",
|
||||
"애플이 영국의 스타트업을 10억 달러에 인수하는 것을 알아보고 있다.",
|
||||
"자율주행 자동차의 손해 배상 책임이 제조 업체로 옮겨 가다",
|
||||
"샌프란시스코 시가 자동 배달 로봇의 보도 주행 금지를 검토 중이라고 합니다.",
|
||||
"런던은 영국의 수도이자 가장 큰 도시입니다.",
|
||||
]
|
||||
|
|
|
@ -1,24 +1,19 @@
|
|||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
from .punctuation import TOKENIZER_INFIXES
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from .tag_map import TAG_MAP
|
||||
from .stop_words import STOP_WORDS
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...language import Language
|
||||
from ...attrs import LANG, NORM
|
||||
from ...util import update_exc, add_lookups
|
||||
from ...attrs import LANG
|
||||
from ...util import update_exc
|
||||
|
||||
|
||||
class LuxembourgishDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = lambda text: "lb"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], NORM_EXCEPTIONS, BASE_NORMS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
stop_words = STOP_WORDS
|
||||
tag_map = TAG_MAP
|
||||
|
|
|
@ -1,13 +0,0 @@
|
|||
# TODO
|
||||
# norm execptions: find a possibility to deal with the zillions of spelling
|
||||
# variants (vläicht = vlaicht, vleicht, viläicht, viläischt, etc. etc.)
|
||||
# here one could include the most common spelling mistakes
|
||||
|
||||
_exc = {"dass": "datt", "viläicht": "vläicht"}
|
||||
|
||||
|
||||
NORM_EXCEPTIONS = {}
|
||||
|
||||
for string, norm in _exc.items():
|
||||
NORM_EXCEPTIONS[string] = norm
|
||||
NORM_EXCEPTIONS[string.title()] = norm
|
|
@ -183,10 +183,6 @@ def suffix(string):
|
|||
return string[-3:]
|
||||
|
||||
|
||||
def cluster(string):
|
||||
return 0
|
||||
|
||||
|
||||
def is_alpha(string):
|
||||
return string.isalpha()
|
||||
|
||||
|
@ -215,20 +211,11 @@ def is_stop(string, stops=set()):
|
|||
return string.lower() in stops
|
||||
|
||||
|
||||
def is_oov(string):
|
||||
return True
|
||||
|
||||
|
||||
def get_prob(string):
|
||||
return -20.0
|
||||
|
||||
|
||||
LEX_ATTRS = {
|
||||
attrs.LOWER: lower,
|
||||
attrs.NORM: lower,
|
||||
attrs.PREFIX: prefix,
|
||||
attrs.SUFFIX: suffix,
|
||||
attrs.CLUSTER: cluster,
|
||||
attrs.IS_ALPHA: is_alpha,
|
||||
attrs.IS_DIGIT: is_digit,
|
||||
attrs.IS_LOWER: is_lower,
|
||||
|
@ -236,8 +223,6 @@ LEX_ATTRS = {
|
|||
attrs.IS_TITLE: is_title,
|
||||
attrs.IS_UPPER: is_upper,
|
||||
attrs.IS_STOP: is_stop,
|
||||
attrs.IS_OOV: is_oov,
|
||||
attrs.PROB: get_prob,
|
||||
attrs.LIKE_EMAIL: like_email,
|
||||
attrs.LIKE_NUM: like_num,
|
||||
attrs.IS_PUNCT: is_punct,
|
||||
|
|
18
spacy/lang/ml/__init__.py
Normal file
18
spacy/lang/ml/__init__.py
Normal file
|
@ -0,0 +1,18 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .stop_words import STOP_WORDS
|
||||
|
||||
from ...language import Language
|
||||
|
||||
|
||||
class MalayalamDefaults(Language.Defaults):
|
||||
stop_words = STOP_WORDS
|
||||
|
||||
|
||||
class Malayalam(Language):
|
||||
lang = "ml"
|
||||
Defaults = MalayalamDefaults
|
||||
|
||||
|
||||
__all__ = ["Malayalam"]
|
19
spacy/lang/ml/examples.py
Normal file
19
spacy/lang/ml/examples.py
Normal file
|
@ -0,0 +1,19 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
"""
|
||||
Example sentences to test spaCy and its language models.
|
||||
|
||||
>>> from spacy.lang.ml.examples import sentences
|
||||
>>> docs = nlp.pipe(sentences)
|
||||
"""
|
||||
|
||||
|
||||
sentences = [
|
||||
"അനാവശ്യമായി കണ്ണിലും മൂക്കിലും വായിലും സ്പർശിക്കാതിരിക്കുക",
|
||||
"പൊതുരംഗത്ത് മലയാള ഭാഷയുടെ സമഗ്രപുരോഗതി ലക്ഷ്യമാക്കി പ്രവർത്തിക്കുന്ന സംഘടനയായ മലയാളഐക്യവേദിയുടെ വിദ്യാർത്ഥിക്കൂട്ടായ്മയാണ് വിദ്യാർത്ഥി മലയാളവേദി",
|
||||
"എന്താണ് കവാടങ്ങൾ?",
|
||||
"ചുരുക്കത്തിൽ വിക്കിപീഡിയയുടെ ഉള്ളടക്കത്തിലേക്കുള്ള പടിപ്പുരകളാണ് കവാടങ്ങൾ. അവ ലളിതവും വായനക്കാരനെ ആകർഷിക്കുന്നതുമായിരിക്കും",
|
||||
"പതിനൊന്നുപേർ വീതമുള്ള രണ്ടു ടീമുകൾ കളിക്കുന്ന സംഘകായിക വിനോദമാണു ക്രിക്കറ്റ്",
|
||||
]
|
80
spacy/lang/ml/lex_attrs.py
Normal file
80
spacy/lang/ml/lex_attrs.py
Normal file
|
@ -0,0 +1,80 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from ...attrs import LIKE_NUM
|
||||
|
||||
|
||||
# reference 2: https://www.omniglot.com/language/numbers/malayalam.htm
|
||||
|
||||
_num_words = [
|
||||
"പൂജ്യം ",
|
||||
"ഒന്ന് ",
|
||||
"രണ്ട് ",
|
||||
"മൂന്ന് ",
|
||||
"നാല് ",
|
||||
"അഞ്ച് ",
|
||||
"ആറ് ",
|
||||
"ഏഴ് ",
|
||||
"എട്ട് ",
|
||||
"ഒന്പത് ",
|
||||
"പത്ത് ",
|
||||
"പതിനൊന്ന്",
|
||||
"പന്ത്രണ്ട്",
|
||||
"പതി മൂന്നു",
|
||||
"പതിനാല്",
|
||||
"പതിനഞ്ച്",
|
||||
"പതിനാറ്",
|
||||
"പതിനേഴ്",
|
||||
"പതിനെട്ട്",
|
||||
"പത്തൊമ്പതു",
|
||||
"ഇരുപത്",
|
||||
"ഇരുപത്തിഒന്ന്",
|
||||
"ഇരുപത്തിരണ്ട്",
|
||||
"ഇരുപത്തിമൂന്ന്",
|
||||
"ഇരുപത്തിനാല്",
|
||||
"ഇരുപത്തിഅഞ്ചു",
|
||||
"ഇരുപത്തിആറ്",
|
||||
"ഇരുപത്തിഏഴ്",
|
||||
"ഇരുപത്തിഎട്ടു",
|
||||
"ഇരുപത്തിഒന്പത്",
|
||||
"മുപ്പത്",
|
||||
"മുപ്പത്തിഒന്ന്",
|
||||
"മുപ്പത്തിരണ്ട്",
|
||||
"മുപ്പത്തിമൂന്ന്",
|
||||
"മുപ്പത്തിനാല്",
|
||||
"മുപ്പത്തിഅഞ്ചു",
|
||||
"മുപ്പത്തിആറ്",
|
||||
"മുപ്പത്തിഏഴ്",
|
||||
"മുപ്പത്തിഎട്ട്",
|
||||
"മുപ്പത്തിഒന്പതു",
|
||||
"നാല്പത് ",
|
||||
"അന്പത് ",
|
||||
"അറുപത് ",
|
||||
"എഴുപത് ",
|
||||
"എണ്പത് ",
|
||||
"തൊണ്ണൂറ് ",
|
||||
"നുറ് ",
|
||||
"ആയിരം ",
|
||||
"പത്തുലക്ഷം",
|
||||
]
|
||||
|
||||
|
||||
def like_num(text):
|
||||
"""
|
||||
Check if text resembles a number
|
||||
"""
|
||||
if text.startswith(("+", "-", "±", "~")):
|
||||
text = text[1:]
|
||||
text = text.replace(",", "").replace(".", "")
|
||||
if text.isdigit():
|
||||
return True
|
||||
if text.count("/") == 1:
|
||||
num, denom = text.split("/")
|
||||
if num.isdigit() and denom.isdigit():
|
||||
return True
|
||||
if text in _num_words:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
LEX_ATTRS = {LIKE_NUM: like_num}
|
17
spacy/lang/ml/stop_words.py
Normal file
17
spacy/lang/ml/stop_words.py
Normal file
|
@ -0,0 +1,17 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
STOP_WORDS = set(
|
||||
"""
|
||||
അത്
|
||||
ഇത്
|
||||
ആയിരുന്നു
|
||||
ആകുന്നു
|
||||
വരെ
|
||||
അന്നേരം
|
||||
അന്ന്
|
||||
ഇന്ന്
|
||||
ആണ്
|
||||
""".split()
|
||||
)
|
|
@ -1,7 +1,8 @@
|
|||
from ...symbols import NOUN, PROPN, PRON
|
||||
from ...errors import Errors
|
||||
|
||||
|
||||
def noun_chunks(obj):
|
||||
def noun_chunks(doclike):
|
||||
"""
|
||||
Detect base noun phrases from a dependency parse. Works on both Doc and Span.
|
||||
"""
|
||||
|
@ -15,12 +16,16 @@ def noun_chunks(obj):
|
|||
"nmod",
|
||||
"nmod:poss",
|
||||
]
|
||||
doc = obj.doc # Ensure works on both Doc and Span.
|
||||
doc = doclike.doc # Ensure works on both Doc and Span.
|
||||
|
||||
if not doc.is_parsed:
|
||||
raise ValueError(Errors.E029)
|
||||
|
||||
np_deps = [doc.vocab.strings[label] for label in labels]
|
||||
conj = doc.vocab.strings.add("conj")
|
||||
np_label = doc.vocab.strings.add("NP")
|
||||
seen = set()
|
||||
for i, word in enumerate(obj):
|
||||
for i, word in enumerate(doclike):
|
||||
if word.pos not in (NOUN, PROPN, PRON):
|
||||
continue
|
||||
# Prevent nested chunks from being produced
|
||||
|
|
|
@ -2,7 +2,8 @@ from .stop_words import STOP_WORDS
|
|||
from .lex_attrs import LEX_ATTRS
|
||||
from .tag_map import TAG_MAP
|
||||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
|
||||
from .punctuation import TOKENIZER_PREFIXES, TOKENIZER_INFIXES
|
||||
from .punctuation import TOKENIZER_SUFFIXES
|
||||
from .lemmatizer import DutchLemmatizer
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
|
@ -22,6 +23,7 @@ class DutchDefaults(Language.Defaults):
|
|||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
stop_words = STOP_WORDS
|
||||
tag_map = TAG_MAP
|
||||
prefixes = TOKENIZER_PREFIXES
|
||||
infixes = TOKENIZER_INFIXES
|
||||
suffixes = TOKENIZER_SUFFIXES
|
||||
|
||||
|
|
|
@ -1,7 +1,11 @@
|
|||
from ..char_classes import LIST_ELLIPSES, LIST_ICONS
|
||||
from ..char_classes import LIST_ELLIPSES, LIST_ICONS, LIST_UNITS, merge_chars
|
||||
from ..char_classes import LIST_PUNCT, LIST_QUOTES, CURRENCY, PUNCT
|
||||
from ..char_classes import CONCAT_QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
|
||||
|
||||
from ..punctuation import TOKENIZER_SUFFIXES as DEFAULT_TOKENIZER_SUFFIXES
|
||||
from ..punctuation import TOKENIZER_PREFIXES as BASE_TOKENIZER_PREFIXES
|
||||
|
||||
|
||||
_prefixes = [",,"] + BASE_TOKENIZER_PREFIXES
|
||||
|
||||
|
||||
# Copied from `de` package. Main purpose is to ensure that hyphens are not
|
||||
|
@ -19,20 +23,33 @@ _infixes = (
|
|||
r"(?<=[{a}]),(?=[{a}])".format(a=ALPHA),
|
||||
r"(?<=[{a}])([{q}\)\]\(\[])(?=[{a}])".format(a=ALPHA, q=_quotes),
|
||||
r"(?<=[{a}])--(?=[{a}])".format(a=ALPHA),
|
||||
r"(?<=[0-9])-(?=[0-9])",
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
# Remove "'s" suffix from suffix list. In Dutch, "'s" is a plural ending when
|
||||
# it occurs as a suffix and a clitic for "eens" in standalone use. To avoid
|
||||
# ambiguity it's better to just leave it attached when it occurs as a suffix.
|
||||
default_suffix_blacklist = ("'s", "'S", "’s", "’S")
|
||||
_suffixes = [
|
||||
suffix
|
||||
for suffix in DEFAULT_TOKENIZER_SUFFIXES
|
||||
if suffix not in default_suffix_blacklist
|
||||
]
|
||||
_list_units = [u for u in LIST_UNITS if u != "%"]
|
||||
_units = merge_chars(" ".join(_list_units))
|
||||
|
||||
_suffixes = (
|
||||
["''"]
|
||||
+ LIST_PUNCT
|
||||
+ LIST_ELLIPSES
|
||||
+ LIST_QUOTES
|
||||
+ LIST_ICONS
|
||||
+ ["—", "–"]
|
||||
+ [
|
||||
r"(?<=[0-9])\+",
|
||||
r"(?<=°[FfCcKk])\.",
|
||||
r"(?<=[0-9])(?:{c})".format(c=CURRENCY),
|
||||
r"(?<=[0-9])(?:{u})".format(u=_units),
|
||||
r"(?<=[0-9{al}{e}{p}(?:{q})])\.".format(
|
||||
al=ALPHA_LOWER, e=r"%²\-\+", q=CONCAT_QUOTES, p=PUNCT
|
||||
),
|
||||
r"(?<=[{au}][{au}])\.".format(au=ALPHA_UPPER),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
TOKENIZER_PREFIXES = _prefixes
|
||||
TOKENIZER_INFIXES = _infixes
|
||||
TOKENIZER_SUFFIXES = _suffixes
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,14 +1,16 @@
|
|||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .punctuation import TOKENIZER_INFIXES
|
||||
from .punctuation import TOKENIZER_PREFIXES, TOKENIZER_INFIXES
|
||||
from .punctuation import TOKENIZER_SUFFIXES
|
||||
from .tag_map import TAG_MAP
|
||||
from .stop_words import STOP_WORDS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from .lemmatizer import PolishLemmatizer
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...language import Language
|
||||
from ...attrs import LANG, NORM
|
||||
from ...util import update_exc, add_lookups
|
||||
from ...util import add_lookups
|
||||
from ...lookups import Lookups
|
||||
|
||||
|
||||
class PolishDefaults(Language.Defaults):
|
||||
|
@ -18,10 +20,21 @@ class PolishDefaults(Language.Defaults):
|
|||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
mod_base_exceptions = {
|
||||
exc: val for exc, val in BASE_EXCEPTIONS.items() if not exc.endswith(".")
|
||||
}
|
||||
tokenizer_exceptions = mod_base_exceptions
|
||||
stop_words = STOP_WORDS
|
||||
tag_map = TAG_MAP
|
||||
prefixes = TOKENIZER_PREFIXES
|
||||
infixes = TOKENIZER_INFIXES
|
||||
suffixes = TOKENIZER_SUFFIXES
|
||||
|
||||
@classmethod
|
||||
def create_lemmatizer(cls, nlp=None, lookups=None):
|
||||
if lookups is None:
|
||||
lookups = Lookups()
|
||||
return PolishLemmatizer(lookups)
|
||||
|
||||
|
||||
class Polish(Language):
|
||||
|
|
File diff suppressed because it is too large
Load Diff
106
spacy/lang/pl/lemmatizer.py
Normal file
106
spacy/lang/pl/lemmatizer.py
Normal file
|
@ -0,0 +1,106 @@
|
|||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from ...lemmatizer import Lemmatizer
|
||||
from ...parts_of_speech import NAMES
|
||||
|
||||
|
||||
class PolishLemmatizer(Lemmatizer):
|
||||
# This lemmatizer implements lookup lemmatization based on
|
||||
# the Morfeusz dictionary (morfeusz.sgjp.pl/en) by Institute of Computer Science PAS
|
||||
# It utilizes some prefix based improvements for
|
||||
# verb and adjectives lemmatization, as well as case-sensitive
|
||||
# lemmatization for nouns
|
||||
def __init__(self, lookups, *args, **kwargs):
|
||||
# this lemmatizer is lookup based, so it does not require an index, exceptionlist, or rules
|
||||
super().__init__(lookups)
|
||||
self.lemma_lookups = {}
|
||||
for tag in [
|
||||
"ADJ",
|
||||
"ADP",
|
||||
"ADV",
|
||||
"AUX",
|
||||
"NOUN",
|
||||
"NUM",
|
||||
"PART",
|
||||
"PRON",
|
||||
"VERB",
|
||||
"X",
|
||||
]:
|
||||
self.lemma_lookups[tag] = self.lookups.get_table(
|
||||
"lemma_lookup_" + tag.lower(), {}
|
||||
)
|
||||
self.lemma_lookups["DET"] = self.lemma_lookups["X"]
|
||||
self.lemma_lookups["PROPN"] = self.lemma_lookups["NOUN"]
|
||||
|
||||
def __call__(self, string, univ_pos, morphology=None):
|
||||
if isinstance(univ_pos, int):
|
||||
univ_pos = NAMES.get(univ_pos, "X")
|
||||
univ_pos = univ_pos.upper()
|
||||
|
||||
if univ_pos == "NOUN":
|
||||
return self.lemmatize_noun(string, morphology)
|
||||
|
||||
if univ_pos != "PROPN":
|
||||
string = string.lower()
|
||||
|
||||
if univ_pos == "ADJ":
|
||||
return self.lemmatize_adj(string, morphology)
|
||||
elif univ_pos == "VERB":
|
||||
return self.lemmatize_verb(string, morphology)
|
||||
|
||||
lemma_dict = self.lemma_lookups.get(univ_pos, {})
|
||||
return [lemma_dict.get(string, string.lower())]
|
||||
|
||||
def lemmatize_adj(self, string, morphology):
|
||||
# this method utilizes different procedures for adjectives
|
||||
# with 'nie' and 'naj' prefixes
|
||||
lemma_dict = self.lemma_lookups["ADJ"]
|
||||
|
||||
if string[:3] == "nie":
|
||||
search_string = string[3:]
|
||||
if search_string[:3] == "naj":
|
||||
naj_search_string = search_string[3:]
|
||||
if naj_search_string in lemma_dict:
|
||||
return [lemma_dict[naj_search_string]]
|
||||
if search_string in lemma_dict:
|
||||
return [lemma_dict[search_string]]
|
||||
|
||||
if string[:3] == "naj":
|
||||
naj_search_string = string[3:]
|
||||
if naj_search_string in lemma_dict:
|
||||
return [lemma_dict[naj_search_string]]
|
||||
|
||||
return [lemma_dict.get(string, string)]
|
||||
|
||||
def lemmatize_verb(self, string, morphology):
|
||||
# this method utilizes a different procedure for verbs
|
||||
# with 'nie' prefix
|
||||
lemma_dict = self.lemma_lookups["VERB"]
|
||||
|
||||
if string[:3] == "nie":
|
||||
search_string = string[3:]
|
||||
if search_string in lemma_dict:
|
||||
return [lemma_dict[search_string]]
|
||||
|
||||
return [lemma_dict.get(string, string)]
|
||||
|
||||
def lemmatize_noun(self, string, morphology):
|
||||
# this method is case-sensitive, in order to work
|
||||
# for incorrectly tagged proper names
|
||||
lemma_dict = self.lemma_lookups["NOUN"]
|
||||
|
||||
if string != string.lower():
|
||||
if string.lower() in lemma_dict:
|
||||
return [lemma_dict[string.lower()]]
|
||||
elif string in lemma_dict:
|
||||
return [lemma_dict[string]]
|
||||
return [string.lower()]
|
||||
|
||||
return [lemma_dict.get(string, string)]
|
||||
|
||||
def lookup(self, string, orth=None):
|
||||
return string.lower()
|
||||
|
||||
def lemmatize(self, string, index, exceptions, rules):
|
||||
raise NotImplementedError
|
|
@ -1,23 +0,0 @@
|
|||
|
||||
Copyright (c) 2019, Marcin Miłkowski
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
2. Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
||||
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
||||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
||||
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
||||
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
||||
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
@ -1,19 +1,45 @@
|
|||
from ..char_classes import LIST_ELLIPSES, CONCAT_ICONS
|
||||
from ..char_classes import LIST_ELLIPSES, LIST_PUNCT, LIST_HYPHENS
|
||||
from ..char_classes import LIST_ICONS, LIST_QUOTES, CURRENCY, UNITS, PUNCT
|
||||
from ..char_classes import CONCAT_QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
|
||||
from ..punctuation import TOKENIZER_PREFIXES as BASE_TOKENIZER_PREFIXES
|
||||
|
||||
_quotes = CONCAT_QUOTES.replace("'", "")
|
||||
|
||||
_prefixes = _prefixes = [
|
||||
r"(długo|krótko|jedno|dwu|trzy|cztero)-"
|
||||
] + BASE_TOKENIZER_PREFIXES
|
||||
|
||||
_infixes = (
|
||||
LIST_ELLIPSES
|
||||
+ [CONCAT_ICONS]
|
||||
+ LIST_ICONS
|
||||
+ LIST_HYPHENS
|
||||
+ [
|
||||
r"(?<=[{al}])\.(?=[{au}])".format(al=ALPHA_LOWER, au=ALPHA_UPPER),
|
||||
r"(?<=[0-9{al}])\.(?=[0-9{au}])".format(al=ALPHA, au=ALPHA_UPPER),
|
||||
r"(?<=[{a}])[,!?](?=[{a}])".format(a=ALPHA),
|
||||
r"(?<=[{a}])[:<>=](?=[{a}])".format(a=ALPHA),
|
||||
r"(?<=[{a}])--(?=[{a}])".format(a=ALPHA),
|
||||
r"(?<=[{a}])[:<>=\/](?=[{a}])".format(a=ALPHA),
|
||||
r"(?<=[{a}]),(?=[{a}])".format(a=ALPHA),
|
||||
r"(?<=[{a}])([{q}\)\]\(\[])(?=[\-{a}])".format(a=ALPHA, q=CONCAT_QUOTES),
|
||||
r"(?<=[{a}])([{q}\)\]\(\[])(?=[\-{a}])".format(a=ALPHA, q=_quotes),
|
||||
]
|
||||
)
|
||||
|
||||
_suffixes = (
|
||||
["''", "’’", r"\.", "…"]
|
||||
+ LIST_PUNCT
|
||||
+ LIST_QUOTES
|
||||
+ LIST_ICONS
|
||||
+ [
|
||||
r"(?<=[0-9])\+",
|
||||
r"(?<=°[FfCcKk])\.",
|
||||
r"(?<=[0-9])(?:{c})".format(c=CURRENCY),
|
||||
r"(?<=[0-9])(?:{u})".format(u=UNITS),
|
||||
r"(?<=[0-9{al}{e}{p}(?:{q})])\.".format(
|
||||
al=ALPHA_LOWER, e=r"%²\-\+", q=CONCAT_QUOTES, p=PUNCT
|
||||
),
|
||||
r"(?<=[{au}])\.".format(au=ALPHA_UPPER),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
TOKENIZER_PREFIXES = _prefixes
|
||||
TOKENIZER_INFIXES = _infixes
|
||||
TOKENIZER_SUFFIXES = _suffixes
|
||||
|
|
|
@ -1,23 +0,0 @@
|
|||
from ._tokenizer_exceptions_list import PL_BASE_EXCEPTIONS
|
||||
from ...symbols import POS, ADV, NOUN, ORTH, LEMMA, ADJ
|
||||
|
||||
|
||||
_exc = {}
|
||||
|
||||
for exc_data in [
|
||||
{ORTH: "m.in.", LEMMA: "między innymi", POS: ADV},
|
||||
{ORTH: "inż.", LEMMA: "inżynier", POS: NOUN},
|
||||
{ORTH: "mgr.", LEMMA: "magister", POS: NOUN},
|
||||
{ORTH: "tzn.", LEMMA: "to znaczy", POS: ADV},
|
||||
{ORTH: "tj.", LEMMA: "to jest", POS: ADV},
|
||||
{ORTH: "tzw.", LEMMA: "tak zwany", POS: ADJ},
|
||||
]:
|
||||
_exc[exc_data[ORTH]] = [exc_data]
|
||||
|
||||
for orth in ["w.", "r."]:
|
||||
_exc[orth] = [{ORTH: orth}]
|
||||
|
||||
for orth in PL_BASE_EXCEPTIONS:
|
||||
_exc[orth] = [{ORTH: orth}]
|
||||
|
||||
TOKENIZER_EXCEPTIONS = _exc
|
|
@ -2,22 +2,17 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
|||
from .stop_words import STOP_WORDS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from .tag_map import TAG_MAP
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_PREFIXES
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...language import Language
|
||||
from ...attrs import LANG, NORM
|
||||
from ...util import update_exc, add_lookups
|
||||
from ...attrs import LANG
|
||||
from ...util import update_exc
|
||||
|
||||
|
||||
class PortugueseDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters[LANG] = lambda text: "pt"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
|
||||
)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
stop_words = STOP_WORDS
|
||||
|
|
|
@ -1,20 +0,0 @@
|
|||
# These exceptions are used to add NORM values based on a token's ORTH value.
|
||||
# Individual languages can also add their own exceptions and overwrite them -
|
||||
# for example, British vs. American spelling in English.
|
||||
|
||||
# Norms are only set if no alternative is provided in the tokenizer exceptions.
|
||||
# Note that this does not change any other token attributes. Its main purpose
|
||||
# is to normalise the word representations so that equivalent tokens receive
|
||||
# similar representations. For example: $ and € are very different, but they're
|
||||
# both currency symbols. By normalising currency symbols to $, all symbols are
|
||||
# seen as similar, no matter how common they are in the training data.
|
||||
|
||||
|
||||
NORM_EXCEPTIONS = {
|
||||
"R$": "$", # Real
|
||||
"r$": "$", # Real
|
||||
"Cz$": "$", # Cruzado
|
||||
"cz$": "$", # Cruzado
|
||||
"NCz$": "$", # Cruzado Novo
|
||||
"ncz$": "$", # Cruzado Novo
|
||||
}
|
|
@ -1,25 +1,20 @@
|
|||
from .stop_words import STOP_WORDS
|
||||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from .tag_map import TAG_MAP
|
||||
from .lemmatizer import RussianLemmatizer
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...util import update_exc, add_lookups
|
||||
from ...util import update_exc
|
||||
from ...language import Language
|
||||
from ...lookups import Lookups
|
||||
from ...attrs import LANG, NORM
|
||||
from ...attrs import LANG
|
||||
|
||||
|
||||
class RussianDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = lambda text: "ru"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
stop_words = STOP_WORDS
|
||||
tag_map = TAG_MAP
|
||||
|
|
|
@ -1,32 +0,0 @@
|
|||
_exc = {
|
||||
# Slang
|
||||
"прив": "привет",
|
||||
"дарова": "привет",
|
||||
"дак": "так",
|
||||
"дык": "так",
|
||||
"здарова": "привет",
|
||||
"пакедава": "пока",
|
||||
"пакедаво": "пока",
|
||||
"ща": "сейчас",
|
||||
"спс": "спасибо",
|
||||
"пжлст": "пожалуйста",
|
||||
"плиз": "пожалуйста",
|
||||
"ладненько": "ладно",
|
||||
"лады": "ладно",
|
||||
"лан": "ладно",
|
||||
"ясн": "ясно",
|
||||
"всм": "всмысле",
|
||||
"хош": "хочешь",
|
||||
"хаюшки": "привет",
|
||||
"оч": "очень",
|
||||
"че": "что",
|
||||
"чо": "что",
|
||||
"шо": "что",
|
||||
}
|
||||
|
||||
|
||||
NORM_EXCEPTIONS = {}
|
||||
|
||||
for string, norm in _exc.items():
|
||||
NORM_EXCEPTIONS[string] = norm
|
||||
NORM_EXCEPTIONS[string.title()] = norm
|
|
@ -1,21 +1,16 @@
|
|||
from .stop_words import STOP_WORDS
|
||||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .norm_exceptions import NORM_EXCEPTIONS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
from ...language import Language
|
||||
from ...attrs import LANG, NORM
|
||||
from ...util import update_exc, add_lookups
|
||||
from ...attrs import LANG
|
||||
from ...util import update_exc
|
||||
|
||||
|
||||
class SerbianDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = lambda text: "sr"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
stop_words = STOP_WORDS
|
||||
|
||||
|
|
|
@ -1,22 +0,0 @@
|
|||
_exc = {
|
||||
# Slang
|
||||
"ћале": "отац",
|
||||
"кева": "мајка",
|
||||
"смор": "досада",
|
||||
"кец": "јединица",
|
||||
"тебра": "брат",
|
||||
"штребер": "ученик",
|
||||
"факс": "факултет",
|
||||
"профа": "професор",
|
||||
"бус": "аутобус",
|
||||
"пискарало": "службеник",
|
||||
"бакутанер": "бака",
|
||||
"џибер": "простак",
|
||||
}
|
||||
|
||||
|
||||
NORM_EXCEPTIONS = {}
|
||||
|
||||
for string, norm in _exc.items():
|
||||
NORM_EXCEPTIONS[string] = norm
|
||||
NORM_EXCEPTIONS[string.title()] = norm
|
|
@ -1,6 +1,7 @@
|
|||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||
from .tag_map import TAG_MAP
|
||||
from .stop_words import STOP_WORDS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
from .morph_rules import MORPH_RULES
|
||||
|
||||
# Punctuation stolen from Danish
|
||||
|
@ -16,6 +17,7 @@ from .syntax_iterators import SYNTAX_ITERATORS
|
|||
|
||||
class SwedishDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
lex_attr_getters[LANG] = lambda text: "sv"
|
||||
lex_attr_getters[NORM] = add_lookups(
|
||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user