mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-24 04:31:17 +03:00
Merge pull request #6029 from explosion/master-tmp
This commit is contained in:
commit
f06eed800e
107
.github/contributors/bittlingmayer.md
vendored
Normal file
107
.github/contributors/bittlingmayer.md
vendored
Normal file
|
@ -0,0 +1,107 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Adam Bittlingmayer |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 12 Aug 2020 |
|
||||||
|
| GitHub username | bittlingmayer |
|
||||||
|
| Website (optional) | |
|
||||||
|
|
106
.github/contributors/graue70.md
vendored
Normal file
106
.github/contributors/graue70.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Thomas |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2020-08-11 |
|
||||||
|
| GitHub username | graue70 |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/holubvl3.md
vendored
Normal file
106
.github/contributors/holubvl3.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Vladimir Holubec |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 30.07.2020 |
|
||||||
|
| GitHub username | holubvl3 |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/idoshr.md
vendored
Normal file
106
.github/contributors/idoshr.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Ido Shraga |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 20-09-2020 |
|
||||||
|
| GitHub username | idoshr |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/jgutix.md
vendored
Normal file
106
.github/contributors/jgutix.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Juan Gutiérrez |
|
||||||
|
| Company name (if applicable) | Ojtli |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2020-08-28 |
|
||||||
|
| GitHub username | jgutix |
|
||||||
|
| Website (optional) | ojtli.app |
|
106
.github/contributors/leyendecker.md
vendored
Normal file
106
.github/contributors/leyendecker.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | ---------------------------- |
|
||||||
|
| Name | Gustavo Zadrozny Leyendecker |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | July 29, 2020 |
|
||||||
|
| GitHub username | leyendecker |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/lizhe2004.md
vendored
Normal file
106
.github/contributors/lizhe2004.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | ------------------------ |
|
||||||
|
| Name | Zhe li |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2020-07-24 |
|
||||||
|
| GitHub username | lizhe2004 |
|
||||||
|
| Website (optional) | http://www.huahuaxia.net|
|
106
.github/contributors/snsten.md
vendored
Normal file
106
.github/contributors/snsten.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Shashank Shekhar |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2020-08-23 |
|
||||||
|
| GitHub username | snsten |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/solarmist.md
vendored
Normal file
106
.github/contributors/solarmist.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | ------------------------- |
|
||||||
|
| Name | Joshua Olson |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2020-07-22 |
|
||||||
|
| GitHub username | solarmist |
|
||||||
|
| Website (optional) | http://blog.solarmist.net |
|
106
.github/contributors/tilusnet.md
vendored
Normal file
106
.github/contributors/tilusnet.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Attila Szász |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 12 Aug 2020 |
|
||||||
|
| GitHub username | tilusnet |
|
||||||
|
| Website (optional) | |
|
38
licenses/3rd_party_licenses.txt
Normal file
38
licenses/3rd_party_licenses.txt
Normal file
|
@ -0,0 +1,38 @@
|
||||||
|
Third Party Licenses for spaCy
|
||||||
|
==============================
|
||||||
|
|
||||||
|
NumPy
|
||||||
|
-----
|
||||||
|
|
||||||
|
* Files: setup.py
|
||||||
|
|
||||||
|
Copyright (c) 2005-2020, NumPy Developers.
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above
|
||||||
|
copyright notice, this list of conditions and the following
|
||||||
|
disclaimer in the documentation and/or other materials provided
|
||||||
|
with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of the NumPy Developers nor the names of any
|
||||||
|
contributors may be used to endorse or promote products derived
|
||||||
|
from this software without specific prior written permission.
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||||
|
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||||
|
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||||
|
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||||
|
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||||
|
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||||
|
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||||
|
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||||
|
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||||
|
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||||
|
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
@ -329,7 +329,11 @@ class EntityRenderer:
|
||||||
else:
|
else:
|
||||||
markup += entity
|
markup += entity
|
||||||
offset = end
|
offset = end
|
||||||
markup += escape_html(text[offset:])
|
fragments = text[offset:].split("\n")
|
||||||
|
for i, fragment in enumerate(fragments):
|
||||||
|
markup += escape_html(fragment)
|
||||||
|
if len(fragments) > 1 and i != len(fragments) - 1:
|
||||||
|
markup += "</br>"
|
||||||
markup = TPL_ENTS.format(content=markup, dir=self.direction)
|
markup = TPL_ENTS.format(content=markup, dir=self.direction)
|
||||||
if title:
|
if title:
|
||||||
markup = TPL_TITLE.format(title=title) + markup
|
markup = TPL_TITLE.format(title=title) + markup
|
||||||
|
|
|
@ -76,6 +76,10 @@ class Warnings:
|
||||||
"If this is surprising, make sure you have the spacy-lookups-data "
|
"If this is surprising, make sure you have the spacy-lookups-data "
|
||||||
"package installed. The languages with lexeme normalization tables "
|
"package installed. The languages with lexeme normalization tables "
|
||||||
"are currently: {langs}")
|
"are currently: {langs}")
|
||||||
|
W034 = ("Please install the package spacy-lookups-data in order to include "
|
||||||
|
"the default lexeme normalization table for the language '{lang}'.")
|
||||||
|
W035 = ('Discarding subpattern "{pattern}" due to an unrecognized '
|
||||||
|
"attribute or operator.")
|
||||||
|
|
||||||
# TODO: fix numbering after merging develop into master
|
# TODO: fix numbering after merging develop into master
|
||||||
W090 = ("Could not locate any binary .spacy files in path '{path}'.")
|
W090 = ("Could not locate any binary .spacy files in path '{path}'.")
|
||||||
|
@ -474,6 +478,9 @@ class Errors:
|
||||||
E198 = ("Unable to return {n} most similar vectors for the current vectors "
|
E198 = ("Unable to return {n} most similar vectors for the current vectors "
|
||||||
"table, which contains {n_rows} vectors.")
|
"table, which contains {n_rows} vectors.")
|
||||||
E199 = ("Unable to merge 0-length span at doc[{start}:{end}].")
|
E199 = ("Unable to merge 0-length span at doc[{start}:{end}].")
|
||||||
|
E200 = ("Specifying a base model with a pretrained component '{component}' "
|
||||||
|
"can not be combined with adding a pretrained Tok2Vec layer.")
|
||||||
|
E201 = ("Span index out of range.")
|
||||||
|
|
||||||
# TODO: fix numbering after merging develop into master
|
# TODO: fix numbering after merging develop into master
|
||||||
E925 = ("Invalid color values for displaCy visualizer: expected dictionary "
|
E925 = ("Invalid color values for displaCy visualizer: expected dictionary "
|
||||||
|
|
|
@ -1,9 +1,11 @@
|
||||||
from .stop_words import STOP_WORDS
|
from .stop_words import STOP_WORDS
|
||||||
|
from .lex_attrs import LEX_ATTRS
|
||||||
from ...language import Language
|
from ...language import Language
|
||||||
|
|
||||||
|
|
||||||
class CzechDefaults(Language.Defaults):
|
class CzechDefaults(Language.Defaults):
|
||||||
stop_words = STOP_WORDS
|
stop_words = STOP_WORDS
|
||||||
|
lex_attr_getters = LEX_ATTRS
|
||||||
|
|
||||||
|
|
||||||
class Czech(Language):
|
class Czech(Language):
|
||||||
|
|
38
spacy/lang/cs/examples.py
Normal file
38
spacy/lang/cs/examples.py
Normal file
|
@ -0,0 +1,38 @@
|
||||||
|
"""
|
||||||
|
Example sentences to test spaCy and its language models.
|
||||||
|
>>> from spacy.lang.cs.examples import sentences
|
||||||
|
>>> docs = nlp.pipe(sentences)
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
sentences = [
|
||||||
|
"Máma mele maso.",
|
||||||
|
"Příliš žluťoučký kůň úpěl ďábelské ódy.",
|
||||||
|
"ArcGIS je geografický informační systém určený pro práci s prostorovými daty.",
|
||||||
|
"Může data vytvářet a spravovat, ale především je dokáže analyzovat, najít v nich nové vztahy a vše přehledně vizualizovat.",
|
||||||
|
"Dnes je krásné počasí.",
|
||||||
|
"Nestihl autobus, protože pozdě vstal z postele.",
|
||||||
|
"Než budeš jíst, jdi si umýt ruce.",
|
||||||
|
"Dnes je neděle.",
|
||||||
|
"Škola začíná v 8:00.",
|
||||||
|
"Poslední autobus jede v jedenáct hodin večer.",
|
||||||
|
"V roce 2020 se téměř zastavila světová ekonomika.",
|
||||||
|
"Praha je hlavní město České republiky.",
|
||||||
|
"Kdy půjdeš ven?",
|
||||||
|
"Kam pojedete na dovolenou?",
|
||||||
|
"Kolik stojí iPhone 12?",
|
||||||
|
"Průměrná mzda je 30000 Kč.",
|
||||||
|
"1. ledna 1993 byla založena Česká republika.",
|
||||||
|
"Co se stalo 21.8.1968?",
|
||||||
|
"Moje telefonní číslo je 712 345 678.",
|
||||||
|
"Můj pes má blechy.",
|
||||||
|
"Když bude přes noc více než 20°, tak nás čeká tropická noc.",
|
||||||
|
"Kolik bylo letos tropických nocí?",
|
||||||
|
"Jak to mám udělat?",
|
||||||
|
"Bydlíme ve čtvrtém patře.",
|
||||||
|
"Vysílají 30. sezonu seriálu Simpsonovi.",
|
||||||
|
"Adresa ČVUT je Thákurova 7, 166 29, Praha 6.",
|
||||||
|
"Jaké PSČ má Praha 1?",
|
||||||
|
"PSČ Prahy 1 je 110 00.",
|
||||||
|
"Za 20 minut jede vlak.",
|
||||||
|
]
|
61
spacy/lang/cs/lex_attrs.py
Normal file
61
spacy/lang/cs/lex_attrs.py
Normal file
|
@ -0,0 +1,61 @@
|
||||||
|
from ...attrs import LIKE_NUM
|
||||||
|
|
||||||
|
_num_words = [
|
||||||
|
"nula",
|
||||||
|
"jedna",
|
||||||
|
"dva",
|
||||||
|
"tři",
|
||||||
|
"čtyři",
|
||||||
|
"pět",
|
||||||
|
"šest",
|
||||||
|
"sedm",
|
||||||
|
"osm",
|
||||||
|
"devět",
|
||||||
|
"deset",
|
||||||
|
"jedenáct",
|
||||||
|
"dvanáct",
|
||||||
|
"třináct",
|
||||||
|
"čtrnáct",
|
||||||
|
"patnáct",
|
||||||
|
"šestnáct",
|
||||||
|
"sedmnáct",
|
||||||
|
"osmnáct",
|
||||||
|
"devatenáct",
|
||||||
|
"dvacet",
|
||||||
|
"třicet",
|
||||||
|
"čtyřicet",
|
||||||
|
"padesát",
|
||||||
|
"šedesát",
|
||||||
|
"sedmdesát",
|
||||||
|
"osmdesát",
|
||||||
|
"devadesát",
|
||||||
|
"sto",
|
||||||
|
"tisíc",
|
||||||
|
"milion",
|
||||||
|
"miliarda",
|
||||||
|
"bilion",
|
||||||
|
"biliarda",
|
||||||
|
"trilion",
|
||||||
|
"triliarda",
|
||||||
|
"kvadrilion",
|
||||||
|
"kvadriliarda",
|
||||||
|
"kvintilion",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def like_num(text):
|
||||||
|
if text.startswith(("+", "-", "±", "~")):
|
||||||
|
text = text[1:]
|
||||||
|
text = text.replace(",", "").replace(".", "")
|
||||||
|
if text.isdigit():
|
||||||
|
return True
|
||||||
|
if text.count("/") == 1:
|
||||||
|
num, denom = text.split("/")
|
||||||
|
if num.isdigit() and denom.isdigit():
|
||||||
|
return True
|
||||||
|
if text.lower() in _num_words:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
LEX_ATTRS = {LIKE_NUM: like_num}
|
|
@ -1,14 +1,23 @@
|
||||||
# Source: https://github.com/Alir3z4/stop-words
|
# Source: https://github.com/Alir3z4/stop-words
|
||||||
|
# Source: https://github.com/stopwords-iso/stopwords-cs/blob/master/stopwords-cs.txt
|
||||||
|
|
||||||
STOP_WORDS = set(
|
STOP_WORDS = set(
|
||||||
"""
|
"""
|
||||||
ačkoli
|
a
|
||||||
|
aby
|
||||||
ahoj
|
ahoj
|
||||||
|
ačkoli
|
||||||
ale
|
ale
|
||||||
|
alespoň
|
||||||
anebo
|
anebo
|
||||||
|
ani
|
||||||
|
aniž
|
||||||
ano
|
ano
|
||||||
|
atd.
|
||||||
|
atp.
|
||||||
asi
|
asi
|
||||||
aspoň
|
aspoň
|
||||||
|
až
|
||||||
během
|
během
|
||||||
bez
|
bez
|
||||||
beze
|
beze
|
||||||
|
@ -21,12 +30,14 @@ budeš
|
||||||
budete
|
budete
|
||||||
budou
|
budou
|
||||||
budu
|
budu
|
||||||
|
by
|
||||||
byl
|
byl
|
||||||
byla
|
byla
|
||||||
byli
|
byli
|
||||||
bylo
|
bylo
|
||||||
byly
|
byly
|
||||||
bys
|
bys
|
||||||
|
být
|
||||||
čau
|
čau
|
||||||
chce
|
chce
|
||||||
chceme
|
chceme
|
||||||
|
@ -35,14 +46,21 @@ chcete
|
||||||
chci
|
chci
|
||||||
chtějí
|
chtějí
|
||||||
chtít
|
chtít
|
||||||
chut'
|
chuť
|
||||||
chuti
|
chuti
|
||||||
co
|
co
|
||||||
|
což
|
||||||
|
cz
|
||||||
|
či
|
||||||
|
článek
|
||||||
|
článku
|
||||||
|
články
|
||||||
čtrnáct
|
čtrnáct
|
||||||
čtyři
|
čtyři
|
||||||
dál
|
dál
|
||||||
dále
|
dále
|
||||||
daleko
|
daleko
|
||||||
|
další
|
||||||
děkovat
|
děkovat
|
||||||
děkujeme
|
děkujeme
|
||||||
děkuji
|
děkuji
|
||||||
|
@ -50,6 +68,7 @@ den
|
||||||
deset
|
deset
|
||||||
devatenáct
|
devatenáct
|
||||||
devět
|
devět
|
||||||
|
dnes
|
||||||
do
|
do
|
||||||
dobrý
|
dobrý
|
||||||
docela
|
docela
|
||||||
|
@ -57,9 +76,15 @@ dva
|
||||||
dvacet
|
dvacet
|
||||||
dvanáct
|
dvanáct
|
||||||
dvě
|
dvě
|
||||||
|
email
|
||||||
|
ho
|
||||||
hodně
|
hodně
|
||||||
|
i
|
||||||
já
|
já
|
||||||
jak
|
jak
|
||||||
|
jakmile
|
||||||
|
jako
|
||||||
|
jakož
|
||||||
jde
|
jde
|
||||||
je
|
je
|
||||||
jeden
|
jeden
|
||||||
|
@ -69,25 +94,39 @@ jedno
|
||||||
jednou
|
jednou
|
||||||
jedou
|
jedou
|
||||||
jeho
|
jeho
|
||||||
|
jehož
|
||||||
|
jej
|
||||||
její
|
její
|
||||||
jejich
|
jejich
|
||||||
|
jejichž
|
||||||
|
jehož
|
||||||
|
jelikož
|
||||||
jemu
|
jemu
|
||||||
jen
|
jen
|
||||||
jenom
|
jenom
|
||||||
|
jenž
|
||||||
|
jež
|
||||||
ještě
|
ještě
|
||||||
jestli
|
jestli
|
||||||
jestliže
|
jestliže
|
||||||
|
ještě
|
||||||
|
ji
|
||||||
jí
|
jí
|
||||||
jich
|
jich
|
||||||
jím
|
jím
|
||||||
|
jim
|
||||||
jimi
|
jimi
|
||||||
jinak
|
jinak
|
||||||
jsem
|
jiné
|
||||||
|
již
|
||||||
jsi
|
jsi
|
||||||
jsme
|
jsme
|
||||||
|
jsem
|
||||||
jsou
|
jsou
|
||||||
jste
|
jste
|
||||||
|
k
|
||||||
kam
|
kam
|
||||||
|
každý
|
||||||
kde
|
kde
|
||||||
kdo
|
kdo
|
||||||
kdy
|
kdy
|
||||||
|
@ -96,10 +135,13 @@ ke
|
||||||
kolik
|
kolik
|
||||||
kromě
|
kromě
|
||||||
která
|
která
|
||||||
|
kterak
|
||||||
|
kterou
|
||||||
které
|
které
|
||||||
kteří
|
kteří
|
||||||
který
|
který
|
||||||
kvůli
|
kvůli
|
||||||
|
ku
|
||||||
má
|
má
|
||||||
mají
|
mají
|
||||||
málo
|
málo
|
||||||
|
@ -110,8 +152,10 @@ máte
|
||||||
mé
|
mé
|
||||||
mě
|
mě
|
||||||
mezi
|
mezi
|
||||||
|
mi
|
||||||
mí
|
mí
|
||||||
mít
|
mít
|
||||||
|
mne
|
||||||
mně
|
mně
|
||||||
mnou
|
mnou
|
||||||
moc
|
moc
|
||||||
|
@ -134,6 +178,7 @@ nás
|
||||||
náš
|
náš
|
||||||
naše
|
naše
|
||||||
naši
|
naši
|
||||||
|
načež
|
||||||
ne
|
ne
|
||||||
ně
|
ně
|
||||||
nebo
|
nebo
|
||||||
|
@ -141,6 +186,7 @@ nebyl
|
||||||
nebyla
|
nebyla
|
||||||
nebyli
|
nebyli
|
||||||
nebyly
|
nebyly
|
||||||
|
nechť
|
||||||
něco
|
něco
|
||||||
nedělá
|
nedělá
|
||||||
nedělají
|
nedělají
|
||||||
|
@ -150,6 +196,7 @@ neděláš
|
||||||
neděláte
|
neděláte
|
||||||
nějak
|
nějak
|
||||||
nejsi
|
nejsi
|
||||||
|
nejsou
|
||||||
někde
|
někde
|
||||||
někdo
|
někdo
|
||||||
nemají
|
nemají
|
||||||
|
@ -157,15 +204,22 @@ nemáme
|
||||||
nemáte
|
nemáte
|
||||||
neměl
|
neměl
|
||||||
němu
|
němu
|
||||||
|
němuž
|
||||||
není
|
není
|
||||||
nestačí
|
nestačí
|
||||||
|
ně
|
||||||
nevadí
|
nevadí
|
||||||
|
nové
|
||||||
|
nový
|
||||||
|
noví
|
||||||
než
|
než
|
||||||
nic
|
nic
|
||||||
nich
|
nich
|
||||||
|
ní
|
||||||
ním
|
ním
|
||||||
nimi
|
nimi
|
||||||
nula
|
nula
|
||||||
|
o
|
||||||
od
|
od
|
||||||
ode
|
ode
|
||||||
on
|
on
|
||||||
|
@ -179,22 +233,37 @@ pak
|
||||||
patnáct
|
patnáct
|
||||||
pět
|
pět
|
||||||
po
|
po
|
||||||
|
pod
|
||||||
|
pokud
|
||||||
pořád
|
pořád
|
||||||
|
pouze
|
||||||
potom
|
potom
|
||||||
pozdě
|
pozdě
|
||||||
|
pravé
|
||||||
před
|
před
|
||||||
|
přede
|
||||||
přes
|
přes
|
||||||
přese
|
přece
|
||||||
pro
|
pro
|
||||||
proč
|
proč
|
||||||
prosím
|
prosím
|
||||||
prostě
|
prostě
|
||||||
|
proto
|
||||||
proti
|
proti
|
||||||
|
první
|
||||||
|
právě
|
||||||
protože
|
protože
|
||||||
|
při
|
||||||
|
přičemž
|
||||||
rovně
|
rovně
|
||||||
|
s
|
||||||
se
|
se
|
||||||
sedm
|
sedm
|
||||||
sedmnáct
|
sedmnáct
|
||||||
|
si
|
||||||
|
sice
|
||||||
|
skoro
|
||||||
|
sic
|
||||||
šest
|
šest
|
||||||
šestnáct
|
šestnáct
|
||||||
skoro
|
skoro
|
||||||
|
@ -203,41 +272,69 @@ smí
|
||||||
snad
|
snad
|
||||||
spolu
|
spolu
|
||||||
sta
|
sta
|
||||||
|
svůj
|
||||||
|
své
|
||||||
|
svá
|
||||||
|
svých
|
||||||
|
svým
|
||||||
|
svými
|
||||||
|
svůj
|
||||||
sté
|
sté
|
||||||
sto
|
sto
|
||||||
|
strana
|
||||||
ta
|
ta
|
||||||
tady
|
tady
|
||||||
tak
|
tak
|
||||||
takhle
|
takhle
|
||||||
taky
|
taky
|
||||||
|
také
|
||||||
|
takže
|
||||||
tam
|
tam
|
||||||
tamhle
|
támhle
|
||||||
tamhleto
|
támhleto
|
||||||
tamto
|
tamto
|
||||||
tě
|
tě
|
||||||
tebe
|
tebe
|
||||||
tebou
|
tebou
|
||||||
ted'
|
teď
|
||||||
tedy
|
tedy
|
||||||
ten
|
ten
|
||||||
|
tento
|
||||||
|
této
|
||||||
ti
|
ti
|
||||||
|
tím
|
||||||
|
tímto
|
||||||
tisíc
|
tisíc
|
||||||
tisíce
|
tisíce
|
||||||
to
|
to
|
||||||
tobě
|
tobě
|
||||||
tohle
|
tohle
|
||||||
|
tohoto
|
||||||
|
tom
|
||||||
|
tomto
|
||||||
|
tomu
|
||||||
|
tomuto
|
||||||
toto
|
toto
|
||||||
třeba
|
třeba
|
||||||
tři
|
tři
|
||||||
třináct
|
třináct
|
||||||
trošku
|
trošku
|
||||||
|
trochu
|
||||||
|
tu
|
||||||
|
tuto
|
||||||
tvá
|
tvá
|
||||||
tvé
|
tvé
|
||||||
tvoje
|
tvoje
|
||||||
tvůj
|
tvůj
|
||||||
ty
|
ty
|
||||||
|
tyto
|
||||||
|
těm
|
||||||
|
těma
|
||||||
|
těmi
|
||||||
|
u
|
||||||
určitě
|
určitě
|
||||||
už
|
už
|
||||||
|
v
|
||||||
vám
|
vám
|
||||||
vámi
|
vámi
|
||||||
vás
|
vás
|
||||||
|
@ -247,13 +344,19 @@ vaši
|
||||||
ve
|
ve
|
||||||
večer
|
večer
|
||||||
vedle
|
vedle
|
||||||
|
více
|
||||||
vlastně
|
vlastně
|
||||||
|
však
|
||||||
|
všechen
|
||||||
všechno
|
všechno
|
||||||
všichni
|
všichni
|
||||||
vůbec
|
vůbec
|
||||||
vy
|
vy
|
||||||
vždy
|
vždy
|
||||||
|
z
|
||||||
|
zda
|
||||||
za
|
za
|
||||||
|
zde
|
||||||
zač
|
zač
|
||||||
zatímco
|
zatímco
|
||||||
ze
|
ze
|
||||||
|
|
0
spacy/lang/cs/test_text.py
Normal file
0
spacy/lang/cs/test_text.py
Normal file
|
@ -8,6 +8,14 @@ _num_words = [
|
||||||
"fifty", "sixty", "seventy", "eighty", "ninety", "hundred", "thousand",
|
"fifty", "sixty", "seventy", "eighty", "ninety", "hundred", "thousand",
|
||||||
"million", "billion", "trillion", "quadrillion", "gajillion", "bazillion"
|
"million", "billion", "trillion", "quadrillion", "gajillion", "bazillion"
|
||||||
]
|
]
|
||||||
|
_ordinal_words = [
|
||||||
|
"first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eighth",
|
||||||
|
"ninth", "tenth", "eleventh", "twelfth", "thirteenth", "fourteenth",
|
||||||
|
"fifteenth", "sixteenth", "seventeenth", "eighteenth", "nineteenth",
|
||||||
|
"twentieth", "thirtieth", "fortieth", "fiftieth", "sixtieth", "seventieth",
|
||||||
|
"eightieth", "ninetieth", "hundredth", "thousandth", "millionth", "billionth",
|
||||||
|
"trillionth", "quadrillionth", "gajillionth", "bazillionth"
|
||||||
|
]
|
||||||
# fmt: on
|
# fmt: on
|
||||||
|
|
||||||
|
|
||||||
|
@ -21,8 +29,15 @@ def like_num(text: str) -> bool:
|
||||||
num, denom = text.split("/")
|
num, denom = text.split("/")
|
||||||
if num.isdigit() and denom.isdigit():
|
if num.isdigit() and denom.isdigit():
|
||||||
return True
|
return True
|
||||||
if text.lower() in _num_words:
|
text_lower = text.lower()
|
||||||
|
if text_lower in _num_words:
|
||||||
return True
|
return True
|
||||||
|
# Check ordinal number
|
||||||
|
if text_lower in _ordinal_words:
|
||||||
|
return True
|
||||||
|
if text_lower.endswith("th"):
|
||||||
|
if text_lower[:-2].isdigit():
|
||||||
|
return True
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -19,8 +19,7 @@ def noun_chunks(doclike: Union[Doc, Span]) -> Iterator[Span]:
|
||||||
np_left_deps = [doc.vocab.strings.add(label) for label in left_labels]
|
np_left_deps = [doc.vocab.strings.add(label) for label in left_labels]
|
||||||
np_right_deps = [doc.vocab.strings.add(label) for label in right_labels]
|
np_right_deps = [doc.vocab.strings.add(label) for label in right_labels]
|
||||||
stop_deps = [doc.vocab.strings.add(label) for label in stop_labels]
|
stop_deps = [doc.vocab.strings.add(label) for label in stop_labels]
|
||||||
token = doc[0]
|
for token in doclike:
|
||||||
while token and token.i < len(doclike):
|
|
||||||
if token.pos in [PROPN, NOUN, PRON]:
|
if token.pos in [PROPN, NOUN, PRON]:
|
||||||
left, right = noun_bounds(
|
left, right = noun_bounds(
|
||||||
doc, token, np_left_deps, np_right_deps, stop_deps
|
doc, token, np_left_deps, np_right_deps, stop_deps
|
||||||
|
|
|
@ -1,9 +1,11 @@
|
||||||
from .stop_words import STOP_WORDS
|
from .stop_words import STOP_WORDS
|
||||||
|
from .lex_attrs import LEX_ATTRS
|
||||||
from ...language import Language
|
from ...language import Language
|
||||||
|
|
||||||
|
|
||||||
class HebrewDefaults(Language.Defaults):
|
class HebrewDefaults(Language.Defaults):
|
||||||
stop_words = STOP_WORDS
|
stop_words = STOP_WORDS
|
||||||
|
lex_attr_getters = LEX_ATTRS
|
||||||
writing_system = {"direction": "rtl", "has_case": False, "has_letters": True}
|
writing_system = {"direction": "rtl", "has_case": False, "has_letters": True}
|
||||||
|
|
||||||
|
|
||||||
|
|
95
spacy/lang/he/lex_attrs.py
Normal file
95
spacy/lang/he/lex_attrs.py
Normal file
|
@ -0,0 +1,95 @@
|
||||||
|
from ...attrs import LIKE_NUM
|
||||||
|
|
||||||
|
_num_words = [
|
||||||
|
"אפס",
|
||||||
|
"אחד",
|
||||||
|
"אחת",
|
||||||
|
"שתיים",
|
||||||
|
"שתים",
|
||||||
|
"שניים",
|
||||||
|
"שנים",
|
||||||
|
"שלוש",
|
||||||
|
"שלושה",
|
||||||
|
"ארבע",
|
||||||
|
"ארבעה",
|
||||||
|
"חמש",
|
||||||
|
"חמישה",
|
||||||
|
"שש",
|
||||||
|
"שישה",
|
||||||
|
"שבע",
|
||||||
|
"שבעה",
|
||||||
|
"שמונה",
|
||||||
|
"תשע",
|
||||||
|
"תשעה",
|
||||||
|
"עשר",
|
||||||
|
"עשרה",
|
||||||
|
"אחד עשר",
|
||||||
|
"אחת עשרה",
|
||||||
|
"שנים עשר",
|
||||||
|
"שתים עשרה",
|
||||||
|
"שלושה עשר",
|
||||||
|
"שלוש עשרה",
|
||||||
|
"ארבעה עשר",
|
||||||
|
"ארבע עשרה",
|
||||||
|
"חמישה עשר",
|
||||||
|
"חמש עשרה",
|
||||||
|
"ששה עשר",
|
||||||
|
"שש עשרה",
|
||||||
|
"שבעה עשר",
|
||||||
|
"שבע עשרה",
|
||||||
|
"שמונה עשר",
|
||||||
|
"שמונה עשרה",
|
||||||
|
"תשעה עשר",
|
||||||
|
"תשע עשרה",
|
||||||
|
"עשרים",
|
||||||
|
"שלושים",
|
||||||
|
"ארבעים",
|
||||||
|
"חמישים",
|
||||||
|
"שישים",
|
||||||
|
"שבעים",
|
||||||
|
"שמונים",
|
||||||
|
"תשעים",
|
||||||
|
"מאה",
|
||||||
|
"אלף",
|
||||||
|
"מליון",
|
||||||
|
"מליארד",
|
||||||
|
"טריליון",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
_ordinal_words = [
|
||||||
|
"ראשון",
|
||||||
|
"שני",
|
||||||
|
"שלישי",
|
||||||
|
"רביעי",
|
||||||
|
"חמישי",
|
||||||
|
"שישי",
|
||||||
|
"שביעי",
|
||||||
|
"שמיני",
|
||||||
|
"תשיעי",
|
||||||
|
"עשירי",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def like_num(text):
|
||||||
|
if text.startswith(("+", "-", "±", "~")):
|
||||||
|
text = text[1:]
|
||||||
|
text = text.replace(",", "").replace(".", "")
|
||||||
|
if text.isdigit():
|
||||||
|
return True
|
||||||
|
|
||||||
|
if text.count("/") == 1:
|
||||||
|
num, denom = text.split("/")
|
||||||
|
if num.isdigit() and denom.isdigit():
|
||||||
|
return True
|
||||||
|
|
||||||
|
if text in _num_words:
|
||||||
|
return True
|
||||||
|
|
||||||
|
# CHeck ordinal number
|
||||||
|
if text in _ordinal_words:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
LEX_ATTRS = {LIKE_NUM: like_num}
|
|
@ -39,7 +39,6 @@ STOP_WORDS = set(
|
||||||
בין
|
בין
|
||||||
עם
|
עם
|
||||||
עד
|
עד
|
||||||
נגר
|
|
||||||
על
|
על
|
||||||
אל
|
אל
|
||||||
מול
|
מול
|
||||||
|
@ -58,7 +57,7 @@ STOP_WORDS = set(
|
||||||
עליך
|
עליך
|
||||||
עלינו
|
עלינו
|
||||||
עליכם
|
עליכם
|
||||||
לעיכן
|
עליכן
|
||||||
עליהם
|
עליהם
|
||||||
עליהן
|
עליהן
|
||||||
כל
|
כל
|
||||||
|
@ -67,8 +66,8 @@ STOP_WORDS = set(
|
||||||
כך
|
כך
|
||||||
ככה
|
ככה
|
||||||
כזה
|
כזה
|
||||||
|
כזאת
|
||||||
זה
|
זה
|
||||||
זות
|
|
||||||
אותי
|
אותי
|
||||||
אותה
|
אותה
|
||||||
אותם
|
אותם
|
||||||
|
@ -91,7 +90,7 @@ STOP_WORDS = set(
|
||||||
איתכן
|
איתכן
|
||||||
יהיה
|
יהיה
|
||||||
תהיה
|
תהיה
|
||||||
היתי
|
הייתי
|
||||||
היתה
|
היתה
|
||||||
היה
|
היה
|
||||||
להיות
|
להיות
|
||||||
|
@ -101,8 +100,6 @@ STOP_WORDS = set(
|
||||||
עצמם
|
עצמם
|
||||||
עצמן
|
עצמן
|
||||||
עצמנו
|
עצמנו
|
||||||
עצמהם
|
|
||||||
עצמהן
|
|
||||||
מי
|
מי
|
||||||
מה
|
מה
|
||||||
איפה
|
איפה
|
||||||
|
@ -153,6 +150,7 @@ STOP_WORDS = set(
|
||||||
לאו
|
לאו
|
||||||
אי
|
אי
|
||||||
כלל
|
כלל
|
||||||
|
בעד
|
||||||
נגד
|
נגד
|
||||||
אם
|
אם
|
||||||
עם
|
עם
|
||||||
|
@ -196,7 +194,6 @@ STOP_WORDS = set(
|
||||||
אשר
|
אשר
|
||||||
ואילו
|
ואילו
|
||||||
למרות
|
למרות
|
||||||
אס
|
|
||||||
כמו
|
כמו
|
||||||
כפי
|
כפי
|
||||||
אז
|
אז
|
||||||
|
@ -204,8 +201,8 @@ STOP_WORDS = set(
|
||||||
כן
|
כן
|
||||||
לכן
|
לכן
|
||||||
לפיכך
|
לפיכך
|
||||||
מאד
|
|
||||||
עז
|
עז
|
||||||
|
מאוד
|
||||||
מעט
|
מעט
|
||||||
מעטים
|
מעטים
|
||||||
במידה
|
במידה
|
||||||
|
|
|
@ -15,4 +15,6 @@ sentences = [
|
||||||
"फ्रांस के राष्ट्रपति कौन हैं?",
|
"फ्रांस के राष्ट्रपति कौन हैं?",
|
||||||
"संयुक्त राज्यों की राजधानी क्या है?",
|
"संयुक्त राज्यों की राजधानी क्या है?",
|
||||||
"बराक ओबामा का जन्म कब हुआ था?",
|
"बराक ओबामा का जन्म कब हुआ था?",
|
||||||
|
"जवाहरलाल नेहरू भारत के पहले प्रधानमंत्री हैं।",
|
||||||
|
"राजेंद्र प्रसाद, भारत के पहले राष्ट्रपति, दो कार्यकाल के लिए कार्यालय रखने वाले एकमात्र व्यक्ति हैं।",
|
||||||
]
|
]
|
||||||
|
|
|
@ -254,7 +254,7 @@ def get_dtokens_and_spaces(dtokens, text, gap_tag="空白"):
|
||||||
return text_dtokens, text_spaces
|
return text_dtokens, text_spaces
|
||||||
|
|
||||||
# align words and dtokens by referring text, and insert gap tokens for the space char spans
|
# align words and dtokens by referring text, and insert gap tokens for the space char spans
|
||||||
for word, dtoken in zip(words, dtokens):
|
for i, (word, dtoken) in enumerate(zip(words, dtokens)):
|
||||||
# skip all space tokens
|
# skip all space tokens
|
||||||
if word.isspace():
|
if word.isspace():
|
||||||
continue
|
continue
|
||||||
|
@ -275,7 +275,7 @@ def get_dtokens_and_spaces(dtokens, text, gap_tag="空白"):
|
||||||
text_spaces.append(False)
|
text_spaces.append(False)
|
||||||
text_pos += len(word)
|
text_pos += len(word)
|
||||||
# poll a space char after the word
|
# poll a space char after the word
|
||||||
if text_pos < len(text) and text[text_pos] == " ":
|
if i + 1 < len(dtokens) and dtokens[i + 1].surface == " ":
|
||||||
text_spaces[-1] = True
|
text_spaces[-1] = True
|
||||||
text_pos += 1
|
text_pos += 1
|
||||||
|
|
||||||
|
|
|
@ -8,7 +8,7 @@ from .. import attrs
|
||||||
_like_email = re.compile(r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)").match
|
_like_email = re.compile(r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)").match
|
||||||
_tlds = set(
|
_tlds = set(
|
||||||
"com|org|edu|gov|net|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|"
|
"com|org|edu|gov|net|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|"
|
||||||
"name|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|"
|
"name|pro|tel|travel|xyz|icu|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|"
|
||||||
"ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|"
|
"ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|"
|
||||||
"cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|"
|
"cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|"
|
||||||
"ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|"
|
"ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|"
|
||||||
|
|
|
@ -1,7 +1,3 @@
|
||||||
# coding: utf8
|
|
||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
|
|
||||||
# Source: https://github.com/sanjaalcorps/NepaliStopWords/blob/master/NepaliStopWords.txt
|
# Source: https://github.com/sanjaalcorps/NepaliStopWords/blob/master/NepaliStopWords.txt
|
||||||
|
|
||||||
STOP_WORDS = set(
|
STOP_WORDS = set(
|
||||||
|
|
16
spacy/lang/sa/__init__.py
Normal file
16
spacy/lang/sa/__init__.py
Normal file
|
@ -0,0 +1,16 @@
|
||||||
|
from .stop_words import STOP_WORDS
|
||||||
|
from .lex_attrs import LEX_ATTRS
|
||||||
|
from ...language import Language
|
||||||
|
|
||||||
|
|
||||||
|
class SanskritDefaults(Language.Defaults):
|
||||||
|
lex_attr_getters = LEX_ATTRS
|
||||||
|
stop_words = STOP_WORDS
|
||||||
|
|
||||||
|
|
||||||
|
class Sanskrit(Language):
|
||||||
|
lang = "sa"
|
||||||
|
Defaults = SanskritDefaults
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = ["Sanskrit"]
|
15
spacy/lang/sa/examples.py
Normal file
15
spacy/lang/sa/examples.py
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
"""
|
||||||
|
Example sentences to test spaCy and its language models.
|
||||||
|
|
||||||
|
>>> from spacy.lang.sa.examples import sentences
|
||||||
|
>>> docs = nlp.pipe(sentences)
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
sentences = [
|
||||||
|
"अभ्यावहति कल्याणं विविधं वाक् सुभाषिता ।",
|
||||||
|
"मनसि व्याकुले चक्षुः पश्यन्नपि न पश्यति ।",
|
||||||
|
"यस्य बुद्धिर्बलं तस्य निर्बुद्धेस्तु कुतो बलम्?",
|
||||||
|
"परो अपि हितवान् बन्धुः बन्धुः अपि अहितः परः ।",
|
||||||
|
"अहितः देहजः व्याधिः हितम् आरण्यं औषधम् ॥",
|
||||||
|
]
|
127
spacy/lang/sa/lex_attrs.py
Normal file
127
spacy/lang/sa/lex_attrs.py
Normal file
|
@ -0,0 +1,127 @@
|
||||||
|
from ...attrs import LIKE_NUM
|
||||||
|
|
||||||
|
# reference 1: https://en.wikibooks.org/wiki/Sanskrit/Numbers
|
||||||
|
|
||||||
|
_num_words = [
|
||||||
|
"एकः",
|
||||||
|
"द्वौ",
|
||||||
|
"त्रयः",
|
||||||
|
"चत्वारः",
|
||||||
|
"पञ्च",
|
||||||
|
"षट्",
|
||||||
|
"सप्त",
|
||||||
|
"अष्ट",
|
||||||
|
"नव",
|
||||||
|
"दश",
|
||||||
|
"एकादश",
|
||||||
|
"द्वादश",
|
||||||
|
"त्रयोदश",
|
||||||
|
"चतुर्दश",
|
||||||
|
"पञ्चदश",
|
||||||
|
"षोडश",
|
||||||
|
"सप्तदश",
|
||||||
|
"अष्टादश",
|
||||||
|
"एकान्नविंशति",
|
||||||
|
"विंशति",
|
||||||
|
"एकाविंशति",
|
||||||
|
"द्वाविंशति",
|
||||||
|
"त्रयोविंशति",
|
||||||
|
"चतुर्विंशति",
|
||||||
|
"पञ्चविंशति",
|
||||||
|
"षड्विंशति",
|
||||||
|
"सप्तविंशति",
|
||||||
|
"अष्टाविंशति",
|
||||||
|
"एकान्नत्रिंशत्",
|
||||||
|
"त्रिंशत्",
|
||||||
|
"एकत्रिंशत्",
|
||||||
|
"द्वात्रिंशत्",
|
||||||
|
"त्रयत्रिंशत्",
|
||||||
|
"चतुस्त्रिंशत्",
|
||||||
|
"पञ्चत्रिंशत्",
|
||||||
|
"षट्त्रिंशत्",
|
||||||
|
"सप्तत्रिंशत्",
|
||||||
|
"अष्टात्रिंशत्",
|
||||||
|
"एकोनचत्वारिंशत्",
|
||||||
|
"चत्वारिंशत्",
|
||||||
|
"एकचत्वारिंशत्",
|
||||||
|
"द्वाचत्वारिंशत्",
|
||||||
|
"त्रयश्चत्वारिंशत्",
|
||||||
|
"चतुश्चत्वारिंशत्",
|
||||||
|
"पञ्चचत्वारिंशत्",
|
||||||
|
"षट्चत्वारिंशत्",
|
||||||
|
"सप्तचत्वारिंशत्",
|
||||||
|
"अष्टाचत्वारिंशत्",
|
||||||
|
"एकोनपञ्चाशत्",
|
||||||
|
"पञ्चाशत्",
|
||||||
|
"एकपञ्चाशत्",
|
||||||
|
"द्विपञ्चाशत्",
|
||||||
|
"त्रिपञ्चाशत्",
|
||||||
|
"चतुःपञ्चाशत्",
|
||||||
|
"पञ्चपञ्चाशत्",
|
||||||
|
"षट्पञ्चाशत्",
|
||||||
|
"सप्तपञ्चाशत्",
|
||||||
|
"अष्टपञ्चाशत्",
|
||||||
|
"एकोनषष्ठिः",
|
||||||
|
"षष्ठिः",
|
||||||
|
"एकषष्ठिः",
|
||||||
|
"द्विषष्ठिः",
|
||||||
|
"त्रिषष्ठिः",
|
||||||
|
"चतुःषष्ठिः",
|
||||||
|
"पञ्चषष्ठिः",
|
||||||
|
"षट्षष्ठिः",
|
||||||
|
"सप्तषष्ठिः",
|
||||||
|
"अष्टषष्ठिः",
|
||||||
|
"एकोनसप्ततिः",
|
||||||
|
"सप्ततिः",
|
||||||
|
"एकसप्ततिः",
|
||||||
|
"द्विसप्ततिः",
|
||||||
|
"त्रिसप्ततिः",
|
||||||
|
"चतुःसप्ततिः",
|
||||||
|
"पञ्चसप्ततिः",
|
||||||
|
"षट्सप्ततिः",
|
||||||
|
"सप्तसप्ततिः",
|
||||||
|
"अष्टसप्ततिः",
|
||||||
|
"एकोनाशीतिः",
|
||||||
|
"अशीतिः",
|
||||||
|
"एकाशीतिः",
|
||||||
|
"द्वशीतिः",
|
||||||
|
"त्र्यशीतिः",
|
||||||
|
"चतुरशीतिः",
|
||||||
|
"पञ्चाशीतिः",
|
||||||
|
"षडशीतिः",
|
||||||
|
"सप्ताशीतिः",
|
||||||
|
"अष्टाशीतिः",
|
||||||
|
"एकोननवतिः",
|
||||||
|
"नवतिः",
|
||||||
|
"एकनवतिः",
|
||||||
|
"द्विनवतिः",
|
||||||
|
"त्रिनवतिः",
|
||||||
|
"चतुर्नवतिः",
|
||||||
|
"पञ्चनवतिः",
|
||||||
|
"षण्णवतिः",
|
||||||
|
"सप्तनवतिः",
|
||||||
|
"अष्टनवतिः",
|
||||||
|
"एकोनशतम्",
|
||||||
|
"शतम्",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def like_num(text):
|
||||||
|
"""
|
||||||
|
Check if text resembles a number
|
||||||
|
"""
|
||||||
|
if text.startswith(("+", "-", "±", "~")):
|
||||||
|
text = text[1:]
|
||||||
|
text = text.replace(",", "").replace(".", "")
|
||||||
|
if text.isdigit():
|
||||||
|
return True
|
||||||
|
if text.count("/") == 1:
|
||||||
|
num, denom = text.split("/")
|
||||||
|
if num.isdigit() and denom.isdigit():
|
||||||
|
return True
|
||||||
|
if text in _num_words:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
LEX_ATTRS = {LIKE_NUM: like_num}
|
515
spacy/lang/sa/stop_words.py
Normal file
515
spacy/lang/sa/stop_words.py
Normal file
|
@ -0,0 +1,515 @@
|
||||||
|
# Source: https://gist.github.com/Akhilesh28/fe8b8e180f64b72e64751bc31cb6d323
|
||||||
|
|
||||||
|
STOP_WORDS = set(
|
||||||
|
"""
|
||||||
|
अहम्
|
||||||
|
आवाम्
|
||||||
|
वयम्
|
||||||
|
माम् मा
|
||||||
|
आवाम्
|
||||||
|
अस्मान् नः
|
||||||
|
मया
|
||||||
|
आवाभ्याम्
|
||||||
|
अस्माभिस्
|
||||||
|
मह्यम् मे
|
||||||
|
आवाभ्याम् नौ
|
||||||
|
अस्मभ्यम् नः
|
||||||
|
मत्
|
||||||
|
आवाभ्याम्
|
||||||
|
अस्मत्
|
||||||
|
मम मे
|
||||||
|
आवयोः
|
||||||
|
अस्माकम् नः
|
||||||
|
मयि
|
||||||
|
आवयोः
|
||||||
|
अस्मासु
|
||||||
|
त्वम्
|
||||||
|
युवाम्
|
||||||
|
यूयम्
|
||||||
|
त्वाम् त्वा
|
||||||
|
युवाम् वाम्
|
||||||
|
युष्मान् वः
|
||||||
|
त्वया
|
||||||
|
युवाभ्याम्
|
||||||
|
युष्माभिः
|
||||||
|
तुभ्यम् ते
|
||||||
|
युवाभ्याम् वाम्
|
||||||
|
युष्मभ्यम् वः
|
||||||
|
त्वत्
|
||||||
|
युवाभ्याम्
|
||||||
|
युष्मत्
|
||||||
|
तव ते
|
||||||
|
युवयोः वाम्
|
||||||
|
युष्माकम् वः
|
||||||
|
त्वयि
|
||||||
|
युवयोः
|
||||||
|
युष्मासु
|
||||||
|
सः
|
||||||
|
तौ
|
||||||
|
ते
|
||||||
|
तम्
|
||||||
|
तौ
|
||||||
|
तान्
|
||||||
|
तेन
|
||||||
|
ताभ्याम्
|
||||||
|
तैः
|
||||||
|
तस्मै
|
||||||
|
ताभ्याम्
|
||||||
|
तेभ्यः
|
||||||
|
तस्मात्
|
||||||
|
ताभ्याम्
|
||||||
|
तेभ्यः
|
||||||
|
तस्य
|
||||||
|
तयोः
|
||||||
|
तेषाम्
|
||||||
|
तस्मिन्
|
||||||
|
तयोः
|
||||||
|
तेषु
|
||||||
|
सा
|
||||||
|
ते
|
||||||
|
ताः
|
||||||
|
ताम्
|
||||||
|
ते
|
||||||
|
ताः
|
||||||
|
तया
|
||||||
|
ताभ्याम्
|
||||||
|
ताभिः
|
||||||
|
तस्यै
|
||||||
|
ताभ्याम्
|
||||||
|
ताभ्यः
|
||||||
|
तस्याः
|
||||||
|
ताभ्याम्
|
||||||
|
ताभ्यः
|
||||||
|
तस्य
|
||||||
|
तयोः
|
||||||
|
तासाम्
|
||||||
|
तस्याम्
|
||||||
|
तयोः
|
||||||
|
तासु
|
||||||
|
तत्
|
||||||
|
ते
|
||||||
|
तानि
|
||||||
|
तत्
|
||||||
|
ते
|
||||||
|
तानि
|
||||||
|
तया
|
||||||
|
ताभ्याम्
|
||||||
|
ताभिः
|
||||||
|
तस्यै
|
||||||
|
ताभ्याम्
|
||||||
|
ताभ्यः
|
||||||
|
तस्याः
|
||||||
|
ताभ्याम्
|
||||||
|
ताभ्यः
|
||||||
|
तस्य
|
||||||
|
तयोः
|
||||||
|
तासाम्
|
||||||
|
तस्याम्
|
||||||
|
तयोः
|
||||||
|
तासु
|
||||||
|
अयम्
|
||||||
|
इमौ
|
||||||
|
इमे
|
||||||
|
इमम्
|
||||||
|
इमौ
|
||||||
|
इमान्
|
||||||
|
अनेन
|
||||||
|
आभ्याम्
|
||||||
|
एभिः
|
||||||
|
अस्मै
|
||||||
|
आभ्याम्
|
||||||
|
एभ्यः
|
||||||
|
अस्मात्
|
||||||
|
आभ्याम्
|
||||||
|
एभ्यः
|
||||||
|
अस्य
|
||||||
|
अनयोः
|
||||||
|
एषाम्
|
||||||
|
अस्मिन्
|
||||||
|
अनयोः
|
||||||
|
एषु
|
||||||
|
इयम्
|
||||||
|
इमे
|
||||||
|
इमाः
|
||||||
|
इमाम्
|
||||||
|
इमे
|
||||||
|
इमाः
|
||||||
|
अनया
|
||||||
|
आभ्याम्
|
||||||
|
आभिः
|
||||||
|
अस्यै
|
||||||
|
आभ्याम्
|
||||||
|
आभ्यः
|
||||||
|
अस्याः
|
||||||
|
आभ्याम्
|
||||||
|
आभ्यः
|
||||||
|
अस्याः
|
||||||
|
अनयोः
|
||||||
|
आसाम्
|
||||||
|
अस्याम्
|
||||||
|
अनयोः
|
||||||
|
आसु
|
||||||
|
इदम्
|
||||||
|
इमे
|
||||||
|
इमानि
|
||||||
|
इदम्
|
||||||
|
इमे
|
||||||
|
इमानि
|
||||||
|
अनेन
|
||||||
|
आभ्याम्
|
||||||
|
एभिः
|
||||||
|
अस्मै
|
||||||
|
आभ्याम्
|
||||||
|
एभ्यः
|
||||||
|
अस्मात्
|
||||||
|
आभ्याम्
|
||||||
|
एभ्यः
|
||||||
|
अस्य
|
||||||
|
अनयोः
|
||||||
|
एषाम्
|
||||||
|
अस्मिन्
|
||||||
|
अनयोः
|
||||||
|
एषु
|
||||||
|
एषः
|
||||||
|
एतौ
|
||||||
|
एते
|
||||||
|
एतम् एनम्
|
||||||
|
एतौ एनौ
|
||||||
|
एतान् एनान्
|
||||||
|
एतेन
|
||||||
|
एताभ्याम्
|
||||||
|
एतैः
|
||||||
|
एतस्मै
|
||||||
|
एताभ्याम्
|
||||||
|
एतेभ्यः
|
||||||
|
एतस्मात्
|
||||||
|
एताभ्याम्
|
||||||
|
एतेभ्यः
|
||||||
|
एतस्य
|
||||||
|
एतस्मिन्
|
||||||
|
एतेषाम्
|
||||||
|
एतस्मिन्
|
||||||
|
एतस्मिन्
|
||||||
|
एतेषु
|
||||||
|
एषा
|
||||||
|
एते
|
||||||
|
एताः
|
||||||
|
एताम् एनाम्
|
||||||
|
एते एने
|
||||||
|
एताः एनाः
|
||||||
|
एतया एनया
|
||||||
|
एताभ्याम्
|
||||||
|
एताभिः
|
||||||
|
एतस्यै
|
||||||
|
एताभ्याम्
|
||||||
|
एताभ्यः
|
||||||
|
एतस्याः
|
||||||
|
एताभ्याम्
|
||||||
|
एताभ्यः
|
||||||
|
एतस्याः
|
||||||
|
एतयोः एनयोः
|
||||||
|
एतासाम्
|
||||||
|
एतस्याम्
|
||||||
|
एतयोः एनयोः
|
||||||
|
एतासु
|
||||||
|
एतत् एतद्
|
||||||
|
एते
|
||||||
|
एतानि
|
||||||
|
एतत् एतद् एनत् एनद्
|
||||||
|
एते एने
|
||||||
|
एतानि एनानि
|
||||||
|
एतेन एनेन
|
||||||
|
एताभ्याम्
|
||||||
|
एतैः
|
||||||
|
एतस्मै
|
||||||
|
एताभ्याम्
|
||||||
|
एतेभ्यः
|
||||||
|
एतस्मात्
|
||||||
|
एताभ्याम्
|
||||||
|
एतेभ्यः
|
||||||
|
एतस्य
|
||||||
|
एतयोः एनयोः
|
||||||
|
एतेषाम्
|
||||||
|
एतस्मिन्
|
||||||
|
एतयोः एनयोः
|
||||||
|
एतेषु
|
||||||
|
असौ
|
||||||
|
अमू
|
||||||
|
अमी
|
||||||
|
अमूम्
|
||||||
|
अमू
|
||||||
|
अमून्
|
||||||
|
अमुना
|
||||||
|
अमूभ्याम्
|
||||||
|
अमीभिः
|
||||||
|
अमुष्मै
|
||||||
|
अमूभ्याम्
|
||||||
|
अमीभ्यः
|
||||||
|
अमुष्मात्
|
||||||
|
अमूभ्याम्
|
||||||
|
अमीभ्यः
|
||||||
|
अमुष्य
|
||||||
|
अमुयोः
|
||||||
|
अमीषाम्
|
||||||
|
अमुष्मिन्
|
||||||
|
अमुयोः
|
||||||
|
अमीषु
|
||||||
|
असौ
|
||||||
|
अमू
|
||||||
|
अमूः
|
||||||
|
अमूम्
|
||||||
|
अमू
|
||||||
|
अमूः
|
||||||
|
अमुया
|
||||||
|
अमूभ्याम्
|
||||||
|
अमूभिः
|
||||||
|
अमुष्यै
|
||||||
|
अमूभ्याम्
|
||||||
|
अमूभ्यः
|
||||||
|
अमुष्याः
|
||||||
|
अमूभ्याम्
|
||||||
|
अमूभ्यः
|
||||||
|
अमुष्याः
|
||||||
|
अमुयोः
|
||||||
|
अमूषाम्
|
||||||
|
अमुष्याम्
|
||||||
|
अमुयोः
|
||||||
|
अमूषु
|
||||||
|
अमु
|
||||||
|
अमुनी
|
||||||
|
अमूनि
|
||||||
|
अमु
|
||||||
|
अमुनी
|
||||||
|
अमूनि
|
||||||
|
अमुना
|
||||||
|
अमूभ्याम्
|
||||||
|
अमीभिः
|
||||||
|
अमुष्मै
|
||||||
|
अमूभ्याम्
|
||||||
|
अमीभ्यः
|
||||||
|
अमुष्मात्
|
||||||
|
अमूभ्याम्
|
||||||
|
अमीभ्यः
|
||||||
|
अमुष्य
|
||||||
|
अमुयोः
|
||||||
|
अमीषाम्
|
||||||
|
अमुष्मिन्
|
||||||
|
अमुयोः
|
||||||
|
अमीषु
|
||||||
|
कः
|
||||||
|
कौ
|
||||||
|
के
|
||||||
|
कम्
|
||||||
|
कौ
|
||||||
|
कान्
|
||||||
|
केन
|
||||||
|
काभ्याम्
|
||||||
|
कैः
|
||||||
|
कस्मै
|
||||||
|
काभ्याम्
|
||||||
|
केभ्य
|
||||||
|
कस्मात्
|
||||||
|
काभ्याम्
|
||||||
|
केभ्य
|
||||||
|
कस्य
|
||||||
|
कयोः
|
||||||
|
केषाम्
|
||||||
|
कस्मिन्
|
||||||
|
कयोः
|
||||||
|
केषु
|
||||||
|
का
|
||||||
|
के
|
||||||
|
काः
|
||||||
|
काम्
|
||||||
|
के
|
||||||
|
काः
|
||||||
|
कया
|
||||||
|
काभ्याम्
|
||||||
|
काभिः
|
||||||
|
कस्यै
|
||||||
|
काभ्याम्
|
||||||
|
काभ्यः
|
||||||
|
कस्याः
|
||||||
|
काभ्याम्
|
||||||
|
काभ्यः
|
||||||
|
कस्याः
|
||||||
|
कयोः
|
||||||
|
कासाम्
|
||||||
|
कस्याम्
|
||||||
|
कयोः
|
||||||
|
कासु
|
||||||
|
किम्
|
||||||
|
के
|
||||||
|
कानि
|
||||||
|
किम्
|
||||||
|
के
|
||||||
|
कानि
|
||||||
|
केन
|
||||||
|
काभ्याम्
|
||||||
|
कैः
|
||||||
|
कस्मै
|
||||||
|
काभ्याम्
|
||||||
|
केभ्य
|
||||||
|
कस्मात्
|
||||||
|
काभ्याम्
|
||||||
|
केभ्य
|
||||||
|
कस्य
|
||||||
|
कयोः
|
||||||
|
केषाम्
|
||||||
|
कस्मिन्
|
||||||
|
कयोः
|
||||||
|
केषु
|
||||||
|
भवान्
|
||||||
|
भवन्तौ
|
||||||
|
भवन्तः
|
||||||
|
भवन्तम्
|
||||||
|
भवन्तौ
|
||||||
|
भवतः
|
||||||
|
भवता
|
||||||
|
भवद्भ्याम्
|
||||||
|
भवद्भिः
|
||||||
|
भवते
|
||||||
|
भवद्भ्याम्
|
||||||
|
भवद्भ्यः
|
||||||
|
भवतः
|
||||||
|
भवद्भ्याम्
|
||||||
|
भवद्भ्यः
|
||||||
|
भवतः
|
||||||
|
भवतोः
|
||||||
|
भवताम्
|
||||||
|
भवति
|
||||||
|
भवतोः
|
||||||
|
भवत्सु
|
||||||
|
भवती
|
||||||
|
भवत्यौ
|
||||||
|
भवत्यः
|
||||||
|
भवतीम्
|
||||||
|
भवत्यौ
|
||||||
|
भवतीः
|
||||||
|
भवत्या
|
||||||
|
भवतीभ्याम्
|
||||||
|
भवतीभिः
|
||||||
|
भवत्यै
|
||||||
|
भवतीभ्याम्
|
||||||
|
भवतीभिः
|
||||||
|
भवत्याः
|
||||||
|
भवतीभ्याम्
|
||||||
|
भवतीभिः
|
||||||
|
भवत्याः
|
||||||
|
भवत्योः
|
||||||
|
भवतीनाम्
|
||||||
|
भवत्याम्
|
||||||
|
भवत्योः
|
||||||
|
भवतीषु
|
||||||
|
भवत्
|
||||||
|
भवती
|
||||||
|
भवन्ति
|
||||||
|
भवत्
|
||||||
|
भवती
|
||||||
|
भवन्ति
|
||||||
|
भवता
|
||||||
|
भवद्भ्याम्
|
||||||
|
भवद्भिः
|
||||||
|
भवते
|
||||||
|
भवद्भ्याम्
|
||||||
|
भवद्भ्यः
|
||||||
|
भवतः
|
||||||
|
भवद्भ्याम्
|
||||||
|
भवद्भ्यः
|
||||||
|
भवतः
|
||||||
|
भवतोः
|
||||||
|
भवताम्
|
||||||
|
भवति
|
||||||
|
भवतोः
|
||||||
|
भवत्सु
|
||||||
|
अये
|
||||||
|
अरे
|
||||||
|
अरेरे
|
||||||
|
अविधा
|
||||||
|
असाधुना
|
||||||
|
अस्तोभ
|
||||||
|
अहह
|
||||||
|
अहावस्
|
||||||
|
आम्
|
||||||
|
आर्यहलम्
|
||||||
|
आह
|
||||||
|
आहो
|
||||||
|
इस्
|
||||||
|
उम्
|
||||||
|
उवे
|
||||||
|
काम्
|
||||||
|
कुम्
|
||||||
|
चमत्
|
||||||
|
टसत्
|
||||||
|
दृन्
|
||||||
|
धिक्
|
||||||
|
पाट्
|
||||||
|
फत्
|
||||||
|
फाट्
|
||||||
|
फुडुत्
|
||||||
|
बत
|
||||||
|
बाल्
|
||||||
|
वट्
|
||||||
|
व्यवस्तोभति व्यवस्तुभ्
|
||||||
|
षाट्
|
||||||
|
स्तोभ
|
||||||
|
हुम्मा
|
||||||
|
हूम्
|
||||||
|
अति
|
||||||
|
अधि
|
||||||
|
अनु
|
||||||
|
अप
|
||||||
|
अपि
|
||||||
|
अभि
|
||||||
|
अव
|
||||||
|
आ
|
||||||
|
उद्
|
||||||
|
उप
|
||||||
|
नि
|
||||||
|
निर्
|
||||||
|
परा
|
||||||
|
परि
|
||||||
|
प्र
|
||||||
|
प्रति
|
||||||
|
वि
|
||||||
|
सम्
|
||||||
|
अथवा उत
|
||||||
|
अन्यथा
|
||||||
|
इव
|
||||||
|
च
|
||||||
|
चेत् यदि
|
||||||
|
तु परन्तु
|
||||||
|
यतः करणेन हि यतस् यदर्थम् यदर्थे यर्हि यथा यत्कारणम् येन ही हिन
|
||||||
|
यथा यतस्
|
||||||
|
यद्यपि
|
||||||
|
यात् अवधेस् यावति
|
||||||
|
येन प्रकारेण
|
||||||
|
स्थाने
|
||||||
|
अह
|
||||||
|
एव
|
||||||
|
एवम्
|
||||||
|
कच्चित्
|
||||||
|
कु
|
||||||
|
कुवित्
|
||||||
|
कूपत्
|
||||||
|
च
|
||||||
|
चण्
|
||||||
|
चेत्
|
||||||
|
तत्र
|
||||||
|
नकिम्
|
||||||
|
नह
|
||||||
|
नुनम्
|
||||||
|
नेत्
|
||||||
|
भूयस्
|
||||||
|
मकिम्
|
||||||
|
मकिर्
|
||||||
|
यत्र
|
||||||
|
युगपत्
|
||||||
|
वा
|
||||||
|
शश्वत्
|
||||||
|
सूपत्
|
||||||
|
ह
|
||||||
|
हन्त
|
||||||
|
हि
|
||||||
|
""".split()
|
||||||
|
)
|
|
@ -34,13 +34,13 @@ URL_PATTERN = (
|
||||||
r"|"
|
r"|"
|
||||||
# host & domain names
|
# host & domain names
|
||||||
# mods: match is case-sensitive, so include [A-Z]
|
# mods: match is case-sensitive, so include [A-Z]
|
||||||
"(?:" # noqa: E131
|
r"(?:" # noqa: E131
|
||||||
"(?:"
|
r"(?:"
|
||||||
"[A-Za-z0-9\u00a1-\uffff]"
|
r"[A-Za-z0-9\u00a1-\uffff]"
|
||||||
"[A-Za-z0-9\u00a1-\uffff_-]{0,62}"
|
r"[A-Za-z0-9\u00a1-\uffff_-]{0,62}"
|
||||||
")?"
|
r")?"
|
||||||
"[A-Za-z0-9\u00a1-\uffff]\."
|
r"[A-Za-z0-9\u00a1-\uffff]\."
|
||||||
")+"
|
r")+"
|
||||||
# TLD identifier
|
# TLD identifier
|
||||||
# mods: use ALPHA_LOWER instead of a wider range so that this doesn't match
|
# mods: use ALPHA_LOWER instead of a wider range so that this doesn't match
|
||||||
# strings like "lower.Upper", which can be split on "." by infixes in some
|
# strings like "lower.Upper", which can be split on "." by infixes in some
|
||||||
|
@ -128,6 +128,8 @@ emoticons = set(
|
||||||
:-]
|
:-]
|
||||||
[:
|
[:
|
||||||
[-:
|
[-:
|
||||||
|
[=
|
||||||
|
=]
|
||||||
:o)
|
:o)
|
||||||
(o:
|
(o:
|
||||||
:}
|
:}
|
||||||
|
@ -159,6 +161,8 @@ emoticons = set(
|
||||||
=|
|
=|
|
||||||
:|
|
:|
|
||||||
:-|
|
:-|
|
||||||
|
]=
|
||||||
|
=[
|
||||||
:1
|
:1
|
||||||
:P
|
:P
|
||||||
:-P
|
:-P
|
||||||
|
|
|
@ -1,9 +1,8 @@
|
||||||
from typing import Optional, Any, Dict, Callable, Iterable, Union, List, Pattern
|
from typing import Optional, Any, Dict, Callable, Iterable, Union, List, Pattern
|
||||||
from typing import Tuple, Iterator, Optional
|
from typing import Tuple, Iterator
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
import random
|
import random
|
||||||
import itertools
|
import itertools
|
||||||
import weakref
|
|
||||||
import functools
|
import functools
|
||||||
from contextlib import contextmanager
|
from contextlib import contextmanager
|
||||||
from copy import deepcopy
|
from copy import deepcopy
|
||||||
|
@ -1378,8 +1377,6 @@ class Language:
|
||||||
docs = (self.make_doc(text) for text in texts)
|
docs = (self.make_doc(text) for text in texts)
|
||||||
for pipe in pipes:
|
for pipe in pipes:
|
||||||
docs = pipe(docs)
|
docs = pipe(docs)
|
||||||
|
|
||||||
nr_seen = 0
|
|
||||||
for doc in docs:
|
for doc in docs:
|
||||||
yield doc
|
yield doc
|
||||||
|
|
||||||
|
|
|
@ -829,9 +829,11 @@ def _get_extra_predicates(spec, extra_predicates):
|
||||||
attr = "ORTH"
|
attr = "ORTH"
|
||||||
attr = IDS.get(attr.upper())
|
attr = IDS.get(attr.upper())
|
||||||
if isinstance(value, dict):
|
if isinstance(value, dict):
|
||||||
|
processed = False
|
||||||
|
value_with_upper_keys = {k.upper(): v for k, v in value.items()}
|
||||||
for type_, cls in predicate_types.items():
|
for type_, cls in predicate_types.items():
|
||||||
if type_ in value:
|
if type_ in value_with_upper_keys:
|
||||||
predicate = cls(len(extra_predicates), attr, value[type_], type_)
|
predicate = cls(len(extra_predicates), attr, value_with_upper_keys[type_], type_)
|
||||||
# Don't create a redundant predicates.
|
# Don't create a redundant predicates.
|
||||||
# This helps with efficiency, as we're caching the results.
|
# This helps with efficiency, as we're caching the results.
|
||||||
if predicate.key in seen_predicates:
|
if predicate.key in seen_predicates:
|
||||||
|
@ -840,6 +842,9 @@ def _get_extra_predicates(spec, extra_predicates):
|
||||||
extra_predicates.append(predicate)
|
extra_predicates.append(predicate)
|
||||||
output.append(predicate.i)
|
output.append(predicate.i)
|
||||||
seen_predicates[predicate.key] = predicate.i
|
seen_predicates[predicate.key] = predicate.i
|
||||||
|
processed = True
|
||||||
|
if not processed:
|
||||||
|
warnings.warn(Warnings.W035.format(pattern=value))
|
||||||
return output
|
return output
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -133,7 +133,7 @@ class EntityRuler:
|
||||||
matches = set(
|
matches = set(
|
||||||
[(m_id, start, end) for m_id, start, end in matches if start != end]
|
[(m_id, start, end) for m_id, start, end in matches if start != end]
|
||||||
)
|
)
|
||||||
get_sort_key = lambda m: (m[2] - m[1], m[1])
|
get_sort_key = lambda m: (m[2] - m[1], -m[1])
|
||||||
matches = sorted(matches, key=get_sort_key, reverse=True)
|
matches = sorted(matches, key=get_sort_key, reverse=True)
|
||||||
entities = list(doc.ents)
|
entities = list(doc.ents)
|
||||||
new_entities = []
|
new_entities = []
|
||||||
|
|
|
@ -57,12 +57,13 @@ def validate_token_pattern(obj: list) -> List[str]:
|
||||||
|
|
||||||
|
|
||||||
class TokenPatternString(BaseModel):
|
class TokenPatternString(BaseModel):
|
||||||
REGEX: Optional[StrictStr]
|
REGEX: Optional[StrictStr] = Field(None, alias="regex")
|
||||||
IN: Optional[List[StrictStr]]
|
IN: Optional[List[StrictStr]] = Field(None, alias="in")
|
||||||
NOT_IN: Optional[List[StrictStr]]
|
NOT_IN: Optional[List[StrictStr]] = Field(None, alias="not_in")
|
||||||
|
|
||||||
class Config:
|
class Config:
|
||||||
extra = "forbid"
|
extra = "forbid"
|
||||||
|
allow_population_by_field_name = True # allow alias and field name
|
||||||
|
|
||||||
@validator("*", pre=True, each_item=True, allow_reuse=True)
|
@validator("*", pre=True, each_item=True, allow_reuse=True)
|
||||||
def raise_for_none(cls, v):
|
def raise_for_none(cls, v):
|
||||||
|
@ -72,9 +73,9 @@ class TokenPatternString(BaseModel):
|
||||||
|
|
||||||
|
|
||||||
class TokenPatternNumber(BaseModel):
|
class TokenPatternNumber(BaseModel):
|
||||||
REGEX: Optional[StrictStr] = None
|
REGEX: Optional[StrictStr] = Field(None, alias="regex")
|
||||||
IN: Optional[List[StrictInt]] = None
|
IN: Optional[List[StrictInt]] = Field(None, alias="in")
|
||||||
NOT_IN: Optional[List[StrictInt]] = None
|
NOT_IN: Optional[List[StrictInt]] = Field(None, alias="not_in")
|
||||||
EQ: Union[StrictInt, StrictFloat] = Field(None, alias="==")
|
EQ: Union[StrictInt, StrictFloat] = Field(None, alias="==")
|
||||||
NEQ: Union[StrictInt, StrictFloat] = Field(None, alias="!=")
|
NEQ: Union[StrictInt, StrictFloat] = Field(None, alias="!=")
|
||||||
GEQ: Union[StrictInt, StrictFloat] = Field(None, alias=">=")
|
GEQ: Union[StrictInt, StrictFloat] = Field(None, alias=">=")
|
||||||
|
@ -84,6 +85,7 @@ class TokenPatternNumber(BaseModel):
|
||||||
|
|
||||||
class Config:
|
class Config:
|
||||||
extra = "forbid"
|
extra = "forbid"
|
||||||
|
allow_population_by_field_name = True # allow alias and field name
|
||||||
|
|
||||||
@validator("*", pre=True, each_item=True, allow_reuse=True)
|
@validator("*", pre=True, each_item=True, allow_reuse=True)
|
||||||
def raise_for_none(cls, v):
|
def raise_for_none(cls, v):
|
||||||
|
|
|
@ -44,6 +44,11 @@ def ca_tokenizer():
|
||||||
return get_lang_class("ca")().tokenizer
|
return get_lang_class("ca")().tokenizer
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture(scope="session")
|
||||||
|
def cs_tokenizer():
|
||||||
|
return get_lang_class("cs")().tokenizer
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture(scope="session")
|
@pytest.fixture(scope="session")
|
||||||
def da_tokenizer():
|
def da_tokenizer():
|
||||||
return get_lang_class("da")().tokenizer
|
return get_lang_class("da")().tokenizer
|
||||||
|
@ -204,6 +209,11 @@ def ru_lemmatizer():
|
||||||
return get_lang_class("ru")().add_pipe("lemmatizer")
|
return get_lang_class("ru")().add_pipe("lemmatizer")
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture(scope="session")
|
||||||
|
def sa_tokenizer():
|
||||||
|
return get_lang_class("sa")().tokenizer
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture(scope="session")
|
@pytest.fixture(scope="session")
|
||||||
def sr_tokenizer():
|
def sr_tokenizer():
|
||||||
return get_lang_class("sr")().tokenizer
|
return get_lang_class("sr")().tokenizer
|
||||||
|
|
|
@ -162,11 +162,36 @@ def test_spans_are_hashable(en_tokenizer):
|
||||||
|
|
||||||
def test_spans_by_character(doc):
|
def test_spans_by_character(doc):
|
||||||
span1 = doc[1:-2]
|
span1 = doc[1:-2]
|
||||||
|
|
||||||
|
# default and specified alignment mode "strict"
|
||||||
span2 = doc.char_span(span1.start_char, span1.end_char, label="GPE")
|
span2 = doc.char_span(span1.start_char, span1.end_char, label="GPE")
|
||||||
assert span1.start_char == span2.start_char
|
assert span1.start_char == span2.start_char
|
||||||
assert span1.end_char == span2.end_char
|
assert span1.end_char == span2.end_char
|
||||||
assert span2.label_ == "GPE"
|
assert span2.label_ == "GPE"
|
||||||
|
|
||||||
|
span2 = doc.char_span(
|
||||||
|
span1.start_char, span1.end_char, label="GPE", alignment_mode="strict"
|
||||||
|
)
|
||||||
|
assert span1.start_char == span2.start_char
|
||||||
|
assert span1.end_char == span2.end_char
|
||||||
|
assert span2.label_ == "GPE"
|
||||||
|
|
||||||
|
# alignment mode "contract"
|
||||||
|
span2 = doc.char_span(
|
||||||
|
span1.start_char - 3, span1.end_char, label="GPE", alignment_mode="contract"
|
||||||
|
)
|
||||||
|
assert span1.start_char == span2.start_char
|
||||||
|
assert span1.end_char == span2.end_char
|
||||||
|
assert span2.label_ == "GPE"
|
||||||
|
|
||||||
|
# alignment mode "expand"
|
||||||
|
span2 = doc.char_span(
|
||||||
|
span1.start_char + 1, span1.end_char, label="GPE", alignment_mode="expand"
|
||||||
|
)
|
||||||
|
assert span1.start_char == span2.start_char
|
||||||
|
assert span1.end_char == span2.end_char
|
||||||
|
assert span2.label_ == "GPE"
|
||||||
|
|
||||||
|
|
||||||
def test_span_to_array(doc):
|
def test_span_to_array(doc):
|
||||||
span = doc[1:-2]
|
span = doc[1:-2]
|
||||||
|
|
0
spacy/tests/lang/cs/__init__.py
Normal file
0
spacy/tests/lang/cs/__init__.py
Normal file
23
spacy/tests/lang/cs/test_text.py
Normal file
23
spacy/tests/lang/cs/test_text.py
Normal file
|
@ -0,0 +1,23 @@
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"text,match",
|
||||||
|
[
|
||||||
|
("10", True),
|
||||||
|
("1", True),
|
||||||
|
("10.000", True),
|
||||||
|
("1000", True),
|
||||||
|
("999,0", True),
|
||||||
|
("devatenáct", True),
|
||||||
|
("osmdesát", True),
|
||||||
|
("kvadrilion", True),
|
||||||
|
("Pes", False),
|
||||||
|
(",", False),
|
||||||
|
("1/2", True),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_lex_attrs_like_number(cs_tokenizer, text, match):
|
||||||
|
tokens = cs_tokenizer(text)
|
||||||
|
assert len(tokens) == 1
|
||||||
|
assert tokens[0].like_num == match
|
|
@ -56,6 +56,11 @@ def test_lex_attrs_like_number(en_tokenizer, text, match):
|
||||||
assert tokens[0].like_num == match
|
assert tokens[0].like_num == match
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("word", ["third", "Millionth", "100th", "Hundredth"])
|
||||||
|
def test_en_lex_attrs_like_number_for_ordinal(word):
|
||||||
|
assert like_num(word)
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("word", ["eleven"])
|
@pytest.mark.parametrize("word", ["eleven"])
|
||||||
def test_en_lex_attrs_capitals(word):
|
def test_en_lex_attrs_capitals(word):
|
||||||
assert like_num(word)
|
assert like_num(word)
|
||||||
|
|
|
@ -1,4 +1,5 @@
|
||||||
import pytest
|
import pytest
|
||||||
|
from spacy.lang.he.lex_attrs import like_num
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize(
|
@pytest.mark.parametrize(
|
||||||
|
@ -39,3 +40,30 @@ def test_he_tokenizer_handles_abbreviation(he_tokenizer, text, expected_tokens):
|
||||||
def test_he_tokenizer_handles_punct(he_tokenizer, text, expected_tokens):
|
def test_he_tokenizer_handles_punct(he_tokenizer, text, expected_tokens):
|
||||||
tokens = he_tokenizer(text)
|
tokens = he_tokenizer(text)
|
||||||
assert expected_tokens == [token.text for token in tokens]
|
assert expected_tokens == [token.text for token in tokens]
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"text,match",
|
||||||
|
[
|
||||||
|
("10", True),
|
||||||
|
("1", True),
|
||||||
|
("10,000", True),
|
||||||
|
("10,00", True),
|
||||||
|
("999.0", True),
|
||||||
|
("אחד", True),
|
||||||
|
("שתיים", True),
|
||||||
|
("מליון", True),
|
||||||
|
("כלב", False),
|
||||||
|
(",", False),
|
||||||
|
("1/2", True),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_lex_attrs_like_number(he_tokenizer, text, match):
|
||||||
|
tokens = he_tokenizer(text)
|
||||||
|
assert len(tokens) == 1
|
||||||
|
assert tokens[0].like_num == match
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("word", ["שלישי", "מליון", "עשירי", "מאה", "עשר", "אחד עשר"])
|
||||||
|
def test_he_lex_attrs_like_number_for_ordinal(word):
|
||||||
|
assert like_num(word)
|
||||||
|
|
|
@ -1,6 +1,3 @@
|
||||||
# coding: utf-8
|
|
||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
|
0
spacy/tests/lang/sa/__init__.py
Normal file
0
spacy/tests/lang/sa/__init__.py
Normal file
42
spacy/tests/lang/sa/test_text.py
Normal file
42
spacy/tests/lang/sa/test_text.py
Normal file
|
@ -0,0 +1,42 @@
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
def test_sa_tokenizer_handles_long_text(sa_tokenizer):
|
||||||
|
text = """नानाविधानि दिव्यानि नानावर्णाकृतीनि च।।"""
|
||||||
|
tokens = sa_tokenizer(text)
|
||||||
|
assert len(tokens) == 6
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"text,length",
|
||||||
|
[
|
||||||
|
("श्री भगवानुवाच पश्य मे पार्थ रूपाणि शतशोऽथ सहस्रशः।", 9,),
|
||||||
|
("गुणान् सर्वान् स्वभावो मूर्ध्नि वर्तते ।", 6),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_sa_tokenizer_handles_cnts(sa_tokenizer, text, length):
|
||||||
|
tokens = sa_tokenizer(text)
|
||||||
|
assert len(tokens) == length
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"text,match",
|
||||||
|
[
|
||||||
|
("10", True),
|
||||||
|
("1", True),
|
||||||
|
("10.000", True),
|
||||||
|
("1000", True),
|
||||||
|
("999,0", True),
|
||||||
|
("एकः ", True),
|
||||||
|
("दश", True),
|
||||||
|
("पञ्चदश", True),
|
||||||
|
("चत्वारिंशत् ", True),
|
||||||
|
("कूपे", False),
|
||||||
|
(",", False),
|
||||||
|
("1/2", True),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_lex_attrs_like_number(sa_tokenizer, text, match):
|
||||||
|
tokens = sa_tokenizer(text)
|
||||||
|
assert len(tokens) == 1
|
||||||
|
assert tokens[0].like_num == match
|
|
@ -59,3 +59,12 @@ def test_minimal_pattern_validation(en_vocab, pattern, n_errors, n_min_errors):
|
||||||
matcher.add("TEST", [pattern])
|
matcher.add("TEST", [pattern])
|
||||||
elif n_errors == 0:
|
elif n_errors == 0:
|
||||||
matcher.add("TEST", [pattern])
|
matcher.add("TEST", [pattern])
|
||||||
|
|
||||||
|
|
||||||
|
def test_pattern_errors(en_vocab):
|
||||||
|
matcher = Matcher(en_vocab)
|
||||||
|
# normalize "regex" to upper like "text"
|
||||||
|
matcher.add("TEST1", [[{"text": {"regex": "regex"}}]])
|
||||||
|
# error if subpattern attribute isn't recognized and processed
|
||||||
|
with pytest.raises(MatchPatternError):
|
||||||
|
matcher.add("TEST2", [[{"TEXT": {"XX": "xx"}}]])
|
||||||
|
|
|
@ -150,3 +150,15 @@ def test_entity_ruler_properties(nlp, patterns):
|
||||||
ruler = EntityRuler(nlp, patterns=patterns, overwrite_ents=True)
|
ruler = EntityRuler(nlp, patterns=patterns, overwrite_ents=True)
|
||||||
assert sorted(ruler.labels) == sorted(["HELLO", "BYE", "COMPLEX", "TECH_ORG"])
|
assert sorted(ruler.labels) == sorted(["HELLO", "BYE", "COMPLEX", "TECH_ORG"])
|
||||||
assert sorted(ruler.ent_ids) == ["a1", "a2"]
|
assert sorted(ruler.ent_ids) == ["a1", "a2"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_entity_ruler_overlapping_spans(nlp):
|
||||||
|
ruler = EntityRuler(nlp)
|
||||||
|
patterns = [
|
||||||
|
{"label": "FOOBAR", "pattern": "foo bar"},
|
||||||
|
{"label": "BARBAZ", "pattern": "bar baz"},
|
||||||
|
]
|
||||||
|
ruler.add_patterns(patterns)
|
||||||
|
doc = ruler(nlp.make_doc("foo bar baz"))
|
||||||
|
assert len(doc.ents) == 1
|
||||||
|
assert doc.ents[0].label_ == "FOOBAR"
|
||||||
|
|
|
@ -71,6 +71,6 @@ def test_overfitting_IO():
|
||||||
|
|
||||||
def test_tagger_requires_labels():
|
def test_tagger_requires_labels():
|
||||||
nlp = English()
|
nlp = English()
|
||||||
tagger = nlp.add_pipe("tagger")
|
nlp.add_pipe("tagger")
|
||||||
with pytest.raises(ValueError):
|
with pytest.raises(ValueError):
|
||||||
optimizer = nlp.begin_training()
|
nlp.begin_training()
|
||||||
|
|
23
spacy/tests/regression/test_issue5838.py
Normal file
23
spacy/tests/regression/test_issue5838.py
Normal file
|
@ -0,0 +1,23 @@
|
||||||
|
from spacy.lang.en import English
|
||||||
|
from spacy.tokens import Span
|
||||||
|
from spacy import displacy
|
||||||
|
|
||||||
|
|
||||||
|
SAMPLE_TEXT = """First line
|
||||||
|
Second line, with ent
|
||||||
|
Third line
|
||||||
|
Fourth line
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def test_issue5838():
|
||||||
|
# Displacy's EntityRenderer break line
|
||||||
|
# not working after last entity
|
||||||
|
|
||||||
|
nlp = English()
|
||||||
|
doc = nlp(SAMPLE_TEXT)
|
||||||
|
doc.ents = [Span(doc, 7, 8, label="test")]
|
||||||
|
|
||||||
|
html = displacy.render(doc, style="ent")
|
||||||
|
found = html.count("</br>")
|
||||||
|
assert found == 4
|
27
spacy/tests/regression/test_issue5918.py
Normal file
27
spacy/tests/regression/test_issue5918.py
Normal file
|
@ -0,0 +1,27 @@
|
||||||
|
from spacy.lang.en import English
|
||||||
|
from spacy.pipeline import merge_entities
|
||||||
|
|
||||||
|
|
||||||
|
def test_issue5918():
|
||||||
|
# Test edge case when merging entities.
|
||||||
|
nlp = English()
|
||||||
|
ruler = nlp.add_pipe("entity_ruler")
|
||||||
|
patterns = [
|
||||||
|
{"label": "ORG", "pattern": "Digicon Inc"},
|
||||||
|
{"label": "ORG", "pattern": "Rotan Mosle Inc's"},
|
||||||
|
{"label": "ORG", "pattern": "Rotan Mosle Technology Partners Ltd"},
|
||||||
|
]
|
||||||
|
ruler.add_patterns(patterns)
|
||||||
|
|
||||||
|
text = """
|
||||||
|
Digicon Inc said it has completed the previously-announced disposition
|
||||||
|
of its computer systems division to an investment group led by
|
||||||
|
Rotan Mosle Inc's Rotan Mosle Technology Partners Ltd affiliate.
|
||||||
|
"""
|
||||||
|
doc = nlp(text)
|
||||||
|
assert len(doc.ents) == 3
|
||||||
|
# make it so that the third span's head is within the entity (ent_iob=I)
|
||||||
|
# bug #5918 would wrongly transfer that I to the full entity, resulting in 2 instead of 3 final ents.
|
||||||
|
doc[29].head = doc[33]
|
||||||
|
doc = merge_entities(doc)
|
||||||
|
assert len(doc.ents) == 3
|
|
@ -135,6 +135,7 @@ TRAIN_DATA = [
|
||||||
("Eat blue ham", {"tags": ["V", "J", "N"]}),
|
("Eat blue ham", {"tags": ["V", "J", "N"]}),
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
def test_tok2vec_listener():
|
def test_tok2vec_listener():
|
||||||
orig_config = Config().from_str(cfg_string)
|
orig_config = Config().from_str(cfg_string)
|
||||||
nlp, config = util.load_model_from_config(orig_config, auto_fill=True, validate=True)
|
nlp, config = util.load_model_from_config(orig_config, auto_fill=True, validate=True)
|
||||||
|
|
|
@ -29,6 +29,7 @@ NAUGHTY_STRINGS = [
|
||||||
r"₀₁₂",
|
r"₀₁₂",
|
||||||
r"⁰⁴⁵₀₁₂",
|
r"⁰⁴⁵₀₁₂",
|
||||||
r"ด้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็ ด้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็ ด้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็",
|
r"ด้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็ ด้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็ ด้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็็้้้้้้้้็็็็็้้้้้็็็็",
|
||||||
|
r" ̄ ̄",
|
||||||
# Two-Byte Characters
|
# Two-Byte Characters
|
||||||
r"田中さんにあげて下さい",
|
r"田中さんにあげて下さい",
|
||||||
r"パーティーへ行かないか",
|
r"パーティーへ行かないか",
|
||||||
|
|
|
@ -15,7 +15,7 @@ def test_tokenizer_splits_double_space(tokenizer, text):
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("text", ["lorem ipsum "])
|
@pytest.mark.parametrize("text", ["lorem ipsum "])
|
||||||
def test_tokenizer_handles_double_trainling_ws(tokenizer, text):
|
def test_tokenizer_handles_double_trailing_ws(tokenizer, text):
|
||||||
tokens = tokenizer(text)
|
tokens = tokenizer(text)
|
||||||
assert repr(tokens.text_with_ws) == repr(text)
|
assert repr(tokens.text_with_ws) == repr(text)
|
||||||
|
|
||||||
|
|
|
@ -169,6 +169,8 @@ def _merge(Doc doc, merges):
|
||||||
spans.append(span)
|
spans.append(span)
|
||||||
# House the new merged token where it starts
|
# House the new merged token where it starts
|
||||||
token = &doc.c[start]
|
token = &doc.c[start]
|
||||||
|
start_ent_iob = doc.c[start].ent_iob
|
||||||
|
start_ent_type = doc.c[start].ent_type
|
||||||
# Initially set attributes to attributes of span root
|
# Initially set attributes to attributes of span root
|
||||||
token.tag = doc.c[span.root.i].tag
|
token.tag = doc.c[span.root.i].tag
|
||||||
token.pos = doc.c[span.root.i].pos
|
token.pos = doc.c[span.root.i].pos
|
||||||
|
@ -181,8 +183,8 @@ def _merge(Doc doc, merges):
|
||||||
merged_iob = 3
|
merged_iob = 3
|
||||||
# If start token is I-ENT and previous token is of the same
|
# If start token is I-ENT and previous token is of the same
|
||||||
# type, then I-ENT (could check I-ENT from start to span root)
|
# type, then I-ENT (could check I-ENT from start to span root)
|
||||||
if doc.c[start].ent_iob == 1 and start > 0 \
|
if start_ent_iob == 1 and start > 0 \
|
||||||
and doc.c[start].ent_type == token.ent_type \
|
and start_ent_type == token.ent_type \
|
||||||
and doc.c[start - 1].ent_type == token.ent_type:
|
and doc.c[start - 1].ent_type == token.ent_type:
|
||||||
merged_iob = 1
|
merged_iob = 1
|
||||||
token.ent_iob = merged_iob
|
token.ent_iob = merged_iob
|
||||||
|
|
|
@ -336,17 +336,25 @@ cdef class Doc:
|
||||||
def doc(self):
|
def doc(self):
|
||||||
return self
|
return self
|
||||||
|
|
||||||
def char_span(self, int start_idx, int end_idx, label=0, kb_id=0, vector=None):
|
def char_span(self, int start_idx, int end_idx, label=0, kb_id=0, vector=None, alignment_mode="strict"):
|
||||||
"""Create a `Span` object from the slice `doc.text[start : end]`.
|
"""Create a `Span` object from the slice
|
||||||
|
`doc.text[start_idx : end_idx]`. Returns None if no valid `Span` can be
|
||||||
|
created.
|
||||||
|
|
||||||
doc (Doc): The parent document.
|
doc (Doc): The parent document.
|
||||||
start (int): The index of the first character of the span.
|
start_idx (int): The index of the first character of the span.
|
||||||
end (int): The index of the first character after the span.
|
end_idx (int): The index of the first character after the span.
|
||||||
label (uint64 or string): A label to attach to the Span, e.g. for
|
label (uint64 or string): A label to attach to the Span, e.g. for
|
||||||
named entities.
|
named entities.
|
||||||
kb_id (uint64 or string): An ID from a KB to capture the meaning of a named entity.
|
kb_id (uint64 or string): An ID from a KB to capture the meaning of a
|
||||||
|
named entity.
|
||||||
vector (ndarray[ndim=1, dtype='float32']): A meaning representation of
|
vector (ndarray[ndim=1, dtype='float32']): A meaning representation of
|
||||||
the span.
|
the span.
|
||||||
|
alignment_mode (str): How character indices are aligned to token
|
||||||
|
boundaries. Options: "strict" (character indices must be aligned
|
||||||
|
with token boundaries), "contract" (span of all tokens completely
|
||||||
|
within the character span), "expand" (span of all tokens at least
|
||||||
|
partially covered by the character span). Defaults to "strict".
|
||||||
RETURNS (Span): The newly constructed object.
|
RETURNS (Span): The newly constructed object.
|
||||||
|
|
||||||
DOCS: https://nightly.spacy.io/api/doc#char_span
|
DOCS: https://nightly.spacy.io/api/doc#char_span
|
||||||
|
@ -355,12 +363,29 @@ cdef class Doc:
|
||||||
label = self.vocab.strings.add(label)
|
label = self.vocab.strings.add(label)
|
||||||
if not isinstance(kb_id, int):
|
if not isinstance(kb_id, int):
|
||||||
kb_id = self.vocab.strings.add(kb_id)
|
kb_id = self.vocab.strings.add(kb_id)
|
||||||
cdef int start = token_by_start(self.c, self.length, start_idx)
|
if alignment_mode not in ("strict", "contract", "expand"):
|
||||||
if start == -1:
|
alignment_mode = "strict"
|
||||||
|
cdef int start = token_by_char(self.c, self.length, start_idx)
|
||||||
|
if start < 0 or (alignment_mode == "strict" and start_idx != self[start].idx):
|
||||||
return None
|
return None
|
||||||
cdef int end = token_by_end(self.c, self.length, end_idx)
|
# end_idx is exclusive, so find the token at one char before
|
||||||
if end == -1:
|
cdef int end = token_by_char(self.c, self.length, end_idx - 1)
|
||||||
|
if end < 0 or (alignment_mode == "strict" and end_idx != self[end].idx + len(self[end])):
|
||||||
return None
|
return None
|
||||||
|
# Adjust start and end by alignment_mode
|
||||||
|
if alignment_mode == "contract":
|
||||||
|
if self[start].idx < start_idx:
|
||||||
|
start += 1
|
||||||
|
if end_idx < self[end].idx + len(self[end]):
|
||||||
|
end -= 1
|
||||||
|
# if no tokens are completely within the span, return None
|
||||||
|
if end < start:
|
||||||
|
return None
|
||||||
|
elif alignment_mode == "expand":
|
||||||
|
# Don't consider the trailing whitespace to be part of the previous
|
||||||
|
# token
|
||||||
|
if start_idx == self[start].idx + len(self[start]):
|
||||||
|
start += 1
|
||||||
# Currently we have the token index, we want the range-end index
|
# Currently we have the token index, we want the range-end index
|
||||||
end += 1
|
end += 1
|
||||||
cdef Span span = Span(self, start, end, label=label, kb_id=kb_id, vector=vector)
|
cdef Span span = Span(self, start, end, label=label, kb_id=kb_id, vector=vector)
|
||||||
|
@ -1268,23 +1293,35 @@ cdef class Doc:
|
||||||
|
|
||||||
|
|
||||||
cdef int token_by_start(const TokenC* tokens, int length, int start_char) except -2:
|
cdef int token_by_start(const TokenC* tokens, int length, int start_char) except -2:
|
||||||
cdef int i
|
cdef int i = token_by_char(tokens, length, start_char)
|
||||||
for i in range(length):
|
if i >= 0 and tokens[i].idx == start_char:
|
||||||
if tokens[i].idx == start_char:
|
return i
|
||||||
return i
|
|
||||||
else:
|
else:
|
||||||
return -1
|
return -1
|
||||||
|
|
||||||
|
|
||||||
cdef int token_by_end(const TokenC* tokens, int length, int end_char) except -2:
|
cdef int token_by_end(const TokenC* tokens, int length, int end_char) except -2:
|
||||||
cdef int i
|
# end_char is exclusive, so find the token at one char before
|
||||||
for i in range(length):
|
cdef int i = token_by_char(tokens, length, end_char - 1)
|
||||||
if tokens[i].idx + tokens[i].lex.length == end_char:
|
if i >= 0 and tokens[i].idx + tokens[i].lex.length == end_char:
|
||||||
return i
|
return i
|
||||||
else:
|
else:
|
||||||
return -1
|
return -1
|
||||||
|
|
||||||
|
|
||||||
|
cdef int token_by_char(const TokenC* tokens, int length, int char_idx) except -2:
|
||||||
|
cdef int start = 0, mid, end = length - 1
|
||||||
|
while start <= end:
|
||||||
|
mid = (start + end) / 2
|
||||||
|
if char_idx < tokens[mid].idx:
|
||||||
|
end = mid - 1
|
||||||
|
elif char_idx >= tokens[mid].idx + tokens[mid].lex.length + tokens[mid].spacy:
|
||||||
|
start = mid + 1
|
||||||
|
else:
|
||||||
|
return mid
|
||||||
|
return -1
|
||||||
|
|
||||||
|
|
||||||
cdef int set_children_from_heads(TokenC* tokens, int length) except -1:
|
cdef int set_children_from_heads(TokenC* tokens, int length) except -1:
|
||||||
cdef TokenC* head
|
cdef TokenC* head
|
||||||
cdef TokenC* child
|
cdef TokenC* child
|
||||||
|
|
|
@ -186,8 +186,9 @@ Remove a previously registered extension.
|
||||||
|
|
||||||
## Doc.char_span {#char_span tag="method" new="2"}
|
## Doc.char_span {#char_span tag="method" new="2"}
|
||||||
|
|
||||||
Create a `Span` object from the slice `doc.text[start:end]`. Returns `None` if
|
Create a `Span` object from the slice `doc.text[start_idx:end_idx]`. Returns
|
||||||
the character indices don't map to a valid span.
|
`None` if the character indices don't map to a valid span using the default mode
|
||||||
|
`"strict".
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
|
@ -197,14 +198,15 @@ the character indices don't map to a valid span.
|
||||||
> assert span.text == "New York"
|
> assert span.text == "New York"
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ------------------------------------ | ----------------------------------------------------------------------------------------- |
|
| ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `start` | The index of the first character of the span. ~~int~~ |
|
| `start` | The index of the first character of the span. ~~int~~ |
|
||||||
| `end` | The index of the last character after the span. ~int~~ |
|
| `end` | The index of the last character after the span. ~int~~ |
|
||||||
| `label` | A label to attach to the span, e.g. for named entities. ~~Union[int, str]~~ |
|
| `label` | A label to attach to the span, e.g. for named entities. ~~Union[int, str]~~ |
|
||||||
| `kb_id` <Tag variant="new">2.2</Tag> | An ID from a knowledge base to capture the meaning of a named entity. ~~Union[int, str]~~ |
|
| `kb_id` <Tag variant="new">2.2</Tag> | An ID from a knowledge base to capture the meaning of a named entity. ~~Union[int, str]~~ |
|
||||||
| `vector` | A meaning representation of the span. ~~numpy.ndarray[ndim=1, dtype=float32]~~ |
|
| `vector` | A meaning representation of the span. ~~numpy.ndarray[ndim=1, dtype=float32]~~ |
|
||||||
| **RETURNS** | The newly constructed object or `None`. ~~Optional[Span]~~ |
|
| `mode` | How character indices snap to token boundaries. Options: `"strict"` (no snapping), `"inside"` (span of all tokens completely within the character span), `"outside"` (span of all tokens at least partially covered by the character span). Defaults to `"strict"`. ~~str~~ |
|
||||||
|
| **RETURNS** | The newly constructed object or `None`. ~~Optional[Span]~~ |
|
||||||
|
|
||||||
## Doc.similarity {#similarity tag="method" model="vectors"}
|
## Doc.similarity {#similarity tag="method" model="vectors"}
|
||||||
|
|
||||||
|
|
|
@ -1021,7 +1021,7 @@ expressions – for example,
|
||||||
[`compile_suffix_regex`](/api/top-level#util.compile_suffix_regex):
|
[`compile_suffix_regex`](/api/top-level#util.compile_suffix_regex):
|
||||||
|
|
||||||
```python
|
```python
|
||||||
suffixes = nlp.Defaults.suffixes + (r'''-+$''',)
|
suffixes = nlp.Defaults.suffixes + [r'''-+$''',]
|
||||||
suffix_regex = spacy.util.compile_suffix_regex(suffixes)
|
suffix_regex = spacy.util.compile_suffix_regex(suffixes)
|
||||||
nlp.tokenizer.suffix_search = suffix_regex.search
|
nlp.tokenizer.suffix_search = suffix_regex.search
|
||||||
```
|
```
|
||||||
|
|
|
@ -1,5 +1,30 @@
|
||||||
{
|
{
|
||||||
"resources": [
|
"resources": [
|
||||||
|
{
|
||||||
|
"id": "spacy-sentence-bert",
|
||||||
|
"title": "spaCy - sentence-transformers",
|
||||||
|
"slogan": "Pipelines for pretrained sentence-transformers (BERT, RoBERTa, XLM-RoBERTa & Co.) directly within spaCy",
|
||||||
|
"description": "This library lets you use the embeddings from [sentence-transformers](https://github.com/UKPLab/sentence-transformers) of Docs, Spans and Tokens directly from spaCy. Most models are for the english language but three of them are multilingual.",
|
||||||
|
"github": "MartinoMensio/spacy-sentence-bert",
|
||||||
|
"pip": "spacy-sentence-bert",
|
||||||
|
"code_example": [
|
||||||
|
"import spacy_sentence_bert",
|
||||||
|
"# load one of the models listed at https://github.com/MartinoMensio/spacy-sentence-bert/",
|
||||||
|
"nlp = spacy_sentence_bert.load_model('en_roberta_large_nli_stsb_mean_tokens')",
|
||||||
|
"# get two documents",
|
||||||
|
"doc_1 = nlp('Hi there, how are you?')",
|
||||||
|
"doc_2 = nlp('Hello there, how are you doing today?')",
|
||||||
|
"# use the similarity method that is based on the vectors, on Doc, Span or Token",
|
||||||
|
"print(doc_1.similarity(doc_2[0:7]))"
|
||||||
|
],
|
||||||
|
"category": ["models", "pipeline"],
|
||||||
|
"author": "Martino Mensio",
|
||||||
|
"author_links": {
|
||||||
|
"twitter": "MartinoMensio",
|
||||||
|
"github": "MartinoMensio",
|
||||||
|
"website": "https://martinomensio.github.io"
|
||||||
|
}
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": "spacy-streamlit",
|
"id": "spacy-streamlit",
|
||||||
"title": "spacy-streamlit",
|
"title": "spacy-streamlit",
|
||||||
|
@ -55,13 +80,14 @@
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": "spacy-universal-sentence-encoder",
|
"id": "spacy-universal-sentence-encoder",
|
||||||
"title": "SpaCy - Universal Sentence Encoder",
|
"title": "spaCy - Universal Sentence Encoder",
|
||||||
"slogan": "Make use of Google's Universal Sentence Encoder directly within SpaCy",
|
"slogan": "Make use of Google's Universal Sentence Encoder directly within spaCy",
|
||||||
"description": "This library lets you use Universal Sentence Encoder embeddings of Docs, Spans and Tokens directly from TensorFlow Hub",
|
"description": "This library lets you use Universal Sentence Encoder embeddings of Docs, Spans and Tokens directly from TensorFlow Hub",
|
||||||
"github": "MartinoMensio/spacy-universal-sentence-encoder-tfhub",
|
"github": "MartinoMensio/spacy-universal-sentence-encoder",
|
||||||
|
"pip": "spacy-universal-sentence-encoder",
|
||||||
"code_example": [
|
"code_example": [
|
||||||
"import spacy_universal_sentence_encoder",
|
"import spacy_universal_sentence_encoder",
|
||||||
"load one of the models: ['en_use_md', 'en_use_lg', 'xx_use_md', 'xx_use_lg']",
|
"# load one of the models: ['en_use_md', 'en_use_lg', 'xx_use_md', 'xx_use_lg']",
|
||||||
"nlp = spacy_universal_sentence_encoder.load_model('en_use_lg')",
|
"nlp = spacy_universal_sentence_encoder.load_model('en_use_lg')",
|
||||||
"# get two documents",
|
"# get two documents",
|
||||||
"doc_1 = nlp('Hi there, how are you?')",
|
"doc_1 = nlp('Hi there, how are you?')",
|
||||||
|
@ -1436,7 +1462,7 @@
|
||||||
"id": "podcast-init",
|
"id": "podcast-init",
|
||||||
"title": "Podcast.__init__ #87: spaCy with Matthew Honnibal",
|
"title": "Podcast.__init__ #87: spaCy with Matthew Honnibal",
|
||||||
"slogan": "December 2017",
|
"slogan": "December 2017",
|
||||||
"description": "As the amount of text available on the internet and in businesses continues to increase, the need for fast and accurate language analysis becomes more prominent. This week Matthew Honnibal, the creator of SpaCy, talks about his experiences researching natural language processing and creating a library to make his findings accessible to industry.",
|
"description": "As the amount of text available on the internet and in businesses continues to increase, the need for fast and accurate language analysis becomes more prominent. This week Matthew Honnibal, the creator of spaCy, talks about his experiences researching natural language processing and creating a library to make his findings accessible to industry.",
|
||||||
"iframe": "https://www.pythonpodcast.com/wp-content/plugins/podlove-podcasting-plugin-for-wordpress/lib/modules/podlove_web_player/player_v4/dist/share.html?episode=https://www.pythonpodcast.com/?podlove_player4=176",
|
"iframe": "https://www.pythonpodcast.com/wp-content/plugins/podlove-podcasting-plugin-for-wordpress/lib/modules/podlove_web_player/player_v4/dist/share.html?episode=https://www.pythonpodcast.com/?podlove_player4=176",
|
||||||
"iframe_height": 200,
|
"iframe_height": 200,
|
||||||
"thumb": "https://i.imgur.com/rpo6BuY.png",
|
"thumb": "https://i.imgur.com/rpo6BuY.png",
|
||||||
|
@ -1452,7 +1478,7 @@
|
||||||
"id": "podcast-init2",
|
"id": "podcast-init2",
|
||||||
"title": "Podcast.__init__ #256: An Open Source Toolchain For NLP From Explosion AI",
|
"title": "Podcast.__init__ #256: An Open Source Toolchain For NLP From Explosion AI",
|
||||||
"slogan": "March 2020",
|
"slogan": "March 2020",
|
||||||
"description": "The state of the art in natural language processing is a constantly moving target. With the rise of deep learning, previously cutting edge techniques have given way to robust language models. Through it all the team at Explosion AI have built a strong presence with the trifecta of SpaCy, Thinc, and Prodigy to support fast and flexible data labeling to feed deep learning models and performant and scalable text processing. In this episode founder and open source author Matthew Honnibal shares his experience growing a business around cutting edge open source libraries for the machine learning developent process.",
|
"description": "The state of the art in natural language processing is a constantly moving target. With the rise of deep learning, previously cutting edge techniques have given way to robust language models. Through it all the team at Explosion AI have built a strong presence with the trifecta of spaCy, Thinc, and Prodigy to support fast and flexible data labeling to feed deep learning models and performant and scalable text processing. In this episode founder and open source author Matthew Honnibal shares his experience growing a business around cutting edge open source libraries for the machine learning developent process.",
|
||||||
"iframe": "https://cdn.podlove.org/web-player/share.html?episode=https%3A%2F%2Fwww.pythonpodcast.com%2F%3Fpodlove_player4%3D614",
|
"iframe": "https://cdn.podlove.org/web-player/share.html?episode=https%3A%2F%2Fwww.pythonpodcast.com%2F%3Fpodlove_player4%3D614",
|
||||||
"iframe_height": 200,
|
"iframe_height": 200,
|
||||||
"thumb": "https://i.imgur.com/rpo6BuY.png",
|
"thumb": "https://i.imgur.com/rpo6BuY.png",
|
||||||
|
@ -1483,7 +1509,7 @@
|
||||||
"id": "twimlai-podcast",
|
"id": "twimlai-podcast",
|
||||||
"title": "TWiML & AI: Practical NLP with spaCy and Prodigy",
|
"title": "TWiML & AI: Practical NLP with spaCy and Prodigy",
|
||||||
"slogan": "May 2019",
|
"slogan": "May 2019",
|
||||||
"description": "\"Ines and I caught up to discuss her various projects, including the aforementioned SpaCy, an open-source NLP library built with a focus on industry and production use cases. In our conversation, Ines gives us an overview of the SpaCy Library, a look at some of the use cases that excite her, and the Spacy community and contributors. We also discuss her work with Prodigy, an annotation service tool that uses continuous active learning to train models, and finally, what other exciting projects she is working on.\"",
|
"description": "\"Ines and I caught up to discuss her various projects, including the aforementioned spaCy, an open-source NLP library built with a focus on industry and production use cases. In our conversation, Ines gives us an overview of the spaCy Library, a look at some of the use cases that excite her, and the Spacy community and contributors. We also discuss her work with Prodigy, an annotation service tool that uses continuous active learning to train models, and finally, what other exciting projects she is working on.\"",
|
||||||
"thumb": "https://i.imgur.com/ng2F5gK.png",
|
"thumb": "https://i.imgur.com/ng2F5gK.png",
|
||||||
"url": "https://twimlai.com/twiml-talk-262-practical-natural-language-processing-with-spacy-and-prodigy-w-ines-montani",
|
"url": "https://twimlai.com/twiml-talk-262-practical-natural-language-processing-with-spacy-and-prodigy-w-ines-montani",
|
||||||
"iframe": "https://html5-player.libsyn.com/embed/episode/id/9691514/height/90/theme/custom/thumbnail/no/preload/no/direction/backward/render-playlist/no/custom-color/3e85b1/",
|
"iframe": "https://html5-player.libsyn.com/embed/episode/id/9691514/height/90/theme/custom/thumbnail/no/preload/no/direction/backward/render-playlist/no/custom-color/3e85b1/",
|
||||||
|
@ -1515,7 +1541,7 @@
|
||||||
"id": "practical-ai-podcast",
|
"id": "practical-ai-podcast",
|
||||||
"title": "Practical AI: Modern NLP with spaCy",
|
"title": "Practical AI: Modern NLP with spaCy",
|
||||||
"slogan": "December 2019",
|
"slogan": "December 2019",
|
||||||
"description": "\"SpaCy is awesome for NLP! It’s easy to use, has widespread adoption, is open source, and integrates the latest language models. Ines Montani and Matthew Honnibal (core developers of spaCy and co-founders of Explosion) join us to discuss the history of the project, its capabilities, and the latest trends in NLP. We also dig into the practicalities of taking NLP workflows to production. You don’t want to miss this episode!\"",
|
"description": "\"spaCy is awesome for NLP! It’s easy to use, has widespread adoption, is open source, and integrates the latest language models. Ines Montani and Matthew Honnibal (core developers of spaCy and co-founders of Explosion) join us to discuss the history of the project, its capabilities, and the latest trends in NLP. We also dig into the practicalities of taking NLP workflows to production. You don’t want to miss this episode!\"",
|
||||||
"thumb": "https://i.imgur.com/jn8Bcdw.png",
|
"thumb": "https://i.imgur.com/jn8Bcdw.png",
|
||||||
"url": "https://changelog.com/practicalai/68",
|
"url": "https://changelog.com/practicalai/68",
|
||||||
"author": "Daniel Whitenack & Chris Benson",
|
"author": "Daniel Whitenack & Chris Benson",
|
||||||
|
@ -1770,26 +1796,33 @@
|
||||||
{
|
{
|
||||||
"id": "spacy-conll",
|
"id": "spacy-conll",
|
||||||
"title": "spacy_conll",
|
"title": "spacy_conll",
|
||||||
"slogan": "Parse text with spaCy and gets its output in CoNLL-U format",
|
"slogan": "Parsing to CoNLL with spaCy, spacy-stanza, and spacy-udpipe",
|
||||||
"description": "This module allows you to parse a text to CoNLL-U format. It contains a pipeline component for spaCy that adds CoNLL-U properties to a Doc and its sentences. It can also be used as a command-line tool.",
|
"description": "This module allows you to parse text into CoNLL-U format. You can use it as a command line tool, or embed it in your own scripts by adding it as a custom pipeline component to a spaCy, spacy-stanfordnlp, spacy-stanza, or spacy-udpipe pipeline. It also provides an easy-to-use function to quickly initialize a parser. CoNLL-related properties are added to Doc elements, sentence Spans, and Tokens.",
|
||||||
"code_example": [
|
"code_example": [
|
||||||
"import spacy",
|
"from spacy_conll import init_parser",
|
||||||
"from spacy_conll import ConllFormatter",
|
|
||||||
"",
|
"",
|
||||||
"nlp = spacy.load('en')",
|
"",
|
||||||
"conllformatter = ConllFormatter(nlp)",
|
"# Initialise English parser, already including the ConllFormatter as a pipeline component.",
|
||||||
"nlp.add_pipe(conllformatter, after='parser')",
|
"# Indicate that we want to get the CoNLL headers in the string output.",
|
||||||
"doc = nlp('I like cookies. Do you?')",
|
"# `use_gpu` and `verbose` are specific to stanza (and stanfordnlp). These keywords arguments",
|
||||||
"conll = doc._.conll",
|
"# are passed onto their Pipeline() initialisation",
|
||||||
"print(doc._.conll_str_headers)",
|
"nlp = init_parser(\"stanza\",",
|
||||||
"print(doc._.conll_str)"
|
" \"en\",",
|
||||||
|
" parser_opts={\"use_gpu\": True, \"verbose\": False},",
|
||||||
|
" include_headers=True)",
|
||||||
|
"# Parse a given string",
|
||||||
|
"doc = nlp(\"A cookie is a baked or cooked food that is typically small, flat and sweet. It usually contains flour, sugar and some type of oil or fat.\")",
|
||||||
|
"",
|
||||||
|
"# Get the CoNLL representation of the whole document, including headers",
|
||||||
|
"conll = doc._.conll_str",
|
||||||
|
"print(conll)"
|
||||||
],
|
],
|
||||||
"code_language": "python",
|
"code_language": "python",
|
||||||
"author": "Bram Vanroy",
|
"author": "Bram Vanroy",
|
||||||
"author_links": {
|
"author_links": {
|
||||||
"github": "BramVanroy",
|
"github": "BramVanroy",
|
||||||
"twitter": "BramVanroy",
|
"twitter": "BramVanroy",
|
||||||
"website": "https://bramvanroy.be"
|
"website": "http://bramvanroy.be"
|
||||||
},
|
},
|
||||||
"github": "BramVanroy/spacy_conll",
|
"github": "BramVanroy/spacy_conll",
|
||||||
"category": ["standalone", "pipeline"],
|
"category": ["standalone", "pipeline"],
|
||||||
|
@ -1935,6 +1968,28 @@
|
||||||
"category": ["pipeline"],
|
"category": ["pipeline"],
|
||||||
"tags": ["inflection", "lemmatizer"]
|
"tags": ["inflection", "lemmatizer"]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": "amrlib",
|
||||||
|
"slogan": "A python library that makes AMR parsing, generation and visualization simple.",
|
||||||
|
"description": "amrlib is a python module and spaCy add-in for Abstract Meaning Representation (AMR). The system can parse sentences to AMR graphs or generate text from existing graphs. It includes a GUI for visualization and experimentation.",
|
||||||
|
"github": "bjascob/amrlib",
|
||||||
|
"pip": "amrlib",
|
||||||
|
"code_example": [
|
||||||
|
"import spacy",
|
||||||
|
"import amrlib",
|
||||||
|
"amrlib.setup_spacy_extension()",
|
||||||
|
"nlp = spacy.load('en_core_web_sm')",
|
||||||
|
"doc = nlp('This is a test of the spaCy extension. The test has multiple sentences.')",
|
||||||
|
"graphs = doc._.to_amr()",
|
||||||
|
"for graph in graphs:",
|
||||||
|
" print(graph)"
|
||||||
|
],
|
||||||
|
"author": "Brad Jascob",
|
||||||
|
"author_links": {
|
||||||
|
"github": "bjascob"
|
||||||
|
},
|
||||||
|
"category": ["pipeline"]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": "blackstone",
|
"id": "blackstone",
|
||||||
"title": "Blackstone",
|
"title": "Blackstone",
|
||||||
|
@ -2138,7 +2193,7 @@
|
||||||
"category": ["scientific"],
|
"category": ["scientific"],
|
||||||
"tags": ["sentence segmentation"],
|
"tags": ["sentence segmentation"],
|
||||||
"code_example": [
|
"code_example": [
|
||||||
"from pysbd.util import PySBDFactory",
|
"from pysbd.utils import PySBDFactory",
|
||||||
"",
|
"",
|
||||||
"nlp = spacy.blank('en')",
|
"nlp = spacy.blank('en')",
|
||||||
"nlp.add_pipe(PySBDFactory(nlp))",
|
"nlp.add_pipe(PySBDFactory(nlp))",
|
||||||
|
|
Loading…
Reference in New Issue
Block a user