mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 01:04:34 +03:00
Merge branch 'master' into feature/nel-wiki
This commit is contained in:
commit
d83a1e3052
106
.github/contributors/BreakBB.md
vendored
Normal file
106
.github/contributors/BreakBB.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | ------------------------ |
|
||||||
|
| Name | Björn Böing |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 15.04.2019 |
|
||||||
|
| GitHub username | BreakBB |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/Dobita21.md
vendored
Normal file
106
.github/contributors/Dobita21.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Nattapol |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 18.04.2019 |
|
||||||
|
| GitHub username | Dobita21 |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/F0rge1cE.md
vendored
Normal file
106
.github/contributors/F0rge1cE.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Icarus Xu |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 05/06/2019 |
|
||||||
|
| GitHub username | F0rge1cE |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/NirantK.md
vendored
Normal file
106
.github/contributors/NirantK.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Nirant Kasliwal |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | |
|
||||||
|
| GitHub username | NirantK |
|
||||||
|
| Website (optional) | https://nirantk.com |
|
106
.github/contributors/aaronkub.md
vendored
Normal file
106
.github/contributors/aaronkub.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Aaron Kub |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2019-05-09 |
|
||||||
|
| GitHub username | aaronkub |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/amitness.md
vendored
Normal file
106
.github/contributors/amitness.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [X] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Amit Chaudhary |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | April 29, 2019 |
|
||||||
|
| GitHub username | amitness |
|
||||||
|
| Website (optional) | https://amitness.com |
|
106
.github/contributors/bjascob.md
vendored
Normal file
106
.github/contributors/bjascob.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Brad Jascob |
|
||||||
|
| Company name (if applicable) | n/a |
|
||||||
|
| Title or role (if applicable) | Software Engineer |
|
||||||
|
| Date | 04/25/2019 |
|
||||||
|
| GitHub username | bjascob |
|
||||||
|
| Website (optional) | n/a |
|
106
.github/contributors/bryant1410.md
vendored
Normal file
106
.github/contributors/bryant1410.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Santiago Castro |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2019-04-09 |
|
||||||
|
| GitHub username | bryant1410 |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/celikomer.md
vendored
Normal file
106
.github/contributors/celikomer.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Omer Celik |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 04/11/2019 |
|
||||||
|
| GitHub username | celikomer |
|
||||||
|
| Website (optional) | www.ocelik.com |
|
106
.github/contributors/estr4ng7d.md
vendored
Normal file
106
.github/contributors/estr4ng7d.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Amey Baviskar |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 21-May-2019 |
|
||||||
|
| GitHub username | estr4ng7d |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/fizban99.md
vendored
Normal file
106
.github/contributors/fizban99.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | A.I.M. |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 16.04.2019 |
|
||||||
|
| GitHub username | fizban99 |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/henry860916.md
vendored
Normal file
106
.github/contributors/henry860916.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | ------------------------ |
|
||||||
|
| Name | Henry Zhang |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2019-04-30 |
|
||||||
|
| GitHub username | henry860916 |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/ldorigo.md
vendored
Normal file
106
.github/contributors/ldorigo.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Luca Dorigo |
|
||||||
|
| Company name (if applicable) | / |
|
||||||
|
| Title or role (if applicable) | / |
|
||||||
|
| Date | 08.05.2019 |
|
||||||
|
| GitHub username | ldorigo |
|
||||||
|
| Website (optional) | / |
|
106
.github/contributors/munozbravo.md
vendored
Normal file
106
.github/contributors/munozbravo.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Germán Muñoz |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2019-06-01 |
|
||||||
|
| GitHub username | munozbravo |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/nipunsadvilkar.md
vendored
Normal file
106
.github/contributors/nipunsadvilkar.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Nipun Sadvilkar |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 31st May, 2019 |
|
||||||
|
| GitHub username | nipunsadvilkar|
|
||||||
|
| Website (optional) |https://nipunsadvilkar.github.io/|
|
106
.github/contributors/pickfire.md
vendored
Normal file
106
.github/contributors/pickfire.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Ivan Tham Jun Hoe |
|
||||||
|
| Company name (if applicable) | Semut |
|
||||||
|
| Title or role (if applicable) | Data Analyst |
|
||||||
|
| Date | Apr 11, 2019 |
|
||||||
|
| GitHub username | pickfire |
|
||||||
|
| Website (optional) | https://pickfire.tk |
|
106
.github/contributors/richardpaulhudson.md
vendored
Normal file
106
.github/contributors/richardpaulhudson.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Richard Paul Hudson |
|
||||||
|
| Company name (if applicable) | msg systems ag |
|
||||||
|
| Title or role (if applicable) | Principal IT Consultant|
|
||||||
|
| Date | 06. May 2019 |
|
||||||
|
| GitHub username | richardpaulhudson |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/ujwal-narayan.md
vendored
Normal file
106
.github/contributors/ujwal-narayan.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Ujwal Narayan |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 17/05/2019 |
|
||||||
|
| GitHub username | ujwal-narayan |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/xssChauhan.md
vendored
Normal file
106
.github/contributors/xssChauhan.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Shikhar Chauhan |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 12/11/2019 |
|
||||||
|
| GitHub username | xssChauhan |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/yaph.md
vendored
Normal file
106
.github/contributors/yaph.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Ramiro Gómez |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2019-04-29 |
|
||||||
|
| GitHub username | yaph |
|
||||||
|
| Website (optional) | http://ramiro.org/ |
|
|
@ -447,17 +447,7 @@ use the `get_doc()` utility function to construct it manually.
|
||||||
|
|
||||||
## Updating the website
|
## Updating the website
|
||||||
|
|
||||||
Our [website and docs](https://spacy.io) are implemented in
|
For instructions on how to build and run the [website](https://spacy.io) locally see **[Setup and installation](https://github.com/explosion/spaCy/blob/master/website/README.md#setup-and-installation-setup)** in the *website* directory's README.
|
||||||
[Jade/Pug](https://www.jade-lang.org), and built or served by
|
|
||||||
[Harp](https://harpjs.com). Jade/Pug is an extensible templating language with a
|
|
||||||
readable syntax, that compiles to HTML. Here's how to view the site locally:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
sudo npm install --global harp
|
|
||||||
git clone https://github.com/explosion/spaCy
|
|
||||||
cd spaCy/website
|
|
||||||
harp server
|
|
||||||
```
|
|
||||||
|
|
||||||
The docs can always use another example or more detail, and they should always
|
The docs can always use another example or more detail, and they should always
|
||||||
be up to date and not misleading. To quickly find the correct file to edit,
|
be up to date and not misleading. To quickly find the correct file to edit,
|
||||||
|
|
16
README.md
16
README.md
|
@ -6,11 +6,10 @@ spaCy is a library for advanced Natural Language Processing in Python and
|
||||||
Cython. It's built on the very latest research, and was designed from day one
|
Cython. It's built on the very latest research, and was designed from day one
|
||||||
to be used in real products. spaCy comes with
|
to be used in real products. spaCy comes with
|
||||||
[pre-trained statistical models](https://spacy.io/models) and word vectors, and
|
[pre-trained statistical models](https://spacy.io/models) and word vectors, and
|
||||||
currently supports tokenization for **45+ languages**. It features the
|
currently supports tokenization for **49+ languages**. It features
|
||||||
**fastest syntactic parser** in the world, convolutional
|
state-of-the-art speed, convolutional **neural network models** for tagging,
|
||||||
**neural network models** for tagging, parsing and **named entity recognition**
|
parsing and **named entity recognition** and easy **deep learning** integration.
|
||||||
and easy **deep learning** integration. It's commercial open-source software,
|
It's commercial open-source software, released under the MIT license.
|
||||||
released under the MIT license.
|
|
||||||
|
|
||||||
💫 **Version 2.1 out now!** [Check out the release notes here.](https://github.com/explosion/spaCy/releases)
|
💫 **Version 2.1 out now!** [Check out the release notes here.](https://github.com/explosion/spaCy/releases)
|
||||||
|
|
||||||
|
@ -66,11 +65,11 @@ valuable if it's shared publicly, so that more people can benefit from it.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **Fastest syntactic parser** in the world
|
|
||||||
- **Named entity** recognition
|
|
||||||
- Non-destructive **tokenization**
|
- Non-destructive **tokenization**
|
||||||
- Support for **45+ languages**
|
- **Named entity** recognition
|
||||||
|
- Support for **49+ languages**
|
||||||
- Pre-trained [statistical models](https://spacy.io/models) and word vectors
|
- Pre-trained [statistical models](https://spacy.io/models) and word vectors
|
||||||
|
- State-of-the-art speed
|
||||||
- Easy **deep learning** integration
|
- Easy **deep learning** integration
|
||||||
- Part-of-speech tagging
|
- Part-of-speech tagging
|
||||||
- Labelled dependency parsing
|
- Labelled dependency parsing
|
||||||
|
@ -80,7 +79,6 @@ valuable if it's shared publicly, so that more people can benefit from it.
|
||||||
- Export to numpy data arrays
|
- Export to numpy data arrays
|
||||||
- Efficient binary serialization
|
- Efficient binary serialization
|
||||||
- Easy **model packaging** and deployment
|
- Easy **model packaging** and deployment
|
||||||
- State-of-the-art speed
|
|
||||||
- Robust, rigorously evaluated accuracy
|
- Robust, rigorously evaluated accuracy
|
||||||
|
|
||||||
📖 **For more details, see the
|
📖 **For more details, see the
|
||||||
|
|
|
@ -16,4 +16,4 @@ version=${version/\'/}
|
||||||
version=${version/\"/}
|
version=${version/\"/}
|
||||||
version=${version/\"/}
|
version=${version/\"/}
|
||||||
git tag "v$version"
|
git tag "v$version"
|
||||||
git push origin "v$version" --tags
|
git push origin "v$version"
|
||||||
|
|
|
@ -36,11 +36,27 @@ def main(model="en_core_web_sm"):
|
||||||
print("{:<10}\t{}\t{}".format(r1.text, r2.ent_type_, r2.text))
|
print("{:<10}\t{}\t{}".format(r1.text, r2.ent_type_, r2.text))
|
||||||
|
|
||||||
|
|
||||||
|
def filter_spans(spans):
|
||||||
|
# Filter a sequence of spans so they don't contain overlaps
|
||||||
|
get_sort_key = lambda span: (span.end - span.start, span.start)
|
||||||
|
sorted_spans = sorted(spans, key=get_sort_key, reverse=True)
|
||||||
|
result = []
|
||||||
|
seen_tokens = set()
|
||||||
|
for span in sorted_spans:
|
||||||
|
if span.start not in seen_tokens and span.end - 1 not in seen_tokens:
|
||||||
|
result.append(span)
|
||||||
|
seen_tokens.update(range(span.start, span.end))
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
def extract_currency_relations(doc):
|
def extract_currency_relations(doc):
|
||||||
# merge entities and noun chunks into one token
|
# Merge entities and noun chunks into one token
|
||||||
|
seen_tokens = set()
|
||||||
spans = list(doc.ents) + list(doc.noun_chunks)
|
spans = list(doc.ents) + list(doc.noun_chunks)
|
||||||
for span in spans:
|
spans = filter_spans(spans)
|
||||||
span.merge()
|
with doc.retokenize() as retokenizer:
|
||||||
|
for span in spans:
|
||||||
|
retokenizer.merge(span)
|
||||||
|
|
||||||
relations = []
|
relations = []
|
||||||
for money in filter(lambda w: w.ent_type_ == "MONEY", doc):
|
for money in filter(lambda w: w.ent_type_ == "MONEY", doc):
|
||||||
|
|
|
@ -9,9 +9,10 @@ srsly>=0.0.5,<1.1.0
|
||||||
# Third party dependencies
|
# Third party dependencies
|
||||||
numpy>=1.15.0
|
numpy>=1.15.0
|
||||||
requests>=2.13.0,<3.0.0
|
requests>=2.13.0,<3.0.0
|
||||||
jsonschema>=2.6.0,<3.0.0
|
|
||||||
plac<1.0.0,>=0.9.6
|
plac<1.0.0,>=0.9.6
|
||||||
pathlib==1.0.1; python_version < "3.4"
|
pathlib==1.0.1; python_version < "3.4"
|
||||||
|
# Optional dependencies
|
||||||
|
jsonschema>=2.6.0,<3.1.0
|
||||||
# Development dependencies
|
# Development dependencies
|
||||||
cython>=0.25
|
cython>=0.25
|
||||||
pytest>=4.0.0,<4.1.0
|
pytest>=4.0.0,<4.1.0
|
||||||
|
|
3
setup.py
3
setup.py
|
@ -209,7 +209,7 @@ def setup_package():
|
||||||
generate_cython(root, "spacy")
|
generate_cython(root, "spacy")
|
||||||
|
|
||||||
setup(
|
setup(
|
||||||
name=about["__title__"],
|
name="spacy",
|
||||||
zip_safe=False,
|
zip_safe=False,
|
||||||
packages=PACKAGES,
|
packages=PACKAGES,
|
||||||
package_data=PACKAGE_DATA,
|
package_data=PACKAGE_DATA,
|
||||||
|
@ -232,7 +232,6 @@ def setup_package():
|
||||||
"blis>=0.2.2,<0.3.0",
|
"blis>=0.2.2,<0.3.0",
|
||||||
"plac<1.0.0,>=0.9.6",
|
"plac<1.0.0,>=0.9.6",
|
||||||
"requests>=2.13.0,<3.0.0",
|
"requests>=2.13.0,<3.0.0",
|
||||||
"jsonschema>=2.6.0,<3.0.0",
|
|
||||||
"wasabi>=0.2.0,<1.1.0",
|
"wasabi>=0.2.0,<1.1.0",
|
||||||
"srsly>=0.0.5,<1.1.0",
|
"srsly>=0.0.5,<1.1.0",
|
||||||
'pathlib==1.0.1; python_version < "3.4"',
|
'pathlib==1.0.1; python_version < "3.4"',
|
||||||
|
|
|
@ -4,13 +4,13 @@
|
||||||
# fmt: off
|
# fmt: off
|
||||||
|
|
||||||
__title__ = "spacy"
|
__title__ = "spacy"
|
||||||
__version__ = "2.1.3"
|
__version__ = "2.1.4"
|
||||||
__summary__ = "Industrial-strength Natural Language Processing (NLP) with Python and Cython"
|
__summary__ = "Industrial-strength Natural Language Processing (NLP) with Python and Cython"
|
||||||
__uri__ = "https://spacy.io"
|
__uri__ = "https://spacy.io"
|
||||||
__author__ = "Explosion AI"
|
__author__ = "Explosion AI"
|
||||||
__email__ = "contact@explosion.ai"
|
__email__ = "contact@explosion.ai"
|
||||||
__license__ = "MIT"
|
__license__ = "MIT"
|
||||||
__release__ = True
|
__release__ = False
|
||||||
|
|
||||||
__download_url__ = "https://github.com/explosion/spacy-models/releases/download"
|
__download_url__ = "https://github.com/explosion/spacy-models/releases/download"
|
||||||
__compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
|
__compatibility__ = "https://raw.githubusercontent.com/explosion/spacy-models/master/compatibility.json"
|
||||||
|
|
|
@ -39,7 +39,7 @@ FILE_TYPES_STDOUT = ("json", "jsonl")
|
||||||
def convert(
|
def convert(
|
||||||
input_file,
|
input_file,
|
||||||
output_dir="-",
|
output_dir="-",
|
||||||
file_type="jsonl",
|
file_type="json",
|
||||||
n_sents=1,
|
n_sents=1,
|
||||||
morphology=False,
|
morphology=False,
|
||||||
converter="auto",
|
converter="auto",
|
||||||
|
@ -48,8 +48,8 @@ def convert(
|
||||||
"""
|
"""
|
||||||
Convert files into JSON format for use with train command and other
|
Convert files into JSON format for use with train command and other
|
||||||
experiment management functions. If no output_dir is specified, the data
|
experiment management functions. If no output_dir is specified, the data
|
||||||
is written to stdout, so you can pipe them forward to a JSONL file:
|
is written to stdout, so you can pipe them forward to a JSON file:
|
||||||
$ spacy convert some_file.conllu > some_file.jsonl
|
$ spacy convert some_file.conllu > some_file.json
|
||||||
"""
|
"""
|
||||||
msg = Printer()
|
msg = Printer()
|
||||||
input_path = Path(input_file)
|
input_path = Path(input_file)
|
||||||
|
|
|
@ -11,14 +11,8 @@ def iob2json(input_data, n_sents=10, *args, **kwargs):
|
||||||
"""
|
"""
|
||||||
Convert IOB files into JSON format for use with train cli.
|
Convert IOB files into JSON format for use with train cli.
|
||||||
"""
|
"""
|
||||||
docs = []
|
sentences = read_iob(input_data.split("\n"))
|
||||||
for group in minibatch(docs, n_sents):
|
docs = merge_sentences(sentences, n_sents)
|
||||||
group = list(group)
|
|
||||||
first = group.pop(0)
|
|
||||||
to_extend = first["paragraphs"][0]["sentences"]
|
|
||||||
for sent in group[1:]:
|
|
||||||
to_extend.extend(sent["paragraphs"][0]["sentences"])
|
|
||||||
docs.append(first)
|
|
||||||
return docs
|
return docs
|
||||||
|
|
||||||
|
|
||||||
|
@ -27,7 +21,6 @@ def read_iob(raw_sents):
|
||||||
for line in raw_sents:
|
for line in raw_sents:
|
||||||
if not line.strip():
|
if not line.strip():
|
||||||
continue
|
continue
|
||||||
# tokens = [t.split("|") for t in line.split()]
|
|
||||||
tokens = [re.split("[^\w\-]", line.strip())]
|
tokens = [re.split("[^\w\-]", line.strip())]
|
||||||
if len(tokens[0]) == 3:
|
if len(tokens[0]) == 3:
|
||||||
words, pos, iob = zip(*tokens)
|
words, pos, iob = zip(*tokens)
|
||||||
|
@ -49,3 +42,15 @@ def read_iob(raw_sents):
|
||||||
paragraphs = [{"sentences": [sent]} for sent in sentences]
|
paragraphs = [{"sentences": [sent]} for sent in sentences]
|
||||||
docs = [{"id": 0, "paragraphs": [para]} for para in paragraphs]
|
docs = [{"id": 0, "paragraphs": [para]} for para in paragraphs]
|
||||||
return docs
|
return docs
|
||||||
|
|
||||||
|
|
||||||
|
def merge_sentences(docs, n_sents):
|
||||||
|
merged = []
|
||||||
|
for group in minibatch(docs, size=n_sents):
|
||||||
|
group = list(group)
|
||||||
|
first = group.pop(0)
|
||||||
|
to_extend = first["paragraphs"][0]["sentences"]
|
||||||
|
for sent in group[1:]:
|
||||||
|
to_extend.extend(sent["paragraphs"][0]["sentences"])
|
||||||
|
merged.append(first)
|
||||||
|
return merged
|
||||||
|
|
|
@ -17,6 +17,7 @@ from .. import displacy
|
||||||
gpu_id=("Use GPU", "option", "g", int),
|
gpu_id=("Use GPU", "option", "g", int),
|
||||||
displacy_path=("Directory to output rendered parses as HTML", "option", "dp", str),
|
displacy_path=("Directory to output rendered parses as HTML", "option", "dp", str),
|
||||||
displacy_limit=("Limit of parses to render as HTML", "option", "dl", int),
|
displacy_limit=("Limit of parses to render as HTML", "option", "dl", int),
|
||||||
|
return_scores=("Return dict containing model scores", "flag", "R", bool),
|
||||||
)
|
)
|
||||||
def evaluate(
|
def evaluate(
|
||||||
model,
|
model,
|
||||||
|
@ -25,6 +26,7 @@ def evaluate(
|
||||||
gold_preproc=False,
|
gold_preproc=False,
|
||||||
displacy_path=None,
|
displacy_path=None,
|
||||||
displacy_limit=25,
|
displacy_limit=25,
|
||||||
|
return_scores=False,
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Evaluate a model. To render a sample of parses in a HTML file, set an
|
Evaluate a model. To render a sample of parses in a HTML file, set an
|
||||||
|
@ -75,6 +77,8 @@ def evaluate(
|
||||||
ents=render_ents,
|
ents=render_ents,
|
||||||
)
|
)
|
||||||
msg.good("Generated {} parses as HTML".format(displacy_limit), displacy_path)
|
msg.good("Generated {} parses as HTML".format(displacy_limit), displacy_path)
|
||||||
|
if return_scores:
|
||||||
|
return scorer.scores
|
||||||
|
|
||||||
|
|
||||||
def render_parses(docs, output_path, model_name="", limit=250, deps=True, ents=True):
|
def render_parses(docs, output_path, model_name="", limit=250, deps=True, ents=True):
|
||||||
|
|
|
@ -181,7 +181,7 @@ def read_vectors(vectors_loc):
|
||||||
vectors_keys = []
|
vectors_keys = []
|
||||||
for i, line in enumerate(tqdm(f)):
|
for i, line in enumerate(tqdm(f)):
|
||||||
line = line.rstrip()
|
line = line.rstrip()
|
||||||
pieces = line.rsplit(" ", vectors_data.shape[1] + 1)
|
pieces = line.rsplit(" ", vectors_data.shape[1])
|
||||||
word = pieces.pop(0)
|
word = pieces.pop(0)
|
||||||
if len(pieces) != vectors_data.shape[1]:
|
if len(pieces) != vectors_data.shape[1]:
|
||||||
msg.fail(Errors.E094.format(line_num=i, loc=vectors_loc), exits=1)
|
msg.fail(Errors.E094.format(line_num=i, loc=vectors_loc), exits=1)
|
||||||
|
|
|
@ -34,7 +34,8 @@ from .. import util
|
||||||
max_length=("Max words per example.", "option", "xw", int),
|
max_length=("Max words per example.", "option", "xw", int),
|
||||||
min_length=("Min words per example.", "option", "nw", int),
|
min_length=("Min words per example.", "option", "nw", int),
|
||||||
seed=("Seed for random number generators", "option", "s", float),
|
seed=("Seed for random number generators", "option", "s", float),
|
||||||
nr_iter=("Number of iterations to pretrain", "option", "i", int),
|
n_iter=("Number of iterations to pretrain", "option", "i", int),
|
||||||
|
n_save_every=("Save model every X batches.", "option", "se", int),
|
||||||
)
|
)
|
||||||
def pretrain(
|
def pretrain(
|
||||||
texts_loc,
|
texts_loc,
|
||||||
|
@ -46,11 +47,12 @@ def pretrain(
|
||||||
loss_func="cosine",
|
loss_func="cosine",
|
||||||
use_vectors=False,
|
use_vectors=False,
|
||||||
dropout=0.2,
|
dropout=0.2,
|
||||||
nr_iter=1000,
|
n_iter=1000,
|
||||||
batch_size=3000,
|
batch_size=3000,
|
||||||
max_length=500,
|
max_length=500,
|
||||||
min_length=5,
|
min_length=5,
|
||||||
seed=0,
|
seed=0,
|
||||||
|
n_save_every=None,
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Pre-train the 'token-to-vector' (tok2vec) layer of pipeline components,
|
Pre-train the 'token-to-vector' (tok2vec) layer of pipeline components,
|
||||||
|
@ -115,9 +117,26 @@ def pretrain(
|
||||||
msg.divider("Pre-training tok2vec layer")
|
msg.divider("Pre-training tok2vec layer")
|
||||||
row_settings = {"widths": (3, 10, 10, 6, 4), "aligns": ("r", "r", "r", "r", "r")}
|
row_settings = {"widths": (3, 10, 10, 6, 4), "aligns": ("r", "r", "r", "r", "r")}
|
||||||
msg.row(("#", "# Words", "Total Loss", "Loss", "w/s"), **row_settings)
|
msg.row(("#", "# Words", "Total Loss", "Loss", "w/s"), **row_settings)
|
||||||
for epoch in range(nr_iter):
|
|
||||||
for batch in util.minibatch_by_words(
|
def _save_model(epoch, is_temp=False):
|
||||||
((text, None) for text in texts), size=batch_size
|
is_temp_str = ".temp" if is_temp else ""
|
||||||
|
with model.use_params(optimizer.averages):
|
||||||
|
with (output_dir / ("model%d%s.bin" % (epoch, is_temp_str))).open(
|
||||||
|
"wb"
|
||||||
|
) as file_:
|
||||||
|
file_.write(model.tok2vec.to_bytes())
|
||||||
|
log = {
|
||||||
|
"nr_word": tracker.nr_word,
|
||||||
|
"loss": tracker.loss,
|
||||||
|
"epoch_loss": tracker.epoch_loss,
|
||||||
|
"epoch": epoch,
|
||||||
|
}
|
||||||
|
with (output_dir / "log.jsonl").open("a") as file_:
|
||||||
|
file_.write(srsly.json_dumps(log) + "\n")
|
||||||
|
|
||||||
|
for epoch in range(n_iter):
|
||||||
|
for batch_id, batch in enumerate(
|
||||||
|
util.minibatch_by_words(((text, None) for text in texts), size=batch_size)
|
||||||
):
|
):
|
||||||
docs = make_docs(
|
docs = make_docs(
|
||||||
nlp,
|
nlp,
|
||||||
|
@ -133,17 +152,9 @@ def pretrain(
|
||||||
msg.row(progress, **row_settings)
|
msg.row(progress, **row_settings)
|
||||||
if texts_loc == "-" and tracker.words_per_epoch[epoch] >= 10 ** 7:
|
if texts_loc == "-" and tracker.words_per_epoch[epoch] >= 10 ** 7:
|
||||||
break
|
break
|
||||||
with model.use_params(optimizer.averages):
|
if n_save_every and (batch_id % n_save_every == 0):
|
||||||
with (output_dir / ("model%d.bin" % epoch)).open("wb") as file_:
|
_save_model(epoch, is_temp=True)
|
||||||
file_.write(model.tok2vec.to_bytes())
|
_save_model(epoch)
|
||||||
log = {
|
|
||||||
"nr_word": tracker.nr_word,
|
|
||||||
"loss": tracker.loss,
|
|
||||||
"epoch_loss": tracker.epoch_loss,
|
|
||||||
"epoch": epoch,
|
|
||||||
}
|
|
||||||
with (output_dir / "log.jsonl").open("a") as file_:
|
|
||||||
file_.write(srsly.json_dumps(log) + "\n")
|
|
||||||
tracker.epoch_loss = 0.0
|
tracker.epoch_loss = 0.0
|
||||||
if texts_loc != "-":
|
if texts_loc != "-":
|
||||||
# Reshuffle the texts if texts were loaded from a file
|
# Reshuffle the texts if texts were loaded from a file
|
||||||
|
@ -170,10 +181,10 @@ def make_update(model, docs, optimizer, drop=0.0, objective="L2"):
|
||||||
def make_docs(nlp, batch, min_length, max_length):
|
def make_docs(nlp, batch, min_length, max_length):
|
||||||
docs = []
|
docs = []
|
||||||
for record in batch:
|
for record in batch:
|
||||||
text = record["text"]
|
|
||||||
if "tokens" in record:
|
if "tokens" in record:
|
||||||
doc = Doc(nlp.vocab, words=record["tokens"])
|
doc = Doc(nlp.vocab, words=record["tokens"])
|
||||||
else:
|
else:
|
||||||
|
text = record["text"]
|
||||||
doc = nlp.make_doc(text)
|
doc = nlp.make_doc(text)
|
||||||
if "heads" in record:
|
if "heads" in record:
|
||||||
heads = record["heads"]
|
heads = record["heads"]
|
||||||
|
|
|
@ -16,6 +16,7 @@ import random
|
||||||
from .._ml import create_default_optimizer
|
from .._ml import create_default_optimizer
|
||||||
from ..attrs import PROB, IS_OOV, CLUSTER, LANG
|
from ..attrs import PROB, IS_OOV, CLUSTER, LANG
|
||||||
from ..gold import GoldCorpus
|
from ..gold import GoldCorpus
|
||||||
|
from ..compat import path2str
|
||||||
from .. import util
|
from .. import util
|
||||||
from .. import about
|
from .. import about
|
||||||
|
|
||||||
|
@ -35,6 +36,12 @@ from .. import about
|
||||||
pipeline=("Comma-separated names of pipeline components", "option", "p", str),
|
pipeline=("Comma-separated names of pipeline components", "option", "p", str),
|
||||||
vectors=("Model to load vectors from", "option", "v", str),
|
vectors=("Model to load vectors from", "option", "v", str),
|
||||||
n_iter=("Number of iterations", "option", "n", int),
|
n_iter=("Number of iterations", "option", "n", int),
|
||||||
|
n_early_stopping=(
|
||||||
|
"Maximum number of training epochs without dev accuracy improvement",
|
||||||
|
"option",
|
||||||
|
"ne",
|
||||||
|
int,
|
||||||
|
),
|
||||||
n_examples=("Number of examples", "option", "ns", int),
|
n_examples=("Number of examples", "option", "ns", int),
|
||||||
use_gpu=("Use GPU", "option", "g", int),
|
use_gpu=("Use GPU", "option", "g", int),
|
||||||
version=("Model version", "option", "V", str),
|
version=("Model version", "option", "V", str),
|
||||||
|
@ -74,6 +81,7 @@ def train(
|
||||||
pipeline="tagger,parser,ner",
|
pipeline="tagger,parser,ner",
|
||||||
vectors=None,
|
vectors=None,
|
||||||
n_iter=30,
|
n_iter=30,
|
||||||
|
n_early_stopping=None,
|
||||||
n_examples=0,
|
n_examples=0,
|
||||||
use_gpu=-1,
|
use_gpu=-1,
|
||||||
version="0.0.0",
|
version="0.0.0",
|
||||||
|
@ -101,6 +109,7 @@ def train(
|
||||||
train_path = util.ensure_path(train_path)
|
train_path = util.ensure_path(train_path)
|
||||||
dev_path = util.ensure_path(dev_path)
|
dev_path = util.ensure_path(dev_path)
|
||||||
meta_path = util.ensure_path(meta_path)
|
meta_path = util.ensure_path(meta_path)
|
||||||
|
output_path = util.ensure_path(output_path)
|
||||||
if raw_text is not None:
|
if raw_text is not None:
|
||||||
raw_text = list(srsly.read_jsonl(raw_text))
|
raw_text = list(srsly.read_jsonl(raw_text))
|
||||||
if not train_path or not train_path.exists():
|
if not train_path or not train_path.exists():
|
||||||
|
@ -222,6 +231,8 @@ def train(
|
||||||
msg.row(row_head, **row_settings)
|
msg.row(row_head, **row_settings)
|
||||||
msg.row(["-" * width for width in row_settings["widths"]], **row_settings)
|
msg.row(["-" * width for width in row_settings["widths"]], **row_settings)
|
||||||
try:
|
try:
|
||||||
|
iter_since_best = 0
|
||||||
|
best_score = 0.0
|
||||||
for i in range(n_iter):
|
for i in range(n_iter):
|
||||||
train_docs = corpus.train_docs(
|
train_docs = corpus.train_docs(
|
||||||
nlp, noise_level=noise_level, gold_preproc=gold_preproc, max_length=0
|
nlp, noise_level=noise_level, gold_preproc=gold_preproc, max_length=0
|
||||||
|
@ -276,7 +287,9 @@ def train(
|
||||||
gpu_wps = nwords / (end_time - start_time)
|
gpu_wps = nwords / (end_time - start_time)
|
||||||
with Model.use_device("cpu"):
|
with Model.use_device("cpu"):
|
||||||
nlp_loaded = util.load_model_from_path(epoch_model_path)
|
nlp_loaded = util.load_model_from_path(epoch_model_path)
|
||||||
nlp_loaded.parser.cfg["beam_width"]
|
for name, component in nlp_loaded.pipeline:
|
||||||
|
if hasattr(component, "cfg"):
|
||||||
|
component.cfg["beam_width"] = beam_width
|
||||||
dev_docs = list(
|
dev_docs = list(
|
||||||
corpus.dev_docs(nlp_loaded, gold_preproc=gold_preproc)
|
corpus.dev_docs(nlp_loaded, gold_preproc=gold_preproc)
|
||||||
)
|
)
|
||||||
|
@ -328,6 +341,24 @@ def train(
|
||||||
gpu_wps=gpu_wps,
|
gpu_wps=gpu_wps,
|
||||||
)
|
)
|
||||||
msg.row(progress, **row_settings)
|
msg.row(progress, **row_settings)
|
||||||
|
# Early stopping
|
||||||
|
if n_early_stopping is not None:
|
||||||
|
current_score = _score_for_model(meta)
|
||||||
|
if current_score < best_score:
|
||||||
|
iter_since_best += 1
|
||||||
|
else:
|
||||||
|
iter_since_best = 0
|
||||||
|
best_score = current_score
|
||||||
|
if iter_since_best >= n_early_stopping:
|
||||||
|
msg.text(
|
||||||
|
"Early stopping, best iteration "
|
||||||
|
"is: {}".format(i - iter_since_best)
|
||||||
|
)
|
||||||
|
msg.text(
|
||||||
|
"Best score = {}; Final iteration "
|
||||||
|
"score = {}".format(best_score, current_score)
|
||||||
|
)
|
||||||
|
break
|
||||||
finally:
|
finally:
|
||||||
with nlp.use_params(optimizer.averages):
|
with nlp.use_params(optimizer.averages):
|
||||||
final_model_path = output_path / "model-final"
|
final_model_path = output_path / "model-final"
|
||||||
|
@ -338,6 +369,20 @@ def train(
|
||||||
msg.good("Created best model", best_model_path)
|
msg.good("Created best model", best_model_path)
|
||||||
|
|
||||||
|
|
||||||
|
def _score_for_model(meta):
|
||||||
|
""" Returns mean score between tasks in pipeline that can be used for early stopping. """
|
||||||
|
mean_acc = list()
|
||||||
|
pipes = meta["pipeline"]
|
||||||
|
acc = meta["accuracy"]
|
||||||
|
if "tagger" in pipes:
|
||||||
|
mean_acc.append(acc["tags_acc"])
|
||||||
|
if "parser" in pipes:
|
||||||
|
mean_acc.append((acc["uas"] + acc["las"]) / 2)
|
||||||
|
if "ner" in pipes:
|
||||||
|
mean_acc.append((acc["ents_p"] + acc["ents_r"] + acc["ents_f"]) / 3)
|
||||||
|
return sum(mean_acc) / len(mean_acc)
|
||||||
|
|
||||||
|
|
||||||
@contextlib.contextmanager
|
@contextlib.contextmanager
|
||||||
def _create_progress_bar(total):
|
def _create_progress_bar(total):
|
||||||
if int(os.environ.get("LOG_FRIENDLY", 0)):
|
if int(os.environ.get("LOG_FRIENDLY", 0)):
|
||||||
|
@ -379,10 +424,12 @@ def _collate_best_model(meta, output_path, components):
|
||||||
for component in components:
|
for component in components:
|
||||||
bests[component] = _find_best(output_path, component)
|
bests[component] = _find_best(output_path, component)
|
||||||
best_dest = output_path / "model-best"
|
best_dest = output_path / "model-best"
|
||||||
shutil.copytree(output_path / "model-final", best_dest)
|
shutil.copytree(path2str(output_path / "model-final"), path2str(best_dest))
|
||||||
for component, best_component_src in bests.items():
|
for component, best_component_src in bests.items():
|
||||||
shutil.rmtree(best_dest / component)
|
shutil.rmtree(path2str(best_dest / component))
|
||||||
shutil.copytree(best_component_src / component, best_dest / component)
|
shutil.copytree(
|
||||||
|
path2str(best_component_src / component), path2str(best_dest / component)
|
||||||
|
)
|
||||||
accs = srsly.read_json(best_component_src / "accuracy.json")
|
accs = srsly.read_json(best_component_src / "accuracy.json")
|
||||||
for metric in _get_metrics(component):
|
for metric in _get_metrics(component):
|
||||||
meta["accuracy"][metric] = accs[metric]
|
meta["accuracy"][metric] = accs[metric]
|
||||||
|
|
|
@ -92,7 +92,9 @@ def symlink_to(orig, dest):
|
||||||
if is_windows:
|
if is_windows:
|
||||||
import subprocess
|
import subprocess
|
||||||
|
|
||||||
subprocess.call(["mklink", "/d", path2str(orig), path2str(dest)], shell=True)
|
subprocess.check_call(
|
||||||
|
["mklink", "/d", path2str(orig), path2str(dest)], shell=True
|
||||||
|
)
|
||||||
else:
|
else:
|
||||||
orig.symlink_to(dest)
|
orig.symlink_to(dest)
|
||||||
|
|
||||||
|
|
|
@ -19,7 +19,7 @@ RENDER_WRAPPER = None
|
||||||
|
|
||||||
|
|
||||||
def render(
|
def render(
|
||||||
docs, style="dep", page=False, minify=False, jupyter=False, options={}, manual=False
|
docs, style="dep", page=False, minify=False, jupyter=None, options={}, manual=False
|
||||||
):
|
):
|
||||||
"""Render displaCy visualisation.
|
"""Render displaCy visualisation.
|
||||||
|
|
||||||
|
@ -27,7 +27,7 @@ def render(
|
||||||
style (unicode): Visualisation style, 'dep' or 'ent'.
|
style (unicode): Visualisation style, 'dep' or 'ent'.
|
||||||
page (bool): Render markup as full HTML page.
|
page (bool): Render markup as full HTML page.
|
||||||
minify (bool): Minify HTML markup.
|
minify (bool): Minify HTML markup.
|
||||||
jupyter (bool): Experimental, use Jupyter's `display()` to output markup.
|
jupyter (bool): Override Jupyter auto-detection.
|
||||||
options (dict): Visualiser-specific options, e.g. colors.
|
options (dict): Visualiser-specific options, e.g. colors.
|
||||||
manual (bool): Don't parse `Doc` and instead expect a dict/list of dicts.
|
manual (bool): Don't parse `Doc` and instead expect a dict/list of dicts.
|
||||||
RETURNS (unicode): Rendered HTML markup.
|
RETURNS (unicode): Rendered HTML markup.
|
||||||
|
@ -53,7 +53,8 @@ def render(
|
||||||
html = _html["parsed"]
|
html = _html["parsed"]
|
||||||
if RENDER_WRAPPER is not None:
|
if RENDER_WRAPPER is not None:
|
||||||
html = RENDER_WRAPPER(html)
|
html = RENDER_WRAPPER(html)
|
||||||
if jupyter or is_in_jupyter(): # return HTML rendered by IPython display()
|
if jupyter or (jupyter is None and is_in_jupyter()):
|
||||||
|
# return HTML rendered by IPython display()
|
||||||
from IPython.core.display import display, HTML
|
from IPython.core.display import display, HTML
|
||||||
|
|
||||||
return display(HTML(html))
|
return display(HTML(html))
|
||||||
|
|
|
@ -141,8 +141,14 @@ class Errors(object):
|
||||||
E023 = ("Error cleaning up beam: The same state occurred twice at "
|
E023 = ("Error cleaning up beam: The same state occurred twice at "
|
||||||
"memory address {addr} and position {i}.")
|
"memory address {addr} and position {i}.")
|
||||||
E024 = ("Could not find an optimal move to supervise the parser. Usually, "
|
E024 = ("Could not find an optimal move to supervise the parser. Usually, "
|
||||||
"this means the GoldParse was not correct. For example, are all "
|
"this means that the model can't be updated in a way that's valid "
|
||||||
"labels added to the model?")
|
"and satisfies the correct annotations specified in the GoldParse. "
|
||||||
|
"For example, are all labels added to the model? If you're "
|
||||||
|
"training a named entity recognizer, also make sure that none of "
|
||||||
|
"your annotated entity spans have leading or trailing whitespace. "
|
||||||
|
"You can also use the experimental `debug-data` command to "
|
||||||
|
"validate your JSON-formatted training data. For details, run:\n"
|
||||||
|
"python -m spacy debug-data --help")
|
||||||
E025 = ("String is too long: {length} characters. Max is 2**30.")
|
E025 = ("String is too long: {length} characters. Max is 2**30.")
|
||||||
E026 = ("Error accessing token at position {i}: out of bounds in Doc of "
|
E026 = ("Error accessing token at position {i}: out of bounds in Doc of "
|
||||||
"length {length}.")
|
"length {length}.")
|
||||||
|
@ -383,6 +389,10 @@ class Errors(object):
|
||||||
E133 = ("The sum of prior probabilities for alias '{alias}' should not exceed 1, "
|
E133 = ("The sum of prior probabilities for alias '{alias}' should not exceed 1, "
|
||||||
"but found {sum}.")
|
"but found {sum}.")
|
||||||
E134 = ("Alias '{alias}' defined for unknown entity '{entity}'.")
|
E134 = ("Alias '{alias}' defined for unknown entity '{entity}'.")
|
||||||
|
E135 = ("If you meant to replace a built-in component, use `create_pipe`: "
|
||||||
|
"`nlp.replace_pipe('{name}', nlp.create_pipe('{name}'))`")
|
||||||
|
E136 = ("This additional feature requires the jsonschema library to be "
|
||||||
|
"installed:\npip install jsonschema")
|
||||||
|
|
||||||
|
|
||||||
@add_codes
|
@add_codes
|
||||||
|
|
|
@ -168,6 +168,7 @@ GLOSSARY = {
|
||||||
# Dependency Labels (English)
|
# Dependency Labels (English)
|
||||||
# ClearNLP / Universal Dependencies
|
# ClearNLP / Universal Dependencies
|
||||||
# https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md
|
# https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md
|
||||||
|
"acl": "clausal modifier of noun (adjectival clause)",
|
||||||
"acomp": "adjectival complement",
|
"acomp": "adjectival complement",
|
||||||
"advcl": "adverbial clause modifier",
|
"advcl": "adverbial clause modifier",
|
||||||
"advmod": "adverbial modifier",
|
"advmod": "adverbial modifier",
|
||||||
|
@ -177,22 +178,32 @@ GLOSSARY = {
|
||||||
"attr": "attribute",
|
"attr": "attribute",
|
||||||
"aux": "auxiliary",
|
"aux": "auxiliary",
|
||||||
"auxpass": "auxiliary (passive)",
|
"auxpass": "auxiliary (passive)",
|
||||||
|
"case": "case marking",
|
||||||
"cc": "coordinating conjunction",
|
"cc": "coordinating conjunction",
|
||||||
"ccomp": "clausal complement",
|
"ccomp": "clausal complement",
|
||||||
|
"clf": "classifier",
|
||||||
"complm": "complementizer",
|
"complm": "complementizer",
|
||||||
|
"compound": "compound",
|
||||||
"conj": "conjunct",
|
"conj": "conjunct",
|
||||||
"cop": "copula",
|
"cop": "copula",
|
||||||
"csubj": "clausal subject",
|
"csubj": "clausal subject",
|
||||||
"csubjpass": "clausal subject (passive)",
|
"csubjpass": "clausal subject (passive)",
|
||||||
|
"dative": "dative",
|
||||||
"dep": "unclassified dependent",
|
"dep": "unclassified dependent",
|
||||||
"det": "determiner",
|
"det": "determiner",
|
||||||
|
"discourse": "discourse element",
|
||||||
|
"dislocated": "dislocated elements",
|
||||||
"dobj": "direct object",
|
"dobj": "direct object",
|
||||||
"expl": "expletive",
|
"expl": "expletive",
|
||||||
|
"fixed": "fixed multiword expression",
|
||||||
|
"flat": "flat multiword expression",
|
||||||
|
"goeswith": "goes with",
|
||||||
"hmod": "modifier in hyphenation",
|
"hmod": "modifier in hyphenation",
|
||||||
"hyph": "hyphen",
|
"hyph": "hyphen",
|
||||||
"infmod": "infinitival modifier",
|
"infmod": "infinitival modifier",
|
||||||
"intj": "interjection",
|
"intj": "interjection",
|
||||||
"iobj": "indirect object",
|
"iobj": "indirect object",
|
||||||
|
"list": "list",
|
||||||
"mark": "marker",
|
"mark": "marker",
|
||||||
"meta": "meta modifier",
|
"meta": "meta modifier",
|
||||||
"neg": "negation modifier",
|
"neg": "negation modifier",
|
||||||
|
@ -201,11 +212,15 @@ GLOSSARY = {
|
||||||
"npadvmod": "noun phrase as adverbial modifier",
|
"npadvmod": "noun phrase as adverbial modifier",
|
||||||
"nsubj": "nominal subject",
|
"nsubj": "nominal subject",
|
||||||
"nsubjpass": "nominal subject (passive)",
|
"nsubjpass": "nominal subject (passive)",
|
||||||
|
"nounmod": "modifier of nominal",
|
||||||
|
"npmod": "noun phrase as adverbial modifier",
|
||||||
"num": "number modifier",
|
"num": "number modifier",
|
||||||
"number": "number compound modifier",
|
"number": "number compound modifier",
|
||||||
|
"nummod": "numeric modifier",
|
||||||
"oprd": "object predicate",
|
"oprd": "object predicate",
|
||||||
"obj": "object",
|
"obj": "object",
|
||||||
"obl": "oblique nominal",
|
"obl": "oblique nominal",
|
||||||
|
"orphan": "orphan",
|
||||||
"parataxis": "parataxis",
|
"parataxis": "parataxis",
|
||||||
"partmod": "participal modifier",
|
"partmod": "participal modifier",
|
||||||
"pcomp": "complement of preposition",
|
"pcomp": "complement of preposition",
|
||||||
|
@ -218,7 +233,10 @@ GLOSSARY = {
|
||||||
"punct": "punctuation",
|
"punct": "punctuation",
|
||||||
"quantmod": "modifier of quantifier",
|
"quantmod": "modifier of quantifier",
|
||||||
"rcmod": "relative clause modifier",
|
"rcmod": "relative clause modifier",
|
||||||
|
"relcl": "relative clause modifier",
|
||||||
|
"reparandum": "overridden disfluency",
|
||||||
"root": "root",
|
"root": "root",
|
||||||
|
"vocative": "vocative",
|
||||||
"xcomp": "open clausal complement",
|
"xcomp": "open clausal complement",
|
||||||
# Dependency labels (German)
|
# Dependency labels (German)
|
||||||
# TIGER Treebank
|
# TIGER Treebank
|
||||||
|
|
|
@ -532,7 +532,7 @@ cdef class GoldParse:
|
||||||
self.labels[i] = deps[i2j_multi[i]]
|
self.labels[i] = deps[i2j_multi[i]]
|
||||||
# Now set NER...This is annoying because if we've split
|
# Now set NER...This is annoying because if we've split
|
||||||
# got an entity word split into two, we need to adjust the
|
# got an entity word split into two, we need to adjust the
|
||||||
# BILOU tags. We can't have BB or LL etc.
|
# BILUO tags. We can't have BB or LL etc.
|
||||||
# Case 1: O -- easy.
|
# Case 1: O -- easy.
|
||||||
ner_tag = entities[i2j_multi[i]]
|
ner_tag = entities[i2j_multi[i]]
|
||||||
if ner_tag == "O":
|
if ner_tag == "O":
|
||||||
|
|
|
@ -5,8 +5,8 @@ from __future__ import unicode_literals
|
||||||
STOP_WORDS = set(
|
STOP_WORDS = set(
|
||||||
"""
|
"""
|
||||||
á a ab aber ach acht achte achten achter achtes ag alle allein allem allen
|
á a ab aber ach acht achte achten achter achtes ag alle allein allem allen
|
||||||
aller allerdings alles allgemeinen als also am an andere anderen andern anders
|
aller allerdings alles allgemeinen als also am an andere anderen anderem andern
|
||||||
auch auf aus ausser außer ausserdem außerdem
|
anders auch auf aus ausser außer ausserdem außerdem
|
||||||
|
|
||||||
bald bei beide beiden beim beispiel bekannt bereits besonders besser besten bin
|
bald bei beide beiden beim beispiel bekannt bereits besonders besser besten bin
|
||||||
bis bisher bist
|
bis bisher bist
|
||||||
|
@ -35,8 +35,8 @@ großen grosser großer grosses großes gut gute guter gutes
|
||||||
habe haben habt hast hat hatte hätte hatten hätten heisst heißt her heute hier
|
habe haben habt hast hat hatte hätte hatten hätten heisst heißt her heute hier
|
||||||
hin hinter hoch
|
hin hinter hoch
|
||||||
|
|
||||||
ich ihm ihn ihnen ihr ihre ihrem ihrer ihres im immer in indem infolgedessen
|
ich ihm ihn ihnen ihr ihre ihrem ihren ihrer ihres im immer in indem
|
||||||
ins irgend ist
|
infolgedessen ins irgend ist
|
||||||
|
|
||||||
ja jahr jahre jahren je jede jedem jeden jeder jedermann jedermanns jedoch
|
ja jahr jahre jahren je jede jedem jeden jeder jedermann jedermanns jedoch
|
||||||
jemand jemandem jemanden jene jenem jenen jener jenes jetzt
|
jemand jemandem jemanden jene jenem jenen jener jenes jetzt
|
||||||
|
|
|
@ -39,7 +39,7 @@ made make many may me meanwhile might mine more moreover most mostly move much
|
||||||
must my myself
|
must my myself
|
||||||
|
|
||||||
name namely neither never nevertheless next nine no nobody none noone nor not
|
name namely neither never nevertheless next nine no nobody none noone nor not
|
||||||
nothing now nowhere
|
nothing now nowhere
|
||||||
|
|
||||||
of off often on once one only onto or other others otherwise our ours ourselves
|
of off often on once one only onto or other others otherwise our ours ourselves
|
||||||
out over own
|
out over own
|
||||||
|
@ -75,4 +75,3 @@ STOP_WORDS.update(contractions)
|
||||||
for apostrophe in ["‘", "’"]:
|
for apostrophe in ["‘", "’"]:
|
||||||
for stopword in contractions:
|
for stopword in contractions:
|
||||||
STOP_WORDS.add(stopword.replace("'", apostrophe))
|
STOP_WORDS.add(stopword.replace("'", apostrophe))
|
||||||
|
|
||||||
|
|
|
@ -4,6 +4,7 @@ from __future__ import unicode_literals
|
||||||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||||
from .tag_map import TAG_MAP
|
from .tag_map import TAG_MAP
|
||||||
from .stop_words import STOP_WORDS
|
from .stop_words import STOP_WORDS
|
||||||
|
from .lex_attrs import LEX_ATTRS
|
||||||
from .lemmatizer import LOOKUP
|
from .lemmatizer import LOOKUP
|
||||||
from .syntax_iterators import SYNTAX_ITERATORS
|
from .syntax_iterators import SYNTAX_ITERATORS
|
||||||
|
|
||||||
|
@ -16,6 +17,7 @@ from ...util import update_exc, add_lookups
|
||||||
|
|
||||||
class SpanishDefaults(Language.Defaults):
|
class SpanishDefaults(Language.Defaults):
|
||||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||||
|
lex_attr_getters.update(LEX_ATTRS)
|
||||||
lex_attr_getters[LANG] = lambda text: "es"
|
lex_attr_getters[LANG] = lambda text: "es"
|
||||||
lex_attr_getters[NORM] = add_lookups(
|
lex_attr_getters[NORM] = add_lookups(
|
||||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
|
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
|
||||||
|
|
59
spacy/lang/es/lex_attrs.py
Normal file
59
spacy/lang/es/lex_attrs.py
Normal file
|
@ -0,0 +1,59 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
from ...attrs import LIKE_NUM
|
||||||
|
|
||||||
|
|
||||||
|
_num_words = [
|
||||||
|
"cero",
|
||||||
|
"uno",
|
||||||
|
"dos",
|
||||||
|
"tres",
|
||||||
|
"cuatro",
|
||||||
|
"cinco",
|
||||||
|
"seis",
|
||||||
|
"siete",
|
||||||
|
"ocho",
|
||||||
|
"nueve",
|
||||||
|
"diez",
|
||||||
|
"once",
|
||||||
|
"doce",
|
||||||
|
"trece",
|
||||||
|
"catorce",
|
||||||
|
"quince",
|
||||||
|
"dieciséis",
|
||||||
|
"diecisiete",
|
||||||
|
"dieciocho",
|
||||||
|
"diecinueve",
|
||||||
|
"veinte",
|
||||||
|
"treinta",
|
||||||
|
"cuarenta",
|
||||||
|
"cincuenta",
|
||||||
|
"sesenta",
|
||||||
|
"setenta",
|
||||||
|
"ochenta",
|
||||||
|
"noventa",
|
||||||
|
"cien",
|
||||||
|
"mil",
|
||||||
|
"millón",
|
||||||
|
"billón",
|
||||||
|
"trillón",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def like_num(text):
|
||||||
|
if text.startswith(("+", "-", "±", "~")):
|
||||||
|
text = text[1:]
|
||||||
|
text = text.replace(",", "").replace(".", "")
|
||||||
|
if text.isdigit():
|
||||||
|
return True
|
||||||
|
if text.count("/") == 1:
|
||||||
|
num, denom = text.split("/")
|
||||||
|
if num.isdigit() and denom.isdigit():
|
||||||
|
return True
|
||||||
|
if text.lower() in _num_words:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
LEX_ATTRS = {LIKE_NUM: like_num}
|
|
@ -11,9 +11,9 @@ Example sentences to test spaCy and its language models.
|
||||||
|
|
||||||
|
|
||||||
sentences = [
|
sentences = [
|
||||||
"Apple cherche a acheter une startup anglaise pour 1 milliard de dollard",
|
"Apple cherche à acheter une startup anglaise pour 1 milliard de dollars",
|
||||||
"Les voitures autonomes voient leur assurances décalées vers les constructeurs",
|
"Les voitures autonomes déplacent la responsabilité de l'assurance vers les constructeurs",
|
||||||
"San Francisco envisage d'interdire les robots coursiers",
|
"San Francisco envisage d'interdire les robots coursiers sur les trottoirs",
|
||||||
"Londres est une grande ville du Royaume-Uni",
|
"Londres est une grande ville du Royaume-Uni",
|
||||||
"L’Italie choisit ArcelorMittal pour reprendre la plus grande aciérie d’Europe",
|
"L’Italie choisit ArcelorMittal pour reprendre la plus grande aciérie d’Europe",
|
||||||
"Apple lance HomePod parce qu'il se sent menacé par l'Echo d'Amazon",
|
"Apple lance HomePod parce qu'il se sent menacé par l'Echo d'Amazon",
|
||||||
|
|
|
@ -7,88 +7,89 @@ from ...symbols import NOUN, PRON, AUX, SCONJ, INTJ, PART, PROPN
|
||||||
|
|
||||||
# POS explanations for indonesian available from https://www.aclweb.org/anthology/Y12-1014
|
# POS explanations for indonesian available from https://www.aclweb.org/anthology/Y12-1014
|
||||||
TAG_MAP = {
|
TAG_MAP = {
|
||||||
"NSD": {POS: NOUN},
|
"NSD": {POS: NOUN},
|
||||||
"Z--": {POS: PUNCT},
|
"Z--": {POS: PUNCT},
|
||||||
"VSA": {POS: VERB},
|
"VSA": {POS: VERB},
|
||||||
"CC-": {POS: NUM},
|
"CC-": {POS: NUM},
|
||||||
"R--": {POS: ADP},
|
"R--": {POS: ADP},
|
||||||
"D--": {POS: ADV},
|
"D--": {POS: ADV},
|
||||||
"ASP": {POS: ADJ},
|
"ASP": {POS: ADJ},
|
||||||
"S--": {POS: SCONJ},
|
"S--": {POS: SCONJ},
|
||||||
"VSP": {POS: VERB},
|
"VSP": {POS: VERB},
|
||||||
"H--": {POS: CCONJ},
|
"H--": {POS: CCONJ},
|
||||||
"F--": {POS: X},
|
"F--": {POS: X},
|
||||||
"B--": {POS: DET},
|
"B--": {POS: DET},
|
||||||
"CO-": {POS: NUM},
|
"CO-": {POS: NUM},
|
||||||
"G--": {POS: ADV},
|
"G--": {POS: ADV},
|
||||||
"PS3": {POS: PRON},
|
"PS3": {POS: PRON},
|
||||||
"W--": {POS: ADV},
|
"W--": {POS: ADV},
|
||||||
"O--": {POS: AUX},
|
"O--": {POS: AUX},
|
||||||
"PP1": {POS: PRON},
|
"PP1": {POS: PRON},
|
||||||
"ASS": {POS: ADJ},
|
"ASS": {POS: ADJ},
|
||||||
"PS1": {POS: PRON},
|
"PS1": {POS: PRON},
|
||||||
"APP": {POS: ADJ},
|
"APP": {POS: ADJ},
|
||||||
"CD-": {POS: NUM},
|
"CD-": {POS: NUM},
|
||||||
"VPA": {POS: VERB},
|
"VPA": {POS: VERB},
|
||||||
"VPP": {POS: VERB},
|
"VPP": {POS: VERB},
|
||||||
"X--": {POS: X},
|
"X--": {POS: X},
|
||||||
"CO-+PS3": {POS: NUM},
|
"CO-+PS3": {POS: NUM},
|
||||||
"NSD+PS3": {POS: NOUN},
|
"NSD+PS3": {POS: NOUN},
|
||||||
"ASP+PS3": {POS: ADJ},
|
"ASP+PS3": {POS: ADJ},
|
||||||
"M--": {POS: AUX},
|
"M--": {POS: AUX},
|
||||||
"VSA+PS3": {POS: VERB},
|
"VSA+PS3": {POS: VERB},
|
||||||
"R--+PS3": {POS: ADP},
|
"R--+PS3": {POS: ADP},
|
||||||
"W--+T--": {POS: ADV},
|
"W--+T--": {POS: ADV},
|
||||||
"PS2": {POS:PRON},
|
"PS2": {POS: PRON},
|
||||||
"NSD+PS1": {POS:NOUN},
|
"NSD+PS1": {POS: NOUN},
|
||||||
"PP3": {POS: PRON},
|
"PP3": {POS: PRON},
|
||||||
"VSA+T--": {POS: VERB},
|
"VSA+T--": {POS: VERB},
|
||||||
"D--+T--": {POS: ADV},
|
"D--+T--": {POS: ADV},
|
||||||
"VSP+PS3": {POS: VERB},
|
"VSP+PS3": {POS: VERB},
|
||||||
"F--+PS3": {POS: X},
|
"F--+PS3": {POS: X},
|
||||||
"M--+T--": {POS: AUX},
|
"M--+T--": {POS: AUX},
|
||||||
"F--+T--": {POS: X},
|
"F--+T--": {POS: X},
|
||||||
"PUNCT": {POS: PUNCT},
|
"PUNCT": {POS: PUNCT},
|
||||||
"PROPN": {POS: PROPN},
|
"PROPN": {POS: PROPN},
|
||||||
"I--": {POS: INTJ},
|
"I--": {POS: INTJ},
|
||||||
"S--+PS3": {POS: SCONJ},
|
"S--+PS3": {POS: SCONJ},
|
||||||
"ASP+T--": {POS: ADJ},
|
"ASP+T--": {POS: ADJ},
|
||||||
"CC-+PS3": {POS: NUM},
|
"CC-+PS3": {POS: NUM},
|
||||||
"NSD+PS2": {POS: NOUN},
|
"NSD+PS2": {POS: NOUN},
|
||||||
"B--+T--": {POS: DET},
|
"B--+T--": {POS: DET},
|
||||||
"H--+T--": {POS: CCONJ},
|
"H--+T--": {POS: CCONJ},
|
||||||
"VSA+PS2": {POS: VERB},
|
"VSA+PS2": {POS: VERB},
|
||||||
"NSF": {POS: NOUN},
|
"NSF": {POS: NOUN},
|
||||||
"PS1+VSA": {POS: PRON},
|
"PS1+VSA": {POS: PRON},
|
||||||
"NPD": {POS: NOUN},
|
"NPD": {POS: NOUN},
|
||||||
"PP2": {POS:PRON},
|
"PP2": {POS: PRON},
|
||||||
"VSA+PS1": {POS: VERB},
|
"VSA+PS1": {POS: VERB},
|
||||||
"T--": {POS: PART},
|
"T--": {POS: PART},
|
||||||
"NSM": {POS: NOUN},
|
"NSM": {POS: NOUN},
|
||||||
"NUM": {POS: NUM},
|
"NUM": {POS: NUM},
|
||||||
"ASP+PS2": {POS: ADJ},
|
"ASP+PS2": {POS: ADJ},
|
||||||
"G--+T--": {POS: PART},
|
"G--+T--": {POS: PART},
|
||||||
"D--+PS3": {POS: ADV},
|
"D--+PS3": {POS: ADV},
|
||||||
"R--+PS2": {POS: ADP},
|
"R--+PS2": {POS: ADP},
|
||||||
"NSM+PS3": {POS: NOUN},
|
"NSM+PS3": {POS: NOUN},
|
||||||
"VSP+T--": {POS: VERB},
|
"VSP+T--": {POS: VERB},
|
||||||
"M--+PS3": {POS: AUX},
|
"M--+PS3": {POS: AUX},
|
||||||
"ASS+PS3": {POS: ADJ},
|
"ASS+PS3": {POS: ADJ},
|
||||||
"G--+PS3": {POS: PART},
|
"G--+PS3": {POS: PART},
|
||||||
"F--+PS1": {POS: X},
|
"F--+PS1": {POS: X},
|
||||||
"NSD+T--": {POS: NOUN},
|
"NSD+T--": {POS: NOUN},
|
||||||
"PP1+T--": {POS: PRON},
|
"PP1+T--": {POS: PRON},
|
||||||
"B--+PS3": {POS: DET},
|
"B--+PS3": {POS: DET},
|
||||||
"NOUN": {POS: NOUN},
|
"NOUN": {POS: NOUN},
|
||||||
"NPD+PS3": {POS: NOUN},
|
"NPD+PS3": {POS: NOUN},
|
||||||
"R--+PS1": {POS: ADP},
|
"R--+PS1": {POS: ADP},
|
||||||
"F--+PS2": {POS: X},
|
"F--+PS2": {POS: X},
|
||||||
"CD-+PS3": {POS: NUM},
|
"CD-+PS3": {POS: NUM},
|
||||||
"PS1+VSA+T--":{POS: VERB},
|
"PS1+VSA+T--": {POS: VERB},
|
||||||
"PS2+VSA": {POS: VERB},
|
"PS2+VSA": {POS: VERB},
|
||||||
"VERB": {POS: VERB},
|
"VERB": {POS: VERB},
|
||||||
"CC-+T--": {POS: NUM},
|
"CC-+T--": {POS: NUM},
|
||||||
"NPD+PS2":{POS: NOUN},
|
"NPD+PS2": {POS: NOUN},
|
||||||
"D--+PS2":{POS: ADV},
|
"D--+PS2": {POS: ADV},
|
||||||
"PP3+T--": {POS: PRON},
|
"PP3+T--": {POS: PRON},
|
||||||
"X": {POS: X}}
|
"X": {POS: X},
|
||||||
|
}
|
||||||
|
|
|
@ -4,67 +4,87 @@ from __future__ import unicode_literals
|
||||||
|
|
||||||
STOP_WORDS = set(
|
STOP_WORDS = set(
|
||||||
"""
|
"""
|
||||||
ಈ
|
|
||||||
ಮತ್ತು
|
|
||||||
ಹಾಗೂ
|
|
||||||
ಅವರು
|
|
||||||
ಅವರ
|
|
||||||
ಬಗ್ಗೆ
|
|
||||||
ಎಂಬ
|
|
||||||
ಆದರೆ
|
|
||||||
ಅವರನ್ನು
|
|
||||||
ಆದರೆ
|
|
||||||
ತಮ್ಮ
|
|
||||||
ಒಂದು
|
|
||||||
ಎಂದರು
|
|
||||||
ಮೇಲೆ
|
|
||||||
ಹೇಳಿದರು
|
|
||||||
ಸೇರಿದಂತೆ
|
|
||||||
ಬಳಿಕ
|
|
||||||
ಆ
|
|
||||||
ಯಾವುದೇ
|
|
||||||
ಅವರಿಗೆ
|
|
||||||
ನಡೆದ
|
|
||||||
ಕುರಿತು
|
|
||||||
ಇದು
|
|
||||||
ಅವರು
|
|
||||||
ಕಳೆದ
|
|
||||||
ಇದೇ
|
|
||||||
ತಿಳಿಸಿದರು
|
|
||||||
ಹೀಗಾಗಿ
|
|
||||||
ಕೂಡ
|
|
||||||
ತನ್ನ
|
|
||||||
ತಿಳಿಸಿದ್ದಾರೆ
|
|
||||||
ನಾನು
|
|
||||||
ಹೇಳಿದ್ದಾರೆ
|
|
||||||
ಈಗ
|
|
||||||
ಎಲ್ಲ
|
|
||||||
ನನ್ನ
|
|
||||||
ನಮ್ಮ
|
|
||||||
ಈಗಾಗಲೇ
|
|
||||||
ಇದಕ್ಕೆ
|
|
||||||
ಹಲವು
|
ಹಲವು
|
||||||
ಇದೆ
|
ಮೂಲಕ
|
||||||
ಮತ್ತೆ
|
ಹಾಗೂ
|
||||||
ಮಾಡುವ
|
|
||||||
ನೀಡಿದರು
|
|
||||||
ನಾವು
|
|
||||||
ನೀಡಿದ
|
|
||||||
ಇದರಿಂದ
|
|
||||||
ಅದು
|
ಅದು
|
||||||
ಇದನ್ನು
|
|
||||||
ನೀಡಿದ್ದಾರೆ
|
ನೀಡಿದ್ದಾರೆ
|
||||||
|
ಯಾವ
|
||||||
|
ಎಂದರು
|
||||||
|
ಅವರು
|
||||||
|
ಈಗ
|
||||||
|
ಎಂಬ
|
||||||
|
ಹಾಗಾಗಿ
|
||||||
|
ಅಷ್ಟೇ
|
||||||
|
ನಾವು
|
||||||
|
ಇದೇ
|
||||||
|
ಹೇಳಿ
|
||||||
|
ತಮ್ಮ
|
||||||
|
ಹೀಗೆ
|
||||||
|
ನಮ್ಮ
|
||||||
|
ಬೇರೆ
|
||||||
|
ನೀಡಿದರು
|
||||||
|
ಮತ್ತೆ
|
||||||
|
ಇದು
|
||||||
|
ಈ
|
||||||
|
ನೀವು
|
||||||
|
ನಾನು
|
||||||
|
ಇತ್ತು
|
||||||
|
ಎಲ್ಲಾ
|
||||||
|
ಯಾವುದೇ
|
||||||
|
ನಡೆದ
|
||||||
ಅದನ್ನು
|
ಅದನ್ನು
|
||||||
ಇಲ್ಲಿ
|
ಎಂದರೆ
|
||||||
ಆಗ
|
|
||||||
ಬಂದಿದೆ.
|
|
||||||
ಅದೇ
|
|
||||||
ಇರುವ
|
|
||||||
ಅಲ್ಲದೆ
|
|
||||||
ಕೆಲವು
|
|
||||||
ನೀಡಿದೆ
|
ನೀಡಿದೆ
|
||||||
|
ಹೀಗಾಗಿ
|
||||||
|
ಜೊತೆಗೆ
|
||||||
|
ಇದರಿಂದ
|
||||||
|
ನನಗೆ
|
||||||
|
ಅಲ್ಲದೆ
|
||||||
|
ಎಷ್ಟು
|
||||||
ಇದರ
|
ಇದರ
|
||||||
|
ಇಲ್ಲ
|
||||||
|
ಕಳೆದ
|
||||||
|
ತುಂಬಾ
|
||||||
|
ಈಗಾಗಲೇ
|
||||||
|
ಮಾಡಿ
|
||||||
|
ಅದಕ್ಕೆ
|
||||||
|
ಬಗ್ಗೆ
|
||||||
|
ಅವರ
|
||||||
|
ಇದನ್ನು
|
||||||
|
ಆ
|
||||||
|
ಇದೆ
|
||||||
|
ಹೆಚ್ಚು
|
||||||
ಇನ್ನು
|
ಇನ್ನು
|
||||||
|
ಎಲ್ಲ
|
||||||
|
ಇರುವ
|
||||||
|
ಅವರಿಗೆ
|
||||||
|
ನಿಮ್ಮ
|
||||||
|
ಏನು
|
||||||
|
ಕೂಡ
|
||||||
|
ಇಲ್ಲಿ
|
||||||
|
ನನ್ನನ್ನು
|
||||||
|
ಕೆಲವು
|
||||||
|
ಮಾತ್ರ
|
||||||
|
ಬಳಿಕ
|
||||||
|
ಅಂತ
|
||||||
|
ತನ್ನ
|
||||||
|
ಆಗ
|
||||||
|
ಅಥವಾ
|
||||||
|
ಅಲ್ಲ
|
||||||
|
ಕೇವಲ
|
||||||
|
ಆದರೆ
|
||||||
|
ಮತ್ತು
|
||||||
|
ಇನ್ನೂ
|
||||||
|
ಅದೇ
|
||||||
|
ಆಗಿ
|
||||||
|
ಅವರನ್ನು
|
||||||
|
ಹೇಳಿದ್ದಾರೆ
|
||||||
ನಡೆದಿದೆ
|
ನಡೆದಿದೆ
|
||||||
|
ಇದಕ್ಕೆ
|
||||||
|
ಎಂಬುದು
|
||||||
|
ಎಂದು
|
||||||
|
ನನ್ನ
|
||||||
|
ಮೇಲೆ
|
||||||
""".split()
|
""".split()
|
||||||
)
|
)
|
||||||
|
|
20
spacy/lang/mr/__init__.py
Normal file
20
spacy/lang/mr/__init__.py
Normal file
|
@ -0,0 +1,20 @@
|
||||||
|
#coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
from .stop_words import STOP_WORDS
|
||||||
|
from ...language import Language
|
||||||
|
from ...attrs import LANG
|
||||||
|
|
||||||
|
|
||||||
|
class MarathiDefaults(Language.Defaults):
|
||||||
|
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||||
|
lex_attr_getters[LANG] = lambda text: "mr"
|
||||||
|
stop_words = STOP_WORDS
|
||||||
|
|
||||||
|
|
||||||
|
class Marathi(Language):
|
||||||
|
lang = "mr"
|
||||||
|
Defaults = MarathiDefaults
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = ["Marathi"]
|
196
spacy/lang/mr/stop_words.py
Normal file
196
spacy/lang/mr/stop_words.py
Normal file
|
@ -0,0 +1,196 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
|
||||||
|
# Source: https://github.com/stopwords-iso/stopwords-mr/blob/master/stopwords-mr.txt, https://github.com/6/stopwords-json/edit/master/dist/mr.json
|
||||||
|
STOP_WORDS = set(
|
||||||
|
"""
|
||||||
|
न
|
||||||
|
अतरी
|
||||||
|
तो
|
||||||
|
हें
|
||||||
|
तें
|
||||||
|
कां
|
||||||
|
आणि
|
||||||
|
जें
|
||||||
|
जे
|
||||||
|
मग
|
||||||
|
ते
|
||||||
|
मी
|
||||||
|
जो
|
||||||
|
परी
|
||||||
|
गा
|
||||||
|
हे
|
||||||
|
ऐसें
|
||||||
|
आतां
|
||||||
|
नाहीं
|
||||||
|
तेथ
|
||||||
|
हा
|
||||||
|
तया
|
||||||
|
असे
|
||||||
|
म्हणे
|
||||||
|
काय
|
||||||
|
कीं
|
||||||
|
जैसें
|
||||||
|
तंव
|
||||||
|
तूं
|
||||||
|
होय
|
||||||
|
जैसा
|
||||||
|
आहे
|
||||||
|
पैं
|
||||||
|
तैसा
|
||||||
|
जरी
|
||||||
|
म्हणोनि
|
||||||
|
एक
|
||||||
|
ऐसा
|
||||||
|
जी
|
||||||
|
ना
|
||||||
|
मज
|
||||||
|
एथ
|
||||||
|
या
|
||||||
|
जेथ
|
||||||
|
जया
|
||||||
|
तुज
|
||||||
|
तेणें
|
||||||
|
तैं
|
||||||
|
पां
|
||||||
|
असो
|
||||||
|
करी
|
||||||
|
ऐसी
|
||||||
|
येणें
|
||||||
|
जाहला
|
||||||
|
तेंचि
|
||||||
|
आघवें
|
||||||
|
होती
|
||||||
|
कांहीं
|
||||||
|
होऊनि
|
||||||
|
एकें
|
||||||
|
मातें
|
||||||
|
ठायीं
|
||||||
|
ये
|
||||||
|
सकळ
|
||||||
|
केलें
|
||||||
|
जेणें
|
||||||
|
जाण
|
||||||
|
जैसी
|
||||||
|
होये
|
||||||
|
जेवीं
|
||||||
|
एऱ्हवीं
|
||||||
|
मीचि
|
||||||
|
किरीटी
|
||||||
|
दिसे
|
||||||
|
देवा
|
||||||
|
हो
|
||||||
|
तरि
|
||||||
|
कीजे
|
||||||
|
तैसे
|
||||||
|
आपण
|
||||||
|
तिये
|
||||||
|
कर्म
|
||||||
|
नोहे
|
||||||
|
इये
|
||||||
|
पडे
|
||||||
|
माझें
|
||||||
|
तैसी
|
||||||
|
लागे
|
||||||
|
नाना
|
||||||
|
जंव
|
||||||
|
कीर
|
||||||
|
अधिक
|
||||||
|
अनेक
|
||||||
|
अशी
|
||||||
|
असलयाचे
|
||||||
|
असलेल्या
|
||||||
|
असा
|
||||||
|
असून
|
||||||
|
असे
|
||||||
|
आज
|
||||||
|
आणि
|
||||||
|
आता
|
||||||
|
आपल्या
|
||||||
|
आला
|
||||||
|
आली
|
||||||
|
आले
|
||||||
|
आहे
|
||||||
|
आहेत
|
||||||
|
एक
|
||||||
|
एका
|
||||||
|
कमी
|
||||||
|
करणयात
|
||||||
|
करून
|
||||||
|
का
|
||||||
|
काम
|
||||||
|
काय
|
||||||
|
काही
|
||||||
|
किवा
|
||||||
|
की
|
||||||
|
केला
|
||||||
|
केली
|
||||||
|
केले
|
||||||
|
कोटी
|
||||||
|
गेल्या
|
||||||
|
घेऊन
|
||||||
|
जात
|
||||||
|
झाला
|
||||||
|
झाली
|
||||||
|
झाले
|
||||||
|
झालेल्या
|
||||||
|
टा
|
||||||
|
तर
|
||||||
|
तरी
|
||||||
|
तसेच
|
||||||
|
ता
|
||||||
|
ती
|
||||||
|
तीन
|
||||||
|
ते
|
||||||
|
तो
|
||||||
|
त्या
|
||||||
|
त्याचा
|
||||||
|
त्याची
|
||||||
|
त्याच्या
|
||||||
|
त्याना
|
||||||
|
त्यानी
|
||||||
|
त्यामुळे
|
||||||
|
त्री
|
||||||
|
दिली
|
||||||
|
दोन
|
||||||
|
न
|
||||||
|
पण
|
||||||
|
पम
|
||||||
|
परयतन
|
||||||
|
पाटील
|
||||||
|
म
|
||||||
|
मात्र
|
||||||
|
माहिती
|
||||||
|
मी
|
||||||
|
मुबी
|
||||||
|
म्हणजे
|
||||||
|
म्हणाले
|
||||||
|
म्हणून
|
||||||
|
या
|
||||||
|
याचा
|
||||||
|
याची
|
||||||
|
याच्या
|
||||||
|
याना
|
||||||
|
यानी
|
||||||
|
येणार
|
||||||
|
येत
|
||||||
|
येथील
|
||||||
|
येथे
|
||||||
|
लाख
|
||||||
|
व
|
||||||
|
व्यकत
|
||||||
|
सर्व
|
||||||
|
सागित्ले
|
||||||
|
सुरू
|
||||||
|
हजार
|
||||||
|
हा
|
||||||
|
ही
|
||||||
|
हे
|
||||||
|
होणार
|
||||||
|
होत
|
||||||
|
होता
|
||||||
|
होती
|
||||||
|
होते
|
||||||
|
""".split()
|
||||||
|
)
|
|
@ -6,10 +6,7 @@ from .lex_attrs import LEX_ATTRS
|
||||||
from .tag_map import TAG_MAP
|
from .tag_map import TAG_MAP
|
||||||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||||
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
|
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
|
||||||
|
from .lemmatizer import LOOKUP, LEMMA_EXC, LEMMA_INDEX, RULES, DutchLemmatizer
|
||||||
from .lemmatizer import LOOKUP, LEMMA_EXC, LEMMA_INDEX, RULES
|
|
||||||
from .lemmatizer.lemmatizer import DutchLemmatizer
|
|
||||||
|
|
||||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||||
from ..norm_exceptions import BASE_NORMS
|
from ..norm_exceptions import BASE_NORMS
|
||||||
from ...language import Language
|
from ...language import Language
|
||||||
|
@ -21,9 +18,10 @@ class DutchDefaults(Language.Defaults):
|
||||||
|
|
||||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||||
lex_attr_getters.update(LEX_ATTRS)
|
lex_attr_getters.update(LEX_ATTRS)
|
||||||
lex_attr_getters[LANG] = lambda text: 'nl'
|
lex_attr_getters[LANG] = lambda text: "nl"
|
||||||
lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM],
|
lex_attr_getters[NORM] = add_lookups(
|
||||||
BASE_NORMS)
|
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
|
||||||
|
)
|
||||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||||
stop_words = STOP_WORDS
|
stop_words = STOP_WORDS
|
||||||
tag_map = TAG_MAP
|
tag_map = TAG_MAP
|
||||||
|
@ -36,15 +34,14 @@ class DutchDefaults(Language.Defaults):
|
||||||
lemma_index = LEMMA_INDEX
|
lemma_index = LEMMA_INDEX
|
||||||
lemma_exc = LEMMA_EXC
|
lemma_exc = LEMMA_EXC
|
||||||
lemma_lookup = LOOKUP
|
lemma_lookup = LOOKUP
|
||||||
return DutchLemmatizer(index=lemma_index,
|
return DutchLemmatizer(
|
||||||
exceptions=lemma_exc,
|
index=lemma_index, exceptions=lemma_exc, lookup=lemma_lookup, rules=rules
|
||||||
lookup=lemma_lookup,
|
)
|
||||||
rules=rules)
|
|
||||||
|
|
||||||
|
|
||||||
class Dutch(Language):
|
class Dutch(Language):
|
||||||
lang = 'nl'
|
lang = "nl"
|
||||||
Defaults = DutchDefaults
|
Defaults = DutchDefaults
|
||||||
|
|
||||||
|
|
||||||
__all__ = ['Dutch']
|
__all__ = ["Dutch"]
|
||||||
|
|
|
@ -18,23 +18,26 @@ from ._adpositions import ADPOSITIONS
|
||||||
from ._determiners import DETERMINERS
|
from ._determiners import DETERMINERS
|
||||||
|
|
||||||
from .lookup import LOOKUP
|
from .lookup import LOOKUP
|
||||||
|
|
||||||
from ._lemma_rules import RULES
|
from ._lemma_rules import RULES
|
||||||
|
|
||||||
from .lemmatizer import DutchLemmatizer
|
from .lemmatizer import DutchLemmatizer
|
||||||
|
|
||||||
|
|
||||||
LEMMA_INDEX = {"adj": ADJECTIVES,
|
LEMMA_INDEX = {
|
||||||
"noun": NOUNS,
|
"adj": ADJECTIVES,
|
||||||
"verb": VERBS,
|
"noun": NOUNS,
|
||||||
"adp": ADPOSITIONS,
|
"verb": VERBS,
|
||||||
"det": DETERMINERS}
|
"adp": ADPOSITIONS,
|
||||||
|
"det": DETERMINERS,
|
||||||
|
}
|
||||||
|
|
||||||
LEMMA_EXC = {"adj": ADJECTIVES_IRREG,
|
LEMMA_EXC = {
|
||||||
"adv": ADVERBS_IRREG,
|
"adj": ADJECTIVES_IRREG,
|
||||||
"adp": ADPOSITIONS_IRREG,
|
"adv": ADVERBS_IRREG,
|
||||||
"noun": NOUNS_IRREG,
|
"adp": ADPOSITIONS_IRREG,
|
||||||
"verb": VERBS_IRREG,
|
"noun": NOUNS_IRREG,
|
||||||
"det": DETERMINERS_IRREG,
|
"verb": VERBS_IRREG,
|
||||||
"pron": PRONOUNS_IRREG}
|
"det": DETERMINERS_IRREG,
|
||||||
|
"pron": PRONOUNS_IRREG,
|
||||||
|
}
|
||||||
|
|
||||||
|
__all__ = ["LOOKUP", "LEMMA_EXC", "LEMMA_INDEX", "RULES", "DutchLemmatizer"]
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
# coding: utf8
|
# coding: utf8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
from ...symbols import ORTH, LEMMA, TAG, NORM, PRON_LEMMA
|
from ...symbols import ORTH
|
||||||
|
|
||||||
# Extensive list of both common and uncommon dutch abbreviations copied from
|
# Extensive list of both common and uncommon dutch abbreviations copied from
|
||||||
# github.com/diasks2/pragmatic_segmenter, a Ruby library for rule-based
|
# github.com/diasks2/pragmatic_segmenter, a Ruby library for rule-based
|
||||||
|
@ -16,7 +16,7 @@ from ...symbols import ORTH, LEMMA, TAG, NORM, PRON_LEMMA
|
||||||
# are extremely domain-specific. Tokenizer performance may benefit from some
|
# are extremely domain-specific. Tokenizer performance may benefit from some
|
||||||
# slight pruning, although no performance regression has been observed so far.
|
# slight pruning, although no performance regression has been observed so far.
|
||||||
|
|
||||||
|
# fmt: off
|
||||||
abbrevs = ['a.2d.', 'a.a.', 'a.a.j.b.', 'a.f.t.', 'a.g.j.b.',
|
abbrevs = ['a.2d.', 'a.a.', 'a.a.j.b.', 'a.f.t.', 'a.g.j.b.',
|
||||||
'a.h.v.', 'a.h.w.', 'a.hosp.', 'a.i.', 'a.j.b.', 'a.j.t.',
|
'a.h.v.', 'a.h.w.', 'a.hosp.', 'a.i.', 'a.j.b.', 'a.j.t.',
|
||||||
'a.m.', 'a.m.r.', 'a.p.m.', 'a.p.r.', 'a.p.t.', 'a.s.',
|
'a.m.', 'a.m.r.', 'a.p.m.', 'a.p.r.', 'a.p.t.', 'a.s.',
|
||||||
|
@ -326,7 +326,7 @@ abbrevs = ['a.2d.', 'a.a.', 'a.a.j.b.', 'a.f.t.', 'a.g.j.b.',
|
||||||
'wtvb.', 'ww.', 'x.d.', 'z.a.', 'z.g.', 'z.i.', 'z.j.',
|
'wtvb.', 'ww.', 'x.d.', 'z.a.', 'z.g.', 'z.i.', 'z.j.',
|
||||||
'z.o.z.', 'z.p.', 'z.s.m.', 'zg.', 'zgn.', 'zn.', 'znw.',
|
'z.o.z.', 'z.p.', 'z.s.m.', 'zg.', 'zgn.', 'zn.', 'znw.',
|
||||||
'zr.', 'zr.', 'ms.', 'zr.ms.']
|
'zr.', 'zr.', 'ms.', 'zr.ms.']
|
||||||
|
# fmt: on
|
||||||
|
|
||||||
_exc = {}
|
_exc = {}
|
||||||
for orth in abbrevs:
|
for orth in abbrevs:
|
||||||
|
|
|
@ -53,4 +53,11 @@ BASE_NORMS = {
|
||||||
"US$": "$",
|
"US$": "$",
|
||||||
"C$": "$",
|
"C$": "$",
|
||||||
"A$": "$",
|
"A$": "$",
|
||||||
|
"₺": "$",
|
||||||
|
"₹": "$",
|
||||||
|
"৳": "$",
|
||||||
|
"₩": "$",
|
||||||
|
"Mex$": "$",
|
||||||
|
"₣": "$",
|
||||||
|
"E£": "$",
|
||||||
}
|
}
|
||||||
|
|
|
@ -4,11 +4,14 @@ from __future__ import unicode_literals
|
||||||
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||||
from .tag_map import TAG_MAP
|
from .tag_map import TAG_MAP
|
||||||
from .stop_words import STOP_WORDS
|
from .stop_words import STOP_WORDS
|
||||||
|
from .norm_exceptions import NORM_EXCEPTIONS
|
||||||
|
from .lex_attrs import LEX_ATTRS
|
||||||
|
|
||||||
from ...attrs import LANG
|
from ..norm_exceptions import BASE_NORMS
|
||||||
|
from ...attrs import LANG, NORM
|
||||||
from ...language import Language
|
from ...language import Language
|
||||||
from ...tokens import Doc
|
from ...tokens import Doc
|
||||||
from ...util import DummyTokenizer
|
from ...util import DummyTokenizer, add_lookups
|
||||||
|
|
||||||
|
|
||||||
class ThaiTokenizer(DummyTokenizer):
|
class ThaiTokenizer(DummyTokenizer):
|
||||||
|
@ -25,15 +28,18 @@ class ThaiTokenizer(DummyTokenizer):
|
||||||
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
||||||
|
|
||||||
def __call__(self, text):
|
def __call__(self, text):
|
||||||
words = list(self.word_tokenize(text, "newmm"))
|
words = list(self.word_tokenize(text))
|
||||||
spaces = [False] * len(words)
|
spaces = [False] * len(words)
|
||||||
return Doc(self.vocab, words=words, spaces=spaces)
|
return Doc(self.vocab, words=words, spaces=spaces)
|
||||||
|
|
||||||
|
|
||||||
class ThaiDefaults(Language.Defaults):
|
class ThaiDefaults(Language.Defaults):
|
||||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||||
|
lex_attr_getters.update(LEX_ATTRS)
|
||||||
lex_attr_getters[LANG] = lambda _text: "th"
|
lex_attr_getters[LANG] = lambda _text: "th"
|
||||||
|
lex_attr_getters[NORM] = add_lookups(
|
||||||
|
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS, NORM_EXCEPTIONS
|
||||||
|
)
|
||||||
tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
|
tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
|
||||||
tag_map = TAG_MAP
|
tag_map = TAG_MAP
|
||||||
stop_words = STOP_WORDS
|
stop_words = STOP_WORDS
|
||||||
|
|
62
spacy/lang/th/lex_attrs.py
Normal file
62
spacy/lang/th/lex_attrs.py
Normal file
|
@ -0,0 +1,62 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
from ...attrs import LIKE_NUM
|
||||||
|
|
||||||
|
|
||||||
|
_num_words = [
|
||||||
|
"ศูนย์",
|
||||||
|
"หนึ่ง",
|
||||||
|
"สอง",
|
||||||
|
"สาม",
|
||||||
|
"สี่",
|
||||||
|
"ห้า",
|
||||||
|
"หก",
|
||||||
|
"เจ็ด",
|
||||||
|
"แปด",
|
||||||
|
"เก้า",
|
||||||
|
"สิบ",
|
||||||
|
"สิบเอ็ด",
|
||||||
|
"ยี่สิบ",
|
||||||
|
"ยี่สิบเอ็ด",
|
||||||
|
"สามสิบ",
|
||||||
|
"สามสิบเอ็ด",
|
||||||
|
"สี่สิบ",
|
||||||
|
"สี่สิบเอ็ด",
|
||||||
|
"ห้าสิบ",
|
||||||
|
"ห้าสิบเอ็ด",
|
||||||
|
"หกสิบเอ็ด",
|
||||||
|
"เจ็ดสิบ",
|
||||||
|
"เจ็ดสิบเอ็ด",
|
||||||
|
"แปดสิบ",
|
||||||
|
"แปดสิบเอ็ด",
|
||||||
|
"เก้าสิบ",
|
||||||
|
"เก้าสิบเอ็ด",
|
||||||
|
"ร้อย",
|
||||||
|
"พัน",
|
||||||
|
"ล้าน",
|
||||||
|
"พันล้าน",
|
||||||
|
"หมื่นล้าน",
|
||||||
|
"แสนล้าน",
|
||||||
|
"ล้านล้าน",
|
||||||
|
"ล้านล้านล้าน",
|
||||||
|
"ล้านล้านล้านล้าน",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def like_num(text):
|
||||||
|
if text.startswith(("+", "-", "±", "~")):
|
||||||
|
text = text[1:]
|
||||||
|
text = text.replace(",", "").replace(".", "")
|
||||||
|
if text.isdigit():
|
||||||
|
return True
|
||||||
|
if text.count("/") == 1:
|
||||||
|
num, denom = text.split("/")
|
||||||
|
if num.isdigit() and denom.isdigit():
|
||||||
|
return True
|
||||||
|
if text in _num_words:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
LEX_ATTRS = {LIKE_NUM: like_num}
|
113
spacy/lang/th/norm_exceptions.py
Normal file
113
spacy/lang/th/norm_exceptions.py
Normal file
|
@ -0,0 +1,113 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
|
||||||
|
_exc = {
|
||||||
|
# Conjugation and Diversion invalid to Tonal form (ผันอักษรและเสียงไม่ตรงกับรูปวรรณยุกต์)
|
||||||
|
"สนุ๊กเกอร์": "สนุกเกอร์",
|
||||||
|
"โน้ต": "โน้ต",
|
||||||
|
# Misspelled because of being lazy or hustle (สะกดผิดเพราะขี้เกียจพิมพ์ หรือเร่งรีบ)
|
||||||
|
"โทสับ": "โทรศัพท์",
|
||||||
|
"พุ่งนี้": "พรุ่งนี้",
|
||||||
|
# Strange (ให้ดูแปลกตา)
|
||||||
|
"ชะมะ": "ใช่ไหม",
|
||||||
|
"ชิมิ": "ใช่ไหม",
|
||||||
|
"ชะ": "ใช่ไหม",
|
||||||
|
"ช่ายมะ": "ใช่ไหม",
|
||||||
|
"ป่าว": "เปล่า",
|
||||||
|
"ป่ะ": "เปล่า",
|
||||||
|
"ปล่าว": "เปล่า",
|
||||||
|
"คัย": "ใคร",
|
||||||
|
"ไค": "ใคร",
|
||||||
|
"คราย": "ใคร",
|
||||||
|
"เตง": "ตัวเอง",
|
||||||
|
"ตะเอง": "ตัวเอง",
|
||||||
|
"รึ": "หรือ",
|
||||||
|
"เหรอ": "หรือ",
|
||||||
|
"หรา": "หรือ",
|
||||||
|
"หรอ": "หรือ",
|
||||||
|
"ชั้น": "ฉัน",
|
||||||
|
"ชั้ล": "ฉัน",
|
||||||
|
"ช้าน": "ฉัน",
|
||||||
|
"เทอ": "เธอ",
|
||||||
|
"เทอร์": "เธอ",
|
||||||
|
"เทอว์": "เธอ",
|
||||||
|
"แกร": "แก",
|
||||||
|
"ป๋ม": "ผม",
|
||||||
|
"บ่องตง": "บอกตรงๆ",
|
||||||
|
"ถ่ามตง": "ถามตรงๆ",
|
||||||
|
"ต่อมตง": "ตอบตรงๆ",
|
||||||
|
"เพิ่ล": "เพื่อน",
|
||||||
|
"จอบอ": "จอบอ",
|
||||||
|
"ดั้ย": "ได้",
|
||||||
|
"ขอบคุง": "ขอบคุณ",
|
||||||
|
"ยังงัย": "ยังไง",
|
||||||
|
"Inw": "เทพ",
|
||||||
|
"uou": "นอน",
|
||||||
|
"Lกรีeu": "เกรียน",
|
||||||
|
# Misspelled to express emotions (คำที่สะกดผิดเพื่อแสดงอารมณ์)
|
||||||
|
"เปงราย": "เป็นอะไร",
|
||||||
|
"เปนรัย": "เป็นอะไร",
|
||||||
|
"เปงรัย": "เป็นอะไร",
|
||||||
|
"เป็นอัลไล": "เป็นอะไร",
|
||||||
|
"ทามมาย": "ทำไม",
|
||||||
|
"ทามมัย": "ทำไม",
|
||||||
|
"จังรุย": "จังเลย",
|
||||||
|
"จังเยย": "จังเลย",
|
||||||
|
"จุงเบย": "จังเลย",
|
||||||
|
"ไม่รู้": "มะรุ",
|
||||||
|
"เฮ่ย": "เฮ้ย",
|
||||||
|
"เห้ย": "เฮ้ย",
|
||||||
|
"น่าร็อค": "น่ารัก",
|
||||||
|
"น่าร๊าก": "น่ารัก",
|
||||||
|
"ตั้ลล๊าก": "น่ารัก",
|
||||||
|
"คือร๊ะ": "คืออะไร",
|
||||||
|
"โอป่ะ": "โอเคหรือเปล่า",
|
||||||
|
"น่ามคาน": "น่ารำคาญ",
|
||||||
|
"น่ามสาร": "น่าสงสาร",
|
||||||
|
"วงวาร": "สงสาร",
|
||||||
|
"บับว่า": "แบบว่า",
|
||||||
|
"อัลไล": "อะไร",
|
||||||
|
"อิจ": "อิจฉา",
|
||||||
|
# Reduce rough words or Avoid to software filter (คำที่สะกดผิดเพื่อลดความหยาบของคำ หรืออาจใช้หลีกเลี่ยงการกรองคำหยาบของซอฟต์แวร์)
|
||||||
|
"กรู": "กู",
|
||||||
|
"กุ": "กู",
|
||||||
|
"กรุ": "กู",
|
||||||
|
"ตู": "กู",
|
||||||
|
"ตรู": "กู",
|
||||||
|
"มรึง": "มึง",
|
||||||
|
"เมิง": "มึง",
|
||||||
|
"มืง": "มึง",
|
||||||
|
"มุง": "มึง",
|
||||||
|
"สาด": "สัตว์",
|
||||||
|
"สัส": "สัตว์",
|
||||||
|
"สัก": "สัตว์",
|
||||||
|
"แสรด": "สัตว์",
|
||||||
|
"โคโตะ": "โคตร",
|
||||||
|
"โคด": "โคตร",
|
||||||
|
"โครต": "โคตร",
|
||||||
|
"โคตะระ": "โคตร",
|
||||||
|
"พ่อง": "พ่อมึง",
|
||||||
|
"แม่เมิง": "แม่มึง",
|
||||||
|
"เชี่ย": "เหี้ย",
|
||||||
|
# Imitate words (คำเลียนเสียง โดยส่วนใหญ่จะเพิ่มทัณฑฆาต หรือซ้ำตัวอักษร)
|
||||||
|
"แอร๊ยย": "อ๊าย",
|
||||||
|
"อร๊ายยย": "อ๊าย",
|
||||||
|
"มันส์": "มัน",
|
||||||
|
"วู๊วววววววว์": "วู้",
|
||||||
|
# Acronym (แบบคำย่อ)
|
||||||
|
"หมาลัย": "มหาวิทยาลัย",
|
||||||
|
"วิดวะ": "วิศวะ",
|
||||||
|
"สินสาด ": "ศิลปศาสตร์",
|
||||||
|
"สินกำ ": "ศิลปกรรมศาสตร์",
|
||||||
|
"เสารีย์ ": "อนุเสาวรีย์ชัยสมรภูมิ",
|
||||||
|
"เมกา ": "อเมริกา",
|
||||||
|
"มอไซค์ ": "มอเตอร์ไซค์",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
NORM_EXCEPTIONS = {}
|
||||||
|
|
||||||
|
for string, norm in _exc.items():
|
||||||
|
NORM_EXCEPTIONS[string] = norm
|
||||||
|
NORM_EXCEPTIONS[string.title()] = norm
|
|
@ -5,7 +5,7 @@ from ...symbols import ORTH, LEMMA
|
||||||
|
|
||||||
|
|
||||||
_exc = {
|
_exc = {
|
||||||
#หน่วยงานรัฐ / government agency
|
# หน่วยงานรัฐ / government agency
|
||||||
"กกต.": [{ORTH: "กกต.", LEMMA: "คณะกรรมการการเลือกตั้ง"}],
|
"กกต.": [{ORTH: "กกต.", LEMMA: "คณะกรรมการการเลือกตั้ง"}],
|
||||||
"กทท.": [{ORTH: "กทท.", LEMMA: "การท่าเรือแห่งประเทศไทย"}],
|
"กทท.": [{ORTH: "กทท.", LEMMA: "การท่าเรือแห่งประเทศไทย"}],
|
||||||
"กทพ.": [{ORTH: "กทพ.", LEMMA: "การทางพิเศษแห่งประเทศไทย"}],
|
"กทพ.": [{ORTH: "กทพ.", LEMMA: "การทางพิเศษแห่งประเทศไทย"}],
|
||||||
|
@ -44,11 +44,21 @@ _exc = {
|
||||||
"ธอส.": [{ORTH: "ธอส.", LEMMA: "ธนาคารอาคารสงเคราะห์"}],
|
"ธอส.": [{ORTH: "ธอส.", LEMMA: "ธนาคารอาคารสงเคราะห์"}],
|
||||||
"นย.": [{ORTH: "นย.", LEMMA: "นาวิกโยธิน"}],
|
"นย.": [{ORTH: "นย.", LEMMA: "นาวิกโยธิน"}],
|
||||||
"ปตท.": [{ORTH: "ปตท.", LEMMA: "การปิโตรเลียมแห่งประเทศไทย"}],
|
"ปตท.": [{ORTH: "ปตท.", LEMMA: "การปิโตรเลียมแห่งประเทศไทย"}],
|
||||||
"ป.ป.ช.": [{ORTH: "ป.ป.ช.", LEMMA: "คณะกรรมการป้องกันและปราบปรามการทุจริตและประพฤติมิชอบในวงราชการ"}],
|
"ป.ป.ช.": [
|
||||||
|
{
|
||||||
|
ORTH: "ป.ป.ช.",
|
||||||
|
LEMMA: "คณะกรรมการป้องกันและปราบปรามการทุจริตและประพฤติมิชอบในวงราชการ",
|
||||||
|
}
|
||||||
|
],
|
||||||
"ป.ป.ส.": [{ORTH: "ป.ป.ส.", LEMMA: "คณะกรรมการป้องกันและปราบปรามยาเสพติด"}],
|
"ป.ป.ส.": [{ORTH: "ป.ป.ส.", LEMMA: "คณะกรรมการป้องกันและปราบปรามยาเสพติด"}],
|
||||||
"บพร.": [{ORTH: "บพร.", LEMMA: "กรมการบินพลเรือน"}],
|
"บพร.": [{ORTH: "บพร.", LEMMA: "กรมการบินพลเรือน"}],
|
||||||
"บย.": [{ORTH: "บย.", LEMMA: "กองบินยุทธการ"}],
|
"บย.": [{ORTH: "บย.", LEMMA: "กองบินยุทธการ"}],
|
||||||
"พสวท.": [{ORTH: "พสวท.", LEMMA: "โครงการพัฒนาและส่งเสริมผู้มีความรู้ความสามารถพิเศษทางวิทยาศาสตร์และเทคโนโลยี"}],
|
"พสวท.": [
|
||||||
|
{
|
||||||
|
ORTH: "พสวท.",
|
||||||
|
LEMMA: "โครงการพัฒนาและส่งเสริมผู้มีความรู้ความสามารถพิเศษทางวิทยาศาสตร์และเทคโนโลยี",
|
||||||
|
}
|
||||||
|
],
|
||||||
"มอก.": [{ORTH: "มอก.", LEMMA: "สำนักงานมาตรฐานผลิตภัณฑ์อุตสาหกรรม"}],
|
"มอก.": [{ORTH: "มอก.", LEMMA: "สำนักงานมาตรฐานผลิตภัณฑ์อุตสาหกรรม"}],
|
||||||
"ยธ.": [{ORTH: "ยธ.", LEMMA: "กรมโยธาธิการ"}],
|
"ยธ.": [{ORTH: "ยธ.", LEMMA: "กรมโยธาธิการ"}],
|
||||||
"รพช.": [{ORTH: "รพช.", LEMMA: "สำนักงานเร่งรัดพัฒนาชนบท"}],
|
"รพช.": [{ORTH: "รพช.", LEMMA: "สำนักงานเร่งรัดพัฒนาชนบท"}],
|
||||||
|
@ -71,11 +81,15 @@ _exc = {
|
||||||
"สปช.": [{ORTH: "สปช.", LEMMA: "สำนักงานคณะกรรมการการประถมศึกษาแห่งชาติ"}],
|
"สปช.": [{ORTH: "สปช.", LEMMA: "สำนักงานคณะกรรมการการประถมศึกษาแห่งชาติ"}],
|
||||||
"สปอ.": [{ORTH: "สปอ.", LEMMA: "สำนักงานการประถมศึกษาอำเภอ"}],
|
"สปอ.": [{ORTH: "สปอ.", LEMMA: "สำนักงานการประถมศึกษาอำเภอ"}],
|
||||||
"สพช.": [{ORTH: "สพช.", LEMMA: "สำนักงานคณะกรรมการนโยบายพลังงานแห่งชาติ"}],
|
"สพช.": [{ORTH: "สพช.", LEMMA: "สำนักงานคณะกรรมการนโยบายพลังงานแห่งชาติ"}],
|
||||||
"สยช.": [{ORTH: "สยช.", LEMMA: "สำนักงานคณะกรรมการส่งเสริมและประสานงานเยาวชนแห่งชาติ"}],
|
"สยช.": [
|
||||||
|
{ORTH: "สยช.", LEMMA: "สำนักงานคณะกรรมการส่งเสริมและประสานงานเยาวชนแห่งชาติ"}
|
||||||
|
],
|
||||||
"สวช.": [{ORTH: "สวช.", LEMMA: "สำนักงานคณะกรรมการวัฒนธรรมแห่งชาติ"}],
|
"สวช.": [{ORTH: "สวช.", LEMMA: "สำนักงานคณะกรรมการวัฒนธรรมแห่งชาติ"}],
|
||||||
"สวท.": [{ORTH: "สวท.", LEMMA: "สถานีวิทยุกระจายเสียงแห่งประเทศไทย"}],
|
"สวท.": [{ORTH: "สวท.", LEMMA: "สถานีวิทยุกระจายเสียงแห่งประเทศไทย"}],
|
||||||
"สวทช.": [{ORTH: "สวทช.", LEMMA: "สำนักงานพัฒนาวิทยาศาสตร์และเทคโนโลยีแห่งชาติ"}],
|
"สวทช.": [{ORTH: "สวทช.", LEMMA: "สำนักงานพัฒนาวิทยาศาสตร์และเทคโนโลยีแห่งชาติ"}],
|
||||||
"สคช.": [{ORTH: "สคช.", LEMMA: "สำนักงานคณะกรรมการพัฒนาการเศรษฐกิจและสังคมแห่งชาติ"}],
|
"สคช.": [
|
||||||
|
{ORTH: "สคช.", LEMMA: "สำนักงานคณะกรรมการพัฒนาการเศรษฐกิจและสังคมแห่งชาติ"}
|
||||||
|
],
|
||||||
"สสว.": [{ORTH: "สสว.", LEMMA: "สำนักงานส่งเสริมวิสาหกิจขนาดกลางและขนาดย่อม"}],
|
"สสว.": [{ORTH: "สสว.", LEMMA: "สำนักงานส่งเสริมวิสาหกิจขนาดกลางและขนาดย่อม"}],
|
||||||
"สสส.": [{ORTH: "สสส.", LEMMA: "สำนักงานกองทุนสนับสนุนการสร้างเสริมสุขภาพ"}],
|
"สสส.": [{ORTH: "สสส.", LEMMA: "สำนักงานกองทุนสนับสนุนการสร้างเสริมสุขภาพ"}],
|
||||||
"สสวท.": [{ORTH: "สสวท.", LEMMA: "สถาบันส่งเสริมการสอนวิทยาศาสตร์และเทคโนโลยี"}],
|
"สสวท.": [{ORTH: "สสวท.", LEMMA: "สถาบันส่งเสริมการสอนวิทยาศาสตร์และเทคโนโลยี"}],
|
||||||
|
@ -85,7 +99,7 @@ _exc = {
|
||||||
"อปพร.": [{ORTH: "อปพร.", LEMMA: "อาสาสมัครป้องกันภัยฝ่ายพลเรือน"}],
|
"อปพร.": [{ORTH: "อปพร.", LEMMA: "อาสาสมัครป้องกันภัยฝ่ายพลเรือน"}],
|
||||||
"อย.": [{ORTH: "อย.", LEMMA: "สำนักงานคณะกรรมการอาหารและยา"}],
|
"อย.": [{ORTH: "อย.", LEMMA: "สำนักงานคณะกรรมการอาหารและยา"}],
|
||||||
"อ.ส.ม.ท.": [{ORTH: "อ.ส.ม.ท.", LEMMA: "องค์การสื่อสารมวลชนแห่งประเทศไทย"}],
|
"อ.ส.ม.ท.": [{ORTH: "อ.ส.ม.ท.", LEMMA: "องค์การสื่อสารมวลชนแห่งประเทศไทย"}],
|
||||||
#มหาวิทยาลัย / สถานศึกษา / university / college
|
# มหาวิทยาลัย / สถานศึกษา / university / college
|
||||||
"มทส.": [{ORTH: "มทส.", LEMMA: "มหาวิทยาลัยเทคโนโลยีสุรนารี"}],
|
"มทส.": [{ORTH: "มทส.", LEMMA: "มหาวิทยาลัยเทคโนโลยีสุรนารี"}],
|
||||||
"มธ.": [{ORTH: "มธ.", LEMMA: "มหาวิทยาลัยธรรมศาสตร์"}],
|
"มธ.": [{ORTH: "มธ.", LEMMA: "มหาวิทยาลัยธรรมศาสตร์"}],
|
||||||
"ม.อ.": [{ORTH: "ม.อ.", LEMMA: "มหาวิทยาลัยสงขลานครินทร์"}],
|
"ม.อ.": [{ORTH: "ม.อ.", LEMMA: "มหาวิทยาลัยสงขลานครินทร์"}],
|
||||||
|
@ -93,7 +107,7 @@ _exc = {
|
||||||
"มมส.": [{ORTH: "มมส.", LEMMA: "มหาวิทยาลัยมหาสารคาม"}],
|
"มมส.": [{ORTH: "มมส.", LEMMA: "มหาวิทยาลัยมหาสารคาม"}],
|
||||||
"วท.": [{ORTH: "วท.", LEMMA: "วิทยาลัยเทคนิค"}],
|
"วท.": [{ORTH: "วท.", LEMMA: "วิทยาลัยเทคนิค"}],
|
||||||
"สตม.": [{ORTH: "สตม.", LEMMA: "สำนักงานตรวจคนเข้าเมือง (ตำรวจ)"}],
|
"สตม.": [{ORTH: "สตม.", LEMMA: "สำนักงานตรวจคนเข้าเมือง (ตำรวจ)"}],
|
||||||
#ยศ / rank
|
# ยศ / rank
|
||||||
"ดร.": [{ORTH: "ดร.", LEMMA: "ดอกเตอร์"}],
|
"ดร.": [{ORTH: "ดร.", LEMMA: "ดอกเตอร์"}],
|
||||||
"ด.ต.": [{ORTH: "ด.ต.", LEMMA: "ดาบตำรวจ"}],
|
"ด.ต.": [{ORTH: "ด.ต.", LEMMA: "ดาบตำรวจ"}],
|
||||||
"จ.ต.": [{ORTH: "จ.ต.", LEMMA: "จ่าตรี"}],
|
"จ.ต.": [{ORTH: "จ.ต.", LEMMA: "จ่าตรี"}],
|
||||||
|
@ -133,10 +147,14 @@ _exc = {
|
||||||
"ผญบ.": [{ORTH: "ผญบ.", LEMMA: "ผู้ใหญ่บ้าน"}],
|
"ผญบ.": [{ORTH: "ผญบ.", LEMMA: "ผู้ใหญ่บ้าน"}],
|
||||||
"ผบ.": [{ORTH: "ผบ.", LEMMA: "ผู้บังคับบัญชา"}],
|
"ผบ.": [{ORTH: "ผบ.", LEMMA: "ผู้บังคับบัญชา"}],
|
||||||
"ผบก.": [{ORTH: "ผบก.", LEMMA: "ผู้บังคับบัญชาการ (ตำรวจ)"}],
|
"ผบก.": [{ORTH: "ผบก.", LEMMA: "ผู้บังคับบัญชาการ (ตำรวจ)"}],
|
||||||
"ผบก.": [{ORTH: "ผบก.", LEMMA: "ผู้บังคับการ (ตำรวจ)"}],
|
|
||||||
"ผบก.น.": [{ORTH: "ผบก.น.", LEMMA: "ผู้บังคับการตำรวจนครบาล"}],
|
"ผบก.น.": [{ORTH: "ผบก.น.", LEMMA: "ผู้บังคับการตำรวจนครบาล"}],
|
||||||
"ผบก.ป.": [{ORTH: "ผบก.ป.", LEMMA: "ผู้บังคับการตำรวจกองปราบปราม"}],
|
"ผบก.ป.": [{ORTH: "ผบก.ป.", LEMMA: "ผู้บังคับการตำรวจกองปราบปราม"}],
|
||||||
"ผบก.ปค.": [{ORTH: "ผบก.ปค.", LEMMA: "ผู้บังคับการ กองบังคับการปกครอง (โรงเรียนนายร้อยตำรวจ)"}],
|
"ผบก.ปค.": [
|
||||||
|
{
|
||||||
|
ORTH: "ผบก.ปค.",
|
||||||
|
LEMMA: "ผู้บังคับการ กองบังคับการปกครอง (โรงเรียนนายร้อยตำรวจ)",
|
||||||
|
}
|
||||||
|
],
|
||||||
"ผบก.ปม.": [{ORTH: "ผบก.ปม.", LEMMA: "ผู้บังคับการตำรวจป่าไม้"}],
|
"ผบก.ปม.": [{ORTH: "ผบก.ปม.", LEMMA: "ผู้บังคับการตำรวจป่าไม้"}],
|
||||||
"ผบก.ภ.": [{ORTH: "ผบก.ภ.", LEMMA: "ผู้บังคับการตำรวจภูธร"}],
|
"ผบก.ภ.": [{ORTH: "ผบก.ภ.", LEMMA: "ผู้บังคับการตำรวจภูธร"}],
|
||||||
"ผบช.": [{ORTH: "ผบช.", LEMMA: "ผู้บัญชาการ (ตำรวจ)"}],
|
"ผบช.": [{ORTH: "ผบช.", LEMMA: "ผู้บัญชาการ (ตำรวจ)"}],
|
||||||
|
@ -177,7 +195,6 @@ _exc = {
|
||||||
"พล.อ.ต.": [{ORTH: "พล.อ.ต.", LEMMA: "พลอากาศตรี"}],
|
"พล.อ.ต.": [{ORTH: "พล.อ.ต.", LEMMA: "พลอากาศตรี"}],
|
||||||
"พล.อ.ท.": [{ORTH: "พล.อ.ท.", LEMMA: "พลอากาศโท"}],
|
"พล.อ.ท.": [{ORTH: "พล.อ.ท.", LEMMA: "พลอากาศโท"}],
|
||||||
"พล.อ.อ.": [{ORTH: "พล.อ.อ.", LEMMA: "พลอากาศเอก"}],
|
"พล.อ.อ.": [{ORTH: "พล.อ.อ.", LEMMA: "พลอากาศเอก"}],
|
||||||
"พ.อ.": [{ORTH: "พ.อ.", LEMMA: "พันเอก"}],
|
|
||||||
"พ.อ.พิเศษ": [{ORTH: "พ.อ.พิเศษ", LEMMA: "พันเอกพิเศษ"}],
|
"พ.อ.พิเศษ": [{ORTH: "พ.อ.พิเศษ", LEMMA: "พันเอกพิเศษ"}],
|
||||||
"พ.อ.ต.": [{ORTH: "พ.อ.ต.", LEMMA: "พันจ่าอากาศตรี"}],
|
"พ.อ.ต.": [{ORTH: "พ.อ.ต.", LEMMA: "พันจ่าอากาศตรี"}],
|
||||||
"พ.อ.ท.": [{ORTH: "พ.อ.ท.", LEMMA: "พันจ่าอากาศโท"}],
|
"พ.อ.ท.": [{ORTH: "พ.อ.ท.", LEMMA: "พันจ่าอากาศโท"}],
|
||||||
|
@ -209,7 +226,7 @@ _exc = {
|
||||||
"ส.อ.": [{ORTH: "ส.อ.", LEMMA: "สิบเอก"}],
|
"ส.อ.": [{ORTH: "ส.อ.", LEMMA: "สิบเอก"}],
|
||||||
"อจ.": [{ORTH: "อจ.", LEMMA: "อาจารย์"}],
|
"อจ.": [{ORTH: "อจ.", LEMMA: "อาจารย์"}],
|
||||||
"อจญ.": [{ORTH: "อจญ.", LEMMA: "อาจารย์ใหญ่"}],
|
"อจญ.": [{ORTH: "อจญ.", LEMMA: "อาจารย์ใหญ่"}],
|
||||||
#วุฒิ / bachelor degree
|
# วุฒิ / bachelor degree
|
||||||
"ป.": [{ORTH: "ป.", LEMMA: "ประถมศึกษา"}],
|
"ป.": [{ORTH: "ป.", LEMMA: "ประถมศึกษา"}],
|
||||||
"ป.กศ.": [{ORTH: "ป.กศ.", LEMMA: "ประกาศนียบัตรวิชาการศึกษา"}],
|
"ป.กศ.": [{ORTH: "ป.กศ.", LEMMA: "ประกาศนียบัตรวิชาการศึกษา"}],
|
||||||
"ป.กศ.สูง": [{ORTH: "ป.กศ.สูง", LEMMA: "ประกาศนียบัตรวิชาการศึกษาชั้นสูง"}],
|
"ป.กศ.สูง": [{ORTH: "ป.กศ.สูง", LEMMA: "ประกาศนียบัตรวิชาการศึกษาชั้นสูง"}],
|
||||||
|
@ -283,20 +300,20 @@ _exc = {
|
||||||
"อ.บ.": [{ORTH: "อ.บ.", LEMMA: "อักษรศาสตรบัณฑิต"}],
|
"อ.บ.": [{ORTH: "อ.บ.", LEMMA: "อักษรศาสตรบัณฑิต"}],
|
||||||
"อ.ม.": [{ORTH: "อ.ม.", LEMMA: "อักษรศาสตรมหาบัณฑิต"}],
|
"อ.ม.": [{ORTH: "อ.ม.", LEMMA: "อักษรศาสตรมหาบัณฑิต"}],
|
||||||
"อ.ด.": [{ORTH: "อ.ด.", LEMMA: "อักษรศาสตรดุษฎีบัณฑิต"}],
|
"อ.ด.": [{ORTH: "อ.ด.", LEMMA: "อักษรศาสตรดุษฎีบัณฑิต"}],
|
||||||
#ปี / เวลา / year / time
|
# ปี / เวลา / year / time
|
||||||
"ชม.": [{ORTH: "ชม.", LEMMA: "ชั่วโมง"}],
|
"ชม.": [{ORTH: "ชม.", LEMMA: "ชั่วโมง"}],
|
||||||
"จ.ศ.": [{ORTH: "จ.ศ.", LEMMA: "จุลศักราช"}],
|
"จ.ศ.": [{ORTH: "จ.ศ.", LEMMA: "จุลศักราช"}],
|
||||||
"ค.ศ.": [{ORTH: "ค.ศ.", LEMMA: "คริสต์ศักราช"}],
|
"ค.ศ.": [{ORTH: "ค.ศ.", LEMMA: "คริสต์ศักราช"}],
|
||||||
"ฮ.ศ.": [{ORTH: "ฮ.ศ.", LEMMA: "ฮิจเราะห์ศักราช"}],
|
"ฮ.ศ.": [{ORTH: "ฮ.ศ.", LEMMA: "ฮิจเราะห์ศักราช"}],
|
||||||
"ว.ด.ป.": [{ORTH: "ว.ด.ป.", LEMMA: "วัน เดือน ปี"}],
|
"ว.ด.ป.": [{ORTH: "ว.ด.ป.", LEMMA: "วัน เดือน ปี"}],
|
||||||
#ระยะทาง / distance
|
# ระยะทาง / distance
|
||||||
"ฮม.": [{ORTH: "ฮม.", LEMMA: "เฮกโตเมตร"}],
|
"ฮม.": [{ORTH: "ฮม.", LEMMA: "เฮกโตเมตร"}],
|
||||||
"ดคม.": [{ORTH: "ดคม.", LEMMA: "เดคาเมตร"}],
|
"ดคม.": [{ORTH: "ดคม.", LEMMA: "เดคาเมตร"}],
|
||||||
"ดม.": [{ORTH: "ดม.", LEMMA: "เดซิเมตร"}],
|
"ดม.": [{ORTH: "ดม.", LEMMA: "เดซิเมตร"}],
|
||||||
"มม.": [{ORTH: "มม.", LEMMA: "มิลลิเมตร"}],
|
"มม.": [{ORTH: "มม.", LEMMA: "มิลลิเมตร"}],
|
||||||
"ซม.": [{ORTH: "ซม.", LEMMA: "เซนติเมตร"}],
|
"ซม.": [{ORTH: "ซม.", LEMMA: "เซนติเมตร"}],
|
||||||
"กม.": [{ORTH: "กม.", LEMMA: "กิโลเมตร"}],
|
"กม.": [{ORTH: "กม.", LEMMA: "กิโลเมตร"}],
|
||||||
#น้ำหนัก / weight
|
# น้ำหนัก / weight
|
||||||
"น.น.": [{ORTH: "น.น.", LEMMA: "น้ำหนัก"}],
|
"น.น.": [{ORTH: "น.น.", LEMMA: "น้ำหนัก"}],
|
||||||
"ฮก.": [{ORTH: "ฮก.", LEMMA: "เฮกโตกรัม"}],
|
"ฮก.": [{ORTH: "ฮก.", LEMMA: "เฮกโตกรัม"}],
|
||||||
"ดคก.": [{ORTH: "ดคก.", LEMMA: "เดคากรัม"}],
|
"ดคก.": [{ORTH: "ดคก.", LEMMA: "เดคากรัม"}],
|
||||||
|
@ -305,7 +322,7 @@ _exc = {
|
||||||
"มก.": [{ORTH: "มก.", LEMMA: "มิลลิกรัม"}],
|
"มก.": [{ORTH: "มก.", LEMMA: "มิลลิกรัม"}],
|
||||||
"ก.": [{ORTH: "ก.", LEMMA: "กรัม"}],
|
"ก.": [{ORTH: "ก.", LEMMA: "กรัม"}],
|
||||||
"กก.": [{ORTH: "กก.", LEMMA: "กิโลกรัม"}],
|
"กก.": [{ORTH: "กก.", LEMMA: "กิโลกรัม"}],
|
||||||
#ปริมาตร / volume
|
# ปริมาตร / volume
|
||||||
"ฮล.": [{ORTH: "ฮล.", LEMMA: "เฮกโตลิตร"}],
|
"ฮล.": [{ORTH: "ฮล.", LEMMA: "เฮกโตลิตร"}],
|
||||||
"ดคล.": [{ORTH: "ดคล.", LEMMA: "เดคาลิตร"}],
|
"ดคล.": [{ORTH: "ดคล.", LEMMA: "เดคาลิตร"}],
|
||||||
"ดล.": [{ORTH: "ดล.", LEMMA: "เดซิลิตร"}],
|
"ดล.": [{ORTH: "ดล.", LEMMA: "เดซิลิตร"}],
|
||||||
|
@ -313,12 +330,12 @@ _exc = {
|
||||||
"ล.": [{ORTH: "ล.", LEMMA: "ลิตร"}],
|
"ล.": [{ORTH: "ล.", LEMMA: "ลิตร"}],
|
||||||
"กล.": [{ORTH: "กล.", LEMMA: "กิโลลิตร"}],
|
"กล.": [{ORTH: "กล.", LEMMA: "กิโลลิตร"}],
|
||||||
"ลบ.": [{ORTH: "ลบ.", LEMMA: "ลูกบาศก์"}],
|
"ลบ.": [{ORTH: "ลบ.", LEMMA: "ลูกบาศก์"}],
|
||||||
#พื้นที่ / area
|
# พื้นที่ / area
|
||||||
"ตร.ซม.": [{ORTH: "ตร.ซม.", LEMMA: "ตารางเซนติเมตร"}],
|
"ตร.ซม.": [{ORTH: "ตร.ซม.", LEMMA: "ตารางเซนติเมตร"}],
|
||||||
"ตร.ม.": [{ORTH: "ตร.ม.", LEMMA: "ตารางเมตร"}],
|
"ตร.ม.": [{ORTH: "ตร.ม.", LEMMA: "ตารางเมตร"}],
|
||||||
"ตร.ว.": [{ORTH: "ตร.ว.", LEMMA: "ตารางวา"}],
|
"ตร.ว.": [{ORTH: "ตร.ว.", LEMMA: "ตารางวา"}],
|
||||||
"ตร.กม.": [{ORTH: "ตร.กม.", LEMMA: "ตารางกิโลเมตร"}],
|
"ตร.กม.": [{ORTH: "ตร.กม.", LEMMA: "ตารางกิโลเมตร"}],
|
||||||
#เดือน / month
|
# เดือน / month
|
||||||
"ม.ค.": [{ORTH: "ม.ค.", LEMMA: "มกราคม"}],
|
"ม.ค.": [{ORTH: "ม.ค.", LEMMA: "มกราคม"}],
|
||||||
"ก.พ.": [{ORTH: "ก.พ.", LEMMA: "กุมภาพันธ์"}],
|
"ก.พ.": [{ORTH: "ก.พ.", LEMMA: "กุมภาพันธ์"}],
|
||||||
"มี.ค.": [{ORTH: "มี.ค.", LEMMA: "มีนาคม"}],
|
"มี.ค.": [{ORTH: "มี.ค.", LEMMA: "มีนาคม"}],
|
||||||
|
@ -331,22 +348,22 @@ _exc = {
|
||||||
"ต.ค.": [{ORTH: "ต.ค.", LEMMA: "ตุลาคม"}],
|
"ต.ค.": [{ORTH: "ต.ค.", LEMMA: "ตุลาคม"}],
|
||||||
"พ.ย.": [{ORTH: "พ.ย.", LEMMA: "พฤศจิกายน"}],
|
"พ.ย.": [{ORTH: "พ.ย.", LEMMA: "พฤศจิกายน"}],
|
||||||
"ธ.ค.": [{ORTH: "ธ.ค.", LEMMA: "ธันวาคม"}],
|
"ธ.ค.": [{ORTH: "ธ.ค.", LEMMA: "ธันวาคม"}],
|
||||||
#เพศ / gender
|
# เพศ / gender
|
||||||
"ช.": [{ORTH: "ช.", LEMMA: "ชาย"}],
|
"ช.": [{ORTH: "ช.", LEMMA: "ชาย"}],
|
||||||
"ญ.": [{ORTH: "ญ.", LEMMA: "หญิง"}],
|
"ญ.": [{ORTH: "ญ.", LEMMA: "หญิง"}],
|
||||||
"ด.ช.": [{ORTH: "ด.ช.", LEMMA: "เด็กชาย"}],
|
"ด.ช.": [{ORTH: "ด.ช.", LEMMA: "เด็กชาย"}],
|
||||||
"ด.ญ.": [{ORTH: "ด.ญ.", LEMMA: "เด็กหญิง"}],
|
"ด.ญ.": [{ORTH: "ด.ญ.", LEMMA: "เด็กหญิง"}],
|
||||||
#ที่อยู่ / address
|
# ที่อยู่ / address
|
||||||
"ถ.": [{ORTH: "ถ.", LEMMA: "ถนน"}],
|
"ถ.": [{ORTH: "ถ.", LEMMA: "ถนน"}],
|
||||||
"ต.": [{ORTH: "ต.", LEMMA: "ตำบล"}],
|
"ต.": [{ORTH: "ต.", LEMMA: "ตำบล"}],
|
||||||
"อ.": [{ORTH: "อ.", LEMMA: "อำเภอ"}],
|
"อ.": [{ORTH: "อ.", LEMMA: "อำเภอ"}],
|
||||||
"จ.": [{ORTH: "จ.", LEMMA: "จังหวัด"}],
|
"จ.": [{ORTH: "จ.", LEMMA: "จังหวัด"}],
|
||||||
#สรรพนาม / pronoun
|
# สรรพนาม / pronoun
|
||||||
"ข้าฯ": [{ORTH: "ข้าฯ", LEMMA: "ข้าพระพุทธเจ้า"}],
|
"ข้าฯ": [{ORTH: "ข้าฯ", LEMMA: "ข้าพระพุทธเจ้า"}],
|
||||||
"ทูลเกล้าฯ": [{ORTH: "ทูลเกล้าฯ", LEMMA: "ทูลเกล้าทูลกระหม่อม"}],
|
"ทูลเกล้าฯ": [{ORTH: "ทูลเกล้าฯ", LEMMA: "ทูลเกล้าทูลกระหม่อม"}],
|
||||||
"น้อมเกล้าฯ": [{ORTH: "น้อมเกล้าฯ", LEMMA: "น้อมเกล้าน้อมกระหม่อม"}],
|
"น้อมเกล้าฯ": [{ORTH: "น้อมเกล้าฯ", LEMMA: "น้อมเกล้าน้อมกระหม่อม"}],
|
||||||
"โปรดเกล้าฯ": [{ORTH: "โปรดเกล้าฯ", LEMMA: "โปรดเกล้าโปรดกระหม่อม"}],
|
"โปรดเกล้าฯ": [{ORTH: "โปรดเกล้าฯ", LEMMA: "โปรดเกล้าโปรดกระหม่อม"}],
|
||||||
#การเมือง / politic
|
# การเมือง / politic
|
||||||
"ขจก.": [{ORTH: "ขจก.", LEMMA: "ขบวนการโจรก่อการร้าย"}],
|
"ขจก.": [{ORTH: "ขจก.", LEMMA: "ขบวนการโจรก่อการร้าย"}],
|
||||||
"ขบด.": [{ORTH: "ขบด.", LEMMA: "ขบวนการแบ่งแยกดินแดน"}],
|
"ขบด.": [{ORTH: "ขบด.", LEMMA: "ขบวนการแบ่งแยกดินแดน"}],
|
||||||
"นปช.": [{ORTH: "นปช.", LEMMA: "แนวร่วมประชาธิปไตยขับไล่เผด็จการ"}],
|
"นปช.": [{ORTH: "นปช.", LEMMA: "แนวร่วมประชาธิปไตยขับไล่เผด็จการ"}],
|
||||||
|
@ -363,7 +380,7 @@ _exc = {
|
||||||
"สจ.": [{ORTH: "สจ.", LEMMA: "สมาชิกสภาจังหวัด"}],
|
"สจ.": [{ORTH: "สจ.", LEMMA: "สมาชิกสภาจังหวัด"}],
|
||||||
"สว.": [{ORTH: "สว.", LEMMA: "สมาชิกวุฒิสภา"}],
|
"สว.": [{ORTH: "สว.", LEMMA: "สมาชิกวุฒิสภา"}],
|
||||||
"ส.ส.": [{ORTH: "ส.ส.", LEMMA: "สมาชิกสภาผู้แทนราษฎร"}],
|
"ส.ส.": [{ORTH: "ส.ส.", LEMMA: "สมาชิกสภาผู้แทนราษฎร"}],
|
||||||
#ทั่วไป / general
|
# ทั่วไป / general
|
||||||
"ก.ข.ค.": [{ORTH: "ก.ข.ค.", LEMMA: "ก้างขวางคอ"}],
|
"ก.ข.ค.": [{ORTH: "ก.ข.ค.", LEMMA: "ก้างขวางคอ"}],
|
||||||
"กทม.": [{ORTH: "กทม.", LEMMA: "กรุงเทพมหานคร"}],
|
"กทม.": [{ORTH: "กทม.", LEMMA: "กรุงเทพมหานคร"}],
|
||||||
"กรุงเทพฯ": [{ORTH: "กรุงเทพฯ", LEMMA: "กรุงเทพมหานคร"}],
|
"กรุงเทพฯ": [{ORTH: "กรุงเทพฯ", LEMMA: "กรุงเทพมหานคร"}],
|
||||||
|
@ -376,7 +393,12 @@ _exc = {
|
||||||
"จก.": [{ORTH: "จก.", LEMMA: "จำกัด"}],
|
"จก.": [{ORTH: "จก.", LEMMA: "จำกัด"}],
|
||||||
"จขกท.": [{ORTH: "จขกท.", LEMMA: "เจ้าของกระทู้"}],
|
"จขกท.": [{ORTH: "จขกท.", LEMMA: "เจ้าของกระทู้"}],
|
||||||
"จนท.": [{ORTH: "จนท.", LEMMA: "เจ้าหน้าที่"}],
|
"จนท.": [{ORTH: "จนท.", LEMMA: "เจ้าหน้าที่"}],
|
||||||
"จ.ป.ร.": [{ORTH: "จ.ป.ร.", LEMMA: "มหาจุฬาลงกรณ ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระจุลจอมเกล้าเจ้าอยู่หัว)"}],
|
"จ.ป.ร.": [
|
||||||
|
{
|
||||||
|
ORTH: "จ.ป.ร.",
|
||||||
|
LEMMA: "มหาจุฬาลงกรณ ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระจุลจอมเกล้าเจ้าอยู่หัว)",
|
||||||
|
}
|
||||||
|
],
|
||||||
"จ.ม.": [{ORTH: "จ.ม.", LEMMA: "จดหมาย"}],
|
"จ.ม.": [{ORTH: "จ.ม.", LEMMA: "จดหมาย"}],
|
||||||
"จย.": [{ORTH: "จย.", LEMMA: "จักรยาน"}],
|
"จย.": [{ORTH: "จย.", LEMMA: "จักรยาน"}],
|
||||||
"จยย.": [{ORTH: "จยย.", LEMMA: "จักรยานยนต์"}],
|
"จยย.": [{ORTH: "จยย.", LEMMA: "จักรยานยนต์"}],
|
||||||
|
@ -387,7 +409,9 @@ _exc = {
|
||||||
"น.ศ.": [{ORTH: "น.ศ.", LEMMA: "นักศึกษา"}],
|
"น.ศ.": [{ORTH: "น.ศ.", LEMMA: "นักศึกษา"}],
|
||||||
"น.ส.": [{ORTH: "น.ส.", LEMMA: "นางสาว"}],
|
"น.ส.": [{ORTH: "น.ส.", LEMMA: "นางสาว"}],
|
||||||
"น.ส.๓": [{ORTH: "น.ส.๓", LEMMA: "หนังสือรับรองการทำประโยชน์ในที่ดิน"}],
|
"น.ส.๓": [{ORTH: "น.ส.๓", LEMMA: "หนังสือรับรองการทำประโยชน์ในที่ดิน"}],
|
||||||
"น.ส.๓ ก.": [{ORTH: "น.ส.๓ ก", LEMMA: "หนังสือแสดงกรรมสิทธิ์ในที่ดิน (มีระวางกำหนด)"}],
|
"น.ส.๓ ก.": [
|
||||||
|
{ORTH: "น.ส.๓ ก", LEMMA: "หนังสือแสดงกรรมสิทธิ์ในที่ดิน (มีระวางกำหนด)"}
|
||||||
|
],
|
||||||
"นสพ.": [{ORTH: "นสพ.", LEMMA: "หนังสือพิมพ์"}],
|
"นสพ.": [{ORTH: "นสพ.", LEMMA: "หนังสือพิมพ์"}],
|
||||||
"บ.ก.": [{ORTH: "บ.ก.", LEMMA: "บรรณาธิการ"}],
|
"บ.ก.": [{ORTH: "บ.ก.", LEMMA: "บรรณาธิการ"}],
|
||||||
"บจก.": [{ORTH: "บจก.", LEMMA: "บริษัทจำกัด"}],
|
"บจก.": [{ORTH: "บจก.", LEMMA: "บริษัทจำกัด"}],
|
||||||
|
@ -410,7 +434,12 @@ _exc = {
|
||||||
"พขร.": [{ORTH: "พขร.", LEMMA: "พนักงานขับรถ"}],
|
"พขร.": [{ORTH: "พขร.", LEMMA: "พนักงานขับรถ"}],
|
||||||
"ภ.ง.ด.": [{ORTH: "ภ.ง.ด.", LEMMA: "ภาษีเงินได้"}],
|
"ภ.ง.ด.": [{ORTH: "ภ.ง.ด.", LEMMA: "ภาษีเงินได้"}],
|
||||||
"ภ.ง.ด.๙": [{ORTH: "ภ.ง.ด.๙", LEMMA: "แบบแสดงรายการเสียภาษีเงินได้ของกรมสรรพากร"}],
|
"ภ.ง.ด.๙": [{ORTH: "ภ.ง.ด.๙", LEMMA: "แบบแสดงรายการเสียภาษีเงินได้ของกรมสรรพากร"}],
|
||||||
"ภ.ป.ร.": [{ORTH: "ภ.ป.ร.", LEMMA: "ภูมิพลอดุยเดช ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระปรมินทรมหาภูมิพลอดุลยเดช)"}],
|
"ภ.ป.ร.": [
|
||||||
|
{
|
||||||
|
ORTH: "ภ.ป.ร.",
|
||||||
|
LEMMA: "ภูมิพลอดุยเดช ปรมราชาธิราช (พระปรมาภิไธยในพระบาทสมเด็จพระปรมินทรมหาภูมิพลอดุลยเดช)",
|
||||||
|
}
|
||||||
|
],
|
||||||
"ภ.พ.": [{ORTH: "ภ.พ.", LEMMA: "ภาษีมูลค่าเพิ่ม"}],
|
"ภ.พ.": [{ORTH: "ภ.พ.", LEMMA: "ภาษีมูลค่าเพิ่ม"}],
|
||||||
"ร.": [{ORTH: "ร.", LEMMA: "รัชกาล"}],
|
"ร.": [{ORTH: "ร.", LEMMA: "รัชกาล"}],
|
||||||
"ร.ง.": [{ORTH: "ร.ง.", LEMMA: "โรงงาน"}],
|
"ร.ง.": [{ORTH: "ร.ง.", LEMMA: "โรงงาน"}],
|
||||||
|
@ -438,7 +467,6 @@ _exc = {
|
||||||
"เสธ.": [{ORTH: "เสธ.", LEMMA: "เสนาธิการ"}],
|
"เสธ.": [{ORTH: "เสธ.", LEMMA: "เสนาธิการ"}],
|
||||||
"หจก.": [{ORTH: "หจก.", LEMMA: "ห้างหุ้นส่วนจำกัด"}],
|
"หจก.": [{ORTH: "หจก.", LEMMA: "ห้างหุ้นส่วนจำกัด"}],
|
||||||
"ห.ร.ม.": [{ORTH: "ห.ร.ม.", LEMMA: "ตัวหารร่วมมาก"}],
|
"ห.ร.ม.": [{ORTH: "ห.ร.ม.", LEMMA: "ตัวหารร่วมมาก"}],
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -333,6 +333,11 @@ class Language(object):
|
||||||
"""
|
"""
|
||||||
if name not in self.pipe_names:
|
if name not in self.pipe_names:
|
||||||
raise ValueError(Errors.E001.format(name=name, opts=self.pipe_names))
|
raise ValueError(Errors.E001.format(name=name, opts=self.pipe_names))
|
||||||
|
if not hasattr(component, "__call__"):
|
||||||
|
msg = Errors.E003.format(component=repr(component), name=name)
|
||||||
|
if isinstance(component, basestring_) and component in self.factories:
|
||||||
|
msg += Errors.E135.format(name=name)
|
||||||
|
raise ValueError(msg)
|
||||||
self.pipeline[self.pipe_names.index(name)] = (name, component)
|
self.pipeline[self.pipe_names.index(name)] = (name, component)
|
||||||
|
|
||||||
def rename_pipe(self, old_name, new_name):
|
def rename_pipe(self, old_name, new_name):
|
||||||
|
@ -412,7 +417,9 @@ class Language(object):
|
||||||
golds (iterable): A batch of `GoldParse` objects.
|
golds (iterable): A batch of `GoldParse` objects.
|
||||||
drop (float): The droput rate.
|
drop (float): The droput rate.
|
||||||
sgd (callable): An optimizer.
|
sgd (callable): An optimizer.
|
||||||
RETURNS (dict): Results from the update.
|
losses (dict): Dictionary to update with the loss, keyed by component.
|
||||||
|
component_cfg (dict): Config parameters for specific pipeline
|
||||||
|
components, keyed by component name.
|
||||||
|
|
||||||
DOCS: https://spacy.io/api/language#update
|
DOCS: https://spacy.io/api/language#update
|
||||||
"""
|
"""
|
||||||
|
@ -593,6 +600,19 @@ class Language(object):
|
||||||
def evaluate(
|
def evaluate(
|
||||||
self, docs_golds, verbose=False, batch_size=256, scorer=None, component_cfg=None
|
self, docs_golds, verbose=False, batch_size=256, scorer=None, component_cfg=None
|
||||||
):
|
):
|
||||||
|
"""Evaluate a model's pipeline components.
|
||||||
|
|
||||||
|
docs_golds (iterable): Tuples of `Doc` and `GoldParse` objects.
|
||||||
|
verbose (bool): Print debugging information.
|
||||||
|
batch_size (int): Batch size to use.
|
||||||
|
scorer (Scorer): Optional `Scorer` to use. If not passed in, a new one
|
||||||
|
will be created.
|
||||||
|
component_cfg (dict): An optional dictionary with extra keyword
|
||||||
|
arguments for specific components.
|
||||||
|
RETURNS (Scorer): The scorer containing the evaluation results.
|
||||||
|
|
||||||
|
DOCS: https://spacy.io/api/language#evaluate
|
||||||
|
"""
|
||||||
if scorer is None:
|
if scorer is None:
|
||||||
scorer = Scorer()
|
scorer = Scorer()
|
||||||
if component_cfg is None:
|
if component_cfg is None:
|
||||||
|
|
|
@ -1,5 +1,6 @@
|
||||||
# coding: utf8
|
# coding: utf8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
from collections import OrderedDict
|
||||||
|
|
||||||
from .symbols import POS, NOUN, VERB, ADJ, PUNCT, PROPN
|
from .symbols import POS, NOUN, VERB, ADJ, PUNCT, PROPN
|
||||||
from .symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
|
from .symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
|
||||||
|
@ -118,8 +119,8 @@ def lemmatize(string, index, exceptions, rules):
|
||||||
forms.append(form)
|
forms.append(form)
|
||||||
else:
|
else:
|
||||||
oov_forms.append(form)
|
oov_forms.append(form)
|
||||||
# Remove duplicates, and sort forms generated by rules alphabetically.
|
# Remove duplicates but preserve the ordering of applied "rules"
|
||||||
forms = list(set(forms))
|
forms = list(OrderedDict.fromkeys(forms))
|
||||||
# Put exceptions at the front of the list, so they get priority.
|
# Put exceptions at the front of the list, so they get priority.
|
||||||
# This is a dodgy heuristic -- but it's the best we can do until we get
|
# This is a dodgy heuristic -- but it's the best we can do until we get
|
||||||
# frequencies on this. We can at least prune out problematic exceptions,
|
# frequencies on this. We can at least prune out problematic exceptions,
|
||||||
|
|
|
@ -48,7 +48,10 @@ cdef class Matcher:
|
||||||
self._extra_predicates = []
|
self._extra_predicates = []
|
||||||
self.vocab = vocab
|
self.vocab = vocab
|
||||||
self.mem = Pool()
|
self.mem = Pool()
|
||||||
self.validator = get_json_validator(TOKEN_PATTERN_SCHEMA) if validate else None
|
if validate:
|
||||||
|
self.validator = get_json_validator(TOKEN_PATTERN_SCHEMA)
|
||||||
|
else:
|
||||||
|
self.validator = None
|
||||||
|
|
||||||
def __reduce__(self):
|
def __reduce__(self):
|
||||||
data = (self.vocab, self._patterns, self._callbacks)
|
data = (self.vocab, self._patterns, self._callbacks)
|
||||||
|
@ -105,7 +108,7 @@ cdef class Matcher:
|
||||||
raise ValueError(Errors.E012.format(key=key))
|
raise ValueError(Errors.E012.format(key=key))
|
||||||
if self.validator:
|
if self.validator:
|
||||||
errors[i] = validate_json(pattern, self.validator)
|
errors[i] = validate_json(pattern, self.validator)
|
||||||
if errors:
|
if any(err for err in errors.values()):
|
||||||
raise MatchPatternError(key, errors)
|
raise MatchPatternError(key, errors)
|
||||||
key = self._normalize_key(key)
|
key = self._normalize_key(key)
|
||||||
for pattern in patterns:
|
for pattern in patterns:
|
||||||
|
|
|
@ -127,7 +127,7 @@ cdef class PhraseMatcher:
|
||||||
and self.attr not in (DEP, POS, TAG, LEMMA):
|
and self.attr not in (DEP, POS, TAG, LEMMA):
|
||||||
string_attr = self.vocab.strings[self.attr]
|
string_attr = self.vocab.strings[self.attr]
|
||||||
user_warning(Warnings.W012.format(key=key, attr=string_attr))
|
user_warning(Warnings.W012.format(key=key, attr=string_attr))
|
||||||
tags = get_bilou(length)
|
tags = get_biluo(length)
|
||||||
phrase_key = <attr_t*>mem.alloc(length, sizeof(attr_t))
|
phrase_key = <attr_t*>mem.alloc(length, sizeof(attr_t))
|
||||||
for i, tag in enumerate(tags):
|
for i, tag in enumerate(tags):
|
||||||
attr_value = self.get_lex_value(doc, i)
|
attr_value = self.get_lex_value(doc, i)
|
||||||
|
@ -230,7 +230,7 @@ cdef class PhraseMatcher:
|
||||||
return "matcher:{}-{}".format(string_attr_name, string_attr_value)
|
return "matcher:{}-{}".format(string_attr_name, string_attr_value)
|
||||||
|
|
||||||
|
|
||||||
def get_bilou(length):
|
def get_biluo(length):
|
||||||
if length == 0:
|
if length == 0:
|
||||||
raise ValueError(Errors.E127)
|
raise ValueError(Errors.E127)
|
||||||
elif length == 1:
|
elif length == 1:
|
||||||
|
|
|
@ -109,6 +109,7 @@ cdef class Morphology:
|
||||||
analysis.tag = rich_tag
|
analysis.tag = rich_tag
|
||||||
analysis.lemma = self.lemmatize(analysis.tag.pos, token.lex.orth,
|
analysis.lemma = self.lemmatize(analysis.tag.pos, token.lex.orth,
|
||||||
self.tag_map.get(tag_str, {}))
|
self.tag_map.get(tag_str, {}))
|
||||||
|
|
||||||
self._cache.set(tag_id, token.lex.orth, analysis)
|
self._cache.set(tag_id, token.lex.orth, analysis)
|
||||||
if token.lemma == 0:
|
if token.lemma == 0:
|
||||||
token.lemma = analysis.lemma
|
token.lemma = analysis.lemma
|
||||||
|
@ -140,7 +141,7 @@ cdef class Morphology:
|
||||||
if tag not in self.reverse_index:
|
if tag not in self.reverse_index:
|
||||||
return
|
return
|
||||||
tag_id = self.reverse_index[tag]
|
tag_id = self.reverse_index[tag]
|
||||||
orth = self.strings[orth_str]
|
orth = self.strings.add(orth_str)
|
||||||
cdef RichTagC rich_tag = self.rich_tags[tag_id]
|
cdef RichTagC rich_tag = self.rich_tags[tag_id]
|
||||||
attrs = intify_attrs(attrs, self.strings, _do_deprecated=True)
|
attrs = intify_attrs(attrs, self.strings, _do_deprecated=True)
|
||||||
cached = <MorphAnalysisC*>self._cache.get(tag_id, orth)
|
cached = <MorphAnalysisC*>self._cache.get(tag_id, orth)
|
||||||
|
|
|
@ -35,7 +35,17 @@ class PRFScore(object):
|
||||||
|
|
||||||
|
|
||||||
class Scorer(object):
|
class Scorer(object):
|
||||||
|
"""Compute evaluation scores."""
|
||||||
|
|
||||||
def __init__(self, eval_punct=False):
|
def __init__(self, eval_punct=False):
|
||||||
|
"""Initialize the Scorer.
|
||||||
|
|
||||||
|
eval_punct (bool): Evaluate the dependency attachments to and from
|
||||||
|
punctuation.
|
||||||
|
RETURNS (Scorer): The newly created object.
|
||||||
|
|
||||||
|
DOCS: https://spacy.io/api/scorer#init
|
||||||
|
"""
|
||||||
self.tokens = PRFScore()
|
self.tokens = PRFScore()
|
||||||
self.sbd = PRFScore()
|
self.sbd = PRFScore()
|
||||||
self.unlabelled = PRFScore()
|
self.unlabelled = PRFScore()
|
||||||
|
@ -46,34 +56,46 @@ class Scorer(object):
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def tags_acc(self):
|
def tags_acc(self):
|
||||||
|
"""RETURNS (float): Part-of-speech tag accuracy (fine grained tags,
|
||||||
|
i.e. `Token.tag`).
|
||||||
|
"""
|
||||||
return self.tags.fscore * 100
|
return self.tags.fscore * 100
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def token_acc(self):
|
def token_acc(self):
|
||||||
|
"""RETURNS (float): Tokenization accuracy."""
|
||||||
return self.tokens.precision * 100
|
return self.tokens.precision * 100
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def uas(self):
|
def uas(self):
|
||||||
|
"""RETURNS (float): Unlabelled dependency score."""
|
||||||
return self.unlabelled.fscore * 100
|
return self.unlabelled.fscore * 100
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def las(self):
|
def las(self):
|
||||||
|
"""RETURNS (float): Labelled depdendency score."""
|
||||||
return self.labelled.fscore * 100
|
return self.labelled.fscore * 100
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def ents_p(self):
|
def ents_p(self):
|
||||||
|
"""RETURNS (float): Named entity accuracy (precision)."""
|
||||||
return self.ner.precision * 100
|
return self.ner.precision * 100
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def ents_r(self):
|
def ents_r(self):
|
||||||
|
"""RETURNS (float): Named entity accuracy (recall)."""
|
||||||
return self.ner.recall * 100
|
return self.ner.recall * 100
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def ents_f(self):
|
def ents_f(self):
|
||||||
|
"""RETURNS (float): Named entity accuracy (F-score)."""
|
||||||
return self.ner.fscore * 100
|
return self.ner.fscore * 100
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def scores(self):
|
def scores(self):
|
||||||
|
"""RETURNS (dict): All scores with keys `uas`, `las`, `ents_p`,
|
||||||
|
`ents_r`, `ents_f`, `tags_acc` and `token_acc`.
|
||||||
|
"""
|
||||||
return {
|
return {
|
||||||
"uas": self.uas,
|
"uas": self.uas,
|
||||||
"las": self.las,
|
"las": self.las,
|
||||||
|
@ -84,9 +106,20 @@ class Scorer(object):
|
||||||
"token_acc": self.token_acc,
|
"token_acc": self.token_acc,
|
||||||
}
|
}
|
||||||
|
|
||||||
def score(self, tokens, gold, verbose=False, punct_labels=("p", "punct")):
|
def score(self, doc, gold, verbose=False, punct_labels=("p", "punct")):
|
||||||
if len(tokens) != len(gold):
|
"""Update the evaluation scores from a single Doc / GoldParse pair.
|
||||||
gold = GoldParse.from_annot_tuples(tokens, zip(*gold.orig_annot))
|
|
||||||
|
doc (Doc): The predicted annotations.
|
||||||
|
gold (GoldParse): The correct annotations.
|
||||||
|
verbose (bool): Print debugging information.
|
||||||
|
punct_labels (tuple): Dependency labels for punctuation. Used to
|
||||||
|
evaluate dependency attachments to punctuation if `eval_punct` is
|
||||||
|
`True`.
|
||||||
|
|
||||||
|
DOCS: https://spacy.io/api/scorer#score
|
||||||
|
"""
|
||||||
|
if len(doc) != len(gold):
|
||||||
|
gold = GoldParse.from_annot_tuples(doc, zip(*gold.orig_annot))
|
||||||
gold_deps = set()
|
gold_deps = set()
|
||||||
gold_tags = set()
|
gold_tags = set()
|
||||||
gold_ents = set(tags_to_entities([annot[-1] for annot in gold.orig_annot]))
|
gold_ents = set(tags_to_entities([annot[-1] for annot in gold.orig_annot]))
|
||||||
|
@ -96,7 +129,7 @@ class Scorer(object):
|
||||||
gold_deps.add((id_, head, dep.lower()))
|
gold_deps.add((id_, head, dep.lower()))
|
||||||
cand_deps = set()
|
cand_deps = set()
|
||||||
cand_tags = set()
|
cand_tags = set()
|
||||||
for token in tokens:
|
for token in doc:
|
||||||
if token.orth_.isspace():
|
if token.orth_.isspace():
|
||||||
continue
|
continue
|
||||||
gold_i = gold.cand_to_gold[token.i]
|
gold_i = gold.cand_to_gold[token.i]
|
||||||
|
@ -116,7 +149,7 @@ class Scorer(object):
|
||||||
cand_deps.add((gold_i, gold_head, token.dep_.lower()))
|
cand_deps.add((gold_i, gold_head, token.dep_.lower()))
|
||||||
if "-" not in [token[-1] for token in gold.orig_annot]:
|
if "-" not in [token[-1] for token in gold.orig_annot]:
|
||||||
cand_ents = set()
|
cand_ents = set()
|
||||||
for ent in tokens.ents:
|
for ent in doc.ents:
|
||||||
first = gold.cand_to_gold[ent.start]
|
first = gold.cand_to_gold[ent.start]
|
||||||
last = gold.cand_to_gold[ent.end - 1]
|
last = gold.cand_to_gold[ent.end - 1]
|
||||||
if first is None or last is None:
|
if first is None or last is None:
|
||||||
|
|
|
@ -6,6 +6,7 @@ from spacy.attrs import ORTH, LENGTH
|
||||||
from spacy.tokens import Doc, Span
|
from spacy.tokens import Doc, Span
|
||||||
from spacy.vocab import Vocab
|
from spacy.vocab import Vocab
|
||||||
from spacy.errors import ModelsWarning
|
from spacy.errors import ModelsWarning
|
||||||
|
from spacy.util import filter_spans
|
||||||
|
|
||||||
from ..util import get_doc
|
from ..util import get_doc
|
||||||
|
|
||||||
|
@ -219,3 +220,21 @@ def test_span_ents_property(doc):
|
||||||
assert sentences[2].ents[0].label_ == "PRODUCT"
|
assert sentences[2].ents[0].label_ == "PRODUCT"
|
||||||
assert sentences[2].ents[0].start == 11
|
assert sentences[2].ents[0].start == 11
|
||||||
assert sentences[2].ents[0].end == 14
|
assert sentences[2].ents[0].end == 14
|
||||||
|
|
||||||
|
|
||||||
|
def test_filter_spans(doc):
|
||||||
|
# Test filtering duplicates
|
||||||
|
spans = [doc[1:4], doc[6:8], doc[1:4], doc[10:14]]
|
||||||
|
filtered = filter_spans(spans)
|
||||||
|
assert len(filtered) == 3
|
||||||
|
assert filtered[0].start == 1 and filtered[0].end == 4
|
||||||
|
assert filtered[1].start == 6 and filtered[1].end == 8
|
||||||
|
assert filtered[2].start == 10 and filtered[2].end == 14
|
||||||
|
# Test filtering overlaps with longest preference
|
||||||
|
spans = [doc[1:4], doc[1:3], doc[5:10], doc[7:9], doc[1:4]]
|
||||||
|
filtered = filter_spans(spans)
|
||||||
|
assert len(filtered) == 2
|
||||||
|
assert len(filtered[0]) == 3
|
||||||
|
assert len(filtered[1]) == 5
|
||||||
|
assert filtered[0].start == 1 and filtered[0].end == 4
|
||||||
|
assert filtered[1].start == 5 and filtered[1].end == 10
|
||||||
|
|
|
@ -140,3 +140,28 @@ def test_underscore_mutable_defaults_dict(en_vocab):
|
||||||
assert len(token1._.mutable) == 2
|
assert len(token1._.mutable) == 2
|
||||||
assert token1._.mutable["x"] == ["y"]
|
assert token1._.mutable["x"] == ["y"]
|
||||||
assert len(token2._.mutable) == 0
|
assert len(token2._.mutable) == 0
|
||||||
|
|
||||||
|
|
||||||
|
def test_underscore_dir(en_vocab):
|
||||||
|
"""Test that dir() correctly returns extension attributes. This enables
|
||||||
|
things like tab-completion for the attributes in doc._."""
|
||||||
|
Doc.set_extension("test_dir", default=None)
|
||||||
|
doc = Doc(en_vocab, words=["hello", "world"])
|
||||||
|
assert "_" in dir(doc)
|
||||||
|
assert "test_dir" in dir(doc._)
|
||||||
|
assert "test_dir" not in dir(doc[0]._)
|
||||||
|
assert "test_dir" not in dir(doc[0:2]._)
|
||||||
|
|
||||||
|
|
||||||
|
def test_underscore_docstring(en_vocab):
|
||||||
|
"""Test that docstrings are available for extension methods, even though
|
||||||
|
they're partials."""
|
||||||
|
|
||||||
|
def test_method(doc, arg1=1, arg2=2):
|
||||||
|
"""I am a docstring"""
|
||||||
|
return (arg1, arg2)
|
||||||
|
|
||||||
|
Doc.set_extension("test_docstrings", method=test_method)
|
||||||
|
doc = Doc(en_vocab, words=["hello", "world"])
|
||||||
|
assert test_method.__doc__ == "I am a docstring"
|
||||||
|
assert doc._.test_docstrings.__doc__.rsplit(". ")[-1] == "I am a docstring"
|
||||||
|
|
|
@ -52,11 +52,13 @@ def test_get_pipe(nlp, name):
|
||||||
assert nlp.get_pipe(name) == new_pipe
|
assert nlp.get_pipe(name) == new_pipe
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("name,replacement", [("my_component", lambda doc: doc)])
|
@pytest.mark.parametrize("name,replacement,not_callable", [("my_component", lambda doc: doc, {})])
|
||||||
def test_replace_pipe(nlp, name, replacement):
|
def test_replace_pipe(nlp, name, replacement, not_callable):
|
||||||
with pytest.raises(ValueError):
|
with pytest.raises(ValueError):
|
||||||
nlp.replace_pipe(name, new_pipe)
|
nlp.replace_pipe(name, new_pipe)
|
||||||
nlp.add_pipe(new_pipe, name=name)
|
nlp.add_pipe(new_pipe, name=name)
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
nlp.replace_pipe(name, not_callable)
|
||||||
nlp.replace_pipe(name, replacement)
|
nlp.replace_pipe(name, replacement)
|
||||||
assert nlp.get_pipe(name) != new_pipe
|
assert nlp.get_pipe(name) != new_pipe
|
||||||
assert nlp.get_pipe(name) == replacement
|
assert nlp.get_pipe(name) == replacement
|
||||||
|
|
|
@ -6,20 +6,16 @@ import pytest
|
||||||
from spacy.lang.en import English
|
from spacy.lang.en import English
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.xfail(reason="Current default suffix rules avoid one upper-case letter before a dot.")
|
@pytest.mark.xfail(reason="default suffix rules avoid one upper-case letter before dot")
|
||||||
def test_issue3449():
|
def test_issue3449():
|
||||||
nlp = English()
|
nlp = English()
|
||||||
nlp.add_pipe(nlp.create_pipe('sentencizer'))
|
nlp.add_pipe(nlp.create_pipe("sentencizer"))
|
||||||
|
|
||||||
text1 = "He gave the ball to I. Do you want to go to the movies with I?"
|
text1 = "He gave the ball to I. Do you want to go to the movies with I?"
|
||||||
text2 = "He gave the ball to I. Do you want to go to the movies with I?"
|
text2 = "He gave the ball to I. Do you want to go to the movies with I?"
|
||||||
text3 = "He gave the ball to I.\nDo you want to go to the movies with I?"
|
text3 = "He gave the ball to I.\nDo you want to go to the movies with I?"
|
||||||
|
|
||||||
t1 = nlp(text1)
|
t1 = nlp(text1)
|
||||||
t2 = nlp(text2)
|
t2 = nlp(text2)
|
||||||
t3 = nlp(text3)
|
t3 = nlp(text3)
|
||||||
|
assert t1[5].text == "I"
|
||||||
assert t1[5].text == 'I'
|
assert t2[5].text == "I"
|
||||||
assert t2[5].text == 'I'
|
assert t3[5].text == "I"
|
||||||
assert t3[5].text == 'I'
|
|
||||||
|
|
||||||
|
|
15
spacy/tests/regression/test_issue3549.py
Normal file
15
spacy/tests/regression/test_issue3549.py
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from spacy.matcher import Matcher
|
||||||
|
from spacy.errors import MatchPatternError
|
||||||
|
|
||||||
|
|
||||||
|
def test_issue3549(en_vocab):
|
||||||
|
"""Test that match pattern validation doesn't raise on empty errors."""
|
||||||
|
matcher = Matcher(en_vocab, validate=True)
|
||||||
|
pattern = [{"LOWER": "hello"}, {"LOWER": "world"}]
|
||||||
|
matcher.add("GOOD", None, pattern)
|
||||||
|
with pytest.raises(MatchPatternError):
|
||||||
|
matcher.add("BAD", None, [{"X": "Y"}])
|
17
spacy/tests/regression/test_issue3555.py
Normal file
17
spacy/tests/regression/test_issue3555.py
Normal file
|
@ -0,0 +1,17 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from spacy.tokens import Doc, Token
|
||||||
|
from spacy.matcher import Matcher
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.xfail
|
||||||
|
def test_issue3555(en_vocab):
|
||||||
|
"""Test that custom extensions with default None don't break matcher."""
|
||||||
|
Token.set_extension("issue3555", default=None)
|
||||||
|
matcher = Matcher(en_vocab)
|
||||||
|
pattern = [{"LEMMA": "have"}, {"_": {"issue3555": True}}]
|
||||||
|
matcher.add("TEST", None, pattern)
|
||||||
|
doc = Doc(en_vocab, words=["have", "apple"])
|
||||||
|
matcher(doc)
|
15
spacy/tests/regression/test_issue3803.py
Normal file
15
spacy/tests/regression/test_issue3803.py
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from spacy.lang.es import Spanish
|
||||||
|
|
||||||
|
|
||||||
|
def test_issue3803():
|
||||||
|
"""Test that spanish num-like tokens have True for like_num attribute."""
|
||||||
|
nlp = Spanish()
|
||||||
|
text = "2 dos 1000 mil 12 doce"
|
||||||
|
doc = nlp(text)
|
||||||
|
|
||||||
|
assert [t.like_num for t in doc] == [True, True, True, True, True, True]
|
|
@ -3,11 +3,13 @@ from __future__ import unicode_literals
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
import os
|
import os
|
||||||
|
import ctypes
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from spacy import util
|
from spacy import util
|
||||||
from spacy import prefer_gpu, require_gpu
|
from spacy import prefer_gpu, require_gpu
|
||||||
from spacy.compat import symlink_to, symlink_remove, path2str
|
from spacy.compat import symlink_to, symlink_remove, path2str, is_windows
|
||||||
from spacy._ml import PrecomputableAffine
|
from spacy._ml import PrecomputableAffine
|
||||||
|
from subprocess import CalledProcessError
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
|
@ -28,12 +30,25 @@ def symlink_setup_target(request, symlink_target, symlink):
|
||||||
# https://github.com/pytest-dev/pytest/issues/2508#issuecomment-309934240
|
# https://github.com/pytest-dev/pytest/issues/2508#issuecomment-309934240
|
||||||
|
|
||||||
def cleanup():
|
def cleanup():
|
||||||
symlink_remove(symlink)
|
# Remove symlink only if it was created
|
||||||
|
if symlink.exists():
|
||||||
|
symlink_remove(symlink)
|
||||||
os.rmdir(path2str(symlink_target))
|
os.rmdir(path2str(symlink_target))
|
||||||
|
|
||||||
request.addfinalizer(cleanup)
|
request.addfinalizer(cleanup)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def is_admin():
|
||||||
|
"""Determine if the tests are run as admin or not."""
|
||||||
|
try:
|
||||||
|
admin = os.getuid() == 0
|
||||||
|
except AttributeError:
|
||||||
|
admin = ctypes.windll.shell32.IsUserAnAdmin() != 0
|
||||||
|
|
||||||
|
return admin
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("text", ["hello/world", "hello world"])
|
@pytest.mark.parametrize("text", ["hello/world", "hello world"])
|
||||||
def test_util_ensure_path_succeeds(text):
|
def test_util_ensure_path_succeeds(text):
|
||||||
path = util.ensure_path(text)
|
path = util.ensure_path(text)
|
||||||
|
@ -88,7 +103,20 @@ def test_require_gpu():
|
||||||
require_gpu()
|
require_gpu()
|
||||||
|
|
||||||
|
|
||||||
def test_create_symlink_windows(symlink_setup_target, symlink_target, symlink):
|
def test_create_symlink_windows(
|
||||||
|
symlink_setup_target, symlink_target, symlink, is_admin
|
||||||
|
):
|
||||||
|
"""Test the creation of symlinks on windows. If run as admin or not on windows it should succeed, otherwise a CalledProcessError should be raised."""
|
||||||
assert symlink_target.exists()
|
assert symlink_target.exists()
|
||||||
symlink_to(symlink, symlink_target)
|
|
||||||
assert symlink.exists()
|
if is_admin or not is_windows:
|
||||||
|
try:
|
||||||
|
symlink_to(symlink, symlink_target)
|
||||||
|
assert symlink.exists()
|
||||||
|
except CalledProcessError as e:
|
||||||
|
pytest.fail(e)
|
||||||
|
else:
|
||||||
|
with pytest.raises(CalledProcessError):
|
||||||
|
symlink_to(symlink, symlink_target)
|
||||||
|
|
||||||
|
assert not symlink.exists()
|
||||||
|
|
|
@ -25,6 +25,11 @@ class Underscore(object):
|
||||||
object.__setattr__(self, "_start", start)
|
object.__setattr__(self, "_start", start)
|
||||||
object.__setattr__(self, "_end", end)
|
object.__setattr__(self, "_end", end)
|
||||||
|
|
||||||
|
def __dir__(self):
|
||||||
|
# Hack to enable autocomplete on custom extensions
|
||||||
|
extensions = list(self._extensions.keys())
|
||||||
|
return ["set", "get", "has"] + extensions
|
||||||
|
|
||||||
def __getattr__(self, name):
|
def __getattr__(self, name):
|
||||||
if name not in self._extensions:
|
if name not in self._extensions:
|
||||||
raise AttributeError(Errors.E046.format(name=name))
|
raise AttributeError(Errors.E046.format(name=name))
|
||||||
|
@ -32,7 +37,16 @@ class Underscore(object):
|
||||||
if getter is not None:
|
if getter is not None:
|
||||||
return getter(self._obj)
|
return getter(self._obj)
|
||||||
elif method is not None:
|
elif method is not None:
|
||||||
return functools.partial(method, self._obj)
|
method_partial = functools.partial(method, self._obj)
|
||||||
|
# Hack to port over docstrings of the original function
|
||||||
|
# See https://stackoverflow.com/q/27362727/6400719
|
||||||
|
method_docstring = method.__doc__ or ""
|
||||||
|
method_docstring_prefix = (
|
||||||
|
"This method is a partial function and its first argument "
|
||||||
|
"(the object it's called on) will be filled automatically. "
|
||||||
|
)
|
||||||
|
method_partial.__doc__ = method_docstring_prefix + method_docstring
|
||||||
|
return method_partial
|
||||||
else:
|
else:
|
||||||
key = self._get_key(name)
|
key = self._get_key(name)
|
||||||
if key in self._doc.user_data:
|
if key in self._doc.user_data:
|
||||||
|
|
|
@ -14,8 +14,11 @@ import functools
|
||||||
import itertools
|
import itertools
|
||||||
import numpy.random
|
import numpy.random
|
||||||
import srsly
|
import srsly
|
||||||
from jsonschema import Draft4Validator
|
|
||||||
|
|
||||||
|
try:
|
||||||
|
import jsonschema
|
||||||
|
except ImportError:
|
||||||
|
jsonschema = None
|
||||||
|
|
||||||
try:
|
try:
|
||||||
import cupy.random
|
import cupy.random
|
||||||
|
@ -510,7 +513,7 @@ def decaying(start, stop, decay):
|
||||||
curr = float(start)
|
curr = float(start)
|
||||||
while True:
|
while True:
|
||||||
yield max(curr, stop)
|
yield max(curr, stop)
|
||||||
curr -= (decay)
|
curr -= decay
|
||||||
|
|
||||||
|
|
||||||
def minibatch_by_words(items, size, tuples=True, count_words=len):
|
def minibatch_by_words(items, size, tuples=True, count_words=len):
|
||||||
|
@ -571,6 +574,28 @@ def itershuffle(iterable, bufsize=1000):
|
||||||
raise StopIteration
|
raise StopIteration
|
||||||
|
|
||||||
|
|
||||||
|
def filter_spans(spans):
|
||||||
|
"""Filter a sequence of spans and remove duplicates or overlaps. Useful for
|
||||||
|
creating named entities (where one token can only be part of one entity) or
|
||||||
|
when merging spans with `Retokenizer.merge`. When spans overlap, the (first)
|
||||||
|
longest span is preferred over shorter spans.
|
||||||
|
|
||||||
|
spans (iterable): The spans to filter.
|
||||||
|
RETURNS (list): The filtered spans.
|
||||||
|
"""
|
||||||
|
get_sort_key = lambda span: (span.end - span.start, span.start)
|
||||||
|
sorted_spans = sorted(spans, key=get_sort_key, reverse=True)
|
||||||
|
result = []
|
||||||
|
seen_tokens = set()
|
||||||
|
for span in sorted_spans:
|
||||||
|
# Check for end - 1 here because boundaries are inclusive
|
||||||
|
if span.start not in seen_tokens and span.end - 1 not in seen_tokens:
|
||||||
|
result.append(span)
|
||||||
|
seen_tokens.update(range(span.start, span.end))
|
||||||
|
result = sorted(result, key=lambda span: span.start)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
def to_bytes(getters, exclude):
|
def to_bytes(getters, exclude):
|
||||||
serialized = OrderedDict()
|
serialized = OrderedDict()
|
||||||
for key, getter in getters.items():
|
for key, getter in getters.items():
|
||||||
|
@ -660,7 +685,9 @@ def get_json_validator(schema):
|
||||||
# validator that's used (e.g. different draft implementation), without
|
# validator that's used (e.g. different draft implementation), without
|
||||||
# having to change it all across the codebase.
|
# having to change it all across the codebase.
|
||||||
# TODO: replace with (stable) Draft6Validator, if available
|
# TODO: replace with (stable) Draft6Validator, if available
|
||||||
return Draft4Validator(schema)
|
if jsonschema is None:
|
||||||
|
raise ValueError(Errors.E136)
|
||||||
|
return jsonschema.Draft4Validator(schema)
|
||||||
|
|
||||||
|
|
||||||
def validate_schema(schema):
|
def validate_schema(schema):
|
||||||
|
|
|
@ -457,7 +457,7 @@ sit amet dignissim justo congue.
|
||||||
## Setup and installation {#setup}
|
## Setup and installation {#setup}
|
||||||
|
|
||||||
Before running the setup, make sure your versions of
|
Before running the setup, make sure your versions of
|
||||||
[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date.
|
[Node](https://nodejs.org/en/) and [npm](https://www.npmjs.com/) are up to date. Node v10.15 or later is required.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone the repository
|
# Clone the repository
|
||||||
|
|
94
website/UNIVERSE.md
Normal file
94
website/UNIVERSE.md
Normal file
|
@ -0,0 +1,94 @@
|
||||||
|
<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
|
||||||
|
|
||||||
|
# spaCy Universe
|
||||||
|
|
||||||
|
The [spaCy Universe](https://spacy.io/universe) collects the many great resources developed with or for spaCy. It
|
||||||
|
includes standalone packages, plugins, extensions, educational materials,
|
||||||
|
operational utilities and bindings for other languages.
|
||||||
|
|
||||||
|
If you have a project that you want the spaCy community to make use of, you can
|
||||||
|
suggest it by submitting a pull request to this repository. The Universe
|
||||||
|
database is open-source and collected in a simple JSON file.
|
||||||
|
|
||||||
|
Looking for inspiration for your own spaCy plugin or extension? Check out the
|
||||||
|
[`project idea`](https://github.com/explosion/spaCy/labels/project%20idea) label
|
||||||
|
on the issue tracker.
|
||||||
|
|
||||||
|
## Checklist
|
||||||
|
|
||||||
|
### Projects
|
||||||
|
|
||||||
|
✅ Libraries and packages should be **open-source** (with a user-friendly license) and at least somewhat **documented** (e.g. a simple `README` with usage instructions).
|
||||||
|
|
||||||
|
✅ We're happy to include work in progress and prereleases, but we'd like to keep the emphasis on projects that should be useful to the community **right away**.
|
||||||
|
|
||||||
|
✅ Demos and visualizers should be available via a **public URL**.
|
||||||
|
|
||||||
|
### Educational Materials
|
||||||
|
|
||||||
|
✅ Books should be **available for purchase or download** (not just pre-order). Ebooks and self-published books are fine, too, if they include enough substantial content.
|
||||||
|
|
||||||
|
✅ The `"url"` of book entries should either point to the publisher's website or a reseller of your choice (ideally one that ships worldwide or as close as possible).
|
||||||
|
|
||||||
|
✅ If an online course is only available behind a paywall, it should at least have a **free excerpt** or chapter available, so users know what to expect.
|
||||||
|
|
||||||
|
## JSON format
|
||||||
|
|
||||||
|
To add a project, fork this repository, edit the [`universe.json`](meta/universe.json)
|
||||||
|
and add an object of the following format to the list of `"resources"`. Before
|
||||||
|
you submit your pull request, make sure to use a linter to verify that your
|
||||||
|
markup is correct.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "unique-project-id",
|
||||||
|
"title": "Project title",
|
||||||
|
"slogan": "A short summary",
|
||||||
|
"description": "A longer description – *Mardown allowed!*",
|
||||||
|
"github": "user/repo",
|
||||||
|
"pip": "package-name",
|
||||||
|
"code_example": [
|
||||||
|
"import spacy",
|
||||||
|
"import package_name",
|
||||||
|
"",
|
||||||
|
"nlp = spacy.load('en')",
|
||||||
|
"nlp.add_pipe(package_name)"
|
||||||
|
],
|
||||||
|
"code_language": "python",
|
||||||
|
"url": "https://example.com",
|
||||||
|
"thumb": "https://example.com/thumb.jpg",
|
||||||
|
"image": "https://example.com/image.jpg",
|
||||||
|
"author": "Your Name",
|
||||||
|
"author_links": {
|
||||||
|
"twitter": "username",
|
||||||
|
"github": "username",
|
||||||
|
"website": "https://example.com"
|
||||||
|
},
|
||||||
|
"category": ["pipeline", "standalone"],
|
||||||
|
"tags": ["some-tag", "etc"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
| Field | Type | Description |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `id` | string | Unique ID of the project. |
|
||||||
|
| `title` | string | Project title. If not set, the `id` will be used as the display title. |
|
||||||
|
| `slogan` | string | A short description of the project. Displayed in the overview and under the title. |
|
||||||
|
| `description` | string | A longer description of the project. Markdown is allowed, but should be limited to basic formatting like bold, italics, code or links. |
|
||||||
|
| `github` | string | Associated GitHub repo in the format `user/repo`. Will be displayed as a link and used for release, license and star badges. |
|
||||||
|
| `pip` | string | Package name on pip. If available, the installation command will be displayed. |
|
||||||
|
| `cran` | string | For R packages: package name on CRAN. If available, the installation command will be displayed. |
|
||||||
|
| `code_example` | array | Short example that shows how to use the project. Formatted as an array with one string per line. |
|
||||||
|
| `code_language` | string | Defaults to `'python'`. Optional code language used for syntax highlighting with [Prism](http://prismjs.com/). |
|
||||||
|
| `url` | string | Optional project link to display as button. |
|
||||||
|
| `thumb` | string | Optional URL to project thumbnail to display in overview and project header. Recommended size is 100x100px. |
|
||||||
|
| `image` | string | Optional URL to project image to display with description. |
|
||||||
|
| `author` | string | Name(s) of project author(s). |
|
||||||
|
| `author_links` | object | Usernames and links to display as icons to author info. Currently supports `twitter` and `github` usernames, as well as `website` link. |
|
||||||
|
| `category` | list | One or more categories to assign to project. Must be one of the available options. |
|
||||||
|
| `tags` | list | Still experimental and not used for filtering: one or more tags to assign to project. |
|
||||||
|
|
||||||
|
To separate them from the projects, educational materials also specify
|
||||||
|
`"type": "education`. Books can also set a `"cover"` field containing a URL
|
||||||
|
to a cover image. If available, it's used in the overview and displayed on
|
||||||
|
the individual book page.
|
|
@ -510,7 +510,7 @@ described in any single publication. The model is a greedy transition-based
|
||||||
parser guided by a linear model whose weights are learned using the averaged
|
parser guided by a linear model whose weights are learned using the averaged
|
||||||
perceptron loss, via the
|
perceptron loss, via the
|
||||||
[dynamic oracle](http://www.aclweb.org/anthology/C12-1059) imitation learning
|
[dynamic oracle](http://www.aclweb.org/anthology/C12-1059) imitation learning
|
||||||
strategy. The transition system is equivalent to the BILOU tagging scheme.
|
strategy. The transition system is equivalent to the BILUO tagging scheme.
|
||||||
|
|
||||||
## Models and training data {#training}
|
## Models and training data {#training}
|
||||||
|
|
||||||
|
|
|
@ -189,7 +189,7 @@ using the [`package`](/api/cli#package) command.
|
||||||
|
|
||||||
<Infobox title="Changed in v2.1" variant="warning">
|
<Infobox title="Changed in v2.1" variant="warning">
|
||||||
|
|
||||||
As of spaCy 2.1, the `--no-tagger`, `--no-parser` and `--no-parser` flags have
|
As of spaCy 2.1, the `--no-tagger`, `--no-parser` and `--no-entities` flags have
|
||||||
been replaced by a `--pipeline` option, which lets you define comma-separated
|
been replaced by a `--pipeline` option, which lets you define comma-separated
|
||||||
names of pipeline components to train. For example, `--pipeline tagger,parser`
|
names of pipeline components to train. For example, `--pipeline tagger,parser`
|
||||||
will only train the tagger and parser.
|
will only train the tagger and parser.
|
||||||
|
@ -198,7 +198,7 @@ will only train the tagger and parser.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
$ python -m spacy train [lang] [output_path] [train_path] [dev_path]
|
$ python -m spacy train [lang] [output_path] [train_path] [dev_path]
|
||||||
[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-examples] [--use-gpu]
|
[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-early-stopping] [--n-examples] [--use-gpu]
|
||||||
[--version] [--meta-path] [--init-tok2vec] [--parser-multitasks]
|
[--version] [--meta-path] [--init-tok2vec] [--parser-multitasks]
|
||||||
[--entity-multitasks] [--gold-preproc] [--noise-level] [--learn-tokens]
|
[--entity-multitasks] [--gold-preproc] [--noise-level] [--learn-tokens]
|
||||||
[--verbose]
|
[--verbose]
|
||||||
|
@ -210,10 +210,11 @@ $ python -m spacy train [lang] [output_path] [train_path] [dev_path]
|
||||||
| `output_path` | positional | Directory to store model in. Will be created if it doesn't exist. |
|
| `output_path` | positional | Directory to store model in. Will be created if it doesn't exist. |
|
||||||
| `train_path` | positional | Location of JSON-formatted training data. Can be a file or a directory of files. |
|
| `train_path` | positional | Location of JSON-formatted training data. Can be a file or a directory of files. |
|
||||||
| `dev_path` | positional | Location of JSON-formatted development data for evaluation. Can be a file or a directory of files. |
|
| `dev_path` | positional | Location of JSON-formatted development data for evaluation. Can be a file or a directory of files. |
|
||||||
| `--base-model`, `-b` | option | Optional name of base model to update. Can be any loadable spaCy model. |
|
| `--base-model`, `-b` <Tag variant="new">2.1</Tag> | option | Optional name of base model to update. Can be any loadable spaCy model. |
|
||||||
| `--pipeline`, `-p` <Tag variant="new">2.1</Tag> | option | Comma-separated names of pipeline components to train. Defaults to `'tagger,parser,ner'`. |
|
| `--pipeline`, `-p` <Tag variant="new">2.1</Tag> | option | Comma-separated names of pipeline components to train. Defaults to `'tagger,parser,ner'`. |
|
||||||
| `--vectors`, `-v` | option | Model to load vectors from. |
|
| `--vectors`, `-v` | option | Model to load vectors from. |
|
||||||
| `--n-iter`, `-n` | option | Number of iterations (default: `30`). |
|
| `--n-iter`, `-n` | option | Number of iterations (default: `30`). |
|
||||||
|
| `--n-early-stopping`, `-ne` | option | Maximum number of training epochs without dev accuracy improvement. |
|
||||||
| `--n-examples`, `-ns` | option | Number of examples to use (defaults to `0` for all examples). |
|
| `--n-examples`, `-ns` | option | Number of examples to use (defaults to `0` for all examples). |
|
||||||
| `--use-gpu`, `-g` | option | Whether to use GPU. Can be either `0`, `1` or `-1`. |
|
| `--use-gpu`, `-g` | option | Whether to use GPU. Can be either `0`, `1` or `-1`. |
|
||||||
| `--version`, `-V` | option | Model version. Will be written out to the model's `meta.json` after training. |
|
| `--version`, `-V` | option | Model version. Will be written out to the model's `meta.json` after training. |
|
||||||
|
@ -274,7 +275,7 @@ an approximate language-modeling objective. Specifically, we load pre-trained
|
||||||
vectors, and train a component like a CNN, BiLSTM, etc to predict vectors which
|
vectors, and train a component like a CNN, BiLSTM, etc to predict vectors which
|
||||||
match the pre-trained ones. The weights are saved to a directory after each
|
match the pre-trained ones. The weights are saved to a directory after each
|
||||||
epoch. You can then pass a path to one of these pre-trained weights files to the
|
epoch. You can then pass a path to one of these pre-trained weights files to the
|
||||||
'spacy train' command.
|
`spacy train` command.
|
||||||
|
|
||||||
This technique may be especially helpful if you have little labelled data.
|
This technique may be especially helpful if you have little labelled data.
|
||||||
However, it's still quite experimental, so your mileage may vary. To load the
|
However, it's still quite experimental, so your mileage may vary. To load the
|
||||||
|
@ -285,24 +286,26 @@ improvement.
|
||||||
```bash
|
```bash
|
||||||
$ python -m spacy pretrain [texts_loc] [vectors_model] [output_dir] [--width]
|
$ python -m spacy pretrain [texts_loc] [vectors_model] [output_dir] [--width]
|
||||||
[--depth] [--embed-rows] [--dropout] [--seed] [--n-iter] [--use-vectors]
|
[--depth] [--embed-rows] [--dropout] [--seed] [--n-iter] [--use-vectors]
|
||||||
|
[--n-save_every]
|
||||||
```
|
```
|
||||||
|
|
||||||
| Argument | Type | Description |
|
| Argument | Type | Description |
|
||||||
| ---------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `texts_loc` | positional | Path to JSONL file with raw texts to learn from, with text provided as the key `"text"`. [See here](#pretrain-jsonl) for details. |
|
| `texts_loc` | positional | Path to JSONL file with raw texts to learn from, with text provided as the key `"text"`. [See here](#pretrain-jsonl) for details. |
|
||||||
| `vectors_model` | positional | Name or path to spaCy model with vectors to learn from. |
|
| `vectors_model` | positional | Name or path to spaCy model with vectors to learn from. |
|
||||||
| `output_dir` | positional | Directory to write models to on each epoch. |
|
| `output_dir` | positional | Directory to write models to on each epoch. |
|
||||||
| `--width`, `-cw` | option | Width of CNN layers. |
|
| `--width`, `-cw` | option | Width of CNN layers. |
|
||||||
| `--depth`, `-cd` | option | Depth of CNN layers. |
|
| `--depth`, `-cd` | option | Depth of CNN layers. |
|
||||||
| `--embed-rows`, `-er` | option | Number of embedding rows. |
|
| `--embed-rows`, `-er` | option | Number of embedding rows. |
|
||||||
| `--dropout`, `-d` | option | Dropout rate. |
|
| `--dropout`, `-d` | option | Dropout rate. |
|
||||||
| `--batch-size`, `-bs` | option | Number of words per training batch. |
|
| `--batch-size`, `-bs` | option | Number of words per training batch. |
|
||||||
| `--max-length`, `-xw` | option | Maximum words per example. Longer examples are discarded. |
|
| `--max-length`, `-xw` | option | Maximum words per example. Longer examples are discarded. |
|
||||||
| `--min-length`, `-nw` | option | Minimum words per example. Shorter examples are discarded. |
|
| `--min-length`, `-nw` | option | Minimum words per example. Shorter examples are discarded. |
|
||||||
| `--seed`, `-s` | option | Seed for random number generators. |
|
| `--seed`, `-s` | option | Seed for random number generators. |
|
||||||
| `--n-iter`, `-i` | option | Number of iterations to pretrain. |
|
| `--n-iter`, `-i` | option | Number of iterations to pretrain. |
|
||||||
| `--use-vectors`, `-uv` | flag | Whether to use the static vectors as input features. |
|
| `--use-vectors`, `-uv` | flag | Whether to use the static vectors as input features. |
|
||||||
| **CREATES** | weights | The pre-trained weights that can be used to initialize `spacy train`. |
|
| `--n-save_every`, `-se` | option | Save model every X batches. |
|
||||||
|
| **CREATES** | weights | The pre-trained weights that can be used to initialize `spacy train`. |
|
||||||
|
|
||||||
### JSONL format for raw text {#pretrain-jsonl}
|
### JSONL format for raw text {#pretrain-jsonl}
|
||||||
|
|
||||||
|
@ -324,7 +327,7 @@ tokenization can be provided.
|
||||||
|
|
||||||
| Key | Type | Description |
|
| Key | Type | Description |
|
||||||
| -------- | ------- | -------------------------------------------- |
|
| -------- | ------- | -------------------------------------------- |
|
||||||
| `text` | unicode | The raw input text. |
|
| `text` | unicode | The raw input text. Is not required if `tokens` available. |
|
||||||
| `tokens` | list | Optional tokenization, one string per token. |
|
| `tokens` | list | Optional tokenization, one string per token. |
|
||||||
|
|
||||||
```json
|
```json
|
||||||
|
@ -332,6 +335,7 @@ tokenization can be provided.
|
||||||
{"text": "Can I ask where you work now and what you do, and if you enjoy it?"}
|
{"text": "Can I ask where you work now and what you do, and if you enjoy it?"}
|
||||||
{"text": "They may just pull out of the Seattle market completely, at least until they have autonomous vehicles."}
|
{"text": "They may just pull out of the Seattle market completely, at least until they have autonomous vehicles."}
|
||||||
{"text": "My cynical view on this is that it will never be free to the public. Reason: what would be the draw of joining the military? Right now their selling point is free Healthcare and Education. Ironically both are run horribly and most, that I've talked to, come out wishing they never went in."}
|
{"text": "My cynical view on this is that it will never be free to the public. Reason: what would be the draw of joining the military? Right now their selling point is free Healthcare and Education. Ironically both are run horribly and most, that I've talked to, come out wishing they never went in."}
|
||||||
|
{"tokens": ["If", "tokens", "are", "provided", "then", "we", "can", "skip", "the", "raw", "input", "text"]}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Init Model {#init-model new="2"}
|
## Init Model {#init-model new="2"}
|
||||||
|
@ -375,7 +379,7 @@ pipeline.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
$ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit]
|
$ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit]
|
||||||
[--gpu-id] [--gold-preproc]
|
[--gpu-id] [--gold-preproc] [--return-scores]
|
||||||
```
|
```
|
||||||
|
|
||||||
| Argument | Type | Description |
|
| Argument | Type | Description |
|
||||||
|
@ -386,6 +390,7 @@ $ python -m spacy evaluate [model] [data_path] [--displacy-path] [--displacy-lim
|
||||||
| `--displacy-limit`, `-dl` | option | Number of parses to generate per file. Defaults to `25`. Keep in mind that a significantly higher number might cause the `.html` files to render slowly. |
|
| `--displacy-limit`, `-dl` | option | Number of parses to generate per file. Defaults to `25`. Keep in mind that a significantly higher number might cause the `.html` files to render slowly. |
|
||||||
| `--gpu-id`, `-g` | option | GPU to use, if any. Defaults to `-1` for CPU. |
|
| `--gpu-id`, `-g` | option | GPU to use, if any. Defaults to `-1` for CPU. |
|
||||||
| `--gold-preproc`, `-G` | flag | Use gold preprocessing. |
|
| `--gold-preproc`, `-G` | flag | Use gold preprocessing. |
|
||||||
|
| `--return-scores`, `-R` | flag | Return dict containing model scores. |
|
||||||
| **CREATES** | `stdout`, HTML | Training results and optional displaCy visualizations. |
|
| **CREATES** | `stdout`, HTML | Training results and optional displaCy visualizations. |
|
||||||
|
|
||||||
## Package {#package}
|
## Package {#package}
|
||||||
|
|
|
@ -172,7 +172,7 @@ struct.
|
||||||
| `prefix` | <Abbr title="uint64_t">`attr_t`</Abbr> | Length-N substring from the start of the lexeme. Defaults to `N=1`. |
|
| `prefix` | <Abbr title="uint64_t">`attr_t`</Abbr> | Length-N substring from the start of the lexeme. Defaults to `N=1`. |
|
||||||
| `suffix` | <Abbr title="uint64_t">`attr_t`</Abbr> | Length-N substring from the end of the lexeme. Defaults to `N=3`. |
|
| `suffix` | <Abbr title="uint64_t">`attr_t`</Abbr> | Length-N substring from the end of the lexeme. Defaults to `N=3`. |
|
||||||
| `cluster` | <Abbr title="uint64_t">`attr_t`</Abbr> | Brown cluster ID. |
|
| `cluster` | <Abbr title="uint64_t">`attr_t`</Abbr> | Brown cluster ID. |
|
||||||
| `prob` | `float` | Smoothed log probability estimate of the lexeme's type. |
|
| `prob` | `float` | Smoothed log probability estimate of the lexeme's word type (context-independent entry in the vocabulary). |
|
||||||
| `sentiment` | `float` | A scalar value indicating positivity or negativity. |
|
| `sentiment` | `float` | A scalar value indicating positivity or negativity. |
|
||||||
|
|
||||||
### Lexeme.get_struct_attr {#lexeme_get_struct_attr tag="staticmethod, nogil" source="spacy/lexeme.pxd"}
|
### Lexeme.get_struct_attr {#lexeme_get_struct_attr tag="staticmethod, nogil" source="spacy/lexeme.pxd"}
|
||||||
|
|
|
@ -102,10 +102,10 @@ Apply the pipeline's model to a batch of docs, without modifying them.
|
||||||
> scores = parser.predict([doc1, doc2])
|
> scores = parser.predict([doc1, doc2])
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------- | ------------------- | ---------------------------------------------- |
|
||||||
| `docs` | iterable | The documents to predict. |
|
| `docs` | iterable | The documents to predict. |
|
||||||
| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. |
|
| **RETURNS** | `syntax.StateClass` | A helper class for the parse state (internal). |
|
||||||
|
|
||||||
## DependencyParser.set_annotations {#set_annotations tag="method"}
|
## DependencyParser.set_annotations {#set_annotations tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -119,8 +119,27 @@ Update the models in the pipeline.
|
||||||
| `golds` | iterable | A batch of `GoldParse` objects or dictionaries. Dictionaries will be used to create [`GoldParse`](/api/goldparse) objects. For the available keys and their usage, see [`GoldParse.__init__`](/api/goldparse#init). |
|
| `golds` | iterable | A batch of `GoldParse` objects or dictionaries. Dictionaries will be used to create [`GoldParse`](/api/goldparse) objects. For the available keys and their usage, see [`GoldParse.__init__`](/api/goldparse#init). |
|
||||||
| `drop` | float | The dropout rate. |
|
| `drop` | float | The dropout rate. |
|
||||||
| `sgd` | callable | An optimizer. |
|
| `sgd` | callable | An optimizer. |
|
||||||
|
| `losses` | dict | Dictionary to update with the loss, keyed by pipeline component. |
|
||||||
| `component_cfg` <Tag variant="new">2.1</Tag> | dict | Config parameters for specific pipeline components, keyed by component name. |
|
| `component_cfg` <Tag variant="new">2.1</Tag> | dict | Config parameters for specific pipeline components, keyed by component name. |
|
||||||
| **RETURNS** | dict | Results from the update. |
|
|
||||||
|
## Language.evaluate {#evaluate tag="method"}
|
||||||
|
|
||||||
|
Evaluate a model's pipeline components.
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> scorer = nlp.evaluate(docs_golds, verbose=True)
|
||||||
|
> print(scorer.scores)
|
||||||
|
> ```
|
||||||
|
|
||||||
|
| Name | Type | Description |
|
||||||
|
| -------------------------------------------- | -------- | ------------------------------------------------------------------------------------- |
|
||||||
|
| `docs_golds` | iterable | Tuples of `Doc` and `GoldParse` objects. |
|
||||||
|
| `verbose` | bool | Print debugging information. |
|
||||||
|
| `batch_size` | int | The batch size to use. |
|
||||||
|
| `scorer` | `Scorer` | Optional [`Scorer`](/api/scorer) to use. If not passed in, a new one will be created. |
|
||||||
|
| `component_cfg` <Tag variant="new">2.1</Tag> | dict | Config parameters for specific pipeline components, keyed by component name. |
|
||||||
|
|
||||||
## Language.begin_training {#begin_training tag="method"}
|
## Language.begin_training {#begin_training tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -128,7 +128,6 @@ The L2 norm of the lexeme's vector representation.
|
||||||
| `text` | unicode | Verbatim text content. |
|
| `text` | unicode | Verbatim text content. |
|
||||||
| `orth` | int | ID of the verbatim text content. |
|
| `orth` | int | ID of the verbatim text content. |
|
||||||
| `orth_` | unicode | Verbatim text content (identical to `Lexeme.text`). Exists mostly for consistency with the other attributes. |
|
| `orth_` | unicode | Verbatim text content (identical to `Lexeme.text`). Exists mostly for consistency with the other attributes. |
|
||||||
| `lex_id` | int | ID of the lexeme's lexical type. |
|
|
||||||
| `rank` | int | Sequential ID of the lexemes's lexical type, used to index into tables, e.g. for word vectors. |
|
| `rank` | int | Sequential ID of the lexemes's lexical type, used to index into tables, e.g. for word vectors. |
|
||||||
| `flags` | int | Container of the lexeme's binary flags. |
|
| `flags` | int | Container of the lexeme's binary flags. |
|
||||||
| `norm` | int | The lexemes's norm, i.e. a normalized form of the lexeme text. |
|
| `norm` | int | The lexemes's norm, i.e. a normalized form of the lexeme text. |
|
||||||
|
@ -161,6 +160,6 @@ The L2 norm of the lexeme's vector representation.
|
||||||
| `is_stop` | bool | Is the lexeme part of a "stop list"? |
|
| `is_stop` | bool | Is the lexeme part of a "stop list"? |
|
||||||
| `lang` | int | Language of the parent vocabulary. |
|
| `lang` | int | Language of the parent vocabulary. |
|
||||||
| `lang_` | unicode | Language of the parent vocabulary. |
|
| `lang_` | unicode | Language of the parent vocabulary. |
|
||||||
| `prob` | float | Smoothed log probability estimate of the lexeme's type. |
|
| `prob` | float | Smoothed log probability estimate of the lexeme's word type (context-independent entry in the vocabulary). |
|
||||||
| `cluster` | int | Brown cluster ID. |
|
| `cluster` | int | Brown cluster ID. |
|
||||||
| `sentiment` | float | A scalar value indicating the positivity or negativity of the lexeme. |
|
| `sentiment` | float | A scalar value indicating the positivity or negativity of the lexeme. |
|
||||||
|
|
58
website/docs/api/scorer.md
Normal file
58
website/docs/api/scorer.md
Normal file
|
@ -0,0 +1,58 @@
|
||||||
|
---
|
||||||
|
title: Scorer
|
||||||
|
teaser: Compute evaluation scores
|
||||||
|
tag: class
|
||||||
|
source: spacy/scorer.py
|
||||||
|
---
|
||||||
|
|
||||||
|
The `Scorer` computes and stores evaluation scores. It's typically created by
|
||||||
|
[`Language.evaluate`](/api/language#evaluate).
|
||||||
|
|
||||||
|
## Scorer.\_\_init\_\_ {#init tag="method"}
|
||||||
|
|
||||||
|
Create a new `Scorer`.
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> from spacy.scorer import Scorer
|
||||||
|
>
|
||||||
|
> scorer = Scorer()
|
||||||
|
> ```
|
||||||
|
|
||||||
|
| Name | Type | Description |
|
||||||
|
| ------------ | -------- | ------------------------------------------------------------ |
|
||||||
|
| `eval_punct` | bool | Evaluate the dependency attachments to and from punctuation. |
|
||||||
|
| **RETURNS** | `Scorer` | The newly created object. |
|
||||||
|
|
||||||
|
## Scorer.score {#score tag="method"}
|
||||||
|
|
||||||
|
Update the evaluation scores from a single [`Doc`](/api/doc) /
|
||||||
|
[`GoldParse`](/api/goldparse) pair.
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> scorer = Scorer()
|
||||||
|
> scorer.score(doc, gold)
|
||||||
|
> ```
|
||||||
|
|
||||||
|
| Name | Type | Description |
|
||||||
|
| -------------- | ----------- | -------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
| `doc` | `Doc` | The predicted annotations. |
|
||||||
|
| `gold` | `GoldParse` | The correct annotations. |
|
||||||
|
| `verbose` | bool | Print debugging information. |
|
||||||
|
| `punct_labels` | tuple | Dependency labels for punctuation. Used to evaluate dependency attachments to punctuation if `eval_punct` is `True`. |
|
||||||
|
|
||||||
|
## Properties
|
||||||
|
|
||||||
|
| Name | Type | Description |
|
||||||
|
| ----------- | ----- | -------------------------------------------------------------------------------------------- |
|
||||||
|
| `token_acc` | float | Tokenization accuracy. |
|
||||||
|
| `tags_acc` | float | Part-of-speech tag accuracy (fine grained tags, i.e. `Token.tag`). |
|
||||||
|
| `uas` | float | Unlabelled dependency score. |
|
||||||
|
| `las` | float | Labelled dependency score. |
|
||||||
|
| `ents_p` | float | Named entity accuracy (precision). |
|
||||||
|
| `ents_r` | float | Named entity accuracy (recall). |
|
||||||
|
| `ents_f` | float | Named entity accuracy (F-score). |
|
||||||
|
| `scores` | dict | All scores with keys `uas`, `las`, `ents_p`, `ents_r`, `ents_f`, `tags_acc` and `token_acc`. |
|
|
@ -424,7 +424,7 @@ The L2 norm of the token's vector representation.
|
||||||
| `ent_type` | int | Named entity type. |
|
| `ent_type` | int | Named entity type. |
|
||||||
| `ent_type_` | unicode | Named entity type. |
|
| `ent_type_` | unicode | Named entity type. |
|
||||||
| `ent_iob` | int | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set. | |
|
| `ent_iob` | int | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set. | |
|
||||||
| `ent_iob_` | unicode | IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set. |
|
| `ent_iob_` | unicode | IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set. |
|
||||||
| `ent_id` | int | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. |
|
| `ent_id` | int | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. |
|
||||||
| `ent_id_` | unicode | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. |
|
| `ent_id_` | unicode | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. |
|
||||||
| `lemma` | int | Base form of the token, with no inflectional suffixes. |
|
| `lemma` | int | Base form of the token, with no inflectional suffixes. |
|
||||||
|
@ -465,10 +465,10 @@ The L2 norm of the token's vector representation.
|
||||||
| `dep_` | unicode | Syntactic dependency relation. |
|
| `dep_` | unicode | Syntactic dependency relation. |
|
||||||
| `lang` | int | Language of the parent document's vocabulary. |
|
| `lang` | int | Language of the parent document's vocabulary. |
|
||||||
| `lang_` | unicode | Language of the parent document's vocabulary. |
|
| `lang_` | unicode | Language of the parent document's vocabulary. |
|
||||||
| `prob` | float | Smoothed log probability estimate of token's type. |
|
| `prob` | float | Smoothed log probability estimate of token's word type (context-independent entry in the vocabulary). |
|
||||||
| `idx` | int | The character offset of the token within the parent document. |
|
| `idx` | int | The character offset of the token within the parent document. |
|
||||||
| `sentiment` | float | A scalar value indicating the positivity or negativity of the token. |
|
| `sentiment` | float | A scalar value indicating the positivity or negativity of the token. |
|
||||||
| `lex_id` | int | Sequential ID of the token's lexical type. |
|
| `lex_id` | int | Sequential ID of the token's lexical type, used to index into tables, e.g. for word vectors. |
|
||||||
| `rank` | int | Sequential ID of the token's lexical type, used to index into tables, e.g. for word vectors. |
|
| `rank` | int | Sequential ID of the token's lexical type, used to index into tables, e.g. for word vectors. |
|
||||||
| `cluster` | int | Brown cluster ID. |
|
| `cluster` | int | Brown cluster ID. |
|
||||||
| `_` | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes). |
|
| `_` | `Underscore` | User space for adding custom [attribute extensions](/usage/processing-pipelines#custom-components-attributes). |
|
||||||
|
|
|
@ -211,16 +211,16 @@ Render a dependency parse tree or named entity visualization.
|
||||||
> html = displacy.render(doc, style="dep")
|
> html = displacy.render(doc, style="dep")
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description | Default |
|
| Name | Type | Description | Default |
|
||||||
| ----------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------- |
|
| ----------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
|
||||||
| `docs` | list, `Doc`, `Span` | Document(s) to visualize. |
|
| `docs` | list, `Doc`, `Span` | Document(s) to visualize. |
|
||||||
| `style` | unicode | Visualization style, `'dep'` or `'ent'`. | `'dep'` |
|
| `style` | unicode | Visualization style, `'dep'` or `'ent'`. | `'dep'` |
|
||||||
| `page` | bool | Render markup as full HTML page. | `False` |
|
| `page` | bool | Render markup as full HTML page. | `False` |
|
||||||
| `minify` | bool | Minify HTML markup. | `False` |
|
| `minify` | bool | Minify HTML markup. | `False` |
|
||||||
| `jupyter` | bool | Explicitly enable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. | detected automatically |
|
| `jupyter` | bool | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None`. | `None` |
|
||||||
| `options` | dict | [Visualizer-specific options](#options), e.g. colors. | `{}` |
|
| `options` | dict | [Visualizer-specific options](#options), e.g. colors. | `{}` |
|
||||||
| `manual` | bool | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False` |
|
| `manual` | bool | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here](/usage/visualizers#manual-usage) for formats and examples. | `False` |
|
||||||
| **RETURNS** | unicode | Rendered HTML markup. |
|
| **RETURNS** | unicode | Rendered HTML markup. |
|
||||||
|
|
||||||
### Visualizer options {#displacy_options}
|
### Visualizer options {#displacy_options}
|
||||||
|
|
||||||
|
@ -351,7 +351,7 @@ the two-letter language code.
|
||||||
| `name` | unicode | Two-letter language code, e.g. `'en'`. |
|
| `name` | unicode | Two-letter language code, e.g. `'en'`. |
|
||||||
| `cls` | `Language` | The language class, e.g. `English`. |
|
| `cls` | `Language` | The language class, e.g. `English`. |
|
||||||
|
|
||||||
### util.lang_class_is_loaded (#util.lang_class_is_loaded tag="function" new="2.1")
|
### util.lang_class_is_loaded {#util.lang_class_is_loaded tag="function" new="2.1"}
|
||||||
|
|
||||||
Check whether a `Language` class is already loaded. `Language` classes are
|
Check whether a `Language` class is already loaded. `Language` classes are
|
||||||
loaded lazily, to avoid expensive setup code associated with the language data.
|
loaded lazily, to avoid expensive setup code associated with the language data.
|
||||||
|
@ -654,6 +654,27 @@ for batching. Larger `buffsize` means less bias.
|
||||||
| `buffsize` | int | Items to hold back. |
|
| `buffsize` | int | Items to hold back. |
|
||||||
| **YIELDS** | iterable | The shuffled iterator. |
|
| **YIELDS** | iterable | The shuffled iterator. |
|
||||||
|
|
||||||
|
### util.filter_spans {#util.filter_spans tag="function" new="2.1.4"}
|
||||||
|
|
||||||
|
Filter a sequence of [`Span`](/api/span) objects and remove duplicates or
|
||||||
|
overlaps. Useful for creating named entities (where one token can only be part
|
||||||
|
of one entity) or when merging spans with
|
||||||
|
[`Retokenizer.merge`](/api/doc#retokenizer.merge). When spans overlap, the
|
||||||
|
(first) longest span is preferred over shorter spans.
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```python
|
||||||
|
> doc = nlp("This is a sentence.")
|
||||||
|
> spans = [doc[0:2], doc[0:2], doc[0:4]]
|
||||||
|
> filtered = filter_spans(spans)
|
||||||
|
> ```
|
||||||
|
|
||||||
|
| Name | Type | Description |
|
||||||
|
| ----------- | -------- | -------------------- |
|
||||||
|
| `spans` | iterable | The spans to filter. |
|
||||||
|
| **RETURNS** | list | The filtered spans. |
|
||||||
|
|
||||||
## Compatibility functions {#compat source="spacy/compaty.py"}
|
## Compatibility functions {#compat source="spacy/compaty.py"}
|
||||||
|
|
||||||
All Python code is written in an **intersection of Python 2 and Python 3**. This
|
All Python code is written in an **intersection of Python 2 and Python 3**. This
|
||||||
|
|
|
@ -306,7 +306,7 @@ vectors, they will be counted individually.
|
||||||
|
|
||||||
Load [GloVe](https://nlp.stanford.edu/projects/glove/) vectors from a directory.
|
Load [GloVe](https://nlp.stanford.edu/projects/glove/) vectors from a directory.
|
||||||
Assumes binary format, that the vocab is in a `vocab.txt`, and that vectors are
|
Assumes binary format, that the vocab is in a `vocab.txt`, and that vectors are
|
||||||
named `vectors.{size}.[fd`.bin], e.g. `vectors.128.f.bin` for 128d float32
|
named `vectors.{size}.[fd.bin]`, e.g. `vectors.128.f.bin` for 128d float32
|
||||||
vectors, `vectors.300.d.bin` for 300d float64 (double) vectors, etc. By default
|
vectors, `vectors.300.d.bin` for 300d float64 (double) vectors, etc. By default
|
||||||
GloVe outputs 64-bit vectors.
|
GloVe outputs 64-bit vectors.
|
||||||
|
|
||||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 1.6 MiB |
BIN
website/docs/images/course.jpg
Normal file
BIN
website/docs/images/course.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 270 KiB |
|
@ -4,7 +4,7 @@ example, everything that's in your `nlp` object. This means you'll have to
|
||||||
translate its contents and structure into a format that can be saved, like a
|
translate its contents and structure into a format that can be saved, like a
|
||||||
file or a byte string. This process is called serialization. spaCy comes with
|
file or a byte string. This process is called serialization. spaCy comes with
|
||||||
**built-in serialization methods** and supports the
|
**built-in serialization methods** and supports the
|
||||||
[Pickle protocol](http://www.diveintopython3.net/serializing.html#dump).
|
[Pickle protocol](https://www.diveinto.org/python3/serializing.html#dump).
|
||||||
|
|
||||||
> #### What's pickle?
|
> #### What's pickle?
|
||||||
>
|
>
|
||||||
|
|
|
@ -50,7 +50,7 @@ together.
|
||||||
|
|
||||||
## Benchmarks {#benchmarks}
|
## Benchmarks {#benchmarks}
|
||||||
|
|
||||||
Two peer-reviewed papers in 2015 confirm that spaCy offers the **fastest
|
Two peer-reviewed papers in 2015 confirmed that spaCy offers the **fastest
|
||||||
syntactic parser in the world** and that **its accuracy is within 1% of the
|
syntactic parser in the world** and that **its accuracy is within 1% of the
|
||||||
best** available. The few systems that are more accurate are 20× slower or more.
|
best** available. The few systems that are more accurate are 20× slower or more.
|
||||||
|
|
||||||
|
|
|
@ -326,7 +326,7 @@ URLs.
|
||||||
```text
|
```text
|
||||||
### requirements.txt
|
### requirements.txt
|
||||||
spacy>=2.0.0,<3.0.0
|
spacy>=2.0.0,<3.0.0
|
||||||
https://github.com/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm
|
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz#egg=en_core_web_sm
|
||||||
```
|
```
|
||||||
|
|
||||||
Specifying `#egg=` with the package name tells pip which package to expect from
|
Specifying `#egg=` with the package name tells pip which package to expect from
|
||||||
|
|
|
@ -260,7 +260,7 @@ def my_component(doc):
|
||||||
|
|
||||||
nlp = spacy.load("en_core_web_sm")
|
nlp = spacy.load("en_core_web_sm")
|
||||||
nlp.add_pipe(my_component, name="print_info", last=True)
|
nlp.add_pipe(my_component, name="print_info", last=True)
|
||||||
print(nlp.pipe_names) # ['print_info', 'tagger', 'parser', 'ner']
|
print(nlp.pipe_names) # ['tagger', 'parser', 'ner', 'print_info']
|
||||||
doc = nlp(u"This is a sentence.")
|
doc = nlp(u"This is a sentence.")
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
|
@ -214,7 +214,8 @@ example, you might want to match different spellings of a word, without having
|
||||||
to add a new pattern for each spelling.
|
to add a new pattern for each spelling.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
pattern = [{"TEXT": {"REGEX": "^([Uu](\\.?|nited) ?[Ss](\\.?|tates)"}},
|
pattern = [{"TEXT": {"REGEX": "^[Uu](\\.?|nited)$"}},
|
||||||
|
{"TEXT": {"REGEX": "^[Ss](\\.?|tates)$"}},
|
||||||
{"LOWER": "president"}]
|
{"LOWER": "president"}]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -227,7 +228,7 @@ attributes:
|
||||||
pattern = [{"TAG": {"REGEX": "^V"}}]
|
pattern = [{"TAG": {"REGEX": "^V"}}]
|
||||||
|
|
||||||
# Match custom attribute values with regular expressions
|
# Match custom attribute values with regular expressions
|
||||||
pattern = [{"_": {"country": {"REGEX": "^([Uu](\\.?|nited) ?[Ss](\\.?|tates)"}}}]
|
pattern = [{"_": {"country": {"REGEX": "^[Uu](\\.?|nited) ?[Ss](\\.?|tates)$"}}}]
|
||||||
```
|
```
|
||||||
|
|
||||||
<Infobox title="Regular expressions in older versions" variant="warning">
|
<Infobox title="Regular expressions in older versions" variant="warning">
|
||||||
|
@ -404,7 +405,7 @@ class BadHTMLMerger(object):
|
||||||
for match_id, start, end in matches:
|
for match_id, start, end in matches:
|
||||||
spans.append(doc[start:end])
|
spans.append(doc[start:end])
|
||||||
with doc.retokenize() as retokenizer:
|
with doc.retokenize() as retokenizer:
|
||||||
for span in hashtags:
|
for span in spans:
|
||||||
retokenizer.merge(span)
|
retokenizer.merge(span)
|
||||||
for token in span:
|
for token in span:
|
||||||
token._.bad_html = True # Mark token as bad HTML
|
token._.bad_html = True # Mark token as bad HTML
|
||||||
|
@ -678,7 +679,7 @@ for match_id, start, end in matches:
|
||||||
if doc.vocab.strings[match_id] == "HASHTAG":
|
if doc.vocab.strings[match_id] == "HASHTAG":
|
||||||
hashtags.append(doc[start:end])
|
hashtags.append(doc[start:end])
|
||||||
with doc.retokenize() as retokenizer:
|
with doc.retokenize() as retokenizer:
|
||||||
for span in spans:
|
for span in hashtags:
|
||||||
retokenizer.merge(span)
|
retokenizer.merge(span)
|
||||||
for token in span:
|
for token in span:
|
||||||
token._.is_hashtag = True
|
token._.is_hashtag = True
|
||||||
|
@ -712,9 +713,9 @@ from spacy.matcher import PhraseMatcher
|
||||||
|
|
||||||
nlp = spacy.load('en_core_web_sm')
|
nlp = spacy.load('en_core_web_sm')
|
||||||
matcher = PhraseMatcher(nlp.vocab)
|
matcher = PhraseMatcher(nlp.vocab)
|
||||||
terminology_list = [u"Barack Obama", u"Angela Merkel", u"Washington, D.C."]
|
terms = [u"Barack Obama", u"Angela Merkel", u"Washington, D.C."]
|
||||||
# Only run nlp.make_doc to speed things up
|
# Only run nlp.make_doc to speed things up
|
||||||
patterns = [nlp.make_doc(text) for text in terminology_list]
|
patterns = [nlp.make_doc(text) for text in terms]
|
||||||
matcher.add("TerminologyList", None, *patterns)
|
matcher.add("TerminologyList", None, *patterns)
|
||||||
|
|
||||||
doc = nlp(u"German Chancellor Angela Merkel and US President Barack Obama "
|
doc = nlp(u"German Chancellor Angela Merkel and US President Barack Obama "
|
||||||
|
|
|
@ -29,6 +29,19 @@ quick introduction.
|
||||||
> [pull requests](https://github.com/explosion/spaCy/pulls). You can find a
|
> [pull requests](https://github.com/explosion/spaCy/pulls). You can find a
|
||||||
> "Suggest edits" link at the bottom of each page that points you to the source.
|
> "Suggest edits" link at the bottom of each page that points you to the source.
|
||||||
|
|
||||||
|
<Infobox title="Take the free interactive course">
|
||||||
|
|
||||||
|
[![Advanced NLP with spaCy](../images/course.jpg)](https://course.spacy.io)
|
||||||
|
|
||||||
|
In this course you'll learn how to use spaCy to build advanced natural language
|
||||||
|
understanding systems, using both rule-based and machine learning approaches. It
|
||||||
|
includes 55 exercises featuring interactive coding practice, multiple-choice
|
||||||
|
questions and slide decks.
|
||||||
|
|
||||||
|
<p><Button to="https://course.spacy.io" variant="primary">Start the course</Button></p>
|
||||||
|
|
||||||
|
</Infobox>
|
||||||
|
|
||||||
## What's spaCy? {#whats-spacy}
|
## What's spaCy? {#whats-spacy}
|
||||||
|
|
||||||
<Grid cols={2}>
|
<Grid cols={2}>
|
||||||
|
@ -89,27 +102,12 @@ systems, or to pre-process text for **deep learning**.
|
||||||
integrated and opinionated. spaCy tries to avoid asking the user to choose
|
integrated and opinionated. spaCy tries to avoid asking the user to choose
|
||||||
between multiple algorithms that deliver equivalent functionality. Keeping the
|
between multiple algorithms that deliver equivalent functionality. Keeping the
|
||||||
menu small lets spaCy deliver generally better performance and developer
|
menu small lets spaCy deliver generally better performance and developer
|
||||||
experience.M
|
experience.
|
||||||
|
|
||||||
- **spaCy is not a company**. It's an open-source library. Our company
|
- **spaCy is not a company**. It's an open-source library. Our company
|
||||||
publishing spaCy and other software is called
|
publishing spaCy and other software is called
|
||||||
[Explosion AI](https://explosion.ai).
|
[Explosion AI](https://explosion.ai).
|
||||||
|
|
||||||
<Infobox title="Download the spaCy Cheat Sheet!">
|
|
||||||
|
|
||||||
[![spaCy Cheatsheet](../images/cheatsheet.jpg)](http://datacamp-community-prod.s3.amazonaws.com/29aa28bf-570a-4965-8f54-d6a541ae4e06)
|
|
||||||
|
|
||||||
For the launch of our
|
|
||||||
["Advanced NLP with spaCy"](https://www.datacamp.com/courses/advanced-nlp-with-spacy)
|
|
||||||
course on DataCamp we created the first official spaCy cheat sheet! A handy
|
|
||||||
two-page reference to the most important concepts and features, from loading
|
|
||||||
models and accessing linguistic annotations, to custom pipeline components and
|
|
||||||
rule-based matching.
|
|
||||||
|
|
||||||
<p><Button to="http://datacamp-community-prod.s3.amazonaws.com/29aa28bf-570a-4965-8f54-d6a541ae4e06" variant="primary">Download</Button></p>
|
|
||||||
|
|
||||||
</Infobox>
|
|
||||||
|
|
||||||
## Features {#features}
|
## Features {#features}
|
||||||
|
|
||||||
In the documentation, you'll come across mentions of spaCy's features and
|
In the documentation, you'll come across mentions of spaCy's features and
|
||||||
|
|
|
@ -136,7 +136,7 @@ The entity visualizer lets you customize the following `options`:
|
||||||
| Argument | Type | Description | Default |
|
| Argument | Type | Description | Default |
|
||||||
| -------- | ---- | ------------------------------------------------------------------------------------- | ------- |
|
| -------- | ---- | ------------------------------------------------------------------------------------- | ------- |
|
||||||
| `ents` | list | Entity types to highlight (`None` for all types). | `None` |
|
| `ents` | list | Entity types to highlight (`None` for all types). | `None` |
|
||||||
| `colors` | dict | Color overrides. Entity types in lowercase should be mapped to color names or values. | `{}` |
|
| `colors` | dict | Color overrides. Entity types in uppercase should be mapped to color names or values. | `{}` |
|
||||||
|
|
||||||
If you specify a list of `ents`, only those entity types will be rendered – for
|
If you specify a list of `ents`, only those entity types will be rendered – for
|
||||||
example, you can choose to display `PERSON` entities. Internally, the visualizer
|
example, you can choose to display `PERSON` entities. Internally, the visualizer
|
||||||
|
|
|
@ -90,7 +90,8 @@
|
||||||
{ "text": "StringStore", "url": "/api/stringstore" },
|
{ "text": "StringStore", "url": "/api/stringstore" },
|
||||||
{ "text": "Vectors", "url": "/api/vectors" },
|
{ "text": "Vectors", "url": "/api/vectors" },
|
||||||
{ "text": "GoldParse", "url": "/api/goldparse" },
|
{ "text": "GoldParse", "url": "/api/goldparse" },
|
||||||
{ "text": "GoldCorpus", "url": "/api/goldcorpus" }
|
{ "text": "GoldCorpus", "url": "/api/goldcorpus" },
|
||||||
|
{ "text": "Scorer", "url": "/api/scorer" }
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
|
@ -1,5 +1,107 @@
|
||||||
{
|
{
|
||||||
"resources": [
|
"resources": [
|
||||||
|
{
|
||||||
|
"id": "nlp-architect",
|
||||||
|
"title": "NLP Architect",
|
||||||
|
"slogan": "Python lib for exploring Deep NLP & NLU by Intel AI",
|
||||||
|
"github": "NervanaSystems/nlp-architect",
|
||||||
|
"pip": "nlp-architect",
|
||||||
|
"thumb": "https://i.imgur.com/vMideRx.png",
|
||||||
|
"category": ["standalone", "research"],
|
||||||
|
"tags": ["pytorch"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "NeuroNER",
|
||||||
|
"title": "NeuroNER",
|
||||||
|
"slogan": "Named-entity recognition using neural networks",
|
||||||
|
"github": "Franck-Dernoncourt/NeuroNER",
|
||||||
|
"pip": "pyneuroner[cpu]",
|
||||||
|
"code_example": [
|
||||||
|
"from neuroner import neuromodel",
|
||||||
|
"nn = neuromodel.NeuroNER(train_model=False, use_pretrained_model=True)"
|
||||||
|
],
|
||||||
|
"category": ["ner"],
|
||||||
|
"tags": ["standalone"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "NLPre",
|
||||||
|
"title": "NLPre",
|
||||||
|
"slogan": "Natural Language Preprocessing Library for health data and more",
|
||||||
|
"github": "NIHOPA/NLPre",
|
||||||
|
"pip": "nlpre",
|
||||||
|
"code_example": [
|
||||||
|
"from nlpre import titlecaps, dedash, identify_parenthetical_phrases",
|
||||||
|
"from nlpre import replace_acronyms, replace_from_dictionary",
|
||||||
|
"ABBR = identify_parenthetical_phrases()(text)",
|
||||||
|
"parsers = [dedash(), titlecaps(), replace_acronyms(ABBR),",
|
||||||
|
" replace_from_dictionary(prefix='MeSH_')]",
|
||||||
|
"for f in parsers:",
|
||||||
|
" text = f(text)",
|
||||||
|
"print(text)"
|
||||||
|
],
|
||||||
|
"category": ["scientific"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "Chatterbot",
|
||||||
|
"title": "Chatterbot",
|
||||||
|
"slogan": "A machine-learning based conversational dialog engine for creating chat bots",
|
||||||
|
"github": "gunthercox/ChatterBot",
|
||||||
|
"pip": "chatterbot",
|
||||||
|
"thumb": "https://i.imgur.com/eyAhwXk.jpg",
|
||||||
|
"code_example": [
|
||||||
|
"from chatterbot import ChatBot",
|
||||||
|
"from chatterbot.trainers import ListTrainer",
|
||||||
|
"# Create a new chat bot named Charlie",
|
||||||
|
"chatbot = ChatBot('Charlie')",
|
||||||
|
"trainer = ListTrainer(chatbot)",
|
||||||
|
"trainer.train([",
|
||||||
|
"'Hi, can I help you?',",
|
||||||
|
"'Sure, I would like to book a flight to Iceland.",
|
||||||
|
"'Your flight has been booked.'",
|
||||||
|
"])",
|
||||||
|
"",
|
||||||
|
"response = chatbot.get_response('I would like to book a flight.')"
|
||||||
|
],
|
||||||
|
"author": "Gunther Cox",
|
||||||
|
"author_links": {
|
||||||
|
"github": "gunthercox"
|
||||||
|
},
|
||||||
|
"category": ["conversational", "standalone"],
|
||||||
|
"tags": ["chatbots"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "saber",
|
||||||
|
"title": "saber",
|
||||||
|
"slogan": "Deep-learning based tool for information extraction in the biomedical domain",
|
||||||
|
"github": "BaderLab/saber",
|
||||||
|
"pip": "saber",
|
||||||
|
"thumb": "https://raw.githubusercontent.com/BaderLab/saber/master/docs/img/saber_logo.png",
|
||||||
|
"code_example": [
|
||||||
|
"from saber.saber import Saber",
|
||||||
|
"saber = Saber()",
|
||||||
|
"saber.load('PRGE')",
|
||||||
|
"saber.annotate('The phosphorylation of Hdm2 by MK2 promotes the ubiquitination of p53.')"
|
||||||
|
],
|
||||||
|
"author": "Bader Lab, University of Toronto",
|
||||||
|
"category": ["scientific"],
|
||||||
|
"tags": ["keras", "biomedical"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "alibi",
|
||||||
|
"title": "alibi",
|
||||||
|
"slogan": "Algorithms for monitoring and explaining machine learning models ",
|
||||||
|
"github": "SeldonIO/alibi",
|
||||||
|
"pip": "alibi",
|
||||||
|
"thumb": "https://i.imgur.com/YkzQHRp.png",
|
||||||
|
"code_example": [
|
||||||
|
"from alibi.explainers import AnchorTabular",
|
||||||
|
"explainer = AnchorTabular(predict_fn, feature_names)",
|
||||||
|
"explainer.fit(X_train)",
|
||||||
|
"explainer.explain(x)"
|
||||||
|
],
|
||||||
|
"author": "Seldon",
|
||||||
|
"category": ["standalone", "research"]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": "spacymoji",
|
"id": "spacymoji",
|
||||||
"slogan": "Emoji handling and meta data as a spaCy pipeline component",
|
"slogan": "Emoji handling and meta data as a spaCy pipeline component",
|
||||||
|
@ -143,7 +245,7 @@
|
||||||
"doc = nlp(my_doc_text)"
|
"doc = nlp(my_doc_text)"
|
||||||
],
|
],
|
||||||
"author": "tc64",
|
"author": "tc64",
|
||||||
"author_link": {
|
"author_links": {
|
||||||
"github": "tc64"
|
"github": "tc64"
|
||||||
},
|
},
|
||||||
"category": ["pipeline"]
|
"category": ["pipeline"]
|
||||||
|
@ -346,7 +448,7 @@
|
||||||
"author_links": {
|
"author_links": {
|
||||||
"github": "huggingface"
|
"github": "huggingface"
|
||||||
},
|
},
|
||||||
"category": ["standalone", "conversational"],
|
"category": ["standalone", "conversational", "models"],
|
||||||
"tags": ["coref"]
|
"tags": ["coref"]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -538,7 +640,7 @@
|
||||||
"twitter": "allenai_org",
|
"twitter": "allenai_org",
|
||||||
"website": "http://allenai.org"
|
"website": "http://allenai.org"
|
||||||
},
|
},
|
||||||
"category": ["models", "research"]
|
"category": ["scientific", "models", "research"]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": "textacy",
|
"id": "textacy",
|
||||||
|
@ -601,7 +703,7 @@
|
||||||
"github": "ahalterman",
|
"github": "ahalterman",
|
||||||
"twitter": "ahalterman"
|
"twitter": "ahalterman"
|
||||||
},
|
},
|
||||||
"category": ["standalone"]
|
"category": ["standalone", "scientific"]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": "kindred",
|
"id": "kindred",
|
||||||
|
@ -626,7 +728,7 @@
|
||||||
"author_links": {
|
"author_links": {
|
||||||
"github": "jakelever"
|
"github": "jakelever"
|
||||||
},
|
},
|
||||||
"category": ["standalone"]
|
"category": ["standalone", "scientific"]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": "sense2vec",
|
"id": "sense2vec",
|
||||||
|
@ -837,6 +939,42 @@
|
||||||
},
|
},
|
||||||
"category": ["standalone"]
|
"category": ["standalone"]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": "prefect",
|
||||||
|
"title": "Prefect",
|
||||||
|
"slogan": "Workflow management system designed for modern infrastructure",
|
||||||
|
"github": "PrefectHQ/prefect",
|
||||||
|
"pip": "prefect",
|
||||||
|
"thumb": "https://i.imgur.com/oLTwr0e.png",
|
||||||
|
"code_example": [
|
||||||
|
"from prefect import Flow",
|
||||||
|
"from prefect.tasks.spacy.spacy_tasks import SpacyNLP",
|
||||||
|
"import spacy",
|
||||||
|
"",
|
||||||
|
"nlp = spacy.load(\"en_core_web_sm\")",
|
||||||
|
"",
|
||||||
|
"with Flow(\"Natural Language Processing\") as flow:",
|
||||||
|
" doc = SpacyNLP(text=\"This is some text\", nlp=nlp)",
|
||||||
|
"",
|
||||||
|
"flow.run()"
|
||||||
|
],
|
||||||
|
"author": "Prefect",
|
||||||
|
"author_links": {
|
||||||
|
"website": "https://prefect.io"
|
||||||
|
},
|
||||||
|
"category": ["standalone"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "graphbrain",
|
||||||
|
"title": "Graphbrain",
|
||||||
|
"slogan": "Automated meaning extraction and text understanding",
|
||||||
|
"description": "Graphbrain is an Artificial Intelligence open-source software library and scientific research tool. Its aim is to facilitate automated meaning extraction and text understanding, as well as the exploration and inference of knowledge.",
|
||||||
|
"github": "graphbrain/graphbrain",
|
||||||
|
"pip": "graphbrain",
|
||||||
|
"thumb": "https://i.imgur.com/cct9W1E.png",
|
||||||
|
"author": "Graphbrain",
|
||||||
|
"category": ["standalone"]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"type": "education",
|
"type": "education",
|
||||||
"id": "oreilly-python-ds",
|
"id": "oreilly-python-ds",
|
||||||
|
@ -883,36 +1021,6 @@
|
||||||
"author": "Bhargav Srinivasa-Desikan",
|
"author": "Bhargav Srinivasa-Desikan",
|
||||||
"category": ["books"]
|
"category": ["books"]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"type": "education",
|
|
||||||
"id": "datacamp-nlp-fundamentals",
|
|
||||||
"title": "Natural Language Processing Fundamentals in Python",
|
|
||||||
"slogan": "Datacamp, 2017",
|
|
||||||
"description": "In this course, you'll learn Natural Language Processing (NLP) basics, such as how to identify and separate words, how to extract topics in a text, and how to build your own fake news classifier. You'll also learn how to use basic libraries such as NLTK, alongside libraries which utilize deep learning to solve common NLP problems. This course will give you the foundation to process and parse text as you move forward in your Python learning.",
|
|
||||||
"url": "https://www.datacamp.com/courses/natural-language-processing-fundamentals-in-python",
|
|
||||||
"thumb": "https://i.imgur.com/0Zks7c0.jpg",
|
|
||||||
"author": "Katharine Jarmul",
|
|
||||||
"author_links": {
|
|
||||||
"twitter": "kjam"
|
|
||||||
},
|
|
||||||
"category": ["courses"]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"type": "education",
|
|
||||||
"id": "datacamp-advanced-nlp",
|
|
||||||
"title": "Advanced Natural Language Processing with spaCy",
|
|
||||||
"slogan": "Datacamp, 2019",
|
|
||||||
"description": "If you're working with a lot of text, you'll eventually want to know more about it. For example, what's it about? What do the words mean in context? Who is doing what to whom? What companies and products are mentioned? Which texts are similar to each other? In this course, you'll learn how to use spaCy, a fast-growing industry standard library for NLP in Python, to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
|
|
||||||
"url": "https://www.datacamp.com/courses/advanced-nlp-with-spacy",
|
|
||||||
"thumb": "https://i.imgur.com/0Zks7c0.jpg",
|
|
||||||
"author": "Ines Montani",
|
|
||||||
"author_links": {
|
|
||||||
"twitter": "_inesmontani",
|
|
||||||
"github": "ines",
|
|
||||||
"website": "https://ines.io"
|
|
||||||
},
|
|
||||||
"category": ["courses"]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"type": "education",
|
"type": "education",
|
||||||
"id": "learning-path-spacy",
|
"id": "learning-path-spacy",
|
||||||
|
@ -924,6 +1032,23 @@
|
||||||
"author": "Aaron Kramer",
|
"author": "Aaron Kramer",
|
||||||
"category": ["courses"]
|
"category": ["courses"]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"type": "education",
|
||||||
|
"id": "spacy-course",
|
||||||
|
"title": "Advanced NLP with spaCy",
|
||||||
|
"slogan": "spaCy, 2019",
|
||||||
|
"description": "In this free interactive course, you'll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches.",
|
||||||
|
"url": "https://course.spacy.io",
|
||||||
|
"image": "https://i.imgur.com/JC00pHW.jpg",
|
||||||
|
"thumb": "https://i.imgur.com/5RXLtrr.jpg",
|
||||||
|
"author": "Ines Montani",
|
||||||
|
"author_links": {
|
||||||
|
"twitter": "_inesmontani",
|
||||||
|
"github": "ines",
|
||||||
|
"website": "https://ines.io"
|
||||||
|
},
|
||||||
|
"category": ["courses"]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"type": "education",
|
"type": "education",
|
||||||
"id": "video-spacys-ner-model",
|
"id": "video-spacys-ner-model",
|
||||||
|
@ -1010,6 +1135,22 @@
|
||||||
},
|
},
|
||||||
"category": ["podcasts"]
|
"category": ["podcasts"]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"type": "education",
|
||||||
|
"id": "twimlai-podcast",
|
||||||
|
"title": "TWiML & AI: Practical NLP with spaCy and Prodigy",
|
||||||
|
"slogan": "May 2019",
|
||||||
|
"description": "\"Ines and I caught up to discuss her various projects, including the aforementioned SpaCy, an open-source NLP library built with a focus on industry and production use cases. In our conversation, Ines gives us an overview of the SpaCy Library, a look at some of the use cases that excite her, and the Spacy community and contributors. We also discuss her work with Prodigy, an annotation service tool that uses continuous active learning to train models, and finally, what other exciting projects she is working on.\"",
|
||||||
|
"thumb": "https://i.imgur.com/ng2F5gK.png",
|
||||||
|
"url": "https://twimlai.com/twiml-talk-262-practical-natural-language-processing-with-spacy-and-prodigy-w-ines-montani",
|
||||||
|
"iframe": "https://html5-player.libsyn.com/embed/episode/id/9691514/height/90/theme/custom/thumbnail/no/preload/no/direction/backward/render-playlist/no/custom-color/3e85b1/",
|
||||||
|
"iframe_height": 90,
|
||||||
|
"author": "Sam Charrington",
|
||||||
|
"author_links": {
|
||||||
|
"website": "https://twimlai.com"
|
||||||
|
},
|
||||||
|
"category": ["podcasts"]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": "adam_qas",
|
"id": "adam_qas",
|
||||||
"title": "ADAM: Question Answering System",
|
"title": "ADAM: Question Answering System",
|
||||||
|
@ -1068,7 +1209,7 @@
|
||||||
"github": "ecohealthalliance",
|
"github": "ecohealthalliance",
|
||||||
"website": " https://ecohealthalliance.org/"
|
"website": " https://ecohealthalliance.org/"
|
||||||
},
|
},
|
||||||
"category": ["research", "standalone"]
|
"category": ["scientific", "standalone"]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": "self-attentive-parser",
|
"id": "self-attentive-parser",
|
||||||
|
@ -1311,8 +1452,100 @@
|
||||||
"website": "http://w4nderlu.st"
|
"website": "http://w4nderlu.st"
|
||||||
},
|
},
|
||||||
"category": ["standalone", "research"]
|
"category": ["standalone", "research"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "gracyql",
|
||||||
|
"title": "gracyql",
|
||||||
|
"slogan": "A thin GraphQL wrapper around spacy",
|
||||||
|
"github": "oterrier/gracyql",
|
||||||
|
"description": "An example of a basic [Starlette](https://github.com/encode/starlette) app using [Spacy](https://github.com/explosion/spaCy) and [Graphene](https://github.com/graphql-python/graphene). The main goal is to be able to use the amazing power of spaCy from other languages and retrieving only the information you need thanks to the GraphQL query definition. The GraphQL schema tries to mimic as much as possible the original Spacy API with classes Doc, Span and Token.",
|
||||||
|
"thumb": "https://i.imgur.com/xC7zpTO.png",
|
||||||
|
"category": ["apis"],
|
||||||
|
"tags": ["graphql"],
|
||||||
|
"code_example": [
|
||||||
|
"query ParserDisabledQuery {",
|
||||||
|
" nlp(model: \"en\", disable: [\"parser\", \"ner\"]) {",
|
||||||
|
" doc(text: \"I live in Grenoble, France\") {",
|
||||||
|
" text",
|
||||||
|
" tokens {",
|
||||||
|
" id",
|
||||||
|
" pos",
|
||||||
|
" lemma",
|
||||||
|
" dep",
|
||||||
|
" }",
|
||||||
|
" ents {",
|
||||||
|
" start",
|
||||||
|
" end",
|
||||||
|
" label",
|
||||||
|
" }",
|
||||||
|
" }",
|
||||||
|
" }",
|
||||||
|
"}"
|
||||||
|
],
|
||||||
|
"code_language": "json",
|
||||||
|
"author": "Olivier Terrier",
|
||||||
|
"author_links": {
|
||||||
|
"github": "oterrier"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "pyInflect",
|
||||||
|
"slogan": "A python module for word inflections",
|
||||||
|
"description": "This package uses the [spaCy 2.0 extensions](https://spacy.io/usage/processing-pipelines#extensions) to add word inflections to the system.",
|
||||||
|
"github": "bjascob/pyInflect",
|
||||||
|
"pip": "pyinflect",
|
||||||
|
"code_example": [
|
||||||
|
"import spacy",
|
||||||
|
"import pyinflect",
|
||||||
|
"",
|
||||||
|
"nlp = spacy.load('en_core_web_sm')",
|
||||||
|
"doc = nlp('This is an example.')",
|
||||||
|
"doc[3].tag_ # NN",
|
||||||
|
"doc[3]._.inflect('NNS') # examples"
|
||||||
|
],
|
||||||
|
"author": "Brad Jascob",
|
||||||
|
"author_links": {
|
||||||
|
"github": "bjascob"
|
||||||
|
},
|
||||||
|
"category": ["pipeline"],
|
||||||
|
"tags": ["inflection"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "NGym",
|
||||||
|
"title": "NeuralGym",
|
||||||
|
"slogan": "A little Windows GUI for training models with spaCy",
|
||||||
|
"description": "NeuralGym is a Python application for Windows with a graphical user interface to train models with spaCy. Run the application, select an output folder, a training data file in spaCy's data format, a spaCy model or blank model and press 'Start'.",
|
||||||
|
"github": "d5555/NeuralGym",
|
||||||
|
"url": "https://github.com/d5555/NeuralGym",
|
||||||
|
"image": "https://github.com/d5555/NeuralGym/raw/master/NGym.png",
|
||||||
|
"thumb": "https://github.com/d5555/NeuralGym/raw/master/NGym/web.png",
|
||||||
|
"author": "d5555",
|
||||||
|
"category": ["training"],
|
||||||
|
"tags": ["windows"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "holmes",
|
||||||
|
"title": "Holmes",
|
||||||
|
"slogan": "Information extraction from English and German texts based on predicate logic",
|
||||||
|
"github": "msg-systems/holmes-extractor",
|
||||||
|
"url": "https://github.com/msg-systems/holmes-extractor",
|
||||||
|
"description": "Holmes is a Python 3 library that supports a number of use cases involving information extraction from English and German texts, including chatbot, structural search, topic matching and supervised document classification.",
|
||||||
|
"pip": "holmes-extractor",
|
||||||
|
"category": ["conversational", "standalone"],
|
||||||
|
"tags": ["chatbots", "text-processing"],
|
||||||
|
"code_example": [
|
||||||
|
"import holmes_extractor as holmes",
|
||||||
|
"holmes_manager = holmes.Manager(model='en_coref_lg')",
|
||||||
|
"holmes_manager.register_search_phrase('A big dog chases a cat')",
|
||||||
|
"holmes_manager.start_chatbot_mode_console()"
|
||||||
|
],
|
||||||
|
"author": "Richard Paul Hudson",
|
||||||
|
"author_links": {
|
||||||
|
"github": "richardpaulhudson"
|
||||||
|
}
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
|
||||||
"categories": [
|
"categories": [
|
||||||
{
|
{
|
||||||
"label": "Projects",
|
"label": "Projects",
|
||||||
|
@ -1337,6 +1570,11 @@
|
||||||
"title": "Research",
|
"title": "Research",
|
||||||
"description": "Frameworks and utilities for developing better NLP models, especially using neural networks"
|
"description": "Frameworks and utilities for developing better NLP models, especially using neural networks"
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": "scientific",
|
||||||
|
"title": "Scientific",
|
||||||
|
"description": "Frameworks and utilities for scientific text processing"
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": "visualizers",
|
"id": "visualizers",
|
||||||
"title": "Visualizers",
|
"title": "Visualizers",
|
||||||
|
@ -1356,6 +1594,11 @@
|
||||||
"id": "standalone",
|
"id": "standalone",
|
||||||
"title": "Standalone",
|
"title": "Standalone",
|
||||||
"description": "Self-contained libraries or tools that use spaCy under the hood"
|
"description": "Self-contained libraries or tools that use spaCy under the hood"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "models",
|
||||||
|
"title": "Models",
|
||||||
|
"description": "Third-party pre-trained models for different languages and domains"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
|
@ -93,6 +93,7 @@ const SEO = ({ description, lang, title, section, sectionTitle, bodyClass }) =>
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<Helmet
|
<Helmet
|
||||||
|
defer={false}
|
||||||
htmlAttributes={{ lang }}
|
htmlAttributes={{ lang }}
|
||||||
bodyAttributes={{ class: bodyClass }}
|
bodyAttributes={{ class: bodyClass }}
|
||||||
title={pageTitle}
|
title={pageTitle}
|
||||||
|
|
|
@ -125,7 +125,7 @@ const UniverseContent = ({ content = [], categories, pageContext, location, mdxC
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<InlineList>
|
<InlineList>
|
||||||
<Button variant="primary" to={github('website/universe/README.md')}>
|
<Button variant="primary" to={github('website/UNIVERSE.md')}>
|
||||||
Read the docs
|
Read the docs
|
||||||
</Button>
|
</Button>
|
||||||
<Button icon="code" to={github('website/meta/universe.json')}>
|
<Button icon="code" to={github('website/meta/universe.json')}>
|
||||||
|
|
|
@ -75,16 +75,6 @@ const Landing = ({ data }) => {
|
||||||
<LandingSubtitle>in Python</LandingSubtitle>
|
<LandingSubtitle>in Python</LandingSubtitle>
|
||||||
</LandingHeader>
|
</LandingHeader>
|
||||||
<LandingGrid blocks>
|
<LandingGrid blocks>
|
||||||
<LandingCard title="Fastest in the world">
|
|
||||||
<p>
|
|
||||||
spaCy excels at large-scale information extraction tasks. It's written from
|
|
||||||
the ground up in carefully memory-managed Cython. Independent research has
|
|
||||||
confirmed that spaCy is the fastest in the world. If your application needs
|
|
||||||
to process entire web dumps, spaCy is the library you want to be using.
|
|
||||||
</p>
|
|
||||||
<LandingButton to="/usage/facts-figures">Facts & Figures</LandingButton>
|
|
||||||
</LandingCard>
|
|
||||||
|
|
||||||
<LandingCard title="Get things done">
|
<LandingCard title="Get things done">
|
||||||
<p>
|
<p>
|
||||||
spaCy is designed to help you do real work — to build real products, or
|
spaCy is designed to help you do real work — to build real products, or
|
||||||
|
@ -92,7 +82,16 @@ const Landing = ({ data }) => {
|
||||||
wasting it. It's easy to install, and its API is simple and productive. We
|
wasting it. It's easy to install, and its API is simple and productive. We
|
||||||
like to think of spaCy as the Ruby on Rails of Natural Language Processing.
|
like to think of spaCy as the Ruby on Rails of Natural Language Processing.
|
||||||
</p>
|
</p>
|
||||||
<LandingButton to="/usage">Get started</LandingButton>
|
<LandingButton to="/usage/spacy-101">Get started</LandingButton>
|
||||||
|
</LandingCard>
|
||||||
|
<LandingCard title="Blazing fast">
|
||||||
|
<p>
|
||||||
|
spaCy excels at large-scale information extraction tasks. It's written from
|
||||||
|
the ground up in carefully memory-managed Cython. Independent research in
|
||||||
|
2015 found spaCy to be the fastest in the world. If your application needs
|
||||||
|
to process entire web dumps, spaCy is the library you want to be using.
|
||||||
|
</p>
|
||||||
|
<LandingButton to="/usage/facts-figures">Facts & Figures</LandingButton>
|
||||||
</LandingCard>
|
</LandingCard>
|
||||||
|
|
||||||
<LandingCard title="Deep learning">
|
<LandingCard title="Deep learning">
|
||||||
|
@ -129,6 +128,7 @@ const Landing = ({ data }) => {
|
||||||
<Li>
|
<Li>
|
||||||
Pre-trained <strong>word vectors</strong>
|
Pre-trained <strong>word vectors</strong>
|
||||||
</Li>
|
</Li>
|
||||||
|
<Li>State-of-the-art speed</Li>
|
||||||
<Li>
|
<Li>
|
||||||
Easy <strong>deep learning</strong> integration
|
Easy <strong>deep learning</strong> integration
|
||||||
</Li>
|
</Li>
|
||||||
|
@ -144,7 +144,6 @@ const Landing = ({ data }) => {
|
||||||
<Li>
|
<Li>
|
||||||
Easy <strong>model packaging</strong> and deployment
|
Easy <strong>model packaging</strong> and deployment
|
||||||
</Li>
|
</Li>
|
||||||
<Li>State-of-the-art speed</Li>
|
|
||||||
<Li>Robust, rigorously evaluated accuracy</Li>
|
<Li>Robust, rigorously evaluated accuracy</Li>
|
||||||
</Ul>
|
</Ul>
|
||||||
</LandingCol>
|
</LandingCol>
|
||||||
|
|
Loading…
Reference in New Issue
Block a user