mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-03 22:06:37 +03:00
Merge branch 'master' into spacy.io
This commit is contained in:
commit
db81604d54
106
.github/contributors/AlJohri.md
vendored
Normal file
106
.github/contributors/AlJohri.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Al Johri |
|
||||||
|
| Company name (if applicable) | N/A |
|
||||||
|
| Title or role (if applicable) | N/A |
|
||||||
|
| Date | December 27th, 2019 |
|
||||||
|
| GitHub username | AlJohri |
|
||||||
|
| Website (optional) | http://aljohri.com/ |
|
106
.github/contributors/Olamyy.md
vendored
Normal file
106
.github/contributors/Olamyy.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [ x ] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Olamilekan Wahab |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 8/11/2019 |
|
||||||
|
| GitHub username | Olamyy |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/iechevarria.md
vendored
Normal file
106
.github/contributors/iechevarria.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | --------------------- |
|
||||||
|
| Name | Ivan Echevarria |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2019-12-24 |
|
||||||
|
| GitHub username | iechevarria |
|
||||||
|
| Website (optional) | https://echevarria.io |
|
106
.github/contributors/iurshina.md
vendored
Normal file
106
.github/contributors/iurshina.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Anastasiia Iurshina |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 28.12.2019 |
|
||||||
|
| GitHub username | iurshina |
|
||||||
|
| Website (optional) | |
|
|
@ -30,7 +30,7 @@ S[:i] -> T[:j] (at D[i,j])
|
||||||
S[:i+1] -> T[:j] (at D[i+1,j])
|
S[:i+1] -> T[:j] (at D[i+1,j])
|
||||||
S[:i] -> T[:j+1] (at D[i,j+1])
|
S[:i] -> T[:j+1] (at D[i,j+1])
|
||||||
|
|
||||||
Further, we now we can tranform:
|
Further, now we can transform:
|
||||||
S[:i+1] -> S[:i] (DEL) for 1,
|
S[:i+1] -> S[:i] (DEL) for 1,
|
||||||
T[:j+1] -> T[:j] (INS) for 1.
|
T[:j+1] -> T[:j] (INS) for 1.
|
||||||
S[i+1] -> T[j+1] (SUB) for 0 or 1
|
S[i+1] -> T[j+1] (SUB) for 0 or 1
|
||||||
|
|
|
@ -55,9 +55,10 @@ def render(
|
||||||
html = RENDER_WRAPPER(html)
|
html = RENDER_WRAPPER(html)
|
||||||
if jupyter or (jupyter is None and is_in_jupyter()):
|
if jupyter or (jupyter is None and is_in_jupyter()):
|
||||||
# return HTML rendered by IPython display()
|
# return HTML rendered by IPython display()
|
||||||
|
# See #4840 for details on span wrapper to disable mathjax
|
||||||
from IPython.core.display import display, HTML
|
from IPython.core.display import display, HTML
|
||||||
|
|
||||||
return display(HTML(html))
|
return display(HTML('<span class="tex2jax_ignore">{}</span>'.format(html)))
|
||||||
return html
|
return html
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -78,10 +78,9 @@ class Warnings(object):
|
||||||
W015 = ("As of v2.1.0, the use of keyword arguments to exclude fields from "
|
W015 = ("As of v2.1.0, the use of keyword arguments to exclude fields from "
|
||||||
"being serialized or deserialized is deprecated. Please use the "
|
"being serialized or deserialized is deprecated. Please use the "
|
||||||
"`exclude` argument instead. For example: exclude=['{arg}'].")
|
"`exclude` argument instead. For example: exclude=['{arg}'].")
|
||||||
W016 = ("The keyword argument `n_threads` on the is now deprecated, as "
|
W016 = ("The keyword argument `n_threads` is now deprecated. As of v2.2.2, "
|
||||||
"the v2.x models cannot release the global interpreter lock. "
|
"the argument `n_process` controls parallel inference via "
|
||||||
"Future versions may introduce a `n_process` argument for "
|
"multiprocessing.")
|
||||||
"parallel inference via multiprocessing.")
|
|
||||||
W017 = ("Alias '{alias}' already exists in the Knowledge Base.")
|
W017 = ("Alias '{alias}' already exists in the Knowledge Base.")
|
||||||
W018 = ("Entity '{entity}' already exists in the Knowledge Base - "
|
W018 = ("Entity '{entity}' already exists in the Knowledge Base - "
|
||||||
"ignoring the duplicate entry.")
|
"ignoring the duplicate entry.")
|
||||||
|
@ -105,6 +104,10 @@ class Warnings(object):
|
||||||
W025 = ("'{name}' requires '{attr}' to be assigned, but none of the "
|
W025 = ("'{name}' requires '{attr}' to be assigned, but none of the "
|
||||||
"previous components in the pipeline declare that they assign it.")
|
"previous components in the pipeline declare that they assign it.")
|
||||||
W026 = ("Unable to set all sentence boundaries from dependency parses.")
|
W026 = ("Unable to set all sentence boundaries from dependency parses.")
|
||||||
|
W027 = ("Found a large training file of {size} bytes. Note that it may "
|
||||||
|
"be more efficient to split your training data into multiple "
|
||||||
|
"smaller JSON files instead.")
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@add_codes
|
@add_codes
|
||||||
|
|
|
@ -13,7 +13,7 @@ import srsly
|
||||||
|
|
||||||
from .syntax import nonproj
|
from .syntax import nonproj
|
||||||
from .tokens import Doc, Span
|
from .tokens import Doc, Span
|
||||||
from .errors import Errors, AlignmentError
|
from .errors import Errors, AlignmentError, user_warning, Warnings
|
||||||
from .compat import path2str
|
from .compat import path2str
|
||||||
from . import util
|
from . import util
|
||||||
from .util import minibatch, itershuffle
|
from .util import minibatch, itershuffle
|
||||||
|
@ -557,12 +557,16 @@ def _json_iterate(loc):
|
||||||
loc = util.ensure_path(loc)
|
loc = util.ensure_path(loc)
|
||||||
with loc.open("rb") as file_:
|
with loc.open("rb") as file_:
|
||||||
py_raw = file_.read()
|
py_raw = file_.read()
|
||||||
|
cdef long file_length = len(py_raw)
|
||||||
|
if file_length > 2 ** 30:
|
||||||
|
user_warning(Warnings.W027.format(size=file_length))
|
||||||
|
|
||||||
raw = <char*>py_raw
|
raw = <char*>py_raw
|
||||||
cdef int square_depth = 0
|
cdef int square_depth = 0
|
||||||
cdef int curly_depth = 0
|
cdef int curly_depth = 0
|
||||||
cdef int inside_string = 0
|
cdef int inside_string = 0
|
||||||
cdef int escape = 0
|
cdef int escape = 0
|
||||||
cdef int start = -1
|
cdef long start = -1
|
||||||
cdef char c
|
cdef char c
|
||||||
cdef char quote = ord('"')
|
cdef char quote = ord('"')
|
||||||
cdef char backslash = ord("\\")
|
cdef char backslash = ord("\\")
|
||||||
|
@ -570,7 +574,7 @@ def _json_iterate(loc):
|
||||||
cdef char close_square = ord("]")
|
cdef char close_square = ord("]")
|
||||||
cdef char open_curly = ord("{")
|
cdef char open_curly = ord("{")
|
||||||
cdef char close_curly = ord("}")
|
cdef char close_curly = ord("}")
|
||||||
for i in range(len(py_raw)):
|
for i in range(file_length):
|
||||||
c = raw[i]
|
c = raw[i]
|
||||||
if escape:
|
if escape:
|
||||||
escape = False
|
escape = False
|
||||||
|
|
|
@ -4249,20 +4249,20 @@ TAG_MAP = {
|
||||||
"Voice": "Act",
|
"Voice": "Act",
|
||||||
"Case": "Nom|Gen|Dat|Acc|Voc",
|
"Case": "Nom|Gen|Dat|Acc|Voc",
|
||||||
},
|
},
|
||||||
'ADJ': {POS: ADJ},
|
"ADJ": {POS: ADJ},
|
||||||
'ADP': {POS: ADP},
|
"ADP": {POS: ADP},
|
||||||
'ADV': {POS: ADV},
|
"ADV": {POS: ADV},
|
||||||
'AtDf': {POS: DET},
|
"AtDf": {POS: DET},
|
||||||
'AUX': {POS: AUX},
|
"AUX": {POS: AUX},
|
||||||
'CCONJ': {POS: CCONJ},
|
"CCONJ": {POS: CCONJ},
|
||||||
'DET': {POS: DET},
|
"DET": {POS: DET},
|
||||||
'NOUN': {POS: NOUN},
|
"NOUN": {POS: NOUN},
|
||||||
'NUM': {POS: NUM},
|
"NUM": {POS: NUM},
|
||||||
'PART': {POS: PART},
|
"PART": {POS: PART},
|
||||||
'PRON': {POS: PRON},
|
"PRON": {POS: PRON},
|
||||||
'PROPN': {POS: PROPN},
|
"PROPN": {POS: PROPN},
|
||||||
'SCONJ': {POS: SCONJ},
|
"SCONJ": {POS: SCONJ},
|
||||||
'SYM': {POS: SYM},
|
"SYM": {POS: SYM},
|
||||||
'VERB': {POS: VERB},
|
"VERB": {POS: VERB},
|
||||||
'X': {POS: X},
|
"X": {POS: X},
|
||||||
}
|
}
|
||||||
|
|
|
@ -16,7 +16,8 @@ from ...util import DummyTokenizer
|
||||||
# the flow by creating a dummy with the same interface.
|
# the flow by creating a dummy with the same interface.
|
||||||
DummyNode = namedtuple("DummyNode", ["surface", "pos", "feature"])
|
DummyNode = namedtuple("DummyNode", ["surface", "pos", "feature"])
|
||||||
DummyNodeFeatures = namedtuple("DummyNodeFeatures", ["lemma"])
|
DummyNodeFeatures = namedtuple("DummyNodeFeatures", ["lemma"])
|
||||||
DummySpace = DummyNode(' ', ' ', DummyNodeFeatures(' '))
|
DummySpace = DummyNode(" ", " ", DummyNodeFeatures(" "))
|
||||||
|
|
||||||
|
|
||||||
def try_fugashi_import():
|
def try_fugashi_import():
|
||||||
"""Fugashi is required for Japanese support, so check for it.
|
"""Fugashi is required for Japanese support, so check for it.
|
||||||
|
@ -27,8 +28,7 @@ def try_fugashi_import():
|
||||||
return fugashi
|
return fugashi
|
||||||
except ImportError:
|
except ImportError:
|
||||||
raise ImportError(
|
raise ImportError(
|
||||||
"Japanese support requires Fugashi: "
|
"Japanese support requires Fugashi: " "https://github.com/polm/fugashi"
|
||||||
"https://github.com/polm/fugashi"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@ -55,13 +55,14 @@ def resolve_pos(token):
|
||||||
return token.pos + ",ADJ"
|
return token.pos + ",ADJ"
|
||||||
return token.pos
|
return token.pos
|
||||||
|
|
||||||
|
|
||||||
def get_words_and_spaces(tokenizer, text):
|
def get_words_and_spaces(tokenizer, text):
|
||||||
"""Get the individual tokens that make up the sentence and handle white space.
|
"""Get the individual tokens that make up the sentence and handle white space.
|
||||||
|
|
||||||
Japanese doesn't usually use white space, and MeCab's handling of it for
|
Japanese doesn't usually use white space, and MeCab's handling of it for
|
||||||
multiple spaces in a row is somewhat awkward.
|
multiple spaces in a row is somewhat awkward.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
tokens = tokenizer.parseToNodeList(text)
|
tokens = tokenizer.parseToNodeList(text)
|
||||||
|
|
||||||
words = []
|
words = []
|
||||||
|
@ -76,6 +77,7 @@ def get_words_and_spaces(tokenizer, text):
|
||||||
spaces.append(bool(token.white_space))
|
spaces.append(bool(token.white_space))
|
||||||
return words, spaces
|
return words, spaces
|
||||||
|
|
||||||
|
|
||||||
class JapaneseTokenizer(DummyTokenizer):
|
class JapaneseTokenizer(DummyTokenizer):
|
||||||
def __init__(self, cls, nlp=None):
|
def __init__(self, cls, nlp=None):
|
||||||
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
||||||
|
|
|
@ -1,8 +1,7 @@
|
||||||
# coding: utf8
|
# coding: utf8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
from ..char_classes import LIST_ELLIPSES, LIST_ICONS
|
from ..char_classes import LIST_ELLIPSES, LIST_ICONS, ALPHA, ALPHA_LOWER, ALPHA_UPPER
|
||||||
from ..char_classes import CONCAT_QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
|
|
||||||
|
|
||||||
ELISION = " ' ’ ".strip().replace(" ", "")
|
ELISION = " ' ’ ".strip().replace(" ", "")
|
||||||
|
|
||||||
|
|
|
@ -20,7 +20,7 @@ for exc_data in [
|
||||||
{ORTH: "asw.", LEMMA: "an sou weider", NORM: "an sou weider"},
|
{ORTH: "asw.", LEMMA: "an sou weider", NORM: "an sou weider"},
|
||||||
{ORTH: "etc.", LEMMA: "et cetera", NORM: "et cetera"},
|
{ORTH: "etc.", LEMMA: "et cetera", NORM: "et cetera"},
|
||||||
{ORTH: "bzw.", LEMMA: "bezéiungsweis", NORM: "bezéiungsweis"},
|
{ORTH: "bzw.", LEMMA: "bezéiungsweis", NORM: "bezéiungsweis"},
|
||||||
{ORTH: "Jan.", LEMMA: "Januar", NORM: "Januar"}
|
{ORTH: "Jan.", LEMMA: "Januar", NORM: "Januar"},
|
||||||
]:
|
]:
|
||||||
_exc[exc_data[ORTH]] = [exc_data]
|
_exc[exc_data[ORTH]] = [exc_data]
|
||||||
|
|
||||||
|
|
|
@ -467,38 +467,110 @@ TAG_MAP = {
|
||||||
"VERB__VerbForm=Part": {"morph": "VerbForm=Part", POS: VERB},
|
"VERB__VerbForm=Part": {"morph": "VerbForm=Part", POS: VERB},
|
||||||
"VERB___": {"morph": "_", POS: VERB},
|
"VERB___": {"morph": "_", POS: VERB},
|
||||||
"X___": {"morph": "_", POS: X},
|
"X___": {"morph": "_", POS: X},
|
||||||
'CCONJ___': {"morph": "_", POS: CCONJ},
|
"CCONJ___": {"morph": "_", POS: CCONJ},
|
||||||
"ADJ__Abbr=Yes": {"morph": "Abbr=Yes", POS: ADJ},
|
"ADJ__Abbr=Yes": {"morph": "Abbr=Yes", POS: ADJ},
|
||||||
"ADJ__Abbr=Yes|Degree=Pos": {"morph": "Abbr=Yes|Degree=Pos", POS: ADJ},
|
"ADJ__Abbr=Yes|Degree=Pos": {"morph": "Abbr=Yes|Degree=Pos", POS: ADJ},
|
||||||
"ADJ__Case=Gen|Definite=Def|Number=Sing|VerbForm=Part": {"morph": "Case=Gen|Definite=Def|Number=Sing|VerbForm=Part", POS: ADJ},
|
"ADJ__Case=Gen|Definite=Def|Number=Sing|VerbForm=Part": {
|
||||||
"ADJ__Definite=Def|Number=Sing|VerbForm=Part": {"morph": "Definite=Def|Number=Sing|VerbForm=Part", POS: ADJ},
|
"morph": "Case=Gen|Definite=Def|Number=Sing|VerbForm=Part",
|
||||||
"ADJ__Definite=Ind|Gender=Masc|Number=Sing|VerbForm=Part": {"morph": "Definite=Ind|Gender=Masc|Number=Sing|VerbForm=Part", POS: ADJ},
|
POS: ADJ,
|
||||||
"ADJ__Definite=Ind|Gender=Neut|Number=Sing|VerbForm=Part": {"morph": "Definite=Ind|Gender=Neut|Number=Sing|VerbForm=Part", POS: ADJ},
|
},
|
||||||
"ADJ__Definite=Ind|Number=Sing|VerbForm=Part": {"morph": "Definite=Ind|Number=Sing|VerbForm=Part", POS: ADJ},
|
"ADJ__Definite=Def|Number=Sing|VerbForm=Part": {
|
||||||
|
"morph": "Definite=Def|Number=Sing|VerbForm=Part",
|
||||||
|
POS: ADJ,
|
||||||
|
},
|
||||||
|
"ADJ__Definite=Ind|Gender=Masc|Number=Sing|VerbForm=Part": {
|
||||||
|
"morph": "Definite=Ind|Gender=Masc|Number=Sing|VerbForm=Part",
|
||||||
|
POS: ADJ,
|
||||||
|
},
|
||||||
|
"ADJ__Definite=Ind|Gender=Neut|Number=Sing|VerbForm=Part": {
|
||||||
|
"morph": "Definite=Ind|Gender=Neut|Number=Sing|VerbForm=Part",
|
||||||
|
POS: ADJ,
|
||||||
|
},
|
||||||
|
"ADJ__Definite=Ind|Number=Sing|VerbForm=Part": {
|
||||||
|
"morph": "Definite=Ind|Number=Sing|VerbForm=Part",
|
||||||
|
POS: ADJ,
|
||||||
|
},
|
||||||
"ADJ__Number=Sing|VerbForm=Part": {"morph": "Number=Sing|VerbForm=Part", POS: ADJ},
|
"ADJ__Number=Sing|VerbForm=Part": {"morph": "Number=Sing|VerbForm=Part", POS: ADJ},
|
||||||
"ADJ__VerbForm=Part": {"morph": "VerbForm=Part", POS: ADJ},
|
"ADJ__VerbForm=Part": {"morph": "VerbForm=Part", POS: ADJ},
|
||||||
"ADP__Abbr=Yes": {"morph": "Abbr=Yes", POS: ADP},
|
"ADP__Abbr=Yes": {"morph": "Abbr=Yes", POS: ADP},
|
||||||
"ADV__Abbr=Yes": {"morph": "Abbr=Yes", POS: ADV},
|
"ADV__Abbr=Yes": {"morph": "Abbr=Yes", POS: ADV},
|
||||||
"DET__Case=Gen|Gender=Masc|Number=Sing|PronType=Art": {"morph": "Case=Gen|Gender=Masc|Number=Sing|PronType=Art", POS: DET},
|
"DET__Case=Gen|Gender=Masc|Number=Sing|PronType=Art": {
|
||||||
"DET__Case=Gen|Number=Plur|PronType=Tot": {"morph": "Case=Gen|Number=Plur|PronType=Tot", POS: DET},
|
"morph": "Case=Gen|Gender=Masc|Number=Sing|PronType=Art",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Case=Gen|Number=Plur|PronType=Tot": {
|
||||||
|
"morph": "Case=Gen|Number=Plur|PronType=Tot",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
"DET__Definite=Def|PronType=Prs": {"morph": "Definite=Def|PronType=Prs", POS: DET},
|
"DET__Definite=Def|PronType=Prs": {"morph": "Definite=Def|PronType=Prs", POS: DET},
|
||||||
"DET__Definite=Ind|Gender=Fem|Number=Sing|PronType=Prs": {"morph": "Definite=Ind|Gender=Fem|Number=Sing|PronType=Prs", POS: DET},
|
"DET__Definite=Ind|Gender=Fem|Number=Sing|PronType=Prs": {
|
||||||
"DET__Definite=Ind|Gender=Masc|Number=Sing|PronType=Prs": {"morph": "Definite=Ind|Gender=Masc|Number=Sing|PronType=Prs", POS: DET},
|
"morph": "Definite=Ind|Gender=Fem|Number=Sing|PronType=Prs",
|
||||||
"DET__Definite=Ind|Gender=Neut|Number=Sing|PronType=Prs": {"morph": "Definite=Ind|Gender=Neut|Number=Sing|PronType=Prs", POS: DET},
|
POS: DET,
|
||||||
"DET__Gender=Fem|Number=Sing|PronType=Art": {"morph": "Gender=Fem|Number=Sing|PronType=Art", POS: DET},
|
},
|
||||||
"DET__Gender=Fem|Number=Sing|PronType=Ind": {"morph": "Gender=Fem|Number=Sing|PronType=Ind", POS: DET},
|
"DET__Definite=Ind|Gender=Masc|Number=Sing|PronType=Prs": {
|
||||||
"DET__Gender=Fem|Number=Sing|PronType=Prs": {"morph": "Gender=Fem|Number=Sing|PronType=Prs", POS: DET},
|
"morph": "Definite=Ind|Gender=Masc|Number=Sing|PronType=Prs",
|
||||||
"DET__Gender=Fem|Number=Sing|PronType=Tot": {"morph": "Gender=Fem|Number=Sing|PronType=Tot", POS: DET},
|
POS: DET,
|
||||||
"DET__Gender=Masc|Number=Sing|Polarity=Neg|PronType=Neg": {"morph": "Gender=Masc|Number=Sing|Polarity=Neg|PronType=Neg", POS: DET},
|
},
|
||||||
"DET__Gender=Masc|Number=Sing|PronType=Art": {"morph": "Gender=Masc|Number=Sing|PronType=Art", POS: DET},
|
"DET__Definite=Ind|Gender=Neut|Number=Sing|PronType=Prs": {
|
||||||
"DET__Gender=Masc|Number=Sing|PronType=Ind": {"morph": "Gender=Masc|Number=Sing|PronType=Ind", POS: DET},
|
"morph": "Definite=Ind|Gender=Neut|Number=Sing|PronType=Prs",
|
||||||
"DET__Gender=Masc|Number=Sing|PronType=Tot": {"morph": "Gender=Masc|Number=Sing|PronType=Tot", POS: DET},
|
POS: DET,
|
||||||
"DET__Gender=Neut|Number=Sing|Polarity=Neg|PronType=Neg": {"morph": "Gender=Neut|Number=Sing|Polarity=Neg|PronType=Neg", POS: DET},
|
},
|
||||||
"DET__Gender=Neut|Number=Sing|PronType=Art": {"morph": "Gender=Neut|Number=Sing|PronType=Art", POS: DET},
|
"DET__Gender=Fem|Number=Sing|PronType=Art": {
|
||||||
"DET__Gender=Neut|Number=Sing|PronType=Dem,Ind": {"morph": "Gender=Neut|Number=Sing|PronType=Dem,Ind", POS: DET},
|
"morph": "Gender=Fem|Number=Sing|PronType=Art",
|
||||||
"DET__Gender=Neut|Number=Sing|PronType=Ind": {"morph": "Gender=Neut|Number=Sing|PronType=Ind", POS: DET},
|
POS: DET,
|
||||||
"DET__Gender=Neut|Number=Sing|PronType=Tot": {"morph": "Gender=Neut|Number=Sing|PronType=Tot", POS: DET},
|
},
|
||||||
"DET__Number=Plur|Polarity=Neg|PronType=Neg": {"morph": "Number=Plur|Polarity=Neg|PronType=Neg", POS: DET},
|
"DET__Gender=Fem|Number=Sing|PronType=Ind": {
|
||||||
|
"morph": "Gender=Fem|Number=Sing|PronType=Ind",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Fem|Number=Sing|PronType=Prs": {
|
||||||
|
"morph": "Gender=Fem|Number=Sing|PronType=Prs",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Fem|Number=Sing|PronType=Tot": {
|
||||||
|
"morph": "Gender=Fem|Number=Sing|PronType=Tot",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Masc|Number=Sing|Polarity=Neg|PronType=Neg": {
|
||||||
|
"morph": "Gender=Masc|Number=Sing|Polarity=Neg|PronType=Neg",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Masc|Number=Sing|PronType=Art": {
|
||||||
|
"morph": "Gender=Masc|Number=Sing|PronType=Art",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Masc|Number=Sing|PronType=Ind": {
|
||||||
|
"morph": "Gender=Masc|Number=Sing|PronType=Ind",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Masc|Number=Sing|PronType=Tot": {
|
||||||
|
"morph": "Gender=Masc|Number=Sing|PronType=Tot",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Neut|Number=Sing|Polarity=Neg|PronType=Neg": {
|
||||||
|
"morph": "Gender=Neut|Number=Sing|Polarity=Neg|PronType=Neg",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Neut|Number=Sing|PronType=Art": {
|
||||||
|
"morph": "Gender=Neut|Number=Sing|PronType=Art",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Neut|Number=Sing|PronType=Dem,Ind": {
|
||||||
|
"morph": "Gender=Neut|Number=Sing|PronType=Dem,Ind",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Neut|Number=Sing|PronType=Ind": {
|
||||||
|
"morph": "Gender=Neut|Number=Sing|PronType=Ind",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Gender=Neut|Number=Sing|PronType=Tot": {
|
||||||
|
"morph": "Gender=Neut|Number=Sing|PronType=Tot",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
|
"DET__Number=Plur|Polarity=Neg|PronType=Neg": {
|
||||||
|
"morph": "Number=Plur|Polarity=Neg|PronType=Neg",
|
||||||
|
POS: DET,
|
||||||
|
},
|
||||||
"DET__Number=Plur|PronType=Art": {"morph": "Number=Plur|PronType=Art", POS: DET},
|
"DET__Number=Plur|PronType=Art": {"morph": "Number=Plur|PronType=Art", POS: DET},
|
||||||
"DET__Number=Plur|PronType=Ind": {"morph": "Number=Plur|PronType=Ind", POS: DET},
|
"DET__Number=Plur|PronType=Ind": {"morph": "Number=Plur|PronType=Ind", POS: DET},
|
||||||
"DET__Number=Plur|PronType=Prs": {"morph": "Number=Plur|PronType=Prs", POS: DET},
|
"DET__Number=Plur|PronType=Prs": {"morph": "Number=Plur|PronType=Prs", POS: DET},
|
||||||
|
@ -507,57 +579,183 @@ TAG_MAP = {
|
||||||
"DET__PronType=Prs": {"morph": "PronType=Prs", POS: DET},
|
"DET__PronType=Prs": {"morph": "PronType=Prs", POS: DET},
|
||||||
"NOUN__Abbr=Yes": {"morph": "Abbr=Yes", POS: NOUN},
|
"NOUN__Abbr=Yes": {"morph": "Abbr=Yes", POS: NOUN},
|
||||||
"NOUN__Abbr=Yes|Case=Gen": {"morph": "Abbr=Yes|Case=Gen", POS: NOUN},
|
"NOUN__Abbr=Yes|Case=Gen": {"morph": "Abbr=Yes|Case=Gen", POS: NOUN},
|
||||||
"NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Plur,Sing": {"morph": "Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Plur,Sing", POS: NOUN},
|
"NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Plur,Sing": {
|
||||||
"NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Sing": {"morph": "Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Sing", POS: NOUN},
|
"morph": "Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Plur,Sing",
|
||||||
"NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Neut|Number=Plur,Sing": {"morph": "Abbr=Yes|Definite=Def,Ind|Gender=Neut|Number=Plur,Sing", POS: NOUN},
|
POS: NOUN,
|
||||||
|
},
|
||||||
|
"NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Sing": {
|
||||||
|
"morph": "Abbr=Yes|Definite=Def,Ind|Gender=Masc|Number=Sing",
|
||||||
|
POS: NOUN,
|
||||||
|
},
|
||||||
|
"NOUN__Abbr=Yes|Definite=Def,Ind|Gender=Neut|Number=Plur,Sing": {
|
||||||
|
"morph": "Abbr=Yes|Definite=Def,Ind|Gender=Neut|Number=Plur,Sing",
|
||||||
|
POS: NOUN,
|
||||||
|
},
|
||||||
"NOUN__Abbr=Yes|Gender=Masc": {"morph": "Abbr=Yes|Gender=Masc", POS: NOUN},
|
"NOUN__Abbr=Yes|Gender=Masc": {"morph": "Abbr=Yes|Gender=Masc", POS: NOUN},
|
||||||
"NUM__Case=Gen|Number=Plur|NumType=Card": {"morph": "Case=Gen|Number=Plur|NumType=Card", POS: NUM},
|
"NUM__Case=Gen|Number=Plur|NumType=Card": {
|
||||||
"NUM__Definite=Def|Number=Sing|NumType=Card": {"morph": "Definite=Def|Number=Sing|NumType=Card", POS: NUM},
|
"morph": "Case=Gen|Number=Plur|NumType=Card",
|
||||||
|
POS: NUM,
|
||||||
|
},
|
||||||
|
"NUM__Definite=Def|Number=Sing|NumType=Card": {
|
||||||
|
"morph": "Definite=Def|Number=Sing|NumType=Card",
|
||||||
|
POS: NUM,
|
||||||
|
},
|
||||||
"NUM__Definite=Def|NumType=Card": {"morph": "Definite=Def|NumType=Card", POS: NUM},
|
"NUM__Definite=Def|NumType=Card": {"morph": "Definite=Def|NumType=Card", POS: NUM},
|
||||||
"NUM__Gender=Fem|Number=Sing|NumType=Card": {"morph": "Gender=Fem|Number=Sing|NumType=Card", POS: NUM},
|
"NUM__Gender=Fem|Number=Sing|NumType=Card": {
|
||||||
"NUM__Gender=Masc|Number=Sing|NumType=Card": {"morph": "Gender=Masc|Number=Sing|NumType=Card", POS: NUM},
|
"morph": "Gender=Fem|Number=Sing|NumType=Card",
|
||||||
"NUM__Gender=Neut|Number=Sing|NumType=Card": {"morph": "Gender=Neut|Number=Sing|NumType=Card", POS: NUM},
|
POS: NUM,
|
||||||
|
},
|
||||||
|
"NUM__Gender=Masc|Number=Sing|NumType=Card": {
|
||||||
|
"morph": "Gender=Masc|Number=Sing|NumType=Card",
|
||||||
|
POS: NUM,
|
||||||
|
},
|
||||||
|
"NUM__Gender=Neut|Number=Sing|NumType=Card": {
|
||||||
|
"morph": "Gender=Neut|Number=Sing|NumType=Card",
|
||||||
|
POS: NUM,
|
||||||
|
},
|
||||||
"NUM__Number=Plur|NumType=Card": {"morph": "Number=Plur|NumType=Card", POS: NUM},
|
"NUM__Number=Plur|NumType=Card": {"morph": "Number=Plur|NumType=Card", POS: NUM},
|
||||||
"NUM__Number=Sing|NumType=Card": {"morph": "Number=Sing|NumType=Card", POS: NUM},
|
"NUM__Number=Sing|NumType=Card": {"morph": "Number=Sing|NumType=Card", POS: NUM},
|
||||||
"NUM__NumType=Card": {"morph": "NumType=Card", POS: NUM},
|
"NUM__NumType=Card": {"morph": "NumType=Card", POS: NUM},
|
||||||
"PART__Polarity=Neg": {"morph": "Polarity=Neg", POS: PART},
|
"PART__Polarity=Neg": {"morph": "Polarity=Neg", POS: PART},
|
||||||
"PRON__Animacy=Hum|Case=Acc|Gender=Fem|Number=Sing|Person=3|PronType=Prs": { "morph": "Animacy=Hum|Case=Acc|Gender=Fem|Number=Sing|Person=3|PronType=Prs", POS: PRON},
|
"PRON__Animacy=Hum|Case=Acc|Gender=Fem|Number=Sing|Person=3|PronType=Prs": {
|
||||||
"PRON__Animacy=Hum|Case=Acc|Gender=Masc|Number=Sing|Person=3|PronType=Prs": { "morph": "Animacy=Hum|Case=Acc|Gender=Masc|Number=Sing|Person=3|PronType=Prs", POS: PRON},
|
"morph": "Animacy=Hum|Case=Acc|Gender=Fem|Number=Sing|Person=3|PronType=Prs",
|
||||||
"PRON__Animacy=Hum|Case=Acc|Number=Plur|Person=1|PronType=Prs": {"morph": "Animacy=Hum|Case=Acc|Number=Plur|Person=1|PronType=Prs", POS: PRON},
|
POS: PRON,
|
||||||
"PRON__Animacy=Hum|Case=Acc|Number=Plur|Person=2|PronType=Prs": {"morph": "Animacy=Hum|Case=Acc|Number=Plur|Person=2|PronType=Prs", POS: PRON},
|
},
|
||||||
"PRON__Animacy=Hum|Case=Acc|Number=Sing|Person=1|PronType=Prs": {"morph": "Animacy=Hum|Case=Acc|Number=Sing|Person=1|PronType=Prs", POS: PRON},
|
"PRON__Animacy=Hum|Case=Acc|Gender=Masc|Number=Sing|Person=3|PronType=Prs": {
|
||||||
"PRON__Animacy=Hum|Case=Acc|Number=Sing|Person=2|PronType=Prs": {"morph": "Animacy=Hum|Case=Acc|Number=Sing|Person=2|PronType=Prs", POS: PRON},
|
"morph": "Animacy=Hum|Case=Acc|Gender=Masc|Number=Sing|Person=3|PronType=Prs",
|
||||||
"PRON__Animacy=Hum|Case=Gen,Nom|Number=Sing|PronType=Art,Prs": {"morph": "Animacy=Hum|Case=Gen,Nom|Number=Sing|PronType=Art,Prs", POS: PRON},
|
POS: PRON,
|
||||||
"PRON__Animacy=Hum|Case=Gen|Number=Sing|PronType=Art,Prs": {"morph": "Animacy=Hum|Case=Gen|Number=Sing|PronType=Art,Prs", POS: PRON},
|
},
|
||||||
"PRON__Animacy=Hum|Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs": { "morph": "Animacy=Hum|Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs", POS: PRON},
|
"PRON__Animacy=Hum|Case=Acc|Number=Plur|Person=1|PronType=Prs": {
|
||||||
"PRON__Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs": { "morph": "Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs", POS: PRON},
|
"morph": "Animacy=Hum|Case=Acc|Number=Plur|Person=1|PronType=Prs",
|
||||||
"PRON__Animacy=Hum|Case=Nom|Number=Plur|Person=1|PronType=Prs": {"morph": "Animacy=Hum|Case=Nom|Number=Plur|Person=1|PronType=Prs", POS: PRON},
|
POS: PRON,
|
||||||
"PRON__Animacy=Hum|Case=Nom|Number=Plur|Person=2|PronType=Prs": {"morph": "Animacy=Hum|Case=Nom|Number=Plur|Person=2|PronType=Prs", POS: PRON},
|
},
|
||||||
"PRON__Animacy=Hum|Case=Nom|Number=Sing|Person=1|PronType=Prs": {"morph": "Animacy=Hum|Case=Nom|Number=Sing|Person=1|PronType=Prs", POS: PRON},
|
"PRON__Animacy=Hum|Case=Acc|Number=Plur|Person=2|PronType=Prs": {
|
||||||
"PRON__Animacy=Hum|Case=Nom|Number=Sing|Person=2|PronType=Prs": {"morph": "Animacy=Hum|Case=Nom|Number=Sing|Person=2|PronType=Prs", POS: PRON},
|
"morph": "Animacy=Hum|Case=Acc|Number=Plur|Person=2|PronType=Prs",
|
||||||
"PRON__Animacy=Hum|Case=Nom|Number=Sing|PronType=Prs": {"morph": "Animacy=Hum|Case=Nom|Number=Sing|PronType=Prs", POS: PRON},
|
POS: PRON,
|
||||||
"PRON__Animacy=Hum|Number=Plur|PronType=Rcp": {"morph": "Animacy=Hum|Number=Plur|PronType=Rcp", POS: PRON},
|
},
|
||||||
"PRON__Animacy=Hum|Number=Sing|PronType=Art,Prs": {"morph": "Animacy=Hum|Number=Sing|PronType=Art,Prs", POS: PRON},
|
"PRON__Animacy=Hum|Case=Acc|Number=Sing|Person=1|PronType=Prs": {
|
||||||
"PRON__Animacy=Hum|Poss=Yes|PronType=Int": {"morph": "Animacy=Hum|Poss=Yes|PronType=Int", POS: PRON},
|
"morph": "Animacy=Hum|Case=Acc|Number=Sing|Person=1|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Case=Acc|Number=Sing|Person=2|PronType=Prs": {
|
||||||
|
"morph": "Animacy=Hum|Case=Acc|Number=Sing|Person=2|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Case=Gen,Nom|Number=Sing|PronType=Art,Prs": {
|
||||||
|
"morph": "Animacy=Hum|Case=Gen,Nom|Number=Sing|PronType=Art,Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Case=Gen|Number=Sing|PronType=Art,Prs": {
|
||||||
|
"morph": "Animacy=Hum|Case=Gen|Number=Sing|PronType=Art,Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs": {
|
||||||
|
"morph": "Animacy=Hum|Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs": {
|
||||||
|
"morph": "Animacy=Hum|Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Case=Nom|Number=Plur|Person=1|PronType=Prs": {
|
||||||
|
"morph": "Animacy=Hum|Case=Nom|Number=Plur|Person=1|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Case=Nom|Number=Plur|Person=2|PronType=Prs": {
|
||||||
|
"morph": "Animacy=Hum|Case=Nom|Number=Plur|Person=2|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Case=Nom|Number=Sing|Person=1|PronType=Prs": {
|
||||||
|
"morph": "Animacy=Hum|Case=Nom|Number=Sing|Person=1|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Case=Nom|Number=Sing|Person=2|PronType=Prs": {
|
||||||
|
"morph": "Animacy=Hum|Case=Nom|Number=Sing|Person=2|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Case=Nom|Number=Sing|PronType=Prs": {
|
||||||
|
"morph": "Animacy=Hum|Case=Nom|Number=Sing|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Number=Plur|PronType=Rcp": {
|
||||||
|
"morph": "Animacy=Hum|Number=Plur|PronType=Rcp",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Number=Sing|PronType=Art,Prs": {
|
||||||
|
"morph": "Animacy=Hum|Number=Sing|PronType=Art,Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Animacy=Hum|Poss=Yes|PronType=Int": {
|
||||||
|
"morph": "Animacy=Hum|Poss=Yes|PronType=Int",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
"PRON__Animacy=Hum|PronType=Int": {"morph": "Animacy=Hum|PronType=Int", POS: PRON},
|
"PRON__Animacy=Hum|PronType=Int": {"morph": "Animacy=Hum|PronType=Int", POS: PRON},
|
||||||
"PRON__Case=Acc|PronType=Prs|Reflex=Yes": {"morph": "Case=Acc|PronType=Prs|Reflex=Yes", POS: PRON},
|
"PRON__Case=Acc|PronType=Prs|Reflex=Yes": {
|
||||||
"PRON__Gender=Fem,Masc|Number=Sing|Person=3|Polarity=Neg|PronType=Neg,Prs": { "morph": "Gender=Fem,Masc|Number=Sing|Person=3|Polarity=Neg|PronType=Neg,Prs", POS: PRON},
|
"morph": "Case=Acc|PronType=Prs|Reflex=Yes",
|
||||||
"PRON__Gender=Fem,Masc|Number=Sing|Person=3|PronType=Ind,Prs": {"morph": "Gender=Fem,Masc|Number=Sing|Person=3|PronType=Ind,Prs", POS: PRON},
|
POS: PRON,
|
||||||
"PRON__Gender=Fem,Masc|Number=Sing|Person=3|PronType=Prs,Tot": {"morph": "Gender=Fem,Masc|Number=Sing|Person=3|PronType=Prs,Tot", POS: PRON},
|
},
|
||||||
"PRON__Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs": {"morph": "Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs", POS: PRON},
|
"PRON__Gender=Fem,Masc|Number=Sing|Person=3|Polarity=Neg|PronType=Neg,Prs": {
|
||||||
"PRON__Gender=Masc|Number=Sing|Poss=Yes|PronType=Prs": {"morph": "Gender=Masc|Number=Sing|Poss=Yes|PronType=Prs", POS: PRON},
|
"morph": "Gender=Fem,Masc|Number=Sing|Person=3|Polarity=Neg|PronType=Neg,Prs",
|
||||||
"PRON__Gender=Neut|Number=Sing|Person=3|PronType=Ind,Prs": {"morph": "Gender=Neut|Number=Sing|Person=3|PronType=Ind,Prs", POS: PRON},
|
POS: PRON,
|
||||||
"PRON__Gender=Neut|Number=Sing|Poss=Yes|PronType=Prs": {"morph": "Gender=Neut|Number=Sing|Poss=Yes|PronType=Prs", POS: PRON},
|
},
|
||||||
"PRON__Number=Plur|Person=3|Polarity=Neg|PronType=Neg,Prs": {"morph": "Number=Plur|Person=3|Polarity=Neg|PronType=Neg,Prs", POS: PRON},
|
"PRON__Gender=Fem,Masc|Number=Sing|Person=3|PronType=Ind,Prs": {
|
||||||
"PRON__Number=Plur|Person=3|PronType=Ind,Prs": {"morph": "Number=Plur|Person=3|PronType=Ind,Prs", POS: PRON},
|
"morph": "Gender=Fem,Masc|Number=Sing|Person=3|PronType=Ind,Prs",
|
||||||
"PRON__Number=Plur|Person=3|PronType=Prs,Tot": {"morph": "Number=Plur|Person=3|PronType=Prs,Tot", POS: PRON},
|
POS: PRON,
|
||||||
"PRON__Number=Plur|Poss=Yes|PronType=Prs": {"morph": "Number=Plur|Poss=Yes|PronType=Prs", POS: PRON},
|
},
|
||||||
"PRON__Number=Plur|Poss=Yes|PronType=Rcp": {"morph": "Number=Plur|Poss=Yes|PronType=Rcp", POS: PRON},
|
"PRON__Gender=Fem,Masc|Number=Sing|Person=3|PronType=Prs,Tot": {
|
||||||
"PRON__Number=Sing|Polarity=Neg|PronType=Neg": {"morph": "Number=Sing|Polarity=Neg|PronType=Neg", POS: PRON},
|
"morph": "Gender=Fem,Masc|Number=Sing|Person=3|PronType=Prs,Tot",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs": {
|
||||||
|
"morph": "Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Gender=Masc|Number=Sing|Poss=Yes|PronType=Prs": {
|
||||||
|
"morph": "Gender=Masc|Number=Sing|Poss=Yes|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Gender=Neut|Number=Sing|Person=3|PronType=Ind,Prs": {
|
||||||
|
"morph": "Gender=Neut|Number=Sing|Person=3|PronType=Ind,Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Gender=Neut|Number=Sing|Poss=Yes|PronType=Prs": {
|
||||||
|
"morph": "Gender=Neut|Number=Sing|Poss=Yes|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Number=Plur|Person=3|Polarity=Neg|PronType=Neg,Prs": {
|
||||||
|
"morph": "Number=Plur|Person=3|Polarity=Neg|PronType=Neg,Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Number=Plur|Person=3|PronType=Ind,Prs": {
|
||||||
|
"morph": "Number=Plur|Person=3|PronType=Ind,Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Number=Plur|Person=3|PronType=Prs,Tot": {
|
||||||
|
"morph": "Number=Plur|Person=3|PronType=Prs,Tot",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Number=Plur|Poss=Yes|PronType=Prs": {
|
||||||
|
"morph": "Number=Plur|Poss=Yes|PronType=Prs",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Number=Plur|Poss=Yes|PronType=Rcp": {
|
||||||
|
"morph": "Number=Plur|Poss=Yes|PronType=Rcp",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
|
"PRON__Number=Sing|Polarity=Neg|PronType=Neg": {
|
||||||
|
"morph": "Number=Sing|Polarity=Neg|PronType=Neg",
|
||||||
|
POS: PRON,
|
||||||
|
},
|
||||||
"PRON__PronType=Prs": {"morph": "PronType=Prs", POS: PRON},
|
"PRON__PronType=Prs": {"morph": "PronType=Prs", POS: PRON},
|
||||||
"PRON__PronType=Rel": {"morph": "PronType=Rel", POS: PRON},
|
"PRON__PronType=Rel": {"morph": "PronType=Rel", POS: PRON},
|
||||||
"PROPN__Abbr=Yes": {"morph": "Abbr=Yes", POS: PROPN},
|
"PROPN__Abbr=Yes": {"morph": "Abbr=Yes", POS: PROPN},
|
||||||
"PROPN__Abbr=Yes|Case=Gen": {"morph": "Abbr=Yes|Case=Gen", POS: PROPN},
|
"PROPN__Abbr=Yes|Case=Gen": {"morph": "Abbr=Yes|Case=Gen", POS: PROPN},
|
||||||
"VERB__Abbr=Yes|Mood=Ind|Tense=Pres|VerbForm=Fin": {"morph": "Abbr=Yes|Mood=Ind|Tense=Pres|VerbForm=Fin", POS: VERB},
|
"VERB__Abbr=Yes|Mood=Ind|Tense=Pres|VerbForm=Fin": {
|
||||||
"VERB__Definite=Ind|Number=Sing|VerbForm=Part": {"morph": "Definite=Ind|Number=Sing|VerbForm=Part", POS: VERB},
|
"morph": "Abbr=Yes|Mood=Ind|Tense=Pres|VerbForm=Fin",
|
||||||
|
POS: VERB,
|
||||||
|
},
|
||||||
|
"VERB__Definite=Ind|Number=Sing|VerbForm=Part": {
|
||||||
|
"morph": "Definite=Ind|Number=Sing|VerbForm=Part",
|
||||||
|
POS: VERB,
|
||||||
|
},
|
||||||
}
|
}
|
||||||
|
|
24
spacy/lang/yo/__init__.py
Normal file
24
spacy/lang/yo/__init__.py
Normal file
|
@ -0,0 +1,24 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
from .stop_words import STOP_WORDS
|
||||||
|
from .lex_attrs import LEX_ATTRS
|
||||||
|
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||||
|
from ...language import Language
|
||||||
|
from ...attrs import LANG
|
||||||
|
|
||||||
|
|
||||||
|
class YorubaDefaults(Language.Defaults):
|
||||||
|
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||||
|
lex_attr_getters.update(LEX_ATTRS)
|
||||||
|
lex_attr_getters[LANG] = lambda text: "yo"
|
||||||
|
stop_words = STOP_WORDS
|
||||||
|
tokenizer_exceptions = BASE_EXCEPTIONS
|
||||||
|
|
||||||
|
|
||||||
|
class Yoruba(Language):
|
||||||
|
lang = "yo"
|
||||||
|
Defaults = YorubaDefaults
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = ["Yoruba"]
|
26
spacy/lang/yo/examples.py
Normal file
26
spacy/lang/yo/examples.py
Normal file
|
@ -0,0 +1,26 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
Example sentences to test spaCy and its language models.
|
||||||
|
|
||||||
|
>>> from spacy.lang.yo.examples import sentences
|
||||||
|
>>> docs = nlp.pipe(sentences)
|
||||||
|
"""
|
||||||
|
|
||||||
|
# 1. https://yo.wikipedia.org/wiki/Wikipedia:%C3%80y%E1%BB%8Dk%C3%A0_p%C3%A0t%C3%A0k%C3%AC
|
||||||
|
# 2.https://yo.wikipedia.org/wiki/Oj%C3%BAew%C3%A9_%C3%80k%E1%BB%8D%CC%81k%E1%BB%8D%CC%81
|
||||||
|
# 3. https://www.bbc.com/yoruba
|
||||||
|
|
||||||
|
sentences = [
|
||||||
|
"Ìjọba Tanzania fi Ajìjàgbara Ọmọ Orílẹ̀-èdèe Uganda sí àtìmọ́lé",
|
||||||
|
"Olúṣẹ́gun Ọbásanjọ́, tí ó jẹ́ Ààrẹ ìjọba ológun àná (láti ọdún 1976 sí 1979), tí ó sì tún ṣe Ààrẹ ìjọba alágbádá tí ìbò gbé wọlé (ní ọdún 1999 sí 2007), kúndùn láti máa bu ẹnu àtẹ́ lu àwọn "
|
||||||
|
"ètò ìjọba Ààrẹ orílẹ̀-èdè Nàìjíríà tí ó jẹ tẹ̀lé e.",
|
||||||
|
"Akin Alabi rán ẹnu mọ́ agbárá Adárí Òsìsẹ̀, àwọn ọmọ Nàìjíríà dẹnu bò ó",
|
||||||
|
"Ta ló leè dúró s'ẹ́gbẹ̀ẹ́ Okunnu láì rẹ́rìín?",
|
||||||
|
"Dídarapọ̀ mọ́n ìpolongo",
|
||||||
|
"Bi a se n so, omobinrin ni oruko ni ojo kejo bee naa ni omokunrin ni oruko ni ojo kesan.",
|
||||||
|
"Oríṣìíríṣìí nǹkan ló le yọrí sí orúkọ tí a sọ ọmọ",
|
||||||
|
"Gbogbo won ni won ni oriki ti won",
|
||||||
|
]
|
115
spacy/lang/yo/lex_attrs.py
Normal file
115
spacy/lang/yo/lex_attrs.py
Normal file
|
@ -0,0 +1,115 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import unicodedata
|
||||||
|
|
||||||
|
from ...attrs import LIKE_NUM
|
||||||
|
|
||||||
|
|
||||||
|
_num_words = [
|
||||||
|
"ení",
|
||||||
|
"oókàn",
|
||||||
|
"ọ̀kanlá",
|
||||||
|
"ẹ́ẹdọ́gbọ̀n",
|
||||||
|
"àádọ́fà",
|
||||||
|
"ẹ̀walélúɡba",
|
||||||
|
"egbèje",
|
||||||
|
"ẹgbàárin",
|
||||||
|
"èjì",
|
||||||
|
"eéjì",
|
||||||
|
"èjìlá",
|
||||||
|
"ọgbọ̀n,",
|
||||||
|
"ọgọ́fà",
|
||||||
|
"ọ̀ọ́dúrún",
|
||||||
|
"ẹgbẹ̀jọ",
|
||||||
|
"ẹ̀ẹ́dẹ́ɡbàárùn",
|
||||||
|
"ẹ̀ta",
|
||||||
|
"ẹẹ́ta",
|
||||||
|
"ẹ̀talá",
|
||||||
|
"aárùndílogójì",
|
||||||
|
"àádóje",
|
||||||
|
"irinwó",
|
||||||
|
"ẹgbẹ̀sàn",
|
||||||
|
"ẹgbàárùn",
|
||||||
|
"ẹ̀rin",
|
||||||
|
"ẹẹ́rin",
|
||||||
|
"ẹ̀rinlá",
|
||||||
|
"ogójì",
|
||||||
|
"ogóje",
|
||||||
|
"ẹ̀ẹ́dẹ́gbẹ̀ta",
|
||||||
|
"ẹgbàá",
|
||||||
|
"ẹgbàájọ",
|
||||||
|
"àrún",
|
||||||
|
"aárùn",
|
||||||
|
"ẹ́ẹdógún",
|
||||||
|
"àádọ́ta",
|
||||||
|
"àádọ́jọ",
|
||||||
|
"ẹgbẹ̀ta",
|
||||||
|
"ẹgboókànlá",
|
||||||
|
"ẹgbàawǎ",
|
||||||
|
"ẹ̀fà",
|
||||||
|
"ẹẹ́fà",
|
||||||
|
"ẹẹ́rìndílógún",
|
||||||
|
"ọgọ́ta",
|
||||||
|
"ọgọ́jọ",
|
||||||
|
"ọ̀ọ́dẹ́gbẹ̀rin",
|
||||||
|
"ẹgbẹ́ẹdógún",
|
||||||
|
"ọkẹ́marun",
|
||||||
|
"èje",
|
||||||
|
"etàdílógún",
|
||||||
|
"àádọ́rin",
|
||||||
|
"àádọ́sán",
|
||||||
|
"ẹgbẹ̀rin",
|
||||||
|
"ẹgbàajì",
|
||||||
|
"ẹgbẹ̀ẹgbẹ̀rún",
|
||||||
|
"ẹ̀jọ",
|
||||||
|
"ẹẹ́jọ",
|
||||||
|
"eéjìdílógún",
|
||||||
|
"ọgọ́rin",
|
||||||
|
"ọgọsàn",
|
||||||
|
"ẹ̀ẹ́dẹ́gbẹ̀rún",
|
||||||
|
"ẹgbẹ́ẹdọ́gbọ̀n",
|
||||||
|
"ọgọ́rùn ọkẹ́",
|
||||||
|
"ẹ̀sán",
|
||||||
|
"ẹẹ́sàn",
|
||||||
|
"oókàndílógún",
|
||||||
|
"àádọ́rùn",
|
||||||
|
"ẹ̀wadilúɡba",
|
||||||
|
"ẹgbẹ̀rún",
|
||||||
|
"ẹgbàáta",
|
||||||
|
"ẹ̀wá",
|
||||||
|
"ẹẹ́wàá",
|
||||||
|
"ogún",
|
||||||
|
"ọgọ́rùn",
|
||||||
|
"igba",
|
||||||
|
"ẹgbẹ̀fà",
|
||||||
|
"ẹ̀ẹ́dẹ́ɡbarin",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def strip_accents_text(text):
|
||||||
|
"""
|
||||||
|
Converts the string to NFD, separates & returns only the base characters
|
||||||
|
:param text:
|
||||||
|
:return: input string without diacritic adornments on base characters
|
||||||
|
"""
|
||||||
|
return "".join(
|
||||||
|
c for c in unicodedata.normalize("NFD", text) if unicodedata.category(c) != "Mn"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def like_num(text):
|
||||||
|
text = text.replace(",", "").replace(".", "")
|
||||||
|
num_markers = ["dí", "dọ", "lé", "dín", "di", "din", "le", "do"]
|
||||||
|
if any(mark in text for mark in num_markers):
|
||||||
|
return True
|
||||||
|
text = strip_accents_text(text)
|
||||||
|
_num_words_stripped = [strip_accents_text(num) for num in _num_words]
|
||||||
|
if text.isdigit():
|
||||||
|
return True
|
||||||
|
if text in _num_words_stripped or text.lower() in _num_words_stripped:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
LEX_ATTRS = {LIKE_NUM: like_num}
|
12
spacy/lang/yo/stop_words.py
Normal file
12
spacy/lang/yo/stop_words.py
Normal file
|
@ -0,0 +1,12 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
# stop words as whitespace-separated list.
|
||||||
|
# Source: https://raw.githubusercontent.com/dohliam/more-stoplists/master/yo/yo.txt
|
||||||
|
|
||||||
|
STOP_WORDS = set(
|
||||||
|
"a an b bá bí bẹ̀rẹ̀ d e f fún fẹ́ g gbogbo i inú j jù jẹ jẹ́ k kan kì kí kò "
|
||||||
|
"l láti lè lọ m mi mo máa mọ̀ n ni náà ní nígbà nítorí nǹkan o p padà pé "
|
||||||
|
"púpọ̀ pẹ̀lú r rẹ̀ s sì sí sínú t ti tí u w wà wá wọn wọ́n y yìí à àti àwọn á "
|
||||||
|
"è é ì í ò òun ó ù ú ń ńlá ǹ ̀ ́ ̣ ṣ ṣe ṣé ṣùgbọ́n ẹ ẹmọ́ ọ ọjọ́ ọ̀pọ̀lọpọ̀".split()
|
||||||
|
)
|
|
@ -295,10 +295,9 @@ class EntityRuler(object):
|
||||||
deserializers_patterns = {
|
deserializers_patterns = {
|
||||||
"patterns": lambda p: self.add_patterns(
|
"patterns": lambda p: self.add_patterns(
|
||||||
srsly.read_jsonl(p.with_suffix(".jsonl"))
|
srsly.read_jsonl(p.with_suffix(".jsonl"))
|
||||||
)}
|
)
|
||||||
deserializers_cfg = {
|
|
||||||
"cfg": lambda p: cfg.update(srsly.read_json(p))
|
|
||||||
}
|
}
|
||||||
|
deserializers_cfg = {"cfg": lambda p: cfg.update(srsly.read_json(p))}
|
||||||
from_disk(path, deserializers_cfg, {})
|
from_disk(path, deserializers_cfg, {})
|
||||||
self.overwrite = cfg.get("overwrite", False)
|
self.overwrite = cfg.get("overwrite", False)
|
||||||
self.phrase_matcher_attr = cfg.get("phrase_matcher_attr")
|
self.phrase_matcher_attr = cfg.get("phrase_matcher_attr")
|
||||||
|
|
|
@ -220,6 +220,11 @@ def ur_tokenizer():
|
||||||
return get_lang_class("ur").Defaults.create_tokenizer()
|
return get_lang_class("ur").Defaults.create_tokenizer()
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture(scope="session")
|
||||||
|
def yo_tokenizer():
|
||||||
|
return get_lang_class("yo").Defaults.create_tokenizer()
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture(scope="session")
|
@pytest.fixture(scope="session")
|
||||||
def zh_tokenizer():
|
def zh_tokenizer():
|
||||||
pytest.importorskip("jieba")
|
pytest.importorskip("jieba")
|
||||||
|
|
|
@ -15,7 +15,7 @@ ABBREVIATION_TESTS = [
|
||||||
HYPHENATED_TESTS = [
|
HYPHENATED_TESTS = [
|
||||||
(
|
(
|
||||||
"1700-luvulle sijoittuva taide-elokuva",
|
"1700-luvulle sijoittuva taide-elokuva",
|
||||||
["1700-luvulle", "sijoittuva", "taide-elokuva"]
|
["1700-luvulle", "sijoittuva", "taide-elokuva"],
|
||||||
)
|
)
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
|
@ -3,16 +3,19 @@ from __future__ import unicode_literals
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("text", ["z.B.", "Jan."])
|
@pytest.mark.parametrize("text", ["z.B.", "Jan."])
|
||||||
def test_lb_tokenizer_handles_abbr(lb_tokenizer, text):
|
def test_lb_tokenizer_handles_abbr(lb_tokenizer, text):
|
||||||
tokens = lb_tokenizer(text)
|
tokens = lb_tokenizer(text)
|
||||||
assert len(tokens) == 1
|
assert len(tokens) == 1
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("text", ["d'Saach", "d'Kanner", "d’Welt", "d’Suen"])
|
@pytest.mark.parametrize("text", ["d'Saach", "d'Kanner", "d’Welt", "d’Suen"])
|
||||||
def test_lb_tokenizer_splits_contractions(lb_tokenizer, text):
|
def test_lb_tokenizer_splits_contractions(lb_tokenizer, text):
|
||||||
tokens = lb_tokenizer(text)
|
tokens = lb_tokenizer(text)
|
||||||
assert len(tokens) == 2
|
assert len(tokens) == 2
|
||||||
|
|
||||||
|
|
||||||
def test_lb_tokenizer_handles_exc_in_text(lb_tokenizer):
|
def test_lb_tokenizer_handles_exc_in_text(lb_tokenizer):
|
||||||
text = "Mee 't ass net evident, d'Liewen."
|
text = "Mee 't ass net evident, d'Liewen."
|
||||||
tokens = lb_tokenizer(text)
|
tokens = lb_tokenizer(text)
|
||||||
|
@ -20,6 +23,7 @@ def test_lb_tokenizer_handles_exc_in_text(lb_tokenizer):
|
||||||
assert tokens[1].text == "'t"
|
assert tokens[1].text == "'t"
|
||||||
assert tokens[1].lemma_ == "et"
|
assert tokens[1].lemma_ == "et"
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize("text,norm", [("dass", "datt"), ("viläicht", "vläicht")])
|
@pytest.mark.parametrize("text,norm", [("dass", "datt"), ("viläicht", "vläicht")])
|
||||||
def test_lb_norm_exceptions(lb_tokenizer, text, norm):
|
def test_lb_norm_exceptions(lb_tokenizer, text, norm):
|
||||||
tokens = lb_tokenizer(text)
|
tokens = lb_tokenizer(text)
|
||||||
|
|
|
@ -16,7 +16,7 @@ def test_lb_tokenizer_handles_long_text(lb_tokenizer):
|
||||||
[
|
[
|
||||||
("»Wat ass mat mir geschitt?«, huet hie geduecht.", 13),
|
("»Wat ass mat mir geschitt?«, huet hie geduecht.", 13),
|
||||||
("“Dëst fréi Opstoen”, denkt hien, “mécht ee ganz duercherneen. ", 15),
|
("“Dëst fréi Opstoen”, denkt hien, “mécht ee ganz duercherneen. ", 15),
|
||||||
("Am Grand-Duché ass d'Liewen schéin, mee 't gëtt ze vill Autoen.", 14)
|
("Am Grand-Duché ass d'Liewen schéin, mee 't gëtt ze vill Autoen.", 14),
|
||||||
],
|
],
|
||||||
)
|
)
|
||||||
def test_lb_tokenizer_handles_examples(lb_tokenizer, text, length):
|
def test_lb_tokenizer_handles_examples(lb_tokenizer, text, length):
|
||||||
|
|
|
@ -11,7 +11,7 @@ from spacy.util import get_lang_class
|
||||||
LANGUAGES = ["af", "ar", "bg", "bn", "ca", "cs", "da", "de", "el", "en", "es",
|
LANGUAGES = ["af", "ar", "bg", "bn", "ca", "cs", "da", "de", "el", "en", "es",
|
||||||
"et", "fa", "fi", "fr", "ga", "he", "hi", "hr", "hu", "id", "is",
|
"et", "fa", "fi", "fr", "ga", "he", "hi", "hr", "hu", "id", "is",
|
||||||
"it", "kn", "lt", "lv", "nb", "nl", "pl", "pt", "ro", "si", "sk",
|
"it", "kn", "lt", "lv", "nb", "nl", "pl", "pt", "ro", "si", "sk",
|
||||||
"sl", "sq", "sr", "sv", "ta", "te", "tl", "tr", "tt", "ur"]
|
"sl", "sq", "sr", "sv", "ta", "te", "tl", "tr", "tt", "ur", 'yo']
|
||||||
# fmt: on
|
# fmt: on
|
||||||
|
|
||||||
|
|
||||||
|
|
0
spacy/tests/lang/yo/__init__.py
Normal file
0
spacy/tests/lang/yo/__init__.py
Normal file
32
spacy/tests/lang/yo/test_text.py
Normal file
32
spacy/tests/lang/yo/test_text.py
Normal file
|
@ -0,0 +1,32 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from spacy.lang.yo.lex_attrs import like_num
|
||||||
|
|
||||||
|
|
||||||
|
def test_yo_tokenizer_handles_long_text(yo_tokenizer):
|
||||||
|
text = """Àwọn ọmọ ìlú tí wọ́n ń ṣàmúlò ayélujára ti bẹ̀rẹ̀ ìkọkúkọ sórí àwòrán ààrẹ Nkurunziza nínú ìfẹ̀hónúhàn pẹ̀lú àmì ìdámọ̀: Nkurunziza àti Burundi:
|
||||||
|
Ọmọ ilé ẹ̀kọ́ gíga ní ẹ̀wọ̀n fún kíkọ ìkọkúkọ sí orí àwòrán Ààrẹ .
|
||||||
|
Bí mo bá ṣe èyí ní Burundi , ó ṣe é ṣe kí a fi mí sí àtìmọ́lé
|
||||||
|
Ìjọba Burundi fi akẹ́kọ̀ọ́bìnrin sí àtìmọ́lé látàrí ẹ̀sùn ìkọkúkọ sí orí àwòrán ààrẹ. A túwíìtì àwòrán ìkọkúkọ wa ní ìbánikẹ́dùn ìṣẹ̀lẹ̀ náà.
|
||||||
|
Wọ́n ní kí a dán an wò, kí a kọ nǹkan sí orí àwòrán ààrẹ mo sì ṣe bẹ́ẹ̀. Mo ní ìgbóyà wípé ẹnikẹ́ni kò ní mú mi níbí.
|
||||||
|
Ìfòfinlíle mú àtakò"""
|
||||||
|
tokens = yo_tokenizer(text)
|
||||||
|
assert len(tokens) == 121
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize(
|
||||||
|
"text,match",
|
||||||
|
[("ení", True), ("ogun", True), ("mewadinlogun", True), ("ten", False)],
|
||||||
|
)
|
||||||
|
def test_lex_attrs_like_number(yo_tokenizer, text, match):
|
||||||
|
tokens = yo_tokenizer(text)
|
||||||
|
assert len(tokens) == 1
|
||||||
|
assert tokens[0].like_num == match
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("word", ["eji", "ejila", "ogun", "aárùn"])
|
||||||
|
def test_yo_lex_attrs_capitals(word):
|
||||||
|
assert like_num(word)
|
||||||
|
assert like_num(word.upper())
|
|
@ -151,17 +151,17 @@ def test_parser_arc_eager_finalize_state(en_tokenizer, en_parser):
|
||||||
|
|
||||||
|
|
||||||
def test_parser_set_sent_starts(en_vocab):
|
def test_parser_set_sent_starts(en_vocab):
|
||||||
|
# fmt: off
|
||||||
words = ['Ein', 'Satz', '.', 'Außerdem', 'ist', 'Zimmer', 'davon', 'überzeugt', ',', 'dass', 'auch', 'epige-', '\n', 'netische', 'Mechanismen', 'eine', 'Rolle', 'spielen', ',', 'also', 'Vorgänge', ',', 'die', '\n', 'sich', 'darauf', 'auswirken', ',', 'welche', 'Gene', 'abgelesen', 'werden', 'und', '\n', 'welche', 'nicht', '.', '\n']
|
words = ['Ein', 'Satz', '.', 'Außerdem', 'ist', 'Zimmer', 'davon', 'überzeugt', ',', 'dass', 'auch', 'epige-', '\n', 'netische', 'Mechanismen', 'eine', 'Rolle', 'spielen', ',', 'also', 'Vorgänge', ',', 'die', '\n', 'sich', 'darauf', 'auswirken', ',', 'welche', 'Gene', 'abgelesen', 'werden', 'und', '\n', 'welche', 'nicht', '.', '\n']
|
||||||
heads = [1, 0, -1, 27, 0, -1, 1, -3, -1, 8, 4, 3, -1, 1, 3, 1, 1, -11, -1, 1, -9, -1, 4, -1, 2, 1, -6, -1, 1, 2, 1, -6, -1, -1, -17, -31, -32, -1]
|
heads = [1, 0, -1, 27, 0, -1, 1, -3, -1, 8, 4, 3, -1, 1, 3, 1, 1, -11, -1, 1, -9, -1, 4, -1, 2, 1, -6, -1, 1, 2, 1, -6, -1, -1, -17, -31, -32, -1]
|
||||||
deps = ['nk', 'ROOT', 'punct', 'mo', 'ROOT', 'sb', 'op', 'pd', 'punct', 'cp', 'mo', 'nk', '', 'nk', 'sb', 'nk', 'oa', 're', 'punct', 'mo', 'app', 'punct', 'sb', '', 'oa', 'op', 'rc', 'punct', 'nk', 'sb', 'oc', 're', 'cd', '', 'oa', 'ng', 'punct', '']
|
deps = ['nk', 'ROOT', 'punct', 'mo', 'ROOT', 'sb', 'op', 'pd', 'punct', 'cp', 'mo', 'nk', '', 'nk', 'sb', 'nk', 'oa', 're', 'punct', 'mo', 'app', 'punct', 'sb', '', 'oa', 'op', 'rc', 'punct', 'nk', 'sb', 'oc', 're', 'cd', '', 'oa', 'ng', 'punct', '']
|
||||||
doc = get_doc(
|
# fmt: on
|
||||||
en_vocab, words=words, deps=deps, heads=heads
|
doc = get_doc(en_vocab, words=words, deps=deps, heads=heads)
|
||||||
)
|
|
||||||
for i in range(len(words)):
|
for i in range(len(words)):
|
||||||
if i == 0 or i == 3:
|
if i == 0 or i == 3:
|
||||||
assert doc[i].is_sent_start == True
|
assert doc[i].is_sent_start is True
|
||||||
else:
|
else:
|
||||||
assert doc[i].is_sent_start == None
|
assert doc[i].is_sent_start is None
|
||||||
for sent in doc.sents:
|
for sent in doc.sents:
|
||||||
for token in sent:
|
for token in sent:
|
||||||
assert token.head in sent
|
assert token.head in sent
|
||||||
|
|
|
@ -3,7 +3,6 @@ from __future__ import unicode_literals
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
from spacy.language import Language
|
from spacy.language import Language
|
||||||
from spacy.pipeline import Tagger
|
|
||||||
|
|
||||||
|
|
||||||
def test_label_types():
|
def test_label_types():
|
||||||
|
|
|
@ -1,11 +1,12 @@
|
||||||
# coding: utf-8
|
# coding: utf-8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
from spacy.kb import KnowledgeBase
|
from spacy.kb import KnowledgeBase
|
||||||
from spacy.util import ensure_path
|
from spacy.util import ensure_path
|
||||||
|
|
||||||
from spacy.lang.en import English
|
from spacy.lang.en import English
|
||||||
from spacy.tests.util import make_tempdir
|
|
||||||
|
from ..util import make_tempdir
|
||||||
|
|
||||||
|
|
||||||
def test_issue4674():
|
def test_issue4674():
|
||||||
|
@ -15,7 +16,12 @@ def test_issue4674():
|
||||||
|
|
||||||
vector1 = [0.9, 1.1, 1.01]
|
vector1 = [0.9, 1.1, 1.01]
|
||||||
vector2 = [1.8, 2.25, 2.01]
|
vector2 = [1.8, 2.25, 2.01]
|
||||||
kb.set_entities(entity_list=["Q1", "Q1"], freq_list=[32, 111], vector_list=[vector1, vector2])
|
with pytest.warns(UserWarning):
|
||||||
|
kb.set_entities(
|
||||||
|
entity_list=["Q1", "Q1"],
|
||||||
|
freq_list=[32, 111],
|
||||||
|
vector_list=[vector1, vector2],
|
||||||
|
)
|
||||||
|
|
||||||
assert kb.get_size_entities() == 1
|
assert kb.get_size_entities() == 1
|
||||||
|
|
||||||
|
@ -31,4 +37,3 @@ def test_issue4674():
|
||||||
kb2.load_bulk(str(file_path))
|
kb2.load_bulk(str(file_path))
|
||||||
|
|
||||||
assert kb2.get_size_entities() == 1
|
assert kb2.get_size_entities() == 1
|
||||||
|
|
||||||
|
|
|
@ -994,9 +994,9 @@ cdef class Doc:
|
||||||
order, and no span intersection is allowed.
|
order, and no span intersection is allowed.
|
||||||
|
|
||||||
spans (Span[]): Spans to merge, in document order, with all span
|
spans (Span[]): Spans to merge, in document order, with all span
|
||||||
intersections empty. Cannot be emty.
|
intersections empty. Cannot be empty.
|
||||||
attributes (Dictionary[]): Attributes to assign to the merged tokens. By default,
|
attributes (Dictionary[]): Attributes to assign to the merged tokens. By default,
|
||||||
must be the same lenghth as spans, emty dictionaries are allowed.
|
must be the same length as spans, empty dictionaries are allowed.
|
||||||
attributes are inherited from the syntactic root of the span.
|
attributes are inherited from the syntactic root of the span.
|
||||||
RETURNS (Token): The first newly merged token.
|
RETURNS (Token): The first newly merged token.
|
||||||
"""
|
"""
|
||||||
|
|
|
@ -77,9 +77,9 @@ more efficient than processing texts one-by-one.
|
||||||
Early versions of spaCy used simple statistical models that could be efficiently
|
Early versions of spaCy used simple statistical models that could be efficiently
|
||||||
multi-threaded, as we were able to entirely release Python's global interpreter
|
multi-threaded, as we were able to entirely release Python's global interpreter
|
||||||
lock. The multi-threading was controlled using the `n_threads` keyword argument
|
lock. The multi-threading was controlled using the `n_threads` keyword argument
|
||||||
to the `.pipe` method. This keyword argument is now deprecated as of v2.1.0.
|
to the `.pipe` method. This keyword argument is now deprecated as of v2.1.0. A
|
||||||
Future versions may introduce a `n_process` argument for parallel inference via
|
new keyword argument, `n_process`, was introduced to control parallel inference
|
||||||
multiprocessing.
|
via multiprocessing in v2.2.2.
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
|
@ -98,6 +98,7 @@ multiprocessing.
|
||||||
| `batch_size` | int | The number of texts to buffer. |
|
| `batch_size` | int | The number of texts to buffer. |
|
||||||
| `disable` | list | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). |
|
| `disable` | list | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). |
|
||||||
| `component_cfg` <Tag variant="new">2.1</Tag> | dict | Config parameters for specific pipeline components, keyed by component name. |
|
| `component_cfg` <Tag variant="new">2.1</Tag> | dict | Config parameters for specific pipeline components, keyed by component name. |
|
||||||
|
| `n_process` <Tag variant="new">2.2.2</Tag> | int | Number of processors to use, only supported in Python 3. Defaults to `1`. |
|
||||||
| **YIELDS** | `Doc` | Documents in the order of the original text. |
|
| **YIELDS** | `Doc` | Documents in the order of the original text. |
|
||||||
|
|
||||||
## Language.update {#update tag="method"}
|
## Language.update {#update tag="method"}
|
||||||
|
|
|
@ -124,9 +124,8 @@ interface for GPU arrays.
|
||||||
spaCy can be installed on GPU by specifying `spacy[cuda]`, `spacy[cuda90]`,
|
spaCy can be installed on GPU by specifying `spacy[cuda]`, `spacy[cuda90]`,
|
||||||
`spacy[cuda91]`, `spacy[cuda92]` or `spacy[cuda100]`. If you know your cuda
|
`spacy[cuda91]`, `spacy[cuda92]` or `spacy[cuda100]`. If you know your cuda
|
||||||
version, using the more explicit specifier allows cupy to be installed via
|
version, using the more explicit specifier allows cupy to be installed via
|
||||||
wheel, saving some compilation time. The specifiers should install two
|
wheel, saving some compilation time. The specifiers should install
|
||||||
libraries: [`cupy`](https://cupy.chainer.org) and
|
[`cupy`](https://cupy.chainer.org).
|
||||||
[`thinc_gpu_ops`](https://github.com/explosion/thinc_gpu_ops).
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
$ pip install -U spacy[cuda92]
|
$ pip install -U spacy[cuda92]
|
||||||
|
|
Loading…
Reference in New Issue
Block a user