mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 02:06:31 +03:00
Merge branch 'master' into develop
This commit is contained in:
commit
5d0b60999d
106
.github/contributors/DeNeutoy.md
vendored
Normal file
106
.github/contributors/DeNeutoy.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name |Mark Neumann |
|
||||
| Company name (if applicable) |Allen Institute for AI |
|
||||
| Title or role (if applicable) |Research Engineer |
|
||||
| Date | 13/01/2019 |
|
||||
| GitHub username |@Deneutoy |
|
||||
| Website (optional) |markneumann.xyz |
|
106
.github/contributors/Loghijiaha.md
vendored
Normal file
106
.github/contributors/Loghijiaha.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [ x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ x] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Loghi Perinpanayagam |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | Student |
|
||||
| Date | 13 Jan, 2019 |
|
||||
| GitHub username | loghijiaha |
|
||||
| Website (optional) | |
|
106
.github/contributors/PolyglotOpenstreetmap.md
vendored
Normal file
106
.github/contributors/PolyglotOpenstreetmap.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Jo |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2018-01-26 |
|
||||
| GitHub username | PolyglotOpenstreetmap|
|
||||
| Website (optional) | |
|
106
.github/contributors/adrianeboyd.md
vendored
Normal file
106
.github/contributors/adrianeboyd.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Adriane Boyd |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 28 January 2019 |
|
||||
| GitHub username | adrianeboyd |
|
||||
| Website (optional) | |
|
106
.github/contributors/alvations.md
vendored
Normal file
106
.github/contributors/alvations.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [ ] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Liling |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 04 Jan 2019 |
|
||||
| GitHub username | alvations |
|
||||
| Website (optional) | |
|
2
.github/contributors/amperinet.md
vendored
2
.github/contributors/amperinet.md
vendored
|
@ -101,6 +101,6 @@ mark both statements:
|
|||
| Name | Amandine Périnet |
|
||||
| Company name (if applicable) | 365Talents |
|
||||
| Title or role (if applicable) | Data Science Researcher |
|
||||
| Date | 12/12/2018 |
|
||||
| Date | 28/01/2019 |
|
||||
| GitHub username | amperinet |
|
||||
| Website (optional) | |
|
||||
|
|
106
.github/contributors/boena.md
vendored
Normal file
106
.github/contributors/boena.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Björn Lennartsson |
|
||||
| Company name (if applicable) | Uptrail AB |
|
||||
| Title or role (if applicable) | CTO |
|
||||
| Date | 2019-01-15 |
|
||||
| GitHub username | boena |
|
||||
| Website (optional) | www.uptrail.com |
|
106
.github/contributors/foufaster.md
vendored
Normal file
106
.github/contributors/foufaster.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name |Anès Foufa |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) |NLP developer |
|
||||
| Date |21/01/2019 |
|
||||
| GitHub username |foufaster |
|
||||
| Website (optional) | |
|
106
.github/contributors/ozcankasal.md
vendored
Normal file
106
.github/contributors/ozcankasal.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Ozcan Kasal |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | December 21, 2018 |
|
||||
| GitHub username | ozcankasal |
|
||||
| Website (optional) | |
|
106
.github/contributors/retnuh.md
vendored
Normal file
106
.github/contributors/retnuh.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
- Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
- to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
- each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
| ----------------------------- | ------------ |
|
||||
| Name | Hunter Kelly |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 2019-01-10 |
|
||||
| GitHub username | retnuh |
|
||||
| Website (optional) | |
|
106
.github/contributors/willprice.md
vendored
Normal file
106
.github/contributors/willprice.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | --------------------- |
|
||||
| Name | Will Price |
|
||||
| Company name (if applicable) | N/A |
|
||||
| Title or role (if applicable) | N/A |
|
||||
| Date | 26/12/2018 |
|
||||
| GitHub username | willprice |
|
||||
| Website (optional) | https://willprice.org |
|
|
@ -1,4 +1,5 @@
|
|||
recursive-include include *.h
|
||||
include LICENSE
|
||||
include README.md
|
||||
include pyproject.toml
|
||||
include bin/spacy
|
||||
|
|
106
contributer_agreement.md
Normal file
106
contributer_agreement.md
Normal file
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Laura Baakman |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | February 7, 2019 |
|
||||
| GitHub username | lauraBaakman |
|
||||
| Website (optional) | |
|
|
@ -58,7 +58,7 @@ import spacy
|
|||
lang=("Language class to initialise", "option", "l", str),
|
||||
)
|
||||
def main(patterns_loc, text_loc, n=10000, lang="en"):
|
||||
nlp = spacy.blank("en")
|
||||
nlp = spacy.blank(lang)
|
||||
nlp.vocab.lex_attr_getters = {}
|
||||
phrases = read_gazetteer(nlp.tokenizer, patterns_loc)
|
||||
count = 0
|
||||
|
|
|
@ -26,6 +26,11 @@ from spacy.util import minibatch, compounding
|
|||
n_iter=("Number of training iterations", "option", "n", int),
|
||||
)
|
||||
def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
|
||||
if output_dir is not None:
|
||||
output_dir = Path(output_dir)
|
||||
if not output_dir.exists():
|
||||
output_dir.mkdir()
|
||||
|
||||
if model is not None:
|
||||
nlp = spacy.load(model) # load existing spaCy model
|
||||
print("Loaded model '%s'" % model)
|
||||
|
@ -87,9 +92,6 @@ def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
|
|||
print(test_text, doc.cats)
|
||||
|
||||
if output_dir is not None:
|
||||
output_dir = Path(output_dir)
|
||||
if not output_dir.exists():
|
||||
output_dir.mkdir()
|
||||
with nlp.use_params(optimizer.averages):
|
||||
nlp.to_disk(output_dir)
|
||||
print("Saved model to", output_dir)
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
[
|
||||
{
|
||||
"id": "wsj_0200",
|
||||
"id": 42,
|
||||
"paragraphs": [
|
||||
{
|
||||
"raw": "In an Oct. 19 review of \"The Misanthrope\" at Chicago's Goodman Theatre (\"Revitalized Classics Take the Stage in Windy City,\" Leisure & Arts), the role of Celimene, played by Kim Cattrall, was mistakenly attributed to Christina Haag. Ms. Haag plays Elianti.",
|
||||
|
|
10
pyproject.toml
Normal file
10
pyproject.toml
Normal file
|
@ -0,0 +1,10 @@
|
|||
[build-system]
|
||||
requires = ["setuptools",
|
||||
"wheel>0.32.0.<0.33.0",
|
||||
"Cython",
|
||||
"cymem>=2.0.2,<2.1.0",
|
||||
"preshed>=2.0.1,<2.1.0",
|
||||
"murmurhash>=0.28.0,<1.1.0",
|
||||
"thinc>=6.12.1,<6.13.0",
|
||||
]
|
||||
build-backend = "setuptools.build_meta"
|
|
@ -14,7 +14,7 @@ plac<1.0.0,>=0.9.6
|
|||
pathlib==1.0.1; python_version < "3.4"
|
||||
# Development dependencies
|
||||
cython>=0.25
|
||||
pytest>=4.0.0,<5.0.0
|
||||
pytest>=4.0.0,<4.1.0
|
||||
pytest-timeout>=1.3.0,<2.0.0
|
||||
mock>=2.0.0,<3.0.0
|
||||
flake8>=3.5.0,<3.6.0
|
||||
|
|
1
setup.py
1
setup.py
|
@ -246,6 +246,7 @@ def setup_package():
|
|||
"cuda92": ["cupy-cuda92>=4.0"],
|
||||
"cuda100": ["cupy-cuda100>=4.0"],
|
||||
},
|
||||
python_requires=">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*",
|
||||
classifiers=[
|
||||
"Development Status :: 5 - Production/Stable",
|
||||
"Environment :: Console",
|
||||
|
|
|
@ -31,9 +31,13 @@ def read_iob(raw_sents):
|
|||
tokens = [re.split("[^\w\-]", line.strip())]
|
||||
if len(tokens[0]) == 3:
|
||||
words, pos, iob = zip(*tokens)
|
||||
else:
|
||||
elif len(tokens[0]) == 2:
|
||||
words, iob = zip(*tokens)
|
||||
pos = ["-"] * len(words)
|
||||
else:
|
||||
raise ValueError(
|
||||
"The iob/iob2 file is not formatted correctly. Try checking whitespace and delimiters."
|
||||
)
|
||||
biluo = iob_to_biluo(iob)
|
||||
sentences.append(
|
||||
[
|
||||
|
|
|
@ -208,7 +208,11 @@ def read_freqs(freqs_loc, max_length=100, min_doc_freq=5, min_freq=50):
|
|||
doc_freq = int(doc_freq)
|
||||
freq = int(freq)
|
||||
if doc_freq >= min_doc_freq and freq >= min_freq and len(key) < max_length:
|
||||
word = literal_eval(key)
|
||||
try:
|
||||
word = literal_eval(key)
|
||||
except SyntaxError:
|
||||
# Take odd strings literally.
|
||||
word = literal_eval("'%s'" % key)
|
||||
smooth_count = counts.smoother(int(freq))
|
||||
probs[word] = math.log(smooth_count) - log_total
|
||||
oov_prob = math.log(counts.smoother(0)) - log_total
|
||||
|
|
|
@ -9,7 +9,6 @@ from ..util import is_in_jupyter
|
|||
|
||||
|
||||
_html = {}
|
||||
IS_JUPYTER = is_in_jupyter()
|
||||
RENDER_WRAPPER = None
|
||||
|
||||
|
||||
|
@ -18,7 +17,7 @@ def render(
|
|||
style="dep",
|
||||
page=False,
|
||||
minify=False,
|
||||
jupyter=IS_JUPYTER,
|
||||
jupyter=False,
|
||||
options={},
|
||||
manual=False,
|
||||
):
|
||||
|
@ -51,7 +50,7 @@ def render(
|
|||
html = _html["parsed"]
|
||||
if RENDER_WRAPPER is not None:
|
||||
html = RENDER_WRAPPER(html)
|
||||
if jupyter: # return HTML rendered by IPython display()
|
||||
if jupyter or is_in_jupyter(): # return HTML rendered by IPython display()
|
||||
from IPython.core.display import display, HTML
|
||||
|
||||
return display(HTML(html))
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import random
|
||||
import uuid
|
||||
|
||||
from .templates import TPL_DEP_SVG, TPL_DEP_WORDS, TPL_DEP_ARCS
|
||||
from .templates import TPL_ENT, TPL_ENTS, TPL_FIGURE, TPL_TITLE, TPL_PAGE
|
||||
|
@ -41,7 +41,7 @@ class DependencyRenderer(object):
|
|||
"""
|
||||
# Create a random ID prefix to make sure parses don't receive the
|
||||
# same ID, even if they're identical
|
||||
id_prefix = random.randint(0, 999)
|
||||
id_prefix = uuid.uuid4().hex
|
||||
rendered = [
|
||||
self.render_svg("{}-{}".format(id_prefix, i), p["words"], p["arcs"])
|
||||
for i, p in enumerate(parsed)
|
||||
|
|
|
@ -4,20 +4,24 @@ from __future__ import unicode_literals
|
|||
from .lookup import LOOKUP
|
||||
from ._adjectives import ADJECTIVES
|
||||
from ._adjectives_irreg import ADJECTIVES_IRREG
|
||||
from ._adp_irreg import ADP_IRREG
|
||||
from ._adverbs import ADVERBS
|
||||
from ._auxiliary_verbs_irreg import AUXILIARY_VERBS_IRREG
|
||||
from ._cconj_irreg import CCONJ_IRREG
|
||||
from ._dets_irreg import DETS_IRREG
|
||||
from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES
|
||||
from ._nouns import NOUNS
|
||||
from ._nouns_irreg import NOUNS_IRREG
|
||||
from ._pronouns_irreg import PRONOUNS_IRREG
|
||||
from ._sconj_irreg import SCONJ_IRREG
|
||||
from ._verbs import VERBS
|
||||
from ._verbs_irreg import VERBS_IRREG
|
||||
from ._dets_irreg import DETS_IRREG
|
||||
from ._pronouns_irreg import PRONOUNS_IRREG
|
||||
from ._auxiliary_verbs_irreg import AUXILIARY_VERBS_IRREG
|
||||
from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES
|
||||
|
||||
|
||||
LEMMA_INDEX = {'adj': ADJECTIVES, 'adv': ADVERBS, 'noun': NOUNS, 'verb': VERBS}
|
||||
|
||||
LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'noun': NOUNS_IRREG, 'verb': VERBS_IRREG,
|
||||
'det': DETS_IRREG, 'pron': PRONOUNS_IRREG, 'aux': AUXILIARY_VERBS_IRREG}
|
||||
LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'adp': ADP_IRREG, 'aux': AUXILIARY_VERBS_IRREG,
|
||||
'cconj': CCONJ_IRREG, 'det': DETS_IRREG, 'noun': NOUNS_IRREG, 'verb': VERBS_IRREG,
|
||||
'pron': PRONOUNS_IRREG, 'sconj': SCONJ_IRREG}
|
||||
|
||||
LEMMA_RULES = {'adj': ADJECTIVE_RULES, 'noun': NOUN_RULES, 'verb': VERB_RULES}
|
||||
|
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
24
spacy/lang/fr/lemmatizer/_adp_irreg.py
Normal file
24
spacy/lang/fr/lemmatizer/_adp_irreg.py
Normal file
|
@ -0,0 +1,24 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
ADP_IRREG = {
|
||||
"a": ("à",),
|
||||
"apr.": ("après",),
|
||||
"aux": ("à",),
|
||||
"av.": ("avant",),
|
||||
"avt": ("avant",),
|
||||
"cf.": ("cf",),
|
||||
"conf.": ("cf",),
|
||||
"confer": ("cf",),
|
||||
"d'": ("de",),
|
||||
"des": ("de",),
|
||||
"du": ("de",),
|
||||
"jusqu'": ("jusque",),
|
||||
"pdt": ("pendant",),
|
||||
"+": ("plus",),
|
||||
"pr": ("pour",),
|
||||
"/": ("sur",),
|
||||
"versus": ("vs",),
|
||||
"vs.": ("vs",)
|
||||
}
|
File diff suppressed because it is too large
Load Diff
17
spacy/lang/fr/lemmatizer/_cconj_irreg.py
Normal file
17
spacy/lang/fr/lemmatizer/_cconj_irreg.py
Normal file
|
@ -0,0 +1,17 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
CCONJ_IRREG = {
|
||||
"&": ("et",),
|
||||
"c-à-d": ("c'est-à-dire",),
|
||||
"c.-à.-d.": ("c'est-à-dire",),
|
||||
"càd": ("c'est-à-dire",),
|
||||
"&": ("et",),
|
||||
"et|ou": ("et-ou",),
|
||||
"et/ou": ("et-ou",),
|
||||
"i.e.": ("c'est-à-dire",),
|
||||
"ie": ("c'est-à-dire",),
|
||||
"ou/et": ("et-ou",),
|
||||
"+": ("plus",)
|
||||
}
|
|
@ -4,20 +4,27 @@ from __future__ import unicode_literals
|
|||
|
||||
DETS_IRREG = {
|
||||
"aucune": ("aucun",),
|
||||
"cents": ("cent",),
|
||||
"certaine": ("certain",),
|
||||
"certaines": ("certain",),
|
||||
"certains": ("certain",),
|
||||
"ces": ("ce",),
|
||||
"cet": ("ce",),
|
||||
"cette": ("ce",),
|
||||
"cents": ("cent",),
|
||||
"certaines": ("certains",),
|
||||
"des": ("un",),
|
||||
"différentes": ("différents",),
|
||||
"diverse": ("divers",),
|
||||
"diverses": ("divers",),
|
||||
"du": ("de",),
|
||||
"la": ("le",),
|
||||
"les": ("le",),
|
||||
"l'": ("le",),
|
||||
"laquelle": ("lequel",),
|
||||
"les": ("le",),
|
||||
"lesdites": ("ledit",),
|
||||
"lesdits": ("ledit",),
|
||||
"leurs": ("leur",),
|
||||
"lesquelles": ("lequel",),
|
||||
"lesquels": ("lequel",),
|
||||
"leurs": ("leur",),
|
||||
"l'": ("le",),
|
||||
"mainte": ("maint",),
|
||||
"maintes": ("maint",),
|
||||
"maints": ("maint",),
|
||||
|
@ -27,23 +34,29 @@ DETS_IRREG = {
|
|||
"nulle": ("nul",),
|
||||
"nulles": ("nul",),
|
||||
"nuls": ("nul",),
|
||||
"pareille": ("pareil",),
|
||||
"pareilles": ("pareil",),
|
||||
"pareils": ("pareil",),
|
||||
"quelle": ("quel",),
|
||||
"quelles": ("quel",),
|
||||
"quels": ("quel",),
|
||||
"quelqu'": ("quelque",),
|
||||
"qq": ("quelque",),
|
||||
"qqes": ("quelque",),
|
||||
"qqs": ("quelque",),
|
||||
"quelques": ("quelque",),
|
||||
"quelqu'": ("quelque",),
|
||||
"quels": ("quel",),
|
||||
"sa": ("son",),
|
||||
"ses": ("son",),
|
||||
"telle": ("tel",),
|
||||
"telles": ("tel",),
|
||||
"tels": ("tel",),
|
||||
"ta": ("ton",),
|
||||
"telles": ("tel",),
|
||||
"telle": ("tel",),
|
||||
"tels": ("tel",),
|
||||
"tes": ("ton",),
|
||||
"tous": ("tout",),
|
||||
"toute": ("tout",),
|
||||
"toutes": ("tout",),
|
||||
"des": ("un",),
|
||||
"toute": ("tout",),
|
||||
"une": ("un",),
|
||||
"vingts": ("vingt",),
|
||||
"vot'": ("votre",),
|
||||
"vos": ("votre",),
|
||||
}
|
||||
|
|
|
@ -63,36 +63,8 @@ NOUN_RULES = [
|
|||
["w", "w"],
|
||||
["y", "y"],
|
||||
["z", "z"],
|
||||
["as", "a"],
|
||||
["aux", "au"],
|
||||
["cs", "c"],
|
||||
["chs", "ch"],
|
||||
["ds", "d"],
|
||||
["és", "é"],
|
||||
["es", "e"],
|
||||
["eux", "eu"],
|
||||
["fs", "f"],
|
||||
["gs", "g"],
|
||||
["hs", "h"],
|
||||
["is", "i"],
|
||||
["ïs", "ï"],
|
||||
["js", "j"],
|
||||
["ks", "k"],
|
||||
["ls", "l"],
|
||||
["ms", "m"],
|
||||
["ns", "n"],
|
||||
["oux", "ou"],
|
||||
["os", "o"],
|
||||
["ps", "p"],
|
||||
["qs", "q"],
|
||||
["rs", "r"],
|
||||
["ses", "se"],
|
||||
["se", "se"],
|
||||
["ts", "t"],
|
||||
["us", "u"],
|
||||
["vs", "v"],
|
||||
["ws", "w"],
|
||||
["ys", "y"],
|
||||
["s", ""],
|
||||
["x", ""],
|
||||
["nt(e", "nt"],
|
||||
["nt(e)", "nt"],
|
||||
["al(e", "ale"],
|
||||
|
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -4,37 +4,89 @@ from __future__ import unicode_literals
|
|||
|
||||
PRONOUNS_IRREG = {
|
||||
"aucune": ("aucun",),
|
||||
"celle-ci": ("celui-ci",),
|
||||
"celles-ci": ("celui-ci",),
|
||||
"ceux-ci": ("celui-ci",),
|
||||
"celle-là": ("celui-là",),
|
||||
"celles-là": ("celui-là",),
|
||||
"ceux-là": ("celui-là",),
|
||||
"autres": ("autre",),
|
||||
"ça": ("cela",),
|
||||
"c'": ("ce",),
|
||||
"celle": ("celui",),
|
||||
"celle-ci": ("celui-ci",),
|
||||
"celle-là": ("celui-là",),
|
||||
"celles": ("celui",),
|
||||
"ceux": ("celui",),
|
||||
"celles-ci": ("celui-ci",),
|
||||
"celles-là": ("celui-là",),
|
||||
"certaines": ("certains",),
|
||||
"ceux": ("celui",),
|
||||
"ceux-ci": ("celui-ci",),
|
||||
"ceux-là": ("celui-là",),
|
||||
"chacune": ("chacun",),
|
||||
"-elle": ("lui",),
|
||||
"elle": ("lui",),
|
||||
"elle-même": ("lui-même",),
|
||||
"-elles": ("lui",),
|
||||
"elles": ("lui",),
|
||||
"elles-mêmes": ("lui-même",),
|
||||
"eux": ("lui",),
|
||||
"eux-mêmes": ("lui-même",),
|
||||
"icelle": ("icelui",),
|
||||
"icelles": ("icelui",),
|
||||
"iceux": ("icelui",),
|
||||
"-il": ("il",),
|
||||
"-ils": ("il",),
|
||||
"ils": ("il",),
|
||||
"-je": ("je",),
|
||||
"j'": ("je",),
|
||||
"la": ("le",),
|
||||
"les": ("le",),
|
||||
"laquelle": ("lequel",),
|
||||
"l'autre": ("l'autre",),
|
||||
"les": ("le",),
|
||||
"lesquelles": ("lequel",),
|
||||
"lesquels": ("lequel",),
|
||||
"elle-même": ("lui-même",),
|
||||
"elles-mêmes": ("lui-même",),
|
||||
"eux-mêmes": ("lui-même",),
|
||||
"-leur": ("leur",),
|
||||
"l'on": ("on",),
|
||||
"-lui": ("lui",),
|
||||
"l'une": ("l'un",),
|
||||
"mêmes": ("même",),
|
||||
"-m'": ("me",),
|
||||
"m'": ("me",),
|
||||
"-moi": ("moi",),
|
||||
"nous-mêmes": ("nous-même",),
|
||||
"-nous": ("nous",),
|
||||
"-on": ("on",),
|
||||
"qqchose": ("quelque chose",),
|
||||
"qqch": ("quelque chose",),
|
||||
"qqc": ("quelque chose",),
|
||||
"qqn": ("quelqu'un",),
|
||||
"quelle": ("quel",),
|
||||
"quelles": ("quel",),
|
||||
"quels": ("quel",),
|
||||
"quelques-unes": ("quelqu'un",),
|
||||
"quelques-uns": ("quelqu'un",),
|
||||
"quelques-unes": ("quelques-uns",),
|
||||
"quelque-une": ("quelqu'un",),
|
||||
"quelqu'une": ("quelqu'un",),
|
||||
"quels": ("quel",),
|
||||
"qu": ("que",),
|
||||
"telle": ("tel",),
|
||||
"s'": ("se",),
|
||||
"-t-elle": ("elle",),
|
||||
"-t-elles": ("elle",),
|
||||
"telles": ("tel",),
|
||||
"telle": ("tel",),
|
||||
"tels": ("tel",),
|
||||
"toutes": ("tous",),
|
||||
"-t-en": ("en",),
|
||||
"-t-il": ("il",),
|
||||
"-t-ils": ("il",),
|
||||
"-toi": ("toi",),
|
||||
"-t-on": ("on",),
|
||||
"tous": ("tout",),
|
||||
"toutes": ("tout",),
|
||||
"toute": ("tout",),
|
||||
"-t'": ("te",),
|
||||
"t'": ("te",),
|
||||
"-tu": ("tu",),
|
||||
"-t-y": ("y",),
|
||||
"unes": ("un",),
|
||||
"une": ("un",),
|
||||
"uns": ("un",),
|
||||
"vous-mêmes": ("vous-même",),
|
||||
"vous-même": ("vous-même",),
|
||||
"-vous": ("vous",),
|
||||
"-vs": ("vous",),
|
||||
"vs": ("vous",),
|
||||
"-y": ("y",),
|
||||
}
|
||||
|
|
19
spacy/lang/fr/lemmatizer/_sconj_irreg.py
Normal file
19
spacy/lang/fr/lemmatizer/_sconj_irreg.py
Normal file
|
@ -0,0 +1,19 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
SCONJ_IRREG = {
|
||||
"lorsqu'": ("lorsque",),
|
||||
"pac'que": ("parce que",),
|
||||
"pac'qu'": ("parce que",),
|
||||
"parc'que": ("parce que",),
|
||||
"parc'qu'": ("parce que",),
|
||||
"paske": ("parce que",),
|
||||
"pask'": ("parce que",),
|
||||
"pcq": ("parce que",),
|
||||
"+": ("plus",),
|
||||
"puisqu'": ("puisque",),
|
||||
"qd": ("quand",),
|
||||
"quoiqu'": ("quoique",),
|
||||
"qu'": ("que",)
|
||||
}
|
|
@ -6,63 +6,64 @@ VERBS = set(
|
|||
"""
|
||||
abaisser abandonner abdiquer abecquer abéliser aberrer abhorrer abîmer abjurer
|
||||
ablater abluer ablutionner abominer abonder abonner aborder aborner aboucher
|
||||
abouler abouter abraquer abraser abreuver abricoter abriter absenter absinther
|
||||
absolutiser absorber abuser académifier académiser acagnarder accabler
|
||||
accagner accaparer accastiller accentuer accepter accessoiriser accidenter
|
||||
acclamer acclimater accointer accolader accoler accommoder accompagner
|
||||
accorder accorer accoster accoter accoucher accouder accouer accoupler
|
||||
accoutrer accoutumer accouver accrassiner accréditer accrocher acculer
|
||||
acculturer accumuler accuser acenser acétaliser acétyler achalander acharner
|
||||
acheminer achopper achromatiser aciduler aciériser acliquer acoquiner acquêter
|
||||
acquitter acter actiniser actionner activer actoriser actualiser acupuncturer
|
||||
acyler adapter additionner adenter adieuser adirer adjectiver adjectiviser
|
||||
adjurer adjuver administrer admirer admonester adoniser adonner adopter adorer
|
||||
adorner adosser adouber adresser adsorber aduler adverbialiser aéroporter
|
||||
aérosoliser aérosonder aérotransporter affabuler affacturer affairer affaisser
|
||||
affaiter affaler affamer affecter affectionner affermer afficher affider
|
||||
affiler affiner affirmer affistoler affixer affleurer afflouer affluer affoler
|
||||
afforester affouiller affourcher affriander affricher affrioler affriquer
|
||||
affriter affronter affruiter affubler affurer affûter afghaniser afistoler
|
||||
africaniser agatiser agenouiller agglutiner aggraver agioter agiter agoniser
|
||||
agourmander agrafer agrainer agrémenter agresser agriffer agripper
|
||||
agroalimentariser agrouper aguetter aguicher ahaner aheurter aicher aider
|
||||
aigretter aiguer aiguiller aiguillonner aiguiser ailer ailler ailloliser
|
||||
aimanter aimer airer ajointer ajourer ajourner ajouter ajuster ajuter
|
||||
alambiquer alarmer albaniser albitiser alcaliniser alcaliser alcooliser
|
||||
alcoolyser alcoyler aldoliser alerter aleviner algébriser algérianiser
|
||||
algorithmiser aligner alimenter alinéater alinéatiser aliter alkyler allaiter
|
||||
allectomiser allégoriser allitiser allivrer allocutionner alloter allouer
|
||||
alluder allumer allusionner alluvionner allyler aloter alpaguer alphabétiser
|
||||
alterner aluminer aluminiser aluner alvéoler alvéoliser amabiliser amadouer
|
||||
amalgamer amariner amarrer amateloter ambitionner ambler ambrer ambuler
|
||||
améliorer amender amenuiser américaniser ameulonner ameuter amhariser amiauler
|
||||
amicoter amidonner amignarder amignoter amignotter aminer ammoniaquer
|
||||
ammoniser ammoxyder amocher amouiller amouracher amourer amphotériser ampouler
|
||||
amputer amunitionner amurer amuser anagrammatiser anagrammer analyser
|
||||
anamorphoser anaphylactiser anarchiser anastomoser anathématiser anatomiser
|
||||
ancher anchoiter ancrer anecdoter anecdotiser angéliser anglaiser angler
|
||||
angliciser angoisser anguler animaliser animer aniser ankyloser annexer
|
||||
annihiler annoter annualiser annuler anodiser ânonner anser antagoniser
|
||||
antéposer antérioriser anthropomorphiser anticiper anticoaguler antidater
|
||||
antiparasiter antiquer antiseptiser anuiter aoûter apaiser apériter apetisser
|
||||
apeurer apicaliser apiquer aplaner apologiser aponévrotomiser aponter aposter
|
||||
apostiller apostoliser apostropher apostumer apothéoser appareiller apparenter
|
||||
appeauter appertiser appliquer appointer appoltronner apponter apporter
|
||||
apposer appréhender apprêter apprivoiser approcher approuver approvisionner
|
||||
approximer apurer aquareller arabiser araméiser aramer araser arbitrer arborer
|
||||
arboriser arcbouter arc-bouter archaïser architecturer archiver arçonner
|
||||
ardoiser aréniser arer argenter argentiniser argoter argotiser argumenter
|
||||
arianiser arimer ariser aristocratiser aristotéliser arithmétiser armaturer
|
||||
armer arnaquer aromatiser arpenter arquebuser arquer arracher arraisonner
|
||||
arrenter arrêter arrher arrimer arriser arriver arroser arsouiller
|
||||
artérialiser articler articuler artificialiser artistiquer aryaniser aryler
|
||||
ascensionner ascétiser aseptiser asexuer asianiser asiatiser aspecter
|
||||
asphalter aspirer assabler assaisonner assassiner assembler assener asséner
|
||||
assermenter asserter assibiler assigner assimiler assister assoiffer assoler
|
||||
assommer assoner assoter assumer assurer asticoter astiquer athéiser
|
||||
atlantiser atomiser atourner atropiniser attabler attacher attaquer attarder
|
||||
attenter attentionner atténuer atterrer attester attifer attirer attiser
|
||||
attitrer attraper attremper attribuer attrister attrouper aubiner
|
||||
abouler abouter aboutonner abracadabrer abraquer abraser abreuver abricoter
|
||||
abriter absenter absinther absolutiser absorber abuser académifier académiser
|
||||
acagnarder accabler accagner accaparer accastiller accentuer accepter
|
||||
accessoiriser accidenter acclamer acclimater accointer accolader accoler
|
||||
accommoder accompagner accorder accorer accoster accoter accoucher accouder
|
||||
accouer accoupler accoutrer accoutumer accouver accrassiner accréditer
|
||||
accrocher acculer acculturer accumuler accuser acenser acétaliser acétyler
|
||||
achalander acharner acheminer achopper achromatiser aciduler aciériser
|
||||
acliquer acoquiner acquêter acquitter acter actiniser actionner activer
|
||||
actoriser actualiser acupuncturer acyler adapter additionner adenter adieuser
|
||||
adirer adjectiver adjectiviser adjurer adjuver administrer admirer admonester
|
||||
adoniser adonner adopter adorer adorner adosser adouber adresser adsorber
|
||||
aduler adverbialiser aéroporter aérosoliser aérosonder aérotransporter
|
||||
affabuler affacturer affairer affaisser affaiter affaler affamer affecter
|
||||
affectionner affermer afficher affider affiler affiner affirmer affistoler
|
||||
affixer affleurer afflouer affluer affoler afforester affouiller affourcher
|
||||
affriander affricher affrioler affriquer affriter affronter affruiter affubler
|
||||
affurer affûter afghaniser afistoler africaniser agatiser agenouiller
|
||||
agglutiner aggraver agioter agiter agoniser agourmander agrafer agrainer
|
||||
agrémenter agresser agricher agriffer agripper agroalimentariser agrouper
|
||||
aguetter aguicher aguiller ahaner aheurter aicher aider aigretter aiguer
|
||||
aiguiller aiguillonner aiguiser ailer ailler ailloliser aimanter aimer airer
|
||||
ajointer ajourer ajourner ajouter ajuster ajuter alambiquer alarmer albaniser
|
||||
albitiser alcaliniser alcaliser alcooliser alcoolyser alcoyler aldoliser
|
||||
alerter aleviner algébriser algérianiser algorithmiser aligner alimenter
|
||||
alinéater alinéatiser aliter alkyler allaiter allectomiser allégoriser
|
||||
allitiser allivrer allocutionner alloter allouer alluder allumer allusionner
|
||||
alluvionner allyler aloter alpaguer alphabétiser alterner aluminer aluminiser
|
||||
aluner alvéoler alvéoliser amabiliser amadouer amalgamer amariner amarrer
|
||||
amateloter ambitionner ambler ambrer ambuler améliorer amender amenuiser
|
||||
américaniser ameulonner ameuter amhariser amiauler amicoter amidonner
|
||||
amignarder amignoter amignotter aminer ammoniaquer ammoniser ammoxyder amocher
|
||||
amouiller amouracher amourer amphotériser ampouler amputer amunitionner amurer
|
||||
amuser anagrammatiser anagrammer analyser anamorphoser anaphylactiser
|
||||
anarchiser anastomoser anathématiser anatomiser ancher anchoiter ancrer
|
||||
anecdoter anecdotiser angéliser anglaiser angler angliciser angoisser anguler
|
||||
animaliser animer aniser ankyloser annexer annihiler annoter annualiser
|
||||
annuler anodiser ânonner anser antagoniser antéposer antérioriser
|
||||
anthropomorphiser anticiper anticoaguler antidater antiparasiter antiquer
|
||||
antiseptiser anuiter aoûter apaiser apériter apetisser apeurer apicaliser
|
||||
apiquer aplaner apologiser aponévrotomiser aponter aposter apostiller
|
||||
apostoliser apostropher apostumer apothéoser appareiller apparenter appeauter
|
||||
appertiser appliquer appointer appoltronner apponter apporter apposer
|
||||
appréhender apprêter apprivoiser approcher approuver approvisionner approximer
|
||||
apurer aquareller arabiser araméiser aramer araser arbitrer arborer arboriser
|
||||
arcbouter arc-bouter archaïser architecturer archiver arçonner ardoiser
|
||||
aréniser arer argenter argentiniser argoter argotiser argumenter arianiser
|
||||
arimer ariser aristocratiser aristotéliser arithmétiser armaturer armer
|
||||
arnaquer aromatiser arpenter arquebuser arquer arracher arraisonner arrenter
|
||||
arrêter arrher arrimer arriser arriver arroser arsouiller artérialiser
|
||||
articler articuler artificialiser artistiquer aryaniser aryler ascensionner
|
||||
ascétiser aseptiser asexuer asianiser asiatiser aspecter asphalter aspirer
|
||||
assabler assaisonner assassiner assembler assener asséner assermenter asserter
|
||||
assibiler assigner assimiler assister assoiffer assoler assommer assoner
|
||||
assoter assumer assurer asticoter astiquer athéiser atlantiser atomiser
|
||||
atourner atropiniser attabler attacher attaquer attarder attenter attentionner
|
||||
atténuer atterrer attester attifer attirer attiser attitrer attoucher attraper
|
||||
attremper attribuer attriquer attrister attrouper aubader aubiner
|
||||
audiovisualiser auditer auditionner augmenter augurer aulofer auloffer aumôner
|
||||
auner auréoler ausculter authentiquer autoaccuser autoadapter autoadministrer
|
||||
autoagglutiner autoalimenter autoallumer autoamputer autoanalyser autoancrer
|
||||
|
@ -73,10 +74,10 @@ VERBS = set(
|
|||
autodéterminer autodévelopper autodévorer autodicter autodiscipliner
|
||||
autodupliquer autoéduquer autoenchâsser autoenseigner autoépurer autoéquiper
|
||||
autoévaporiser autoévoluer autoféconder autofertiliser autoflageller
|
||||
autofonder autoformer autofretter autogouverner autogreffer autoguider auto-
|
||||
immuniser auto-ioniser autolégitimer autolimiter autoliquider autolyser
|
||||
automatiser automédiquer automitrailler automutiler autonomiser auto-
|
||||
optimaliser auto-optimiser autoorganiser autoperpétuer autopersuader
|
||||
autofonder autoformer autofretter autogouverner autogreffer autoguider
|
||||
auto-immuniser auto-ioniser autolégitimer autolimiter autoliquider autolyser
|
||||
automatiser automédiquer automitrailler automutiler autonomiser
|
||||
auto-optimaliser auto-optimiser autoorganiser autoperpétuer autopersuader
|
||||
autopiloter autopolliniser autoporter autopositionner autoproclamer
|
||||
autopropulser autoréaliser autorecruter autoréglementer autoréguler
|
||||
autorelaxer autoréparer autoriser autosélectionner autosevrer autostabiliser
|
||||
|
@ -84,7 +85,7 @@ VERBS = set(
|
|||
autotracter autotransformer autovacciner autoventiler avaler avaliser
|
||||
aventurer aveugler avillonner aviner avironner aviser avitailler aviver
|
||||
avoiner avoisiner avorter avouer axéniser axer axiomatiser azimuter azoter
|
||||
azurer babiller babouiner bâcher bachonner bachoter bâcler badauder
|
||||
azurer babiller babouiner bâcher bachonner bachoter bâcler badauder bader
|
||||
badigeonner badiner baffer bafouer bafouiller bâfrer bagarrer bagoter bagouler
|
||||
baguenauder baguer baguetter bahuter baigner bailler bâiller baîller
|
||||
bâillonner baîllonner baiser baisoter baisouiller baisser bakéliser balader
|
||||
|
@ -135,9 +136,9 @@ VERBS = set(
|
|||
brouillonner broussailler brousser brouter bruiner bruisser bruiter brûler
|
||||
brumer brumiser bruncher brusquer brutaliser bruter bûcher bucoliser
|
||||
budgétiser buer buffériser buffler bugler bugner buiser buissonner bulgariser
|
||||
buquer bureaucratiser buriner buser busquer buter butiner butonner butter
|
||||
buvoter byzantiner byzantiniser cabaler cabaliser cabaner câbler cabosser
|
||||
caboter cabotiner cabrer cabrioler cacaber cacaoter cacarder cacher
|
||||
buller buquer bureaucratiser buriner buser busquer buter butiner butonner
|
||||
butter buvoter byzantiner byzantiniser cabaler cabaliser cabaner câbler
|
||||
cabosser caboter cabotiner cabrer cabrioler cacaber cacaoter cacarder cacher
|
||||
cachetonner cachotter cadastrer cadavériser cadeauter cadetter cadoter cadrer
|
||||
cafarder cafeter cafouiller cafter cageoler cagnarder cagner caguer cahoter
|
||||
caillebotter cailler caillouter cajoler calaminer calamistrer calamiter
|
||||
|
@ -185,65 +186,66 @@ VERBS = set(
|
|||
claveliser claver clavetter clayonner cléricaliser clicher cligner clignoter
|
||||
climatiser clinquanter clinquer cliper cliquer clisser cliver clochardiser
|
||||
clocher clocter cloisonner cloîtrer cloner cloper clopiner cloquer clôturer
|
||||
clouer clouter coaccuser coacerver coacher coadapter coagglutiner coaguler
|
||||
coaliser coaltarer coaltariser coanimer coarticuler cobelligérer cocaïniser
|
||||
cocarder cocheniller cocher côcher cochonner coconiser coconner cocooner
|
||||
cocoter coder codéterminer codiller coéditer coéduquer coexister coexploiter
|
||||
coexprimer coffiner coffrer cofonder cogiter cogner cogouverner cohabiter
|
||||
cohériter cohober coiffer coincher coincider coïncider coïter colchiciner
|
||||
collaber collaborer collationner collecter collectionner collectiviser coller
|
||||
collisionner colloquer colluvionner colmater colombianiser colombiner
|
||||
coloniser colorer coloriser colostomiser colporter colpotomiser coltiner
|
||||
columniser combiner combler commander commanditer commémorer commenter
|
||||
commercialiser comminer commissionner commotionner commuer communaliser
|
||||
communautariser communiquer communiser commuter compacifier compacter comparer
|
||||
compartimenter compenser compiler compisser complanter complémenter
|
||||
complétiviser complexer complimenter compliquer comploter comporter composer
|
||||
composter compoter compounder compresser comprimer comptabiliser compter
|
||||
compulser computer computériser concentrer conceptualiser concerner concerter
|
||||
concher conciliabuler concocter concomiter concorder concrétionner concrétiser
|
||||
concubiner condamner condenser condimenter conditionner confabuler
|
||||
confectionner confédéraliser confesser confessionnaliser configurer confiner
|
||||
confirmer confisquer confiter confluer conformer conforter confronter
|
||||
confusionner congestionner conglober conglutiner congoliser congratuler
|
||||
coniser conjecturer conjointer conjuger conjuguer conjurer connecter conniver
|
||||
connoter conquêter consacrer conscientiser conseiller conserver consigner
|
||||
consister consoler consolider consommariser consommer consonantiser consoner
|
||||
conspirer conspuer constater consteller conster consterner constiper
|
||||
constituer constitutionnaliser consulter consumer contacter contagionner
|
||||
containeriser containériser contaminer contemner contempler conteneuriser
|
||||
contenter conter contester contextualiser continentaliser contingenter
|
||||
continuer contorsionner contourner contracter contractualiser contracturer
|
||||
contraposer contraster contre-attaquer contrebouter contrebuter contrecalquer
|
||||
contrecarrer contre-expertiser contreficher contrefraser contre-indiquer
|
||||
contremander contremanifester contremarcher contremarquer contreminer
|
||||
contremurer contrenquêter contreplaquer contrepointer contrer contresigner
|
||||
contrespionner contretyper contreventer contribuer contrister contrôler
|
||||
controuver controverser contusionner conventionnaliser conventionner
|
||||
conventualiser converser convoiter convoler convoquer convulser convulsionner
|
||||
cooccuper coopératiser coopter coordonner coorganiser coparrainer coparticiper
|
||||
copermuter copiner copolycondenser copolymériser coprésenter coprésider copser
|
||||
copter copuler copyrighter coqueliner coquer coqueriquer coquiller corailler
|
||||
corder cordonner coréaliser coréaniser coréguler coresponsabiliser cornaquer
|
||||
cornemuser corner coroniser corporiser correctionaliser correctionnaliser
|
||||
correler corréler corroborer corroder corser corticaliser cosigner cosmétiquer
|
||||
cosser costumer coter cotillonner cotiser cotonner cotransfecter couaquer
|
||||
couarder couchailler coucher couchoter couchotter coucouer coucouler couder
|
||||
coudrer couillonner couiner couler coulisser coupailler coupeller couper
|
||||
couperoser coupler couponner courailler courbaturer courber courbetter
|
||||
courcailler couronner courrieler courser courtauder court-circuiter courtiser
|
||||
cousiner coussiner coûter couturer couver cracher crachiner crachoter
|
||||
crachouiller crailler cramer craminer cramper cramponner crampser cramser
|
||||
craner crâner crânoter cranter crapahuter crapaüter crapser crapuler craquer
|
||||
crasher cratériser craticuler cratoniser cravacher cravater crawler crayonner
|
||||
crédibiliser créditer crématiser créoliser créosoter crêper crépiner crépiter
|
||||
crésyler crêter crétiniser creuser criailler cribler criminaliser criquer
|
||||
crisper crisser cristalliser criticailler critiquer crocher croiser crôler
|
||||
croquer croskiller crosser crotoniser crotter crouler croupionner crouponner
|
||||
clotûrer clouer clouter coaccuser coacerver coacher coadapter coagglutiner
|
||||
coaguler coaliser coaltarer coaltariser coanimer coarticuler cobelligérer
|
||||
cocaïniser cocarder cocheniller cocher côcher cochonner coconiser coconner
|
||||
cocooner cocoter coder codéterminer codiller coéditer coéduquer coexister
|
||||
coexploiter coexprimer coffiner coffrer cofonder cogiter cogner cogouverner
|
||||
cohabiter cohériter cohober coiffer coincher coincider coïncider coïter
|
||||
colchiciner collaber collaborer collationner collecter collectionner
|
||||
collectiviser coller collisionner colloquer colluvionner colmater
|
||||
colombianiser colombiner coloniser colorer coloriser colostomiser colporter
|
||||
colpotomiser coltiner columniser combiner combler commander commanditer
|
||||
commémorer commenter commercialiser comminer commissionner commotionner
|
||||
commuer communaliser communautariser communiquer communiser commuter
|
||||
compacifier compacter comparer compartimenter compenser compiler compisser
|
||||
complanter complémenter complétiviser complexer complimenter compliquer
|
||||
comploter comporter composer composter compoter compounder compresser
|
||||
comprimer comptabiliser compter compulser computer computériser concentrer
|
||||
conceptualiser concerner concerter concher conciliabuler concocter concomiter
|
||||
concorder concrétionner concrétiser concubiner condamner condenser condimenter
|
||||
conditionner confabuler confectionner confédéraliser confesser
|
||||
confessionnaliser configurer confiner confirmer confisquer confiter confluer
|
||||
conformer conforter confronter confusionner congestionner conglober
|
||||
conglutiner congoliser congratuler coniser conjecturer conjointer conjuger
|
||||
conjuguer conjurer connecter conniver connoter conquêter consacrer
|
||||
conscientiser conseiller conserver consigner consister consoler consolider
|
||||
consommariser consommer consonantiser consoner conspirer conspuer constater
|
||||
consteller conster consterner constiper constituer constitutionnaliser
|
||||
consulter consumer contacter contagionner containeriser containériser
|
||||
contaminer contemner contempler conteneuriser contenter conter contester
|
||||
contextualiser continentaliser contingenter continuer contorsionner contourner
|
||||
contracter contractualiser contracturer contraposer contraster contre-attaquer
|
||||
contrebouter contrebuter contrecalquer contrecarrer contre-expertiser
|
||||
contreficher contrefraser contre-indiquer contremander contremanifester
|
||||
contremarcher contremarquer contreminer contremurer contrenquêter
|
||||
contreplaquer contrepointer contrer contresigner contrespionner contretyper
|
||||
contreventer contribuer contrister contrôler controuver controverser
|
||||
contusionner conventionnaliser conventionner conventualiser converser
|
||||
convoiter convoler convoquer convulser convulsionner cooccuper coopératiser
|
||||
coopter coordonner coorganiser coparrainer coparticiper copermuter copiner
|
||||
copolycondenser copolymériser coprésenter coprésider copser copter copuler
|
||||
copyrighter coqueliner coquer coqueriquer coquiller corailler corder cordonner
|
||||
coréaliser coréaniser coréguler coresponsabiliser cornaquer cornemuser corner
|
||||
coroniser corporiser correctionaliser correctionnaliser correler corréler
|
||||
corroborer corroder corser corticaliser cosigner cosmétiquer cosser costumer
|
||||
coter cotillonner cotiser cotonner cotransfecter couaquer couarder couchailler
|
||||
coucher couchoter couchotter coucouer coucouler couder coudrer couillonner
|
||||
couiner couler coulisser coupailler coupeller couper couperoser coupler
|
||||
couponner courailler courbaturer courber courbetter courcailler couronner
|
||||
courrieler courser courtauder court-circuiter courtiser cousiner coussiner
|
||||
coûter couturer couver cracher crachiner crachoter crachouiller crailler
|
||||
cramer craminer cramper cramponner crampser cramser craner crâner crânoter
|
||||
cranter crapahuter crapaüter crapser crapuler craquer crasher cratériser
|
||||
craticuler cratoniser cravacher cravater crawler crayonner crédibiliser
|
||||
créditer crématiser créoliser créosoter crêper crépiner crépiter crésyler
|
||||
crêter crétiniser creuser criailler cribler criminaliser criquer crisper
|
||||
crisser cristalliser criticailler critiquer crocher croiser crôler croquer
|
||||
croskiller crosser crotoniser crotter crouler croupionner crouponner
|
||||
croustiller croûter croûtonner cryoappliquer cryocautériser cryocoaguler
|
||||
cryoconcentrer cryodécaper cryoébarber cryofixer cryogéniser cryomarquer
|
||||
cryosorber crypter cuber cueiller cuider cuisiner cuiter cuivrer culbuter
|
||||
culer culminer culotter culpabiliser cultiver culturaliser cumuler curariser
|
||||
cryosorber crypter cuber cueiller cuider cuisiner cuivrer culbuter culer
|
||||
culminer culotter culpabiliser cultiver culturaliser cumuler curariser
|
||||
curedenter curer curetter customiser cuter cutiniser cuver cyaniser cyanoser
|
||||
cyanurer cybernétiser cycler cycliser cycloner cylindrer dactylocoder daguer
|
||||
daguerréotyper daïer daigner dailler daller damasquiner damer damner
|
||||
|
@ -748,8 +750,8 @@ VERBS = set(
|
|||
mithridatiser mitonner mitrailler mixer mixter mixtionner mobiliser modaliser
|
||||
modéliser modérantiser moderniser moduler moellonner mofler moirer moiser
|
||||
moissonner molarder molariser moléculariser molester moletter mollarder
|
||||
molletter monarchiser mondaniser monder mondialiser monétariser monétiser
|
||||
moniliser monologuer monomériser monophtonguer monopoler monopoliser
|
||||
molletonner molletter monarchiser mondaniser monder mondialiser monétariser
|
||||
monétiser moniliser monologuer monomériser monophtonguer monopoler monopoliser
|
||||
monoprogrammer monosiallitiser monotoniser monseigneuriser monter montrer
|
||||
monumentaliser moquer moquetter morailler moraliser mordailler mordiller
|
||||
mordillonner mordorer mordoriser morfailler morfaler morfiler morfler morganer
|
||||
|
@ -792,63 +794,64 @@ VERBS = set(
|
|||
palpiter palucher panacher panader pancarter paner paniquer panneauter panner
|
||||
pannetonner panoramiquer panser pantiner pantomimer pantoufler paoner paonner
|
||||
papelarder papillonner papilloter papoter papouiller paquer paraboliser
|
||||
parachuter parader parafer paraffiner paralléliser paralyser paramétriser
|
||||
parangonner parapher paraphraser parasiter parcellariser parceller parcelliser
|
||||
parcheminer parcoriser pardonner parementer parenthétiser parer paresser
|
||||
parfiler parfumer parisianiser parjurer parkériser parlementer parler parloter
|
||||
parlotter parquer parrainer participer particulariser partitionner partouzer
|
||||
pasquiner pasquiniser passefiler passementer passepoiler passeriller
|
||||
passionnaliser passionner pasteller pasteuriser pasticher pastiller pastoriser
|
||||
patafioler pateliner patenter paternaliser paterner pathétiser patienter
|
||||
patiner pâtisser patoiser pâtonner patouiller patrimonialiser patrociner
|
||||
patronner patrouiller patter pâturer paumer paupériser pauser pavaner paver
|
||||
pavoiser peaufiner pébriner pécher pêcher pécloter pectiser pédaler pédanter
|
||||
pédantiser pédiculiser pédicurer pédimenter peigner peiner peinturer
|
||||
peinturlurer péjorer pelaner pelauder péleriner pèleriner pelletiser
|
||||
pelleverser pelliculer peloter pelotonner pelucher pelurer pénaliser pencher
|
||||
pendeloquer pendiller pendouiller penduler pénéplaner penser pensionner
|
||||
peptiser peptoniser percaliner percher percoler percuter perdurer pérégriner
|
||||
pérenniser perfectionner perforer performer perfuser péricliter périmer
|
||||
périodiser périphériser périphraser péritoniser perler permanenter permaner
|
||||
perméabiliser permuter pérorer pérouaniser peroxyder perpétuer perquisitionner
|
||||
perreyer perruquer persécuter persifler persiller persister personnaliser
|
||||
persuader perturber pervibrer pester pétarader pétarder pétiller pétitionner
|
||||
pétocher pétouiller pétrarquiser pétroliser pétuner peupler pexer
|
||||
phacoémulsifier phagocyter phalangiser pharyngaliser phéniquer phénoler
|
||||
phényler philosophailler philosopher phlébotomiser phlegmatiser phlogistiquer
|
||||
phonétiser phonologiser phosphater phosphorer phosphoriser phosphoryler
|
||||
photoactiver photocomposer photograver photo-ioniser photoïoniser photomonter
|
||||
photophosphoryler photopolymériser photosensibiliser phraser piaffer piailler
|
||||
pianomiser pianoter piauler pickler picocher picoler picorer picoter picouser
|
||||
picouzer picrater pictonner picturaliser pidginiser piédestaliser pierrer
|
||||
piétiner piétonnifier piétonniser pieuter pifer piffer piffrer pigeonner
|
||||
pigmenter pigner pignocher pignoler piler piller pilloter pilonner piloter
|
||||
pimenter pinailler pinceauter pinçoter pindariser pinter piocher pionner
|
||||
piotter piper piqueniquer pique-niquer piquer piquetonner piquouser piquouzer
|
||||
pirater pirouetter piser pisser pissoter pissouiller pistacher pister pistoler
|
||||
pistonner pitancher pitcher piter pitonner pituiter pivoter placarder
|
||||
placardiser plafonner plaider plainer plaisanter plamer plancher planer
|
||||
planétariser planétiser planquer planter plaquer plasmolyser plastiquer
|
||||
plastronner platiner platiniser platoniser plâtrer plébisciter pleurailler
|
||||
pleuraliser pleurer pleurnicher pleuroter pleuviner pleuvioter pleuvoter
|
||||
plisser plissoter plomber ploquer plotiniser plouter ploutrer plucher
|
||||
plumarder plumer pluraliser plussoyer pluviner pluvioter pocharder pocher
|
||||
pochetronner pochtronner poculer podzoliser poêler poétiser poignarder poigner
|
||||
poiler poinçonner pointer pointiller poireauter poirer poiroter poisser
|
||||
poitriner poivrer poivroter polariser poldériser polémiquer polissonner
|
||||
politicailler politiquer politiser polker polliciser polliniser polluer
|
||||
poloniser polychromer polycontaminer polygoner polygoniser polymériser
|
||||
polyploïdiser polytransfuser polyviser pommader pommer pomper pomponner
|
||||
ponctionner ponctuer ponter pontiller populariser poquer porer porphyriser
|
||||
porter porteuser portionner portoricaniser portraicturer portraiturer poser
|
||||
positionner positiver possibiliser postdater poster postérioriser posticher
|
||||
postillonner postposer postsonoriser postsynchroniser postuler potabiliser
|
||||
potentialiser poter poteyer potiner poudrer pouffer pouiller pouliner pouloper
|
||||
poulotter pouponner pourpenser pourprer poussailler pousser poutser praliner
|
||||
pratiquer préaccentuer préadapter préallouer préassembler préassimiler
|
||||
préaviser précariser précautionner prêchailler préchauffer préchauler prêcher
|
||||
précipiter préciser préciter précompter préconditionner préconfigurer
|
||||
préconiser préconstituer précoter prédater prédécouper prédésigner prédestiner
|
||||
parachuter parader parafer paraffiner paraisonner paralléliser paralyser
|
||||
paramétriser parangonner parapher paraphraser parasiter parcellariser
|
||||
parceller parcelliser parcheminer parcoriser pardonner parementer
|
||||
parenthétiser parer paresser parfiler parfumer parisianiser parjurer
|
||||
parkériser parlementer parler parloter parlotter parquer parrainer participer
|
||||
particulariser partitionner partouzer pasquiner pasquiniser passefiler
|
||||
passementer passepoiler passeriller passionnaliser passionner pasteller
|
||||
pasteuriser pasticher pastiller pastoriser patafioler pateliner patenter
|
||||
paternaliser paterner pathétiser patienter patiner pâtisser patoiser pâtonner
|
||||
patouiller patrimonialiser patrociner patronner patrouiller patter pâturer
|
||||
paumer paupériser pauser pavaner paver pavoiser peaufiner pébriner pécher
|
||||
pêcher pécloter pectiser pédaler pédanter pédantiser pédiculiser pédicurer
|
||||
pédimenter peigner peiner peinturer peinturlurer péjorer pelaner pelauder
|
||||
péleriner pèleriner pelletiser pelleverser pelliculer peloter pelotonner
|
||||
pelucher pelurer pénaliser pencher pendeloquer pendiller pendouiller penduler
|
||||
pénéplaner penser pensionner peptiser peptoniser percaliner percher percoler
|
||||
percuter perdurer pérégriner pérenniser perfectionner perforer performer
|
||||
perfuser péricliter périmer périodiser périphériser périphraser péritoniser
|
||||
perler permanenter permaner perméabiliser permuter pérorer pérouaniser
|
||||
peroxyder perpétuer perquisitionner perreyer perruquer persécuter persifler
|
||||
persiller persister personnaliser persuader perturber pervibrer pester
|
||||
pétarader pétarder pétiller pétitionner pétocher pétouiller pétrarquiser
|
||||
pétroliser pétuner peupler pexer phacoémulsifier phagocyter phalangiser
|
||||
pharyngaliser phéniquer phénoler phényler philosophailler philosopher
|
||||
phlébotomiser phlegmatiser phlogistiquer phonétiser phonologiser phosphater
|
||||
phosphorer phosphoriser phosphoryler photoactiver photocomposer photograver
|
||||
photo-ioniser photoïoniser photomonter photophosphoryler photopolymériser
|
||||
photosensibiliser phraser piaffer piailler pianomiser pianoter piauler pickler
|
||||
picocher picoler picorer picoter picouser picouzer picrater pictonner
|
||||
picturaliser pidginiser piédestaliser pierrer piétiner piétonnifier
|
||||
piétonniser pieuter pifer piffer piffrer pigeonner pigmenter pigner pignocher
|
||||
pignoler piler piller pilloter pilonner piloter pimenter pinailler pinceauter
|
||||
pinçoter pindariser pinter piocher pionner piotter piper piqueniquer
|
||||
pique-niquer piquer piquetonner piquouser piquouzer pirater pirouetter piser
|
||||
pisser pissoter pissouiller pistacher pister pistoler pistonner pitancher
|
||||
pitcher piter pitonner pituiter pivoter placarder placardiser plafonner
|
||||
plaider plainer plaisanter plamer plancher planer planétariser planétiser
|
||||
planquer planter plaquer plasmolyser plastiquer plastronner platiner
|
||||
platiniser platoniser plâtrer plébisciter pleurailler pleuraliser pleurer
|
||||
pleurnicher pleuroter pleuviner pleuvioter pleuvoter plisser plissoter plomber
|
||||
ploquer plotiniser plouter ploutrer plucher plumarder plumer pluraliser
|
||||
plussoyer pluviner pluvioter pocharder pocher pochetronner pochtronner poculer
|
||||
podzoliser poêler poétiser poignarder poigner poiler poinçonner pointer
|
||||
pointiller poireauter poirer poiroter poisser poitriner poivrer poivroter
|
||||
polariser poldériser polémiquer polissonner politicailler politiquer politiser
|
||||
polker polliciser polliniser polluer poloniser polychromer polycontaminer
|
||||
polygoner polygoniser polymériser polyploïdiser polytransfuser polyviser
|
||||
pommader pommer pomper pomponner ponctionner ponctuer ponter pontiller
|
||||
populariser poquer porer porphyriser porter porteuser portionner
|
||||
portoricaniser portraicturer portraiturer poser positionner positiver
|
||||
possibiliser postdater poster postérioriser posticher postillonner postposer
|
||||
postsonoriser postsynchroniser postuler potabiliser potentialiser poter
|
||||
poteyer potiner poudrer pouffer pouiller pouliner pouloper poulotter pouponner
|
||||
pourpenser pourprer poussailler pousser poutser praliner pratiquer
|
||||
préaccentuer préadapter préallouer préassembler préassimiler préaviser
|
||||
précariser précautionner prêchailler préchauffer préchauler prêcher précipiter
|
||||
préciser préciter précompter préconditionner préconfigurer préconiser
|
||||
préconstituer précoter prédater prédécouper prédésigner prédestiner
|
||||
prédéterminer prédiffuser prédilectionner prédiquer prédisposer prédominer
|
||||
préemballer préempter préencoller préenregistrer préenrober préexaminer
|
||||
préexister préfabriquer préfaner préfigurer préfixer préformater préformer
|
||||
|
@ -879,8 +882,8 @@ VERBS = set(
|
|||
raccommoder raccompagner raccorder raccoutrer raccoutumer raccrocher racémiser
|
||||
rachalander racher raciner racketter racler râcler racoler raconter racoquiner
|
||||
radariser rader radicaliser radiner radioactiver radiobaliser radiocommander
|
||||
radioconserver radiodétecter radiodiffuser radioexposer radioguider radio-
|
||||
immuniser radiolocaliser radiopasteuriser radiosonder radiostériliser
|
||||
radioconserver radiodétecter radiodiffuser radioexposer radioguider
|
||||
radio-immuniser radiolocaliser radiopasteuriser radiosonder radiostériliser
|
||||
radiotéléphoner radiotéléviser radoter radouber rafaler raffermer raffiler
|
||||
raffiner raffluer raffoler raffûter rafistoler rafler ragoter ragoûter
|
||||
ragrafer raguer raguser raiguiser railler rainer rainurer raisonner rajouter
|
||||
|
@ -1123,19 +1126,21 @@ VERBS = set(
|
|||
sommer somnambuler somniloquer somnoler sonder sonnailler sonner sonoriser
|
||||
sophistiquer sorguer soubresauter souder souffler souffroter soufrer souhaiter
|
||||
souiller souillonner soûler souligner soûlotter soumissionner soupailler
|
||||
soupçonner souper soupirer souquer sourciller sourdiner sous-capitaliser sous-
|
||||
catégoriser sousestimer sous-estimer sous-industrialiser sous-médicaliser
|
||||
sousperformer sous-qualifier soussigner sous-titrer sous-utiliser soutacher
|
||||
souter soutirer soviétiser spammer spasmer spatialiser spatuler spécialiser
|
||||
spéculer sphéroïdiser spilitiser spiraler spiraliser spirantiser spiritualiser
|
||||
spitter splénectomiser spléniser sponsoriser sporter sporuler sprinter
|
||||
squatériser squatter squatteriser squattériser squeezer stabiliser stabuler
|
||||
staffer stagner staliniser standardiser standoliser stanioler stariser
|
||||
stationner statistiquer statuer stelliter stenciler stendhaliser sténoser
|
||||
sténotyper stepper stéréotyper stériliser stigmatiser stimuler stipuler
|
||||
stocker stoloniser stopper stranguler stratégiser stresser strider striduler
|
||||
striper stripper striquer stronker strouiller structurer strychniser stuquer
|
||||
styler styliser subalterniser subdiviser subdivisionner subériser subjectiver
|
||||
soupçonner souper soupirer souquer sourciller sourdiner sous-alimenter
|
||||
sous-capitaliser sous-catégoriser sous-équiper sousestimer sous-estimer
|
||||
sous-évaluer sous-exploiter sous-exposer sous-industrialiser sous-louer
|
||||
sous-médicaliser sousperformer sous-qualifier soussigner sous-titrer
|
||||
sous-traiter sous-utiliser sous-virer soutacher souter soutirer soviétiser
|
||||
spammer spasmer spatialiser spatuler spécialiser spéculer sphéroïdiser
|
||||
spilitiser spiraler spiraliser spirantiser spiritualiser spitter
|
||||
splénectomiser spléniser sponsoriser sporter sporuler sprinter squatériser
|
||||
squatter squatteriser squattériser squeezer stabiliser stabuler staffer
|
||||
stagner staliniser standardiser standoliser stanioler stariser stationner
|
||||
statistiquer statuer stelliter stenciler stendhaliser sténoser sténotyper
|
||||
stepper stéréotyper stériliser stigmatiser stimuler stipuler stocker
|
||||
stoloniser stopper stranguler stratégiser stresser strider striduler striper
|
||||
stripper striquer stronker strouiller structurer strychniser stuquer styler
|
||||
styliser subalterniser subdiviser subdivisionner subériser subjectiver
|
||||
subjectiviser subjuguer sublimer sublimiser subluxer subminiaturiser subodorer
|
||||
subordonner suborner subsister substanter substantialiser substantiver
|
||||
substituer subsumer subtiliser suburbaniser subventionner succomber suçoter
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,7 +1,7 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from ....symbols import POS, NOUN, VERB, ADJ, ADV, PRON, DET, AUX, PUNCT
|
||||
from ....symbols import POS, NOUN, VERB, ADJ, ADV, PRON, DET, AUX, PUNCT, ADP, SCONJ, CCONJ
|
||||
from ....symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
|
||||
from .lookup import LOOKUP
|
||||
|
||||
|
@ -9,7 +9,7 @@ from .lookup import LOOKUP
|
|||
French language lemmatizer applies the default rule based lemmatization
|
||||
procedure with some modifications for better French language support.
|
||||
|
||||
The parts of speech 'ADV', 'PRON', 'DET' and 'AUX' are added to use the
|
||||
The parts of speech 'ADV', 'PRON', 'DET', 'ADP' and 'AUX' are added to use the
|
||||
rule-based lemmatization. As a last resort, the lemmatizer checks in
|
||||
the lookup table.
|
||||
'''
|
||||
|
@ -34,16 +34,22 @@ class FrenchLemmatizer(object):
|
|||
univ_pos = 'verb'
|
||||
elif univ_pos in (ADJ, 'ADJ', 'adj'):
|
||||
univ_pos = 'adj'
|
||||
elif univ_pos in (ADP, 'ADP', 'adp'):
|
||||
univ_pos = 'adp'
|
||||
elif univ_pos in (ADV, 'ADV', 'adv'):
|
||||
univ_pos = 'adv'
|
||||
elif univ_pos in (PRON, 'PRON', 'pron'):
|
||||
univ_pos = 'pron'
|
||||
elif univ_pos in (DET, 'DET', 'det'):
|
||||
univ_pos = 'det'
|
||||
elif univ_pos in (AUX, 'AUX', 'aux'):
|
||||
univ_pos = 'aux'
|
||||
elif univ_pos in (CCONJ, 'CCONJ', 'cconj'):
|
||||
univ_pos = 'cconj'
|
||||
elif univ_pos in (DET, 'DET', 'det'):
|
||||
univ_pos = 'det'
|
||||
elif univ_pos in (PRON, 'PRON', 'pron'):
|
||||
univ_pos = 'pron'
|
||||
elif univ_pos in (PUNCT, 'PUNCT', 'punct'):
|
||||
univ_pos = 'punct'
|
||||
elif univ_pos in (SCONJ, 'SCONJ', 'sconj'):
|
||||
univ_pos = 'sconj'
|
||||
else:
|
||||
return [self.lookup(string)]
|
||||
# See Issue #435 for example of where this logic is requied.
|
||||
|
@ -100,7 +106,7 @@ class FrenchLemmatizer(object):
|
|||
|
||||
def lookup(self, string):
|
||||
if string in self.lookup_table:
|
||||
return self.lookup_table[string]
|
||||
return self.lookup_table[string][0]
|
||||
return string
|
||||
|
||||
|
||||
|
@ -125,7 +131,7 @@ def lemmatize(string, index, exceptions, rules):
|
|||
if not forms:
|
||||
forms.extend(oov_forms)
|
||||
if not forms and string in LOOKUP.keys():
|
||||
forms.append(LOOKUP[string])
|
||||
forms.append(LOOKUP[string][0])
|
||||
if not forms:
|
||||
forms.append(string)
|
||||
return list(set(forms))
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,16 +1,15 @@
|
|||
# encoding: utf8
|
||||
from __future__ import unicode_literals, print_function
|
||||
|
||||
from ...language import Language
|
||||
from ...attrs import LANG
|
||||
from ...tokens import Doc, Token
|
||||
from ...tokenizer import Tokenizer
|
||||
from ... import util
|
||||
from .tag_map import TAG_MAP
|
||||
|
||||
import re
|
||||
from collections import namedtuple
|
||||
|
||||
from .tag_map import TAG_MAP
|
||||
|
||||
from ...attrs import LANG
|
||||
from ...language import Language
|
||||
from ...tokens import Doc, Token
|
||||
from ...util import DummyTokenizer
|
||||
|
||||
ShortUnitWord = namedtuple("ShortUnitWord", ["surface", "lemma", "pos"])
|
||||
|
||||
|
@ -46,12 +45,12 @@ def resolve_pos(token):
|
|||
# PoS mappings.
|
||||
|
||||
if token.pos == "連体詞,*,*,*":
|
||||
if re.match("^[こそあど此其彼]の", token.surface):
|
||||
if re.match(r"[こそあど此其彼]の", token.surface):
|
||||
return token.pos + ",DET"
|
||||
if re.match("^[こそあど此其彼]", token.surface):
|
||||
if re.match(r"[こそあど此其彼]", token.surface):
|
||||
return token.pos + ",PRON"
|
||||
else:
|
||||
return token.pos + ",ADJ"
|
||||
return token.pos + ",ADJ"
|
||||
|
||||
return token.pos
|
||||
|
||||
|
||||
|
@ -68,7 +67,8 @@ def detailed_tokens(tokenizer, text):
|
|||
pos = ",".join(parts[0:4])
|
||||
|
||||
if len(parts) > 7:
|
||||
# this information is only available for words in the tokenizer dictionary
|
||||
# this information is only available for words in the tokenizer
|
||||
# dictionary
|
||||
base = parts[7]
|
||||
|
||||
words.append(ShortUnitWord(surface, base, pos))
|
||||
|
@ -76,38 +76,27 @@ def detailed_tokens(tokenizer, text):
|
|||
return words
|
||||
|
||||
|
||||
class JapaneseTokenizer(object):
|
||||
class JapaneseTokenizer(DummyTokenizer):
|
||||
def __init__(self, cls, nlp=None):
|
||||
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
||||
|
||||
MeCab = try_mecab_import()
|
||||
self.tokenizer = MeCab.Tagger()
|
||||
self.tokenizer = try_mecab_import().Tagger()
|
||||
self.tokenizer.parseToNode("") # see #2901
|
||||
|
||||
def __call__(self, text):
|
||||
dtokens = detailed_tokens(self.tokenizer, text)
|
||||
|
||||
words = [x.surface for x in dtokens]
|
||||
doc = Doc(self.vocab, words=words, spaces=[False] * len(words))
|
||||
spaces = [False] * len(words)
|
||||
doc = Doc(self.vocab, words=words, spaces=spaces)
|
||||
|
||||
for token, dtoken in zip(doc, dtokens):
|
||||
token._.mecab_tag = dtoken.pos
|
||||
token.tag_ = resolve_pos(dtoken)
|
||||
token.lemma_ = dtoken.lemma
|
||||
|
||||
return doc
|
||||
|
||||
# add dummy methods for to_bytes, from_bytes, to_disk and from_disk to
|
||||
# allow serialization (see #1557)
|
||||
def to_bytes(self, **exclude):
|
||||
return b""
|
||||
|
||||
def from_bytes(self, bytes_data, **exclude):
|
||||
return self
|
||||
|
||||
def to_disk(self, path, **exclude):
|
||||
return None
|
||||
|
||||
def from_disk(self, path, **exclude):
|
||||
return self
|
||||
|
||||
|
||||
class JapaneseCharacterSegmenter(object):
|
||||
def __init__(self, vocab):
|
||||
|
@ -154,7 +143,8 @@ class JapaneseCharacterSegmenter(object):
|
|||
|
||||
class JapaneseDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters[LANG] = lambda text: "ja"
|
||||
lex_attr_getters[LANG] = lambda _text: "ja"
|
||||
|
||||
tag_map = TAG_MAP
|
||||
use_janome = True
|
||||
|
||||
|
@ -169,7 +159,6 @@ class JapaneseDefaults(Language.Defaults):
|
|||
class Japanese(Language):
|
||||
lang = "ja"
|
||||
Defaults = JapaneseDefaults
|
||||
Tokenizer = JapaneseTokenizer
|
||||
|
||||
def make_doc(self, text):
|
||||
return self.tokenizer(text)
|
||||
|
|
|
@ -5,6 +5,7 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
|||
from .stop_words import STOP_WORDS
|
||||
from .morph_rules import MORPH_RULES
|
||||
from .lemmatizer import LEMMA_RULES, LOOKUP
|
||||
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ..norm_exceptions import BASE_NORMS
|
||||
|
@ -20,12 +21,14 @@ class SwedishDefaults(Language.Defaults):
|
|||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
|
||||
)
|
||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||
morph_rules = MORPH_RULES
|
||||
infixes = TOKENIZER_INFIXES
|
||||
suffixes = TOKENIZER_SUFFIXES
|
||||
stop_words = STOP_WORDS
|
||||
lemma_rules = LEMMA_RULES
|
||||
lemma_lookup = LOOKUP
|
||||
morph_rules = MORPH_RULES
|
||||
|
||||
|
||||
class Swedish(Language):
|
||||
lang = "sv"
|
||||
Defaults = SwedishDefaults
|
||||
|
|
|
@ -233167,7 +233167,6 @@ LOOKUP = {
|
|||
"jades": "jade",
|
||||
"jaet": "ja",
|
||||
"jaets": "ja",
|
||||
"jag": "jaga",
|
||||
"jagad": "jaga",
|
||||
"jagade": "jaga",
|
||||
"jagades": "jaga",
|
||||
|
|
25
spacy/lang/sv/punctuation.py
Normal file
25
spacy/lang/sv/punctuation.py
Normal file
|
@ -0,0 +1,25 @@
|
|||
# coding: utf8
|
||||
"""Punctuation stolen from Danish"""
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from ..char_classes import LIST_ELLIPSES, LIST_ICONS
|
||||
from ..char_classes import QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
|
||||
from ..punctuation import TOKENIZER_SUFFIXES
|
||||
|
||||
|
||||
_quotes = QUOTES.replace("'", '')
|
||||
|
||||
_infixes = (LIST_ELLIPSES + LIST_ICONS +
|
||||
[r'(?<=[{}])\.(?=[{}])'.format(ALPHA_LOWER, ALPHA_UPPER),
|
||||
r'(?<=[{a}])[,!?](?=[{a}])'.format(a=ALPHA),
|
||||
r'(?<=[{a}"])[:<>=](?=[{a}])'.format(a=ALPHA),
|
||||
r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
|
||||
r'(?<=[{a}])([{q}\)\]\(\[])(?=[\{a}])'.format(a=ALPHA, q=_quotes),
|
||||
r'(?<=[{a}])--(?=[{a}])'.format(a=ALPHA)])
|
||||
|
||||
_suffixes = [suffix for suffix in TOKENIZER_SUFFIXES if suffix not in ["'s", "'S", "’s", "’S", r"\'"]]
|
||||
_suffixes += [r"(?<=[^sSxXzZ])\'"]
|
||||
|
||||
|
||||
TOKENIZER_INFIXES = _infixes
|
||||
TOKENIZER_SUFFIXES = _suffixes
|
|
@ -26,14 +26,15 @@ for verb_data in [
|
|||
{ORTH: "u", LEMMA: PRON_LEMMA, NORM: "du"},
|
||||
]
|
||||
|
||||
|
||||
# Abbreviations for weekdays "sön." (for "söndag" / "söner")
|
||||
# are left out because they are ambiguous. The same is the case
|
||||
# for abbreviations "jul." and "Jul." ("juli" / "jul").
|
||||
for exc_data in [
|
||||
{ORTH: "jan.", LEMMA: "januari"},
|
||||
{ORTH: "febr.", LEMMA: "februari"},
|
||||
{ORTH: "feb.", LEMMA: "februari"},
|
||||
{ORTH: "apr.", LEMMA: "april"},
|
||||
{ORTH: "jun.", LEMMA: "juni"},
|
||||
{ORTH: "jul.", LEMMA: "juli"},
|
||||
{ORTH: "aug.", LEMMA: "augusti"},
|
||||
{ORTH: "sept.", LEMMA: "september"},
|
||||
{ORTH: "sep.", LEMMA: "september"},
|
||||
|
@ -46,13 +47,11 @@ for exc_data in [
|
|||
{ORTH: "tors.", LEMMA: "torsdag"},
|
||||
{ORTH: "fre.", LEMMA: "fredag"},
|
||||
{ORTH: "lör.", LEMMA: "lördag"},
|
||||
{ORTH: "sön.", LEMMA: "söndag"},
|
||||
{ORTH: "Jan.", LEMMA: "Januari"},
|
||||
{ORTH: "Febr.", LEMMA: "Februari"},
|
||||
{ORTH: "Feb.", LEMMA: "Februari"},
|
||||
{ORTH: "Apr.", LEMMA: "April"},
|
||||
{ORTH: "Jun.", LEMMA: "Juni"},
|
||||
{ORTH: "Jul.", LEMMA: "Juli"},
|
||||
{ORTH: "Aug.", LEMMA: "Augusti"},
|
||||
{ORTH: "Sept.", LEMMA: "September"},
|
||||
{ORTH: "Sep.", LEMMA: "September"},
|
||||
|
@ -65,28 +64,32 @@ for exc_data in [
|
|||
{ORTH: "Tors.", LEMMA: "Torsdag"},
|
||||
{ORTH: "Fre.", LEMMA: "Fredag"},
|
||||
{ORTH: "Lör.", LEMMA: "Lördag"},
|
||||
{ORTH: "Sön.", LEMMA: "Söndag"},
|
||||
{ORTH: "sthlm", LEMMA: "Stockholm"},
|
||||
{ORTH: "gbg", LEMMA: "Göteborg"},
|
||||
]:
|
||||
_exc[exc_data[ORTH]] = [exc_data]
|
||||
|
||||
|
||||
# Specific case abbreviations only
|
||||
for orth in ["AB", "Dr.", "H.M.", "H.K.H.", "m/s", "M/S", "Ph.d.", "S:t", "s:t"]:
|
||||
_exc[orth] = [{ORTH: orth}]
|
||||
|
||||
|
||||
ABBREVIATIONS = [
|
||||
"ang",
|
||||
"anm",
|
||||
"bil",
|
||||
"bl.a",
|
||||
"d.v.s",
|
||||
"doc",
|
||||
"dvs",
|
||||
"e.d",
|
||||
"e.kr",
|
||||
"el",
|
||||
"el.",
|
||||
"eng",
|
||||
"etc",
|
||||
"exkl",
|
||||
"f",
|
||||
"ev",
|
||||
"f.",
|
||||
"f.d",
|
||||
"f.kr",
|
||||
"f.n",
|
||||
|
@ -97,10 +100,11 @@ ABBREVIATIONS = [
|
|||
"fr.o.m",
|
||||
"förf",
|
||||
"inkl",
|
||||
"jur",
|
||||
"iofs",
|
||||
"jur.",
|
||||
"kap",
|
||||
"kl",
|
||||
"kor",
|
||||
"kor.",
|
||||
"kr",
|
||||
"kungl",
|
||||
"lat",
|
||||
|
@ -109,9 +113,10 @@ ABBREVIATIONS = [
|
|||
"m.m",
|
||||
"max",
|
||||
"milj",
|
||||
"min",
|
||||
"min.",
|
||||
"mos",
|
||||
"mt",
|
||||
"mvh",
|
||||
"o.d",
|
||||
"o.s.v",
|
||||
"obs",
|
||||
|
@ -125,21 +130,27 @@ ABBREVIATIONS = [
|
|||
"s.k",
|
||||
"s.t",
|
||||
"sid",
|
||||
"s:t",
|
||||
"t.ex",
|
||||
"t.h",
|
||||
"t.o.m",
|
||||
"t.v",
|
||||
"tel",
|
||||
"ung",
|
||||
"ung.",
|
||||
"vol",
|
||||
"v.",
|
||||
"äv",
|
||||
"övers",
|
||||
]
|
||||
ABBREVIATIONS = [abbr + "." for abbr in ABBREVIATIONS] + ABBREVIATIONS
|
||||
|
||||
# Add abbreviation for trailing punctuation too. If the abbreviation already has a trailing punctuation - skip it.
|
||||
for abbr in ABBREVIATIONS:
|
||||
if abbr.endswith(".") == False:
|
||||
ABBREVIATIONS.append(abbr + ".")
|
||||
|
||||
for orth in ABBREVIATIONS:
|
||||
_exc[orth] = [{ORTH: orth}]
|
||||
capitalized = orth.capitalize()
|
||||
_exc[capitalized] = [{ORTH: capitalized}]
|
||||
|
||||
# Sentences ending in "i." (as in "... peka i."), "m." (as in "...än 2000 m."),
|
||||
# should be tokenized as two separate tokens.
|
||||
|
|
24
spacy/lang/ta/__init__.py
Normal file
24
spacy/lang/ta/__init__.py
Normal file
|
@ -0,0 +1,24 @@
|
|||
# import language-specific data
|
||||
from .stop_words import STOP_WORDS
|
||||
from .lex_attrs import LEX_ATTRS
|
||||
|
||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||
from ...language import Language
|
||||
from ...attrs import LANG
|
||||
from ...util import update_exc
|
||||
|
||||
# create Defaults class in the module scope (necessary for pickling!)
|
||||
class TamilDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters[LANG] = lambda text: 'ta' # language ISO code
|
||||
|
||||
# optional: replace flags with custom functions, e.g. like_num()
|
||||
lex_attr_getters.update(LEX_ATTRS)
|
||||
|
||||
# create actual Language class
|
||||
class Tamil(Language):
|
||||
lang = 'ta' # language ISO code
|
||||
Defaults = TamilDefaults # override defaults
|
||||
|
||||
# set default export – this allows the language class to be lazy-loaded
|
||||
__all__ = ['Tamil']
|
21
spacy/lang/ta/examples.py
Normal file
21
spacy/lang/ta/examples.py
Normal file
|
@ -0,0 +1,21 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
"""
|
||||
Example sentences to test spaCy and its language models.
|
||||
|
||||
>>> from spacy.lang.ta.examples import sentences
|
||||
>>> docs = nlp.pipe(sentences)
|
||||
"""
|
||||
|
||||
|
||||
sentences = [
|
||||
"கிறிஸ்துமஸ் மற்றும் இனிய புத்தாண்டு வாழ்த்துக்கள்",
|
||||
"எனக்கு என் குழந்தைப் பருவம் நினைவிருக்கிறது",
|
||||
"உங்கள் பெயர் என்ன?",
|
||||
"ஏறத்தாழ இலங்கைத் தமிழரில் மூன்றிலொரு பங்கினர் இலங்கையை விட்டு வெளியேறிப் பிற நாடுகளில் வாழ்கின்றனர்",
|
||||
"இந்த ஃபோனுடன் சுமார் ரூ.2,990 மதிப்புள்ள போட் ராக்கர்ஸ் நிறுவனத்தின் ஸ்போர்ட் புளூடூத் ஹெட்போன்ஸ் இலவசமாக வழங்கப்படவுள்ளது.",
|
||||
"மட்டக்களப்பில் பல இடங்களில் வீட்டுத் திட்டங்களுக்கு இன்று அடிக்கல் நாட்டல்",
|
||||
"ஐ போன்க்கு முகத்தை வைத்து அன்லாக் செய்யும் முறை மற்றும் விரலால் தொட்டு அன்லாக் செய்யும் முறையை வாட்ஸ் ஆப் நிறுவனம் இதற்கு முன் கண்டுபிடித்தது"
|
||||
]
|
44
spacy/lang/ta/lex_attrs.py
Normal file
44
spacy/lang/ta/lex_attrs.py
Normal file
|
@ -0,0 +1,44 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
from ...attrs import LIKE_NUM
|
||||
|
||||
|
||||
_numeral_suffixes = {'பத்து': 'பது', 'ற்று': 'று', 'ரத்து':'ரம்' , 'சத்து': 'சம்'}
|
||||
_num_words = ['பூச்சியம்', 'ஒரு', 'ஒன்று', 'இரண்டு', 'மூன்று', 'நான்கு', 'ஐந்து', 'ஆறு', 'ஏழு',
|
||||
'எட்டு', 'ஒன்பது', 'பத்து', 'பதினொன்று', 'பன்னிரண்டு', 'பதின்மூன்று', 'பதினான்கு',
|
||||
'பதினைந்து', 'பதினாறு', 'பதினேழு', 'பதினெட்டு', 'பத்தொன்பது', 'இருபது',
|
||||
'முப்பது', 'நாற்பது', 'ஐம்பது', 'அறுபது', 'எழுபது', 'எண்பது', 'தொண்ணூறு',
|
||||
'நூறு', 'இருநூறு', 'முன்னூறு', 'நாநூறு', 'ஐநூறு', 'அறுநூறு', 'எழுநூறு', 'எண்ணூறு', 'தொள்ளாயிரம்',
|
||||
'ஆயிரம்', 'ஒராயிரம்', 'லட்சம்', 'மில்லியன்', 'கோடி', 'பில்லியன்', 'டிரில்லியன்']
|
||||
|
||||
|
||||
# 20-89 ,90-899,900-99999 and above have different suffixes
|
||||
def suffix_filter(text):
|
||||
# text without numeral suffixes
|
||||
for num_suffix in _numeral_suffixes.keys():
|
||||
length = len(num_suffix)
|
||||
if (len(text) < length):
|
||||
break
|
||||
elif text.endswith(num_suffix):
|
||||
return text[:-length] + _numeral_suffixes[num_suffix]
|
||||
return text
|
||||
|
||||
|
||||
def like_num(text):
|
||||
text = text.replace(',', '').replace('.', '')
|
||||
if text.isdigit():
|
||||
return True
|
||||
if text.count('/') == 1:
|
||||
num, denom = text.split('/')
|
||||
if num.isdigit() and denom.isdigit():
|
||||
return True
|
||||
print(suffix_filter(text))
|
||||
if text.lower() in _num_words:
|
||||
return True
|
||||
elif suffix_filter(text) in _num_words:
|
||||
return True
|
||||
|
||||
return False
|
||||
LEX_ATTRS = {
|
||||
LIKE_NUM: like_num
|
||||
}
|
148
spacy/lang/ta/norm_exceptions.py
Normal file
148
spacy/lang/ta/norm_exceptions.py
Normal file
|
@ -0,0 +1,148 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
_exc = {
|
||||
|
||||
# Regional words normal
|
||||
# Sri Lanka - wikipeadia
|
||||
"இங்க": "இங்கே",
|
||||
"வாங்க": "வாருங்கள்",
|
||||
'ஒண்டு':'ஒன்று',
|
||||
'கண்டு': 'கன்று',
|
||||
'கொண்டு': 'கொன்று',
|
||||
'பண்டி': 'பன்றி',
|
||||
'பச்ச': 'பச்சை',
|
||||
'அம்பது': 'ஐம்பது',
|
||||
'வெச்ச': 'வைத்து',
|
||||
'வச்ச': 'வைத்து',
|
||||
'வச்சி': 'வைத்து',
|
||||
'வாளைப்பழம்':'வாழைப்பழம்',
|
||||
'மண்ணு': 'மண்',
|
||||
'பொன்னு': 'பொன்',
|
||||
'சாவல்': 'சேவல்',
|
||||
'அங்கால': 'அங்கு ',
|
||||
'அசுப்பு': 'நடமாட்டம்',
|
||||
'எழுவான் கரை': 'எழுவான்கரை',
|
||||
'ஓய்யாரம்': 'எழில் ',
|
||||
'ஒளும்பு': 'எழும்பு',
|
||||
'ஓர்மை': 'துணிவு',
|
||||
'கச்சை': 'கோவணம்',
|
||||
'கடப்பு': 'தெருவாசல்',
|
||||
'சுள்ளி': 'காய்ந்த குச்சி',
|
||||
'திறாவுதல்': 'தடவுதல்',
|
||||
'நாசமறுப்பு': 'தொல்லை',
|
||||
'பரிசாரி': 'வைத்தியன்',
|
||||
'பறவாதி': 'பேராசைக்காரன்',
|
||||
'பிசினி': 'உலோபி ',
|
||||
'விசர்': 'பைத்தியம்',
|
||||
'ஏனம்': 'பாத்திரம்',
|
||||
'ஏலா': 'இயலாது',
|
||||
'ஒசில்': 'அழகு',
|
||||
'ஒள்ளுப்பம்': 'கொஞ்சம்',
|
||||
|
||||
# Srilankan and indian
|
||||
'குத்துமதிப்பு': '',
|
||||
'நூனாயம்': 'நூல்நயம்',
|
||||
'பைய': 'மெதுவாக',
|
||||
'மண்டை': 'தலை',
|
||||
'வெள்ளனே': 'சீக்கிரம்',
|
||||
'உசுப்பு': 'எழுப்பு',
|
||||
'ஆணம்': 'குழம்பு',
|
||||
'உறக்கம்': 'தூக்கம்',
|
||||
'பஸ்': 'பேருந்து',
|
||||
'களவு': 'திருட்டு ',
|
||||
|
||||
#relationship
|
||||
'புருசன்': 'கணவன்',
|
||||
'பொஞ்சாதி': 'மனைவி',
|
||||
'புள்ள': 'பிள்ளை',
|
||||
'பிள்ள': 'பிள்ளை',
|
||||
'ஆம்பிளப்புள்ள': 'ஆண் பிள்ளை',
|
||||
'பொம்பிளப்புள்ள': 'பெண் பிள்ளை',
|
||||
'அண்ணாச்சி': 'அண்ணா',
|
||||
'அக்காச்சி': 'அக்கா',
|
||||
'தங்கச்சி': 'தங்கை',
|
||||
|
||||
#difference words
|
||||
'பொடியன்': 'சிறுவன்',
|
||||
'பொட்டை': 'சிறுமி',
|
||||
'பிறகு': 'பின்பு',
|
||||
'டக்கென்டு': 'விரைவாக',
|
||||
'கெதியா': 'விரைவாக',
|
||||
'கிறுகி': 'திரும்பி',
|
||||
'போயித்து வாறன்': 'போய் வருகிறேன்',
|
||||
'வருவாங்களா': 'வருவார்களா',
|
||||
|
||||
# regular spokens
|
||||
'சொல்லு': 'சொல்',
|
||||
'கேளு': 'கேள்',
|
||||
'சொல்லுங்க': 'சொல்லுங்கள்',
|
||||
'கேளுங்க': 'கேளுங்கள்',
|
||||
'நீங்கள்': 'நீ',
|
||||
'உன்': 'உன்னுடைய',
|
||||
|
||||
# Portugeese formal words
|
||||
'அலவாங்கு': 'கடப்பாரை',
|
||||
'ஆசுப்பத்திரி': 'மருத்துவமனை',
|
||||
'உரோதை': 'சில்லு',
|
||||
'கடுதாசி': 'கடிதம்',
|
||||
'கதிரை': 'நாற்காலி',
|
||||
'குசினி': 'அடுக்களை',
|
||||
'கோப்பை': 'கிண்ணம்',
|
||||
'சப்பாத்து': 'காலணி',
|
||||
'தாச்சி': 'இரும்புச் சட்டி',
|
||||
'துவாய்': 'துவாலை',
|
||||
'தவறணை': 'மதுக்கடை',
|
||||
'பீப்பா': 'மரத்தாழி',
|
||||
'யன்னல்': 'சாளரம்',
|
||||
'வாங்கு': 'மரஇருக்கை',
|
||||
|
||||
# Dutch formal words
|
||||
'இறாக்கை': 'பற்சட்டம்',
|
||||
'இலாட்சி': 'இழுப்பறை',
|
||||
'கந்தோர்': 'பணிமனை',
|
||||
'நொத்தாரிசு': 'ஆவண எழுத்துபதிவாளர்',
|
||||
|
||||
# English formal words
|
||||
'இஞ்சினியர்': 'பொறியியலாளர்',
|
||||
'சூப்பு': 'ரசம்',
|
||||
'செக்': 'காசோலை',
|
||||
'சேட்டு': 'மேற்ச்சட்டை',
|
||||
'மார்க்கட்டு': 'சந்தை',
|
||||
'விண்ணன்': 'கெட்டிக்காரன்',
|
||||
|
||||
# Arabic formal words
|
||||
'ஈமான்': 'நம்பிக்கை',
|
||||
'சுன்னத்து': 'விருத்தசேதனம்',
|
||||
'செய்த்தான்': 'பிசாசு',
|
||||
'மவுத்து': 'இறப்பு',
|
||||
'ஹலால்': 'அங்கீகரிக்கப்பட்டது',
|
||||
'கறாம்': 'நிராகரிக்கப்பட்டது',
|
||||
# Persian, Hindustanian and hindi formal words
|
||||
'சுமார்': 'கிட்டத்தட்ட',
|
||||
'சிப்பாய்': 'போர்வீரன்',
|
||||
'சிபார்சு': 'சிபாரிசு',
|
||||
'ஜமீன்': 'பணக்காரா்',
|
||||
'அசல்': 'மெய்யான',
|
||||
'அந்தஸ்து': 'கௌரவம்',
|
||||
'ஆஜர்': 'சமா்ப்பித்தல்',
|
||||
'உசார்': 'எச்சரிக்கை',
|
||||
'அச்சா':'நல்ல',
|
||||
# English words used in text conversations
|
||||
"bcoz": "ஏனெனில்",
|
||||
"bcuz": "ஏனெனில்",
|
||||
"fav": "விருப்பமான",
|
||||
"morning": "காலை வணக்கம்",
|
||||
"gdeveng": "மாலை வணக்கம்",
|
||||
"gdnyt": "இரவு வணக்கம்",
|
||||
"gdnit": "இரவு வணக்கம்",
|
||||
"plz": "தயவு செய்து",
|
||||
"pls": "தயவு செய்து",
|
||||
"thx": "நன்றி",
|
||||
"thanx": "நன்றி",
|
||||
}
|
||||
|
||||
NORM_EXCEPTIONS = {}
|
||||
|
||||
for string, norm in _exc.items():
|
||||
NORM_EXCEPTIONS[string] = norm
|
133
spacy/lang/ta/stop_words.py
Normal file
133
spacy/lang/ta/stop_words.py
Normal file
|
@ -0,0 +1,133 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
|
||||
# Stop words
|
||||
|
||||
STOP_WORDS = set("""
|
||||
ஒரு
|
||||
என்று
|
||||
மற்றும்
|
||||
இந்த
|
||||
இது
|
||||
என்ற
|
||||
கொண்டு
|
||||
என்பது
|
||||
பல
|
||||
ஆகும்
|
||||
அல்லது
|
||||
அவர்
|
||||
நான்
|
||||
உள்ள
|
||||
அந்த
|
||||
இவர்
|
||||
என
|
||||
முதல்
|
||||
என்ன
|
||||
இருந்து
|
||||
சில
|
||||
என்
|
||||
போன்ற
|
||||
வேண்டும்
|
||||
வந்து
|
||||
இதன்
|
||||
அது
|
||||
அவன்
|
||||
தான்
|
||||
பலரும்
|
||||
என்னும்
|
||||
மேலும்
|
||||
பின்னர்
|
||||
கொண்ட
|
||||
இருக்கும்
|
||||
தனது
|
||||
உள்ளது
|
||||
போது
|
||||
என்றும்
|
||||
அதன்
|
||||
தன்
|
||||
பிறகு
|
||||
அவர்கள்
|
||||
வரை
|
||||
அவள்
|
||||
நீ
|
||||
ஆகிய
|
||||
இருந்தது
|
||||
உள்ளன
|
||||
வந்த
|
||||
இருந்த
|
||||
மிகவும்
|
||||
இங்கு
|
||||
மீது
|
||||
ஓர்
|
||||
இவை
|
||||
இந்தக்
|
||||
பற்றி
|
||||
வரும்
|
||||
வேறு
|
||||
இரு
|
||||
இதில்
|
||||
போல்
|
||||
இப்போது
|
||||
அவரது
|
||||
மட்டும்
|
||||
இந்தப்
|
||||
எனும்
|
||||
மேல்
|
||||
பின்
|
||||
சேர்ந்த
|
||||
ஆகியோர்
|
||||
எனக்கு
|
||||
இன்னும்
|
||||
அந்தப்
|
||||
அன்று
|
||||
ஒரே
|
||||
மிக
|
||||
அங்கு
|
||||
பல்வேறு
|
||||
விட்டு
|
||||
பெரும்
|
||||
அதை
|
||||
பற்றிய
|
||||
உன்
|
||||
அதிக
|
||||
அந்தக்
|
||||
பேர்
|
||||
இதனால்
|
||||
அவை
|
||||
அதே
|
||||
ஏன்
|
||||
முறை
|
||||
யார்
|
||||
என்பதை
|
||||
எல்லாம்
|
||||
மட்டுமே
|
||||
இங்கே
|
||||
அங்கே
|
||||
இடம்
|
||||
இடத்தில்
|
||||
அதில்
|
||||
நாம்
|
||||
அதற்கு
|
||||
எனவே
|
||||
பிற
|
||||
சிறு
|
||||
மற்ற
|
||||
விட
|
||||
எந்த
|
||||
எனவும்
|
||||
எனப்படும்
|
||||
எனினும்
|
||||
அடுத்த
|
||||
இதனை
|
||||
இதை
|
||||
கொள்ள
|
||||
இந்தத்
|
||||
இதற்கு
|
||||
அதனால்
|
||||
தவிர
|
||||
போல
|
||||
வரையில்
|
||||
சற்று
|
||||
எனக்
|
||||
""".split())
|
|
@ -5,24 +5,14 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
|||
from .tag_map import TAG_MAP
|
||||
from .stop_words import STOP_WORDS
|
||||
|
||||
from ...tokens import Doc
|
||||
from ...language import Language
|
||||
from ...attrs import LANG
|
||||
from ...language import Language
|
||||
from ...tokens import Doc
|
||||
from ...util import DummyTokenizer
|
||||
|
||||
|
||||
class ThaiDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters[LANG] = lambda text: "th"
|
||||
tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
|
||||
tag_map = TAG_MAP
|
||||
stop_words = STOP_WORDS
|
||||
|
||||
|
||||
class Thai(Language):
|
||||
lang = "th"
|
||||
Defaults = ThaiDefaults
|
||||
|
||||
def make_doc(self, text):
|
||||
class ThaiTokenizer(DummyTokenizer):
|
||||
def __init__(self, cls, nlp=None):
|
||||
try:
|
||||
from pythainlp.tokenize import word_tokenize
|
||||
except ImportError:
|
||||
|
@ -30,8 +20,35 @@ class Thai(Language):
|
|||
"The Thai tokenizer requires the PyThaiNLP library: "
|
||||
"https://github.com/PyThaiNLP/pythainlp"
|
||||
)
|
||||
words = [x for x in list(word_tokenize(text, "newmm"))]
|
||||
return Doc(self.vocab, words=words, spaces=[False] * len(words))
|
||||
|
||||
self.word_tokenize = word_tokenize
|
||||
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
||||
|
||||
def __call__(self, text):
|
||||
words = list(self.word_tokenize(text, "newmm"))
|
||||
spaces = [False] * len(words)
|
||||
return Doc(self.vocab, words=words, spaces=spaces)
|
||||
|
||||
|
||||
class ThaiDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters[LANG] = lambda _text: "th"
|
||||
|
||||
tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
|
||||
tag_map = TAG_MAP
|
||||
stop_words = STOP_WORDS
|
||||
|
||||
@classmethod
|
||||
def create_tokenizer(cls, nlp=None):
|
||||
return ThaiTokenizer(cls, nlp)
|
||||
|
||||
|
||||
class Thai(Language):
|
||||
lang = "th"
|
||||
Defaults = ThaiDefaults
|
||||
|
||||
def make_doc(self, text):
|
||||
return self.tokenizer(text)
|
||||
|
||||
|
||||
__all__ = ["Thai"]
|
||||
|
|
|
@ -5,6 +5,7 @@ from ...attrs import LIKE_NUM
|
|||
|
||||
|
||||
# Thirteen, fifteen etc. are written separate: on üç
|
||||
|
||||
_num_words = [
|
||||
"bir",
|
||||
"iki",
|
||||
|
@ -28,6 +29,7 @@ _num_words = [
|
|||
"bin",
|
||||
"milyon",
|
||||
"milyar",
|
||||
"trilyon",
|
||||
"katrilyon",
|
||||
"kentilyon",
|
||||
]
|
||||
|
|
|
@ -353,10 +353,38 @@ def test_doc_api_similarity_match():
|
|||
assert doc.similarity(doc2) == 0.0
|
||||
|
||||
|
||||
def test_lowest_common_ancestor(en_tokenizer):
|
||||
tokens = en_tokenizer("the lazy dog slept")
|
||||
doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=[2, 1, 1, 0])
|
||||
@pytest.mark.parametrize(
|
||||
"sentence,heads,lca_matrix",
|
||||
[
|
||||
(
|
||||
"the lazy dog slept",
|
||||
[2, 1, 1, 0],
|
||||
numpy.array([[0, 2, 2, 3], [2, 1, 2, 3], [2, 2, 2, 3], [3, 3, 3, 3]]),
|
||||
),
|
||||
(
|
||||
"The lazy dog slept. The quick fox jumped",
|
||||
[2, 1, 1, 0, -1, 2, 1, 1, 0],
|
||||
numpy.array(
|
||||
[
|
||||
[0, 2, 2, 3, 3, -1, -1, -1, -1],
|
||||
[2, 1, 2, 3, 3, -1, -1, -1, -1],
|
||||
[2, 2, 2, 3, 3, -1, -1, -1, -1],
|
||||
[3, 3, 3, 3, 3, -1, -1, -1, -1],
|
||||
[3, 3, 3, 3, 4, -1, -1, -1, -1],
|
||||
[-1, -1, -1, -1, -1, 5, 7, 7, 8],
|
||||
[-1, -1, -1, -1, -1, 7, 6, 7, 8],
|
||||
[-1, -1, -1, -1, -1, 7, 7, 7, 8],
|
||||
[-1, -1, -1, -1, -1, 8, 8, 8, 8],
|
||||
]
|
||||
),
|
||||
),
|
||||
],
|
||||
)
|
||||
def test_lowest_common_ancestor(en_tokenizer, sentence, heads, lca_matrix):
|
||||
tokens = en_tokenizer(sentence)
|
||||
doc = get_doc(tokens.vocab, [t.text for t in tokens], heads=heads)
|
||||
lca = doc.get_lca_matrix()
|
||||
assert (lca == lca_matrix).all()
|
||||
assert lca[1, 1] == 1
|
||||
assert lca[0, 1] == 2
|
||||
assert lca[1, 2] == 2
|
||||
|
|
|
@ -80,10 +80,24 @@ def test_spans_lca_matrix(en_tokenizer):
|
|||
tokens = en_tokenizer("the lazy dog slept")
|
||||
doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=[2, 1, 1, 0])
|
||||
lca = doc[:2].get_lca_matrix()
|
||||
assert lca[0, 0] == 0
|
||||
assert lca[0, 1] == -1
|
||||
assert lca[1, 0] == -1
|
||||
assert lca[1, 1] == 1
|
||||
assert lca.shape == (2, 2)
|
||||
assert lca[0, 0] == 0 # the & the -> the
|
||||
assert lca[0, 1] == -1 # the & lazy -> dog (out of span)
|
||||
assert lca[1, 0] == -1 # lazy & the -> dog (out of span)
|
||||
assert lca[1, 1] == 1 # lazy & lazy -> lazy
|
||||
|
||||
lca = doc[1:].get_lca_matrix()
|
||||
assert lca.shape == (3, 3)
|
||||
assert lca[0, 0] == 0 # lazy & lazy -> lazy
|
||||
assert lca[0, 1] == 1 # lazy & dog -> dog
|
||||
assert lca[0, 2] == 2 # lazy & slept -> slept
|
||||
|
||||
lca = doc[2:].get_lca_matrix()
|
||||
assert lca.shape == (2, 2)
|
||||
assert lca[0, 0] == 0 # dog & dog -> dog
|
||||
assert lca[0, 1] == 1 # dog & slept -> slept
|
||||
assert lca[1, 0] == 1 # slept & dog -> slept
|
||||
assert lca[1, 1] == 1 # slept & slept -> slept
|
||||
|
||||
|
||||
def test_span_similarity_match():
|
||||
|
@ -158,15 +172,17 @@ def test_span_as_doc(doc):
|
|||
|
||||
|
||||
def test_span_string_label(doc):
|
||||
span = Span(doc, 0, 1, label='hello')
|
||||
assert span.label_ == 'hello'
|
||||
assert span.label == doc.vocab.strings['hello']
|
||||
span = Span(doc, 0, 1, label="hello")
|
||||
assert span.label_ == "hello"
|
||||
assert span.label == doc.vocab.strings["hello"]
|
||||
|
||||
|
||||
def test_span_string_set_label(doc):
|
||||
span = Span(doc, 0, 1)
|
||||
span.label_ = 'hello'
|
||||
assert span.label_ == 'hello'
|
||||
assert span.label == doc.vocab.strings['hello']
|
||||
span.label_ = "hello"
|
||||
assert span.label_ == "hello"
|
||||
assert span.label == doc.vocab.strings["hello"]
|
||||
|
||||
|
||||
def test_span_ents_property(doc):
|
||||
"""Test span.ents for the """
|
||||
|
|
53
spacy/tests/lang/sv/test_exceptions.py
Normal file
53
spacy/tests/lang/sv/test_exceptions.py
Normal file
|
@ -0,0 +1,53 @@
|
|||
# coding: utf8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
SV_TOKEN_EXCEPTION_TESTS = [
|
||||
('Smörsåsen används bl.a. till fisk', ['Smörsåsen', 'används', 'bl.a.', 'till', 'fisk']),
|
||||
('Jag kommer först kl. 13 p.g.a. diverse förseningar', ['Jag', 'kommer', 'först', 'kl.', '13', 'p.g.a.', 'diverse', 'förseningar']),
|
||||
('Anders I. tycker om ord med i i.', ["Anders", "I.", "tycker", "om", "ord", "med", "i", "i", "."])
|
||||
]
|
||||
|
||||
|
||||
@pytest.mark.parametrize('text,expected_tokens', SV_TOKEN_EXCEPTION_TESTS)
|
||||
def test_sv_tokenizer_handles_exception_cases(sv_tokenizer, text, expected_tokens):
|
||||
tokens = sv_tokenizer(text)
|
||||
token_list = [token.text for token in tokens if not token.is_space]
|
||||
assert expected_tokens == token_list
|
||||
|
||||
|
||||
@pytest.mark.parametrize('text', ["driveru", "hajaru", "Serru", "Fixaru"])
|
||||
def test_sv_tokenizer_handles_verb_exceptions(sv_tokenizer, text):
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 2
|
||||
assert tokens[1].text == "u"
|
||||
|
||||
|
||||
@pytest.mark.parametrize('text',
|
||||
["bl.a", "m.a.o.", "Jan.", "Dec.", "kr.", "osv."])
|
||||
def test_sv_tokenizer_handles_abbr(sv_tokenizer, text):
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 1
|
||||
|
||||
|
||||
@pytest.mark.parametrize('text', ["Jul.", "jul.", "sön.", "Sön."])
|
||||
def test_sv_tokenizer_handles_ambiguous_abbr(sv_tokenizer, text):
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 2
|
||||
|
||||
|
||||
def test_sv_tokenizer_handles_exc_in_text(sv_tokenizer):
|
||||
text = "Det er bl.a. ikke meningen"
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 5
|
||||
assert tokens[2].text == "bl.a."
|
||||
|
||||
|
||||
def test_sv_tokenizer_handles_custom_base_exc(sv_tokenizer):
|
||||
text = "Her er noget du kan kigge i."
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 8
|
||||
assert tokens[6].text == "i"
|
||||
assert tokens[7].text == "."
|
15
spacy/tests/lang/sv/test_lemmatizer.py
Normal file
15
spacy/tests/lang/sv/test_lemmatizer.py
Normal file
|
@ -0,0 +1,15 @@
|
|||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.mark.parametrize('string,lemma', [('DNA-profilernas', 'DNA-profil'),
|
||||
('Elfenbenskustens', 'Elfenbenskusten'),
|
||||
('abortmotståndarens', 'abortmotståndare'),
|
||||
('kolesterols', 'kolesterol'),
|
||||
('portionssnusernas', 'portionssnus'),
|
||||
('åsyns', 'åsyn')])
|
||||
def test_lemmatizer_lookup_assigns(sv_tokenizer, string, lemma):
|
||||
tokens = sv_tokenizer(string)
|
||||
assert tokens[0].lemma_ == lemma
|
37
spacy/tests/lang/sv/test_prefix_suffix_infix.py
Normal file
37
spacy/tests/lang/sv/test_prefix_suffix_infix.py
Normal file
|
@ -0,0 +1,37 @@
|
|||
# coding: utf-8
|
||||
"""Test that tokenizer prefixes, suffixes and infixes are handled correctly."""
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import pytest
|
||||
|
||||
@pytest.mark.parametrize('text', ["(under)"])
|
||||
def test_tokenizer_splits_no_special(sv_tokenizer, text):
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 3
|
||||
|
||||
|
||||
@pytest.mark.parametrize('text', ["gitta'r", "Björn's", "Lars'"])
|
||||
def test_tokenizer_handles_no_punct(sv_tokenizer, text):
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 1
|
||||
|
||||
|
||||
@pytest.mark.parametrize('text', ["svart.Gul", "Hej.Världen"])
|
||||
def test_tokenizer_splits_period_infix(sv_tokenizer, text):
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 3
|
||||
|
||||
|
||||
@pytest.mark.parametrize('text', ["Hej,Världen", "en,två"])
|
||||
def test_tokenizer_splits_comma_infix(sv_tokenizer, text):
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 3
|
||||
assert tokens[0].text == text.split(",")[0]
|
||||
assert tokens[1].text == ","
|
||||
assert tokens[2].text == text.split(",")[1]
|
||||
|
||||
|
||||
@pytest.mark.parametrize('text', ["svart...Gul", "svart...gul"])
|
||||
def test_tokenizer_splits_ellipsis_infix(sv_tokenizer, text):
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 3
|
21
spacy/tests/lang/sv/test_text.py
Normal file
21
spacy/tests/lang/sv/test_text.py
Normal file
|
@ -0,0 +1,21 @@
|
|||
# coding: utf-8
|
||||
"""Test that longer and mixed texts are tokenized correctly."""
|
||||
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import pytest
|
||||
|
||||
def test_sv_tokenizer_handles_long_text(sv_tokenizer):
|
||||
text = """Det var så härligt ute på landet. Det var sommar, majsen var gul, havren grön,
|
||||
höet var uppställt i stackar nere vid den gröna ängen, och där gick storken på sina långa,
|
||||
röda ben och snackade engelska, för det språket hade han lärt sig av sin mor.
|
||||
|
||||
Runt om åkrar och äng låg den stora skogen, och mitt i skogen fanns djupa sjöar; jo, det var verkligen trevligt ute på landet!"""
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 86
|
||||
|
||||
|
||||
def test_sv_tokenizer_handles_trailing_dot_for_i_in_sentence(sv_tokenizer):
|
||||
text = "Provar att tokenisera en mening med ord i."
|
||||
tokens = sv_tokenizer(text)
|
||||
assert len(tokens) == 9
|
|
@ -5,27 +5,31 @@ from ..util import get_doc
|
|||
|
||||
import pytest
|
||||
import numpy
|
||||
from numpy.testing import assert_array_equal
|
||||
|
||||
|
||||
@pytest.mark.parametrize('words,heads,matrix', [
|
||||
(
|
||||
'She created a test for spacy'.split(),
|
||||
[1, 0, 1, -2, -1, -1],
|
||||
numpy.array([
|
||||
[0, 1, 1, 1, 1, 1],
|
||||
[1, 1, 1, 1, 1, 1],
|
||||
[1, 1, 2, 3, 3, 3],
|
||||
[1, 1, 3, 3, 3, 3],
|
||||
[1, 1, 3, 3, 4, 4],
|
||||
[1, 1, 3, 3, 4, 5]], dtype=numpy.int32)
|
||||
)
|
||||
])
|
||||
def test_issue2396(en_vocab, words, heads, matrix):
|
||||
doc = get_doc(en_vocab, words=words, heads=heads)
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"sentence,heads,matrix",
|
||||
[
|
||||
(
|
||||
"She created a test for spacy",
|
||||
[1, 0, 1, -2, -1, -1],
|
||||
numpy.array(
|
||||
[
|
||||
[0, 1, 1, 1, 1, 1],
|
||||
[1, 1, 1, 1, 1, 1],
|
||||
[1, 1, 2, 3, 3, 3],
|
||||
[1, 1, 3, 3, 3, 3],
|
||||
[1, 1, 3, 3, 4, 4],
|
||||
[1, 1, 3, 3, 4, 5],
|
||||
],
|
||||
dtype=numpy.int32,
|
||||
),
|
||||
)
|
||||
],
|
||||
)
|
||||
def test_issue2396(en_tokenizer, sentence, heads, matrix):
|
||||
tokens = en_tokenizer(sentence)
|
||||
doc = get_doc(tokens.vocab, [t.text for t in tokens], heads=heads)
|
||||
span = doc[:]
|
||||
assert_array_equal(doc.get_lca_matrix(), matrix)
|
||||
assert_array_equal(span.get_lca_matrix(), matrix)
|
||||
|
||||
|
||||
assert (doc.get_lca_matrix() == matrix).all()
|
||||
assert (span.get_lca_matrix() == matrix).all()
|
||||
|
|
|
@ -10,7 +10,7 @@ def test_issue2901():
|
|||
"""Test that `nlp` doesn't fail."""
|
||||
try:
|
||||
nlp = Japanese()
|
||||
except:
|
||||
except ImportError:
|
||||
pytest.skip()
|
||||
|
||||
doc = nlp("pythonが大好きです")
|
||||
|
|
10
spacy/tests/regression/test_issue3178.py
Normal file
10
spacy/tests/regression/test_issue3178.py
Normal file
|
@ -0,0 +1,10 @@
|
|||
from __future__ import unicode_literals
|
||||
import pytest
|
||||
import spacy
|
||||
|
||||
|
||||
@pytest.mark.models('fr')
|
||||
def test_issue1959(FR):
|
||||
texts = ['Je suis la mauvaise herbe', "Me, myself and moi"]
|
||||
for text in texts:
|
||||
FR(text)
|
|
@ -1075,21 +1075,30 @@ cdef int [:,:] _get_lca_matrix(Doc doc, int start, int end):
|
|||
cdef int [:,:] lca_matrix
|
||||
|
||||
n_tokens= end - start
|
||||
lca_matrix = numpy.empty((n_tokens, n_tokens), dtype=numpy.int32)
|
||||
lca_mat = numpy.empty((n_tokens, n_tokens), dtype=numpy.int32)
|
||||
lca_mat.fill(-1)
|
||||
lca_matrix = lca_mat
|
||||
|
||||
for j in range(start, end):
|
||||
token_j = doc[j]
|
||||
for j in range(n_tokens):
|
||||
token_j = doc[start + j]
|
||||
# the common ancestor of token and itself is itself:
|
||||
lca_matrix[j, j] = j
|
||||
for k in range(j + 1, end):
|
||||
lca = _get_tokens_lca(token_j, doc[k])
|
||||
# we will only iterate through tokens in the same sentence
|
||||
sent = token_j.sent
|
||||
sent_start = sent.start
|
||||
j_idx_in_sent = start + j - sent_start
|
||||
n_missing_tokens_in_sent = len(sent) - j_idx_in_sent
|
||||
# make sure we do not go past `end`, in cases where `end` < sent.end
|
||||
max_range = min(j + n_missing_tokens_in_sent, end)
|
||||
for k in range(j + 1, max_range):
|
||||
lca = _get_tokens_lca(token_j, doc[start + k])
|
||||
# if lca is outside of span, we set it to -1
|
||||
if not start <= lca < end:
|
||||
lca_matrix[j, k] = -1
|
||||
lca_matrix[k, j] = -1
|
||||
else:
|
||||
lca_matrix[j, k] = lca
|
||||
lca_matrix[k, j] = lca
|
||||
lca_matrix[j, k] = lca - start
|
||||
lca_matrix[k, j] = lca - start
|
||||
|
||||
return lca_matrix
|
||||
|
||||
|
|
|
@ -524,9 +524,9 @@ cdef class Span:
|
|||
return len(list(self.rights))
|
||||
|
||||
property subtree:
|
||||
"""Tokens that descend from tokens in the span, but fall outside it.
|
||||
"""Tokens within the span and tokens which descend from them.
|
||||
|
||||
YIELDS (Token): A descendant of a token within the span.
|
||||
YIELDS (Token): A token within the span, or a descendant from it.
|
||||
"""
|
||||
def __get__(self):
|
||||
for word in self.lefts:
|
||||
|
|
|
@ -457,10 +457,11 @@ cdef class Token:
|
|||
yield from self.rights
|
||||
|
||||
property subtree:
|
||||
"""A sequence of all the token's syntactic descendents.
|
||||
"""A sequence containing the token and all the token's syntactic
|
||||
descendants.
|
||||
|
||||
YIELDS (Token): A descendent token such that
|
||||
`self.is_ancestor(descendent)`.
|
||||
`self.is_ancestor(descendent) or token == self`.
|
||||
"""
|
||||
def __get__(self):
|
||||
for word in self.lefts:
|
||||
|
|
|
@ -253,7 +253,6 @@ def get_entry_point(key, value):
|
|||
def is_in_jupyter():
|
||||
"""Check if user is running spaCy from a Jupyter notebook by detecting the
|
||||
IPython kernel. Mainly used for the displaCy visualizer.
|
||||
|
||||
RETURNS (bool): True if in Jupyter, False if not.
|
||||
"""
|
||||
# https://stackoverflow.com/a/39662359/6400719
|
||||
|
@ -667,3 +666,19 @@ class SimpleFrozenDict(dict):
|
|||
|
||||
def update(self, other):
|
||||
raise NotImplementedError(Errors.E095)
|
||||
|
||||
|
||||
class DummyTokenizer(object):
|
||||
# add dummy methods for to_bytes, from_bytes, to_disk and from_disk to
|
||||
# allow serialization (see #1557)
|
||||
def to_bytes(self, **exclude):
|
||||
return b''
|
||||
|
||||
def from_bytes(self, _bytes_data, **exclude):
|
||||
return self
|
||||
|
||||
def to_disk(self, _path, **exclude):
|
||||
return None
|
||||
|
||||
def from_disk(self, _path, **exclude):
|
||||
return self
|
||||
|
|
|
@ -150,3 +150,9 @@ p
|
|||
+dep-row("re", "repeated element")
|
||||
+dep-row("rs", "reported speech")
|
||||
+dep-row("sb", "subject")
|
||||
+dep-row("sbp", "passivised subject")
|
||||
+dep-row("sp", "subject or predicate")
|
||||
+dep-row("svp", "separable verb prefix")
|
||||
+dep-row("uc", "unit component")
|
||||
+dep-row("vo", "vocative")
|
||||
+dep-row("ROOT", "root")
|
||||
|
|
|
@ -5,7 +5,7 @@ include ../_includes/_mixins
|
|||
p
|
||||
| The #[code PhraseMatcher] lets you efficiently match large terminology
|
||||
| lists. While the #[+api("matcher") #[code Matcher]] lets you match
|
||||
| squences based on lists of token descriptions, the #[code PhraseMatcher]
|
||||
| sequences based on lists of token descriptions, the #[code PhraseMatcher]
|
||||
| accepts match patterns in the form of #[code Doc] objects.
|
||||
|
||||
+h(2, "init") PhraseMatcher.__init__
|
||||
|
|
|
@ -489,7 +489,7 @@ p
|
|||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p Tokens that descend from tokens in the span, but fall outside it.
|
||||
p Tokens within the span and tokens which descend from them.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
|
@ -500,7 +500,7 @@ p Tokens that descend from tokens in the span, but fall outside it.
|
|||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A descendant of a token within the span.
|
||||
+cell A token within the span, or a descendant from it.
|
||||
|
||||
+h(2, "has_vector") Span.has_vector
|
||||
+tag property
|
||||
|
|
|
@ -1,3 +1,4 @@
|
|||
|
||||
//- 💫 DOCS > API > TOKEN
|
||||
|
||||
include ../_includes/_mixins
|
||||
|
@ -405,7 +406,7 @@ p
|
|||
+tag property
|
||||
+tag-model("parse")
|
||||
|
||||
p A sequence of all the token's syntactic descendants.
|
||||
p A sequence containing the token and all the token's syntactic descendants.
|
||||
|
||||
+aside-code("Example").
|
||||
doc = nlp(u'Give it back! He pleaded.')
|
||||
|
@ -416,7 +417,7 @@ p A sequence of all the token's syntactic descendants.
|
|||
+row("foot")
|
||||
+cell yields
|
||||
+cell #[code Token]
|
||||
+cell A descendant token such that #[code self.is_ancestor(descendant)].
|
||||
+cell A descendant token such that #[code self.is_ancestor(token) or token == self].
|
||||
|
||||
+h(2, "is_sent_start") Token.is_sent_start
|
||||
+tag property
|
||||
|
|
|
@ -1083,20 +1083,31 @@
|
|||
"category": ["pipeline"]
|
||||
},
|
||||
{
|
||||
"id": "spacy2conllu",
|
||||
"title": "spaCy2CoNLLU",
|
||||
"id": "spacy-conll",
|
||||
"title": "spacy_conll",
|
||||
"slogan": "Parse text with spaCy and print the output in CoNLL-U format",
|
||||
"description": "Simple script to parse text with spaCy and print the output in CoNLL-U format",
|
||||
"description": "This module allows you to parse a text to CoNLL-U format. You can use it as a command line tool, or embed it in your own scripts.",
|
||||
"code_example": [
|
||||
"python parse_as_conllu.py [-h] --input_file INPUT_FILE [--output_file OUTPUT_FILE] --model MODEL"
|
||||
"from spacy_conll import Spacy2ConllParser",
|
||||
"spacyconll = Spacy2ConllParser()",
|
||||
"",
|
||||
"# `parse` returns a generator of the parsed sentences",
|
||||
"for parsed_sent in spacyconll.parse(input_str='I like cookies.\nWhat about you?\nI don't like 'em!'):",
|
||||
" do_something_(parsed_sent)",
|
||||
"",
|
||||
"# `parseprint` prints output to stdout (default) or a file (use `output_file` parameter)",
|
||||
"# This method is called when using the command line",
|
||||
"spacyconll.parseprint(input_str='I like cookies.')"
|
||||
],
|
||||
"code_language": "bash",
|
||||
"author": "Raquel G. Alhama",
|
||||
"code_language": "python",
|
||||
"author": "Bram Vanroy",
|
||||
"author_links": {
|
||||
"github": "rgalhama"
|
||||
"github": "BramVanroy",
|
||||
"website": "https://bramvanroy.be"
|
||||
|
||||
},
|
||||
"github": "rgalhama/spaCy2CoNLLU",
|
||||
"category": ["training"]
|
||||
"github": "BramVanroy/spacy_conll",
|
||||
"category": ["standalone"]
|
||||
}
|
||||
],
|
||||
"projectCats": {
|
||||
|
|
|
@ -159,7 +159,7 @@ p
|
|||
| To provide training examples to the entity recogniser, you'll first need
|
||||
| to create an instance of the #[+api("goldparse") #[code GoldParse]] class.
|
||||
| You can specify your annotations in a stand-off format or as token tags.
|
||||
| If a character offset in your entity annotations don't fall on a token
|
||||
| If a character offset in your entity annotations doesn't fall on a token
|
||||
| boundary, the #[code GoldParse] class will treat that annotation as a
|
||||
| missing value. This allows for more realistic training, because the
|
||||
| entity recogniser is allowed to learn from examples that may feature
|
||||
|
|
|
@ -444,7 +444,7 @@ p
|
|||
| Let's say you're analysing user comments and you want to find out what
|
||||
| people are saying about Facebook. You want to start off by finding
|
||||
| adjectives following "Facebook is" or "Facebook was". This is obviously
|
||||
| a very rudimentary solution, but it'll be fast, and a great way get an
|
||||
| a very rudimentary solution, but it'll be fast, and a great way to get an
|
||||
| idea for what's in your data. Your pattern could look like this:
|
||||
|
||||
+code.
|
||||
|
|
|
@ -40,7 +40,7 @@ p
|
|||
| constrained to predict parses consistent with the sentence boundaries.
|
||||
|
||||
+infobox("Important note", "⚠️")
|
||||
| To prevent inconsitent state, you can only set boundaries #[em before] a
|
||||
| To prevent inconsistent state, you can only set boundaries #[em before] a
|
||||
| document is parsed (and #[code Doc.is_parsed] is #[code False]). To
|
||||
| ensure that your component is added in the right place, you can set
|
||||
| #[code before='parser'] or #[code first=True] when adding it to the
|
||||
|
|
|
@ -21,7 +21,7 @@ p
|
|||
| which needs to be split into two tokens: #[code {ORTH: "do"}] and
|
||||
| #[code {ORTH: "n't", LEMMA: "not"}]. The prefixes, suffixes and infixes
|
||||
| mosty define punctuation rules – for example, when to split off periods
|
||||
| (at the end of a sentence), and when to leave token containing periods
|
||||
| (at the end of a sentence), and when to leave tokens containing periods
|
||||
| intact (abbreviations like "U.S.").
|
||||
|
||||
+graphic("/assets/img/language_data.svg")
|
||||
|
|
|
@ -43,7 +43,7 @@ p
|
|||
|
||||
p
|
||||
| This example shows how to use multiple cores to process text using
|
||||
| spaCy and #[+a("https://pythonhosted.org/joblib/") Joblib]. We're
|
||||
| spaCy and #[+a("https://joblib.readthedocs.io/en/latest/parallel.html") Joblib]. We're
|
||||
| exporting part-of-speech-tagged, true-cased, (very roughly)
|
||||
| sentence-separated text, with each "sentence" on a newline, and
|
||||
| spaces between tokens. Data is loaded from the IMDB movie reviews
|
||||
|
|
|
@ -74,7 +74,7 @@ p
|
|||
displacy.serve(doc, style='ent')
|
||||
|
||||
p
|
||||
| This feature is espeically handy if you're using displaCy to compare
|
||||
| This feature is especially handy if you're using displaCy to compare
|
||||
| performance at different stages of a process, e.g. during training. Here
|
||||
| you could use the title for a brief description of the text example and
|
||||
| the number of iterations.
|
||||
|
|
|
@ -61,7 +61,7 @@ p
|
|||
output_path.open('w', encoding='utf-8').write(svg)
|
||||
|
||||
p
|
||||
| The above code will generate the dependency visualizations as to
|
||||
| The above code will generate the dependency visualizations as
|
||||
| two files, #[code This-is-an-example.svg] and #[code This-is-another-one.svg].
|
||||
|
||||
|
||||
|
|
|
@ -24,7 +24,7 @@ include ../_includes/_mixins
|
|||
| standards.
|
||||
|
||||
p
|
||||
| The quickest way visualize #[code Doc] is to use
|
||||
| The quickest way to visualize #[code Doc] is to use
|
||||
| #[+api("displacy#serve") #[code displacy.serve]]. This will spin up a
|
||||
| simple web server and let you view the result straight from your browser.
|
||||
| displaCy can either take a single #[code Doc] or a list of #[code Doc]
|
||||
|
|
Loading…
Reference in New Issue
Block a user