mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-25 01:16:28 +03:00
Merge branch 'master' into develop
This commit is contained in:
commit
5d0b60999d
106
.github/contributors/DeNeutoy.md
vendored
Normal file
106
.github/contributors/DeNeutoy.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name |Mark Neumann |
|
||||||
|
| Company name (if applicable) |Allen Institute for AI |
|
||||||
|
| Title or role (if applicable) |Research Engineer |
|
||||||
|
| Date | 13/01/2019 |
|
||||||
|
| GitHub username |@Deneutoy |
|
||||||
|
| Website (optional) |markneumann.xyz |
|
106
.github/contributors/Loghijiaha.md
vendored
Normal file
106
.github/contributors/Loghijiaha.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [ x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ x] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Loghi Perinpanayagam |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | Student |
|
||||||
|
| Date | 13 Jan, 2019 |
|
||||||
|
| GitHub username | loghijiaha |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/PolyglotOpenstreetmap.md
vendored
Normal file
106
.github/contributors/PolyglotOpenstreetmap.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Jo |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2018-01-26 |
|
||||||
|
| GitHub username | PolyglotOpenstreetmap|
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/adrianeboyd.md
vendored
Normal file
106
.github/contributors/adrianeboyd.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Adriane Boyd |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 28 January 2019 |
|
||||||
|
| GitHub username | adrianeboyd |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/alvations.md
vendored
Normal file
106
.github/contributors/alvations.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Liling |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 04 Jan 2019 |
|
||||||
|
| GitHub username | alvations |
|
||||||
|
| Website (optional) | |
|
2
.github/contributors/amperinet.md
vendored
2
.github/contributors/amperinet.md
vendored
|
@ -101,6 +101,6 @@ mark both statements:
|
||||||
| Name | Amandine Périnet |
|
| Name | Amandine Périnet |
|
||||||
| Company name (if applicable) | 365Talents |
|
| Company name (if applicable) | 365Talents |
|
||||||
| Title or role (if applicable) | Data Science Researcher |
|
| Title or role (if applicable) | Data Science Researcher |
|
||||||
| Date | 12/12/2018 |
|
| Date | 28/01/2019 |
|
||||||
| GitHub username | amperinet |
|
| GitHub username | amperinet |
|
||||||
| Website (optional) | |
|
| Website (optional) | |
|
||||||
|
|
106
.github/contributors/boena.md
vendored
Normal file
106
.github/contributors/boena.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Björn Lennartsson |
|
||||||
|
| Company name (if applicable) | Uptrail AB |
|
||||||
|
| Title or role (if applicable) | CTO |
|
||||||
|
| Date | 2019-01-15 |
|
||||||
|
| GitHub username | boena |
|
||||||
|
| Website (optional) | www.uptrail.com |
|
106
.github/contributors/foufaster.md
vendored
Normal file
106
.github/contributors/foufaster.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name |Anès Foufa |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) |NLP developer |
|
||||||
|
| Date |21/01/2019 |
|
||||||
|
| GitHub username |foufaster |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/ozcankasal.md
vendored
Normal file
106
.github/contributors/ozcankasal.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Ozcan Kasal |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | December 21, 2018 |
|
||||||
|
| GitHub username | ozcankasal |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/retnuh.md
vendored
Normal file
106
.github/contributors/retnuh.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
- Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
- to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
- each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
| ----------------------------- | ------------ |
|
||||||
|
| Name | Hunter Kelly |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2019-01-10 |
|
||||||
|
| GitHub username | retnuh |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/willprice.md
vendored
Normal file
106
.github/contributors/willprice.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | --------------------- |
|
||||||
|
| Name | Will Price |
|
||||||
|
| Company name (if applicable) | N/A |
|
||||||
|
| Title or role (if applicable) | N/A |
|
||||||
|
| Date | 26/12/2018 |
|
||||||
|
| GitHub username | willprice |
|
||||||
|
| Website (optional) | https://willprice.org |
|
|
@ -1,4 +1,5 @@
|
||||||
recursive-include include *.h
|
recursive-include include *.h
|
||||||
include LICENSE
|
include LICENSE
|
||||||
include README.md
|
include README.md
|
||||||
|
include pyproject.toml
|
||||||
include bin/spacy
|
include bin/spacy
|
||||||
|
|
106
contributer_agreement.md
Normal file
106
contributer_agreement.md
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Laura Baakman |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | February 7, 2019 |
|
||||||
|
| GitHub username | lauraBaakman |
|
||||||
|
| Website (optional) | |
|
|
@ -58,7 +58,7 @@ import spacy
|
||||||
lang=("Language class to initialise", "option", "l", str),
|
lang=("Language class to initialise", "option", "l", str),
|
||||||
)
|
)
|
||||||
def main(patterns_loc, text_loc, n=10000, lang="en"):
|
def main(patterns_loc, text_loc, n=10000, lang="en"):
|
||||||
nlp = spacy.blank("en")
|
nlp = spacy.blank(lang)
|
||||||
nlp.vocab.lex_attr_getters = {}
|
nlp.vocab.lex_attr_getters = {}
|
||||||
phrases = read_gazetteer(nlp.tokenizer, patterns_loc)
|
phrases = read_gazetteer(nlp.tokenizer, patterns_loc)
|
||||||
count = 0
|
count = 0
|
||||||
|
|
|
@ -26,6 +26,11 @@ from spacy.util import minibatch, compounding
|
||||||
n_iter=("Number of training iterations", "option", "n", int),
|
n_iter=("Number of training iterations", "option", "n", int),
|
||||||
)
|
)
|
||||||
def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
|
def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
|
||||||
|
if output_dir is not None:
|
||||||
|
output_dir = Path(output_dir)
|
||||||
|
if not output_dir.exists():
|
||||||
|
output_dir.mkdir()
|
||||||
|
|
||||||
if model is not None:
|
if model is not None:
|
||||||
nlp = spacy.load(model) # load existing spaCy model
|
nlp = spacy.load(model) # load existing spaCy model
|
||||||
print("Loaded model '%s'" % model)
|
print("Loaded model '%s'" % model)
|
||||||
|
@ -87,9 +92,6 @@ def main(model=None, output_dir=None, n_iter=20, n_texts=2000):
|
||||||
print(test_text, doc.cats)
|
print(test_text, doc.cats)
|
||||||
|
|
||||||
if output_dir is not None:
|
if output_dir is not None:
|
||||||
output_dir = Path(output_dir)
|
|
||||||
if not output_dir.exists():
|
|
||||||
output_dir.mkdir()
|
|
||||||
with nlp.use_params(optimizer.averages):
|
with nlp.use_params(optimizer.averages):
|
||||||
nlp.to_disk(output_dir)
|
nlp.to_disk(output_dir)
|
||||||
print("Saved model to", output_dir)
|
print("Saved model to", output_dir)
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
[
|
[
|
||||||
{
|
{
|
||||||
"id": "wsj_0200",
|
"id": 42,
|
||||||
"paragraphs": [
|
"paragraphs": [
|
||||||
{
|
{
|
||||||
"raw": "In an Oct. 19 review of \"The Misanthrope\" at Chicago's Goodman Theatre (\"Revitalized Classics Take the Stage in Windy City,\" Leisure & Arts), the role of Celimene, played by Kim Cattrall, was mistakenly attributed to Christina Haag. Ms. Haag plays Elianti.",
|
"raw": "In an Oct. 19 review of \"The Misanthrope\" at Chicago's Goodman Theatre (\"Revitalized Classics Take the Stage in Windy City,\" Leisure & Arts), the role of Celimene, played by Kim Cattrall, was mistakenly attributed to Christina Haag. Ms. Haag plays Elianti.",
|
||||||
|
|
10
pyproject.toml
Normal file
10
pyproject.toml
Normal file
|
@ -0,0 +1,10 @@
|
||||||
|
[build-system]
|
||||||
|
requires = ["setuptools",
|
||||||
|
"wheel>0.32.0.<0.33.0",
|
||||||
|
"Cython",
|
||||||
|
"cymem>=2.0.2,<2.1.0",
|
||||||
|
"preshed>=2.0.1,<2.1.0",
|
||||||
|
"murmurhash>=0.28.0,<1.1.0",
|
||||||
|
"thinc>=6.12.1,<6.13.0",
|
||||||
|
]
|
||||||
|
build-backend = "setuptools.build_meta"
|
|
@ -14,7 +14,7 @@ plac<1.0.0,>=0.9.6
|
||||||
pathlib==1.0.1; python_version < "3.4"
|
pathlib==1.0.1; python_version < "3.4"
|
||||||
# Development dependencies
|
# Development dependencies
|
||||||
cython>=0.25
|
cython>=0.25
|
||||||
pytest>=4.0.0,<5.0.0
|
pytest>=4.0.0,<4.1.0
|
||||||
pytest-timeout>=1.3.0,<2.0.0
|
pytest-timeout>=1.3.0,<2.0.0
|
||||||
mock>=2.0.0,<3.0.0
|
mock>=2.0.0,<3.0.0
|
||||||
flake8>=3.5.0,<3.6.0
|
flake8>=3.5.0,<3.6.0
|
||||||
|
|
1
setup.py
1
setup.py
|
@ -246,6 +246,7 @@ def setup_package():
|
||||||
"cuda92": ["cupy-cuda92>=4.0"],
|
"cuda92": ["cupy-cuda92>=4.0"],
|
||||||
"cuda100": ["cupy-cuda100>=4.0"],
|
"cuda100": ["cupy-cuda100>=4.0"],
|
||||||
},
|
},
|
||||||
|
python_requires=">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*",
|
||||||
classifiers=[
|
classifiers=[
|
||||||
"Development Status :: 5 - Production/Stable",
|
"Development Status :: 5 - Production/Stable",
|
||||||
"Environment :: Console",
|
"Environment :: Console",
|
||||||
|
|
|
@ -31,9 +31,13 @@ def read_iob(raw_sents):
|
||||||
tokens = [re.split("[^\w\-]", line.strip())]
|
tokens = [re.split("[^\w\-]", line.strip())]
|
||||||
if len(tokens[0]) == 3:
|
if len(tokens[0]) == 3:
|
||||||
words, pos, iob = zip(*tokens)
|
words, pos, iob = zip(*tokens)
|
||||||
else:
|
elif len(tokens[0]) == 2:
|
||||||
words, iob = zip(*tokens)
|
words, iob = zip(*tokens)
|
||||||
pos = ["-"] * len(words)
|
pos = ["-"] * len(words)
|
||||||
|
else:
|
||||||
|
raise ValueError(
|
||||||
|
"The iob/iob2 file is not formatted correctly. Try checking whitespace and delimiters."
|
||||||
|
)
|
||||||
biluo = iob_to_biluo(iob)
|
biluo = iob_to_biluo(iob)
|
||||||
sentences.append(
|
sentences.append(
|
||||||
[
|
[
|
||||||
|
|
|
@ -208,7 +208,11 @@ def read_freqs(freqs_loc, max_length=100, min_doc_freq=5, min_freq=50):
|
||||||
doc_freq = int(doc_freq)
|
doc_freq = int(doc_freq)
|
||||||
freq = int(freq)
|
freq = int(freq)
|
||||||
if doc_freq >= min_doc_freq and freq >= min_freq and len(key) < max_length:
|
if doc_freq >= min_doc_freq and freq >= min_freq and len(key) < max_length:
|
||||||
word = literal_eval(key)
|
try:
|
||||||
|
word = literal_eval(key)
|
||||||
|
except SyntaxError:
|
||||||
|
# Take odd strings literally.
|
||||||
|
word = literal_eval("'%s'" % key)
|
||||||
smooth_count = counts.smoother(int(freq))
|
smooth_count = counts.smoother(int(freq))
|
||||||
probs[word] = math.log(smooth_count) - log_total
|
probs[word] = math.log(smooth_count) - log_total
|
||||||
oov_prob = math.log(counts.smoother(0)) - log_total
|
oov_prob = math.log(counts.smoother(0)) - log_total
|
||||||
|
|
|
@ -9,7 +9,6 @@ from ..util import is_in_jupyter
|
||||||
|
|
||||||
|
|
||||||
_html = {}
|
_html = {}
|
||||||
IS_JUPYTER = is_in_jupyter()
|
|
||||||
RENDER_WRAPPER = None
|
RENDER_WRAPPER = None
|
||||||
|
|
||||||
|
|
||||||
|
@ -18,7 +17,7 @@ def render(
|
||||||
style="dep",
|
style="dep",
|
||||||
page=False,
|
page=False,
|
||||||
minify=False,
|
minify=False,
|
||||||
jupyter=IS_JUPYTER,
|
jupyter=False,
|
||||||
options={},
|
options={},
|
||||||
manual=False,
|
manual=False,
|
||||||
):
|
):
|
||||||
|
@ -51,7 +50,7 @@ def render(
|
||||||
html = _html["parsed"]
|
html = _html["parsed"]
|
||||||
if RENDER_WRAPPER is not None:
|
if RENDER_WRAPPER is not None:
|
||||||
html = RENDER_WRAPPER(html)
|
html = RENDER_WRAPPER(html)
|
||||||
if jupyter: # return HTML rendered by IPython display()
|
if jupyter or is_in_jupyter(): # return HTML rendered by IPython display()
|
||||||
from IPython.core.display import display, HTML
|
from IPython.core.display import display, HTML
|
||||||
|
|
||||||
return display(HTML(html))
|
return display(HTML(html))
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
# coding: utf8
|
# coding: utf8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
import random
|
import uuid
|
||||||
|
|
||||||
from .templates import TPL_DEP_SVG, TPL_DEP_WORDS, TPL_DEP_ARCS
|
from .templates import TPL_DEP_SVG, TPL_DEP_WORDS, TPL_DEP_ARCS
|
||||||
from .templates import TPL_ENT, TPL_ENTS, TPL_FIGURE, TPL_TITLE, TPL_PAGE
|
from .templates import TPL_ENT, TPL_ENTS, TPL_FIGURE, TPL_TITLE, TPL_PAGE
|
||||||
|
@ -41,7 +41,7 @@ class DependencyRenderer(object):
|
||||||
"""
|
"""
|
||||||
# Create a random ID prefix to make sure parses don't receive the
|
# Create a random ID prefix to make sure parses don't receive the
|
||||||
# same ID, even if they're identical
|
# same ID, even if they're identical
|
||||||
id_prefix = random.randint(0, 999)
|
id_prefix = uuid.uuid4().hex
|
||||||
rendered = [
|
rendered = [
|
||||||
self.render_svg("{}-{}".format(id_prefix, i), p["words"], p["arcs"])
|
self.render_svg("{}-{}".format(id_prefix, i), p["words"], p["arcs"])
|
||||||
for i, p in enumerate(parsed)
|
for i, p in enumerate(parsed)
|
||||||
|
|
|
@ -4,20 +4,24 @@ from __future__ import unicode_literals
|
||||||
from .lookup import LOOKUP
|
from .lookup import LOOKUP
|
||||||
from ._adjectives import ADJECTIVES
|
from ._adjectives import ADJECTIVES
|
||||||
from ._adjectives_irreg import ADJECTIVES_IRREG
|
from ._adjectives_irreg import ADJECTIVES_IRREG
|
||||||
|
from ._adp_irreg import ADP_IRREG
|
||||||
from ._adverbs import ADVERBS
|
from ._adverbs import ADVERBS
|
||||||
|
from ._auxiliary_verbs_irreg import AUXILIARY_VERBS_IRREG
|
||||||
|
from ._cconj_irreg import CCONJ_IRREG
|
||||||
|
from ._dets_irreg import DETS_IRREG
|
||||||
|
from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES
|
||||||
from ._nouns import NOUNS
|
from ._nouns import NOUNS
|
||||||
from ._nouns_irreg import NOUNS_IRREG
|
from ._nouns_irreg import NOUNS_IRREG
|
||||||
|
from ._pronouns_irreg import PRONOUNS_IRREG
|
||||||
|
from ._sconj_irreg import SCONJ_IRREG
|
||||||
from ._verbs import VERBS
|
from ._verbs import VERBS
|
||||||
from ._verbs_irreg import VERBS_IRREG
|
from ._verbs_irreg import VERBS_IRREG
|
||||||
from ._dets_irreg import DETS_IRREG
|
|
||||||
from ._pronouns_irreg import PRONOUNS_IRREG
|
|
||||||
from ._auxiliary_verbs_irreg import AUXILIARY_VERBS_IRREG
|
|
||||||
from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES
|
|
||||||
|
|
||||||
|
|
||||||
LEMMA_INDEX = {'adj': ADJECTIVES, 'adv': ADVERBS, 'noun': NOUNS, 'verb': VERBS}
|
LEMMA_INDEX = {'adj': ADJECTIVES, 'adv': ADVERBS, 'noun': NOUNS, 'verb': VERBS}
|
||||||
|
|
||||||
LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'noun': NOUNS_IRREG, 'verb': VERBS_IRREG,
|
LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'adp': ADP_IRREG, 'aux': AUXILIARY_VERBS_IRREG,
|
||||||
'det': DETS_IRREG, 'pron': PRONOUNS_IRREG, 'aux': AUXILIARY_VERBS_IRREG}
|
'cconj': CCONJ_IRREG, 'det': DETS_IRREG, 'noun': NOUNS_IRREG, 'verb': VERBS_IRREG,
|
||||||
|
'pron': PRONOUNS_IRREG, 'sconj': SCONJ_IRREG}
|
||||||
|
|
||||||
LEMMA_RULES = {'adj': ADJECTIVE_RULES, 'noun': NOUN_RULES, 'verb': VERB_RULES}
|
LEMMA_RULES = {'adj': ADJECTIVE_RULES, 'noun': NOUN_RULES, 'verb': VERB_RULES}
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
24
spacy/lang/fr/lemmatizer/_adp_irreg.py
Normal file
24
spacy/lang/fr/lemmatizer/_adp_irreg.py
Normal file
|
@ -0,0 +1,24 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
|
||||||
|
ADP_IRREG = {
|
||||||
|
"a": ("à",),
|
||||||
|
"apr.": ("après",),
|
||||||
|
"aux": ("à",),
|
||||||
|
"av.": ("avant",),
|
||||||
|
"avt": ("avant",),
|
||||||
|
"cf.": ("cf",),
|
||||||
|
"conf.": ("cf",),
|
||||||
|
"confer": ("cf",),
|
||||||
|
"d'": ("de",),
|
||||||
|
"des": ("de",),
|
||||||
|
"du": ("de",),
|
||||||
|
"jusqu'": ("jusque",),
|
||||||
|
"pdt": ("pendant",),
|
||||||
|
"+": ("plus",),
|
||||||
|
"pr": ("pour",),
|
||||||
|
"/": ("sur",),
|
||||||
|
"versus": ("vs",),
|
||||||
|
"vs.": ("vs",)
|
||||||
|
}
|
File diff suppressed because it is too large
Load Diff
17
spacy/lang/fr/lemmatizer/_cconj_irreg.py
Normal file
17
spacy/lang/fr/lemmatizer/_cconj_irreg.py
Normal file
|
@ -0,0 +1,17 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
|
||||||
|
CCONJ_IRREG = {
|
||||||
|
"&": ("et",),
|
||||||
|
"c-à-d": ("c'est-à-dire",),
|
||||||
|
"c.-à.-d.": ("c'est-à-dire",),
|
||||||
|
"càd": ("c'est-à-dire",),
|
||||||
|
"&": ("et",),
|
||||||
|
"et|ou": ("et-ou",),
|
||||||
|
"et/ou": ("et-ou",),
|
||||||
|
"i.e.": ("c'est-à-dire",),
|
||||||
|
"ie": ("c'est-à-dire",),
|
||||||
|
"ou/et": ("et-ou",),
|
||||||
|
"+": ("plus",)
|
||||||
|
}
|
|
@ -4,20 +4,27 @@ from __future__ import unicode_literals
|
||||||
|
|
||||||
DETS_IRREG = {
|
DETS_IRREG = {
|
||||||
"aucune": ("aucun",),
|
"aucune": ("aucun",),
|
||||||
|
"cents": ("cent",),
|
||||||
|
"certaine": ("certain",),
|
||||||
|
"certaines": ("certain",),
|
||||||
|
"certains": ("certain",),
|
||||||
"ces": ("ce",),
|
"ces": ("ce",),
|
||||||
"cet": ("ce",),
|
"cet": ("ce",),
|
||||||
"cette": ("ce",),
|
"cette": ("ce",),
|
||||||
"cents": ("cent",),
|
"des": ("un",),
|
||||||
"certaines": ("certains",),
|
|
||||||
"différentes": ("différents",),
|
"différentes": ("différents",),
|
||||||
|
"diverse": ("divers",),
|
||||||
"diverses": ("divers",),
|
"diverses": ("divers",),
|
||||||
|
"du": ("de",),
|
||||||
"la": ("le",),
|
"la": ("le",),
|
||||||
"les": ("le",),
|
|
||||||
"l'": ("le",),
|
|
||||||
"laquelle": ("lequel",),
|
"laquelle": ("lequel",),
|
||||||
|
"les": ("le",),
|
||||||
|
"lesdites": ("ledit",),
|
||||||
|
"lesdits": ("ledit",),
|
||||||
|
"leurs": ("leur",),
|
||||||
"lesquelles": ("lequel",),
|
"lesquelles": ("lequel",),
|
||||||
"lesquels": ("lequel",),
|
"lesquels": ("lequel",),
|
||||||
"leurs": ("leur",),
|
"l'": ("le",),
|
||||||
"mainte": ("maint",),
|
"mainte": ("maint",),
|
||||||
"maintes": ("maint",),
|
"maintes": ("maint",),
|
||||||
"maints": ("maint",),
|
"maints": ("maint",),
|
||||||
|
@ -27,23 +34,29 @@ DETS_IRREG = {
|
||||||
"nulle": ("nul",),
|
"nulle": ("nul",),
|
||||||
"nulles": ("nul",),
|
"nulles": ("nul",),
|
||||||
"nuls": ("nul",),
|
"nuls": ("nul",),
|
||||||
|
"pareille": ("pareil",),
|
||||||
|
"pareilles": ("pareil",),
|
||||||
|
"pareils": ("pareil",),
|
||||||
"quelle": ("quel",),
|
"quelle": ("quel",),
|
||||||
"quelles": ("quel",),
|
"quelles": ("quel",),
|
||||||
"quels": ("quel",),
|
"qq": ("quelque",),
|
||||||
"quelqu'": ("quelque",),
|
"qqes": ("quelque",),
|
||||||
|
"qqs": ("quelque",),
|
||||||
"quelques": ("quelque",),
|
"quelques": ("quelque",),
|
||||||
|
"quelqu'": ("quelque",),
|
||||||
|
"quels": ("quel",),
|
||||||
"sa": ("son",),
|
"sa": ("son",),
|
||||||
"ses": ("son",),
|
"ses": ("son",),
|
||||||
"telle": ("tel",),
|
|
||||||
"telles": ("tel",),
|
|
||||||
"tels": ("tel",),
|
|
||||||
"ta": ("ton",),
|
"ta": ("ton",),
|
||||||
|
"telles": ("tel",),
|
||||||
|
"telle": ("tel",),
|
||||||
|
"tels": ("tel",),
|
||||||
"tes": ("ton",),
|
"tes": ("ton",),
|
||||||
"tous": ("tout",),
|
"tous": ("tout",),
|
||||||
"toute": ("tout",),
|
|
||||||
"toutes": ("tout",),
|
"toutes": ("tout",),
|
||||||
"des": ("un",),
|
"toute": ("tout",),
|
||||||
"une": ("un",),
|
"une": ("un",),
|
||||||
"vingts": ("vingt",),
|
"vingts": ("vingt",),
|
||||||
|
"vot'": ("votre",),
|
||||||
"vos": ("votre",),
|
"vos": ("votre",),
|
||||||
}
|
}
|
||||||
|
|
|
@ -63,36 +63,8 @@ NOUN_RULES = [
|
||||||
["w", "w"],
|
["w", "w"],
|
||||||
["y", "y"],
|
["y", "y"],
|
||||||
["z", "z"],
|
["z", "z"],
|
||||||
["as", "a"],
|
["s", ""],
|
||||||
["aux", "au"],
|
["x", ""],
|
||||||
["cs", "c"],
|
|
||||||
["chs", "ch"],
|
|
||||||
["ds", "d"],
|
|
||||||
["és", "é"],
|
|
||||||
["es", "e"],
|
|
||||||
["eux", "eu"],
|
|
||||||
["fs", "f"],
|
|
||||||
["gs", "g"],
|
|
||||||
["hs", "h"],
|
|
||||||
["is", "i"],
|
|
||||||
["ïs", "ï"],
|
|
||||||
["js", "j"],
|
|
||||||
["ks", "k"],
|
|
||||||
["ls", "l"],
|
|
||||||
["ms", "m"],
|
|
||||||
["ns", "n"],
|
|
||||||
["oux", "ou"],
|
|
||||||
["os", "o"],
|
|
||||||
["ps", "p"],
|
|
||||||
["qs", "q"],
|
|
||||||
["rs", "r"],
|
|
||||||
["ses", "se"],
|
|
||||||
["se", "se"],
|
|
||||||
["ts", "t"],
|
|
||||||
["us", "u"],
|
|
||||||
["vs", "v"],
|
|
||||||
["ws", "w"],
|
|
||||||
["ys", "y"],
|
|
||||||
["nt(e", "nt"],
|
["nt(e", "nt"],
|
||||||
["nt(e)", "nt"],
|
["nt(e)", "nt"],
|
||||||
["al(e", "ale"],
|
["al(e", "ale"],
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -4,37 +4,89 @@ from __future__ import unicode_literals
|
||||||
|
|
||||||
PRONOUNS_IRREG = {
|
PRONOUNS_IRREG = {
|
||||||
"aucune": ("aucun",),
|
"aucune": ("aucun",),
|
||||||
"celle-ci": ("celui-ci",),
|
"autres": ("autre",),
|
||||||
"celles-ci": ("celui-ci",),
|
"ça": ("cela",),
|
||||||
"ceux-ci": ("celui-ci",),
|
"c'": ("ce",),
|
||||||
"celle-là": ("celui-là",),
|
|
||||||
"celles-là": ("celui-là",),
|
|
||||||
"ceux-là": ("celui-là",),
|
|
||||||
"celle": ("celui",),
|
"celle": ("celui",),
|
||||||
|
"celle-ci": ("celui-ci",),
|
||||||
|
"celle-là": ("celui-là",),
|
||||||
"celles": ("celui",),
|
"celles": ("celui",),
|
||||||
"ceux": ("celui",),
|
"celles-ci": ("celui-ci",),
|
||||||
|
"celles-là": ("celui-là",),
|
||||||
"certaines": ("certains",),
|
"certaines": ("certains",),
|
||||||
|
"ceux": ("celui",),
|
||||||
|
"ceux-ci": ("celui-ci",),
|
||||||
|
"ceux-là": ("celui-là",),
|
||||||
"chacune": ("chacun",),
|
"chacune": ("chacun",),
|
||||||
|
"-elle": ("lui",),
|
||||||
|
"elle": ("lui",),
|
||||||
|
"elle-même": ("lui-même",),
|
||||||
|
"-elles": ("lui",),
|
||||||
|
"elles": ("lui",),
|
||||||
|
"elles-mêmes": ("lui-même",),
|
||||||
|
"eux": ("lui",),
|
||||||
|
"eux-mêmes": ("lui-même",),
|
||||||
"icelle": ("icelui",),
|
"icelle": ("icelui",),
|
||||||
"icelles": ("icelui",),
|
"icelles": ("icelui",),
|
||||||
"iceux": ("icelui",),
|
"iceux": ("icelui",),
|
||||||
|
"-il": ("il",),
|
||||||
|
"-ils": ("il",),
|
||||||
|
"ils": ("il",),
|
||||||
|
"-je": ("je",),
|
||||||
|
"j'": ("je",),
|
||||||
"la": ("le",),
|
"la": ("le",),
|
||||||
"les": ("le",),
|
|
||||||
"laquelle": ("lequel",),
|
"laquelle": ("lequel",),
|
||||||
|
"l'autre": ("l'autre",),
|
||||||
|
"les": ("le",),
|
||||||
"lesquelles": ("lequel",),
|
"lesquelles": ("lequel",),
|
||||||
"lesquels": ("lequel",),
|
"lesquels": ("lequel",),
|
||||||
"elle-même": ("lui-même",),
|
"-leur": ("leur",),
|
||||||
"elles-mêmes": ("lui-même",),
|
"l'on": ("on",),
|
||||||
"eux-mêmes": ("lui-même",),
|
"-lui": ("lui",),
|
||||||
|
"l'une": ("l'un",),
|
||||||
|
"mêmes": ("même",),
|
||||||
|
"-m'": ("me",),
|
||||||
|
"m'": ("me",),
|
||||||
|
"-moi": ("moi",),
|
||||||
|
"nous-mêmes": ("nous-même",),
|
||||||
|
"-nous": ("nous",),
|
||||||
|
"-on": ("on",),
|
||||||
|
"qqchose": ("quelque chose",),
|
||||||
|
"qqch": ("quelque chose",),
|
||||||
|
"qqc": ("quelque chose",),
|
||||||
|
"qqn": ("quelqu'un",),
|
||||||
"quelle": ("quel",),
|
"quelle": ("quel",),
|
||||||
"quelles": ("quel",),
|
"quelles": ("quel",),
|
||||||
"quels": ("quel",),
|
"quelques-unes": ("quelques-uns",),
|
||||||
"quelques-unes": ("quelqu'un",),
|
|
||||||
"quelques-uns": ("quelqu'un",),
|
|
||||||
"quelque-une": ("quelqu'un",),
|
"quelque-une": ("quelqu'un",),
|
||||||
|
"quelqu'une": ("quelqu'un",),
|
||||||
|
"quels": ("quel",),
|
||||||
"qu": ("que",),
|
"qu": ("que",),
|
||||||
"telle": ("tel",),
|
"s'": ("se",),
|
||||||
|
"-t-elle": ("elle",),
|
||||||
|
"-t-elles": ("elle",),
|
||||||
"telles": ("tel",),
|
"telles": ("tel",),
|
||||||
|
"telle": ("tel",),
|
||||||
"tels": ("tel",),
|
"tels": ("tel",),
|
||||||
"toutes": ("tous",),
|
"-t-en": ("en",),
|
||||||
|
"-t-il": ("il",),
|
||||||
|
"-t-ils": ("il",),
|
||||||
|
"-toi": ("toi",),
|
||||||
|
"-t-on": ("on",),
|
||||||
|
"tous": ("tout",),
|
||||||
|
"toutes": ("tout",),
|
||||||
|
"toute": ("tout",),
|
||||||
|
"-t'": ("te",),
|
||||||
|
"t'": ("te",),
|
||||||
|
"-tu": ("tu",),
|
||||||
|
"-t-y": ("y",),
|
||||||
|
"unes": ("un",),
|
||||||
|
"une": ("un",),
|
||||||
|
"uns": ("un",),
|
||||||
|
"vous-mêmes": ("vous-même",),
|
||||||
|
"vous-même": ("vous-même",),
|
||||||
|
"-vous": ("vous",),
|
||||||
|
"-vs": ("vous",),
|
||||||
|
"vs": ("vous",),
|
||||||
|
"-y": ("y",),
|
||||||
}
|
}
|
||||||
|
|
19
spacy/lang/fr/lemmatizer/_sconj_irreg.py
Normal file
19
spacy/lang/fr/lemmatizer/_sconj_irreg.py
Normal file
|
@ -0,0 +1,19 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
|
||||||
|
SCONJ_IRREG = {
|
||||||
|
"lorsqu'": ("lorsque",),
|
||||||
|
"pac'que": ("parce que",),
|
||||||
|
"pac'qu'": ("parce que",),
|
||||||
|
"parc'que": ("parce que",),
|
||||||
|
"parc'qu'": ("parce que",),
|
||||||
|
"paske": ("parce que",),
|
||||||
|
"pask'": ("parce que",),
|
||||||
|
"pcq": ("parce que",),
|
||||||
|
"+": ("plus",),
|
||||||
|
"puisqu'": ("puisque",),
|
||||||
|
"qd": ("quand",),
|
||||||
|
"quoiqu'": ("quoique",),
|
||||||
|
"qu'": ("que",)
|
||||||
|
}
|
|
@ -6,63 +6,64 @@ VERBS = set(
|
||||||
"""
|
"""
|
||||||
abaisser abandonner abdiquer abecquer abéliser aberrer abhorrer abîmer abjurer
|
abaisser abandonner abdiquer abecquer abéliser aberrer abhorrer abîmer abjurer
|
||||||
ablater abluer ablutionner abominer abonder abonner aborder aborner aboucher
|
ablater abluer ablutionner abominer abonder abonner aborder aborner aboucher
|
||||||
abouler abouter abraquer abraser abreuver abricoter abriter absenter absinther
|
abouler abouter aboutonner abracadabrer abraquer abraser abreuver abricoter
|
||||||
absolutiser absorber abuser académifier académiser acagnarder accabler
|
abriter absenter absinther absolutiser absorber abuser académifier académiser
|
||||||
accagner accaparer accastiller accentuer accepter accessoiriser accidenter
|
acagnarder accabler accagner accaparer accastiller accentuer accepter
|
||||||
acclamer acclimater accointer accolader accoler accommoder accompagner
|
accessoiriser accidenter acclamer acclimater accointer accolader accoler
|
||||||
accorder accorer accoster accoter accoucher accouder accouer accoupler
|
accommoder accompagner accorder accorer accoster accoter accoucher accouder
|
||||||
accoutrer accoutumer accouver accrassiner accréditer accrocher acculer
|
accouer accoupler accoutrer accoutumer accouver accrassiner accréditer
|
||||||
acculturer accumuler accuser acenser acétaliser acétyler achalander acharner
|
accrocher acculer acculturer accumuler accuser acenser acétaliser acétyler
|
||||||
acheminer achopper achromatiser aciduler aciériser acliquer acoquiner acquêter
|
achalander acharner acheminer achopper achromatiser aciduler aciériser
|
||||||
acquitter acter actiniser actionner activer actoriser actualiser acupuncturer
|
acliquer acoquiner acquêter acquitter acter actiniser actionner activer
|
||||||
acyler adapter additionner adenter adieuser adirer adjectiver adjectiviser
|
actoriser actualiser acupuncturer acyler adapter additionner adenter adieuser
|
||||||
adjurer adjuver administrer admirer admonester adoniser adonner adopter adorer
|
adirer adjectiver adjectiviser adjurer adjuver administrer admirer admonester
|
||||||
adorner adosser adouber adresser adsorber aduler adverbialiser aéroporter
|
adoniser adonner adopter adorer adorner adosser adouber adresser adsorber
|
||||||
aérosoliser aérosonder aérotransporter affabuler affacturer affairer affaisser
|
aduler adverbialiser aéroporter aérosoliser aérosonder aérotransporter
|
||||||
affaiter affaler affamer affecter affectionner affermer afficher affider
|
affabuler affacturer affairer affaisser affaiter affaler affamer affecter
|
||||||
affiler affiner affirmer affistoler affixer affleurer afflouer affluer affoler
|
affectionner affermer afficher affider affiler affiner affirmer affistoler
|
||||||
afforester affouiller affourcher affriander affricher affrioler affriquer
|
affixer affleurer afflouer affluer affoler afforester affouiller affourcher
|
||||||
affriter affronter affruiter affubler affurer affûter afghaniser afistoler
|
affriander affricher affrioler affriquer affriter affronter affruiter affubler
|
||||||
africaniser agatiser agenouiller agglutiner aggraver agioter agiter agoniser
|
affurer affûter afghaniser afistoler africaniser agatiser agenouiller
|
||||||
agourmander agrafer agrainer agrémenter agresser agriffer agripper
|
agglutiner aggraver agioter agiter agoniser agourmander agrafer agrainer
|
||||||
agroalimentariser agrouper aguetter aguicher ahaner aheurter aicher aider
|
agrémenter agresser agricher agriffer agripper agroalimentariser agrouper
|
||||||
aigretter aiguer aiguiller aiguillonner aiguiser ailer ailler ailloliser
|
aguetter aguicher aguiller ahaner aheurter aicher aider aigretter aiguer
|
||||||
aimanter aimer airer ajointer ajourer ajourner ajouter ajuster ajuter
|
aiguiller aiguillonner aiguiser ailer ailler ailloliser aimanter aimer airer
|
||||||
alambiquer alarmer albaniser albitiser alcaliniser alcaliser alcooliser
|
ajointer ajourer ajourner ajouter ajuster ajuter alambiquer alarmer albaniser
|
||||||
alcoolyser alcoyler aldoliser alerter aleviner algébriser algérianiser
|
albitiser alcaliniser alcaliser alcooliser alcoolyser alcoyler aldoliser
|
||||||
algorithmiser aligner alimenter alinéater alinéatiser aliter alkyler allaiter
|
alerter aleviner algébriser algérianiser algorithmiser aligner alimenter
|
||||||
allectomiser allégoriser allitiser allivrer allocutionner alloter allouer
|
alinéater alinéatiser aliter alkyler allaiter allectomiser allégoriser
|
||||||
alluder allumer allusionner alluvionner allyler aloter alpaguer alphabétiser
|
allitiser allivrer allocutionner alloter allouer alluder allumer allusionner
|
||||||
alterner aluminer aluminiser aluner alvéoler alvéoliser amabiliser amadouer
|
alluvionner allyler aloter alpaguer alphabétiser alterner aluminer aluminiser
|
||||||
amalgamer amariner amarrer amateloter ambitionner ambler ambrer ambuler
|
aluner alvéoler alvéoliser amabiliser amadouer amalgamer amariner amarrer
|
||||||
améliorer amender amenuiser américaniser ameulonner ameuter amhariser amiauler
|
amateloter ambitionner ambler ambrer ambuler améliorer amender amenuiser
|
||||||
amicoter amidonner amignarder amignoter amignotter aminer ammoniaquer
|
américaniser ameulonner ameuter amhariser amiauler amicoter amidonner
|
||||||
ammoniser ammoxyder amocher amouiller amouracher amourer amphotériser ampouler
|
amignarder amignoter amignotter aminer ammoniaquer ammoniser ammoxyder amocher
|
||||||
amputer amunitionner amurer amuser anagrammatiser anagrammer analyser
|
amouiller amouracher amourer amphotériser ampouler amputer amunitionner amurer
|
||||||
anamorphoser anaphylactiser anarchiser anastomoser anathématiser anatomiser
|
amuser anagrammatiser anagrammer analyser anamorphoser anaphylactiser
|
||||||
ancher anchoiter ancrer anecdoter anecdotiser angéliser anglaiser angler
|
anarchiser anastomoser anathématiser anatomiser ancher anchoiter ancrer
|
||||||
angliciser angoisser anguler animaliser animer aniser ankyloser annexer
|
anecdoter anecdotiser angéliser anglaiser angler angliciser angoisser anguler
|
||||||
annihiler annoter annualiser annuler anodiser ânonner anser antagoniser
|
animaliser animer aniser ankyloser annexer annihiler annoter annualiser
|
||||||
antéposer antérioriser anthropomorphiser anticiper anticoaguler antidater
|
annuler anodiser ânonner anser antagoniser antéposer antérioriser
|
||||||
antiparasiter antiquer antiseptiser anuiter aoûter apaiser apériter apetisser
|
anthropomorphiser anticiper anticoaguler antidater antiparasiter antiquer
|
||||||
apeurer apicaliser apiquer aplaner apologiser aponévrotomiser aponter aposter
|
antiseptiser anuiter aoûter apaiser apériter apetisser apeurer apicaliser
|
||||||
apostiller apostoliser apostropher apostumer apothéoser appareiller apparenter
|
apiquer aplaner apologiser aponévrotomiser aponter aposter apostiller
|
||||||
appeauter appertiser appliquer appointer appoltronner apponter apporter
|
apostoliser apostropher apostumer apothéoser appareiller apparenter appeauter
|
||||||
apposer appréhender apprêter apprivoiser approcher approuver approvisionner
|
appertiser appliquer appointer appoltronner apponter apporter apposer
|
||||||
approximer apurer aquareller arabiser araméiser aramer araser arbitrer arborer
|
appréhender apprêter apprivoiser approcher approuver approvisionner approximer
|
||||||
arboriser arcbouter arc-bouter archaïser architecturer archiver arçonner
|
apurer aquareller arabiser araméiser aramer araser arbitrer arborer arboriser
|
||||||
ardoiser aréniser arer argenter argentiniser argoter argotiser argumenter
|
arcbouter arc-bouter archaïser architecturer archiver arçonner ardoiser
|
||||||
arianiser arimer ariser aristocratiser aristotéliser arithmétiser armaturer
|
aréniser arer argenter argentiniser argoter argotiser argumenter arianiser
|
||||||
armer arnaquer aromatiser arpenter arquebuser arquer arracher arraisonner
|
arimer ariser aristocratiser aristotéliser arithmétiser armaturer armer
|
||||||
arrenter arrêter arrher arrimer arriser arriver arroser arsouiller
|
arnaquer aromatiser arpenter arquebuser arquer arracher arraisonner arrenter
|
||||||
artérialiser articler articuler artificialiser artistiquer aryaniser aryler
|
arrêter arrher arrimer arriser arriver arroser arsouiller artérialiser
|
||||||
ascensionner ascétiser aseptiser asexuer asianiser asiatiser aspecter
|
articler articuler artificialiser artistiquer aryaniser aryler ascensionner
|
||||||
asphalter aspirer assabler assaisonner assassiner assembler assener asséner
|
ascétiser aseptiser asexuer asianiser asiatiser aspecter asphalter aspirer
|
||||||
assermenter asserter assibiler assigner assimiler assister assoiffer assoler
|
assabler assaisonner assassiner assembler assener asséner assermenter asserter
|
||||||
assommer assoner assoter assumer assurer asticoter astiquer athéiser
|
assibiler assigner assimiler assister assoiffer assoler assommer assoner
|
||||||
atlantiser atomiser atourner atropiniser attabler attacher attaquer attarder
|
assoter assumer assurer asticoter astiquer athéiser atlantiser atomiser
|
||||||
attenter attentionner atténuer atterrer attester attifer attirer attiser
|
atourner atropiniser attabler attacher attaquer attarder attenter attentionner
|
||||||
attitrer attraper attremper attribuer attrister attrouper aubiner
|
atténuer atterrer attester attifer attirer attiser attitrer attoucher attraper
|
||||||
|
attremper attribuer attriquer attrister attrouper aubader aubiner
|
||||||
audiovisualiser auditer auditionner augmenter augurer aulofer auloffer aumôner
|
audiovisualiser auditer auditionner augmenter augurer aulofer auloffer aumôner
|
||||||
auner auréoler ausculter authentiquer autoaccuser autoadapter autoadministrer
|
auner auréoler ausculter authentiquer autoaccuser autoadapter autoadministrer
|
||||||
autoagglutiner autoalimenter autoallumer autoamputer autoanalyser autoancrer
|
autoagglutiner autoalimenter autoallumer autoamputer autoanalyser autoancrer
|
||||||
|
@ -73,10 +74,10 @@ VERBS = set(
|
||||||
autodéterminer autodévelopper autodévorer autodicter autodiscipliner
|
autodéterminer autodévelopper autodévorer autodicter autodiscipliner
|
||||||
autodupliquer autoéduquer autoenchâsser autoenseigner autoépurer autoéquiper
|
autodupliquer autoéduquer autoenchâsser autoenseigner autoépurer autoéquiper
|
||||||
autoévaporiser autoévoluer autoféconder autofertiliser autoflageller
|
autoévaporiser autoévoluer autoféconder autofertiliser autoflageller
|
||||||
autofonder autoformer autofretter autogouverner autogreffer autoguider auto-
|
autofonder autoformer autofretter autogouverner autogreffer autoguider
|
||||||
immuniser auto-ioniser autolégitimer autolimiter autoliquider autolyser
|
auto-immuniser auto-ioniser autolégitimer autolimiter autoliquider autolyser
|
||||||
automatiser automédiquer automitrailler automutiler autonomiser auto-
|
automatiser automédiquer automitrailler automutiler autonomiser
|
||||||
optimaliser auto-optimiser autoorganiser autoperpétuer autopersuader
|
auto-optimaliser auto-optimiser autoorganiser autoperpétuer autopersuader
|
||||||
autopiloter autopolliniser autoporter autopositionner autoproclamer
|
autopiloter autopolliniser autoporter autopositionner autoproclamer
|
||||||
autopropulser autoréaliser autorecruter autoréglementer autoréguler
|
autopropulser autoréaliser autorecruter autoréglementer autoréguler
|
||||||
autorelaxer autoréparer autoriser autosélectionner autosevrer autostabiliser
|
autorelaxer autoréparer autoriser autosélectionner autosevrer autostabiliser
|
||||||
|
@ -84,7 +85,7 @@ VERBS = set(
|
||||||
autotracter autotransformer autovacciner autoventiler avaler avaliser
|
autotracter autotransformer autovacciner autoventiler avaler avaliser
|
||||||
aventurer aveugler avillonner aviner avironner aviser avitailler aviver
|
aventurer aveugler avillonner aviner avironner aviser avitailler aviver
|
||||||
avoiner avoisiner avorter avouer axéniser axer axiomatiser azimuter azoter
|
avoiner avoisiner avorter avouer axéniser axer axiomatiser azimuter azoter
|
||||||
azurer babiller babouiner bâcher bachonner bachoter bâcler badauder
|
azurer babiller babouiner bâcher bachonner bachoter bâcler badauder bader
|
||||||
badigeonner badiner baffer bafouer bafouiller bâfrer bagarrer bagoter bagouler
|
badigeonner badiner baffer bafouer bafouiller bâfrer bagarrer bagoter bagouler
|
||||||
baguenauder baguer baguetter bahuter baigner bailler bâiller baîller
|
baguenauder baguer baguetter bahuter baigner bailler bâiller baîller
|
||||||
bâillonner baîllonner baiser baisoter baisouiller baisser bakéliser balader
|
bâillonner baîllonner baiser baisoter baisouiller baisser bakéliser balader
|
||||||
|
@ -135,9 +136,9 @@ VERBS = set(
|
||||||
brouillonner broussailler brousser brouter bruiner bruisser bruiter brûler
|
brouillonner broussailler brousser brouter bruiner bruisser bruiter brûler
|
||||||
brumer brumiser bruncher brusquer brutaliser bruter bûcher bucoliser
|
brumer brumiser bruncher brusquer brutaliser bruter bûcher bucoliser
|
||||||
budgétiser buer buffériser buffler bugler bugner buiser buissonner bulgariser
|
budgétiser buer buffériser buffler bugler bugner buiser buissonner bulgariser
|
||||||
buquer bureaucratiser buriner buser busquer buter butiner butonner butter
|
buller buquer bureaucratiser buriner buser busquer buter butiner butonner
|
||||||
buvoter byzantiner byzantiniser cabaler cabaliser cabaner câbler cabosser
|
butter buvoter byzantiner byzantiniser cabaler cabaliser cabaner câbler
|
||||||
caboter cabotiner cabrer cabrioler cacaber cacaoter cacarder cacher
|
cabosser caboter cabotiner cabrer cabrioler cacaber cacaoter cacarder cacher
|
||||||
cachetonner cachotter cadastrer cadavériser cadeauter cadetter cadoter cadrer
|
cachetonner cachotter cadastrer cadavériser cadeauter cadetter cadoter cadrer
|
||||||
cafarder cafeter cafouiller cafter cageoler cagnarder cagner caguer cahoter
|
cafarder cafeter cafouiller cafter cageoler cagnarder cagner caguer cahoter
|
||||||
caillebotter cailler caillouter cajoler calaminer calamistrer calamiter
|
caillebotter cailler caillouter cajoler calaminer calamistrer calamiter
|
||||||
|
@ -185,65 +186,66 @@ VERBS = set(
|
||||||
claveliser claver clavetter clayonner cléricaliser clicher cligner clignoter
|
claveliser claver clavetter clayonner cléricaliser clicher cligner clignoter
|
||||||
climatiser clinquanter clinquer cliper cliquer clisser cliver clochardiser
|
climatiser clinquanter clinquer cliper cliquer clisser cliver clochardiser
|
||||||
clocher clocter cloisonner cloîtrer cloner cloper clopiner cloquer clôturer
|
clocher clocter cloisonner cloîtrer cloner cloper clopiner cloquer clôturer
|
||||||
clouer clouter coaccuser coacerver coacher coadapter coagglutiner coaguler
|
clotûrer clouer clouter coaccuser coacerver coacher coadapter coagglutiner
|
||||||
coaliser coaltarer coaltariser coanimer coarticuler cobelligérer cocaïniser
|
coaguler coaliser coaltarer coaltariser coanimer coarticuler cobelligérer
|
||||||
cocarder cocheniller cocher côcher cochonner coconiser coconner cocooner
|
cocaïniser cocarder cocheniller cocher côcher cochonner coconiser coconner
|
||||||
cocoter coder codéterminer codiller coéditer coéduquer coexister coexploiter
|
cocooner cocoter coder codéterminer codiller coéditer coéduquer coexister
|
||||||
coexprimer coffiner coffrer cofonder cogiter cogner cogouverner cohabiter
|
coexploiter coexprimer coffiner coffrer cofonder cogiter cogner cogouverner
|
||||||
cohériter cohober coiffer coincher coincider coïncider coïter colchiciner
|
cohabiter cohériter cohober coiffer coincher coincider coïncider coïter
|
||||||
collaber collaborer collationner collecter collectionner collectiviser coller
|
colchiciner collaber collaborer collationner collecter collectionner
|
||||||
collisionner colloquer colluvionner colmater colombianiser colombiner
|
collectiviser coller collisionner colloquer colluvionner colmater
|
||||||
coloniser colorer coloriser colostomiser colporter colpotomiser coltiner
|
colombianiser colombiner coloniser colorer coloriser colostomiser colporter
|
||||||
columniser combiner combler commander commanditer commémorer commenter
|
colpotomiser coltiner columniser combiner combler commander commanditer
|
||||||
commercialiser comminer commissionner commotionner commuer communaliser
|
commémorer commenter commercialiser comminer commissionner commotionner
|
||||||
communautariser communiquer communiser commuter compacifier compacter comparer
|
commuer communaliser communautariser communiquer communiser commuter
|
||||||
compartimenter compenser compiler compisser complanter complémenter
|
compacifier compacter comparer compartimenter compenser compiler compisser
|
||||||
complétiviser complexer complimenter compliquer comploter comporter composer
|
complanter complémenter complétiviser complexer complimenter compliquer
|
||||||
composter compoter compounder compresser comprimer comptabiliser compter
|
comploter comporter composer composter compoter compounder compresser
|
||||||
compulser computer computériser concentrer conceptualiser concerner concerter
|
comprimer comptabiliser compter compulser computer computériser concentrer
|
||||||
concher conciliabuler concocter concomiter concorder concrétionner concrétiser
|
conceptualiser concerner concerter concher conciliabuler concocter concomiter
|
||||||
concubiner condamner condenser condimenter conditionner confabuler
|
concorder concrétionner concrétiser concubiner condamner condenser condimenter
|
||||||
confectionner confédéraliser confesser confessionnaliser configurer confiner
|
conditionner confabuler confectionner confédéraliser confesser
|
||||||
confirmer confisquer confiter confluer conformer conforter confronter
|
confessionnaliser configurer confiner confirmer confisquer confiter confluer
|
||||||
confusionner congestionner conglober conglutiner congoliser congratuler
|
conformer conforter confronter confusionner congestionner conglober
|
||||||
coniser conjecturer conjointer conjuger conjuguer conjurer connecter conniver
|
conglutiner congoliser congratuler coniser conjecturer conjointer conjuger
|
||||||
connoter conquêter consacrer conscientiser conseiller conserver consigner
|
conjuguer conjurer connecter conniver connoter conquêter consacrer
|
||||||
consister consoler consolider consommariser consommer consonantiser consoner
|
conscientiser conseiller conserver consigner consister consoler consolider
|
||||||
conspirer conspuer constater consteller conster consterner constiper
|
consommariser consommer consonantiser consoner conspirer conspuer constater
|
||||||
constituer constitutionnaliser consulter consumer contacter contagionner
|
consteller conster consterner constiper constituer constitutionnaliser
|
||||||
containeriser containériser contaminer contemner contempler conteneuriser
|
consulter consumer contacter contagionner containeriser containériser
|
||||||
contenter conter contester contextualiser continentaliser contingenter
|
contaminer contemner contempler conteneuriser contenter conter contester
|
||||||
continuer contorsionner contourner contracter contractualiser contracturer
|
contextualiser continentaliser contingenter continuer contorsionner contourner
|
||||||
contraposer contraster contre-attaquer contrebouter contrebuter contrecalquer
|
contracter contractualiser contracturer contraposer contraster contre-attaquer
|
||||||
contrecarrer contre-expertiser contreficher contrefraser contre-indiquer
|
contrebouter contrebuter contrecalquer contrecarrer contre-expertiser
|
||||||
contremander contremanifester contremarcher contremarquer contreminer
|
contreficher contrefraser contre-indiquer contremander contremanifester
|
||||||
contremurer contrenquêter contreplaquer contrepointer contrer contresigner
|
contremarcher contremarquer contreminer contremurer contrenquêter
|
||||||
contrespionner contretyper contreventer contribuer contrister contrôler
|
contreplaquer contrepointer contrer contresigner contrespionner contretyper
|
||||||
controuver controverser contusionner conventionnaliser conventionner
|
contreventer contribuer contrister contrôler controuver controverser
|
||||||
conventualiser converser convoiter convoler convoquer convulser convulsionner
|
contusionner conventionnaliser conventionner conventualiser converser
|
||||||
cooccuper coopératiser coopter coordonner coorganiser coparrainer coparticiper
|
convoiter convoler convoquer convulser convulsionner cooccuper coopératiser
|
||||||
copermuter copiner copolycondenser copolymériser coprésenter coprésider copser
|
coopter coordonner coorganiser coparrainer coparticiper copermuter copiner
|
||||||
copter copuler copyrighter coqueliner coquer coqueriquer coquiller corailler
|
copolycondenser copolymériser coprésenter coprésider copser copter copuler
|
||||||
corder cordonner coréaliser coréaniser coréguler coresponsabiliser cornaquer
|
copyrighter coqueliner coquer coqueriquer coquiller corailler corder cordonner
|
||||||
cornemuser corner coroniser corporiser correctionaliser correctionnaliser
|
coréaliser coréaniser coréguler coresponsabiliser cornaquer cornemuser corner
|
||||||
correler corréler corroborer corroder corser corticaliser cosigner cosmétiquer
|
coroniser corporiser correctionaliser correctionnaliser correler corréler
|
||||||
cosser costumer coter cotillonner cotiser cotonner cotransfecter couaquer
|
corroborer corroder corser corticaliser cosigner cosmétiquer cosser costumer
|
||||||
couarder couchailler coucher couchoter couchotter coucouer coucouler couder
|
coter cotillonner cotiser cotonner cotransfecter couaquer couarder couchailler
|
||||||
coudrer couillonner couiner couler coulisser coupailler coupeller couper
|
coucher couchoter couchotter coucouer coucouler couder coudrer couillonner
|
||||||
couperoser coupler couponner courailler courbaturer courber courbetter
|
couiner couler coulisser coupailler coupeller couper couperoser coupler
|
||||||
courcailler couronner courrieler courser courtauder court-circuiter courtiser
|
couponner courailler courbaturer courber courbetter courcailler couronner
|
||||||
cousiner coussiner coûter couturer couver cracher crachiner crachoter
|
courrieler courser courtauder court-circuiter courtiser cousiner coussiner
|
||||||
crachouiller crailler cramer craminer cramper cramponner crampser cramser
|
coûter couturer couver cracher crachiner crachoter crachouiller crailler
|
||||||
craner crâner crânoter cranter crapahuter crapaüter crapser crapuler craquer
|
cramer craminer cramper cramponner crampser cramser craner crâner crânoter
|
||||||
crasher cratériser craticuler cratoniser cravacher cravater crawler crayonner
|
cranter crapahuter crapaüter crapser crapuler craquer crasher cratériser
|
||||||
crédibiliser créditer crématiser créoliser créosoter crêper crépiner crépiter
|
craticuler cratoniser cravacher cravater crawler crayonner crédibiliser
|
||||||
crésyler crêter crétiniser creuser criailler cribler criminaliser criquer
|
créditer crématiser créoliser créosoter crêper crépiner crépiter crésyler
|
||||||
crisper crisser cristalliser criticailler critiquer crocher croiser crôler
|
crêter crétiniser creuser criailler cribler criminaliser criquer crisper
|
||||||
croquer croskiller crosser crotoniser crotter crouler croupionner crouponner
|
crisser cristalliser criticailler critiquer crocher croiser crôler croquer
|
||||||
|
croskiller crosser crotoniser crotter crouler croupionner crouponner
|
||||||
croustiller croûter croûtonner cryoappliquer cryocautériser cryocoaguler
|
croustiller croûter croûtonner cryoappliquer cryocautériser cryocoaguler
|
||||||
cryoconcentrer cryodécaper cryoébarber cryofixer cryogéniser cryomarquer
|
cryoconcentrer cryodécaper cryoébarber cryofixer cryogéniser cryomarquer
|
||||||
cryosorber crypter cuber cueiller cuider cuisiner cuiter cuivrer culbuter
|
cryosorber crypter cuber cueiller cuider cuisiner cuivrer culbuter culer
|
||||||
culer culminer culotter culpabiliser cultiver culturaliser cumuler curariser
|
culminer culotter culpabiliser cultiver culturaliser cumuler curariser
|
||||||
curedenter curer curetter customiser cuter cutiniser cuver cyaniser cyanoser
|
curedenter curer curetter customiser cuter cutiniser cuver cyaniser cyanoser
|
||||||
cyanurer cybernétiser cycler cycliser cycloner cylindrer dactylocoder daguer
|
cyanurer cybernétiser cycler cycliser cycloner cylindrer dactylocoder daguer
|
||||||
daguerréotyper daïer daigner dailler daller damasquiner damer damner
|
daguerréotyper daïer daigner dailler daller damasquiner damer damner
|
||||||
|
@ -748,8 +750,8 @@ VERBS = set(
|
||||||
mithridatiser mitonner mitrailler mixer mixter mixtionner mobiliser modaliser
|
mithridatiser mitonner mitrailler mixer mixter mixtionner mobiliser modaliser
|
||||||
modéliser modérantiser moderniser moduler moellonner mofler moirer moiser
|
modéliser modérantiser moderniser moduler moellonner mofler moirer moiser
|
||||||
moissonner molarder molariser moléculariser molester moletter mollarder
|
moissonner molarder molariser moléculariser molester moletter mollarder
|
||||||
molletter monarchiser mondaniser monder mondialiser monétariser monétiser
|
molletonner molletter monarchiser mondaniser monder mondialiser monétariser
|
||||||
moniliser monologuer monomériser monophtonguer monopoler monopoliser
|
monétiser moniliser monologuer monomériser monophtonguer monopoler monopoliser
|
||||||
monoprogrammer monosiallitiser monotoniser monseigneuriser monter montrer
|
monoprogrammer monosiallitiser monotoniser monseigneuriser monter montrer
|
||||||
monumentaliser moquer moquetter morailler moraliser mordailler mordiller
|
monumentaliser moquer moquetter morailler moraliser mordailler mordiller
|
||||||
mordillonner mordorer mordoriser morfailler morfaler morfiler morfler morganer
|
mordillonner mordorer mordoriser morfailler morfaler morfiler morfler morganer
|
||||||
|
@ -792,63 +794,64 @@ VERBS = set(
|
||||||
palpiter palucher panacher panader pancarter paner paniquer panneauter panner
|
palpiter palucher panacher panader pancarter paner paniquer panneauter panner
|
||||||
pannetonner panoramiquer panser pantiner pantomimer pantoufler paoner paonner
|
pannetonner panoramiquer panser pantiner pantomimer pantoufler paoner paonner
|
||||||
papelarder papillonner papilloter papoter papouiller paquer paraboliser
|
papelarder papillonner papilloter papoter papouiller paquer paraboliser
|
||||||
parachuter parader parafer paraffiner paralléliser paralyser paramétriser
|
parachuter parader parafer paraffiner paraisonner paralléliser paralyser
|
||||||
parangonner parapher paraphraser parasiter parcellariser parceller parcelliser
|
paramétriser parangonner parapher paraphraser parasiter parcellariser
|
||||||
parcheminer parcoriser pardonner parementer parenthétiser parer paresser
|
parceller parcelliser parcheminer parcoriser pardonner parementer
|
||||||
parfiler parfumer parisianiser parjurer parkériser parlementer parler parloter
|
parenthétiser parer paresser parfiler parfumer parisianiser parjurer
|
||||||
parlotter parquer parrainer participer particulariser partitionner partouzer
|
parkériser parlementer parler parloter parlotter parquer parrainer participer
|
||||||
pasquiner pasquiniser passefiler passementer passepoiler passeriller
|
particulariser partitionner partouzer pasquiner pasquiniser passefiler
|
||||||
passionnaliser passionner pasteller pasteuriser pasticher pastiller pastoriser
|
passementer passepoiler passeriller passionnaliser passionner pasteller
|
||||||
patafioler pateliner patenter paternaliser paterner pathétiser patienter
|
pasteuriser pasticher pastiller pastoriser patafioler pateliner patenter
|
||||||
patiner pâtisser patoiser pâtonner patouiller patrimonialiser patrociner
|
paternaliser paterner pathétiser patienter patiner pâtisser patoiser pâtonner
|
||||||
patronner patrouiller patter pâturer paumer paupériser pauser pavaner paver
|
patouiller patrimonialiser patrociner patronner patrouiller patter pâturer
|
||||||
pavoiser peaufiner pébriner pécher pêcher pécloter pectiser pédaler pédanter
|
paumer paupériser pauser pavaner paver pavoiser peaufiner pébriner pécher
|
||||||
pédantiser pédiculiser pédicurer pédimenter peigner peiner peinturer
|
pêcher pécloter pectiser pédaler pédanter pédantiser pédiculiser pédicurer
|
||||||
peinturlurer péjorer pelaner pelauder péleriner pèleriner pelletiser
|
pédimenter peigner peiner peinturer peinturlurer péjorer pelaner pelauder
|
||||||
pelleverser pelliculer peloter pelotonner pelucher pelurer pénaliser pencher
|
péleriner pèleriner pelletiser pelleverser pelliculer peloter pelotonner
|
||||||
pendeloquer pendiller pendouiller penduler pénéplaner penser pensionner
|
pelucher pelurer pénaliser pencher pendeloquer pendiller pendouiller penduler
|
||||||
peptiser peptoniser percaliner percher percoler percuter perdurer pérégriner
|
pénéplaner penser pensionner peptiser peptoniser percaliner percher percoler
|
||||||
pérenniser perfectionner perforer performer perfuser péricliter périmer
|
percuter perdurer pérégriner pérenniser perfectionner perforer performer
|
||||||
périodiser périphériser périphraser péritoniser perler permanenter permaner
|
perfuser péricliter périmer périodiser périphériser périphraser péritoniser
|
||||||
perméabiliser permuter pérorer pérouaniser peroxyder perpétuer perquisitionner
|
perler permanenter permaner perméabiliser permuter pérorer pérouaniser
|
||||||
perreyer perruquer persécuter persifler persiller persister personnaliser
|
peroxyder perpétuer perquisitionner perreyer perruquer persécuter persifler
|
||||||
persuader perturber pervibrer pester pétarader pétarder pétiller pétitionner
|
persiller persister personnaliser persuader perturber pervibrer pester
|
||||||
pétocher pétouiller pétrarquiser pétroliser pétuner peupler pexer
|
pétarader pétarder pétiller pétitionner pétocher pétouiller pétrarquiser
|
||||||
phacoémulsifier phagocyter phalangiser pharyngaliser phéniquer phénoler
|
pétroliser pétuner peupler pexer phacoémulsifier phagocyter phalangiser
|
||||||
phényler philosophailler philosopher phlébotomiser phlegmatiser phlogistiquer
|
pharyngaliser phéniquer phénoler phényler philosophailler philosopher
|
||||||
phonétiser phonologiser phosphater phosphorer phosphoriser phosphoryler
|
phlébotomiser phlegmatiser phlogistiquer phonétiser phonologiser phosphater
|
||||||
photoactiver photocomposer photograver photo-ioniser photoïoniser photomonter
|
phosphorer phosphoriser phosphoryler photoactiver photocomposer photograver
|
||||||
photophosphoryler photopolymériser photosensibiliser phraser piaffer piailler
|
photo-ioniser photoïoniser photomonter photophosphoryler photopolymériser
|
||||||
pianomiser pianoter piauler pickler picocher picoler picorer picoter picouser
|
photosensibiliser phraser piaffer piailler pianomiser pianoter piauler pickler
|
||||||
picouzer picrater pictonner picturaliser pidginiser piédestaliser pierrer
|
picocher picoler picorer picoter picouser picouzer picrater pictonner
|
||||||
piétiner piétonnifier piétonniser pieuter pifer piffer piffrer pigeonner
|
picturaliser pidginiser piédestaliser pierrer piétiner piétonnifier
|
||||||
pigmenter pigner pignocher pignoler piler piller pilloter pilonner piloter
|
piétonniser pieuter pifer piffer piffrer pigeonner pigmenter pigner pignocher
|
||||||
pimenter pinailler pinceauter pinçoter pindariser pinter piocher pionner
|
pignoler piler piller pilloter pilonner piloter pimenter pinailler pinceauter
|
||||||
piotter piper piqueniquer pique-niquer piquer piquetonner piquouser piquouzer
|
pinçoter pindariser pinter piocher pionner piotter piper piqueniquer
|
||||||
pirater pirouetter piser pisser pissoter pissouiller pistacher pister pistoler
|
pique-niquer piquer piquetonner piquouser piquouzer pirater pirouetter piser
|
||||||
pistonner pitancher pitcher piter pitonner pituiter pivoter placarder
|
pisser pissoter pissouiller pistacher pister pistoler pistonner pitancher
|
||||||
placardiser plafonner plaider plainer plaisanter plamer plancher planer
|
pitcher piter pitonner pituiter pivoter placarder placardiser plafonner
|
||||||
planétariser planétiser planquer planter plaquer plasmolyser plastiquer
|
plaider plainer plaisanter plamer plancher planer planétariser planétiser
|
||||||
plastronner platiner platiniser platoniser plâtrer plébisciter pleurailler
|
planquer planter plaquer plasmolyser plastiquer plastronner platiner
|
||||||
pleuraliser pleurer pleurnicher pleuroter pleuviner pleuvioter pleuvoter
|
platiniser platoniser plâtrer plébisciter pleurailler pleuraliser pleurer
|
||||||
plisser plissoter plomber ploquer plotiniser plouter ploutrer plucher
|
pleurnicher pleuroter pleuviner pleuvioter pleuvoter plisser plissoter plomber
|
||||||
plumarder plumer pluraliser plussoyer pluviner pluvioter pocharder pocher
|
ploquer plotiniser plouter ploutrer plucher plumarder plumer pluraliser
|
||||||
pochetronner pochtronner poculer podzoliser poêler poétiser poignarder poigner
|
plussoyer pluviner pluvioter pocharder pocher pochetronner pochtronner poculer
|
||||||
poiler poinçonner pointer pointiller poireauter poirer poiroter poisser
|
podzoliser poêler poétiser poignarder poigner poiler poinçonner pointer
|
||||||
poitriner poivrer poivroter polariser poldériser polémiquer polissonner
|
pointiller poireauter poirer poiroter poisser poitriner poivrer poivroter
|
||||||
politicailler politiquer politiser polker polliciser polliniser polluer
|
polariser poldériser polémiquer polissonner politicailler politiquer politiser
|
||||||
poloniser polychromer polycontaminer polygoner polygoniser polymériser
|
polker polliciser polliniser polluer poloniser polychromer polycontaminer
|
||||||
polyploïdiser polytransfuser polyviser pommader pommer pomper pomponner
|
polygoner polygoniser polymériser polyploïdiser polytransfuser polyviser
|
||||||
ponctionner ponctuer ponter pontiller populariser poquer porer porphyriser
|
pommader pommer pomper pomponner ponctionner ponctuer ponter pontiller
|
||||||
porter porteuser portionner portoricaniser portraicturer portraiturer poser
|
populariser poquer porer porphyriser porter porteuser portionner
|
||||||
positionner positiver possibiliser postdater poster postérioriser posticher
|
portoricaniser portraicturer portraiturer poser positionner positiver
|
||||||
postillonner postposer postsonoriser postsynchroniser postuler potabiliser
|
possibiliser postdater poster postérioriser posticher postillonner postposer
|
||||||
potentialiser poter poteyer potiner poudrer pouffer pouiller pouliner pouloper
|
postsonoriser postsynchroniser postuler potabiliser potentialiser poter
|
||||||
poulotter pouponner pourpenser pourprer poussailler pousser poutser praliner
|
poteyer potiner poudrer pouffer pouiller pouliner pouloper poulotter pouponner
|
||||||
pratiquer préaccentuer préadapter préallouer préassembler préassimiler
|
pourpenser pourprer poussailler pousser poutser praliner pratiquer
|
||||||
préaviser précariser précautionner prêchailler préchauffer préchauler prêcher
|
préaccentuer préadapter préallouer préassembler préassimiler préaviser
|
||||||
précipiter préciser préciter précompter préconditionner préconfigurer
|
précariser précautionner prêchailler préchauffer préchauler prêcher précipiter
|
||||||
préconiser préconstituer précoter prédater prédécouper prédésigner prédestiner
|
préciser préciter précompter préconditionner préconfigurer préconiser
|
||||||
|
préconstituer précoter prédater prédécouper prédésigner prédestiner
|
||||||
prédéterminer prédiffuser prédilectionner prédiquer prédisposer prédominer
|
prédéterminer prédiffuser prédilectionner prédiquer prédisposer prédominer
|
||||||
préemballer préempter préencoller préenregistrer préenrober préexaminer
|
préemballer préempter préencoller préenregistrer préenrober préexaminer
|
||||||
préexister préfabriquer préfaner préfigurer préfixer préformater préformer
|
préexister préfabriquer préfaner préfigurer préfixer préformater préformer
|
||||||
|
@ -879,8 +882,8 @@ VERBS = set(
|
||||||
raccommoder raccompagner raccorder raccoutrer raccoutumer raccrocher racémiser
|
raccommoder raccompagner raccorder raccoutrer raccoutumer raccrocher racémiser
|
||||||
rachalander racher raciner racketter racler râcler racoler raconter racoquiner
|
rachalander racher raciner racketter racler râcler racoler raconter racoquiner
|
||||||
radariser rader radicaliser radiner radioactiver radiobaliser radiocommander
|
radariser rader radicaliser radiner radioactiver radiobaliser radiocommander
|
||||||
radioconserver radiodétecter radiodiffuser radioexposer radioguider radio-
|
radioconserver radiodétecter radiodiffuser radioexposer radioguider
|
||||||
immuniser radiolocaliser radiopasteuriser radiosonder radiostériliser
|
radio-immuniser radiolocaliser radiopasteuriser radiosonder radiostériliser
|
||||||
radiotéléphoner radiotéléviser radoter radouber rafaler raffermer raffiler
|
radiotéléphoner radiotéléviser radoter radouber rafaler raffermer raffiler
|
||||||
raffiner raffluer raffoler raffûter rafistoler rafler ragoter ragoûter
|
raffiner raffluer raffoler raffûter rafistoler rafler ragoter ragoûter
|
||||||
ragrafer raguer raguser raiguiser railler rainer rainurer raisonner rajouter
|
ragrafer raguer raguser raiguiser railler rainer rainurer raisonner rajouter
|
||||||
|
@ -1123,19 +1126,21 @@ VERBS = set(
|
||||||
sommer somnambuler somniloquer somnoler sonder sonnailler sonner sonoriser
|
sommer somnambuler somniloquer somnoler sonder sonnailler sonner sonoriser
|
||||||
sophistiquer sorguer soubresauter souder souffler souffroter soufrer souhaiter
|
sophistiquer sorguer soubresauter souder souffler souffroter soufrer souhaiter
|
||||||
souiller souillonner soûler souligner soûlotter soumissionner soupailler
|
souiller souillonner soûler souligner soûlotter soumissionner soupailler
|
||||||
soupçonner souper soupirer souquer sourciller sourdiner sous-capitaliser sous-
|
soupçonner souper soupirer souquer sourciller sourdiner sous-alimenter
|
||||||
catégoriser sousestimer sous-estimer sous-industrialiser sous-médicaliser
|
sous-capitaliser sous-catégoriser sous-équiper sousestimer sous-estimer
|
||||||
sousperformer sous-qualifier soussigner sous-titrer sous-utiliser soutacher
|
sous-évaluer sous-exploiter sous-exposer sous-industrialiser sous-louer
|
||||||
souter soutirer soviétiser spammer spasmer spatialiser spatuler spécialiser
|
sous-médicaliser sousperformer sous-qualifier soussigner sous-titrer
|
||||||
spéculer sphéroïdiser spilitiser spiraler spiraliser spirantiser spiritualiser
|
sous-traiter sous-utiliser sous-virer soutacher souter soutirer soviétiser
|
||||||
spitter splénectomiser spléniser sponsoriser sporter sporuler sprinter
|
spammer spasmer spatialiser spatuler spécialiser spéculer sphéroïdiser
|
||||||
squatériser squatter squatteriser squattériser squeezer stabiliser stabuler
|
spilitiser spiraler spiraliser spirantiser spiritualiser spitter
|
||||||
staffer stagner staliniser standardiser standoliser stanioler stariser
|
splénectomiser spléniser sponsoriser sporter sporuler sprinter squatériser
|
||||||
stationner statistiquer statuer stelliter stenciler stendhaliser sténoser
|
squatter squatteriser squattériser squeezer stabiliser stabuler staffer
|
||||||
sténotyper stepper stéréotyper stériliser stigmatiser stimuler stipuler
|
stagner staliniser standardiser standoliser stanioler stariser stationner
|
||||||
stocker stoloniser stopper stranguler stratégiser stresser strider striduler
|
statistiquer statuer stelliter stenciler stendhaliser sténoser sténotyper
|
||||||
striper stripper striquer stronker strouiller structurer strychniser stuquer
|
stepper stéréotyper stériliser stigmatiser stimuler stipuler stocker
|
||||||
styler styliser subalterniser subdiviser subdivisionner subériser subjectiver
|
stoloniser stopper stranguler stratégiser stresser strider striduler striper
|
||||||
|
stripper striquer stronker strouiller structurer strychniser stuquer styler
|
||||||
|
styliser subalterniser subdiviser subdivisionner subériser subjectiver
|
||||||
subjectiviser subjuguer sublimer sublimiser subluxer subminiaturiser subodorer
|
subjectiviser subjuguer sublimer sublimiser subluxer subminiaturiser subodorer
|
||||||
subordonner suborner subsister substanter substantialiser substantiver
|
subordonner suborner subsister substanter substantialiser substantiver
|
||||||
substituer subsumer subtiliser suburbaniser subventionner succomber suçoter
|
substituer subsumer subtiliser suburbaniser subventionner succomber suçoter
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,7 +1,7 @@
|
||||||
# coding: utf8
|
# coding: utf8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
from ....symbols import POS, NOUN, VERB, ADJ, ADV, PRON, DET, AUX, PUNCT
|
from ....symbols import POS, NOUN, VERB, ADJ, ADV, PRON, DET, AUX, PUNCT, ADP, SCONJ, CCONJ
|
||||||
from ....symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
|
from ....symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
|
||||||
from .lookup import LOOKUP
|
from .lookup import LOOKUP
|
||||||
|
|
||||||
|
@ -9,7 +9,7 @@ from .lookup import LOOKUP
|
||||||
French language lemmatizer applies the default rule based lemmatization
|
French language lemmatizer applies the default rule based lemmatization
|
||||||
procedure with some modifications for better French language support.
|
procedure with some modifications for better French language support.
|
||||||
|
|
||||||
The parts of speech 'ADV', 'PRON', 'DET' and 'AUX' are added to use the
|
The parts of speech 'ADV', 'PRON', 'DET', 'ADP' and 'AUX' are added to use the
|
||||||
rule-based lemmatization. As a last resort, the lemmatizer checks in
|
rule-based lemmatization. As a last resort, the lemmatizer checks in
|
||||||
the lookup table.
|
the lookup table.
|
||||||
'''
|
'''
|
||||||
|
@ -34,16 +34,22 @@ class FrenchLemmatizer(object):
|
||||||
univ_pos = 'verb'
|
univ_pos = 'verb'
|
||||||
elif univ_pos in (ADJ, 'ADJ', 'adj'):
|
elif univ_pos in (ADJ, 'ADJ', 'adj'):
|
||||||
univ_pos = 'adj'
|
univ_pos = 'adj'
|
||||||
|
elif univ_pos in (ADP, 'ADP', 'adp'):
|
||||||
|
univ_pos = 'adp'
|
||||||
elif univ_pos in (ADV, 'ADV', 'adv'):
|
elif univ_pos in (ADV, 'ADV', 'adv'):
|
||||||
univ_pos = 'adv'
|
univ_pos = 'adv'
|
||||||
elif univ_pos in (PRON, 'PRON', 'pron'):
|
|
||||||
univ_pos = 'pron'
|
|
||||||
elif univ_pos in (DET, 'DET', 'det'):
|
|
||||||
univ_pos = 'det'
|
|
||||||
elif univ_pos in (AUX, 'AUX', 'aux'):
|
elif univ_pos in (AUX, 'AUX', 'aux'):
|
||||||
univ_pos = 'aux'
|
univ_pos = 'aux'
|
||||||
|
elif univ_pos in (CCONJ, 'CCONJ', 'cconj'):
|
||||||
|
univ_pos = 'cconj'
|
||||||
|
elif univ_pos in (DET, 'DET', 'det'):
|
||||||
|
univ_pos = 'det'
|
||||||
|
elif univ_pos in (PRON, 'PRON', 'pron'):
|
||||||
|
univ_pos = 'pron'
|
||||||
elif univ_pos in (PUNCT, 'PUNCT', 'punct'):
|
elif univ_pos in (PUNCT, 'PUNCT', 'punct'):
|
||||||
univ_pos = 'punct'
|
univ_pos = 'punct'
|
||||||
|
elif univ_pos in (SCONJ, 'SCONJ', 'sconj'):
|
||||||
|
univ_pos = 'sconj'
|
||||||
else:
|
else:
|
||||||
return [self.lookup(string)]
|
return [self.lookup(string)]
|
||||||
# See Issue #435 for example of where this logic is requied.
|
# See Issue #435 for example of where this logic is requied.
|
||||||
|
@ -100,7 +106,7 @@ class FrenchLemmatizer(object):
|
||||||
|
|
||||||
def lookup(self, string):
|
def lookup(self, string):
|
||||||
if string in self.lookup_table:
|
if string in self.lookup_table:
|
||||||
return self.lookup_table[string]
|
return self.lookup_table[string][0]
|
||||||
return string
|
return string
|
||||||
|
|
||||||
|
|
||||||
|
@ -125,7 +131,7 @@ def lemmatize(string, index, exceptions, rules):
|
||||||
if not forms:
|
if not forms:
|
||||||
forms.extend(oov_forms)
|
forms.extend(oov_forms)
|
||||||
if not forms and string in LOOKUP.keys():
|
if not forms and string in LOOKUP.keys():
|
||||||
forms.append(LOOKUP[string])
|
forms.append(LOOKUP[string][0])
|
||||||
if not forms:
|
if not forms:
|
||||||
forms.append(string)
|
forms.append(string)
|
||||||
return list(set(forms))
|
return list(set(forms))
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,16 +1,15 @@
|
||||||
# encoding: utf8
|
# encoding: utf8
|
||||||
from __future__ import unicode_literals, print_function
|
from __future__ import unicode_literals, print_function
|
||||||
|
|
||||||
from ...language import Language
|
|
||||||
from ...attrs import LANG
|
|
||||||
from ...tokens import Doc, Token
|
|
||||||
from ...tokenizer import Tokenizer
|
|
||||||
from ... import util
|
|
||||||
from .tag_map import TAG_MAP
|
|
||||||
|
|
||||||
import re
|
import re
|
||||||
from collections import namedtuple
|
from collections import namedtuple
|
||||||
|
|
||||||
|
from .tag_map import TAG_MAP
|
||||||
|
|
||||||
|
from ...attrs import LANG
|
||||||
|
from ...language import Language
|
||||||
|
from ...tokens import Doc, Token
|
||||||
|
from ...util import DummyTokenizer
|
||||||
|
|
||||||
ShortUnitWord = namedtuple("ShortUnitWord", ["surface", "lemma", "pos"])
|
ShortUnitWord = namedtuple("ShortUnitWord", ["surface", "lemma", "pos"])
|
||||||
|
|
||||||
|
@ -46,12 +45,12 @@ def resolve_pos(token):
|
||||||
# PoS mappings.
|
# PoS mappings.
|
||||||
|
|
||||||
if token.pos == "連体詞,*,*,*":
|
if token.pos == "連体詞,*,*,*":
|
||||||
if re.match("^[こそあど此其彼]の", token.surface):
|
if re.match(r"[こそあど此其彼]の", token.surface):
|
||||||
return token.pos + ",DET"
|
return token.pos + ",DET"
|
||||||
if re.match("^[こそあど此其彼]", token.surface):
|
if re.match(r"[こそあど此其彼]", token.surface):
|
||||||
return token.pos + ",PRON"
|
return token.pos + ",PRON"
|
||||||
else:
|
return token.pos + ",ADJ"
|
||||||
return token.pos + ",ADJ"
|
|
||||||
return token.pos
|
return token.pos
|
||||||
|
|
||||||
|
|
||||||
|
@ -68,7 +67,8 @@ def detailed_tokens(tokenizer, text):
|
||||||
pos = ",".join(parts[0:4])
|
pos = ",".join(parts[0:4])
|
||||||
|
|
||||||
if len(parts) > 7:
|
if len(parts) > 7:
|
||||||
# this information is only available for words in the tokenizer dictionary
|
# this information is only available for words in the tokenizer
|
||||||
|
# dictionary
|
||||||
base = parts[7]
|
base = parts[7]
|
||||||
|
|
||||||
words.append(ShortUnitWord(surface, base, pos))
|
words.append(ShortUnitWord(surface, base, pos))
|
||||||
|
@ -76,38 +76,27 @@ def detailed_tokens(tokenizer, text):
|
||||||
return words
|
return words
|
||||||
|
|
||||||
|
|
||||||
class JapaneseTokenizer(object):
|
class JapaneseTokenizer(DummyTokenizer):
|
||||||
def __init__(self, cls, nlp=None):
|
def __init__(self, cls, nlp=None):
|
||||||
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
||||||
|
|
||||||
MeCab = try_mecab_import()
|
self.tokenizer = try_mecab_import().Tagger()
|
||||||
self.tokenizer = MeCab.Tagger()
|
|
||||||
self.tokenizer.parseToNode("") # see #2901
|
self.tokenizer.parseToNode("") # see #2901
|
||||||
|
|
||||||
def __call__(self, text):
|
def __call__(self, text):
|
||||||
dtokens = detailed_tokens(self.tokenizer, text)
|
dtokens = detailed_tokens(self.tokenizer, text)
|
||||||
|
|
||||||
words = [x.surface for x in dtokens]
|
words = [x.surface for x in dtokens]
|
||||||
doc = Doc(self.vocab, words=words, spaces=[False] * len(words))
|
spaces = [False] * len(words)
|
||||||
|
doc = Doc(self.vocab, words=words, spaces=spaces)
|
||||||
|
|
||||||
for token, dtoken in zip(doc, dtokens):
|
for token, dtoken in zip(doc, dtokens):
|
||||||
token._.mecab_tag = dtoken.pos
|
token._.mecab_tag = dtoken.pos
|
||||||
token.tag_ = resolve_pos(dtoken)
|
token.tag_ = resolve_pos(dtoken)
|
||||||
token.lemma_ = dtoken.lemma
|
token.lemma_ = dtoken.lemma
|
||||||
|
|
||||||
return doc
|
return doc
|
||||||
|
|
||||||
# add dummy methods for to_bytes, from_bytes, to_disk and from_disk to
|
|
||||||
# allow serialization (see #1557)
|
|
||||||
def to_bytes(self, **exclude):
|
|
||||||
return b""
|
|
||||||
|
|
||||||
def from_bytes(self, bytes_data, **exclude):
|
|
||||||
return self
|
|
||||||
|
|
||||||
def to_disk(self, path, **exclude):
|
|
||||||
return None
|
|
||||||
|
|
||||||
def from_disk(self, path, **exclude):
|
|
||||||
return self
|
|
||||||
|
|
||||||
|
|
||||||
class JapaneseCharacterSegmenter(object):
|
class JapaneseCharacterSegmenter(object):
|
||||||
def __init__(self, vocab):
|
def __init__(self, vocab):
|
||||||
|
@ -154,7 +143,8 @@ class JapaneseCharacterSegmenter(object):
|
||||||
|
|
||||||
class JapaneseDefaults(Language.Defaults):
|
class JapaneseDefaults(Language.Defaults):
|
||||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||||
lex_attr_getters[LANG] = lambda text: "ja"
|
lex_attr_getters[LANG] = lambda _text: "ja"
|
||||||
|
|
||||||
tag_map = TAG_MAP
|
tag_map = TAG_MAP
|
||||||
use_janome = True
|
use_janome = True
|
||||||
|
|
||||||
|
@ -169,7 +159,6 @@ class JapaneseDefaults(Language.Defaults):
|
||||||
class Japanese(Language):
|
class Japanese(Language):
|
||||||
lang = "ja"
|
lang = "ja"
|
||||||
Defaults = JapaneseDefaults
|
Defaults = JapaneseDefaults
|
||||||
Tokenizer = JapaneseTokenizer
|
|
||||||
|
|
||||||
def make_doc(self, text):
|
def make_doc(self, text):
|
||||||
return self.tokenizer(text)
|
return self.tokenizer(text)
|
||||||
|
|
|
@ -5,6 +5,7 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||||
from .stop_words import STOP_WORDS
|
from .stop_words import STOP_WORDS
|
||||||
from .morph_rules import MORPH_RULES
|
from .morph_rules import MORPH_RULES
|
||||||
from .lemmatizer import LEMMA_RULES, LOOKUP
|
from .lemmatizer import LEMMA_RULES, LOOKUP
|
||||||
|
from .punctuation import TOKENIZER_INFIXES, TOKENIZER_SUFFIXES
|
||||||
|
|
||||||
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||||
from ..norm_exceptions import BASE_NORMS
|
from ..norm_exceptions import BASE_NORMS
|
||||||
|
@ -20,12 +21,14 @@ class SwedishDefaults(Language.Defaults):
|
||||||
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
|
Language.Defaults.lex_attr_getters[NORM], BASE_NORMS
|
||||||
)
|
)
|
||||||
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
|
||||||
|
morph_rules = MORPH_RULES
|
||||||
|
infixes = TOKENIZER_INFIXES
|
||||||
|
suffixes = TOKENIZER_SUFFIXES
|
||||||
stop_words = STOP_WORDS
|
stop_words = STOP_WORDS
|
||||||
lemma_rules = LEMMA_RULES
|
lemma_rules = LEMMA_RULES
|
||||||
lemma_lookup = LOOKUP
|
lemma_lookup = LOOKUP
|
||||||
morph_rules = MORPH_RULES
|
morph_rules = MORPH_RULES
|
||||||
|
|
||||||
|
|
||||||
class Swedish(Language):
|
class Swedish(Language):
|
||||||
lang = "sv"
|
lang = "sv"
|
||||||
Defaults = SwedishDefaults
|
Defaults = SwedishDefaults
|
||||||
|
|
|
@ -233167,7 +233167,6 @@ LOOKUP = {
|
||||||
"jades": "jade",
|
"jades": "jade",
|
||||||
"jaet": "ja",
|
"jaet": "ja",
|
||||||
"jaets": "ja",
|
"jaets": "ja",
|
||||||
"jag": "jaga",
|
|
||||||
"jagad": "jaga",
|
"jagad": "jaga",
|
||||||
"jagade": "jaga",
|
"jagade": "jaga",
|
||||||
"jagades": "jaga",
|
"jagades": "jaga",
|
||||||
|
|
25
spacy/lang/sv/punctuation.py
Normal file
25
spacy/lang/sv/punctuation.py
Normal file
|
@ -0,0 +1,25 @@
|
||||||
|
# coding: utf8
|
||||||
|
"""Punctuation stolen from Danish"""
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
from ..char_classes import LIST_ELLIPSES, LIST_ICONS
|
||||||
|
from ..char_classes import QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER
|
||||||
|
from ..punctuation import TOKENIZER_SUFFIXES
|
||||||
|
|
||||||
|
|
||||||
|
_quotes = QUOTES.replace("'", '')
|
||||||
|
|
||||||
|
_infixes = (LIST_ELLIPSES + LIST_ICONS +
|
||||||
|
[r'(?<=[{}])\.(?=[{}])'.format(ALPHA_LOWER, ALPHA_UPPER),
|
||||||
|
r'(?<=[{a}])[,!?](?=[{a}])'.format(a=ALPHA),
|
||||||
|
r'(?<=[{a}"])[:<>=](?=[{a}])'.format(a=ALPHA),
|
||||||
|
r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
|
||||||
|
r'(?<=[{a}])([{q}\)\]\(\[])(?=[\{a}])'.format(a=ALPHA, q=_quotes),
|
||||||
|
r'(?<=[{a}])--(?=[{a}])'.format(a=ALPHA)])
|
||||||
|
|
||||||
|
_suffixes = [suffix for suffix in TOKENIZER_SUFFIXES if suffix not in ["'s", "'S", "’s", "’S", r"\'"]]
|
||||||
|
_suffixes += [r"(?<=[^sSxXzZ])\'"]
|
||||||
|
|
||||||
|
|
||||||
|
TOKENIZER_INFIXES = _infixes
|
||||||
|
TOKENIZER_SUFFIXES = _suffixes
|
|
@ -26,14 +26,15 @@ for verb_data in [
|
||||||
{ORTH: "u", LEMMA: PRON_LEMMA, NORM: "du"},
|
{ORTH: "u", LEMMA: PRON_LEMMA, NORM: "du"},
|
||||||
]
|
]
|
||||||
|
|
||||||
|
# Abbreviations for weekdays "sön." (for "söndag" / "söner")
|
||||||
|
# are left out because they are ambiguous. The same is the case
|
||||||
|
# for abbreviations "jul." and "Jul." ("juli" / "jul").
|
||||||
for exc_data in [
|
for exc_data in [
|
||||||
{ORTH: "jan.", LEMMA: "januari"},
|
{ORTH: "jan.", LEMMA: "januari"},
|
||||||
{ORTH: "febr.", LEMMA: "februari"},
|
{ORTH: "febr.", LEMMA: "februari"},
|
||||||
{ORTH: "feb.", LEMMA: "februari"},
|
{ORTH: "feb.", LEMMA: "februari"},
|
||||||
{ORTH: "apr.", LEMMA: "april"},
|
{ORTH: "apr.", LEMMA: "april"},
|
||||||
{ORTH: "jun.", LEMMA: "juni"},
|
{ORTH: "jun.", LEMMA: "juni"},
|
||||||
{ORTH: "jul.", LEMMA: "juli"},
|
|
||||||
{ORTH: "aug.", LEMMA: "augusti"},
|
{ORTH: "aug.", LEMMA: "augusti"},
|
||||||
{ORTH: "sept.", LEMMA: "september"},
|
{ORTH: "sept.", LEMMA: "september"},
|
||||||
{ORTH: "sep.", LEMMA: "september"},
|
{ORTH: "sep.", LEMMA: "september"},
|
||||||
|
@ -46,13 +47,11 @@ for exc_data in [
|
||||||
{ORTH: "tors.", LEMMA: "torsdag"},
|
{ORTH: "tors.", LEMMA: "torsdag"},
|
||||||
{ORTH: "fre.", LEMMA: "fredag"},
|
{ORTH: "fre.", LEMMA: "fredag"},
|
||||||
{ORTH: "lör.", LEMMA: "lördag"},
|
{ORTH: "lör.", LEMMA: "lördag"},
|
||||||
{ORTH: "sön.", LEMMA: "söndag"},
|
|
||||||
{ORTH: "Jan.", LEMMA: "Januari"},
|
{ORTH: "Jan.", LEMMA: "Januari"},
|
||||||
{ORTH: "Febr.", LEMMA: "Februari"},
|
{ORTH: "Febr.", LEMMA: "Februari"},
|
||||||
{ORTH: "Feb.", LEMMA: "Februari"},
|
{ORTH: "Feb.", LEMMA: "Februari"},
|
||||||
{ORTH: "Apr.", LEMMA: "April"},
|
{ORTH: "Apr.", LEMMA: "April"},
|
||||||
{ORTH: "Jun.", LEMMA: "Juni"},
|
{ORTH: "Jun.", LEMMA: "Juni"},
|
||||||
{ORTH: "Jul.", LEMMA: "Juli"},
|
|
||||||
{ORTH: "Aug.", LEMMA: "Augusti"},
|
{ORTH: "Aug.", LEMMA: "Augusti"},
|
||||||
{ORTH: "Sept.", LEMMA: "September"},
|
{ORTH: "Sept.", LEMMA: "September"},
|
||||||
{ORTH: "Sep.", LEMMA: "September"},
|
{ORTH: "Sep.", LEMMA: "September"},
|
||||||
|
@ -65,28 +64,32 @@ for exc_data in [
|
||||||
{ORTH: "Tors.", LEMMA: "Torsdag"},
|
{ORTH: "Tors.", LEMMA: "Torsdag"},
|
||||||
{ORTH: "Fre.", LEMMA: "Fredag"},
|
{ORTH: "Fre.", LEMMA: "Fredag"},
|
||||||
{ORTH: "Lör.", LEMMA: "Lördag"},
|
{ORTH: "Lör.", LEMMA: "Lördag"},
|
||||||
{ORTH: "Sön.", LEMMA: "Söndag"},
|
|
||||||
{ORTH: "sthlm", LEMMA: "Stockholm"},
|
{ORTH: "sthlm", LEMMA: "Stockholm"},
|
||||||
{ORTH: "gbg", LEMMA: "Göteborg"},
|
{ORTH: "gbg", LEMMA: "Göteborg"},
|
||||||
]:
|
]:
|
||||||
_exc[exc_data[ORTH]] = [exc_data]
|
_exc[exc_data[ORTH]] = [exc_data]
|
||||||
|
|
||||||
|
|
||||||
|
# Specific case abbreviations only
|
||||||
|
for orth in ["AB", "Dr.", "H.M.", "H.K.H.", "m/s", "M/S", "Ph.d.", "S:t", "s:t"]:
|
||||||
|
_exc[orth] = [{ORTH: orth}]
|
||||||
|
|
||||||
|
|
||||||
ABBREVIATIONS = [
|
ABBREVIATIONS = [
|
||||||
"ang",
|
"ang",
|
||||||
"anm",
|
"anm",
|
||||||
"bil",
|
|
||||||
"bl.a",
|
"bl.a",
|
||||||
"d.v.s",
|
"d.v.s",
|
||||||
"doc",
|
"doc",
|
||||||
"dvs",
|
"dvs",
|
||||||
"e.d",
|
"e.d",
|
||||||
"e.kr",
|
"e.kr",
|
||||||
"el",
|
"el.",
|
||||||
"eng",
|
"eng",
|
||||||
"etc",
|
"etc",
|
||||||
"exkl",
|
"exkl",
|
||||||
"f",
|
"ev",
|
||||||
|
"f.",
|
||||||
"f.d",
|
"f.d",
|
||||||
"f.kr",
|
"f.kr",
|
||||||
"f.n",
|
"f.n",
|
||||||
|
@ -97,10 +100,11 @@ ABBREVIATIONS = [
|
||||||
"fr.o.m",
|
"fr.o.m",
|
||||||
"förf",
|
"förf",
|
||||||
"inkl",
|
"inkl",
|
||||||
"jur",
|
"iofs",
|
||||||
|
"jur.",
|
||||||
"kap",
|
"kap",
|
||||||
"kl",
|
"kl",
|
||||||
"kor",
|
"kor.",
|
||||||
"kr",
|
"kr",
|
||||||
"kungl",
|
"kungl",
|
||||||
"lat",
|
"lat",
|
||||||
|
@ -109,9 +113,10 @@ ABBREVIATIONS = [
|
||||||
"m.m",
|
"m.m",
|
||||||
"max",
|
"max",
|
||||||
"milj",
|
"milj",
|
||||||
"min",
|
"min.",
|
||||||
"mos",
|
"mos",
|
||||||
"mt",
|
"mt",
|
||||||
|
"mvh",
|
||||||
"o.d",
|
"o.d",
|
||||||
"o.s.v",
|
"o.s.v",
|
||||||
"obs",
|
"obs",
|
||||||
|
@ -125,21 +130,27 @@ ABBREVIATIONS = [
|
||||||
"s.k",
|
"s.k",
|
||||||
"s.t",
|
"s.t",
|
||||||
"sid",
|
"sid",
|
||||||
"s:t",
|
|
||||||
"t.ex",
|
"t.ex",
|
||||||
"t.h",
|
"t.h",
|
||||||
"t.o.m",
|
"t.o.m",
|
||||||
"t.v",
|
"t.v",
|
||||||
"tel",
|
"tel",
|
||||||
"ung",
|
"ung.",
|
||||||
"vol",
|
"vol",
|
||||||
|
"v.",
|
||||||
"äv",
|
"äv",
|
||||||
"övers",
|
"övers",
|
||||||
]
|
]
|
||||||
ABBREVIATIONS = [abbr + "." for abbr in ABBREVIATIONS] + ABBREVIATIONS
|
|
||||||
|
# Add abbreviation for trailing punctuation too. If the abbreviation already has a trailing punctuation - skip it.
|
||||||
|
for abbr in ABBREVIATIONS:
|
||||||
|
if abbr.endswith(".") == False:
|
||||||
|
ABBREVIATIONS.append(abbr + ".")
|
||||||
|
|
||||||
for orth in ABBREVIATIONS:
|
for orth in ABBREVIATIONS:
|
||||||
_exc[orth] = [{ORTH: orth}]
|
_exc[orth] = [{ORTH: orth}]
|
||||||
|
capitalized = orth.capitalize()
|
||||||
|
_exc[capitalized] = [{ORTH: capitalized}]
|
||||||
|
|
||||||
# Sentences ending in "i." (as in "... peka i."), "m." (as in "...än 2000 m."),
|
# Sentences ending in "i." (as in "... peka i."), "m." (as in "...än 2000 m."),
|
||||||
# should be tokenized as two separate tokens.
|
# should be tokenized as two separate tokens.
|
||||||
|
|
24
spacy/lang/ta/__init__.py
Normal file
24
spacy/lang/ta/__init__.py
Normal file
|
@ -0,0 +1,24 @@
|
||||||
|
# import language-specific data
|
||||||
|
from .stop_words import STOP_WORDS
|
||||||
|
from .lex_attrs import LEX_ATTRS
|
||||||
|
|
||||||
|
from ..tokenizer_exceptions import BASE_EXCEPTIONS
|
||||||
|
from ...language import Language
|
||||||
|
from ...attrs import LANG
|
||||||
|
from ...util import update_exc
|
||||||
|
|
||||||
|
# create Defaults class in the module scope (necessary for pickling!)
|
||||||
|
class TamilDefaults(Language.Defaults):
|
||||||
|
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||||
|
lex_attr_getters[LANG] = lambda text: 'ta' # language ISO code
|
||||||
|
|
||||||
|
# optional: replace flags with custom functions, e.g. like_num()
|
||||||
|
lex_attr_getters.update(LEX_ATTRS)
|
||||||
|
|
||||||
|
# create actual Language class
|
||||||
|
class Tamil(Language):
|
||||||
|
lang = 'ta' # language ISO code
|
||||||
|
Defaults = TamilDefaults # override defaults
|
||||||
|
|
||||||
|
# set default export – this allows the language class to be lazy-loaded
|
||||||
|
__all__ = ['Tamil']
|
21
spacy/lang/ta/examples.py
Normal file
21
spacy/lang/ta/examples.py
Normal file
|
@ -0,0 +1,21 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
Example sentences to test spaCy and its language models.
|
||||||
|
|
||||||
|
>>> from spacy.lang.ta.examples import sentences
|
||||||
|
>>> docs = nlp.pipe(sentences)
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
sentences = [
|
||||||
|
"கிறிஸ்துமஸ் மற்றும் இனிய புத்தாண்டு வாழ்த்துக்கள்",
|
||||||
|
"எனக்கு என் குழந்தைப் பருவம் நினைவிருக்கிறது",
|
||||||
|
"உங்கள் பெயர் என்ன?",
|
||||||
|
"ஏறத்தாழ இலங்கைத் தமிழரில் மூன்றிலொரு பங்கினர் இலங்கையை விட்டு வெளியேறிப் பிற நாடுகளில் வாழ்கின்றனர்",
|
||||||
|
"இந்த ஃபோனுடன் சுமார் ரூ.2,990 மதிப்புள்ள போட் ராக்கர்ஸ் நிறுவனத்தின் ஸ்போர்ட் புளூடூத் ஹெட்போன்ஸ் இலவசமாக வழங்கப்படவுள்ளது.",
|
||||||
|
"மட்டக்களப்பில் பல இடங்களில் வீட்டுத் திட்டங்களுக்கு இன்று அடிக்கல் நாட்டல்",
|
||||||
|
"ஐ போன்க்கு முகத்தை வைத்து அன்லாக் செய்யும் முறை மற்றும் விரலால் தொட்டு அன்லாக் செய்யும் முறையை வாட்ஸ் ஆப் நிறுவனம் இதற்கு முன் கண்டுபிடித்தது"
|
||||||
|
]
|
44
spacy/lang/ta/lex_attrs.py
Normal file
44
spacy/lang/ta/lex_attrs.py
Normal file
|
@ -0,0 +1,44 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
from ...attrs import LIKE_NUM
|
||||||
|
|
||||||
|
|
||||||
|
_numeral_suffixes = {'பத்து': 'பது', 'ற்று': 'று', 'ரத்து':'ரம்' , 'சத்து': 'சம்'}
|
||||||
|
_num_words = ['பூச்சியம்', 'ஒரு', 'ஒன்று', 'இரண்டு', 'மூன்று', 'நான்கு', 'ஐந்து', 'ஆறு', 'ஏழு',
|
||||||
|
'எட்டு', 'ஒன்பது', 'பத்து', 'பதினொன்று', 'பன்னிரண்டு', 'பதின்மூன்று', 'பதினான்கு',
|
||||||
|
'பதினைந்து', 'பதினாறு', 'பதினேழு', 'பதினெட்டு', 'பத்தொன்பது', 'இருபது',
|
||||||
|
'முப்பது', 'நாற்பது', 'ஐம்பது', 'அறுபது', 'எழுபது', 'எண்பது', 'தொண்ணூறு',
|
||||||
|
'நூறு', 'இருநூறு', 'முன்னூறு', 'நாநூறு', 'ஐநூறு', 'அறுநூறு', 'எழுநூறு', 'எண்ணூறு', 'தொள்ளாயிரம்',
|
||||||
|
'ஆயிரம்', 'ஒராயிரம்', 'லட்சம்', 'மில்லியன்', 'கோடி', 'பில்லியன்', 'டிரில்லியன்']
|
||||||
|
|
||||||
|
|
||||||
|
# 20-89 ,90-899,900-99999 and above have different suffixes
|
||||||
|
def suffix_filter(text):
|
||||||
|
# text without numeral suffixes
|
||||||
|
for num_suffix in _numeral_suffixes.keys():
|
||||||
|
length = len(num_suffix)
|
||||||
|
if (len(text) < length):
|
||||||
|
break
|
||||||
|
elif text.endswith(num_suffix):
|
||||||
|
return text[:-length] + _numeral_suffixes[num_suffix]
|
||||||
|
return text
|
||||||
|
|
||||||
|
|
||||||
|
def like_num(text):
|
||||||
|
text = text.replace(',', '').replace('.', '')
|
||||||
|
if text.isdigit():
|
||||||
|
return True
|
||||||
|
if text.count('/') == 1:
|
||||||
|
num, denom = text.split('/')
|
||||||
|
if num.isdigit() and denom.isdigit():
|
||||||
|
return True
|
||||||
|
print(suffix_filter(text))
|
||||||
|
if text.lower() in _num_words:
|
||||||
|
return True
|
||||||
|
elif suffix_filter(text) in _num_words:
|
||||||
|
return True
|
||||||
|
|
||||||
|
return False
|
||||||
|
LEX_ATTRS = {
|
||||||
|
LIKE_NUM: like_num
|
||||||
|
}
|
148
spacy/lang/ta/norm_exceptions.py
Normal file
148
spacy/lang/ta/norm_exceptions.py
Normal file
|
@ -0,0 +1,148 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
_exc = {
|
||||||
|
|
||||||
|
# Regional words normal
|
||||||
|
# Sri Lanka - wikipeadia
|
||||||
|
"இங்க": "இங்கே",
|
||||||
|
"வாங்க": "வாருங்கள்",
|
||||||
|
'ஒண்டு':'ஒன்று',
|
||||||
|
'கண்டு': 'கன்று',
|
||||||
|
'கொண்டு': 'கொன்று',
|
||||||
|
'பண்டி': 'பன்றி',
|
||||||
|
'பச்ச': 'பச்சை',
|
||||||
|
'அம்பது': 'ஐம்பது',
|
||||||
|
'வெச்ச': 'வைத்து',
|
||||||
|
'வச்ச': 'வைத்து',
|
||||||
|
'வச்சி': 'வைத்து',
|
||||||
|
'வாளைப்பழம்':'வாழைப்பழம்',
|
||||||
|
'மண்ணு': 'மண்',
|
||||||
|
'பொன்னு': 'பொன்',
|
||||||
|
'சாவல்': 'சேவல்',
|
||||||
|
'அங்கால': 'அங்கு ',
|
||||||
|
'அசுப்பு': 'நடமாட்டம்',
|
||||||
|
'எழுவான் கரை': 'எழுவான்கரை',
|
||||||
|
'ஓய்யாரம்': 'எழில் ',
|
||||||
|
'ஒளும்பு': 'எழும்பு',
|
||||||
|
'ஓர்மை': 'துணிவு',
|
||||||
|
'கச்சை': 'கோவணம்',
|
||||||
|
'கடப்பு': 'தெருவாசல்',
|
||||||
|
'சுள்ளி': 'காய்ந்த குச்சி',
|
||||||
|
'திறாவுதல்': 'தடவுதல்',
|
||||||
|
'நாசமறுப்பு': 'தொல்லை',
|
||||||
|
'பரிசாரி': 'வைத்தியன்',
|
||||||
|
'பறவாதி': 'பேராசைக்காரன்',
|
||||||
|
'பிசினி': 'உலோபி ',
|
||||||
|
'விசர்': 'பைத்தியம்',
|
||||||
|
'ஏனம்': 'பாத்திரம்',
|
||||||
|
'ஏலா': 'இயலாது',
|
||||||
|
'ஒசில்': 'அழகு',
|
||||||
|
'ஒள்ளுப்பம்': 'கொஞ்சம்',
|
||||||
|
|
||||||
|
# Srilankan and indian
|
||||||
|
'குத்துமதிப்பு': '',
|
||||||
|
'நூனாயம்': 'நூல்நயம்',
|
||||||
|
'பைய': 'மெதுவாக',
|
||||||
|
'மண்டை': 'தலை',
|
||||||
|
'வெள்ளனே': 'சீக்கிரம்',
|
||||||
|
'உசுப்பு': 'எழுப்பு',
|
||||||
|
'ஆணம்': 'குழம்பு',
|
||||||
|
'உறக்கம்': 'தூக்கம்',
|
||||||
|
'பஸ்': 'பேருந்து',
|
||||||
|
'களவு': 'திருட்டு ',
|
||||||
|
|
||||||
|
#relationship
|
||||||
|
'புருசன்': 'கணவன்',
|
||||||
|
'பொஞ்சாதி': 'மனைவி',
|
||||||
|
'புள்ள': 'பிள்ளை',
|
||||||
|
'பிள்ள': 'பிள்ளை',
|
||||||
|
'ஆம்பிளப்புள்ள': 'ஆண் பிள்ளை',
|
||||||
|
'பொம்பிளப்புள்ள': 'பெண் பிள்ளை',
|
||||||
|
'அண்ணாச்சி': 'அண்ணா',
|
||||||
|
'அக்காச்சி': 'அக்கா',
|
||||||
|
'தங்கச்சி': 'தங்கை',
|
||||||
|
|
||||||
|
#difference words
|
||||||
|
'பொடியன்': 'சிறுவன்',
|
||||||
|
'பொட்டை': 'சிறுமி',
|
||||||
|
'பிறகு': 'பின்பு',
|
||||||
|
'டக்கென்டு': 'விரைவாக',
|
||||||
|
'கெதியா': 'விரைவாக',
|
||||||
|
'கிறுகி': 'திரும்பி',
|
||||||
|
'போயித்து வாறன்': 'போய் வருகிறேன்',
|
||||||
|
'வருவாங்களா': 'வருவார்களா',
|
||||||
|
|
||||||
|
# regular spokens
|
||||||
|
'சொல்லு': 'சொல்',
|
||||||
|
'கேளு': 'கேள்',
|
||||||
|
'சொல்லுங்க': 'சொல்லுங்கள்',
|
||||||
|
'கேளுங்க': 'கேளுங்கள்',
|
||||||
|
'நீங்கள்': 'நீ',
|
||||||
|
'உன்': 'உன்னுடைய',
|
||||||
|
|
||||||
|
# Portugeese formal words
|
||||||
|
'அலவாங்கு': 'கடப்பாரை',
|
||||||
|
'ஆசுப்பத்திரி': 'மருத்துவமனை',
|
||||||
|
'உரோதை': 'சில்லு',
|
||||||
|
'கடுதாசி': 'கடிதம்',
|
||||||
|
'கதிரை': 'நாற்காலி',
|
||||||
|
'குசினி': 'அடுக்களை',
|
||||||
|
'கோப்பை': 'கிண்ணம்',
|
||||||
|
'சப்பாத்து': 'காலணி',
|
||||||
|
'தாச்சி': 'இரும்புச் சட்டி',
|
||||||
|
'துவாய்': 'துவாலை',
|
||||||
|
'தவறணை': 'மதுக்கடை',
|
||||||
|
'பீப்பா': 'மரத்தாழி',
|
||||||
|
'யன்னல்': 'சாளரம்',
|
||||||
|
'வாங்கு': 'மரஇருக்கை',
|
||||||
|
|
||||||
|
# Dutch formal words
|
||||||
|
'இறாக்கை': 'பற்சட்டம்',
|
||||||
|
'இலாட்சி': 'இழுப்பறை',
|
||||||
|
'கந்தோர்': 'பணிமனை',
|
||||||
|
'நொத்தாரிசு': 'ஆவண எழுத்துபதிவாளர்',
|
||||||
|
|
||||||
|
# English formal words
|
||||||
|
'இஞ்சினியர்': 'பொறியியலாளர்',
|
||||||
|
'சூப்பு': 'ரசம்',
|
||||||
|
'செக்': 'காசோலை',
|
||||||
|
'சேட்டு': 'மேற்ச்சட்டை',
|
||||||
|
'மார்க்கட்டு': 'சந்தை',
|
||||||
|
'விண்ணன்': 'கெட்டிக்காரன்',
|
||||||
|
|
||||||
|
# Arabic formal words
|
||||||
|
'ஈமான்': 'நம்பிக்கை',
|
||||||
|
'சுன்னத்து': 'விருத்தசேதனம்',
|
||||||
|
'செய்த்தான்': 'பிசாசு',
|
||||||
|
'மவுத்து': 'இறப்பு',
|
||||||
|
'ஹலால்': 'அங்கீகரிக்கப்பட்டது',
|
||||||
|
'கறாம்': 'நிராகரிக்கப்பட்டது',
|
||||||
|
# Persian, Hindustanian and hindi formal words
|
||||||
|
'சுமார்': 'கிட்டத்தட்ட',
|
||||||
|
'சிப்பாய்': 'போர்வீரன்',
|
||||||
|
'சிபார்சு': 'சிபாரிசு',
|
||||||
|
'ஜமீன்': 'பணக்காரா்',
|
||||||
|
'அசல்': 'மெய்யான',
|
||||||
|
'அந்தஸ்து': 'கௌரவம்',
|
||||||
|
'ஆஜர்': 'சமா்ப்பித்தல்',
|
||||||
|
'உசார்': 'எச்சரிக்கை',
|
||||||
|
'அச்சா':'நல்ல',
|
||||||
|
# English words used in text conversations
|
||||||
|
"bcoz": "ஏனெனில்",
|
||||||
|
"bcuz": "ஏனெனில்",
|
||||||
|
"fav": "விருப்பமான",
|
||||||
|
"morning": "காலை வணக்கம்",
|
||||||
|
"gdeveng": "மாலை வணக்கம்",
|
||||||
|
"gdnyt": "இரவு வணக்கம்",
|
||||||
|
"gdnit": "இரவு வணக்கம்",
|
||||||
|
"plz": "தயவு செய்து",
|
||||||
|
"pls": "தயவு செய்து",
|
||||||
|
"thx": "நன்றி",
|
||||||
|
"thanx": "நன்றி",
|
||||||
|
}
|
||||||
|
|
||||||
|
NORM_EXCEPTIONS = {}
|
||||||
|
|
||||||
|
for string, norm in _exc.items():
|
||||||
|
NORM_EXCEPTIONS[string] = norm
|
133
spacy/lang/ta/stop_words.py
Normal file
133
spacy/lang/ta/stop_words.py
Normal file
|
@ -0,0 +1,133 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
|
||||||
|
# Stop words
|
||||||
|
|
||||||
|
STOP_WORDS = set("""
|
||||||
|
ஒரு
|
||||||
|
என்று
|
||||||
|
மற்றும்
|
||||||
|
இந்த
|
||||||
|
இது
|
||||||
|
என்ற
|
||||||
|
கொண்டு
|
||||||
|
என்பது
|
||||||
|
பல
|
||||||
|
ஆகும்
|
||||||
|
அல்லது
|
||||||
|
அவர்
|
||||||
|
நான்
|
||||||
|
உள்ள
|
||||||
|
அந்த
|
||||||
|
இவர்
|
||||||
|
என
|
||||||
|
முதல்
|
||||||
|
என்ன
|
||||||
|
இருந்து
|
||||||
|
சில
|
||||||
|
என்
|
||||||
|
போன்ற
|
||||||
|
வேண்டும்
|
||||||
|
வந்து
|
||||||
|
இதன்
|
||||||
|
அது
|
||||||
|
அவன்
|
||||||
|
தான்
|
||||||
|
பலரும்
|
||||||
|
என்னும்
|
||||||
|
மேலும்
|
||||||
|
பின்னர்
|
||||||
|
கொண்ட
|
||||||
|
இருக்கும்
|
||||||
|
தனது
|
||||||
|
உள்ளது
|
||||||
|
போது
|
||||||
|
என்றும்
|
||||||
|
அதன்
|
||||||
|
தன்
|
||||||
|
பிறகு
|
||||||
|
அவர்கள்
|
||||||
|
வரை
|
||||||
|
அவள்
|
||||||
|
நீ
|
||||||
|
ஆகிய
|
||||||
|
இருந்தது
|
||||||
|
உள்ளன
|
||||||
|
வந்த
|
||||||
|
இருந்த
|
||||||
|
மிகவும்
|
||||||
|
இங்கு
|
||||||
|
மீது
|
||||||
|
ஓர்
|
||||||
|
இவை
|
||||||
|
இந்தக்
|
||||||
|
பற்றி
|
||||||
|
வரும்
|
||||||
|
வேறு
|
||||||
|
இரு
|
||||||
|
இதில்
|
||||||
|
போல்
|
||||||
|
இப்போது
|
||||||
|
அவரது
|
||||||
|
மட்டும்
|
||||||
|
இந்தப்
|
||||||
|
எனும்
|
||||||
|
மேல்
|
||||||
|
பின்
|
||||||
|
சேர்ந்த
|
||||||
|
ஆகியோர்
|
||||||
|
எனக்கு
|
||||||
|
இன்னும்
|
||||||
|
அந்தப்
|
||||||
|
அன்று
|
||||||
|
ஒரே
|
||||||
|
மிக
|
||||||
|
அங்கு
|
||||||
|
பல்வேறு
|
||||||
|
விட்டு
|
||||||
|
பெரும்
|
||||||
|
அதை
|
||||||
|
பற்றிய
|
||||||
|
உன்
|
||||||
|
அதிக
|
||||||
|
அந்தக்
|
||||||
|
பேர்
|
||||||
|
இதனால்
|
||||||
|
அவை
|
||||||
|
அதே
|
||||||
|
ஏன்
|
||||||
|
முறை
|
||||||
|
யார்
|
||||||
|
என்பதை
|
||||||
|
எல்லாம்
|
||||||
|
மட்டுமே
|
||||||
|
இங்கே
|
||||||
|
அங்கே
|
||||||
|
இடம்
|
||||||
|
இடத்தில்
|
||||||
|
அதில்
|
||||||
|
நாம்
|
||||||
|
அதற்கு
|
||||||
|
எனவே
|
||||||
|
பிற
|
||||||
|
சிறு
|
||||||
|
மற்ற
|
||||||
|
விட
|
||||||
|
எந்த
|
||||||
|
எனவும்
|
||||||
|
எனப்படும்
|
||||||
|
எனினும்
|
||||||
|
அடுத்த
|
||||||
|
இதனை
|
||||||
|
இதை
|
||||||
|
கொள்ள
|
||||||
|
இந்தத்
|
||||||
|
இதற்கு
|
||||||
|
அதனால்
|
||||||
|
தவிர
|
||||||
|
போல
|
||||||
|
வரையில்
|
||||||
|
சற்று
|
||||||
|
எனக்
|
||||||
|
""".split())
|
|
@ -5,24 +5,14 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
|
||||||
from .tag_map import TAG_MAP
|
from .tag_map import TAG_MAP
|
||||||
from .stop_words import STOP_WORDS
|
from .stop_words import STOP_WORDS
|
||||||
|
|
||||||
from ...tokens import Doc
|
|
||||||
from ...language import Language
|
|
||||||
from ...attrs import LANG
|
from ...attrs import LANG
|
||||||
|
from ...language import Language
|
||||||
|
from ...tokens import Doc
|
||||||
|
from ...util import DummyTokenizer
|
||||||
|
|
||||||
|
|
||||||
class ThaiDefaults(Language.Defaults):
|
class ThaiTokenizer(DummyTokenizer):
|
||||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
def __init__(self, cls, nlp=None):
|
||||||
lex_attr_getters[LANG] = lambda text: "th"
|
|
||||||
tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
|
|
||||||
tag_map = TAG_MAP
|
|
||||||
stop_words = STOP_WORDS
|
|
||||||
|
|
||||||
|
|
||||||
class Thai(Language):
|
|
||||||
lang = "th"
|
|
||||||
Defaults = ThaiDefaults
|
|
||||||
|
|
||||||
def make_doc(self, text):
|
|
||||||
try:
|
try:
|
||||||
from pythainlp.tokenize import word_tokenize
|
from pythainlp.tokenize import word_tokenize
|
||||||
except ImportError:
|
except ImportError:
|
||||||
|
@ -30,8 +20,35 @@ class Thai(Language):
|
||||||
"The Thai tokenizer requires the PyThaiNLP library: "
|
"The Thai tokenizer requires the PyThaiNLP library: "
|
||||||
"https://github.com/PyThaiNLP/pythainlp"
|
"https://github.com/PyThaiNLP/pythainlp"
|
||||||
)
|
)
|
||||||
words = [x for x in list(word_tokenize(text, "newmm"))]
|
|
||||||
return Doc(self.vocab, words=words, spaces=[False] * len(words))
|
self.word_tokenize = word_tokenize
|
||||||
|
self.vocab = nlp.vocab if nlp is not None else cls.create_vocab(nlp)
|
||||||
|
|
||||||
|
def __call__(self, text):
|
||||||
|
words = list(self.word_tokenize(text, "newmm"))
|
||||||
|
spaces = [False] * len(words)
|
||||||
|
return Doc(self.vocab, words=words, spaces=spaces)
|
||||||
|
|
||||||
|
|
||||||
|
class ThaiDefaults(Language.Defaults):
|
||||||
|
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||||
|
lex_attr_getters[LANG] = lambda _text: "th"
|
||||||
|
|
||||||
|
tokenizer_exceptions = dict(TOKENIZER_EXCEPTIONS)
|
||||||
|
tag_map = TAG_MAP
|
||||||
|
stop_words = STOP_WORDS
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def create_tokenizer(cls, nlp=None):
|
||||||
|
return ThaiTokenizer(cls, nlp)
|
||||||
|
|
||||||
|
|
||||||
|
class Thai(Language):
|
||||||
|
lang = "th"
|
||||||
|
Defaults = ThaiDefaults
|
||||||
|
|
||||||
|
def make_doc(self, text):
|
||||||
|
return self.tokenizer(text)
|
||||||
|
|
||||||
|
|
||||||
__all__ = ["Thai"]
|
__all__ = ["Thai"]
|
||||||
|
|
|
@ -5,6 +5,7 @@ from ...attrs import LIKE_NUM
|
||||||
|
|
||||||
|
|
||||||
# Thirteen, fifteen etc. are written separate: on üç
|
# Thirteen, fifteen etc. are written separate: on üç
|
||||||
|
|
||||||
_num_words = [
|
_num_words = [
|
||||||
"bir",
|
"bir",
|
||||||
"iki",
|
"iki",
|
||||||
|
@ -28,6 +29,7 @@ _num_words = [
|
||||||
"bin",
|
"bin",
|
||||||
"milyon",
|
"milyon",
|
||||||
"milyar",
|
"milyar",
|
||||||
|
"trilyon",
|
||||||
"katrilyon",
|
"katrilyon",
|
||||||
"kentilyon",
|
"kentilyon",
|
||||||
]
|
]
|
||||||
|
|
|
@ -353,10 +353,38 @@ def test_doc_api_similarity_match():
|
||||||
assert doc.similarity(doc2) == 0.0
|
assert doc.similarity(doc2) == 0.0
|
||||||
|
|
||||||
|
|
||||||
def test_lowest_common_ancestor(en_tokenizer):
|
@pytest.mark.parametrize(
|
||||||
tokens = en_tokenizer("the lazy dog slept")
|
"sentence,heads,lca_matrix",
|
||||||
doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=[2, 1, 1, 0])
|
[
|
||||||
|
(
|
||||||
|
"the lazy dog slept",
|
||||||
|
[2, 1, 1, 0],
|
||||||
|
numpy.array([[0, 2, 2, 3], [2, 1, 2, 3], [2, 2, 2, 3], [3, 3, 3, 3]]),
|
||||||
|
),
|
||||||
|
(
|
||||||
|
"The lazy dog slept. The quick fox jumped",
|
||||||
|
[2, 1, 1, 0, -1, 2, 1, 1, 0],
|
||||||
|
numpy.array(
|
||||||
|
[
|
||||||
|
[0, 2, 2, 3, 3, -1, -1, -1, -1],
|
||||||
|
[2, 1, 2, 3, 3, -1, -1, -1, -1],
|
||||||
|
[2, 2, 2, 3, 3, -1, -1, -1, -1],
|
||||||
|
[3, 3, 3, 3, 3, -1, -1, -1, -1],
|
||||||
|
[3, 3, 3, 3, 4, -1, -1, -1, -1],
|
||||||
|
[-1, -1, -1, -1, -1, 5, 7, 7, 8],
|
||||||
|
[-1, -1, -1, -1, -1, 7, 6, 7, 8],
|
||||||
|
[-1, -1, -1, -1, -1, 7, 7, 7, 8],
|
||||||
|
[-1, -1, -1, -1, -1, 8, 8, 8, 8],
|
||||||
|
]
|
||||||
|
),
|
||||||
|
),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_lowest_common_ancestor(en_tokenizer, sentence, heads, lca_matrix):
|
||||||
|
tokens = en_tokenizer(sentence)
|
||||||
|
doc = get_doc(tokens.vocab, [t.text for t in tokens], heads=heads)
|
||||||
lca = doc.get_lca_matrix()
|
lca = doc.get_lca_matrix()
|
||||||
|
assert (lca == lca_matrix).all()
|
||||||
assert lca[1, 1] == 1
|
assert lca[1, 1] == 1
|
||||||
assert lca[0, 1] == 2
|
assert lca[0, 1] == 2
|
||||||
assert lca[1, 2] == 2
|
assert lca[1, 2] == 2
|
||||||
|
|
|
@ -80,10 +80,24 @@ def test_spans_lca_matrix(en_tokenizer):
|
||||||
tokens = en_tokenizer("the lazy dog slept")
|
tokens = en_tokenizer("the lazy dog slept")
|
||||||
doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=[2, 1, 1, 0])
|
doc = get_doc(tokens.vocab, words=[t.text for t in tokens], heads=[2, 1, 1, 0])
|
||||||
lca = doc[:2].get_lca_matrix()
|
lca = doc[:2].get_lca_matrix()
|
||||||
assert lca[0, 0] == 0
|
assert lca.shape == (2, 2)
|
||||||
assert lca[0, 1] == -1
|
assert lca[0, 0] == 0 # the & the -> the
|
||||||
assert lca[1, 0] == -1
|
assert lca[0, 1] == -1 # the & lazy -> dog (out of span)
|
||||||
assert lca[1, 1] == 1
|
assert lca[1, 0] == -1 # lazy & the -> dog (out of span)
|
||||||
|
assert lca[1, 1] == 1 # lazy & lazy -> lazy
|
||||||
|
|
||||||
|
lca = doc[1:].get_lca_matrix()
|
||||||
|
assert lca.shape == (3, 3)
|
||||||
|
assert lca[0, 0] == 0 # lazy & lazy -> lazy
|
||||||
|
assert lca[0, 1] == 1 # lazy & dog -> dog
|
||||||
|
assert lca[0, 2] == 2 # lazy & slept -> slept
|
||||||
|
|
||||||
|
lca = doc[2:].get_lca_matrix()
|
||||||
|
assert lca.shape == (2, 2)
|
||||||
|
assert lca[0, 0] == 0 # dog & dog -> dog
|
||||||
|
assert lca[0, 1] == 1 # dog & slept -> slept
|
||||||
|
assert lca[1, 0] == 1 # slept & dog -> slept
|
||||||
|
assert lca[1, 1] == 1 # slept & slept -> slept
|
||||||
|
|
||||||
|
|
||||||
def test_span_similarity_match():
|
def test_span_similarity_match():
|
||||||
|
@ -158,15 +172,17 @@ def test_span_as_doc(doc):
|
||||||
|
|
||||||
|
|
||||||
def test_span_string_label(doc):
|
def test_span_string_label(doc):
|
||||||
span = Span(doc, 0, 1, label='hello')
|
span = Span(doc, 0, 1, label="hello")
|
||||||
assert span.label_ == 'hello'
|
assert span.label_ == "hello"
|
||||||
assert span.label == doc.vocab.strings['hello']
|
assert span.label == doc.vocab.strings["hello"]
|
||||||
|
|
||||||
|
|
||||||
def test_span_string_set_label(doc):
|
def test_span_string_set_label(doc):
|
||||||
span = Span(doc, 0, 1)
|
span = Span(doc, 0, 1)
|
||||||
span.label_ = 'hello'
|
span.label_ = "hello"
|
||||||
assert span.label_ == 'hello'
|
assert span.label_ == "hello"
|
||||||
assert span.label == doc.vocab.strings['hello']
|
assert span.label == doc.vocab.strings["hello"]
|
||||||
|
|
||||||
|
|
||||||
def test_span_ents_property(doc):
|
def test_span_ents_property(doc):
|
||||||
"""Test span.ents for the """
|
"""Test span.ents for the """
|
||||||
|
|
53
spacy/tests/lang/sv/test_exceptions.py
Normal file
53
spacy/tests/lang/sv/test_exceptions.py
Normal file
|
@ -0,0 +1,53 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
SV_TOKEN_EXCEPTION_TESTS = [
|
||||||
|
('Smörsåsen används bl.a. till fisk', ['Smörsåsen', 'används', 'bl.a.', 'till', 'fisk']),
|
||||||
|
('Jag kommer först kl. 13 p.g.a. diverse förseningar', ['Jag', 'kommer', 'först', 'kl.', '13', 'p.g.a.', 'diverse', 'förseningar']),
|
||||||
|
('Anders I. tycker om ord med i i.', ["Anders", "I.", "tycker", "om", "ord", "med", "i", "i", "."])
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text,expected_tokens', SV_TOKEN_EXCEPTION_TESTS)
|
||||||
|
def test_sv_tokenizer_handles_exception_cases(sv_tokenizer, text, expected_tokens):
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
token_list = [token.text for token in tokens if not token.is_space]
|
||||||
|
assert expected_tokens == token_list
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text', ["driveru", "hajaru", "Serru", "Fixaru"])
|
||||||
|
def test_sv_tokenizer_handles_verb_exceptions(sv_tokenizer, text):
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 2
|
||||||
|
assert tokens[1].text == "u"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text',
|
||||||
|
["bl.a", "m.a.o.", "Jan.", "Dec.", "kr.", "osv."])
|
||||||
|
def test_sv_tokenizer_handles_abbr(sv_tokenizer, text):
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 1
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text', ["Jul.", "jul.", "sön.", "Sön."])
|
||||||
|
def test_sv_tokenizer_handles_ambiguous_abbr(sv_tokenizer, text):
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_sv_tokenizer_handles_exc_in_text(sv_tokenizer):
|
||||||
|
text = "Det er bl.a. ikke meningen"
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 5
|
||||||
|
assert tokens[2].text == "bl.a."
|
||||||
|
|
||||||
|
|
||||||
|
def test_sv_tokenizer_handles_custom_base_exc(sv_tokenizer):
|
||||||
|
text = "Her er noget du kan kigge i."
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 8
|
||||||
|
assert tokens[6].text == "i"
|
||||||
|
assert tokens[7].text == "."
|
15
spacy/tests/lang/sv/test_lemmatizer.py
Normal file
15
spacy/tests/lang/sv/test_lemmatizer.py
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
# coding: utf-8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('string,lemma', [('DNA-profilernas', 'DNA-profil'),
|
||||||
|
('Elfenbenskustens', 'Elfenbenskusten'),
|
||||||
|
('abortmotståndarens', 'abortmotståndare'),
|
||||||
|
('kolesterols', 'kolesterol'),
|
||||||
|
('portionssnusernas', 'portionssnus'),
|
||||||
|
('åsyns', 'åsyn')])
|
||||||
|
def test_lemmatizer_lookup_assigns(sv_tokenizer, string, lemma):
|
||||||
|
tokens = sv_tokenizer(string)
|
||||||
|
assert tokens[0].lemma_ == lemma
|
37
spacy/tests/lang/sv/test_prefix_suffix_infix.py
Normal file
37
spacy/tests/lang/sv/test_prefix_suffix_infix.py
Normal file
|
@ -0,0 +1,37 @@
|
||||||
|
# coding: utf-8
|
||||||
|
"""Test that tokenizer prefixes, suffixes and infixes are handled correctly."""
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text', ["(under)"])
|
||||||
|
def test_tokenizer_splits_no_special(sv_tokenizer, text):
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 3
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text', ["gitta'r", "Björn's", "Lars'"])
|
||||||
|
def test_tokenizer_handles_no_punct(sv_tokenizer, text):
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 1
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text', ["svart.Gul", "Hej.Världen"])
|
||||||
|
def test_tokenizer_splits_period_infix(sv_tokenizer, text):
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 3
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text', ["Hej,Världen", "en,två"])
|
||||||
|
def test_tokenizer_splits_comma_infix(sv_tokenizer, text):
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 3
|
||||||
|
assert tokens[0].text == text.split(",")[0]
|
||||||
|
assert tokens[1].text == ","
|
||||||
|
assert tokens[2].text == text.split(",")[1]
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text', ["svart...Gul", "svart...gul"])
|
||||||
|
def test_tokenizer_splits_ellipsis_infix(sv_tokenizer, text):
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 3
|
21
spacy/tests/lang/sv/test_text.py
Normal file
21
spacy/tests/lang/sv/test_text.py
Normal file
|
@ -0,0 +1,21 @@
|
||||||
|
# coding: utf-8
|
||||||
|
"""Test that longer and mixed texts are tokenized correctly."""
|
||||||
|
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
def test_sv_tokenizer_handles_long_text(sv_tokenizer):
|
||||||
|
text = """Det var så härligt ute på landet. Det var sommar, majsen var gul, havren grön,
|
||||||
|
höet var uppställt i stackar nere vid den gröna ängen, och där gick storken på sina långa,
|
||||||
|
röda ben och snackade engelska, för det språket hade han lärt sig av sin mor.
|
||||||
|
|
||||||
|
Runt om åkrar och äng låg den stora skogen, och mitt i skogen fanns djupa sjöar; jo, det var verkligen trevligt ute på landet!"""
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 86
|
||||||
|
|
||||||
|
|
||||||
|
def test_sv_tokenizer_handles_trailing_dot_for_i_in_sentence(sv_tokenizer):
|
||||||
|
text = "Provar att tokenisera en mening med ord i."
|
||||||
|
tokens = sv_tokenizer(text)
|
||||||
|
assert len(tokens) == 9
|
|
@ -5,27 +5,31 @@ from ..util import get_doc
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
import numpy
|
import numpy
|
||||||
from numpy.testing import assert_array_equal
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize('words,heads,matrix', [
|
@pytest.mark.parametrize(
|
||||||
(
|
"sentence,heads,matrix",
|
||||||
'She created a test for spacy'.split(),
|
[
|
||||||
[1, 0, 1, -2, -1, -1],
|
(
|
||||||
numpy.array([
|
"She created a test for spacy",
|
||||||
[0, 1, 1, 1, 1, 1],
|
[1, 0, 1, -2, -1, -1],
|
||||||
[1, 1, 1, 1, 1, 1],
|
numpy.array(
|
||||||
[1, 1, 2, 3, 3, 3],
|
[
|
||||||
[1, 1, 3, 3, 3, 3],
|
[0, 1, 1, 1, 1, 1],
|
||||||
[1, 1, 3, 3, 4, 4],
|
[1, 1, 1, 1, 1, 1],
|
||||||
[1, 1, 3, 3, 4, 5]], dtype=numpy.int32)
|
[1, 1, 2, 3, 3, 3],
|
||||||
)
|
[1, 1, 3, 3, 3, 3],
|
||||||
])
|
[1, 1, 3, 3, 4, 4],
|
||||||
def test_issue2396(en_vocab, words, heads, matrix):
|
[1, 1, 3, 3, 4, 5],
|
||||||
doc = get_doc(en_vocab, words=words, heads=heads)
|
],
|
||||||
|
dtype=numpy.int32,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
def test_issue2396(en_tokenizer, sentence, heads, matrix):
|
||||||
|
tokens = en_tokenizer(sentence)
|
||||||
|
doc = get_doc(tokens.vocab, [t.text for t in tokens], heads=heads)
|
||||||
span = doc[:]
|
span = doc[:]
|
||||||
assert_array_equal(doc.get_lca_matrix(), matrix)
|
assert (doc.get_lca_matrix() == matrix).all()
|
||||||
assert_array_equal(span.get_lca_matrix(), matrix)
|
assert (span.get_lca_matrix() == matrix).all()
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -10,7 +10,7 @@ def test_issue2901():
|
||||||
"""Test that `nlp` doesn't fail."""
|
"""Test that `nlp` doesn't fail."""
|
||||||
try:
|
try:
|
||||||
nlp = Japanese()
|
nlp = Japanese()
|
||||||
except:
|
except ImportError:
|
||||||
pytest.skip()
|
pytest.skip()
|
||||||
|
|
||||||
doc = nlp("pythonが大好きです")
|
doc = nlp("pythonが大好きです")
|
||||||
|
|
10
spacy/tests/regression/test_issue3178.py
Normal file
10
spacy/tests/regression/test_issue3178.py
Normal file
|
@ -0,0 +1,10 @@
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
import pytest
|
||||||
|
import spacy
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.models('fr')
|
||||||
|
def test_issue1959(FR):
|
||||||
|
texts = ['Je suis la mauvaise herbe', "Me, myself and moi"]
|
||||||
|
for text in texts:
|
||||||
|
FR(text)
|
|
@ -1075,21 +1075,30 @@ cdef int [:,:] _get_lca_matrix(Doc doc, int start, int end):
|
||||||
cdef int [:,:] lca_matrix
|
cdef int [:,:] lca_matrix
|
||||||
|
|
||||||
n_tokens= end - start
|
n_tokens= end - start
|
||||||
lca_matrix = numpy.empty((n_tokens, n_tokens), dtype=numpy.int32)
|
lca_mat = numpy.empty((n_tokens, n_tokens), dtype=numpy.int32)
|
||||||
|
lca_mat.fill(-1)
|
||||||
|
lca_matrix = lca_mat
|
||||||
|
|
||||||
for j in range(start, end):
|
for j in range(n_tokens):
|
||||||
token_j = doc[j]
|
token_j = doc[start + j]
|
||||||
# the common ancestor of token and itself is itself:
|
# the common ancestor of token and itself is itself:
|
||||||
lca_matrix[j, j] = j
|
lca_matrix[j, j] = j
|
||||||
for k in range(j + 1, end):
|
# we will only iterate through tokens in the same sentence
|
||||||
lca = _get_tokens_lca(token_j, doc[k])
|
sent = token_j.sent
|
||||||
|
sent_start = sent.start
|
||||||
|
j_idx_in_sent = start + j - sent_start
|
||||||
|
n_missing_tokens_in_sent = len(sent) - j_idx_in_sent
|
||||||
|
# make sure we do not go past `end`, in cases where `end` < sent.end
|
||||||
|
max_range = min(j + n_missing_tokens_in_sent, end)
|
||||||
|
for k in range(j + 1, max_range):
|
||||||
|
lca = _get_tokens_lca(token_j, doc[start + k])
|
||||||
# if lca is outside of span, we set it to -1
|
# if lca is outside of span, we set it to -1
|
||||||
if not start <= lca < end:
|
if not start <= lca < end:
|
||||||
lca_matrix[j, k] = -1
|
lca_matrix[j, k] = -1
|
||||||
lca_matrix[k, j] = -1
|
lca_matrix[k, j] = -1
|
||||||
else:
|
else:
|
||||||
lca_matrix[j, k] = lca
|
lca_matrix[j, k] = lca - start
|
||||||
lca_matrix[k, j] = lca
|
lca_matrix[k, j] = lca - start
|
||||||
|
|
||||||
return lca_matrix
|
return lca_matrix
|
||||||
|
|
||||||
|
|
|
@ -524,9 +524,9 @@ cdef class Span:
|
||||||
return len(list(self.rights))
|
return len(list(self.rights))
|
||||||
|
|
||||||
property subtree:
|
property subtree:
|
||||||
"""Tokens that descend from tokens in the span, but fall outside it.
|
"""Tokens within the span and tokens which descend from them.
|
||||||
|
|
||||||
YIELDS (Token): A descendant of a token within the span.
|
YIELDS (Token): A token within the span, or a descendant from it.
|
||||||
"""
|
"""
|
||||||
def __get__(self):
|
def __get__(self):
|
||||||
for word in self.lefts:
|
for word in self.lefts:
|
||||||
|
|
|
@ -457,10 +457,11 @@ cdef class Token:
|
||||||
yield from self.rights
|
yield from self.rights
|
||||||
|
|
||||||
property subtree:
|
property subtree:
|
||||||
"""A sequence of all the token's syntactic descendents.
|
"""A sequence containing the token and all the token's syntactic
|
||||||
|
descendants.
|
||||||
|
|
||||||
YIELDS (Token): A descendent token such that
|
YIELDS (Token): A descendent token such that
|
||||||
`self.is_ancestor(descendent)`.
|
`self.is_ancestor(descendent) or token == self`.
|
||||||
"""
|
"""
|
||||||
def __get__(self):
|
def __get__(self):
|
||||||
for word in self.lefts:
|
for word in self.lefts:
|
||||||
|
|
|
@ -253,7 +253,6 @@ def get_entry_point(key, value):
|
||||||
def is_in_jupyter():
|
def is_in_jupyter():
|
||||||
"""Check if user is running spaCy from a Jupyter notebook by detecting the
|
"""Check if user is running spaCy from a Jupyter notebook by detecting the
|
||||||
IPython kernel. Mainly used for the displaCy visualizer.
|
IPython kernel. Mainly used for the displaCy visualizer.
|
||||||
|
|
||||||
RETURNS (bool): True if in Jupyter, False if not.
|
RETURNS (bool): True if in Jupyter, False if not.
|
||||||
"""
|
"""
|
||||||
# https://stackoverflow.com/a/39662359/6400719
|
# https://stackoverflow.com/a/39662359/6400719
|
||||||
|
@ -667,3 +666,19 @@ class SimpleFrozenDict(dict):
|
||||||
|
|
||||||
def update(self, other):
|
def update(self, other):
|
||||||
raise NotImplementedError(Errors.E095)
|
raise NotImplementedError(Errors.E095)
|
||||||
|
|
||||||
|
|
||||||
|
class DummyTokenizer(object):
|
||||||
|
# add dummy methods for to_bytes, from_bytes, to_disk and from_disk to
|
||||||
|
# allow serialization (see #1557)
|
||||||
|
def to_bytes(self, **exclude):
|
||||||
|
return b''
|
||||||
|
|
||||||
|
def from_bytes(self, _bytes_data, **exclude):
|
||||||
|
return self
|
||||||
|
|
||||||
|
def to_disk(self, _path, **exclude):
|
||||||
|
return None
|
||||||
|
|
||||||
|
def from_disk(self, _path, **exclude):
|
||||||
|
return self
|
||||||
|
|
|
@ -150,3 +150,9 @@ p
|
||||||
+dep-row("re", "repeated element")
|
+dep-row("re", "repeated element")
|
||||||
+dep-row("rs", "reported speech")
|
+dep-row("rs", "reported speech")
|
||||||
+dep-row("sb", "subject")
|
+dep-row("sb", "subject")
|
||||||
|
+dep-row("sbp", "passivised subject")
|
||||||
|
+dep-row("sp", "subject or predicate")
|
||||||
|
+dep-row("svp", "separable verb prefix")
|
||||||
|
+dep-row("uc", "unit component")
|
||||||
|
+dep-row("vo", "vocative")
|
||||||
|
+dep-row("ROOT", "root")
|
||||||
|
|
|
@ -5,7 +5,7 @@ include ../_includes/_mixins
|
||||||
p
|
p
|
||||||
| The #[code PhraseMatcher] lets you efficiently match large terminology
|
| The #[code PhraseMatcher] lets you efficiently match large terminology
|
||||||
| lists. While the #[+api("matcher") #[code Matcher]] lets you match
|
| lists. While the #[+api("matcher") #[code Matcher]] lets you match
|
||||||
| squences based on lists of token descriptions, the #[code PhraseMatcher]
|
| sequences based on lists of token descriptions, the #[code PhraseMatcher]
|
||||||
| accepts match patterns in the form of #[code Doc] objects.
|
| accepts match patterns in the form of #[code Doc] objects.
|
||||||
|
|
||||||
+h(2, "init") PhraseMatcher.__init__
|
+h(2, "init") PhraseMatcher.__init__
|
||||||
|
|
|
@ -489,7 +489,7 @@ p
|
||||||
+tag property
|
+tag property
|
||||||
+tag-model("parse")
|
+tag-model("parse")
|
||||||
|
|
||||||
p Tokens that descend from tokens in the span, but fall outside it.
|
p Tokens within the span and tokens which descend from them.
|
||||||
|
|
||||||
+aside-code("Example").
|
+aside-code("Example").
|
||||||
doc = nlp(u'Give it back! He pleaded.')
|
doc = nlp(u'Give it back! He pleaded.')
|
||||||
|
@ -500,7 +500,7 @@ p Tokens that descend from tokens in the span, but fall outside it.
|
||||||
+row("foot")
|
+row("foot")
|
||||||
+cell yields
|
+cell yields
|
||||||
+cell #[code Token]
|
+cell #[code Token]
|
||||||
+cell A descendant of a token within the span.
|
+cell A token within the span, or a descendant from it.
|
||||||
|
|
||||||
+h(2, "has_vector") Span.has_vector
|
+h(2, "has_vector") Span.has_vector
|
||||||
+tag property
|
+tag property
|
||||||
|
|
|
@ -1,3 +1,4 @@
|
||||||
|
|
||||||
//- 💫 DOCS > API > TOKEN
|
//- 💫 DOCS > API > TOKEN
|
||||||
|
|
||||||
include ../_includes/_mixins
|
include ../_includes/_mixins
|
||||||
|
@ -405,7 +406,7 @@ p
|
||||||
+tag property
|
+tag property
|
||||||
+tag-model("parse")
|
+tag-model("parse")
|
||||||
|
|
||||||
p A sequence of all the token's syntactic descendants.
|
p A sequence containing the token and all the token's syntactic descendants.
|
||||||
|
|
||||||
+aside-code("Example").
|
+aside-code("Example").
|
||||||
doc = nlp(u'Give it back! He pleaded.')
|
doc = nlp(u'Give it back! He pleaded.')
|
||||||
|
@ -416,7 +417,7 @@ p A sequence of all the token's syntactic descendants.
|
||||||
+row("foot")
|
+row("foot")
|
||||||
+cell yields
|
+cell yields
|
||||||
+cell #[code Token]
|
+cell #[code Token]
|
||||||
+cell A descendant token such that #[code self.is_ancestor(descendant)].
|
+cell A descendant token such that #[code self.is_ancestor(token) or token == self].
|
||||||
|
|
||||||
+h(2, "is_sent_start") Token.is_sent_start
|
+h(2, "is_sent_start") Token.is_sent_start
|
||||||
+tag property
|
+tag property
|
||||||
|
|
|
@ -1083,20 +1083,31 @@
|
||||||
"category": ["pipeline"]
|
"category": ["pipeline"]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": "spacy2conllu",
|
"id": "spacy-conll",
|
||||||
"title": "spaCy2CoNLLU",
|
"title": "spacy_conll",
|
||||||
"slogan": "Parse text with spaCy and print the output in CoNLL-U format",
|
"slogan": "Parse text with spaCy and print the output in CoNLL-U format",
|
||||||
"description": "Simple script to parse text with spaCy and print the output in CoNLL-U format",
|
"description": "This module allows you to parse a text to CoNLL-U format. You can use it as a command line tool, or embed it in your own scripts.",
|
||||||
"code_example": [
|
"code_example": [
|
||||||
"python parse_as_conllu.py [-h] --input_file INPUT_FILE [--output_file OUTPUT_FILE] --model MODEL"
|
"from spacy_conll import Spacy2ConllParser",
|
||||||
|
"spacyconll = Spacy2ConllParser()",
|
||||||
|
"",
|
||||||
|
"# `parse` returns a generator of the parsed sentences",
|
||||||
|
"for parsed_sent in spacyconll.parse(input_str='I like cookies.\nWhat about you?\nI don't like 'em!'):",
|
||||||
|
" do_something_(parsed_sent)",
|
||||||
|
"",
|
||||||
|
"# `parseprint` prints output to stdout (default) or a file (use `output_file` parameter)",
|
||||||
|
"# This method is called when using the command line",
|
||||||
|
"spacyconll.parseprint(input_str='I like cookies.')"
|
||||||
],
|
],
|
||||||
"code_language": "bash",
|
"code_language": "python",
|
||||||
"author": "Raquel G. Alhama",
|
"author": "Bram Vanroy",
|
||||||
"author_links": {
|
"author_links": {
|
||||||
"github": "rgalhama"
|
"github": "BramVanroy",
|
||||||
|
"website": "https://bramvanroy.be"
|
||||||
|
|
||||||
},
|
},
|
||||||
"github": "rgalhama/spaCy2CoNLLU",
|
"github": "BramVanroy/spacy_conll",
|
||||||
"category": ["training"]
|
"category": ["standalone"]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"projectCats": {
|
"projectCats": {
|
||||||
|
|
|
@ -159,7 +159,7 @@ p
|
||||||
| To provide training examples to the entity recogniser, you'll first need
|
| To provide training examples to the entity recogniser, you'll first need
|
||||||
| to create an instance of the #[+api("goldparse") #[code GoldParse]] class.
|
| to create an instance of the #[+api("goldparse") #[code GoldParse]] class.
|
||||||
| You can specify your annotations in a stand-off format or as token tags.
|
| You can specify your annotations in a stand-off format or as token tags.
|
||||||
| If a character offset in your entity annotations don't fall on a token
|
| If a character offset in your entity annotations doesn't fall on a token
|
||||||
| boundary, the #[code GoldParse] class will treat that annotation as a
|
| boundary, the #[code GoldParse] class will treat that annotation as a
|
||||||
| missing value. This allows for more realistic training, because the
|
| missing value. This allows for more realistic training, because the
|
||||||
| entity recogniser is allowed to learn from examples that may feature
|
| entity recogniser is allowed to learn from examples that may feature
|
||||||
|
|
|
@ -444,7 +444,7 @@ p
|
||||||
| Let's say you're analysing user comments and you want to find out what
|
| Let's say you're analysing user comments and you want to find out what
|
||||||
| people are saying about Facebook. You want to start off by finding
|
| people are saying about Facebook. You want to start off by finding
|
||||||
| adjectives following "Facebook is" or "Facebook was". This is obviously
|
| adjectives following "Facebook is" or "Facebook was". This is obviously
|
||||||
| a very rudimentary solution, but it'll be fast, and a great way get an
|
| a very rudimentary solution, but it'll be fast, and a great way to get an
|
||||||
| idea for what's in your data. Your pattern could look like this:
|
| idea for what's in your data. Your pattern could look like this:
|
||||||
|
|
||||||
+code.
|
+code.
|
||||||
|
|
|
@ -40,7 +40,7 @@ p
|
||||||
| constrained to predict parses consistent with the sentence boundaries.
|
| constrained to predict parses consistent with the sentence boundaries.
|
||||||
|
|
||||||
+infobox("Important note", "⚠️")
|
+infobox("Important note", "⚠️")
|
||||||
| To prevent inconsitent state, you can only set boundaries #[em before] a
|
| To prevent inconsistent state, you can only set boundaries #[em before] a
|
||||||
| document is parsed (and #[code Doc.is_parsed] is #[code False]). To
|
| document is parsed (and #[code Doc.is_parsed] is #[code False]). To
|
||||||
| ensure that your component is added in the right place, you can set
|
| ensure that your component is added in the right place, you can set
|
||||||
| #[code before='parser'] or #[code first=True] when adding it to the
|
| #[code before='parser'] or #[code first=True] when adding it to the
|
||||||
|
|
|
@ -21,7 +21,7 @@ p
|
||||||
| which needs to be split into two tokens: #[code {ORTH: "do"}] and
|
| which needs to be split into two tokens: #[code {ORTH: "do"}] and
|
||||||
| #[code {ORTH: "n't", LEMMA: "not"}]. The prefixes, suffixes and infixes
|
| #[code {ORTH: "n't", LEMMA: "not"}]. The prefixes, suffixes and infixes
|
||||||
| mosty define punctuation rules – for example, when to split off periods
|
| mosty define punctuation rules – for example, when to split off periods
|
||||||
| (at the end of a sentence), and when to leave token containing periods
|
| (at the end of a sentence), and when to leave tokens containing periods
|
||||||
| intact (abbreviations like "U.S.").
|
| intact (abbreviations like "U.S.").
|
||||||
|
|
||||||
+graphic("/assets/img/language_data.svg")
|
+graphic("/assets/img/language_data.svg")
|
||||||
|
|
|
@ -43,7 +43,7 @@ p
|
||||||
|
|
||||||
p
|
p
|
||||||
| This example shows how to use multiple cores to process text using
|
| This example shows how to use multiple cores to process text using
|
||||||
| spaCy and #[+a("https://pythonhosted.org/joblib/") Joblib]. We're
|
| spaCy and #[+a("https://joblib.readthedocs.io/en/latest/parallel.html") Joblib]. We're
|
||||||
| exporting part-of-speech-tagged, true-cased, (very roughly)
|
| exporting part-of-speech-tagged, true-cased, (very roughly)
|
||||||
| sentence-separated text, with each "sentence" on a newline, and
|
| sentence-separated text, with each "sentence" on a newline, and
|
||||||
| spaces between tokens. Data is loaded from the IMDB movie reviews
|
| spaces between tokens. Data is loaded from the IMDB movie reviews
|
||||||
|
|
|
@ -74,7 +74,7 @@ p
|
||||||
displacy.serve(doc, style='ent')
|
displacy.serve(doc, style='ent')
|
||||||
|
|
||||||
p
|
p
|
||||||
| This feature is espeically handy if you're using displaCy to compare
|
| This feature is especially handy if you're using displaCy to compare
|
||||||
| performance at different stages of a process, e.g. during training. Here
|
| performance at different stages of a process, e.g. during training. Here
|
||||||
| you could use the title for a brief description of the text example and
|
| you could use the title for a brief description of the text example and
|
||||||
| the number of iterations.
|
| the number of iterations.
|
||||||
|
|
|
@ -61,7 +61,7 @@ p
|
||||||
output_path.open('w', encoding='utf-8').write(svg)
|
output_path.open('w', encoding='utf-8').write(svg)
|
||||||
|
|
||||||
p
|
p
|
||||||
| The above code will generate the dependency visualizations as to
|
| The above code will generate the dependency visualizations as
|
||||||
| two files, #[code This-is-an-example.svg] and #[code This-is-another-one.svg].
|
| two files, #[code This-is-an-example.svg] and #[code This-is-another-one.svg].
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -24,7 +24,7 @@ include ../_includes/_mixins
|
||||||
| standards.
|
| standards.
|
||||||
|
|
||||||
p
|
p
|
||||||
| The quickest way visualize #[code Doc] is to use
|
| The quickest way to visualize #[code Doc] is to use
|
||||||
| #[+api("displacy#serve") #[code displacy.serve]]. This will spin up a
|
| #[+api("displacy#serve") #[code displacy.serve]]. This will spin up a
|
||||||
| simple web server and let you view the result straight from your browser.
|
| simple web server and let you view the result straight from your browser.
|
||||||
| displaCy can either take a single #[code Doc] or a list of #[code Doc]
|
| displaCy can either take a single #[code Doc] or a list of #[code Doc]
|
||||||
|
|
Loading…
Reference in New Issue
Block a user