mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-11 12:18:04 +03:00
Merge branch 'master' of https://github.com/explosion/spaCy into feature/better-faster-matcher
This commit is contained in:
commit
f7dc64d2a3
106
.github/contributors/emulbreh.md
vendored
Normal file
106
.github/contributors/emulbreh.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Johannes Dollinger |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2018-02-13 |
|
||||||
|
| GitHub username | emulbreh |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/enerrio.md
vendored
Normal file
106
.github/contributors/enerrio.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Aaron Marquez |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2/15/2018 |
|
||||||
|
| GitHub username | enerrio |
|
||||||
|
| Website (optional) | |
|
106
.github/contributors/oxinabox.md
vendored
Normal file
106
.github/contributors/oxinabox.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | -------------------- |
|
||||||
|
| Name | Lyndon White |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 9/2/2018 |
|
||||||
|
| GitHub username | oxinabox |
|
||||||
|
| Website (optional) | white.ucc.asn.au |
|
106
.github/contributors/ursachec.md
vendored
Normal file
106
.github/contributors/ursachec.md
vendored
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
# spaCy contributor agreement
|
||||||
|
|
||||||
|
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||||
|
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||||
|
The SCA applies to any contribution that you make to any product or project
|
||||||
|
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||||
|
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||||
|
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
|
||||||
|
**"you"** shall mean the person or entity identified below.
|
||||||
|
|
||||||
|
If you agree to be bound by these terms, fill in the information requested
|
||||||
|
below and include the filled-in version with your first pull request, under the
|
||||||
|
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||||
|
should be your GitHub username, with the extension `.md`. For example, the user
|
||||||
|
example_user would create the file `.github/contributors/example_user.md`.
|
||||||
|
|
||||||
|
Read this agreement carefully before signing. These terms and conditions
|
||||||
|
constitute a binding legal agreement.
|
||||||
|
|
||||||
|
## Contributor Agreement
|
||||||
|
|
||||||
|
1. The term "contribution" or "contributed materials" means any source code,
|
||||||
|
object code, patch, tool, sample, graphic, specification, manual,
|
||||||
|
documentation, or any other material posted or submitted by you to the project.
|
||||||
|
|
||||||
|
2. With respect to any worldwide copyrights, or copyright applications and
|
||||||
|
registrations, in your contribution:
|
||||||
|
|
||||||
|
* you hereby assign to us joint ownership, and to the extent that such
|
||||||
|
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||||
|
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||||
|
royalty-free, unrestricted license to exercise all rights under those
|
||||||
|
copyrights. This includes, at our option, the right to sublicense these same
|
||||||
|
rights to third parties through multiple levels of sublicensees or other
|
||||||
|
licensing arrangements;
|
||||||
|
|
||||||
|
* you agree that each of us can do all things in relation to your
|
||||||
|
contribution as if each of us were the sole owners, and if one of us makes
|
||||||
|
a derivative work of your contribution, the one who makes the derivative
|
||||||
|
work (or has it made will be the sole owner of that derivative work;
|
||||||
|
|
||||||
|
* you agree that you will not assert any moral rights in your contribution
|
||||||
|
against us, our licensees or transferees;
|
||||||
|
|
||||||
|
* you agree that we may register a copyright in your contribution and
|
||||||
|
exercise all ownership rights associated with it; and
|
||||||
|
|
||||||
|
* you agree that neither of us has any duty to consult with, obtain the
|
||||||
|
consent of, pay or render an accounting to the other for any use or
|
||||||
|
distribution of your contribution.
|
||||||
|
|
||||||
|
3. With respect to any patents you own, or that you can license without payment
|
||||||
|
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||||
|
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||||
|
|
||||||
|
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||||
|
your contribution in whole or in part, alone or in combination with or
|
||||||
|
included in any product, work or materials arising out of the project to
|
||||||
|
which your contribution was submitted, and
|
||||||
|
|
||||||
|
* at our option, to sublicense these same rights to third parties through
|
||||||
|
multiple levels of sublicensees or other licensing arrangements.
|
||||||
|
|
||||||
|
4. Except as set out above, you keep all right, title, and interest in your
|
||||||
|
contribution. The rights that you grant to us under these terms are effective
|
||||||
|
on the date you first submitted a contribution to us, even if your submission
|
||||||
|
took place before the date you sign these terms.
|
||||||
|
|
||||||
|
5. You covenant, represent, warrant and agree that:
|
||||||
|
|
||||||
|
* Each contribution that you submit is and shall be an original work of
|
||||||
|
authorship and you can legally grant the rights set out in this SCA;
|
||||||
|
|
||||||
|
* to the best of your knowledge, each contribution will not violate any
|
||||||
|
third party's copyrights, trademarks, patents, or other intellectual
|
||||||
|
property rights; and
|
||||||
|
|
||||||
|
* each contribution shall be in compliance with U.S. export control laws and
|
||||||
|
other applicable export and import laws. You agree to notify us if you
|
||||||
|
become aware of any circumstance which would make any of the foregoing
|
||||||
|
representations inaccurate in any respect. We may publicly disclose your
|
||||||
|
participation in the project, including the fact that you have signed the SCA.
|
||||||
|
|
||||||
|
6. This SCA is governed by the laws of the State of California and applicable
|
||||||
|
U.S. Federal law. Any choice of law rules will not apply.
|
||||||
|
|
||||||
|
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||||
|
mark both statements:
|
||||||
|
|
||||||
|
* [x] I am signing on behalf of myself as an individual and no other person
|
||||||
|
or entity, including my employer, has or will have rights with respect to my
|
||||||
|
contributions.
|
||||||
|
|
||||||
|
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||||
|
actual authority to contractually bind that entity.
|
||||||
|
|
||||||
|
## Contributor Details
|
||||||
|
|
||||||
|
| Field | Entry |
|
||||||
|
|------------------------------- | ------------------------- |
|
||||||
|
| Name | Claudiu-Vlad Ursache |
|
||||||
|
| Company name (if applicable) | |
|
||||||
|
| Title or role (if applicable) | |
|
||||||
|
| Date | 2018-02-04 |
|
||||||
|
| GitHub username | ursachec |
|
||||||
|
| Website (optional) | https://www.cvursache.com |
|
|
@ -18,9 +18,9 @@ cdef enum attr_id_t:
|
||||||
IS_QUOTE
|
IS_QUOTE
|
||||||
IS_LEFT_PUNCT
|
IS_LEFT_PUNCT
|
||||||
IS_RIGHT_PUNCT
|
IS_RIGHT_PUNCT
|
||||||
|
IS_CURRENCY
|
||||||
|
|
||||||
FLAG18 = 18
|
FLAG19 = 19
|
||||||
FLAG19
|
|
||||||
FLAG20
|
FLAG20
|
||||||
FLAG21
|
FLAG21
|
||||||
FLAG22
|
FLAG22
|
||||||
|
|
|
@ -21,7 +21,7 @@ IDS = {
|
||||||
"IS_QUOTE": IS_QUOTE,
|
"IS_QUOTE": IS_QUOTE,
|
||||||
"IS_LEFT_PUNCT": IS_LEFT_PUNCT,
|
"IS_LEFT_PUNCT": IS_LEFT_PUNCT,
|
||||||
"IS_RIGHT_PUNCT": IS_RIGHT_PUNCT,
|
"IS_RIGHT_PUNCT": IS_RIGHT_PUNCT,
|
||||||
"FLAG18": FLAG18,
|
"IS_CURRENCY": IS_CURRENCY,
|
||||||
"FLAG19": FLAG19,
|
"FLAG19": FLAG19,
|
||||||
"FLAG20": FLAG20,
|
"FLAG20": FLAG20,
|
||||||
"FLAG21": FLAG21,
|
"FLAG21": FLAG21,
|
||||||
|
|
|
@ -3,8 +3,6 @@ from __future__ import unicode_literals, division, print_function
|
||||||
|
|
||||||
import plac
|
import plac
|
||||||
from timeit import default_timer as timer
|
from timeit import default_timer as timer
|
||||||
import random
|
|
||||||
import numpy.random
|
|
||||||
|
|
||||||
from ..gold import GoldCorpus
|
from ..gold import GoldCorpus
|
||||||
from ..util import prints
|
from ..util import prints
|
||||||
|
@ -12,10 +10,6 @@ from .. import util
|
||||||
from .. import displacy
|
from .. import displacy
|
||||||
|
|
||||||
|
|
||||||
random.seed(0)
|
|
||||||
numpy.random.seed(0)
|
|
||||||
|
|
||||||
|
|
||||||
@plac.annotations(
|
@plac.annotations(
|
||||||
model=("model name or path", "positional", None, str),
|
model=("model name or path", "positional", None, str),
|
||||||
data_path=("location of JSON-formatted evaluation data", "positional",
|
data_path=("location of JSON-formatted evaluation data", "positional",
|
||||||
|
@ -31,6 +25,8 @@ def evaluate(model, data_path, gpu_id=-1, gold_preproc=False, displacy_path=None
|
||||||
Evaluate a model. To render a sample of parses in a HTML file, set an
|
Evaluate a model. To render a sample of parses in a HTML file, set an
|
||||||
output directory as the displacy_path argument.
|
output directory as the displacy_path argument.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
util.fix_random_seed()
|
||||||
if gpu_id >= 0:
|
if gpu_id >= 0:
|
||||||
util.use_gpu(gpu_id)
|
util.use_gpu(gpu_id)
|
||||||
util.set_env_log(False)
|
util.set_env_log(False)
|
||||||
|
|
|
@ -6,8 +6,6 @@ from pathlib import Path
|
||||||
import tqdm
|
import tqdm
|
||||||
from thinc.neural._classes.model import Model
|
from thinc.neural._classes.model import Model
|
||||||
from timeit import default_timer as timer
|
from timeit import default_timer as timer
|
||||||
import random
|
|
||||||
import numpy.random
|
|
||||||
|
|
||||||
from ..gold import GoldCorpus, minibatch
|
from ..gold import GoldCorpus, minibatch
|
||||||
from ..util import prints
|
from ..util import prints
|
||||||
|
@ -16,9 +14,6 @@ from .. import about
|
||||||
from .. import displacy
|
from .. import displacy
|
||||||
from ..compat import json_dumps
|
from ..compat import json_dumps
|
||||||
|
|
||||||
random.seed(0)
|
|
||||||
numpy.random.seed(0)
|
|
||||||
|
|
||||||
|
|
||||||
@plac.annotations(
|
@plac.annotations(
|
||||||
lang=("model language", "positional", None, str),
|
lang=("model language", "positional", None, str),
|
||||||
|
@ -45,6 +40,7 @@ def train(lang, output_dir, train_data, dev_data, n_iter=30, n_sents=0,
|
||||||
"""
|
"""
|
||||||
Train a model. Expects data in spaCy's JSON format.
|
Train a model. Expects data in spaCy's JSON format.
|
||||||
"""
|
"""
|
||||||
|
util.fix_random_seed()
|
||||||
util.set_env_log(True)
|
util.set_env_log(True)
|
||||||
n_sents = n_sents or None
|
n_sents = n_sents or None
|
||||||
output_path = util.ensure_path(output_dir)
|
output_path = util.ensure_path(output_dir)
|
||||||
|
|
|
@ -43,15 +43,15 @@ fix_text = ftfy.fix_text
|
||||||
copy_array = copy_array
|
copy_array = copy_array
|
||||||
izip = getattr(itertools, 'izip', zip)
|
izip = getattr(itertools, 'izip', zip)
|
||||||
|
|
||||||
is_python2 = six.PY2
|
|
||||||
is_python3 = six.PY3
|
|
||||||
is_windows = sys.platform.startswith('win')
|
is_windows = sys.platform.startswith('win')
|
||||||
is_linux = sys.platform.startswith('linux')
|
is_linux = sys.platform.startswith('linux')
|
||||||
is_osx = sys.platform == 'darwin'
|
is_osx = sys.platform == 'darwin'
|
||||||
|
|
||||||
|
is_python2 = six.PY2
|
||||||
|
is_python3 = six.PY3
|
||||||
|
is_python_pre_3_5 = is_python2 or (is_python3 and sys.version_info[1]<5)
|
||||||
|
|
||||||
if is_python2:
|
if is_python2:
|
||||||
import imp
|
|
||||||
bytes_ = str
|
bytes_ = str
|
||||||
unicode_ = unicode # noqa: F821
|
unicode_ = unicode # noqa: F821
|
||||||
basestring_ = basestring # noqa: F821
|
basestring_ = basestring # noqa: F821
|
||||||
|
@ -60,7 +60,6 @@ if is_python2:
|
||||||
path2str = lambda path: str(path).decode('utf8')
|
path2str = lambda path: str(path).decode('utf8')
|
||||||
|
|
||||||
elif is_python3:
|
elif is_python3:
|
||||||
import importlib.util
|
|
||||||
bytes_ = bytes
|
bytes_ = bytes
|
||||||
unicode_ = str
|
unicode_ = str
|
||||||
basestring_ = str
|
basestring_ = str
|
||||||
|
@ -111,9 +110,11 @@ def normalize_string_keys(old):
|
||||||
|
|
||||||
def import_file(name, loc):
|
def import_file(name, loc):
|
||||||
loc = str(loc)
|
loc = str(loc)
|
||||||
if is_python2:
|
if is_python_pre_3_5:
|
||||||
|
import imp
|
||||||
return imp.load_source(name, loc)
|
return imp.load_source(name, loc)
|
||||||
else:
|
else:
|
||||||
|
import importlib.util
|
||||||
spec = importlib.util.spec_from_file_location(name, str(loc))
|
spec = importlib.util.spec_from_file_location(name, str(loc))
|
||||||
module = importlib.util.module_from_spec(spec)
|
module = importlib.util.module_from_spec(spec)
|
||||||
spec.loader.exec_module(module)
|
spec.loader.exec_module(module)
|
||||||
|
|
|
@ -115,7 +115,7 @@ GLOSSARY = {
|
||||||
'ADJA': 'adjective, attributive',
|
'ADJA': 'adjective, attributive',
|
||||||
'ADJD': 'adjective, adverbial or predicative',
|
'ADJD': 'adjective, adverbial or predicative',
|
||||||
'APPO': 'postposition',
|
'APPO': 'postposition',
|
||||||
'APRP': 'preposition; circumposition left',
|
'APPR': 'preposition; circumposition left',
|
||||||
'APPRART': 'preposition with article',
|
'APPRART': 'preposition with article',
|
||||||
'APZR': 'circumposition right',
|
'APZR': 'circumposition right',
|
||||||
'ART': 'definite or indefinite article',
|
'ART': 'definite or indefinite article',
|
||||||
|
|
|
@ -69,6 +69,14 @@ def is_right_punct(text):
|
||||||
return text in right_punct
|
return text in right_punct
|
||||||
|
|
||||||
|
|
||||||
|
def is_currency(text):
|
||||||
|
# can be overwritten by lang with list of currency words, e.g. dollar, euro
|
||||||
|
for char in text:
|
||||||
|
if unicodedata.category(char) != 'Sc':
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
def like_email(text):
|
def like_email(text):
|
||||||
return bool(_like_email(text))
|
return bool(_like_email(text))
|
||||||
|
|
||||||
|
@ -164,5 +172,6 @@ LEX_ATTRS = {
|
||||||
attrs.IS_QUOTE: is_quote,
|
attrs.IS_QUOTE: is_quote,
|
||||||
attrs.IS_LEFT_PUNCT: is_left_punct,
|
attrs.IS_LEFT_PUNCT: is_left_punct,
|
||||||
attrs.IS_RIGHT_PUNCT: is_right_punct,
|
attrs.IS_RIGHT_PUNCT: is_right_punct,
|
||||||
|
attrs.IS_CURRENCY: is_currency,
|
||||||
attrs.LIKE_URL: like_url
|
attrs.LIKE_URL: like_url
|
||||||
}
|
}
|
||||||
|
|
|
@ -624,7 +624,7 @@ class Language(object):
|
||||||
deserializers = OrderedDict((
|
deserializers = OrderedDict((
|
||||||
('vocab', lambda p: self.vocab.from_disk(p)),
|
('vocab', lambda p: self.vocab.from_disk(p)),
|
||||||
('tokenizer', lambda p: self.tokenizer.from_disk(p, vocab=False)),
|
('tokenizer', lambda p: self.tokenizer.from_disk(p, vocab=False)),
|
||||||
('meta.json', lambda p: self.meta.update(ujson.load(p.open('r'))))
|
('meta.json', lambda p: self.meta.update(util.read_json(p)))
|
||||||
))
|
))
|
||||||
for name, proc in self.pipeline:
|
for name, proc in self.pipeline:
|
||||||
if name in disable:
|
if name in disable:
|
||||||
|
@ -720,5 +720,5 @@ class DisabledPipes(list):
|
||||||
|
|
||||||
def _pipe(func, docs):
|
def _pipe(func, docs):
|
||||||
for doc in docs:
|
for doc in docs:
|
||||||
func(doc)
|
doc = func(doc)
|
||||||
yield doc
|
yield doc
|
||||||
|
|
|
@ -12,7 +12,7 @@ import numpy
|
||||||
from .typedefs cimport attr_t, flags_t
|
from .typedefs cimport attr_t, flags_t
|
||||||
from .attrs cimport IS_ALPHA, IS_ASCII, IS_DIGIT, IS_LOWER, IS_PUNCT, IS_SPACE
|
from .attrs cimport IS_ALPHA, IS_ASCII, IS_DIGIT, IS_LOWER, IS_PUNCT, IS_SPACE
|
||||||
from .attrs cimport IS_TITLE, IS_UPPER, LIKE_URL, LIKE_NUM, LIKE_EMAIL, IS_STOP
|
from .attrs cimport IS_TITLE, IS_UPPER, LIKE_URL, LIKE_NUM, LIKE_EMAIL, IS_STOP
|
||||||
from .attrs cimport IS_BRACKET, IS_QUOTE, IS_LEFT_PUNCT, IS_RIGHT_PUNCT, IS_OOV
|
from .attrs cimport IS_BRACKET, IS_QUOTE, IS_LEFT_PUNCT, IS_RIGHT_PUNCT, IS_CURRENCY, IS_OOV
|
||||||
from .attrs cimport PROB
|
from .attrs cimport PROB
|
||||||
from .attrs import intify_attrs
|
from .attrs import intify_attrs
|
||||||
from . import about
|
from . import about
|
||||||
|
@ -474,6 +474,14 @@ cdef class Lexeme:
|
||||||
def __set__(self, bint x):
|
def __set__(self, bint x):
|
||||||
Lexeme.c_set_flag(self.c, IS_RIGHT_PUNCT, x)
|
Lexeme.c_set_flag(self.c, IS_RIGHT_PUNCT, x)
|
||||||
|
|
||||||
|
property is_currency:
|
||||||
|
"""RETURNS (bool): Whether the lexeme is a currency symbol, e.g. $, €."""
|
||||||
|
def __get__(self):
|
||||||
|
return Lexeme.c_check_flag(self.c, IS_CURRENCY)
|
||||||
|
|
||||||
|
def __set__(self, bint x):
|
||||||
|
Lexeme.c_set_flag(self.c, IS_CURRENCY, x)
|
||||||
|
|
||||||
property like_url:
|
property like_url:
|
||||||
"""RETURNS (bool): Whether the lexeme resembles a URL."""
|
"""RETURNS (bool): Whether the lexeme resembles a URL."""
|
||||||
def __get__(self):
|
def __get__(self):
|
||||||
|
|
|
@ -144,7 +144,8 @@ class Pipe(object):
|
||||||
return create_default_optimizer(self.model.ops,
|
return create_default_optimizer(self.model.ops,
|
||||||
**self.cfg.get('optimizer', {}))
|
**self.cfg.get('optimizer', {}))
|
||||||
|
|
||||||
def begin_training(self, gold_tuples=tuple(), pipeline=None, sgd=None):
|
def begin_training(self, gold_tuples=tuple(), pipeline=None, sgd=None,
|
||||||
|
**kwargs):
|
||||||
"""Initialize the pipe for training, using data exampes if available.
|
"""Initialize the pipe for training, using data exampes if available.
|
||||||
If no model has been initialized yet, the model is added."""
|
If no model has been initialized yet, the model is added."""
|
||||||
if self.model is True:
|
if self.model is True:
|
||||||
|
@ -214,7 +215,8 @@ class Pipe(object):
|
||||||
|
|
||||||
def _load_cfg(path):
|
def _load_cfg(path):
|
||||||
if path.exists():
|
if path.exists():
|
||||||
return ujson.load(path.open())
|
with path.open() as file_:
|
||||||
|
return ujson.load(file_)
|
||||||
else:
|
else:
|
||||||
return {}
|
return {}
|
||||||
|
|
||||||
|
@ -344,7 +346,8 @@ class Tensorizer(Pipe):
|
||||||
loss = (d_scores**2).sum()
|
loss = (d_scores**2).sum()
|
||||||
return loss, d_scores
|
return loss, d_scores
|
||||||
|
|
||||||
def begin_training(self, gold_tuples=tuple(), pipeline=None, sgd=None):
|
def begin_training(self, gold_tuples=tuple(), pipeline=None, sgd=None,
|
||||||
|
**kwargs):
|
||||||
"""Allocate models, pre-process training data and acquire an
|
"""Allocate models, pre-process training data and acquire an
|
||||||
optimizer.
|
optimizer.
|
||||||
|
|
||||||
|
@ -467,7 +470,8 @@ class Tagger(Pipe):
|
||||||
d_scores = self.model.ops.unflatten(d_scores, [len(d) for d in docs])
|
d_scores = self.model.ops.unflatten(d_scores, [len(d) for d in docs])
|
||||||
return float(loss), d_scores
|
return float(loss), d_scores
|
||||||
|
|
||||||
def begin_training(self, gold_tuples=tuple(), pipeline=None, sgd=None):
|
def begin_training(self, gold_tuples=tuple(), pipeline=None, sgd=None,
|
||||||
|
**kwargs):
|
||||||
orig_tag_map = dict(self.vocab.morphology.tag_map)
|
orig_tag_map = dict(self.vocab.morphology.tag_map)
|
||||||
new_tag_map = OrderedDict()
|
new_tag_map = OrderedDict()
|
||||||
for raw_text, annots_brackets in gold_tuples:
|
for raw_text, annots_brackets in gold_tuples:
|
||||||
|
@ -580,7 +584,8 @@ class Tagger(Pipe):
|
||||||
def load_model(p):
|
def load_model(p):
|
||||||
if self.model is True:
|
if self.model is True:
|
||||||
self.model = self.Model(self.vocab.morphology.n_tags, **self.cfg)
|
self.model = self.Model(self.vocab.morphology.n_tags, **self.cfg)
|
||||||
self.model.from_bytes(p.open('rb').read())
|
with p.open('rb') as file_:
|
||||||
|
self.model.from_bytes(file_.read())
|
||||||
|
|
||||||
def load_tag_map(p):
|
def load_tag_map(p):
|
||||||
with p.open('rb') as file_:
|
with p.open('rb') as file_:
|
||||||
|
@ -641,7 +646,7 @@ class MultitaskObjective(Tagger):
|
||||||
pass
|
pass
|
||||||
|
|
||||||
def begin_training(self, gold_tuples=tuple(), pipeline=None, tok2vec=None,
|
def begin_training(self, gold_tuples=tuple(), pipeline=None, tok2vec=None,
|
||||||
sgd=None):
|
sgd=None, **kwargs):
|
||||||
gold_tuples = nonproj.preprocess_training_data(gold_tuples)
|
gold_tuples = nonproj.preprocess_training_data(gold_tuples)
|
||||||
for raw_text, annots_brackets in gold_tuples:
|
for raw_text, annots_brackets in gold_tuples:
|
||||||
for annots, brackets in annots_brackets:
|
for annots, brackets in annots_brackets:
|
||||||
|
@ -766,7 +771,7 @@ class SimilarityHook(Pipe):
|
||||||
def update(self, doc1_doc2, golds, sgd=None, drop=0.):
|
def update(self, doc1_doc2, golds, sgd=None, drop=0.):
|
||||||
sims, bp_sims = self.model.begin_update(doc1_doc2, drop=drop)
|
sims, bp_sims = self.model.begin_update(doc1_doc2, drop=drop)
|
||||||
|
|
||||||
def begin_training(self, _=tuple(), pipeline=None, sgd=None):
|
def begin_training(self, _=tuple(), pipeline=None, sgd=None, **kwargs):
|
||||||
"""Allocate model, using width from tensorizer in pipeline.
|
"""Allocate model, using width from tensorizer in pipeline.
|
||||||
|
|
||||||
gold_tuples (iterable): Gold-standard training data.
|
gold_tuples (iterable): Gold-standard training data.
|
||||||
|
@ -887,6 +892,7 @@ cdef class DependencyParser(Parser):
|
||||||
self._multitasks.append(labeller)
|
self._multitasks.append(labeller)
|
||||||
|
|
||||||
def init_multitask_objectives(self, gold_tuples, pipeline, sgd=None, **cfg):
|
def init_multitask_objectives(self, gold_tuples, pipeline, sgd=None, **cfg):
|
||||||
|
self.add_multitask_objective('tag')
|
||||||
for labeller in self._multitasks:
|
for labeller in self._multitasks:
|
||||||
tok2vec = self.model[0]
|
tok2vec = self.model[0]
|
||||||
labeller.begin_training(gold_tuples, pipeline=pipeline,
|
labeller.begin_training(gold_tuples, pipeline=pipeline,
|
||||||
|
|
|
@ -17,9 +17,9 @@ cdef enum symbol_t:
|
||||||
IS_QUOTE
|
IS_QUOTE
|
||||||
IS_LEFT_PUNCT
|
IS_LEFT_PUNCT
|
||||||
IS_RIGHT_PUNCT
|
IS_RIGHT_PUNCT
|
||||||
|
IS_CURRENCY
|
||||||
|
|
||||||
FLAG18 = 18
|
FLAG19 = 19
|
||||||
FLAG19
|
|
||||||
FLAG20
|
FLAG20
|
||||||
FLAG21
|
FLAG21
|
||||||
FLAG22
|
FLAG22
|
||||||
|
|
|
@ -22,8 +22,8 @@ IDS = {
|
||||||
"IS_QUOTE": IS_QUOTE,
|
"IS_QUOTE": IS_QUOTE,
|
||||||
"IS_LEFT_PUNCT": IS_LEFT_PUNCT,
|
"IS_LEFT_PUNCT": IS_LEFT_PUNCT,
|
||||||
"IS_RIGHT_PUNCT": IS_RIGHT_PUNCT,
|
"IS_RIGHT_PUNCT": IS_RIGHT_PUNCT,
|
||||||
|
"IS_CURRENCY": IS_CURRENCY,
|
||||||
|
|
||||||
"FLAG18": FLAG18,
|
|
||||||
"FLAG19": FLAG19,
|
"FLAG19": FLAG19,
|
||||||
"FLAG20": FLAG20,
|
"FLAG20": FLAG20,
|
||||||
"FLAG21": FLAG21,
|
"FLAG21": FLAG21,
|
||||||
|
|
|
@ -390,6 +390,22 @@ cdef class ArcEager(TransitionSystem):
|
||||||
gold.c.labels[i] = self.strings.add(label)
|
gold.c.labels[i] = self.strings.add(label)
|
||||||
return gold
|
return gold
|
||||||
|
|
||||||
|
def get_beam_parses(self, Beam beam):
|
||||||
|
parses = []
|
||||||
|
probs = beam.probs
|
||||||
|
for i in range(beam.size):
|
||||||
|
state = <StateC*>beam.at(i)
|
||||||
|
if state.is_final():
|
||||||
|
self.finalize_state(state)
|
||||||
|
prob = probs[i]
|
||||||
|
parse = []
|
||||||
|
for j in range(state.length):
|
||||||
|
head = state.H(j)
|
||||||
|
label = self.strings[state._sent[j].dep]
|
||||||
|
parse.append((head, j, label))
|
||||||
|
parses.append((prob, parse))
|
||||||
|
return parses
|
||||||
|
|
||||||
cdef Transition lookup_transition(self, object name) except *:
|
cdef Transition lookup_transition(self, object name) except *:
|
||||||
if '-' in name:
|
if '-' in name:
|
||||||
move_str, label_str = name.split('-', 1)
|
move_str, label_str = name.split('-', 1)
|
||||||
|
|
|
@ -835,6 +835,7 @@ cdef class Parser:
|
||||||
sgd = self.create_optimizer()
|
sgd = self.create_optimizer()
|
||||||
self.model[1].begin_training(
|
self.model[1].begin_training(
|
||||||
self.model[1].ops.allocate((5, cfg['token_vector_width'])))
|
self.model[1].ops.allocate((5, cfg['token_vector_width'])))
|
||||||
|
if pipeline is not None:
|
||||||
self.init_multitask_objectives(gold_tuples, pipeline, sgd=sgd, **cfg)
|
self.init_multitask_objectives(gold_tuples, pipeline, sgd=sgd, **cfg)
|
||||||
link_vectors_to_models(self.vocab)
|
link_vectors_to_models(self.vocab)
|
||||||
else:
|
else:
|
||||||
|
@ -887,7 +888,7 @@ cdef class Parser:
|
||||||
deserializers = {
|
deserializers = {
|
||||||
'vocab': lambda p: self.vocab.from_disk(p),
|
'vocab': lambda p: self.vocab.from_disk(p),
|
||||||
'moves': lambda p: self.moves.from_disk(p, strings=False),
|
'moves': lambda p: self.moves.from_disk(p, strings=False),
|
||||||
'cfg': lambda p: self.cfg.update(ujson.load(p.open())),
|
'cfg': lambda p: self.cfg.update(util.read_json(p)),
|
||||||
'model': lambda p: None
|
'model': lambda p: None
|
||||||
}
|
}
|
||||||
util.from_disk(path, deserializers, exclude)
|
util.from_disk(path, deserializers, exclude)
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
from ...attrs import intify_attrs, ORTH, NORM, LEMMA, IS_ALPHA
|
from ...attrs import intify_attrs, ORTH, NORM, LEMMA, IS_ALPHA
|
||||||
from ...lang.lex_attrs import is_punct, is_ascii, like_url, word_shape
|
from ...lang.lex_attrs import is_punct, is_ascii, is_currency, like_url, word_shape
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
|
@ -37,6 +37,13 @@ def test_lex_attrs_is_ascii(text, match):
|
||||||
assert is_ascii(text) == match
|
assert is_ascii(text) == match
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize('text,match', [('$', True), ('£', True), ('♥', False),
|
||||||
|
('€', True), ('¥', True), ('¢', True),
|
||||||
|
('a', False), ('www.google.com', False), ('dog', False)])
|
||||||
|
def test_lex_attrs_is_currency(text, match):
|
||||||
|
assert is_currency(text) == match
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize('text,match', [
|
@pytest.mark.parametrize('text,match', [
|
||||||
('www.google.com', True), ('google.com', True), ('sydney.com', True),
|
('www.google.com', True), ('google.com', True), ('sydney.com', True),
|
||||||
('2girls1cup.org', True), ('http://stupid', True), ('www.hi', True),
|
('2girls1cup.org', True), ('http://stupid', True), ('www.hi', True),
|
||||||
|
|
23
spacy/tests/regression/test_issue1959.py
Normal file
23
spacy/tests/regression/test_issue1959.py
Normal file
|
@ -0,0 +1,23 @@
|
||||||
|
# coding: utf8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.models('en')
|
||||||
|
def test_issue1959(EN):
|
||||||
|
texts = ['Apple is looking at buying U.K. startup for $1 billion.']
|
||||||
|
# nlp = load_test_model('en_core_web_sm')
|
||||||
|
EN.add_pipe(clean_component, name='cleaner', after='ner')
|
||||||
|
doc = EN(texts[0])
|
||||||
|
doc_pipe = [doc_pipe for doc_pipe in EN.pipe(texts)]
|
||||||
|
assert doc == doc_pipe[0]
|
||||||
|
|
||||||
|
|
||||||
|
def clean_component(doc):
|
||||||
|
""" Clean up text. Make lowercase and remove punctuation and stopwords """
|
||||||
|
# Remove punctuation, symbols (#) and stopwords
|
||||||
|
doc = [tok.text.lower() for tok in doc if (not tok.is_stop
|
||||||
|
and tok.pos_ != 'PUNCT' and
|
||||||
|
tok.pos_ != 'SYM')]
|
||||||
|
doc = ' '.join(doc)
|
||||||
|
return doc
|
28
spacy/tests/serialize/test_serialize_language.py
Normal file
28
spacy/tests/serialize/test_serialize_language.py
Normal file
|
@ -0,0 +1,28 @@
|
||||||
|
# coding: utf-8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
from ..util import make_tempdir
|
||||||
|
from ...language import Language
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def meta_data():
|
||||||
|
return {
|
||||||
|
'name': 'name-in-fixture',
|
||||||
|
'version': 'version-in-fixture',
|
||||||
|
'description': 'description-in-fixture',
|
||||||
|
'author': 'author-in-fixture',
|
||||||
|
'email': 'email-in-fixture',
|
||||||
|
'url': 'url-in-fixture',
|
||||||
|
'license': 'license-in-fixture',
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_serialize_language_meta_disk(meta_data):
|
||||||
|
language = Language(meta=meta_data)
|
||||||
|
with make_tempdir() as d:
|
||||||
|
language.to_disk(d)
|
||||||
|
new_language = Language().from_disk(d)
|
||||||
|
assert new_language.meta == language.meta
|
|
@ -15,7 +15,7 @@ from ..lexeme cimport Lexeme
|
||||||
from .. import parts_of_speech
|
from .. import parts_of_speech
|
||||||
from ..attrs cimport IS_ALPHA, IS_ASCII, IS_DIGIT, IS_LOWER, IS_PUNCT, IS_SPACE
|
from ..attrs cimport IS_ALPHA, IS_ASCII, IS_DIGIT, IS_LOWER, IS_PUNCT, IS_SPACE
|
||||||
from ..attrs cimport IS_BRACKET, IS_QUOTE, IS_LEFT_PUNCT, IS_RIGHT_PUNCT
|
from ..attrs cimport IS_BRACKET, IS_QUOTE, IS_LEFT_PUNCT, IS_RIGHT_PUNCT
|
||||||
from ..attrs cimport IS_OOV, IS_TITLE, IS_UPPER, LIKE_URL, LIKE_NUM, LIKE_EMAIL
|
from ..attrs cimport IS_OOV, IS_TITLE, IS_UPPER, IS_CURRENCY, LIKE_URL, LIKE_NUM, LIKE_EMAIL
|
||||||
from ..attrs cimport IS_STOP, ID, ORTH, NORM, LOWER, SHAPE, PREFIX, SUFFIX
|
from ..attrs cimport IS_STOP, ID, ORTH, NORM, LOWER, SHAPE, PREFIX, SUFFIX
|
||||||
from ..attrs cimport LENGTH, CLUSTER, LEMMA, POS, TAG, DEP
|
from ..attrs cimport LENGTH, CLUSTER, LEMMA, POS, TAG, DEP
|
||||||
from ..compat import is_config
|
from ..compat import is_config
|
||||||
|
@ -855,6 +855,11 @@ cdef class Token:
|
||||||
def __get__(self):
|
def __get__(self):
|
||||||
return Lexeme.c_check_flag(self.c.lex, IS_RIGHT_PUNCT)
|
return Lexeme.c_check_flag(self.c.lex, IS_RIGHT_PUNCT)
|
||||||
|
|
||||||
|
property is_currency:
|
||||||
|
"""RETURNS (bool): Whether the token is a currency symbol."""
|
||||||
|
def __get__(self):
|
||||||
|
return Lexeme.c_check_flag(self.c.lex, IS_CURRENCY)
|
||||||
|
|
||||||
property like_url:
|
property like_url:
|
||||||
"""RETURNS (bool): Whether the token resembles a URL."""
|
"""RETURNS (bool): Whether the token resembles a URL."""
|
||||||
def __get__(self):
|
def __get__(self):
|
||||||
|
|
|
@ -17,6 +17,7 @@ from thinc.neural._classes.model import Model
|
||||||
import functools
|
import functools
|
||||||
import cytoolz
|
import cytoolz
|
||||||
import itertools
|
import itertools
|
||||||
|
import numpy.random
|
||||||
|
|
||||||
from .symbols import ORTH
|
from .symbols import ORTH
|
||||||
from .compat import cupy, CudaStream, path2str, basestring_, input_, unicode_
|
from .compat import cupy, CudaStream, path2str, basestring_, input_, unicode_
|
||||||
|
@ -623,3 +624,8 @@ def use_gpu(gpu_id):
|
||||||
Model.ops = CupyOps()
|
Model.ops = CupyOps()
|
||||||
Model.Ops = CupyOps
|
Model.Ops = CupyOps
|
||||||
return device
|
return device
|
||||||
|
|
||||||
|
|
||||||
|
def fix_random_seed(seed=0):
|
||||||
|
random.seed(seed)
|
||||||
|
numpy.random.seed(seed)
|
||||||
|
|
|
@ -347,7 +347,8 @@ cdef class Vectors:
|
||||||
"""
|
"""
|
||||||
def load_key2row(path):
|
def load_key2row(path):
|
||||||
if path.exists():
|
if path.exists():
|
||||||
self.key2row = msgpack.load(path.open('rb'))
|
with path.open('rb') as file_:
|
||||||
|
self.key2row = msgpack.load(file_)
|
||||||
for key, row in self.key2row.items():
|
for key, row in self.key2row.items():
|
||||||
if row in self._unset:
|
if row in self._unset:
|
||||||
self._unset.remove(row)
|
self._unset.remove(row)
|
||||||
|
|
|
@ -10,6 +10,9 @@ nav.c-nav.u-text.js-nav(class=landing ? "c-nav--theme" : null)
|
||||||
li.c-nav__menu__item(class=is_active ? "is-active" : null)
|
li.c-nav__menu__item(class=is_active ? "is-active" : null)
|
||||||
+a(url)(tabindex=is_active ? "-1" : null)=item
|
+a(url)(tabindex=is_active ? "-1" : null)=item
|
||||||
|
|
||||||
|
li.c-nav__menu__item.u-hidden-xs
|
||||||
|
+a("https://survey.spacy.io", true) User Survey 2018
|
||||||
|
|
||||||
li.c-nav__menu__item.u-hidden-xs
|
li.c-nav__menu__item.u-hidden-xs
|
||||||
+a(gh("spaCy"))(aria-label="GitHub") #[+icon("github", 20)]
|
+a(gh("spaCy"))(aria-label="GitHub") #[+icon("github", 20)]
|
||||||
|
|
||||||
|
|
|
@ -13,7 +13,7 @@ p
|
||||||
| Their results and subsequent discussions helped us develop a novel
|
| Their results and subsequent discussions helped us develop a novel
|
||||||
| psychologically-motivated technique to improve spaCy's accuracy, which
|
| psychologically-motivated technique to improve spaCy's accuracy, which
|
||||||
| we published in joint work with Macquarie University
|
| we published in joint work with Macquarie University
|
||||||
| #[+a("https://aclweb.org/anthology/D/D15/D15-1162.pdf") (Honnibal and Johnson, 2015)].
|
| #[+a("https://www.aclweb.org/anthology/D/D15/D15-1162.pdf") (Honnibal and Johnson, 2015)].
|
||||||
|
|
||||||
include _benchmarks-choi-2015
|
include _benchmarks-choi-2015
|
||||||
|
|
||||||
|
|
|
@ -38,9 +38,10 @@ p
|
||||||
| #[code spacy/data] directory. This means your user needs permission to do
|
| #[code spacy/data] directory. This means your user needs permission to do
|
||||||
| this. The above error mostly occurs when doing a system-wide installation,
|
| this. The above error mostly occurs when doing a system-wide installation,
|
||||||
| which will create the symlinks in a system directory. Run the
|
| which will create the symlinks in a system directory. Run the
|
||||||
| #[code download] or #[code link] command as administrator, or use a
|
| #[code download] or #[code link] command as administrator (on Windows,
|
||||||
| #[code virtualenv] to install spaCy in a user directory, instead
|
| simply right-click on your terminal or shell ans select "Run as
|
||||||
| of doing a system-wide installation.
|
| Administrator"), or use a #[code virtualenv] to install spaCy in a user
|
||||||
|
| directory, instead of doing a system-wide installation.
|
||||||
|
|
||||||
+h(3, "no-cache-dir") No such option: --no-cache-dir
|
+h(3, "no-cache-dir") No such option: --no-cache-dir
|
||||||
|
|
||||||
|
|
|
@ -65,9 +65,9 @@ p
|
||||||
- var style = [0, 1, 0, 1, 0]
|
- var style = [0, 1, 0, 1, 0]
|
||||||
+annotation-row(["Autonomous", "amod", "cars", "NOUN", ""], style)
|
+annotation-row(["Autonomous", "amod", "cars", "NOUN", ""], style)
|
||||||
+annotation-row(["cars", "nsubj", "shift", "VERB", "Autonomous"], style)
|
+annotation-row(["cars", "nsubj", "shift", "VERB", "Autonomous"], style)
|
||||||
+annotation-row(["shift", "ROOT", "shift", "VERB", "cars, liability"], style)
|
+annotation-row(["shift", "ROOT", "shift", "VERB", "cars, liability, toward"], style)
|
||||||
+annotation-row(["insurance", "compound", "liability", "NOUN", ""], style)
|
+annotation-row(["insurance", "compound", "liability", "NOUN", ""], style)
|
||||||
+annotation-row(["liability", "dobj", "shift", "VERB", "insurance, toward"], style)
|
+annotation-row(["liability", "dobj", "shift", "VERB", "insurance"], style)
|
||||||
+annotation-row(["toward", "prep", "liability", "NOUN", "manufacturers"], style)
|
+annotation-row(["toward", "prep", "liability", "NOUN", "manufacturers"], style)
|
||||||
+annotation-row(["manufacturers", "pobj", "toward", "ADP", ""], style)
|
+annotation-row(["manufacturers", "pobj", "toward", "ADP", ""], style)
|
||||||
|
|
||||||
|
|
|
@ -80,7 +80,7 @@ p
|
||||||
doc.ents = [netflix_ent]
|
doc.ents = [netflix_ent]
|
||||||
|
|
||||||
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
|
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
|
||||||
assert ents = [(u'Netflix', 0, 7, u'ORG')]
|
assert ents == [(u'Netflix', 0, 7, u'ORG')]
|
||||||
|
|
||||||
p
|
p
|
||||||
| Keep in mind that you need to create a #[code Span] with the start and
|
| Keep in mind that you need to create a #[code Span] with the start and
|
||||||
|
|
|
@ -54,10 +54,21 @@ p
|
||||||
|
|
||||||
p
|
p
|
||||||
| The matcher returns a list of #[code (match_id, start, end)] tuples – in
|
| The matcher returns a list of #[code (match_id, start, end)] tuples – in
|
||||||
| this case, #[code [('HelloWorld', 0, 2)]], which maps to the span
|
| this case, #[code [('15578876784678163569', 0, 2)]], which maps to the
|
||||||
| #[code doc[0:2]] of our original document. Optionally, we could also
|
| span #[code doc[0:2]] of our original document. The #[code match_id]
|
||||||
| choose to add more than one pattern, for example to also match sequences
|
| is the #[+a("/usage/spacy-101#vocab") hash value] of the string ID
|
||||||
| without punctuation between "hello" and "world":
|
| "HelloWorld". To get the string value, you can look up the ID
|
||||||
|
| in the #[+api("stringstore") #[code StringStore]].
|
||||||
|
|
||||||
|
+code.
|
||||||
|
for match_id, start, end in matches:
|
||||||
|
string_id = nlp.vocab.strings[match_id] # 'HelloWorld'
|
||||||
|
span = doc[start:end] # the matched span
|
||||||
|
|
||||||
|
p
|
||||||
|
| Optionally, we could also choose to add more than one pattern, for
|
||||||
|
| example to also match sequences without punctuation between "hello" and
|
||||||
|
| "world":
|
||||||
|
|
||||||
+code.
|
+code.
|
||||||
matcher.add('HelloWorld', None,
|
matcher.add('HelloWorld', None,
|
||||||
|
@ -91,6 +102,10 @@ p
|
||||||
+cell.u-nowrap #[code LOWER]
|
+cell.u-nowrap #[code LOWER]
|
||||||
+cell The lowercase form of the token text.
|
+cell The lowercase form of the token text.
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code LENGTH]
|
||||||
|
+cell The length of the token text.
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell.u-nowrap #[code IS_ALPHA], #[code IS_ASCII], #[code IS_DIGIT]
|
+cell.u-nowrap #[code IS_ALPHA], #[code IS_ASCII], #[code IS_DIGIT]
|
||||||
+cell
|
+cell
|
||||||
|
@ -117,6 +132,10 @@ p
|
||||||
| The token's simple and extended part-of-speech tag, dependency
|
| The token's simple and extended part-of-speech tag, dependency
|
||||||
| label, lemma, shape.
|
| label, lemma, shape.
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell.u-nowrap #[code ENT_TYPE]
|
||||||
|
+cell The token's entity label.
|
||||||
|
|
||||||
+h(4, "adding-patterns-wildcard") Using wildcard token patterns
|
+h(4, "adding-patterns-wildcard") Using wildcard token patterns
|
||||||
+tag-new(2)
|
+tag-new(2)
|
||||||
|
|
||||||
|
@ -335,7 +354,8 @@ p
|
||||||
| flag.
|
| flag.
|
||||||
|
|
||||||
+code.
|
+code.
|
||||||
IS_DEFINITELY = nlp.vocab.add_flag(re.compile(r'deff?in[ia]tely').match)
|
definitely_flag = lambda text: bool(re.compile(r'deff?in[ia]tely').match(text))
|
||||||
|
IS_DEFINITELY = nlp.vocab.add_flag(definitely_flag)
|
||||||
|
|
||||||
matcher = Matcher(nlp.vocab)
|
matcher = Matcher(nlp.vocab)
|
||||||
matcher.add('DEFINITELY', None, [{IS_DEFINITELY: True}])
|
matcher.add('DEFINITELY', None, [{IS_DEFINITELY: True}])
|
||||||
|
|
|
@ -54,7 +54,7 @@ p
|
||||||
|
|
||||||
+code.
|
+code.
|
||||||
import spacy
|
import spacy
|
||||||
from spacy.symbols import ORTH, LEMMA, POS
|
from spacy.symbols import ORTH, LEMMA, POS, TAG
|
||||||
|
|
||||||
nlp = spacy.load('en')
|
nlp = spacy.load('en')
|
||||||
doc = nlp(u'gimme that') # phrase to tokenize
|
doc = nlp(u'gimme that') # phrase to tokenize
|
||||||
|
|
|
@ -31,3 +31,13 @@ p
|
||||||
import spacy
|
import spacy
|
||||||
nlp = spacy.load('en')
|
nlp = spacy.load('en')
|
||||||
doc = nlp(u'This is a sentence.')
|
doc = nlp(u'This is a sentence.')
|
||||||
|
|
||||||
|
+infobox("Important note", "⚠️")
|
||||||
|
| To allow loading models via convenient shortcuts like #[code 'en'], spaCy
|
||||||
|
| will create a symlink within the #[code spacy/data] directory. This means
|
||||||
|
| that your user needs the #[strong required permissions].
|
||||||
|
| If you've installed spaCy to a system directory and don't have admin
|
||||||
|
| privileges, the model linking may fail. The easiest solution
|
||||||
|
| is to re-run the command as admin, or use a #[code virtualenv]. For more
|
||||||
|
| info on this, see the
|
||||||
|
| #[+a("/usage/#symlink-privilege") troubleshooting guide].
|
||||||
|
|
|
@ -132,7 +132,7 @@ p
|
||||||
# set up shortcut link to load local model as "my_amazing_model"
|
# set up shortcut link to load local model as "my_amazing_model"
|
||||||
python -m spacy link /Users/you/model my_amazing_model
|
python -m spacy link /Users/you/model my_amazing_model
|
||||||
|
|
||||||
+infobox("Important note")
|
+infobox("Important note", "⚠️")
|
||||||
| In order to create a symlink, your user needs the #[strong required permissions].
|
| In order to create a symlink, your user needs the #[strong required permissions].
|
||||||
| If you've installed spaCy to a system directory and don't have admin
|
| If you've installed spaCy to a system directory and don't have admin
|
||||||
| privileges, the #[code spacy link] command may fail. The easiest solution
|
| privileges, the #[code spacy link] command may fail. The easiest solution
|
||||||
|
|
Loading…
Reference in New Issue
Block a user