This commit is contained in:
Matthew Honnibal 2018-03-24 17:31:49 +01:00
commit 0d3bf0d4eb
27 changed files with 1317 additions and 141 deletions

106
.github/contributors/alldefector.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [x] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Feng Niu |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | Feb 21, 2018 |
| GitHub username | alldefector |
| Website (optional) | |

106
.github/contributors/calumcalder.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Calum Calder |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 22 March 2018 |
| GitHub username | calumcalder |
| Website (optional) | |

106
.github/contributors/doug-descombaz.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Doug DesCombaz |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 2018-03-15 |
| GitHub username | doug-descombaz |
| Website (optional) | https://medium.com/@doug.descombaz |

106
.github/contributors/howl-anderson.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Xiaoquan Kong |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 2018-03-23 |
| GitHub username | howl-anderson |
| Website (optional) | |

106
.github/contributors/iann0036.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [x] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Ian Mckay |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 22/03/2018 |
| GitHub username | iann0036 |
| Website (optional) | |

106
.github/contributors/justindujardin.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Justin DuJardin |
| Company name (if applicable) | DuJardin Consulting, LLC |
| Title or role (if applicable) | |
| Date | 2018-03-23 |
| GitHub username | justindujardin |
| Website (optional) | https://justindujardin.com |

106
.github/contributors/ottosulin.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschr<68>nkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an <20>x<EFBFBD> on one of the applicable statement below. Please do NOT
mark both statements:
* [ X ] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Otto Sulin |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 23/03/2018 |
| GitHub username | ottosulin |
| Website (optional) | |

106
.github/contributors/willismonroe.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [x] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Willis Monroe |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 2018-3-5 |
| GitHub username | willismonroe |
| Website (optional) | |

View File

@ -0,0 +1,82 @@
#!/usr/bin/env python
# coding: utf8
"""Visualize spaCy word vectors in Tensorboard.
Adapted from: https://gist.github.com/BrikerMan/7bd4e4bd0a00ac9076986148afc06507
"""
from __future__ import unicode_literals
from os import path
import math
import numpy
import plac
import spacy
import tensorflow as tf
import tqdm
from tensorflow.contrib.tensorboard.plugins.projector import visualize_embeddings, ProjectorConfig
@plac.annotations(
vectors_loc=("Path to spaCy model that contains vectors", "positional", None, str),
out_loc=("Path to output folder for tensorboard session data", "positional", None, str),
name=("Human readable name for tsv file and vectors tensor", "positional", None, str),
)
def main(vectors_loc, out_loc, name="spaCy_vectors"):
meta_file = "{}.tsv".format(name)
out_meta_file = path.join(out_loc, meta_file)
print('Loading spaCy vectors model: {}'.format(vectors_loc))
model = spacy.load(vectors_loc)
print('Finding lexemes with vectors attached: {}'.format(vectors_loc))
strings_stream = tqdm.tqdm(model.vocab.strings, total=len(model.vocab.strings), leave=False)
queries = [w for w in strings_stream if model.vocab.has_vector(w)]
vector_count = len(queries)
print('Building Tensorboard Projector metadata for ({}) vectors: {}'.format(vector_count, out_meta_file))
# Store vector data in a tensorflow variable
tf_vectors_variable = numpy.zeros((vector_count, model.vocab.vectors.shape[1]))
# Write a tab-separated file that contains information about the vectors for visualization
#
# Reference: https://www.tensorflow.org/programmers_guide/embedding#metadata
with open(out_meta_file, 'wb') as file_metadata:
# Define columns in the first row
file_metadata.write("Text\tFrequency\n".encode('utf-8'))
# Write out a row for each vector that we add to the tensorflow variable we created
vec_index = 0
for text in tqdm.tqdm(queries, total=len(queries), leave=False):
# https://github.com/tensorflow/tensorflow/issues/9094
text = '<Space>' if text.lstrip() == '' else text
lex = model.vocab[text]
# Store vector data and metadata
tf_vectors_variable[vec_index] = model.vocab.get_vector(text)
file_metadata.write("{}\t{}\n".format(text, math.exp(lex.prob) * vector_count).encode('utf-8'))
vec_index += 1
print('Running Tensorflow Session...')
sess = tf.InteractiveSession()
tf.Variable(tf_vectors_variable, trainable=False, name=name)
tf.global_variables_initializer().run()
saver = tf.train.Saver()
writer = tf.summary.FileWriter(out_loc, sess.graph)
# Link the embeddings into the config
config = ProjectorConfig()
embed = config.embeddings.add()
embed.tensor_name = name
embed.metadata_path = meta_file
# Tell the projector about the configured embeddings and metadata file
visualize_embeddings(writer, config)
# Save session and print run command to the output
print('Saving Tensorboard Session...')
saver.save(sess, path.join(out_loc, '{}.ckpt'.format(name)))
print('Done. Run `tensorboard --logdir={0}` to view in Tensorboard'.format(out_loc))
if __name__ == '__main__':
plac.call(main)

22
spacy/lang/tr/examples.py Normal file
View File

@ -0,0 +1,22 @@
# coding: utf8
from __future__ import unicode_literals
"""
Example sentences to test spaCy and its language models.
>>> from spacy.lang.tr.examples import sentences
>>> docs = nlp.pipe(sentences)
"""
sentences = [
"Neredesin?",
"Neredesiniz?",
"Bu bir cümledir.",
"Sürücüsüz araçlar sigorta yükümlülüğünü üreticilere kaydırıyor.",
"San Francisco kaldırımda kurye robotları yasaklayabilir."
"Londra İngiltere'nin başkentidir.",
"Türkiye'nin başkenti neresi?",
"Bakanlar Kurulu 180 günlük eylem planınııkladı.",
"Merkez Bankası, beklentiler doğrultusunda faizlerde değişikliğe gitmedi."
]

View File

@ -0,0 +1,31 @@
# coding: utf8
from __future__ import unicode_literals
from ...attrs import LIKE_NUM
#Thirteen, fifteen etc. are written separate: on üç
_num_words = ['bir', 'iki', 'üç', 'dört', 'beş', 'altı', 'yedi', 'sekiz',
'dokuz', 'on', 'yirmi', 'otuz', 'kırk', 'elli', 'altmış',
'yetmiş', 'seksen', 'doksan', 'yüz', 'bin', 'milyon',
'milyar', 'katrilyon', 'kentilyon']
def like_num(text):
text = text.replace(',', '').replace('.', '')
if text.isdigit():
return True
if text.count('/') == 1:
num, denom = text.split('/')
if num.isdigit() and denom.isdigit():
return True
if text.lower() in _num_words:
return True
return False
LEX_ATTRS = {
LIKE_NUM: like_num
}

View File

@ -10,16 +10,12 @@ acep
adamakıllı adamakıllı
adeta adeta
ait ait
altmýþ
altmış
altý
altı
ama ama
amma amma
anca anca
ancak ancak
arada arada
artýk artık
aslında aslında
aynen aynen
ayrıca ayrıca
@ -29,46 +25,82 @@ açıkçası
bana bana
bari bari
bazen bazen
bazý
bazı bazı
bazısı
bazısına
bazısında
bazısından
bazısını
bazısının
başkası başkası
baţka başkasına
başkasında
başkasından
başkasını
başkasının
başka
belki belki
ben ben
bende
benden benden
beni beni
benim benim
beri beri
beriki beriki
beþ berikinin
beş berikiyi
beţ berisi
bilcümle bilcümle
bile bile
bin
binaen binaen
binaenaleyh binaenaleyh
bir
biraz biraz
birazdan birazdan
birbiri birbiri
birbirine
birbirini
birbirinin
birbirinde
birbirinden
birden birden
birdenbire birdenbire
biri biri
birine
birini
birinin
birinde
birinden
birice birice
birileri birileri
birilerinde
birilerinden
birilerine
birilerini
birilerinin
birisi birisi
birisine
birisini
birisinin
birisinde
birisinden
birkaç birkaç
birkaçı birkaçı
birkaçına
birkaçını
birkaçının
birkaçında
birkaçından
birkez birkez
birlikte birlikte
birçok birçok
birçoğu birçoğu
birþey birçoğuna
birþeyi birçoğunda
birçoğundan
birçoğunu
birçoğunun
birşey birşey
birşeyi birşeyi
birţey
bitevi bitevi
biteviye biteviye
bittabi bittabi
@ -96,6 +128,11 @@ buracıkta
burada burada
buradan buradan
burası burası
burasına
burasını
burasının
burasında
burasından
böyle böyle
böylece böylece
böylecene böylecene
@ -106,8 +143,34 @@ büsbütün
bütün bütün
cuk cuk
cümlesi cümlesi
cümlesine
cümlesini
cümlesinin
cümlesinden
cümlemize
cümlemizi
cümlemizden
çabuk
çabukça
çeşitli
çok
çokları
çoklarınca
çokluk
çoklukla
çokça
çoğu
çoğun
çoğunca
çoğunda
çoğundan
çoğunlukla
çoğunu
çoğunun
çünkü
da da
daha daha
dahası
dahi dahi
dahil dahil
dahilen dahilen
@ -124,19 +187,17 @@ denli
derakap derakap
derhal derhal
derken derken
deđil
değil değil
değin değin
diye diye
diđer
diğer diğer
diğeri diğeri
doksan diğerine
dokuz diğerini
diğerinden
dolayı dolayı
dolayısıyla dolayısıyla
doğru doğru
dört
edecek edecek
eden eden
ederek ederek
@ -146,7 +207,6 @@ edilmesi
ediyor ediyor
elbet elbet
elbette elbette
elli
emme emme
en en
enikonu enikonu
@ -168,10 +228,10 @@ evvelce
evvelden evvelden
evvelemirde evvelemirde
evveli evveli
eđer
eğer eğer
fakat fakat
filanca filanca
filancanın
gah gah
gayet gayet
gayetle gayetle
@ -197,6 +257,10 @@ haliyle
handiyse handiyse
hangi hangi
hangisi hangisi
hangisine
hangisine
hangisinde
hangisinden
hani hani
hariç hariç
hasebiyle hasebiyle
@ -207,17 +271,27 @@ hem
henüz henüz
hep hep
hepsi hepsi
hepsini
hepsinin
hepsinde
hepsinden
her her
herhangi herhangi
herkes herkes
herkesi
herkesin herkesin
herkesten
hiç hiç
hiçbir hiçbir
hiçbiri hiçbiri
hiçbirine
hiçbirini
hiçbirinin
hiçbirinde
hiçbirinden
hoş hoş
hulasaten hulasaten
iken iken
iki
ila ila
ile ile
ilen ilen
@ -240,43 +314,55 @@ iyicene
için için
işte işte
iţte
kadar kadar
kaffesi kaffesi
kah kah
kala kala
kanýmca kanımca
karşın karşın
katrilyon
kaynak kaynak
kaçı kaçı
kaçına
kaçında
kaçından
kaçını
kaçının
kelli kelli
kendi kendi
kendilerinde
kendilerinden
kendilerine kendilerine
kendilerini
kendilerinin
kendini kendini
kendisi kendisi
kendisinde
kendisinden
kendisine kendisine
kendisini kendisini
kendisinin
kere kere
kez kez
keza keza
kezalik kezalik
keşke keşke
keţke
ki ki
kim kim
kimden kimden
kime kime
kimi kimi
kiminin
kimisi kimisi
kimisinde
kimisinden
kimisine
kimisinin
kimse kimse
kimsecik kimsecik
kimsecikler kimsecikler
külliyen külliyen
kýrk
kýsaca
kırk
kısaca kısaca
kısacası
lakin lakin
leh leh
lütfen lütfen
@ -289,13 +375,10 @@ međer
meğer meğer
meğerki meğerki
meğerse meğerse
milyar
milyon
mu mu
mı mı
nasýl mi
nasıl nasıl
nasılsa nasılsa
nazaran nazaran
@ -304,6 +387,8 @@ ne
neden neden
nedeniyle nedeniyle
nedenle nedenle
nedenler
nedenlerden
nedense nedense
nerde nerde
nerden nerden
@ -332,32 +417,27 @@ olduklarını
oldukça oldukça
olduğu olduğu
olduğunu olduğunu
olmadı
olmadığı
olmak olmak
olması olması
olmayan
olmaz
olsa olsa
olsun olsun
olup olup
olur olur
olursa olursa
oluyor oluyor
on
ona ona
onca onca
onculayın onculayın
onda onda
ondan ondan
onlar onlar
onlara
onlardan onlardan
onlari
onlarýn
onları onları
onların onların
onu onu
onun onun
ora
oracık oracık
oracıkta oracıkta
orada orada
@ -365,9 +445,26 @@ oradan
oranca oranca
oranla oranla
oraya oraya
otuz
oysa oysa
oysaki oysaki
öbür
öbürkü
öbürü
öbüründe
öbüründen
öbürüne
öbürünü
önce
önceden
önceleri
öncelikle
öteki
ötekisi
öyle
öylece
öylelikle
öylemesine
öz
pek pek
pekala pekala
peki peki
@ -379,8 +476,6 @@ sahi
sahiden sahiden
sana sana
sanki sanki
sekiz
seksen
sen sen
senden senden
seni seni
@ -393,6 +488,27 @@ sonra
sonradan sonradan
sonraları sonraları
sonunda sonunda
şayet
şey
şeyden
şeyi
şeyler
şu
şuna
şuncacık
şunda
şundan
şunlar
şunları
şunların
şunu
şunun
şura
şuracık
şuracıkta
şurası
şöyle
şimdi
tabii tabii
tam tam
tamam tamam
@ -400,8 +516,8 @@ tamamen
tamamıyla tamamıyla
tarafından tarafından
tek tek
trilyon
tüm tüm
üzere
var var
vardı vardı
vasıtasıyla vasıtasıyla
@ -429,84 +545,16 @@ yaptığını
yapılan yapılan
yapılması yapılması
yapıyor yapıyor
yedi
yeniden yeniden
yenilerde yenilerde
yerine yerine
yetmiþ
yetmiş
yetmiţ
yine yine
yirmi
yok yok
yoksa yoksa
yoluyla yoluyla
yüz
yüzünden yüzünden
zarfında zarfında
zaten zaten
zati zati
zira zira
çabuk
çabukça
çeşitli
çok
çokları
çoklarınca
çokluk
çoklukla
çokça
çoğu
çoğun
çoğunca
çoğunlukla
çünkü
öbür
öbürkü
öbürü
önce
önceden
önceleri
öncelikle
öteki
ötekisi
öyle
öylece
öylelikle
öylemesine
öz
üzere
üç
þey
þeyden
þeyi
þeyler
þu
þuna
þunda
þundan
þunu
şayet
şey
şeyden
şeyi
şeyler
şu
şuna
şuncacık
şunda
şundan
şunlar
şunları
şunu
şunun
şura
şuracık
şuracıkta
şurası
şöyle
ţayet
ţimdi
ţu
ţöyle
""".split()) """.split())

View File

@ -3,11 +3,6 @@ from __future__ import unicode_literals
from ...symbols import ORTH, NORM from ...symbols import ORTH, NORM
# These exceptions are mostly for example purposes hoping that Turkish
# speakers can contribute in the future! Source of copy-pasted examples:
# https://en.wiktionary.org/wiki/Category:Turkish_language
_exc = { _exc = {
"sağol": [ "sağol": [
{ORTH: "sağ"}, {ORTH: "sağ"},
@ -16,11 +11,112 @@ _exc = {
for exc_data in [ for exc_data in [
{ORTH: "A.B.D.", NORM: "Amerika Birleşik Devletleri"}]: {ORTH: "A.B.D.", NORM: "Amerika Birleşik Devletleri"},
{ORTH: "Alb.", NORM: "Albay"},
{ORTH: "Ar.Gör.", NORM: "Araştırma Görevlisi"},
{ORTH: "Arş.Gör.", NORM: "Araştırma Görevlisi"},
{ORTH: "Asb.", NORM: "Astsubay"},
{ORTH: "Astsb.", NORM: "Astsubay"},
{ORTH: "As.İz.", NORM: "Askeri İnzibat"},
{ORTH: "Atğm", NORM: "Asteğmen"},
{ORTH: "Av.", NORM: "Avukat"},
{ORTH: "Apt.", NORM: "Apartmanı"},
{ORTH: "Bçvş.", NORM: "Başçavuş"},
{ORTH: "bk.", NORM: "bakınız"},
{ORTH: "bknz.", NORM: "bakınız"},
{ORTH: "Bnb.", NORM: "Binbaşı"},
{ORTH: "bnb.", NORM: "binbaşı"},
{ORTH: "Böl.", NORM: "Bölümü"},
{ORTH: "Bşk.", NORM: "Başkanlığı"},
{ORTH: "Bştbp.", NORM: "Baştabip"},
{ORTH: "Bul.", NORM: "Bulvarı"},
{ORTH: "Cad.", NORM: "Caddesi"},
{ORTH: "çev.", NORM: "çeviren"},
{ORTH: "Çvş.", NORM: "Çavuş"},
{ORTH: "dak.", NORM: "dakika"},
{ORTH: "dk.", NORM: "dakika"},
{ORTH: "Doç.", NORM: "Doçent"},
{ORTH: "doğ.", NORM: "doğum tarihi"},
{ORTH: "drl.", NORM: "derleyen"},
{ORTH: "Dz.", NORM: "Deniz"},
{ORTH: "Dz.K.K.lığı", NORM: "Deniz Kuvvetleri Komutanlığı"},
{ORTH: "Dz.Kuv.", NORM: "Deniz Kuvvetleri"},
{ORTH: "Dz.Kuv.K.", NORM: "Deniz Kuvvetleri Komutanlığı"},
{ORTH: "dzl.", NORM: "düzenleyen"},
{ORTH: "Ecz.", NORM: "Eczanesi"},
{ORTH: "ekon.", NORM: "ekonomi"},
{ORTH: "Fak.", NORM: "Fakültesi"},
{ORTH: "Gn.", NORM: "Genel"},
{ORTH: "Gnkur.", NORM: "Genelkurmay"},
{ORTH: "Gn.Kur.", NORM: "Genelkurmay"},
{ORTH: "gr.", NORM: "gram"},
{ORTH: "Hst.", NORM: "Hastanesi"},
{ORTH: "Hs.Uzm.", NORM: "Hesap Uzmanı"},
{ORTH: "huk.", NORM: "hukuk"},
{ORTH: "Hv.", NORM: "Hava"},
{ORTH: "Hv.K.K.lığı", NORM: "Hava Kuvvetleri Komutanlığı"},
{ORTH: "Hv.Kuv.", NORM: "Hava Kuvvetleri"},
{ORTH: "Hv.Kuv.K.", NORM: "Hava Kuvvetleri Komutanlığı"},
{ORTH: "Hz.", NORM: "Hazreti"},
{ORTH: "Hz.Öz.", NORM: "Hizmete Özel"},
{ORTH: "İng.", NORM: "İngilizce"},
{ORTH: "Jeol.", NORM: "Jeoloji"},
{ORTH: "jeol.", NORM: "jeoloji"},
{ORTH: "Korg.", NORM: "Korgeneral"},
{ORTH: "Kur.", NORM: "Kurmay"},
{ORTH: "Kur.Bşk.", NORM: "Kurmay Başkanı"},
{ORTH: "Kuv.", NORM: "Kuvvetleri"},
{ORTH: "Ltd.", NORM: "Limited"},
{ORTH: "Mah.", NORM: "Mahallesi"},
{ORTH: "mah.", NORM: "mahallesi"},
{ORTH: "max.", NORM: "maksimum"},
{ORTH: "min.", NORM: "minimum"},
{ORTH: "Müh.", NORM: "Mühendisliği"},
{ORTH: "müh.", NORM: "mühendisliği"},
{ORTH: "MÖ.", NORM: "Milattan Önce"},
{ORTH: "Onb.", NORM: "Onbaşı"},
{ORTH: "Ord.", NORM: "Ordinaryüs"},
{ORTH: "Org.", NORM: "Orgeneral"},
{ORTH: "Ped.", NORM: "Pedagoji"},
{ORTH: "Prof.", NORM: "Profesör"},
{ORTH: "Sb.", NORM: "Subay"},
{ORTH: "Sn.", NORM: "Sayın"},
{ORTH: "sn.", NORM: "saniye"},
{ORTH: "Sok.", NORM: "Sokak"},
{ORTH: "Şb.", NORM: "Şube"},
{ORTH: "Şti.", NORM: "Şirketi"},
{ORTH: "Tbp.", NORM: "Tabip"},
{ORTH: "T.C.", NORM: "Türkiye Cumhuriyeti"},
{ORTH: "Tel.", NORM: "Telefon"},
{ORTH: "tel.", NORM: "telefon"},
{ORTH: "telg.", NORM: "telgraf"},
{ORTH: "Tğm.", NORM: "Teğmen"},
{ORTH: "tğm.", NORM: "teğmen"},
{ORTH: "tic.", NORM: "ticaret"},
{ORTH: "Tug.", NORM: "Tugay"},
{ORTH: "Tuğg.", NORM: "Tuğgeneral"},
{ORTH: "Tümg.", NORM: "Tümgeneral"},
{ORTH: "Uzm.", NORM: "Uzman"},
{ORTH: "Üçvş.", NORM: "Üstçavuş"},
{ORTH: "Üni.", NORM: "Üniversitesi"},
{ORTH: "Ütğm.", NORM: "Üsteğmen"},
{ORTH: "vb.", NORM: "ve benzeri"},
{ORTH: "vs.", NORM: "vesaire"},
{ORTH: "Yard.", NORM: "Yardımcı"},
{ORTH: "Yar.", NORM: "Yardımcı"},
{ORTH: "Yd.Sb.", NORM: "Yedek Subay"},
{ORTH: "Yard.Doç.", NORM: "Yardımcı Doçent"},
{ORTH: "Yar.Doç.", NORM: "Yardımcı Doçent"},
{ORTH: "Yb.", NORM: "Yarbay"},
{ORTH: "Yrd.", NORM: "Yardımcı"},
{ORTH: "Yrd.Doç.", NORM: "Yardımcı Doçent"},
{ORTH: "Y.Müh.", NORM: "Yüksek mühendis"},
{ORTH: "Y.Mim.", NORM: "Yüksek mimar"}]:
_exc[exc_data[ORTH]] = [exc_data] _exc[exc_data[ORTH]] = [exc_data]
for orth in ["Dr."]: for orth in [
"Dr.", "yy."]:
_exc[orth] = [{ORTH: orth}] _exc[orth] = [{ORTH: orth}]

View File

@ -208,7 +208,7 @@ p
+row +row
+cell #[code word_spacing] +cell #[code word_spacing]
+cell int +cell int
+cell Horizontal spacing between words and arcs in px. +cell Vertical spacing between words and arcs in px.
+cell #[code 45] +cell #[code 45]
+row +row

View File

@ -674,7 +674,7 @@ p
| token vectors. | token vectors.
+aside-code("Example"). +aside-code("Example").
apples = nlp(u'I like apples') doc = nlp(u'I like apples')
assert doc.vector.dtype == 'float32' assert doc.vector.dtype == 'float32'
assert doc.vector.shape == (300,) assert doc.vector.shape == (300,)

View File

@ -12,11 +12,24 @@ p Create a #[code GoldCorpus].
+table(["Name", "Type", "Description"]) +table(["Name", "Type", "Description"])
+row +row
+cell #[code train_path] +cell #[code train]
+cell unicode or #[code Path] +cell unicode or #[code Path] or iterable
+cell File or directory of training data. +cell
| Training data, as a path (file or directory) or iterable. If an
| iterable, each item should be a #[code (text, paragraphs)]
| tuple, where each paragraph is a tuple
| #[code.u-break (sentences, brackets)],and each sentence is a
| tuple #[code.u-break (ids, words, tags, heads, ner)]. See the
| implementation of
| #[+src(gh("spacy", "spacy/gold.pyx")) #[code gold.read_json_file]]
| for further details.
+row +row
+cell #[code dev_path] +cell #[code dev]
+cell unicode or #[code Path] +cell unicode or #[code Path] or iterable
+cell File or directory of development data. +cell Development data, as a path (file or directory) or iterable.
+row("foot")
+cell returns
+cell #[code GoldCorpus]
+cell The newly constructed object.

View File

@ -325,6 +325,12 @@ p The L2 norm of the lexeme's vector representation.
+cell bool +cell bool
+cell Is the lexeme a quotation mark? +cell Is the lexeme a quotation mark?
+row
+cell #[code is_currency]
+tag-new("2.0.8")
+cell bool
+cell Is the lexeme a currency symbol?
+row +row
+cell #[code like_url] +cell #[code like_url]
+cell bool +cell bool

View File

@ -111,6 +111,25 @@ p Match a stream of documents, yielding them in turn.
| parallel, if the #[code Matcher] implementation supports | parallel, if the #[code Matcher] implementation supports
| multi-threading. | multi-threading.
+row
+cell #[code return_matches]
+tag-new(2.1)
+cell bool
+cell
| Yield the match lists along with the docs, making results
| #[code (doc, matches)] tuples.
+row
+cell #[code as_tuples]
+tag-new(2.1)
+cell bool
+cell
| Interpret the input stream as #[code (doc, context)] tuples, and
| yield #[code (result, context)] tuples out. If both
| #[code return_matches] and #[code as_tuples] are #[code True],
| the output will be a sequence of
| #[code ((doc, matches), context)] tuples.
+row("foot") +row("foot")
+cell yields +cell yields
+cell #[code Doc] +cell #[code Doc]

View File

@ -209,7 +209,7 @@ p
+row +row
+cell #[code drop] +cell #[code drop]
+cell int +cell float
+cell The dropout rate. +cell The dropout rate.
+row +row

View File

@ -740,6 +740,12 @@ p The L2 norm of the token's vector representation.
+cell bool +cell bool
+cell Is the token a quotation mark? +cell Is the token a quotation mark?
+row
+cell #[code is_currency]
+tag-new("2.0.8")
+cell bool
+cell Is the token a currency symbol?
+row +row
+cell #[code like_url] +cell #[code like_url]
+cell bool +cell bool

Binary file not shown.

Before

Width:  |  Height:  |  Size: 378 KiB

View File

@ -76,13 +76,15 @@
}, },
"MODEL_LICENSES": { "MODEL_LICENSES": {
"CC BY-SA": "https://creativecommons.org/licenses/by-sa/3.0/", "CC BY 4.0": "https://creativecommons.org/licenses/by/4.0/",
"CC BY-SA 3.0": "https://creativecommons.org/licenses/by-sa/3.0/", "CC BY-SA": "https://creativecommons.org/licenses/by-sa/3.0/",
"CC BY-SA 4.0": "https://creativecommons.org/licenses/by-sa/4.0/", "CC BY-SA 3.0": "https://creativecommons.org/licenses/by-sa/3.0/",
"CC BY-NC": "https://creativecommons.org/licenses/by-nc/3.0/", "CC BY-SA 4.0": "https://creativecommons.org/licenses/by-sa/4.0/",
"CC BY-NC 3.0": "https://creativecommons.org/licenses/by-nc/3.0/", "CC BY-NC": "https://creativecommons.org/licenses/by-nc/3.0/",
"GPL": "https://www.gnu.org/licenses/gpl.html", "CC BY-NC 3.0": "https://creativecommons.org/licenses/by-nc/3.0/",
"LGPL": "https://www.gnu.org/licenses/lgpl.html" "CC-BY-NC-SA 3.0": "https://creativecommons.org/licenses/by-nc-sa/3.0/",
"GPL": "https://www.gnu.org/licenses/gpl.html",
"LGPL": "https://www.gnu.org/licenses/lgpl.html"
}, },
"MODEL_BENCHMARKS": { "MODEL_BENCHMARKS": {

View File

@ -40,7 +40,7 @@ p
+item +item
| Make the #[strong model data] available to the #[code Language] class | Make the #[strong model data] available to the #[code Language] class
| by calling #[+api("language#from_disk") #[code from_disk]] with the | by calling #[+api("language#from_disk") #[code from_disk]] with the
| path to the model data ditectory. | path to the model data directory.
p p
| So when you call this... | So when you call this...
@ -53,7 +53,7 @@ p
| pipeline #[code.u-break ["tagger", "parser", "ner"]]. spaCy will then | pipeline #[code.u-break ["tagger", "parser", "ner"]]. spaCy will then
| initialise #[code spacy.lang.en.English], and create each pipeline | initialise #[code spacy.lang.en.English], and create each pipeline
| component and add it to the processing pipeline. It'll then load in the | component and add it to the processing pipeline. It'll then load in the
| model's data from its data ditectory and return the modified | model's data from its data directory and return the modified
| #[code Language] class for you to use as the #[code nlp] object. | #[code Language] class for you to use as the #[code nlp] object.
p p

View File

@ -37,7 +37,7 @@ p
+cell.u-text-label.u-color-theme=label +cell.u-text-label.u-color-theme=label
for cell in cells for cell in cells
+cell.u-text-center +cell.u-text-center
- var result = cell > 0.5 ? ["yes", "similar"] : cell != 1 ? ["no", "dissimilar"] : ["neutral", "identical"] - var result = cell < 0.5 ? ["no", "dissimilar"] : cell != 1 ? ["yes", "similar"] : ["neutral", "identical"]
| #[code=cell.toFixed(2)] #[+procon(...result)] | #[code=cell.toFixed(2)] #[+procon(...result)]
p p

View File

@ -163,7 +163,7 @@ p
nlp = English().from_disk('/path/to/nlp') nlp = English().from_disk('/path/to/nlp')
p p
| spay's serialization API has been made consistent across classes and | spaCy's serialization API has been made consistent across classes and
| objects. All container classes, i.e. #[code Language], #[code Doc], | objects. All container classes, i.e. #[code Language], #[code Doc],
| #[code Vocab] and #[code StringStore] now have a #[code to_bytes()], | #[code Vocab] and #[code StringStore] now have a #[code to_bytes()],
| #[code from_bytes()], #[code to_disk()] and #[code from_disk()] method | #[code from_bytes()], #[code to_disk()] and #[code from_disk()] method

View File

@ -120,6 +120,9 @@ include ../_includes/_mixins
| A Practical Real-World Approach to Gaining Actionable Insights | A Practical Real-World Approach to Gaining Actionable Insights
| from your Data | from your Data
+card("Practical Machine Learning with Python", "", "Dipanjan Sarkar et al. (Apress, 2017)", "book")
| A Problem-Solver's Guide to Building Real-World Intelligent Systems
+section("notebooks") +section("notebooks")
+h(2, "notebooks") Jupyter notebooks +h(2, "notebooks") Jupyter notebooks

View File

@ -68,7 +68,7 @@ p
+item #[strong spaCy is not research software]. +item #[strong spaCy is not research software].
| It's built on the latest research, but it's designed to get | It's built on the latest research, but it's designed to get
| things done. This leads to fairly different design decisions than | things done. This leads to fairly different design decisions than
| #[+a("https://github./nltk/nltk") NLTK] | #[+a("https://github.com/nltk/nltk") NLTK]
| or #[+a("https://stanfordnlp.github.io/CoreNLP/") CoreNLP], which were | or #[+a("https://stanfordnlp.github.io/CoreNLP/") CoreNLP], which were
| created as platforms for teaching and research. The main difference | created as platforms for teaching and research. The main difference
| is that spaCy is integrated and opinionated. spaCy tries to avoid asking | is that spaCy is integrated and opinionated. spaCy tries to avoid asking