This commit is contained in:
Matthew Honnibal 2018-03-24 17:31:49 +01:00
commit 0d3bf0d4eb
27 changed files with 1317 additions and 141 deletions

106
.github/contributors/alldefector.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [x] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Feng Niu |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | Feb 21, 2018 |
| GitHub username | alldefector |
| Website (optional) | |

106
.github/contributors/calumcalder.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Calum Calder |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 22 March 2018 |
| GitHub username | calumcalder |
| Website (optional) | |

106
.github/contributors/doug-descombaz.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Doug DesCombaz |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 2018-03-15 |
| GitHub username | doug-descombaz |
| Website (optional) | https://medium.com/@doug.descombaz |

106
.github/contributors/howl-anderson.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Xiaoquan Kong |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 2018-03-23 |
| GitHub username | howl-anderson |
| Website (optional) | |

106
.github/contributors/iann0036.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [x] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Ian Mckay |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 22/03/2018 |
| GitHub username | iann0036 |
| Website (optional) | |

106
.github/contributors/justindujardin.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Justin DuJardin |
| Company name (if applicable) | DuJardin Consulting, LLC |
| Title or role (if applicable) | |
| Date | 2018-03-23 |
| GitHub username | justindujardin |
| Website (optional) | https://justindujardin.com |

106
.github/contributors/ottosulin.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschr<68>nkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an <20>x<EFBFBD> on one of the applicable statement below. Please do NOT
mark both statements:
* [ X ] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Otto Sulin |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 23/03/2018 |
| GitHub username | ottosulin |
| Website (optional) | |

106
.github/contributors/willismonroe.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [x] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Willis Monroe |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 2018-3-5 |
| GitHub username | willismonroe |
| Website (optional) | |

View File

@ -0,0 +1,82 @@
#!/usr/bin/env python
# coding: utf8
"""Visualize spaCy word vectors in Tensorboard.
Adapted from: https://gist.github.com/BrikerMan/7bd4e4bd0a00ac9076986148afc06507
"""
from __future__ import unicode_literals
from os import path
import math
import numpy
import plac
import spacy
import tensorflow as tf
import tqdm
from tensorflow.contrib.tensorboard.plugins.projector import visualize_embeddings, ProjectorConfig
@plac.annotations(
vectors_loc=("Path to spaCy model that contains vectors", "positional", None, str),
out_loc=("Path to output folder for tensorboard session data", "positional", None, str),
name=("Human readable name for tsv file and vectors tensor", "positional", None, str),
)
def main(vectors_loc, out_loc, name="spaCy_vectors"):
meta_file = "{}.tsv".format(name)
out_meta_file = path.join(out_loc, meta_file)
print('Loading spaCy vectors model: {}'.format(vectors_loc))
model = spacy.load(vectors_loc)
print('Finding lexemes with vectors attached: {}'.format(vectors_loc))
strings_stream = tqdm.tqdm(model.vocab.strings, total=len(model.vocab.strings), leave=False)
queries = [w for w in strings_stream if model.vocab.has_vector(w)]
vector_count = len(queries)
print('Building Tensorboard Projector metadata for ({}) vectors: {}'.format(vector_count, out_meta_file))
# Store vector data in a tensorflow variable
tf_vectors_variable = numpy.zeros((vector_count, model.vocab.vectors.shape[1]))
# Write a tab-separated file that contains information about the vectors for visualization
#
# Reference: https://www.tensorflow.org/programmers_guide/embedding#metadata
with open(out_meta_file, 'wb') as file_metadata:
# Define columns in the first row
file_metadata.write("Text\tFrequency\n".encode('utf-8'))
# Write out a row for each vector that we add to the tensorflow variable we created
vec_index = 0
for text in tqdm.tqdm(queries, total=len(queries), leave=False):
# https://github.com/tensorflow/tensorflow/issues/9094
text = '<Space>' if text.lstrip() == '' else text
lex = model.vocab[text]
# Store vector data and metadata
tf_vectors_variable[vec_index] = model.vocab.get_vector(text)
file_metadata.write("{}\t{}\n".format(text, math.exp(lex.prob) * vector_count).encode('utf-8'))
vec_index += 1
print('Running Tensorflow Session...')
sess = tf.InteractiveSession()
tf.Variable(tf_vectors_variable, trainable=False, name=name)
tf.global_variables_initializer().run()
saver = tf.train.Saver()
writer = tf.summary.FileWriter(out_loc, sess.graph)
# Link the embeddings into the config
config = ProjectorConfig()
embed = config.embeddings.add()
embed.tensor_name = name
embed.metadata_path = meta_file
# Tell the projector about the configured embeddings and metadata file
visualize_embeddings(writer, config)
# Save session and print run command to the output
print('Saving Tensorboard Session...')
saver.save(sess, path.join(out_loc, '{}.ckpt'.format(name)))
print('Done. Run `tensorboard --logdir={0}` to view in Tensorboard'.format(out_loc))
if __name__ == '__main__':
plac.call(main)

22
spacy/lang/tr/examples.py Normal file
View File

@ -0,0 +1,22 @@
# coding: utf8
from __future__ import unicode_literals
"""
Example sentences to test spaCy and its language models.
>>> from spacy.lang.tr.examples import sentences
>>> docs = nlp.pipe(sentences)
"""
sentences = [
"Neredesin?",
"Neredesiniz?",
"Bu bir cümledir.",
"Sürücüsüz araçlar sigorta yükümlülüğünü üreticilere kaydırıyor.",
"San Francisco kaldırımda kurye robotları yasaklayabilir."
"Londra İngiltere'nin başkentidir.",
"Türkiye'nin başkenti neresi?",
"Bakanlar Kurulu 180 günlük eylem planınııkladı.",
"Merkez Bankası, beklentiler doğrultusunda faizlerde değişikliğe gitmedi."
]

View File

@ -0,0 +1,31 @@
# coding: utf8
from __future__ import unicode_literals
from ...attrs import LIKE_NUM
#Thirteen, fifteen etc. are written separate: on üç
_num_words = ['bir', 'iki', 'üç', 'dört', 'beş', 'altı', 'yedi', 'sekiz',
'dokuz', 'on', 'yirmi', 'otuz', 'kırk', 'elli', 'altmış',
'yetmiş', 'seksen', 'doksan', 'yüz', 'bin', 'milyon',
'milyar', 'katrilyon', 'kentilyon']
def like_num(text):
text = text.replace(',', '').replace('.', '')
if text.isdigit():
return True
if text.count('/') == 1:
num, denom = text.split('/')
if num.isdigit() and denom.isdigit():
return True
if text.lower() in _num_words:
return True
return False
LEX_ATTRS = {
LIKE_NUM: like_num
}

View File

@ -10,16 +10,12 @@ acep
adamakıllı
adeta
ait
altmýþ
altmış
altý
altı
ama
amma
anca
ancak
arada
artýk
artık
aslında
aynen
ayrıca
@ -29,46 +25,82 @@ açıkçası
bana
bari
bazen
bazý
bazı
bazısı
bazısına
bazısında
bazısından
bazısını
bazısının
başkası
baţka
başkasına
başkasında
başkasından
başkasını
başkasının
başka
belki
ben
bende
benden
beni
benim
beri
beriki
beþ
beş
beţ
berikinin
berikiyi
berisi
bilcümle
bile
bin
binaen
binaenaleyh
bir
biraz
birazdan
birbiri
birbirine
birbirini
birbirinin
birbirinde
birbirinden
birden
birdenbire
biri
birine
birini
birinin
birinde
birinden
birice
birileri
birilerinde
birilerinden
birilerine
birilerini
birilerinin
birisi
birisine
birisini
birisinin
birisinde
birisinden
birkaç
birkaçı
birkaçına
birkaçını
birkaçının
birkaçında
birkaçından
birkez
birlikte
birçok
birçoğu
birþey
birþeyi
birçoğuna
birçoğunda
birçoğundan
birçoğunu
birçoğunun
birşey
birşeyi
birţey
bitevi
biteviye
bittabi
@ -96,6 +128,11 @@ buracıkta
burada
buradan
burası
burasına
burasını
burasının
burasında
burasından
böyle
böylece
böylecene
@ -106,8 +143,34 @@ büsbütün
bütün
cuk
cümlesi
cümlesine
cümlesini
cümlesinin
cümlesinden
cümlemize
cümlemizi
cümlemizden
çabuk
çabukça
çeşitli
çok
çokları
çoklarınca
çokluk
çoklukla
çokça
çoğu
çoğun
çoğunca
çoğunda
çoğundan
çoğunlukla
çoğunu
çoğunun
çünkü
da
daha
dahası
dahi
dahil
dahilen
@ -124,19 +187,17 @@ denli
derakap
derhal
derken
deđil
değil
değin
diye
diđer
diğer
diğeri
doksan
dokuz
diğerine
diğerini
diğerinden
dolayı
dolayısıyla
doğru
dört
edecek
eden
ederek
@ -146,7 +207,6 @@ edilmesi
ediyor
elbet
elbette
elli
emme
en
enikonu
@ -168,10 +228,10 @@ evvelce
evvelden
evvelemirde
evveli
eđer
eğer
fakat
filanca
filancanın
gah
gayet
gayetle
@ -197,6 +257,10 @@ haliyle
handiyse
hangi
hangisi
hangisine
hangisine
hangisinde
hangisinden
hani
hariç
hasebiyle
@ -207,17 +271,27 @@ hem
henüz
hep
hepsi
hepsini
hepsinin
hepsinde
hepsinden
her
herhangi
herkes
herkesi
herkesin
herkesten
hiç
hiçbir
hiçbiri
hiçbirine
hiçbirini
hiçbirinin
hiçbirinde
hiçbirinden
hoş
hulasaten
iken
iki
ila
ile
ilen
@ -240,43 +314,55 @@ iyicene
için
işte
iţte
kadar
kaffesi
kah
kala
kanýmca
kanımca
karşın
katrilyon
kaynak
kaçı
kaçına
kaçında
kaçından
kaçını
kaçının
kelli
kendi
kendilerinde
kendilerinden
kendilerine
kendilerini
kendilerinin
kendini
kendisi
kendisinde
kendisinden
kendisine
kendisini
kendisinin
kere
kez
keza
kezalik
keşke
keţke
ki
kim
kimden
kime
kimi
kiminin
kimisi
kimisinde
kimisinden
kimisine
kimisinin
kimse
kimsecik
kimsecikler
külliyen
kýrk
kýsaca
kırk
kısaca
kısacası
lakin
leh
lütfen
@ -289,13 +375,10 @@ međer
meğer
meğerki
meğerse
milyar
milyon
mu
mı
nasýl
mi
nasıl
nasılsa
nazaran
@ -304,6 +387,8 @@ ne
neden
nedeniyle
nedenle
nedenler
nedenlerden
nedense
nerde
nerden
@ -332,32 +417,27 @@ olduklarını
oldukça
olduğu
olduğunu
olmadı
olmadığı
olmak
olması
olmayan
olmaz
olsa
olsun
olup
olur
olursa
oluyor
on
ona
onca
onculayın
onda
ondan
onlar
onlara
onlardan
onlari
onlarýn
onları
onların
onu
onun
ora
oracık
oracıkta
orada
@ -365,9 +445,26 @@ oradan
oranca
oranla
oraya
otuz
oysa
oysaki
öbür
öbürkü
öbürü
öbüründe
öbüründen
öbürüne
öbürünü
önce
önceden
önceleri
öncelikle
öteki
ötekisi
öyle
öylece
öylelikle
öylemesine
öz
pek
pekala
peki
@ -379,8 +476,6 @@ sahi
sahiden
sana
sanki
sekiz
seksen
sen
senden
seni
@ -393,6 +488,27 @@ sonra
sonradan
sonraları
sonunda
şayet
şey
şeyden
şeyi
şeyler
şu
şuna
şuncacık
şunda
şundan
şunlar
şunları
şunların
şunu
şunun
şura
şuracık
şuracıkta
şurası
şöyle
şimdi
tabii
tam
tamam
@ -400,8 +516,8 @@ tamamen
tamamıyla
tarafından
tek
trilyon
tüm
üzere
var
vardı
vasıtasıyla
@ -429,84 +545,16 @@ yaptığını
yapılan
yapılması
yapıyor
yedi
yeniden
yenilerde
yerine
yetmiþ
yetmiş
yetmiţ
yine
yirmi
yok
yoksa
yoluyla
yüz
yüzünden
zarfında
zaten
zati
zira
çabuk
çabukça
çeşitli
çok
çokları
çoklarınca
çokluk
çoklukla
çokça
çoğu
çoğun
çoğunca
çoğunlukla
çünkü
öbür
öbürkü
öbürü
önce
önceden
önceleri
öncelikle
öteki
ötekisi
öyle
öylece
öylelikle
öylemesine
öz
üzere
üç
þey
þeyden
þeyi
þeyler
þu
þuna
þunda
þundan
þunu
şayet
şey
şeyden
şeyi
şeyler
şu
şuna
şuncacık
şunda
şundan
şunlar
şunları
şunu
şunun
şura
şuracık
şuracıkta
şurası
şöyle
ţayet
ţimdi
ţu
ţöyle
""".split())

View File

@ -3,11 +3,6 @@ from __future__ import unicode_literals
from ...symbols import ORTH, NORM
# These exceptions are mostly for example purposes hoping that Turkish
# speakers can contribute in the future! Source of copy-pasted examples:
# https://en.wiktionary.org/wiki/Category:Turkish_language
_exc = {
"sağol": [
{ORTH: "sağ"},
@ -16,11 +11,112 @@ _exc = {
for exc_data in [
{ORTH: "A.B.D.", NORM: "Amerika Birleşik Devletleri"}]:
{ORTH: "A.B.D.", NORM: "Amerika Birleşik Devletleri"},
{ORTH: "Alb.", NORM: "Albay"},
{ORTH: "Ar.Gör.", NORM: "Araştırma Görevlisi"},
{ORTH: "Arş.Gör.", NORM: "Araştırma Görevlisi"},
{ORTH: "Asb.", NORM: "Astsubay"},
{ORTH: "Astsb.", NORM: "Astsubay"},
{ORTH: "As.İz.", NORM: "Askeri İnzibat"},
{ORTH: "Atğm", NORM: "Asteğmen"},
{ORTH: "Av.", NORM: "Avukat"},
{ORTH: "Apt.", NORM: "Apartmanı"},
{ORTH: "Bçvş.", NORM: "Başçavuş"},
{ORTH: "bk.", NORM: "bakınız"},
{ORTH: "bknz.", NORM: "bakınız"},
{ORTH: "Bnb.", NORM: "Binbaşı"},
{ORTH: "bnb.", NORM: "binbaşı"},
{ORTH: "Böl.", NORM: "Bölümü"},
{ORTH: "Bşk.", NORM: "Başkanlığı"},
{ORTH: "Bştbp.", NORM: "Baştabip"},
{ORTH: "Bul.", NORM: "Bulvarı"},
{ORTH: "Cad.", NORM: "Caddesi"},
{ORTH: "çev.", NORM: "çeviren"},
{ORTH: "Çvş.", NORM: "Çavuş"},
{ORTH: "dak.", NORM: "dakika"},
{ORTH: "dk.", NORM: "dakika"},
{ORTH: "Doç.", NORM: "Doçent"},
{ORTH: "doğ.", NORM: "doğum tarihi"},
{ORTH: "drl.", NORM: "derleyen"},
{ORTH: "Dz.", NORM: "Deniz"},
{ORTH: "Dz.K.K.lığı", NORM: "Deniz Kuvvetleri Komutanlığı"},
{ORTH: "Dz.Kuv.", NORM: "Deniz Kuvvetleri"},
{ORTH: "Dz.Kuv.K.", NORM: "Deniz Kuvvetleri Komutanlığı"},
{ORTH: "dzl.", NORM: "düzenleyen"},
{ORTH: "Ecz.", NORM: "Eczanesi"},
{ORTH: "ekon.", NORM: "ekonomi"},
{ORTH: "Fak.", NORM: "Fakültesi"},
{ORTH: "Gn.", NORM: "Genel"},
{ORTH: "Gnkur.", NORM: "Genelkurmay"},
{ORTH: "Gn.Kur.", NORM: "Genelkurmay"},
{ORTH: "gr.", NORM: "gram"},
{ORTH: "Hst.", NORM: "Hastanesi"},
{ORTH: "Hs.Uzm.", NORM: "Hesap Uzmanı"},
{ORTH: "huk.", NORM: "hukuk"},
{ORTH: "Hv.", NORM: "Hava"},
{ORTH: "Hv.K.K.lığı", NORM: "Hava Kuvvetleri Komutanlığı"},
{ORTH: "Hv.Kuv.", NORM: "Hava Kuvvetleri"},
{ORTH: "Hv.Kuv.K.", NORM: "Hava Kuvvetleri Komutanlığı"},
{ORTH: "Hz.", NORM: "Hazreti"},
{ORTH: "Hz.Öz.", NORM: "Hizmete Özel"},
{ORTH: "İng.", NORM: "İngilizce"},
{ORTH: "Jeol.", NORM: "Jeoloji"},
{ORTH: "jeol.", NORM: "jeoloji"},
{ORTH: "Korg.", NORM: "Korgeneral"},
{ORTH: "Kur.", NORM: "Kurmay"},
{ORTH: "Kur.Bşk.", NORM: "Kurmay Başkanı"},
{ORTH: "Kuv.", NORM: "Kuvvetleri"},
{ORTH: "Ltd.", NORM: "Limited"},
{ORTH: "Mah.", NORM: "Mahallesi"},
{ORTH: "mah.", NORM: "mahallesi"},
{ORTH: "max.", NORM: "maksimum"},
{ORTH: "min.", NORM: "minimum"},
{ORTH: "Müh.", NORM: "Mühendisliği"},
{ORTH: "müh.", NORM: "mühendisliği"},
{ORTH: "MÖ.", NORM: "Milattan Önce"},
{ORTH: "Onb.", NORM: "Onbaşı"},
{ORTH: "Ord.", NORM: "Ordinaryüs"},
{ORTH: "Org.", NORM: "Orgeneral"},
{ORTH: "Ped.", NORM: "Pedagoji"},
{ORTH: "Prof.", NORM: "Profesör"},
{ORTH: "Sb.", NORM: "Subay"},
{ORTH: "Sn.", NORM: "Sayın"},
{ORTH: "sn.", NORM: "saniye"},
{ORTH: "Sok.", NORM: "Sokak"},
{ORTH: "Şb.", NORM: "Şube"},
{ORTH: "Şti.", NORM: "Şirketi"},
{ORTH: "Tbp.", NORM: "Tabip"},
{ORTH: "T.C.", NORM: "Türkiye Cumhuriyeti"},
{ORTH: "Tel.", NORM: "Telefon"},
{ORTH: "tel.", NORM: "telefon"},
{ORTH: "telg.", NORM: "telgraf"},
{ORTH: "Tğm.", NORM: "Teğmen"},
{ORTH: "tğm.", NORM: "teğmen"},
{ORTH: "tic.", NORM: "ticaret"},
{ORTH: "Tug.", NORM: "Tugay"},
{ORTH: "Tuğg.", NORM: "Tuğgeneral"},
{ORTH: "Tümg.", NORM: "Tümgeneral"},
{ORTH: "Uzm.", NORM: "Uzman"},
{ORTH: "Üçvş.", NORM: "Üstçavuş"},
{ORTH: "Üni.", NORM: "Üniversitesi"},
{ORTH: "Ütğm.", NORM: "Üsteğmen"},
{ORTH: "vb.", NORM: "ve benzeri"},
{ORTH: "vs.", NORM: "vesaire"},
{ORTH: "Yard.", NORM: "Yardımcı"},
{ORTH: "Yar.", NORM: "Yardımcı"},
{ORTH: "Yd.Sb.", NORM: "Yedek Subay"},
{ORTH: "Yard.Doç.", NORM: "Yardımcı Doçent"},
{ORTH: "Yar.Doç.", NORM: "Yardımcı Doçent"},
{ORTH: "Yb.", NORM: "Yarbay"},
{ORTH: "Yrd.", NORM: "Yardımcı"},
{ORTH: "Yrd.Doç.", NORM: "Yardımcı Doçent"},
{ORTH: "Y.Müh.", NORM: "Yüksek mühendis"},
{ORTH: "Y.Mim.", NORM: "Yüksek mimar"}]:
_exc[exc_data[ORTH]] = [exc_data]
for orth in ["Dr."]:
for orth in [
"Dr.", "yy."]:
_exc[orth] = [{ORTH: orth}]

View File

@ -208,7 +208,7 @@ p
+row
+cell #[code word_spacing]
+cell int
+cell Horizontal spacing between words and arcs in px.
+cell Vertical spacing between words and arcs in px.
+cell #[code 45]
+row

View File

@ -674,7 +674,7 @@ p
| token vectors.
+aside-code("Example").
apples = nlp(u'I like apples')
doc = nlp(u'I like apples')
assert doc.vector.dtype == 'float32'
assert doc.vector.shape == (300,)

View File

@ -12,11 +12,24 @@ p Create a #[code GoldCorpus].
+table(["Name", "Type", "Description"])
+row
+cell #[code train_path]
+cell unicode or #[code Path]
+cell File or directory of training data.
+cell #[code train]
+cell unicode or #[code Path] or iterable
+cell
| Training data, as a path (file or directory) or iterable. If an
| iterable, each item should be a #[code (text, paragraphs)]
| tuple, where each paragraph is a tuple
| #[code.u-break (sentences, brackets)],and each sentence is a
| tuple #[code.u-break (ids, words, tags, heads, ner)]. See the
| implementation of
| #[+src(gh("spacy", "spacy/gold.pyx")) #[code gold.read_json_file]]
| for further details.
+row
+cell #[code dev_path]
+cell unicode or #[code Path]
+cell File or directory of development data.
+cell #[code dev]
+cell unicode or #[code Path] or iterable
+cell Development data, as a path (file or directory) or iterable.
+row("foot")
+cell returns
+cell #[code GoldCorpus]
+cell The newly constructed object.

View File

@ -325,6 +325,12 @@ p The L2 norm of the lexeme's vector representation.
+cell bool
+cell Is the lexeme a quotation mark?
+row
+cell #[code is_currency]
+tag-new("2.0.8")
+cell bool
+cell Is the lexeme a currency symbol?
+row
+cell #[code like_url]
+cell bool

View File

@ -111,6 +111,25 @@ p Match a stream of documents, yielding them in turn.
| parallel, if the #[code Matcher] implementation supports
| multi-threading.
+row
+cell #[code return_matches]
+tag-new(2.1)
+cell bool
+cell
| Yield the match lists along with the docs, making results
| #[code (doc, matches)] tuples.
+row
+cell #[code as_tuples]
+tag-new(2.1)
+cell bool
+cell
| Interpret the input stream as #[code (doc, context)] tuples, and
| yield #[code (result, context)] tuples out. If both
| #[code return_matches] and #[code as_tuples] are #[code True],
| the output will be a sequence of
| #[code ((doc, matches), context)] tuples.
+row("foot")
+cell yields
+cell #[code Doc]

View File

@ -209,7 +209,7 @@ p
+row
+cell #[code drop]
+cell int
+cell float
+cell The dropout rate.
+row

View File

@ -740,6 +740,12 @@ p The L2 norm of the token's vector representation.
+cell bool
+cell Is the token a quotation mark?
+row
+cell #[code is_currency]
+tag-new("2.0.8")
+cell bool
+cell Is the token a currency symbol?
+row
+cell #[code like_url]
+cell bool

Binary file not shown.

Before

Width:  |  Height:  |  Size: 378 KiB

View File

@ -76,13 +76,15 @@
},
"MODEL_LICENSES": {
"CC BY-SA": "https://creativecommons.org/licenses/by-sa/3.0/",
"CC BY-SA 3.0": "https://creativecommons.org/licenses/by-sa/3.0/",
"CC BY-SA 4.0": "https://creativecommons.org/licenses/by-sa/4.0/",
"CC BY-NC": "https://creativecommons.org/licenses/by-nc/3.0/",
"CC BY-NC 3.0": "https://creativecommons.org/licenses/by-nc/3.0/",
"GPL": "https://www.gnu.org/licenses/gpl.html",
"LGPL": "https://www.gnu.org/licenses/lgpl.html"
"CC BY 4.0": "https://creativecommons.org/licenses/by/4.0/",
"CC BY-SA": "https://creativecommons.org/licenses/by-sa/3.0/",
"CC BY-SA 3.0": "https://creativecommons.org/licenses/by-sa/3.0/",
"CC BY-SA 4.0": "https://creativecommons.org/licenses/by-sa/4.0/",
"CC BY-NC": "https://creativecommons.org/licenses/by-nc/3.0/",
"CC BY-NC 3.0": "https://creativecommons.org/licenses/by-nc/3.0/",
"CC-BY-NC-SA 3.0": "https://creativecommons.org/licenses/by-nc-sa/3.0/",
"GPL": "https://www.gnu.org/licenses/gpl.html",
"LGPL": "https://www.gnu.org/licenses/lgpl.html"
},
"MODEL_BENCHMARKS": {

View File

@ -40,7 +40,7 @@ p
+item
| Make the #[strong model data] available to the #[code Language] class
| by calling #[+api("language#from_disk") #[code from_disk]] with the
| path to the model data ditectory.
| path to the model data directory.
p
| So when you call this...
@ -53,7 +53,7 @@ p
| pipeline #[code.u-break ["tagger", "parser", "ner"]]. spaCy will then
| initialise #[code spacy.lang.en.English], and create each pipeline
| component and add it to the processing pipeline. It'll then load in the
| model's data from its data ditectory and return the modified
| model's data from its data directory and return the modified
| #[code Language] class for you to use as the #[code nlp] object.
p

View File

@ -37,7 +37,7 @@ p
+cell.u-text-label.u-color-theme=label
for cell in cells
+cell.u-text-center
- var result = cell > 0.5 ? ["yes", "similar"] : cell != 1 ? ["no", "dissimilar"] : ["neutral", "identical"]
- var result = cell < 0.5 ? ["no", "dissimilar"] : cell != 1 ? ["yes", "similar"] : ["neutral", "identical"]
| #[code=cell.toFixed(2)] #[+procon(...result)]
p

View File

@ -163,7 +163,7 @@ p
nlp = English().from_disk('/path/to/nlp')
p
| spay's serialization API has been made consistent across classes and
| spaCy's serialization API has been made consistent across classes and
| objects. All container classes, i.e. #[code Language], #[code Doc],
| #[code Vocab] and #[code StringStore] now have a #[code to_bytes()],
| #[code from_bytes()], #[code to_disk()] and #[code from_disk()] method

View File

@ -120,6 +120,9 @@ include ../_includes/_mixins
| A Practical Real-World Approach to Gaining Actionable Insights
| from your Data
+card("Practical Machine Learning with Python", "", "Dipanjan Sarkar et al. (Apress, 2017)", "book")
| A Problem-Solver's Guide to Building Real-World Intelligent Systems
+section("notebooks")
+h(2, "notebooks") Jupyter notebooks

View File

@ -68,7 +68,7 @@ p
+item #[strong spaCy is not research software].
| It's built on the latest research, but it's designed to get
| things done. This leads to fairly different design decisions than
| #[+a("https://github./nltk/nltk") NLTK]
| #[+a("https://github.com/nltk/nltk") NLTK]
| or #[+a("https://stanfordnlp.github.io/CoreNLP/") CoreNLP], which were
| created as platforms for teaching and research. The main difference
| is that spaCy is integrated and opinionated. spaCy tries to avoid asking