This commit is contained in:
Matthew Honnibal 2017-12-04 14:42:52 +01:00
commit 07acb43a85
73 changed files with 3423 additions and 129 deletions

View File

@ -4,15 +4,11 @@ environment:
# For Python versions available on Appveyor, see # For Python versions available on Appveyor, see
# http://www.appveyor.com/docs/installed-software#python # http://www.appveyor.com/docs/installed-software#python
# The list here is complete (excluding Python 2.6, which
# isn't covered by this document) at the time of writing.
- PYTHON: "C:\\Python27" - PYTHON: "C:\\Python27"
#- PYTHON: "C:\\Python33"
#- PYTHON: "C:\\Python34" #- PYTHON: "C:\\Python34"
#- PYTHON: "C:\\Python35" #- PYTHON: "C:\\Python35"
#- PYTHON: "C:\\Python27-x64" #- PYTHON: "C:\\Python27-x64"
#- PYTHON: "C:\\Python33-x64"
#- DISTUTILS_USE_SDK: "1" #- DISTUTILS_USE_SDK: "1"
#- PYTHON: "C:\\Python34-x64" #- PYTHON: "C:\\Python34-x64"
#- DISTUTILS_USE_SDK: "1" #- DISTUTILS_USE_SDK: "1"
@ -30,7 +26,7 @@ build: off
test_script: test_script:
# Put your test command here. # Put your test command here.
# If you don't need to build C extensions on 64-bit Python 3.3 or 3.4, # If you don't need to build C extensions on 64-bit Python 3.4,
# you can remove "build.cmd" from the front of the command, as it's # you can remove "build.cmd" from the front of the command, as it's
# only needed to support those cases. # only needed to support those cases.
# Note that you must use the environment variable %PYTHON% to refer to # Note that you must use the environment variable %PYTHON% to refer to
@ -41,7 +37,7 @@ test_script:
after_test: after_test:
# This step builds your wheels. # This step builds your wheels.
# Again, you only need build.cmd if you're building C extensions for # Again, you only need build.cmd if you're building C extensions for
# 64-bit Python 3.3/3.4. And you need to use %PYTHON% to get the correct # 64-bit Python 3.4. And you need to use %PYTHON% to get the correct
# interpreter # interpreter
- "%PYTHON%\\python.exe setup.py bdist_wheel" - "%PYTHON%\\python.exe setup.py bdist_wheel"

106
.github/contributors/MartinoMensio.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Martino Mensio |
| Company name (if applicable) | Polytechnic University of Turin |
| Title or role (if applicable) | Student |
| Date | 17 November 2017 |
| GitHub username | MartinoMensio |
| Website (optional) | https://martinomensio.github.io/ |

106
.github/contributors/bdewilde.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | ---------------------------- |
| Name | Burton DeWilde |
| Company name (if applicable) | - |
| Title or role (if applicable) | data scientist |
| Date | 20 November 2017 |
| GitHub username | bdewilde |
| Website (optional) | https://bdewilde.github.io/ |

106
.github/contributors/cclauss.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Chris Clauss |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 20 November 2017 |
| GitHub username | cclauss |
| Website (optional) | |

106
.github/contributors/fsonntag.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [ ] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [x] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Felix Sonntag |
| Company name (if applicable) | - |
| Title or role (if applicable) | Student |
| Date | 2017-11-19 |
| GitHub username | fsonntag |
| Website (optional) | http://github.com/fsonntag/ |

106
.github/contributors/greenriverrus.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | --------------------|
| Name | Vadim Mazaev |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 26 November 2017 |
| GitHub username | GreenRiverRUS |
| Website (optional) | |

106
.github/contributors/hugovk.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Hugo van Kemenade |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 26 November 2017 |
| GitHub username | hugovk |
| Website (optional) | |

106
.github/contributors/markulrich.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Mark Ulrich |
| Company name (if applicable) | |
| Title or role (if applicable) | Machine Learning Engineer |
| Date | 22 November 2017 |
| GitHub username | markulrich |
| Website (optional) | |

106
.github/contributors/sorenlind.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Søren Lind Kristiansen |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 24 November 2017 |
| GitHub username | sorenlind |
| Website (optional) | |

106
.github/contributors/tokestermw.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Motoki Wu |
| Company name (if applicable) | WriteLab |
| Title or role (if applicable) | NLP / Deep Learning Engineer |
| Date | 17 November 2017 |
| GitHub username | tokestermw |
| Website (optional) | https://twitter.com/plusepsilon |

View File

@ -15,14 +15,17 @@ os:
env: env:
- VIA=compile LC_ALL=en_US.ascii - VIA=compile LC_ALL=en_US.ascii
- VIA=compile - VIA=compile
- VIA=flake8
#- VIA=pypi_nightly #- VIA=pypi_nightly
install: install:
- "./travis.sh" - "./travis.sh"
- pip install flake8
script: script:
- "pip install pytest pytest-timeout" - "pip install pytest pytest-timeout"
- if [[ "${VIA}" == "compile" ]]; then python -m pytest --tb=native spacy; fi - if [[ "${VIA}" == "compile" ]]; then python -m pytest --tb=native spacy; fi
- if [[ "${VIA}" == "flake8" ]]; then flake8 . --count --exclude=spacy/compat.py,spacy/lang --select=E901,E999,F821,F822,F823 --show-source --statistics; fi
- if [[ "${VIA}" == "pypi_nightly" ]]; then python -m pytest --tb=native --models --en `python -c "import os.path; import spacy; print(os.path.abspath(os.path.dirname(spacy.__file__)))"`; fi - if [[ "${VIA}" == "pypi_nightly" ]]; then python -m pytest --tb=native --models --en `python -c "import os.path; import spacy; print(os.path.abspath(os.path.dirname(spacy.__file__)))"`; fi
- if [[ "${VIA}" == "sdist" ]]; then python -m pytest --tb=native `python -c "import os.path; import spacy; print(os.path.abspath(os.path.dirname(spacy.__file__)))"`; fi - if [[ "${VIA}" == "sdist" ]]; then python -m pytest --tb=native `python -c "import os.path; import spacy; print(os.path.abspath(os.path.dirname(spacy.__file__)))"`; fi

View File

@ -57,7 +57,7 @@ even format them as Markdown to copy-paste into GitHub issues:
* **Checking the model compatibility:** If you're having problems with a * **Checking the model compatibility:** If you're having problems with a
[statistical model](https://spacy.io/models), it may be because to the [statistical model](https://spacy.io/models), it may be because to the
model is incompatible with your spaCy installation. In spaCy v2.0+, you can check model is incompatible with your spaCy installation. In spaCy v2.0+, you can check
this on the command line by running `spacy validate`. this on the command line by running `python -m spacy validate`.
* **Sharing a model's output, like dependencies and entities:** spaCy v2.0+ * **Sharing a model's output, like dependencies and entities:** spaCy v2.0+
comes with [built-in visualizers](https://spacy.io/usage/visualizers) that comes with [built-in visualizers](https://spacy.io/usage/visualizers) that

View File

@ -16,6 +16,10 @@ integration. It's commercial open-source software, released under the MIT licens
:target: https://travis-ci.org/explosion/spaCy :target: https://travis-ci.org/explosion/spaCy
:alt: Build Status :alt: Build Status
.. image:: https://img.shields.io/appveyor/ci/explosion/spaCy/master.svg?style=flat-square
:target: https://ci.appveyor.com/project/explosion/spaCy
:alt: Appveyor Build Status
.. image:: https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square .. image:: https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square
:target: https://github.com/explosion/spaCy/releases :target: https://github.com/explosion/spaCy/releases
:alt: Current Release Version :alt: Current Release Version
@ -108,7 +112,7 @@ the `documentation <https://spacy.io/usage>`_.
==================== === ==================== ===
**Operating system** macOS / OS X, Linux, Windows (Cygwin, MinGW, Visual Studio) **Operating system** macOS / OS X, Linux, Windows (Cygwin, MinGW, Visual Studio)
**Python version** CPython 2.6, 2.7, 3.3+. Only 64 bit. **Python version** CPython 2.7, 3.4+. Only 64 bit.
**Package managers** `pip`_ (source packages only), `conda`_ (via ``conda-forge``) **Package managers** `pip`_ (source packages only), `conda`_ (via ``conda-forge``)
==================== === ==================== ===

View File

@ -36,7 +36,8 @@ def main(model='en_core_web_sm'):
def extract_currency_relations(doc): def extract_currency_relations(doc):
# merge entities and noun chunks into one token # merge entities and noun chunks into one token
for span in [*list(doc.ents), *list(doc.noun_chunks)]: spans = list(doc.ents) + list(doc.noun_chunks)
for span in spans:
span.merge() span.merge()
relations = [] relations = []

View File

@ -73,7 +73,7 @@ TRAIN_DATA = [
new_model_name=("New model name for model meta.", "option", "nm", str), new_model_name=("New model name for model meta.", "option", "nm", str),
output_dir=("Optional output directory", "option", "o", Path), output_dir=("Optional output directory", "option", "o", Path),
n_iter=("Number of training iterations", "option", "n", int)) n_iter=("Number of training iterations", "option", "n", int))
def main(model=None, new_model_name='animal', output_dir=None, n_iter=50): def main(model=None, new_model_name='animal', output_dir=None, n_iter=20):
"""Set up the pipeline and entity recognizer, and train the new entity.""" """Set up the pipeline and entity recognizer, and train the new entity."""
if model is not None: if model is not None:
nlp = spacy.load(model) # load existing spaCy model nlp = spacy.load(model) # load existing spaCy model

View File

@ -30,8 +30,11 @@ TAG_MAP = {
'J': {'pos': 'ADJ'} 'J': {'pos': 'ADJ'}
} }
# Usually you'll read this in, of course. Data formats vary. # Usually you'll read this in, of course. Data formats vary. Ensure your
# Ensure your strings are unicode. # strings are unicode and that the number of tags assigned matches spaCy's
# tokenization. If not, you can always add a 'words' key to the annotations
# that specifies the gold-standard tokenization, e.g.:
# ("Eatblueham", {'words': ['Eat', 'blue', 'ham'] 'tags': ['V', 'J', 'N']})
TRAIN_DATA = [ TRAIN_DATA = [
("I like green eggs", {'tags': ['N', 'V', 'J', 'N']}), ("I like green eggs", {'tags': ['N', 'V', 'J', 'N']}),
("Eat blue ham", {'tags': ['V', 'J', 'N']}) ("Eat blue ham", {'tags': ['V', 'J', 'N']})

View File

@ -13,7 +13,7 @@ from spacy.language import Language
@plac.annotations( @plac.annotations(
vectors_loc=("Path to vectors", "positional", None, str), vectors_loc=("Path to .vec file", "positional", None, str),
lang=("Optional language ID. If not set, blank Language() will be used.", lang=("Optional language ID. If not set, blank Language() will be used.",
"positional", None, str)) "positional", None, str))
def main(vectors_loc, lang=None): def main(vectors_loc, lang=None):
@ -30,7 +30,7 @@ def main(vectors_loc, lang=None):
nlp.vocab.reset_vectors(width=int(nr_dim)) nlp.vocab.reset_vectors(width=int(nr_dim))
for line in file_: for line in file_:
line = line.rstrip().decode('utf8') line = line.rstrip().decode('utf8')
pieces = line.rsplit(' ', nr_dim) pieces = line.rsplit(' ', int(nr_dim))
word = pieces[0] word = pieces[0]
vector = numpy.asarray([float(v) for v in pieces[1:]], dtype='f') vector = numpy.asarray([float(v) for v in pieces[1:]], dtype='f')
nlp.vocab.set_vector(word, vector) # add the vectors to the vocab nlp.vocab.set_vector(word, vector) # add the vectors to the vocab

View File

@ -211,9 +211,9 @@ def setup_package():
'Operating System :: MacOS :: MacOS X', 'Operating System :: MacOS :: MacOS X',
'Operating System :: Microsoft :: Windows', 'Operating System :: Microsoft :: Windows',
'Programming Language :: Cython', 'Programming Language :: Cython',
'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.6',

View File

@ -131,7 +131,7 @@ def intify_attrs(stringy_attrs, strings_map=None, _do_deprecated=False):
'NumValue', 'PartType', 'Polite', 'StyleVariant', 'NumValue', 'PartType', 'Polite', 'StyleVariant',
'PronType', 'AdjType', 'Person', 'Variant', 'AdpType', 'PronType', 'AdjType', 'Person', 'Variant', 'AdpType',
'Reflex', 'Negative', 'Mood', 'Aspect', 'Case', 'Reflex', 'Negative', 'Mood', 'Aspect', 'Case',
'Polarity', # U20 'Polarity', 'Animacy' # U20
] ]
for key in morph_keys: for key in morph_keys:
if key in stringy_attrs: if key in stringy_attrs:

View File

@ -146,7 +146,7 @@ def list_files(data_dir):
def list_requirements(meta): def list_requirements(meta):
parent_package = meta.get('parent_package', 'spacy') parent_package = meta.get('parent_package', 'spacy')
requirements = [parent_package + meta['spacy_version']] requirements = [parent_package + ">=" + meta['spacy_version']]
if 'setup_requires' in meta: if 'setup_requires' in meta:
requirements += meta['setup_requires'] requirements += meta['setup_requires']
return requirements return requirements

View File

@ -36,13 +36,13 @@ def profile(cmd, lang, inputs=None):
if inputs is None: if inputs is None:
imdb_train, _ = thinc.extra.datasets.imdb() imdb_train, _ = thinc.extra.datasets.imdb()
inputs, _ = zip(*imdb_train) inputs, _ = zip(*imdb_train)
inputs = inputs[:2000] inputs = inputs[:25000]
nlp = spacy.load(lang) nlp = spacy.load(lang)
texts = list(cytoolz.take(10000, inputs)) texts = list(cytoolz.take(10000, inputs))
cProfile.runctx("parse_texts(nlp, texts)", globals(), locals(), cProfile.runctx("parse_texts(nlp, texts)", globals(), locals(),
"Profile.prof") "Profile.prof")
s = pstats.Stats("Profile.prof") s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("cumtime").print_stats() s.strip_dirs().sort_stats("time").print_stats()
def parse_texts(nlp, texts): def parse_texts(nlp, texts):

View File

@ -53,9 +53,9 @@ is_osx = sys.platform == 'darwin'
if is_python2: if is_python2:
import imp import imp
bytes_ = str bytes_ = str
unicode_ = unicode unicode_ = unicode # noqa: F821
basestring_ = basestring basestring_ = basestring # noqa: F821
input_ = raw_input input_ = raw_input # noqa: F821
json_dumps = lambda data: ujson.dumps(data, indent=2, escape_forward_slashes=False).decode('utf8') json_dumps = lambda data: ujson.dumps(data, indent=2, escape_forward_slashes=False).decode('utf8')
path2str = lambda path: str(path).decode('utf8') path2str = lambda path: str(path).decode('utf8')

View File

@ -97,7 +97,7 @@ def parse_deps(orig_doc, options={}):
word.lemma_, word.ent_type_)) word.lemma_, word.ent_type_))
for span_props in spans: for span_props in spans:
doc.merge(*span_props) doc.merge(*span_props)
words = [{'text': w.text, 'tag': w.tag_} for w in doc] words = [{'text': w.text, 'tag': w.pos_} for w in doc]
arcs = [] arcs = []
for word in doc: for word in doc:
if word.i < word.head.i: if word.i < word.head.i:

View File

@ -541,5 +541,24 @@ def biluo_tags_from_offsets(doc, entities, missing='O'):
return biluo return biluo
def offsets_from_biluo_tags(doc, tags):
"""Encode per-token tags following the BILUO scheme into entity offsets.
doc (Doc): The document that the BILUO tags refer to.
entities (iterable): A sequence of BILUO tags with each tag describing one
token. Each tags string will be of the form of either "", "O" or
"{action}-{label}", where action is one of "B", "I", "L", "U".
RETURNS (list): A sequence of `(start, end, label)` triples. `start` and
`end` will be character-offset integers denoting the slice into the
original string.
"""
token_offsets = tags_to_entities(tags)
offsets = []
for label, start_idx, end_idx in token_offsets:
span = doc[start_idx : end_idx + 1]
offsets.append((span.start_char, span.end_char, label))
return offsets
def is_punct_label(label): def is_punct_label(label):
return label == 'P' or label.lower() == 'punct' return label == 'P' or label.lower() == 'punct'

View File

@ -15,9 +15,11 @@ _hebrew = r'[\p{L}&&\p{Hebrew}]'
_latin_lower = r'[\p{Ll}&&\p{Latin}]' _latin_lower = r'[\p{Ll}&&\p{Latin}]'
_latin_upper = r'[\p{Lu}&&\p{Latin}]' _latin_upper = r'[\p{Lu}&&\p{Latin}]'
_latin = r'[[\p{Ll}||\p{Lu}]&&\p{Latin}]' _latin = r'[[\p{Ll}||\p{Lu}]&&\p{Latin}]'
_russian_lower = r'[ёа-я]'
_russian_upper = r'[ЁА-Я]'
_upper = [_latin_upper] _upper = [_latin_upper, _russian_upper]
_lower = [_latin_lower] _lower = [_latin_lower, _russian_lower]
_uncased = [_bengali, _hebrew] _uncased = [_bengali, _hebrew]
ALPHA = merge_char_classes(_upper + _lower + _uncased) ALPHA = merge_char_classes(_upper + _lower + _uncased)
@ -27,8 +29,9 @@ ALPHA_UPPER = merge_char_classes(_upper + _uncased)
_units = ('km km² km³ m m² m³ dm dm² dm³ cm cm² cm³ mm mm² mm³ ha µm nm yd in ft ' _units = ('km km² km³ m m² m³ dm dm² dm³ cm cm² cm³ mm mm² mm³ ha µm nm yd in ft '
'kg g mg µg t lb oz m/s km/h kmh mph hPa Pa mbar mb MB kb KB gb GB tb ' 'kg g mg µg t lb oz m/s km/h kmh mph hPa Pa mbar mb MB kb KB gb GB tb '
'TB T G M K %') 'TB T G M K % км км² км³ м м² м³ дм дм² дм³ см см² см³ мм мм² мм³ нм '
_currency = r'\$ £ € ¥ ฿ US\$ C\$ A\$' 'кг г мг м/с км/ч кПа Па мбар Кб КБ кб Мб МБ мб Гб ГБ гб Тб ТБ тб')
_currency = r'\$ £ € ¥ ฿ US\$ C\$ A\$ ₽'
# These expressions contain various unicode variations, including characters # These expressions contain various unicode variations, including characters
# used in Chinese (see #1333, #1340, #1351) unless there are cross-language # used in Chinese (see #1333, #1340, #1351) unless there are cross-language

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
from .norm_exceptions import NORM_EXCEPTIONS
from .stop_words import STOP_WORDS from .stop_words import STOP_WORDS
from .lex_attrs import LEX_ATTRS from .lex_attrs import LEX_ATTRS
from .morph_rules import MORPH_RULES from .morph_rules import MORPH_RULES
@ -18,7 +19,8 @@ class DanishDefaults(Language.Defaults):
lex_attr_getters = dict(Language.Defaults.lex_attr_getters) lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
lex_attr_getters.update(LEX_ATTRS) lex_attr_getters.update(LEX_ATTRS)
lex_attr_getters[LANG] = lambda text: 'da' lex_attr_getters[LANG] = lambda text: 'da'
lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS) lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM],
BASE_NORMS, NORM_EXCEPTIONS)
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS) tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
# morph_rules = MORPH_RULES # morph_rules = MORPH_RULES
tag_map = TAG_MAP tag_map = TAG_MAP

View File

@ -11,7 +11,7 @@ Example sentences to test spaCy and its language models.
sentences = [ sentences = [
"Apple overvejer at købe et britisk statup for 1 milliard dollar", "Apple overvejer at købe et britisk startup for 1 milliard dollar",
"Selvkørende biler flytter forsikringsansvaret over på producenterne", "Selvkørende biler flytter forsikringsansvaret over på producenterne",
"San Francisco overvejer at forbyde leverandørrobotter på fortov", "San Francisco overvejer at forbyde leverandørrobotter på fortov",
"London er en stor by i Storbritannien" "London er en stor by i Storbritannien"

View File

@ -0,0 +1,527 @@
# coding: utf8
"""
Special-case rules for normalizing tokens to improve the model's predictions.
For example 'mysterium' vs 'mysterie' and similar.
"""
from __future__ import unicode_literals
# Sources:
# 1: https://dsn.dk/retskrivning/om-retskrivningsordbogen/mere-om-retskrivningsordbogen-2012/endrede-stave-og-ordformer/
# 2: http://www.tjerry-korrektur.dk/ord-med-flere-stavemaader/
_exc = {
# Alternative spelling
"a-kraft-værk": "a-kraftværk", # 1
"ålborg": "aalborg", # 2
"århus": "aarhus",
"accessoirer": "accessoires", # 1
"affektert": "affekteret", # 1
"afrikander": "afrikaaner", # 1
"aftabuere": "aftabuisere", # 1
"aftabuering": "aftabuisering", # 1
"akvarium": "akvarie", # 1
"alenefader": "alenefar", # 1
"alenemoder": "alenemor", # 1
"alkoholambulatorium": "alkoholambulatorie", # 1
"ambulatorium": "ambulatorie", # 1
"ananassene": "ananasserne", # 2
"anførelsestegn": "anførselstegn", # 1
"anseelig": "anselig", # 2
"antioxydant": "antioxidant", # 1
"artrig": "artsrig", # 1
"auditorium": "auditorie", # 1
"avocado": "avokado", # 2
"bagerst": "bagest", # 2
"bagstræv": "bagstræb", # 1
"bagstræver": "bagstræber", # 1
"bagstræverisk": "bagstræberisk", # 1
"balde": "balle", # 2
"barselorlov": "barselsorlov", # 1
"barselvikar": "barselsvikar", # 1
"baskien": "baskerlandet", # 1
"bayrisk": "bayersk", # 1
"bedstefader": "bedstefar", # 1
"bedstemoder": "bedstemor", # 1
"behefte": "behæfte", # 1
"beheftelse": "behæftelse", # 1
"bidragydende": "bidragsydende", # 1
"bidragyder": "bidragsyder", # 1
"billiondel": "billiontedel", # 1
"blaseret": "blasert", # 1
"bleskifte": "bleskift", # 1
"blodbroder": "blodsbroder", # 2
"blyantspidser": "blyantsspidser", # 2
"boligministerium": "boligministerie", # 1
"borhul": "borehul", # 1
"broder": "bror", # 2
"buldog": "bulldog", # 2
"bådhus": "bådehus", # 1
"børnepleje": "barnepleje", # 1
"børneseng": "barneseng", # 1
"børnestol": "barnestol", # 1
"cairo": "kairo", # 1
"cambodia": "cambodja", # 1
"cambodianer": "cambodjaner", # 1
"cambodiansk": "cambodjansk", # 1
"camouflage": "kamuflage", # 2
"campylobacter": "kampylobakter", # 1
"centeret": "centret", # 2
"chefskahyt": "chefkahyt", # 1
"chefspost": "chefpost", # 1
"chefssekretær": "chefsekretær", # 1
"chefsstol": "chefstol", # 1
"cirkulærskrivelse": "cirkulæreskrivelse", # 1
"cognacsglas": "cognacglas", # 1
"columnist": "kolumnist", # 1
"cricket": "kricket", # 2
"dagplejemoder": "dagplejemor", # 1
"damaskesdug": "damaskdug", # 1
"damp-barn": "dampbarn", # 1
"delfinarium": "delfinarie", # 1
"dentallaboratorium": "dentallaboratorie", # 1
"diaramme": "diasramme", # 1
"diaré": "diarré", # 1
"dioxyd": "dioxid", # 1
"dommedagsprædiken": "dommedagspræken", # 1
"donut": "doughnut", # 2
"driftmæssig": "driftsmæssig", # 1
"driftsikker": "driftssikker", # 1
"driftsikring": "driftssikring", # 1
"drikkejogurt": "drikkeyoghurt", # 1
"drivein": "drive-in", # 1
"driveinbiograf": "drive-in-biograf", # 1
"drøvel": "drøbel", # 1
"dødskriterium": "dødskriterie", # 1
"e-mail-adresse": "e-mailadresse", # 1
"e-post-adresse": "e-postadresse", # 1
"egypten": "ægypten", # 2
"ekskommunicere": "ekskommunikere", # 1
"eksperimentarium": "eksperimentarie", # 1
"elsass": "Alsace", # 1
"elsasser": "alsacer", # 1
"elsassisk": "alsacisk", # 1
"elvetal": "ellevetal", # 1
"elvetiden": "ellevetiden", # 1
"elveårig": "elleveårig", # 1
"elveårs": "elleveårs", # 1
"elveårsbarn": "elleveårsbarn", # 1
"elvte": "ellevte", # 1
"elvtedel": "ellevtedel", # 1
"energiministerium": "energiministerie", # 1
"erhvervsministerium": "erhvervsministerie", # 1
"espaliere": "spaliere", # 2
"evangelium": "evangelie", # 1
"fagministerium": "fagministerie", # 1
"fakse": "faxe", # 1
"fangstkvota": "fangstkvote", # 1
"fader": "far", # 2
"farbroder": "farbror", # 1
"farfader": "farfar", # 1
"farmoder": "farmor", # 1
"federal": "føderal", # 1
"federalisering": "føderalisering", # 1
"federalisme": "føderalisme", # 1
"federalist": "føderalist", # 1
"federalistisk": "føderalistisk", # 1
"federation": "føderation", # 1
"federativ": "føderativ", # 1
"fejlbeheftet": "fejlbehæftet", # 1
"femetagers": "femetages", # 2
"femhundredekroneseddel": "femhundredkroneseddel", # 2
"filmpremiere": "filmpræmiere", # 2
"finansimperium": "finansimperie", # 1
"finansministerium": "finansministerie", # 1
"firehjulstræk": "firhjulstræk", # 2
"fjernstudium": "fjernstudie", # 1
"formalier": "formalia", # 1
"formandsskift": "formandsskifte", # 1
"fornemst": "fornemmest", # 2
"fornuftparti": "fornuftsparti", # 1
"fornuftstridig": "fornuftsstridig", # 1
"fornuftvæsen": "fornuftsvæsen", # 1
"fornuftægteskab": "fornuftsægteskab", # 1
"forretningsministerium": "forretningsministerie", # 1
"forskningsministerium": "forskningsministerie", # 1
"forstudium": "forstudie", # 1
"forsvarsministerium": "forsvarsministerie", # 1
"frilægge": "fritlægge", # 1
"frilæggelse": "fritlæggelse", # 1
"frilægning": "fritlægning", # 1
"fristille": "fritstille", # 1
"fristilling": "fritstilling", # 1
"fuldttegnet": "fuldtegnet", # 1
"fødestedskriterium": "fødestedskriterie", # 1
"fødevareministerium": "fødevareministerie", # 1
"følesløs": "følelsesløs", # 1
"følgeligt": "følgelig", # 1
"førne": "førn", # 1
"gearskift": "gearskifte", # 2
"gladeligt": "gladelig", # 1
"glosehefte": "glosehæfte", # 1
"glædeløs": "glædesløs", # 1
"gonoré": "gonorré", # 1
"grangiveligt": "grangivelig", # 1
"grundliggende": "grundlæggende", # 2
"grønsag": "grøntsag", # 2
"gudbenådet": "gudsbenådet", # 1
"gudfader": "gudfar", # 1
"gudmoder": "gudmor", # 1
"gulvmop": "gulvmoppe", # 1
"gymnasium": "gymnasie", # 1
"hackning": "hacking", # 1
"halvbroder": "halvbror", # 1
"halvelvetiden": "halvellevetiden", # 1
"handelsgymnasium": "handelsgymnasie", # 1
"hefte": "hæfte", # 1
"hefteklamme": "hæfteklamme", # 1
"heftelse": "hæftelse", # 1
"heftemaskine": "hæftemaskine", # 1
"heftepistol": "hæftepistol", # 1
"hefteplaster": "hæfteplaster", # 1
"heftestraf": "hæftestraf", # 1
"heftning": "hæftning", # 1
"helbroder": "helbror", # 1
"hjemmeklasse": "hjemklasse", # 1
"hjulspin": "hjulspind", # 1
"huggevåben": "hugvåben", # 1
"hulmurisolering": "hulmursisolering", # 1
"hurtiggående": "hurtigtgående", # 2
"hurtigttørrende": "hurtigtørrende", # 2
"husmoder": "husmor", # 1
"hydroxyd": "hydroxid", # 1
"håndmikser": "håndmixer", # 1
"højtaler": "højttaler", # 2
"hønemoder": "hønemor", # 1
"ide": "idé", # 2
"imperium": "imperie", # 1
"imponerthed": "imponerethed", # 1
"inbox": "indboks", # 2
"indenrigsministerium": "indenrigsministerie", # 1
"indhefte": "indhæfte", # 1
"indheftning": "indhæftning", # 1
"indicium": "indicie", # 1
"indkassere": "inkassere", # 2
"iota": "jota", # 1
"jobskift": "jobskifte", # 1
"jogurt": "yoghurt", # 1
"jukeboks": "jukebox", # 1
"justitsministerium": "justitsministerie", # 1
"kalorifere": "kalorifer", # 1
"kandidatstipendium": "kandidatstipendie", # 1
"kannevas": "kanvas", # 1
"kaperssauce": "kaperssovs", # 1
"kigge": "kikke", # 2
"kirkeministerium": "kirkeministerie", # 1
"klapmydse": "klapmyds", # 1
"klimakterium": "klimakterie", # 1
"klogeligt": "klogelig", # 1
"knivblad": "knivsblad", # 1
"kollegaer": "kolleger", # 2
"kollegium": "kollegie", # 1
"kollegiehefte": "kollegiehæfte", # 1
"kollokviumx": "kollokvium", # 1
"kommissorium": "kommissorie", # 1
"kompendium": "kompendie", # 1
"komplicerthed": "komplicerethed", # 1
"konfederation": "konføderation", # 1
"konfedereret": "konfødereret", # 1
"konferensstudium": "konferensstudie", # 1
"konservatorium": "konservatorie", # 1
"konsulere": "konsultere", # 1
"kradsbørstig": "krasbørstig", # 2
"kravsspecifikation": "kravspecifikation", # 1
"krematorium": "krematorie", # 1
"krep": "crepe", # 1
"krepnylon": "crepenylon", # 1
"kreppapir": "crepepapir", # 1
"kricket": "cricket", # 2
"kriterium": "kriterie", # 1
"kroat": "kroater", # 2
"kroki": "croquis", # 1
"kronprinsepar": "kronprinspar", # 2
"kropdoven": "kropsdoven", # 1
"kroplus": "kropslus", # 1
"krøllefedt": "krølfedt", # 1
"kulturministerium": "kulturministerie", # 1
"kuponhefte": "kuponhæfte", # 1
"kvota": "kvote", # 1
"kvotaordning": "kvoteordning", # 1
"laboratorium": "laboratorie", # 1
"laksfarve": "laksefarve", # 1
"laksfarvet": "laksefarvet", # 1
"laksrød": "lakserød", # 1
"laksyngel": "lakseyngel", # 1
"laksørred": "lakseørred", # 1
"landbrugsministerium": "landbrugsministerie", # 1
"landskampstemning": "landskampsstemning", # 1
"langust": "languster", # 1
"lappegrejer": "lappegrej", # 1
"lavløn": "lavtløn", # 1
"lillebroder": "lillebror", # 1
"linear": "lineær", # 1
"loftlampe": "loftslampe", # 2
"log-in": "login", # 1
"login": "log-in", # 2
"lovmedholdig": "lovmedholdelig", # 1
"ludder": "luder", # 2
"lysholder": "lyseholder", # 1
"lægeskifte": "lægeskift", # 1
"lærvillig": "lærevillig", # 1
"løgsauce": "løgsovs", # 1
"madmoder": "madmor", # 1
"majonæse": "mayonnaise", # 1
"mareridtagtig": "mareridtsagtig", # 1
"margen": "margin", # 2
"martyrium": "martyrie", # 1
"mellemstatlig": "mellemstatslig", # 1
"menneskene": "menneskerne", # 2
"metropolis": "metropol", # 1
"miks": "mix", # 1
"mikse": "mixe", # 1
"miksepult": "mixerpult", # 1
"mikser": "mixer", # 1
"mikserpult": "mixerpult", # 1
"mikslån": "mixlån", # 1
"miksning": "mixning", # 1
"miljøministerium": "miljøministerie", # 1
"milliarddel": "milliardtedel", # 1
"milliondel": "milliontedel", # 1
"ministerium": "ministerie", # 1
"mop": "moppe", # 1
"moder": "mor", # 2
"moratorium": "moratorie", # 1
"morbroder": "morbror", # 1
"morfader": "morfar", # 1
"mormoder": "mormor", # 1
"musikkonservatorium": "musikkonservatorie", # 1
"muslingskal": "muslingeskal", # 1
"mysterium": "mysterie", # 1
"naturalieydelse": "naturalydelse", # 1
"naturalieøkonomi": "naturaløkonomi", # 1
"navnebroder": "navnebror", # 1
"nerium": "nerie", # 1
"nådeløs": "nådesløs", # 1
"nærforestående": "nærtforestående", # 1
"nærstående": "nærtstående", # 1
"observatorium": "observatorie", # 1
"oldefader": "oldefar", # 1
"oldemoder": "oldemor", # 1
"opgraduere": "opgradere", # 1
"opgraduering": "opgradering", # 1
"oratorium": "oratorie", # 1
"overbookning": "overbooking", # 1
"overpræsidium": "overpræsidie", # 1
"overstatlig": "overstatslig", # 1
"oxyd": "oxid", # 1
"oxydere": "oxidere", # 1
"oxydering": "oxidering", # 1
"pakkenellike": "pakkenelliker", # 1
"papirtynd": "papirstynd", # 1
"pastoralseminarium": "pastoralseminarie", # 1
"peanutsene": "peanuttene", # 2
"penalhus": "pennalhus", # 2
"pensakrav": "pensumkrav", # 1
"pepperoni": "peperoni", # 1
"peruaner": "peruvianer", # 1
"petrole": "petrol", # 1
"piltast": "piletast", # 1
"piltaste": "piletast", # 1
"planetarium": "planetarie", # 1
"plasteret": "plastret", # 2
"plastic": "plastik", # 2
"play-off-kamp": "playoffkamp", # 1
"plejefader": "plejefar", # 1
"plejemoder": "plejemor", # 1
"podium": "podie", # 2
"praha": "prag", # 2
"preciøs": "pretiøs", # 2
"privilegium": "privilegie", # 1
"progredere": "progrediere", # 1
"præsidium": "præsidie", # 1
"psykodelisk": "psykedelisk", # 1
"pudsegrejer": "pudsegrej", # 1
"referensgruppe": "referencegruppe", # 1
"referensramme": "referenceramme", # 1
"refugium": "refugie", # 1
"registeret": "registret", # 2
"remedium": "remedie", # 1
"remiks": "remix", # 1
"reservert": "reserveret", # 1
"ressortministerium": "ressortministerie", # 1
"ressource": "resurse", # 2
"resætte": "resette", # 1
"rettelig": "retteligt", # 1
"rettetaste": "rettetast", # 1
"returtaste": "returtast", # 1
"risici": "risikoer", # 2
"roll-on": "rollon", # 1
"rollehefte": "rollehæfte", # 1
"rostbøf": "roastbeef", # 1
"rygsæksturist": "rygsækturist", # 1
"rødstjært": "rødstjert", # 1
"saddel": "sadel", # 2
"samaritan": "samaritaner", # 2
"sanatorium": "sanatorie", # 1
"sauce": "sovs", # 1
"scanning": "skanning", # 2
"sceneskifte": "sceneskift", # 1
"scilla": "skilla", # 1
"sejflydende": "sejtflydende", # 1
"selvstudium": "selvstudie", # 1
"seminarium": "seminarie", # 1
"sennepssauce": "sennepssovs ", # 1
"servitutbeheftet": "servitutbehæftet", # 1
"sit-in": "sitin", # 1
"skatteministerium": "skatteministerie", # 1
"skifer": "skiffer", # 2
"skyldsfølelse": "skyldfølelse", # 1
"skysauce": "skysovs", # 1
"sladdertaske": "sladretaske", # 2
"sladdervorn": "sladrevorn", # 2
"slagsbroder": "slagsbror", # 1
"slettetaste": "slettetast", # 1
"smørsauce": "smørsovs", # 1
"snitsel": "schnitzel", # 1
"snobbeeffekt": "snobeffekt", # 2
"socialministerium": "socialministerie", # 1
"solarium": "solarie", # 1
"soldebroder": "soldebror", # 1
"spagetti": "spaghetti", # 1
"spagettistrop": "spaghettistrop", # 1
"spagettiwestern": "spaghettiwestern", # 1
"spin-off": "spinoff", # 1
"spinnefiskeri": "spindefiskeri", # 1
"spolorm": "spoleorm", # 1
"sproglaboratorium": "sproglaboratorie", # 1
"spækbræt": "spækkebræt", # 2
"stand-in": "standin", # 1
"stand-up-comedy": "standupcomedy", # 1
"stand-up-komiker": "standupkomiker", # 1
"statsministerium": "statsministerie", # 1
"stedbroder": "stedbror", # 1
"stedfader": "stedfar", # 1
"stedmoder": "stedmor", # 1
"stilehefte": "stilehæfte", # 1
"stipendium": "stipendie", # 1
"stjært": "stjert", # 1
"stjærthage": "stjerthage", # 1
"storebroder": "storebror", # 1
"stortå": "storetå", # 1
"strabads": "strabadser", # 1
"strømlinjet": "strømlinet", # 1
"studium": "studie", # 1
"stænkelap": "stænklap", # 1
"sundhedsministerium": "sundhedsministerie", # 1
"suppositorium": "suppositorie", # 1
"svejts": "schweiz", # 1
"svejtser": "schweizer", # 1
"svejtserfranc": "schweizerfranc", # 1
"svejtserost": "schweizerost", # 1
"svejtsisk": "schweizisk", # 1
"svigerfader": "svigerfar", # 1
"svigermoder": "svigermor", # 1
"svirebroder": "svirebror", # 1
"symposium": "symposie", # 1
"sælarium": "sælarie", # 1
"søreme": "sørme", # 2
"søterritorium": "søterritorie", # 1
"t-bone-steak": "t-bonesteak", # 1
"tabgivende": "tabsgivende", # 1
"tabuere": "tabuisere", # 1
"tabuering": "tabuisering", # 1
"tackle": "takle", # 2
"tackling": "takling", # 2
"taifun": "tyfon", # 1
"take-off": "takeoff", # 1
"taknemlig": "taknemmelig", # 2
"talehørelærer": "tale-høre-lærer", # 1
"talehøreundervisning": "tale-høre-undervisning", # 1
"tandstik": "tandstikker", # 1
"tao": "dao", # 1
"taoisme": "daoisme", # 1
"taoist": "daoist", # 1
"taoistisk": "daoistisk", # 1
"taverne": "taverna", # 1
"teateret": "teatret", # 2
"tekno": "techno", # 1
"temposkifte": "temposkift", # 1
"terrarium": "terrarie", # 1
"territorium": "territorie", # 1
"tesis": "tese", # 1
"tidsstudium": "tidsstudie", # 1
"tipoldefader": "tipoldefar", # 1
"tipoldemoder": "tipoldemor", # 1
"tomatsauce": "tomatsovs", # 1
"tonart": "toneart", # 1
"trafikministerium": "trafikministerie", # 1
"tredve": "tredive", # 1
"tredver": "trediver", # 1
"tredveårig": "trediveårig", # 1
"tredveårs": "trediveårs", # 1
"tredveårsfødselsdag": "trediveårsfødselsdag", # 1
"tredvte": "tredivte", # 1
"tredvtedel": "tredivtedel", # 1
"troldunge": "troldeunge", # 1
"trommestikke": "trommestik", # 1
"trubadur": "troubadour", # 2
"trøstepræmie": "trøstpræmie", # 2
"tummerum": "trummerum", # 1
"tumultuarisk": "tumultarisk", # 1
"tunghørighed": "tunghørhed", # 1
"tus": "tusch", # 2
"tusind": "tusinde", # 2
"tvillingbroder": "tvillingebror", # 1
"tvillingbror": "tvillingebror", # 1
"tvillingebroder": "tvillingebror", # 1
"ubeheftet": "ubehæftet", # 1
"udenrigsministerium": "udenrigsministerie", # 1
"udhulning": "udhuling", # 1
"udslaggivende": "udslagsgivende", # 1
"udspekulert": "udspekuleret", # 1
"udviklingsministerium": "udviklingsministerie", # 1
"uforpligtigende": "uforpligtende", # 1
"uheldvarslende": "uheldsvarslende", # 1
"uimponerthed": "uimponerethed", # 1
"undervisningsministerium": "undervisningsministerie", # 1
"unægtelig": "unægteligt", # 1
"urinale": "urinal", # 1
"uvederheftig": "uvederhæftig", # 1
"vabel": "vable", # 2
"vadi": "wadi", # 1
"vaklevorn": "vakkelvorn", # 1
"vanadin": "vanadium", # 1
"vaselin": "vaseline", # 1
"vederheftig": "vederhæftig", # 1
"vedhefte": "vedhæfte", # 1
"velar": "velær", # 1
"videndeling": "vidensdeling", # 2
"vinkelanførelsestegn": "vinkelanførselstegn", # 1
"vipstjært": "vipstjert", # 1
"vismut": "bismut", # 1
"visvas": "vissevasse", # 1
"voksværk": "vokseværk", # 1
"værtdyr": "værtsdyr", # 1
"værtplante": "værtsplante", # 1
"wienersnitsel": "wienerschnitzel", # 1
"yderliggående": "yderligtgående", # 2
"zombi": "zombie", # 1
"ægbakke": "æggebakke", # 1
"ægformet": "æggeformet", # 1
"ægleder": "æggeleder", # 1
"ækvilibrist": "ekvilibrist", # 2
"æselsøre": "æseløre", # 1
"øjehule": "øjenhule", # 1
"øjelåg": "øjenlåg", # 1
"øjeåbner": "øjenåbner", # 1
"økonomiministerium": "økonomiministerie", # 1
"ørenring": "ørering", # 2
"øvehefte": "øvehæfte" # 1
}
NORM_EXCEPTIONS = {}
for string, norm in _exc.items():
NORM_EXCEPTIONS[string] = norm
NORM_EXCEPTIONS[string.title()] = norm

View File

@ -1,32 +1,134 @@
# encoding: utf8 # encoding: utf8
"""
Tokenizer Exceptions.
Source: https://forkortelse.dk/ and various others.
"""
from __future__ import unicode_literals from __future__ import unicode_literals
from ...symbols import ORTH, LEMMA, NORM from ...symbols import ORTH, LEMMA, NORM, TAG, PUNCT
_exc = {} _exc = {}
# Abbreviations for weekdays "søn." (for "søndag") as well as "Tor." and "Tors."
# (for "torsdag") are left out because they are ambiguous. The same is the case
# for abbreviations "jul." and "Jul." ("juli").
for exc_data in [ for exc_data in [
{ORTH: "Kbh.", LEMMA: "København", NORM: "København"}, {ORTH: "Kbh.", LEMMA: "København", NORM: "København"},
{ORTH: "Jan.", LEMMA: "januar", NORM: "januar"}, {ORTH: "jan.", LEMMA: "januar"},
{ORTH: "Feb.", LEMMA: "februar", NORM: "februar"}, {ORTH: "febr.", LEMMA: "februar"},
{ORTH: "Mar.", LEMMA: "marts", NORM: "marts"}, {ORTH: "feb.", LEMMA: "februar"},
{ORTH: "Apr.", LEMMA: "april", NORM: "april"}, {ORTH: "mar.", LEMMA: "marts"},
{ORTH: "Maj.", LEMMA: "maj", NORM: "maj"}, {ORTH: "apr.", LEMMA: "april"},
{ORTH: "Jun.", LEMMA: "juni", NORM: "juni"}, {ORTH: "jun.", LEMMA: "juni"},
{ORTH: "Jul.", LEMMA: "juli", NORM: "juli"}, {ORTH: "aug.", LEMMA: "august"},
{ORTH: "Aug.", LEMMA: "august", NORM: "august"}, {ORTH: "sept.", LEMMA: "september"},
{ORTH: "Sep.", LEMMA: "september", NORM: "september"}, {ORTH: "sep.", LEMMA: "september"},
{ORTH: "Okt.", LEMMA: "oktober", NORM: "oktober"}, {ORTH: "okt.", LEMMA: "oktober"},
{ORTH: "Nov.", LEMMA: "november", NORM: "november"}, {ORTH: "nov.", LEMMA: "november"},
{ORTH: "Dec.", LEMMA: "december", NORM: "december"}]: {ORTH: "dec.", LEMMA: "december"},
{ORTH: "man.", LEMMA: "mandag"},
{ORTH: "tirs.", LEMMA: "tirsdag"},
{ORTH: "ons.", LEMMA: "onsdag"},
{ORTH: "tor.", LEMMA: "torsdag"},
{ORTH: "tors.", LEMMA: "torsdag"},
{ORTH: "fre.", LEMMA: "fredag"},
{ORTH: "lør.", LEMMA: "lørdag"},
{ORTH: "Jan.", LEMMA: "januar"},
{ORTH: "Febr.", LEMMA: "februar"},
{ORTH: "Feb.", LEMMA: "februar"},
{ORTH: "Mar.", LEMMA: "marts"},
{ORTH: "Apr.", LEMMA: "april"},
{ORTH: "Jun.", LEMMA: "juni"},
{ORTH: "Aug.", LEMMA: "august"},
{ORTH: "Sept.", LEMMA: "september"},
{ORTH: "Sep.", LEMMA: "september"},
{ORTH: "Okt.", LEMMA: "oktober"},
{ORTH: "Nov.", LEMMA: "november"},
{ORTH: "Dec.", LEMMA: "december"},
{ORTH: "Man.", LEMMA: "mandag"},
{ORTH: "Tirs.", LEMMA: "tirsdag"},
{ORTH: "Ons.", LEMMA: "onsdag"},
{ORTH: "Fre.", LEMMA: "fredag"},
{ORTH: "Lør.", LEMMA: "lørdag"}]:
_exc[exc_data[ORTH]] = [exc_data] _exc[exc_data[ORTH]] = [exc_data]
for orth in [ for orth in [
"A/S", "beg.", "bl.a.", "ca.", "d.s.s.", "dvs.", "f.eks.", "fr.", "hhv.", "A.D.", "A/S", "aarh.", "ac.", "adj.", "adr.", "adsk.", "adv.", "afb.",
"if.", "iflg.", "m.a.o.", "mht.", "min.", "osv.", "pga.", "resp.", "self.", "afd.", "afg.", "afk.", "afs.", "aht.", "alg.", "alk.", "alm.", "amer.",
"t.o.m.", "vha.", ""]: "ang.", "ank.", "anl.", "anv.", "arb.", "arr.", "att.", "B.C.", "bd.",
"bdt.", "beg.", "begr.", "beh.", "bet.", "bev.", "bhk.", "bib.",
"bibl.", "bidr.", "bildl.", "bill.", "bio.", "biol.", "bk.", "BK.",
"bl.", "bl.a.", "borgm.", "bot.", "Boul.", "br.", "brolægn.", "bto.",
"bygn.", "ca.", "cand.", "Chr.", "d.", "d.d.", "d.m.", "d.s.", "d.s.s.",
"d.y.", "d.å.", "d.æ.", "da.", "dagl.", "dat.", "dav.", "def.", "dek.",
"dep.", "desl.", "diam.", "dir.", "disp.", "distr.", "div.", "dkr.",
"dl.", "do.", "dobb.", "Dr.", "dr.h.c", "Dronn.", "ds.", "dvs.", "e.b.",
"e.l.", "e.o.", "e.v.t.", "eftf.", "eftm.", "eg.", "egl.", "eks.",
"eksam.", "ekskl.", "eksp.", "ekspl.", "el.", "el.lign.", "emer.",
"endv.", "eng.", "enk.", "etc.", "etym.", "eur.", "evt.", "exam.", "f.",
"f.eks.", "f.m.", "f.n.", "f.o.", "f.o.m.", "f.s.v.", "f.t.", "f.v.t.",
"f.å.", "fa.", "fakt.", "fam.", "fem.", "ff.", "fg.", "fhv.", "fig.",
"filol.", "filos.", "fl.", "flg.", "fm.", "fmd.", "fol.", "forb.",
"foreg.", "foren.", "forf.", "fork.", "form.", "forr.", "fors.",
"forsk.", "forts.", "fr.", "fr.u.", "frk.", "fsva.", "fuldm.", "fung.",
"fx.", "fys.", "fær.", "g.d.", "g.m.", "gd.", "gdr.", "genuds.", "gl.",
"gn.", "gns.", "gr.", "grdl.", "gross.", "h.a.", "h.c.", "H.K.H.",
"H.M.", "hdl.", "henv.", "Hf.", "hhv.", "hj.hj.", "hj.spl.", "hort.",
"hosp.", "hpl.", "Hr.", "hr.", "hrs.", "hum.", "hvp.", "i/s", "I/S",
"i.e.", "ib.", "id.", "if.", "iflg.", "ifm.", "ift.", "iht.", "ill.",
"indb.", "indreg.", "inf.", "ing.", "inh.", "inj.", "inkl.", "insp.",
"instr.", "isl.", "istf.", "it.", "ital.", "iv.", "jap.", "jf.", "jfr.",
"jnr.", "j.nr.", "jr.", "jur.", "jvf.", "K.", "kap.", "kat.", "kbh.",
"kem.", "kgl.", "kl.", "kld.", "knsp.", "komm.", "kons.", "korr.",
"kp.", "Kprs.", "kr.", "kst.", "kt.", "ktr.", "kv.", "kvt.", "l.",
"L.A.", "l.c.", "lab.", "lat.", "lb.m.", "lb.nr.", "lejl.", "lgd.",
"lic.", "lign.", "lin.", "ling.merc.", "litt.", "Ll.", "loc.cit.",
"lok.", "lrs.", "ltr.", "m/s", "M/S", "m.a.o.", "m.fl.", "m.m.", "m.v.",
"m.v.h.", "Mag.", "maks.", "md.", "mdr.", "mdtl.", "mezz.", "mfl.",
"m.h.p.", "m.h.t", "mht.", "mik.", "min.", "mio.", "modt.", "Mr.",
"mrk.", "mul.", "mv.", "n.br.", "n.f.", "nat.", "nb.", "Ndr.",
"nedenst.", "nl.", "nr.", "Nr.", "nto.", "nuv.", "o/m", "o.a.", "o.fl.",
"o.h.", "o.l.", "o.lign.", "o.m.a.", "o.s.fr.", "obl.", "obs.",
"odont.", "oecon.", "off.", "ofl.", "omg.", "omkr.", "omr.", "omtr.",
"opg.", "opl.", "opr.", "org.", "orig.", "osv.", "ovenst.", "overs.",
"ovf.", "p.", "p.a.", "p.b.a", "p.b.v", "p.c.", "p.m.", "p.m.v.",
"p.n.", "p.p.", "p.p.s.", "p.s.", "p.t.", "p.v.a.", "p.v.c.", "pag.",
"par.", "Pas.", "pass.", "pcs.", "pct.", "pd.", "pens.", "pers.",
"pft.", "pg.", "pga.", "pgl.", "Ph.d.", "pinx.", "pk.", "pkt.",
"polit.", "polyt.", "pos.", "pp.", "ppm.", "pr.", "prc.", "priv.",
"prod.", "prof.", "pron.", "Prs.", "præd.", "præf.", "præt.", "psych.",
"pt.", "pæd.", "q.e.d.", "rad.", "Rcp.", "red.", "ref.", "reg.",
"regn.", "rel.", "rep.", "repr.", "resp.", "rest.", "rm.", "rtg.",
"russ.", "s.", "s.br.", "s.d.", "s.f.", "s.m.b.a.", "s.u.", "s.å.",
"sa.", "sb.", "sc.", "scient.", "scil.", "Sdr.", "sek.", "sekr.",
"self.", "sem.", "sen.", "shj.", "sign.", "sing.", "sj.", "skr.",
"Skt.", "slutn.", "sml.", "smp.", "sms.", "snr.", "soc.", "soc.dem.",
"sort.", "sp.", "spec.", "Spl.", "spm.", "spr.", "spsk.", "statsaut.",
"st.", "stk.", "str.", "stud.", "subj.", "subst.", "suff.", "sup.",
"suppl.", "sv.", "såk.", "sædv.", "sø.", "t/r", "t.", "t.h.", "t.o.",
"t.o.m.", "t.v.", "tab.", "tbl.", "tcp/ip", "td.", "tdl.", "tdr.",
"techn.", "tekn.", "temp.", "th.", "theol.", "ti.", "tidl.", "tilf.",
"tilh.", "till.", "tilsv.", "tjg.", "tkr.", "tlf.", "tlgr.", "to.",
"tr.", "trp.", "tsk.", "tv.", "ty.", "u/b", "udb.", "udbet.", "ugtl.",
"undt.", "v.", "v.f.", "var.", "vb.", "vedk.", "vedl.", "vedr.",
"vejl.", "Vg.", "vh.", "vha.", "vs.", "vsa.", "vær.", "zool.", "ø.lgd.",
"øv.", "øvr.", "årg.", "årh.", ""]:
_exc[orth] = [{ORTH: orth}] _exc[orth] = [{ORTH: orth}]
# Dates
for h in range(1, 31 + 1):
for period in ["."]:
_exc["%d%s" % (h, period)] = [
{ORTH: "%d." % h}]
_custom_base_exc = {
"i.": [
{ORTH: "i", LEMMA: "i", NORM: "i"},
{ORTH: ".", TAG: PUNCT}]
}
_exc.update(_custom_base_exc)
TOKENIZER_EXCEPTIONS = _exc TOKENIZER_EXCEPTIONS = _exc

18
spacy/lang/nl/examples.py Normal file
View File

@ -0,0 +1,18 @@
# coding: utf8
from __future__ import unicode_literals
"""
Example sentences to test spaCy and its language models.
>>> from spacy.lang.nl.examples import sentences
>>> docs = nlp.pipe(sentences)
"""
examples = [
"Apple overweegt om voor 1 miljard een U.K. startup te kopen",
"Autonome auto's verschuiven de verzekeringverantwoordelijkheid naar producenten",
"San Francisco overweegt robots op voetpaden te verbieden",
"Londen is een grote stad in het Verenigd Koninkrijk"
]

38
spacy/lang/ru/__init__.py Normal file
View File

@ -0,0 +1,38 @@
# encoding: utf8
from __future__ import unicode_literals, print_function
from .stop_words import STOP_WORDS
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
from .norm_exceptions import NORM_EXCEPTIONS
from .lex_attrs import LEX_ATTRS
from .tag_map import TAG_MAP
from .lemmatizer import RussianLemmatizer
from ..tokenizer_exceptions import BASE_EXCEPTIONS
from ..norm_exceptions import BASE_NORMS
from ...util import update_exc, add_lookups
from ...language import Language
from ...attrs import LANG, LIKE_NUM, NORM
class RussianDefaults(Language.Defaults):
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
lex_attr_getters.update(LEX_ATTRS)
lex_attr_getters[LANG] = lambda text: 'ru'
lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM],
BASE_NORMS, NORM_EXCEPTIONS)
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
stop_words = STOP_WORDS
tag_map = TAG_MAP
@classmethod
def create_lemmatizer(cls, nlp=None):
return RussianLemmatizer()
class Russian(Language):
lang = 'ru'
Defaults = RussianDefaults
__all__ = ['Russian']

237
spacy/lang/ru/lemmatizer.py Normal file
View File

@ -0,0 +1,237 @@
# coding: utf8
from ...symbols import (
ADJ, DET, NOUN, NUM, PRON, PROPN, PUNCT, VERB, POS
)
from ...lemmatizer import Lemmatizer
class RussianLemmatizer(Lemmatizer):
_morph = None
def __init__(self):
super(RussianLemmatizer, self).__init__()
try:
from pymorphy2 import MorphAnalyzer
except ImportError:
raise ImportError(
'The Russian lemmatizer requires the pymorphy2 library: '
'try to fix it with "pip install pymorphy2==0.8"')
if RussianLemmatizer._morph is None:
RussianLemmatizer._morph = MorphAnalyzer()
def __call__(self, string, univ_pos, morphology=None):
univ_pos = self.normalize_univ_pos(univ_pos)
if univ_pos == 'PUNCT':
return [PUNCT_RULES.get(string, string)]
if univ_pos not in ('ADJ', 'DET', 'NOUN', 'NUM', 'PRON', 'PROPN', 'VERB'):
# Skip unchangeable pos
return [string.lower()]
analyses = self._morph.parse(string)
filtered_analyses = []
for analysis in analyses:
if not analysis.is_known:
# Skip suggested parse variant for unknown word for pymorphy
continue
analysis_pos, _ = oc2ud(str(analysis.tag))
if analysis_pos == univ_pos \
or (analysis_pos in ('NOUN', 'PROPN') and univ_pos in ('NOUN', 'PROPN')):
filtered_analyses.append(analysis)
if not len(filtered_analyses):
return [string.lower()]
if morphology is None or (len(morphology) == 1 and POS in morphology):
return list(set([analysis.normal_form for analysis in filtered_analyses]))
if univ_pos in ('ADJ', 'DET', 'NOUN', 'PROPN'):
features_to_compare = ['Case', 'Number', 'Gender']
elif univ_pos == 'NUM':
features_to_compare = ['Case', 'Gender']
elif univ_pos == 'PRON':
features_to_compare = ['Case', 'Number', 'Gender', 'Person']
else: # VERB
features_to_compare = ['Aspect', 'Gender', 'Mood', 'Number', 'Tense', 'VerbForm', 'Voice']
analyses, filtered_analyses = filtered_analyses, []
for analysis in analyses:
_, analysis_morph = oc2ud(str(analysis.tag))
for feature in features_to_compare:
if (feature in morphology and feature in analysis_morph
and morphology[feature] != analysis_morph[feature]):
break
else:
filtered_analyses.append(analysis)
if not len(filtered_analyses):
return [string.lower()]
return list(set([analysis.normal_form for analysis in filtered_analyses]))
@staticmethod
def normalize_univ_pos(univ_pos):
if isinstance(univ_pos, str):
return univ_pos.upper()
symbols_to_str = {
ADJ: 'ADJ',
DET: 'DET',
NOUN: 'NOUN',
NUM: 'NUM',
PRON: 'PRON',
PROPN: 'PROPN',
PUNCT: 'PUNCT',
VERB: 'VERB'
}
if univ_pos in symbols_to_str:
return symbols_to_str[univ_pos]
return None
def is_base_form(self, univ_pos, morphology=None):
# TODO
raise NotImplementedError
def det(self, string, morphology=None):
return self(string, 'det', morphology)
def num(self, string, morphology=None):
return self(string, 'num', morphology)
def pron(self, string, morphology=None):
return self(string, 'pron', morphology)
def lookup(self, string):
analyses = self._morph.parse(string)
if len(analyses) == 1:
return analyses[0].normal_form
return string
def oc2ud(oc_tag):
gram_map = {
'_POS': {
'ADJF': 'ADJ',
'ADJS': 'ADJ',
'ADVB': 'ADV',
'Apro': 'DET',
'COMP': 'ADJ', # Can also be an ADV - unchangeable
'CONJ': 'CCONJ', # Can also be a SCONJ - both unchangeable ones
'GRND': 'VERB',
'INFN': 'VERB',
'INTJ': 'INTJ',
'NOUN': 'NOUN',
'NPRO': 'PRON',
'NUMR': 'NUM',
'NUMB': 'NUM',
'PNCT': 'PUNCT',
'PRCL': 'PART',
'PREP': 'ADP',
'PRTF': 'VERB',
'PRTS': 'VERB',
'VERB': 'VERB',
},
'Animacy': {
'anim': 'Anim',
'inan': 'Inan',
},
'Aspect': {
'impf': 'Imp',
'perf': 'Perf',
},
'Case': {
'ablt': 'Ins',
'accs': 'Acc',
'datv': 'Dat',
'gen1': 'Gen',
'gen2': 'Gen',
'gent': 'Gen',
'loc2': 'Loc',
'loct': 'Loc',
'nomn': 'Nom',
'voct': 'Voc',
},
'Degree': {
'COMP': 'Cmp',
'Supr': 'Sup',
},
'Gender': {
'femn': 'Fem',
'masc': 'Masc',
'neut': 'Neut',
},
'Mood': {
'impr': 'Imp',
'indc': 'Ind',
},
'Number': {
'plur': 'Plur',
'sing': 'Sing',
},
'NumForm': {
'NUMB': 'Digit',
},
'Person': {
'1per': '1',
'2per': '2',
'3per': '3',
'excl': '2',
'incl': '1',
},
'Tense': {
'futr': 'Fut',
'past': 'Past',
'pres': 'Pres',
},
'Variant': {
'ADJS': 'Brev',
'PRTS': 'Brev',
},
'VerbForm': {
'GRND': 'Conv',
'INFN': 'Inf',
'PRTF': 'Part',
'PRTS': 'Part',
'VERB': 'Fin',
},
'Voice': {
'actv': 'Act',
'pssv': 'Pass',
},
'Abbr': {
'Abbr': 'Yes'
}
}
pos = 'X'
morphology = dict()
unmatched = set()
grams = oc_tag.replace(' ', ',').split(',')
for gram in grams:
match = False
for categ, gmap in sorted(gram_map.items()):
if gram in gmap:
match = True
if categ == '_POS':
pos = gmap[gram]
else:
morphology[categ] = gmap[gram]
if not match:
unmatched.add(gram)
while len(unmatched) > 0:
gram = unmatched.pop()
if gram in ('Name', 'Patr', 'Surn', 'Geox', 'Orgn'):
pos = 'PROPN'
elif gram == 'Auxt':
pos = 'AUX'
elif gram == 'Pltm':
morphology['Number'] = 'Ptan'
return pos, morphology
PUNCT_RULES = {
"«": "\"",
"»": "\""
}

View File

@ -0,0 +1,35 @@
# coding: utf8
from __future__ import unicode_literals
from ...attrs import LIKE_NUM
_num_words = [
'ноль', 'один', 'два', 'три', 'четыре', 'пять', 'шесть', 'семь', 'восемь', 'девять',
'десять', 'одиннадцать', 'двенадцать', 'тринадцать', 'четырнадцать',
'пятнадцать', 'шестнадцать', 'семнадцать', 'восемнадцать', 'девятнадцать',
'двадцать', 'тридцать', 'сорок', 'пятьдесят', 'шестьдесят', 'семьдесят', 'восемьдесят', 'девяносто',
'сто', 'двести', 'триста', 'четыреста', 'пятьсот', 'шестьсот', 'семьсот', 'восемьсот', 'девятьсот',
'тысяча', 'миллион', 'миллиард', 'триллион', 'квадриллион', 'квинтиллион']
def like_num(text):
text = text.replace(',', '').replace('.', '')
if text.isdigit():
return True
if text.count('/') == 1:
num, denom = text.split('/')
if num.isdigit() and denom.isdigit():
return True
if text in _num_words:
return True
return False
LEX_ATTRS = {
LIKE_NUM: like_num
}

View File

@ -0,0 +1,24 @@
# coding: utf8
from __future__ import unicode_literals
_exc = {
# Slang
'прив': 'привет',
'ща': 'сейчас',
'спс': 'спасибо',
'пжлст': 'пожалуйста',
'плиз': 'пожалуйста',
'лан': 'ладно',
'ясн': 'ясно',
'всм': 'всмысле',
'хош': 'хочешь',
'оч': 'очень'
}
NORM_EXCEPTIONS = {}
for string, norm in _exc.items():
NORM_EXCEPTIONS[string] = norm
NORM_EXCEPTIONS[string.title()] = norm

View File

@ -0,0 +1,54 @@
# encoding: utf8
from __future__ import unicode_literals
STOP_WORDS = set("""
а
будем будет будете будешь буду будут будучи будь будьте бы был была были было
быть
в вам вами вас весь во вот все всё всего всей всем всём всеми всему всех всею
всея всю вся вы
да для до
его едим едят ее её ей ел ела ем ему емъ если ест есть ешь еще ещё ею
же
за
и из или им ими имъ их
к как кем ко когда кого ком кому комья которая которого которое которой котором
которому которою которую которые который которым которыми которых кто
меня мне мной мною мог моги могите могла могли могло могу могут мое моё моего
моей моем моём моему моею можем может можете можешь мои мой моим моими моих
мочь мою моя мы
на нам нами нас наса наш наша наше нашего нашей нашем нашему нашею наши нашим
нашими наших нашу не него нее неё ней нем нём нему нет нею ним ними них но
о об один одна одни одним одними одних одно одного одной одном одному одною
одну он она оне они оно от
по при
с сам сама сами самим самими самих само самого самом самому саму свое своё
своего своей своем своём своему своею свои свой своим своими своих свою своя
себе себя собой собою
та так такая такие таким такими таких такого такое такой таком такому такою
такую те тебе тебя тем теми тех то тобой тобою того той только том томах тому
тот тою ту ты
у уже
чего чем чём чему что чтобы
эта эти этим этими этих это этого этой этом этому этот этою эту
я
""".split())

731
spacy/lang/ru/tag_map.py Normal file
View File

@ -0,0 +1,731 @@
# coding: utf8
from __future__ import unicode_literals
from ...symbols import (
POS, PUNCT, SYM, ADJ, NUM, DET, ADV, ADP, X, VERB, NOUN, PROPN, PART, INTJ, SPACE, PRON, SCONJ, AUX, CONJ, CCONJ
)
TAG_MAP = {
'ADJ__Animacy=Anim|Case=Acc|Degree=Pos|Gender=Masc|Number=Sing': {POS: ADJ, 'Animacy': 'Anim', 'Case': 'Acc', 'Degree': 'Pos', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Animacy=Anim|Case=Acc|Degree=Pos|Number=Plur': {POS: ADJ, 'Animacy': 'Anim', 'Case': 'Acc', 'Degree': 'Pos', 'Number': 'Plur'},
'ADJ__Animacy=Anim|Case=Acc|Degree=Sup|Gender=Masc|Number=Sing': {POS: ADJ, 'Animacy': 'Anim', 'Case': 'Acc', 'Degree': 'Sup', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Animacy=Anim|Case=Nom|Degree=Pos|Number=Plur': {POS: ADJ, 'Animacy': 'Anim', 'Case': 'Nom', 'Degree': 'Pos', 'Number': 'Plur'},
'ADJ__Animacy=Inan|Case=Acc|Degree=Pos|Gender=Masc|Number=Sing': {POS: ADJ, 'Animacy': 'Inan', 'Case': 'Acc', 'Degree': 'Pos', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Animacy=Inan|Case=Acc|Degree=Pos|Gender=Neut|Number=Sing': {POS: ADJ, 'Animacy': 'Inan', 'Case': 'Acc', 'Degree': 'Pos', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Animacy=Inan|Case=Acc|Degree=Pos|Number=Plur': {POS: ADJ, 'Animacy': 'Inan', 'Case': 'Acc', 'Degree': 'Pos', 'Number': 'Plur'},
'ADJ__Animacy=Inan|Case=Acc|Degree=Sup|Gender=Masc|Number=Sing': {POS: ADJ, 'Animacy': 'Inan', 'Case': 'Acc', 'Degree': 'Sup', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Animacy=Inan|Case=Acc|Degree=Sup|Number=Plur': {POS: ADJ, 'Animacy': 'Inan', 'Case': 'Acc', 'Degree': 'Sup', 'Number': 'Plur'},
'ADJ__Animacy=Inan|Case=Acc|Gender=Fem|Number=Sing': {POS: ADJ, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Animacy=Inan|Case=Nom|Degree=Pos|Gender=Fem|Number=Sing': {POS: ADJ, 'Animacy': 'Inan', 'Case': 'Nom', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Acc|Degree=Pos|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Acc', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Acc|Degree=Pos|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Acc', 'Degree': 'Pos', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Acc|Degree=Sup|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Acc', 'Degree': 'Sup', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Acc|Degree=Sup|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Acc', 'Degree': 'Sup', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Dat|Degree=Pos|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Dat', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Dat|Degree=Pos|Gender=Masc|Number=Sing': {POS: ADJ, 'Case': 'Dat', 'Degree': 'Pos', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Case=Dat|Degree=Pos|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Dat', 'Degree': 'Pos', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Dat|Degree=Pos|Number=Plur': {POS: ADJ, 'Case': 'Dat', 'Degree': 'Pos', 'Number': 'Plur'},
'ADJ__Case=Dat|Degree=Sup|Gender=Masc|Number=Sing': {POS: ADJ, 'Case': 'Dat', 'Degree': 'Sup', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Case=Dat|Degree=Sup|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Dat', 'Degree': 'Sup', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Dat|Degree=Sup|Number=Plur': {POS: ADJ, 'Case': 'Dat', 'Degree': 'Sup', 'Number': 'Plur'},
'ADJ__Case=Gen|Degree=Pos|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Gen', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Gen|Degree=Pos|Gender=Fem|Number=Sing|Variant=Short': {POS: ADJ, 'Case': 'Gen', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing', 'Variant': 'Short'},
'ADJ__Case=Gen|Degree=Pos|Gender=Masc|Number=Sing': {POS: ADJ, 'Case': 'Gen', 'Degree': 'Pos', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Case=Gen|Degree=Pos|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Gen', 'Degree': 'Pos', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Gen|Degree=Pos|Number=Plur': {POS: ADJ, 'Case': 'Gen', 'Degree': 'Pos', 'Number': 'Plur'},
'ADJ__Case=Gen|Degree=Sup|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Gen', 'Degree': 'Sup', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Gen|Degree=Sup|Gender=Masc|Number=Sing': {POS: ADJ, 'Case': 'Gen', 'Degree': 'Sup', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Case=Gen|Degree=Sup|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Gen', 'Degree': 'Sup', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Gen|Degree=Sup|Number=Plur': {POS: ADJ, 'Case': 'Gen', 'Degree': 'Sup', 'Number': 'Plur'},
'ADJ__Case=Ins|Degree=Pos|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Ins', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Ins|Degree=Pos|Gender=Masc|Number=Sing': {POS: ADJ, 'Case': 'Ins', 'Degree': 'Pos', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Case=Ins|Degree=Pos|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Ins', 'Degree': 'Pos', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Ins|Degree=Pos|Number=Plur': {POS: ADJ, 'Case': 'Ins', 'Degree': 'Pos', 'Number': 'Plur'},
'ADJ__Case=Ins|Degree=Sup|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Ins', 'Degree': 'Sup', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Ins|Degree=Sup|Gender=Masc|Number=Sing': {POS: ADJ, 'Case': 'Ins', 'Degree': 'Sup', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Case=Ins|Degree=Sup|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Ins', 'Degree': 'Sup', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Ins|Degree=Sup|Number=Plur': {POS: ADJ, 'Case': 'Ins', 'Degree': 'Sup', 'Number': 'Plur'},
'ADJ__Case=Loc|Degree=Pos|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Loc', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Loc|Degree=Pos|Gender=Masc|Number=Sing': {POS: ADJ, 'Case': 'Loc', 'Degree': 'Pos', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Case=Loc|Degree=Pos|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Loc', 'Degree': 'Pos', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Loc|Degree=Pos|Number=Plur': {POS: ADJ, 'Case': 'Loc', 'Degree': 'Pos', 'Number': 'Plur'},
'ADJ__Case=Loc|Degree=Sup|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Loc', 'Degree': 'Sup', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Loc|Degree=Sup|Gender=Masc|Number=Sing': {POS: ADJ, 'Case': 'Loc', 'Degree': 'Sup', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Case=Loc|Degree=Sup|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Loc', 'Degree': 'Sup', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Loc|Degree=Sup|Number=Plur': {POS: ADJ, 'Case': 'Loc', 'Degree': 'Sup', 'Number': 'Plur'},
'ADJ__Case=Nom|Degree=Pos|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Nom', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Nom|Degree=Pos|Gender=Masc|Number=Sing': {POS: ADJ, 'Case': 'Nom', 'Degree': 'Pos', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Case=Nom|Degree=Pos|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Nom', 'Degree': 'Pos', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Nom|Degree=Pos|Number=Plur': {POS: ADJ, 'Case': 'Nom', 'Degree': 'Pos', 'Number': 'Plur'},
'ADJ__Case=Nom|Degree=Sup|Gender=Fem|Number=Sing': {POS: ADJ, 'Case': 'Nom', 'Degree': 'Sup', 'Gender': 'Fem', 'Number': 'Sing'},
'ADJ__Case=Nom|Degree=Sup|Gender=Masc|Number=Sing': {POS: ADJ, 'Case': 'Nom', 'Degree': 'Sup', 'Gender': 'Masc', 'Number': 'Sing'},
'ADJ__Case=Nom|Degree=Sup|Gender=Neut|Number=Sing': {POS: ADJ, 'Case': 'Nom', 'Degree': 'Sup', 'Gender': 'Neut', 'Number': 'Sing'},
'ADJ__Case=Nom|Degree=Sup|Number=Plur': {POS: ADJ, 'Case': 'Nom', 'Degree': 'Sup', 'Number': 'Plur'},
'ADJ__Degree=Cmp': {POS: ADJ, 'Degree': 'Cmp'},
'ADJ__Degree=Pos': {POS: ADJ, 'Degree': 'Pos'},
'ADJ__Degree=Pos|Gender=Fem|Number=Sing|Variant=Short': {POS: ADJ, 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing', 'Variant': 'Short'},
'ADJ__Degree=Pos|Gender=Masc|Number=Sing|Variant=Short': {POS: ADJ, 'Degree': 'Pos', 'Gender': 'Masc', 'Number': 'Sing', 'Variant': 'Short'},
'ADJ__Degree=Pos|Gender=Neut|Number=Sing|Variant=Short': {POS: ADJ, 'Degree': 'Pos', 'Gender': 'Neut', 'Number': 'Sing', 'Variant': 'Short'},
'ADJ__Degree=Pos|Number=Plur|Variant=Short': {POS: ADJ, 'Degree': 'Pos', 'Number': 'Plur', 'Variant': 'Short'},
'ADJ__Foreign=Yes': {POS: ADJ, 'Foreign': 'Yes'},
'ADJ___': {POS: ADJ},
'ADP___': {POS: ADP},
'ADV__Degree=Cmp': {POS: ADV, 'Degree': 'Cmp'},
'ADV__Degree=Pos': {POS: ADV, 'Degree': 'Pos'},
'ADV__Polarity=Neg': {POS: ADV, 'Polarity': 'Neg'},
'AUX__Aspect=Imp|Case=Loc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'AUX__Aspect=Imp|Case=Nom|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'AUX__Aspect=Imp|Case=Nom|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Case': 'Nom', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'AUX__Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Gender': 'Fem', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Gender': 'Masc', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Gender=Neut|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Gender': 'Neut', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Mood=Imp|Number=Plur|Person=2|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Mood': 'Imp', 'Number': 'Plur', 'Person': '2', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Mood=Imp|Number=Sing|Person=2|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Mood': 'Imp', 'Number': 'Sing', 'Person': '2', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '1', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '2', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '3', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '1', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '2', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '3', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'AUX__Aspect=Imp|Tense=Pres|VerbForm=Conv|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'Tense': 'Pres', 'VerbForm': 'Conv', 'Voice': 'Act'},
'AUX__Aspect=Imp|VerbForm=Inf|Voice=Act': {POS: AUX, 'Aspect': 'Imp', 'VerbForm': 'Inf', 'Voice': 'Act'},
'CCONJ___': {POS: CCONJ},
'DET__Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing': {POS: DET, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing'},
'DET__Animacy=Inan|Case=Acc|Gender=Neut|Number=Sing': {POS: DET, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing'},
'DET__Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing': {POS: DET, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing'},
'DET__Animacy=Inan|Case=Gen|Number=Plur': {POS: DET, 'Animacy': 'Inan', 'Case': 'Gen', 'Number': 'Plur'},
'DET__Case=Acc|Degree=Pos|Number=Plur': {POS: DET, 'Case': 'Acc', 'Degree': 'Pos', 'Number': 'Plur'},
'DET__Case=Acc|Gender=Fem|Number=Sing': {POS: DET, 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing'},
'DET__Case=Acc|Gender=Masc|Number=Sing': {POS: DET, 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing'},
'DET__Case=Acc|Gender=Neut|Number=Sing': {POS: DET, 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing'},
'DET__Case=Acc|Number=Plur': {POS: DET, 'Case': 'Acc', 'Number': 'Plur'},
'DET__Case=Dat|Gender=Fem|Number=Sing': {POS: DET, 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing'},
'DET__Case=Dat|Gender=Masc|Number=Plur': {POS: DET, 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Plur'},
'DET__Case=Dat|Gender=Masc|Number=Sing': {POS: DET, 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing'},
'DET__Case=Dat|Gender=Neut|Number=Sing': {POS: DET, 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing'},
'DET__Case=Dat|Number=Plur': {POS: DET, 'Case': 'Dat', 'Number': 'Plur'},
'DET__Case=Gen|Gender=Fem|Number=Sing': {POS: DET, 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing'},
'DET__Case=Gen|Gender=Masc|Number=Sing': {POS: DET, 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing'},
'DET__Case=Gen|Gender=Neut|Number=Sing': {POS: DET, 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing'},
'DET__Case=Gen|Number=Plur': {POS: DET, 'Case': 'Gen', 'Number': 'Plur'},
'DET__Case=Ins|Gender=Fem|Number=Sing': {POS: DET, 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing'},
'DET__Case=Ins|Gender=Masc|Number=Sing': {POS: DET, 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing'},
'DET__Case=Ins|Gender=Neut|Number=Sing': {POS: DET, 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing'},
'DET__Case=Ins|Number=Plur': {POS: DET, 'Case': 'Ins', 'Number': 'Plur'},
'DET__Case=Loc|Gender=Fem|Number=Sing': {POS: DET, 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing'},
'DET__Case=Loc|Gender=Masc|Number=Sing': {POS: DET, 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing'},
'DET__Case=Loc|Gender=Neut|Number=Sing': {POS: DET, 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing'},
'DET__Case=Loc|Number=Plur': {POS: DET, 'Case': 'Loc', 'Number': 'Plur'},
'DET__Case=Nom|Gender=Fem|Number=Sing': {POS: DET, 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing'},
'DET__Case=Nom|Gender=Masc|Number=Plur': {POS: DET, 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Plur'},
'DET__Case=Nom|Gender=Masc|Number=Sing': {POS: DET, 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing'},
'DET__Case=Nom|Gender=Neut|Number=Sing': {POS: DET, 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing'},
'DET__Case=Nom|Number=Plur': {POS: DET, 'Case': 'Nom', 'Number': 'Plur'},
'DET__Gender=Masc|Number=Sing': {POS: DET, 'Gender': 'Masc', 'Number': 'Sing'},
'INTJ___': {POS: INTJ},
'NOUN__Animacy=Anim|Case=Acc|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Acc|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Acc|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Acc|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Acc|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Acc|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Acc|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Acc', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Dat|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Dat|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Dat|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Dat|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Dat|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Dat|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Dat|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Dat', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Gen|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Gen|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Gen|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Gen|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Gen|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Gen|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Gen|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Gen', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Ins|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Ins|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Ins|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Ins|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Ins|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Ins|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Ins|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Ins', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Loc|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Loc|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Loc|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Loc|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Loc|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Loc|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Loc|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Loc', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Nom|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Nom|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Nom|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Nom|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Anim|Case=Nom|Number=Plur': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Nom', 'Number': 'Plur'},
'NOUN__Animacy=Anim|Case=Voc|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Anim', 'Case': 'Voc', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Acc|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Acc|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Acc|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Acc|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Acc|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Acc|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Acc', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Dat|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Dat|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Dat|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Dat|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Dat|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Dat|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Dat|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Dat', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Gen|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Gen|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Gen|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Gen|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Gen|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Gen|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Gen', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Ins|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Ins|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Ins|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Ins|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Ins|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Ins|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Ins|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Ins', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Loc|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Loc|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Loc|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Loc|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Loc|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Loc|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Loc', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Nom|Gender=Fem|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Nom|Gender=Fem|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Nom|Gender=Masc|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Nom|Gender=Neut|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Case=Nom|Number=Plur': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Nom', 'Number': 'Plur'},
'NOUN__Animacy=Inan|Case=Par|Gender=Masc|Number=Sing': {POS: NOUN, 'Animacy': 'Inan', 'Case': 'Par', 'Gender': 'Masc', 'Number': 'Sing'},
'NOUN__Animacy=Inan|Gender=Fem': {POS: NOUN, 'Animacy': 'Inan', 'Gender': 'Fem'},
'NOUN__Animacy=Inan|Gender=Masc': {POS: NOUN, 'Animacy': 'Inan', 'Gender': 'Masc'},
'NOUN__Animacy=Inan|Gender=Neut': {POS: NOUN, 'Animacy': 'Inan', 'Gender': 'Neut'},
'NOUN__Case=Gen|Degree=Pos|Gender=Fem|Number=Sing': {POS: NOUN, 'Case': 'Gen', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'NOUN__Foreign=Yes': {POS: NOUN, 'Foreign': 'Yes'},
'NOUN___': {POS: NOUN},
'NUM__Animacy=Anim|Case=Acc': {POS: NUM, 'Animacy': 'Anim', 'Case': 'Acc'},
'NUM__Animacy=Anim|Case=Acc|Gender=Fem': {POS: NUM, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Fem'},
'NUM__Animacy=Anim|Case=Acc|Gender=Masc': {POS: NUM, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Masc'},
'NUM__Animacy=Inan|Case=Acc': {POS: NUM, 'Animacy': 'Inan', 'Case': 'Acc'},
'NUM__Animacy=Inan|Case=Acc|Gender=Fem': {POS: NUM, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Fem'},
'NUM__Animacy=Inan|Case=Acc|Gender=Masc': {POS: NUM, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Masc'},
'NUM__Case=Acc': {POS: NUM, 'Case': 'Acc'},
'NUM__Case=Acc|Gender=Fem': {POS: NUM, 'Case': 'Acc', 'Gender': 'Fem'},
'NUM__Case=Acc|Gender=Masc': {POS: NUM, 'Case': 'Acc', 'Gender': 'Masc'},
'NUM__Case=Acc|Gender=Neut': {POS: NUM, 'Case': 'Acc', 'Gender': 'Neut'},
'NUM__Case=Dat': {POS: NUM, 'Case': 'Dat'},
'NUM__Case=Dat|Gender=Fem': {POS: NUM, 'Case': 'Dat', 'Gender': 'Fem'},
'NUM__Case=Dat|Gender=Masc': {POS: NUM, 'Case': 'Dat', 'Gender': 'Masc'},
'NUM__Case=Dat|Gender=Neut': {POS: NUM, 'Case': 'Dat', 'Gender': 'Neut'},
'NUM__Case=Gen': {POS: NUM, 'Case': 'Gen'},
'NUM__Case=Gen|Gender=Fem': {POS: NUM, 'Case': 'Gen', 'Gender': 'Fem'},
'NUM__Case=Gen|Gender=Masc': {POS: NUM, 'Case': 'Gen', 'Gender': 'Masc'},
'NUM__Case=Gen|Gender=Neut': {POS: NUM, 'Case': 'Gen', 'Gender': 'Neut'},
'NUM__Case=Ins': {POS: NUM, 'Case': 'Ins'},
'NUM__Case=Ins|Gender=Fem': {POS: NUM, 'Case': 'Ins', 'Gender': 'Fem'},
'NUM__Case=Ins|Gender=Masc': {POS: NUM, 'Case': 'Ins', 'Gender': 'Masc'},
'NUM__Case=Ins|Gender=Neut': {POS: NUM, 'Case': 'Ins', 'Gender': 'Neut'},
'NUM__Case=Loc': {POS: NUM, 'Case': 'Loc'},
'NUM__Case=Loc|Gender=Fem': {POS: NUM, 'Case': 'Loc', 'Gender': 'Fem'},
'NUM__Case=Loc|Gender=Masc': {POS: NUM, 'Case': 'Loc', 'Gender': 'Masc'},
'NUM__Case=Loc|Gender=Neut': {POS: NUM, 'Case': 'Loc', 'Gender': 'Neut'},
'NUM__Case=Nom': {POS: NUM, 'Case': 'Nom'},
'NUM__Case=Nom|Gender=Fem': {POS: NUM, 'Case': 'Nom', 'Gender': 'Fem'},
'NUM__Case=Nom|Gender=Masc': {POS: NUM, 'Case': 'Nom', 'Gender': 'Masc'},
'NUM__Case=Nom|Gender=Neut': {POS: NUM, 'Case': 'Nom', 'Gender': 'Neut'},
'NUM___': {POS: NUM},
'PART__Mood=Cnd': {POS: PART, 'Mood': 'Cnd'},
'PART__Polarity=Neg': {POS: PART, 'Polarity': 'Neg'},
'PART___': {POS: PART},
'PRON__Animacy=Anim|Case=Acc|Gender=Masc|Number=Plur': {POS: PRON, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Plur'},
'PRON__Animacy=Anim|Case=Acc|Number=Plur': {POS: PRON, 'Animacy': 'Anim', 'Case': 'Acc', 'Number': 'Plur'},
'PRON__Animacy=Anim|Case=Dat|Gender=Masc|Number=Sing': {POS: PRON, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing'},
'PRON__Animacy=Anim|Case=Dat|Number=Plur': {POS: PRON, 'Animacy': 'Anim', 'Case': 'Dat', 'Number': 'Plur'},
'PRON__Animacy=Anim|Case=Gen|Number=Plur': {POS: PRON, 'Animacy': 'Anim', 'Case': 'Gen', 'Number': 'Plur'},
'PRON__Animacy=Anim|Case=Ins|Gender=Masc|Number=Sing': {POS: PRON, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing'},
'PRON__Animacy=Anim|Case=Ins|Number=Plur': {POS: PRON, 'Animacy': 'Anim', 'Case': 'Ins', 'Number': 'Plur'},
'PRON__Animacy=Anim|Case=Loc|Number=Plur': {POS: PRON, 'Animacy': 'Anim', 'Case': 'Loc', 'Number': 'Plur'},
'PRON__Animacy=Anim|Case=Nom|Gender=Masc|Number=Plur': {POS: PRON, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Plur'},
'PRON__Animacy=Anim|Case=Nom|Number=Plur': {POS: PRON, 'Animacy': 'Anim', 'Case': 'Nom', 'Number': 'Plur'},
'PRON__Animacy=Anim|Gender=Masc|Number=Plur': {POS: PRON, 'Animacy': 'Anim', 'Gender': 'Masc', 'Number': 'Plur'},
'PRON__Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing': {POS: PRON, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing'},
'PRON__Animacy=Inan|Case=Acc|Gender=Neut|Number=Sing': {POS: PRON, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing'},
'PRON__Animacy=Inan|Case=Dat|Gender=Neut|Number=Sing': {POS: PRON, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing'},
'PRON__Animacy=Inan|Case=Gen|Gender=Masc|Number=Sing': {POS: PRON, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing'},
'PRON__Animacy=Inan|Case=Gen|Gender=Neut|Number=Sing': {POS: PRON, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing'},
'PRON__Animacy=Inan|Case=Ins|Gender=Fem|Number=Sing': {POS: PRON, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing'},
'PRON__Animacy=Inan|Case=Ins|Gender=Neut|Number=Sing': {POS: PRON, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing'},
'PRON__Animacy=Inan|Case=Loc|Gender=Neut|Number=Sing': {POS: PRON, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing'},
'PRON__Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing': {POS: PRON, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing'},
'PRON__Animacy=Inan|Gender=Neut|Number=Sing': {POS: PRON, 'Animacy': 'Inan', 'Gender': 'Neut', 'Number': 'Sing'},
'PRON__Case=Acc': {POS: PRON, 'Case': 'Acc'},
'PRON__Case=Acc|Gender=Fem|Number=Sing|Person=3': {POS: PRON, 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Acc|Gender=Masc|Number=Sing|Person=3': {POS: PRON, 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Acc|Gender=Neut|Number=Sing|Person=3': {POS: PRON, 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Acc|Number=Plur|Person=1': {POS: PRON, 'Case': 'Acc', 'Number': 'Plur', 'Person': '1'},
'PRON__Case=Acc|Number=Plur|Person=2': {POS: PRON, 'Case': 'Acc', 'Number': 'Plur', 'Person': '2'},
'PRON__Case=Acc|Number=Plur|Person=3': {POS: PRON, 'Case': 'Acc', 'Number': 'Plur', 'Person': '3'},
'PRON__Case=Acc|Number=Sing|Person=1': {POS: PRON, 'Case': 'Acc', 'Number': 'Sing', 'Person': '1'},
'PRON__Case=Acc|Number=Sing|Person=2': {POS: PRON, 'Case': 'Acc', 'Number': 'Sing', 'Person': '2'},
'PRON__Case=Dat': {POS: PRON, 'Case': 'Dat'},
'PRON__Case=Dat|Gender=Fem|Number=Sing|Person=3': {POS: PRON, 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Dat|Gender=Masc|Number=Sing|Person=3': {POS: PRON, 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Dat|Gender=Neut|Number=Sing|Person=3': {POS: PRON, 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Dat|Number=Plur|Person=1': {POS: PRON, 'Case': 'Dat', 'Number': 'Plur', 'Person': '1'},
'PRON__Case=Dat|Number=Plur|Person=2': {POS: PRON, 'Case': 'Dat', 'Number': 'Plur', 'Person': '2'},
'PRON__Case=Dat|Number=Plur|Person=3': {POS: PRON, 'Case': 'Dat', 'Number': 'Plur', 'Person': '3'},
'PRON__Case=Dat|Number=Sing|Person=1': {POS: PRON, 'Case': 'Dat', 'Number': 'Sing', 'Person': '1'},
'PRON__Case=Dat|Number=Sing|Person=2': {POS: PRON, 'Case': 'Dat', 'Number': 'Sing', 'Person': '2'},
'PRON__Case=Gen': {POS: PRON, 'Case': 'Gen'},
'PRON__Case=Gen|Gender=Fem|Number=Sing|Person=3': {POS: PRON, 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Gen|Gender=Masc|Number=Sing|Person=3': {POS: PRON, 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Gen|Gender=Neut|Number=Sing|Person=3': {POS: PRON, 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Gen|Number=Plur|Person=1': {POS: PRON, 'Case': 'Gen', 'Number': 'Plur', 'Person': '1'},
'PRON__Case=Gen|Number=Plur|Person=2': {POS: PRON, 'Case': 'Gen', 'Number': 'Plur', 'Person': '2'},
'PRON__Case=Gen|Number=Plur|Person=3': {POS: PRON, 'Case': 'Gen', 'Number': 'Plur', 'Person': '3'},
'PRON__Case=Gen|Number=Sing|Person=1': {POS: PRON, 'Case': 'Gen', 'Number': 'Sing', 'Person': '1'},
'PRON__Case=Gen|Number=Sing|Person=2': {POS: PRON, 'Case': 'Gen', 'Number': 'Sing', 'Person': '2'},
'PRON__Case=Ins': {POS: PRON, 'Case': 'Ins'},
'PRON__Case=Ins|Gender=Fem|Number=Sing|Person=3': {POS: PRON, 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Ins|Gender=Masc|Number=Sing|Person=3': {POS: PRON, 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Ins|Gender=Neut|Number=Sing|Person=3': {POS: PRON, 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Ins|Number=Plur|Person=1': {POS: PRON, 'Case': 'Ins', 'Number': 'Plur', 'Person': '1'},
'PRON__Case=Ins|Number=Plur|Person=2': {POS: PRON, 'Case': 'Ins', 'Number': 'Plur', 'Person': '2'},
'PRON__Case=Ins|Number=Plur|Person=3': {POS: PRON, 'Case': 'Ins', 'Number': 'Plur', 'Person': '3'},
'PRON__Case=Ins|Number=Sing|Person=1': {POS: PRON, 'Case': 'Ins', 'Number': 'Sing', 'Person': '1'},
'PRON__Case=Ins|Number=Sing|Person=2': {POS: PRON, 'Case': 'Ins', 'Number': 'Sing', 'Person': '2'},
'PRON__Case=Loc': {POS: PRON, 'Case': 'Loc'},
'PRON__Case=Loc|Gender=Fem|Number=Sing|Person=3': {POS: PRON, 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Loc|Gender=Masc|Number=Sing|Person=3': {POS: PRON, 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Loc|Gender=Neut|Number=Sing|Person=3': {POS: PRON, 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Loc|Number=Plur|Person=1': {POS: PRON, 'Case': 'Loc', 'Number': 'Plur', 'Person': '1'},
'PRON__Case=Loc|Number=Plur|Person=2': {POS: PRON, 'Case': 'Loc', 'Number': 'Plur', 'Person': '2'},
'PRON__Case=Loc|Number=Plur|Person=3': {POS: PRON, 'Case': 'Loc', 'Number': 'Plur', 'Person': '3'},
'PRON__Case=Loc|Number=Sing|Person=1': {POS: PRON, 'Case': 'Loc', 'Number': 'Sing', 'Person': '1'},
'PRON__Case=Loc|Number=Sing|Person=2': {POS: PRON, 'Case': 'Loc', 'Number': 'Sing', 'Person': '2'},
'PRON__Case=Nom': {POS: PRON, 'Case': 'Nom'},
'PRON__Case=Nom|Gender=Fem|Number=Sing|Person=3': {POS: PRON, 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Nom|Gender=Masc|Number=Sing|Person=3': {POS: PRON, 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Nom|Gender=Neut|Number=Sing|Person=3': {POS: PRON, 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing', 'Person': '3'},
'PRON__Case=Nom|Number=Plur|Person=1': {POS: PRON, 'Case': 'Nom', 'Number': 'Plur', 'Person': '1'},
'PRON__Case=Nom|Number=Plur|Person=2': {POS: PRON, 'Case': 'Nom', 'Number': 'Plur', 'Person': '2'},
'PRON__Case=Nom|Number=Plur|Person=3': {POS: PRON, 'Case': 'Nom', 'Number': 'Plur', 'Person': '3'},
'PRON__Case=Nom|Number=Sing|Person=1': {POS: PRON, 'Case': 'Nom', 'Number': 'Sing', 'Person': '1'},
'PRON__Case=Nom|Number=Sing|Person=2': {POS: PRON, 'Case': 'Nom', 'Number': 'Sing', 'Person': '2'},
'PRON__Number=Sing|Person=1': {POS: PRON, 'Number': 'Sing', 'Person': '1'},
'PRON___': {POS: PRON},
'PROPN__Animacy=Anim|Case=Acc|Gender=Fem|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Acc|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Acc|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Acc|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Acc|Gender=Neut|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Dat|Gender=Fem|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Dat|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Dat|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Dat|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Dat|Gender=Neut|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Gen|Foreign=Yes|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Gen', 'Foreign': 'Yes', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Gen|Gender=Fem|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Gen|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Gen|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Gen|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Ins|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Ins|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Ins|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Ins|Gender=Neut|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Loc|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Loc|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Loc|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Nom|Foreign=Yes|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Nom', 'Foreign': 'Yes', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Nom|Gender=Fem|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Nom|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Case=Nom|Gender=Neut|Number=Plur': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Plur'},
'PROPN__Animacy=Anim|Case=Voc|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Case': 'Voc', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Anim|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Anim', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Acc|Gender=Fem|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Acc|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Acc|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Acc|Gender=Neut|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Acc|Gender=Neut|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Acc|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Acc', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Dat|Gender=Fem|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Dat|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Dat|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Dat|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Dat|Gender=Neut|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Dat|Gender=Neut|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Dat|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Dat', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Gen|Foreign=Yes|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Gen', 'Foreign': 'Yes', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Gen|Gender=Fem|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Gen|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Gen|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Gen|Gender=Neut|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Gen|Gender=Neut|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Gen|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Gen', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Ins|Gender=Fem|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Ins|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Ins|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Ins|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Ins|Gender=Neut|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Ins|Gender=Neut|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Ins|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Ins', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Loc|Gender=Fem|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Loc|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Loc|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Loc|Gender=Neut|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Loc|Gender=Neut|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Loc|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Loc', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Nom|Foreign=Yes|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Nom', 'Foreign': 'Yes', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Nom|Foreign=Yes|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Nom', 'Foreign': 'Yes', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Nom|Foreign=Yes|Gender=Neut|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Nom', 'Foreign': 'Yes', 'Gender': 'Neut', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Nom|Gender=Fem|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Nom|Gender=Fem|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Nom|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Nom|Gender=Neut|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Case=Nom|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Nom', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Case=Par|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Case': 'Par', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Gender=Fem': {POS: PROPN, 'Animacy': 'Inan', 'Gender': 'Fem'},
'PROPN__Animacy=Inan|Gender=Masc': {POS: PROPN, 'Animacy': 'Inan', 'Gender': 'Masc'},
'PROPN__Animacy=Inan|Gender=Masc|Number=Plur': {POS: PROPN, 'Animacy': 'Inan', 'Gender': 'Masc', 'Number': 'Plur'},
'PROPN__Animacy=Inan|Gender=Masc|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Animacy=Inan|Gender=Neut|Number=Sing': {POS: PROPN, 'Animacy': 'Inan', 'Gender': 'Neut', 'Number': 'Sing'},
'PROPN__Case=Acc|Degree=Pos|Gender=Fem|Number=Sing': {POS: PROPN, 'Case': 'Acc', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Case=Dat|Degree=Pos|Gender=Masc|Number=Sing': {POS: PROPN, 'Case': 'Dat', 'Degree': 'Pos', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Case=Ins|Degree=Pos|Gender=Fem|Number=Sing': {POS: PROPN, 'Case': 'Ins', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Case=Ins|Degree=Pos|Number=Plur': {POS: PROPN, 'Case': 'Ins', 'Degree': 'Pos', 'Number': 'Plur'},
'PROPN__Case=Nom|Degree=Pos|Gender=Fem|Number=Sing': {POS: PROPN, 'Case': 'Nom', 'Degree': 'Pos', 'Gender': 'Fem', 'Number': 'Sing'},
'PROPN__Case=Nom|Degree=Pos|Gender=Masc|Number=Sing': {POS: PROPN, 'Case': 'Nom', 'Degree': 'Pos', 'Gender': 'Masc', 'Number': 'Sing'},
'PROPN__Case=Nom|Degree=Pos|Gender=Neut|Number=Sing': {POS: PROPN, 'Case': 'Nom', 'Degree': 'Pos', 'Gender': 'Neut', 'Number': 'Sing'},
'PROPN__Case=Nom|Degree=Pos|Number=Plur': {POS: PROPN, 'Case': 'Nom', 'Degree': 'Pos', 'Number': 'Plur'},
'PROPN__Degree=Pos|Gender=Neut|Number=Sing|Variant=Short': {POS: PROPN, 'Degree': 'Pos', 'Gender': 'Neut', 'Number': 'Sing', 'Variant': 'Short'},
'PROPN__Degree=Pos|Number=Plur|Variant=Short': {POS: PROPN, 'Degree': 'Pos', 'Number': 'Plur', 'Variant': 'Short'},
'PROPN__Foreign=Yes': {POS: PROPN, 'Foreign': 'Yes'},
'PROPN__Number=Sing': {POS: PROPN, 'Number': 'Sing'},
'PROPN___': {POS: PROPN},
'PUNCT___': {POS: PUNCT},
'SCONJ__Mood=Cnd': {POS: SCONJ, 'Mood': 'Cnd'},
'SCONJ___': {POS: SCONJ},
'SYM___': {POS: SYM},
'VERB__Animacy=Anim|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Anim|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Anim|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Anim|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Anim|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Animacy=Anim|Aspect=Imp|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Anim|Aspect=Imp|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Anim|Aspect=Imp|Case=Acc|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Anim|Aspect=Imp|Case=Acc|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Anim|Aspect=Imp|Case=Acc|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Animacy=Anim|Aspect=Perf|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Anim|Aspect=Perf|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Anim|Aspect=Perf|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Animacy=Anim|Aspect=Perf|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Perf', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Anim|Aspect=Perf|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Perf', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Anim|Aspect=Perf|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Animacy': 'Anim', 'Aspect': 'Perf', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Inan|Aspect=Imp|Case=Acc|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Imp', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Animacy=Inan|Aspect=Perf|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Inan|Aspect=Perf|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Inan|Aspect=Perf|Case=Acc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Animacy=Inan|Aspect=Perf|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Perf', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Animacy=Inan|Aspect=Perf|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Perf', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Animacy=Inan|Aspect=Perf|Case=Acc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Animacy': 'Inan', 'Aspect': 'Perf', 'Case': 'Acc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Acc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Acc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Acc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Acc|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Acc|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Acc|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Acc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Acc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Acc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Acc|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Acc|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Acc|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Dat|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Dat|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Dat|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Dat|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Dat|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Dat|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Dat|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Dat|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Dat|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Dat|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Dat|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Dat|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Dat|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Dat|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Dat|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Dat|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Dat|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Dat|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Dat|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Dat|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Dat', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Gen|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Gen|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Gen|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Gen|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Gen|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Gen|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Gen|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Gen|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Gen|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Gen|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Gen|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Gen|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Gen|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Gen|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Gen|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Gen|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Gen|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Gen|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Gen|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Gen|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Gen|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Gen|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Gen|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Gen', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Ins|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Ins|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Ins|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Ins|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Ins|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Ins|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Ins|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Ins|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Ins|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Ins|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Ins|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Ins|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Ins|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Ins|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Ins|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Ins|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Ins|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Ins|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Ins|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Ins', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Loc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Loc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Loc|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Loc|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Loc|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Loc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Loc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Loc|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Loc|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Loc|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Loc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Loc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Loc|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Loc|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Loc|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Loc|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Loc|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Loc|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Loc', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Nom|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Nom|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Nom|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Nom|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Nom|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Nom|Gender=Fem|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Nom|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Nom|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Nom|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Nom|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Nom|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Nom|Gender=Masc|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Nom|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Nom|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Nom|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Nom|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Nom|Gender=Neut|Number=Sing|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Nom|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Nom|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Nom|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Case=Nom|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Imp|Case=Nom|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Case=Nom|Number=Plur|Tense=Pres|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Case': 'Nom', 'Number': 'Plur', 'Tense': 'Pres', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Fem', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Fem', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Fem', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Gender=Fem|Number=Sing|Tense=Past|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Gender=Fem|Number=Sing|Tense=Pres|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Pres', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Masc', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Masc', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Masc', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Gender=Masc|Number=Sing|Tense=Past|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Gender=Masc|Number=Sing|Tense=Pres|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Pres', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Gender=Neut|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Neut', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Gender=Neut|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Neut', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Gender=Neut|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Neut', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Gender=Neut|Number=Sing|Tense=Past|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Gender=Neut|Number=Sing|Tense=Pres|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Pres', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Mood=Imp|Number=Plur|Person=2|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Imp', 'Number': 'Plur', 'Person': '2', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Mood=Imp|Number=Plur|Person=2|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Imp', 'Number': 'Plur', 'Person': '2', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Mood=Imp|Number=Sing|Person=2|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Imp', 'Number': 'Sing', 'Person': '2', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Mood=Imp|Number=Sing|Person=2|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Imp', 'Number': 'Sing', 'Person': '2', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '1', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '1', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '2', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '2', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '3', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '3', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '3', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '1', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '1', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '2', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '2', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '3', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '3', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '3', 'Tense': 'Pres', 'VerbForm': 'Fin', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Number=Plur|Tense=Past|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Number': 'Plur', 'Tense': 'Past', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Number=Plur|Tense=Pres|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Number': 'Plur', 'Tense': 'Pres', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Imp|Tense=Past|VerbForm=Conv|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Tense': 'Past', 'VerbForm': 'Conv', 'Voice': 'Act'},
'VERB__Aspect=Imp|Tense=Pres|VerbForm=Conv|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'Tense': 'Pres', 'VerbForm': 'Conv', 'Voice': 'Act'},
'VERB__Aspect=Imp|Tense=Pres|VerbForm=Conv|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'Tense': 'Pres', 'VerbForm': 'Conv', 'Voice': 'Mid'},
'VERB__Aspect=Imp|Tense=Pres|VerbForm=Conv|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'Tense': 'Pres', 'VerbForm': 'Conv', 'Voice': 'Pass'},
'VERB__Aspect=Imp|VerbForm=Inf|Voice=Act': {POS: VERB, 'Aspect': 'Imp', 'VerbForm': 'Inf', 'Voice': 'Act'},
'VERB__Aspect=Imp|VerbForm=Inf|Voice=Mid': {POS: VERB, 'Aspect': 'Imp', 'VerbForm': 'Inf', 'Voice': 'Mid'},
'VERB__Aspect=Imp|VerbForm=Inf|Voice=Pass': {POS: VERB, 'Aspect': 'Imp', 'VerbForm': 'Inf', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Acc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Acc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Acc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Acc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Acc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Acc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Acc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Dat|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Dat|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Dat|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Dat|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Dat|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Dat|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Dat|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Dat|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Dat|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Dat|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Dat|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Dat|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Dat', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Gen|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Gen|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Gen|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Gen|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Gen|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Gen|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Gen|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Gen|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Gen|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Gen|Number=Plur|Tense=Fut|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Number': 'Plur', 'Tense': 'Fut', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Gen|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Gen|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Gen|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Gen', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Ins|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Ins|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Ins|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Ins|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Ins|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Ins|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Ins|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Ins|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Ins|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Ins|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Ins|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Ins|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Ins', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Loc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Loc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Loc|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Loc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Loc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Loc|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Loc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Loc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Loc|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Loc', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Nom|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Nom|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Nom|Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Nom|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Nom|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Nom|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Nom|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Nom|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Nom|Gender=Neut|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Case=Nom|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Act'},
'VERB__Aspect=Perf|Case=Nom|Number=Plur|Tense=Past|VerbForm=Part|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Case=Nom|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Case': 'Nom', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Gender': 'Fem', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Gender': 'Fem', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Gender=Fem|Number=Sing|Tense=Past|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Gender': 'Fem', 'Number': 'Sing', 'Tense': 'Past', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Gender': 'Masc', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Gender': 'Masc', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Gender=Masc|Number=Sing|Tense=Past|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Gender': 'Masc', 'Number': 'Sing', 'Tense': 'Past', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Gender=Neut|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Gender': 'Neut', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Gender=Neut|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Gender': 'Neut', 'Mood': 'Ind', 'Number': 'Sing', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Gender=Neut|Number=Sing|Tense=Past|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Gender': 'Neut', 'Number': 'Sing', 'Tense': 'Past', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Mood=Imp|Number=Plur|Person=1|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Imp', 'Number': 'Plur', 'Person': '1', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Mood=Imp|Number=Plur|Person=2|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Imp', 'Number': 'Plur', 'Person': '2', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Mood=Imp|Number=Plur|Person=2|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Imp', 'Number': 'Plur', 'Person': '2', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Mood=Imp|Number=Sing|Person=2|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Imp', 'Number': 'Sing', 'Person': '2', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Mood=Imp|Number=Sing|Person=2|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Imp', 'Number': 'Sing', 'Person': '2', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Mood=Ind|Number=Plur|Person=1|Tense=Fut|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '1', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Mood=Ind|Number=Plur|Person=1|Tense=Fut|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '1', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Mood=Ind|Number=Plur|Person=2|Tense=Fut|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '2', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Mood=Ind|Number=Plur|Person=2|Tense=Fut|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '2', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Mood=Ind|Number=Plur|Person=3|Tense=Fut|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '3', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Mood=Ind|Number=Plur|Person=3|Tense=Fut|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '3', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Mood=Ind|Number=Plur|Person=3|Tense=Fut|VerbForm=Fin|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Plur', 'Person': '3', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Plur', 'Tense': 'Past', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Mood=Ind|Number=Sing|Person=1|Tense=Fut|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '1', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Mood=Ind|Number=Sing|Person=1|Tense=Fut|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '1', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Mood=Ind|Number=Sing|Person=2|Tense=Fut|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '2', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Mood=Ind|Number=Sing|Person=2|Tense=Fut|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '2', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Mood=Ind|Number=Sing|Person=3|Tense=Fut|VerbForm=Fin|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '3', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Act'},
'VERB__Aspect=Perf|Mood=Ind|Number=Sing|Person=3|Tense=Fut|VerbForm=Fin|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Mood': 'Ind', 'Number': 'Sing', 'Person': '3', 'Tense': 'Fut', 'VerbForm': 'Fin', 'Voice': 'Mid'},
'VERB__Aspect=Perf|Number=Plur|Tense=Past|Variant=Short|VerbForm=Part|Voice=Pass': {POS: VERB, 'Aspect': 'Perf', 'Number': 'Plur', 'Tense': 'Past', 'Variant': 'Short', 'VerbForm': 'Part', 'Voice': 'Pass'},
'VERB__Aspect=Perf|Tense=Past|VerbForm=Conv|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'Tense': 'Past', 'VerbForm': 'Conv', 'Voice': 'Act'},
'VERB__Aspect=Perf|Tense=Past|VerbForm=Conv|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'Tense': 'Past', 'VerbForm': 'Conv', 'Voice': 'Mid'},
'VERB__Aspect=Perf|VerbForm=Inf|Voice=Act': {POS: VERB, 'Aspect': 'Perf', 'VerbForm': 'Inf', 'Voice': 'Act'},
'VERB__Aspect=Perf|VerbForm=Inf|Voice=Mid': {POS: VERB, 'Aspect': 'Perf', 'VerbForm': 'Inf', 'Voice': 'Mid'},
'VERB__Voice=Act': {POS: VERB, 'Voice': 'Act'},
'VERB___': {POS: VERB},
'X__Foreign=Yes': {POS: X, 'Foreign': 'Yes'},
'X___': {POS: X},
}

View File

@ -0,0 +1,68 @@
# encoding: utf8
from __future__ import unicode_literals
from ...symbols import ORTH, LEMMA, NORM
_exc = {}
_abbrev_exc = [
# Weekdays abbreviations
{ORTH: "пн", LEMMA: "понедельник", NORM: "понедельник"},
{ORTH: "вт", LEMMA: "вторник", NORM: "вторник"},
{ORTH: "ср", LEMMA: "среда", NORM: "среда"},
{ORTH: "чт", LEMMA: "четверг", NORM: "четверг"},
{ORTH: "чтв", LEMMA: "четверг", NORM: "четверг"},
{ORTH: "пт", LEMMA: "пятница", NORM: "пятница"},
{ORTH: "сб", LEMMA: "суббота", NORM: "суббота"},
{ORTH: "сбт", LEMMA: "суббота", NORM: "суббота"},
{ORTH: "вс", LEMMA: "воскресенье", NORM: "воскресенье"},
{ORTH: "вскр", LEMMA: "воскресенье", NORM: "воскресенье"},
{ORTH: "воскр", LEMMA: "воскресенье", NORM: "воскресенье"},
# Months abbreviations
{ORTH: "янв", LEMMA: "январь", NORM: "январь"},
{ORTH: "фев", LEMMA: "февраль", NORM: "февраль"},
{ORTH: "февр", LEMMA: "февраль", NORM: "февраль"},
{ORTH: "мар", LEMMA: "март", NORM: "март"},
# {ORTH: "март", LEMMA: "март", NORM: "март"},
{ORTH: "мрт", LEMMA: "март", NORM: "март"},
{ORTH: "апр", LEMMA: "апрель", NORM: "апрель"},
# {ORTH: "май", LEMMA: "май", NORM: "май"},
{ORTH: "июн", LEMMA: "июнь", NORM: "июнь"},
# {ORTH: "июнь", LEMMA: "июнь", NORM: "июнь"},
{ORTH: "июл", LEMMA: "июль", NORM: "июль"},
# {ORTH: "июль", LEMMA: "июль", NORM: "июль"},
{ORTH: "авг", LEMMA: "август", NORM: "август"},
{ORTH: "сен", LEMMA: "сентябрь", NORM: "сентябрь"},
{ORTH: "сент", LEMMA: "сентябрь", NORM: "сентябрь"},
{ORTH: "окт", LEMMA: "октябрь", NORM: "октябрь"},
{ORTH: "октб", LEMMA: "октябрь", NORM: "октябрь"},
{ORTH: "ноя", LEMMA: "ноябрь", NORM: "ноябрь"},
{ORTH: "нояб", LEMMA: "ноябрь", NORM: "ноябрь"},
{ORTH: "нбр", LEMMA: "ноябрь", NORM: "ноябрь"},
{ORTH: "дек", LEMMA: "декабрь", NORM: "декабрь"},
]
for abbrev_desc in _abbrev_exc:
abbrev = abbrev_desc[ORTH]
for orth in (abbrev, abbrev.capitalize(), abbrev.upper()):
_exc[orth] = [{ORTH: orth, LEMMA: abbrev_desc[LEMMA], NORM: abbrev_desc[NORM]}]
_exc[orth + '.'] = [{ORTH: orth + '.', LEMMA: abbrev_desc[LEMMA], NORM: abbrev_desc[NORM]}]
_slang_exc = [
{ORTH: '2к15', LEMMA: '2015', NORM: '2015'},
{ORTH: '2к16', LEMMA: '2016', NORM: '2016'},
{ORTH: '2к17', LEMMA: '2017', NORM: '2017'},
{ORTH: '2к18', LEMMA: '2018', NORM: '2018'},
{ORTH: '2к19', LEMMA: '2019', NORM: '2019'},
{ORTH: '2к20', LEMMA: '2020', NORM: '2020'},
]
for slang_desc in _slang_exc:
_exc[slang_desc[ORTH]] = [slang_desc]
TOKENIZER_EXCEPTIONS = _exc

View File

@ -147,7 +147,7 @@ class Language(object):
self._meta.setdefault('lang', self.vocab.lang) self._meta.setdefault('lang', self.vocab.lang)
self._meta.setdefault('name', 'model') self._meta.setdefault('name', 'model')
self._meta.setdefault('version', '0.0.0') self._meta.setdefault('version', '0.0.0')
self._meta.setdefault('spacy_version', about.__version__) self._meta.setdefault('spacy_version', '>={}'.format(about.__version__))
self._meta.setdefault('description', '') self._meta.setdefault('description', '')
self._meta.setdefault('author', '') self._meta.setdefault('author', '')
self._meta.setdefault('email', '') self._meta.setdefault('email', '')
@ -260,7 +260,7 @@ class Language(object):
elif before and before in self.pipe_names: elif before and before in self.pipe_names:
self.pipeline.insert(self.pipe_names.index(before), pipe) self.pipeline.insert(self.pipe_names.index(before), pipe)
elif after and after in self.pipe_names: elif after and after in self.pipe_names:
self.pipeline.insert(self.pipe_names.index(after), pipe) self.pipeline.insert(self.pipe_names.index(after) + 1, pipe)
else: else:
msg = "Can't find '{}' in pipeline. Available names: {}" msg = "Can't find '{}' in pipeline. Available names: {}"
unfound = before or after unfound = before or after
@ -502,19 +502,19 @@ class Language(object):
pass pass
def pipe(self, texts, as_tuples=False, n_threads=2, batch_size=1000, def pipe(self, texts, as_tuples=False, n_threads=2, batch_size=1000,
disable=[]): disable=[], cleanup=False):
"""Process texts as a stream, and yield `Doc` objects in order. """Process texts as a stream, and yield `Doc` objects in order.
Supports GIL-free multi-threading.
texts (iterator): A sequence of texts to process. texts (iterator): A sequence of texts to process.
as_tuples (bool): as_tuples (bool):
If set to True, inputs should be a sequence of If set to True, inputs should be a sequence of
(text, context) tuples. Output will then be a sequence of (text, context) tuples. Output will then be a sequence of
(doc, context) tuples. Defaults to False. (doc, context) tuples. Defaults to False.
n_threads (int): The number of worker threads to use. If -1, OpenMP n_threads (int): Currently inactive.
will decide how many to use at run time. Default is 2.
batch_size (int): The number of texts to buffer. batch_size (int): The number of texts to buffer.
disable (list): Names of the pipeline components to disable. disable (list): Names of the pipeline components to disable.
cleanup (bool): If True, unneeded strings are freed,
to control memory use. Experimental.
YIELDS (Doc): Documents in the order of the original text. YIELDS (Doc): Documents in the order of the original text.
EXAMPLE: EXAMPLE:
@ -547,24 +547,27 @@ class Language(object):
# in the string store. # in the string store.
recent_refs = weakref.WeakSet() recent_refs = weakref.WeakSet()
old_refs = weakref.WeakSet() old_refs = weakref.WeakSet()
# If there is anything that we have inside — after iterations we should # Keep track of the original string data, so that if we flush old strings,
# carefully get it back. # we can recover the original ones. However, we only want to do this if we're
original_strings_data = list(self.vocab.strings) # really adding strings, to save up-front costs.
original_strings_data = None
nr_seen = 0 nr_seen = 0
for doc in docs: for doc in docs:
yield doc yield doc
recent_refs.add(doc) if cleanup:
if nr_seen < 10000: recent_refs.add(doc)
old_refs.add(doc) if nr_seen < 10000:
nr_seen += 1 old_refs.add(doc)
elif len(old_refs) == 0: nr_seen += 1
self.vocab.strings._cleanup_stale_strings() elif len(old_refs) == 0:
nr_seen = 0 old_refs, recent_refs = recent_refs, old_refs
# We can't know which strings from the last batch have really expired. if original_strings_data is None:
# So we don't erase the strings — we just extend with the original original_strings_data = list(self.vocab.strings)
# content. else:
for string in original_strings_data: keys, strings = self.vocab.strings._cleanup_stale_strings(original_strings_data)
self.vocab.strings.add(string) self.vocab._reset_cache(keys, strings)
self.tokenizer._reset_cache(keys)
nr_seen = 0
def to_disk(self, path, disable=tuple()): def to_disk(self, path, disable=tuple()):
"""Save the current state to a directory. If a model is loaded, this """Save the current state to a directory. If a model is loaded, this

View File

@ -249,20 +249,37 @@ cdef class StringStore:
for string in strings: for string in strings:
self.add(string) self.add(string)
def _cleanup_stale_strings(self): def _cleanup_stale_strings(self, excepted):
"""
excepted (list): Strings that should not be removed.
RETURNS (keys, strings): Dropped strings and keys that can be dropped from other places
"""
if self.hits.size() == 0: if self.hits.size() == 0:
# If we don't have any hits, just skip cleanup # If we don't have any hits, just skip cleanup
return return
cdef vector[hash_t] tmp cdef vector[hash_t] tmp
dropped_strings = []
dropped_keys = []
for i in range(self.keys.size()): for i in range(self.keys.size()):
key = self.keys[i] key = self.keys[i]
if self.hits.count(key) != 0: # Here we cannot use __getitem__ because it also set hit.
utf8str = <Utf8Str*>self._map.get(key)
value = decode_Utf8Str(utf8str)
if self.hits.count(key) != 0 or value in excepted:
tmp.push_back(key) tmp.push_back(key)
else:
dropped_keys.append(key)
dropped_strings.append(value)
self.keys.swap(tmp) self.keys.swap(tmp)
strings = list(self)
self._reset_and_load(strings)
# Here we have strings but hits to it should be reseted
self.hits.clear() self.hits.clear()
return dropped_keys, dropped_strings
cdef const Utf8Str* intern_unicode(self, unicode py_string): cdef const Utf8Str* intern_unicode(self, unicode py_string):
# 0 means missing, but we don't bother offsetting the index. # 0 means missing, but we don't bother offsetting the index.
cdef bytes byte_string = py_string.encode('utf8') cdef bytes byte_string = py_string.encode('utf8')

View File

@ -1,4 +1,6 @@
# coding: utf-8 # coding: utf-8
# cython: profile=True
# cython: infer_types=True
"""Implements the projectivize/deprojectivize mechanism in Nivre & Nilsson 2005 """Implements the projectivize/deprojectivize mechanism in Nivre & Nilsson 2005
for doing pseudo-projective parsing implementation uses the HEAD decoration for doing pseudo-projective parsing implementation uses the HEAD decoration
scheme. scheme.
@ -7,6 +9,8 @@ from __future__ import unicode_literals
from copy import copy from copy import copy
from ..tokens.doc cimport Doc
DELIMITER = '||' DELIMITER = '||'
@ -111,17 +115,18 @@ def projectivize(heads, labels):
return proj_heads, deco_labels return proj_heads, deco_labels
def deprojectivize(tokens): cpdef deprojectivize(Doc doc):
# Reattach arcs with decorated labels (following HEAD scheme). For each # Reattach arcs with decorated labels (following HEAD scheme). For each
# decorated arc X||Y, search top-down, left-to-right, breadth-first until # decorated arc X||Y, search top-down, left-to-right, breadth-first until
# hitting a Y then make this the new head. # hitting a Y then make this the new head.
for token in tokens: for i in range(doc.length):
if is_decorated(token.dep_): label = doc.vocab.strings[doc.c[i].dep]
newlabel, headlabel = decompose(token.dep_) if DELIMITER in label:
newhead = _find_new_head(token, headlabel) new_label, head_label = label.split(DELIMITER)
token.head = newhead new_head = _find_new_head(doc[i], head_label)
token.dep_ = newlabel doc[i].head = new_head
return tokens doc.c[i].dep = doc.vocab.strings.add(new_label)
return doc
def _decorate(heads, proj_heads, labels): def _decorate(heads, proj_heads, labels):

View File

@ -15,7 +15,7 @@ from .. import util
# here if it's using spaCy's tokenizer (not a different library) # here if it's using spaCy's tokenizer (not a different library)
# TODO: re-implement generic tokenizer tests # TODO: re-implement generic tokenizer tests
_languages = ['bn', 'da', 'de', 'en', 'es', 'fi', 'fr', 'ga', 'he', 'hu', 'id', _languages = ['bn', 'da', 'de', 'en', 'es', 'fi', 'fr', 'ga', 'he', 'hu', 'id',
'it', 'nb', 'nl', 'pl', 'pt', 'sv', 'xx'] 'it', 'nb', 'nl', 'pl', 'pt', 'ru', 'sv', 'xx']
_models = {'en': ['en_core_web_sm'], _models = {'en': ['en_core_web_sm'],
'de': ['de_core_news_md'], 'de': ['de_core_news_md'],
'fr': ['fr_core_news_sm'], 'fr': ['fr_core_news_sm'],
@ -40,6 +40,12 @@ def FR(request):
return load_test_model(request.param) return load_test_model(request.param)
@pytest.fixture()
def RU(request):
pymorphy = pytest.importorskip('pymorphy2')
return util.get_lang_class('ru')()
#@pytest.fixture(params=_languages) #@pytest.fixture(params=_languages)
#def tokenizer(request): #def tokenizer(request):
#lang = util.get_lang_class(request.param) #lang = util.get_lang_class(request.param)
@ -137,6 +143,12 @@ def th_tokenizer():
return util.get_lang_class('th').Defaults.create_tokenizer() return util.get_lang_class('th').Defaults.create_tokenizer()
@pytest.fixture
def ru_tokenizer():
pymorphy = pytest.importorskip('pymorphy2')
return util.get_lang_class('ru').Defaults.create_tokenizer()
@pytest.fixture @pytest.fixture
def stringstore(): def stringstore():
return StringStore() return StringStore()

View File

@ -1,7 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from ...gold import biluo_tags_from_offsets from ...gold import biluo_tags_from_offsets, offsets_from_biluo_tags
from ...tokens.doc import Doc from ...tokens.doc import Doc
import pytest import pytest
@ -41,3 +41,14 @@ def test_gold_biluo_misalign(en_vocab):
entities = [(len("I flew to "), len("I flew to San Francisco Valley"), 'LOC')] entities = [(len("I flew to "), len("I flew to San Francisco Valley"), 'LOC')]
tags = biluo_tags_from_offsets(doc, entities) tags = biluo_tags_from_offsets(doc, entities)
assert tags == ['O', 'O', 'O', '-', '-', '-'] assert tags == ['O', 'O', 'O', '-', '-', '-']
def test_roundtrip_offsets_biluo_conversion(en_tokenizer):
text = "I flew to Silicon Valley via London."
biluo_tags = ['O', 'O', 'O', 'B-LOC', 'L-LOC', 'O', 'U-GPE', 'O']
offsets = [(10, 24, 'LOC'), (29, 35, 'GPE')]
doc = en_tokenizer(text)
biluo_tags_converted = biluo_tags_from_offsets(doc, offsets)
assert biluo_tags_converted == biluo_tags
offsets_converted = offsets_from_biluo_tags(doc, biluo_tags)
assert offsets_converted == offsets

View File

@ -3,13 +3,37 @@ from __future__ import unicode_literals
import pytest import pytest
@pytest.mark.parametrize('text', ["ca.", "m.a.o.", "Jan.", "Dec."]) @pytest.mark.parametrize('text',
["ca.", "m.a.o.", "Jan.", "Dec.", "kr.", "jf."])
def test_da_tokenizer_handles_abbr(da_tokenizer, text): def test_da_tokenizer_handles_abbr(da_tokenizer, text):
tokens = da_tokenizer(text) tokens = da_tokenizer(text)
assert len(tokens) == 1 assert len(tokens) == 1
@pytest.mark.parametrize('text', ["Jul.", "jul.", "Tor.", "Tors."])
def test_da_tokenizer_handles_ambiguous_abbr(da_tokenizer, text):
tokens = da_tokenizer(text)
assert len(tokens) == 2
@pytest.mark.parametrize('text', ["1.", "10.", "31."])
def test_da_tokenizer_handles_dates(da_tokenizer, text):
tokens = da_tokenizer(text)
assert len(tokens) == 1
def test_da_tokenizer_handles_exc_in_text(da_tokenizer): def test_da_tokenizer_handles_exc_in_text(da_tokenizer):
text = "Det er bl.a. ikke meningen" text = "Det er bl.a. ikke meningen"
tokens = da_tokenizer(text) tokens = da_tokenizer(text)
assert len(tokens) == 5 assert len(tokens) == 5
assert tokens[2].text == "bl.a." assert tokens[2].text == "bl.a."
def test_da_tokenizer_handles_custom_base_exc(da_tokenizer):
text = "Her er noget du kan kigge i."
tokens = da_tokenizer(text)
assert len(tokens) == 8
assert tokens[6].text == "i"
assert tokens[7].text == "."
@pytest.mark.parametrize('text,norm',
[("akvarium", "akvarie"), ("bedstemoder", "bedstemor")])
def test_da_tokenizer_norm_exceptions(da_tokenizer, text, norm):
tokens = da_tokenizer(text)
assert tokens[0].norm_ == norm

View File

@ -21,7 +21,7 @@ def test_lemmatizer_noun_verb_2(FR):
@pytest.mark.models('fr') @pytest.mark.models('fr')
@pytest.mark.xfail(reason="Costaricienne TAG is PROPN instead of NOUN and spacy don't lemmatize PROPN") @pytest.mark.xfail(reason="Costaricienne TAG is PROPN instead of NOUN and spacy don't lemmatize PROPN")
def test_lemmatizer_noun(model): def test_lemmatizer_noun(FR):
tokens = FR("il y a des Costaricienne.") tokens = FR("il y a des Costaricienne.")
assert tokens[4].lemma_ == "Costaricain" assert tokens[4].lemma_ == "Costaricain"

View File

View File

@ -0,0 +1,71 @@
# coding: utf-8
from __future__ import unicode_literals
import pytest
from ....tokens.doc import Doc
@pytest.fixture
def ru_lemmatizer(RU):
return RU.Defaults.create_lemmatizer()
@pytest.mark.models('ru')
def test_doc_lemmatization(RU):
doc = Doc(RU.vocab, words=['мама', 'мыла', 'раму'])
doc[0].tag_ = 'NOUN__Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing'
doc[1].tag_ = 'VERB__Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act'
doc[2].tag_ = 'NOUN__Animacy=Anim|Case=Acc|Gender=Fem|Number=Sing'
lemmas = [token.lemma_ for token in doc]
assert lemmas == ['мама', 'мыть', 'рама']
@pytest.mark.models('ru')
@pytest.mark.parametrize('text,lemmas', [('гвоздики', ['гвоздик', 'гвоздика']),
('люди', ['человек']),
('реки', ['река']),
('кольцо', ['кольцо']),
('пепперони', ['пепперони'])])
def test_ru_lemmatizer_noun_lemmas(ru_lemmatizer, text, lemmas):
assert sorted(ru_lemmatizer.noun(text)) == lemmas
@pytest.mark.models('ru')
@pytest.mark.parametrize('text,pos,morphology,lemma', [('рой', 'NOUN', None, 'рой'),
('рой', 'VERB', None, 'рыть'),
('клей', 'NOUN', None, 'клей'),
('клей', 'VERB', None, 'клеить'),
('три', 'NUM', None, 'три'),
('кос', 'NOUN', {'Number': 'Sing'}, 'кос'),
('кос', 'NOUN', {'Number': 'Plur'}, 'коса'),
('кос', 'ADJ', None, 'косой'),
('потом', 'NOUN', None, 'пот'),
('потом', 'ADV', None, 'потом')
])
def test_ru_lemmatizer_works_with_different_pos_homonyms(ru_lemmatizer, text, pos, morphology, lemma):
assert ru_lemmatizer(text, pos, morphology) == [lemma]
@pytest.mark.models('ru')
@pytest.mark.parametrize('text,morphology,lemma', [('гвоздики', {'Gender': 'Fem'}, 'гвоздика'),
('гвоздики', {'Gender': 'Masc'}, 'гвоздик'),
('вина', {'Gender': 'Fem'}, 'вина'),
('вина', {'Gender': 'Neut'}, 'вино')
])
def test_ru_lemmatizer_works_with_noun_homonyms(ru_lemmatizer, text, morphology, lemma):
assert ru_lemmatizer.noun(text, morphology) == [lemma]
@pytest.mark.models('ru')
def test_ru_lemmatizer_punct(ru_lemmatizer):
assert ru_lemmatizer.punct('«') == ['"']
assert ru_lemmatizer.punct('»') == ['"']
# @pytest.mark.models('ru')
# def test_ru_lemmatizer_lemma_assignment(RU):
# text = "А роза упала на лапу Азора."
# doc = RU.make_doc(text)
# RU.tagger(doc)
# assert all(t.lemma_ != '' for t in doc)

View File

@ -0,0 +1,128 @@
# coding: utf-8
"""Test that open, closed and paired punctuation is split off correctly."""
from __future__ import unicode_literals
import pytest
PUNCT_OPEN = ['(', '[', '{', '*']
PUNCT_CLOSE = [')', ']', '}', '*']
PUNCT_PAIRED = [('(', ')'), ('[', ']'), ('{', '}'), ('*', '*')]
@pytest.mark.parametrize('text', ["(", "((", "<"])
def test_ru_tokenizer_handles_only_punct(ru_tokenizer, text):
tokens = ru_tokenizer(text)
assert len(tokens) == len(text)
@pytest.mark.parametrize('punct', PUNCT_OPEN)
@pytest.mark.parametrize('text', ["Привет"])
def test_ru_tokenizer_splits_open_punct(ru_tokenizer, punct, text):
tokens = ru_tokenizer(punct + text)
assert len(tokens) == 2
assert tokens[0].text == punct
assert tokens[1].text == text
@pytest.mark.parametrize('punct', PUNCT_CLOSE)
@pytest.mark.parametrize('text', ["Привет"])
def test_ru_tokenizer_splits_close_punct(ru_tokenizer, punct, text):
tokens = ru_tokenizer(text + punct)
assert len(tokens) == 2
assert tokens[0].text == text
assert tokens[1].text == punct
@pytest.mark.parametrize('punct', PUNCT_OPEN)
@pytest.mark.parametrize('punct_add', ["`"])
@pytest.mark.parametrize('text', ["Привет"])
def test_ru_tokenizer_splits_two_diff_open_punct(ru_tokenizer, punct, punct_add, text):
tokens = ru_tokenizer(punct + punct_add + text)
assert len(tokens) == 3
assert tokens[0].text == punct
assert tokens[1].text == punct_add
assert tokens[2].text == text
@pytest.mark.parametrize('punct', PUNCT_CLOSE)
@pytest.mark.parametrize('punct_add', ["'"])
@pytest.mark.parametrize('text', ["Привет"])
def test_ru_tokenizer_splits_two_diff_close_punct(ru_tokenizer, punct, punct_add, text):
tokens = ru_tokenizer(text + punct + punct_add)
assert len(tokens) == 3
assert tokens[0].text == text
assert tokens[1].text == punct
assert tokens[2].text == punct_add
@pytest.mark.parametrize('punct', PUNCT_OPEN)
@pytest.mark.parametrize('text', ["Привет"])
def test_ru_tokenizer_splits_same_open_punct(ru_tokenizer, punct, text):
tokens = ru_tokenizer(punct + punct + punct + text)
assert len(tokens) == 4
assert tokens[0].text == punct
assert tokens[3].text == text
@pytest.mark.parametrize('punct', PUNCT_CLOSE)
@pytest.mark.parametrize('text', ["Привет"])
def test_ru_tokenizer_splits_same_close_punct(ru_tokenizer, punct, text):
tokens = ru_tokenizer(text + punct + punct + punct)
assert len(tokens) == 4
assert tokens[0].text == text
assert tokens[1].text == punct
@pytest.mark.parametrize('text', ["'Тест"])
def test_ru_tokenizer_splits_open_appostrophe(ru_tokenizer, text):
tokens = ru_tokenizer(text)
assert len(tokens) == 2
assert tokens[0].text == "'"
@pytest.mark.parametrize('text', ["Тест''"])
def test_ru_tokenizer_splits_double_end_quote(ru_tokenizer, text):
tokens = ru_tokenizer(text)
assert len(tokens) == 2
tokens_punct = ru_tokenizer("''")
assert len(tokens_punct) == 1
@pytest.mark.parametrize('punct_open,punct_close', PUNCT_PAIRED)
@pytest.mark.parametrize('text', ["Тест"])
def test_ru_tokenizer_splits_open_close_punct(ru_tokenizer, punct_open,
punct_close, text):
tokens = ru_tokenizer(punct_open + text + punct_close)
assert len(tokens) == 3
assert tokens[0].text == punct_open
assert tokens[1].text == text
assert tokens[2].text == punct_close
@pytest.mark.parametrize('punct_open,punct_close', PUNCT_PAIRED)
@pytest.mark.parametrize('punct_open2,punct_close2', [("`", "'")])
@pytest.mark.parametrize('text', ["Тест"])
def test_ru_tokenizer_two_diff_punct(ru_tokenizer, punct_open, punct_close,
punct_open2, punct_close2, text):
tokens = ru_tokenizer(punct_open2 + punct_open + text + punct_close + punct_close2)
assert len(tokens) == 5
assert tokens[0].text == punct_open2
assert tokens[1].text == punct_open
assert tokens[2].text == text
assert tokens[3].text == punct_close
assert tokens[4].text == punct_close2
@pytest.mark.parametrize('text', ["Тест."])
def test_ru_tokenizer_splits_trailing_dot(ru_tokenizer, text):
tokens = ru_tokenizer(text)
assert tokens[1].text == "."
def test_ru_tokenizer_splits_bracket_period(ru_tokenizer):
text = "(Раз, два, три, проверка)."
tokens = ru_tokenizer(text)
assert tokens[len(tokens) - 1].text == "."

View File

@ -0,0 +1,16 @@
# coding: utf-8
"""Test that tokenizer exceptions are parsed correctly."""
from __future__ import unicode_literals
import pytest
@pytest.mark.parametrize('text,norms', [("пн.", ["понедельник"]),
("пт.", ["пятница"]),
("дек.", ["декабрь"])])
def test_ru_tokenizer_abbrev_exceptions(ru_tokenizer, text, norms):
tokens = ru_tokenizer(text)
assert len(tokens) == 1
assert [token.norm_ for token in tokens] == norms

View File

@ -100,3 +100,10 @@ def test_disable_pipes_context(nlp, name):
with nlp.disable_pipes(name): with nlp.disable_pipes(name):
assert not nlp.has_pipe(name) assert not nlp.has_pipe(name)
assert nlp.has_pipe(name) assert nlp.has_pipe(name)
@pytest.mark.parametrize('n_pipes', [100])
def test_add_lots_of_pipes(nlp, n_pipes):
for i in range(n_pipes):
nlp.add_pipe(lambda doc: doc, name='pipe_%d' % i)
assert len(nlp.pipe_names) == n_pipes

View File

@ -0,0 +1,13 @@
from __future__ import unicode_literals
import pytest
@pytest.mark.models('en')
def test_issue1207(EN):
text = 'Employees are recruiting talented staffers from overseas.'
doc = EN(text)
assert [i.text for i in doc.noun_chunks] == ['Employees', 'talented staffers']
sent = list(doc.sents)[0]
assert [i.text for i in sent.noun_chunks] == ['Employees', 'talented staffers']

View File

@ -0,0 +1,39 @@
# coding: utf8
from __future__ import unicode_literals
import pytest
import re
from ...lang.en import English
from ...tokenizer import Tokenizer
def test_issue1494():
infix_re = re.compile(r'''[^a-z]''')
text_to_tokenize1 = 'token 123test'
expected_tokens1 = ['token', '1', '2', '3', 'test']
text_to_tokenize2 = 'token 1test'
expected_tokens2 = ['token', '1test']
text_to_tokenize3 = 'hello...test'
expected_tokens3 = ['hello', '.', '.', '.', 'test']
def my_tokenizer(nlp):
return Tokenizer(nlp.vocab,
{},
infix_finditer=infix_re.finditer
)
nlp = English()
nlp.tokenizer = my_tokenizer(nlp)
tokenized_words1 = [token.text for token in nlp(text_to_tokenize1)]
assert tokenized_words1 == expected_tokens1
tokenized_words2 = [token.text for token in nlp(text_to_tokenize2)]
assert tokenized_words2 == expected_tokens2
tokenized_words3 = [token.text for token in nlp(text_to_tokenize3)]
assert tokenized_words3 == expected_tokens3

View File

@ -1,6 +1,8 @@
# coding: utf8 # coding: utf8
from __future__ import unicode_literals from __future__ import unicode_literals
import gc
from ...lang.en import English from ...lang.en import English
@ -9,14 +11,25 @@ def test_issue1506():
def string_generator(): def string_generator():
for _ in range(10001): for _ in range(10001):
yield "It's sentence produced by that bug." yield u"It's sentence produced by that bug."
for _ in range(10001): for _ in range(10001):
yield "I erase lemmas." yield u"I erase some hbdsaj lemmas."
for _ in range(10001): for _ in range(10001):
yield "It's sentence produced by that bug." yield u"I erase lemmas."
for _ in range(10001):
yield u"It's sentence produced by that bug."
for _ in range(10001):
yield u"It's sentence produced by that bug."
for i, d in enumerate(nlp.pipe(string_generator())):
# We should run cleanup more than one time to actually cleanup data.
# In first run — clean up only mark strings as «not hitted».
if i == 10000 or i == 20000 or i == 30000:
gc.collect()
for d in nlp.pipe(string_generator()):
for t in d: for t in d:
str(t.lemma_) str(t.lemma_)

View File

@ -0,0 +1,8 @@
# coding: utf8
from __future__ import unicode_literals
def test_issue1612(en_tokenizer):
doc = en_tokenizer('The black cat purrs.')
span = doc[1: 3]
assert span.orth_ == span.text

View File

@ -0,0 +1,23 @@
# coding: utf8
from __future__ import unicode_literals
import pytest
from ...language import Language
from ...vocab import Vocab
def test_issue1654():
nlp = Language(Vocab())
assert not nlp.pipeline
nlp.add_pipe(lambda doc: doc, name='1')
nlp.add_pipe(lambda doc: doc, name='2', after='1')
nlp.add_pipe(lambda doc: doc, name='3', after='2')
assert nlp.pipe_names == ['1', '2', '3']
nlp2 = Language(Vocab())
assert not nlp2.pipeline
nlp2.add_pipe(lambda doc: doc, name='3')
nlp2.add_pipe(lambda doc: doc, name='2', before='3')
nlp2.add_pipe(lambda doc: doc, name='1', before='2')
assert nlp2.pipe_names == ['1', '2', '3']

View File

@ -1,4 +1,6 @@
# coding: utf8
from __future__ import unicode_literals from __future__ import unicode_literals
import json import json
import random import random
import contextlib import contextlib
@ -6,7 +8,7 @@ import shutil
import pytest import pytest
import tempfile import tempfile
from pathlib import Path from pathlib import Path
from thinc.neural.optimizers import Adam
from ...gold import GoldParse from ...gold import GoldParse
from ...pipeline import EntityRecognizer from ...pipeline import EntityRecognizer

View File

@ -48,15 +48,17 @@ def test_displacy_parse_deps(en_vocab):
"""Test that deps and tags on a Doc are converted into displaCy's format.""" """Test that deps and tags on a Doc are converted into displaCy's format."""
words = ["This", "is", "a", "sentence"] words = ["This", "is", "a", "sentence"]
heads = [1, 0, 1, -2] heads = [1, 0, 1, -2]
pos = ['DET', 'VERB', 'DET', 'NOUN']
tags = ['DT', 'VBZ', 'DT', 'NN'] tags = ['DT', 'VBZ', 'DT', 'NN']
deps = ['nsubj', 'ROOT', 'det', 'attr'] deps = ['nsubj', 'ROOT', 'det', 'attr']
doc = get_doc(en_vocab, words=words, heads=heads, tags=tags, deps=deps) doc = get_doc(en_vocab, words=words, heads=heads, pos=pos, tags=tags,
deps=deps)
deps = parse_deps(doc) deps = parse_deps(doc)
assert isinstance(deps, dict) assert isinstance(deps, dict)
assert deps['words'] == [{'text': 'This', 'tag': 'DT'}, assert deps['words'] == [{'text': 'This', 'tag': 'DET'},
{'text': 'is', 'tag': 'VBZ'}, {'text': 'is', 'tag': 'VERB'},
{'text': 'a', 'tag': 'DT'}, {'text': 'a', 'tag': 'DET'},
{'text': 'sentence', 'tag': 'NN'}] {'text': 'sentence', 'tag': 'NOUN'}]
assert deps['arcs'] == [{'start': 0, 'end': 1, 'label': 'nsubj', 'dir': 'left'}, assert deps['arcs'] == [{'start': 0, 'end': 1, 'label': 'nsubj', 'dir': 'left'},
{'start': 2, 'end': 3, 'label': 'det', 'dir': 'left'}, {'start': 2, 'end': 3, 'label': 'det', 'dir': 'left'},
{'start': 1, 'end': 3, 'label': 'attr', 'dir': 'right'}] {'start': 1, 'end': 3, 'label': 'attr', 'dir': 'right'}]

View File

@ -133,6 +133,10 @@ cdef class Tokenizer:
for text in texts: for text in texts:
yield self(text) yield self(text)
def _reset_cache(self, keys):
for k in keys:
del self._cache[k]
cdef int _try_cache(self, hash_t key, Doc tokens) except -1: cdef int _try_cache(self, hash_t key, Doc tokens) except -1:
cached = <_Cached*>self._cache.get(key) cached = <_Cached*>self._cache.get(key)
if cached == NULL: if cached == NULL:
@ -238,14 +242,17 @@ cdef class Tokenizer:
# let's say we have dyn-o-mite-dave - the regex finds the # let's say we have dyn-o-mite-dave - the regex finds the
# start and end positions of the hyphens # start and end positions of the hyphens
start = 0 start = 0
start_before_infixes = start
for match in matches: for match in matches:
infix_start = match.start() infix_start = match.start()
infix_end = match.end() infix_end = match.end()
if infix_start == start:
if infix_start == start_before_infixes:
continue continue
span = string[start:infix_start] if infix_start != start:
tokens.push_back(self.vocab.get(tokens.mem, span), False) span = string[start:infix_start]
tokens.push_back(self.vocab.get(tokens.mem, span), False)
if infix_start != infix_end: if infix_start != infix_end:
# If infix_start != infix_end, it means the infix # If infix_start != infix_end, it means the infix

View File

@ -1,6 +1,7 @@
# coding: utf8 # coding: utf8
# cython: infer_types=True # cython: infer_types=True
# cython: bounds_check=False # cython: bounds_check=False
# cython: profile=True
from __future__ import unicode_literals from __future__ import unicode_literals
cimport cython cimport cython
@ -543,8 +544,6 @@ cdef class Doc:
assert t.lex.orth != 0 assert t.lex.orth != 0
t.spacy = has_space t.spacy = has_space
self.length += 1 self.length += 1
# Set morphological attributes, e.g. by lemma, if possible
self.vocab.morphology.assign_untagged(t)
return t.idx + t.lex.length + t.spacy return t.idx + t.lex.length + t.spacy
@cython.boundscheck(False) @cython.boundscheck(False)
@ -569,7 +568,6 @@ cdef class Doc:
""" """
cdef int i, j cdef int i, j
cdef attr_id_t feature cdef attr_id_t feature
cdef np.ndarray[attr_t, ndim=1] attr_ids
cdef np.ndarray[attr_t, ndim=2] output cdef np.ndarray[attr_t, ndim=2] output
# Handle scalar/list inputs of strings/ints for py_attr_ids # Handle scalar/list inputs of strings/ints for py_attr_ids
if not hasattr(py_attr_ids, '__iter__') \ if not hasattr(py_attr_ids, '__iter__') \
@ -581,12 +579,17 @@ cdef class Doc:
for id_ in py_attr_ids] for id_ in py_attr_ids]
# Make an array from the attributes --- otherwise our inner loop is # Make an array from the attributes --- otherwise our inner loop is
# Python dict iteration. # Python dict iteration.
attr_ids = numpy.asarray(py_attr_ids, dtype=numpy.uint64) cdef np.ndarray attr_ids = numpy.asarray(py_attr_ids, dtype='i')
output = numpy.ndarray(shape=(self.length, len(attr_ids)), output = numpy.ndarray(shape=(self.length, len(attr_ids)),
dtype=numpy.uint64) dtype=numpy.uint64)
c_output = <attr_t*>output.data
c_attr_ids = <attr_id_t*>attr_ids.data
cdef TokenC* token
cdef int nr_attr = attr_ids.shape[0]
for i in range(self.length): for i in range(self.length):
for j, feature in enumerate(attr_ids): token = &self.c[i]
output[i, j] = get_token_attr(&self.c[i], feature) for j in range(nr_attr):
c_output[i*nr_attr + j] = get_token_attr(token, c_attr_ids[j])
# Handle 1d case # Handle 1d case
return output if len(attr_ids) >= 2 else output.reshape((self.length,)) return output if len(attr_ids) >= 2 else output.reshape((self.length,))

View File

@ -370,7 +370,7 @@ cdef class Span:
spans = [] spans = []
cdef attr_t label cdef attr_t label
for start, end, label in self.doc.noun_chunks_iterator(self): for start, end, label in self.doc.noun_chunks_iterator(self):
spans.append(Span(self, start, end, label=label)) spans.append(Span(self.doc, start, end, label=label))
for span in spans: for span in spans:
yield span yield span
@ -527,7 +527,7 @@ cdef class Span:
RETURNS (unicode): The span's text.""" RETURNS (unicode): The span's text."""
def __get__(self): def __get__(self):
return ''.join([t.orth_ for t in self]).strip() return self.text
property lemma_: property lemma_:
"""RETURNS (unicode): The span's lemma.""" """RETURNS (unicode): The span's lemma."""

View File

@ -257,7 +257,11 @@ cdef class Token:
inflectional suffixes. inflectional suffixes.
""" """
def __get__(self): def __get__(self):
return self.c.lemma if self.c.lemma == 0:
lemma = self.vocab.morphology.lemmatizer.lookup(self.orth_)
return lemma
else:
return self.c.lemma
def __set__(self, attr_t lemma): def __set__(self, attr_t lemma):
self.c.lemma = lemma self.c.lemma = lemma
@ -724,7 +728,10 @@ cdef class Token:
with no inflectional suffixes. with no inflectional suffixes.
""" """
def __get__(self): def __get__(self):
return self.vocab.strings[self.c.lemma] if self.c.lemma == 0:
return self.vocab.morphology.lemmatizer.lookup(self.orth_)
else:
return self.vocab.strings[self.c.lemma]
def __set__(self, unicode lemma_): def __set__(self, unicode lemma_):
self.c.lemma = self.vocab.strings.add(lemma_) self.c.lemma = self.vocab.strings.add(lemma_)

View File

@ -467,6 +467,13 @@ cdef class Vocab:
self._by_orth.set(lexeme.orth, lexeme) self._by_orth.set(lexeme.orth, lexeme)
self.length += 1 self.length += 1
def _reset_cache(self, keys, strings):
for k in keys:
del self._by_hash[k]
if len(strings) != 0:
self._by_orth = PreshMap()
def pickle_vocab(vocab): def pickle_vocab(vocab):
sstore = vocab.strings sstore = vocab.strings

View File

@ -110,7 +110,7 @@ p
| information about your installation, models and local setup from within | information about your installation, models and local setup from within
| spaCy. To get the model meta data as a dictionary instead, you can | spaCy. To get the model meta data as a dictionary instead, you can
| use the #[code meta] attribute on your #[code nlp] object with a | use the #[code meta] attribute on your #[code nlp] object with a
| loaded model, e.g. #[code nlp['meta']]. | loaded model, e.g. #[code nlp.meta].
+aside-code("Example"). +aside-code("Example").
spacy.info() spacy.info()

View File

@ -123,7 +123,7 @@ p
p p
| Returns a list of unicode strings, describing the tags. Each tag string | Returns a list of unicode strings, describing the tags. Each tag string
| will be of the form either #[code ""], #[code "O"] or | will be of the form of either #[code ""], #[code "O"] or
| #[code "{action}-{label}"], where action is one of #[code "B"], | #[code "{action}-{label}"], where action is one of #[code "B"],
| #[code "I"], #[code "L"], #[code "U"]. The string #[code &quot;-&quot;] | #[code "I"], #[code "L"], #[code "U"]. The string #[code &quot;-&quot;]
| is used where the entity offsets don't align with the tokenization in the | is used where the entity offsets don't align with the tokenization in the
@ -135,9 +135,9 @@ p
+aside-code("Example"). +aside-code("Example").
from spacy.gold import biluo_tags_from_offsets from spacy.gold import biluo_tags_from_offsets
text = 'I like London.'
entities = [(len('I like '), len('I like London'), 'LOC')] doc = nlp('I like London.')
doc = tokenizer(text) entities = [(7, 13, 'LOC')]
tags = biluo_tags_from_offsets(doc, entities) tags = biluo_tags_from_offsets(doc, entities)
assert tags == ['O', 'O', 'U-LOC', 'O'] assert tags == ['O', 'O', 'U-LOC', 'O']
@ -163,5 +163,3 @@ p
+cell +cell
| Unicode strings, describing the | Unicode strings, describing the
| #[+a("/api/annotation#biluo") BILUO] tags. | #[+a("/api/annotation#biluo") BILUO] tags.

View File

@ -16,7 +16,7 @@ Prism.languages.json={property:/".*?"(?=\s*:)/gi,string:/"(?!:)(\\?[^"])*?"(?!:)
!function(a){var e=/\\([^a-z()[\]]|[a-z\*]+)/i,n={"equation-command":{pattern:e,alias:"regex"}};a.languages.latex={comment:/%.*/m,cdata:{pattern:/(\\begin\{((?:verbatim|lstlisting)\*?)\})([\w\W]*?)(?=\\end\{\2\})/,lookbehind:!0},equation:[{pattern:/\$(?:\\?[\w\W])*?\$|\\\((?:\\?[\w\W])*?\\\)|\\\[(?:\\?[\w\W])*?\\\]/,inside:n,alias:"string"},{pattern:/(\\begin\{((?:equation|math|eqnarray|align|multline|gather)\*?)\})([\w\W]*?)(?=\\end\{\2\})/,lookbehind:!0,inside:n,alias:"string"}],keyword:{pattern:/(\\(?:begin|end|ref|cite|label|usepackage|documentclass)(?:\[[^\]]+\])?\{)[^}]+(?=\})/,lookbehind:!0},url:{pattern:/(\\url\{)[^}]+(?=\})/,lookbehind:!0},headline:{pattern:/(\\(?:part|chapter|section|subsection|frametitle|subsubsection|paragraph|subparagraph|subsubparagraph|subsubsubparagraph)\*?(?:\[[^\]]+\])?\{)[^}]+(?=\}(?:\[[^\]]+\])?)/,lookbehind:!0,alias:"class-name"},"function":{pattern:e,alias:"selector"},punctuation:/[[\]{}&]/}}(Prism); !function(a){var e=/\\([^a-z()[\]]|[a-z\*]+)/i,n={"equation-command":{pattern:e,alias:"regex"}};a.languages.latex={comment:/%.*/m,cdata:{pattern:/(\\begin\{((?:verbatim|lstlisting)\*?)\})([\w\W]*?)(?=\\end\{\2\})/,lookbehind:!0},equation:[{pattern:/\$(?:\\?[\w\W])*?\$|\\\((?:\\?[\w\W])*?\\\)|\\\[(?:\\?[\w\W])*?\\\]/,inside:n,alias:"string"},{pattern:/(\\begin\{((?:equation|math|eqnarray|align|multline|gather)\*?)\})([\w\W]*?)(?=\\end\{\2\})/,lookbehind:!0,inside:n,alias:"string"}],keyword:{pattern:/(\\(?:begin|end|ref|cite|label|usepackage|documentclass)(?:\[[^\]]+\])?\{)[^}]+(?=\})/,lookbehind:!0},url:{pattern:/(\\url\{)[^}]+(?=\})/,lookbehind:!0},headline:{pattern:/(\\(?:part|chapter|section|subsection|frametitle|subsubsection|paragraph|subparagraph|subsubparagraph|subsubsubparagraph)\*?(?:\[[^\]]+\])?\{)[^}]+(?=\}(?:\[[^\]]+\])?)/,lookbehind:!0,alias:"class-name"},"function":{pattern:e,alias:"selector"},punctuation:/[[\]{}&]/}}(Prism);
Prism.languages.makefile={comment:{pattern:/(^|[^\\])#(?:\\(?:\r\n|[\s\S])|.)*/,lookbehind:!0},string:/(["'])(?:\\(?:\r\n|[\s\S])|(?!\1)[^\\\r\n])*\1/,builtin:/\.[A-Z][^:#=\s]+(?=\s*:(?!=))/,symbol:{pattern:/^[^:=\r\n]+(?=\s*:(?!=))/m,inside:{variable:/\$+(?:[^(){}:#=\s]+|(?=[({]))/}},variable:/\$+(?:[^(){}:#=\s]+|\([@*%<^+?][DF]\)|(?=[({]))/,keyword:[/-include\b|\b(?:define|else|endef|endif|export|ifn?def|ifn?eq|include|override|private|sinclude|undefine|unexport|vpath)\b/,{pattern:/(\()(?:addsuffix|abspath|and|basename|call|dir|error|eval|file|filter(?:-out)?|findstring|firstword|flavor|foreach|guile|if|info|join|lastword|load|notdir|or|origin|patsubst|realpath|shell|sort|strip|subst|suffix|value|warning|wildcard|word(?:s|list)?)(?=[ \t])/,lookbehind:!0}],operator:/(?:::|[?:+!])?=|[|@]/,punctuation:/[:;(){}]/}; Prism.languages.makefile={comment:{pattern:/(^|[^\\])#(?:\\(?:\r\n|[\s\S])|.)*/,lookbehind:!0},string:/(["'])(?:\\(?:\r\n|[\s\S])|(?!\1)[^\\\r\n])*\1/,builtin:/\.[A-Z][^:#=\s]+(?=\s*:(?!=))/,symbol:{pattern:/^[^:=\r\n]+(?=\s*:(?!=))/m,inside:{variable:/\$+(?:[^(){}:#=\s]+|(?=[({]))/}},variable:/\$+(?:[^(){}:#=\s]+|\([@*%<^+?][DF]\)|(?=[({]))/,keyword:[/-include\b|\b(?:define|else|endef|endif|export|ifn?def|ifn?eq|include|override|private|sinclude|undefine|unexport|vpath)\b/,{pattern:/(\()(?:addsuffix|abspath|and|basename|call|dir|error|eval|file|filter(?:-out)?|findstring|firstword|flavor|foreach|guile|if|info|join|lastword|load|notdir|or|origin|patsubst|realpath|shell|sort|strip|subst|suffix|value|warning|wildcard|word(?:s|list)?)(?=[ \t])/,lookbehind:!0}],operator:/(?:::|[?:+!])?=|[|@]/,punctuation:/[:;(){}]/};
Prism.languages.markdown=Prism.languages.extend("markup",{}),Prism.languages.insertBefore("markdown","prolog",{blockquote:{pattern:/^>(?:[\t ]*>)*/m,alias:"punctuation"},code:[{pattern:/^(?: {4}|\t).+/m,alias:"keyword"},{pattern:/``.+?``|`[^`\n]+`/,alias:"keyword"}],title:[{pattern:/\w+.*(?:\r?\n|\r)(?:==+|--+)/,alias:"important",inside:{punctuation:/==+$|--+$/}},{pattern:/(^\s*)#+.+/m,lookbehind:!0,alias:"important",inside:{punctuation:/^#+|#+$/}}],hr:{pattern:/(^\s*)([*-])([\t ]*\2){2,}(?=\s*$)/m,lookbehind:!0,alias:"punctuation"},list:{pattern:/(^\s*)(?:[*+-]|\d+\.)(?=[\t ].)/m,lookbehind:!0,alias:"punctuation"},"url-reference":{pattern:/!?\[[^\]]+\]:[\t ]+(?:\S+|<(?:\\.|[^>\\])+>)(?:[\t ]+(?:"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\((?:\\.|[^)\\])*\)))?/,inside:{variable:{pattern:/^(!?\[)[^\]]+/,lookbehind:!0},string:/(?:"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\((?:\\.|[^)\\])*\))$/,punctuation:/^[\[\]!:]|[<>]/},alias:"url"},bold:{pattern:/(^|[^\\])(\*\*|__)(?:(?:\r?\n|\r)(?!\r?\n|\r)|.)+?\2/,lookbehind:!0,inside:{punctuation:/^\*\*|^__|\*\*$|__$/}},italic:{pattern:/(^|[^\\])([*_])(?:(?:\r?\n|\r)(?!\r?\n|\r)|.)+?\2/,lookbehind:!0,inside:{punctuation:/^[*_]|[*_]$/}},url:{pattern:/!?\[[^\]]+\](?:\([^\s)]+(?:[\t ]+"(?:\\.|[^"\\])*")?\)| ?\[[^\]\n]*\])/,inside:{variable:{pattern:/(!?\[)[^\]]+(?=\]$)/,lookbehind:!0},string:{pattern:/"(?:\\.|[^"\\])*"(?=\)$)/}}}}),Prism.languages.markdown.bold.inside.url=Prism.util.clone(Prism.languages.markdown.url),Prism.languages.markdown.italic.inside.url=Prism.util.clone(Prism.languages.markdown.url),Prism.languages.markdown.bold.inside.italic=Prism.util.clone(Prism.languages.markdown.italic),Prism.languages.markdown.italic.inside.bold=Prism.util.clone(Prism.languages.markdown.bold); Prism.languages.markdown=Prism.languages.extend("markup",{}),Prism.languages.insertBefore("markdown","prolog",{blockquote:{pattern:/^>(?:[\t ]*>)*/m,alias:"punctuation"},code:[{pattern:/^(?: {4}|\t).+/m,alias:"keyword"},{pattern:/``.+?``|`[^`\n]+`/,alias:"keyword"}],title:[{pattern:/\w+.*(?:\r?\n|\r)(?:==+|--+)/,alias:"important",inside:{punctuation:/==+$|--+$/}},{pattern:/(^\s*)#+.+/m,lookbehind:!0,alias:"important",inside:{punctuation:/^#+|#+$/}}],hr:{pattern:/(^\s*)([*-])([\t ]*\2){2,}(?=\s*$)/m,lookbehind:!0,alias:"punctuation"},list:{pattern:/(^\s*)(?:[*+-]|\d+\.)(?=[\t ].)/m,lookbehind:!0,alias:"punctuation"},"url-reference":{pattern:/!?\[[^\]]+\]:[\t ]+(?:\S+|<(?:\\.|[^>\\])+>)(?:[\t ]+(?:"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\((?:\\.|[^)\\])*\)))?/,inside:{variable:{pattern:/^(!?\[)[^\]]+/,lookbehind:!0},string:/(?:"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\((?:\\.|[^)\\])*\))$/,punctuation:/^[\[\]!:]|[<>]/},alias:"url"},bold:{pattern:/(^|[^\\])(\*\*|__)(?:(?:\r?\n|\r)(?!\r?\n|\r)|.)+?\2/,lookbehind:!0,inside:{punctuation:/^\*\*|^__|\*\*$|__$/}},italic:{pattern:/(^|[^\\])([*_])(?:(?:\r?\n|\r)(?!\r?\n|\r)|.)+?\2/,lookbehind:!0,inside:{punctuation:/^[*_]|[*_]$/}},url:{pattern:/!?\[[^\]]+\](?:\([^\s)]+(?:[\t ]+"(?:\\.|[^"\\])*")?\)| ?\[[^\]\n]*\])/,inside:{variable:{pattern:/(!?\[)[^\]]+(?=\]$)/,lookbehind:!0},string:{pattern:/"(?:\\.|[^"\\])*"(?=\)$)/}}}}),Prism.languages.markdown.bold.inside.url=Prism.util.clone(Prism.languages.markdown.url),Prism.languages.markdown.italic.inside.url=Prism.util.clone(Prism.languages.markdown.url),Prism.languages.markdown.bold.inside.italic=Prism.util.clone(Prism.languages.markdown.italic),Prism.languages.markdown.italic.inside.bold=Prism.util.clone(Prism.languages.markdown.bold);
Prism.languages.python={"triple-quoted-string":{pattern:/"""[\s\S]+?"""|'''[\s\S]+?'''/,alias:"string"},comment:{pattern:/(^|[^\\])#.*/,lookbehind:!0},string:/("|')(?:\\?.)*?\1/,"function":{pattern:/((?:^|\s)def[ \t]+)[a-zA-Z_][a-zA-Z0-9_]*(?=\()/g,lookbehind:!0},"class-name":{pattern:/(\bclass\s+)[a-z0-9_]+/i,lookbehind:!0},keyword:/\b(?:as|assert|async|await|break|class|continue|def|del|elif|else|except|exec|finally|for|from|global|if|import|in|is|lambda|pass|print|raise|return|try|while|with|yield)\b/,"boolean":/\b(?:True|False)\b/,number:/\b-?(?:0[bo])?(?:(?:\d|0x[\da-f])[\da-f]*\.?\d*|\.\d+)(?:e[+-]?\d+)?j?\b/i,operator:/[-+%=]=?|!=|\*\*?=?|\/\/?=?|<[<=>]?|>[=>]?|[&|^~]|\b(?:or|and|not)\b/,punctuation:/[{}[\];(),.:]/,"constant":/\b[A-Z_]{2,}\b/}; Prism.languages.python={"triple-quoted-string":{pattern:/"""[\s\S]+?"""|'''[\s\S]+?'''/,alias:"string"},comment:{pattern:/(^|[^\\])#.*/,lookbehind:!0},string:/("|')(?:\\?.)*?\1/,"function":{pattern:/((?:^|\s)def[ \t]+)[a-zA-Z_][a-zA-Z0-9_]*(?=\()/g,lookbehind:!0},"class-name":{pattern:/(\bclass\s+)[a-z0-9_]+/i,lookbehind:!0},keyword:/\b(?:as|assert|async|await|break|class|continue|def|del|elif|else|except|exec|finally|for|from|global|if|import|in|is|lambda|pass|print|raise|return|try|while|with|yield)\b/,"boolean":/\b(?:True|False|None)\b/,number:/\b-?(?:0[bo])?(?:(?:\d|0x[\da-f])[\da-f]*\.?\d*|\.\d+)(?:e[+-]?\d+)?j?\b/i,operator:/[-+%=]=?|!=|\*\*?=?|\/\/?=?|<[<=>]?|>[=>]?|[&|^~]|\b(?:or|and|not)\b/,punctuation:/[{}[\];(),.:]/,"constant":/\b[A-Z_]{2,}\b/};
Prism.languages.rest={table:[{pattern:/(\s*)(?:\+[=-]+)+\+(?:\r?\n|\r)(?:\1(?:[+|].+)+[+|](?:\r?\n|\r))+\1(?:\+[=-]+)+\+/,lookbehind:!0,inside:{punctuation:/\||(?:\+[=-]+)+\+/}},{pattern:/(\s*)(?:=+ +)+=+((?:\r?\n|\r)\1.+)+(?:\r?\n|\r)\1(?:=+ +)+=+(?=(?:\r?\n|\r){2}|\s*$)/,lookbehind:!0,inside:{punctuation:/[=-]+/}}],"substitution-def":{pattern:/(^\s*\.\. )\|(?:[^|\s](?:[^|]*[^|\s])?)\| [^:]+::/m,lookbehind:!0,inside:{substitution:{pattern:/^\|(?:[^|\s]|[^|\s][^|]*[^|\s])\|/,alias:"attr-value",inside:{punctuation:/^\||\|$/}},directive:{pattern:/( +)[^:]+::/,lookbehind:!0,alias:"function",inside:{punctuation:/::$/}}}},"link-target":[{pattern:/(^\s*\.\. )\[[^\]]+\]/m,lookbehind:!0,alias:"string",inside:{punctuation:/^\[|\]$/}},{pattern:/(^\s*\.\. )_(?:`[^`]+`|(?:[^:\\]|\\.)+):/m,lookbehind:!0,alias:"string",inside:{punctuation:/^_|:$/}}],directive:{pattern:/(^\s*\.\. )[^:]+::/m,lookbehind:!0,alias:"function",inside:{punctuation:/::$/}},comment:{pattern:/(^\s*\.\.)(?:(?: .+)?(?:(?:\r?\n|\r).+)+| .+)(?=(?:\r?\n|\r){2}|$)/m,lookbehind:!0},title:[{pattern:/^(([!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~])\2+)(?:\r?\n|\r).+(?:\r?\n|\r)\1$/m,inside:{punctuation:/^[!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~]+|[!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~]+$/,important:/.+/}},{pattern:/(^|(?:\r?\n|\r){2}).+(?:\r?\n|\r)([!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~])\2+(?=\r?\n|\r|$)/,lookbehind:!0,inside:{punctuation:/[!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~]+$/,important:/.+/}}],hr:{pattern:/((?:\r?\n|\r){2})([!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~])\2{3,}(?=(?:\r?\n|\r){2})/,lookbehind:!0,alias:"punctuation"},field:{pattern:/(^\s*):[^:\r\n]+:(?= )/m,lookbehind:!0,alias:"attr-name"},"command-line-option":{pattern:/(^\s*)(?:[+-][a-z\d]|(?:\-\-|\/)[a-z\d-]+)(?:[ =](?:[a-z][a-z\d_-]*|<[^<>]+>))?(?:, (?:[+-][a-z\d]|(?:\-\-|\/)[a-z\d-]+)(?:[ =](?:[a-z][a-z\d_-]*|<[^<>]+>))?)*(?=(?:\r?\n|\r)? {2,}\S)/im,lookbehind:!0,alias:"symbol"},"literal-block":{pattern:/::(?:\r?\n|\r){2}([ \t]+).+(?:(?:\r?\n|\r)\1.+)*/,inside:{"literal-block-punctuation":{pattern:/^::/,alias:"punctuation"}}},"quoted-literal-block":{pattern:/::(?:\r?\n|\r){2}([!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~]).*(?:(?:\r?\n|\r)\1.*)*/,inside:{"literal-block-punctuation":{pattern:/^(?:::|([!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~])\1*)/m,alias:"punctuation"}}},"list-bullet":{pattern:/(^\s*)(?:[*+\-•‣⁃]|\(?(?:\d+|[a-z]|[ivxdclm]+)\)|(?:\d+|[a-z]|[ivxdclm]+)\.)(?= )/im,lookbehind:!0,alias:"punctuation"},"doctest-block":{pattern:/(^\s*)>>> .+(?:(?:\r?\n|\r).+)*/m,lookbehind:!0,inside:{punctuation:/^>>>/}},inline:[{pattern:/(^|[\s\-:\/'"<(\[{])(?::[^:]+:`.*?`|`.*?`:[^:]+:|(\*\*?|``?|\|)(?!\s).*?[^\s]\2(?=[\s\-.,:;!?\\\/'")\]}]|$))/m,lookbehind:!0,inside:{bold:{pattern:/(^\*\*).+(?=\*\*$)/,lookbehind:!0},italic:{pattern:/(^\*).+(?=\*$)/,lookbehind:!0},"inline-literal":{pattern:/(^``).+(?=``$)/,lookbehind:!0,alias:"symbol"},role:{pattern:/^:[^:]+:|:[^:]+:$/,alias:"function",inside:{punctuation:/^:|:$/}},"interpreted-text":{pattern:/(^`).+(?=`$)/,lookbehind:!0,alias:"attr-value"},substitution:{pattern:/(^\|).+(?=\|$)/,lookbehind:!0,alias:"attr-value"},punctuation:/\*\*?|``?|\|/}}],link:[{pattern:/\[[^\]]+\]_(?=[\s\-.,:;!?\\\/'")\]}]|$)/,alias:"string",inside:{punctuation:/^\[|\]_$/}},{pattern:/(?:\b[a-z\d](?:[_.:+]?[a-z\d]+)*_?_|`[^`]+`_?_|_`[^`]+`)(?=[\s\-.,:;!?\\\/'")\]}]|$)/i,alias:"string",inside:{punctuation:/^_?`|`$|`?_?_$/}}],punctuation:{pattern:/(^\s*)(?:\|(?= |$)|(?:---?|—|\.\.|__)(?= )|\.\.$)/m,lookbehind:!0}}; Prism.languages.rest={table:[{pattern:/(\s*)(?:\+[=-]+)+\+(?:\r?\n|\r)(?:\1(?:[+|].+)+[+|](?:\r?\n|\r))+\1(?:\+[=-]+)+\+/,lookbehind:!0,inside:{punctuation:/\||(?:\+[=-]+)+\+/}},{pattern:/(\s*)(?:=+ +)+=+((?:\r?\n|\r)\1.+)+(?:\r?\n|\r)\1(?:=+ +)+=+(?=(?:\r?\n|\r){2}|\s*$)/,lookbehind:!0,inside:{punctuation:/[=-]+/}}],"substitution-def":{pattern:/(^\s*\.\. )\|(?:[^|\s](?:[^|]*[^|\s])?)\| [^:]+::/m,lookbehind:!0,inside:{substitution:{pattern:/^\|(?:[^|\s]|[^|\s][^|]*[^|\s])\|/,alias:"attr-value",inside:{punctuation:/^\||\|$/}},directive:{pattern:/( +)[^:]+::/,lookbehind:!0,alias:"function",inside:{punctuation:/::$/}}}},"link-target":[{pattern:/(^\s*\.\. )\[[^\]]+\]/m,lookbehind:!0,alias:"string",inside:{punctuation:/^\[|\]$/}},{pattern:/(^\s*\.\. )_(?:`[^`]+`|(?:[^:\\]|\\.)+):/m,lookbehind:!0,alias:"string",inside:{punctuation:/^_|:$/}}],directive:{pattern:/(^\s*\.\. )[^:]+::/m,lookbehind:!0,alias:"function",inside:{punctuation:/::$/}},comment:{pattern:/(^\s*\.\.)(?:(?: .+)?(?:(?:\r?\n|\r).+)+| .+)(?=(?:\r?\n|\r){2}|$)/m,lookbehind:!0},title:[{pattern:/^(([!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~])\2+)(?:\r?\n|\r).+(?:\r?\n|\r)\1$/m,inside:{punctuation:/^[!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~]+|[!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~]+$/,important:/.+/}},{pattern:/(^|(?:\r?\n|\r){2}).+(?:\r?\n|\r)([!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~])\2+(?=\r?\n|\r|$)/,lookbehind:!0,inside:{punctuation:/[!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~]+$/,important:/.+/}}],hr:{pattern:/((?:\r?\n|\r){2})([!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~])\2{3,}(?=(?:\r?\n|\r){2})/,lookbehind:!0,alias:"punctuation"},field:{pattern:/(^\s*):[^:\r\n]+:(?= )/m,lookbehind:!0,alias:"attr-name"},"command-line-option":{pattern:/(^\s*)(?:[+-][a-z\d]|(?:\-\-|\/)[a-z\d-]+)(?:[ =](?:[a-z][a-z\d_-]*|<[^<>]+>))?(?:, (?:[+-][a-z\d]|(?:\-\-|\/)[a-z\d-]+)(?:[ =](?:[a-z][a-z\d_-]*|<[^<>]+>))?)*(?=(?:\r?\n|\r)? {2,}\S)/im,lookbehind:!0,alias:"symbol"},"literal-block":{pattern:/::(?:\r?\n|\r){2}([ \t]+).+(?:(?:\r?\n|\r)\1.+)*/,inside:{"literal-block-punctuation":{pattern:/^::/,alias:"punctuation"}}},"quoted-literal-block":{pattern:/::(?:\r?\n|\r){2}([!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~]).*(?:(?:\r?\n|\r)\1.*)*/,inside:{"literal-block-punctuation":{pattern:/^(?:::|([!"#$%&'()*+,\-.\/:;<=>?@\[\\\]^_`{|}~])\1*)/m,alias:"punctuation"}}},"list-bullet":{pattern:/(^\s*)(?:[*+\-•‣⁃]|\(?(?:\d+|[a-z]|[ivxdclm]+)\)|(?:\d+|[a-z]|[ivxdclm]+)\.)(?= )/im,lookbehind:!0,alias:"punctuation"},"doctest-block":{pattern:/(^\s*)>>> .+(?:(?:\r?\n|\r).+)*/m,lookbehind:!0,inside:{punctuation:/^>>>/}},inline:[{pattern:/(^|[\s\-:\/'"<(\[{])(?::[^:]+:`.*?`|`.*?`:[^:]+:|(\*\*?|``?|\|)(?!\s).*?[^\s]\2(?=[\s\-.,:;!?\\\/'")\]}]|$))/m,lookbehind:!0,inside:{bold:{pattern:/(^\*\*).+(?=\*\*$)/,lookbehind:!0},italic:{pattern:/(^\*).+(?=\*$)/,lookbehind:!0},"inline-literal":{pattern:/(^``).+(?=``$)/,lookbehind:!0,alias:"symbol"},role:{pattern:/^:[^:]+:|:[^:]+:$/,alias:"function",inside:{punctuation:/^:|:$/}},"interpreted-text":{pattern:/(^`).+(?=`$)/,lookbehind:!0,alias:"attr-value"},substitution:{pattern:/(^\|).+(?=\|$)/,lookbehind:!0,alias:"attr-value"},punctuation:/\*\*?|``?|\|/}}],link:[{pattern:/\[[^\]]+\]_(?=[\s\-.,:;!?\\\/'")\]}]|$)/,alias:"string",inside:{punctuation:/^\[|\]_$/}},{pattern:/(?:\b[a-z\d](?:[_.:+]?[a-z\d]+)*_?_|`[^`]+`_?_|_`[^`]+`)(?=[\s\-.,:;!?\\\/'")\]}]|$)/i,alias:"string",inside:{punctuation:/^_?`|`$|`?_?_$/}}],punctuation:{pattern:/(^\s*)(?:\|(?= |$)|(?:---?|—|\.\.|__)(?= )|\.\.$)/m,lookbehind:!0}};
!function(e){e.languages.sass=e.languages.extend("css",{comment:{pattern:/^([ \t]*)\/[\/*].*(?:(?:\r?\n|\r)\1[ \t]+.+)*/m,lookbehind:!0}}),e.languages.insertBefore("sass","atrule",{"atrule-line":{pattern:/^(?:[ \t]*)[@+=].+/m,inside:{atrule:/(?:@[\w-]+|[+=])/m}}}),delete e.languages.sass.atrule;var a=/((\$[-_\w]+)|(#\{\$[-_\w]+\}))/i,t=[/[+*\/%]|[=!]=|<=?|>=?|\b(?:and|or|not)\b/,{pattern:/(\s+)-(?=\s)/,lookbehind:!0}];e.languages.insertBefore("sass","property",{"variable-line":{pattern:/^[ \t]*\$.+/m,inside:{punctuation:/:/,variable:a,operator:t}},"property-line":{pattern:/^[ \t]*(?:[^:\s]+ *:.*|:[^:\s]+.*)/m,inside:{property:[/[^:\s]+(?=\s*:)/,{pattern:/(:)[^:\s]+/,lookbehind:!0}],punctuation:/:/,variable:a,operator:t,important:e.languages.sass.important}}}),delete e.languages.sass.property,delete e.languages.sass.important,delete e.languages.sass.selector,e.languages.insertBefore("sass","punctuation",{selector:{pattern:/([ \t]*)\S(?:,?[^,\r\n]+)*(?:,(?:\r?\n|\r)\1[ \t]+\S(?:,?[^,\r\n]+)*)*/,lookbehind:!0}})}(Prism); !function(e){e.languages.sass=e.languages.extend("css",{comment:{pattern:/^([ \t]*)\/[\/*].*(?:(?:\r?\n|\r)\1[ \t]+.+)*/m,lookbehind:!0}}),e.languages.insertBefore("sass","atrule",{"atrule-line":{pattern:/^(?:[ \t]*)[@+=].+/m,inside:{atrule:/(?:@[\w-]+|[+=])/m}}}),delete e.languages.sass.atrule;var a=/((\$[-_\w]+)|(#\{\$[-_\w]+\}))/i,t=[/[+*\/%]|[=!]=|<=?|>=?|\b(?:and|or|not)\b/,{pattern:/(\s+)-(?=\s)/,lookbehind:!0}];e.languages.insertBefore("sass","property",{"variable-line":{pattern:/^[ \t]*\$.+/m,inside:{punctuation:/:/,variable:a,operator:t}},"property-line":{pattern:/^[ \t]*(?:[^:\s]+ *:.*|:[^:\s]+.*)/m,inside:{property:[/[^:\s]+(?=\s*:)/,{pattern:/(:)[^:\s]+/,lookbehind:!0}],punctuation:/:/,variable:a,operator:t,important:e.languages.sass.important}}}),delete e.languages.sass.property,delete e.languages.sass.important,delete e.languages.sass.selector,e.languages.insertBefore("sass","punctuation",{selector:{pattern:/([ \t]*)\S(?:,?[^,\r\n]+)*(?:,(?:\r?\n|\r)\1[ \t]+\S(?:,?[^,\r\n]+)*)*/,lookbehind:!0}})}(Prism);
Prism.languages.scss=Prism.languages.extend("css",{comment:{pattern:/(^|[^\\])(?:\/\*[\w\W]*?\*\/|\/\/.*)/,lookbehind:!0},atrule:{pattern:/@[\w-]+(?:\([^()]+\)|[^(])*?(?=\s+[{;])/,inside:{rule:/@[\w-]+/}},url:/(?:[-a-z]+-)*url(?=\()/i,selector:{pattern:/(?=\S)[^@;\{\}\(\)]?([^@;\{\}\(\)]|&|#\{\$[-_\w]+\})+(?=\s*\{(\}|\s|[^\}]+(:|\{)[^\}]+))/m,inside:{placeholder:/%[-_\w]+/}}}),Prism.languages.insertBefore("scss","atrule",{keyword:[/@(?:if|else(?: if)?|for|each|while|import|extend|debug|warn|mixin|include|function|return|content)/i,{pattern:/( +)(?:from|through)(?= )/,lookbehind:!0}]}),Prism.languages.insertBefore("scss","property",{variable:/\$[-_\w]+|#\{\$[-_\w]+\}/}),Prism.languages.insertBefore("scss","function",{placeholder:{pattern:/%[-_\w]+/,alias:"selector"},statement:/\B!(?:default|optional)\b/i,"boolean":/\b(?:true|false)\b/,"null":/\bnull\b/,operator:{pattern:/(\s)(?:[-+*\/%]|[=!]=|<=?|>=?|and|or|not)(?=\s)/,lookbehind:!0}}),Prism.languages.scss.atrule.inside.rest=Prism.util.clone(Prism.languages.scss); Prism.languages.scss=Prism.languages.extend("css",{comment:{pattern:/(^|[^\\])(?:\/\*[\w\W]*?\*\/|\/\/.*)/,lookbehind:!0},atrule:{pattern:/@[\w-]+(?:\([^()]+\)|[^(])*?(?=\s+[{;])/,inside:{rule:/@[\w-]+/}},url:/(?:[-a-z]+-)*url(?=\()/i,selector:{pattern:/(?=\S)[^@;\{\}\(\)]?([^@;\{\}\(\)]|&|#\{\$[-_\w]+\})+(?=\s*\{(\}|\s|[^\}]+(:|\{)[^\}]+))/m,inside:{placeholder:/%[-_\w]+/}}}),Prism.languages.insertBefore("scss","atrule",{keyword:[/@(?:if|else(?: if)?|for|each|while|import|extend|debug|warn|mixin|include|function|return|content)/i,{pattern:/( +)(?:from|through)(?= )/,lookbehind:!0}]}),Prism.languages.insertBefore("scss","property",{variable:/\$[-_\w]+|#\{\$[-_\w]+\}/}),Prism.languages.insertBefore("scss","function",{placeholder:{pattern:/%[-_\w]+/,alias:"selector"},statement:/\B!(?:default|optional)\b/i,"boolean":/\b(?:true|false)\b/,"null":/\bnull\b/,operator:{pattern:/(\s)(?:[-+*\/%]|[=!]=|<=?|>=?|and|or|not)(?=\s)/,lookbehind:!0}}),Prism.languages.scss.atrule.inside.rest=Prism.util.clone(Prism.languages.scss);

View File

@ -60,8 +60,8 @@ include _includes/_mixins
# Load English tokenizer, tagger, parser, NER and word vectors # Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en') nlp = spacy.load('en')
# Process a document, of any size # Process whole documents
text = open('war_and_peace.txt').read() text = open('customer_feedback_627.txt').read()
doc = nlp(text) doc = nlp(text)
# Find named entities, phrases and concepts # Find named entities, phrases and concepts

View File

@ -29,7 +29,7 @@ p
| #[strong Text:] The original noun chunk text.#[br] | #[strong Text:] The original noun chunk text.#[br]
| #[strong Root text:] The original text of the word connecting the noun | #[strong Root text:] The original text of the word connecting the noun
| chunk to the rest of the parse.#[br] | chunk to the rest of the parse.#[br]
| #[strong Root dep:] Dependcy relation connecting the root to its head.#[br] | #[strong Root dep:] Dependency relation connecting the root to its head.#[br]
| #[strong Root head text:] The text of the root token's head.#[br] | #[strong Root head text:] The text of the root token's head.#[br]
+table(["Text", "root.text", "root.dep_", "root.head.text"]) +table(["Text", "root.text", "root.dep_", "root.head.text"])

View File

@ -354,7 +354,8 @@ p
# append mock entity for match in displaCy style to matched_sents # append mock entity for match in displaCy style to matched_sents
# get the match span by ofsetting the start and end of the span with the # get the match span by ofsetting the start and end of the span with the
# start and end of the sentence in the doc # start and end of the sentence in the doc
match_ents = [{'start': span.start-sent.start, 'end': span.end-sent.start, match_ents = [{'start': span.start_char - sent.start_char,
'end': span.end_char - sent.start_char,
'label': 'MATCH'}] 'label': 'MATCH'}]
matched_sents.append({'text': sent.text, 'ents': match_ents }) matched_sents.append({'text': sent.text, 'ents': match_ents })

View File

@ -33,7 +33,7 @@ p
+code("requirements.txt", "text"). +code("requirements.txt", "text").
spacy&gt;=2.0.0,&lt;3.0.0 spacy&gt;=2.0.0,&lt;3.0.0
-e #{gh("spacy-models")}/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#en_core_web_sm #{gh("spacy-models")}/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#en_core_web_sm
p p
| Specifying #[code #egg=] with the package name tells pip | Specifying #[code #egg=] with the package name tells pip

View File

@ -53,11 +53,11 @@ p
class MyComponent(object): class MyComponent(object):
name = 'print_info' name = 'print_info'
def __init__(vocab, short_limit=10): def __init__(self, vocab, short_limit=10):
self.vocab = nlp.vocab self.vocab = vocab
self.short_limit = short_limit self.short_limit = short_limit
def __call__(doc): def __call__(self, doc):
if len(doc) &lt; self.short_limit: if len(doc) &lt; self.short_limit:
print("This is a pretty short document.") print("This is a pretty short document.")
return doc return doc
@ -100,7 +100,10 @@ p
| Set a default value for an attribute, which can be overwritten | Set a default value for an attribute, which can be overwritten
| manually at any time. Attribute extensions work like "normal" | manually at any time. Attribute extensions work like "normal"
| variables and are the quickest way to store arbitrary information | variables and are the quickest way to store arbitrary information
| on a #[code Doc], #[code Span] or #[code Token]. | on a #[code Doc], #[code Span] or #[code Token]. Attribute defaults
| behaves just like argument defaults
| #[+a("http://docs.python-guide.org/en/latest/writing/gotchas/#mutable-default-arguments") in Python functions],
| and should not be used for mutable values like dictionaries or lists.
+code-wrapper +code-wrapper
+code. +code.

View File

@ -36,6 +36,24 @@ p
if token.text in ('apple', 'orange'): if token.text in ('apple', 'orange'):
token._.set('is_fruit', True) token._.set('is_fruit', True)
+item
| When using #[strong mutable values] like dictionaries or lists as
| the #[code default] argument, keep in mind that they behave just like
| mutable default arguments
| #[+a("http://docs.python-guide.org/en/latest/writing/gotchas/#mutable-default-arguments") in Python functions].
| This can easily cause unintended results, like the same value being
| set on #[em all] objects instead of only one particular instance.
| In most cases, it's better to use #[strong getters and setters], and
| only set the #[code default] for boolean or string values.
+code-wrapper
+code-new.
Doc.set_extension('fruits', getter=get_fruits, setter=set_fruits)
+code-old.
Doc.set_extension('fruits', default={})
doc._.fruits['apple'] = u'🍎' # all docs now have {'apple': u'🍎'}
+item +item
| Always add your custom attributes to the #[strong global] #[code Doc] | Always add your custom attributes to the #[strong global] #[code Doc]
| #[code Token] or #[code Span] objects, not a particular instance of | #[code Token] or #[code Span] objects, not a particular instance of

View File

@ -82,7 +82,7 @@ p
unicorn_text = doc.vocab.strings[unicorn_hash] # '🦄 ' unicorn_text = doc.vocab.strings[unicorn_hash] # '🦄 '
+infobox +infobox
| #[+label-inline API:] #[+api("stringstore") #[code stringstore]] | #[+label-inline API:] #[+api("stringstore") #[code StringStore]]
| #[+label-inline Usage:] #[+a("/usage/spacy-101#vocab") Vocab, hashes and lexemes 101] | #[+label-inline Usage:] #[+a("/usage/spacy-101#vocab") Vocab, hashes and lexemes 101]
+h(3, "lightning-tour-entities") Recognise and update named entities +h(3, "lightning-tour-entities") Recognise and update named entities
@ -102,6 +102,28 @@ p
+infobox +infobox
| #[+label-inline Usage:] #[+a("/usage/linguistic-features#named-entities") Named entity recognition] | #[+label-inline Usage:] #[+a("/usage/linguistic-features#named-entities") Named entity recognition]
+h(3, "lightning-tour-training") Train and update neural network models
+tag-model
+code.
import spacy
import random
nlp = spacy.load('en')
train_data = [("Uber blew through $1 million", {'entities': [(0, 4, 'ORG')]})]
with nlp.disable_pipes([pipe for pipe in nlp.pipe_names if pipe != 'ner']):
optimizer = nlp.begin_training()
for i in range(10):
random.shuffle(train_data)
for text, annotations in train_data:
nlp.update([text], [annotations] sgd=optimizer)
nlp.to_disk('/model')
+infobox
| #[+label-inline API:] #[+api("language#update") #[code Language.update]]
| #[+label-inline Usage:] #[+a("/usage/training") Training spaCy&apos;s statistical models]
+h(3, "lightning-tour-displacy") Visualize a dependency parse and named entities in your browser +h(3, "lightning-tour-displacy") Visualize a dependency parse and named entities in your browser
+tag-model("dependency parse", "NER") +tag-model("dependency parse", "NER")
+tag-new(2) +tag-new(2)
@ -183,11 +205,11 @@ p
from spacy.vocab import Vocab from spacy.vocab import Vocab
nlp = spacy.load('en') nlp = spacy.load('en')
moby_dick = open('moby_dick.txt', 'r').read() customer_feedback = open('customer_feedback_627.txt').read()
doc = nlp(moby_dick) doc = nlp(customer_feedback)
doc.to_disk('/moby_dick.bin') doc.to_disk('/tmp/customer_feedback_627.bin')
new_doc = Doc(Vocab()).from_disk('/moby_dick.bin') new_doc = Doc(Vocab()).from_disk('/tmp/customer_feedback_627.bin')
+infobox +infobox
| #[+label-inline API:] #[+api("language") #[code Language]], | #[+label-inline API:] #[+api("language") #[code Language]],
@ -210,7 +232,8 @@ p
pattern2 = [[{'ORTH': emoji, 'OP': '+'}] for emoji in ['😀', '😂', '🤣', '😍']] pattern2 = [[{'ORTH': emoji, 'OP': '+'}] for emoji in ['😀', '😂', '🤣', '😍']]
matcher.add('GoogleIO', None, pattern1) # match "Google I/O" or "Google i/o" matcher.add('GoogleIO', None, pattern1) # match "Google I/O" or "Google i/o"
matcher.add('HAPPY', set_sentiment, *pattern2) # match one or more happy emoji matcher.add('HAPPY', set_sentiment, *pattern2) # match one or more happy emoji
matches = nlp(LOTS_OF TEXT) text = open('customer_feedback_627.txt').read()
matches = nlp(text)
+infobox +infobox
| #[+label-inline API:] #[+api("matcher") #[code Matcher]] | #[+label-inline API:] #[+api("matcher") #[code Matcher]]

View File

@ -113,7 +113,7 @@ p
p p
| Interestingly, "man bites dog" and "man dog bites" are seen as slightly | Interestingly, "man bites dog" and "man dog bites" are seen as slightly
| more similar than "man bites dog" and "dog bites man". This may be a | more similar than "man bites dog" and "dog bites man". This may be a
| conincidence or the result of "man" being interpreted as both sentence's | coincidence or the result of "man" being interpreted as both sentence's
| subject. | subject.
+table +table

View File

@ -22,6 +22,12 @@ include ../_includes/_mixins
+card("textacy", "https://github.com/chartbeat-labs/textacy", "Burton DeWilde", "github") +card("textacy", "https://github.com/chartbeat-labs/textacy", "Burton DeWilde", "github")
| Higher-level NLP built on spaCy. | Higher-level NLP built on spaCy.
+card("mordecai", "https://github.com/openeventdata/mordecai", "Andy Halterman", "github")
| Full text geoparsing using spaCy, Geonames and Keras.
+card("kindred", "https://github.com/jakelever/kindred", "Jake Lever", "github")
| Biomedical relation extraction using spaCy.
+card("spacyr", "https://github.com/kbenoit/spacyr", "Kenneth Benoit", "github") +card("spacyr", "https://github.com/kbenoit/spacyr", "Kenneth Benoit", "github")
| An R wrapper for spaCy. | An R wrapper for spaCy.
@ -55,6 +61,14 @@ include ../_includes/_mixins
| Pipeline component for emoji handling and adding emoji meta data | Pipeline component for emoji handling and adding emoji meta data
| to #[code Doc], #[code Token] and #[code Span] attributes. | to #[code Doc], #[code Token] and #[code Span] attributes.
+card("spacy_hunspell", "https://github.com/tokestermw/spacy_hunspell", "Motoki Wu", "github")
| Add spellchecking and spelling suggestions to your spaCy pipeline
| using Hunspell.
+card("spacy_cld", "https://github.com/nickdavidhaynes/spacy-cld", "Nicholas D Haynes", "github")
| Add language detection to your spaCy pipeline using Compact
| Language Detector 2 via PYCLD2.
.u-text-right .u-text-right
+button("https://github.com/topics/spacy-extension?o=desc&s=stars", false, "primary", "small") See more extensions on GitHub +button("https://github.com/topics/spacy-extension?o=desc&s=stars", false, "primary", "small") See more extensions on GitHub