Merge branch 'develop' of https://github.com/explosion/spaCy into develop

This commit is contained in:
Matthew Honnibal 2017-11-05 15:33:56 +01:00
commit 6e5181bbaa
25 changed files with 6871 additions and 150 deletions

106
.github/contributors/uwol.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Ulrich Wolffgang |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 2017-11-05 |
| GitHub username | uwol |
| Website (optional) | https://uwol.github.io/ |

View File

@ -2,7 +2,10 @@
# Contribute to spaCy # Contribute to spaCy
Following the v1.0 release, it's time to welcome more contributors into the spaCy project and code base 🎉 This page will give you a quick overview of how things are organised and most importantly, how to get involved. Thanks for your interest in contributing to spaCy 🎉 The project is maintained
by [@honnibal](https://github.com/honnibal) and [@ines](https://github.com/ines),
and we'll do our best to help you get started. This page will give you a quick
overview of how things are organised and most importantly, how to get involved.
## Table of contents ## Table of contents
1. [Issues and bug reports](#issues-and-bug-reports) 1. [Issues and bug reports](#issues-and-bug-reports)
@ -10,27 +13,68 @@ Following the v1.0 release, it's time to welcome more contributors into the spaC
3. [Code conventions](#code-conventions) 3. [Code conventions](#code-conventions)
4. [Adding tests](#adding-tests) 4. [Adding tests](#adding-tests)
5. [Updating the website](#updating-the-website) 5. [Updating the website](#updating-the-website)
6. [Submitting a tutorial](#submitting-a-tutorial) 6. [Publishing extensions and plugins](#publishing-spacy-extensions-and-plugins)
7. [Submitting a project to the showcase](#submitting-a-project-to-the-showcase) 7. [Code of conduct](#code-of-conduct)
8. [Code of conduct](#code-of-conduct)
## Issues and bug reports ## Issues and bug reports
First, [do a quick search](https://github.com/issues?q=+is%3Aissue+user%3Aexplosion) to see if the issue has already been reported. If so, it's often better to just leave a comment on an existing issue, rather than creating a new one. First, [do a quick search](https://github.com/issues?q=+is%3Aissue+user%3Aexplosion)
to see if the issue has already been reported. If so, it's often better to just
leave a comment on an existing issue, rather than creating a new one. Old issues
also often include helpful tips and solutions to common problems. You should
also check the [troubleshooting guide](https://alpha.spacy.io/usage/#troubleshooting)
to see if your problem is already listed there.
If you're looking for help with your code, consider posting a question on [StackOverflow](http://stackoverflow.com/questions/tagged/spacy) instead. If you tag it `spacy` and `python`, more people will see it and hopefully be able to help. If you're looking for help with your code, consider posting a question on
[StackOverflow](http://stackoverflow.com/questions/tagged/spacy) instead. If you
tag it `spacy` and `python`, more people will see it and hopefully be able to
help. Please understand that we won't be able to provide individual support via
email. We also believe that help is much more valuable if it's **shared publicly**,
so that more people can benefit from it.
When opening an issue, use a descriptive title and include your environment (operating system, Python version, spaCy version). Our [issue template](https://github.com/explosion/spaCy/issues/new) helps you remember the most important details to include. If you've discovered a bug, you can also submit a [regression test](#fixing-bugs) straight away. When you're opening an issue to report the bug, simply refer to your pull request in the issue body. ### Submitting issues
### Tips When opening an issue, use a **descriptive title** and include your
**environment** (operating system, Python version, spaCy version). Our
[issue template](https://github.com/explosion/spaCy/issues/new) helps you
remember the most important details to include. If you've discovered a bug, you
can also submit a [regression test](#fixing-bugs) straight away. When you're
opening an issue to report the bug, simply refer to your pull request in the
issue body. A few more tips:
* **Getting info about your spaCy installation and environment**: If you're using spaCy v1.7+, you can use the command line interface to print details and even format them as Markdown to copy-paste into GitHub issues: `python -m spacy info --markdown`. * **Describing your issue:** Try to provide as many details as possible. What
exactly goes wrong? *How* is is failing? Is there an error?
"XY doesn't work" usually isn't that helpful for tracking down problems. Always
remember to include the code you ran and if possible, extract only the relevant
parts and don't just dump your entire script. This will make it easier for us to
reproduce the error.
* **Sharing long blocks of code or logs**: If you need to include long code, logs or tracebacks, you can wrap them in `<details>` and `</details>`. This [collapses the content](https://developer.mozilla.org/en/docs/Web/HTML/Element/details) so it only becomes visible on click, making the issue easier to read and follow. * **Getting info about your spaCy installation and environment:** If you're
using spaCy v1.7+, you can use the command line interface to print details and
even format them as Markdown to copy-paste into GitHub issues:
`python -m spacy info --markdown`.
* **Checking the model compatibility:** If you're having problems with a
[statistical model](https://alpha.spacy.io/models), it may be because to the
model is incompatible with your spaCy installation. In spaCy v2.0+, you can check
this on the command line by running `spacy validate`.
* **Sharing a model's output, like dependencies and entities:** spaCy v2.0+
comes with [built-in visualizers](https://alpha.spacy.io/usage/visualizers) that
you can run from within your script or a Jupyter notebook. For some issues, it's
helpful to **include a screenshot** of the visualization. You can simply drag and
drop the image into GitHub's editor and it will be uploaded and included.
* **Sharing long blocks of code or logs:** If you need to include long code,
logs or tracebacks, you can wrap them in `<details>` and `</details>`. This
[collapses the content](https://developer.mozilla.org/en/docs/Web/HTML/Element/details)
so it only becomes visible on click, making the issue easier to read and follow.
### Issue labels ### Issue labels
To distinguish issues that are opened by us, the maintainers, we usually add a 💫 to the title. We also use the following system to tag our issues: To distinguish issues that are opened by us, the maintainers, we usually add a
💫 to the title. We also use the following system to tag our issues and pull
requests:
| Issue label | Description | | Issue label | Description |
| --- | --- | | --- | --- |
@ -40,55 +84,143 @@ To distinguish issues that are opened by us, the maintainers, we usually add a
| [`performance`](https://github.com/explosion/spaCy/labels/performance) | Accuracy, speed and memory use problems | | [`performance`](https://github.com/explosion/spaCy/labels/performance) | Accuracy, speed and memory use problems |
| [`tests`](https://github.com/explosion/spaCy/labels/tests) | Missing or incorrect [tests](spacy/tests) | | [`tests`](https://github.com/explosion/spaCy/labels/tests) | Missing or incorrect [tests](spacy/tests) |
| [`docs`](https://github.com/explosion/spaCy/labels/docs), [`examples`](https://github.com/explosion/spaCy/labels/examples) | Issues related to the [documentation](https://spacy.io/docs) and [examples](spacy/examples) | | [`docs`](https://github.com/explosion/spaCy/labels/docs), [`examples`](https://github.com/explosion/spaCy/labels/examples) | Issues related to the [documentation](https://spacy.io/docs) and [examples](spacy/examples) |
| [`training`](https://github.com/explosion/spaCy/labels/training) | Issues related to training and updating models |
| [`models`](https://github.com/explosion/spaCy/labels/models), `language / [name]` | Issues related to the specific [models](https://github.com/explosion/spacy-models), languages and data | | [`models`](https://github.com/explosion/spaCy/labels/models), `language / [name]` | Issues related to the specific [models](https://github.com/explosion/spacy-models), languages and data |
| [`linux`](https://github.com/explosion/spaCy/labels/linux), [`osx`](https://github.com/explosion/spaCy/labels/osx), [`windows`](https://github.com/explosion/spaCy/labels/windows) | Issues related to the specific operating systems | | [`linux`](https://github.com/explosion/spaCy/labels/linux), [`osx`](https://github.com/explosion/spaCy/labels/osx), [`windows`](https://github.com/explosion/spaCy/labels/windows) | Issues related to the specific operating systems |
| [`pip`](https://github.com/explosion/spaCy/labels/pip), [`conda`](https://github.com/explosion/spaCy/labels/conda) | Issues related to the specific package managers | | [`pip`](https://github.com/explosion/spaCy/labels/pip), [`conda`](https://github.com/explosion/spaCy/labels/conda) | Issues related to the specific package managers |
| [`wip`](https://github.com/explosion/spaCy/labels/wip) | Work in progress | | [`wip`](https://github.com/explosion/spaCy/labels/wip) | Work in progress, mostly used for pull requests. |
| [`duplicate`](https://github.com/explosion/spaCy/labels/duplicate) | Duplicates, i.e. issues that have been reported before | | [`duplicate`](https://github.com/explosion/spaCy/labels/duplicate) | Duplicates, i.e. issues that have been reported before |
| [`meta`](https://github.com/explosion/spaCy/labels/meta) | Meta topics, e.g. repo organisation and issue management | | [`meta`](https://github.com/explosion/spaCy/labels/meta) | Meta topics, e.g. repo organisation and issue management |
| [`help wanted`](https://github.com/explosion/spaCy/labels/help%20wanted), [`help wanted (easy)`](https://github.com/explosion/spaCy/labels/help%20wanted%20%28easy%29) | Requests for contributions | | [`help wanted`](https://github.com/explosion/spaCy/labels/help%20wanted), [`help wanted (easy)`](https://github.com/explosion/spaCy/labels/help%20wanted%20%28easy%29) | Requests for contributions |
## Contributing to the code base ## Contributing to the code base
You don't have to be an NLP expert or Python pro to contribute, and we're happy to help you get started. If you're new to spaCy, a good place to start is the [`help wanted (easy)`](https://github.com/explosion/spaCy/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted+%28easy%29%22) label, which we use to tag bugs and feature requests that are easy and self-contained. If you've decided to take on one of these problems and you're making good progress, don't forget to add a quick comment to the issue. You can also use the issue to ask questions, or share your work in progress. You don't have to be an NLP expert or Python pro to contribute, and we're happy
to help you get started. If you're new to spaCy, a good place to start is the
[spaCy 101 guide](https://alpha.spacy.io/usage/spacy-101) and the
[`help wanted (easy)`](https://github.com/explosion/spaCy/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted+%28easy%29%22)
label, which we use to tag bugs and feature requests that are easy and
self-contained. If you've decided to take on one of these problems and you're
making good progress, don't forget to add a quick comment to the issue. You can
also use the issue to ask questions, or share your work in progress.
### What belongs in spaCy? ### What belongs in spaCy?
Every library has a different inclusion philosophy — a policy of what should be shipped in the core library, and what could be provided in other packages. Our philosophy is to prefer a smaller core library. We generally ask the following questions: Every library has a different inclusion philosophy — a policy of what should be
shipped in the core library, and what could be provided in other packages. Our
philosophy is to prefer a smaller core library. We generally ask the following
questions:
* **What would this feature look like if implemented in a separate package?** Some features would be very difficult to implement externally. For instance, anything that requires a change to the `Token` class really needs to be implemented within spaCy, because there's no convenient way to make spaCy return custom `Token` objects. In contrast, a library of word alignment functions could easily live as a separate package that depended on spaCy — there's little difference between writing `import word_aligner` and `import spacy.word_aligner`. * **What would this feature look like if implemented in a separate package?**
Some features would be very difficult to implement externally for example,
changes to spaCy's built-in methods. In contrast, a library of word
alignment functions could easily live as a separate package that depended on
spaCy — there's little difference between writing `import word_aligner` and
`import spacy.word_aligner`. spaCy v2.0+ makes it easy to implement
[custom pipeline components](https://alpha.spacy.io/usage/processing-pipelines#custom-components),
and add your own attributes, properties and methods to the `Doc`, `Token` and
`Span`. If you're looking to implement a new spaCy feature, starting with a
custom component package is usually the best strategy. You won't have to worry
about spaCy's internals and you can test your module in an isolated
environment. And if it works well, we can always integrate it into the core
library later.
* **Would the feature be easier to implement if it relied on "heavy" dependencies spaCy doesn't currently require?** Python has a very rich ecosystem. Libraries like Sci-Kit Learn, Scipy, Gensim, Keras etc. do lots of useful things — but we don't want to have them as dependencies. If the feature requires functionality in one of these libraries, it's probably better to break it out into a different package. * **Would the feature be easier to implement if it relied on "heavy" dependencies spaCy doesn't currently require?**
Python has a very rich ecosystem. Libraries like scikit-learn, SciPy, Gensim or
TensorFlow/Keras do lots of useful things — but we don't want to have them as
dependencies. If the feature requires functionality in one of these libraries,
it's probably better to break it out into a different package.
* **Is the feature orthogonal to the current spaCy functionality, or overlapping?** spaCy strongly prefers to avoid having 6 different ways of doing the same thing. As better techniques are developed, we prefer to drop support for "the old way". However, it's rare that one approach *entirely* dominates another. It's very common that there's still a use-case for the "obsolete" approach. For instance, [WordNet](https://wordnet.princeton.edu/) is still very useful — but word vectors are better for most use-cases, and the two approaches to lexical semantics do a lot of the same things. spaCy therefore only supports word vectors, and support for WordNet is currently left for other packages. * **Is the feature orthogonal to the current spaCy functionality, or overlapping?**
spaCy strongly prefers to avoid having 6 different ways of doing the same thing.
As better techniques are developed, we prefer to drop support for "the old way".
However, it's rare that one approach *entirely* dominates another. It's very
common that there's still a use-case for the "obsolete" approach. For instance,
[WordNet](https://wordnet.princeton.edu/) is still very useful — but word
vectors are better for most use-cases, and the two approaches to lexical
semantics do a lot of the same things. spaCy therefore only supports word
vectors, and support for WordNet is currently left for other packages.
* **Do you need the feature to get basic things done?** We do want spaCy to be at least somewhat self-contained. If we keep needing some feature in our recipes, that does provide some argument for bringing it "in house". * **Do you need the feature to get basic things done?** We do want spaCy to be
at least somewhat self-contained. If we keep needing some feature in our
recipes, that does provide some argument for bringing it "in house".
### Developer resources ### Getting started
To make changes to spaCy's code base, you need to clone the GitHub repository
and build spaCy from source. You'll need to make sure that you have a
development environment consisting of a Python distribution including header
files, a compiler, [pip](https://pip.pypa.io/en/latest/installing/),
[virtualenv](https://virtualenv.pypa.io/en/stable/) and
[git](https://git-scm.com) installed. The compiler is usually the trickiest part.
```
python -m pip install -U pip venv
git clone https://github.com/explosion/spaCy
cd spaCy
venv .env
source .env/bin/activate
export PYTHONPATH=`pwd`
pip install -r requirements.txt
python setup.py build_ext --inplace
```
If you've made changes to `.pyx` files, you need to recompile spaCy before you
can test your changes by re-running `python setup.py build_ext --inplace`.
Changes to `.py` files will be effective immediately.
📖 **For more details and instructions, see the documentation on [compiling spaCy from source](https://spacy.io/usage/#source) and the [quickstart widget](https://alpha.spacy.io/usage/#section-quickstart) to get the right commands for your platform and Python version.**
The [spaCy developer resources](https://github.com/explosion/spacy-dev-resources) repo contains useful scripts, tools and templates for developing spaCy, adding new languages and training new models. If you've written a script that might help others, feel free to contribute it to that repository.
### Contributor agreement ### Contributor agreement
If you've made a substantial contribution to spaCy, you should fill in the [spaCy contributor agreement](.github/CONTRIBUTOR_AGREEMENT.md) to ensure that your contribution can be used across the project. If you agree to be bound by the terms of the agreement, fill in the [template]((.github/CONTRIBUTOR_AGREEMENT.md)) and include it with your pull request, or sumit it separately to [`.github/contributors/`](/.github/contributors). The name of the file should be your GitHub username, with the extension `.md`. For example, the user If you've made a contribution to spaCy, you should fill in the
[spaCy contributor agreement](.github/CONTRIBUTOR_AGREEMENT.md) to ensure that
your contribution can be used across the project. If you agree to be bound by
the terms of the agreement, fill in the [template]((.github/CONTRIBUTOR_AGREEMENT.md))
and include it with your pull request, or sumit it separately to
[`.github/contributors/`](/.github/contributors). The name of the file should be
your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`. example_user would create the file `.github/contributors/example_user.md`.
### Fixing bugs ### Fixing bugs
When fixing a bug, first create an [issue](https://github.com/explosion/spaCy/issues) if one does not already exist. The description text can be very short we don't want to make this too bureaucratic. When fixing a bug, first create an
[issue](https://github.com/explosion/spaCy/issues) if one does not already exist.
The description text can be very short we don't want to make this too
bureaucratic.
Next, create a test file named `test_issue[ISSUE NUMBER].py` in the [`spacy/tests/regression`](spacy/tests/regression) folder. Test for the bug you're fixing, and make sure the test fails. Next, add and commit your test file referencing the issue number in the commit message. Finally, fix the bug, make sure your test passes and reference the issue in your commit message. Next, create a test file named `test_issue[ISSUE NUMBER].py` in the
[`spacy/tests/regression`](spacy/tests/regression) folder. Test for the bug
you're fixing, and make sure the test fails. Next, add and commit your test file
referencing the issue number in the commit message. Finally, fix the bug, make
sure your test passes and reference the issue in your commit message.
📖 **For more information on how to add tests, check out the [tests README](spacy/tests/README.md).** 📖 **For more information on how to add tests, check out the [tests README](spacy/tests/README.md).**
## Code conventions ## Code conventions
Code should loosely follow [pep8](https://www.python.org/dev/peps/pep-0008/). Regular line length is **80 characters**, with some tolerance for lines up to 90 characters if the alternative would be worse — for instance, if your list comprehension comes to 82 characters, it's better not to split it over two lines. Code should loosely follow [pep8](https://www.python.org/dev/peps/pep-0008/).
Regular line length is **80 characters**, with some tolerance for lines up to
90 characters if the alternative would be worse — for instance, if your list
comprehension comes to 82 characters, it's better not to split it over two lines.
You can also use a linter like [`flake8`](https://pypi.python.org/pypi/flake8)
or [`frosted`](https://pypi.python.org/pypi/frosted) just keep in mind that
it won't work very well for `.pyx` files and will complain about Cython syntax
like `<int*>` or `cimport`.
### Python conventions ### Python conventions
All Python code must be written in an **intersection of Python 2 and Python 3**. This is easy in Cython, but somewhat ugly in Python. Logic that deals with Python or platform compatibility should only live in [`spacy.compat`](spacy/compat.py). To distinguish them from the builtin functions, replacement functions are suffixed with an undersocre, for example `unicode_`. If you need to access the user's version or platform information, for example to show more specific error messages, you can use the `is_config()` helper function. All Python code must be written in an **intersection of Python 2 and Python 3**.
This is easy in Cython, but somewhat ugly in Python. Logic that deals with
Python or platform compatibility should only live in
[`spacy.compat`](spacy/compat.py). To distinguish them from the builtin
functions, replacement functions are suffixed with an undersocre, for example
`unicode_`. If you need to access the user's version or platform information,
for example to show more specific error messages, you can use the `is_config()`
helper function.
```python ```python
from .compat import unicode_, json_dumps, is_config from .compat import unicode_, json_dumps, is_config
@ -99,21 +231,56 @@ if is_config(windows=True, python2=True):
print("You are using Python 2 on Windows.") print("You are using Python 2 on Windows.")
``` ```
Code that interacts with the file-system should accept objects that follow the `pathlib.Path` API, without assuming that the object inherits from `pathlib.Path`. If the function is user-facing and takes a path as an argument, it should check whether the path is provided as a string. Strings should be converted to `pathlib.Path` objects. Code that interacts with the file-system should accept objects that follow the
`pathlib.Path` API, without assuming that the object inherits from `pathlib.Path`.
If the function is user-facing and takes a path as an argument, it should check
whether the path is provided as a string. Strings should be converted to
`pathlib.Path` objects. Serialization and deserialization functions should always
accept **file-like objects**, as it makes the library io-agnostic. Working on
buffers makes the code more general, easier to test, and compatible with Python
3's asynchronous IO.
At the time of writing (v1.7), spaCy's serialization and deserialization functions are inconsistent about accepting paths vs accepting file-like objects. The correct answer is "file-like objects" — that's what we want going forward, as it makes the library io-agnostic. Working on buffers makes the code more general, easier to test, and compatible with Python 3's asynchronous IO. Although spaCy uses a lot of classes, **inheritance is viewed with some suspicion**
— it's seen as a mechanism of last resort. You should discuss plans to extend
the class hierarchy before implementing.
Although spaCy uses a lot of classes, inheritance is viewed with some suspicion — it's seen as a mechanism of last resort. You should discuss plans to extend the class hierarchy before implementing. We have a number of conventions around variable naming that are still being
documented, and aren't 100% strict. A general policy is that instances of the
We have a number of conventions around variable naming that are still being documented, and aren't 100% strict. A general policy is that instances of the class `Doc` should by default be called `doc`, `Token` `token`, `Lexeme` `lex`, `Vocab` `vocab` and `Language` `nlp`. You should avoid naming variables that are of other types these names. For instance, don't name a text string `doc` — you should usually call this `text`. Two general code style preferences further help with naming. First, lean away from introducing temporary variables, as these clutter your namespace. This is one reason why comprehension expressions are often preferred. Second, keep your functions shortish, so that can work in a smaller scope. Of course, this is a question of trade-offs. class `Doc` should by default be called `doc`, `Token` `token`, `Lexeme` `lex`,
`Vocab` `vocab` and `Language` `nlp`. You should avoid naming variables that are
of other types these names. For instance, don't name a text string `doc` — you
should usually call this `text`. Two general code style preferences further help
with naming. First, **lean away from introducing temporary variables**, as these
clutter your namespace. This is one reason why comprehension expressions are
often preferred. Second, **keep your functions shortish**, so that can work in a
smaller scope. Of course, this is a question of trade-offs.
### Cython conventions ### Cython conventions
spaCy's core data structures are implemented as [Cython](http://cython.org/) `cdef` classes. Memory is managed through the `cymem.cymem.Pool` class, which allows you to allocate memory which will be freed when the `Pool` object is garbage collected. This means you usually don't have to worry about freeing memory. You just have to decide which Python object owns the memory, and make it own the `Pool`. When that object goes out of scope, the memory will be freed. You do have to take care that no pointers outlive the object that owns them — but this is generally quite easy. spaCy's core data structures are implemented as [Cython](http://cython.org/) `cdef`
classes. Memory is managed through the `cymem.cymem.Pool` class, which allows
you to allocate memory which will be freed when the `Pool` object is garbage
collected. This means you usually don't have to worry about freeing memory. You
just have to decide which Python object owns the memory, and make it own the
`Pool`. When that object goes out of scope, the memory will be freed. You do
have to take care that no pointers outlive the object that owns them — but this
is generally quite easy.
All Cython modules should have the `# cython: infer_types=True` compiler directive at the top of the file. This makes the code much cleaner, as it avoids the need for many type declarations. If possible, you should prefer to declare your functions `nogil`, even if you don't especially care about multi-threading. The reason is that `nogil` functions help the Cython compiler reason about your code quite a lot — you're telling the compiler that no Python dynamics are possible. This lets many errors be raised, and ensures your function will run at C speed. All Cython modules should have the `# cython: infer_types=True` compiler
directive at the top of the file. This makes the code much cleaner, as it avoids
the need for many type declarations. If possible, you should prefer to declare
your functions `nogil`, even if you don't especially care about multi-threading.
The reason is that `nogil` functions help the Cython compiler reason about your
code quite a lot — you're telling the compiler that no Python dynamics are
possible. This lets many errors be raised, and ensures your function will run
at C speed.
Cython gives you many choices of sequences: you could have a Python list, a numpy array, a memory view, a C++ vector, or a pointer. Pointers are preferred, because they are fastest, have the most explicit semantics, and let the compiler check your code more strictly. C++ vectors are also great — but you should only use them internally in functions. It's less friendly to accept a vector as an argument, because that asks the user to do much more work. Cython gives you many choices of sequences: you could have a Python list, a
numpy array, a memory view, a C++ vector, or a pointer. Pointers are preferred,
because they are fastest, have the most explicit semantics, and let the compiler
check your code more strictly. C++ vectors are also great — but you should only
use them internally in functions. It's less friendly to accept a vector as an
argument, because that asks the user to do much more work.
Here's how to get a pointer from a numpy array, memory view or vector: Here's how to get a pointer from a numpy array, memory view or vector:
@ -124,9 +291,14 @@ cdef void get_pointers(np.ndarray[int, mode='c'] numpy_array, vector[int] cpp_ve
pointer3 = &memory_view[0] pointer3 = &memory_view[0]
``` ```
Both C arrays and C++ vectors reassure the compiler that no Python operations are possible on your variable. This is a big advantage: it lets the Cython compiler raise many more errors for you. Both C arrays and C++ vectors reassure the compiler that no Python operations
are possible on your variable. This is a big advantage: it lets the Cython
compiler raise many more errors for you.
When getting a pointer from a numpy array or memoryview, take care that the data is actually stored in C-contiguous order — otherwise you'll get a pointer to nonsense. The type-declarations in the code above should generate runtime errors if buffers with incorrect memory layouts are passed in. When getting a pointer from a numpy array or memoryview, take care that the data
is actually stored in C-contiguous order — otherwise you'll get a pointer to
nonsense. The type-declarations in the code above should generate runtime errors
if buffers with incorrect memory layouts are passed in.
To iterate over the array, the following style is preferred: To iterate over the array, the following style is preferred:
@ -138,13 +310,40 @@ cdef int c_total(const int* int_array, int length) nogil:
return total return total
``` ```
If this is confusing, consider that the compiler couldn't deal with `for item in int_array:` — there's no length attached to a raw pointer, so how could we figure out where to stop? The length is provided in the slice notation as a solution to this. Note that we don't have to declare the type of `item` in the code above — the compiler can easily infer it. This gives us tidy code that looks quite like Python, but is exactly as fast as C — because we've made sure the compilation to C is trivial. If this is confusing, consider that the compiler couldn't deal with
`for item in int_array:` — there's no length attached to a raw pointer, so how
could we figure out where to stop? The length is provided in the slice notation
as a solution to this. Note that we don't have to declare the type of `item` in
the code above — the compiler can easily infer it. This gives us tidy code that
looks quite like Python, but is exactly as fast as C — because we've made sure
the compilation to C is trivial.
Your functions cannot be declared `nogil` if they need to create Python objects or call Python functions. This is perfectly okay — you shouldn't torture your code just to get `nogil` functions. However, if your function isn't `nogil`, you should compile your module with `cython -a --cplus my_module.pyx` and open the resulting `my_module.html` file in a browser. This will let you see how Cython is compiling your code. Calls into the Python run-time will be in bright yellow. This lets you easily see whether Cython is able to correctly type your code, or whether there are unexpected problems. Your functions cannot be declared `nogil` if they need to create Python objects
or call Python functions. This is perfectly okay — you shouldn't torture your
code just to get `nogil` functions. However, if your function isn't `nogil`, you
should compile your module with `cython -a --cplus my_module.pyx` and open the
resulting `my_module.html` file in a browser. This will let you see how Cython
is compiling your code. Calls into the Python run-time will be in bright yellow.
This lets you easily see whether Cython is able to correctly type your code, or
whether there are unexpected problems.
Finally, if you're new to Cython, you should expect to find the first steps a bit frustrating. It's a very large language, since it's essentially a superset of Python and C++, with additional complexity and syntax from numpy. The [documentation](http://docs.cython.org/en/latest/) isn't great, and there are many "traps for new players". Help is available on [Gitter](https://gitter.im/explosion/spaCy). Finally, if you're new to Cython, you should expect to find the first steps a
bit frustrating. It's a very large language, since it's essentially a superset
Working in Cython is very rewarding once you're over the initial learning curve. As with C and C++, the first way you write something in Cython will often be the performance-optimal approach. In contrast, Python optimisation generally requires a lot of experimentation. Is it faster to have an `if item in my_dict` check, or to use `.get()`? What about `try`/`except`? Does this numpy operation create a copy? There's no way to guess the answers to these questions, and you'll usually be dissatisfied with your results — so there's no way to know when to stop this process. In the worst case, you'll make a mess that invites the next reader to try their luck too. This is like one of those [volcanic gas-traps](http://www.wemjournal.org/article/S1080-6032%2809%2970088-2/abstract), where the rescuers keep passing out from low oxygen, causing another rescuer to follow — only to succumb themselves. In short, just say no to optimizing your Python. If it's not fast enough the first time, just switch to Cython. of Python and C++, with additional complexity and syntax from numpy. The
[documentation](http://docs.cython.org/en/latest/) isn't great, and there are
many "traps for new players". Working in Cython is very rewarding once you're
over the initial learning curve. As with C and C++, the first way you write
something in Cython will often be the performance-optimal approach. In contrast,
Python optimisation generally requires a lot of experimentation. Is it faster to
have an `if item in my_dict` check, or to use `.get()`? What about `try`/`except`?
Does this numpy operation create a copy? There's no way to guess the answers to
these questions, and you'll usually be dissatisfied with your results — so
there's no way to know when to stop this process. In the worst case, you'll make
a mess that invites the next reader to try their luck too. This is like one of
those [volcanic gas-traps](http://www.wemjournal.org/article/S1080-6032%2809%2970088-2/abstract),
where the rescuers keep passing out from low oxygen, causing another rescuer to
follow — only to succumb themselves. In short, just say no to optimizing your
Python. If it's not fast enough the first time, just switch to Cython.
### Resources to get you started ### Resources to get you started
@ -156,18 +355,34 @@ Working in Cython is very rewarding once you're over the initial learning curve.
## Adding tests ## Adding tests
spaCy uses the [pytest](http://doc.pytest.org/) framework for testing. For more info on this, see the [pytest documentation](http://docs.pytest.org/en/latest/contents.html). Tests for spaCy modules and classes live in their own directories of the same name. For example, tests for the `Tokenizer` can be found in [`/spacy/tests/tokenizer`](spacy/tests/tokenizer). To be interpreted and run, all test files and test functions need to be prefixed with `test_`. spaCy uses the [pytest](http://doc.pytest.org/) framework for testing. For more
info on this, see the [pytest documentation](http://docs.pytest.org/en/latest/contents.html).
Tests for spaCy modules and classes live in their own directories of the same
name. For example, tests for the `Tokenizer` can be found in
[`/spacy/tests/tokenizer`](spacy/tests/tokenizer). To be interpreted and run,
all test files and test functions need to be prefixed with `test_`.
When adding tests, make sure to use descriptive names, keep the code short and concise and only test for one behaviour at a time. Try to `parametrize` test cases wherever possible, use our pre-defined fixtures for spaCy components and avoid unnecessary imports. When adding tests, make sure to use descriptive names, keep the code short and
concise and only test for one behaviour at a time. Try to `parametrize` test
cases wherever possible, use our pre-defined fixtures for spaCy components and
avoid unnecessary imports.
Extensive tests that take a long time should be marked with `@pytest.mark.slow`. Tests that require the model to be loaded should be marked with `@pytest.mark.models`. Loading the models is expensive and not necessary if you're not actually testing the model performance. If all you needs ia a `Doc` object with annotations like heads, POS tags or the dependency parse, you can use the `get_doc()` utility function to construct it manually. Extensive tests that take a long time should be marked with `@pytest.mark.slow`.
Tests that require the model to be loaded should be marked with
`@pytest.mark.models`. Loading the models is expensive and not necessary if
you're not actually testing the model performance. If all you needs ia a `Doc`
object with annotations like heads, POS tags or the dependency parse, you can
use the `get_doc()` utility function to construct it manually.
📖 **For more guidelines and information on how to add tests, check out the [tests README](spacy/tests/README.md).** 📖 **For more guidelines and information on how to add tests, check out the [tests README](spacy/tests/README.md).**
## Updating the website ## Updating the website
Our [website and docs](https://spacy.io) are implemented in [Jade/Pug](https://www.jade-lang.org), and built or served by [Harp](https://harpjs.com). Jade/Pug is an extensible templating language with a readable syntax, that compiles to HTML. Here's how to view the site locally: Our [website and docs](https://spacy.io) are implemented in
[Jade/Pug](https://www.jade-lang.org), and built or served by
[Harp](https://harpjs.com). Jade/Pug is an extensible templating language with a
readable syntax, that compiles to HTML. Here's how to view the site locally:
```bash ```bash
sudo npm install --global harp sudo npm install --global harp
@ -176,9 +391,25 @@ cd spaCy/website
harp server harp server
``` ```
The docs can always use another example or more detail, and they should always be up to date and not misleading. To quickly find the correct file to edit, simply click on the "Suggest edits" button at the bottom of a page. The docs can always use another example or more detail, and they should always
be up to date and not misleading. To quickly find the correct file to edit,
simply click on the "Suggest edits" button at the bottom of a page. To keep
long pages maintainable, and allow including content in several places without
doubling it, sections often consist of partials. Partials and partial directories
are prefixed by an underscore `_` so they're not compiled with the site. For
example:
To make it easy to add content components, we use a [collection of custom mixins](_includes/_mixins.jade), like `+table`, `+list` or `+code`. ```pug
+section("tokenization")
+h(2, "tokenization") Tokenization
include _spacy-101/_tokenization
```
So if you're looking to edit the content of the tokenization section, you can
find it in `_spacy-101/_tokenization.jade`. To make it easy to add content
components, we use a [collection of custom mixins](_includes/_mixins.jade),
like `+table`, `+list` or `+code`. For an overview of the available mixins and
components, see the [styleguide](https://alpha.spacy.io/styleguide).
📖 **For more info and troubleshooting guides, check out the [website README](website).** 📖 **For more info and troubleshooting guides, check out the [website README](website).**
@ -186,62 +417,40 @@ To make it easy to add content components, we use a [collection of custom mixins
* [Guide to static websites with Harp and Jade](https://ines.io/blog/the-ultimate-guide-static-websites-harp-jade) (ines.io) * [Guide to static websites with Harp and Jade](https://ines.io/blog/the-ultimate-guide-static-websites-harp-jade) (ines.io)
* [Building a website with modular markup components (mixins)](https://explosion.ai/blog/modular-markup) (explosion.ai) * [Building a website with modular markup components (mixins)](https://explosion.ai/blog/modular-markup) (explosion.ai)
* [spacy.io Styleguide](https://alpha.spacy.io/styleguide) (spacy.io)
* [Jade/Pug documentation](https://pugjs.org) (pugjs.org) * [Jade/Pug documentation](https://pugjs.org) (pugjs.org)
* [Harp documentation](https://harpjs.com/) (harpjs.com) * [Harp documentation](https://harpjs.com/) (harpjs.com)
## Submitting a tutorial ## Publishing spaCy extensions and plugins
Did you write a [tutorial](https://spacy.io/docs/usage/tutorials) to help others use spaCy, or did you come across one that should be added to our directory? You can submit it by making a pull request to [`website/docs/usage/_data.json`](website/docs/usage/_data.json): We're very excited about all the new possibilities for **community extensions**
and plugins in spaCy v2.0, and we can't wait to see what you build with it!
```json * An extension or plugin should add substantial functionality, be
{ **well-documented** and **open-source**. It should be available for users to download
"tutorials": { and install as a Python package for example via [PyPi](http://pypi.python.org).
"deep_dives": {
"Deep Learning with custom pipelines and Keras": {
"url": "https://explosion.ai/blog/spacy-deep-learning-keras",
"author": "Matthew Honnibal",
"tags": [ "keras", "sentiment" ]
}
}
}
}
```
### A few tips * Extensions that write to `Doc`, `Token` or `Span` attributes should be wrapped
as [pipeline components](https://alpha.spacy.io/usage/processing-pipelines#custom-components)
that users can **add to their processing pipeline** using `nlp.add_pipe()`.
* A suitable tutorial should provide additional content and practical examples that are not covered as such in the docs. * When publishing your extension on GitHub, **tag it** with the topics
* Make sure to choose the right category `first_steps`, `deep_dives` (tutorials that take a deeper look at specific features) or `code` (programs and scripts on GitHub etc.). [`spacy`](https://github.com/topics/spacy?o=desc&s=stars) and
* Don't go overboard with the tags. Take inspirations from the existing ones and only add tags for features (`"sentiment"`, `"pos"`) or integrations (`"jupyter"`, `"keras"`). [`spacy-extensions`](https://github.com/topics/spacy-extension?o=desc&s=stars)
* Double-check the JSON markup and/or use a linter. A wrong or missing comma will (unfortunately) break the site rendering. to make it easier to find. Those are also the topics we're linking to from the
spaCy website. If you're sharing your project on Twitter, feel free to tag
[@spacy_io](https://twitter.com/spacy_io) so we can check it out.
## Submitting a project to the showcase * Once your extension is published, you can open an issue on the
[issue tracker](https://github.com/explosion/spacy/issues) to suggest it for the
[resources directory](https://alpha.spacy.io/usage/resources#extensions) on the
website.
Have you built a library, visualizer, demo or product with spaCy, or did you come across one that should be featured in our [showcase](https://spacy.io/docs/usage/showcase)? You can submit it by making a pull request to [`website/docs/usage/_data.json`](website/docs/usage/_data.json): 📖 **For more tips and best practices, see the [checklist for developing spaCy extensions](https://alpha.spacy.io/usage/processing-pipelines#extensions).**
```json
{
"showcase": {
"visualizations": {
"displaCy": {
"url": "https://demos.explosion.ai/displacy",
"author": "Ines Montani",
"description": "An open-source NLP visualiser for the modern web",
"image": "displacy.jpg"
}
}
}
}
```
### A few tips
* A suitable third-party library should add substantial functionality, be well-documented and open-source. If it's just a code snippet or script, consider submitting it to the `code` category of the tutorials section instead.
* A suitable demo should be hosted and accessible online. Open-source code is always a plus.
* For visualizations and products, add an image that clearly shows how it looks screenshots are ideal.
* The image should be resized to 300x188px, optimised using a tool like [ImageOptim](https://imageoptim.com/mac) and added to [`website/assets/img/showcase`](website/assets/img/showcase).
* Double-check the JSON markup and/or use a linter. A wrong or missing comma will (unfortunately) break the site rendering.
## Code of conduct ## Code of conduct
spaCy adheres to the [Contributor Covenant Code of Conduct](http://contributor-covenant.org/version/1/4/). By participating, you are expected to uphold this code. spaCy adheres to the
[Contributor Covenant Code of Conduct](http://contributor-covenant.org/version/1/4/).
By participating, you are expected to uphold this code.

View File

@ -124,12 +124,12 @@ Using pip, spaCy releases are currently only available as source packages.
pip install spacy pip install spacy
When using pip it is generally recommended to install packages in a ``virtualenv`` When using pip it is generally recommended to install packages in a virtual
to avoid modifying system state: environment to avoid modifying system state:
.. code:: bash .. code:: bash
virtualenv .env venv .env
source .env/bin/activate source .env/bin/activate
pip install spacy pip install spacy
@ -247,25 +247,31 @@ details.
.. code:: bash .. code:: bash
# make sure you are using recent pip/virtualenv versions # make sure you are using recent pip/virtualenv versions
python -m pip install -U pip virtualenv python -m pip install -U pip venv
git clone https://github.com/explosion/spaCy git clone https://github.com/explosion/spaCy
cd spaCy cd spaCy
virtualenv .env venv .env
source .env/bin/activate source .env/bin/activate
export PYTHONPATH=`pwd`
pip install -r requirements.txt pip install -r requirements.txt
pip install -e . python setup.py build_ext --inplace
Compared to regular install via pip, `requirements.txt <requirements.txt>`_ Compared to regular install via pip, `requirements.txt <requirements.txt>`_
additionally installs developer dependencies such as Cython. additionally installs developer dependencies such as Cython. For more details
and instructions, see the documentation on
`compiling spaCy from source <https://spacy.io/usage/#source>`_ and the
`quickstart widget <https://alpha.spacy.io/usage/#section-quickstart>`_ to get
the right commands for your platform and Python version.
Instead of the above verbose commands, you can also use the following Instead of the above verbose commands, you can also use the following
`Fabric <http://www.fabfile.org/>`_ commands. All commands assume that your `Fabric <http://www.fabfile.org/>`_ commands. All commands assume that your
``virtualenv`` is located in a directory ``.env``. If you're using a different virtual environment is located in a directory ``.env``. If you're using a
directory, you can change it via the environment variable ``VENV_DIR``, for different directory, you can change it via the environment variable ``VENV_DIR``,
example ``VENV_DIR=".custom-env" fab clean make``. for example ``VENV_DIR=".custom-env" fab clean make``.
============= === ============= ===
``fab env`` Create ``virtualenv`` and delete previous one, if it exists. ``fab env`` Create virtual environment and delete previous one, if it exists.
``fab make`` Compile the source. ``fab make`` Compile the source.
``fab clean`` Remove compiled objects, including the generated C++. ``fab clean`` Remove compiled objects, including the generated C++.
``fab test`` Run basic tests, aborting after first failure. ``fab test`` Run basic tests, aborting after first failure.

View File

@ -33,22 +33,26 @@ def main(output_dir, model='en_core_web_sm', n_jobs=4, batch_size=1000,
print("Loading IMDB data...") print("Loading IMDB data...")
data, _ = thinc.extra.datasets.imdb() data, _ = thinc.extra.datasets.imdb()
texts, _ = zip(*data[-limit:]) texts, _ = zip(*data[-limit:])
print("Processing texts...")
partitions = partition_all(batch_size, texts) partitions = partition_all(batch_size, texts)
items = ((i, [nlp(text) for text in texts], output_dir) for i, texts executor = Parallel(n_jobs=n_jobs)
in enumerate(partitions)) do = delayed(transform_texts)
Parallel(n_jobs=n_jobs)(delayed(transform_texts)(*item) for item in items) tasks = (do(nlp, i, batch, output_dir)
for i, batch in enumerate(partitions))
executor(tasks)
def transform_texts(batch_id, docs, output_dir): def transform_texts(nlp, batch_id, texts, output_dir):
print(nlp.pipe_names)
out_path = Path(output_dir) / ('%d.txt' % batch_id) out_path = Path(output_dir) / ('%d.txt' % batch_id)
if out_path.exists(): # return None in case same batch is called again if out_path.exists(): # return None in case same batch is called again
return None return None
print('Processing batch', batch_id) print('Processing batch', batch_id)
with out_path.open('w', encoding='utf8') as f: with out_path.open('w', encoding='utf8') as f:
for doc in docs: for doc in nlp.pipe(texts):
f.write(' '.join(represent_word(w) for w in doc if not w.is_space)) f.write(' '.join(represent_word(w) for w in doc if not w.is_space))
f.write('\n') f.write('\n')
print('Saved {} texts to {}.txt'.format(len(docs), batch_id)) print('Saved {} texts to {}.txt'.format(len(texts), batch_id))
def represent_word(word): def represent_word(word):

View File

@ -3,7 +3,7 @@
# https://github.com/pypa/warehouse/blob/master/warehouse/__about__.py # https://github.com/pypa/warehouse/blob/master/warehouse/__about__.py
__title__ = 'spacy-nightly' __title__ = 'spacy-nightly'
__version__ = '2.0.0a18' __version__ = '2.0.0a19'
__summary__ = 'Industrial-strength Natural Language Processing (NLP) with Python and Cython' __summary__ = 'Industrial-strength Natural Language Processing (NLP) with Python and Cython'
__uri__ = 'https://spacy.io' __uri__ = 'https://spacy.io'
__author__ = 'Explosion AI' __author__ = 'Explosion AI'

View File

@ -2,7 +2,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
TAG_MAP = { TAG_MAP = {
"ADJ___": {"morph": "_", "pos": "ADJ"}, "ADJ___": {"morph": "_", "pos": "ADJ"},
"ADJ__AdpType=Prep": {"morph": "AdpType=Prep", "pos": "ADJ"}, "ADJ__AdpType=Prep": {"morph": "AdpType=Prep", "pos": "ADJ"},

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS, TOKEN_MATCH from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS, TOKEN_MATCH
from .punctuation import TOKENIZER_SUFFIXES, TOKENIZER_INFIXES from .punctuation import TOKENIZER_SUFFIXES, TOKENIZER_INFIXES
from .tag_map import TAG_MAP
from .stop_words import STOP_WORDS from .stop_words import STOP_WORDS
from .lex_attrs import LEX_ATTRS from .lex_attrs import LEX_ATTRS
from .lemmatizer import LOOKUP from .lemmatizer import LOOKUP
@ -21,6 +22,7 @@ class FrenchDefaults(Language.Defaults):
lex_attr_getters[LANG] = lambda text: 'fr' lex_attr_getters[LANG] = lambda text: 'fr'
lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS) lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS)
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS) tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
tag_map = TAG_MAP
stop_words = STOP_WORDS stop_words = STOP_WORDS
infixes = TOKENIZER_INFIXES infixes = TOKENIZER_INFIXES
suffixes = TOKENIZER_SUFFIXES suffixes = TOKENIZER_SUFFIXES

216
spacy/lang/fr/tag_map.py Normal file
View File

@ -0,0 +1,216 @@
# coding: utf8
from __future__ import unicode_literals
TAG_MAP = {
"ADJ__Gender=Fem|Number=Plur": {"pos": "PRON"},
"ADJ__Gender=Fem|Number=Plur|NumType=Ord": {"pos": "PRON"},
"ADJ__Gender=Fem|Number=Sing": {"pos": "PRON"},
"ADJ__Gender=Fem|Number=Sing|NumType=Ord": {"pos": "PRON"},
"ADJ__Gender=Masc": {"pos": "PRON"},
"ADJ__Gender=Masc|Number=Plur": {"pos": "PRON"},
"ADJ__Gender=Masc|Number=Plur|NumType=Ord": {"pos": "PRON"},
"ADJ__Gender=Masc|Number=Sing": {"pos": "PRON"},
"ADJ__Gender=Masc|Number=Sing|NumType=Card": {"pos": "PRON"},
"ADJ__Gender=Masc|Number=Sing|NumType=Ord": {"pos": "PRON"},
"ADJ__NumType=Card": {"pos": "PRON"},
"ADJ__NumType=Ord": {"pos": "PRON"},
"ADJ__Number=Plur": {"pos": "PRON"},
"ADJ__Number=Sing": {"pos": "PRON"},
"ADJ__Number=Sing|NumType=Ord": {"pos": "PRON"},
"ADJ___": {"pos": "PRON"},
"ADP__Gender=Fem|Number=Plur|Person=3": {"pos": "PRON"},
"ADP__Gender=Masc|Number=Plur|Person=3": {"pos": "PRON"},
"ADP__Gender=Masc|Number=Sing|Person=3": {"pos": "PRON"},
"ADP___": {"pos": "PRON"},
"ADV__Polarity=Neg": {"pos": "PRON"},
"ADV__PronType=Int": {"pos": "PRON"},
"ADV___": {"pos": "PRON"},
"AUX__Gender=Fem|Number=Plur|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"AUX__Gender=Fem|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"AUX__Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"AUX__Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"AUX__Gender=Masc|Number=Plur|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"AUX__Gender=Masc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"AUX__Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"AUX__Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"AUX__Mood=Cnd|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Cnd|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Cnd|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Cnd|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Cnd|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Plur|Person=1|Tense=Fut|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Plur|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Plur|Person=3|Tense=Fut|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Plur|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Plur|Person=3|Tense=Past|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Sing|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Sing|Person=2|Tense=Imp|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Sing|Person=3|Tense=Fut|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Sing|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Sub|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Sub|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Sub|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Sub|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Mood=Sub|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"AUX__Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"AUX__Tense=Past|VerbForm=Part": {"pos": "PRON"},
"AUX__Tense=Pres|VerbForm=Part": {"pos": "PRON"},
"AUX__VerbForm=Inf": {"pos": "PRON"},
"CCONJ___": {"pos": "PRON"},
"DET__Definite=Def|Gender=Fem|Number=Sing|PronType=Art": {"pos": "PRON"},
"DET__Definite=Def|Gender=Masc|Number=Sing|PronType=Art": {"pos": "PRON"},
"DET__Definite=Def|Number=Plur|PronType=Art": {"pos": "PRON"},
"DET__Definite=Def|Number=Sing|PronType=Art": {"pos": "PRON"},
"DET__Definite=Ind|Gender=Fem|Number=Plur|PronType=Art": {"pos": "PRON"},
"DET__Definite=Ind|Gender=Fem|Number=Sing|PronType=Art": {"pos": "PRON"},
"DET__Definite=Ind|Gender=Masc|Number=Plur|PronType=Art": {"pos": "PRON"},
"DET__Definite=Ind|Gender=Masc|Number=Sing|PronType=Art": {"pos": "PRON"},
"DET__Definite=Ind|Number=Plur|PronType=Art": {"pos": "PRON"},
"DET__Definite=Ind|Number=Sing|PronType=Art": {"pos": "PRON"},
"DET__Gender=Fem|Number=Plur": {"pos": "PRON"},
"DET__Gender=Fem|Number=Plur|PronType=Int": {"pos": "PRON"},
"DET__Gender=Fem|Number=Sing": {"pos": "PRON"},
"DET__Gender=Fem|Number=Sing|Poss=Yes": {"pos": "PRON"},
"DET__Gender=Fem|Number=Sing|PronType=Dem": {"pos": "PRON"},
"DET__Gender=Fem|Number=Sing|PronType=Int": {"pos": "PRON"},
"DET__Gender=Masc|Number=Plur": {"pos": "PRON"},
"DET__Gender=Masc|Number=Sing": {"pos": "PRON"},
"DET__Gender=Masc|Number=Sing|PronType=Dem": {"pos": "PRON"},
"DET__Gender=Masc|Number=Sing|PronType=Int": {"pos": "PRON"},
"DET__Number=Plur": {"pos": "PRON"},
"DET__Number=Plur|Poss=Yes": {"pos": "PRON"},
"DET__Number=Plur|PronType=Dem": {"pos": "PRON"},
"DET__Number=Sing": {"pos": "PRON"},
"DET__Number=Sing|Poss=Yes": {"pos": "PRON"},
"DET___": {"pos": "PRON"},
"INTJ___": {"pos": "PRON"},
"NOUN__Gender=Fem": {"pos": "PRON"},
"NOUN__Gender=Fem|Number=Plur": {"pos": "PRON"},
"NOUN__Gender=Fem|Number=Sing": {"pos": "PRON"},
"NOUN__Gender=Masc": {"pos": "PRON"},
"NOUN__Gender=Masc|Number=Plur": {"pos": "PRON"},
"NOUN__Gender=Masc|Number=Plur|NumType=Card": {"pos": "PRON"},
"NOUN__Gender=Masc|Number=Sing": {"pos": "PRON"},
"NOUN__Gender=Masc|Number=Sing|NumType=Card": {"pos": "PRON"},
"NOUN__NumType=Card": {"pos": "PRON"},
"NOUN__Number=Plur": {"pos": "PRON"},
"NOUN__Number=Sing": {"pos": "PRON"},
"NOUN___": {"pos": "PRON"},
"NUM__Gender=Masc|Number=Plur|NumType=Card": {"pos": "PRON"},
"NUM__NumType=Card": {"pos": "PRON"},
"PART___": {"pos": "PRON"},
"PRON__Gender=Fem|Number=Plur": {"pos": "PRON"},
"PRON__Gender=Fem|Number=Plur|Person=3": {"pos": "PRON"},
"PRON__Gender=Fem|Number=Plur|Person=3|PronType=Prs": {"pos": "PRON"},
"PRON__Gender=Fem|Number=Plur|Person=3|PronType=Rel": {"pos": "PRON"},
"PRON__Gender=Fem|Number=Plur|PronType=Dem": {"pos": "PRON"},
"PRON__Gender=Fem|Number=Plur|PronType=Rel": {"pos": "PRON"},
"PRON__Gender=Fem|Number=Sing|Person=3": {"pos": "PRON"},
"PRON__Gender=Fem|Number=Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"PRON__Gender=Fem|Number=Sing|PronType=Dem": {"pos": "PRON"},
"PRON__Gender=Fem|Number=Sing|PronType=Rel": {"pos": "PRON"},
"PRON__Gender=Fem|PronType=Rel": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Plur": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Plur|Person=3": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Plur|Person=3|PronType=Prs": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Plur|Person=3|PronType=Rel": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Plur|PronType=Dem": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Plur|PronType=Rel": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Sing": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Sing|Person=3": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Sing|Person=3|PronType=Dem": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Sing|PronType=Dem": {"pos": "PRON"},
"PRON__Gender=Masc|Number=Sing|PronType=Rel": {"pos": "PRON"},
"PRON__Gender=Masc|PronType=Rel": {"pos": "PRON"},
"PRON__NumType=Card|PronType=Rel": {"pos": "PRON"},
"PRON__Number=Plur|Person=1": {"pos": "PRON"},
"PRON__Number=Plur|Person=1|PronType=Prs": {"pos": "PRON"},
"PRON__Number=Plur|Person=1|Reflex=Yes": {"pos": "PRON"},
"PRON__Number=Plur|Person=2": {"pos": "PRON"},
"PRON__Number=Plur|Person=2|PronType=Prs": {"pos": "PRON"},
"PRON__Number=Plur|Person=2|Reflex=Yes": {"pos": "PRON"},
"PRON__Number=Plur|Person=3": {"pos": "PRON"},
"PRON__Number=Plur|PronType=Rel": {"pos": "PRON"},
"PRON__Number=Sing|Person=1": {"pos": "PRON"},
"PRON__Number=Sing|Person=1|PronType=Prs": {"pos": "PRON"},
"PRON__Number=Sing|Person=1|Reflex=Yes": {"pos": "PRON"},
"PRON__Number=Sing|Person=2|PronType=Prs": {"pos": "PRON"},
"PRON__Number=Sing|Person=3": {"pos": "PRON"},
"PRON__Number=Sing|PronType=Dem": {"pos": "PRON"},
"PRON__Number=Sing|PronType=Rel": {"pos": "PRON"},
"PRON__Person=3": {"pos": "PRON"},
"PRON__Person=3|Reflex=Yes": {"pos": "PRON"},
"PRON__PronType=Int": {"pos": "PRON"},
"PRON__PronType=Rel": {"pos": "PRON"},
"PRON___": {"pos": "PRON"},
"PROPN__Gender=Fem|Number=Plur": {"pos": "PRON"},
"PROPN__Gender=Fem|Number=Sing": {"pos": "PRON"},
"PROPN__Gender=Masc": {"pos": "PRON"},
"PROPN__Gender=Masc|Number=Plur": {"pos": "PRON"},
"PROPN__Gender=Masc|Number=Sing": {"pos": "PRON"},
"PROPN__Number=Plur": {"pos": "PRON"},
"PROPN__Number=Sing": {"pos": "PRON"},
"PROPN___": {"pos": "PRON"},
"PUNCT___": {"pos": "PRON"},
"SCONJ___": {"pos": "PRON"},
"VERB__Gender=Fem|Number=Plur|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"VERB__Gender=Fem|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"VERB__Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"VERB__Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"VERB__Gender=Masc|Number=Plur|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"VERB__Gender=Masc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"VERB__Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"VERB__Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"VERB__Gender=Masc|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"VERB__Gender=Masc|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"VERB__Mood=Cnd|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Cnd|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Cnd|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Imp|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Imp|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Plur|Person=1|Tense=Fut|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Plur|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Plur|Person=2|Tense=Fut|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Plur|Person=2|Tense=Imp|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Plur|Person=3|Tense=Fut|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Plur|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Plur|Person=3|Tense=Past|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Sing|Person=1|Tense=Fut|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Sing|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Sing|Person=3|Tense=Fut|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Sing|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|Person=3|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Ind|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Sub|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Sub|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Sub|Number=Sing|Person=3|Tense=Past|VerbForm=Fin": {"pos": "PRON"},
"VERB__Mood=Sub|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"VERB__Number=Plur|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"VERB__Number=Plur|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"VERB__Number=Sing|Tense=Past|VerbForm=Part": {"pos": "PRON"},
"VERB__Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"VERB__Tense=Past|VerbForm=Part": {"pos": "PRON"},
"VERB__Tense=Past|VerbForm=Part|Voice=Pass": {"pos": "PRON"},
"VERB__Tense=Pres|VerbForm=Part": {"pos": "PRON"},
"VERB__VerbForm=Inf": {"pos": "PRON"},
"VERB__VerbForm=Part": {"pos": "PRON"},
"X___": {"pos": "PRON"},
"_SP": {"pos": "PRON"}
}

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .stop_words import STOP_WORDS from .stop_words import STOP_WORDS
from .lemmatizer import LOOKUP from .lemmatizer import LOOKUP
from .tag_map import TAG_MAP
from ..tokenizer_exceptions import BASE_EXCEPTIONS from ..tokenizer_exceptions import BASE_EXCEPTIONS
from ..norm_exceptions import BASE_NORMS from ..norm_exceptions import BASE_NORMS
@ -18,6 +19,7 @@ class ItalianDefaults(Language.Defaults):
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS) tokenizer_exceptions = update_exc(BASE_EXCEPTIONS)
stop_words = STOP_WORDS stop_words = STOP_WORDS
lemma_lookup = LOOKUP lemma_lookup = LOOKUP
tag_map = TAG_MAP
class Italian(Language): class Italian(Language):

320
spacy/lang/it/tag_map.py Normal file
View File

@ -0,0 +1,320 @@
# coding: utf8
from __future__ import unicode_literals
TAG_MAP = {
"AP__Gender=Fem|Number=Plur|Poss=Yes|PronType=Prs": {"pos": "DET"},
"AP__Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs": {"pos": "DET"},
"AP__Gender=Masc|Number=Plur|Poss=Yes|PronType=Prs": {"pos": "DET"},
"AP__Gender=Masc|Number=Sing|Poss=Yes|PronType=Prs": {"pos": "DET"},
"AP__Gender=Masc|Poss=Yes|PronType=Prs": {"pos": "DET"},
"AP__Number=Sing|Poss=Yes|PronType=Prs": {"pos": "DET"},
"AP__Poss=Yes|PronType=Prs": {"pos": "DET"},
"A__Degree=Abs|Gender=Fem|Number=Plur": {"pos": "ADJ"},
"A__Degree=Abs|Gender=Fem|Number=Sing": {"pos": "ADJ"},
"A__Degree=Abs|Gender=Masc|Number=Plur": {"pos": "ADJ"},
"A__Degree=Abs|Gender=Masc|Number=Sing": {"pos": "ADJ"},
"A__Degree=Cmp": {"pos": "ADJ"},
"A__Degree=Cmp|Number=Plur": {"pos": "ADJ"},
"A__Degree=Cmp|Number=Sing": {"pos": "ADJ"},
"A__Gender=Fem|Number=Plur": {"pos": "ADJ"},
"A__Gender=Fem|Number=Sing": {"pos": "ADJ"},
"A__Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs": {"pos": "ADJ"},
"A__Gender=Masc": {"pos": "ADJ"},
"A__Gender=Masc|Number=Plur": {"pos": "ADJ"},
"A__Gender=Masc|Number=Sing": {"pos": "ADJ"},
"A__Number=Plur": {"pos": "ADJ"},
"A__Number=Sing": {"pos": "ADJ"},
"A___": {"pos": "ADJ"},
"BN__PronType=Neg": {"pos": "ADV"},
"B__Degree=Abs": {"pos": "ADV"},
"B__Degree=Abs|Gender=Masc|Number=Sing": {"pos": "ADV"},
"B___": {"pos": "ADV"},
"CC___": {"pos": "CONJ"},
"CS___": {"pos": "SCONJ"},
"DD__Gender=Fem|Number=Plur|PronType=Dem": {"pos": "DET"},
"DD__Gender=Fem|Number=Sing|PronType=Dem": {"pos": "DET"},
"DD__Gender=Masc|Number=Plur|PronType=Dem": {"pos": "DET"},
"DD__Gender=Masc|Number=Sing|PronType=Dem": {"pos": "DET"},
"DD__Gender=Masc|PronType=Dem": {"pos": "DET"},
"DD__Number=Plur|PronType=Dem": {"pos": "DET"},
"DD__Number=Sing|PronType=Dem": {"pos": "DET"},
"DE__PronType=Exc": {"pos": "DET"},
"DI__Definite=Def|Gender=Fem|Number=Plur|PronType=Art": {"pos": "DET"},
"DI__Gender=Fem|Number=Plur": {"pos": "DET"},
"DI__Gender=Fem|Number=Plur|PronType=Ind": {"pos": "DET"},
"DI__Gender=Fem|Number=Sing|PronType=Ind": {"pos": "DET"},
"DI__Gender=Masc|Number=Plur": {"pos": "DET"},
"DI__Gender=Masc|Number=Plur|PronType=Ind": {"pos": "DET"},
"DI__Gender=Masc|Number=Sing|PronType=Ind": {"pos": "DET"},
"DI__Number=Sing|PronType=Art": {"pos": "DET"},
"DI__Number=Sing|PronType=Ind": {"pos": "DET"},
"DI__PronType=Ind": {"pos": "DET"},
"DQ__Gender=Fem|Number=Plur|PronType=Int": {"pos": "DET"},
"DQ__Gender=Fem|Number=Sing|PronType=Int": {"pos": "DET"},
"DQ__Gender=Masc|Number=Plur|PronType=Int": {"pos": "DET"},
"DQ__Gender=Masc|Number=Sing|PronType=Int": {"pos": "DET"},
"DQ__Number=Plur|PronType=Int": {"pos": "DET"},
"DQ__Number=Sing|PronType=Int": {"pos": "DET"},
"DQ__PronType=Int": {"pos": "DET"},
"DQ___": {"pos": "DET"},
"DR__Number=Plur|PronType=Rel": {"pos": "DET"},
"DR__PronType=Rel": {"pos": "DET"},
"E__Gender=Masc|Number=Sing": {"pos": "ADP"},
"E___": {"pos": "ADP"},
"FB___": {"pos": "PUNCT"},
"FC___": {"pos": "PUNCT"},
"FF___": {"pos": "PUNCT"},
"FS___": {"pos": "PUNCT"},
"I__Polarity=Neg": {"pos": "INTJ"},
"I__Polarity=Pos": {"pos": "INTJ"},
"I___": {"pos": "INTJ"},
"NO__Gender=Fem|Number=Plur|NumType=Ord": {"pos": "ADJ"},
"NO__Gender=Fem|Number=Sing|NumType=Ord": {"pos": "ADJ"},
"NO__Gender=Masc|Number=Plur": {"pos": "ADJ"},
"NO__Gender=Masc|Number=Plur|NumType=Ord": {"pos": "ADJ"},
"NO__Gender=Masc|Number=Sing|NumType=Ord": {"pos": "ADJ"},
"NO__NumType=Ord": {"pos": "ADJ"},
"NO__Number=Sing|NumType=Ord": {"pos": "ADJ"},
"NO___": {"pos": "ADJ"},
"N__Gender=Masc|Number=Sing": {"pos": "NUM"},
"N__NumType=Card": {"pos": "NUM"},
"N__NumType=Range": {"pos": "NUM"},
"N___": {"pos": "NUM"},
"PART___": {"pos": "PART"},
"PC__Clitic=Yes|Definite=Def|Gender=Fem|Number=Plur|PronType=Art": {"pos": "PRON"},
"PC__Clitic=Yes|Gender=Fem|Number=Plur|Person=3|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Gender=Fem|Number=Plur|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Gender=Fem|Number=Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Gender=Fem|Person=3|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Gender=Masc|Number=Plur|Person=3|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Gender=Masc|Number=Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Gender=Masc|Number=Sing|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Number=Plur|Person=1|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Number=Plur|Person=2|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Number=Plur|Person=3|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Number=Plur|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Number=Sing|Person=1|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Number=Sing|Person=2|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Number=Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|Person=3|PronType=Prs": {"pos": "PRON"},
"PC__Clitic=Yes|PronType=Prs": {"pos": "PRON"},
"PD__Gender=Fem|Number=Plur|PronType=Dem": {"pos": "PRON"},
"PD__Gender=Fem|Number=Sing|PronType=Dem": {"pos": "PRON"},
"PD__Gender=Masc|Number=Plur|PronType=Dem": {"pos": "PRON"},
"PD__Gender=Masc|Number=Sing|PronType=Dem": {"pos": "PRON"},
"PD__Number=Plur|PronType=Dem": {"pos": "PRON"},
"PD__Number=Sing|PronType=Dem": {"pos": "PRON"},
"PD__PronType=Dem": {"pos": "PRON"},
"PE__Gender=Fem|Number=Plur|Person=3|PronType=Prs": {"pos": "PRON"},
"PE__Gender=Fem|Number=Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"PE__Gender=Masc|Number=Plur|Person=3|PronType=Prs": {"pos": "PRON"},
"PE__Gender=Masc|Number=Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"PE__Number=Plur|Person=1|PronType=Prs": {"pos": "PRON"},
"PE__Number=Plur|Person=2|PronType=Prs": {"pos": "PRON"},
"PE__Number=Plur|Person=3|PronType=Prs": {"pos": "PRON"},
"PE__Number=Sing|Person=1|PronType=Prs": {"pos": "PRON"},
"PE__Number=Sing|Person=2|PronType=Prs": {"pos": "PRON"},
"PE__Number=Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"PE__Person=3|PronType=Prs": {"pos": "PRON"},
"PE__PronType=Prs": {"pos": "PRON"},
"PI__Gender=Fem|Number=Plur|PronType=Ind": {"pos": "PRON"},
"PI__Gender=Fem|Number=Sing|PronType=Ind": {"pos": "PRON"},
"PI__Gender=Masc|Number=Plur|PronType=Ind": {"pos": "PRON"},
"PI__Gender=Masc|Number=Sing": {"pos": "PRON"},
"PI__Gender=Masc|Number=Sing|PronType=Ind": {"pos": "PRON"},
"PI__Number=Plur|PronType=Ind": {"pos": "PRON"},
"PI__Number=Sing|PronType=Ind": {"pos": "PRON"},
"PI__PronType=Ind": {"pos": "PRON"},
"PP__Gender=Fem|Number=Sing|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"PP__Gender=Masc|Number=Plur|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"PP__Gender=Masc|Number=Sing|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"PP__Number=Plur|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"PP__Number=Sing|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"PQ__Gender=Fem|Number=Plur|PronType=Int": {"pos": "PRON"},
"PQ__Gender=Fem|Number=Sing|PronType=Int": {"pos": "PRON"},
"PQ__Gender=Masc|Number=Plur|PronType=Int": {"pos": "PRON"},
"PQ__Gender=Masc|Number=Sing|PronType=Int": {"pos": "PRON"},
"PQ__Number=Plur|PronType=Int": {"pos": "PRON"},
"PQ__Number=Sing|PronType=Int": {"pos": "PRON"},
"PQ__PronType=Int": {"pos": "PRON"},
"PR__Gender=Masc|Number=Plur|PronType=Rel": {"pos": "PRON"},
"PR__Gender=Masc|Number=Sing|PronType=Rel": {"pos": "PRON"},
"PR__Gender=Masc|PronType=Rel": {"pos": "PRON"},
"PR__Number=Plur|PronType=Rel": {"pos": "PRON"},
"PR__Number=Sing|PronType=Rel": {"pos": "PRON"},
"PR__Person=3|PronType=Rel": {"pos": "PRON"},
"PR__PronType=Rel": {"pos": "PRON"},
"RD__Definite=Def": {"pos": "DET"},
"RD__Definite=Def|Gender=Fem": {"pos": "DET"},
"RD__Definite=Def|Gender=Fem|Number=Plur|PronType=Art": {"pos": "DET"},
"RD__Definite=Def|Gender=Fem|Number=Sing|PronType=Art": {"pos": "DET"},
"RD__Definite=Def|Gender=Masc|Number=Plur|PronType=Art": {"pos": "DET"},
"RD__Definite=Def|Gender=Masc|Number=Sing|PronType=Art": {"pos": "DET"},
"RD__Definite=Def|Number=Plur|PronType=Art": {"pos": "DET"},
"RD__Definite=Def|Number=Sing|PronType=Art": {"pos": "DET"},
"RD__Definite=Def|PronType=Art": {"pos": "DET"},
"RD__Gender=Fem|Number=Sing": {"pos": "DET"},
"RD__Gender=Masc|Number=Sing": {"pos": "DET"},
"RD__Number=Sing": {"pos": "DET"},
"RD__Number=Sing|PronType=Art": {"pos": "DET"},
"RI__Definite=Ind|Gender=Fem|Number=Plur|PronType=Art": {"pos": "DET"},
"RI__Definite=Ind|Gender=Fem|Number=Sing|PronType=Art": {"pos": "DET"},
"RI__Definite=Ind|Gender=Masc|Number=Plur|PronType=Art": {"pos": "DET"},
"RI__Definite=Ind|Gender=Masc|Number=Sing|PronType=Art": {"pos": "DET"},
"RI__Definite=Ind|Number=Sing|PronType=Art": {"pos": "DET"},
"RI__Definite=Ind|PronType=Art": {"pos": "DET"},
"SP__Gender=Fem|Number=Plur": {"pos": "PROPN"},
"SP__NumType=Card": {"pos": "PROPN"},
"SP___": {"pos": "PROPN"},
"SW__Foreign=Yes": {"pos": "X"},
"SW__Foreign=Yes|Gender=Masc": {"pos": "X"},
"SW__Foreign=Yes|Number=Sing": {"pos": "X"},
"SYM___": {"pos": "SYM"},
"S__Gender=Fem": {"pos": "NOUN"},
"S__Gender=Fem|Number=Plur": {"pos": "NOUN"},
"S__Gender=Fem|Number=Sing": {"pos": "NOUN"},
"S__Gender=Masc": {"pos": "NOUN"},
"S__Gender=Masc|Number=Plur": {"pos": "NOUN"},
"S__Gender=Masc|Number=Sing": {"pos": "NOUN"},
"S__Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "NOUN"},
"S__Number=Plur": {"pos": "NOUN"},
"S__Number=Sing": {"pos": "NOUN"},
"S___": {"pos": "NOUN"},
"Sw___": {"pos": "X"},
"T__Gender=Fem|Number=Plur|PronType=Tot": {"pos": "DET"},
"T__Gender=Fem|Number=Sing": {"pos": "DET"},
"T__Gender=Fem|Number=Sing|PronType=Tot": {"pos": "DET"},
"T__Gender=Masc|Number=Plur|PronType=Tot": {"pos": "DET"},
"T__Gender=Masc|Number=Sing|PronType=Tot": {"pos": "DET"},
"T__Number=Plur|PronType=Tot": {"pos": "DET"},
"T__PronType=Tot": {"pos": "DET"},
"VA__Gender=Fem|Number=Plur|Tense=Past|VerbForm=Part": {"pos": "AUX"},
"VA__Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "AUX"},
"VA__Gender=Masc|Number=Plur|Tense=Past|VerbForm=Part": {"pos": "AUX"},
"VA__Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "AUX"},
"VA__Mood=Cnd|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Cnd|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Cnd|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Cnd|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Plur|Person=1|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Plur|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Plur|Person=2|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Plur|Person=2|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Plur|Person=3|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Plur|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Plur|Person=3|Tense=Past|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Sing|Person=1|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Sing|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Sing|Person=2|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Sing|Person=3|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Sing|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Sub|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Sub|Number=Plur|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Sub|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Sub|Number=Sing|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Sub|Number=Sing|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VA__Mood=Sub|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VA__VerbForm=Ger": {"pos": "AUX"},
"VA__VerbForm=Inf": {"pos": "AUX"},
"VM__Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "AUX"},
"VM__Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "AUX"},
"VM__Mood=Cnd|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Cnd|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Cnd|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Cnd|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Cnd|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Cnd|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Imp|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Plur|Person=1|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Plur|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Plur|Person=2|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Plur|Person=3|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Plur|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Plur|Person=3|Tense=Past|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Sing|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Sing|Person=2|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Sing|Person=2|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Sing|Person=3|Tense=Fut|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Sing|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Sub|Number=Plur|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Sub|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Sub|Number=Plur|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Sub|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Sub|Number=Sing|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "AUX"},
"VM__Mood=Sub|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "AUX"},
"VM__VerbForm=Ger": {"pos": "AUX"},
"VM__VerbForm=Inf": {"pos": "AUX"},
"V__Gender=Fem|Number=Plur|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V__Gender=Fem|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V__Gender=Masc|Number=Plur|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V__Gender=Masc|Number=Plur|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V__Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V__Mood=Cnd|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Cnd|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Cnd|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Cnd|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Cnd|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Cnd|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Imp|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Imp|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Imp|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Plur|Person=1|Tense=Fut|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Plur|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Plur|Person=1|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Plur|Person=2|Tense=Fut|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Plur|Person=3|Tense=Fut|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Plur|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Plur|Person=3|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Sing|Person=1|Tense=Fut|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Sing|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Sing|Person=2|Tense=Fut|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Sing|Person=3|Tense=Fut|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Sing|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Ind|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Plur|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Plur|Person=2|Tense=Imp|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Plur|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Sing|Person=1|Tense=Imp|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Sing|Person=3|Tense=Imp|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V__Mood=Sub|Number=Sing|Person=3|VerbForm=Fin": {"pos": "VERB"},
"V__Number=Plur|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V__Number=Sing|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V__Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V__VerbForm=Ger": {"pos": "VERB"},
"V__VerbForm=Inf": {"pos": "VERB"},
"X___": {"pos": "X"},
"_SP": {"pos": "SPACE"}
}

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .stop_words import STOP_WORDS from .stop_words import STOP_WORDS
from .lex_attrs import LEX_ATTRS from .lex_attrs import LEX_ATTRS
from .tag_map import TAG_MAP
from ..tokenizer_exceptions import BASE_EXCEPTIONS from ..tokenizer_exceptions import BASE_EXCEPTIONS
from ..norm_exceptions import BASE_NORMS from ..norm_exceptions import BASE_NORMS
@ -18,6 +19,7 @@ class DutchDefaults(Language.Defaults):
lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS) lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS)
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS) tokenizer_exceptions = update_exc(BASE_EXCEPTIONS)
stop_words = STOP_WORDS stop_words = STOP_WORDS
tag_map = TAG_MAP
class Dutch(Language): class Dutch(Language):

809
spacy/lang/nl/tag_map.py Normal file
View File

@ -0,0 +1,809 @@
# coding: utf8
from __future__ import unicode_literals
TAG_MAP = {
"ADJ__Number=Sing": {"pos": "ADJ"},
"ADJ___": {"pos": "ADJ"},
"ADP__AdpType=Prep": {"pos": "ADP"},
"ADP__AdpType=Preppron|Gender=Fem|Number=Sing": {"pos": "ADP"},
"ADP__AdpType=Preppron|Gender=Masc|Number=Plur": {"pos": "ADP"},
"ADP__AdpType=Preppron|Gender=Masc|Number=Sing": {"pos": "ADP"},
"ADV__Number=Sing": {"pos": "ADV"},
"ADV__PunctType=Comm": {"pos": "ADV"},
"ADV___": {"pos": "ADV"},
"Adj_Adj_N_N__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_Adj_N__Degree=Pos|Number=Plur|Variant=Short": {"pos": "ADJ"},
"Adj_Adj_N__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_Adj__Case=Nom|Degree=Pos": {"pos": "ADJ"},
"Adj_Adj__Degree=Pos": {"pos": "ADJ"},
"Adj_Adj__Degree=Pos|Variant=Short": {"pos": "ADJ"},
"Adj_Adv__Degree=Pos|Variant=Short": {"pos": "ADJ"},
"Adj_Adv|adv|stell|onverv_deelv__Degree=Pos|Variant=Short": {"pos": "ADJ"},
"Adj_Art__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_Art__Degree=Pos|Number=Sing|Variant=Short": {"pos": "ADJ"},
"Adj_Conj_V__Degree=Pos|Mood=Sub|VerbForm=Fin": {"pos": "ADJ"},
"Adj_Int|attr|stell|vervneut__Case=Nom|Degree=Pos": {"pos": "ADJ"},
"Adj_Misc_Misc__Degree=Pos": {"pos": "ADJ"},
"Adj_N_Conj_N__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_N_N_N_N__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_N_N_N__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_N_N__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_N_Num__Definite=Def|Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_N_Prep_Art_Adj_N__Degree=Pos|Gender=Neut|Number=Sing": {"pos": "ADJ"},
"Adj_N_Prep_N_Conj_N__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_N_Prep_N_N__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_N_Prep_N__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_N_Punc__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_N__Degree=Pos|Number=Plur": {"pos": "ADJ"},
"Adj_N__Degree=Pos|Number=Sing": {"pos": "ADJ"},
"Adj_N__Degree=Pos|Number=Sing|Variant=Short": {"pos": "ADJ"},
"Adj_Num__Definite=Def|Degree=Pos": {"pos": "ADJ"},
"Adj_Num__Definite=Def|Degree=Pos|Variant=Short": {"pos": "ADJ"},
"Adj_Prep|adv|stell|vervneut_voor__Degree=Pos|Variant=Short": {"pos": "ADJ"},
"Adj_Prep|adv|vergr|onverv_voor__Degree=Cmp|Variant=Short": {"pos": "ADJ"},
"Adj_V_Conj_V__Degree=Pos|VerbForm=Inf": {"pos": "ADJ"},
"Adj_V_N__Degree=Pos|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "ADJ"},
"Adj_V|adv|stell|onverv_intrans|inf__Degree=Pos|Variant=Short|VerbForm=Inf": {"pos": "ADJ"},
"Adj_V|adv|stell|onverv_trans|imp__Degree=Pos|Mood=Imp|Variant=Short|VerbForm=Fin": {"pos": "ADJ"},
"Adj|adv|stell|onverv__Degree=Pos|Variant=Short": {"pos": "ADJ"},
"Adj|adv|stell|vervneut__Case=Nom|Degree=Pos|Variant=Short": {"pos": "ADJ"},
"Adj|adv|vergr|onverv__Degree=Cmp|Variant=Short": {"pos": "ADJ"},
"Adj|adv|vergr|vervneut__Case=Nom|Degree=Cmp|Variant=Short": {"pos": "ADJ"},
"Adj|attr|overtr|onverv__Degree=Sup": {"pos": "ADJ"},
"Adj|attr|overtr|vervneut__Case=Nom|Degree=Sup": {"pos": "ADJ"},
"Adj|attr|stell|onverv__Degree=Pos": {"pos": "ADJ"},
"Adj|attr|stell|vervgen__Case=Gen|Degree=Pos": {"pos": "ADJ"},
"Adj|attr|stell|vervneut__Case=Nom|Degree=Pos": {"pos": "ADJ"},
"Adj|attr|vergr|onverv__Degree=Cmp": {"pos": "ADJ"},
"Adj|attr|vergr|vervgen__Case=Gen|Degree=Cmp": {"pos": "ADJ"},
"Adj|attr|vergr|vervneut__Case=Nom|Degree=Cmp": {"pos": "ADJ"},
"Adj|zelfst|overtr|vervneut__Case=Nom|Degree=Sup": {"pos": "ADJ"},
"Adj|zelfst|stell|onverv__Degree=Pos": {"pos": "ADJ"},
"Adj|zelfst|stell|vervmv__Degree=Pos|Number=Plur": {"pos": "ADJ"},
"Adj|zelfst|stell|vervneut__Case=Nom|Degree=Pos": {"pos": "ADJ"},
"Adj|zelfst|vergr|vervneut__Case=Nom|Degree=Cmp": {"pos": "ADJ"},
"Adv_Adj_Conj__Degree=Pos": {"pos": "ADV"},
"Adv_Adj__Degree=Cmp": {"pos": "ADV"},
"Adv_Adj__Degree=Pos": {"pos": "ADV"},
"Adv_Adv_Conj_Adv__PronType=Dem": {"pos": "ADV"},
"Adv_Adv__AdpType=Prep": {"pos": "ADV"},
"Adv_Adv__Degree=Pos": {"pos": "ADV"},
"Adv_Adv__Degree=Pos|PronType=Dem": {"pos": "ADV"},
"Adv_Adv|pron|vrag_deeladv___": {"pos": "ADV"},
"Adv_Art__Degree=Pos|Number=Sing": {"pos": "ADV"},
"Adv_Art__Number=Sing": {"pos": "ADV"},
"Adv_Conj_Adv__AdpType=Preppron|Gender=Masc|Number=Sing": {"pos": "ADV"},
"Adv_Conj_Adv__Degree=Pos": {"pos": "ADV"},
"Adv_Conj_Adv|gew|aanw_neven_gew|aanw__PronType=Dem": {"pos": "ADV"},
"Adv_Conj_Adv|gew|onbep_neven_gew|onbep__PronType=Ind": {"pos": "ADV"},
"Adv_Conj_N__Degree=Pos|Number=Sing": {"pos": "ADV"},
"Adv_Conj__Degree=Pos": {"pos": "ADV"},
"Adv_N__Degree=Pos|Number=Sing": {"pos": "ADV"},
"Adv_Num__Degree=Cmp|PronType=Ind": {"pos": "ADV"},
"Adv_N|gew|aanw_soort|ev|neut__Number=Sing": {"pos": "ADV"},
"Adv_Prep_N__Case=Dat|Degree=Pos|Number=Sing": {"pos": "ADV"},
"Adv_Prep_Pron__AdpType=Preppron|Gender=Masc|Number=Sing": {"pos": "ADV"},
"Adv_Prep__Degree=Pos": {"pos": "ADV"},
"Adv_Prep|gew|aanw_voor__AdpType=Prep": {"pos": "ADV"},
"Adv_Prep|gew|aanw_voor___": {"pos": "ADV"},
"Adv_Pron__Degree=Pos": {"pos": "ADV"},
"Adv|deeladv__PartType=Vbp": {"pos": "ADV"},
"Adv|deelv__PartType=Vbp": {"pos": "ADV"},
"Adv|gew|aanw__PronType=Dem": {"pos": "ADV"},
"Adv|gew|betr__PronType=Rel": {"pos": "ADV"},
"Adv|gew|er__AdvType=Ex": {"pos": "ADV"},
"Adv|gew|geenfunc|overtr|onverv__Degree=Sup": {"pos": "ADV"},
"Adv|gew|geenfunc|stell|onverv__Degree=Pos": {"pos": "ADV"},
"Adv|gew|geenfunc|vergr|onverv__Degree=Cmp": {"pos": "ADV"},
"Adv|gew|onbep__PronType=Ind": {"pos": "ADV"},
"Adv|gew|vrag__PronType=Int": {"pos": "ADV"},
"Adv|pron|aanw__PronType=Dem": {"pos": "ADV"},
"Adv|pron|betr__PronType=Rel": {"pos": "ADV"},
"Adv|pron|er__AdvType=Ex": {"pos": "ADV"},
"Adv|pron|onbep__PronType=Ind": {"pos": "ADV"},
"Adv|pron|vrag__PronType=Int": {"pos": "ADV"},
"Art_Adj_N__AdpType=Prep": {"pos": "DET"},
"Art_Adj_N__Definite=Def|Degree=Sup|Gender=Neut|Number=Sing": {"pos": "DET"},
"Art_Adj__Case=Nom|Definite=Def|Degree=Cmp|Gender=Neut": {"pos": "DET"},
"Art_Adj__Case=Nom|Definite=Def|Degree=Sup|Gender=Neut": {"pos": "DET"},
"Art_Adj__Definite=Def|Degree=Cmp|Gender=Neut": {"pos": "DET"},
"Art_Adj__Definite=Def|Degree=Sup|Gender=Neut": {"pos": "DET"},
"Art_Adv__Definite=Def|Degree=Sup|Gender=Neut": {"pos": "DET"},
"Art_Conj_Pron__Number=Sing|PronType=Ind": {"pos": "DET"},
"Art_N_Conj_Art_N__Definite=Def|Gender=Neut|Number=Sing": {"pos": "DET"},
"Art_N_Conj_Art_V__AdpType=Prep": {"pos": "DET"},
"Art_N_Conj_Pron_N__Definite=Def|Gender=Neut|Number=Plur|Person=3": {"pos": "DET"},
"Art_N_Conj__Number=Sing|PronType=Ind": {"pos": "DET"},
"Art_N_N__AdpType=Prep": {"pos": "DET"},
"Art_N_Prep_Adj__Degree=Pos|Number=Sing|PronType=Ind": {"pos": "DET"},
"Art_N_Prep_Art_N__Number=Sing|PronType=Ind": {"pos": "DET"},
"Art_N_Prep_N__AdpType=Prep": {"pos": "DET"},
"Art_N_Prep_N__Definite=Def|Gender=Neut|Number=Sing": {"pos": "DET"},
"Art_N_Prep_N__Number=Sing|PronType=Ind": {"pos": "DET"},
"Art_N_Prep_Pron_N__AdpType=Prep": {"pos": "DET"},
"Art_N__AdpType=Prep": {"pos": "DET"},
"Art_N__Case=Gen|Definite=Def|Number=Sing": {"pos": "DET"},
"Art_N__Number=Sing|PronType=Ind": {"pos": "DET"},
"Art_Num_Art_Adj__AdpType=Prep": {"pos": "DET"},
"Art_Num_N__AdpType=Prep": {"pos": "DET"},
"Art_Num__Definite=Def|Degree=Sup|Gender=Neut|PronType=Ind": {"pos": "DET"},
"Art_Num__Definite=Def|Gender=Neut": {"pos": "DET"},
"Art_Num__Degree=Pos|Number=Sing|PronType=Ind": {"pos": "DET"},
"Art_N|bep|onzijd|neut_eigen|ev|neut__Definite=Def|Gender=Neut|Number=Sing": {"pos": "DET"},
"Art_N|bep|onzijd|neut_soort|ev|neut__Definite=Def|Gender=Neut|Number=Sing": {"pos": "DET"},
"Art_Pron_N__Case=Gen|Number=Plur|PronType=Ind": {"pos": "DET"},
"Art_Pron__Number=Sing|PronType=Ind": {"pos": "DET"},
"Art_V_N__AdpType=Prep": {"pos": "DET"},
"Art|bep|onzijd|neut__Definite=Def|Gender=Neut|PronType=Art": {"pos": "DET"},
"Art|bep|zijdofmv|gen__Case=Gen|Definite=Def|PronType=Art": {"pos": "DET"},
"Art|bep|zijdofmv|neut__Definite=Def|PronType=Art": {"pos": "DET"},
"Art|bep|zijdofonzijd|gen__Case=Gen|Definite=Def|Number=Sing|PronType=Art": {"pos": "DET"},
"Art|bep|zijd|dat__Case=Dat|Definite=Def|Gender=Com|PronType=Art": {"pos": "DET"},
"Art|onbep|zijdofonzijd|neut__Definite=Ind|Number=Sing|PronType=Art": {"pos": "DET"},
"CCONJ___": {"pos": "CONJ"},
"Conj_Adj|neven_adv|vergr|onverv__Degree=Cmp": {"pos": "CONJ"},
"Conj_Adj|neven_attr|stell|onverv__Degree=Pos": {"pos": "CONJ"},
"Conj_Adv_Adv__Degree=Pos": {"pos": "CONJ"},
"Conj_Adv__AdpType=Prep": {"pos": "CONJ"},
"Conj_Adv__AdpType=Preppron|Gender=Masc|Number=Plur": {"pos": "CONJ"},
"Conj_Adv__Degree=Pos": {"pos": "CONJ"},
"Conj_Adv|neven_gew|aanw__PronType=Dem": {"pos": "CONJ"},
"Conj_Art_N__AdpType=Preppron|Gender=Masc|Number=Plur": {"pos": "CONJ"},
"Conj_Art_N__Gender=Neut|Number=Sing": {"pos": "CONJ"},
"Conj_Conj|neven_onder|metfin___": {"pos": "CONJ"},
"Conj_Int|neven___": {"pos": "CONJ"},
"Conj_Int|onder|metfin___": {"pos": "CONJ"},
"Conj_N_Adv__AdpType=Preppron|Gender=Masc|Number=Plur": {"pos": "CONJ"},
"Conj_N_Prep__AdpType=Preppron|Gender=Masc|Number=Plur": {"pos": "CONJ"},
"Conj_N|onder|metfin_soort|ev|neut__AdpType=Preppron|Gender=Masc|Number=Plur": {"pos": "CONJ"},
"Conj_Pron_Adv__Degree=Pos|Number=Sing|Person=3": {"pos": "CONJ"},
"Conj_Pron_V__AdpType=Preppron|Gender=Masc|Number=Plur": {"pos": "CONJ"},
"Conj_Pron|neven_aanw|neut|zelfst__AdpType=Prep": {"pos": "CONJ"},
"Conj_Punc_Conj|neven_schuinstreep_neven__AdpType=Prep": {"pos": "CONJ"},
"Conj_V|onder|metfin_intrans|ott|3|ev__AdpType=Preppron|Gender=Masc|Number=Plur": {"pos": "CONJ"},
"Conj|neven___": {"pos": "CONJ"},
"Conj|onder|metfin___": {"pos": "CONJ"},
"Conj|onder|metinf___": {"pos": "CONJ"},
"DET__Degree=Cmp|NumType=Card|PronType=Ind": {"pos": "DET"},
"DET__Gender=Fem|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs": {"pos": "DET"},
"DET__Gender=Fem|Number=Sing|PronType=Art": {"pos": "DET"},
"DET__Gender=Masc|Number=Plur|PronType=Art": {"pos": "DET"},
"DET__Gender=Masc|Number=Sing|PronType=Tot": {"pos": "DET"},
"Int_Adv|gew|aanw___": {"pos": "X"},
"Int_Int__NumType=Card": {"pos": "X"},
"Int_Int___": {"pos": "X"},
"Int_N_N_Misc_N___": {"pos": "X"},
"Int_N_Punc_Int_N__Number=Sing": {"pos": "X"},
"Int_Punc_Int|komma__PunctType=Comm": {"pos": "X"},
"Int___": {"pos": "X"},
"Misc_Misc_Misc_Misc_Misc_Misc_Misc_Misc_Misc___": {"pos": "MISC"},
"Misc_Misc_Misc_Misc_Misc_Misc_Misc___": {"pos": "MISC"},
"Misc_Misc_Misc_Misc_Misc_Misc_Punc_Misc_Misc_Misc___": {"pos": "MISC"},
"Misc_Misc_Misc_Misc_Misc_Misc___": {"pos": "MISC"},
"Misc_Misc_Misc_Misc_Misc_N_Misc_Misc_Misc_Misc_Misc_Misc___": {"pos": "MISC"},
"Misc_Misc_Misc_Misc|vreemd_vreemd_vreemd_vreemd__AdpType=Preppron|Gender=Masc|Number=Sing": {"pos": "MISC"},
"Misc_Misc_Misc_Misc|vreemd_vreemd_vreemd_vreemd___": {"pos": "MISC"},
"Misc_Misc_Misc_N__Number=Sing": {"pos": "MISC"},
"Misc_Misc_Misc|vreemd_vreemd_vreemd___": {"pos": "MISC"},
"Misc_Misc_N_N__Number=Sing": {"pos": "MISC"},
"Misc_Misc_N|vreemd_vreemd_soort|mv|neut__Number=Plur": {"pos": "MISC"},
"Misc_Misc_Punc_N_N__Number=Sing": {"pos": "MISC"},
"Misc_Misc|vreemd_vreemd__AdpType=Prep": {"pos": "MISC"},
"Misc_Misc|vreemd_vreemd__NumType=Card": {"pos": "MISC"},
"Misc_Misc|vreemd_vreemd___": {"pos": "MISC"},
"Misc_N_Misc_Misc__Number=Sing": {"pos": "MISC"},
"Misc_N_N__Number=Sing": {"pos": "MISC"},
"Misc_N|vreemd_eigen|ev|neut__Number=Sing": {"pos": "MISC"},
"Misc_N|vreemd_soort|ev|neut__Number=Sing": {"pos": "MISC"},
"Misc|vreemd__Foreign=Yes": {"pos": "MISC"},
"NUM__Case=Nom|Definite=Def|Degree=Pos|NumType=Card": {"pos": "NUM"},
"NUM__Definite=Def|Degree=Pos|NumType=Card": {"pos": "NUM"},
"NUM__Definite=Def|Degree=Pos|Number=Sing|NumType=Card": {"pos": "NUM"},
"NUM__Definite=Def|NumType=Card": {"pos": "NUM"},
"NUM__Definite=Def|Number=Plur|NumType=Card": {"pos": "NUM"},
"NUM__Definite=Def|Number=Sing|NumType=Card": {"pos": "NUM"},
"NUM__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"NUM__NumType=Card": {"pos": "NUM"},
"N_Adj_N_Num__Definite=Def|Degree=Pos|Number=Sing": {"pos": "NOUN"},
"N_Adj_N__Degree=Pos|Number=Plur": {"pos": "NOUN"},
"N_Adj_N___": {"pos": "NOUN"},
"N_Adj__AdpType=Prep": {"pos": "NOUN"},
"N_Adj__Case=Nom|Degree=Pos|Number=Plur": {"pos": "NOUN"},
"N_Adj__Case=Nom|Degree=Pos|Number=Sing": {"pos": "NOUN"},
"N_Adj__Degree=Pos|Number=Plur": {"pos": "NOUN"},
"N_Adj__Degree=Pos|Number=Sing": {"pos": "NOUN"},
"N_Adj___": {"pos": "NOUN"},
"N_Adv_Punc_V_Pron_V__Aspect=Imp|Degree=Pos|Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Inf": {"pos": "NOUN"},
"N_Adv__Degree=Pos|Number=Sing": {"pos": "NOUN"},
"N_Adv___": {"pos": "NOUN"},
"N_Adv|soort|ev|neut_deelv__Number=Sing": {"pos": "NOUN"},
"N_Art_Adj_Prep_N___": {"pos": "NOUN"},
"N_Art_N__Case=Gen|Number=Sing": {"pos": "NOUN"},
"N_Art_N__Number=Plur": {"pos": "NOUN"},
"N_Art_N__Number=Sing": {"pos": "NOUN"},
"N_Art_N___": {"pos": "NOUN"},
"N_Conj_Adv__Degree=Pos|Number=Sing": {"pos": "NOUN"},
"N_Conj_Art_N___": {"pos": "NOUN"},
"N_Conj_N_N__Number=Sing": {"pos": "NOUN"},
"N_Conj_N_N___": {"pos": "NOUN"},
"N_Conj_N__Number=Plur": {"pos": "NOUN"},
"N_Conj_N__Number=Sing": {"pos": "NOUN"},
"N_Conj_N___": {"pos": "NOUN"},
"N_Conj|soort|ev|neut_neven__Number=Sing": {"pos": "NOUN"},
"N_Int_N|eigen|ev|neut_eigen|ev|neut___": {"pos": "NOUN"},
"N_Misc_Misc_Misc_Misc___": {"pos": "NOUN"},
"N_Misc_Misc_N___": {"pos": "NOUN"},
"N_Misc_Misc|eigen|ev|neut_vreemd_vreemd___": {"pos": "NOUN"},
"N_Misc_Misc|soort|mv|neut_vreemd_vreemd__Number=Plur": {"pos": "NOUN"},
"N_Misc_N_N_N_N___": {"pos": "NOUN"},
"N_Misc_N_N___": {"pos": "NOUN"},
"N_Misc_N___": {"pos": "NOUN"},
"N_Misc_Num___": {"pos": "NOUN"},
"N_Misc|eigen|ev|neut_vreemd___": {"pos": "NOUN"},
"N_Misc|soort|ev|neut_vreemd__Number=Sing": {"pos": "NOUN"},
"N_N_Adj_Art_N_N__Gender=Masc|Number=Plur|PronType=Art": {"pos": "NOUN"},
"N_N_Adj_N___": {"pos": "NOUN"},
"N_N_Adj__Degree=Pos|Number=Sing": {"pos": "NOUN"},
"N_N_Adj___": {"pos": "NOUN"},
"N_N_Art_Adv___": {"pos": "NOUN"},
"N_N_Art_N___": {"pos": "NOUN"},
"N_N_Conj_N_N_N_N_N___": {"pos": "NOUN"},
"N_N_Conj_N_N___": {"pos": "NOUN"},
"N_N_Conj_N__Number=Sing": {"pos": "NOUN"},
"N_N_Conj_N___": {"pos": "NOUN"},
"N_N_Conj___": {"pos": "NOUN"},
"N_N_Int_N_N___": {"pos": "NOUN"},
"N_N_Misc___": {"pos": "NOUN"},
"N_N_N_Adj_N___": {"pos": "NOUN"},
"N_N_N_Adv___": {"pos": "NOUN"},
"N_N_N_Int__AdpType=Prep": {"pos": "NOUN"},
"N_N_N_Misc___": {"pos": "NOUN"},
"N_N_N_N_Conj_N___": {"pos": "NOUN"},
"N_N_N_N_Misc___": {"pos": "NOUN"},
"N_N_N_N_N_N_Int__AdpType=Prep": {"pos": "NOUN"},
"N_N_N_N_N_N_N__AdpType=Prep": {"pos": "NOUN"},
"N_N_N_N_N_N_N__Gender=Fem|Number=Sing|PronType=Art": {"pos": "NOUN"},
"N_N_N_N_N_N_N___": {"pos": "NOUN"},
"N_N_N_N_N_N_Prep_N___": {"pos": "NOUN"},
"N_N_N_N_N_N__AdpType=Prep": {"pos": "NOUN"},
"N_N_N_N_N_N___": {"pos": "NOUN"},
"N_N_N_N_N_Prep_N___": {"pos": "NOUN"},
"N_N_N_N_N__AdpType=Prep": {"pos": "NOUN"},
"N_N_N_N_N__Number=Sing": {"pos": "NOUN"},
"N_N_N_N_N___": {"pos": "NOUN"},
"N_N_N_N_Prep_N___": {"pos": "NOUN"},
"N_N_N_N_Punc_N_Punc___": {"pos": "NOUN"},
"N_N_N_N_V___": {"pos": "NOUN"},
"N_N_N_N__Gender=Fem|Number=Plur|PronType=Art": {"pos": "NOUN"},
"N_N_N_N__Gender=Fem|Number=Sing|PronType=Art": {"pos": "NOUN"},
"N_N_N_N__NumType=Card": {"pos": "NOUN"},
"N_N_N_N__Number=Plur": {"pos": "NOUN"},
"N_N_N_N__Number=Sing": {"pos": "NOUN"},
"N_N_N_N___": {"pos": "NOUN"},
"N_N_N_Prep_Art_Adj_N___": {"pos": "NOUN"},
"N_N_N_Prep_N_N___": {"pos": "NOUN"},
"N_N_N_Prep_N___": {"pos": "NOUN"},
"N_N_N_Punc_N___": {"pos": "NOUN"},
"N_N_N_Punc___": {"pos": "NOUN"},
"N_N_N__AdpType=Prep": {"pos": "NOUN"},
"N_N_N__Gender=Fem|Number=Sing|PronType=Art": {"pos": "NOUN"},
"N_N_N__Gender=Masc|Number=Plur|PronType=Art": {"pos": "NOUN"},
"N_N_N__Number=Plur": {"pos": "NOUN"},
"N_N_N__Number=Sing": {"pos": "NOUN"},
"N_N_N___": {"pos": "NOUN"},
"N_N_Num_N___": {"pos": "NOUN"},
"N_N_Num__Definite=Def|Number=Sing": {"pos": "NOUN"},
"N_N_Num___": {"pos": "NOUN"},
"N_N_Prep_Art_Adj_N__Degree=Pos|Gender=Neut|Number=Sing": {"pos": "NOUN"},
"N_N_Prep_Art_N_Prep_Art_N___": {"pos": "NOUN"},
"N_N_Prep_Art_N___": {"pos": "NOUN"},
"N_N_Prep_N_N__AdpType=Prep": {"pos": "NOUN"},
"N_N_Prep_N_Prep_Adj_N___": {"pos": "NOUN"},
"N_N_Prep_N__AdpType=Prep": {"pos": "NOUN"},
"N_N_Prep_N__Number=Sing": {"pos": "NOUN"},
"N_N_Prep_N___": {"pos": "NOUN"},
"N_N_Punc_N_Punc___": {"pos": "NOUN"},
"N_Num_N_N__Definite=Def|Number=Sing": {"pos": "NOUN"},
"N_Num_N_Num___": {"pos": "NOUN"},
"N_Num_N___": {"pos": "NOUN"},
"N_Num_Num__Definite=Def|Number=Sing": {"pos": "NOUN"},
"N_Num__Definite=Def|Number=Plur": {"pos": "NOUN"},
"N_Num__Definite=Def|Number=Sing": {"pos": "NOUN"},
"N_Num___": {"pos": "NOUN"},
"N_N|eigen|ev|gen_eigen|ev|gen___": {"pos": "NOUN"},
"N_N|eigen|ev|gen_eigen|ev|neut___": {"pos": "NOUN"},
"N_N|eigen|ev|gen_soort|ev|neut___": {"pos": "NOUN"},
"N_N|eigen|ev|gen_soort|mv|neut___": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|gen___": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|neut__AdpType=Prep": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|neut__AdpType=Preppron|Gender=Fem|Number=Plur": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|neut__AdpType=Preppron|Gender=Masc|Number=Sing": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|neut__Gender=Fem|Number=Plur|PronType=Art": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|neut__Gender=Fem|Number=Sing|PronType=Art": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|neut__Gender=Masc|Number=Plur|PronType=Art": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|neut__Gender=Masc|Number=Sing|PronType=Art": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|neut__NumType=Card": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|neut__Number=Sing": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|ev|neut___": {"pos": "NOUN"},
"N_N|eigen|ev|neut_eigen|mv|neut___": {"pos": "NOUN"},
"N_N|eigen|ev|neut_soort|ev|neut__AdpType=Prep": {"pos": "NOUN"},
"N_N|eigen|ev|neut_soort|ev|neut___": {"pos": "NOUN"},
"N_N|eigen|ev|neut_soort|mv|neut___": {"pos": "NOUN"},
"N_N|eigen|mv|neut_eigen|mv|neut___": {"pos": "NOUN"},
"N_N|soort|ev|neut_eigen|ev|neut__Number=Sing": {"pos": "NOUN"},
"N_N|soort|ev|neut_soort|ev|neut__Gender=Masc|Number=Plur|PronType=Art": {"pos": "NOUN"},
"N_N|soort|ev|neut_soort|ev|neut__NumForm=Digit|NumType=Card": {"pos": "NOUN"},
"N_N|soort|ev|neut_soort|ev|neut__Number=Sing": {"pos": "NOUN"},
"N_N|soort|ev|neut_soort|mv|neut__Number=Plur": {"pos": "NOUN"},
"N_N|soort|mv|neut_eigen|ev|neut__Number=Sing": {"pos": "NOUN"},
"N_N|soort|mv|neut_soort|ev|neut__Number=Sing": {"pos": "NOUN"},
"N_N|soort|mv|neut_soort|mv|neut__Number=Plur": {"pos": "NOUN"},
"N_Prep_Adj_Adj_N__Degree=Pos|Number=Plur": {"pos": "NOUN"},
"N_Prep_Adj_N___": {"pos": "NOUN"},
"N_Prep_Art_N_Art_N__Number=Plur": {"pos": "NOUN"},
"N_Prep_Art_N_N__Number=Sing": {"pos": "NOUN"},
"N_Prep_Art_N_Prep_Art_N__Gender=Neut|Number=Sing": {"pos": "NOUN"},
"N_Prep_Art_N__Number=Plur": {"pos": "NOUN"},
"N_Prep_Art_N__Number=Sing": {"pos": "NOUN"},
"N_Prep_Art_N___": {"pos": "NOUN"},
"N_Prep_N_Art_Adj___": {"pos": "NOUN"},
"N_Prep_N_N__Number=Sing": {"pos": "NOUN"},
"N_Prep_N_N___": {"pos": "NOUN"},
"N_Prep_N_Prep_Art_N___": {"pos": "NOUN"},
"N_Prep_N_Prep_N_Conj_N_Prep_Art_N_N__Number=Sing": {"pos": "NOUN"},
"N_Prep_N_Punc_N_Conj_N__Number=Sing": {"pos": "NOUN"},
"N_Prep_N__Number=Plur": {"pos": "NOUN"},
"N_Prep_N__Number=Sing": {"pos": "NOUN"},
"N_Prep_N___": {"pos": "NOUN"},
"N_Prep_Num__Definite=Def|Number=Sing": {"pos": "NOUN"},
"N_Prep_Pron_N___": {"pos": "NOUN"},
"N_Prep|soort|ev|neut_voor__Number=Sing": {"pos": "NOUN"},
"N_Pron___": {"pos": "NOUN"},
"N_Punc_Adj_N___": {"pos": "NOUN"},
"N_Punc_Adj_Pron_Punc__Degree=Pos|Number=Sing|Person=2": {"pos": "NOUN"},
"N_Punc_Adv_V_Pron_N__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "NOUN"},
"N_Punc_Misc_Punc_N___": {"pos": "NOUN"},
"N_Punc_N_N_N_N__Number=Sing": {"pos": "NOUN"},
"N_Punc_N_Punc_N__Number=Sing": {"pos": "NOUN"},
"N_Punc_N_Punc__Number=Sing": {"pos": "NOUN"},
"N_Punc_N__Number=Sing": {"pos": "NOUN"},
"N_Punc_Punc_N_N_Punc_Punc_N___": {"pos": "NOUN"},
"N_V_N_N___": {"pos": "NOUN"},
"N_V_N___": {"pos": "NOUN"},
"N_V__Aspect=Imp|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin": {"pos": "NOUN"},
"N_V__Number=Sing|Tense=Past|VerbForm=Part": {"pos": "NOUN"},
"N_V___": {"pos": "NOUN"},
"N_V|eigen|ev|neut_trans|imp___": {"pos": "NOUN"},
"N_V|soort|ev|neut_hulpofkopp|conj__Mood=Sub|Number=Sing|VerbForm=Fin": {"pos": "NOUN"},
"N_V|soort|ev|neut_intrans|conj__Mood=Sub|Number=Sing|VerbForm=Fin": {"pos": "NOUN"},
"Num_Adj_Adj_N___": {"pos": "NUM"},
"Num_Adj_N___": {"pos": "NUM"},
"Num_Adj__Definite=Def|Degree=Pos|NumType=Card": {"pos": "NUM"},
"Num_Adj__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"Num_Adj___": {"pos": "NUM"},
"Num_Conj_Adj__Case=Nom|Definite=Def|Degree=Pos|NumType=Card": {"pos": "NUM"},
"Num_Conj_Art_Adj__Definite=Def|Degree=Pos|Number=Sing|NumType=Card": {"pos": "NUM"},
"Num_Conj_Num_N__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"Num_Conj_Num__Degree=Cmp|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num_N_N__Definite=Def|Number=Sing|NumType=Card": {"pos": "NUM"},
"Num_N_Num_Num_N__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"Num_N_Num__Definite=Def|Number=Sing|NumType=Card": {"pos": "NUM"},
"Num_N_Num__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"Num_N__Definite=Def|Number=Plur|NumType=Card": {"pos": "NUM"},
"Num_N__Definite=Def|Number=Sing|NumType=Card": {"pos": "NUM"},
"Num_N__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"Num_N___": {"pos": "NUM"},
"Num_Num_N__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"Num_Num__Definite=Def|NumType=Card": {"pos": "NUM"},
"Num_Num__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"Num_Prep_Num__Definite=Def|NumType=Card": {"pos": "NUM"},
"Num_Punc_Num_N_N__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"Num_Punc_Num__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"Num_Punc__NumForm=Digit|NumType=Card": {"pos": "NUM"},
"Num__Case=Nom|Degree=Cmp|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num__Case=Nom|Degree=Pos|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num__Case=Nom|Degree=Sup|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num__Degree=Cmp|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num__Degree=Pos|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num__Degree=Pos|Number=Plur|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num__Degree=Sup|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num__Degree=Sup|Number=Plur|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num|hoofd|bep|attr|onverv__Definite=Def|NumType=Card": {"pos": "NUM"},
"Num|hoofd|bep|zelfst|onverv__Definite=Def|NumType=Card": {"pos": "NUM"},
"Num|hoofd|bep|zelfst|vervmv__Definite=Def|Number=Plur|NumType=Card": {"pos": "NUM"},
"Num|hoofd|onbep|attr|stell|onverv__Degree=Pos|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num|hoofd|onbep|attr|vergr|onverv__Degree=Cmp|NumType=Card|PronType=Ind": {"pos": "NUM"},
"Num|rang|bep|attr|onverv__Definite=Def|NumType=Ord": {"pos": "NUM"},
"Num|rang|bep|zelfst|onverv__Definite=Def|NumType=Ord": {"pos": "NUM"},
"N|eigen|ev|gen__Case=Gen|Number=Sing": {"pos": "NOUN"},
"N|eigen|ev|neut__Number=Sing": {"pos": "NOUN"},
"N|eigen|mv|neut__Number=Plur": {"pos": "NOUN"},
"N|soort|ev|dat__Case=Dat|Number=Sing": {"pos": "NOUN"},
"N|soort|ev|gen__Case=Gen|Number=Sing": {"pos": "NOUN"},
"N|soort|ev|neut__Number=Sing": {"pos": "NOUN"},
"N|soort|mv|neut__Number=Plur": {"pos": "NOUN"},
"PROPN___": {"pos": "PROPN"},
"PUNCT___": {"pos": "PUNCT"},
"Prep_Adj_Conj_Prep_N__Degree=Pos|Number=Sing": {"pos": "PREP"},
"Prep_Adj_N__Degree=Pos|Number=Plur": {"pos": "PREP"},
"Prep_Adj|voor_adv|vergr|vervneut__Case=Nom|Degree=Cmp": {"pos": "PREP"},
"Prep_Adj|voor_attr|stell|onverv__Degree=Pos": {"pos": "PREP"},
"Prep_Adj|voor_attr|stell|vervneut__Case=Nom|Degree=Pos": {"pos": "PREP"},
"Prep_Adv__AdpType=Prep": {"pos": "PREP"},
"Prep_Adv__Case=Nom|Degree=Pos": {"pos": "PREP"},
"Prep_Adv__Case=Nom|Degree=Sup": {"pos": "PREP"},
"Prep_Adv__Degree=Pos": {"pos": "PREP"},
"Prep_Adv|voor_gew|aanw__AdpType=Prep": {"pos": "PREP"},
"Prep_Adv|voor_gew|aanw__Gender=Masc|Number=Sing|PronType=Tot": {"pos": "PREP"},
"Prep_Adv|voor_gew|aanw__PronType=Dem": {"pos": "PREP"},
"Prep_Adv|voor_pron|vrag__PronType=Int": {"pos": "PREP"},
"Prep_Art_Adj_N__Degree=Pos|Number=Sing": {"pos": "PREP"},
"Prep_Art_Adj__AdpType=Prep": {"pos": "PREP"},
"Prep_Art_Adj__Case=Nom|Degree=Pos": {"pos": "PREP"},
"Prep_Art_Adj__Degree=Cmp|Gender=Neut": {"pos": "PREP"},
"Prep_Art_Misc_Misc___": {"pos": "PREP"},
"Prep_Art_N_Adv__Number=Sing": {"pos": "PREP"},
"Prep_Art_N_Adv__Number=Sing|PronType=Int": {"pos": "PREP"},
"Prep_Art_N_Art_N__AdpType=Prep": {"pos": "PREP"},
"Prep_Art_N_Prep_Art_N__AdpType=Prep": {"pos": "PREP"},
"Prep_Art_N_Prep__AdpType=Prep": {"pos": "PREP"},
"Prep_Art_N_Prep__Gender=Neut|Number=Sing": {"pos": "PREP"},
"Prep_Art_N_Prep__Number=Sing": {"pos": "PREP"},
"Prep_Art_N_V__Number=Plur|Tense=Past|VerbForm=Part": {"pos": "PREP"},
"Prep_Art_N__AdpType=Prep": {"pos": "PREP"},
"Prep_Art_N__Gender=Com|Number=Sing": {"pos": "PREP"},
"Prep_Art_N__Gender=Neut|Number=Sing": {"pos": "PREP"},
"Prep_Art_N__Number=Plur": {"pos": "PREP"},
"Prep_Art_N__Number=Sing": {"pos": "PREP"},
"Prep_Art_V__AdpType=Prep": {"pos": "PREP"},
"Prep_Art_V__Gender=Neut|VerbForm=Inf": {"pos": "PREP"},
"Prep_Art|voor_bep|onzijd|neut__Gender=Neut": {"pos": "PREP"},
"Prep_Art|voor_onbep|zijdofonzijd|neut__Number=Sing": {"pos": "PREP"},
"Prep_Conj_Prep|voor_neven_voor__Gender=Masc|Number=Sing|PronType=Tot": {"pos": "PREP"},
"Prep_Misc|voor_vreemd___": {"pos": "PREP"},
"Prep_N_Adv|voor_soort|ev|neut_deeladv__Number=Sing": {"pos": "PREP"},
"Prep_N_Adv|voor_soort|ev|neut_pron|aanw__AdpType=Prep": {"pos": "PREP"},
"Prep_N_Adv|voor_soort|ev|neut_pron|aanw__Number=Sing|PronType=Dem": {"pos": "PREP"},
"Prep_N_Adv|voor_soort|ev|neut_pron|vrag__Number=Sing|PronType=Int": {"pos": "PREP"},
"Prep_N_Adv|voor_soort|mv|neut_deelv__Gender=Masc|Number=Sing|PronType=Tot": {"pos": "PREP"},
"Prep_N_Conj_N__Number=Sing": {"pos": "PREP"},
"Prep_N_Conj__AdpType=Prep": {"pos": "PREP"},
"Prep_N_Prep_N__Number=Sing": {"pos": "PREP"},
"Prep_N_Prep|voor_soort|ev|dat_voor__Number=Sing": {"pos": "PREP"},
"Prep_N_Prep|voor_soort|ev|neut_voor__AdpType=Prep": {"pos": "PREP"},
"Prep_N_Prep|voor_soort|ev|neut_voor__Number=Sing": {"pos": "PREP"},
"Prep_N_Prep|voor_soort|mv|neut_voor__Number=Plur": {"pos": "PREP"},
"Prep_N_V__Case=Nom|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "PREP"},
"Prep_Num_N__Definite=Def|Number=Sing": {"pos": "PREP"},
"Prep_Num__Case=Nom|Degree=Sup|PronType=Ind": {"pos": "PREP"},
"Prep_Num__Degree=Cmp|PronType=Ind": {"pos": "PREP"},
"Prep_N|voor_eigen|ev|neut__Number=Sing": {"pos": "PREP"},
"Prep_N|voor_soort|ev|dat__AdpType=Prep": {"pos": "PREP"},
"Prep_N|voor_soort|ev|dat__Case=Dat|Number=Sing": {"pos": "PREP"},
"Prep_N|voor_soort|ev|neut__AdpType=Prep": {"pos": "PREP"},
"Prep_N|voor_soort|ev|neut__Gender=Masc|Number=Sing|PronType=Tot": {"pos": "PREP"},
"Prep_N|voor_soort|ev|neut__Number=Sing": {"pos": "PREP"},
"Prep_N|voor_soort|mv|neut__AdpType=Prep": {"pos": "PREP"},
"Prep_N|voor_soort|mv|neut__Number=Plur": {"pos": "PREP"},
"Prep_Prep_Adj|voor_voor_adv|stell|onverv__Gender=Masc|Number=Sing|PronType=Tot": {"pos": "PREP"},
"Prep_Prep_Adv__Degree=Pos": {"pos": "PREP"},
"Prep_Pron_Adj__Degree=Cmp|Number=Sing|Person=3": {"pos": "PREP"},
"Prep_Pron_N_Adv__Number=Plur": {"pos": "PREP"},
"Prep_Pron_N__AdpType=Prep": {"pos": "PREP"},
"Prep_Pron_N__Case=Dat|Number=Sing": {"pos": "PREP"},
"Prep_Pron|voor_aanw|neut|zelfst___": {"pos": "PREP"},
"Prep_Pron|voor_onbep|neut|attr___": {"pos": "PREP"},
"Prep_Pron|voor_onbep|neut|zelfst___": {"pos": "PREP"},
"Prep_Pron|voor_rec|neut__AdpType=Prep": {"pos": "PREP"},
"Prep_Pron|voor_rec|neut___": {"pos": "PREP"},
"Prep_Pron|voor_ref|3|evofmv__Number=Plur,Sing|Person=3": {"pos": "PREP"},
"Prep_Punc_N_Conj_N__AdpType=Prep": {"pos": "PREP"},
"Prep_V_N__Number=Sing|Tense=Pres|VerbForm=Part": {"pos": "PREP"},
"Prep_V_Pron_Pron_Adv__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|PronType=Dem|Tense=Pres|VerbForm=Fin": {"pos": "PREP"},
"Prep_V|voor_intrans|inf__VerbForm=Inf": {"pos": "PREP"},
"Prep_V|voorinf_trans|inf__VerbForm=Inf": {"pos": "PREP"},
"Prep|achter__AdpType=Post": {"pos": "PREP"},
"Prep|comb__AdpType=Circ": {"pos": "PREP"},
"Prep|voor__AdpType=Prep": {"pos": "PREP"},
"Prep|voorinf__AdpType=Prep|PartType=Inf": {"pos": "PREP"},
"Pron_Adj_N_Punc_Art_Adj_N_Prep_Art_Adj_N__NumType=Card": {"pos": "PRON"},
"Pron_Adj__Case=Nom|Degree=Sup|Number=Sing|Person=2|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"Pron_Adj__Degree=Cmp|PronType=Ind": {"pos": "PRON"},
"Pron_Adv|vrag|neut|attr_deelv__PronType=Int": {"pos": "PRON"},
"Pron_Art_N_N__Number=Plur|PronType=Ind": {"pos": "PRON"},
"Pron_Art__Number=Sing|PronType=Int": {"pos": "PRON"},
"Pron_N_Adv__Number=Sing|PronType=Ind": {"pos": "PRON"},
"Pron_N_V_Adv_Num_Punc__Aspect=Imp|Definite=Def|Mood=Ind|Number=Sing|Person=3|PronType=Ind|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"Pron_N_V_Conj_N__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|PronType=Ind|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"Pron_N__Case=Gen|Number=Sing|PronType=Ind": {"pos": "PRON"},
"Pron_N__Number=Sing|PronType=Ind": {"pos": "PRON"},
"Pron_N|aanw|gen|attr_soort|mv|neut__Case=Gen|Number=Plur|PronType=Dem": {"pos": "PRON"},
"Pron_N|onbep|neut|attr_soort|ev|neut__Number=Sing|PronType=Ind": {"pos": "PRON"},
"Pron_Prep_Art__Number=Sing|PronType=Int": {"pos": "PRON"},
"Pron_Prep_Art__Number=Sing|PronType=Rel": {"pos": "PRON"},
"Pron_Prep_N__Number=Plur|PronType=Int": {"pos": "PRON"},
"Pron_Prep|betr|neut|zelfst_voor__PronType=Rel": {"pos": "PRON"},
"Pron_Prep|onbep|neut|zelfst_voor__PronType=Ind": {"pos": "PRON"},
"Pron_Prep|vrag|neut|attr_voor__PronType=Int": {"pos": "PRON"},
"Pron_Pron_V__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|PronType=Rel|Tense=Pres|VerbForm=Fin": {"pos": "PRON"},
"Pron_Pron__Person=3|PronType=Prs|Reflex=Yes": {"pos": "PRON"},
"Pron_V_V__Aspect=Imp|Mood=Ind|Person=3|PronType=Dem|Tense=Pres|VerbForm=Inf": {"pos": "PRON"},
"Pron_V__Case=Gen|Number=Sing|Person=3|Poss=Yes|PronType=Prs|VerbForm=Inf": {"pos": "PRON"},
"Pron_V__Number=Plur|Person=1|Poss=Yes|PronType=Prs|VerbForm=Inf": {"pos": "PRON"},
"Pron|aanw|dat|attr__Case=Dat|PronType=Dem": {"pos": "PRON"},
"Pron|aanw|gen|attr__Case=Gen|PronType=Dem": {"pos": "PRON"},
"Pron|aanw|neut|attr__PronType=Dem": {"pos": "PRON"},
"Pron|aanw|neut|attr|weigen__PronType=Dem": {"pos": "PRON"},
"Pron|aanw|neut|attr|wzelf__PronType=Dem": {"pos": "PRON"},
"Pron|aanw|neut|zelfst__PronType=Dem": {"pos": "PRON"},
"Pron|betr|gen|zelfst__Case=Gen|PronType=Rel": {"pos": "PRON"},
"Pron|betr|neut|attr__PronType=Rel": {"pos": "PRON"},
"Pron|betr|neut|zelfst__PronType=Rel": {"pos": "PRON"},
"Pron|bez|1|ev|neut|attr__Number=Sing|Person=1|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"Pron|bez|1|mv|neut|attr__Number=Plur|Person=1|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"Pron|bez|2|ev|neut|attr__Number=Sing|Person=2|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"Pron|bez|2|mv|neut|attr__Number=Plur|Person=2|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"Pron|bez|3|ev|gen|attr__Case=Gen|Number=Sing|Person=3|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"Pron|bez|3|ev|neut|attr__Number=Sing|Person=3|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"Pron|bez|3|ev|neut|zelfst__Number=Sing|Person=3|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"Pron|bez|3|mv|neut|attr__Number=Plur|Person=3|Poss=Yes|PronType=Prs": {"pos": "PRON"},
"Pron|onbep|gen|attr__Case=Gen|PronType=Ind": {"pos": "PRON"},
"Pron|onbep|gen|zelfst__Case=Gen|PronType=Ind": {"pos": "PRON"},
"Pron|onbep|neut|attr__PronType=Ind": {"pos": "PRON"},
"Pron|onbep|neut|zelfst__PronType=Ind": {"pos": "PRON"},
"Pron|per|1|ev|datofacc__Case=Acc,Dat|Number=Sing|Person=1|PronType=Prs": {"pos": "PRON"},
"Pron|per|1|ev|nom__Case=Nom|Number=Sing|Person=1|PronType=Prs": {"pos": "PRON"},
"Pron|per|1|mv|datofacc__Case=Acc,Dat|Number=Plur|Person=1|PronType=Prs": {"pos": "PRON"},
"Pron|per|1|mv|nom__Case=Nom|Number=Plur|Person=1|PronType=Prs": {"pos": "PRON"},
"Pron|per|2|ev|datofacc__Case=Acc,Dat|Number=Sing|Person=2|PronType=Prs": {"pos": "PRON"},
"Pron|per|2|ev|nom__Case=Nom|Number=Sing|Person=2|PronType=Prs": {"pos": "PRON"},
"Pron|per|2|mv|datofacc__Case=Acc,Dat|Number=Plur|Person=2|PronType=Prs": {"pos": "PRON"},
"Pron|per|2|mv|nom__Case=Nom|Number=Plur|Person=2|PronType=Prs": {"pos": "PRON"},
"Pron|per|3|evofmv|datofacc__Case=Acc,Dat|Number=Plur,Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"Pron|per|3|evofmv|nom__Case=Nom|Number=Plur,Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"Pron|per|3|ev|datofacc__Case=Acc,Dat|Number=Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"Pron|per|3|ev|nom__Case=Nom|Number=Sing|Person=3|PronType=Prs": {"pos": "PRON"},
"Pron|per|3|mv|datofacc__Case=Acc,Dat|Number=Plur|Person=3|PronType=Prs": {"pos": "PRON"},
"Pron|rec|gen__Case=Gen|PronType=Rcp": {"pos": "PRON"},
"Pron|rec|neut__PronType=Rcp": {"pos": "PRON"},
"Pron|ref|1|ev__Number=Sing|Person=1|PronType=Prs|Reflex=Yes": {"pos": "PRON"},
"Pron|ref|1|mv__Number=Plur|Person=1|PronType=Prs|Reflex=Yes": {"pos": "PRON"},
"Pron|ref|2|ev__Number=Sing|Person=2|PronType=Prs|Reflex=Yes": {"pos": "PRON"},
"Pron|ref|3|evofmv__Number=Plur,Sing|Person=3|PronType=Prs|Reflex=Yes": {"pos": "PRON"},
"Pron|vrag|neut|attr__PronType=Int": {"pos": "PRON"},
"Pron|vrag|neut|zelfst__PronType=Int": {"pos": "PRON"},
"Punc_Int_Punc_N_N_N_Punc_Pron_V_Pron_Adj_V_Punc___": {"pos": "PUNCT"},
"Punc_N_Punc_N___": {"pos": "PUNCT"},
"Punc_Num_Num___": {"pos": "PUNCT"},
"Punc_Num___": {"pos": "PUNCT"},
"Punc|aanhaaldubb__PunctType=Quot": {"pos": "PUNCT"},
"Punc|aanhaalenk__PunctType=Quot": {"pos": "PUNCT"},
"Punc|dubbpunt__PunctType=Colo": {"pos": "PUNCT"},
"Punc|haakopen__PunctSide=Ini|PunctType=Brck": {"pos": "PUNCT"},
"Punc|haaksluit__PunctSide=Fin|PunctType=Brck": {"pos": "PUNCT"},
"Punc|hellip__PunctType=Peri": {"pos": "PUNCT"},
"Punc|isgelijk___": {"pos": "PUNCT"},
"Punc|komma__PunctType=Comm": {"pos": "PUNCT"},
"Punc|liggstreep___": {"pos": "PUNCT"},
"Punc|maal___": {"pos": "PUNCT"},
"Punc|punt__PunctType=Peri": {"pos": "PUNCT"},
"Punc|puntkomma__PunctType=Semi": {"pos": "PUNCT"},
"Punc|schuinstreep___": {"pos": "PUNCT"},
"Punc|uitroep__PunctType=Excl": {"pos": "PUNCT"},
"Punc|vraag__PunctType=Qest": {"pos": "PUNCT"},
"V_Adv_Art_N_Prep_Pron_N__Degree=Pos|Number=Plur|Person=2|Subcat=Tran": {"pos": "VERB"},
"V_Adv__Degree=Pos|Subcat=Tran": {"pos": "VERB"},
"V_Art_N_Num_N__Aspect=Imp|Definite=Def|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|VerbType=Mod": {"pos": "VERB"},
"V_Art_N__Number=Sing|Subcat=Tran": {"pos": "VERB"},
"V_Conj_N_N__Number=Sing|Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V_Conj_Pron__Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V_N_Conj_Adj_N_Prep_Art_N__Degree=Pos|Number=Sing|Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V_N_N__Number=Sing|Subcat=Intr|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V_N_N__Number=Sing|Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V_N_V__Aspect=Imp|Mood=Ind|Number=Sing|Subcat=Intr|Tense=Pres|VerbForm=Inf": {"pos": "VERB"},
"V_N__Number=Plur|Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V_N|trans|imp_eigen|ev|neut__Number=Sing|Subcat=Tran": {"pos": "VERB"},
"V_Prep|intrans|verldw|onverv_voor__Subcat=Intr|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V_Pron_Adv_Adv_Pron_V__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Subcat=Tran|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V_Pron_Adv__Aspect=Imp|Degree=Pos|Mood=Ind|Number=Sing|Person=2|Subcat=Tran|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V_Pron_V__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Subcat=Tran|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V_Pron__VerbType=Aux,Cop": {"pos": "VERB"},
"V_V|hulp|imp_intrans|inf__VerbForm=Inf|VerbType=Mod": {"pos": "VERB"},
"V|hulpofkopp|conj__Mood=Sub|VerbForm=Fin": {"pos": "VERB"},
"V|hulpofkopp|conj__Mood=Sub|VerbForm=Fin|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulpofkopp|imp__Mood=Imp|VerbForm=Fin": {"pos": "VERB"},
"V|hulpofkopp|imp__Mood=Imp|VerbForm=Fin|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulpofkopp|inf__VerbForm=Inf": {"pos": "VERB"},
"V|hulpofkopp|inf__VerbForm=Inf|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulpofkopp|inf|subst__VerbForm=Inf": {"pos": "VERB"},
"V|hulpofkopp|ott|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|hulpofkopp|ott|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Pres|VerbForm=Fin|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulpofkopp|ott|1|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|hulpofkopp|ott|1|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulpofkopp|ott|2|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|hulpofkopp|ott|2|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulpofkopp|ott|3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|hulpofkopp|ott|3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulpofkopp|ovt|1of2of3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V|hulpofkopp|ovt|1of2of3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulpofkopp|ovt|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V|hulpofkopp|ovt|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulpofkopp|tegdw|vervneut__Case=Nom|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V|hulpofkopp|tegdw|vervneut__Case=Nom|Tense=Pres|VerbForm=Part|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulpofkopp|verldw|onverv__Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V|hulpofkopp|verldw|onverv__Tense=Past|VerbForm=Part|VerbType=Aux,Cop": {"pos": "VERB"},
"V|hulp|conj__Mood=Sub|VerbForm=Fin|VerbType=Mod": {"pos": "VERB"},
"V|hulp|inf__VerbForm=Inf": {"pos": "VERB"},
"V|hulp|inf__VerbForm=Inf|VerbType=Mod": {"pos": "VERB"},
"V|hulp|ott|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|hulp|ott|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Pres|VerbForm=Fin|VerbType=Mod": {"pos": "VERB"},
"V|hulp|ott|1|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|hulp|ott|1|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|VerbType=Mod": {"pos": "VERB"},
"V|hulp|ott|2|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|hulp|ott|2|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Fin|VerbType=Mod": {"pos": "VERB"},
"V|hulp|ott|3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|hulp|ott|3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|VerbType=Mod": {"pos": "VERB"},
"V|hulp|ovt|1of2of3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V|hulp|ovt|1of2of3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|VerbType=Mod": {"pos": "VERB"},
"V|hulp|ovt|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V|hulp|ovt|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|VerbType=Mod": {"pos": "VERB"},
"V|hulp|verldw|onverv__Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V|hulp|verldw|onverv__Tense=Past|VerbForm=Part|VerbType=Mod": {"pos": "VERB"},
"V|intrans|conj__Mood=Sub|Subcat=Intr|VerbForm=Fin": {"pos": "VERB"},
"V|intrans|imp__Mood=Imp|Subcat=Intr|VerbForm=Fin": {"pos": "VERB"},
"V|intrans|inf__Subcat=Intr|VerbForm=Inf": {"pos": "VERB"},
"V|intrans|inf|subst__Subcat=Intr|VerbForm=Inf": {"pos": "VERB"},
"V|intrans|ott|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Subcat=Intr|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|intrans|ott|1|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Subcat=Intr|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|intrans|ott|2|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Subcat=Intr|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|intrans|ott|3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Subcat=Intr|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|intrans|ovt|1of2of3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Subcat=Intr|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V|intrans|ovt|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Subcat=Intr|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V|intrans|tegdw|onverv__Subcat=Intr|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V|intrans|tegdw|vervmv__Number=Plur|Subcat=Intr|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V|intrans|tegdw|vervneut__Case=Nom|Subcat=Intr|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V|intrans|tegdw|vervvergr__Degree=Cmp|Subcat=Intr|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V|intrans|verldw|onverv__Subcat=Intr|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V|intrans|verldw|vervmv__Number=Plur|Subcat=Intr|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V|intrans|verldw|vervneut__Case=Nom|Subcat=Intr|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V|refl|imp__Mood=Imp|Reflex=Yes|VerbForm=Fin": {"pos": "VERB"},
"V|refl|inf__Reflex=Yes|VerbForm=Inf": {"pos": "VERB"},
"V|refl|inf|subst__Reflex=Yes|VerbForm=Inf": {"pos": "VERB"},
"V|refl|ott|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Reflex=Yes|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|refl|ott|1|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Reflex=Yes|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|refl|ott|2|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Reflex=Yes|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|refl|ott|3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Reflex=Yes|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|refl|ovt|1of2of3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Reflex=Yes|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V|refl|ovt|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Reflex=Yes|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V|refl|tegdw|vervneut__Case=Nom|Reflex=Yes|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V|refl|verldw|onverv__Reflex=Yes|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V|trans|conj__Mood=Sub|Subcat=Tran|VerbForm=Fin": {"pos": "VERB"},
"V|trans|imp__Mood=Imp|Subcat=Tran|VerbForm=Fin": {"pos": "VERB"},
"V|trans|inf__Subcat=Tran|VerbForm=Inf": {"pos": "VERB"},
"V|trans|inf|subst__Subcat=Tran|VerbForm=Inf": {"pos": "VERB"},
"V|trans|ott|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Subcat=Tran|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|trans|ott|1|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Subcat=Tran|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|trans|ott|2|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Subcat=Tran|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|trans|ott|3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Subcat=Tran|Tense=Pres|VerbForm=Fin": {"pos": "VERB"},
"V|trans|ovt|1of2of3|ev__Aspect=Imp|Mood=Ind|Number=Sing|Subcat=Tran|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V|trans|ovt|1of2of3|mv__Aspect=Imp|Mood=Ind|Number=Plur|Subcat=Tran|Tense=Past|VerbForm=Fin": {"pos": "VERB"},
"V|trans|tegdw|onverv__Subcat=Tran|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V|trans|tegdw|vervneut__Case=Nom|Subcat=Tran|Tense=Pres|VerbForm=Part": {"pos": "VERB"},
"V|trans|verldw|onverv__Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V|trans|verldw|vervmv__Number=Plur|Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V|trans|verldw|vervneut__Case=Nom|Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"V|trans|verldw|vervvergr__Degree=Cmp|Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "VERB"},
"X__Aspect=Imp|Definite=Def|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|VerbType=Mod": {"pos": "X"},
"X__Aspect=Imp|Definite=Def|Mood=Ind|Number=Sing|Person=3|PronType=Ind|Tense=Pres|VerbForm=Fin": {"pos": "X"},
"X__Aspect=Imp|Degree=Pos|Mood=Ind|Number=Sing|Person=2|Subcat=Tran|Tense=Pres|VerbForm=Fin": {"pos": "X"},
"X__Aspect=Imp|Degree=Pos|Mood=Ind|Number=Sing|Person=2|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__Aspect=Imp|Degree=Pos|Mood=Ind|Number=Sing|Person=2|Tense=Pres|VerbForm=Inf": {"pos": "X"},
"X__Aspect=Imp|Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin": {"pos": "X"},
"X__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|PronType=Dem|Tense=Pres|VerbForm=Fin": {"pos": "X"},
"X__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|PronType=Rel|Tense=Pres|VerbForm=Fin": {"pos": "X"},
"X__Aspect=Imp|Mood=Ind|Number=Sing|Person=2|Subcat=Tran|Tense=Pres|VerbForm=Fin": {"pos": "X"},
"X__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|PronType=Ind|Tense=Pres|VerbForm=Fin": {"pos": "X"},
"X__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Subcat=Tran|Tense=Pres|VerbForm=Fin": {"pos": "X"},
"X__Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin": {"pos": "X"},
"X__Aspect=Imp|Mood=Ind|Number=Sing|Subcat=Intr|Tense=Pres|VerbForm=Inf": {"pos": "X"},
"X__Aspect=Imp|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin": {"pos": "X"},
"X__Aspect=Imp|Mood=Ind|Person=3|PronType=Dem|Tense=Pres|VerbForm=Inf": {"pos": "X"},
"X__Case=Dat|Degree=Pos|Number=Sing": {"pos": "X"},
"X__Case=Dat|Number=Sing": {"pos": "X"},
"X__Case=Gen|Definite=Def|Number=Sing": {"pos": "X"},
"X__Case=Gen|Number=Plur|PronType=Dem": {"pos": "X"},
"X__Case=Gen|Number=Plur|PronType=Ind": {"pos": "X"},
"X__Case=Gen|Number=Sing": {"pos": "X"},
"X__Case=Gen|Number=Sing|Person=3|Poss=Yes|PronType=Prs|VerbForm=Inf": {"pos": "X"},
"X__Case=Gen|Number=Sing|PronType=Ind": {"pos": "X"},
"X__Case=Nom|Definite=Def|Degree=Cmp|Gender=Neut": {"pos": "X"},
"X__Case=Nom|Definite=Def|Degree=Sup": {"pos": "X"},
"X__Case=Nom|Definite=Def|Degree=Sup|Gender=Neut": {"pos": "X"},
"X__Case=Nom|Degree=Cmp": {"pos": "X"},
"X__Case=Nom|Degree=Pos": {"pos": "X"},
"X__Case=Nom|Degree=Pos|Gender=Neut": {"pos": "X"},
"X__Case=Nom|Degree=Pos|Number=Plur": {"pos": "X"},
"X__Case=Nom|Degree=Pos|Number=Sing": {"pos": "X"},
"X__Case=Nom|Degree=Sup": {"pos": "X"},
"X__Case=Nom|Degree=Sup|Number=Sing|Person=2|Poss=Yes|PronType=Prs": {"pos": "X"},
"X__Case=Nom|Degree=Sup|PronType=Ind": {"pos": "X"},
"X__Case=Nom|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__Definite=Def": {"pos": "X"},
"X__Definite=Def|Degree=Cmp|Gender=Neut": {"pos": "X"},
"X__Definite=Def|Degree=Pos": {"pos": "X"},
"X__Definite=Def|Degree=Pos|Number=Sing": {"pos": "X"},
"X__Definite=Def|Degree=Pos|Variant=Short": {"pos": "X"},
"X__Definite=Def|Degree=Sup|Gender=Neut": {"pos": "X"},
"X__Definite=Def|Degree=Sup|Gender=Neut|Number=Sing": {"pos": "X"},
"X__Definite=Def|Degree=Sup|Gender=Neut|PronType=Ind": {"pos": "X"},
"X__Definite=Def|Gender=Neut": {"pos": "X"},
"X__Definite=Def|Gender=Neut|Number=Plur|Person=3": {"pos": "X"},
"X__Definite=Def|Gender=Neut|Number=Sing": {"pos": "X"},
"X__Definite=Def|Number=Plur": {"pos": "X"},
"X__Definite=Def|Number=Sing": {"pos": "X"},
"X__Definite=Def|Number=Sing|Person=1": {"pos": "X"},
"X__Definite=Def|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__Definite=Def|Number=Sing|Tense=Pres|VerbForm=Part": {"pos": "X"},
"X__Degree=Cmp": {"pos": "X"},
"X__Degree=Cmp|Gender=Neut": {"pos": "X"},
"X__Degree=Cmp|Number=Sing|Person=3": {"pos": "X"},
"X__Degree=Cmp|PronType=Ind": {"pos": "X"},
"X__Degree=Cmp|Variant=Short": {"pos": "X"},
"X__Degree=Pos": {"pos": "X"},
"X__Degree=Pos|Gender=Neut|Number=Sing": {"pos": "X"},
"X__Degree=Pos|Mood=Imp|Variant=Short|VerbForm=Fin": {"pos": "X"},
"X__Degree=Pos|Mood=Sub|VerbForm=Fin": {"pos": "X"},
"X__Degree=Pos|Number=Plur": {"pos": "X"},
"X__Degree=Pos|Number=Plur|Person=2|Subcat=Tran": {"pos": "X"},
"X__Degree=Pos|Number=Plur|Variant=Short": {"pos": "X"},
"X__Degree=Pos|Number=Sing": {"pos": "X"},
"X__Degree=Pos|Number=Sing|Person=1|Poss=Yes|PronType=Prs": {"pos": "X"},
"X__Degree=Pos|Number=Sing|Person=2": {"pos": "X"},
"X__Degree=Pos|Number=Sing|Person=3": {"pos": "X"},
"X__Degree=Pos|Number=Sing|PronType=Ind": {"pos": "X"},
"X__Degree=Pos|Number=Sing|Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__Degree=Pos|Number=Sing|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__Degree=Pos|Number=Sing|Variant=Short": {"pos": "X"},
"X__Degree=Pos|PronType=Dem": {"pos": "X"},
"X__Degree=Pos|Subcat=Tran": {"pos": "X"},
"X__Degree=Pos|Variant=Short": {"pos": "X"},
"X__Degree=Pos|Variant=Short|VerbForm=Inf": {"pos": "X"},
"X__Degree=Pos|VerbForm=Inf": {"pos": "X"},
"X__Gender=Com|Number=Sing": {"pos": "X"},
"X__Gender=Neut": {"pos": "X"},
"X__Gender=Neut|Number=Sing": {"pos": "X"},
"X__Gender=Neut|VerbForm=Inf": {"pos": "X"},
"X__Mood=Sub|Number=Sing|VerbForm=Fin": {"pos": "X"},
"X__Mood=Sub|VerbForm=Fin": {"pos": "X"},
"X__Number=Plur": {"pos": "X"},
"X__Number=Plur,Sing|Person=3": {"pos": "X"},
"X__Number=Plur|Person=1|Poss=Yes|PronType=Prs|VerbForm=Inf": {"pos": "X"},
"X__Number=Plur|PronType=Ind": {"pos": "X"},
"X__Number=Plur|PronType=Int": {"pos": "X"},
"X__Number=Plur|Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__Number=Plur|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__Number=Sing": {"pos": "X"},
"X__Number=Sing|Person=3": {"pos": "X"},
"X__Number=Sing|PronType=Dem": {"pos": "X"},
"X__Number=Sing|PronType=Ind": {"pos": "X"},
"X__Number=Sing|PronType=Int": {"pos": "X"},
"X__Number=Sing|PronType=Rel": {"pos": "X"},
"X__Number=Sing|Subcat=Intr|Tense=Pres|VerbForm=Part": {"pos": "X"},
"X__Number=Sing|Subcat=Tran": {"pos": "X"},
"X__Number=Sing|Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__Number=Sing|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__Number=Sing|Tense=Pres|VerbForm=Part": {"pos": "X"},
"X__Person=3|PronType=Prs|Reflex=Yes": {"pos": "X"},
"X__PronType=Dem": {"pos": "X"},
"X__PronType=Ind": {"pos": "X"},
"X__PronType=Int": {"pos": "X"},
"X__PronType=Rel": {"pos": "X"},
"X__Subcat=Intr|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__Subcat=Tran|Tense=Past|VerbForm=Part": {"pos": "X"},
"X__VerbForm=Inf": {"pos": "X"},
"X__VerbForm=Inf|VerbType=Mod": {"pos": "X"},
"X__VerbType=Aux,Cop": {"pos": "X"},
"X___": {"pos": "X"},
"_SP": {"pos": "SPACE"}
}

View File

@ -5,6 +5,7 @@ from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
from .stop_words import STOP_WORDS from .stop_words import STOP_WORDS
from .lex_attrs import LEX_ATTRS from .lex_attrs import LEX_ATTRS
from .lemmatizer import LOOKUP from .lemmatizer import LOOKUP
from .tag_map import TAG_MAP
from ..tokenizer_exceptions import BASE_EXCEPTIONS from ..tokenizer_exceptions import BASE_EXCEPTIONS
from ..norm_exceptions import BASE_NORMS from ..norm_exceptions import BASE_NORMS
@ -21,6 +22,7 @@ class PortugueseDefaults(Language.Defaults):
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS) tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
stop_words = STOP_WORDS stop_words = STOP_WORDS
lemma_lookup = LOOKUP lemma_lookup = LOOKUP
tag_map = TAG_MAP
class Portuguese(Language): class Portuguese(Language):

5039
spacy/lang/pt/tag_map.py Normal file

File diff suppressed because it is too large Load Diff

View File

@ -135,10 +135,6 @@ class Language(object):
self.pipeline = [] self.pipeline = []
self._optimizer = None self._optimizer = None
def __reduce__(self):
bytes_data = self.to_bytes(vocab=False)
return (unpickle_language, (self.vocab, self.meta, bytes_data))
@property @property
def path(self): def path(self):
return self._path return self._path
@ -724,12 +720,6 @@ class DisabledPipes(list):
self[:] = [] self[:] = []
def unpickle_language(vocab, meta, bytes_data):
lang = Language(vocab=vocab)
lang.from_bytes(bytes_data)
return lang
def _pipe(func, docs): def _pipe(func, docs):
for doc in docs: for doc in docs:
func(doc) func(doc)

View File

@ -318,7 +318,7 @@ class Tensorizer(Pipe):
loss, d_scores = self.get_loss(docs, golds, scores) loss, d_scores = self.get_loss(docs, golds, scores)
d_inputs = bp_scores(d_scores, sgd=sgd) d_inputs = bp_scores(d_scores, sgd=sgd)
d_inputs = self.model.ops.xp.split(d_inputs, len(self.input_models), axis=1) d_inputs = self.model.ops.xp.split(d_inputs, len(self.input_models), axis=1)
for d_input, bp_input in zip(d_inputs, bp_inputs): for d_input, bp_input in zip(d_inputs, bp_inputs):
bp_input(d_input, sgd=sgd) bp_input(d_input, sgd=sgd)
if losses is not None: if losses is not None:
losses.setdefault(self.name, 0.) losses.setdefault(self.name, 0.)
@ -777,7 +777,8 @@ class TextCategorizer(Pipe):
def predict(self, docs): def predict(self, docs):
scores = self.model(docs) scores = self.model(docs)
scores = self.model.ops.asarray(scores) scores = self.model.ops.asarray(scores)
return scores tensors = [doc.tensor for doc in docs]
return scores, tensors
def set_annotations(self, docs, scores, tensors=None): def set_annotations(self, docs, scores, tensors=None):
for i, doc in enumerate(docs): for i, doc in enumerate(docs):

View File

@ -2,10 +2,22 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import pytest import pytest
from ...language import Language
from ...pipeline import DependencyParser
@pytest.mark.models('en') @pytest.mark.models('en')
def test_beam_parse(EN): def test_beam_parse_en(EN):
doc = EN(u'Australia is a country', disable=['ner']) doc = EN(u'Australia is a country', disable=['ner'])
ents = EN.entity(doc, beam_width=2) ents = EN.entity(doc, beam_width=2)
print(ents) print(ents)
def test_beam_parse():
nlp = Language()
nlp.add_pipe(DependencyParser(nlp.vocab), name='parser')
nlp.parser.add_label('nsubj')
nlp.begin_training()
doc = nlp.make_doc(u'Australia is a country')
nlp.parser(doc, beam_width=2)

View File

@ -358,7 +358,7 @@ cdef class Vectors:
def load_vectors(path): def load_vectors(path):
xp = Model.ops.xp xp = Model.ops.xp
if path.exists(): if path.exists():
self.data = xp.load(path) self.data = xp.load(str(path))
serializers = OrderedDict(( serializers = OrderedDict((
('key2row', load_key2row), ('key2row', load_key2row),

View File

@ -52,7 +52,7 @@
vertical-align: middle vertical-align: middle
margin-right: 1rem margin-right: 1rem
cursor: pointer cursor: pointer
border-radius: 50% border-radius: 2px
.c-quickstart__input--check:checked + &:before .c-quickstart__input--check:checked + &:before
background: $color-theme url() background: $color-theme url()

View File

@ -43,6 +43,8 @@
"en": ["en_core_web_sm", "en_core_web_lg", "en_vectors_web_lg"], "en": ["en_core_web_sm", "en_core_web_lg", "en_vectors_web_lg"],
"de": ["de_core_news_sm"], "de": ["de_core_news_sm"],
"es": ["es_core_news_sm", "es_core_news_md"], "es": ["es_core_news_sm", "es_core_news_md"],
"pt": ["pt_core_news_sm"],
"fr": ["fr_core_news_sm"],
"it": ["it_core_news_sm"], "it": ["it_core_news_sm"],
"xx": ["xx_ent_wiki_sm"] "xx": ["xx_ent_wiki_sm"]
}, },

View File

@ -139,7 +139,7 @@ p
# merge base exceptions and custom tokenizer exceptions # merge base exceptions and custom tokenizer exceptions
tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS) tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
stop_words = set(STOP_WORDS) stop_words = STOP_WORDS
# create actual Language class # create actual Language class
class Xxxxx(Language): class Xxxxx(Language):
@ -248,7 +248,7 @@ p
{ORTH: period, LEMMA: "p.m."}] {ORTH: period, LEMMA: "p.m."}]
# only declare this at the bottom # only declare this at the bottom
TOKENIZER_EXCEPTIONS = dict(_exc) TOKENIZER_EXCEPTIONS = _exc
+aside("Generating tokenizer exceptions") +aside("Generating tokenizer exceptions")
| Keep in mind that generating exceptions only makes sense if there's a | Keep in mind that generating exceptions only makes sense if there's a

View File

@ -20,10 +20,10 @@ p Using pip, spaCy releases are currently only available as source packages.
p p
| When using pip it is generally recommended to install packages in a | When using pip it is generally recommended to install packages in a
| #[code virtualenv] to avoid modifying system state: | virtual environment to avoid modifying system state:
+code(false, "bash"). +code(false, "bash").
virtualenv .env venv .env
source .env/bin/activate source .env/bin/activate
pip install spacy pip install spacy
@ -115,29 +115,29 @@ p
| #[a(href="#source-windows") Windows] for details. | #[a(href="#source-windows") Windows] for details.
+code(false, "bash"). +code(false, "bash").
# make sure you are using recent pip/virtualenv versions python -m pip install -U pip venv # update pip & virtualenv
python -m pip install -U pip virtualenv git clone #{gh("spaCy")} # clone spaCy
git clone #{gh("spaCy")} cd spaCy # navigate into directory
cd spaCy
virtualenv .env venv .env # create environment in .env
source .env/bin/activate source .env/bin/activate # activate virtual environment
pip install -r requirements.txt export PYTHONPATH=`pwd` # set Python path to spaCy directory
pip install -e . pip install -r requirements.txt # install all requirements
python setup.py build_ext --inplace # compile spaCy
p p
| Compared to regular install via pip, | Compared to regular install via pip, the
| #[+a(gh("spaCy", "requirements.txt")) requirements.txt] | #[+src(gh("spaCy", "requirements.txt")) #[code requirements.txt]]
| additionally installs developer dependencies such as Cython. | additionally installs developer dependencies such as Cython. See the
| the #[+a("#section-quickstart") quickstart widget] to get the right
p | commands for your platform and Python version. Instead of the above
| Instead of the above verbose commands, you can also use the following | verbose commands, you can also use the following
| #[+a("http://www.fabfile.org/") Fabric] commands: | #[+a("http://www.fabfile.org/") Fabric] commands:
+table(["Command", "Description"]) +table(["Command", "Description"])
+row +row
+cell #[code fab env] +cell #[code fab env]
+cell Create #[code virtualenv] and delete previous one, if it exists. +cell Create a virtual environment and delete previous one, if it exists.
+row +row
+cell #[code fab make] +cell #[code fab make]
@ -152,7 +152,7 @@ p
+cell Run basic tests, aborting after first failure. +cell Run basic tests, aborting after first failure.
p p
| All commands assume that your #[code virtualenv] is located in a | All commands assume that your virtual environment is located in a
| directory #[code .env]. If you're using a different directory, you can | directory #[code .env]. If you're using a different directory, you can
| change it via the environment variable #[code VENV_DIR], for example: | change it via the environment variable #[code VENV_DIR], for example:

View File

@ -21,7 +21,7 @@
+qs({package: 'source'}) cd spaCy +qs({package: 'source'}) cd spaCy
+qs({package: 'source'}) export PYTHONPATH=`pwd` +qs({package: 'source'}) export PYTHONPATH=`pwd`
+qs({package: 'source'}) pip install -r requirements.txt +qs({package: 'source'}) pip install -r requirements.txt
+qs({package: 'source'}) pip install -e . +qs({package: 'source'}) python setup.py build_ext --inplace
for _, model in MODELS for _, model in MODELS
+qs({model: model}) spacy download #{model} +qs({model: model}) spacy download #{model}

View File

@ -104,8 +104,8 @@ p
| install and use your extension, for example by uploading it to | install and use your extension, for example by uploading it to
| #[+a("https://pypi.python.org") PyPi]. If you're sharing your code on | #[+a("https://pypi.python.org") PyPi]. If you're sharing your code on
| GitHub, don't forget to tag it | GitHub, don't forget to tag it
| with #[+a("https://github.com/search?q=topic%3Aspacy") #[code spacy]] | with #[+a("https://github.com/topics/spacy?o=desc&s=stars") #[code spacy]]
| and #[+a("https://github.com/search?q=topic%3Aspacy-extensions") #[code spacy-extensions]] | and #[+a("https://github.com/topics/spacy-extension?o=desc&s=stars") #[code spacy-extension]]
| to help people find it. If you post it on Twitter, feel free to tag | to help people find it. If you post it on Twitter, feel free to tag
| #[+a("https://twitter.com/" + SOCIAL.twitter) @#{SOCIAL.twitter}] | #[+a("https://twitter.com/" + SOCIAL.twitter) @#{SOCIAL.twitter}]
| so we can check it out. | so we can check it out.

View File

@ -56,7 +56,7 @@ include ../_includes/_mixins
| to #[code Doc], #[code Token] and #[code Span] attributes. | to #[code Doc], #[code Token] and #[code Span] attributes.
.u-text-right .u-text-right
+button("https://github.com/search?o=desc&q=spacy-extensions&s=stars&type=Repositories&utf8=%E2%9C%93", false, "primary", "small") See more extensions on GitHub +button("https://github.com/topics/spacy-extension?o=desc&s=stars", false, "primary", "small") See more extensions on GitHub
+section("demos") +section("demos")
+h(2, "demos") Demos & Visualizations +h(2, "demos") Demos & Visualizations