💫 Port master changes over to develop (#2979)

* Create aryaprabhudesai.md (#2681) * Update _install.jade (#2688) Typo fix: "models" -> "model" * Add FAC to spacy.explain (resolves #2706) * Remove docstrings for deprecated arguments (see #2703) * When calling getoption() in conftest.py, pass a default option (#2709) * When calling getoption() in conftest.py, pass a default option This is necessary to allow testing an installed spacy by running: pytest --pyargs spacy * Add contributor agreement * update bengali token rules for hyphen and digits (#2731) * Less norm computations in token similarity (#2730) * Less norm computations in token similarity * Contributor agreement * Remove ')' for clarity (#2737) Sorry, don't mean to be nitpicky, I just noticed this when going through the CLI and thought it was a quick fix. That said, if this was intention than please let me know. * added contributor agreement for mbkupfer (#2738) * Basic support for Telugu language (#2751) * Lex _attrs for polish language (#2750) * Signed spaCy contributor agreement * Added polish version of english lex_attrs * Introduces a bulk merge function, in order to solve issue #653 (#2696) * Fix comment * Introduce bulk merge to increase performance on many span merges * Sign contributor agreement * Implement pull request suggestions * Describe converters more explicitly (see #2643) * Add multi-threading note to Language.pipe (resolves #2582) [ci skip] * Fix formatting * Fix dependency scheme docs (closes #2705) [ci skip] * Don't set stop word in example (closes #2657) [ci skip] * Add words to portuguese language _num_words (#2759) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Update Indonesian model (#2752) * adding e-KTP in tokenizer exceptions list * add exception token * removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception * add tokenizer exceptions list * combining base_norms with norm_exceptions * adding norm_exception * fix double key in lemmatizer * remove unused import on punctuation.py * reformat stop_words to reduce number of lines, improve readibility * updating tokenizer exception * implement is_currency for lang/id * adding orth_first_upper in tokenizer_exceptions * update the norm_exception list * remove bunch of abbreviations * adding contributors file * Fixed spaCy+Keras example (#2763) * bug fixes in keras example * created contributor agreement * Adding French hyphenated first name (#2786) * Fix typo (closes #2784) * Fix typo (#2795) [ci skip] Fixed typo on line 6 "regcognizer --> recognizer" * Adding basic support for Sinhala language. (#2788) * adding Sinhala language package, stop words, examples and lex_attrs. * Adding contributor agreement * Updating contributor agreement * Also include lowercase norm exceptions * Fix error (#2802) * Fix error ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function * added spaCy Contributor Agreement * Add charlax's contributor agreement (#2805) * agreement of contributor, may I introduce a tiny pl languge contribution (#2799) * Contributors agreement * Contributors agreement * Contributors agreement * Add jupyter=True to displacy.render in documentation (#2806) * Revert "Also include lowercase norm exceptions" This reverts commit 70f4e8adf3. * Remove deprecated encoding argument to msgpack * Set up dependency tree pattern matching skeleton (#2732) * Fix bug when too many entity types. Fixes #2800 * Fix Python 2 test failure * Require older msgpack-numpy * Restore encoding arg on msgpack-numpy * Try to fix version pin for msgpack-numpy * Update Portuguese Language (#2790) * Add words to portuguese language _num_words * Add words to portuguese language _num_words * Portuguese - Add/remove stopwords, fix tokenizer, add currency symbols * Extended punctuation and norm_exceptions in the Portuguese language * Correct error in spacy universe docs concerning spacy-lookup (#2814) * Update Keras Example for (Parikh et al, 2016) implementation (#2803) * bug fixes in keras example * created contributor agreement * baseline for Parikh model * initial version of parikh 2016 implemented * tested asymmetric models * fixed grevious error in normalization * use standard SNLI test file * begin to rework parikh example * initial version of running example * start to document the new version * start to document the new version * Update Decompositional Attention.ipynb * fixed calls to similarity * updated the README * import sys package duh * simplified indexing on mapping word to IDs * stupid python indent error * added code from https://github.com/tensorflow/tensorflow/issues/3388 for tf bug workaround * Fix typo (closes #2815) [ci skip] * Update regex version dependency * Set version to 2.0.13.dev3 * Skip seemingly problematic test * Remove problematic test * Try previous version of regex * Revert "Remove problematic test" This reverts commit bdebbef455. * Unskip test * Try older version of regex * 💫 Update training examples and use minibatching (#2830)  ## Description Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results. ### Types of change enhancements ## Checklist  - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Visual C++ link updated (#2842) (closes #2841) [ci skip] * New landing page * Add contribution agreement * Correcting lang/ru/examples.py (#2845) * Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement * Correct some grammatical inaccuracies in lang\ru\examples.py * Move contributor agreement to separate file * Set version to 2.0.13.dev4 * Add Persian(Farsi) language support (#2797) * Also include lowercase norm exceptions * Remove in favour of https://github.com/explosion/spaCy/graphs/contributors * Rule-based French Lemmatizer (#2818)  ## Description  Add a rule-based French Lemmatizer following the english one and the excellent PR for [greek language optimizations](https://github.com/explosion/spaCy/pull/2558) to adapt the Lemmatizer class. ### Types of change  - Lemma dictionary used can be found [here](http://infolingu.univ-mlv.fr/DonneesLinguistiques/Dictionnaires/telechargement.html), I used the XML version. - Add several files containing exhaustive list of words for each part of speech - Add some lemma rules - Add POS that are not checked in the standard Lemmatizer, i.e PRON, DET, ADV and AUX - Modify the Lemmatizer class to check in lookup table as a last resort if POS not mentionned - Modify the lemmatize function to check in lookup table as a last resort - Init files are updated so the model can support all the functionalities mentioned above - Add words to tokenizer_exceptions_list.py in respect to regex used in tokenizer_exceptions.py ## Checklist  - [X] I have submitted the spaCy Contributor Agreement. - [X] I ran the tests, and all new and existing tests passed. - [X] My changes don't require a change to the documentation, or if they do, I've added all required information. * Set version to 2.0.13 * Fix formatting and consistency * Update docs for new version [ci skip] * Increment version [ci skip] * Add info on wheels [ci skip] * Adding "This is a sentence" example to Sinhala (#2846) * Add wheels badge * Update badge [ci skip] * Update README.rst [ci skip] * Update murmurhash pin * Increment version to 2.0.14.dev0 * Update GPU docs for v2.0.14 * Add wheel to setup_requires * Import prefer_gpu and require_gpu functions from Thinc * Add tests for prefer_gpu() and require_gpu() * Update requirements and setup.py * Workaround bug in thinc require_gpu * Set version to v2.0.14 * Update push-tag script * Unhack prefer_gpu * Require thinc 6.10.6 * Update prefer_gpu and require_gpu docs [ci skip] * Fix specifiers for GPU * Set version to 2.0.14.dev1 * Set version to 2.0.14 * Update Thinc version pin * Increment version * Fix msgpack-numpy version pin * Increment version * Update version to 2.0.16 * Update version [ci skip] * Redundant ')' in the Stop words' example (#2856)  ## Description  ### Types of change  ## Checklist  - [ ] I have submitted the spaCy Contributor Agreement. - [ ] I ran the tests, and all new and existing tests passed. - [ ] My changes don't require a change to the documentation, or if they do, I've added all required information. * Documentation improvement regarding joblib and SO (#2867) Some documentation improvements ## Description 1. Fixed the dead URL to joblib 2. Fixed Stack Overflow brand name (with space) ### Types of change Documentation ## Checklist  - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * raise error when setting overlapping entities as doc.ents (#2880) * Fix out-of-bounds access in NER training The helper method state.B(1) gets the index of the first token of the buffer, or -1 if no such token exists. Normally this is safe because we pass this to functions like state.safe_get(), which returns an empty token. Here we used it directly as an array index, which is not okay! This error may have been the cause of out-of-bounds access errors during training. Similar errors may still be around, so much be hunted down. Hunting this one down took a long time...I printed out values across training runs and diffed, looking for points of divergence between runs, when no randomness should be allowed. * Change PyThaiNLP Url (#2876) * Fix missing comma * Add example showing a fix-up rule for space entities * Set version to 2.0.17.dev0 * Update regex version * Revert "Update regex version" This reverts commit 62358dd867. * Try setting older regex version, to align with conda * Set version to 2.0.17 * Add spacy-js to universe [ci-skip] * Add spacy-raspberry to universe (closes #2889) * Add script to validate universe json [ci skip] * Removed space in docs + added contributor indo (#2909) * - removed unneeded space in documentation * - added contributor info * Allow input text of length up to max_length, inclusive (#2922) * Include universe spec for spacy-wordnet component (#2919) * feat: include universe spec for spacy-wordnet component * chore: include spaCy contributor agreement * Minor formatting changes [ci skip] * Fix image [ci skip] Twitter URL doesn't work on live site * Check if the word is in one of the regular lists specific to each POS (#2886) * 💫 Create random IDs for SVGs to prevent ID clashes (#2927) Resolves #2924. ## Description Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.) ### Types of change bug fix ## Checklist  - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Fix typo [ci skip] * fixes symbolic link on py3 and windows (#2949) * fixes symbolic link on py3 and windows during setup of spacy using command python -m spacy link en_core_web_sm en closes #2948 * Update spacy/compat.py Co-Authored-By: cicorias <cicorias@users.noreply.github.com> * Fix formatting * Update universe [ci skip] * Catalan Language Support (#2940) * Catalan language Support * Ddding Catalan to documentation * Sort languages alphabetically [ci skip] * Update tests for pytest 4.x (#2965)  ## Description - [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize)) - [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here) ### Types of change  ## Checklist  - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information. * Fix regex pin to harmonize with conda (#2964) * Update README.rst * Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977) Fixes #2976 * Fix typo * Fix typo * Remove duplicate file * Require thinc 7.0.0.dev2 Fixes bug in gpu_ops that would use cupy instead of numpy on CPU * Add missing import * Fix error IDs * Fix tests
2025-09-23 04:26:46 +03:00 · 2018-11-29 16:30:29 +01:00 · 2018-11-29 16:30:29 +01:00 · d33953037e
commit d33953037e
parent 681258e29b
159 changed files with 1011058 additions and 218744 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@ -1,7 +1,7 @@
 <!--- Please provide a summary in the title and describe your issue here.
 Is this a bug or feature request? If a bug, include all the steps that led to the issue.

-If you're looking for help with your code, consider posting a question on StackOverflow instead:
+If you're looking for help with your code, consider posting a question on Stack Overflow instead:
 http://stackoverflow.com/questions/tagged/spacy -->


--- a/.github/ISSUE_TEMPLATE/05_other.md
+++ b/.github/ISSUE_TEMPLATE/05_other.md
@ -1,11 +1,11 @@
 ---
 name: "\U0001F4AC Anything else?"
 about: For general usage questions or help with your code, please consider
-  posting on StackOverflow instead.
+  posting on Stack Overflow instead.

 ---

-<!-- Describe your issue here. Please keep in mind that the GitHub issue tracker is mostly intended for reports related to the spaCy code base and source, and for bugs and feature requests. If you're looking for help with your code, consider posting a question on StackOverflow instead: http://stackoverflow.com/questions/tagged/spacy -->
+<!-- Describe your issue here. Please keep in mind that the GitHub issue tracker is mostly intended for reports related to the spaCy code base and source, and for bugs and feature requests. If you're looking for help with your code, consider posting a question on Stack Overflow instead: http://stackoverflow.com/questions/tagged/spacy -->

 ## Your Environment
 <!-- Include details of your environment. If you're using spaCy 1.7+, you can also type `python -m spacy info --markdown` and copy-paste the result here.-->
--- a/.github/contributors/ALSchwalm.md
+++ b/.github/contributors/ALSchwalm.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                    |
+|------------------------------- | ------------------------ |
+| Name                           | Adam Schwalm             |
+| Company name (if applicable)   | Star Lab                 |
+| Title or role (if applicable)  | Software Engineer        |
+| Date                           | 2018-11-28               |
+| GitHub username                | ALSchwalm                |
+| Website (optional)             | https://alschwalm.com    |
--- a/.github/contributors/BramVanroy.md
+++ b/.github/contributors/BramVanroy.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                 |
+|------------------------------- | ----------------------|
+| Name                           | Bram Vanroy           |
+| Company name (if applicable)   |                       |
+| Title or role (if applicable)  |                       |
+| Date                           | October 19, 2018      |
+| GitHub username                | BramVanroy            |
+| Website (optional)             | https://bramvanroy.be |
--- a/.github/contributors/Cinnamy.md
+++ b/.github/contributors/Cinnamy.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Marina Lysyuk        |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 13.10.2018           |
+| GitHub username                | Cinnamy              |
+| Website (optional)             |                      |
--- a/.github/contributors/JKhakpour.md
+++ b/.github/contributors/JKhakpour.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Ja'far Khakpour      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2018-09-24           |
+| GitHub username                | JKhakpour            |
+| Website (optional)             |                      |
--- a/.github/contributors/aniruddha-adhikary.md
+++ b/.github/contributors/aniruddha-adhikary.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Aniruddha Adhikary   |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2018-09-05           |
+| GitHub username                | aniruddha-adhikary   |
+| Website (optional)             | https://adhikary.net |
--- a/.github/contributors/aongko.md
+++ b/.github/contributors/aongko.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Andrew Ongko         |
+| Company name (if applicable)   | Kurio                |
+| Title or role (if applicable)  | Senior Data Science  |
+| Date                           | Sep 10, 2018         |
+| GitHub username                | aongko               |
+| Website (optional)             |                      |
--- a/.github/contributors/aryaprabhudesai.md
+++ b/.github/contributors/aryaprabhudesai.md
@ -0,0 +1,54 @@
+spaCy contributor agreement
+This spaCy Contributor Agreement ("SCA") is based on the Oracle Contributor Agreement. The SCA applies to any contribution that you make to any product or project managed by us (the "project"), and sets out the intellectual property rights you grant to us in the contributed materials. The term "us" shall mean ExplosionAI UG (haftungsbeschränkt). The term "you" shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested below and include the filled-in version with your first pull request, under the folder .github/contributors/. The name of the file should be your GitHub username, with the extension .md. For example, the user example_user would create the file .github/contributors/example_user.md.
+
+Read this agreement carefully before signing. These terms and conditions constitute a binding legal agreement.
+
+Contributor Agreement
+The term "contribution" or "contributed materials" means any source code, object code, patch, tool, sample, graphic, specification, manual, documentation, or any other material posted or submitted by you to the project.
+
+With respect to any worldwide copyrights, or copyright applications and registrations, in your contribution:
+
+you hereby assign to us joint ownership, and to the extent that such assignment is or becomes invalid, ineffective or unenforceable, you hereby grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, royalty-free, unrestricted license to exercise all rights under those copyrights. This includes, at our option, the right to sublicense these same rights to third parties through multiple levels of sublicensees or other licensing arrangements;
+
+you agree that each of us can do all things in relation to your contribution as if each of us were the sole owners, and if one of us makes a derivative work of your contribution, the one who makes the derivative work (or has it made will be the sole owner of that derivative work;
+
+you agree that you will not assert any moral rights in your contribution against us, our licensees or transferees;
+
+you agree that we may register a copyright in your contribution and exercise all ownership rights associated with it; and
+
+you agree that neither of us has any duty to consult with, obtain the consent of, pay or render an accounting to the other for any use or distribution of your contribution.
+
+With respect to any patents you own, or that you can license without payment to any third party, you hereby grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, royalty-free license to:
+
+make, have made, use, sell, offer to sell, import, and otherwise transfer your contribution in whole or in part, alone or in combination with or included in any product, work or materials arising out of the project to which your contribution was submitted, and
+
+at our option, to sublicense these same rights to third parties through multiple levels of sublicensees or other licensing arrangements.
+
+Except as set out above, you keep all right, title, and interest in your contribution. The rights that you grant to us under these terms are effective on the date you first submitted a contribution to us, even if your submission took place before the date you sign these terms.
+
+You covenant, represent, warrant and agree that:
+
+Each contribution that you submit is and shall be an original work of authorship and you can legally grant the rights set out in this SCA;
+
+to the best of your knowledge, each contribution will not violate any third party's copyrights, trademarks, patents, or other intellectual property rights; and
+
+each contribution shall be in compliance with U.S. export control laws and other applicable export and import laws. You agree to notify us if you become aware of any circumstance which would make any of the foregoing representations inaccurate in any respect. We may publicly disclose your participation in the project, including the fact that you have signed the SCA.
+
+This SCA is governed by the laws of the State of California and applicable U.S. Federal law. Any choice of law rules will not apply.
+
+Please place an “x” on one of the applicable statement below. Please do NOT mark both statements:
+
+ [X] I am signing on behalf of myself as an individual and no other person or entity, including my employer, has or will have rights with respect to my contributions.
+
+ I am signing on behalf of my employer or a legal entity and I have the actual authority to contractually bind that entity.
+
+Contributor Details
+Field	Entry
+Name	Arya Prabhudesai
+Company name (if applicable)	-
+Title or role (if applicable)	-
+Date	2018-08-17
+GitHub username	aryaprabhudesai
+Website (optional)	-
--- a/.github/contributors/charlax.md
+++ b/.github/contributors/charlax.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [x] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Charles-Axel Dein                     |
+| Company name (if applicable)   | Skrib                     |
+| Title or role (if applicable)  | CEO                     |
+| Date                           | 27/09/2018                     |
+| GitHub username                | charlax                     |
+| Website (optional)             | www.dein.fr                     |
--- a/.github/contributors/cicorias.md
+++ b/.github/contributors/cicorias.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Shawn Cicoria                     |
+| Company name (if applicable)   |   Microsoft                   |
+| Title or role (if applicable)  |   Principal Software Engineer                   |
+| Date                           |   November  20, 2018                  |
+| GitHub username                |     cicorias                 |
+| Website (optional)             |      www.cicoria.com                |
--- a/.github/contributors/darindf.md
+++ b/.github/contributors/darindf.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                              |
+|------------------------------- | --------------------               |
+| Name                           | Darin DeForest                     |
+| Company name (if applicable)   | Ipro Tech                          |
+| Title or role (if applicable)  | Senior Software Engineer           |
+| Date                           | 2018-09-26                         |
+| GitHub username                | darindf                            |
+| Website (optional)             |                                    |
--- a/.github/contributors/filipecaixeta.md
+++ b/.github/contributors/filipecaixeta.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Filipe Caixeta       |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 09.12.2018           |
+| GitHub username                | filipecaixeta        |
+| Website (optional)             | filipecaixeta.com.br |
--- a/.github/contributors/frascuchon.md
+++ b/.github/contributors/frascuchon.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Francisco Aranda     |
+| Company name (if applicable)   | recognai             |
+| Title or role (if applicable)  |                      |
+| Date                           |                      |
+| GitHub username                | frascuchon           |
+| Website (optional)             | https://recogn.ai    |
--- a/.github/contributors/free-variation.md
+++ b/.github/contributors/free-variation.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  John Stewart        |
+| Company name (if applicable)   |  Amplify             |
+| Title or role (if applicable)  |  SVP Research        |
+| Date                           |  14/09/2018          |
+| GitHub username                |  free-variation      |
+| Website (optional)             |                      |
--- a/.github/contributors/grivaz.md
+++ b/.github/contributors/grivaz.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |C. Grivaz                   |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |08.22.2018                  |
+| GitHub username                |grivaz               |
+| Website (optional)             |                      |
--- a/.github/contributors/jacopofar.md
+++ b/.github/contributors/jacopofar.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [X] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |   Jacopo Farina      |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |  2018-10-12          |
+| GitHub username                |  jacopofar           |
+| Website (optional)             |  jacopofarina.eu     |
--- a/.github/contributors/keshan.md
+++ b/.github/contributors/keshan.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Keshan Sodimana |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | Sep 21, 2018  |
+| GitHub username                | keshan     |
+| Website (optional)             |                      |
--- a/.github/contributors/mbkupfer.md
+++ b/.github/contributors/mbkupfer.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           |  Maxim Kupfer        |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           |  Sep 6, 2018         |
+| GitHub username                |  mbkupfer            |
+| Website (optional)             |                      |
--- a/.github/contributors/mikelibg.md
+++ b/.github/contributors/mikelibg.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                    |
+|------------------------------- | ------------------------ |
+| Name                           | Michael Liberman         |
+| Company name (if applicable)   |                          |
+| Title or role (if applicable)  |                          |
+| Date                           | 2018-11-08               |
+| GitHub username                | mikelibg                 |
+| Website (optional)             |                          |
--- a/.github/contributors/mpuig.md
+++ b/.github/contributors/mpuig.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Marc Puig            |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2018-11-17           |
+| GitHub username                | mpuig                |
+| Website (optional)             |                      |
--- a/.github/contributors/phojnacki.md
+++ b/.github/contributors/phojnacki.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ X ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                                 |
+|------------------------------- | ------------------------------------- |
+| Name                           | Przemysław Hojnacki                   |
+| Company name (if applicable)   |                                       |
+| Title or role (if applicable)  |                                       |
+| Date                           | 12/09/2018                            |
+| GitHub username                | phojnacki                             |
+| Website (optional)             | https://about.me/przemyslaw.hojnacki  |
--- a/.github/contributors/pzelasko.md
+++ b/.github/contributors/pzelasko.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your 
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Piotr Żelasko        |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 04-09-2018           |
+| GitHub username                | pzelasko             |
+| Website (optional)             |                      |
--- a/.github/contributors/sainathadapa.md
+++ b/.github/contributors/sainathadapa.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [x] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Sainath Adapa   |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 2018-09-06           |
+| GitHub username                | sainathadapa         |
+| Website (optional)             |                      |
--- a/.github/contributors/tyburam.md
+++ b/.github/contributors/tyburam.md
@ -0,0 +1,106 @@
+# spaCy contributor agreement
+
+This spaCy Contributor Agreement (**"SCA"**) is based on the
+[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
+The SCA applies to any contribution that you make to any product or project
+managed by us (the **"project"**), and sets out the intellectual property rights
+you grant to us in the contributed materials. The term **"us"** shall mean
+[ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term
+**"you"** shall mean the person or entity identified below.
+
+If you agree to be bound by these terms, fill in the information requested
+below and include the filled-in version with your first pull request, under the
+folder [`.github/contributors/`](/.github/contributors/). The name of the file
+should be your GitHub username, with the extension `.md`. For example, the user
+example_user would create the file `.github/contributors/example_user.md`.
+
+Read this agreement carefully before signing. These terms and conditions
+constitute a binding legal agreement.
+
+## Contributor Agreement
+
+1. The term "contribution" or "contributed materials" means any source code,
+object code, patch, tool, sample, graphic, specification, manual,
+documentation, or any other material posted or submitted by you to the project.
+
+2. With respect to any worldwide copyrights, or copyright applications and
+registrations, in your contribution:
+
+    * you hereby assign to us joint ownership, and to the extent that such
+    assignment is or becomes invalid, ineffective or unenforceable, you hereby
+    grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
+    royalty-free, unrestricted license to exercise all rights under those
+    copyrights. This includes, at our option, the right to sublicense these same
+    rights to third parties through multiple levels of sublicensees or other
+    licensing arrangements;
+
+    * you agree that each of us can do all things in relation to your
+    contribution as if each of us were the sole owners, and if one of us makes
+    a derivative work of your contribution, the one who makes the derivative
+    work (or has it made will be the sole owner of that derivative work;
+
+    * you agree that you will not assert any moral rights in your contribution
+    against us, our licensees or transferees;
+
+    * you agree that we may register a copyright in your contribution and
+    exercise all ownership rights associated with it; and
+
+    * you agree that neither of us has any duty to consult with, obtain the
+    consent of, pay or render an accounting to the other for any use or
+    distribution of your contribution.
+
+3. With respect to any patents you own, or that you can license without payment
+to any third party, you hereby grant to us a perpetual, irrevocable,
+non-exclusive, worldwide, no-charge, royalty-free license to:
+
+    * make, have made, use, sell, offer to sell, import, and otherwise transfer
+    your contribution in whole or in part, alone or in combination with or
+    included in any product, work or materials arising out of the project to
+    which your contribution was submitted, and
+
+    * at our option, to sublicense these same rights to third parties through
+    multiple levels of sublicensees or other licensing arrangements.
+
+4. Except as set out above, you keep all right, title, and interest in your
+contribution. The rights that you grant to us under these terms are effective
+on the date you first submitted a contribution to us, even if your submission
+took place before the date you sign these terms.
+
+5. You covenant, represent, warrant and agree that:
+
+    * Each contribution that you submit is and shall be an original work of
+    authorship and you can legally grant the rights set out in this SCA;
+
+    * to the best of your knowledge, each contribution will not violate any
+    third party's copyrights, trademarks, patents, or other intellectual
+    property rights; and
+
+    * each contribution shall be in compliance with U.S. export control laws and
+    other applicable export and import laws. You agree to notify us if you
+    become aware of any circumstance which would make any of the foregoing
+    representations inaccurate in any respect. We may publicly disclose your
+    participation in the project, including the fact that you have signed the SCA.
+
+6. This SCA is governed by the laws of the State of California and applicable
+U.S. Federal law. Any choice of law rules will not apply.
+
+7. Please place an “x” on one of the applicable statement below. Please do NOT
+mark both statements:
+
+    * [ ] I am signing on behalf of myself as an individual and no other person
+    or entity, including my employer, has or will have rights with respect to my
+    contributions.
+
+    * [ ] I am signing on behalf of my employer or a legal entity and I have the
+    actual authority to contractually bind that entity.
+
+## Contributor Details
+
+| Field                          | Entry                |
+|------------------------------- | -------------------- |
+| Name                           | Mateusz Tybura       |
+| Company name (if applicable)   |                      |
+| Title or role (if applicable)  |                      |
+| Date                           | 08.09.2018           |
+| GitHub username                | tyburam              |
+| Website (optional)             |                      |
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -26,7 +26,7 @@ also check the [troubleshooting guide](https://spacy.io/usage/#troubleshooting)
 to see if your problem is already listed there.

 If you're looking for help with your code, consider posting a question on
-[StackOverflow](http://stackoverflow.com/questions/tagged/spacy) instead. If you
+[Stack Overflow](http://stackoverflow.com/questions/tagged/spacy) instead. If you
 tag it `spacy` and `python`, more people will see it and hopefully be able to
 help. Please understand that we won't be able to provide individual support via
 email. We also believe that help is much more valuable if it's **shared publicly**,
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@ -1,83 +0,0 @@
-# 👥 Contributors
-
-This is a list of everyone who has made significant contributions to spaCy, in alphabetical order. Thanks a lot for the great work!
-
-* Adam Bittlingmayer, [@bittlingmayer](https://github.com/bittlingmayer)
-* Alexey Kim, [@yuukos](https://github.com/yuukos)
-* Alexis Eidelman, [@AlexisEidelman](https://github.com/AlexisEidelman)
-* Ali Zarezade, [@azarezade](https://github.com/azarezade)
-* Andreas Grivas, [@andreasgrv](https://github.com/andreasgrv)
-* Andrew Poliakov, [@pavlin99th](https://github.com/pavlin99th)
-* Aniruddha Adhikary, [@aniruddha-adhikary](https://github.com/aniruddha-adhikary)
-* Anto Binish Kaspar, [@binishkaspar](https://github.com/binishkaspar)
-* Avadh Patel, [@avadhpatel](https://github.com/avadhpatel)
-* Ben Eyal, [@beneyal](https://github.com/beneyal)
-* Bhargav Srinivasa, [@bhargavvader](https://github.com/bhargavvader)
-* Bruno P. Kinoshita, [@kinow](https://github.com/kinow)
-* Canbey Bilgili, [@cbilgili](https://github.com/cbilgili)
-* Chris DuBois, [@chrisdubois](https://github.com/chrisdubois)
-* Christoph Schwienheer, [@chssch](https://github.com/chssch)
-* Dafne van Kuppevelt, [@dafnevk](https://github.com/dafnevk)
-* Daniel Rapp, [@rappdw](https://github.com/rappdw)
-* Daniel Vila Suero, [@dvsrepo](https://github.com/dvsrepo)
-* Dmytro Sadovnychyi, [@sadovnychyi](https://github.com/sadovnychyi)
-* Eric Zhao, [@ericzhao28](https://github.com/ericzhao28)
-* Francisco Aranda, [@frascuchon](https://github.com/frascuchon)
-* Greg Baker, [@solresol](https://github.com/solresol)
-* Greg Dubbin, [@GregDubbin](https://github.com/GregDubbin)
-* Grégory Howard, [@Gregory-Howard](https://github.com/Gregory-Howard)
-* György Orosz, [@oroszgy](https://github.com/oroszgy)
-* Henning Peters, [@henningpeters](https://github.com/henningpeters)
-* Iddo Berger, [@iddoberger](https://github.com/iddoberger)
-* Ines Montani, [@ines](https://github.com/ines)
-* J Nicolas Schrading, [@NSchrading](https://github.com/NSchrading)
-* Janneke van der Zwaan, [@jvdzwaan](https://github.com/jvdzwaan)
-* Jim Geovedi, [@geovedi](https://github.com/geovedi)
-* Jim Regan, [@jimregan](https://github.com/jimregan)
-* Jeffrey Gerard, [@IamJeffG](https://github.com/IamJeffG)
-* Jordan Suchow, [@suchow](https://github.com/suchow)
-* Josh Reeter, [@jreeter](https://github.com/jreeter)
-* Juan Miguel Cejuela, [@juanmirocks](https://github.com/juanmirocks)
-* Kendrick Tan, [@kendricktan](https://github.com/kendricktan)
-* Kyle P. Johnson, [@kylepjohnson](https://github.com/kylepjohnson)
-* Leif Uwe Vogelsang, [@luvogels](https://github.com/luvogels)
-* Liling Tan, [@alvations](https://github.com/alvations)
-* Magnus Burton, [@magnusburton](https://github.com/magnusburton)
-* Mark Amery, [@ExplodingCabbage](https://github.com/ExplodingCabbage)
-* Matthew Honnibal, [@honnibal](https://github.com/honnibal)
-* Maxim Samsonov, [@maxirmx](https://github.com/maxirmx)
-* Michael Wallin, [@wallinm1](https://github.com/wallinm1)
-* Miguel Almeida, [@mamoit](https://github.com/mamoit)
-* Motoki Wu, [@tokestermw](https://github.com/tokestermw)
-* Ole Henrik Skogstrøm, [@ohenrik](https://github.com/ohenrik)
-* Oleg Zd, [@olegzd](https://github.com/olegzd)
-* Orhan Bilgin, [@melanuria](https://github.com/melanuria)
-* Orion Montoya, [@mdcclv](https://github.com/mdcclv)
-* Paul O'Leary McCann, [@polm](https://github.com/polm)
-* Pokey Rule, [@pokey](https://github.com/pokey)
-* Ramanan Balakrishnan, [@ramananbalakrishnan](https://github.com/ramananbalakrishnan)
-* Raphaël Bournhonesque, [@raphael0202](https://github.com/raphael0202)
-* Rob van Nieuwpoort, [@RvanNieuwpoort](https://github.com/RvanNieuwpoort)
-* Roman Domrachev, [@ligser](https://github.com/ligser)
-* Roman Inflianskas, [@rominf](https://github.com/rominf)
-* Sam Bozek, [@sambozek](https://github.com/sambozek)
-* Sasho Savkov, [@savkov](https://github.com/savkov)
-* Shuvanon Razik, [@shuvanon](https://github.com/shuvanon)
-* Søren Lind Kristiansen, [@sorenlind](https://github.com/sorenlind)
-* Swier, [@swierh](https://github.com/swierh)
-* Thomas Tanon, [@Tpt](https://github.com/Tpt)
-* Thomas Opsomer, [@thomasopsomer](https://github.com/thomasopsomer)
-* Tiago Rodrigues, [@TiagoMRodrigues](https://github.com/TiagoMRodrigues)
-* Vadim Mazaev, [@GreenRiverRUS](https://github.com/GreenRiverRUS)
-* Vimos Tan, [@Vimos](https://github.com/Vimos)
-* Vsevolod Solovyov, [@vsolovyov](https://github.com/vsolovyov)
-* Wah Loon Keng, [@kengz](https://github.com/kengz)
-* Wannaphong Phatthiyaphaibun, [@wannaphongcom](https://github.com/wannaphongcom)
-* Willem van Hage, [@wrvhage](https://github.com/wrvhage)
-* Wolfgang Seeker, [@wbwseeker](https://github.com/wbwseeker)
-* Yam, [@hscspring](https://github.com/hscspring)
-* Yanhao Yang, [@YanhaoYang](https://github.com/YanhaoYang)
-* Yasuaki Uechi, [@uetchy](https://github.com/uetchy)
-* Yu-chun Huang, [@galaxyh](https://github.com/galaxyh)
-* Yubing Dong, [@tomtung](https://github.com/tomtung)
-* Yuval Pinter, [@yuvalpinter](https://github.com/yuvalpinter)
--- a/README.rst
+++ b/README.rst
@ -0,0 +1,328 @@
+spaCy: Industrial-strength NLP
+******************************
+
+spaCy is a library for advanced Natural Language Processing in Python and Cython.
+It's built on the very latest research, and was designed from day one to be
+used in real products. spaCy comes with
+`pre-trained statistical models <https://spacy.io/models>`_ and word
+vectors, and currently supports tokenization for **30+ languages**. It features
+the **fastest syntactic parser** in the world, convolutional **neural network models**
+for tagging, parsing and **named entity recognition** and easy **deep learning**
+integration. It's commercial open-source software, released under the MIT license.
+
+💫 **Version 2.0 out now!** `Check out the release notes here. <https://github.com/explosion/spaCy/releases>`_
+
+.. image:: https://img.shields.io/travis/explosion/spaCy/master.svg?style=flat-square&logo=travis
+    :target: https://travis-ci.org/explosion/spaCy
+    :alt: Build Status
+
+.. image:: https://img.shields.io/appveyor/ci/explosion/spaCy/master.svg?style=flat-square&logo=appveyor
+    :target: https://ci.appveyor.com/project/explosion/spaCy
+    :alt: Appveyor Build Status
+
+.. image:: https://img.shields.io/github/release/explosion/spacy.svg?style=flat-square
+    :target: https://github.com/explosion/spaCy/releases
+    :alt: Current Release Version
+
+.. image:: https://img.shields.io/pypi/v/spacy.svg?style=flat-square
+    :target: https://pypi.python.org/pypi/spacy
+    :alt: pypi Version
+
+.. image:: https://img.shields.io/conda/vn/conda-forge/spacy.svg?style=flat-square
+    :target: https://anaconda.org/conda-forge/spacy
+    :alt: conda Version
+
+.. image:: https://img.shields.io/badge/wheels-%E2%9C%93-4c1.svg?longCache=true&style=flat-square&logo=python&logoColor=white
+    :target: https://github.com/explosion/wheelwright/releases
+    :alt: Python wheels
+
+.. image:: https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow
+    :target: https://twitter.com/spacy_io
+    :alt: spaCy on Twitter
+
+📖 Documentation
+================
+
+===================  ===
+`spaCy 101`_         New to spaCy? Here's everything you need to know!
+`Usage Guides`_      How to use spaCy and its features.
+`New in v2.0`_       New features, backwards incompatibilities and migration guide.
+`API Reference`_     The detailed reference for spaCy's API.
+`Models`_            Download statistical language models for spaCy.
+`Universe`_          Libraries, extensions, demos, books and courses.
+`Changelog`_         Changes and version history.
+`Contribute`_        How to contribute to the spaCy project and code base.
+===================  ===
+
+.. _spaCy 101: https://spacy.io/usage/spacy-101
+.. _New in v2.0: https://spacy.io/usage/v2#migrating
+.. _Usage Guides: https://spacy.io/usage/
+.. _API Reference: https://spacy.io/api/
+.. _Models: https://spacy.io/models
+.. _Universe: https://spacy.io/universe
+.. _Changelog: https://spacy.io/usage/#changelog
+.. _Contribute: https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md
+
+💬 Where to ask questions
+==========================
+
+The spaCy project is maintained by `@honnibal <https://github.com/honnibal>`_
+and `@ines <https://github.com/ines>`_. Please understand that we won't be able
+to provide individual support via email. We also believe that help is much more
+valuable if it's shared publicly, so that more people can benefit from it.
+
+====================== ===
+**Bug Reports**        `GitHub Issue Tracker`_
+**Usage Questions**    `Stack Overflow`_, `Gitter Chat`_, `Reddit User Group`_
+**General Discussion** `Gitter Chat`_, `Reddit User Group`_
+====================== ===
+
+.. _GitHub Issue Tracker: https://github.com/explosion/spaCy/issues
+.. _Stack Overflow: http://stackoverflow.com/questions/tagged/spacy
+.. _Gitter Chat: https://gitter.im/explosion/spaCy
+.. _Reddit User Group: https://www.reddit.com/r/spacynlp
+
+Features
+========
+
+* **Fastest syntactic parser** in the world
+* **Named entity** recognition
+* Non-destructive **tokenization**
+* Support for **30+ languages**
+* Pre-trained `statistical models <https://spacy.io/models>`_ and word vectors
+* Easy **deep learning** integration
+* Part-of-speech tagging
+* Labelled dependency parsing
+* Syntax-driven sentence segmentation
+* Built in **visualizers** for syntax and NER
+* Convenient string-to-hash mapping
+* Export to numpy data arrays
+* Efficient binary serialization
+* Easy **model packaging** and deployment
+* State-of-the-art speed
+* Robust, rigorously evaluated accuracy
+
+📖  **For more details, see the** `facts, figures and benchmarks <https://spacy.io/usage/facts-figures>`_.
+
+Install spaCy
+=============
+
+For detailed installation instructions, see
+the `documentation <https://spacy.io/usage>`_.
+
+==================== ===
+**Operating system** macOS / OS X, Linux, Windows (Cygwin, MinGW, Visual Studio)
+**Python version**   CPython 2.7, 3.4+. Only 64 bit.
+**Package managers** `pip`_, `conda`_ (via ``conda-forge``)
+==================== ===
+
+.. _pip: https://pypi.python.org/pypi/spacy
+.. _conda: https://anaconda.org/conda-forge/spacy
+
+pip
+---
+
+Using pip, spaCy releases are available as source packages and binary wheels
+(as of ``v2.0.13``).
+
+.. code:: bash
+
+    pip install spacy
+
+When using pip it is generally recommended to install packages in a virtual
+environment to avoid modifying system state:
+
+.. code:: bash
+
+    python -m venv .env
+    source .env/bin/activate
+    pip install spacy
+
+conda
+-----
+
+Thanks to our great community, we've finally re-added conda support. You can now
+install spaCy via ``conda-forge``:
+
+.. code:: bash
+
+    conda config --add channels conda-forge
+    conda install spacy
+
+For the feedstock including the build recipe and configuration,
+check out `this repository <https://github.com/conda-forge/spacy-feedstock>`_.
+Improvements and pull requests to the recipe and setup are always appreciated.
+
+Updating spaCy
+--------------
+
+Some updates to spaCy may require downloading new statistical models. If you're
+running spaCy v2.0 or higher, you can use the ``validate`` command to check if
+your installed models are compatible and if not, print details on how to update
+them:
+
+.. code:: bash
+
+    pip install -U spacy
+    python -m spacy validate
+
+If you've trained your own models, keep in mind that your training and runtime
+inputs must match. After updating spaCy, we recommend **retraining your models**
+with the new version.
+
+📖  **For details on upgrading from spaCy 1.x to spaCy 2.x, see the**
+`migration guide <https://spacy.io/usage/v2#migrating>`_.
+
+Download models
+===============
+
+As of v1.7.0, models for spaCy can be installed as **Python packages**.
+This means that they're a component of your application, just like any
+other module. Models can be installed using spaCy's ``download`` command,
+or manually by pointing pip to a path or URL.
+
+======================= ===
+`Available Models`_     Detailed model descriptions, accuracy figures and benchmarks.
+`Models Documentation`_ Detailed usage instructions.
+======================= ===
+
+.. _Available Models: https://spacy.io/models
+.. _Models Documentation: https://spacy.io/docs/usage/models
+
+.. code:: bash
+
+    # out-of-the-box: download best-matching default model
+    python -m spacy download en
+
+    # download best-matching version of specific model for your spaCy installation
+    python -m spacy download en_core_web_lg
+
+    # pip install .tar.gz archive from path or URL
+    pip install /Users/you/en_core_web_sm-2.0.0.tar.gz
+
+Loading and using models
+------------------------
+
+To load a model, use ``spacy.load()`` with the model's shortcut link:
+
+.. code:: python
+
+    import spacy
+    nlp = spacy.load('en')
+    doc = nlp(u'This is a sentence.')
+
+If you've installed a model via pip, you can also ``import`` it directly and
+then call its ``load()`` method:
+
+.. code:: python
+
+    import spacy
+    import en_core_web_sm
+
+    nlp = en_core_web_sm.load()
+    doc = nlp(u'This is a sentence.')
+
+📖 **For more info and examples, check out the**
+`models documentation <https://spacy.io/docs/usage/models>`_.
+
+Support for older versions
+--------------------------
+
+If you're using an older version (``v1.6.0`` or below), you can still download
+and install the old models from within spaCy using ``python -m spacy.en.download all``
+or ``python -m spacy.de.download all``. The ``.tar.gz`` archives are also
+`attached to the v1.6.0 release <https://github.com/explosion/spaCy/tree/v1.6.0>`_.
+To download and install the models manually, unpack the archive, drop the
+contained directory into ``spacy/data`` and load the model via ``spacy.load('en')``
+or ``spacy.load('de')``.
+
+Compile from source
+===================
+
+The other way to install spaCy is to clone its
+`GitHub repository <https://github.com/explosion/spaCy>`_ and build it from
+source. That is the common way if you want to make changes to the code base.
+You'll need to make sure that you have a development environment consisting of a
+Python distribution including header files, a compiler,
+`pip <https://pip.pypa.io/en/latest/installing/>`__, `virtualenv <https://virtualenv.pypa.io/>`_
+and `git <https://git-scm.com>`_ installed. The compiler part is the trickiest.
+How to do that depends on your system. See notes on Ubuntu, OS X and Windows for
+details.
+
+.. code:: bash
+
+    # make sure you are using the latest pip
+    python -m pip install -U pip
+    git clone https://github.com/explosion/spaCy
+    cd spaCy
+
+    python -m venv .env
+    source .env/bin/activate
+    export PYTHONPATH=`pwd`
+    pip install -r requirements.txt
+    python setup.py build_ext --inplace
+
+Compared to regular install via pip, `requirements.txt <requirements.txt>`_
+additionally installs developer dependencies such as Cython. For more details
+and instructions, see the documentation on
+`compiling spaCy from source <https://spacy.io/usage/#source>`_ and the
+`quickstart widget <https://spacy.io/usage/#section-quickstart>`_ to get
+the right commands for your platform and Python version.
+
+Instead of the above verbose commands, you can also use the following
+`Fabric <http://www.fabfile.org/>`_ commands. All commands assume that your
+virtual environment is located in a directory ``.env``. If you're using a
+different directory, you can change it via the environment variable ``VENV_DIR``,
+for example ``VENV_DIR=".custom-env" fab clean make``.
+
+============= ===
+``fab env``   Create virtual environment and delete previous one, if it exists.
+``fab make``  Compile the source.
+``fab clean`` Remove compiled objects, including the generated C++.
+``fab test``  Run basic tests, aborting after first failure.
+============= ===
+
+Ubuntu
+------
+
+Install system-level dependencies via ``apt-get``:
+
+.. code:: bash
+
+    sudo apt-get install build-essential python-dev git
+
+macOS / OS X
+------------
+
+Install a recent version of `XCode <https://developer.apple.com/xcode/>`_,
+including the so-called "Command Line Tools". macOS and OS X ship with Python
+and git preinstalled.
+
+Windows
+-------
+
+Install a version of `Visual Studio Express <https://www.visualstudio.com/vs/visual-studio-express/>`_
+or higher that matches the version that was used to compile your Python
+interpreter. For official distributions these are VS 2008 (Python 2.7),
+VS 2010 (Python 3.4) and VS 2015 (Python 3.5).
+
+Run tests
+=========
+
+spaCy comes with an `extensive test suite <spacy/tests>`_.  In order to run the
+tests, you'll usually want to clone the repository and build spaCy from source.
+This will also install the required development dependencies and test utilities
+defined in the ``requirements.txt``.
+
+Alternatively, you can find out where spaCy is installed and run ``pytest`` on
+that directory. Don't forget to also install the test utilities via spaCy's
+``requirements.txt``:
+
+.. code:: bash
+
+    python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
+    pip install -r path/to/requirements.txt
+    python -m pytest <spacy-directory>
+
+See `the documentation <https://spacy.io/usage/#tests>`_ for more details and
+examples.
--- a/bin/push-tag.sh
+++ b/bin/push-tag.sh
@ -7,6 +7,7 @@ git diff-index --quiet HEAD

 git checkout $1
 git pull origin $1
+
 version=$(grep "__version__ = " spacy/about.py)
 version=${version/__version__ = }
 version=${version/\'/}
--- a/examples/deep_learning_keras.py
+++ b/examples/deep_learning_keras.py
@ -92,11 +92,13 @@ def get_features(docs, max_length):
 def train(train_texts, train_labels, dev_texts, dev_labels,
          lstm_shape, lstm_settings, lstm_optimizer, batch_size=100,
          nb_epoch=5, by_sentence=True):
+    
    print("Loading spaCy")
    nlp = spacy.load('en_vectors_web_lg')
    nlp.add_pipe(nlp.create_pipe('sentencizer'))
    embeddings = get_embeddings(nlp.vocab)
    model = compile_lstm(embeddings, lstm_shape, lstm_settings)
+    
    print("Parsing texts...")
    train_docs = list(nlp.pipe(train_texts))
    dev_docs = list(nlp.pipe(dev_texts))
@ -107,7 +109,7 @@ def train(train_texts, train_labels, dev_texts, dev_labels,
    train_X = get_features(train_docs, lstm_shape['max_length'])
    dev_X = get_features(dev_docs, lstm_shape['max_length'])
    model.fit(train_X, train_labels, validation_data=(dev_X, dev_labels),
-              nb_epoch=nb_epoch, batch_size=batch_size)
+              epochs=nb_epoch, batch_size=batch_size)
    return model


@ -138,15 +140,9 @@ def get_embeddings(vocab):


 def evaluate(model_dir, texts, labels, max_length=100):
-    def create_pipeline(nlp):
-        '''
-        This could be a lambda, but named functions are easier to read in Python.
-        '''
-        return [nlp.tagger, nlp.parser, SentimentAnalyser.load(model_dir, nlp,
-                                                               max_length=max_length)]
-
-    nlp = spacy.load('en')
-    nlp.pipeline = create_pipeline(nlp)
+    nlp = spacy.load('en_vectors_web_lg')
+    nlp.add_pipe(nlp.create_pipe('sentencizer'))
+    nlp.add_pipe(SentimentAnalyser.load(model_dir, nlp, max_length=max_length))

    correct = 0
    i = 0
@ -186,7 +182,7 @@ def main(model_dir=None, train_dir=None, dev_dir=None,
         is_runtime=False,
         nr_hidden=64, max_length=100, # Shape
         dropout=0.5, learn_rate=0.001, # General NN config
-         nb_epoch=5, batch_size=100, nr_examples=-1):  # Training params
+         nb_epoch=5, batch_size=256, nr_examples=-1):  # Training params
    if model_dir is not None:
        model_dir = pathlib.Path(model_dir)
    if train_dir is None or dev_dir is None:
@ -219,7 +215,7 @@ def main(model_dir=None, train_dir=None, dev_dir=None,
        if model_dir is not None:
            with (model_dir / 'model').open('wb') as file_:
                pickle.dump(weights[1:], file_)
-            with (model_dir / 'config.json').open('wb') as file_:
+            with (model_dir / 'config.json').open('w') as file_:
                file_.write(lstm.to_json())


--- a/examples/keras_parikh_entailment/README.md
+++ b/examples/keras_parikh_entailment/README.md
@ -2,11 +2,7 @@

 # A decomposable attention model for Natural Language Inference
 **by Matthew Honnibal, [@honnibal](https://github.com/honnibal)**
-
-> ⚠️ **IMPORTANT NOTE:** This example is currently only compatible with spaCy
-> v1.x. We're working on porting the example over to Keras v2.x and spaCy v2.x.
-> See [#1445](https://github.com/explosion/spaCy/issues/1445) for details –
-> contributions welcome!
+**Updated for spaCy 2.0+ and Keras 2.2.2+ by John Stewart, [@free-variation](https://github.com/free-variation)**

 This directory contains an implementation of the entailment prediction model described
 by [Parikh et al. (2016)](https://arxiv.org/pdf/1606.01933.pdf). The model is notable
@ -21,19 +17,25 @@ hook is installed to customise the `.similarity()` method of spaCy's `Doc`
 and `Span` objects:

 ```python
-def demo(model_dir):
-    nlp = spacy.load('en', path=model_dir,
-            create_pipeline=create_similarity_pipeline)
-    doc1 = nlp(u'Worst fries ever! Greasy and horrible...')
-    doc2 = nlp(u'The milkshakes are good. The fries are bad.')
-    print(doc1.similarity(doc2))
-    sent1a, sent1b = doc1.sents
-    print(sent1a.similarity(sent1b))
-    print(sent1a.similarity(doc2))
-    print(sent1b.similarity(doc2))
+def demo(shape):
+	nlp = spacy.load('en_vectors_web_lg')
+    nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))
+
+    doc1 = nlp(u'The king of France is bald.')
+    doc2 = nlp(u'France has no king.')
+
+    print("Sentence 1:", doc1)
+    print("Sentence 2:", doc2)
+
+    entailment_type, confidence = doc1.similarity(doc2)
+    print("Entailment type:", entailment_type, "(Confidence:", confidence, ")")
 ```

+Which gives the output `Entailment type: contradiction (Confidence: 0.60604566)`, showing that
+the system has definite opinions about Betrand Russell's [famous conundrum](https://users.drew.edu/jlenz/br-on-denoting.html)!
+
 I'm working on a blog post to explain Parikh et al.'s model in more detail.
+A [notebook](https://github.com/free-variation/spaCy/blob/master/examples/notebooks/Decompositional%20Attention.ipynb) is available that briefly explains this implementation.
 I think it is a very interesting example of the attention mechanism, which
 I didn't understand very well before working through this paper. There are
 lots of ways to extend the model.
@ -43,7 +45,7 @@ lots of ways to extend the model.
 | File | Description |
 | --- | --- |
 | `__main__.py` | The script that will be executed. Defines the CLI, the data reading, etc — all the boring stuff. |
-| `spacy_hook.py` | Provides a class `SimilarityShim` that lets you use an arbitrary function to customize spaCy's `doc.similarity()` method. Instead of the default average-of-vectors algorithm, when you call `doc1.similarity(doc2)`, you'll get the result of `your_model(doc1, doc2)`. |
+| `spacy_hook.py` | Provides a class `KerasSimilarityShim` that lets you use an arbitrary function to customize spaCy's `doc.similarity()` method. Instead of the default average-of-vectors algorithm, when you call `doc1.similarity(doc2)`, you'll get the result of `your_model(doc1, doc2)`. |
 | `keras_decomposable_attention.py` | Defines the neural network model. |

 ## Setting up
@ -52,17 +54,13 @@ First, install [Keras](https://keras.io/), [spaCy](https://spacy.io) and the spa
 English models (about 1GB of data):

 ```bash
-pip install https://github.com/fchollet/keras/archive/1.2.2.zip
+pip install keras
 pip install spacy
-python -m spacy.en.download
+python -m spacy download en_vectors_web_lg
 ```

-⚠️ **Important:** In order for the example to run, you'll need to install Keras from
-the 1.2.2 release (and not via `pip install keras`). For more info on this, see
-[#727](https://github.com/explosion/spaCy/issues/727).
-
-You'll also want to get Keras working on your GPU. This will depend on your
-set up, so you're mostly on your own for this step. If you're using AWS, try the
+You'll also want to get Keras working on your GPU, and you will need a backend, such as TensorFlow or Theano.
+This will depend on your set up, so you're mostly on your own for this step. If you're using AWS, try the
 [NVidia AMI](https://aws.amazon.com/marketplace/pp/B00FYCDDTE). It made things pretty easy.

 Once you've installed the dependencies, you can run a small preliminary test of
@ -80,22 +78,35 @@ Finally, download the [Stanford Natural Language Inference corpus](http://nlp.st
 ## Running the example

 You can run the `keras_parikh_entailment/` directory as a script, which executes the file
-[`keras_parikh_entailment/__main__.py`](__main__.py). The first thing you'll want to do is train the model:
+[`keras_parikh_entailment/__main__.py`](__main__.py).  If you run the script without arguments
+the usage is shown.  Running it with `-h` explains the command line arguments.
+
+The first thing you'll want to do is train the model:

 ```bash
-python keras_parikh_entailment/ train <train_directory> <dev_directory>
+python keras_parikh_entailment/ train -t <path to SNLI train JSON> -s <path to SNLI dev JSON>
 ```

 Training takes about 300 epochs for full accuracy, and I haven't rerun the full
 experiment since refactoring things to publish this example — please let me
-know if I've broken something. You should get to at least 85% on the development data.
+know if I've broken something. You should get to at least 85% on the development data even after 10-15 epochs.

 The other two modes demonstrate run-time usage. I never like relying on the accuracy printed
 by `.fit()` methods. I never really feel confident until I've run a new process that loads
 the model and starts making predictions, without access to the gold labels. I've therefore
-included an `evaluate` mode. Finally, there's also a little demo, which mostly exists to show
+included an `evaluate` mode. 
+
+```bash
+python keras_parikh_entailment/ evaluate -s <path to SNLI train JSON>
+```
+
+Finally, there's also a little demo, which mostly exists to show
 you how run-time usage will eventually look.

+```bash
+python keras_parikh_entailment/ demo
+```
+
 ## Getting updates

 We should have the blog post explaining the model ready before the end of the week. To get
--- a/examples/keras_parikh_entailment/main.py
+++ b/examples/keras_parikh_entailment/main.py
@ -1,82 +1,104 @@
-from __future__ import division, unicode_literals, print_function
-import spacy
-
-import plac
-from pathlib import Path
+import numpy as np
 import ujson as json
-import numpy
-from keras.utils.np_utils import to_categorical
-
-from spacy_hook import get_embeddings, get_word_ids
-from spacy_hook import create_similarity_pipeline
+from keras.utils import to_categorical
+import plac
+import sys

 from keras_decomposable_attention import build_model
+from spacy_hook import get_embeddings, KerasSimilarityShim

 try:
    import cPickle as pickle
 except ImportError:
    import pickle

+import spacy
+
+# workaround for keras/tensorflow bug
+# see https://github.com/tensorflow/tensorflow/issues/3388
+import os
+import importlib
+from keras import backend as K
+
+def set_keras_backend(backend):
+    if K.backend() != backend:
+        os.environ['KERAS_BACKEND'] = backend
+        importlib.reload(K)
+        assert K.backend() == backend
+    if backend == "tensorflow":
+        K.get_session().close()
+        cfg = K.tf.ConfigProto()
+        cfg.gpu_options.allow_growth = True
+        K.set_session(K.tf.Session(config=cfg))
+        K.clear_session()
+
+set_keras_backend("tensorflow") 
+

 def train(train_loc, dev_loc, shape, settings):
    train_texts1, train_texts2, train_labels = read_snli(train_loc)
    dev_texts1, dev_texts2, dev_labels = read_snli(dev_loc)

    print("Loading spaCy")
-    nlp = spacy.load('en')
+    nlp = spacy.load('en_vectors_web_lg')
    assert nlp.path is not None
+   
+    print("Processing texts...")
+    train_X = create_dataset(nlp, train_texts1, train_texts2, 100, shape[0])
+    dev_X = create_dataset(nlp, dev_texts1, dev_texts2, 100, shape[0])
+
    print("Compiling network")
    model = build_model(get_embeddings(nlp.vocab), shape, settings)
-    print("Processing texts...")
-    Xs = []
-    for texts in (train_texts1, train_texts2, dev_texts1, dev_texts2):
-        Xs.append(get_word_ids(list(nlp.pipe(texts, n_threads=20, batch_size=20000)),
-                         max_length=shape[0],
-                         rnn_encode=settings['gru_encode'],
-                         tree_truncate=settings['tree_truncate']))
-    train_X1, train_X2, dev_X1, dev_X2 = Xs
+
    print(settings)
    model.fit(
-        [train_X1, train_X2],
+        train_X,
        train_labels,
-        validation_data=([dev_X1, dev_X2], dev_labels),
-        nb_epoch=settings['nr_epoch'],
-        batch_size=settings['batch_size'])
+        validation_data = (dev_X, dev_labels),
+        epochs = settings['nr_epoch'],
+        batch_size = settings['batch_size'])
+    
    if not (nlp.path / 'similarity').exists():
        (nlp.path / 'similarity').mkdir()
    print("Saving to", nlp.path / 'similarity')
    weights = model.get_weights()
+    # remove the embedding matrix.  We can reconstruct it.
+    del weights[1]
    with (nlp.path / 'similarity' / 'model').open('wb') as file_:
-        pickle.dump(weights[1:], file_)
-    with (nlp.path / 'similarity' / 'config.json').open('wb') as file_:
+        pickle.dump(weights, file_)
+    with (nlp.path / 'similarity' / 'config.json').open('w') as file_:
        file_.write(model.to_json())


-def evaluate(dev_loc):
+def evaluate(dev_loc, shape):
    dev_texts1, dev_texts2, dev_labels = read_snli(dev_loc)
-    nlp = spacy.load('en',
-            create_pipeline=create_similarity_pipeline)
+    nlp = spacy.load('en_vectors_web_lg')
+    nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))
+    
    total = 0.
    correct = 0.
    for text1, text2, label in zip(dev_texts1, dev_texts2, dev_labels):
        doc1 = nlp(text1)
        doc2 = nlp(text2)
-        sim = doc1.similarity(doc2)
-        if sim.argmax() == label.argmax():
+        sim, _ = doc1.similarity(doc2)
+        if sim == KerasSimilarityShim.entailment_types[label.argmax()]:
            correct += 1
        total += 1
    return correct, total


-def demo():
-    nlp = spacy.load('en',
-            create_pipeline=create_similarity_pipeline)
-    doc1 = nlp(u'What were the best crime fiction books in 2016?')
-    doc2 = nlp(
-        u'What should I read that was published last year? I like crime stories.')
-    print(doc1)
-    print(doc2)
-    print("Similarity", doc1.similarity(doc2))
+def demo(shape):
+    nlp = spacy.load('en_vectors_web_lg')
+    nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))
+
+    doc1 = nlp(u'The king of France is bald.')
+    doc2 = nlp(u'France has no king.')
+
+    print("Sentence 1:", doc1)
+    print("Sentence 2:", doc2)
+
+    entailment_type, confidence = doc1.similarity(doc2)
+    print("Entailment type:", entailment_type, "(Confidence:", confidence, ")")


 LABELS = {'entailment': 0, 'contradiction': 1, 'neutral': 2}
@ -84,56 +106,92 @@ def read_snli(path):
    texts1 = []
    texts2 = []
    labels = []
-    with path.open() as file_:
+    with open(path, 'r') as file_:
        for line in file_:
            eg = json.loads(line)
            label = eg['gold_label']
-            if label == '-':
+            if label == '-':  # per Parikh, ignore - SNLI entries
                continue
            texts1.append(eg['sentence1'])
            texts2.append(eg['sentence2'])
            labels.append(LABELS[label])
-    return texts1, texts2, to_categorical(numpy.asarray(labels, dtype='int32'))
+    return texts1, texts2, to_categorical(np.asarray(labels, dtype='int32'))
+
+def create_dataset(nlp, texts, hypotheses, num_unk, max_length):
+    sents = texts + hypotheses
+    
+    sents_as_ids = []
+    for sent in sents:
+        doc = nlp(sent)
+        word_ids = []
+        
+        for i, token in enumerate(doc):
+            # skip odd spaces from tokenizer
+            if token.has_vector and token.vector_norm == 0:
+                continue
+                
+            if i > max_length:
+                break
+                
+            if token.has_vector:
+                word_ids.append(token.rank + num_unk + 1)
+            else:
+                # if we don't have a vector, pick an OOV entry
+                word_ids.append(token.rank % num_unk + 1) 
+                
+        # there must be a simpler way of generating padded arrays from lists...
+        word_id_vec = np.zeros((max_length), dtype='int')
+        clipped_len = min(max_length, len(word_ids))
+        word_id_vec[:clipped_len] = word_ids[:clipped_len]
+        sents_as_ids.append(word_id_vec)
+        
+        
+    return [np.array(sents_as_ids[:len(texts)]), np.array(sents_as_ids[len(texts):])]


@plac.annotations(
    mode=("Mode to execute", "positional", None, str, ["train", "evaluate", "demo"]),
-    train_loc=("Path to training data", "positional", None, Path),
-    dev_loc=("Path to development data", "positional", None, Path),
+    train_loc=("Path to training data", "option", "t", str),
+    dev_loc=("Path to development or test data", "option", "s", str),
    max_length=("Length to truncate sentences", "option", "L", int),
    nr_hidden=("Number of hidden units", "option", "H", int),
    dropout=("Dropout level", "option", "d", float),
-    learn_rate=("Learning rate", "option", "e", float),
+    learn_rate=("Learning rate", "option", "r", float),
    batch_size=("Batch size for neural network training", "option", "b", int),
-    nr_epoch=("Number of training epochs", "option", "i", int),
-    tree_truncate=("Truncate sentences by tree distance", "flag", "T", bool),
-    gru_encode=("Encode sentences with bidirectional GRU", "flag", "E", bool),
+    nr_epoch=("Number of training epochs", "option", "e", int),
+    entail_dir=("Direction of entailment", "option", "D", str, ["both", "left", "right"])
 )
 def main(mode, train_loc, dev_loc,
-        tree_truncate=False,
-        gru_encode=False,
-        max_length=100,
-        nr_hidden=100,
-        dropout=0.2,
-        learn_rate=0.001,
-        batch_size=100,
-        nr_epoch=5):
+        max_length = 50,
+        nr_hidden = 200,
+        dropout = 0.2,
+        learn_rate = 0.001,
+        batch_size = 1024,
+        nr_epoch = 10,
+        entail_dir="both"):
+    
    shape = (max_length, nr_hidden, 3)
    settings = {
        'lr': learn_rate,
        'dropout': dropout,
        'batch_size': batch_size,
        'nr_epoch': nr_epoch,
-        'tree_truncate': tree_truncate,
-        'gru_encode': gru_encode
+        'entail_dir': entail_dir
    }
+
    if mode == 'train':
+        if train_loc == None or dev_loc == None:
+            print("Train mode requires paths to training and development data sets.")
+            sys.exit(1)
        train(train_loc, dev_loc, shape, settings)
    elif mode == 'evaluate':
-        correct, total = evaluate(dev_loc)
+        if  dev_loc == None:
+            print("Evaluate mode requires paths to test data set.")
+            sys.exit(1)
+        correct, total = evaluate(dev_loc, shape)
        print(correct, '/', total, correct / total)
    else:
-        demo()
+        demo(shape)

 if __name__ == '__main__':
    plac.call(main)
--- a/examples/keras_parikh_entailment/keras_decomposable_attention.py
+++ b/examples/keras_parikh_entailment/keras_decomposable_attention.py
@ -1,259 +1,137 @@
-# Semantic similarity with decomposable attention (using spaCy and Keras)
-# Practical state-of-the-art text similarity with spaCy and Keras
-import numpy
-
-from keras.layers import InputSpec, Layer, Input, Dense, merge
-from keras.layers import Lambda, Activation, Dropout, Embedding, TimeDistributed
-from keras.layers import Bidirectional, GRU, LSTM
-from keras.layers.noise import GaussianNoise
-from keras.layers.advanced_activations import ELU
-import keras.backend as K
-from keras.models import Sequential, Model, model_from_json
-from keras.regularizers import l2
-from keras.optimizers import Adam
-from keras.layers.normalization import BatchNormalization
-from keras.layers.pooling import GlobalAveragePooling1D, GlobalMaxPooling1D
-from keras.layers import Merge
+# Semantic entailment/similarity with decomposable attention (using spaCy and Keras)
+# Practical state-of-the-art textual entailment with spaCy and Keras

+import numpy as np
+from keras import layers, Model, models, optimizers
+from keras import backend as K

 def build_model(vectors, shape, settings):
-    '''Compile the model.'''
    max_length, nr_hidden, nr_class = shape
-    # Declare inputs.
-    ids1 = Input(shape=(max_length,), dtype='int32', name='words1')
-    ids2 = Input(shape=(max_length,), dtype='int32', name='words2')

-    # Construct operations, which we'll chain together.
-    embed = _StaticEmbedding(vectors, max_length, nr_hidden, dropout=0.2, nr_tune=5000)
-    if settings['gru_encode']:
-        encode = _BiRNNEncoding(max_length, nr_hidden, dropout=settings['dropout'])
-    attend = _Attention(max_length, nr_hidden, dropout=settings['dropout'])
-    align = _SoftAlignment(max_length, nr_hidden)
-    compare = _Comparison(max_length, nr_hidden, dropout=settings['dropout'])
-    entail = _Entailment(nr_hidden, nr_class, dropout=settings['dropout'])
+    input1 = layers.Input(shape=(max_length,), dtype='int32', name='words1')
+    input2 = layers.Input(shape=(max_length,), dtype='int32', name='words2')
+    
+    # embeddings (projected)
+    embed = create_embedding(vectors, max_length, nr_hidden)
+   
+    a = embed(input1)
+    b = embed(input2)
+    
+    # step 1: attend
+    F = create_feedforward(nr_hidden)
+    att_weights = layers.dot([F(a), F(b)], axes=-1)
+    
+    G = create_feedforward(nr_hidden)
+    
+    if settings['entail_dir'] == 'both':
+        norm_weights_a = layers.Lambda(normalizer(1))(att_weights)
+        norm_weights_b = layers.Lambda(normalizer(2))(att_weights)
+        alpha = layers.dot([norm_weights_a, a], axes=1)
+        beta  = layers.dot([norm_weights_b, b], axes=1)

-    # Declare the model as a computational graph.
-    sent1 = embed(ids1) # Shape: (i, n)
-    sent2 = embed(ids2) # Shape: (j, n)
+        # step 2: compare
+        comp1 = layers.concatenate([a, beta])
+        comp2 = layers.concatenate([b, alpha])
+        v1 = layers.TimeDistributed(G)(comp1)
+        v2 = layers.TimeDistributed(G)(comp2)

-    if settings['gru_encode']:
-        sent1 = encode(sent1)
-        sent2 = encode(sent2)
+        # step 3: aggregate
+        v1_sum = layers.Lambda(sum_word)(v1)
+        v2_sum = layers.Lambda(sum_word)(v2)
+        concat = layers.concatenate([v1_sum, v2_sum])

-    attention = attend(sent1, sent2)  # Shape: (i, j)
+    elif settings['entail_dir'] == 'left':
+        norm_weights_a = layers.Lambda(normalizer(1))(att_weights)
+        alpha = layers.dot([norm_weights_a, a], axes=1)
+        comp2 = layers.concatenate([b, alpha])
+        v2 = layers.TimeDistributed(G)(comp2)
+        v2_sum = layers.Lambda(sum_word)(v2)
+        concat = v2_sum

-    align1 = align(sent2, attention)
-    align2 = align(sent1, attention, transpose=True)
-
-    feats1 = compare(sent1, align1)
-    feats2 = compare(sent2, align2)
-
-    scores = entail(feats1, feats2)
-
-    # Now that we have the input/output, we can construct the Model object...
-    model = Model(input=[ids1, ids2], output=[scores])
-
-    # ...Compile it...
+    else:
+        norm_weights_b = layers.Lambda(normalizer(2))(att_weights)
+        beta  = layers.dot([norm_weights_b, b], axes=1)
+        comp1 = layers.concatenate([a, beta])
+        v1 = layers.TimeDistributed(G)(comp1)
+        v1_sum = layers.Lambda(sum_word)(v1)
+        concat = v1_sum
+    
+    H = create_feedforward(nr_hidden)
+    out = H(concat)
+    out = layers.Dense(nr_class, activation='softmax')(out)
+    
+    model = Model([input1, input2], out)
+    
    model.compile(
-        optimizer=Adam(lr=settings['lr']),
+        optimizer=optimizers.Adam(lr=settings['lr']),
        loss='categorical_crossentropy',
        metrics=['accuracy'])
-    # ...And return it for training.
+    
    return model


-class _StaticEmbedding(object):
-    def __init__(self, vectors, max_length, nr_out, nr_tune=1000, dropout=0.0):
-        self.nr_out = nr_out
-        self.max_length = max_length
-        self.embed = Embedding(
-                        vectors.shape[0],
-                        vectors.shape[1],
-                        input_length=max_length,
-                        weights=[vectors],
-                        name='embed',
-                        trainable=False)
-        self.tune = Embedding(
-                        nr_tune,
-                        nr_out,
-                        input_length=max_length,
-                        weights=None,
-                        name='tune',
-                        trainable=True,
-                        dropout=dropout)
-        self.mod_ids = Lambda(lambda sent: sent % (nr_tune-1)+1,
-                              output_shape=(self.max_length,))
+def create_embedding(vectors, max_length, projected_dim):
+    return models.Sequential([
+        layers.Embedding(
+            vectors.shape[0],
+            vectors.shape[1],
+            input_length=max_length,
+            weights=[vectors],
+            trainable=False),
+        
+        layers.TimeDistributed(
+            layers.Dense(projected_dim,
+                         activation=None,
+                         use_bias=False))
+    ])

-        self.project = TimeDistributed(
-                            Dense(
-                                nr_out,
-                                activation=None,
-                                bias=False,
-                                name='project'))
-
-    def __call__(self, sentence):
-        def get_output_shape(shapes):
-            print(shapes)
-            return shapes[0]
-        mod_sent = self.mod_ids(sentence)
-        tuning = self.tune(mod_sent)
-        #tuning = merge([tuning, mod_sent],
-        #    mode=lambda AB: AB[0] * (K.clip(K.cast(AB[1], 'float32'), 0, 1)),
-        #    output_shape=(self.max_length, self.nr_out))
-        pretrained = self.project(self.embed(sentence))
-        vectors = merge([pretrained, tuning], mode='sum')
-        return vectors
+def create_feedforward(num_units=200, activation='relu', dropout_rate=0.2):
+    return models.Sequential([
+        layers.Dense(num_units, activation=activation),
+        layers.Dropout(dropout_rate),
+        layers.Dense(num_units, activation=activation),
+        layers.Dropout(dropout_rate)
+    ])


-class _BiRNNEncoding(object):
-    def __init__(self, max_length, nr_out, dropout=0.0):
-        self.model = Sequential()
-        self.model.add(Bidirectional(LSTM(nr_out, return_sequences=True,
-                                         dropout_W=dropout, dropout_U=dropout),
-                                         input_shape=(max_length, nr_out)))
-        self.model.add(TimeDistributed(Dense(nr_out, activation='relu', init='he_normal')))
-        self.model.add(TimeDistributed(Dropout(0.2)))
+def normalizer(axis):
+    def _normalize(att_weights):
+        exp_weights = K.exp(att_weights)
+        sum_weights = K.sum(exp_weights, axis=axis, keepdims=True)
+        return exp_weights/sum_weights
+    return _normalize

-    def __call__(self, sentence):
-        return self.model(sentence)
-
-
-class _Attention(object):
-    def __init__(self, max_length, nr_hidden, dropout=0.0, L2=0.0, activation='relu'):
-        self.max_length = max_length
-        self.model = Sequential()
-        self.model.add(Dropout(dropout, input_shape=(nr_hidden,)))
-        self.model.add(
-            Dense(nr_hidden, name='attend1',
-                init='he_normal', W_regularizer=l2(L2),
-                input_shape=(nr_hidden,), activation='relu'))
-        self.model.add(Dropout(dropout))
-        self.model.add(Dense(nr_hidden, name='attend2',
-            init='he_normal', W_regularizer=l2(L2), activation='relu'))
-        self.model = TimeDistributed(self.model)
-
-    def __call__(self, sent1, sent2):
-        def _outer(AB):
-            att_ji = K.batch_dot(AB[1], K.permute_dimensions(AB[0], (0, 2, 1)))
-            return K.permute_dimensions(att_ji,(0, 2, 1))
-        return merge(
-                [self.model(sent1), self.model(sent2)],
-                mode=_outer,
-                output_shape=(self.max_length, self.max_length))
-
-
-class _SoftAlignment(object):
-    def __init__(self, max_length, nr_hidden):
-        self.max_length = max_length
-        self.nr_hidden = nr_hidden
-
-    def __call__(self, sentence, attention, transpose=False):
-        def _normalize_attention(attmat):
-            att = attmat[0]
-            mat = attmat[1]
-            if transpose:
-                att = K.permute_dimensions(att,(0, 2, 1))
-            # 3d softmax
-            e = K.exp(att - K.max(att, axis=-1, keepdims=True))
-            s = K.sum(e, axis=-1, keepdims=True)
-            sm_att = e / s
-            return K.batch_dot(sm_att, mat)
-        return merge([attention, sentence], mode=_normalize_attention,
-                      output_shape=(self.max_length, self.nr_hidden)) # Shape: (i, n)
-
-
-class _Comparison(object):
-    def __init__(self, words, nr_hidden, L2=0.0, dropout=0.0):
-        self.words = words
-        self.model = Sequential()
-        self.model.add(Dropout(dropout, input_shape=(nr_hidden*2,)))
-        self.model.add(Dense(nr_hidden, name='compare1',
-            init='he_normal', W_regularizer=l2(L2)))
-        self.model.add(Activation('relu'))
-        self.model.add(Dropout(dropout))
-        self.model.add(Dense(nr_hidden, name='compare2',
-                        W_regularizer=l2(L2), init='he_normal'))
-        self.model.add(Activation('relu'))
-        self.model = TimeDistributed(self.model)
-
-    def __call__(self, sent, align, **kwargs):
-        result = self.model(merge([sent, align], mode='concat')) # Shape: (i, n)
-        avged = GlobalAveragePooling1D()(result, mask=self.words)
-        maxed = GlobalMaxPooling1D()(result, mask=self.words)
-        merged = merge([avged, maxed])
-        result = BatchNormalization()(merged)
-        return result
-
-
-class _Entailment(object):
-    def __init__(self, nr_hidden, nr_out, dropout=0.0, L2=0.0):
-        self.model = Sequential()
-        self.model.add(Dropout(dropout, input_shape=(nr_hidden*2,)))
-        self.model.add(Dense(nr_hidden, name='entail1',
-            init='he_normal', W_regularizer=l2(L2)))
-        self.model.add(Activation('relu'))
-        self.model.add(Dropout(dropout))
-        self.model.add(Dense(nr_hidden, name='entail2',
-            init='he_normal', W_regularizer=l2(L2)))
-        self.model.add(Activation('relu'))
-        self.model.add(Dense(nr_out, name='entail_out', activation='softmax',
-                        W_regularizer=l2(L2), init='zero'))
-
-    def __call__(self, feats1, feats2):
-        features = merge([feats1, feats2], mode='concat')
-        return self.model(features)
-
-
-class _GlobalSumPooling1D(Layer):
-    '''Global sum pooling operation for temporal data.
-
-    # Input shape
-        3D tensor with shape: `(samples, steps, features)`.
-
-    # Output shape
-        2D tensor with shape: `(samples, features)`.
-    '''
-    def __init__(self, **kwargs):
-        super(_GlobalSumPooling1D, self).__init__(**kwargs)
-        self.input_spec = [InputSpec(ndim=3)]
-
-    def get_output_shape_for(self, input_shape):
-        return (input_shape[0], input_shape[2])
-
-    def call(self, x, mask=None):
-        if mask is not None:
-            return K.sum(x * K.clip(mask, 0, 1), axis=1)
-        else:
-            return K.sum(x, axis=1)
+def sum_word(x):
+    return K.sum(x, axis=1)


 def test_build_model():
-    vectors = numpy.ndarray((100, 8), dtype='float32')
+    vectors = np.ndarray((100, 8), dtype='float32')
    shape = (10, 16, 3)
-    settings = {'lr': 0.001, 'dropout': 0.2, 'gru_encode':True}
+    settings = {'lr': 0.001, 'dropout': 0.2, 'gru_encode':True, 'entail_dir':'both'}
    model = build_model(vectors, shape, settings)


 def test_fit_model():

    def _generate_X(nr_example, length, nr_vector):
-        X1 = numpy.ndarray((nr_example, length), dtype='int32')
+        X1 = np.ndarray((nr_example, length), dtype='int32')
        X1 *= X1 < nr_vector
        X1 *= 0 <= X1
-        X2 = numpy.ndarray((nr_example, length), dtype='int32')
+        X2 = np.ndarray((nr_example, length), dtype='int32')
        X2 *= X2 < nr_vector
        X2 *= 0 <= X2
        return [X1, X2]

    def _generate_Y(nr_example, nr_class):
-        ys = numpy.zeros((nr_example, nr_class), dtype='int32')
+        ys = np.zeros((nr_example, nr_class), dtype='int32')
        for i in range(nr_example):
            ys[i, i % nr_class] = 1
        return ys

-    vectors = numpy.ndarray((100, 8), dtype='float32')
+    vectors = np.ndarray((100, 8), dtype='float32')
    shape = (10, 16, 3)
-    settings = {'lr': 0.001, 'dropout': 0.2, 'gru_encode':True}
+    settings = {'lr': 0.001, 'dropout': 0.2, 'gru_encode':True, 'entail_dir':'both'}
    model = build_model(vectors, shape, settings)

    train_X = _generate_X(20, shape[0], vectors.shape[0])
@ -261,8 +139,7 @@ def test_fit_model():
    dev_X = _generate_X(15, shape[0], vectors.shape[0])
    dev_Y = _generate_Y(15, shape[2])

-    model.fit(train_X, train_Y, validation_data=(dev_X, dev_Y), nb_epoch=5,
-              batch_size=4)
+    model.fit(train_X, train_Y, validation_data=(dev_X, dev_Y), epochs=5, batch_size=4)


 __all__ = [build_model]
--- a/examples/keras_parikh_entailment/spacy_hook.py
+++ b/examples/keras_parikh_entailment/spacy_hook.py
@ -1,8 +1,5 @@
+import numpy as np
 from keras.models import model_from_json
-import numpy
-import numpy.random
-import json
-from spacy.tokens.span import Span

 try:
    import cPickle as pickle
@ -11,16 +8,23 @@ except ImportError:


 class KerasSimilarityShim(object):
+    entailment_types = ["entailment", "contradiction", "neutral"]
+
    @classmethod
-    def load(cls, path, nlp, get_features=None, max_length=100):
+    def load(cls, path, nlp, max_length=100, get_features=None):
+        
        if get_features is None:
            get_features = get_word_ids
+            
        with (path / 'config.json').open() as file_:
            model = model_from_json(file_.read())
        with (path / 'model').open('rb') as file_:
            weights = pickle.load(file_)
+            
        embeddings = get_embeddings(nlp.vocab)
-        model.set_weights([embeddings] + weights)
+        weights.insert(1, embeddings)
+        model.set_weights(weights)
+
        return cls(model, get_features=get_features, max_length=max_length)

    def __init__(self, model, get_features=None, max_length=100):
@ -32,58 +36,42 @@ class KerasSimilarityShim(object):
        doc.user_hooks['similarity'] = self.predict
        doc.user_span_hooks['similarity'] = self.predict

+        return doc
+
    def predict(self, doc1, doc2):
-        x1 = self.get_features([doc1], max_length=self.max_length, tree_truncate=True)
-        x2 = self.get_features([doc2], max_length=self.max_length, tree_truncate=True)
+        x1 = self.get_features([doc1], max_length=self.max_length)
+        x2 = self.get_features([doc2], max_length=self.max_length)
        scores = self.model.predict([x1, x2])
-        return scores[0]
+
+        return self.entailment_types[scores.argmax()], scores.max()


 def get_embeddings(vocab, nr_unk=100):
-    nr_vector = max(lex.rank for lex in vocab) + 1
-    vectors = numpy.zeros((nr_vector+nr_unk+2, vocab.vectors_length), dtype='float32')
+    # the extra +1 is for a zero vector representing sentence-final padding
+    num_vectors = max(lex.rank for lex in vocab) + 2 
+    
+    # create random vectors for OOV tokens
+    oov = np.random.normal(size=(nr_unk, vocab.vectors_length))
+    oov = oov / oov.sum(axis=1, keepdims=True)
+    
+    vectors = np.zeros((num_vectors + nr_unk, vocab.vectors_length), dtype='float32')
+    vectors[1:(nr_unk + 1), ] = oov
    for lex in vocab:
-        if lex.has_vector:
-            vectors[lex.rank+1] = lex.vector / lex.vector_norm
+        if lex.has_vector and lex.vector_norm > 0:
+            vectors[nr_unk + lex.rank + 1] = lex.vector / lex.vector_norm 
+
    return vectors


-def get_word_ids(docs, rnn_encode=False, tree_truncate=False, max_length=100, nr_unk=100):
-    Xs = numpy.zeros((len(docs), max_length), dtype='int32')
+def get_word_ids(docs, max_length=100, nr_unk=100):
+    Xs = np.zeros((len(docs), max_length), dtype='int32')
+    
    for i, doc in enumerate(docs):
-        if tree_truncate:
-            if isinstance(doc, Span):
-                queue = [doc.root]
-            else:
-                queue = [sent.root for sent in doc.sents]
-        else:
-            queue = list(doc)
-        words = []
-        while len(words) <= max_length and queue:
-            word = queue.pop(0)
-            if rnn_encode or (not word.is_punct and not word.is_space):
-                words.append(word)
-            if tree_truncate:
-                queue.extend(list(word.lefts))
-                queue.extend(list(word.rights))
-        words.sort()
-        for j, token in enumerate(words):
-            if token.has_vector:
-                Xs[i, j] = token.rank+1
-            else:
-                Xs[i, j] = (token.shape % (nr_unk-1))+2
-            j += 1
-            if j >= max_length:
+        for j, token in enumerate(doc):
+            if j == max_length:
                break
-        else:
-            Xs[i, len(words)] = 1
+            if token.has_vector:
+                Xs[i, j] = token.rank + nr_unk + 1
+            else:
+                Xs[i, j] = token.rank % nr_unk + 1
    return Xs
-
-
-def create_similarity_pipeline(nlp, max_length=100):
-    return [
-        nlp.tagger,
-        nlp.entity,
-        nlp.parser,
-        KerasSimilarityShim.load(nlp.path / 'similarity', nlp, max_length)
-    ]
--- a/examples/notebooks/Decompositional
+++ b/examples/notebooks/Decompositional
@ -0,0 +1,955 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Natural language inference using spaCy and Keras"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Introduction"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This notebook details an implementation of the natural language inference model presented in [(Parikh et al, 2016)](https://arxiv.org/abs/1606.01933).  The model is notable for the small number of paramaters *and hyperparameters* it specifices, while still yielding good performance."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Constructing the dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import spacy\n",
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We only need the GloVe vectors from spaCy, not a full NLP pipeline."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "nlp = spacy.load('en_vectors_web_lg')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Function to load the SNLI dataset.  The categories are converted to one-shot representation.  The function comes from an example in spaCy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/jds/tensorflow-gpu/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
+      "  from ._conv import register_converters as _register_converters\n",
+      "Using TensorFlow backend.\n"
+     ]
+    }
+   ],
+   "source": [
+    "import ujson as json\n",
+    "from keras.utils import to_categorical\n",
+    "\n",
+    "LABELS = {'entailment': 0, 'contradiction': 1, 'neutral': 2}\n",
+    "def read_snli(path):\n",
+    "    texts1 = []\n",
+    "    texts2 = []\n",
+    "    labels = []\n",
+    "    with open(path, 'r') as file_:\n",
+    "        for line in file_:\n",
+    "            eg = json.loads(line)\n",
+    "            label = eg['gold_label']\n",
+    "            if label == '-':  # per Parikh, ignore - SNLI entries\n",
+    "                continue\n",
+    "            texts1.append(eg['sentence1'])\n",
+    "            texts2.append(eg['sentence2'])\n",
+    "            labels.append(LABELS[label])\n",
+    "    return texts1, texts2, to_categorical(np.asarray(labels, dtype='int32'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Because Keras can do the train/test split for us, we'll load *all* SNLI triples from one file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "texts,hypotheses,labels = read_snli('snli/snli_1.0_train.jsonl')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_dataset(nlp, texts, hypotheses, num_oov, max_length, norm_vectors = True):\n",
+    "    sents = texts + hypotheses\n",
+    "    \n",
+    "    # the extra +1 is for a zero vector represting NULL for padding\n",
+    "    num_vectors = max(lex.rank for lex in nlp.vocab) + 2 \n",
+    "    \n",
+    "    # create random vectors for OOV tokens\n",
+    "    oov = np.random.normal(size=(num_oov, nlp.vocab.vectors_length))\n",
+    "    oov = oov / oov.sum(axis=1, keepdims=True)\n",
+    "    \n",
+    "    vectors = np.zeros((num_vectors + num_oov, nlp.vocab.vectors_length), dtype='float32')\n",
+    "    vectors[num_vectors:, ] = oov\n",
+    "    for lex in nlp.vocab:\n",
+    "        if lex.has_vector and lex.vector_norm > 0:\n",
+    "            vectors[lex.rank + 1] = lex.vector / lex.vector_norm if norm_vectors == True else lex.vector\n",
+    "            \n",
+    "    sents_as_ids = []\n",
+    "    for sent in sents:\n",
+    "        doc = nlp(sent)\n",
+    "        word_ids = []\n",
+    "        \n",
+    "        for i, token in enumerate(doc):\n",
+    "            # skip odd spaces from tokenizer\n",
+    "            if token.has_vector and token.vector_norm == 0:\n",
+    "                continue\n",
+    "                \n",
+    "            if i > max_length:\n",
+    "                break\n",
+    "                \n",
+    "            if token.has_vector:\n",
+    "                word_ids.append(token.rank + 1)\n",
+    "            else:\n",
+    "                # if we don't have a vector, pick an OOV entry\n",
+    "                word_ids.append(token.rank % num_oov + num_vectors) \n",
+    "                \n",
+    "        # there must be a simpler way of generating padded arrays from lists...\n",
+    "        word_id_vec = np.zeros((max_length), dtype='int')\n",
+    "        clipped_len = min(max_length, len(word_ids))\n",
+    "        word_id_vec[:clipped_len] = word_ids[:clipped_len]\n",
+    "        sents_as_ids.append(word_id_vec)\n",
+    "        \n",
+    "        \n",
+    "    return vectors, np.array(sents_as_ids[:len(texts)]), np.array(sents_as_ids[len(texts):])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sem_vectors, text_vectors, hypothesis_vectors = create_dataset(nlp, texts, hypotheses, 100, 50, True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "texts_test,hypotheses_test,labels_test = read_snli('snli/snli_1.0_test.jsonl')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "_, text_vectors_test, hypothesis_vectors_test = create_dataset(nlp, texts_test, hypotheses_test, 100, 50, True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We use spaCy to tokenize the sentences and return, when available, a semantic vector for each token.  \n",
+    "\n",
+    "OOV terms (tokens for which no semantic vector is available) are assigned to one of a set of randomly-generated OOV vectors, per (Parikh et al, 2016).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that we will clip sentences to 50 words maximum."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from keras import layers, Model, models\n",
+    "from keras import backend as K"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Building the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The embedding layer copies the 300-dimensional GloVe vectors into GPU memory.  Per (Parikh et al, 2016), the vectors, which are not adapted during training, are projected down to lower-dimensional vectors using a trained projection matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_embedding(vectors, max_length, projected_dim):\n",
+    "    return models.Sequential([\n",
+    "        layers.Embedding(\n",
+    "            vectors.shape[0],\n",
+    "            vectors.shape[1],\n",
+    "            input_length=max_length,\n",
+    "            weights=[vectors],\n",
+    "            trainable=False),\n",
+    "        \n",
+    "        layers.TimeDistributed(\n",
+    "            layers.Dense(projected_dim,\n",
+    "                         activation=None,\n",
+    "                         use_bias=False))\n",
+    "    ])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The Parikh model makes use of three feedforward blocks that construct nonlinear combinations of their input.  Each block contains two ReLU layers and two dropout layers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_feedforward(num_units=200, activation='relu', dropout_rate=0.2):\n",
+    "    return models.Sequential([\n",
+    "        layers.Dense(num_units, activation=activation),\n",
+    "        layers.Dropout(dropout_rate),\n",
+    "        layers.Dense(num_units, activation=activation),\n",
+    "        layers.Dropout(dropout_rate)\n",
+    "    ])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The basic idea of the (Parikh et al, 2016) model is to:\n",
+    "\n",
+    "1.  *Align*: Construct an alignment of subphrases in the text and hypothesis using an attention-like mechanism, called \"decompositional\" because the layer is applied to each of the two sentences individually rather than to their product.  The dot product of the nonlinear transformations of the inputs is then normalized vertically and horizontally to yield a pair of \"soft\" alignment structures, from text->hypothesis and hypothesis->text.  Concretely, for each word in one sentence, a multinomial distribution is computed over the words of the other sentence, by learning a multinomial logistic with softmax target.\n",
+    "2.  *Compare*: Each word is now compared to its aligned phrase using a function modeled as a two-layer feedforward ReLU network.  The output is a high-dimensional representation of the strength of association between word and aligned phrase.\n",
+    "3.  *Aggregate*: The comparison vectors are summed, separately, for the text and the hypothesis.  The result is two vectors: one that describes the degree of association of the text to the hypothesis, and the second, of the hypothesis to the text.\n",
+    "4.  Finally, these two vectors are processed by a dense layer followed by a softmax classifier, as usual.\n",
+    "\n",
+    "Note that because in entailment the truth conditions of the consequent must be a subset of those of the antecedent, it is not obvious that we need both vectors in step (3).  Entailment is not symmetric.  It may be enough to just use the hypothesis->text vector.  We will explore this possibility later."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We need a couple of little functions for Lambda layers to normalize and aggregate weights:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def normalizer(axis):\n",
+    "    def _normalize(att_weights):\n",
+    "        exp_weights = K.exp(att_weights)\n",
+    "        sum_weights = K.sum(exp_weights, axis=axis, keepdims=True)\n",
+    "        return exp_weights/sum_weights\n",
+    "    return _normalize\n",
+    "\n",
+    "def sum_word(x):\n",
+    "    return K.sum(x, axis=1)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def build_model(vectors, max_length, num_hidden, num_classes, projected_dim, entail_dir='both'):\n",
+    "    input1 = layers.Input(shape=(max_length,), dtype='int32', name='words1')\n",
+    "    input2 = layers.Input(shape=(max_length,), dtype='int32', name='words2')\n",
+    "    \n",
+    "    # embeddings (projected)\n",
+    "    embed = create_embedding(vectors, max_length, projected_dim)\n",
+    "   \n",
+    "    a = embed(input1)\n",
+    "    b = embed(input2)\n",
+    "    \n",
+    "    # step 1: attend\n",
+    "    F = create_feedforward(num_hidden)\n",
+    "    att_weights = layers.dot([F(a), F(b)], axes=-1)\n",
+    "    \n",
+    "    G = create_feedforward(num_hidden)\n",
+    "    \n",
+    "    if entail_dir == 'both':\n",
+    "        norm_weights_a = layers.Lambda(normalizer(1))(att_weights)\n",
+    "        norm_weights_b = layers.Lambda(normalizer(2))(att_weights)\n",
+    "        alpha = layers.dot([norm_weights_a, a], axes=1)\n",
+    "        beta  = layers.dot([norm_weights_b, b], axes=1)\n",
+    "\n",
+    "        # step 2: compare\n",
+    "        comp1 = layers.concatenate([a, beta])\n",
+    "        comp2 = layers.concatenate([b, alpha])\n",
+    "        v1 = layers.TimeDistributed(G)(comp1)\n",
+    "        v2 = layers.TimeDistributed(G)(comp2)\n",
+    "\n",
+    "        # step 3: aggregate\n",
+    "        v1_sum = layers.Lambda(sum_word)(v1)\n",
+    "        v2_sum = layers.Lambda(sum_word)(v2)\n",
+    "        concat = layers.concatenate([v1_sum, v2_sum])\n",
+    "    elif entail_dir == 'left':\n",
+    "        norm_weights_a = layers.Lambda(normalizer(1))(att_weights)\n",
+    "        alpha = layers.dot([norm_weights_a, a], axes=1)\n",
+    "        comp2 = layers.concatenate([b, alpha])\n",
+    "        v2 = layers.TimeDistributed(G)(comp2)\n",
+    "        v2_sum = layers.Lambda(sum_word)(v2)\n",
+    "        concat = v2_sum\n",
+    "    else:\n",
+    "        norm_weights_b = layers.Lambda(normalizer(2))(att_weights)\n",
+    "        beta  = layers.dot([norm_weights_b, b], axes=1)\n",
+    "        comp1 = layers.concatenate([a, beta])\n",
+    "        v1 = layers.TimeDistributed(G)(comp1)\n",
+    "        v1_sum = layers.Lambda(sum_word)(v1)\n",
+    "        concat = v1_sum\n",
+    "    \n",
+    "    H = create_feedforward(num_hidden)\n",
+    "    out = H(concat)\n",
+    "    out = layers.Dense(num_classes, activation='softmax')(out)\n",
+    "    \n",
+    "    model = Model([input1, input2], out)\n",
+    "    \n",
+    "    model.compile(optimizer='adam',\n",
+    "                  loss='categorical_crossentropy',\n",
+    "                  metrics=['accuracy'])\n",
+    "    return model\n",
+    "    \n",
+    "    \n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "__________________________________________________________________________________________________\n",
+      "Layer (type)                    Output Shape         Param #     Connected to                     \n",
+      "==================================================================================================\n",
+      "words1 (InputLayer)             (None, 50)           0                                            \n",
+      "__________________________________________________________________________________________________\n",
+      "words2 (InputLayer)             (None, 50)           0                                            \n",
+      "__________________________________________________________________________________________________\n",
+      "sequential_1 (Sequential)       (None, 50, 200)      321381600   words1[0][0]                     \n",
+      "                                                                 words2[0][0]                     \n",
+      "__________________________________________________________________________________________________\n",
+      "sequential_2 (Sequential)       (None, 50, 200)      80400       sequential_1[1][0]               \n",
+      "                                                                 sequential_1[2][0]               \n",
+      "__________________________________________________________________________________________________\n",
+      "dot_1 (Dot)                     (None, 50, 50)       0           sequential_2[1][0]               \n",
+      "                                                                 sequential_2[2][0]               \n",
+      "__________________________________________________________________________________________________\n",
+      "lambda_2 (Lambda)               (None, 50, 50)       0           dot_1[0][0]                      \n",
+      "__________________________________________________________________________________________________\n",
+      "lambda_1 (Lambda)               (None, 50, 50)       0           dot_1[0][0]                      \n",
+      "__________________________________________________________________________________________________\n",
+      "dot_3 (Dot)                     (None, 50, 200)      0           lambda_2[0][0]                   \n",
+      "                                                                 sequential_1[2][0]               \n",
+      "__________________________________________________________________________________________________\n",
+      "dot_2 (Dot)                     (None, 50, 200)      0           lambda_1[0][0]                   \n",
+      "                                                                 sequential_1[1][0]               \n",
+      "__________________________________________________________________________________________________\n",
+      "concatenate_1 (Concatenate)     (None, 50, 400)      0           sequential_1[1][0]               \n",
+      "                                                                 dot_3[0][0]                      \n",
+      "__________________________________________________________________________________________________\n",
+      "concatenate_2 (Concatenate)     (None, 50, 400)      0           sequential_1[2][0]               \n",
+      "                                                                 dot_2[0][0]                      \n",
+      "__________________________________________________________________________________________________\n",
+      "time_distributed_2 (TimeDistrib (None, 50, 200)      120400      concatenate_1[0][0]              \n",
+      "__________________________________________________________________________________________________\n",
+      "time_distributed_3 (TimeDistrib (None, 50, 200)      120400      concatenate_2[0][0]              \n",
+      "__________________________________________________________________________________________________\n",
+      "lambda_3 (Lambda)               (None, 200)          0           time_distributed_2[0][0]         \n",
+      "__________________________________________________________________________________________________\n",
+      "lambda_4 (Lambda)               (None, 200)          0           time_distributed_3[0][0]         \n",
+      "__________________________________________________________________________________________________\n",
+      "concatenate_3 (Concatenate)     (None, 400)          0           lambda_3[0][0]                   \n",
+      "                                                                 lambda_4[0][0]                   \n",
+      "__________________________________________________________________________________________________\n",
+      "sequential_4 (Sequential)       (None, 200)          120400      concatenate_3[0][0]              \n",
+      "__________________________________________________________________________________________________\n",
+      "dense_8 (Dense)                 (None, 3)            603         sequential_4[1][0]               \n",
+      "==================================================================================================\n",
+      "Total params: 321,703,403\n",
+      "Trainable params: 381,803\n",
+      "Non-trainable params: 321,321,600\n",
+      "__________________________________________________________________________________________________\n"
+     ]
+    }
+   ],
+   "source": [
+    "K.clear_session()\n",
+    "m = build_model(sem_vectors, 50, 200, 3, 200)\n",
+    "m.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The number of trainable parameters, ~381k, is the number given by Parikh et al, so we're on the right track."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Training the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Parikh et al use tiny batches of 4, training for 50MM batches, which amounts to around 500 epochs.  Here we'll use large batches to better use the GPU, and train for fewer epochs -- for purposes of this experiment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Train on 549367 samples, validate on 9824 samples\n",
+      "Epoch 1/50\n",
+      "549367/549367 [==============================] - 34s 62us/step - loss: 0.7599 - acc: 0.6617 - val_loss: 0.5396 - val_acc: 0.7861\n",
+      "Epoch 2/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.5611 - acc: 0.7763 - val_loss: 0.4892 - val_acc: 0.8085\n",
+      "Epoch 3/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.5212 - acc: 0.7948 - val_loss: 0.4574 - val_acc: 0.8261\n",
+      "Epoch 4/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4986 - acc: 0.8045 - val_loss: 0.4410 - val_acc: 0.8274\n",
+      "Epoch 5/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4819 - acc: 0.8114 - val_loss: 0.4224 - val_acc: 0.8383\n",
+      "Epoch 6/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4714 - acc: 0.8166 - val_loss: 0.4200 - val_acc: 0.8379\n",
+      "Epoch 7/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4633 - acc: 0.8203 - val_loss: 0.4098 - val_acc: 0.8457\n",
+      "Epoch 8/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4558 - acc: 0.8232 - val_loss: 0.4114 - val_acc: 0.8415\n",
+      "Epoch 9/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4508 - acc: 0.8250 - val_loss: 0.4062 - val_acc: 0.8477\n",
+      "Epoch 10/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4433 - acc: 0.8286 - val_loss: 0.3982 - val_acc: 0.8486\n",
+      "Epoch 11/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4388 - acc: 0.8307 - val_loss: 0.3953 - val_acc: 0.8497\n",
+      "Epoch 12/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4351 - acc: 0.8321 - val_loss: 0.3973 - val_acc: 0.8522\n",
+      "Epoch 13/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4309 - acc: 0.8342 - val_loss: 0.3939 - val_acc: 0.8539\n",
+      "Epoch 14/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4269 - acc: 0.8355 - val_loss: 0.3932 - val_acc: 0.8517\n",
+      "Epoch 15/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4247 - acc: 0.8369 - val_loss: 0.3938 - val_acc: 0.8515\n",
+      "Epoch 16/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4208 - acc: 0.8379 - val_loss: 0.3936 - val_acc: 0.8504\n",
+      "Epoch 17/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4194 - acc: 0.8390 - val_loss: 0.3885 - val_acc: 0.8560\n",
+      "Epoch 18/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4162 - acc: 0.8402 - val_loss: 0.3874 - val_acc: 0.8561\n",
+      "Epoch 19/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4140 - acc: 0.8409 - val_loss: 0.3889 - val_acc: 0.8545\n",
+      "Epoch 20/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4114 - acc: 0.8426 - val_loss: 0.3864 - val_acc: 0.8583\n",
+      "Epoch 21/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4092 - acc: 0.8430 - val_loss: 0.3870 - val_acc: 0.8561\n",
+      "Epoch 22/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4062 - acc: 0.8442 - val_loss: 0.3852 - val_acc: 0.8577\n",
+      "Epoch 23/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4050 - acc: 0.8450 - val_loss: 0.3850 - val_acc: 0.8578\n",
+      "Epoch 24/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4035 - acc: 0.8455 - val_loss: 0.3825 - val_acc: 0.8555\n",
+      "Epoch 25/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.4018 - acc: 0.8460 - val_loss: 0.3837 - val_acc: 0.8573\n",
+      "Epoch 26/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3989 - acc: 0.8476 - val_loss: 0.3843 - val_acc: 0.8599\n",
+      "Epoch 27/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3979 - acc: 0.8481 - val_loss: 0.3841 - val_acc: 0.8589\n",
+      "Epoch 28/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3967 - acc: 0.8484 - val_loss: 0.3811 - val_acc: 0.8575\n",
+      "Epoch 29/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3956 - acc: 0.8492 - val_loss: 0.3829 - val_acc: 0.8589\n",
+      "Epoch 30/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3938 - acc: 0.8499 - val_loss: 0.3859 - val_acc: 0.8562\n",
+      "Epoch 31/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3925 - acc: 0.8500 - val_loss: 0.3798 - val_acc: 0.8587\n",
+      "Epoch 32/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3906 - acc: 0.8509 - val_loss: 0.3834 - val_acc: 0.8569\n",
+      "Epoch 33/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3893 - acc: 0.8511 - val_loss: 0.3806 - val_acc: 0.8588\n",
+      "Epoch 34/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3885 - acc: 0.8515 - val_loss: 0.3828 - val_acc: 0.8603\n",
+      "Epoch 35/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3879 - acc: 0.8520 - val_loss: 0.3800 - val_acc: 0.8594\n",
+      "Epoch 36/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3860 - acc: 0.8530 - val_loss: 0.3796 - val_acc: 0.8577\n",
+      "Epoch 37/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3856 - acc: 0.8532 - val_loss: 0.3857 - val_acc: 0.8591\n",
+      "Epoch 38/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3838 - acc: 0.8535 - val_loss: 0.3835 - val_acc: 0.8603\n",
+      "Epoch 39/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3830 - acc: 0.8543 - val_loss: 0.3830 - val_acc: 0.8599\n",
+      "Epoch 40/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3818 - acc: 0.8548 - val_loss: 0.3832 - val_acc: 0.8559\n",
+      "Epoch 41/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3806 - acc: 0.8551 - val_loss: 0.3845 - val_acc: 0.8553\n",
+      "Epoch 42/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3803 - acc: 0.8550 - val_loss: 0.3789 - val_acc: 0.8617\n",
+      "Epoch 43/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3791 - acc: 0.8556 - val_loss: 0.3835 - val_acc: 0.8580\n",
+      "Epoch 44/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3778 - acc: 0.8565 - val_loss: 0.3799 - val_acc: 0.8580\n",
+      "Epoch 45/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3766 - acc: 0.8571 - val_loss: 0.3790 - val_acc: 0.8625\n",
+      "Epoch 46/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3770 - acc: 0.8569 - val_loss: 0.3820 - val_acc: 0.8590\n",
+      "Epoch 47/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3761 - acc: 0.8573 - val_loss: 0.3831 - val_acc: 0.8581\n",
+      "Epoch 48/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3739 - acc: 0.8579 - val_loss: 0.3828 - val_acc: 0.8599\n",
+      "Epoch 49/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3738 - acc: 0.8577 - val_loss: 0.3785 - val_acc: 0.8590\n",
+      "Epoch 50/50\n",
+      "549367/549367 [==============================] - 33s 60us/step - loss: 0.3726 - acc: 0.8580 - val_loss: 0.3820 - val_acc: 0.8585\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<keras.callbacks.History at 0x7f5c9f49c438>"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "m.fit([text_vectors, hypothesis_vectors], labels, batch_size=1024, epochs=50,validation_data=([text_vectors_test, hypothesis_vectors_test], labels_test))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The result is broadly in the region reported by Parikh et al: ~86 vs 86.3%.  The small difference might be accounted by differences in `max_length` (here set at 50), in the training regime, and that here we use Keras' built-in validation splitting rather than the SNLI test set."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Experiment: the asymmetric model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It was suggested earlier that, based on the semantics of entailment, the vector representing the strength of association between the hypothesis to the text is all that is needed for classifying the entailment.\n",
+    "\n",
+    "The following model removes consideration of the complementary vector (text to hypothesis) from the computation.  This will decrease the paramater count slightly, because the final dense layers will be smaller, and speed up the forward pass when predicting, because fewer calculations will be needed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "__________________________________________________________________________________________________\n",
+      "Layer (type)                    Output Shape         Param #     Connected to                     \n",
+      "==================================================================================================\n",
+      "words2 (InputLayer)             (None, 50)           0                                            \n",
+      "__________________________________________________________________________________________________\n",
+      "words1 (InputLayer)             (None, 50)           0                                            \n",
+      "__________________________________________________________________________________________________\n",
+      "sequential_5 (Sequential)       (None, 50, 200)      321381600   words1[0][0]                     \n",
+      "                                                                 words2[0][0]                     \n",
+      "__________________________________________________________________________________________________\n",
+      "sequential_6 (Sequential)       (None, 50, 200)      80400       sequential_5[1][0]               \n",
+      "                                                                 sequential_5[2][0]               \n",
+      "__________________________________________________________________________________________________\n",
+      "dot_4 (Dot)                     (None, 50, 50)       0           sequential_6[1][0]               \n",
+      "                                                                 sequential_6[2][0]               \n",
+      "__________________________________________________________________________________________________\n",
+      "lambda_5 (Lambda)               (None, 50, 50)       0           dot_4[0][0]                      \n",
+      "__________________________________________________________________________________________________\n",
+      "dot_5 (Dot)                     (None, 50, 200)      0           lambda_5[0][0]                   \n",
+      "                                                                 sequential_5[1][0]               \n",
+      "__________________________________________________________________________________________________\n",
+      "concatenate_4 (Concatenate)     (None, 50, 400)      0           sequential_5[2][0]               \n",
+      "                                                                 dot_5[0][0]                      \n",
+      "__________________________________________________________________________________________________\n",
+      "time_distributed_5 (TimeDistrib (None, 50, 200)      120400      concatenate_4[0][0]              \n",
+      "__________________________________________________________________________________________________\n",
+      "lambda_6 (Lambda)               (None, 200)          0           time_distributed_5[0][0]         \n",
+      "__________________________________________________________________________________________________\n",
+      "sequential_8 (Sequential)       (None, 200)          80400       lambda_6[0][0]                   \n",
+      "__________________________________________________________________________________________________\n",
+      "dense_16 (Dense)                (None, 3)            603         sequential_8[1][0]               \n",
+      "==================================================================================================\n",
+      "Total params: 321,663,403\n",
+      "Trainable params: 341,803\n",
+      "Non-trainable params: 321,321,600\n",
+      "__________________________________________________________________________________________________\n"
+     ]
+    }
+   ],
+   "source": [
+    "m1 = build_model(sem_vectors, 50, 200, 3, 200, 'left')\n",
+    "m1.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The parameter count has indeed decreased by 40,000, corresponding to the 200x200 smaller H function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Train on 549367 samples, validate on 9824 samples\n",
+      "Epoch 1/50\n",
+      "549367/549367 [==============================] - 25s 46us/step - loss: 0.7331 - acc: 0.6770 - val_loss: 0.5257 - val_acc: 0.7936\n",
+      "Epoch 2/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.5518 - acc: 0.7799 - val_loss: 0.4717 - val_acc: 0.8159\n",
+      "Epoch 3/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.5147 - acc: 0.7967 - val_loss: 0.4449 - val_acc: 0.8278\n",
+      "Epoch 4/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4948 - acc: 0.8060 - val_loss: 0.4326 - val_acc: 0.8344\n",
+      "Epoch 5/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4814 - acc: 0.8122 - val_loss: 0.4247 - val_acc: 0.8359\n",
+      "Epoch 6/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4712 - acc: 0.8162 - val_loss: 0.4143 - val_acc: 0.8430\n",
+      "Epoch 7/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4635 - acc: 0.8205 - val_loss: 0.4172 - val_acc: 0.8401\n",
+      "Epoch 8/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4570 - acc: 0.8223 - val_loss: 0.4106 - val_acc: 0.8422\n",
+      "Epoch 9/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4505 - acc: 0.8259 - val_loss: 0.4043 - val_acc: 0.8451\n",
+      "Epoch 10/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4459 - acc: 0.8280 - val_loss: 0.4050 - val_acc: 0.8467\n",
+      "Epoch 11/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4405 - acc: 0.8300 - val_loss: 0.3975 - val_acc: 0.8481\n",
+      "Epoch 12/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4360 - acc: 0.8324 - val_loss: 0.4026 - val_acc: 0.8496\n",
+      "Epoch 13/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4327 - acc: 0.8334 - val_loss: 0.4024 - val_acc: 0.8471\n",
+      "Epoch 14/50\n",
+      "549367/549367 [==============================] - 24s 45us/step - loss: 0.4293 - acc: 0.8350 - val_loss: 0.3955 - val_acc: 0.8496\n",
+      "Epoch 15/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4263 - acc: 0.8369 - val_loss: 0.3980 - val_acc: 0.8490\n",
+      "Epoch 16/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4236 - acc: 0.8377 - val_loss: 0.3958 - val_acc: 0.8496\n",
+      "Epoch 17/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4213 - acc: 0.8384 - val_loss: 0.3954 - val_acc: 0.8496\n",
+      "Epoch 18/50\n",
+      "549367/549367 [==============================] - 24s 45us/step - loss: 0.4187 - acc: 0.8394 - val_loss: 0.3929 - val_acc: 0.8514\n",
+      "Epoch 19/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4157 - acc: 0.8409 - val_loss: 0.3939 - val_acc: 0.8507\n",
+      "Epoch 20/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4135 - acc: 0.8417 - val_loss: 0.3953 - val_acc: 0.8522\n",
+      "Epoch 21/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4122 - acc: 0.8424 - val_loss: 0.3974 - val_acc: 0.8506\n",
+      "Epoch 22/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4099 - acc: 0.8435 - val_loss: 0.3918 - val_acc: 0.8522\n",
+      "Epoch 23/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4075 - acc: 0.8443 - val_loss: 0.3901 - val_acc: 0.8513\n",
+      "Epoch 24/50\n",
+      "549367/549367 [==============================] - 24s 44us/step - loss: 0.4067 - acc: 0.8447 - val_loss: 0.3885 - val_acc: 0.8543\n",
+      "Epoch 25/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4047 - acc: 0.8454 - val_loss: 0.3846 - val_acc: 0.8531\n",
+      "Epoch 26/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.4031 - acc: 0.8461 - val_loss: 0.3864 - val_acc: 0.8562\n",
+      "Epoch 27/50\n",
+      "549367/549367 [==============================] - 24s 45us/step - loss: 0.4020 - acc: 0.8467 - val_loss: 0.3874 - val_acc: 0.8546\n",
+      "Epoch 28/50\n",
+      "549367/549367 [==============================] - 24s 45us/step - loss: 0.4001 - acc: 0.8473 - val_loss: 0.3848 - val_acc: 0.8534\n",
+      "Epoch 29/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3991 - acc: 0.8479 - val_loss: 0.3865 - val_acc: 0.8562\n",
+      "Epoch 30/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3976 - acc: 0.8484 - val_loss: 0.3833 - val_acc: 0.8574\n",
+      "Epoch 31/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3961 - acc: 0.8487 - val_loss: 0.3846 - val_acc: 0.8585\n",
+      "Epoch 32/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3942 - acc: 0.8498 - val_loss: 0.3805 - val_acc: 0.8573\n",
+      "Epoch 33/50\n",
+      "549367/549367 [==============================] - 24s 44us/step - loss: 0.3935 - acc: 0.8503 - val_loss: 0.3856 - val_acc: 0.8579\n",
+      "Epoch 34/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3923 - acc: 0.8507 - val_loss: 0.3829 - val_acc: 0.8560\n",
+      "Epoch 35/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3920 - acc: 0.8508 - val_loss: 0.3864 - val_acc: 0.8575\n",
+      "Epoch 36/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3907 - acc: 0.8516 - val_loss: 0.3873 - val_acc: 0.8563\n",
+      "Epoch 37/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3891 - acc: 0.8519 - val_loss: 0.3850 - val_acc: 0.8570\n",
+      "Epoch 38/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3872 - acc: 0.8522 - val_loss: 0.3815 - val_acc: 0.8591\n",
+      "Epoch 39/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3887 - acc: 0.8520 - val_loss: 0.3829 - val_acc: 0.8590\n",
+      "Epoch 40/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3868 - acc: 0.8531 - val_loss: 0.3807 - val_acc: 0.8600\n",
+      "Epoch 41/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3859 - acc: 0.8537 - val_loss: 0.3832 - val_acc: 0.8574\n",
+      "Epoch 42/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3849 - acc: 0.8537 - val_loss: 0.3850 - val_acc: 0.8576\n",
+      "Epoch 43/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3834 - acc: 0.8541 - val_loss: 0.3825 - val_acc: 0.8563\n",
+      "Epoch 44/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3829 - acc: 0.8548 - val_loss: 0.3844 - val_acc: 0.8540\n",
+      "Epoch 45/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3816 - acc: 0.8552 - val_loss: 0.3841 - val_acc: 0.8559\n",
+      "Epoch 46/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3816 - acc: 0.8549 - val_loss: 0.3880 - val_acc: 0.8567\n",
+      "Epoch 47/50\n",
+      "549367/549367 [==============================] - 24s 45us/step - loss: 0.3799 - acc: 0.8559 - val_loss: 0.3767 - val_acc: 0.8635\n",
+      "Epoch 48/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3800 - acc: 0.8560 - val_loss: 0.3786 - val_acc: 0.8563\n",
+      "Epoch 49/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3781 - acc: 0.8563 - val_loss: 0.3812 - val_acc: 0.8596\n",
+      "Epoch 50/50\n",
+      "549367/549367 [==============================] - 25s 45us/step - loss: 0.3788 - acc: 0.8560 - val_loss: 0.3782 - val_acc: 0.8601\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<keras.callbacks.History at 0x7f5ca1bf3e48>"
+      ]
+     },
+     "execution_count": 21,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "m1.fit([text_vectors, hypothesis_vectors], labels, batch_size=1024, epochs=50,validation_data=([text_vectors_test, hypothesis_vectors_test], labels_test))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This model performs the same as the slightly more complex model that evaluates alignments in both directions.  Note also that processing time is improved, from 64 down to 48 microseconds per step. \n",
+    "\n",
+    "Let's now look at an asymmetric model that evaluates text to hypothesis comparisons.  The prediction is that such a model will correctly classify a decent proportion of the exemplars, but not as accurately as the previous two.\n",
+    "\n",
+    "We'll just use 10 epochs for expediency."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 96,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "__________________________________________________________________________________________________\n",
+      "Layer (type)                    Output Shape         Param #     Connected to                     \n",
+      "==================================================================================================\n",
+      "words1 (InputLayer)             (None, 50)           0                                            \n",
+      "__________________________________________________________________________________________________\n",
+      "words2 (InputLayer)             (None, 50)           0                                            \n",
+      "__________________________________________________________________________________________________\n",
+      "sequential_13 (Sequential)      (None, 50, 200)      321381600   words1[0][0]                     \n",
+      "                                                                 words2[0][0]                     \n",
+      "__________________________________________________________________________________________________\n",
+      "sequential_14 (Sequential)      (None, 50, 200)      80400       sequential_13[1][0]              \n",
+      "                                                                 sequential_13[2][0]              \n",
+      "__________________________________________________________________________________________________\n",
+      "dot_8 (Dot)                     (None, 50, 50)       0           sequential_14[1][0]              \n",
+      "                                                                 sequential_14[2][0]              \n",
+      "__________________________________________________________________________________________________\n",
+      "lambda_9 (Lambda)               (None, 50, 50)       0           dot_8[0][0]                      \n",
+      "__________________________________________________________________________________________________\n",
+      "dot_9 (Dot)                     (None, 50, 200)      0           lambda_9[0][0]                   \n",
+      "                                                                 sequential_13[2][0]              \n",
+      "__________________________________________________________________________________________________\n",
+      "concatenate_6 (Concatenate)     (None, 50, 400)      0           sequential_13[1][0]              \n",
+      "                                                                 dot_9[0][0]                      \n",
+      "__________________________________________________________________________________________________\n",
+      "time_distributed_9 (TimeDistrib (None, 50, 200)      120400      concatenate_6[0][0]              \n",
+      "__________________________________________________________________________________________________\n",
+      "lambda_10 (Lambda)              (None, 200)          0           time_distributed_9[0][0]         \n",
+      "__________________________________________________________________________________________________\n",
+      "sequential_16 (Sequential)      (None, 200)          80400       lambda_10[0][0]                  \n",
+      "__________________________________________________________________________________________________\n",
+      "dense_32 (Dense)                (None, 3)            603         sequential_16[1][0]              \n",
+      "==================================================================================================\n",
+      "Total params: 321,663,403\n",
+      "Trainable params: 341,803\n",
+      "Non-trainable params: 321,321,600\n",
+      "__________________________________________________________________________________________________\n"
+     ]
+    }
+   ],
+   "source": [
+    "m2 = build_model(sem_vectors, 50, 200, 3, 200, 'right')\n",
+    "m2.summary()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 97,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Train on 455226 samples, validate on 113807 samples\n",
+      "Epoch 1/10\n",
+      "455226/455226 [==============================] - 22s 49us/step - loss: 0.8920 - acc: 0.5771 - val_loss: 0.8001 - val_acc: 0.6435\n",
+      "Epoch 2/10\n",
+      "455226/455226 [==============================] - 22s 47us/step - loss: 0.7808 - acc: 0.6553 - val_loss: 0.7267 - val_acc: 0.6855\n",
+      "Epoch 3/10\n",
+      "455226/455226 [==============================] - 22s 47us/step - loss: 0.7329 - acc: 0.6825 - val_loss: 0.6966 - val_acc: 0.7006\n",
+      "Epoch 4/10\n",
+      "455226/455226 [==============================] - 22s 47us/step - loss: 0.7055 - acc: 0.6978 - val_loss: 0.6713 - val_acc: 0.7150\n",
+      "Epoch 5/10\n",
+      "455226/455226 [==============================] - 22s 47us/step - loss: 0.6862 - acc: 0.7081 - val_loss: 0.6533 - val_acc: 0.7253\n",
+      "Epoch 6/10\n",
+      "455226/455226 [==============================] - 21s 47us/step - loss: 0.6694 - acc: 0.7179 - val_loss: 0.6472 - val_acc: 0.7277\n",
+      "Epoch 7/10\n",
+      "455226/455226 [==============================] - 22s 47us/step - loss: 0.6555 - acc: 0.7252 - val_loss: 0.6338 - val_acc: 0.7347\n",
+      "Epoch 8/10\n",
+      "455226/455226 [==============================] - 22s 48us/step - loss: 0.6434 - acc: 0.7310 - val_loss: 0.6246 - val_acc: 0.7385\n",
+      "Epoch 9/10\n",
+      "455226/455226 [==============================] - 22s 47us/step - loss: 0.6325 - acc: 0.7367 - val_loss: 0.6164 - val_acc: 0.7424\n",
+      "Epoch 10/10\n",
+      "455226/455226 [==============================] - 22s 47us/step - loss: 0.6216 - acc: 0.7426 - val_loss: 0.6082 - val_acc: 0.7478\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "<keras.callbacks.History at 0x7fa6850cf080>"
+      ]
+     },
+     "execution_count": 97,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "m2.fit([text_vectors, hypothesis_vectors], labels, batch_size=1024, epochs=10,validation_split=.2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Comparing this fit to the validation accuracy of the previous two models after 10 epochs, we observe that its accuracy is roughly 10% lower.\n",
+    "\n",
+    "It is reassuring that the neural modeling here reproduces what we know from the semantics of natural language!"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.5.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/examples/pipeline/fix_space_entities.py
+++ b/examples/pipeline/fix_space_entities.py
@ -0,0 +1,27 @@
+'''Demonstrate adding a rule-based component that forces some tokens to not
+be entities, before the NER tagger is applied. This is used to hotfix the issue
+in https://github.com/explosion/spaCy/issues/2870 , present as of spaCy v2.0.16.
+'''
+import spacy
+from spacy.attrs import ENT_IOB
+
+def fix_space_tags(doc):
+    ent_iobs = doc.to_array([ENT_IOB])
+    for i, token in enumerate(doc):
+        if token.is_space:
+            # Sets 'O' tag (0 is None, so I is 1, O is 2)
+            ent_iobs[i] = 2
+    doc.from_array([ENT_IOB], ent_iobs.reshape((len(doc), 1)))
+    return doc
+
+def main():
+    nlp = spacy.load('en_core_web_sm')
+    text = u'''This is some crazy test where I dont need an Apple                Watch to make things bug'''
+    doc = nlp(text)
+    print('Before', doc.ents)
+    nlp.add_pipe(fix_space_tags, name='fix-ner', before='ner')
+    doc = nlp(text)
+    print('After', doc.ents)
+
+if __name__ == '__main__':
+    main()
--- a/examples/training/train_intent_parser.py
+++ b/examples/training/train_intent_parser.py
@ -21,8 +21,9 @@ from __future__ import unicode_literals, print_function

 import plac
 import random
-import spacy
 from pathlib import Path
+import spacy
+from spacy.util import minibatch, compounding


 # training data: texts, heads and dependency labels
@ -63,7 +64,7 @@ TRAIN_DATA = [
    model=("Model name. Defaults to blank 'en' model.", "option", "m", str),
    output_dir=("Optional output directory", "option", "o", Path),
    n_iter=("Number of training iterations", "option", "n", int))
-def main(model=None, output_dir=None, n_iter=5):
+def main(model=None, output_dir=None, n_iter=15):
    """Load the model, set up the pipeline and train the parser."""
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
@ -89,9 +90,12 @@ def main(model=None, output_dir=None, n_iter=5):
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
-            for text, annotations in TRAIN_DATA:
-                nlp.update([text], [annotations], sgd=optimizer, losses=losses)
-            print(losses)
+            # batch up the examples using spaCy's minibatch
+            batches = minibatch(TRAIN_DATA, size=compounding(4., 32., 1.001))
+            for batch in batches:
+                texts, annotations = zip(*batch)
+                nlp.update(texts, annotations, sgd=optimizer, losses=losses)
+            print('Losses', losses)

    # test the trained model
    test_model(nlp)
@ -135,7 +139,8 @@ if __name__ == '__main__':
    # [
    #   ('find', 'ROOT', 'find'),
    #   ('cheapest', 'QUALITY', 'gym'),
-    #   ('gym', 'PLACE', 'find')
+    #   ('gym', 'PLACE', 'find'),
+    #   ('near', 'ATTRIBUTE', 'gym'),
    #   ('work', 'LOCATION', 'near')
    # ]
    # show me the best hotel in berlin
--- a/examples/training/train_ner.py
+++ b/examples/training/train_ner.py
@ -15,6 +15,7 @@ import plac
 import random
 from pathlib import Path
 import spacy
+from spacy.util import minibatch, compounding


 # training data
@ -62,14 +63,17 @@ def main(model=None, output_dir=None, n_iter=100):
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
-            for text, annotations in TRAIN_DATA:
+            # batch up the examples using spaCy's minibatch
+            batches = minibatch(TRAIN_DATA, size=compounding(4., 32., 1.001))
+            for batch in batches:
+                texts, annotations = zip(*batch)
                nlp.update(
-                    [text],  # batch of texts
-                    [annotations],  # batch of annotations
+                    texts,  # batch of texts
+                    annotations,  # batch of annotations
                    drop=0.5,  # dropout - make it harder to memorise data
                    sgd=optimizer,  # callable to update weights
                    losses=losses)
-            print(losses)
+            print('Losses', losses)

    # test the trained model
    for text, _ in TRAIN_DATA:
--- a/examples/training/train_new_entity_type.py
+++ b/examples/training/train_new_entity_type.py
@ -31,6 +31,7 @@ import plac
 import random
 from pathlib import Path
 import spacy
+from spacy.util import minibatch, compounding


 # new entity label
@ -73,7 +74,7 @@ TRAIN_DATA = [
    new_model_name=("New model name for model meta.", "option", "nm", str),
    output_dir=("Optional output directory", "option", "o", Path),
    n_iter=("Number of training iterations", "option", "n", int))
-def main(model=None, new_model_name='animal', output_dir=None, n_iter=20):
+def main(model=None, new_model_name='animal', output_dir=None, n_iter=10):
    """Set up the pipeline and entity recognizer, and train the new entity."""
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
@ -104,10 +105,13 @@ def main(model=None, new_model_name='animal', output_dir=None, n_iter=20):
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
-            for text, annotations in TRAIN_DATA:
-                nlp.update([text], [annotations], sgd=optimizer, drop=0.35,
+            # batch up the examples using spaCy's minibatch
+            batches = minibatch(TRAIN_DATA, size=compounding(4., 32., 1.001))
+            for batch in batches:
+                texts, annotations = zip(*batch)
+                nlp.update(texts, annotations, sgd=optimizer, drop=0.35,
                           losses=losses)
-            print(losses)
+            print('Losses', losses)

    # test the trained model
    test_text = 'Do you like horses?'
--- a/examples/training/train_parser.py
+++ b/examples/training/train_parser.py
@ -13,6 +13,7 @@ import plac
 import random
 from pathlib import Path
 import spacy
+from spacy.util import minibatch, compounding


 # training data
@ -62,9 +63,12 @@ def main(model=None, output_dir=None, n_iter=10):
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
-            for text, annotations in TRAIN_DATA:
-                nlp.update([text], [annotations], sgd=optimizer, losses=losses)
-            print(losses)
+            # batch up the examples using spaCy's minibatch
+            batches = minibatch(TRAIN_DATA, size=compounding(4., 32., 1.001))
+            for batch in batches:
+                texts, annotations = zip(*batch)
+                nlp.update(texts, annotations, sgd=optimizer, losses=losses)
+            print('Losses', losses)

    # test the trained model
    test_text = "I like securities."
--- a/examples/training/train_tagger.py
+++ b/examples/training/train_tagger.py
@ -16,6 +16,7 @@ import plac
 import random
 from pathlib import Path
 import spacy
+from spacy.util import minibatch, compounding


 # You need to define a mapping from your data's part-of-speech tag names to the
@ -63,9 +64,12 @@ def main(lang='en', output_dir=None, n_iter=25):
    for i in range(n_iter):
        random.shuffle(TRAIN_DATA)
        losses = {}
-        for text, annotations in TRAIN_DATA:
-            nlp.update([text], [annotations], sgd=optimizer, losses=losses)
-        print(losses)
+        # batch up the examples using spaCy's minibatch
+        batches = minibatch(TRAIN_DATA, size=compounding(4., 32., 1.001))
+        for batch in batches:
+            texts, annotations = zip(*batch)
+            nlp.update(texts, annotations, sgd=optimizer, losses=losses)
+        print('Losses', losses)

    # test the trained model
    test_text = "I like blue eggs"
--- a/requirements.txt
+++ b/requirements.txt
@ -2,7 +2,7 @@ cython>=0.25
 numpy>=1.15.0
 cymem>=2.0.2,<2.1.0
 preshed>=2.0.1,<2.1.0
-thinc==7.0.0.dev1
+thinc==7.0.0.dev2
 blis>=0.2.2,<0.3.0
 murmurhash>=0.28.0,<1.1.0
 cytoolz>=0.9.0,<0.10.0
--- a/setup.py
+++ b/setup.py
@ -200,7 +200,7 @@ def setup_package():
                "murmurhash>=0.28.0,<1.1.0",
                "cymem>=2.0.2,<2.1.0",
                "preshed>=2.0.1,<2.1.0",
-                "thinc==7.0.0.dev1",
+                "thinc==7.0.0.dev2",
                "blis>=0.2.2,<0.3.0",
                "plac<1.0.0,>=0.9.6",
                "ujson>=1.35",
--- a/spacy/init.py
+++ b/spacy/init.py
@ -4,6 +4,9 @@ import warnings
 warnings.filterwarnings("ignore", message="numpy.dtype size changed")
 warnings.filterwarnings("ignore", message="numpy.ufunc size changed")

+# These are imported as part of the API
+from thinc.neural.util import prefer_gpu, require_gpu
+
 from .cli.info import info as cli_info
 from .glossary import explain
 from .about import __version__
--- a/spacy/cli/download.py
+++ b/spacy/cli/download.py
@ -14,7 +14,7 @@ from .. import about


@plac.annotations(
-    model=("model to download, shortcut or name)", "positional", None, str),
+    model=("model to download, shortcut or name", "positional", None, str),
    direct=("force direct download. Needs model name with version and won't "
            "perform compatibility check", "flag", "d", bool),
    pip_args=("additional arguments to be passed to `pip install` when "
--- a/spacy/compat.py
+++ b/spacy/compat.py
@ -1,6 +1,7 @@
 # coding: utf8
 from __future__ import unicode_literals

+import os
 import sys
 import ujson
 import itertools
--- a/spacy/displacy/render.py
+++ b/spacy/displacy/render.py
@ -1,6 +1,8 @@
 # coding: utf8
 from __future__ import unicode_literals

+import random
+
 from .templates import TPL_DEP_SVG, TPL_DEP_WORDS, TPL_DEP_ARCS
 from .templates import TPL_ENT, TPL_ENTS, TPL_FIGURE, TPL_TITLE, TPL_PAGE
 from ..util import minify_html, escape_html
@ -38,7 +40,10 @@ class DependencyRenderer(object):
        minify (bool): Minify HTML markup.
        RETURNS (unicode): Rendered SVG or HTML markup.
        """
-        rendered = [self.render_svg(i, p['words'], p['arcs'])
+        # Create a random ID prefix to make sure parses don't receive the
+        # same ID, even if they're identical
+        id_prefix = random.randint(0, 999)
+        rendered = [self.render_svg('{}-{}'.format(id_prefix, i), p['words'], p['arcs'])
                    for i, p in enumerate(parsed)]
        if page:
            content = ''.join([TPL_FIGURE.format(content=svg)
--- a/spacy/errors.py
+++ b/spacy/errors.py
@ -270,7 +270,10 @@ class Errors(object):
            "NBOR_RELOP.")
    E101 = ("NODE_NAME should be a new node and NBOR_NAME should already have "
            "have been declared in previous edges.")
-
+    E102 = ("Can't merge non-disjoint spans. '{token}' is already part of tokens to merge")
+    E103 = ("Trying to set conflicting doc.ents: '{span1}' and '{span2}'. A token"
+            " can only be part of one entity, so make sure the entities you're "
+            "setting don't overlap.")

@add_codes
 class TempErrors(object):
--- a/spacy/glossary.py
+++ b/spacy/glossary.py
@ -286,6 +286,7 @@ GLOSSARY = {
    'PERSON':       'People, including fictional',
    'NORP':         'Nationalities or religious or political groups',
    'FACILITY':     'Buildings, airports, highways, bridges, etc.',
+    'FAC':          'Buildings, airports, highways, bridges, etc.',
    'ORG':          'Companies, agencies, institutions, etc.',
    'GPE':          'Countries, cities, states',
    'LOC':          'Non-GPE locations, mountain ranges, bodies of water',
--- a/spacy/lang/bn/punctuation.py
+++ b/spacy/lang/bn/punctuation.py
@ -20,12 +20,11 @@ _suffixes = (_list_punct + LIST_ELLIPSES + LIST_QUOTES + LIST_ICONS +
              r'(?<=[{}(?:{})])\.'.format('|'.join([ALPHA_LOWER, r'%²\-\)\]\+', QUOTES]), _currency)])

 _infixes = (LIST_ELLIPSES + LIST_ICONS +
-            [r'(?<=[{}])\.(?=[{}])'.format(ALPHA_LOWER, ALPHA_UPPER),
+            [r'(?<=[0-9{zero}-{nine}])[+\-\*^=](?=[0-9{zero}-{nine}-])'.format(zero=u'০', nine=u'৯'),
             r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
-             r'(?<=[{a}"])[:<>=](?=[{a}])'.format(a=ALPHA),
-             r'(?<=[{a}])--(?=[{a}])'.format(a=ALPHA),
-             r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
-             r'(?<=[{a}])([{q}\)\]\(\[])(?=[\-{a}])'.format(a=ALPHA, q=_quotes)])
+             r'(?<=[{a}])[{h}](?={ae})'.format(a=ALPHA, h=HYPHENS, ae=u'এ'),
+             r'(?<=[{a}])[?";:=,.]*(?:{h})(?=[{a}])'.format(a=ALPHA, h=HYPHENS),
+             r'(?<=[{a}"])[:<>=/](?=[{a}])'.format(a=ALPHA)])


 TOKENIZER_PREFIXES = _prefixes
--- a/spacy/lang/ca/init.py
+++ b/spacy/lang/ca/init.py
@ -0,0 +1,64 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
+from .stop_words import STOP_WORDS
+from .lex_attrs import LEX_ATTRS
+
+# uncomment if files are available
+# from .norm_exceptions import NORM_EXCEPTIONS
+# from .tag_map import TAG_MAP
+# from .morph_rules import MORPH_RULES
+
+# uncomment if lookup-based lemmatizer is available
+from .lemmatizer import LOOKUP
+
+from ..tokenizer_exceptions import BASE_EXCEPTIONS
+from ..norm_exceptions import BASE_NORMS
+from ...language import Language
+from ...attrs import LANG, NORM
+from ...util import update_exc, add_lookups
+
+# Create a Language subclass
+# Documentation: https://spacy.io/docs/usage/adding-languages
+
+# This file should be placed in spacy/lang/ca (ISO code of language).
+# Before submitting a pull request, make sure the remove all comments from the
+# language data files, and run at least the basic tokenizer tests. Simply add the
+# language ID to the list of languages in spacy/tests/conftest.py to include it
+# in the basic tokenizer sanity tests. You can optionally add a fixture for the
+# language's tokenizer and add more specific tests. For more info, see the
+# tests documentation: https://github.com/explosion/spaCy/tree/master/spacy/tests
+
+
+class CatalanDefaults(Language.Defaults):
+    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
+    lex_attr_getters[LANG] = lambda text: 'ca' # ISO code
+    # add more norm exception dictionaries here
+    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS)
+
+    # overwrite functions for lexical attributes
+    lex_attr_getters.update(LEX_ATTRS)
+
+    # add custom tokenizer exceptions to base exceptions
+    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
+
+    # add stop words
+    stop_words = STOP_WORDS
+
+    # if available: add tag map
+    # tag_map = dict(TAG_MAP)
+
+    # if available: add morph rules
+    # morph_rules = dict(MORPH_RULES)
+
+    lemma_lookup = LOOKUP
+
+
+class Catalan(Language):
+    lang = 'ca' # ISO code
+    Defaults = CatalanDefaults # set Defaults to custom language defaults
+
+
+# set default export – this allows the language class to be lazy-loaded
+__all__ = ['Catalan']
--- a/spacy/lang/ca/examples.py
+++ b/spacy/lang/ca/examples.py
@ -0,0 +1,22 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+"""
+Example sentences to test spaCy and its language models.
+
+>>> from spacy.lang.es.examples import sentences
+>>> docs = nlp.pipe(sentences)
+"""
+
+
+sentences = [
+    "Apple està buscant comprar una startup del Regne Unit per mil milions de dòlars",
+    "Els cotxes autònoms deleguen la responsabilitat de l'assegurança als seus fabricants",
+    "San Francisco analitza prohibir els robots de repartiment",
+    "Londres és una gran ciutat del Regne Unit",
+    "El gat menja peix",
+    "Veig a l'home amb el telescopi",
+    "L'Aranya menja mosques",
+    "El pingüí incuba en el seu niu",
+]
--- a/spacy/lang/ca/lemmatizer.py
+++ b/spacy/lang/ca/lemmatizer.py
--- a/spacy/lang/ca/lex_attrs.py
+++ b/spacy/lang/ca/lex_attrs.py
@ -0,0 +1,43 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+# import the symbols for the attrs you want to overwrite
+from ...attrs import LIKE_NUM
+
+
+# Overwriting functions for lexical attributes
+# Documentation: https://localhost:1234/docs/usage/adding-languages#lex-attrs
+# Most of these functions, like is_lower or like_url should be language-
+# independent. Others, like like_num (which includes both digits and number
+# words), requires customisation.
+
+
+# Example: check if token resembles a number
+
+_num_words = ['zero', 'un', 'dos', 'tres', 'quatre', 'cinc', 'sis', 'set',
+              'vuit', 'nou', 'deu', 'onze', 'dotze', 'tretze', 'catorze',
+              'quinze', 'setze', 'disset', 'divuit', 'dinou', 'vint',
+              'trenta', 'quaranta', 'cinquanta', 'seixanta', 'setanta', 'vuitanta', 'noranta',
+              'cent', 'mil', 'milió', 'bilió', 'trilió', 'quatrilió',
+              'gazilió', 'bazilió']
+
+
+def like_num(text):
+    text = text.replace(',', '').replace('.', '')
+    if text.isdigit():
+        return True
+    if text.count('/') == 1:
+        num, denom = text.split('/')
+        if num.isdigit() and denom.isdigit():
+            return True
+    if text in _num_words:
+        return True
+    return False
+
+
+# Create dictionary of functions to overwrite. The default lex_attr_getters are
+# updated with this one, so only the functions defined here are overwritten.
+
+LEX_ATTRS = {
+    LIKE_NUM: like_num
+}
--- a/spacy/lang/ca/stop_words.py
+++ b/spacy/lang/ca/stop_words.py
@ -0,0 +1,56 @@
+# encoding: utf8
+from __future__ import unicode_literals
+
+
+# Stop words
+
+STOP_WORDS = set("""
+a abans ací ah així això al aleshores algun alguna algunes alguns alhora allà allí allò
+als altra altre altres amb ambdues ambdós anar ans apa aquell aquella aquelles aquells
+aquest aquesta aquestes aquests aquí
+
+baix bastant bé
+
+cada cadascuna cadascunes cadascuns cadascú com consegueixo conseguim conseguir
+consigueix consigueixen consigueixes contra
+
+d'un d'una d'unes d'uns dalt de del dels des des de després dins dintre donat doncs durant
+
+e eh el elles ells els em en encara ens entre era erem eren eres es esta estan estat
+estava estaven estem esteu estic està estàvem estàveu et etc ets érem éreu és éssent
+
+fa faig fan fas fem fer feu fi fins fora
+
+gairebé
+
+ha han has haver havia he hem heu hi ho
+
+i igual iguals inclòs
+
+ja jo
+
+l'hi la les li li'n llarg llavors
+
+m'he ma mal malgrat mateix mateixa mateixes mateixos me mentre meu meus meva
+meves mode molt molta moltes molts mon mons més
+
+n'he n'hi ne ni no nogensmenys només nosaltres nostra nostre nostres
+
+o oh oi on
+
+pas pel pels per per que perquè però poc poca pocs podem poden poder
+podeu poques potser primer propi puc
+
+qual quals quan quant que quelcom qui quin quina quines quins què
+
+s'ha s'han sa sabem saben saber sabeu sap saps semblant semblants sense ser ses
+seu seus seva seves si sobre sobretot soc solament sols som son sons sota sou sóc són
+
+t'ha t'han t'he ta tal també tampoc tan tant tanta tantes te tene tenim tenir teniu
+teu teus teva teves tinc ton tons tot tota totes tots
+
+un una unes uns us últim ús
+
+va vaig vam van vas veu vosaltres vostra vostre vostres
+
+""".split())
--- a/spacy/lang/ca/tag_map.py
+++ b/spacy/lang/ca/tag_map.py
@ -0,0 +1,36 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ..symbols import POS, ADV, NOUN, ADP, PRON, SCONJ, PROPN, DET, SYM, INTJ
+from ..symbols import PUNCT, NUM, AUX, X, CONJ, ADJ, VERB, PART, SPACE, CCONJ
+
+
+# Add a tag map
+# Documentation: https://spacy.io/docs/usage/adding-languages#tag-map
+# Universal Dependencies: http://universaldependencies.org/u/pos/all.html
+# The keys of the tag map should be strings in your tag set. The dictionary must
+# have an entry POS whose value is one of the Universal Dependencies tags.
+# Optionally, you can also include morphological features or other attributes.
+
+
+TAG_MAP = {
+    "ADV":      {POS: ADV},
+    "NOUN":     {POS: NOUN},
+    "ADP":      {POS: ADP},
+    "PRON":     {POS: PRON},
+    "SCONJ":    {POS: SCONJ},
+    "PROPN":    {POS: PROPN},
+    "DET":      {POS: DET},
+    "SYM":      {POS: SYM},
+    "INTJ":     {POS: INTJ},
+    "PUNCT":    {POS: PUNCT},
+    "NUM":      {POS: NUM},
+    "AUX":      {POS: AUX},
+    "X":        {POS: X},
+    "CONJ":     {POS: CONJ},
+    "CCONJ":    {POS: CCONJ},
+    "ADJ":      {POS: ADJ},
+    "VERB":     {POS: VERB},
+    "PART":     {POS: PART},
+    "SP":     	{POS: SPACE}
+}
--- a/spacy/lang/ca/tokenizer_exceptions.py
+++ b/spacy/lang/ca/tokenizer_exceptions.py
@ -0,0 +1,51 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+# import symbols – if you need to use more, add them here
+from ...symbols import ORTH, LEMMA, TAG, NORM, ADP, DET
+
+
+_exc = {}
+
+for exc_data in [
+    {ORTH: "aprox.", LEMMA: "aproximadament"},
+    {ORTH: "pàg.", LEMMA: "pàgina"},
+    {ORTH: "p.ex.", LEMMA: "per exemple"},
+    {ORTH: "gen.", LEMMA: "gener"},
+    {ORTH: "feb.", LEMMA: "febrer"},
+    {ORTH: "abr.", LEMMA: "abril"},
+    {ORTH: "jul.", LEMMA: "juliol"},
+    {ORTH: "set.", LEMMA: "setembre"},
+    {ORTH: "oct.", LEMMA: "octubre"},
+    {ORTH: "nov.", LEMMA: "novembre"},
+    {ORTH: "dec.", LEMMA: "desembre"},
+    {ORTH: "Dr.", LEMMA: "doctor"},
+    {ORTH: "Sr.", LEMMA: "senyor"},
+    {ORTH: "Sra.", LEMMA: "senyora"},
+    {ORTH: "Srta.", LEMMA: "senyoreta"},
+    {ORTH: "núm", LEMMA: "número"},
+    {ORTH: "St.", LEMMA: "sant"},
+    {ORTH: "Sta.", LEMMA: "santa"}]:
+    _exc[exc_data[ORTH]] = [exc_data]
+
+# Times
+
+_exc["12m."] = [
+    {ORTH: "12"},
+    {ORTH: "m.", LEMMA: "p.m."}]
+
+
+for h in range(1, 12 + 1):
+    for period in ["a.m.", "am"]:
+        _exc["%d%s" % (h, period)] = [
+            {ORTH: "%d" % h},
+            {ORTH: period, LEMMA: "a.m."}]
+    for period in ["p.m.", "pm"]:
+        _exc["%d%s" % (h, period)] = [
+            {ORTH: "%d" % h},
+            {ORTH: period, LEMMA: "p.m."}]
+
+# To keep things clean and readable, it's recommended to only declare the
+# TOKENIZER_EXCEPTIONS at the bottom:
+
+TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/lang/char_classes.py
+++ b/spacy/lang/char_classes.py
@ -16,6 +16,7 @@ _latin = r'[[\p{Ll}||\p{Lu}]&&\p{Latin}]'
 _persian = r'[\p{L}&&\p{Arabic}]'
 _russian_lower = r'[ёа-я]'
 _russian_upper = r'[ЁА-Я]'
+_sinhala = r'[\p{L}&&\p{Sinhala}]'
 _tatar_lower = r'[әөүҗңһ]'
 _tatar_upper = r'[ӘӨҮҖҢҺ]'
 _greek_lower = r'[α-ωάέίόώήύ]'
@ -23,7 +24,7 @@ _greek_upper = r'[Α-ΩΆΈΊΌΏΉΎ]'

 _upper = [_latin_upper, _russian_upper, _tatar_upper, _greek_upper]
 _lower = [_latin_lower, _russian_lower, _tatar_lower, _greek_lower]
-_uncased = [_bengali, _hebrew, _persian]
+_uncased = [_bengali, _hebrew, _persian, _sinhala]

 ALPHA = merge_char_classes(_upper + _lower + _uncased)
 ALPHA_LOWER = merge_char_classes(_lower + _uncased)
--- a/spacy/lang/de/norm_exceptions.py
+++ b/spacy/lang/de/norm_exceptions.py
@ -14,4 +14,5 @@ _exc = {
 NORM_EXCEPTIONS = {}

 for string, norm in _exc.items():
+    NORM_EXCEPTIONS[string] = norm
    NORM_EXCEPTIONS[string.title()] = norm
--- a/spacy/lang/fa/init.py
+++ b/spacy/lang/fa/init.py
@ -1,21 +1,29 @@
 # coding: utf8
 from __future__ import unicode_literals

-from .stop_words import STOP_WORDS
-
-from ..tokenizer_exceptions import BASE_EXCEPTIONS
-from ..norm_exceptions import BASE_NORMS
 from ...language import Language
 from ...attrs import LANG, NORM
 from ...util import update_exc, add_lookups
-
+from ..norm_exceptions import BASE_NORMS
+from .stop_words import STOP_WORDS
+from .lex_attrs import LEX_ATTRS
+from .tokenizer_exceptions import TOKENIZER_EXCEPTIONS
+from .tag_map import TAG_MAP
+from .punctuation import TOKENIZER_SUFFIXES
+from .lemmatizer import LEMMA_RULES, LEMMA_INDEX, LEMMA_EXC

 class PersianDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
-    lex_attr_getters[LANG] = lambda text: 'fa'
+    lex_attr_getters.update(LEX_ATTRS)
    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], BASE_NORMS)
-    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS)
+    lex_attr_getters[LANG] = lambda text: 'fa'
+    tokenizer_exceptions = update_exc(TOKENIZER_EXCEPTIONS)
+    lemma_rules = LEMMA_RULES
+    lemma_index = LEMMA_INDEX
+    lemma_exc = LEMMA_EXC
    stop_words = STOP_WORDS
+    tag_map = TAG_MAP
+    suffixes = TOKENIZER_SUFFIXES


 class Persian(Language):
--- a/spacy/lang/fa/lemmatizer/init.py
+++ b/spacy/lang/fa/lemmatizer/init.py
@ -0,0 +1,32 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ._adjectives import ADJECTIVES
+from ._adjectives_exc import ADJECTIVES_EXC
+from ._nouns import NOUNS
+from ._nouns_exc import NOUNS_EXC
+from ._verbs import VERBS
+from ._verbs_exc import VERBS_EXC
+from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES, PUNCT_RULES
+
+
+LEMMA_INDEX = {
+    'adj': ADJECTIVES,
+    'noun': NOUNS, 
+    'verb': VERBS
+}
+
+LEMMA_RULES = {
+    'adj': ADJECTIVE_RULES, 
+    'noun': NOUN_RULES, 
+    'verb': VERB_RULES,
+    'punct': PUNCT_RULES
+}
+
+LEMMA_EXC = {
+    'adj': ADJECTIVES_EXC, 
+    'noun': NOUNS_EXC,
+    'verb': VERBS_EXC
+}
+
+
--- a/spacy/lang/fa/lemmatizer/_adjectives.py
+++ b/spacy/lang/fa/lemmatizer/_adjectives.py
--- a/spacy/lang/fa/lemmatizer/_adjectives_exc.py
+++ b/spacy/lang/fa/lemmatizer/_adjectives_exc.py
@ -0,0 +1,53 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+# Adjectives extracted from Mojgan Seraji's Persian Universal Dependencies Corpus
+# Below adjectives are exceptions for current adjective lemmatization rules
+ADJECTIVES_EXC = {
+    "بهترین": ("بهتر",),
+    "بهتر": ("بهتر",),
+    "سنگین": ("سنگین",),
+    "بیشترین": ("بیشتر",),
+    "برتر": ("برتر",),
+    "بدبین": ("بدبین",),
+    "متین": ("متین",),
+    "شیرین": ("شیرین",),
+    "معین": ("معین",),
+    "دلنشین": ("دلنشین",),
+    "امین": ("امین",),
+    "متدین": ("متدین",),
+    "تیزبین": ("تیزبین",),
+    "بنیادین": ("بنیادین",),
+    "دروغین": ("دروغین",),
+    "واپسین": ("واپسین",),
+    "خونین": ("خونین",),
+    "مزین": ("مزین",),
+    "خوشبین": ("خوشبین",),
+    "عطرآگین": ("عطرآگین",),
+    "زرین": ("زرین",),
+    "فرجامین": ("فرجامین",),
+    "فقیرنشین": ("فقیرنشین",),
+    "مستتر": ("مستتر",),
+    "چوبین": ("چوبین",),
+    "آغازین": ("آغازین",),
+    "سخن‌چین": ("سخن‌چین",),
+    "مرمرین": ("مرمرین",),
+    "زنده‌تر": ("زنده‌تر",),
+    "صفر‌کیلومتر": ("صفر‌کیلومتر",),
+    "غمگین": ("غمگین",),
+    "نازنین": ("نازنین",),
+    "مثبت": ("مثبت",),
+    "شرمگین": ("شرمگین",),
+    "قرین": ("قرین",),
+    "سوتر": ("سوتر",),
+    "بی‌زین": ("بی‌زین",),
+    "سیمین": ("سیمین",),
+    "رنگین": ("رنگین",),
+    "روشن‌بین": ("روشن‌بین",),
+    "اندوهگین": ("اندوهگین",),
+    "فی‌مابین": ("فی‌مابین",),
+    "لاجوردین": ("لاجوردین",),
+    "برنجین": ("برنجین",),
+    "مشکل‌آفرین": ("مشکل‌آفرین",),
+    "خبرچین": ("خبرچین",),
+}
--- a/spacy/lang/fa/lemmatizer/_lemma_rules.py
+++ b/spacy/lang/fa/lemmatizer/_lemma_rules.py
@ -0,0 +1,64 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+ADJECTIVE_RULES = [
+    ["ین", ""],
+    ["\u200cترین", ""],
+    ["ترین", ""],
+    ["\u200cتر", ""],    
+    ["تر", ""],
+    ["\u200cای", ""],
+#     ["ایی", "ا"],
+#     ["ویی", "و"],
+#     ["ی", ""],
+#     ["مند", ""],
+#     ["گین", ""],
+#     ["مین", ""],
+#     ["ناک", ""],
+#     ["سار", ""],
+#     ["\u200cوار", ""],
+#    ["وار", ""]
+]
+
+
+NOUN_RULES = [
+    ['ایان', 'ا'],
+    ['ویان', 'و'],
+    ['ایانی', 'ا'],
+    ['ویانی', 'و'],
+    ['گان', 'ه'],
+    ['گانی', 'ه'],
+    ['گان', ''],
+    ['گانی', ''],
+    ['ان', ''],
+    ['انی', ''],
+    ['ات', ''],
+    ['ات', 'ه'],
+    ['ات', 'ت'],
+    ['اتی', ''],
+    ['اتی', 'ه'],
+    ['اتی', 'ت'],
+    # ['ین', ''],
+    # ['ینی', ''],
+    # ['ون', ''],
+    # ['ونی', ''],
+    ['\u200cها', ''],
+    ['ها', ''],
+    ['\u200cهای', ''],
+    ['های', ''],
+    ['\u200cهایی', ''],
+    ['هایی', ''],    
+]
+
+
+VERB_RULES = [
+]
+
+
+PUNCT_RULES = [
+    ["“", "\""],
+    ["”", "\""],
+    ["\u2018", "'"],
+    ["\u2019", "'"]
+]
--- a/spacy/lang/fa/lemmatizer/_nouns.py
+++ b/spacy/lang/fa/lemmatizer/_nouns.py
--- a/spacy/lang/fa/lemmatizer/_nouns_exc.py
+++ b/spacy/lang/fa/lemmatizer/_nouns_exc.py
@ -0,0 +1,781 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+NOUNS_EXC = {
+"آثار": ("اثر",),
+"آرا": ("رأی",),
+"آراء": ("رأی",),
+"آفات": ("آفت",),
+"اباطیل": ("باطل",),
+"ائمه": ("امام",),
+"ابرار": ("بر",),
+"ابعاد": ("بعد",),
+"ابنیه": ("بنا",),
+"ابواب": ("باب",),
+"ابیات": ("بیت",),
+"اجداد": ("جد",),
+"اجساد": ("جسد",),
+"اجناس": ("جنس",),
+"اثمار": ("ثمر",),
+"اجرام": ("جرم",),
+"اجسام": ("جسم",),
+"اجنه": ("جن",),
+"احادیث": ("حدیث",),
+"احجام": ("حجم",),
+"احرار": ("حر",),
+"احزاب": ("حزب",),
+"احکام": ("حکم",),
+"اخبار": ("خبر",),
+"اخیار": ("خیر",),
+"ادبا": ("ادیب",),
+"ادعیه": ("دعا",),
+"ادله": ("دلیل",),
+"ادوار": ("دوره",),
+"ادیان": ("دین",),
+"اذهان": ("ذهن",),
+"اذکار": ("ذکر",),
+"اراضی": ("ارض",),
+"ارزاق": ("رزق",),
+"ارقام": ("رقم",),
+"ارواح": ("روح",),
+"ارکان": ("رکن",),
+"ازمنه": ("زمان",),
+"اساتید": ("استاد",),
+"اساطیر": ("اسطوره",),
+"اسامی": ("اسم",),
+"اسرار": ("سر",),
+"اسما": ("اسم",),
+"اسناد": ("سند",),
+"اسیله": ("سوال",),
+"اشجار": ("شجره",),
+"اشخاص": ("شخص",),
+"اشرار": ("شر",),
+"اشربه": ("شراب",),
+"اشعار": ("شعر",),
+"اشقیا": ("شقی",),
+"اشیا": ("شی",),
+"اشباح": ("شبح",),
+"اصدقا": ("صدیق",),
+"اصناف": ("صنف",),
+"اصنام": ("صنم",),
+"اصوات": ("صوت",),
+"اصول": ("اصل",),
+"اضداد": ("ضد",),
+"اطبا": ("طبیب",),
+"اطعمه": ("طعام",),
+"اطفال": ("طفل",),
+"الطاف": ("لطف",),
+"اعدا": ("عدو",),
+"اعزا": ("عزیز",),
+"اعضا": ("عضو",),
+"اعماق": ("عمق",),
+"الفاظ": ("لفظ",),
+"اعناب": ("عنب",),
+"اغذیه": ("غذا",),
+"اغراض": ("غرض",),
+"افراد": ("فرد",),
+"افعال": ("فعل",),
+"افلاک": ("فلک",),
+"افکار": ("فکر",),
+"اقالیم": ("اقلیم",),
+"اقربا": ("قریب",),
+"اقسام": ("قسم",),
+"اقشار": ("قشر",),
+"اقفال": ("قفل",),
+"اقلام": ("قلم",),
+"اقوال": ("قول",),
+"اقوام": ("قوم",),
+"البسه": ("لباس",),
+"الحام": ("لحم",),
+"الحکام": ("الحاکم",),
+"القاب": ("لقب",),
+"الواح": ("لوح",),
+"الکبار": ("الکبیر",),
+"اماکن": ("مکان",),
+"امثال": ("مثل",),
+"امراض": ("مرض",),
+"امم": ("امت",),
+"امواج": ("موج",),
+"اموال": ("مال",),
+"امور": ("امر",),
+"امیال": ("میل",),
+"انبیا": ("نبی",),
+"انجم": ("نجم",),
+"انظار": ("نظر",),
+"انفس": ("نفس",),
+"انهار": ("نهر",),
+"انواع": ("نوع",),
+"اهالی": ("اهل",),
+"اهداف": ("هدف",),
+"اواخر": ("آخر",),
+"اواسط": ("وسط",),
+"اوایل": ("اول",),
+"اوراد": ("ورد",),
+"اوراق": ("ورق",),
+"اوزان": ("وزن",),
+"اوصاف": ("وصف",),
+"اوضاع": ("وضع",),
+"اوقات": ("وقت",),
+"اولاد": ("ولد",),
+"اولیا": ("ولی",),
+"اولیاء": ("ولی",),
+"اوهام": ("وهم",),
+"اکاذیب": ("اکذوبه",),
+"اکفان": ("کفن",),
+"ایالات": ("ایالت",),
+"ایام": ("یوم",),
+"ایتام": ("یتیم",),
+"بشایر": ("بشارت",),
+"بصایر": ("بصیرت",),
+"بطون": ("بطن",),
+"بنادر": ("بندر",),
+"بیوت": ("بیت",),
+"تجار": ("تاجر",),
+"تجارب": ("تجربه",),
+"تدابیر": ("تدبیر",),
+"تعاریف": ("تعریف",),
+"تلامیذ": ("تلمیذ",),
+"تهم": ("تهمت",),
+"توابیت": ("تابوت",),
+"تواریخ": ("تاریخ",),
+"جبال": ("جبل",),
+"جداول": ("جدول",),
+"جدود": ("جد",),
+"جراثیم": ("جرثوم",),
+"جرایم": ("جرم",),
+"جرائم": ("جرم",),
+"جزئیات": ("جزء",),
+"جزایر": ("جزیره",),
+"جزییات": ("جزء",),
+"جنایات": ("جنایت",),
+"جهات": ("جهت",),
+"جوامع": ("جامعه",),
+"حدود": ("حد",),
+"حروف": ("حرف",),
+"حقایق": ("حقیقت",),
+"حقوق": ("حق",),
+"حوادث": ("حادثه",),
+"حواشی": ("حاشیه",),
+"حوایج": ("حاجت",),
+"حوائج": ("حاجت",),
+"حکما": ("حکیم",),
+"خدمات": ("خدمت",),
+"خدمه": ("خادم",),
+"خدم": ("خادم",),
+"خزاین": ("خزینه",),
+"خصایص": ("خصیصه",),
+"خطوط": ("خط",),
+"دراهم": ("درهم",),
+"دروس": ("درس",),
+"دفاتر": ("دفتر",),
+"دلایل": ("دلیل",),
+"دلائل": ("دلیل",),
+"ذخایر": ("ذخیره",),
+"ذنوب": ("ذنب",),
+"ربوع": ("ربع",),
+"رجال": ("رجل",),
+"رسایل": ("رسال",),
+"رسوم": ("رسم",),
+"روابط": ("رابطه",),
+"روسا": ("رئیس",),
+"رئوس": ("راس",),
+"ریوس": ("راس",),
+"زوار": ("زائر",),
+"ساعات": ("ساعت",),
+"سبل": ("سبیل",),
+"سطوح": ("سطح",),
+"سطور": ("سطر",),
+"سعدا": ("سعید",),
+"سفن": ("سفینه",),
+"سقاط": ("ساقی",),
+"سلاطین": ("سلطان",),
+"سلایق": ("سلیقه",),
+"سموم": ("سم",),
+"سنن": ("سنت",),
+"سنین": ("سن",),
+"سهام": ("سهم",),
+"سوابق": ("سابقه",),
+"سواحل": ("ساحل",),
+"سوانح": ("سانحه",),
+"شباب": ("شاب",),
+"شرایط": ("شرط",),
+"شروط": ("شرط",),
+"شرکا": ("شریک",),
+"شعب": ("شعبه",),
+"شعوب": ("شعب",),
+"شموس": ("شمس",),
+"شهدا": ("شهید",),
+"شهور": ("شهر",),
+"شواهد": ("شاهد",),
+"شوون": ("شان",),
+"شکات": ("شاکی",),
+"شیاطین": ("شیطان",),
+"صبیان": ("صبی",),
+"صحف": ("صحیفه",),
+"صغار": ("صغیر",),
+"صفوف": ("صف",),
+"صنادیق": ("صندوق",),
+"ضعفا": ("ضعیف",),
+"ضمایر": ("ضمیر",),
+"ضوابط": ("ضابطه",),
+"طرق": ("طریق",),
+"طلاب": ("طلبه",),
+"طواغیت": ("طاغوت",),
+"طیور": ("طیر",),
+"عادات": ("عادت",),
+"عباد": ("عبد",),
+"عبارات": ("عبارت",),
+"عجایب": ("عجیب",),
+"عزایم": ("عزیمت",),
+"عشایر": ("عشیره",),
+"عطور": ("عطر",),
+"عظما": ("عظیم",),
+"عقاید": ("عقیده",),
+"عقائد": ("عقیده",),
+"علائم": ("علامت",),
+"علایم": ("علامت",),
+"علما": ("عالم",),
+"علوم": ("علم",),
+"عمال": ("عمله",),
+"عناصر": ("عنصر",),
+"عناوین": ("عنوان",),
+"عواطف": ("عاطفه",),
+"عواقب": ("عاقبت",),
+"عوالم": ("عالم",),
+"عوامل": ("عامل",),
+"عیوب": ("عیب",),
+"عیون": ("عین",),
+"غدد": ("غده",),
+"غرف": ("غرفه",),
+"غیوب": ("غیب",),
+"غیوم": ("غیم",),
+"فرایض": ("فریضه",),
+"فضایل": ("فضیلت",),
+"فضلا": ("فاضل",),
+"فواصل": ("فاصله",),
+"فواید": ("فایده",),
+"قبایل": ("قبیله",),
+"قرون": ("قرن",),
+"قصص": ("قصه",),
+"قضات": ("قاضی",),
+"قضایا": ("قضیه",),
+"قلل": ("قله",),
+"قلوب": ("قلب",),
+"قواعد": ("قاعده",),
+"قوانین": ("قانون",),
+"قیود": ("قید",),
+"لطایف": ("لطیفه",),
+"لیالی": ("لیل",),
+"مباحث": ("مبحث",),
+"مبالغ": ("مبلغ",),
+"متون": ("متن",),
+"مجالس": ("مجلس",),
+"محاصیل": ("محصول",),
+"محافل": ("محفل",),
+"محاکم": ("محکمه",),
+"مخارج": ("خرج",),
+"مدارس": ("مدرسه",),
+"مدارک": ("مدرک",),
+"مداین": ("مدینه",),
+"مدن": ("مدینه",),
+"مراتب": ("مرتبه",),
+"مراتع": ("مرتع",),
+"مراجع": ("مرجع",),
+"مراحل": ("مرحله",),
+"مسائل": ("مسئله",),
+"مساجد": ("مسجد",),
+"مساعی": ("سعی",),
+"مسالک": ("مسلک",),
+"مساکین": ("مسکین",),
+"مسایل": ("مسئله",),
+"مشاعر": ("مشعر",),
+"مشاغل": ("شغل",),
+"مشایخ": ("شیخ",),
+"مصادر": ("مصدر",),
+"مصادق": ("مصداق",),
+"مصادیق": ("مصداق",),
+"مصاعب": ("مصعب",),
+"مضار": ("ضرر",),
+"مضامین": ("مضمون",),
+"مطالب": ("مطلب",),
+"مظالم": ("مظلمه",),
+"مظاهر": ("مظهر",),
+"اهرام": ("هرم",),
+"معابد": ("معبد",),
+"معابر": ("معبر",),
+"معاجم": ("معجم",),
+"معادن": ("معدن",),
+"معاذیر": ("عذر",),
+"معارج": ("معراج",),
+"معاصی": ("معصیت",),
+"معالم": ("معلم",),
+"معایب": ("عیب",),
+"مفاسد": ("مفسده",),
+"مفاصل": ("مفصل",),
+"مفاهیم": ("مفهوم",),
+"مقابر": ("مقبره",),
+"مقاتل": ("مقتل",),
+"مقادیر": ("مقدار",),
+"مقاصد": ("مقصد",),
+"مقاطع": ("مقطع",),
+"ملابس": ("ملبس",),
+"ملوک": ("ملک",),
+"ممالک": ("مملکت",),
+"منابع": ("منبع",),
+"منازل": ("منزل",),
+"مناسبات": ("مناسبت",),
+"مناسک": ("منسک",),
+"مناطق": ("منطقه",),
+"مناظر": ("منظره",),
+"منافع": ("منفعت",),
+"موارد": ("مورد",),
+"مواضع": ("موضع",),
+"مواضیع": ("موضوع",),
+"مواطن": ("موطن",),
+"مواقع": ("موقع",),
+"موانع": ("مانع",),
+"مکاتب": ("مکتب",),
+"مکاتیب": ("مکتوب",),
+"مکارم": ("مکرمه",),
+"میادین": ("میدان",),
+"نتایج": ("نتیجه",),
+"نعم": ("نعمت",),
+"نفوس": ("نفس",),
+"نقاط": ("نقطه",),
+"نواحی": ("ناحیه",),
+"نوافذ": ("نافذه",),
+"نواقص": ("نقص",),
+"نوامیس": ("ناموس",),
+"نکات": ("نکته",),
+"نیات": ("نیت",),
+"هدایا": ("هدیه",),
+"واقعیات": ("واقعیت",),
+"وجوه": ("وجه",),
+"وحوش": ("وحش",),
+"وزرا": ("وزیر",),
+"وسایل": ("وسیله",),
+"وصایا": ("وصیت",),
+"وظایف": ("وظیفه",),
+"وعاظ": ("واعظ",),
+"وقایع": ("واقعه",),
+"کتب": ("کتاب",),
+"کسبه": ("کاسب",),
+"کفار": ("کافر",),
+"کواکب": ("کوکب",),
+"تصاویر": ("تصویر",),
+"صنوف": ("صنف",),
+"اجزا": ("جزء",),
+"اجزاء": ("جزء",),
+"ذخائر": ("ذخیره",),
+"خسارات": ("خسارت",),
+"عشاق": ("عاشق",),
+"تصانیف": ("تصنیف",),
+"دﻻیل": ("دلیل",),
+"قوا": ("قوه",),
+"ملل": ("ملت",),
+"جوایز": ("جایزه",),
+"جوائز": ("جایزه",),
+"ابعاض": ("بعض",),
+"اتباع": ("تبعه",),
+"اجلاس": ("جلسه",),
+"احشام": ("حشم",),
+"اخلاف": ("خلف",),
+"ارامنه": ("ارمنی",),
+"ازواج": ("زوج",),
+"اسباط": ("سبط",),
+"اعداد": ("عدد",),
+"اعصار": ("عصر",),
+"اعقاب": ("عقبه",),
+"اعیاد": ("عید",),
+"اعیان": ("عین",),
+"اغیار": ("غیر",),
+"اقارب": ("اقرب",),
+"اقران": ("قرن",),
+"اقساط": ("قسط",),
+"امنای": ("امین",),
+"امنا": ("امین",),
+"اموات": ("میت",),
+"اناجیل": ("انجیل",),
+"انحا": ("نحو",),
+"انساب": ("نسب",),
+"انوار": ("نور",),
+"اوامر": ("امر",),
+"اوائل": ("اول",),
+"اوصیا": ("وصی",),
+"آحاد": ("احد",),
+"براهین": ("برهان",),
+"تعابیر": ("تعبیر",),
+"تعالیم": ("تعلیم",),
+"تفاسیر": ("تفسیر",),
+"تکالیف": ("تکلیف",),
+"تماثیل": ("تمثال",),
+"جنود": ("جند",),
+"جوانب": ("جانب",),
+"حاجات": ("حاجت",),
+"حرکات": ("حرکت",),
+"حضرات": ("حضرت",),
+"حکایات": ("حکایت",),
+"حوالی": ("حول",),
+"خصایل": ("خصلت",),
+"خلایق": ("خلق",),
+"خلفا": ("خلیفه",),
+"دعاوی": ("دعوا",),
+"دیون": ("دین",),
+"ذراع": ("ذرع",),
+"رعایا": ("رعیت",),
+"روایات": ("روایت",),
+"شعرا": ("شاعر",),
+"شکایات": ("شکایت",),
+"شهوات": ("شهوت",),
+"شیوخ": ("شیخ",),
+"شئون": ("شأن",),
+"طبایع": ("طبع",),
+"ظروف": ("ظرف",),
+"ظواهر": ("ظاهر",),
+"عبادات": ("عبادت",),
+"عرایض": ("عریضه",),
+"عرفا": ("عارف",),
+"عروق": ("عرق",),
+"عساکر": ("عسکر",),
+"علماء": ("عالم",),
+"فتاوا": ("فتوا",),
+"فراعنه": ("فرعون",),
+"فرامین": ("فرمان",),
+"فروض": ("فرض",),
+"فروع": ("فرع",),
+"فصول": ("فصل",),
+"فقها": ("فقیه",),
+"قبور": ("قبر",),
+"قبوض": ("قبض",),
+"قدوم": ("قدم",),
+"قرائات": ("قرائت",),
+"قرائن": ("قرینه",),
+"لغات": ("لغت",),
+"مجامع": ("مجمع",),
+"مخازن": ("مخزن",),
+"مدارج": ("درجه",),
+"مذاهب": ("مذهب",),
+"مراکز": ("مرکز",),
+"مصارف": ("مصرف",),
+"مطامع": ("طمع",),
+"معانی": ("معنی",),
+"مناصب": ("منصب",),
+"منافذ": ("منفذ",),
+"مواریث": ("میراث",),
+"موازین": ("میزان",),
+"موالی": ("مولی",),
+"مواهب": ("موهبت",),
+"نسوان": ("نسا",),
+"نصوص": ("نص",),
+"نظایر": ("نظیر",),
+"نقایص": ("نقص",),
+"نقوش": ("نقش",),
+"ولایات": ("ولایت",),
+"هیئات": ("هیأت",),
+"جماهیر": ("جمهوری",),
+"خصائص": ("خصیصه",),
+"دقایق": ("دقیقه",),
+"رذایل": ("رذیلت",),
+"طوایف": ("طایفه",),
+"علامات": ("علامت",),
+"علایق": ("علاقه",),
+"علل": ("علت",),
+"غرایز": ("غریزه",),
+"غرائز": ("غریزه",),
+"غنایم": ("غنیمت",),
+"فرائض": ("فریضه",),
+"فضائل": ("فضیلت",),
+"فقرا": ("فقیر",),
+"فلاسفه": ("فیلسوف",),
+"فواحش": ("فاحشه",),
+"قصائد": ("قصیده",),
+"قصاید": ("قصیده",),
+"قوائد": ("قائده",),
+"مزارع": ("مزرعه",),
+"مصائب": ("مصیبت",),
+"معارف": ("معرفت",),
+"نصایح": ("نصیحت",),
+"وثایق": ("وثیقه",),
+"وظائف": ("وظیفه",),
+"توابین": ("تواب",),
+"رفقا": ("رفیق",),
+"رقبا": ("رقیب",),
+"زحمات": ("زحمت",),
+"زعما": ("زعیم",),
+"زوایا": ("زاویه",),
+"سماوات": ("سما",),
+"علوفه": ("علف",),
+"غایات": ("غایت",),
+"فنون": ("فن",),
+"لذات": ("لذت",),
+"نعمات": ("نعمت",),
+"امراء": ("امیر",),
+"امرا": ("امیر",),
+"دهاقین": ("دهقان",),
+"سنوات": ("سنه",),
+"عمارات": ("عمارت",),
+"فتوح": ("فتح",),
+"لذائذ": ("لذیذ",),
+"لذایذ": ("لذیذ", "لذت",),
+"تکایا": ("تکیه",),
+"صفات": ("صفت",),
+"خصوصیات": ("خصوصیت",),
+"کیفیات": ("کیفیت",),
+"حملات": ("حمله",),
+"شایعات": ("شایعه",),
+"صدمات": ("صدمه",),
+"غلات": ("غله",),
+"کلمات": ("کلمه",),
+"مبارزات": ("مبارزه",),
+"مراجعات": ("مراجعه",),
+"مطالبات": ("مطالبه",),
+"مکاتبات": ("مکاتبه",),
+"نشریات": ("نشریه",),
+"بحور": ("بحر",),
+"تحقیقات": ("تحقیق",),
+"مکالمات": ("مکالمه",),
+"ریزمکالمات": ("ریزمکالمه",),
+"تجربیات": ("تجربه",),
+"جملات": ("جمله",),
+"حالات": ("حالت",),
+"حجاج": ("حاجی",),
+"حسنات": ("حسنه",),
+"حشرات": ("حشره",),
+"خاطرات": ("خاطره",),
+"درجات": ("درجه",),
+"دفعات": ("دفعه",),
+"سیارات": ("سیاره",),
+"شبهات": ("شبهه",),
+"ضایعات": ("ضایعه",),
+"ضربات": ("ضربه",),
+"طبقات": ("طبقه",),
+"فرضیات": ("فرضیه",),
+"قطرات": ("قطره",),
+"قطعات": ("قطعه",),
+"قلاع": ("قلعه",),
+"کشیشان": ("کشیش",),
+"مادیات": ("مادی",),
+"مباحثات": ("مباحثه",),
+"مجاهدات": ("مجاهدت",),
+"محلات": ("محله",),
+"مداخلات": ("مداخله",),
+"مشقات": ("مشقت",),
+"معادلات": ("معادله",),
+"معوقات": ("معوقه",),
+"منویات": ("منویه",),
+"موقوفات": ("موقوفه",),
+"موسسات": ("موسسه",),
+"حلقات": ("حلقه",),
+"ایات": ("ایه",),
+"اصلح": ("صالح",),
+"اظهر": ("ظاهر",),
+"آیات": ("آیه",),
+"برکات": ("برکت",),
+"جزوات": ("جزوه",),
+"خطابات": ("خطابه",),
+"دوایر": ("دایره",),
+"روحیات": ("روحیه",),
+"متهمان": ("متهم",),
+"مجاری": ("مجرا",),
+"مشترکات": ("مشترک",),
+"ورثه": ("وارث",),
+"وکلا": ("وکیل",),
+"نقبا": ("نقیب",),
+"سفرا": ("سفیر",),
+"مآخذ": ("مأخذ",),
+"احوال": ("حال",),
+"آلام": ("الم",),
+"مزایا": ("مزیت",),
+"عقلا": ("عاقل",),
+"مشاهد": ("مشهد",),
+"ظلمات": ("ظلمت",),
+"خفایا": ("خفیه",),
+"مشاهدات": ("مشاهده",),
+"امامان": ("امام",),
+"سگان": ("سگ",),
+"نظریات": ("نظریه",),
+"آفاق": ("افق",),
+"آمال": ("امل",),
+"دکاکین": ("دکان",),
+"قصبات": ("قصبه",),
+"مضرات": ("مضرت",),
+"قبائل": ("قبیله",),
+"مجانین": ("مجنون",),
+"سيئات": ("سیئه",),
+"صدقات": ("صدقه",),
+"کثافات": ("کثافت",),
+"کسورات": ("کسر",),
+"معالجات": ("معالجه",),
+"مقابلات": ("مقابله",),
+"مناظرات": ("مناظره",),
+"ناملايمات": ("ناملایمت",),
+"وجوهات": ("وجه",),
+"مصادرات": ("مصادره",),
+"ملمعات": ("ملمع",),
+"اولویات": ("اولویت",),
+"جمرات": ("جمره",),
+"زیارات": ("زیارت",),
+"عقبات": ("عقبه",),
+"کرامات": ("کرامت",),
+"مراقبات": ("مراقبه",),
+"نجاسات": ("نجاست",),
+"هجویات": ("هجو",),
+"تبدلات": ("تبدل",),
+"روات": ("راوی",),
+"فیوضات": ("فیض",),
+"کفارات": ("کفاره",),
+"نذورات": ("نذر",),
+"حفریات": ("حفر",),
+"عنایات": ("عنایت",),
+"جراحات": ("جراحت",),
+"ثمرات": ("ثمره",),
+"حکام": ("حاکم",),
+"مرسولات": ("مرسوله",),
+"درایات": ("درایت",),
+"سیئات": ("سیئه",),
+"عدوات": ("عداوت",),
+"عشرات": ("عشره",),
+"عقوبات": ("عقوبه",),
+"عقودات": ("عقود",),
+"کثرات": ("کثرت",),
+"مواجهات": ("مواجهه",),
+"مواصلات": ("مواصله",),
+"اجوبه": ("جواب",),
+"اضلاع": ("ضلع",),
+"السنه": ("لسان",),
+"اشتات": ("شت",),
+"دعوات": ("دعوت",),
+"صعوبات": ("صعوبت",),
+"عفونات": ("عفونت",),
+"علوفات": ("علوفه",),
+"غرامات": ("غرامت",),
+"فارقات": ("فارقت",),
+"لزوجات": ("لزوجت",),
+"محللات": ("محلله",),
+"مسافات": ("مسافت",),
+"مسافحات": ("مسافحه",),
+"مسامرات": ("مسامره",),
+"مستلذات": ("مستلذ",),
+"مسرات": ("مسرت",),
+"مشافهات": ("مشافهه",),
+"مشاهرات": ("مشاهره",),
+"معروشات": ("معروشه",),
+"مجادلات": ("مجادله",),
+"ابغاض": ("بغض",),
+"اجداث": ("جدث",),
+"اجواز": ("جوز",),
+"اجواد": ("جواد",),
+"ازاهیر": ("ازهار",),
+"عوائد": ("عائده",),
+"احافیر": ("احفار",),
+"احزان": ("حزن",),
+"آنام": ("انام",),
+"احباب": ("حبیب",),
+"نوابغ": ("نابغه",),
+"بینات": ("بینه",),
+"حوالات": ("حواله",),
+"حوالجات": ("حواله",),
+"دستجات": ("دسته",),
+"شمومات": ("شموم",),
+"طاقات": ("طاقه",),
+"علاقات": ("علاقه",),
+"مراسلات": ("مراسله",),
+"موجهات": ("موجه",),
+"اقویا": ("قوی",),
+"اغنیا": ("غنی",),
+"بلایا": ("بلا",),
+"خطایا": ("خطا",),
+"ثنایا": ("ثنا",),
+"لوایح": ("لایحه",),
+"غزلیات": ("غزل",),
+"اشارات": ("اشاره",),
+"رکعات": ("رکعت",),
+"امثالهم": ("مثل",),
+"تشنجات": ("تشنج",),
+"امانات": ("امانت",),
+"بریات": ("بریت",),
+"توست": ("تو",),
+"حبست": ("حبس",),
+"حیثیات": ("حیثیت",),
+"شامات": ("شامه",),
+"قبالات": ("قباله",),
+"قرابات": ("قرابت",),
+"مطلقات": ("مطلقه",),
+"نزلات": ("نزله",),
+"بکمان": ("بکیم",),
+"روشان": ("روشن",),
+"مسانید": ("مسند",),
+"ناحیت": ("ناحیه",),
+"رسوله": ("رسول",),
+"دانشجویان": ("دانشجو",),
+"روحانیون": ("روحانی",),
+"قرون": ("قرن",),
+"انقلابیون": ("انقلابی",),
+"قوانین": ("قانون",),
+"مجاهدین": ("مجاهد",),
+"محققین": ("محقق",),
+"متهمین": ("متهم",),
+"مهندسین": ("مهندس",),
+"مؤمنین": ("مؤمن",),
+"مسئولین": ("مسئول",),
+"مشرکین": ("مشرک",),
+"مخاطبین": ("مخاطب",),
+"مأمورین": ("مأمور",),
+"سلاطین": ("سلطان",),
+"مضامین": ("مضمون",),
+"منتخبین": ("منتخب",),
+"متحدین": ("متحد",),
+"متخصصین": ("متخصص",),
+"مسوولین": ("مسوول",),
+"شیاطین": ("شیطان",),
+"مباشرین": ("مباشر",),
+"منتقدین": ("منتقد",),
+"موسسین": ("موسس",),
+"مسؤلین": ("مسؤل",),
+"متحجرین": ("متحجر",),
+"مهاجرین": ("مهاجر",),
+"مترجمین": ("مترجم",),
+"مدعوین": ("مدعو",),
+"مشترکین": ("مشترک",),
+"معصومین": ("معصوم",),
+"مسابقات": ("مسابقه",),
+"معانی": ("معنی",),
+"مطالعات": ("مطالعه",),
+"نکات": ("نکته",),
+"خصوصیات": ("خصوصیت",),
+"خدمات": ("خدمت",),
+"نشریات": ("نشریه",),
+"ساعات": ("ساعت",),
+"بزرگان": ("بزرگ",),
+"خسارات": ("خسارت",),
+"شیعیان": ("شیعه",),
+"واقعیات": ("واقعیت",),
+"مذاکرات": ("مذاکره",),
+"حشرات": ("حشره",),
+"طبقات": ("طبقه",),
+"شکایات": ("شکایت",),
+"ابیات": ("بیت",),
+"شایعات": ("شایعه",),
+"ضربات": ("ضربه",),
+"مقالات": ("مقاله",),
+"اوقات": ("وقت",),
+"عباراتی": ("عبارت",),
+"سالیان": ("سال",),
+"زحمات": ("زحمت",),
+"عبارات": ("عبارت",),
+"لغات": ("لغت",),
+"نیات": ("نیت",),
+"مطالبات": ("مطالبه",),
+"مطالب": ("مطلب",),
+"خلقیات": ("خلق",),
+"نکات": ("نکته",),
+"بزرگان": ("بزرگ",),
+"ابیاتی": ("بیت",),
+"محرمات": ("حرام",),
+"اوزان": ("وزن",),
+"اخلاقیات": ("اخلاق",),
+"سبزیجات": ("سبزی",),
+"اضافات": ("اضافه",),
+"قضات": ("قاضی",),
+}
--- a/spacy/lang/fa/lemmatizer/_verbs.py
+++ b/spacy/lang/fa/lemmatizer/_verbs.py
@ -0,0 +1,6 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+VERBS = set("""
+""".split())
--- a/spacy/lang/fa/lemmatizer/_verbs_exc.py
+++ b/spacy/lang/fa/lemmatizer/_verbs_exc.py
@ -0,0 +1,647 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+verb_roots = """
+#هست
+آخت#آهنج
+آراست#آرا
+آراماند#آرامان
+آرامید#آرام
+آرمید#آرام
+آزرد#آزار
+آزمود#آزما
+آسود#آسا
+آشامید#آشام
+آشفت#آشوب
+آشوبید#آشوب
+آغازید#آغاز
+آغشت#آمیز
+آفرید#آفرین
+آلود#آلا
+آمد#آ
+آمرزید#آمرز
+آموخت#آموز
+آموزاند#آموزان
+آمیخت#آمیز
+آورد#آر
+آورد#آور
+آویخت#آویز
+آکند#آکن
+آگاهانید#آگاهان
+ارزید#ارز
+افتاد#افت
+افراخت#افراز
+افراشت#افراز
+افروخت#افروز
+افروزید#افروز
+افزود#افزا
+افسرد#افسر
+افشاند#افشان
+افکند#افکن
+افگند#افگن
+انباشت#انبار
+انجامید#انجام
+انداخت#انداز
+اندوخت#اندوز
+اندود#اندا
+اندیشید#اندیش
+انگاشت#انگار
+انگیخت#انگیز
+انگیزاند#انگیزان
+ایستاد#ایست
+ایستاند#ایستان
+باخت#باز
+باراند#باران
+بارگذاشت#بارگذار
+بارید#بار
+باز#بازخواه
+بازآفرید#بازآفرین
+بازآمد#بازآ
+بازآموخت#بازآموز
+بازآورد#بازآور
+بازایستاد#بازایست
+بازتابید#بازتاب
+بازجست#بازجو
+بازخواند#بازخوان
+بازخوراند#بازخوران
+بازداد#بازده
+بازداشت#بازدار
+بازرساند#بازرسان
+بازرسانید#بازرسان
+باززد#باززن
+بازستاند#بازستان
+بازشمارد#بازشمار
+بازشمرد#بازشمار
+بازشمرد#بازشمر
+بازشناخت#بازشناس
+بازشناساند#بازشناسان
+بازفرستاد#بازفرست
+بازماند#بازمان
+بازنشست#بازنشین
+بازنمایاند#بازنمایان
+بازنهاد#بازنه
+بازنگریست#بازنگر
+بازپرسید#بازپرس
+بازگذارد#بازگذار
+بازگذاشت#بازگذار
+بازگرداند#بازگردان
+بازگردانید#بازگردان
+بازگردید#بازگرد
+بازگرفت#بازگیر
+بازگشت#بازگرد
+بازگشود#بازگشا
+بازگفت#بازگو
+بازیافت#بازیاب
+بافت#باف
+بالید#بال
+باوراند#باوران
+بایست#باید
+بخشود#بخش
+بخشود#بخشا
+بخشید#بخش
+بر#برخواه
+برآشفت#برآشوب
+برآمد#برآ
+برآورد#برآور
+برازید#براز
+برافتاد#برافت
+برافراخت#برافراز
+برافراشت#برافراز
+برافروخت#برافروز
+برافشاند#برافشان
+برافکند#برافکن
+براند#بران
+برانداخت#برانداز
+برانگیخت#برانگیز
+بربست#بربند
+برتاباند#برتابان
+برتابید#برتاب
+برتافت#برتاب
+برتنید#برتن
+برجهید#برجه
+برخاست#برخیز
+برخورد#برخور
+برد#بر
+برداشت#بردار
+بردمید#بردم
+برزد#برزن
+برشد#برشو
+برشمارد#برشمار
+برشمرد#برشمار
+برشمرد#برشمر
+برنشاند#برنشان
+برنشانید#برنشان
+برنشست#برنشین
+برنهاد#برنه
+برچید#برچین
+برکرد#برکن
+برکشید#برکش
+برکند#برکن
+برگذشت#برگذر
+برگرداند#برگردان
+برگردانید#برگردان
+برگردید#برگرد
+برگرفت#برگیر
+برگزید#برگزین
+برگشت#برگرد
+برگشود#برگشا
+برگمارد#برگمار
+برگمارید#برگمار
+برگماشت#برگمار
+برید#بر
+بست#بند
+بلعید#بلع
+بود#باش
+بوسید#بوس
+بویید#بو
+بیخت#بیز
+بیخت#بوز
+تاباند#تابان
+تابید#تاب
+تاخت#تاز
+تاراند#تاران
+تازاند#تازان
+تازید#تاز
+تافت#تاب
+ترادیسید#ترادیس
+تراشاند#تراشان
+تراشید#تراش
+تراوید#تراو
+ترساند#ترسان
+ترسید#ترس
+ترشاند#ترشان
+ترشید#ترش
+ترکاند#ترکان
+ترکید#ترک
+تفتید#تفت
+تمرگید#تمرگ
+تنید#تن
+توانست#توان
+توفید#توف
+تپاند#تپان
+تپید#تپ
+تکاند#تکان
+تکانید#تکان
+جست#جه
+جست#جو
+جنباند#جنبان
+جنبید#جنب
+جنگید#جنگ
+جهاند#جهان
+جهید#جه
+جوشاند#جوشان
+جوشانید#جوشان
+جوشید#جوش
+جويد#جو
+جوید#جو
+خاراند#خاران
+خارید#خار
+خاست#خیز
+خایید#خا
+خراشاند#خراشان
+خراشید#خراش
+خرامید#خرام
+خروشید#خروش
+خرید#خر
+خزید#خز
+خسبید#خسب
+خشکاند#خشکان
+خشکید#خشک
+خفت#خواب
+خلید#خل
+خماند#خمان
+خمید#خم
+خنداند#خندان
+خندانید#خندان
+خندید#خند
+خواباند#خوابان
+خوابانید#خوابان
+خوابید#خواب
+خواست#خواه
+خواست#خیز
+خواند#خوان
+خوراند#خوران
+خورد#خور
+خیزاند#خیزان
+خیساند#خیسان
+داد#ده
+داشت#دار
+دانست#دان
+در#درخواه
+درآمد#درآ
+درآمیخت#درآمیز
+درآورد#درآور
+درآویخت#درآویز
+درافتاد#درافت
+درافکند#درافکن
+درانداخت#درانداز
+درانید#دران
+دربرد#دربر
+دربرگرفت#دربرگیر
+درخشاند#درخشان
+درخشانید#درخشان
+درخشید#درخش
+درداد#درده
+دررفت#دررو
+درماند#درمان
+درنمود#درنما
+درنوردید#درنورد
+درود#درو
+دروید#درو
+درکرد#درکن
+درکشید#درکش
+درگذشت#درگذر
+درگرفت#درگیر
+دریافت#دریاب
+درید#در
+دزدید#دزد
+دمید#دم
+دواند#دوان
+دوخت#دوز
+دوشید#دوش
+دوید#دو
+دید#بین
+راند#ران
+ربود#ربا
+ربود#روب
+رخشید#رخش
+رساند#رسان
+رسانید#رسان
+رست#ره
+رست#رو
+رسید#رس
+رشت#ریس
+رفت#رو
+رفت#روب
+رقصاند#رقصان
+رقصید#رقص
+رماند#رمان
+رمانید#رمان
+رمید#رم
+رنجاند#رنجان
+رنجانید#رنجان
+رنجید#رنج
+رندید#رند
+رهاند#رهان
+رهانید#رهان
+رهید#ره
+روبید#روب
+روفت#روب
+رویاند#رویان
+رویانید#رویان
+رویید#رو
+رویید#روی
+ریخت#ریز
+رید#رین
+ریدن#رین
+ریسید#ریس
+زاد#زا
+زارید#زار
+زایاند#زایان
+زایید#زا
+زد#زن
+زدود#زدا
+زیست#زی
+ساباند#سابان
+سابید#ساب
+ساخت#ساز
+سایید#سا
+ستاد#ستان
+ستاند#ستان
+سترد#ستر
+ستود#ستا
+ستیزید#ستیز
+سراند#سران
+سرایید#سرا
+سرشت#سرش
+سرود#سرا
+سرکشید#سرکش
+سرگرفت#سرگیر
+سرید#سر
+سزید#سز
+سفت#سنب
+سنجید#سنج
+سوخت#سوز
+سود#سا
+سوزاند#سوزان
+سپارد#سپار
+سپرد#سپار
+سپرد#سپر
+سپوخت#سپوز
+سگالید#سگال
+شاشید#شاش
+شایست#
+شایست#شاید
+شتاباند#شتابان
+شتابید#شتاب
+شتافت#شتاب
+شد#شو
+شست#شو
+شست#شوی
+شلید#شل
+شمار#شمر
+شمارد#شمار
+شمرد#شمار
+شمرد#شمر
+شناخت#شناس
+شناساند#شناسان
+شنفت#شنو
+شنید#شنو
+شوتید#شوت
+شوراند#شوران
+شورید#شور
+شکافت#شکاف
+شکاند#شکان
+شکاند#شکن
+شکست#شکن
+شکفت#شکف
+طلبید#طلب
+طپید#طپ
+غراند#غران
+غرید#غر
+غلتاند#غلتان
+غلتانید#غلتان
+غلتید#غلت
+غلطاند#غلطان
+غلطانید#غلطان
+غلطید#غلط
+فرا#فراخواه
+فراخواند#فراخوان
+فراداشت#فرادار
+فرارسید#فرارس
+فرانمود#فرانما
+فراگرفت#فراگیر
+فرستاد#فرست
+فرسود#فرسا
+فرمود#فرما
+فرهیخت#فرهیز
+فرو#فروخواه
+فروآمد#فروآ
+فروآورد#فروآور
+فروافتاد#فروافت
+فروافکند#فروافکن
+فروبرد#فروبر
+فروبست#فروبند
+فروخت#فروش
+فروخفت#فروخواب
+فروخورد#فروخور
+فروداد#فروده
+فرودوخت#فرودوز
+فرورفت#فرورو
+فروریخت#فروریز
+فروشکست#فروشکن
+فروفرستاد#فروفرست
+فروماند#فرومان
+فرونشاند#فرونشان
+فرونشانید#فرونشان
+فرونشست#فرونشین
+فرونمود#فرونما
+فرونهاد#فرونه
+فروپاشاند#فروپاشان
+فروپاشید#فروپاش
+فروچکید#فروچک
+فروکرد#فروکن
+فروکشید#فروکش
+فروکوبید#فروکوب
+فروکوفت#فروکوب
+فروگذارد#فروگذار
+فروگذاشت#فروگذار
+فروگرفت#فروگیر
+فریفت#فریب
+فشاند#فشان
+فشرد#فشار
+فشرد#فشر
+فلسفید#فلسف
+فهماند#فهمان
+فهمید#فهم
+قاپید#قاپ
+قبولاند#قبول
+قبولاند#قبولان
+لاسید#لاس
+لرزاند#لرزان
+لرزید#لرز
+لغزاند#لغزان
+لغزید#لغز
+لمباند#لمبان
+لمید#لم
+لنگید#لنگ
+لولید#لول
+لیسید#لیس
+ماسید#ماس
+مالاند#مالان
+مالید#مال
+ماند#مان
+مانست#مان
+مرد#میر
+مویید#مو
+مکید#مک
+نازید#ناز
+نالاند#نالان
+نالید#نال
+نامید#نام
+نشاند#نشان
+نشست#نشین
+نمایاند#نما
+نمایاند#نمایان
+نمود#نما
+نهاد#نه
+نهفت#نهنب
+نواخت#نواز
+نوازید#نواز
+نوردید#نورد
+نوشاند#نوشان
+نوشانید#نوشان
+نوشت#نویس
+نوشید#نوش
+نکوهید#نکوه
+نگاشت#نگار
+نگرید#
+نگریست#نگر
+هراساند#هراسان
+هراسانید#هراسان
+هراسید#هراس
+هشت#هل
+وا#واخواه
+واداشت#وادار
+وارفت#وارو
+وارهاند#وارهان
+واماند#وامان
+وانهاد#وانه
+واکرد#واکن
+واگذارد#واگذار
+واگذاشت#واگذار
+ور#ورخواه
+ورآمد#ورآ
+ورافتاد#ورافت
+وررفت#وررو
+ورزید#ورز
+وزاند#وزان
+وزید#وز
+ویراست#ویرا
+پاشاند#پاشان
+پاشید#پاش
+پالود#پالا
+پایید#پا
+پخت#پز
+پذیراند#پذیران
+پذیرفت#پذیر
+پراند#پران
+پراکند#پراکن
+پرداخت#پرداز
+پرستید#پرست
+پرسید#پرس
+پرهیخت#پرهیز
+پرهیزید#پرهیز
+پروراند#پروران
+پرورد#پرور
+پرید#پر
+پسندید#پسند
+پلاساند#پلاسان
+پلاسید#پلاس
+پلکید#پلک
+پناهاند#پناهان
+پناهید#پناه
+پنداشت#پندار
+پوساند#پوسان
+پوسید#پوس
+پوشاند#پوشان
+پوشید#پوش
+پویید#پو
+پژمرد#پژمر
+پژوهید#پژوه
+پکید#پک
+پیراست#پیرا
+پیمود#پیما
+پیوست#پیوند
+پیچاند#پیچان
+پیچانید#پیچان
+پیچید#پیچ
+چاپید#چاپ
+چایید#چا
+چراند#چران
+چرانید#چران
+چرباند#چربان
+چربید#چرب
+چرخاند#چرخان
+چرخانید#چرخان
+چرخید#چرخ
+چروکید#چروک
+چرید#چر
+چزاند#چزان
+چسباند#چسبان
+چسبید#چسب
+چسید#چس
+چشاند#چشان
+چشید#چش
+چلاند#چلان
+چلانید#چلان
+چپاند#چپان
+چپید#چپ
+چکاند#چکان
+چکید#چک
+چید#چین
+کاست#کاه
+کاشت#کار
+کاوید#کاو
+کرد#کن
+کشاند#کشان
+کشانید#کشان
+کشت#کار
+کشت#کش
+کشید#کش
+کند#کن
+کوباند#کوبان
+کوبید#کوب
+کوشید#کوش
+کوفت#کوب
+کوچانید#کوچان
+کوچید#کوچ
+گایید#گا
+گداخت#گداز
+گذارد#گذار
+گذاشت#گذار
+گذراند#گذران
+گذشت#گذر
+گرازید#گراز
+گرانید#گران
+گرایید#گرا
+گرداند#گردان
+گردانید#گردان
+گردید#گرد
+گرفت#گیر
+گروید#گرو
+گریاند#گریان
+گریخت#گریز
+گریزاند#گریزان
+گریست#گر
+گریست#گری
+گزارد#گزار
+گزاشت#گزار
+گزید#گزین
+گسارد#گسار
+گستراند#گستران
+گسترانید#گستران
+گسترد#گستر
+گسست#گسل
+گسلاند#گسل
+گسیخت#گسل
+گشاد#گشا
+گشت#گرد
+گشود#گشا
+گفت#گو
+گمارد#گمار
+گماشت#گمار
+گنجاند#گنجان
+گنجانید#گنجان
+گنجید#گنج
+گنداند#گندان
+گندید#گند
+گوارید#گوار
+گوزید#گوز
+گیراند#گیران
+یازید#یاز
+یافت#یاب
+یونید#یون
+""".strip().split()
+
+## Below code is a modified version of HAZM package's verb conjugator, 
+# with soem extra verbs(Anything in hazm and not in here? compare needed!)
+
+VERBS_EXC = {}
+with_nots = lambda items: items + ['ن' + item for item in items]
+simple_ends = ['م', 'ی', '', 'یم', 'ید', 'ند']
+narrative_ends = ['ه‌ام', 'ه‌ای', 'ه', 'ه‌ایم', 'ه‌اید', 'ه‌اند']
+present_ends = ['م', 'ی', 'د', 'یم', 'ید', 'ند']
+
+# special case of '#هست':
+VERBS_EXC.update({conj: 'هست' for conj in ['هست' + end for end in simple_ends]})
+VERBS_EXC.update({conj: 'هست' for conj in ['نیست' + end for end in simple_ends]})
+
+for verb_root in verb_roots:
+    conjugations = []
+    if '#' not in verb_root:
+        continue
+    past, present = verb_root.split('#')
+
+    if past:
+        past_simples = [past + end for end in simple_ends]
+        past_imperfects = ['می‌' + item for item in past_simples]
+        past_narratives = [past + end for end in narrative_ends]
+        conjugations = with_nots(past_simples + past_imperfects + past_narratives)
+    if present:
+        imperatives = ['ب' + present, 'ن' + present]
+        if present.endswith('ا') or present in ('آ', 'گو'):
+            present = present + 'ی'
+        present_simples = [present + end for end in present_ends]
+        present_imperfects = ['می‌' + present + end for end in present_ends]
+        present_subjunctives = ['ب' + present + end for end in present_ends]
+        conjugations += with_nots(present_simples + present_imperfects) + \
+            present_subjunctives + imperatives
+
+    if past.startswith('آ'):
+        conjugations = set(map(lambda item: item.replace('بآ', 'بیا').replace('نآ', 'نیا'),\
+                            conjugations))
+    
+    VERBS_EXC.update({conj: (past,) if past else present for conj in conjugations})
+
--- a/spacy/lang/fa/lex_attrs.py
+++ b/spacy/lang/fa/lex_attrs.py
@ -0,0 +1,92 @@
+# coding: utf8
+from __future__ import unicode_literals
+from ...attrs import LIKE_NUM
+MIM = 'م'
+ZWNJ_O_MIM = '‌ام'
+YE_NUN = 'ین'
+_num_words = set("""
+صفر
+یک
+دو
+سه
+چهار
+پنج
+شش
+شیش
+هفت
+هشت
+نه
+ده
+یازده
+دوازده
+سیزده
+چهارده
+پانزده
+پونزده
+شانزده
+شونزده
+هفده
+هجده
+هیجده
+نوزده
+بیست
+سی
+چهل
+پنجاه
+شصت
+هفتاد
+هشتاد
+نود
+صد
+یکصد
+یک‌صد
+دویست
+سیصد
+چهارصد
+پانصد
+پونصد
+ششصد
+شیشصد
+هفتصد
+هفصد
+هشتصد
+نهصد
+هزار
+میلیون
+میلیارد
+بیلیون
+بیلیارد
+تریلیون
+تریلیارد
+کوادریلیون
+کادریلیارد
+کوینتیلیون
+""".split())
+
+_ordinal_words = set("""
+اول
+سوم
+سی‌ام""".split())
+
+_ordinal_words.update({num + MIM for num in _num_words})
+_ordinal_words.update({num + ZWNJ_O_MIM for num in _num_words})
+_ordinal_words.update({num + YE_NUN for num in _ordinal_words})
+
+def like_num(text):
+    """
+    check if text resembles a number
+    """
+    text = text.replace(',', '').replace('.', '').\
+                replace('،', '').replace('٫','').replace('/', '')
+    if text.isdigit():
+        return True
+    if text in _num_words:
+        return True
+    if text in _ordinal_words:
+        return True
+    return False
+
+
+LEX_ATTRS = {
+    LIKE_NUM: like_num
+}
--- a/spacy/lang/fa/punctuation.py
+++ b/spacy/lang/fa/punctuation.py
@ -0,0 +1,16 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ..punctuation import TOKENIZER_INFIXES
+from ..char_classes import LIST_PUNCT, LIST_ELLIPSES, LIST_QUOTES, CURRENCY
+from ..char_classes import QUOTES, UNITS, ALPHA, ALPHA_LOWER, ALPHA_UPPER
+
+_suffixes = (LIST_PUNCT + LIST_ELLIPSES + LIST_QUOTES +
+             [r'(?<=[0-9])\+',
+             r'(?<=[0-9])%',  # 4% -> ["4", "%"]
+              # Persian is written from Right-To-Left
+              r'(?<=[0-9])(?:{})'.format(CURRENCY),
+              r'(?<=[0-9])(?:{})'.format(UNITS),
+              r'(?<=[{au}][{au}])\.'.format(au=ALPHA_UPPER)])
+
+TOKENIZER_SUFFIXES = _suffixes
--- a/spacy/lang/fa/stop_words.py
+++ b/spacy/lang/fa/stop_words.py
@ -1,105 +1,395 @@
 # coding: utf8
 from __future__ import unicode_literals

+# stop words from HAZM package

 STOP_WORDS = set("""
-آباد آره آری آسانی آمد آمده آن آنان آنجا آنها آن‌ها آنچه آنکه آورد آورده آیا آید
-ات اثر از است استفاده اش اطلاعند الاسف البته الظاهر ام اما امروز امسال اند انکه او اول اکنون
-اگر الواقع ای ایشان ایم این اینک اینکه
-
-ب با بااین بار بارة باره بارها باز بازهم بازی باش باشد باشم باشند باشی باشید باشیم بالا بالاخره
-بالاخص بالاست بالای بالطبع بالعکس باوجودی باورند باید بتدریج بتوان بتواند بتوانی بتوانیم بجز بخش بخشه بخشی بخصوص بخواه
-بخواهد بخواهم بخواهند بخواهی بخواهید بخواهیم بخوبی بد بدان بدانجا بدانها بدون بدین بدینجا بر برآن برآنند برا برابر
-براحتی براساس براستی برای برایت برایش برایشان برایم برایمان برخوردار برخوردارند برخی برداری برداشتن بردن برعکس برنامه
-بروز بروشنی بزرگ بزودی بس بسا بسادگی بسته بسختی بسوی بسی بسیار بسیاری بشدت بطور بطوری بعد بعدا بعدازظهر بعدها بعری
-بعضا بعضی بعضیهایشان بعضی‌ها بعلاوه بعید بفهمی بلافاصله بله بلکه بلی بنابراین بندی به بهت بهتر بهترین بهش بود بودم بودن
-بودند بوده بودی بودید بودیم بویژه بپا بکار بکن بکند بکنم بکنند بکنی بکنید بکنیم بگو بگوید بگویم بگویند بگویی بگویید
-بگوییم بگیر بگیرد بگیرم بگیرند بگیری بگیرید بگیریم بی بیا بیاب بیابد بیابم بیابند بیابی بیابید بیابیم بیاور بیاورد
-بیاورم بیاورند بیاوری بیاورید بیاوریم بیاید بیایم بیایند بیایی بیایید بیاییم بیرون بیست بیش بیشتر بیشتری بین بیگمان
-
-پ پا پارسال پارسایانه پاره‌ای پاعین پایین پدرانه پدیده پرسان پروردگارا پریروز پس پشت پشتوانه پشیمونی پنج پهن پی پیدا
-پیداست پیرامون پیش پیشاپیش پیشتر پیوسته
-
-ت تا تازه تازگی تان تاکنون تحت تحریم تدریج تر ترتیب تردید ترند ترین تصریحا تعدادی تعمدا تفاوتند تقریبا تک تلویحا تمام
-تماما تمامشان تمامی تند تنها تو توؤما تواند توانست توانستم توانستن توانستند توانسته توانستی توانستیم توانم توانند توانی
-توانید توانیم توسط تول توی
-
-ث ثالثا ثانی ثانیا
-
-ج جا جای جایی جدا جداگانه جدید جدیدا جریان جز جلو جلوگیری جلوی جمع جمعا جمعی جنابعالی جناح جنس جهت جور جوری
-
-چ چاله چاپلوسانه چت چته چرا چشم چطور چقدر چنان چنانچه چنانکه چند چندان چنده چندین چنین چه چهار چو چون چکار چگونه چی چیز
-چیزهاست چیزی چیزیست چیست چیه
-
-ح حاشیه‌ حاشیه‌ای حاضر حاضرم حال حالا حاکیست حتما حتی حداقل حداکثر حدود حدودا حسابگرانه حسابی حضرتعالی حق حقیرانه حول
-حکما
-
-خ خارج خالصانه خب خداحافظ خداست خدمات خسته‌ای خصوصا خلاصه خواست خواستم خواستن خواستند خواسته خواستی خواستید خواستیم
-خواهد خواهم خواهند خواهی خواهید خواهیم خوب خوبی خود خودبه خودت خودتان خودتو خودش خودشان خودم خودمان خودمو خودی خوش
-خوشبختانه خویش خویشتن خویشتنم خیاه خیر خیره خیلی
-
-د دا داام دااما داخل داد دادم دادن دادند داده دادی دادید دادیم دار داراست دارد دارم دارند داری دارید داریم داشت داشتم
-داشتن داشتند داشته داشتی داشتید داشتیم دامم دانست دانند دایم دایما در دراین درباره درحالی درحالیکه درست درسته درشتی
-درصورتی درعین درمجموع درواقع درون درپی دریغ دریغا دسته دشمنیم دقیقا دلخواه دم دنبال ده دهد دهم دهند دهی دهید دهیم دو
-دوباره دوم دیده دیر دیرت دیرم دیروز دیشب دیوی دیگر دیگران دیگری دیگه
-
-ذ ذاتا ذلک ذیل
-
-ر را راجع راحت راسا راست راستی راه رسما رسید رشته رغم رفت رفتارهاست رفته رنجند رهگشاست رو رواست روب روبروست روز روزانه
-روزه روزهای روزه‌ای روش روشنی روی رویش ریزی
-
-ز زدن زده زشتکارانند زمان زمانی زمینه زنند زهی زود زودتر زودی زیاد زیاده زیر زیرا
-
-س سابق ساختن ساخته ساده سادگی سازی سالانه سالته سالم‌تر ساله سالهاست سالها سالیانه سایر ست سخت سخته سر سراسر سرانجام
-سراپا سرعت سری سریع سریعا سعی سمت سه سهوا سوم سوی سپس سیاه
-
-ش شان شاهدند شاهدیم شاید شبهاست شخصا شد شدت شدم شدن شدند شده شدی شدید شدیدا شدیم شش شما شماری شماست شمایند شناسی شود
-شوراست شوم شوند شونده شوی شوید شویم شیرین شیرینه
-
-ص صددرصد صرفا صریحا صندوق صورت صورتی
-
-ض ضد ضمن ضمنا
-
-ط طبعا طبق طبیعتا طرف طریق طلبکارانه طور طوری طی
-
-ظ ظاهرا
-
-ع عاجزانه عاقبت عبارتند عجب عجولانه عدم عرفانی عقب علاوه علت علنا علی علیه عمدا عمدتا عمده عمل عملا عملی عموم عموما
-عنقریب عنوان عینا
-
-غ غالبا غیر غیرقانونی
-
-ف فاقد فبها فر فردا فعلا فقط فلان فلذا فوق فکر فی فی‌الواقع
-
-ق قاالند قابل قاطبه قاطعانه قاعدتا قانونا قبل قبلا قبلند قد قدر قدری قراردادن قصد قطعا
-
-ک کارند کاش کاشکی کامل کاملا کتبا کجا کجاست کدام کرات کرد کردم کردن کردند کرده کردی کردید کردیم کس کسانی کسی کشیدن کل
-کلا کلی کلیشه کلیه کم کمااینکه کماکان کمتر کمتره کمتری کمی کن کنار کنارش کنان کنایه‌ای کند کنم کنند کننده کنون کنونی
-کنی کنید کنیم که کو کی كي
-
-گ گاه گاهی گذاری گذاشتن گذاشته گذشته گردد گرفت گرفتارند گرفتم گرفتن گرفتند گرفته گرفتی گرفتید گرفتیم گرمی گروهی گرچه
-گفت گفتم گفتن گفتند گفته گفتی گفتید گفتیم گه گهگاه گو گونه گویا گویان گوید گویم گویند گویی گویید گوییم گیرد گیرم گیرند
-گیری گیرید گیریم
-
-ل لا لااقل لاجرم لب لذا لزوما لطفا لیکن لکن
-
-م ما مادامی ماست مامان مان مانند مبادا متاسفانه متعاقبا متفاوتند مثل مثلا مجبورند مجددا مجموع مجموعا محتاجند محکم
-محکم‌تر مخالفند مختلف مخصوصا مدام مدت مدتهاست مدتی مذهبی مرا مراتب مرتب مردانه مردم مرسی مستحضرید مستقیما مستند مسلما
-مشت مشترکا مشغولند مطمانا مطمانم مطمینا مع معتقدم معتقدند معتقدیم معدود معذوریم معلومه معمولا معمولی مغرضانه مفیدند
-مقابل مقدار مقصرند مقصری ممکن من منتهی منطقی مواجهند موارد موجودند مورد موقتا مکرر مکررا مگر می مي میان میزان میلیارد
-میلیون می‌رسد می‌رود می‌شود می‌کنیم
-
-ن ناامید ناخواسته ناراضی ناشی نام ناچار ناگاه ناگزیر ناگهان ناگهانی نباید نبش نبود نخست نخستین نخواهد نخواهم نخواهند
-نخواهی نخواهید نخواهیم نخودی ندارد ندارم ندارند نداری ندارید نداریم نداشت نداشتم نداشتند نداشته نداشتی نداشتید نداشتیم
-نزد نزدیک نسبتا نشان نشده نظیر نفرند نفهمی نماید نمی نمی‌شود نه نهایت نهایتا نوع نوعا نوعی نکرده نگاه نیازمندانه
-نیازمندند نیز نیست نیمی
-
-و وابسته واقع واقعا واقعی واقفند وای وجه وجود وحشت وسط وضع وضوح وقتی وقتیکه ولی وگرنه وگو وی ویا ویژه
-
-ه ها هاست های هایی هبچ هدف هر هرحال هرچند هرچه هرکس هرگاه هرگز هزار هست هستم هستند هستی هستید هستیم هفت هق هم همان
-همانند همانها همدیگر همزمان همه همه‌اش همواره همچنان همچنین همچون همچین همگان همگی همیشه همین هنوز هنگام هنگامی هوی هی
-هیچ هیچکدام هیچکس هیچگاه هیچگونه هیچی
-
-ی یا یابد یابم یابند یابی یابید یابیم یارب یافت یافتم یافتن یافته یافتی یافتید یافتیم یعنی یقینا یواش یک یکدیگر یکریز
-یکسال یکی یکي
-""".split())
+و
+در
+به
+از
+که
+این
+را
+با
+است
+برای
+آن
+یک
+خود
+تا
+کرد
+بر
+هم
+نیز
+گفت
+می‌شود
+وی
+شد
+دارد
+ما
+اما
+یا
+شده
+باید
+هر
+آنها
+بود
+او
+دیگر
+دو
+مورد
+می‌کند
+شود
+کند
+وجود
+بین
+پیش
+شده‌است
+پس
+نظر
+اگر
+همه
+یکی
+حال
+هستند
+من
+کنند
+نیست
+باشد
+چه
+بی
+می
+بخش
+می‌کنند
+همین
+افزود
+هایی
+دارند
+راه
+همچنین
+روی
+داد
+بیشتر
+بسیار
+سه
+داشت
+چند
+سوی
+تنها
+هیچ
+میان
+اینکه
+شدن
+بعد
+جدید
+ولی
+حتی
+کردن
+برخی
+کردند
+می‌دهد
+اول
+نه
+کرده‌است
+نسبت
+بیش
+شما
+چنین
+طور
+افراد
+تمام
+درباره
+بار
+بسیاری
+می‌تواند
+کرده
+چون
+ندارد
+دوم
+بزرگ
+طی
+حدود
+همان
+بدون
+البته
+آنان
+می‌گوید
+دیگری
+خواهد‌شد
+کنیم
+قابل
+یعنی
+رشد
+می‌توان
+وارد
+کل
+ویژه
+قبل
+براساس
+نیاز
+گذاری
+هنوز
+لازم
+سازی
+بوده‌است
+چرا
+می‌شوند
+وقتی
+گرفت
+کم
+جای
+حالی
+تغییر
+پیدا
+اکنون
+تحت
+باعث
+مدت
+فقط
+زیادی
+تعداد
+آیا
+بیان
+رو
+شدند
+عدم
+کرده‌اند
+بودن
+نوع
+بلکه
+جاری
+دهد
+برابر
+مهم
+بوده
+اخیر
+مربوط
+امر
+زیر
+گیری
+شاید
+خصوص
+آقای
+اثر
+کننده
+بودند
+فکر
+کنار
+اولین
+سوم
+سایر
+کنید
+ضمن
+مانند
+باز
+می‌گیرد
+ممکن
+حل
+دارای
+پی
+مثل
+می‌رسد
+اجرا
+دور
+منظور
+کسی
+موجب
+طول
+امکان
+آنچه
+تعیین
+گفته
+شوند
+جمع
+خیلی
+علاوه
+گونه
+تاکنون
+رسید
+ساله
+گرفته
+شده‌اند
+علت
+چهار
+داشته‌باشد
+خواهد‌بود
+طرف
+تهیه
+تبدیل
+مناسب
+زیرا
+مشخص
+می‌توانند
+نزدیک
+جریان
+روند
+بنابراین
+می‌دهند
+یافت
+نخستین
+بالا
+پنج
+ریزی
+عالی
+چیزی
+نخست
+بیشتری
+ترتیب
+شده‌بود
+خاص
+خوبی
+خوب
+شروع
+فرد
+کامل
+غیر
+می‌رود
+دهند
+آخرین
+دادن
+جدی
+بهترین
+شامل
+گیرد
+بخشی
+باشند
+تمامی
+بهتر
+داده‌است
+حد
+نبود
+کسانی
+می‌کرد
+داریم
+علیه
+می‌باشد
+دانست
+ناشی
+داشتند
+دهه
+می‌شد
+ایشان
+آنجا
+گرفته‌است
+دچار
+می‌آید
+لحاظ
+آنکه
+داده
+بعضی
+هستیم
+اند
+برداری
+نباید
+می‌کنیم
+نشست
+سهم
+همیشه
+آمد
+اش
+وگو
+می‌کنم
+حداقل
+طبق
+جا
+خواهد‌کرد
+نوعی
+چگونه
+رفت
+هنگام
+فوق
+روش
+ندارند
+سعی
+بندی
+شمار
+کلی
+کافی
+مواجه
+همچنان
+زیاد
+سمت
+کوچک
+داشته‌است
+چیز
+پشت
+آورد
+حالا
+روبه
+سال‌های
+دادند
+می‌کردند
+عهده
+نیمه
+جایی
+دیگران
+سی
+بروز
+یکدیگر
+آمده‌است
+جز
+کنم
+سپس
+کنندگان
+خودش
+همواره
+یافته
+شان
+صرف
+نمی‌شود
+رسیدن
+چهارم
+یابد
+متر
+ساز
+داشته
+کرده‌بود
+باره
+نحوه
+کردم
+تو
+شخصی
+داشته‌باشند
+محسوب
+پخش
+کمی
+متفاوت
+سراسر
+کاملا
+داشتن
+نظیر
+آمده
+گروهی
+فردی
+ع
+همچون
+خطر
+خویش
+کدام
+دسته
+سبب
+عین
+آوری
+متاسفانه
+بیرون
+دار
+ابتدا
+شش
+افرادی
+می‌گویند
+سالهای
+درون
+نیستند
+یافته‌است
+پر
+خاطرنشان
+گاه
+جمعی
+اغلب
+دوباره
+می‌یابد
+لذا
+زاده
+گردد
+اینجا""".split())
--- a/spacy/lang/fa/syntax_iterators.py
+++ b/spacy/lang/fa/syntax_iterators.py
@ -0,0 +1,43 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ...symbols import NOUN, PROPN, PRON
+
+
+def noun_chunks(obj):
+    """
+    Detect base noun phrases from a dependency parse. Works on both Doc and Span.
+    """
+    labels = ['nsubj', 'dobj', 'nsubjpass', 'pcomp', 'pobj', 'dative', 'appos',
+              'attr', 'ROOT']
+    doc = obj.doc # Ensure works on both Doc and Span.
+    np_deps = [doc.vocab.strings.add(label) for label in labels]
+    conj = doc.vocab.strings.add('conj')
+    np_label = doc.vocab.strings.add('NP')
+    seen = set()
+    for i, word in enumerate(obj):
+        if word.pos not in (NOUN, PROPN, PRON):
+            continue
+        # Prevent nested chunks from being produced
+        if word.i in seen:
+            continue
+        if word.dep in np_deps:
+            if any(w.i in seen for w in word.subtree):
+                continue
+            seen.update(j for j in range(word.left_edge.i, word.i+1))
+            yield word.left_edge.i, word.i+1, np_label
+        elif word.dep == conj:
+            head = word.head
+            while head.dep == conj and head.head.i < head.i:
+                head = head.head
+            # If the head is an NP, and we're coordinated to it, we're an NP
+            if head.dep in np_deps:
+                if any(w.i in seen for w in word.subtree):
+                    continue
+                seen.update(j for j in range(word.left_edge.i, word.i+1))
+                yield word.left_edge.i, word.i+1, np_label
+
+
+SYNTAX_ITERATORS = {
+    'noun_chunks': noun_chunks
+}
--- a/spacy/lang/fa/tag_map.py
+++ b/spacy/lang/fa/tag_map.py
@ -0,0 +1,39 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ...symbols import POS, PUNCT, SYM, ADJ, CONJ, NUM, DET, ADV, ADP, X, VERB
+from ...symbols import NOUN, PROPN, PART, INTJ, SPACE, PRON, AUX
+
+
+TAG_MAP = {
+    "ADJ": {POS: ADJ },
+    "ADJ_CMPR": {POS: ADJ },
+    "ADJ_INO": {POS: ADJ},
+    "ADJ_SUP": {POS: ADJ},
+    "ADV": {POS: ADV},
+    "ADV_COMP": {POS: ADV},
+    "ADV_I": {POS:  ADV},
+    "ADV_LOC": {POS:  ADV},
+    "ADV_NEG": {POS:  ADV},
+    "ADV_TIME": {POS:  ADV},
+    "CLITIC": {POS:  PART},
+    "CON": {POS:  CONJ},
+    "CONJ": {POS:  CONJ},
+    "DELM": {POS:  PUNCT},
+    "DET": {POS: DET},
+    "FW": {POS: X},
+    "INT": {POS:  INTJ},
+    "N_PL": {POS: NOUN},
+    "N_SING": {POS: NOUN},
+    "N_VOC": {POS: NOUN},
+    "NUM": {POS:  NUM},
+    "P": {POS:  ADP},
+    "PREV": {POS: ADP},
+    "PRO": {POS:  PRON},
+    "V_AUX": {POS:  AUX},
+    "V_IMP": {POS:  VERB},
+    "V_PA": {POS:  VERB},
+    "V_PP": {POS:  VERB},
+    "V_PRS": {POS:  VERB},
+    "V_SUB": {POS:  VERB},
+}
--- a/spacy/lang/fa/tokenizer_exceptions.py
+++ b/spacy/lang/fa/tokenizer_exceptions.py
--- a/spacy/lang/fr/init.py
+++ b/spacy/lang/fr/init.py
@ -6,7 +6,8 @@ from .punctuation import TOKENIZER_SUFFIXES, TOKENIZER_INFIXES
 from .tag_map import TAG_MAP
 from .stop_words import STOP_WORDS
 from .lex_attrs import LEX_ATTRS
-from .lemmatizer import LOOKUP
+from .lemmatizer import LEMMA_RULES, LEMMA_INDEX, LEMMA_EXC, LOOKUP
+from .lemmatizer.lemmatizer import FrenchLemmatizer
 from .syntax_iterators import SYNTAX_ITERATORS

 from ..tokenizer_exceptions import BASE_EXCEPTIONS
@ -28,7 +29,16 @@ class FrenchDefaults(Language.Defaults):
    suffixes = TOKENIZER_SUFFIXES
    token_match = TOKEN_MATCH
    syntax_iterators = SYNTAX_ITERATORS
-    lemma_lookup = LOOKUP
+    
+    
+    @classmethod
+    def create_lemmatizer(cls, nlp=None):
+        lemma_rules = LEMMA_RULES
+        lemma_index = LEMMA_INDEX
+        lemma_exc = LEMMA_EXC
+        lemma_lookup = LOOKUP
+        return FrenchLemmatizer(index=lemma_index, exceptions=lemma_exc,
+                                rules=lemma_rules, lookup=lemma_lookup)


 class French(Language):
--- a/spacy/lang/fr/_tokenizer_exceptions_list.py
+++ b/spacy/lang/fr/_tokenizer_exceptions_list.py
--- a/spacy/lang/fr/lemmatizer/init.py
+++ b/spacy/lang/fr/lemmatizer/init.py
@ -0,0 +1,23 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from .lookup import LOOKUP
+from ._adjectives import ADJECTIVES
+from ._adjectives_irreg import ADJECTIVES_IRREG
+from ._adverbs import ADVERBS
+from ._nouns import NOUNS
+from ._nouns_irreg import NOUNS_IRREG
+from ._verbs import VERBS
+from ._verbs_irreg import VERBS_IRREG
+from ._dets_irreg import DETS_IRREG
+from ._pronouns_irreg import PRONOUNS_IRREG
+from ._auxiliary_verbs_irreg import AUXILIARY_VERBS_IRREG
+from ._lemma_rules import ADJECTIVE_RULES, NOUN_RULES, VERB_RULES
+
+
+LEMMA_INDEX = {'adj': ADJECTIVES, 'adv': ADVERBS, 'noun': NOUNS, 'verb': VERBS}
+
+LEMMA_EXC = {'adj': ADJECTIVES_IRREG, 'noun': NOUNS_IRREG, 'verb': VERBS_IRREG, 
+             'det': DETS_IRREG, 'pron': PRONOUNS_IRREG, 'aux': AUXILIARY_VERBS_IRREG}
+
+LEMMA_RULES = {'adj': ADJECTIVE_RULES, 'noun': NOUN_RULES, 'verb': VERB_RULES}
--- a/spacy/lang/fr/lemmatizer/_adjectives.py
+++ b/spacy/lang/fr/lemmatizer/_adjectives.py
@ -0,0 +1,601 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+ADJECTIVES = set("""
+ abaissant abaissé abandonné abasourdi abasourdissant abattu abcédant aberrant
+ abject abjurant aboli abondant abonné abordé abouti aboutissant abouté
+ abricoté abrité abrouti abrupt abruti abrutissant abruzzain absent absolu
+ absorbé abstinent abstrait abyssin abâtardi abêtissant abîmant abîmé acarpellé
+ accablé accalminé accaparant accastillant accentué acceptant accepté accidenté
+ accolé accombant accommodant accommodé accompagné accompli accordé accorné
+ accoudé accouplé accoutumé accrescent accroché accru accréditant accrédité
+ accueillant accumulé accusé accéléré acescent achalandé acharné achevé
+ acidulé aciéré acotylé acquitté activé acuminé acutangulé acutifolié acutilobé
+ adapté additionné additivé adextré adhérent adimensionné adiré adjacent
+ adjoint adjugé adjuvant administré admirant adné adolescent adoptant adopté
+ adossé adouci adoucissant adressé adroit adscrit adsorbant adultérin adéquat
+ affaibli affaiblissant affairé affamé affectionné affecté affermi affidé
+ affilé affin affligeant affluent affolant affolé affranchi affriolant affronté
+ affété affûté afghan africain agaillardi agatin agatisé agaçant agglomérant
+ agglutinant agglutiné aggravant agissant agitant agité agminé agnat agonisant
+ agrafé agrandi agressé agrippant agrégé agréé aguichant ahanant ahuri
+ aigretté aigri aigrissant aiguilleté aiguisé ailé aimant aimanté aimé ajourné
+ ajusté alabastrin alambiqué alangui alanguissant alarmant alarmé albuginé
+ alcalescent alcalifiant alcalin alcalinisant alcoolisé aldin alexandrin alezan
+ aligoté alizé aliénant aliéné alkylant allaitant allant allemand allergisant
+ alliciant allié allongé allumant allumé alluré alléchant allégeant allégé
+ alphabloquant alphastimulant alphonsin alpin alternant alternifolié
+ altérant altéré alucité alvin alvéolé alésé amaigri amaigrissant amalgamant
+ amaril ambiant ambisexué ambivalent ambulant ami amiantacé amiantin amidé
+ aminé ammoniacé ammoniaqué ammonié amnistiant amnistié amnésiant amoindrissant
+ amorti amplifiant amplifié amplié ampoulé amputé amusant amusé amylacé
+ américain amérisant anabolisant analgésiant anamorphosé anarchisant anastigmat
+ anavirulent ancorné andin andorran anergisant anesthésiant angevin anglican
+ angoissant angoissé angustifolié angustipenné animé anisé ankylosant ankylosé
+ anobli anoblissant anodin anovulant ansé ansérin antenné anthropisé
+ antialcalin antiallemand antiamaril antiautoadjoint antibrouillé
+ anticipant anticipé anticoagulant anticontaminant anticonvulsivant
+ antidécapant antidéflagrant antidérapant antidétonant antifeutrant
+ antigivrant antiglissant antiliant antimonié antiméthémoglobinisant antinatal
+ antiodorant antioxydant antiperspirant antiquisant antirassissant
+ antiréfléchissant antirépublicain antirésonant antirésonnant antisymétrisé
+ antivieillissant antiémétisant antécédent anténatal antéposé antérieur
+ antérosupérieur anémiant anémié aoûté apaisant apeuré apicalisé aplati apocopé
+ apparent apparenté apparié appartenant appaumé appelant appelé appendiculé
+ appointé apposé apprivoisé approchant approché approfondi approprié approuvé
+ apprêté appuyé appétissant apérianthé aquarellé aquitain arabisant araucan
+ arborisé arboré arcelé archaïsant archiconnu archidiocésain architecturé
+ ardent ardoisé ardu argentin argenté argilacé arillé armoricain armé
+ arpégé arqué arrangeant arrivant arrivé arrogant arrondi arrosé arrêté arsénié
+ articulé arénacé aréolé arétin ascendant ascosporé asexué asin asphyxiant
+ aspirant aspiré assaillant assainissant assaisonné assassin assassinant
+ asservissant assidu assimilé assistant assisté assiégeant assiégé associé
+ assommant assonancé assonant assorti assoupi assoupissant assouplissant
+ assujetti assujettissant assuré asséchant astreignant astringent atloïdé
+ atonal atrophiant atrophié attachant attaquant attardé atteint attenant
+ attendu attentionné atterrant attesté attirant attitré attrayant attristant
+ atélectasié auriculé auscitain austral authentifiant autoadjoint autoagrippant
+ autoancré autobronzant autocentré autocohérent autocollant autocommandé
+ autocontraint autoconvergent autocopiant autoflagellant autofondant autoguidé
+ autolubrifiant autolustrant autolégitimant autolégitimé automodifiant
+ autonettoyant autoportant autoproduit autopropulsé autorepassant autorisé
+ autosuffisant autotrempant auvergnat avachi avalant avalé avancé avarié
+ aventuriné aventuré avenu averti aveuglant avianisé avili avilissant aviné
+ avivé avoisinant avoué avéré azimuté azoté azuré azéri aéronaval aéroporté
+ aéré aîné babillard badaud badgé badin bahaï bahreïni bai baillonné baissant
+ balafré balancé balbutiant baleiné ballant ballonisé ballonné ballottant
+ balzan bambochard banal banalisé bancal bandant bandé bangladeshi banlieusard
+ bantou baraqué barbant barbarisant barbelé barbichu barbifiant barbu bardé
+ baroquisant barré baryté basané basculant basculé basedowifiant basedowifié
+ bastillé bastionné bataillé batifolant battant battu bavard becqué bedonnant
+ bellifontain belligérant benoît benzolé benzoïné berçant beurré biacuminé
+ bicarbonaté bicarré bicomponent bicomposé biconstitué bicontinu bicornu
+ bidonnant bienfaisant bienséant bienveillant bigarré bigot bigourdan bigéminé
+ bilié billeté bilobé bimaculé binoclard biodégradant bioluminescent biorienté
+ biparti bipectiné bipinné bipolarisé bipédiculé biramé birman biréfringent
+ biscuité bisexué bismuthé bisontin bispiralé bissexué bisublimé bisérié
+ biterné bivalent bivitellin bivoltin blafard blanchissant blanchoyant blasé
+ blessé bleu bleuissant bleuté blindé blond blondin blondissant blondoyant
+ blousant blâmant blêmissant bodybuildé boisé boitillant bombé bonard
+ bondé bonifié bonnard borain bordant borin borné boré bossagé bossu bot
+ bouclé boudiné bouffant bouffi bouillant bouilli bouillonnant boulant bouleté
+ bouqueté bourdonnant bourdonné bourgeonnant bourrant bourrelé bourru bourré
+ boutonné bovin bracelé bradycardisant braillard branchu branché branlant
+ bressan bretessé bretonnant breveté briard bridgé bridé brillant brillanté
+ bringueballant brinquebalant brinqueballant briochin brioché brisant brisé
+ broché bromé bronzant bronzé brouillé broutant bruissant brun brunissant brut
+ brévistylé brûlant brûlé budgeté burelé buriné bursodépendant busqué busé
+ butyracé buté byzantin bâtard bâti bâté béant béat bédouin bégayant bénard
+ bénédictin béquetant béquillard bétonné bêlant bêtabloquant bêtifiant bômé
+ cabochard cabotin cabriolant cabré cacaoté cachectisant cachemiri caché
+ cadjin cadmié caducifolié cafard cagnard cagot cagoulé cahotant caillouté
+ calcicordé calcifié calculé calmant calotin calé camard cambrousard cambré
+ camisard campagnard camphré campé camé canaliculé canin cannelé canné cantalou
+ canulant cané caoutchouté capitolin capitulant capitulard capité capricant
+ capsulé captivant capuchonné caquetant carabiné caracolant caractérisé
+ carbonaté carboné carburant carburé cardiocutané cardé carencé caressant
+ carillonnant carillonné carié carminé carné carolin caronculé carpé carré
+ caréné casqué cassant cassé castelroussin castillan catalan catastrophé
+ catégorisé caudé caulescent causal causant cavalcadant celtisant cendré censé
+ centraméricain centré cerclé cerdagnol cerdan cerné certain certifié cervelé
+ chafouin chagrin chagrinant chagriné chaloupé chamoisé chamoniard chancelant
+ chantant chançard chapeauté chapé charançonné chargé charmant charnu charpenté
+ charrié chartrain chassant chasé chatoyant chaud chauffant chaussant chauvin
+ chenillé chenu chevalin chevauchant chevelu chevelé chevillé chevronné
+ chiant chicard chiffonné chiffré chimioluminescent chimiorésistant chiné
+ chié chlamydé chleuh chlorurant chloruré chloré chocolaté choisi choké
+ choral chronodépendant chryséléphantin chuintant chypré châtain chélatant
+ chômé ciblé cicatrisant cilié cinglant cinglé cintré circiné circonspect
+ circonvoisin circulant circumtempéré ciré cisalpin cisjuran cispadan citadin
+ citronné citérieur civil civilisé clabotant clair claironnant clairsemé
+ clandestin clapotant claquant clarifiant clariné classicisant claudicant
+ clavelé clignotant climatisé clinquant cliquetant clissé clivant cloisonné
+ cloqué clouté cloîtré clément clémentin coagulant coalescent coalisé coassocié
+ cocciné cocu codant codirigeant codominant codé codélirant codétenu coexistant
+ cogné cohérent coiffant coiffé coinché cokéfiant colicitant colitigant
+ collant collodionné collé colmatant colombin colonisé colorant coloré
+ combattant combinant combinard combiné comburant comité commandant commençant
+ commun communard communiant communicant communiqué communisant compact
+ comparé compassé compatissant compensé complaisant complexant compliqué
+ composant composé comprimé compromettant computérisé compétent comtadin conard
+ concertant concerté conciliant concluant concomitant concordant concourant
+ concupiscent concurrent concédant condamné condensant condensé condescendant
+ conditionné condupliqué confiant confident confiné confit confondant confédéré
+ congru congruent conjoint conjugant conjugué connaissant connard connivent
+ conné conquassant conquérant consacrant consacré consanguin conscient conscrit
+ conservé consistant consolant consolidé consommé consonant constant constellé
+ constipant constipé constituant constitué constringent consultant conséquent
+ containeurisé contaminant contemporain content contenu contestant continent
+ continu contondant contourné contractant contraignant contraint contraposé
+ contrarié contrastant contrasté contravariant contrecollé contredisant
+ contrefait contrevariant contrevenant contrit controuvé controversé contrôlé
+ convaincu convalescent conventionné convenu convergent converti convoluté
+ convulsivant convulsé conçu cooccupant cooccurrent coopérant coordiné
+ coordonné copartageant coparticipant coquillé coquin coraillé corallin
+ cordé cornard corniculé cornu corné corpulent correct correspondant corrigé
+ corrodant corrompu corrélé corticodépendant corticorésistant cortisoné
+ coréférent cossard cossu costaud costulé costumé cotisant couard couchant
+ coulant coulissant coulissé coupant couperosé couplé coupé courant courbatu
+ couronnant couronné court courtaud courtisan couru cousu couturé couvert
+ covalent covariant coïncident coûtant crachotant craché cramoisi cramponnant
+ craquelé cravachant crawlé crevant crevard crevé criant criard criblant criblé
+ crispant cristallin cristallisant cristallisé crochu croisetté croiseté
+ croissanté croisé crollé croquant crossé crotté croulant croupi croupissant
+ croyant cru crucifié cruenté crustacé cryodesséché cryoprécipité crémant
+ crépi crépitant crépu crétacé crétin crétinisant créé crêpelé crêté cubain
+ cuirassé cuisant cuisiné cuit cuivré culminant culotté culpabilisant cultivé
+ cuscuté cutané cyanosé câblé câlin cédant célébrant cérulé cérusé cévenol
+ damassé damné dandinant dansant demeuré demi dentelé denticulé dentu denté
+ dessalé dessiccant dessillant dessiné dessoudé desséchant deutéré diadémé
+ diamanté diapré diastasé diazoté dicarbonylé dichloré diffamant diffamé
+ diffractant diffringent diffusant différencié différent différé difluoré
+ diiodé dilatant dilaté diligent dilobé diluant dimensionné dimidié dimidé
+ diminué diocésain diphasé diplômant diplômé direct dirigeant dirigé dirimant
+ discipliné discontinu discord discordant discriminant discuté disert disgracié
+ disloqué disodé disparu dispersant dispersé disposé disproportionné disputé
+ dissimulé dissipé dissociant dissocié dissolu dissolvant dissonant disséminé
+ distant distinct distingué distrait distrayant distribué disubstitué disulfoné
+ divagant divaguant divalent divergent divertissant divin divorcé djaïn
+ dodu dogmatisant dolent domicilié dominant dominicain donjonné donnant donné
+ dormant dorsalisant doré douci doué drageonnant dragéifié drainant dramatisant
+ drapé dreyfusard drogué droit dru drupacé dual ductodépendant dulcifiant dur
+ duveté dynamisant dynamité dyspnéisant dystrophiant déaminé débarqué débauché
+ débilitant débloquant débordant débordé débouchant débourgeoisé déboussolé
+ débridé débrouillard débroussaillant débroussé débutant décadent décaféiné
+ décalant décalcifiant décalvant décapant décapité décarburant décati décavé
+ décevant déchagriné décharné déchaînant déchaîné déchevelé déchiqueté
+ déchiré déchloruré déchu décidu décidué décidé déclaré déclassé déclenchant
+ décoiffant décolleté décolorant décoloré décompensé décomplémenté décomplété
+ déconcertant déconditionné déconfit décongestionnant déconnant déconsidéré
+ décontractant décontracturant décontracté décortiqué décoré découplé découpé
+ décousu découvert décrispant décrochant décroissant décrépi décrépit décuman
+ décussé décérébré dédoré défaillant défait défanant défatigant défavorisé
+ déferlant déferlé défiant déficient défigé défilant défini déflagrant défleuri
+ défléchi défoliant défoncé déformant défranchi défraîchi défrisant défroqué
+ défâché défécant déférent dégagé dégingandé dégivrant déglutiné dégonflé
+ dégourdi dégouttant dégoûtant dégoûté dégradant dégradé dégraissant dégriffé
+ déguisé dégénérescent dégénéré déhanché déhiscent déjeté délabrant délabré
+ délassant délavé délayé délibérant délibéré délicat délinquant déliquescent
+ délitescent délié déloqué déluré délégué démagnétisant démaquillant démaqué
+ dément démerdard démesuré démixé démodé démontant démonté démoralisant
+ démotivant démotivé démystifiant démyélinisant démyélisant démêlant dénaturant
+ dénigrant dénitrant dénitrifiant dénommé dénudé dénutri dénué déodorant
+ dépapillé dépareillé dépassé dépaysant dépaysé dépeigné dépenaillé dépendant
+ dépeuplé déphasé dépité déplacé déplaisant déplaquetté déplasmatisé dépliant
+ déplumant déplumé déplété dépoitraillé dépolarisant dépoli dépolitisant
+ déponent déporté déposant déposé dépouillé dépourvu dépoussiérant dépravant
+ déprimant déprimé déprédé dépérissant dépétainisé déracinant déraciné
+ dérangé dérapant dérestauré dérivant dérivé dérobé dérogeant déroulant
+ déréalisant déréglé désabusé désaccordé désadapté désaffectivé désaffecté
+ désaisonnalisé désaligné désaliénant désaltérant désaluminisé désambiguïsé
+ désargenté désarmant désarçonnant désassorti désatomisé désaturant désaxé
+ désemparé désenchanté désensibilisant désert désespérant désespéré
+ désherbant déshonorant déshumanisant déshydratant déshydraté déshydrogénant
+ désiconisé désillusionnant désincarné désincrustant désinfectant
+ désintéressé désirant désobligeant désoblitérant désobéi désobéissant
+ désodorisant désodé désoeuvré désolant désolé désopilant désordonné
+ désorienté désossé désoxydant désoxygénant déstabilisant déstressant
+ désuni déséquilibrant déséquilibré détachant détaché détartrant détendu détenu
+ déterminant déterminé déterré détonant détonnant détourné détraqué détérioré
+ développé déverbalisant dévergondé déversé dévertébré déviant dévissé
+ dévoisé dévolu dévorant dévot dévoué dévoyé déwatté déçu effacé effarant
+ effarouché effaré effervescent efficient effiloché effilé efflanqué
+ effluent effondré effrangé effrayant effrayé effronté effréné efféminé
+ emballant embarrassant embarrassé embellissant embiellé embouché embouti
+ embrassant embrassé embrouillant embrouillé embroussaillé embruiné embryonné
+ embusqué embêtant emmerdant emmiellant emmiélant emmotté empaillé empanaché
+ empenné emperlé empesé empiétant emplumé empoignant empoisonnant emporté
+ empressé emprunté empâté empêché empêtré encaissant encaissé encalminé
+ encapsulant encapsulé encartouché encastré encerclant enchanté enchifrené
+ encloisonné encloqué encombrant encombré encorné encourageant encroué
+ encroûté enculé endenté endiablé endiamanté endimanché endogé endolori
+ endormi endurant endurci enfantin enfariné enflammé enflé enfoiré enfoncé
+ engageant engagé engainant englanté englobant engoulé engourdi engourdissant
+ engraissant engravé engrenant engrené engrêlé enguiché enhardé enivrant
+ enjambé enjoué enkikinant enkysté enlaidissant enlaçant enlevé enneigé ennemi
+ ennuyant ennuyé enquiquinant enracinant enrageant enragé enregistrant enrhumé
+ enrichissant enrobé enseignant enseigné ensellé ensoleillé ensommeillé
+ ensoutané ensuqué entartré entendu enterré enthousiasmant entouré entrant
+ entraînant entrecoupé entrecroisé entrelacé entrelardé entreprenant entresolé
+ entrouvert enturbanné enté entêtant entêté envahissant envapé enveloppant
+ envenimé enviné environnant envié envoyé envoûtant ergoté errant erroné
+ escarpé espacé espagnol espagnolisant esquintant esquinté esseulé essorant
+ estomaqué estompé estropié estudiantin euphorisant euphémisé eurafricain
+ exacerbé exact exagéré exalbuminé exaltant exalté exaspérant excellent
+ excepté excitant excité exclu excluant excommunié excru excédant exempt
+ exercé exerçant exfoliant exhalant exhilarant exigeant exilé exinscrit
+ exondé exorbitant exorbité exosporé exostosant expansé expatrié expectant
+ expert expirant exploitant exploité exposé expropriant exproprié expulsé
+ expérimenté extasié extemporané extradossé extrafort extraplat extrapériosté
+ extraverti extroverti exténuant extérieur exubérant exultant facilitant
+ faiblissant faignant failli faillé fainéant faisandé faisant fait falot falqué
+ fané faraud farci fardé farfelu farinacé fasciculé fascinant fascisant fascié
+ fassi fastigié fat fatal fatigant fatigué fauché favorisant façonné faïencé
+ feint fendant fendillé fendu fenestré fenian fenêtré fermant fermentant
+ ferritisant ferruginisé ferré fertilisant fervent fescennin fessu festal
+ festival feuillagé feuilleté feuillu feuillé feutrant feutré fiancé fibrillé
+ ficelé fichant fichu fieffé figulin figuré figé filant fileté filoguidé
+ filé fimbrié fin final finalisé finaud fini finissant fiérot flabellé flagellé
+ flagrant flamand flambant flamboyant flambé flamingant flammé flanchard
+ flanquant flapi flatulent flavescent flemmard fleurdelisé fleuri fleurissant
+ flippant florentin florissant flottant flottard flotté flou fluctuant fluent
+ fluidifié fluocompact fluorescent fluoré flushé fléchissant fléché flémard
+ flétrissant flûté foisonnant foliacé folié folliculé folâtrant foncé fondant
+ fondé forain foraminé forcené forcé forfait forgé formalisé formaté formicant
+ formé fort fortifiant fortrait fortuit fortuné fossilisé foudroyant fouettard
+ fouillé fouinard foulant fourbu fourcheté fourchu fourché fourmillant fourni
+ foutral foutu foxé fracassant fractal fractionné fragilisant fragrant
+ franchouillard francisant franciscain franciscanisant frangeant frappant
+ fratrisé frelaté fretté friand frigorifié fringant fringué friqué frisant
+ frisotté frissonnant frisé frit froid froissant froncé frondescent frottant
+ froussard fructifiant fruité frumentacé frustrant frustré frutescent
+ fréquent fréquenté frétillant fugué fulgurant fulminant fumant fumé furfuracé
+ furibond fusant fuselé futur futé fuyant fuyard fâché fébricitant fécond
+ féculent fédéré félin féminin féminisant férin férié féru fêlé gabalitain
+ gagé gai gaillard galant galbé gallican gallinacé galloisant galonné galopant
+ ganglionné gangrené gangué gantelé garant garanti gardé garni garnissant
+ gauchisant gazonnant gazonné gazouillant gazé gaël geignard gelé genouillé
+ germanisant germé gestant gesticulant gibelin gigotant gigotté gigoté girond
+ gironné gisant gitan givrant givré glabrescent glacial glacé glandouillant
+ glapissant glaçant glissant glissé globalisant glomérulé glottalisé
+ gloussant gloutonnant gluant glucosé glycosylé glycuroconjugué godillé
+ goguenard gommant gommé goménolé gondolant gonflant gonflé gouleyant goulu
+ gourmand gourmé goussaut gouvernant gouverné goûtu goûté gradué gradé graffité
+ grand grandiloquent grandissant granité granoclassé granulé graphitisant
+ grasseyant gratifiant gratiné gratuit gravant gravitant greffant grelottant
+ grenelé grenu grené griffu grignard grilleté grillé grimaçant grimpant
+ grinçant grippé grisant grisonnant grivelé grondant grossissant grouillant
+ grésillant gueulard guignard guilloché guillotiné guindé guivré guéri gâté
+ gélatinisant gélatiné gélifiant gélifié géminé gémissant géniculé généralisant
+ géométrisant gérant gênant gêné gîté habilitant habilité habillé habitué
+ hachuré haché hagard halbrené haletant halin hallucinant halluciné hanché
+ hanté harassant harassé harcelant harcelé hardi harpé hasté haut hautain
+ hennissant heptaperforé herbacé herborisé herbu herminé hernié hersé heurté
+ hibernant hilarant hindou hircin hispanisant historicisant historisant
+ hivernant hiérosolymitain holocristallin hominisé homogénéisé homoprothallé
+ homoxylé honorant honoré hordéacé hormonodéprivé horodaté horrifiant
+ hottentot hoyé huguenot huitard humain humectant humiliant humilié huppé
+ hutu hyalin hydratant hydrocarboné hydrochloré hydrocuté hydrogénant hydrogéné
+ hydrosalin hydrosodé hydroxylé hyperalcalin hypercalcifiant hypercalcémiant
+ hypercoagulant hypercommunicant hypercorrect hyperfin hyperfractionné
+ hyperisé hyperlordosé hypermotivé hyperphosphatémiant hyperplan hypersomnolent
+ hypertrophiant hypertrophié hypervascularisé hypnotisant hypoalgésiant
+ hypocalcémiant hypocarpogé hypocholestérolémiant hypocotylé hypoglycémiant
+ hypolipidémiant hypophosphatémiant hyposodé hypotendu hypotonisant
+ hypovirulent hypoxémiant hâlé hébraïsant hébété hélicosporé héliomarin
+ hélitransporté hémicordé hémicristallin hémiplégié hémodialysé hémopigmenté
+ hépatostrié hérissant hérissé hésitant hétéroprothallé hétérosporé hétérostylé
+ identifié idiot idiotifiant idéal ignifugeant ignorant ignorantin ignoré igné
+ illimité illuminé imaginant imaginé imagé imbriqué imbrûlé imbu imité immaculé
+ immergé immigrant immigré imminent immodéré immortalisant immotivé immun
+ immunocompétent immunodéprimant immunodéprimé immunostimulant immunosupprimé
+ immédiat immérité impair impaludé imparfait imparidigité imparipenné impatient
+ impayé impensé imperforé impermanent imperméabilisant impertinent implorant
+ important importun importé imposant imposé impotent impressionnant imprimant
+ impromptu impromulgué improuvé imprudent imprévoyant imprévu impudent
+ impuni impur impénitent impétiginisé inabordé inabouti inabrité inabrogé
+ inaccepté inaccompli inaccoutumé inachevé inactivé inadapté inadéquat
+ inaguerri inaliéné inaltéré inanalysé inanimé inanitié inapaisé inaperçu
+ inapparenté inappliqué inapprivoisé inapproprié inapprécié inapprêté
+ inarticulé inassimilé inassorti inassouvi inassujetti inattaqué inattendu
+ inavoué incandescent incapacitant incarnadin incarnat incarné incendié
+ incessant inchangé inchâtié incident incidenté incitant incivil inclassé
+ incliné inclément incohérent incombant incomitant incommodant incommuniqué
+ incompétent inconditionné inconfessé incongru incongruent inconnu inconquis
+ inconsidéré inconsistant inconsolé inconsommé inconstant inconséquent
+ incontesté incontinent incontrôlé inconvenant incoordonné incorporant
+ incorrect incorrigé incriminant incriminé incritiqué incroyant incrustant
+ incréé incubant inculpé incultivé incurvé indeviné indifférencié indifférent
+ indirect indirigé indiscipliné indiscriminé indiscuté indisposé indistinct
+ indolent indompté indou indu induit indulgent indupliqué induré
+ indébrouillé indécent indéchiffré indécidué indéfini indéfinisé indéfriché
+ indélibéré indélicat indémontré indémêlé indépassé indépendant indépensé
+ indéterminé ineffectué inefficient inemployé inentamé inentendu inespéré
+ inexaucé inexercé inexistant inexpert inexpié inexpliqué inexploité inexploré
+ inexprimé inexpérimenté inexécuté infamant infantilisant infarci infatué
+ infectant infecté infestant infesté infichu infiltrant infini inflammé
+ infléchi infondé informant informulé infortuné infoutu infréquenté infusé
+ inféodé inférieur inférovarié ingrat ingénu inhabité inhalant inhibant inhibé
+ inhérent inimité inintelligent ininterrompu inintéressant initié injecté
+ innervant innocent innominé innommé innomé innovant inné inobservé inoccupé
+ inondé inopiné inopportun inopérant inorganisé inoublié inouï inquiétant
+ insatisfait insaturé inscrit insensé insermenté insignifiant insinuant
+ insolent insondé insonorisant insonorisé insouciant insoupçonné inspirant
+ inspécifié installé instant instantané instructuré instruit insubordonné
+ insulinodépendant insulinorésistant insultant insulté insurgé insécurisant
+ intelligent intempérant intentionné interallié interaméricain intercepté
+ intercristallin intercurrent interdigité interdiocésain interdit
+ interfacé interfécond interférent interloqué intermittent intermédié interpolé
+ interprétant intersecté intersexué interstratifié interurbain intervenant
+ intestin intimidant intolérant intoxicant intoxiqué intramontagnard
+ intrigant introduit introjecté introverti intumescent intégrant intégrifolié
+ intéressé intérieur inusité inutilisé invaincu invalidant invariant invendu
+ inverti invertébré inviolé invitant involucré involuté invérifié invétéré
+ inéclairci inécouté inédit inégalé inélégant inéprouvé inépuisé inéquivalent
+ iodoformé ioduré iodylé iodé ionisant iridescent iridié irisé ironisant
+ irraisonné irrassasié irritant irrité irréalisé irréfléchi irréfuté irrémunéré
+ irrésolu irrévélé islamisant isohalin isolant isolé isosporé issant issu
+ itinérant ivoirin jacent jacobin jaillissant jamaïcain jamaïquain jambé
+ japonné jardiné jarreté jarré jaspé jauni jaunissant javelé jaïn jobard joint
+ joli joufflu jouissant jovial jubilant juché judaïsant jumelé juponné juré
+ juxtaposant juxtaposé kalmouk kanak kazakh kenyan kosovar kératinisé labié
+ lacinié lactant lactescent lactosé lacté lai laid lainé laité lambin lambrissé
+ lamifié laminé lampant lampassé lamé lancinant lancé lancéolé languissant
+ lapon laqué lardacé larmoyant larvé laryngé lassant latent latifolié latin
+ latté latéralisé lauré lauréat lavant lavé laïcisant lent lenticulé letton
+ lettré leucopéniant leucosporé leucostimulant levant levantin levretté levé
+ liant libertin libéré licencié lichénifié liftant lifté ligaturé lignifié
+ ligulé lilacé limacé limitant limougeaud limousin lionné lippu liquéfiant
+ lithiné lithié lité lié liégé lobulé lobé localisé loculé lointain lombard
+ lorrain loré losangé loti louchant loupé lourd lourdaud lubrifiant luisant
+ lunetté lunulé luné lusitain lustré luthé lutin lutéinisant lutéostimulant
+ lyophilisé lyré léché lénifiant léonard léonin léopardé lézardé maboul maclé
+ madré mafflu maghrébin magnésié magnétisant magrébin magyar mahométan maillant
+ majeur majorant majorquin maladroit malaisé malavisé malbâti malentendant
+ malformé malintentionné malnutri malodorant malotru malouin malpoli malsain
+ malséant maltraitant malté malveillant malvoyant maléficié mamelonné mamelu
+ manchot mandarin mandchou maniéré mannité manoeuvrant manquant manqué mansardé
+ mantouan manuscrit manuélin maori maraîchin marbré marcescent marchand
+ marial marin mariol marié marmottant marocain maronnant marquant marqueté
+ marquésan marrant marri martelé martyr marxisant masculin masculinisant
+ masqué massacrant massant massé massétérin mat matelassé mati matérialisé
+ maugrabin maugrebin meilleur melonné membrané membru menacé menant menaçant
+ mentholé menu merdoyant mesquin messin mesuré meublant mexicain micacé
+ microencapsulé microgrenu microplissé microéclaté miellé mignard migrant
+ militant millerandé millimétré millésimé mineur minidosé minorant minorquin
+ miraculé miraillé miraud mirobolant miroitant miroité miré mitigé mitré mité
+ mobiliérisé mochard modelant modifiant modulant modulé modélisant modéré
+ mogol moiré moisi moleté molletonné mollissant momentané momifié mondain mondé
+ monilié monobromé monochlamydé monochloré monocomposé monocontinu
+ monofluoré monogrammé monohalogéné monohydraté mononucléé monophasé
+ monopérianthé monoréfringent monosporé monotriphasé monovalent montagnard
+ montpelliérain monté monténégrin monumenté moralisant mordant mordicant
+ mordu morfal morfondu moribond moricaud mormon mort mortifiant morvandiot
+ mosellan motivant motivé mouchard moucheté mouflé mouillant mouillé moulant
+ moulé mourant moussant moussu moustachu moutonnant moutonné mouvant mouvementé
+ moyé mozambicain mucroné mugissant mulard multiarticulé multidigité
+ multilobé multinucléé multiperforé multiprogrammé multirésistant multisérié
+ multivalent multivarié multivitaminé multivoltin munificent murin muriqué
+ murrhin musard musclé musqué mussipontain musulman mutant mutilant mutin
+ myorelaxant myrrhé mystifiant mythifiant myélinisant myélinisé mâtiné méchant
+ méconnu mécontent mécréant médaillé médian médiat médicalisé médisant
+ méfiant mélangé mélanostimulant méningé méplat méprisant méritant mérulé
+ métallescent métallisé métamérisé métastasé méthoxylé méthyluré
+ métropolitain météorisant mêlé mûr mûrissant nabot nacré nageant nain naissant
+ nanti napolitain narcissisant nasard nasillard natal natté naturalisé naufragé
+ naval navigant navrant nazi nervin nervuré nervé nettoyant neumé neuralisant
+ neuroméningé neutralisant nickelé nictitant nidifiant nigaud nigérian
+ nippon nitescent nitrant nitrifiant nitrosé nitrurant nitré noir noiraud
+ nombrant nombré nominalisé nommé nonchalant normalisé normand normodosé
+ normotendu normé notarié nourri nourrissant noué noyé nu nuagé nuancé nucléolé
+ nullard numéroté nutant nué né nébulé nécessitant nécrosant négligent négligé
+ néoformé néolatin néonatal névrosant névrosé obligeant obligé oblitérant
+ obscur observant obsolescent obstiné obstrué obsédant obsédé obséquent
+ obturé obéi obéissant obéré occitan occupant occupé occurrent ocellé ochracé
+ oculé odorant odoriférant oeillé oeuvé offensant offensé officiant offrant
+ olivacé oléacé oléfiant oléifiant oman ombellé ombiliqué ombragé ombré
+ omniprésent omniscient ondoyant ondulant ondulé ondé onglé onguiculé ongulé
+ opalescent opalin operculé opiacé opportun opposant oppositifolié opposé
+ oppressé opprimant opprimé opsonisant optimalisant optimisant opulent opérant
+ orant ordonné ordré oreillard oreillé orfévré organisé organochloré
+ organosilicié orientalisant orienté oropharyngé orphelin orthonormé ortié
+ osmié ossifiant ossifluent ossu ostial ostracé ostrogot ostrogoth ostréacé osé
+ ouaté ourlé oursin outillé outrageant outragé outrecuidant outrepassé outré
+ ouvragé ouvrant ouvré ovalisé ovillé ovin ovulant ové oxycarboné oxydant
+ oxygéné ozoné oïdié pacifiant padan padouan pahlavi paillard pailleté pair
+ palatin palermitain palissé pallotin palmatilobé palmatinervé palmatiséqué
+ palmiséqué palmé palpitant panaché panafricain panard paniculé paniquant panné
+ pantelant pantouflard pané papalin papelard papilionacé papillonnant
+ papou papyracé paraffiné paralysant paralysé paramédian parcheminé parent
+ parfumé paridigitidé paridigité parigot paripenné parlant parlé parmesan
+ parsi partagé partant parti participant partisan partousard partouzard
+ parvenu paré passant passepoilé passerillé passionnant passionné passé pataud
+ patelin patelinant patent patenté patient patoisant patriotard pattu patté
+ paumé pavé payant pectiné pehlevi peigné peinard peint pelliculant pelliculé
+ peluché pelé penaud penchant penché pendant pendu pennatilobé pennatinervé
+ penninervé penné pensant pensionné pentavalent pentu peptoné perchloraté
+ percutant percutané perdant perdu perfectionné perfolié perforant performant
+ perfusé perlant perluré perlé permanent permutant perphosporé perruqué persan
+ persistant personnalisé personnifié personé persuadé persulfuré persécuté
+ pertinent perturbant perverti perçant pesant pestiféré petiot petit peul
+ pharmocodépendant pharyngé phasé philippin philistin phophorylé phosphaté
+ phosphoré photoinduit photoluminescent photorésistant photosensibilisant
+ phénolé phénotypé piaffant piaillant piaillard picard picoté pigeonnant
+ pignonné pillard pilonnant pilosébacé pimpant pinaillé pinchard pincé pinné
+ pinçard pionçant piquant piqué pisan pistillé pitchoun pivotant piégé
+ placé plafonnant plaidant plaignant plain plaisant plan planant planté plané
+ plasmolysé plastifiant plat plein pleurant pleurard pleurnichard pliant
+ plissé plié plombé plongeant plumeté pluriarticulé plurihandicapé plurinucléé
+ plurivalent pochard poché poignant poilant poilu pointillé pointu pointé
+ poitevin poivré polarisant polarisé poli policé politicard polluant
+ polycarburant polychloré polycontaminé polycopié polycristallin polydésaturé
+ polyhandicapé polyinsaturé polylobé polynitré polynucléé polyparasité
+ polysubstitué polysyphilisé polytransfusé polytraumatisé polyvalent
+ polyvoltin pommelé pommeté pompant pompé ponctué pondéré pontifiant pontin
+ poplité poqué porcelainé porcin porracé portant portoricain possédant possédé
+ postillonné postnatal postnéonatal posté postérieur posé potelé potencé
+ poupin pourprin pourri pourrissant poursuivant pourtournant poussant poussé
+ pratiquant prenant prescient prescrit pressant pressionné pressé prieur primal
+ privilégié probant prochain procombant procubain profilé profitant profond
+ programmé prohibé projetant prolabé proliférant prolongé prompt promu
+ prononcé propané proportionné proratisé proscrit prostré protestant protonant
+ protubérant protéiné provenant provocant provoqué proéminent prudent pruiné
+ préalpin prébendé précipitant précipité précité précompact préconscient
+ précontraint préconçu précuit précédent prédesséché prédestiné prédiffusé
+ prédisposant prédominant prédécoupé préemballé préencollé préenregistré
+ préfabriqué préfixé préformant préfragmenté préférant préféré prégnant
+ prélatin prématuré prémuni prémédité prénasalisé prénatal prénommé préoblitéré
+ préoccupé préparant prépayé prépondérant prépositionné préprogrammé préroman
+ présalé présanctifié présent présignifié présumé présupposé prétendu
+ prétraité prévalant prévalent prévenant prévenu prévoyant prévu préémargé
+ préétabli prêt prêtant prêté psychiatrisé psychostimulant psychoénergisant
+ puant pubescent pudibond puissant pulsant pulsé pultacé pulvérulent puni pur
+ puritain purpuracé purpurin purulent pustulé putrescent putréfié puéril puîné
+ pyramidant pyramidé pyrazolé pyroxylé pâli pâlissant pédant pédantisant
+ pédiculosé pédiculé pédonculé pékiné pélorié pénalisant pénard pénicillé
+ pénétrant pénétré péquenaud pérennant périanthé périgourdin périmé périnatal
+ pérégrin pérégrinant péréqué pétant pétaradant pétillant pétiolé pétochard
+ pétrifiant pétrifié pétré pétulant pêchant qatari quadrifolié quadrigéminé
+ quadriparti quadrivalent quadruplété qualifiant qualifié quantifié quart
+ questionné quiescent quinaud quint quintessencié quintilobé quiné quérulent
+ rabattable rabattant rabattu rabougri raccourci racorni racé radiant radicant
+ radiodiffusé radiolipiodolé radiorésistant radiotransparent radiotélévisé
+ raffermissant raffiné rafraîchi rafraîchissant rageant ragot ragoûtant
+ raisonné rajeunissant rallié ramassé ramenard ramifié ramolli ramollissant
+ ramé ranci rangé rapatrié rapiat raplati rappelé rapporté rapproché rarescent
+ rasant rassasiant rassasié rassemblé rassurant rassuré rassérénant rasé
+ ratiocinant rationalisé raté ravageant ravagé ravalé ravi ravigotant ravissant
+ rayé rebattu rebondi rebondissant rebutant recalé recarburant recercelé
+ rechigné recombinant recommandé reconnaissant reconnu reconstituant recoqueté
+ recroiseté recroquevillé recru recrudescent recrutant rectifiant recueilli
+ redenté redondant redoublant redoublé refait refoulant refoulé refroidissant
+ regardant regrossi reinté relaxant relevé reluisant relâché relégué remarqué
+ rempli remuant renaissant renchéri rendu renfermé renflé renfoncé renforçant
+ rengagé renommé rentrant rentré renté renversant renversé repentant repenti
+ reporté reposant reposé repoussant repoussé repressé représentant repu
+ resarcelé rescapé rescindant rescié respirant resplendissant ressemblant
+ ressortissant ressurgi ressuscité restant restreint restringent resurgi
+ retardé retentissant retenu retiré retombant retrait retraité retrayant
+ retroussé revanchard revigorant revitalisant reviviscent reçu rhinopharyngé
+ rhodié rhumatisant rhumé rhénan rhônalpin riant ribaud riboulant ricain
+ riciné ridé rifain rigolard ringard risqué riverain roidi romagnol romain
+ romand romanisant rompu rond rondouillard ronflant rongeant rosacé rossard
+ rotacé roublard roucoulant rouergat rougeaud rougeoyant rougi rougissant
+ rouleauté roulotté roulé roumain rouquin rousseauisant routinisé roué rubané
+ rubicond rubéfiant rudenté rugissant ruiné ruisselant ruminant rupin rurbain
+ rusé rutilant rythmé râblé râlant râpé réadapté réalisant récalcitrant récent
+ réchauffant réchauffé récidivant récitant réclamant réclinant récliné
+ réconfortant récurant récurrent récurvé récusant réduit réentrant réflectorisé
+ réfléchissant réformé réfrigérant réfrigéré réfringent réfugié référencé
+ régissant réglant réglé régnant régressé régénérant régénéré réhabilité
+ réitéré réjoui réjouissant rémanent rémittent rémunéré rénitent répandu
+ réprouvé républicain répugnant réputé réservé résidant résident résigné
+ résiné résistant résolu résolvant résonant résonnant résorbant résorciné
+ résumé résupiné résurgent rétabli rétamé réticent réticulé rétrofléchi
+ rétroréfléchissant rétréci réuni réussi réverbérant révoltant révolté révolu
+ révulsant révulsé révélé révérend rééquilibrant rêvé rôti sabin saccadé
+ sacchariné sacrifié sacré safrané sagitté sahraoui saignant saignotant
+ sain saint saisi saisissant saladin salant salarié salicylé salin salissant
+ samaritain samoan sanctifiant sanglant sanglotant sanguin sanguinolent
+ sanskrit santalin saoul saoulard saponacé sarrasin satané satiné satisfaisant
+ saturant saturnin saturé saucissonné saucé saugrenu saumoné saumuré sautant
+ sautillé sauté sauvagin savant savoyard scalant scarifié scellé sciant
+ sclérosant sclérosé scolié scoriacé scorifiant scout script scrobiculé
+ second secrétant semelé semi-fini sempervirent semé sensibilisant sensé senti
+ serein serpentin serré servant servi seul sexdigité sexué sexvalent seyant
+ sibyllin sidérant sifflant sigillé siglé signalé signifiant silicié silicosé
+ simplifié simultané simulé sinapisé sinisant siphonné situé slavisant
+ snobinard socialisant sociologisant sodé soiffard soignant soigné solognot
+ somali sommeillant sommé somnolant somnolent sonnant sonné sorbonnard sortant
+ souahéli soudain soudant soudé soufflant soufflé souffrant soufi soulevé
+ sourd souriant soussigné soutenu souterrain souverain soûlant soûlard
+ spatulé spermagglutinant spermimmobilisant sphacélé spiralé spirant spiritain
+ splénectomisé spontané sporulé spumescent spécialisé stabilisant stagnant
+ staphylin stationné stibié stigmatisant stigmatisé stimulant stipendié stipité
+ stipulé stratifié stressant strict strident stridulant strié structurant
+ stupéfait stupéfiant stylé sténohalin sténosant stérilisant stérilisé
+ su suant subalpin subclaquant subconscient subintrant subit subjacent
+ sublimant subneutralisant subordonnant subordonné subrogé subsident subséquent
+ subulé suburbain subventionné subérifié succenturié succinct succulent
+ sucrant sucré sucé suffisant suffocant suffragant suicidé suintant suivant
+ sulfamidorésistant sulfamidé sulfaté sulfhydrylé sulfoné sulfurant sulfurisé
+ superfin superfini superflu supergéant superhydratant superordonné superovarié
+ suppliant supplicié suppléant supportant supposé suppurant suppuré
+ supradivergent suprahumain supérieur surabondant suractivé surajouté suranné
+ surbrillant surchargé surchauffé surclassé surcomposé surcomprimé surcouplé
+ surdéterminant surdéterminé surdéveloppé surencombré surexcitant surexcité
+ surfin surfondu surfrappé surgelé surgi surglacé surhaussé surhumain suri
+ surmenant surmené surmultiplié surmusclé surneigé suroxygéné surperformé
+ surplombant surplué surprenant surpressé surpuissant surréalisant sursalé
+ sursaturé sursilicé surveillé survitaminé survivant survolté surémancipé
+ susdit susdénommé susmentionné susnommé suspect suspendu susrelaté susurrant
+ suzerain suédé swahili swahéli swazi swingant swingué sylvain sympathisant
+ synanthéré synchronisé syncopé syndiqué synthétisant systématisé séant sébacé
+ séchant sécurisant sécurisé séduisant ségrégué ségrégé sélectionné sélénié
+ sémitisant sénescent séparé séquencé séquestrant sérigraphié séroconverti
+ sérotonicodépendant sétacé sévillan tabou tabouisé tacheté taché tadjik taillé
+ taloté taluté talé tamil tamisant tamisé tamoul tangent tannant tanné tapant
+ tapissant taponné tapé taqueté taquin tarabiscoté taraudant tarentin tari
+ tartré taré tassé tatar taupé taurin tavelé teint teintant teinté telluré
+ tempérant tempéré tenaillant tenant tendu tentant ternifolié terraqué
+ terrifiant terrorisant tessellé testacé texan texturant texturé thallosporé
+ thermisé thermocollant thermodurci thermofixé thermoformé thermohalin
+ thermoluminescent thermopropulsé thermorémanent thermorésistant thrombopéniant
+ thrombosé thymodépendant thébain théocentré théorbé tibétain tiercé tigré tigé
+ timbré timoré tintinnabulant tiqueté tirant tiré tisonné tissu titané titré
+ tocard toisonné tolérant tombal tombant tombé tonal tondant tondu tonifiant
+ tonnant tonsuré tonturé tophacé toquard toqué torché tordant tordu torsadé
+ tortu torturant toscan totalisant totipotent touchant touffu toulousain
+ tourelé tourmentant tourmenté tournant tournoyant tourné tracassant tracté
+ traitant tramaillé tranchant tranché tranquillisant transafricain transalpin
+ transandin transcendant transcutané transfini transfixiant transformant
+ transi transloqué transmutant transpadan transparent transperçant transpirant
+ transposé transtévérin transylvain trapu traumatisant traumatisé travaillant
+ traversant travesti traçant traînant traînard treillissé tremblant tremblotant
+ trempant trempé tressaillant triboluminescent tributant trichiné tricoté
+ tridenté trifoliolé trifolié trifurqué trigéminé trilobé trin trinervé
+ triparti triphasé triphosphaté trisubstitué tritié trituberculé triturant
+ trivialisé trompettant tronqué troublant trouillard trouvé troué truand
+ truffé truité trypsiné trébuchant tréflé trémulant trépassé trépidant
+ tuant tubard tubectomisé tuberculé tubulé tubéracé tubérifié tubérisé tufacé
+ tuilé tumescent tuméfié tuniqué turbiné turbocompressé turbulent turgescent
+ tutsi tué twisté typé tâtonnant téflonisé téléphoné télévisé ténorisant
+ térébrant tétraphasé tétrasubstitué tétravalent têtu tôlé ulcéré ultraciblé
+ ultracourt ultrafin ultramontain ultérieur uncinulé unciné uni unifiant
+ uniformisant unilobé uninucléé uniovulé unipotent uniramé uniréfringent
+ unistratifié unisérié unitegminé univalent univitellin univoltin urbain
+ urgent urticant usagé usant usité usé utriculé utérin utérosacré vacant
+ vacciné vachard vacillant vadrouillant vagabond vagabondant vaginé vagissant
+ vain vaincu vairé valdôtain valgisant validant vallonné valorisant valué
+ valvé vanadié vanilliné vanillé vanisé vanné vantard variolé varisant varié
+ varvé vasard vascularisé vasostimulant vasouillard vaudou veinard veiné
+ velu venaissin venant vendu ventripotent ventromédian ventru venté verdissant
+ vergeté verglacé verglaçant vergé verjuté vermicellé vermiculé vermoulant
+ verni vernissé verré versant versé vert verticillé vertébré vespertin vexant
+ vibrionnant vicariant vicelard vicié vieilli vieillissant vigil vigilant
+ vigorisant vil vilain violacé violent violoné vipérin virevoltant viril
+ virulent visigoth vitaminé vitellin vitré vivant viverrin vivifiant vivotant
+ vogoul voilé voisin voisé volant volanté volatil voletant voltigeant volvulé
+ vorticellé voulu voussé voyant voûté vrai vrillé vrombissant vu vulnérant
+ vulturin vécu végétant véhément vélin vélomotorisé vérolé vésicant vésiculé
+ vêtu wallingant watté wisigoth youpin zazou zend zigzagant zinzolin zoné
+ zoulou zélé zézayant âgé ânonnant ébahi ébaubi éberlué éblouissant ébouriffant
+ éburnin éburné écaillé écartelé écarté écervelé échancré échantillonné échappé
+ échauffant échauffé échevelé échiqueté échoguidé échu éclairant éclaircissant
+ éclatant éclaté éclipsant éclopé écoeurant écorché écoté écoutant écranté
+ écrasé écrit écru écrémé éculé écumant édenté édifiant édulcorant égaillé
+ égaré égayant égrillard égrisé égrotant égueulé éhanché éhonté élaboré élancé
+ électrisant électroconvulsivant électrofondu électroluminescent
+ élevé élingué élisabéthain élizabéthain éloigné éloquent élu élégant
+ émacié émanché émancipé émarginé émergent émergé émerillonné émerveillant
+ émigré éminent émollient émotionnant émoulu émoustillant émouvant ému
+ émulsionnant éméché émétisant énergisant énervant énervé épaississant épanoui
+ épargnant épatant épaté épeigné éperdu épeuré épicotylé épicutané épicé
+ épigé épinglé éploré éployé épointé époustouflant épouvanté éprouvant éprouvé
+ épuisé épuré équicontinu équidistant équilibrant équilibré équin équipollent
+ équipolé équipotent équipé équitant équivalent éraillé éreintant éreinté
+ érubescent érudit érythématopultacé établi étagé éteint étendu éthéré
+ étiolé étoffé étoilé étonnant étonné étouffant étouffé étourdi étourdissant
+ étriquant étriqué étroit étudiant étudié étymologisant évacuant évacué évadé
+""".split())
--- a/spacy/lang/fr/lemmatizer/_adjectives_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_adjectives_irreg.py
--- a/spacy/lang/fr/lemmatizer/_adverbs.py
+++ b/spacy/lang/fr/lemmatizer/_adverbs.py
@ -0,0 +1,553 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+ADVERBS = set("""
+ abandonnément abjectement abominablement abondamment aboralement abouliquement
+ abruptement abrégément abréviativement absconsement absconsément absolument
+ abstraitement abstrusément absurdement abusivement académiquement
+ accelerando acceptablement accessoirement accidentellement accortement
+ acidement acoustiquement acrimonieusement acrobatiquement actiniquement
+ actuellement adagio additionnellement additivement adiabatiquement
+ adjectivement administrativement admirablement admirativement adorablement
+ adultérieurement adverbialement adversativement adéquatement affablement
+ affectionnément affectivement affectueusement affinement affirmativement
+ agilement agitato agnostiquement agogiquement agressivement agrestement
+ agrologiquement agronomiquement agréablement aguicheusement aidant aigrement
+ ailleurs aimablement ainsi aisément alchimiquement alcooliquement alentour
+ algorithmiquement algébriquement alias alimentairement allegretto allegro
+ allopathiquement allusivement allègrement allégoriquement allégrement allégro
+ alphabétiquement alternativement altimétriquement altièrement altruistement
+ amabile ambigument ambitieusement amiablement amicalement amiteusement
+ amoroso amoureusement amphibologiquement amphigouriquement amplement
+ amusément amènement amèrement anachroniquement anagogiquement
+ analogiquement analoguement analytiquement anaphoriquement anarchiquement
+ ancestralement anciennement andante andantino anecdotiquement angulairement
+ angéliquement anharmoniquement animalement animato annuellement anodinement
+ anormalement anthropocentriquement anthropologiquement anthropomorphiquement
+ anticipativement anticonstitutionnellement antidémocratiquement
+ antinomiquement antipathiquement antipatriotiquement antiquement
+ antisocialement antisportivement antisymétriquement antithétiquement
+ antiétatiquement antécédemment antérieurement anxieusement apathiquement
+ apicalement apocalyptiquement apodictiquement apologétiquement apostoliquement
+ appassionato approbativement approchant approximativement appréciablement
+ après-demain aquatiquement arbitrairement arbitralement archaïquement
+ architecturalement archéologiquement ardemment argotiquement aridement
+ arithmétiquement aromatiquement arrière arrogamment articulairement
+ artificiellement artificieusement artisanalement artistement artistiquement
+ aseptiquement asiatiquement assai assertivement assertoriquement assez
+ associativement assurément astrologiquement astrométriquement astronomiquement
+ astucieusement asymptotiquement asymétriquement ataviquement ataxiquement
+ atomiquement atrabilairement atrocement attenant attentionnément attentivement
+ attractivement atypiquement aucunement audacieusement audiblement
+ auditivement auguralement augustement aujourd'hui auparavant auprès
+ aussi aussitôt austèrement autant autarciquement authentiquement
+ autographiquement automatiquement autonomement autoritairement autrefois
+ auxiliairement avant avant-hier avantageusement avarement avaricieusement
+ aventurément aveuglément avidement avunculairement axialement axiologiquement
+ aériennement aérodynamiquement aérostatiquement babéliquement bachiquement
+ badaudement badinement balistiquement balourdement balsamiquement banalement
+ barbarement barométriquement baroquement bas bassement batailleusement
+ baveusement beau beaucoup bellement belliqueusement ben bene benoîtement
+ bestialement bibliographiquement bibliquement bien bienheureusement bientôt
+ bigotement bigrement bihebdomadairement bijectivement bijournellement
+ bileusement bilieusement bilinéairement bimensuellement bimestriellement
+ bioacoustiquement biochimiquement bioclimatiquement biodynamiquement
+ biogénétiquement biologiquement biomédicalement bioniquement biophysiquement
+ bioélectroniquement bioénergétiquement bipolairement biquotidiennement bis
+ biunivoquement bizarrement bizarroïdement blafardement blagueusement
+ blondement blâmablement bon bonassement bonnement bordéliquement botaniquement
+ boueusement bouffonnement bougonnement bougrement boulimiquement
+ bravachement bravement bredouilleusement bref brillamment brièvement
+ brumeusement brusquement brut brutalement bruyamment bucoliquement
+ bureaucratiquement burlesquement byzantinement béatement bégueulement
+ bénéfiquement bénévolement béotiennement bésef bézef bêtement cabalistiquement
+ cabotinement cachottièrement cacophoniquement caf cafardeusement
+ cajoleusement calamiteusement calligraphiquement calmement calmos
+ calorimétriquement caloriquement canaillement cancérologiquement candidement
+ cantabile capablement capillairement capitalement capitulairement
+ captieusement caractériellement caractérologiquement cardiographiquement
+ caricaturalement carrément cartographiquement cartésiennement casanièrement
+ casuellement catalytiquement catastrophiquement catholiquement
+ catégoriquement causalement causativement caustiquement cauteleusement
+ caverneusement cellulairement censitairement censément centièmement
+ cependant certainement certes cf chafouinement chagrinement chaleureusement
+ chaotiquement charismatiquement charitablement charnellement chastement
+ chattement chaud chaudement chauvinement chenuement cher chevaleresquement
+ chichement chichiteusement chimiquement chimériquement chinoisement
+ chiquement chirographairement chirographiquement chirurgicalement chiément
+ chouettement chromatiquement chroniquement chronologiquement
+ chrétiennement chèrement chétivement ci cinquantièmement cinquièmement
+ cinématographiquement cinétiquement circonspectement circonstanciellement
+ citadinement civilement civiquement clairement clandestinement classiquement
+ climatologiquement cliniquement cléricalement cocassement cochonnement
+ coextensivement collatéralement collectivement collusoirement collégialement
+ coléreusement colériquement combien combinatoirement comiquement comme
+ comment commercialement comminatoirement commodément communalement
+ communément comparablement comparativement compatiblement compendieusement
+ compensatoirement complaisamment complexement complètement complémentairement
+ compréhensivement comptablement comptant compulsivement conardement
+ conceptuellement concernant concevablement concisément concomitamment
+ concurremment concussionnairement condamnablement conditionnellement confer
+ confidemment confidentiellement conflictuellement conformationnellement
+ confortablement confraternellement confusément confédéralement congrument
+ congénitalement coniquement conjecturalement conjointement conjonctivement
+ conjugalement connardement connement connotativement consciemment
+ consensuellement conservatoirement considérablement considérément
+ constamment constitutionnellement constitutivement consubstantiellement
+ consécutivement conséquemment contagieusement contemplativement
+ contestablement contextuellement continuellement continûment contractuellement
+ contrairement contrapuntiquement contrastivement contre contributoirement
+ convenablement conventionnellement conventuellement convivialement
+ coopérativement copieusement coquettement coquinement cordialement coriacement
+ coronairement corporativement corporellement corpusculairement correct
+ correctionnellement corrosivement corrélativement cosmiquement
+ cosmographiquement cosmologiquement cossardement cotonneusement couardement
+ courageusement couramment court courtement courtoisement coutumièrement
+ craintivement crapuleusement crescendo criardement criminellement
+ critiquablement critiquement croyez-en crucialement cruellement
+ crânement crédiblement crédulement crépusculairement crétinement crûment
+ cuistrement culinairement cultuellement culturellement cumulativement
+ curativement curieusement cursivement curvilignement cybernétiquement
+ cylindriquement cyniquement cynégétiquement cytogénétiquement cytologiquement
+ célestement célibatairement cérébralement cérémoniellement cérémonieusement
+ d'abondance d'abord d'ailleurs d'après d'arrache-pied d'avance d'emblée d'ici
+ d'office d'urgence d'évidence dactylographiquement damnablement dangereusement
+ debout decimo decrescendo dedans dehors demain densément depuis derechef
+ dernièrement derrière descriptivement despotiquement deusio deuxièmement
+ devant dextrement dextrorse dextrorsum diablement diaboliquement
+ diacoustiquement diagonalement dialectalement dialectiquement
+ dialogiquement diamétralement diantrement diatoniquement dichotomiquement
+ didactiquement difficilement difficultueusement diffusément différemment
+ digitalement dignement dilatoirement diligemment dimanche dimensionnellement
+ dinguement diplomatiquement directement directo disciplinairement
+ discontinûment discourtoisement discriminatoirement discrètement
+ discursivement disertement disgracieusement disjonctivement disons-le
+ dispendieusement disproportionnellement disproportionnément dissemblablement
+ dissuasivement dissymétriquement distinctement distinctivement distraitement
+ distributivement dithyrambiquement dito diurnement diversement divinement
+ dixièmement diététiquement docilement docimologiquement doctement
+ doctrinairement doctrinalement documentairement dodécaphoniquement
+ dogmatiquement dolce dolcissimo dolemment dolentement dolosivement
+ dommageablement donc dont doriquement dorsalement dorénavant doublement
+ doucereusement doucettement douceâtrement douillettement douloureusement
+ doux douzièmement draconiennement dramatiquement drastiquement
+ droit droitement drolatiquement dru drument drôlement dubitativement dur
+ durement dynamiquement dynamogéniquement dysharmoniquement débilement
+ décadairement décemment décidément décimalement décisivement déclamatoirement
+ dédaigneusement déductivement défavorablement défectueusement défensivement
+ définitivement dégoûtamment dégressivement dégueulassement déictiquement déjà
+ délibérément délicatement délicieusement déloyalement délétèrement
+ démentiellement démesurément démocratiquement démographiquement démoniaquement
+ démonstrativement démotiquement déontologiquement départementalement
+ déplaisamment déplorablement dépressivement dépréciativement déraisonnablement
+ dérivationnellement dérogativement dérogatoirement désagréablement
+ désastreusement désavantageusement désespéramment désespérément déshonnêtement
+ désobligeamment désolamment désordonnément désormais déterminément
+ dévotement dévotieusement dûment ecclésiastiquement ecclésiologiquement
+ efficacement effrayamment effrontément effroyablement effrénément
+ elliptiquement emblématiquement embryologiquement embryonnairement
+ emphatiquement empiriquement encor encore encyclopédiquement
+ endémiquement enfantinement enfin enjôleusement ennuyeusement ensemble
+ ensuite enthousiastement entièrement entomologiquement enviablement
+ environ ergonomiquement erratiquement erronément eschatologiquement
+ espressivo essentiellement esthétiquement estimablement etc ethniquement
+ ethnolinguistiquement ethnologiquement eucharistiquement euphoniquement
+ euphémiquement euristiquement européennement eurythmiquement euréka exactement
+ exaspérément excellemment excentriquement exceptionnellement excepté
+ exclamativement exclusivement excédentairement exemplairement exhaustivement
+ existentiellement exorbitamment exotiquement expansivement expertement
+ explicativement explicitement explosivement explétivement exponentiellement
+ expressément exprès expéditivement expérimentalement exquisément extatiquement
+ extensionnellement extensivement extra-muros extrajudiciairement
+ extravagamment extrinsèquement extrêmement extérieurement exécrablement
+ exégétiquement fabuleusement facheusement facile facilement facticement
+ factitivement factuellement facultativement facétieusement fadassement
+ faiblardement faiblement fallacieusement falotement fameusement familialement
+ faméliquement fanatiquement fanfaronnement fangeusement fantaisistement
+ fantasmatiquement fantasquement fantastiquement fantomatiquement
+ faramineusement faraudement farouchement fascistement fashionablement
+ fastueusement fatalement fatalistement fatidiquement faussement fautivement
+ favorablement ferme fermement ferroviairement fertilement fervemment
+ fichtrement fichument fichûment fictivement fiduciairement fidèlement
+ figurativement figurément filandreusement filialement filmiquement fin
+ financièrement finaudement finement finiment fiscalement fissa fixement
+ fiévreusement flagorneusement flasquement flatteusement flegmatiquement
+ flexionnellement flexueusement flou fluidement flâneusement flémardement
+ foireusement folkloriquement follement folâtrement foncièrement
+ fondamentalement forcément forestièrement forfaitairement formellement
+ fort forte fortement fortissimo fortuitement fougueusement fourbement
+ foutument foutûment fragilement fragmentairement frais franc franchement
+ fraternellement frauduleusement fraîchement frigidement frigo frileusement
+ frisquet frivolement froidement frontalement froussardement fructueusement
+ frustement frénétiquement fréquemment frêlement fugacement fugitivement
+ fumeusement funestement funèbrement funérairement furibardement furibondement
+ furioso furtivement futilement futurement fâcheusement fébrilement fécondement
+ félinement félonnement fémininement féodalement férocement fétidement
+ gaffeusement gaiement gaillardement galamment gallicanement galvaniquement
+ gammathérapiquement ganglionnairement gargantualement gastronomiquement
+ gauloisement gaîment geignardement gentement gentiment gestuellement
+ giratoirement glacialement glaciologiquement glaireusement glandulairement
+ globalement glorieusement gloutonnement gnostiquement gnoséologiquement
+ goguenardement goinfrement goniométriquement gothiquement gouailleusement
+ goulûment gourdement gourmandement goutteusement gouvernementalement
+ gracilement gracioso graduellement grammaticalement grand grandement
+ graphiquement graphologiquement gras grassement gratis gratuitement grave
+ gravement grazioso grincheusement grivoisement grièvement grossement
+ grotesquement grégairement guillerettement gutturalement guère guères
+ gyroscopiquement gâteusement gélatineusement génialement génitalement
+ généralement généreusement génériquement génétiquement géodynamiquement
+ géographiquement géologiquement géométralement géométriquement géophysiquement
+ habilement habituellement hagardement hagiographiquement haineusement
+ hargneusement harmonieusement harmoniquement hasardeusement haut hautainement
+ haïssablement hebdomadairement heptagonalement herméneutiquement
+ heureusement heuristiquement hexagonalement hexaédriquement hideusement hier
+ hippiatriquement hippiquement hippologiquement histologiquement historiquement
+ hiératiquement hiéroglyphiquement homocentriquement homographiquement
+ homologiquement homothétiquement homéopathiquement homériquement honnêtement
+ honorifiquement honteusement horizontalement hormis hormonalement horriblement
+ hospitalièrement hostilement houleusement huileusement huitièmement
+ humanitairement humblement humidement humoristiquement humoureusement
+ hydrauliquement hydrodynamiquement hydrographiquement hydrologiquement
+ hydropneumatiquement hydrostatiquement hydrothérapiquement hygiéniquement
+ hypercorrectement hypnotiquement hypocondriaquement hypocoristiquement
+ hypodermiquement hypostatiquement hypothécairement hypothétiquement
+ hâtivement hébraïquement héliaquement hélicoïdalement héliographiquement
+ hémiédriquement hémodynamiquement hémostatiquement héraldiquement héroïquement
+ hérétiquement hétéroclitement hétérodoxement hétérogènement ibidem
+ ici ici-bas iconiquement iconographiquement id idem identiquement
+ idiosyncrasiquement idiosyncratiquement idiotement idoinement idolatriquement
+ idylliquement idéalement idéalistement idéellement idéographiquement
+ ignarement ignoblement ignominieusement ignoramment illicitement illico
+ illogiquement illusoirement illustrement illégalement illégitimement
+ imaginativement imbattablement imbécilement immaculément immanquablement
+ immensément imminemment immobilement immodestement immodérément immondement
+ immortellement immuablement immunitairement immunologiquement immédiatement
+ immémorialement impairement impalpablement imparablement impardonnablement
+ impartialement impassiblement impatiemment impavidement impayablement
+ impensablement imperceptiblement impersonnellement impertinemment
+ impitoyablement implacablement implicitement impoliment impolitiquement
+ importunément impossiblement imprescriptiblement impressivement improbablement
+ impromptu improprement imprudemment imprécisément imprévisiblement impudemment
+ impulsivement impunément impurement impénétrablement impérativement
+ impérieusement impérissablement impétueusement inacceptablement
+ inactivement inadmissiblement inadéquatement inaliénablement inaltérablement
+ inamoviblement inappréciablement inassouvissablement inattaquablement
+ inaudiblement inauguralement inauthentiquement inavouablement incalculablement
+ incertainement incessamment incestueusement incidemment incisivement
+ inciviquement inclusivement incoerciblement incognito incommensurablement
+ incommutablement incomparablement incomplètement incompréhensiblement
+ inconcevablement inconciliablement inconditionnellement inconfortablement
+ inconsciemment inconsidérément inconsolablement inconstamment
+ inconséquemment incontestablement incontinent incontournablement
+ inconvenablement incorporellement incorrectement incorrigiblement
+ increvablement incroyablement incrédulement incurablement indescriptiblement
+ indicativement indiciairement indiciblement indifféremment indigemment
+ indignement indirectement indiscernablement indiscontinûment indiscrètement
+ indispensablement indissociablement indissolublement indistinctement
+ indivisiblement indivisément indocilement indolemment indomptablement
+ inductivement indulgemment industriellement industrieusement indécelablement
+ indéchiffrablement indécrottablement indéfectiblement indéfendablement
+ indéfinissablement indélicatement indélébilement indémontablement
+ indépassablement indépendamment indéracinablement indésirablement
+ indéterminément indévotement indûment ineffablement ineffaçablement
+ ineptement inertement inespérément inesthétiquement inestimablement
+ inexcusablement inexorablement inexpertement inexpiablement inexplicablement
+ inexprimablement inexpugnablement inextinguiblement inextirpablement
+ infailliblement infantilement infatigablement infectement infernalement
+ infimement infiniment infinitésimalement inflexiblement informatiquement
+ infra infructueusement infâmement inférieurement inglorieusement ingratement
+ ingénieusement ingénument inhabilement inhabituellement inharmonieusement
+ inhospitalièrement inhumainement inhéremment inimaginablement inimitablement
+ inintelligiblement inintentionnellement iniquement initialement
+ injurieusement injustement injustifiablement inlassablement innocemment
+ inoffensivement inopinément inopportunément inoubliablement inoxydablement
+ inquisitorialement inquiètement insaisissablement insalubrement insanement
+ insciemment insensiblement insensément insidieusement insignement
+ insipidement insolemment insolitement insolublement insondablement
+ insoucieusement insoupçonnablement insoutenablement instablement instamment
+ instinctivement institutionnellement instructivement instrumentalement
+ insulairement insupportablement insurmontablement insurpassablement
+ inséparablement intangiblement intarissablement intellectuellement
+ intelligiblement intempestivement intemporellement intenablement
+ intensivement intensément intentionnellement intercalairement
+ interlinéairement interlopement interminablement intermusculairement
+ interplanétairement interprofessionnellement interprétativement
+ intersyndicalement intervocaliquement intimement intolérablement
+ intraitablement intramusculairement intransitivement intraveineusement
+ introspectivement intrépidement intuitivement intègrement intégralement
+ intérimairement inutilement invalidement invariablement inventivement
+ invinciblement inviolablement invisiblement involontairement
+ invulnérablement inébranlablement inégalablement inégalement inégalitairement
+ inélégamment inénarrablement inépuisablement inéquitablement inévitablement
+ ironiquement irraisonnablement irrationnellement irrattrapablement
+ irrespectueusement irrespirablement irresponsablement irréconciliablement
+ irrécusablement irréductiblement irréellement irréfragablement irréfutablement
+ irréligieusement irrémissiblement irrémédiablement irréparablement
+ irréprochablement irrépréhensiblement irrésistiblement irrésolument
+ irrévocablement irrévéremment irrévérencieusement isolément isothermiquement
+ isoédriquement item itou itérativement jacobinement jadis jalousement jamais
+ jaune jeudi jeunement jobardement jointivement joliment journalistiquement
+ jovialement joyeusement judaïquement judiciairement judicieusement
+ juste justement justifiablement juvénilement juxtalinéairement jésuitement
+ kaléidoscopiquement kilométriquement l'année l'après-midi l'avant-veille
+ labialement laborieusement labyrinthiquement laconiquement lactiquement
+ ladrement laidement laiteusement lamentablement langagièrement langoureusement
+ languissamment lapidairement large largement larghetto largo lascivement
+ latinement latéralement laxistement laïquement legato lentement lento lerch
+ lestement lexicalement lexicographiquement lexicologiquement libertinement
+ librement libéralement licencieusement licitement ligamentairement
+ limitativement limpidement linguistiquement linéairement linéalement
+ lisiblement lithographiquement lithologiquement litigieusement littérairement
+ liturgiquement lividement livresquement localement logarithmiquement
+ logistiquement logographiquement loin lointainement loisiblement long
+ longitudinalement longtemps longuement loquacement lors louablement
+ louchement loufoquement lourd lourdaudement lourdement loyalement lubriquement
+ lucrativement ludiquement lugubrement lumineusement lunairement lunatiquement
+ lustralement luxueusement luxurieusement lymphatiquement lyriquement là là-bas
+ là-dessous là-dessus là-haut lâchement légalement légendairement léger
+ légitimement légèrement léthargiquement macabrement macache macaroniquement
+ machinalement macrobiotiquement macroscopiquement maestoso magiquement
+ magnanimement magnifiquement magnétiquement magnétohydrodynamiquement
+ maigrement maintenant majestueusement majoritairement mal maladivement
+ malaisément malcommodément malencontreusement malgracieusement malgré
+ malheureusement malhonnêtement malicieusement malignement malproprement
+ malveillamment maléfiquement maniaquement manifestement manuellement mardi
+ maritalement maritimement marmiteusement marotiquement marre martialement
+ masochistement massivement maternellement mathématiquement matin matinalement
+ matriarcalement matrilinéairement matrimonialement maturément matérialistement
+ maupiteusement mauresquement maussadement mauvais mauvaisement maxi
+ meilleur mensongèrement mensuellement mentalement menteusement menu
+ mercredi merdeusement merveilleusement mesquinement mesurément mezzo
+ micrographiquement micrométriquement microphysiquement microscopiquement
+ miette mieux mignardement mignonnement militairement millimétriquement
+ minablement mincement minimement ministériellement minoritairement
+ minutieusement minéralogiquement miraculeusement mirifiquement mirobolamment
+ misanthropiquement misogynement misérablement miséreusement
+ miteusement mièvrement mnémoniquement mnémotechniquement mobilièrement
+ modalement moderato modernement modestement modiquement modulairement
+ moelleusement moindrement moins mollassement mollement mollo molto
+ momentanément monacalement monarchiquement monastiquement mondainement
+ monocordement monographiquement monolithiquement monophoniquement monotonement
+ monumentalement monétairement moqueusement moralement moralistement
+ mordicus morganatiquement mornement morosement morphologiquement mortellement
+ morveusement moult moutonnièrement moyennant moyennement moyenâgeusement
+ multidisciplinairement multilatéralement multinationalement multiplement
+ multipolairement municipalement musculairement musculeusement musicalement
+ mutuellement mystiquement mystérieusement mythiquement mythologiquement
+ mécanographiquement méchamment médialement médiatement médiatiquement
+ médicinalement médiocrement méditativement mélancoliquement mélodieusement
+ mélodramatiquement mémorablement méphistophéliquement méprisablement
+ méritoirement métaboliquement métalinguistiquement métalliquement
+ métallurgiquement métalogiquement métamathématiquement métaphoriquement
+ méthodiquement méthodologiquement méticuleusement métonymiquement métriquement
+ météorologiquement même mêmement mûrement n'étant naguère narcissiquement
+ narrativement nasalement nasillardement natalement nationalement nativement
+ naturellement naïvement ne nenni nerveusement net nettement
+ neurolinguistiquement neurologiquement neurophysiologiquement
+ neutrement neuvièmement niaisement nib nigaudement noblement nocivement
+ noirement nomadement nombreusement nominalement nominativement nommément non
+ nonobstant noologiquement normalement normativement nostalgiquement
+ notamment notarialement notoirement nouménalement nouvellement noétiquement
+ nucléairement nuisiblement nuitamment nullement numismatiquement numériquement
+ nuptialement néanmoins nébuleusement nécessairement néfastement négatif
+ négligemment néologiquement névrotiquement nûment objectivement oblativement
+ oblige obligeamment obliquement obrepticement obscurément obscènement
+ obsessivement obstinément obséquieusement obtusément obèsement
+ occultement octogonalement oculairement océanographiquement odieusement
+ oenologiquement offensivement officiellement officieusement oiseusement
+ olfactivement oligarchiquement ombrageusement onc oncques onctueusement
+ oniriquement onomatopéiquement onques ontologiquement onzièmement onéreusement
+ ophtalmologiquement opiniâtrement opportunistement opportunément opposément
+ optimalement optimistement optionnellement optiquement opulemment
+ opératoirement orageusement oralement oratoirement orbiculairement
+ ordinairement ordurièrement ores organiquement organoleptiquement orgiaquement
+ orgueilleusement orientalement originairement originalement originellement
+ orographiquement orthodoxement orthogonalement orthographiquement
+ orthopédiquement osmotiquement ostensiblement ostentatoirement oublieusement
+ out outrageusement outrancièrement outre outre-atlantique outre-mer
+ outrecuidamment ouvertement ovalement oviparement ovoviviparement
+ pacifiquement paillardement pairement paisiblement palingénésiquement
+ paléobotaniquement paléographiquement paléontologiquement panoptiquement
+ pantagruéliquement papelardement paraboliquement paradigmatiquement
+ paralittérairement parallactiquement parallèlement paramilitairement
+ parasitairement parcellairement parcellement parcimonieusement pardon
+ paresseusement parfaitement parfois parisiennement paritairement
+ parodiquement paroxistiquement paroxystiquement partant parthénogénétiquement
+ particulièrement partiellement partout pas passablement passagèrement passim
+ passionnément passivement passé pastoralement pataudement patelinement
+ paternellement paternement pathologiquement pathétiquement patibulairement
+ patriarcalement patrilinéairement patrimonialement patriotiquement pauvrement
+ païennement peinardement peineusement penaudement pendablement pendant
+ pensivement pentatoniquement perceptiblement perceptivement perdurablement
+ permissivement pernicieusement perpendiculairement perplexement
+ persifleusement perso personnellement perspicacement persuasivement
+ pertinemment perversement pesamment pessimistement petit petitement peu
+ peut-être pharisaïquement pharmacologiquement philanthropement
+ philistinement philologiquement philosophiquement phobiquement phoniquement
+ phonologiquement phonématiquement phonémiquement phonétiquement
+ photographiquement photométriquement phrénologiquement phylogénétiquement
+ physiologiquement physionomiquement physiquement phénoménalement
+ phénoménologiquement pianissimo pianistiquement piano pictographiquement
+ pieusement pile pinailleusement pingrement piteusement pitoyablement
+ piètrement più placidement plaignardement plaintivement plaisamment
+ plantureusement planétairement plastiquement plat platement platoniquement
+ plein pleinement pleutrement pluriannuellement pluridisciplinairement
+ plurinationalement plus plutoniquement plutôt plébéiennement plénièrement
+ pléthoriquement pneumatiquement point pointilleusement pointu poisseusement
+ poliment polissonnement politiquement poltronnement polygonalement
+ polyédriquement polémiquement pomologiquement pompeusement ponctuellement
+ pontificalement populairement pornographiquement positionnellement
+ possessivement possessoirement possiblement posthumement posthumément
+ postérieurement posément potablement potentiellement pourquoi pourtant
+ poétiquement pragmatiquement pratiquement premièrement presque prestement
+ prestissimo presto primairement primesautièrement primitivement primo
+ principalement princièrement printanièrement prioritairement privativement
+ probablement probement problématiquement processionnellement prochainement
+ proconsulairement prodigalement prodigieusement prodiguement productivement
+ professionnellement professoralement profitablement profond profondément
+ progressivement projectivement proleptiquement prolifiquement prolixement
+ promptement pronominalement prophylactiquement prophétiquement propicement
+ proportionnellement proportionnément proprement prosaïquement prosodiquement
+ prospèrement protocolairement protohistoriquement prou proverbialement
+ provincialement provisionnellement provisoirement prudement prudemment
+ préalablement précairement précautionneusement précieusement précipitamment
+ précisément précocement précédemment préférablement préférentiellement
+ préjudiciablement préliminairement prélogiquement prématurément
+ prépositivement préscolairement présentement présomptivement présomptueusement
+ présumément prétendument prétentieusement préventivement prévisionnellement
+ psychanalytiquement psychiatriquement psychiquement psycholinguistiquement
+ psychométriquement psychopathologiquement psychophysiologiquement
+ psychosomatiquement psychothérapiquement puamment publicitairement
+ pudibondement pudiquement pugnacement puissamment pulmonairement
+ purement puritainement pusillanimement putainement putassièrement putativement
+ pyramidalement pyrométriquement pâlement pâteusement pécuniairement
+ pédamment pédantement pédantesquement pédestrement péjorativement pénalement
+ péniblement pénitentiairement pépèrement péremptoirement périlleusement
+ périphériquement périscolairement pétrochimiquement pétrographiquement
+ pêle-mêle quadrangulairement quadrimestriellement quadruplement
+ quand quantitativement quarantièmement quarto quasi quasiment quater
+ quatrièmement quellement quelque quelquefois question quinto quinzièmement
+ quiètement quoique quotidiennement racialement racoleusement radiairement
+ radicalement radieusement radinement radiographiquement radiologiquement
+ radiotélégraphiquement radioélectriquement rageusement raidement railleusement
+ rapacement rapidement rapido rapidos rarement rarissimement ras rasibus
+ recevablement reconventionnellement recta rectangulairement rectilignement
+ redoutablement regrettablement relativement religieusement remarquablement
+ reproductivement représentativement respectablement respectivement
+ restrictivement revêchement rhomboédriquement rhéologiquement rhétoriquement
+ richissimement ridiculement rieusement rigidement rigoureusement rinforzando
+ risiblement ritardando rituellement robustement rocailleusement
+ rogatoirement roguement roidement romainement romancièrement romanesquement
+ rond rondement rondouillardement rosement rossement rotativement roturièrement
+ rougement routinièrement royalement rubato rudement rudimentairement
+ ruralement rustaudement rustiquement rustrement rythmiquement
+ réactionnairement réalistement rébarbativement récemment réciproquement
+ rédhibitoirement réellement réflexivement réfractairement référendairement
+ régionalement réglementairement réglo régressivement régulièrement
+ répréhensiblement répulsivement répétitivement résidentiellement
+ résineusement résolument rétivement rétroactivement rétrospectivement
+ révocablement révolutionnairement révéremment révérencieusement rêveusement
+ sacerdotalement sacramentellement sacrilègement sacrément sadiquement
+ sagement sagittalement sainement saintement saisonnièrement salacement
+ salaudement salement salubrement salutairement samedi sanguinairement
+ saphiquement sarcastiquement sardoniquement sataniquement satiriquement
+ satyriquement sauf saumâtrement sauvagement savamment savoureusement
+ scalairement scandaleusement scatologiquement sceptiquement scherzando scherzo
+ schématiquement sciemment scientifiquement scolairement scolastiquement
+ sculpturalement scélératement scéniquement sec secondairement secondement
+ secrètement sectairement sectoriellement secundo seigneurialement seizièmement
+ semestriellement sempiternellement senestrorsum sensationnellement
+ sensuellement sensément sentencieusement sentimentalement septentrionalement
+ septièmement sereinement serré serviablement servilement seul seulement sexto
+ sforzando si sibyllinement sic sidéralement sidérurgiquement significativement
+ similairement simplement simultanément sincèrement singulièrement sinistrement
+ sinon sinueusement siouxement sirupeusement sitôt sixièmement smorzando
+ sobrement sociablement socialement sociolinguistiquement sociologiquement
+ socratiquement soigneusement soit soixantièmement solairement soldatesquement
+ solidairement solidement solitairement somatiquement sombrement sommairement
+ somptuairement somptueusement songeusement sonorement sophistiquement
+ sordidement sororalement sostenuto sottement soucieusement soudain
+ souhaitablement souplement soupçonneusement sourcilleusement sourdement
+ souterrainement souvent souventefois souverainement soyeusement spacieusement
+ spasmodiquement spatialement spectaculairement spectralement sphériquement
+ splendidement spontanément sporadiquement sportivement spécialement
+ spécifiquement spéculairement spéculativement spéléologiquement stablement
+ staliniennement stationnairement statiquement statistiquement statutairement
+ stoechiométriquement stoïquement stratigraphiquement stratégiquement
+ stridemment strophiquement structuralement structurellement studieusement
+ stylistiquement sténographiquement stérilement stéréographiquement
+ stéréophoniquement suavement subalternement subconsciemment subitement subito
+ sublimement subordinément subordonnément subrepticement subrogatoirement
+ substantiellement substantivement subséquemment subtilement subversivement
+ succinctement succulemment suffisamment suggestivement suicidairement
+ superbement superficiellement superfinement superfétatoirement superlativement
+ supplémentairement supplétivement supportablement supposément supra
+ suprêmement supérieurement surabondamment surhumainement surnaturellement
+ surréellement surtout surérogatoirement sus suspectement suspicieusement
+ syllabiquement syllogistiquement symbiotiquement symboliquement
+ symphoniquement symptomatiquement symétriquement synchroniquement
+ syndicalement synergiquement synonymiquement synoptiquement syntactiquement
+ syntaxiquement synthétiquement systématiquement systémiquement sèchement
+ séculièrement sédentairement séditieusement sélectivement sélénographiquement
+ sémiologiquement sémiotiquement sémiquement séméiotiquement sénilement
+ séquentiellement séraphiquement sériellement sérieusement sérologiquement
+ sûr sûrement tabulairement tacitement taciturnement tactilement tactiquement
+ talmudiquement tangentiellement tangiblement tant tantôt tapageusement
+ tard tardivement tatillonnement tauromachiquement tautologiquement
+ taxinomiquement taxonomiquement techniquement technocratiquement
+ tectoniquement teigneusement tellement temporairement temporellement
+ tenacement tendanciellement tendancieusement tendrement tennistiquement
+ tenuto ter terminologiquement ternement terrible terriblement territorialement
+ testimonialement texto textuellement thermiquement thermodynamiquement
+ thermonucléairement thermoélectriquement thématiquement théocratiquement
+ théologiquement théoriquement théosophiquement thérapeutiquement thétiquement
+ timidement titulairement tièdement tiédassement tocardement tolérablement
+ toniquement topiquement topographiquement topologiquement toponymiquement
+ torrentueusement torridement tortueusement torvement totalement
+ toujours touristiquement tout toute toutefois toxicologiquement
+ traditionnellement tragediante tragiquement tranquillement
+ transcendantalement transformationnellement transgressivement transitivement
+ transversalement traumatologiquement traînardement traîtreusement
+ trentièmement triangulairement tribalement tridimensionnellement
+ trihebdomadairement trimestriellement triomphalement triplement
+ tristement trivialement troisio troisièmement trompeusement trop
+ très ttc tumultuairement tumultueusement turpidement tutélairement typiquement
+ typologiquement tyranniquement télescopiquement téléautographiquement
+ téléinformatiquement télématiquement téléologiquement télépathiquement
+ télévisuellement témérairement ténébreusement tératologiquement tétaniquement
+ tétraédriquement tôt ultimement ultimo ultérieurement unanimement uniformément
+ unilinéairement uniment uninominalement unipolairement uniquement unitairement
+ universitairement univoquement unièmement urbainement urgemment urologiquement
+ usurairement utilement utilitairement utopiquement uvulairement vachardement
+ vaginalement vaguement vaillamment vainement valablement valeureusement
+ vaniteusement vantardement vaporeusement variablement vasculairement
+ vasouillardement vastement velléitairement vendredi venimeusement ventralement
+ verbeusement vernaculairement versatilement vertement verticalement
+ vertueusement verveusement vestimentairement veulement vicieusement
+ vieillottement vigesimo vigilamment vigoureusement vilain vilainement vilement
+ vingtièmement violemment virginalement virilement virtuellement virulemment
+ viscéralement visiblement visqueusement visuellement vitalement vite vitement
+ vivacement vivement viviparement vocalement vocaliquement voir voire
+ volcaniquement volcanologiquement volontairement volontiers volubilement
+ voluptueusement voracement voyez-vous vrai vraiment vraisemblablement
+ vulgo végétalement végétativement véhémentement vélairement vélocement
+ véniellement vénéneusement vénérablement véracement véridiquement
+ vésaniquement vétilleusement vétustement xylographiquement xérographiquement
+ zootechniquement âcrement âprement çà échocardiographiquement
+ échométriquement éclatamment éclectiquement écologiquement économement
+ économétriquement édéniquement également égalitairement égocentriquement
+ égrillardement éhontément élastiquement électivement électoralement
+ électrocardiographiquement électrochimiquement électrodynamiquement
+ électromagnétiquement électromécaniquement électroniquement
+ électropneumatiquement électrostatiquement électrotechniquement
+ éliminatoirement élitistement élogieusement éloquemment élégamment
+ élémentairement éminemment émotionnellement émotivement énergiquement
+ énigmatiquement énièmement énormément épais épaissement éparsement épatamment
+ éphémèrement épicuriennement épidermiquement épidémiologiquement
+ épigrammatiquement épigraphiquement épileptiquement épiquement épiscopalement
+ épistolairement épistémologiquement épouvantablement équitablement
+ équivoquement érotiquement éruditement éruptivement érémitiquement
+ étatiquement éternellement éthiquement éthologiquement étonnamment étourdiment
+ étroitement étymologiquement évangéliquement évasivement éventuellement
+""".split())
--- a/spacy/lang/fr/lemmatizer/_auxiliary_verbs_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_auxiliary_verbs_irreg.py
@ -0,0 +1,86 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+AUXILIARY_VERBS_IRREG = {
+    "suis": ("être",),
+    "es": ("être",),
+    "est": ("être",),
+    "sommes": ("être",),
+    "êtes": ("être",),
+    "sont": ("être",),
+    "étais": ("être",),
+    "étais": ("être",),
+    "était": ("être",),
+    "étions": ("être",),
+    "étiez": ("être",),
+    "étaient": ("être",),
+    "fus": ("être",),
+    "fut": ("être",),
+    "fûmes": ("être",),
+    "fûtes": ("être",),
+    "furent": ("être",),
+    "serai": ("être",),
+    "seras": ("être",),
+    "sera": ("être",),
+    "serons": ("être",),
+    "serez": ("être",),
+    "seront": ("être",),
+    "serais": ("être",),
+    "serait": ("être",),
+    "serions": ("être",),
+    "seriez": ("être",),
+    "seraient": ("être",),
+    "sois": ("être",),
+    "soit": ("être",),
+    "soyons": ("être",),
+    "soyez": ("être",),
+    "soient": ("être",),
+    "fusse": ("être",),
+    "fusses": ("être",),
+    "fût": ("être",),
+    "fussions": ("être",),
+    "fussiez": ("être",),
+    "fussent": ("être",),
+    "étant": ("être",),
+    "ai": ("avoir",),
+    "as": ("avoir",),
+    "a": ("avoir",),
+    "avons": ("avoir",),
+    "avez": ("avoir",),
+    "ont": ("avoir",),
+    "avais": ("avoir",),
+    "avait": ("avoir",),
+    "avions": ("avoir",),
+    "aviez": ("avoir",),
+    "avaient": ("avoir",),
+    "eus": ("avoir",),
+    "eut": ("avoir",),
+    "eûmes": ("avoir",),
+    "eûtes": ("avoir",),
+    "eurent": ("avoir",),
+    "aurai": ("avoir",),
+    "auras": ("avoir",),
+    "aura": ("avoir",),
+    "aurons": ("avoir",),
+    "aurez": ("avoir",),
+    "auront": ("avoir",),
+    "aurais": ("avoir",),
+    "aurait": ("avoir",),
+    "aurions": ("avoir",),
+    "auriez": ("avoir",),
+    "auraient": ("avoir",),
+    "aie": ("avoir",),
+    "aies": ("avoir",),
+    "ait": ("avoir",),
+    "ayons": ("avoir",),
+    "ayez": ("avoir",),
+    "aient": ("avoir",),
+    "eusse": ("avoir",),
+    "eusses": ("avoir",),
+    "eût": ("avoir",),
+    "eussions": ("avoir",),
+    "eussiez": ("avoir",),
+    "eussent": ("avoir",),
+    "ayant": ("avoir",)
+}
--- a/spacy/lang/fr/lemmatizer/_dets_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_dets_irreg.py
@ -0,0 +1,49 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+DETS_IRREG = {
+    "aucune": ("aucun",),
+    "ces": ("ce",),
+    "cet": ("ce",),
+    "cette": ("ce",),
+    "cents": ("cent",),
+    "certaines": ("certains",),
+    "différentes": ("différents",),
+    "diverses": ("divers",),
+    "la": ("le",),
+    "les": ("le",),
+    "l'": ("le",),
+    "laquelle": ("lequel",),
+    "lesquelles": ("lequel",),
+    "lesquels": ("lequel",),
+    "leurs": ("leur",),
+    "mainte": ("maint",),
+    "maintes": ("maint",),
+    "maints": ("maint",),
+    "ma": ("mon",),
+    "mes": ("mon",),
+    "nos": ("notre",),
+    "nulle": ("nul",),
+    "nulles": ("nul",),
+    "nuls": ("nul",),
+    "quelle": ("quel",),
+    "quelles": ("quel",),
+    "quels": ("quel",),
+    "quelqu'": ("quelque",),
+    "quelques": ("quelque",),
+    "sa": ("son",),
+    "ses": ("son",),
+    "telle": ("tel",),
+    "telles": ("tel",),
+    "tels": ("tel",),
+    "ta": ("ton",),
+    "tes": ("ton",),
+    "tous": ("tout",),
+    "toute": ("tout",),
+    "toutes": ("tout",),
+    "des": ("un",),
+    "une": ("un",),
+    "vingts": ("vingt",),
+    "vos": ("votre",)
+}
--- a/spacy/lang/fr/lemmatizer/_lemma_rules.py
+++ b/spacy/lang/fr/lemmatizer/_lemma_rules.py
@ -0,0 +1,56 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+ADJECTIVE_RULES = [
+    ["s", ""],
+    ["e", ""],
+    ["es", ""] 
+]
+    
+    
+NOUN_RULES = [
+    ["s", ""]
+]
+
+
+VERB_RULES = [
+    ["é", "er"],
+    ["és", "er"],
+    ["ée", "er"],
+    ["ées", "er"],
+    ["é", "er"],
+    ["es", "er"],
+    ["ons", "er"],
+    ["ez", "er"],
+    ["ent", "er"],
+    ["ais", "er"],
+    ["ait", "er"],
+    ["ions", "er"],
+    ["iez", "er"],
+    ["aient", "er"],
+    ["ai", "er"],
+    ["as", "er"],
+    ["a", "er"],
+    ["âmes", "er"],
+    ["âtes", "er"],
+    ["èrent", "er"],
+    ["erai", "er"],
+    ["eras", "er"],
+    ["era", "er"],
+    ["erons", "er"],
+    ["erez", "er"],
+    ["eront", "er"],
+    ["erais", "er"],
+    ["erait", "er"],
+    ["erions", "er"],
+    ["eriez", "er"],
+    ["eraient", "er"],
+    ["asse", "er"],
+    ["asses", "er"],
+    ["ât", "er"],
+    ["assions", "er"],
+    ["assiez", "er"],
+    ["assent", "er"],
+    ["ant", "er"]
+]
--- a/spacy/lang/fr/lemmatizer/_nouns.py
+++ b/spacy/lang/fr/lemmatizer/_nouns.py
--- a/spacy/lang/fr/lemmatizer/_nouns_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_nouns_irreg.py
--- a/spacy/lang/fr/lemmatizer/_pronouns_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_pronouns_irreg.py
@ -0,0 +1,40 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+
+PRONOUNS_IRREG = {
+    "aucune": ("aucun",),
+    "celle-ci": ("celui-ci",),
+    "celles-ci": ("celui-ci",),
+    "ceux-ci": ("celui-ci",),
+    "celle-là": ("celui-là",),
+    "celles-là": ("celui-là",),
+    "ceux-là": ("celui-là",),
+    "celle": ("celui",),
+    "celles": ("celui",),
+    "ceux": ("celui",),
+    "certaines": ("certains",),
+    "chacune": ("chacun",),
+    "icelle": ("icelui",),
+    "icelles": ("icelui",),
+    "iceux": ("icelui",),
+    "la": ("le",),
+    "les": ("le",),
+    "laquelle": ("lequel",),
+    "lesquelles": ("lequel",),
+    "lesquels": ("lequel",),
+    "elle-même": ("lui-même",),
+    "elles-mêmes": ("lui-même",),
+    "eux-mêmes": ("lui-même",),
+    "quelle": ("quel",),
+    "quelles": ("quel",),
+    "quels": ("quel",),
+    "quelques-unes": ("quelqu'un",),
+    "quelques-uns": ("quelqu'un",),
+    "quelque-une": ("quelqu'un",),
+    "qu": ("que",),
+    "telle": ("tel",),
+    "telles": ("tel",),
+    "tels": ("tel",),
+    "toutes": ("tous",),
+}
--- a/spacy/lang/fr/lemmatizer/_verbs.py
+++ b/spacy/lang/fr/lemmatizer/_verbs.py
--- a/spacy/lang/fr/lemmatizer/_verbs_irreg.py
+++ b/spacy/lang/fr/lemmatizer/_verbs_irreg.py
--- a/spacy/lang/fr/lemmatizer/lemmatizer.py
+++ b/spacy/lang/fr/lemmatizer/lemmatizer.py
@ -0,0 +1,131 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ....symbols import POS, NOUN, VERB, ADJ, ADV, PRON, DET, AUX, PUNCT
+from ....symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
+from .lookup import LOOKUP
+
+'''
+French language lemmatizer applies the default rule based lemmatization
+procedure with some modifications for better French language support.
+
+The parts of speech 'ADV', 'PRON', 'DET' and 'AUX' are added to use the 
+rule-based lemmatization. As a last resort, the lemmatizer checks in 
+the lookup table.
+'''
+
+class FrenchLemmatizer(object):
+    @classmethod
+    def load(cls, path, index=None, exc=None, rules=None, lookup=None):
+        return cls(index, exc, rules, lookup)
+
+    def __init__(self, index=None, exceptions=None, rules=None, lookup=None):
+        self.index = index
+        self.exc = exceptions
+        self.rules = rules
+        self.lookup_table = lookup if lookup is not None else {}
+
+    def __call__(self, string, univ_pos, morphology=None):
+        if not self.rules:
+            return [self.lookup_table.get(string, string)]
+        if univ_pos in (NOUN, 'NOUN', 'noun'):
+            univ_pos = 'noun'
+        elif univ_pos in (VERB, 'VERB', 'verb'):
+            univ_pos = 'verb'
+        elif univ_pos in (ADJ, 'ADJ', 'adj'):
+            univ_pos = 'adj'
+        elif univ_pos in (ADV, 'ADV', 'adv'):
+            univ_pos = 'adv'
+        elif univ_pos in (PRON, 'PRON', 'pron'):
+            univ_pos = 'pron'
+        elif univ_pos in (DET, 'DET', 'det'):
+            univ_pos = 'det'
+        elif univ_pos in (AUX, 'AUX', 'aux'):
+            univ_pos = 'aux'
+        elif univ_pos in (PUNCT, 'PUNCT', 'punct'):
+            univ_pos = 'punct'
+        else:
+            return [self.lookup(string)]
+        # See Issue #435 for example of where this logic is requied.
+        if self.is_base_form(univ_pos, morphology):
+            return list(set([string.lower()]))
+        lemmas = lemmatize(string, self.index.get(univ_pos, {}),
+                           self.exc.get(univ_pos, {}),
+                           self.rules.get(univ_pos, []))
+        return lemmas
+
+    def is_base_form(self, univ_pos, morphology=None):
+        """
+        Check whether we're dealing with an uninflected paradigm, so we can
+        avoid lemmatization entirely.
+        """
+        morphology = {} if morphology is None else morphology
+        others = [key for key in morphology
+                  if key not in (POS, 'Number', 'POS', 'VerbForm', 'Tense')]
+        if univ_pos == 'noun' and morphology.get('Number') == 'sing':
+            return True
+        elif univ_pos == 'verb' and morphology.get('VerbForm') == 'inf':
+            return True
+        # This maps 'VBP' to base form -- probably just need 'IS_BASE'
+        # morphology
+        elif univ_pos == 'verb' and (morphology.get('VerbForm') == 'fin' and
+                                     morphology.get('Tense') == 'pres' and
+                                     morphology.get('Number') is None and
+                                     not others):
+            return True
+        elif univ_pos == 'adj' and morphology.get('Degree') == 'pos':
+            return True
+        elif VerbForm_inf in morphology:
+            return True
+        elif VerbForm_none in morphology:
+            return True
+        elif Number_sing in morphology:
+            return True
+        elif Degree_pos in morphology:
+            return True
+        else:
+            return False
+
+    def noun(self, string, morphology=None):
+        return self(string, 'noun', morphology)
+
+    def verb(self, string, morphology=None):
+        return self(string, 'verb', morphology)
+
+    def adj(self, string, morphology=None):
+        return self(string, 'adj', morphology)
+
+    def punct(self, string, morphology=None):
+        return self(string, 'punct', morphology)
+
+    def lookup(self, string):
+        if string in self.lookup_table:
+            return self.lookup_table[string]
+        return string
+
+
+def lemmatize(string, index, exceptions, rules):
+    string = string.lower()
+    forms = []
+    if (string in index):
+        forms.append(string)
+        return forms
+    forms.extend(exceptions.get(string, []))
+    oov_forms = []
+    if not forms:
+        for old, new in rules:
+            if string.endswith(old):
+                form = string[:len(string) - len(old)] + new
+                if not form:
+                    pass
+                elif form in index or not form.isalpha():
+                    forms.append(form)
+                else:
+                    oov_forms.append(form)
+    if not forms:
+        forms.extend(oov_forms)
+    if not forms and string in LOOKUP.keys():
+        forms.append(LOOKUP[string])
+    if not forms:
+        forms.append(string)
+    return list(set(forms))
--- a/spacy/lang/fr/lemmatizer/lookup.py
+++ b/spacy/lang/fr/lemmatizer/lookup.py
--- a/spacy/lang/hi/lex_attrs.py
+++ b/spacy/lang/hi/lex_attrs.py
@ -55,6 +55,6 @@ def like_num(text):


 LEX_ATTRS = {
-    NORM: norm
+    NORM: norm,
    LIKE_NUM: like_num
 }
--- a/spacy/lang/id/init.py
+++ b/spacy/lang/id/init.py
@ -10,15 +10,18 @@ from .lex_attrs import LEX_ATTRS
 from .syntax_iterators import SYNTAX_ITERATORS

 from ..tokenizer_exceptions import BASE_EXCEPTIONS
+from ..norm_exceptions import BASE_NORMS
 from ...language import Language
-from ...attrs import LANG
-from ...util import update_exc
+from ...attrs import LANG, NORM
+from ...util import update_exc, add_lookups


 class IndonesianDefaults(Language.Defaults):
    lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
    lex_attr_getters[LANG] = lambda text: 'id'
    lex_attr_getters.update(LEX_ATTRS)
+    lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM],
+                                         BASE_NORMS, NORM_EXCEPTIONS)
    tokenizer_exceptions = update_exc(BASE_EXCEPTIONS, TOKENIZER_EXCEPTIONS)
    stop_words = STOP_WORDS
    prefixes = TOKENIZER_PREFIXES
--- a/spacy/lang/id/_tokenizer_exceptions_list.py
+++ b/spacy/lang/id/_tokenizer_exceptions_list.py
@ -24,7 +24,7 @@ aci-acinya
 aco-acoan
 ad-blocker
 ad-interim
-ada-ada saja
+ada-ada
 ada-adanya
 ada-adanyakah
 adang-adang
@ -243,7 +243,6 @@ bari-bari
 barik-barik
 baris-berbaris
 baru-baru
-baru-baru ini
 baru-batu
 barung-barung
 basa-basi
@ -1059,7 +1058,6 @@ box-to-box
 boyo-boyo
 buah-buahan
 buang-buang
-buang-buang air
 buat-buatan
 buaya-buaya
 bubun-bubun
@ -1226,7 +1224,6 @@ deg-degan
 degap-degap
 dekak-dekak
 dekat-dekat
-dengan -
 dengar-dengaran
 dengking-mendengking
 departemen-departemen
@ -1246,6 +1243,7 @@ dibayang-bayangi
 dibuat-buat
 diiming-imingi
 dilebih-lebihkan
+dimana-mana
 dimata-matai
 dinas-dinas
 dinul-Islam
@ -1278,6 +1276,57 @@ dulang-dulang
 duri-duri
 duta-duta
 dwi-kewarganegaraan
+e-arena
+e-billing
+e-budgeting
+e-cctv
+e-class
+e-commerce
+e-counting
+e-elektronik
+e-entertainment
+e-evolution
+e-faktur
+e-filing
+e-fin
+e-form
+e-government
+e-govt
+e-hakcipta
+e-id
+e-info
+e-katalog
+e-ktp
+e-leadership
+e-lhkpn
+e-library
+e-loket
+e-m1
+e-money
+e-news
+e-nisn
+e-npwp
+e-paspor
+e-paten
+e-pay
+e-perda
+e-perizinan
+e-planning
+e-polisi
+e-power
+e-punten
+e-retribusi
+e-samsat
+e-sport
+e-store
+e-tax
+e-ticketing
+e-tilang
+e-toll
+e-visa
+e-voting
+e-wallet
+e-warong
 ecek-ecek
 eco-friendly
 eco-park
@ -1440,7 +1489,25 @@ ginang-ginang
 girap-girap
 girik-girik
 giring-giring
+go-auto
+go-bills
+go-bluebird
+go-box
+go-car
+go-clean
+go-food
+go-glam
+go-jek
 go-kart
+go-mart
+go-massage
+go-med
+go-points
+go-pulsa
+go-ride
+go-send
+go-shop
+go-tix
 go-to-market
 goak-goak
 goal-line
@ -1488,7 +1555,6 @@ hang-out
 hantu-hantu
 happy-happy
 harap-harap
-harap-harap cemas
 harap-harapan
 hard-disk
 harga-harga
@ -1633,7 +1699,7 @@ jor-joran
 jotos-jotosan
 juak-juak
 jual-beli
-juang-juang !!? lenjuang
+juang-juang
 julo-julo
 julung-julung
 julur-julur
@ -1787,6 +1853,7 @@ kemarah-marahan
 kemasam-masaman
 kemati-matian
 kembang-kembang
+kemenpan-rb
 kementerian-kementerian
 kemerah-merahan
 kempang-kempis
@ -1827,7 +1894,6 @@ keras-mengerasi
 kercap-kercip
 kercap-kercup
 keriang-keriut
-kering-kering air
 kerja-kerja
 kernyat-kernyut
 kerobak-kerabit
@ -1952,7 +2018,7 @@ kuda-kudaan
 kudap-kudap
 kue-kue
 kulah-kulah
-kulak-kulak tangan
+kulak-kulak
 kulik-kulik
 kulum-kulum
 kumat-kamit
@ -2086,7 +2152,6 @@ lumba-lumba
 lumi-lumi
 luntang-lantung
 lupa-lupa
-lupa-lupa ingat
 lupa-lupaan
 lurah-camat
 maaf-memaafkan
@ -2097,6 +2162,7 @@ macan-macanan
 machine-to-machine
 mafia-mafia
 mahasiswa-mahasiswi
+mahasiswa/i
 mahi-mahi
 main-main
 main-mainan
@ -2185,14 +2251,14 @@ memandai-mandai
 memanggil-manggil
 memanis-manis
 memanjut-manjut
-memantas-mantas diri
+memantas-mantas
 memasak-masak
 memata-matai
 mematah-matah
 mematuk-matuk
 mematut-matut
 memau-mau
-memayah-mayahkan (diri)
+memayah-mayahkan
 membaca-baca
 membacah-bacah
 membagi-bagikan
@ -2576,6 +2642,7 @@ meraung-raungkan
 merayau-rayau
 merayu-rayu
 mercak-mercik
+mercedes-benz
 merek-merek
 mereka-mereka
 mereka-reka
@ -2627,9 +2694,9 @@ morat-marit
 move-on
 muda-muda
 muda-mudi
+muda/i
 mudah-mudahan
 muka-muka
-muka-muka (dengan -)
 mula-mula
 multiple-output
 muluk-muluk
@ -2791,6 +2858,7 @@ paus-paus
 paut-memaut
 pay-per-click
 paya-paya
+pdi-p
 pecah-pecah
 pecat-pecatan
 peer-to-peer
@ -2951,6 +3019,7 @@ putih-hitam
 putih-putih
 putra-putra
 putra-putri
+putra/i
 putri-putri
 putus-putus
 putusan-putusan
@ -3069,6 +3138,7 @@ sambung-bersambung
 sambung-menyambung
 sambut-menyambut
 samo-samo
+sampah-sampah
 sampai-sampai
 samping-menyamping
 sana-sini
@ -3204,7 +3274,7 @@ seolah-olah
 sepala-pala
 sepandai-pandai
 sepetang-petangan
-sepoi-sepoi (basa)
+sepoi-sepoi
 sepraktis-praktisnya
 sepuas-puasnya
 serak-serak
@ -3278,6 +3348,7 @@ sisa-sisa
 sisi-sisi
 siswa-siswa
 siswa-siswi
+siswa/i
 siswi-siswi
 situ-situ
 situs-situs
@ -3380,6 +3451,7 @@ tanggul-tanggul
 tanggung-menanggung
 tanggung-tanggung
 tank-tank
+tante-tante
 tanya-jawab
 tapa-tapa
 tapak-tapak
@ -3424,7 +3496,6 @@ teralang-alang
 terambang-ambang
 terambung-ambung
 terang-terang
-terang-terang laras
 terang-terangan
 teranggar-anggar
 terangguk-angguk
@ -3438,7 +3509,6 @@ terayap-rayap
 terbada-bada
 terbahak-bahak
 terbang-terbang
-terbang-terbang hinggap
 terbata-bata
 terbatuk-batuk
 terbayang-bayang
--- a/spacy/lang/id/lemmatizer.py
+++ b/spacy/lang/id/lemmatizer.py
@ -18199,7 +18199,6 @@ LOOKUP = {
    'sekelap': 'kelap',
    'kelap-kelip': 'terkelap',
    'mengelapkan': 'lap',
-    'sekelap': 'terkelap',
    'berlapar': 'lapar',
    'kelaparan': 'lapar',
    'kelaparannya': 'lapar',
@ -30179,7 +30178,6 @@ LOOKUP = {
    'terperonyok': 'peronyok',
    'terperosok': 'perosok',
    'terperosoknya': 'perosok',
-    'merosot': 'perosot',
    'memerosot': 'perosot',
    'memerosotkan': 'perosot',
    'kepustakaan': 'pustaka',
--- a/spacy/lang/id/lex_attrs.py
+++ b/spacy/lang/id/lex_attrs.py
@ -1,7 +1,10 @@
 # coding: utf8
 from __future__ import unicode_literals

-from ...attrs import LIKE_NUM
+import unicodedata
+
+from .punctuation import LIST_CURRENCY
+from ...attrs import IS_CURRENCY, LIKE_NUM


 _num_words = ['nol', 'satu', 'dua', 'tiga', 'empat', 'lima', 'enam', 'tujuh',
@ -29,6 +32,17 @@ def like_num(text):
    return False


+def is_currency(text):
+    if text in LIST_CURRENCY:
+        return True
+
+    for char in text:
+        if unicodedata.category(char) != 'Sc':
+            return False
+    return True
+
+
 LEX_ATTRS = {
+    IS_CURRENCY: is_currency,
    LIKE_NUM: like_num
 }
--- a/spacy/lang/id/norm_exceptions.py
+++ b/spacy/lang/id/norm_exceptions.py
@ -1,7 +1,535 @@
+"""
+Slang and abbreviations
+
+Daftar kosakata yang sering salah dieja
+https://id.wikipedia.org/wiki/Wikipedia:Daftar_kosakata_bahasa_Indonesia_yang_sering_salah_dieja
+
+"""
 # coding: utf8
 from __future__ import unicode_literals

-_exc = {}
+_exc = {
+    # Slang and abbreviations
+    "silahkan": "silakan",
+    "yg": "yang",
+    "kalo": "kalau",
+    "cawu": "caturwulan",
+    "ok": "oke",
+    "gak": "tidak",
+    "enggak": "tidak",
+    "nggak": "tidak",
+    "ndak": "tidak",
+    "ngga": "tidak",
+    "dgn": "dengan",
+    "tdk": "tidak",
+    "jg": "juga",
+    "klo": "kalau",
+    "denger": "dengar",
+    "pinter": "pintar",
+    "krn": "karena",
+    "nemuin": "menemukan",
+    "jgn": "jangan",
+    "udah": "sudah",
+    "sy": "saya",
+    "udh": "sudah",
+    "dapetin": "mendapatkan",
+    "ngelakuin": "melakukan",
+    "ngebuat": "membuat",
+    "membikin": "membuat",
+    "bikin": "buat",
+
+    # Daftar kosakata yang sering salah dieja
+    "malpraktik": "malapraktik",
+    "malfungsi": "malafungsi",
+    "malserap": "malaserap",
+    "maladaptasi": "malaadaptasi",
+    "malsuai": "malasuai",
+    "maldistribusi": "maladistribusi",
+    "malgizi": "malagizi",
+    "malsikap": "malasikap",
+    "memperhatikan": "memerhatikan",
+    "akte": "akta",
+    "cemilan": "camilan",
+    "esei": "esai",
+    "frase": "frasa",
+    "kafeteria": "kafetaria",
+    "ketapel": "katapel",
+    "kenderaan": "kendaraan",
+    "menejemen": "manajemen",
+    "menejer": "manajer",
+    "mesjid": "masjid",
+    "rebo": "rabu",
+    "seksama": "saksama",
+    "senggama": "sanggama",
+    "sekedar": "sekadar",
+    "seprei": "seprai",
+    "semedi": "semadi",
+    "samadi": "semadi",
+    "amandemen": "amendemen",
+    "algoritma": "algoritme",
+    "aritmatika": "aritmetika",
+    "metoda": "metode",
+    "materai": "meterai",
+    "meterei": "meterai",
+    "kalendar": "kalender",
+    "kadaluwarsa": "kedaluwarsa",
+    "katagori": "kategori",
+    "parlamen": "parlemen",
+    "sekular": "sekuler",
+    "selular": "seluler",
+    "sirkular": "sirkuler",
+    "survai": "survei",
+    "survey": "survei",
+    "aktuil": "aktual",
+    "formil": "formal",
+    "trotoir": "trotoar",
+    "komersiil": "komersial",
+    "komersil": "komersial",
+    "tradisionil": "tradisionial",
+    "orisinil": "orisinal",
+    "orijinil": "orisinal",
+    "afdol": "afdal",
+    "antri": "antre",
+    "apotik": "apotek",
+    "atlit": "atlet",
+    "atmosfir": "atmosfer",
+    "cidera": "cedera",
+    "cendikiawan": "cendekiawan",
+    "cepet": "cepat",
+    "cinderamata": "cenderamata",
+    "debet": "debit",
+    "difinisi": "definisi",
+    "dekrit": "dekret",
+    "disain": "desain",
+    "diskripsi": "deskripsi",
+    "diskotik": "diskotek",
+    "eksim": "eksem",
+    "exim": "eksem",
+    "faidah": "faedah",
+    "ekstrim": "ekstrem",
+    "ekstrimis": "ekstremis",
+    "komplit": "komplet",
+    "konkrit": "konkret",
+    "kongkrit": "konkret",
+    "kongkret": "konkret",
+    "kridit": "kredit",
+    "musium": "museum",
+    "pinalti": "penalti",
+    "piranti": "peranti",
+    "pinsil": "pensil",
+    "personil": "personel",
+    "sistim": "sistem",
+    "teoritis": "teoretis",
+    "vidio": "video",
+    "cengkeh": "cengkih",
+    "desertasi": "disertasi",
+    "hakekat": "hakikat",
+    "intelejen": "intelijen",
+    "kaedah": "kaidah",
+    "kempes": "kempis",
+    "kementrian": "kementerian",
+    "ledeng": "leding",
+    "nasehat": "nasihat",
+    "penasehat": "penasihat",
+    "praktek": "praktik",
+    "praktekum": "praktikum",
+    "resiko": "risiko",
+    "retsleting": "ritsleting",
+    "senen": "senin",
+    "amuba": "ameba",
+    "punggawa": "penggawa",
+    "surban": "serban",
+    "nomer": "nomor",
+    "sorban": "serban",
+    "bis": "bus",
+    "agribisnis": "agrobisnis",
+    "kantung": "kantong",
+    "khutbah": "khotbah",
+    "mandur": "mandor",
+    "rubuh": "roboh",
+    "pastur": "pastor",
+    "supir": "sopir",
+    "goncang": "guncang",
+    "goa": "gua",
+    "kaos": "kaus",
+    "kokoh": "kukuh",
+    "komulatif": "kumulatif",
+    "kolomnis": "kolumnis",
+    "korma": "kurma",
+    "lobang": "lubang",
+    "limo": "limusin",
+    "limosin": "limusin",
+    "mangkok": "mangkuk",
+    "saos": "saus",
+    "sop": "sup",
+    "sorga": "surga",
+    "tegor": "tegur",
+    "telor": "telur",
+    "obrak-abrik": "ubrak-abrik",
+    "ekwivalen": "ekuivalen",
+    "frekwensi": "frekuensi",
+    "konsekwensi": "konsekuensi",
+    "kwadran": "kuadran",
+    "kwadrat": "kuadrat",
+    "kwalifikasi": "kualifikasi",
+    "kwalitas": "kualitas",
+    "kwalitet": "kualitas",
+    "kwalitatif": "kualitatif",
+    "kwantitas": "kuantitas",
+    "kwantitatif": "kuantitatif",
+    "kwantum": "kuantum",
+    "kwartal": "kuartal",
+    "kwintal": "kuintal",
+    "kwitansi": "kuitansi",
+    "kwatir": "khawatir",
+    "kuatir": "khawatir",
+    "jadual": "jadwal",
+    "hirarki": "hierarki",
+    "karir": "karier",
+    "aktip": "aktif",
+    "daptar": "daftar",
+    "efektip": "efektif",
+    "epektif": "efektif",
+    "epektip": "efektif",
+    "Pebruari": "Februari",
+    "pisik": "fisik",
+    "pondasi": "fondasi",
+    "photo": "foto",
+    "photokopi": "fotokopi",
+    "hapal": "hafal",
+    "insap": "insaf",
+    "insyaf": "insaf",
+    "konperensi": "konferensi",
+    "kreatip": "kreatif",
+    "kreativ": "kreatif",
+    "maap": "maaf",
+    "napsu": "nafsu",
+    "negatip": "negatif",
+    "negativ": "negatif",
+    "objektip": "objektif",
+    "obyektip": "objektif",
+    "obyektif": "objektif",
+    "pasip": "pasif",
+    "pasiv": "pasif",
+    "positip": "positif",
+    "positiv": "positif",
+    "produktip": "produktif",
+    "produktiv": "produktif",
+    "sarap": "saraf",
+    "sertipikat": "sertifikat",
+    "subjektip": "subjektif",
+    "subyektip": "subjektif",
+    "subyektif": "subjektif",
+    "tarip": "tarif",
+    "transitip": "transitif",
+    "transitiv": "transitif",
+    "faham": "paham",
+    "fikir": "pikir",
+    "berfikir": "berpikir",
+    "telefon": "telepon",
+    "telfon": "telepon",
+    "telpon": "telepon",
+    "tilpon": "telepon",
+    "nafas": "napas",
+    "bernafas": "bernapas",
+    "pernafasan": "pernapasan",
+    "vermak": "permak",
+    "vulpen": "pulpen",
+    "aktifis": "aktivis",
+    "konfeksi": "konveksi",
+    "motifasi": "motivasi",
+    "Nopember": "November",
+    "propinsi": "provinsi",
+    "babtis": "baptis",
+    "jerembab": "jerembap",
+    "lembab": "lembap",
+    "sembab": "sembap",
+    "saptu": "sabtu",
+    "tekat": "tekad",
+    "bejad": "bejat",
+    "nekad": "nekat",
+    "otoped": "otopet",
+    "skuad": "skuat",
+    "jenius": "genius",
+    "marjin": "margin",
+    "marjinal": "marginal",
+    "obyek": "objek",
+    "subyek": "subjek",
+    "projek": "proyek",
+    "azas": "asas",
+    "ijasah": "ijazah",
+    "jenasah": "jenazah",
+    "plasa": "plaza",
+    "bathin": "batin",
+    "Katholik": "Katolik",
+    "orthografi": "ortografi",
+    "pathogen": "patogen",
+    "theologi": "teologi",
+    "ijin": "izin",
+    "rejeki": "rezeki",
+    "rejim": "rezim",
+    "jaman": "zaman",
+    "jamrud": "zamrud",
+    "jinah": "zina",
+    "perjinahan": "perzinaan",
+    "anugrah": "anugerah",
+    "cendrawasih": "cenderawasih",
+    "jendral": "jenderal",
+    "kripik": "keripik",
+    "krupuk": "kerupuk",
+    "ksatria": "kesatria",
+    "mentri": "menteri",
+    "negri": "negeri",
+    "Prancis": "Perancis",
+    "sebrang": "seberang",
+    "menyebrang": "menyeberang",
+    "Sumatra": "Sumatera",
+    "trampil": "terampil",
+    "isteri": "istri",
+    "justeru": "justru",
+    "perajurit": "prajurit",
+    "putera": "putra",
+    "puteri": "putri",
+    "samudera": "samudra",
+    "sastera": "sastra",
+    "sutera": "sutra",
+    "terompet": "trompet",
+    "iklas": "ikhlas",
+    "iktisar": "ikhtisar",
+    "kafilah": "khafilah",
+    "kawatir": "khawatir",
+    "kotbah": "khotbah",
+    "kusyuk": "khusyuk",
+    "makluk": "makhluk",
+    "mahluk": "makhluk",
+    "mahkluk": "makhluk",
+    "nahkoda": "nakhoda",
+    "nakoda": "nakhoda",
+    "tahta": "takhta",
+    "takhyul": "takhayul",
+    "tahyul": "takhayul",
+    "tahayul": "takhayul",
+    "akhli": "ahli",
+    "anarkhi": "anarki",
+    "kharisma": "karisma",
+    "kharismatik": "karismatik",
+    "mahsud": "maksud",
+    "makhsud": "maksud",
+    "rakhmat": "rahmat",
+    "tekhnik": "teknik",
+    "tehnik": "teknik",
+    "tehnologi": "teknologi",
+    "ikhwal": "ihwal",
+    "expor": "ekspor",
+    "extra": "ekstra",
+    "komplex": "komplek",
+    "sex": "seks",
+    "taxi": "taksi",
+    "extasi": "ekstasi",
+    "syaraf": "saraf",
+    "syurga": "surga",
+    "mashur": "masyhur",
+    "masyur": "masyhur",
+    "mahsyur": "masyhur",
+    "mashyur": "masyhur",
+    "muadzin": "muazin",
+    "adzan": "azan",
+    "ustadz": "ustaz",
+    "ustad": "ustaz",
+    "ustadzah": "ustaz",
+    "dzikir": "zikir",
+    "dzuhur": "zuhur",
+    "dhuhur": "zuhur",
+    "zhuhur": "zuhur",
+    "analisa": "analisis",
+    "diagnosa": "diagnosis",
+    "hipotesa": "hipotesis",
+    "sintesa": "sintesis",
+    "aktiviti": "aktivitas",
+    "aktifitas": "aktivitas",
+    "efektifitas": "efektivitas",
+    "komuniti": "komunitas",
+    "kreatifitas": "kreativitas",
+    "produktifitas": "produktivitas",
+    "realiti": "realitas",
+    "realita": "realitas",
+    "selebriti": "selebritas",
+    "spotifitas": "sportivitas",
+    "universiti": "universitas",
+    "utiliti": "utilitas",
+    "validiti": "validitas",
+    "dilokalisir": "dilokalisasi",
+    "didramatisir": "didramatisasi",
+    "dipolitisir": "dipolitisasi",
+    "dinetralisir": "dinetralisasi",
+    "dikonfrontir": "dikonfrontasi",
+    "mendominir": "mendominasi",
+    "koordinir": "koordinasi",
+    "proklamir": "proklamasi",
+    "terorganisir": "terorganisasi",
+    "terealisir": "terealisasi",
+    "robah": "ubah",
+    "dirubah": "diubah",
+    "merubah": "mengubah",
+    "terlanjur": "telanjur",
+    "terlantar": "telantar",
+    "penglepasan": "pelepasan",
+    "pelihatan": "penglihatan",
+    "pemukiman": "permukiman",
+    "pengrumahan": "perumahan",
+    "penyewaan": "persewaan",
+    "menyintai": "mencintai",
+    "menyolok": "mencolok",
+    "contek": "sontek",
+    "mencontek": "menyontek",
+    "pungkir": "mungkir",
+    "dipungkiri": "dimungkiri",
+    "kupungkiri": "kumungkiri",
+    "kaupungkiri": "kaumungkiri",
+    "nampak": "tampak",
+    "nampaknya": "tampaknya",
+    "nongkrong": "tongkrong",
+    "berternak": "beternak",
+    "berterbangan": "beterbangan",
+    "berserta": "beserta",
+    "berperkara": "beperkara",
+    "berpergian": "bepergian",
+    "berkerja": "bekerja",
+    "berberapa": "beberapa",
+    "terbersit": "tebersit",
+    "terpercaya": "tepercaya",
+    "terperdaya": "teperdaya",
+    "terpercik": "tepercik",
+    "terpergok": "tepergok",
+    "aksesoris": "aksesori",
+    "handal": "andal",
+    "hantar": "antar",
+    "panutan": "anutan",
+    "atsiri": "asiri",
+    "bhakti": "bakti",
+    "china": "cina",
+    "dharma": "darma",
+    "diktaktor": "diktator",
+    "eksport": "ekspor",
+    "hembus": "embus",
+    "hadits": "hadis",
+    "hadist": "hadits",
+    "harafiah": "harfiah",
+    "himbau": "imbau",
+    "import": "impor",
+    "inget": "ingat",
+    "hisap": "isap",
+    "interprestasi": "interpretasi",
+    "kangker": "kanker",
+    "konggres": "kongres",
+    "lansekap": "lanskap",
+    "maghrib": "magrib",
+    "emak": "mak",
+    "moderen": "modern",
+    "pasport": "paspor",
+    "perduli": "peduli",
+    "ramadhan": "ramadan",
+    "rapih": "rapi",
+    "Sansekerta": "Sanskerta",
+    "shalat": "salat",
+    "sholat": "salat",
+    "silahkan": "silakan",
+    "standard": "standar",
+    "hutang": "utang",
+    "zinah": "zina",
+    "ambulan": "ambulans",
+    "antartika": "sntarktika",
+    "arteri": "arteria",
+    "asik": "asyik",
+    "australi": "australia",
+    "denga": "dengan",
+    "depo": "depot",
+    "detil": "detail",
+    "ensiklopedi": "ensiklopedia",
+    "elit": "elite",
+    "frustasi": "frustrasi",
+    "gladi": "geladi",
+    "greget": "gereget",
+    "itali": "italia",
+    "karna": "karena",
+    "klenteng": "kelenteng",
+    "erling": "kerling",
+    "kontruksi": "konstruksi",
+    "masal": "massal",
+    "merk": "merek",
+    "respon": "respons",
+    "diresponi": "direspons",
+    "skak": "sekak",
+    "stir": "setir",
+    "singapur": "singapura",
+    "standarisasi": "standardisasi",
+    "varitas": "varietas",
+    "amphibi": "amfibi",
+    "anjlog": "anjlok",
+    "alpukat": "avokad",
+    "alpokat": "avokad",
+    "bolpen": "pulpen",
+    "cabe": "cabai",
+    "cabay": "cabai",
+    "ceret": "cerek",
+    "differensial": "diferensial",
+    "duren": "durian",
+    "faksimili": "faksimile",
+    "faksimil": "faksimile",
+    "graha": "gerha",
+    "goblog": "goblok",
+    "gombrong": "gombroh",
+    "horden": "gorden",
+    "korden": "gorden",
+    "gubug": "gubuk",
+    "imaginasi": "imajinasi",
+    "jerigen": "jeriken",
+    "jirigen": "jeriken",
+    "carut-marut": "karut-marut",
+    "kwota": "kuota",
+    "mahzab": "mazhab",
+    "mempesona": "memesona",
+    "milyar": "miliar",
+    "missi": "misi",
+    "nenas": "nanas",
+    "negoisasi": "negosiasi",
+    "automotif": "otomotif",
+    "pararel": "paralel",
+    "paska": "pasca",
+    "prosen": "persen",
+    "pete": "petai",
+    "petay": "petai",
+    "proffesor": "profesor",
+    "rame": "ramai",
+    "rapot": "rapor",
+    "rileks": "relaks",
+    "rileksasi": "relaksasi",
+    "renumerasi": "remunerasi",
+    "seketaris": "sekretaris",
+    "sekertaris": "sekretaris",
+    "sensorik": "sensoris",
+    "sentausa": "sentosa",
+    "strawberi": "stroberi",
+    "strawbery": "stroberi",
+    "taqwa": "takwa",
+    "tauco": "taoco",
+    "tauge": "taoge",
+    "toge": "taoge",
+    "tauladan": "teladan",
+    "taubat": "tobat",
+    "trilyun": "triliun",
+    "vissi": "visi",
+    "coklat": "cokelat",
+    "narkotika": "narkotik",
+    "oase": "oasis",
+    "politisi": "politikus",
+    "terong": "terung",
+    "wool": "wol",
+    "himpit": "impit",
+    "mujizat": "mukjizat",
+    "mujijat": "mukjizat",
+    "yag": "yang",
+}

 NORM_EXCEPTIONS = {}

--- a/spacy/lang/id/punctuation.py
+++ b/spacy/lang/id/punctuation.py
@ -4,7 +4,7 @@ from __future__ import unicode_literals
 from ..punctuation import TOKENIZER_PREFIXES, TOKENIZER_SUFFIXES, TOKENIZER_INFIXES
 from ..char_classes import merge_chars, split_chars, _currency, _units
 from ..char_classes import LIST_PUNCT, LIST_ELLIPSES, LIST_QUOTES
-from ..char_classes import QUOTES, UNITS, ALPHA, ALPHA_LOWER, ALPHA_UPPER, HYPHENS
+from ..char_classes import QUOTES, ALPHA, ALPHA_LOWER, ALPHA_UPPER, HYPHENS

 _units = (_units + 's bit Gbps Mbps mbps Kbps kbps ƒ ppi px '
          'Hz kHz MHz GHz mAh '
@ -25,7 +25,7 @@ HTML_SUFFIX = r'</(b|strong|i|em|p|span|div|a)>'
 MONTHS = merge_chars(_months)
 LIST_CURRENCY = split_chars(_currency)

-TOKENIZER_PREFIXES.remove('#') # hashtag
+TOKENIZER_PREFIXES.remove('#')  # hashtag
 _prefixes = TOKENIZER_PREFIXES + LIST_CURRENCY + [HTML_PREFIX] + ['/', '—']

 _suffixes = TOKENIZER_SUFFIXES + [r'\-[Nn]ya', '-[KkMm]u', '[—-]'] + [
--- a/spacy/lang/id/stop_words.py
+++ b/spacy/lang/id/stop_words.py
@ -1,763 +1,122 @@
+"""
+List of stop words in Bahasa Indonesia.
+"""
 # coding: utf8
 from __future__ import unicode_literals

 STOP_WORDS = set("""
-ada
-adalah
-adanya
-adapun
-agak
-agaknya
-agar
-akan
-akankah
-akhir
-akhiri
-akhirnya
-aku
-akulah
-amat
-amatlah
-anda
-andalah
-antar
-antara
-antaranya
-apa
-apaan
-apabila
-apakah
-apalagi
-apatah
-artinya
-asal
-asalkan
-atas
-atau
-ataukah
-ataupun
-awal
+ada adalah adanya adapun agak agaknya agar akan akankah akhir akhiri akhirnya
+aku akulah amat amatlah anda andalah antar antara antaranya apa apaan apabila
+apakah apalagi apatah artinya asal asalkan atas atau ataukah ataupun awal
 awalnya
-bagai
-bagaikan
-bagaimana
-bagaimanakah
-bagaimanapun
-bagi
-bagian
-bahkan
-bahwa
-bahwasanya
-baik
-bakal
-bakalan
-balik
-banyak
-bapak
-baru
-bawah
-beberapa
-begini
-beginian
-beginikah
-beginilah
-begitu
-begitukah
-begitulah
-begitupun
-bekerja
-belakang
-belakangan
-belum
-belumlah
-benar
-benarkah
-benarlah
-berada
-berakhir
-berakhirlah
-berakhirnya
-berapa
-berapakah
-berapalah
-berapapun
-berarti
-berawal
-berbagai
-berdatangan
-beri
-berikan
-berikut
-berikutnya
-berjumlah
-berkali-kali
-berkata
-berkehendak
-berkeinginan
-berkenaan
-berlainan
-berlalu
-berlangsung
-berlebihan
-bermacam
-bermacam-macam
-bermaksud
-bermula
-bersama
-bersama-sama
-bersiap
-bersiap-siap
-bertanya
-bertanya-tanya
-berturut
-berturut-turut
-bertutur
-berujar
-berupa
-besar
-betul
-betulkah
-biasa
-biasanya
-bila
-bilakah
-bisa
-bisakah
-boleh
-bolehkah
-bolehlah
-buat
-bukan
-bukankah
-bukanlah
-bukannya
-bulan
-bung
-cara
-caranya
-cukup
-cukupkah
-cukuplah
-cuma
-dahulu
-dalam
-dan
-dapat
-dari
-daripada
-datang
-dekat
-demi
-demikian
-demikianlah
-dengan
-depan
-di
-dia
-diakhiri
-diakhirinya
-dialah
-diantara
-diantaranya
-diberi
-diberikan
-diberikannya
-dibuat
-dibuatnya
-didapat
-didatangkan
-digunakan
-diibaratkan
-diibaratkannya
-diingat
-diingatkan
-diinginkan
-dijawab
-dijelaskan
-dijelaskannya
-dikarenakan
-dikatakan
-dikatakannya
-dikerjakan
-diketahui
-diketahuinya
-dikira
-dilakukan
-dilalui
-dilihat
-dimaksud
-dimaksudkan
-dimaksudkannya
-dimaksudnya
-diminta
-dimintai
-dimisalkan
-dimulai
-dimulailah
-dimulainya
-dimungkinkan
-dini
-dipastikan
-diperbuat
-diperbuatnya
-dipergunakan
-diperkirakan
-diperlihatkan
-diperlukan
-diperlukannya
-dipersoalkan
-dipertanyakan
-dipunyai
-diri
-dirinya
-disampaikan
-disebut
-disebutkan
-disebutkannya
-disini
-disinilah
-ditambahkan
-ditandaskan
-ditanya
-ditanyai
-ditanyakan
-ditegaskan
-ditujukan
-ditunjuk
-ditunjuki
-ditunjukkan
-ditunjukkannya
-ditunjuknya
-dituturkan
-dituturkannya
-diucapkan
-diucapkannya
-diungkapkan
-dong
-dua
-dulu
-empat
-enggak
-enggaknya
-entah
-entahlah
-guna
-gunakan
-hal
-hampir
-hanya
-hanyalah
-hari
-harus
-haruslah
-harusnya
-hendak
-hendaklah
-hendaknya
-hingga
-ia
-ialah
-ibarat
-ibaratkan
-ibaratnya
-ibu
-ikut
-ingat
-ingat-ingat
-ingin
-inginkah
-inginkan
-ini
-inikah
-inilah
-itu
-itukah
-itulah
-jadi
-jadilah
-jadinya
-jangan
-jangankan
-janganlah
-jauh
-jawab
-jawaban
-jawabnya
-jelas
-jelaskan
-jelaslah
-jelasnya
-jika
-jikalau
-juga
-jumlah
-jumlahnya
-justru
-kala
-kalau
-kalaulah
-kalaupun
-kalian
-kami
-kamilah
-kamu
-kamulah
-kan
-kapan
-kapankah
-kapanpun
-karena
-karenanya
-kasus
-kata
-katakan
-katakanlah
-katanya
-ke
-keadaan
-kebetulan
-kecil
-kedua
-keduanya
-keinginan
-kelamaan
-kelihatan
-kelihatannya
-kelima
-keluar
-kembali
-kemudian
-kemungkinan
-kemungkinannya
-kenapa
-kepada
-kepadanya
-kesampaian
-keseluruhan
-keseluruhannya
-keterlaluan
-ketika
-khususnya
-kini
-kinilah
-kira
-kira-kira
-kiranya
-kita
-kitalah
-kok
-kurang
-lagi
-lagian
-lah
-lain
-lainnya
-lalu
-lama
-lamanya
-lanjut
-lanjutnya
-lebih
-lewat
-lima
-luar
-macam
-maka
-makanya
-makin
-malah
-malahan
-mampu
-mampukah
-mana
-manakala
-manalagi
-masa
-masalah
-masalahnya
-masih
-masihkah
-masing
-masing-masing
-mau
-maupun
-melainkan
-melakukan
-melalui
-melihat
-melihatnya
-memang
-memastikan
-memberi
-memberikan
-membuat
-memerlukan
-memihak
-meminta
-memintakan
-memisalkan
-memperbuat
-mempergunakan
-memperkirakan
-memperlihatkan
-mempersiapkan
-mempersoalkan
-mempertanyakan
-mempunyai
-memulai
-memungkinkan
-menaiki
-menambahkan
-menandaskan
-menanti
-menanti-nanti
-menantikan
-menanya
-menanyai
-menanyakan
-mendapat
-mendapatkan
-mendatang
-mendatangi
-mendatangkan
-menegaskan
-mengakhiri
-mengapa
-mengatakan
-mengatakannya
-mengenai
-mengerjakan
-mengetahui
-menggunakan
-menghendaki
-mengibaratkan
-mengibaratkannya
-mengingat
-mengingatkan
-menginginkan
-mengira
-mengucapkan
-mengucapkannya
-mengungkapkan
-menjadi
-menjawab
-menjelaskan
-menuju
-menunjuk
-menunjuki
-menunjukkan
-menunjuknya
-menurut
-menuturkan
-menyampaikan
-menyangkut
-menyatakan
-menyebutkan
-menyeluruh
-menyiapkan
-merasa
-mereka
-merekalah
-merupakan
-meski
-meskipun
-meyakini
-meyakinkan
-minta
-mirip
-misal
-misalkan
-misalnya
-mula
-mulai
-mulailah
-mulanya
-mungkin
-mungkinkah
-nah
-naik
-namun
-nanti
-nantinya
-nyaris
-nyatanya
-oleh
-olehnya
-pada
-padahal
-padanya
-pak
-paling
-panjang
-pantas
-para
-pasti
-pastilah
-penting
-pentingnya
-per
-percuma
-perlu
-perlukah
-perlunya
-pernah
-persoalan
-pertama
-pertama-tama
-pertanyaan
-pertanyakan
-pihak
-pihaknya
-pukul
-pula
-pun
-punya
-rasa
-rasanya
-rata
-rupanya
-saat
-saatnya
-saja
-sajalah
-saling
-sama
-sama-sama
-sambil
-sampai
-sampai-sampai
-sampaikan
-sana
-sangat
-sangatlah
-satu
-saya
-sayalah
-se
-sebab
-sebabnya
-sebagai
-sebagaimana
-sebagainya
-sebagian
-sebaik
-sebaik-baiknya
-sebaiknya
-sebaliknya
-sebanyak
-sebegini
-sebegitu
-sebelum
-sebelumnya
-sebenarnya
-seberapa
-sebesar
-sebetulnya
-sebisanya
-sebuah
-sebut
-sebutlah
-sebutnya
-secara
-secukupnya
-sedang
-sedangkan
-sedemikian
-sedikit
-sedikitnya
-seenaknya
-segala
-segalanya
-segera
-seharusnya
-sehingga
-seingat
-sejak
-sejauh
-sejenak
-sejumlah
-sekadar
-sekadarnya
-sekali
-sekali-kali
-sekalian
-sekaligus
-sekalipun
-sekarang
-sekarang
-sekecil
-seketika
-sekiranya
-sekitar
-sekitarnya
-sekurang-kurangnya
-sekurangnya
-sela
-selain
-selaku
-selalu
-selama
-selama-lamanya
-selamanya
-selanjutnya
-seluruh
-seluruhnya
-semacam
-semakin
-semampu
-semampunya
-semasa
-semasih
-semata
-semata-mata
-semaunya
-sementara
-semisal
-semisalnya
-sempat
-semua
-semuanya
-semula
-sendiri
-sendirian
-sendirinya
-seolah
-seolah-olah
-seorang
-sepanjang
-sepantasnya
-sepantasnyalah
-seperlunya
-seperti
-sepertinya
-sepihak
-sering
-seringnya
-serta
-serupa
-sesaat
-sesama
-sesampai
-sesegera
-sesekali
-seseorang
-sesuatu
-sesuatunya
-sesudah
-sesudahnya
-setelah
-setempat
-setengah
-seterusnya
-setiap
-setiba
-setibanya
-setidak-tidaknya
-setidaknya
-setinggi
-seusai
-sewaktu
-siap
-siapa
-siapakah
-siapapun
-sini
-sinilah
-soal
-soalnya
-suatu
-sudah
-sudahkah
-sudahlah
-supaya
-tadi
-tadinya
-tahu
-tahun
-tak
-tambah
-tambahnya
-tampak
-tampaknya
-tandas
-tandasnya
-tanpa
-tanya
-tanyakan
-tanyanya
-tapi
-tegas
-tegasnya
-telah
-tempat
-tengah
-tentang
-tentu
-tentulah
-tentunya
-tepat
-terakhir
-terasa
-terbanyak
-terdahulu
-terdapat
-terdiri
-terhadap
-terhadapnya
-teringat
-teringat-ingat
-terjadi
-terjadilah
-terjadinya
-terkira
-terlalu
-terlebih
-terlihat
-termasuk
-ternyata
-tersampaikan
-tersebut
-tersebutlah
-tertentu
-tertuju
-terus
-terutama
-tetap
-tetapi
-tiap
-tiba
-tiba-tiba
-tidak
-tidakkah
-tidaklah
-tiga
-tinggi
-toh
-tunjuk
-turut
-tutur
-tuturnya
-ucap
-ucapnya
-ujar
-ujarnya
-umum
-umumnya
-ungkap
-ungkapnya
-untuk
-usah
-usai
-waduh
-wah
-wahai
-waktu
-waktunya
-walau
-walaupun
-wong
-yaitu
-yakin
-yakni
-yang
-""".split())
+
+bagai bagaikan bagaimana bagaimanakah bagaimanapun bagi bagian bahkan bahwa
+bahwasanya baik bakal bakalan balik banyak bapak baru bawah beberapa begini
+beginian beginikah beginilah begitu begitukah begitulah begitupun bekerja
+belakang belakangan belum belumlah benar benarkah benarlah berada berakhir
+berakhirlah berakhirnya berapa berapakah berapalah berapapun berarti berawal
+berbagai berdatangan beri berikan berikut berikutnya berjumlah berkali-kali
+berkata berkehendak berkeinginan berkenaan berlainan berlalu berlangsung
+berlebihan bermacam bermacam-macam bermaksud bermula bersama bersama-sama
+bersiap bersiap-siap bertanya bertanya-tanya berturut berturut-turut bertutur
+berujar berupa besar betul betulkah biasa biasanya bila bilakah bisa bisakah
+boleh bolehkah bolehlah buat bukan bukankah bukanlah bukannya bulan bung
+
+cara caranya cukup cukupkah cukuplah cuma
+
+dahulu dalam dan dapat dari daripada datang dekat demi demikian demikianlah
+dengan depan di dia diakhiri diakhirinya dialah diantara diantaranya diberi
+diberikan diberikannya dibuat dibuatnya didapat didatangkan digunakan
+diibaratkan diibaratkannya diingat diingatkan diinginkan dijawab dijelaskan
+dijelaskannya dikarenakan dikatakan dikatakannya dikerjakan diketahui
+diketahuinya dikira dilakukan dilalui dilihat dimaksud dimaksudkan
+dimaksudkannya dimaksudnya diminta dimintai dimisalkan dimulai dimulailah
+dimulainya dimungkinkan dini dipastikan diperbuat diperbuatnya dipergunakan
+diperkirakan diperlihatkan diperlukan diperlukannya dipersoalkan dipertanyakan
+dipunyai diri dirinya disampaikan disebut disebutkan disebutkannya disini
+disinilah ditambahkan ditandaskan ditanya ditanyai ditanyakan ditegaskan
+ditujukan ditunjuk ditunjuki ditunjukkan ditunjukkannya ditunjuknya dituturkan
+dituturkannya diucapkan diucapkannya diungkapkan dong dua dulu
+
+empat enggak enggaknya entah entahlah
+
+guna gunakan
+
+hal hampir hanya hanyalah hari harus haruslah harusnya hendak hendaklah
+hendaknya hingga
+
+ia ialah ibarat ibaratkan ibaratnya ibu ikut ingat ingat-ingat ingin inginkah
+inginkan ini inikah inilah itu itukah itulah
+
+jadi jadilah jadinya jangan jangankan janganlah jauh jawab jawaban jawabnya
+jelas jelaskan jelaslah jelasnya jika jikalau juga jumlah jumlahnya justru
+
+kala kalau kalaulah kalaupun kalian kami kamilah kamu kamulah kan kapan
+kapankah kapanpun karena karenanya kasus kata katakan katakanlah katanya ke
+keadaan kebetulan kecil kedua keduanya keinginan kelamaan kelihatan
+kelihatannya kelima keluar kembali kemudian kemungkinan kemungkinannya kenapa
+kepada kepadanya kesampaian keseluruhan keseluruhannya keterlaluan ketika
+khususnya kini kinilah kira kira-kira kiranya kita kitalah kok kurang
+
+lagi lagian lah lain lainnya lalu lama lamanya lanjut lanjutnya lebih lewat
+lima luar
+
+macam maka makanya makin malah malahan mampu mampukah mana manakala manalagi
+masa masalah masalahnya masih masihkah masing masing-masing mau maupun
+melainkan melakukan melalui melihat melihatnya memang memastikan memberi
+memberikan membuat memerlukan memihak meminta memintakan memisalkan memperbuat
+mempergunakan memperkirakan memperlihatkan mempersiapkan mempersoalkan
+mempertanyakan mempunyai memulai memungkinkan menaiki menambahkan menandaskan
+menanti menanti-nanti menantikan menanya menanyai menanyakan mendapat
+mendapatkan mendatang mendatangi mendatangkan menegaskan mengakhiri mengapa
+mengatakan mengatakannya mengenai mengerjakan mengetahui menggunakan
+menghendaki mengibaratkan mengibaratkannya mengingat mengingatkan menginginkan
+mengira mengucapkan mengucapkannya mengungkapkan menjadi menjawab menjelaskan
+menuju menunjuk menunjuki menunjukkan menunjuknya menurut menuturkan
+menyampaikan menyangkut menyatakan menyebutkan menyeluruh menyiapkan merasa
+mereka merekalah merupakan meski meskipun meyakini meyakinkan minta mirip
+misal misalkan misalnya mula mulai mulailah mulanya mungkin mungkinkah
+
+nah naik namun nanti nantinya nyaris nyatanya
+
+oleh olehnya
+
+pada padahal padanya pak paling panjang pantas para pasti pastilah penting
+pentingnya per percuma perlu perlukah perlunya pernah persoalan pertama
+pertama-tama pertanyaan pertanyakan pihak pihaknya pukul pula pun punya
+
+rasa rasanya rata rupanya
+
+saat saatnya saja sajalah saling sama sama-sama sambil sampai sampai-sampai
+sampaikan sana sangat sangatlah satu saya sayalah se sebab sebabnya sebagai
+sebagaimana sebagainya sebagian sebaik sebaik-baiknya sebaiknya sebaliknya
+sebanyak sebegini sebegitu sebelum sebelumnya sebenarnya seberapa sebesar
+sebetulnya sebisanya sebuah sebut sebutlah sebutnya secara secukupnya sedang
+sedangkan sedemikian sedikit sedikitnya seenaknya segala segalanya segera
+seharusnya sehingga seingat sejak sejauh sejenak sejumlah sekadar sekadarnya
+sekali sekali-kali sekalian sekaligus sekalipun sekarang sekarang sekecil
+seketika sekiranya sekitar sekitarnya sekurang-kurangnya sekurangnya sela
+selain selaku selalu selama selama-lamanya selamanya selanjutnya seluruh
+seluruhnya semacam semakin semampu semampunya semasa semasih semata semata-mata
+semaunya sementara semisal semisalnya sempat semua semuanya semula sendiri
+sendirian sendirinya seolah seolah-olah seorang sepanjang sepantasnya
+sepantasnyalah seperlunya seperti sepertinya sepihak sering seringnya serta
+serupa sesaat sesama sesampai sesegera sesekali seseorang sesuatu sesuatunya
+sesudah sesudahnya setelah setempat setengah seterusnya setiap setiba setibanya
+setidak-tidaknya setidaknya setinggi seusai sewaktu siap siapa siapakah
+siapapun sini sinilah soal soalnya suatu sudah sudahkah sudahlah supaya
+
+tadi tadinya tahu tahun tak tambah tambahnya tampak tampaknya tandas tandasnya
+tanpa tanya tanyakan tanyanya tapi tegas tegasnya telah tempat tengah tentang
+tentu tentulah tentunya tepat terakhir terasa terbanyak terdahulu terdapat
+terdiri terhadap terhadapnya teringat teringat-ingat terjadi terjadilah
+terjadinya terkira terlalu terlebih terlihat termasuk ternyata tersampaikan
+tersebut tersebutlah tertentu tertuju terus terutama tetap tetapi tiap tiba
+tiba-tiba tidak tidakkah tidaklah tiga tinggi toh tunjuk turut tutur tuturnya
+
+ucap ucapnya ujar ujarnya umum umumnya ungkap ungkapnya untuk usah usai
+
+waduh wah wahai waktu waktunya walau walaupun wong
+
+yaitu yakin yakni yang
+""".split())
--- a/spacy/lang/id/tokenizer_exceptions.py
+++ b/spacy/lang/id/tokenizer_exceptions.py
@ -1,10 +1,11 @@
+"""
+Daftar singkatan dan Akronim dari:
+https://id.wiktionary.org/wiki/Wiktionary:Daftar_singkatan_dan_akronim_bahasa_Indonesia#A
+"""
 # coding: utf8
 from __future__ import unicode_literals

-import regex as re
-
 from ._tokenizer_exceptions_list import ID_BASE_EXCEPTIONS
-from ..tokenizer_exceptions import URL_PATTERN
 from ...symbols import ORTH, LEMMA, NORM


@ -22,6 +23,9 @@ for orth in ID_BASE_EXCEPTIONS:
    orth_lower = orth.lower()
    _exc[orth_lower] = [{ORTH: orth_lower}]

+    orth_first_upper = orth[0].upper() + orth[1:]
+    _exc[orth_first_upper] = [{ORTH: orth_first_upper}]
+
    if '-' in orth:
        orth_title = '-'.join([part.title() for part in orth.split('-')])
        _exc[orth_title] = [{ORTH: orth_title}]
@ -30,28 +34,6 @@ for orth in ID_BASE_EXCEPTIONS:
        _exc[orth_caps] = [{ORTH: orth_caps}]

 for exc_data in [
-    {ORTH: "CKG", LEMMA: "Cakung", NORM: "Cakung"},
-    {ORTH: "CGP", LEMMA: "Grogol Petamburan", NORM: "Grogol Petamburan"},
-    {ORTH: "KSU", LEMMA: "Kepulauan Seribu Utara", NORM: "Kepulauan Seribu Utara"},
-    {ORTH: "KYB", LEMMA: "Kebayoran Baru", NORM: "Kebayoran Baru"},
-    {ORTH: "TJP", LEMMA: "Tanjungpriok", NORM: "Tanjungpriok"},
-    {ORTH: "TNA", LEMMA: "Tanah Abang", NORM: "Tanah Abang"},
-
-    {ORTH: "BEK", LEMMA: "Bengkayang", NORM: "Bengkayang"},
-    {ORTH: "KTP", LEMMA: "Ketapang", NORM: "Ketapang"},
-    {ORTH: "MPW", LEMMA: "Mempawah", NORM: "Mempawah"},
-    {ORTH: "NGP", LEMMA: "Nanga Pinoh", NORM: "Nanga Pinoh"},
-    {ORTH: "NBA", LEMMA: "Ngabang", NORM: "Ngabang"},
-    {ORTH: "PTK", LEMMA: "Pontianak", NORM: "Pontianak"},
-    {ORTH: "PTS", LEMMA: "Putussibau", NORM: "Putussibau"},
-    {ORTH: "SBS", LEMMA: "Sambas", NORM: "Sambas"},
-    {ORTH: "SAG", LEMMA: "Sanggau", NORM: "Sanggau"},
-    {ORTH: "SED", LEMMA: "Sekadau", NORM: "Sekadau"},
-    {ORTH: "SKW", LEMMA: "Singkawang", NORM: "Singkawang"},
-    {ORTH: "STG", LEMMA: "Sintang", NORM: "Sintang"},
-    {ORTH: "SKD", LEMMA: "Sukadane", NORM: "Sukadane"},
-    {ORTH: "SRY", LEMMA: "Sungai Raya", NORM: "Sungai Raya"},
-
    {ORTH: "Jan.", LEMMA: "Januari", NORM: "Januari"},
    {ORTH: "Feb.", LEMMA: "Februari", NORM: "Februari"},
    {ORTH: "Mar.", LEMMA: "Maret", NORM: "Maret"},
@ -66,25 +48,43 @@ for exc_data in [
    {ORTH: "Des.", LEMMA: "Desember", NORM: "Desember"}]:
    _exc[exc_data[ORTH]] = [exc_data]

+_other_exc = {
+    "do'a": [{ORTH: "do'a", LEMMA: "doa", NORM: "doa"}],
+    "jum'at": [{ORTH: "jum'at", LEMMA: "Jumat", NORM: "Jumat"}],
+    "Jum'at": [{ORTH: "Jum'at", LEMMA: "Jumat", NORM: "Jumat"}],
+    "la'nat": [{ORTH: "la'nat", LEMMA: "laknat", NORM: "laknat"}],
+    "ma'af": [{ORTH: "ma'af", LEMMA: "maaf", NORM: "maaf"}],
+    "mu'jizat": [{ORTH: "mu'jizat", LEMMA: "mukjizat", NORM: "mukjizat"}],
+    "Mu'jizat": [{ORTH: "Mu'jizat", LEMMA: "mukjizat", NORM: "mukjizat"}],
+    "ni'mat": [{ORTH: "ni'mat", LEMMA: "nikmat", NORM: "nikmat"}],
+    "raka'at": [{ORTH: "raka'at", LEMMA: "rakaat", NORM: "rakaat"}],
+    "ta'at": [{ORTH: "ta'at", LEMMA: "taat", NORM: "taat"}],
+}
+
+_exc.update(_other_exc)
+
 for orth in [
    "A.AB.", "A.Ma.", "A.Md.", "A.Md.Keb.", "A.Md.Kep.", "A.P.",
    "B.A.", "B.Ch.E.", "B.Sc.", "Dr.", "Dra.", "Drs.", "Hj.", "Ka.", "Kp.",
-    "M.AB", "M.Ag.", "M.AP", "M.Arl", "M.A.R.S", "M.Hum.", "M.I.Kom.", "M.Kes,",
-    "M.Kom.", "M.M.", "M.P.", "M.Pd.", "M.Psi.", "M.Psi.T.", "M.Sc.", "M.SArl",
-    "M.Si.", "M.Sn.", "M.T.", "M.Th.", "No.", "Pjs.", "Plt.", "R.A.",
+    "M.AB", "M.Ag.", "M.AP", "M.Arl", "M.A.R.S", "M.Hum.", "M.I.Kom.",
+    "M.Kes,", "M.Kom.", "M.M.", "M.P.", "M.Pd.", "M.Psi.", "M.Psi.T.", "M.Sc.",
+    "M.SArl", "M.Si.", "M.Sn.", "M.T.", "M.Th.", "No.", "Pjs.", "Plt.", "R.A.",
    "S.AB", "S.AP", "S.Adm", "S.Ag.", "S.Agr", "S.Ant", "S.Arl", "S.Ars",
    "S.A.R.S", "S.Ds", "S.E.", "S.E.I.", "S.Farm", "S.Gz.", "S.H.", "S.Han",
-    "S.H.Int", "S.Hum", "S.Hut.", "S.In.", "S.IK.", "S.I.Kom.", "S.I.P", "S.IP",
-    "S.P.", "S.Pt", "S.Psi", "S.Ptk", "S.Keb", "S.Ked", "S.Kep", "S.KG", "S.KH",
-    "S.Kel", "S.K.M.", "S.Kedg.", "S.Kedh.", "S.Kom.", "S.KPM", "S.Mb", "S.Mat",
-    "S.Par", "S.Pd.", "S.Pd.I.", "S.Pd.SD", "S.Pol.", "S.Psi.", "S.S.", "S.SArl.",
-    "S.Sn", "S.Si.", "S.Si.Teol.", "S.SI.", "S.ST.", "S.ST.Han", "S.STP", "S.Sos.",
-    "S.Sy.", "S.T.", "S.T.Han", "S.Th.", "S.Th.I" "S.TI.", "S.T.P.", "S.TrK",
-    "S.Tekp.", "S.Th.",
-    "a.l.", "a.n.", "a.s.", "b.d.", "d.a.", "d.l.", "d/h", "dkk.", "dll.",
-    "dr.", "drh.", "ds.", "dsb.", "dst.", "faks.", "fax.", "hlm.", "i/o",
-    "n.b.", "p.p." "pjs.", "s.d.", "tel.", "u.p.",
-    ]:
+    "S.H.Int", "S.Hum", "S.Hut.", "S.In.", "S.IK.", "S.I.Kom.", "S.I.P",
+    "S.IP", "S.P.", "S.Pt", "S.Psi", "S.Ptk", "S.Keb", "S.Ked", "S.Kep",
+    "S.KG", "S.KH", "S.Kel", "S.K.M.", "S.Kedg.", "S.Kedh.", "S.Kom.", "S.KPM",
+    "S.Mb", "S.Mat", "S.Par", "S.Pd.", "S.Pd.I.", "S.Pd.SD", "S.Pol.",
+    "S.Psi.", "S.S.", "S.SArl.", "S.Sn", "S.Si.", "S.Si.Teol.", "S.SI.",
+    "S.ST.", "S.ST.Han", "S.STP", "S.Sos.", "S.Sy.", "S.T.", "S.T.Han",
+    "S.Th.", "S.Th.I" "S.TI.", "S.T.P.", "S.TrK", "S.Tekp.", "S.Th.",
+    "Prof.", "drg.", "KH.", "Ust.", "Lc", "Pdt.", "S.H.H.", "Rm.", "Ps.",
+    "St.", "M.A.", "M.B.A", "M.Eng.", "M.Eng.Sc.", "M.Pharm.", "Dr. med",
+    "Dr.-Ing", "Dr. rer. nat.", "Dr. phil.", "Dr. iur.", "Dr. rer. oec",
+    "Dr. rer. pol.", "R.Ng.", "R.", "R.M.", "R.B.", "R.P.", "R.Ay.", "Rr.",
+    "R.Ngt.", "a.l.", "a.n.", "a.s.", "b.d.", "d.a.", "d.l.", "d/h", "dkk.",
+    "dll.", "dr.", "drh.", "ds.", "dsb.", "dst.", "faks.", "fax.", "hlm.",
+    "i/o", "n.b.", "p.p." "pjs.", "s.d.", "tel.", "u.p."]:
    _exc[orth] = [{ORTH: orth}]

 TOKENIZER_EXCEPTIONS = _exc
--- a/spacy/lang/pl/lex_attrs.py
+++ b/spacy/lang/pl/lex_attrs.py
@ -0,0 +1,32 @@
+# coding: utf8
+from __future__ import unicode_literals
+
+from ...attrs import LIKE_NUM
+
+
+_num_words = ['zero', 'jeden', 'dwa', 'trzy', 'cztery', 'pięć', 'sześć', 
+              'siedem', 'osiem', 'dziewięć', 'dziesięć', 'jedenaście', 
+              'dwanaście', 'trzynaście', 'czternaście',
+              'pietnaście', 'szesnaście', 'siedemnaście', 'osiemnaście',
+              'dziewiętnaście', 'dwadzieścia', 'trzydzieści', 'czterdzieści', 
+              'pięćdziesiąt', 'szcześćdziesiąt', 'siedemdziesiąt', 
+              'osiemdziesiąt', 'dziewięćdziesiąt', 'sto', 'tysiąc', 'milion', 
+              'miliard', 'bilion', 'trylion']
+
+
+def like_num(text):
+    text = text.replace(',', '').replace('.', '')
+    if text.isdigit():
+        return True
+    if text.count('/') == 1:
+        num, denom = text.split('/')
+        if num.isdigit() and denom.isdigit():
+            return True
+    if text.lower() in _num_words:
+        return True
+    return False
+
+
+LEX_ATTRS = {
+    LIKE_NUM: like_num
+}
--- a/Show More
+++ b/Show More